Skip to content

Python Optimization

Paul Talbot edited this page Sep 14, 2020 · 5 revisions

Optimization

Table of Contents

  1. Where to optimize
  2. for loops
    1. List Comprehensions
    2. Iterators
    3. Vectors
  3. List Lookups
  4. Assertions

Introduction

While RAVEN takes great advantage of the power and flexibility of Python, sometimes the most efficient methods in Python are unintuitive. To that end, we gather here a list of suggestions for rapid development.

If you find additional tips, let us know!

Where to optimize

In general, python is quite efficient and it is seldom worth spending significant time optimizing every line of code. However, there are a few instances where optimization is very worthwhile:

  • A line of code could be evaluated millions of times per RAVEN run,
  • A line of code is in the serial portions of a RAVEN run, instead of the parallel portion.

In these cases, it may be worth trying to improve the speed of the code. For finding bottlenecks in the code, see Line Profiling below.

Line Profiling

As of PR #1318, line profiling through the RAVEN command line has been automated. By running raven_framework with the flag --profile, a printout will be provided at the end of the RAVEN run showing several useful pieces of information about select methods.

NOTE: To use this feature, the RAVEN-optional library line_profiler needs to be installed via the RAVEN library installation tools.

To enable timings and printouts for a method in the code, import Decorators in the module containing the method, and decorate the desired method with @Decorators.timingProfile. Both module-level and class-level methods can be decorated. For example, in Simulation.py:

import Decorators

[...]

  @Decorators.timingProfile
  def setInputFiles(self,inputFiles):
    """
      Method that can be used to set the input files that the program received.
      These are currently used for cluster running where the program
      needs to be restarted on a different node.
      @ In, inputFiles, list, input files list
      @ Out, None
    """
    self.runInfoDict['SimulationFiles'   ] = inputFiles

Then, running any given input with the --profile flag will give line-by-line information about all tagged methods, such as:

$ cd ~/projects/raven/tests/framework/Samplers/Restart
$ ~/projects/raven/raven_framework --profile test_restart_MC.xml

provides the following printout:

Wrote profile results to Driver.py.lprof
Timer unit: 1e-06 s

Total time: 8e-06 s
File: /Users/talbpw/projects/raven/framework/Simulation.py
Function: setInputFiles at line 348

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   348                                             @Decorators.timingProfile
   349                                             def setInputFiles(self,inputFiles):
   350                                               """
   351                                                 Method that can be used to set the input files that the program received.
   352                                                 These are currently used for cluster running where the program
   353                                                 needs to be restarted on a different node.
   354                                                 @ In, inputFiles, list, input files list
   355                                                 @ Out, None
   356                                               """
   357         1          8.0      8.0    100.0      self.runInfoDict['SimulationFiles'   ] = inputFiles

Note in particular the columns, which show the line number in the module, the number of calls to each line during the run, the total time (in microseconds) spent on each line, the time-per-call, and the percentage time in that method spent on each line. In this uninspiring example, there is only one line of actionable code, so 100% of the time is spent on that line.

Once a bottleneck is discovered, the developer can add the decorator to bottleneck methods in the hierarchy until improvements, if any, are found.

for loops

In general, loops in Python are quite slow, and most optimization techniques revolve around avoiding them. Any time operations can be done in vectors or without looping they should be. Here are some ways to mitigate looping.

List Comprehensions

If we want to populate a list with items, we might be tempted to use an algorithm such as the following:

a = []
for x in range(100):
  a.append(x)
# time: 1.307e-5

Obviously this specific code is poor and not very useful; however, the same structure is seen frequently. On a particular test machine, averaging over 100,000 samples, this code took 1.307e-5 seconds per evaluation. Using list comprehensions, instead, we could do

a = list(x for x in range(100))
# time: 1.286e-6

On the same test machine, averaging over 100,000 samples, this code took 1.286e-6 seconds per evaluation, or roughly an order of magnitude improvement in speed.

Iterators

Furthermore, if a list is being constructed only for iterating over, an iterator is faster than an actual list, as it doesn't evaluate until it is needed. Keeping with our previous example, consider the following:

a = list(x for x in range(100))
for x in a:
  print x
# time: 3.230e-4

versus

a = (x for x in range(100))
for x in a:
  print x
# time: 3.194e-4

Again, not particularly useful code, but the pattern gets used. On the same test machine, the first took 3.230e-4 seconds, while the second took 3.194e-4 seconds. This saves less time than comprehensions, but does save on memory as well.

Vector Operations

If we want to find the element-wise product of two lists, we might try

a = range(1,1001)
b = range(1000,2000)
c = list(a[i]*b[i] for i in range(len(a)))
# time: 1.406e-4

Using numpy arrays,

import numpy as np
a = np.arange(1,1001)
b = np.arange(1000,2000)
c = a*b
# time: 4.81e-6

For vector or matrix operations of floats, numpy methods tend to be far superior to python loops.

List lookups

While in small dictionaries the lookup if key in myDict.keys() looks innocuous, it is rather expensive to look up the key. One option is to store the lookup as sorted tuples and use the bisect algorithm. For example, searching the list keys for the entry find:

keys = itertools.product('abcd',repeat=10) #creates all combinations of these letters in 10-letter words, sorted
find = 'c'*10 # 'cccccccccc'

Naively,

if find in keys:
  print 'yes'
# time: 8.048e-3 seconds

Using bisect,

i = bisect.bisect_left(keys,find)
if i != len(keys) and keys[i] == find:
  print 'yes'
# time: 1.044e-6 seconds

However, bisect only works on sorted lists.

Assertions

Python natively uses assert statements to check boolean evaluations that are necessary when developing or debugging, but not necessary during run time. For example:

if True: pass
# time: 7.701e-8 seconds

assert(True)
# time: 5.984e-8 seconds

Furthermore, running Python with the -O option erases these assertion statements,

# using "python -O test.py"
assert(True)
# time: 3.600e-8 seconds

By default, running raven_framework uses the -O option. This is the expected run for RAVEN users. For developers, however, assert statements can help debug development. To run RAVEN in debug mode, pass the -D flag to raven_framework, as

raven_framework -D test.py