README

This library defines two new decorator, 'persistent_locals' and
'persistent_locals2' that expose the local variables in the inner
scope of a function through a function attribute, 'locals'.

The problem
===========

In scientific development, functions often represent complex data
processing algorithm that transform input data into a desired output.
Internally, the function typically requires several intermediate
results to be computed and stored in local variables.

As a simple toy example, we consider the following function, that
takes three arguments and returns True if the sum of the arguments is
smaller than the product:

def is_sum_lt_prod(a,b,c):
    sum = a+b+c
    prod = a*b*c
    return sum<prod

A frequently occurring problem is that the developer/final user may
need to access the intermediate results at a later stage, because of
the need of analyzing the detailed behavior of the algorithm, or in
order to write more comprehensive tests for the algorithm.

A possible solution would be to re-define the function and return the
needed internal variables, but this would break the existing code. A
better solution is to add a keyword argument to return more
information:

def is_sum_lt_prod(a,b,c, internals=False):
    sum = a+b+c
    prod = a*b*c
    if internals:
         return sum<prod, {'sum': sum, 'prod': prod}
    else:
         return sum<prod

Returning a dictionary is important here in order to avoid breaking
the code at later stages, should one want to access even more local
variables, and also in order to avoid ugly code like

res, _, _, _, var1, _, var3 = f(x)

where most of the returned values are irrelevant.

This solution would keep the existing code intact, but requires
frequent rewriting of the source code, which might not be feasible
(e.g., if the function is defined in a third-party library). Moreover,
the local variables are not accessible in case the function raises an
exception, which is often desirable for debugging.

Proposed solution
=================

The proposed solution consists in a decorator that makes the local
variables accessible from a read-only property of the function,
'locals'. For example:

    @persistent_locals
    def is_sum_lt_prod(a,b,c):
        sum = a+b+c
        prod = a*b*c
        return sum<prod

after calling the function, e.g. is_sum_lt_prod(2,1,2), we can analyze
the intermediate results as

    is_sum_lt_prod.locals

which returns

    {'a': 2, 'b': 1, 'c': 2, 'prod': 4, 'sum': 5}

This style is cleaner, is consistent with the principle of identifying
the value returned by a function as the output of an algorithm, and is
robust to changes in the needs of the researcher.

The local variables are saved even in case of an exception, which
turns out to be quite useful for debugging.

How it works
============

Unfortunately, the local variables in the inner scope of a function
are not easily accessible. Moreover, a candidate decorator should not
break under these conditions:

1) When the function to be decorated is defined in a closure

2) When the original function is deleted, as in
    @persistent_locals
    def f():
        pass
    
    g=f
    del f

1 and 2 imply that the decorator cannot refer to the global name of the
function

3) When the function raises an exception (it should return the locals
   computed up to the point the exception was raised)

4) When the function is defined using an '*args' argument (i.e., the
   decorator cannot add a keyword argument)

Solution 1 (persistent_locals)
------------------------------

The first proposed approach is to wrap the function in a callable
object, and modify its bytecode by adding an external try...finally
statement as follows:

   def f(self, *args, **kwargs):
       try:
           ... old code ...
       finally:
           self.locals = locals().copy()
           del self.locals['self']

The implementation requires the lightweight library byteplay.py
(http://code.google.com/p/byteplay/).

The reason for wrapping the function in a class, instead of saving the
locals in a function attribute directly, is that there are all sorts
of complications in referring from within a function to the function
itself. For example, referring to the attribute as f.locals results in
the bytecode looking for the name 'f' in the namespace, and therefore
moving the function, e.g. with
g = f
del f
would break 'g'. There are even more problems for functions defined in
a closure.

Solution 2 (persistent_locals2)
-------------------------------

This second solution is based on an idea by Andrea Maffezzoli: A
simpler and arguably cleaner approach is to define a tracer function
and use it to trace the code with 'sys.setprofile'. The tracer
function is called at the entry and exit of functions and when an
exception is called, and gives access to the frame of the function:

    def __call__(self, *args, **kwargs):
        def tracer(frame, event, arg):
            if event=='return':
                self._locals = frame.f_locals
                
        # tracer is activated on next call, return or exception
        sys.setprofile(tracer)
        try:
            # trace the function call
            res = self.func(*args, **kwargs)
        finally:
            # disable tracer
            sys.setprofile(None)
        return res

Calling the tracer function at the entry and exit of functions comes
at a cost in execution time. This cost is usually very small, but
might become significant if the decorator function makes many calls to
other functions.

Moreover, this second solution breaks any attempt to profile the
decorated code.

Cost
====

The increase in execution time of the decorated function is minimal.
Given its domain of application, most of the functions will take a
significant amount of time to complete, making the cost the decoration
negligible:

import time
def f(x):
   time.sleep(0.5)
   return 2*x

df = deco.persistent_locals(f)
df2 = deco.persistent_locals2(f)

%timeit f(1)
10 loops, best of 3: 500 ms per loop
%timeit df(1)
10 loops, best of 3: 500 ms per loop
%timeit df2(1)
10 loops, best of 3: 500 ms per loop

This does not work
==================

I tried modifying f.func_globals with a custom dictionary which keeps
a reference to f.func_globals, adding a static element to 'f', but
this does not work as the Python interpreter does not call the
func_globals dictionary with Python calls but directly with
PyDict_GetItem (see
http://osdir.com/ml/python.ideas/2007-11/msg00092.html). It is thus
impossible to re-define __getitem__ to return 'f' as needed. Ideally,
one would like to define a new closure for the function with a cell
variable containing the reference, but this is impossible as far as I
can tell.

Conclusion
==========

The problem of needing to access a different subset of intermediate
results in an algorithm is a recurrent one in my research. The use of
the proposed decorated makes the code cleaner, and successive analysis
of data much easier.