-
Notifications
You must be signed in to change notification settings - Fork 2
/
README
212 lines (164 loc) · 6.82 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
This library defines two new decorator, 'persistent_locals' and
'persistent_locals2' that expose the local variables in the inner
scope of a function through a function attribute, 'locals'.
The problem
===========
In scientific development, functions often represent complex data
processing algorithm that transform input data into a desired output.
Internally, the function typically requires several intermediate
results to be computed and stored in local variables.
As a simple toy example, we consider the following function, that
takes three arguments and returns True if the sum of the arguments is
smaller than the product:
def is_sum_lt_prod(a,b,c):
sum = a+b+c
prod = a*b*c
return sum<prod
A frequently occurring problem is that the developer/final user may
need to access the intermediate results at a later stage, because of
the need of analyzing the detailed behavior of the algorithm, or in
order to write more comprehensive tests for the algorithm.
A possible solution would be to re-define the function and return the
needed internal variables, but this would break the existing code. A
better solution is to add a keyword argument to return more
information:
def is_sum_lt_prod(a,b,c, internals=False):
sum = a+b+c
prod = a*b*c
if internals:
return sum<prod, {'sum': sum, 'prod': prod}
else:
return sum<prod
Returning a dictionary is important here in order to avoid breaking
the code at later stages, should one want to access even more local
variables, and also in order to avoid ugly code like
res, _, _, _, var1, _, var3 = f(x)
where most of the returned values are irrelevant.
This solution would keep the existing code intact, but requires
frequent rewriting of the source code, which might not be feasible
(e.g., if the function is defined in a third-party library). Moreover,
the local variables are not accessible in case the function raises an
exception, which is often desirable for debugging.
Proposed solution
=================
The proposed solution consists in a decorator that makes the local
variables accessible from a read-only property of the function,
'locals'. For example:
@persistent_locals
def is_sum_lt_prod(a,b,c):
sum = a+b+c
prod = a*b*c
return sum<prod
after calling the function, e.g. is_sum_lt_prod(2,1,2), we can analyze
the intermediate results as
is_sum_lt_prod.locals
which returns
{'a': 2, 'b': 1, 'c': 2, 'prod': 4, 'sum': 5}
This style is cleaner, is consistent with the principle of identifying
the value returned by a function as the output of an algorithm, and is
robust to changes in the needs of the researcher.
The local variables are saved even in case of an exception, which
turns out to be quite useful for debugging.
How it works
============
Unfortunately, the local variables in the inner scope of a function
are not easily accessible. Moreover, a candidate decorator should not
break under these conditions:
1) When the function to be decorated is defined in a closure
2) When the original function is deleted, as in
@persistent_locals
def f():
pass
g=f
del f
1 and 2 imply that the decorator cannot refer to the global name of the
function
3) When the function raises an exception (it should return the locals
computed up to the point the exception was raised)
4) When the function is defined using an '*args' argument (i.e., the
decorator cannot add a keyword argument)
Solution 1 (persistent_locals)
------------------------------
The first proposed approach is to wrap the function in a callable
object, and modify its bytecode by adding an external try...finally
statement as follows:
def f(self, *args, **kwargs):
try:
... old code ...
finally:
self.locals = locals().copy()
del self.locals['self']
The implementation requires the lightweight library byteplay.py
(http://code.google.com/p/byteplay/).
The reason for wrapping the function in a class, instead of saving the
locals in a function attribute directly, is that there are all sorts
of complications in referring from within a function to the function
itself. For example, referring to the attribute as f.locals results in
the bytecode looking for the name 'f' in the namespace, and therefore
moving the function, e.g. with
g = f
del f
would break 'g'. There are even more problems for functions defined in
a closure.
Solution 2 (persistent_locals2)
-------------------------------
This second solution is based on an idea by Andrea Maffezzoli: A
simpler and arguably cleaner approach is to define a tracer function
and use it to trace the code with 'sys.setprofile'. The tracer
function is called at the entry and exit of functions and when an
exception is called, and gives access to the frame of the function:
def __call__(self, *args, **kwargs):
def tracer(frame, event, arg):
if event=='return':
self._locals = frame.f_locals
# tracer is activated on next call, return or exception
sys.setprofile(tracer)
try:
# trace the function call
res = self.func(*args, **kwargs)
finally:
# disable tracer
sys.setprofile(None)
return res
Calling the tracer function at the entry and exit of functions comes
at a cost in execution time. This cost is usually very small, but
might become significant if the decorator function makes many calls to
other functions.
Moreover, this second solution breaks any attempt to profile the
decorated code.
Cost
====
The increase in execution time of the decorated function is minimal.
Given its domain of application, most of the functions will take a
significant amount of time to complete, making the cost the decoration
negligible:
import time
def f(x):
time.sleep(0.5)
return 2*x
df = deco.persistent_locals(f)
df2 = deco.persistent_locals2(f)
%timeit f(1)
10 loops, best of 3: 500 ms per loop
%timeit df(1)
10 loops, best of 3: 500 ms per loop
%timeit df2(1)
10 loops, best of 3: 500 ms per loop
This does not work
==================
I tried modifying f.func_globals with a custom dictionary which keeps
a reference to f.func_globals, adding a static element to 'f', but
this does not work as the Python interpreter does not call the
func_globals dictionary with Python calls but directly with
PyDict_GetItem (see
http://osdir.com/ml/python.ideas/2007-11/msg00092.html). It is thus
impossible to re-define __getitem__ to return 'f' as needed. Ideally,
one would like to define a new closure for the function with a cell
variable containing the reference, but this is impossible as far as I
can tell.
Conclusion
==========
The problem of needing to access a different subset of intermediate
results in an algorithm is a recurrent one in my research. The use of
the proposed decorated makes the code cleaner, and successive analysis
of data much easier.