Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix[tool]: update VarAccess pickle implementation #4270

Merged
merged 9 commits into from
Oct 7, 2024

Conversation

charles-cooper
Copy link
Member

@charles-cooper charles-cooper commented Oct 3, 2024

What I did

fix a bug where unpickling annotated_vyper_module would lead to a crash:

AttributeError: 'VarAccess' object has no attribute 'variable'

a repro can be found at charles-cooper@2a5facf.

this is a blocker for tooling, for instance titanoboa relies on pickle/unpickle to cache CompilerData objects:
https:/vyperlang/titanoboa/blob/86df8936654db20686410488738d7abaf165a4c9/boa/util/disk_cache.py#L65-L66

the underlying issue is that pickle.loads() calls obj.__hash__() for objects that are keys in a hashed data structure - namely, dicts, sets and frozensets. this causes a crash when there is a cycle in the object graph, because the object is not fully instantiated at the time that __hash__() is called. this is a cpython bug, reported at python/cpython#124937.

@serhiy-storchaka suggested the approach taken in this PR, which breaks the loop before pickling: python/cpython#124937 (comment).

How I did it

How to verify it

Commit message

fix a bug where unpickling `annotated_vyper_module` would lead to
a crash:

```
AttributeError: 'VarAccess' object has no attribute 'variable'
```

this is a blocker for tooling, for instance, titanoboa
relies on pickle/unpickle to cache `CompilerData` objects:
https:/vyperlang/titanoboa/blob/86df8936654db2068641/boa/util/disk_cache.py#L65-L66

the underlying issue is that `pickle.loads()` calls `obj.__hash__()`
for objects that are keys in a hashed data structure - namely, dicts,
sets and frozensets. this causes a crash when there is a cycle in
the object graph, because the object is not fully instantiated at the
time that `__hash__()` is called. this is a cpython issue, reported at
python/cpython#124937.

@serhiy-storchaka suggested the approach taken in this PR, which breaks
the loop before pickling:
https:/python/cpython/issues/124937#issuecomment-2392227290

note that the implementation of `__reduce__()` in this PR is safe, since
there is no cycle in the hash function itself, since the recursion
breaks in `VarInfo.__hash__()`. in other words, there is no possibility
of `VarAccess.__hash__()` changing mid-way through reconstructing the
object.

Description for the changelog

Cute Animal Picture

Put a link to a cute animal picture inside the parenthesis-->

@charles-cooper charles-cooper marked this pull request as ready for review October 4, 2024 20:00
@cyberthirst
Copy link
Collaborator

can we add an explainer comment on why's this necessary?

@charles-cooper
Copy link
Member Author

can we add an explainer comment on why's this necessary?

will get to it

vyper/semantics/analysis/base.py Outdated Show resolved Hide resolved
# see https:/python/cpython/issues/124937#issuecomment-2392227290
def __reduce__(self):
dict_obj = {f.name: getattr(self, f.name) for f in fields(self)}
return self.__class__._produce, (dict_obj,)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could also use the following trick if all fields can be passed as keyword arguments to constructor in subclasses:

Suggested change
return self.__class__._produce, (dict_obj,)
return partial(self.__class__, **dict_obj), ()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting, is there a performance difference?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be slower. The advantage is that you do not need to add private method _produce().

@charles-cooper
Copy link
Member Author

copying in some offline discussion with @cyberthirst about why the hash is safe even though the data structure is cyclic:

There is no cycle in the hash function, it breaks at VarInfo.

return hash(id(self))

So the contents of the _expr_info._reads/_writes can't affect the value of VarAccess hash.

Copy link

codecov bot commented Oct 5, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 90.07%. Comparing base (c02d2d8) to head (15efb2e).
Report is 1 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #4270      +/-   ##
==========================================
- Coverage   91.30%   90.07%   -1.23%     
==========================================
  Files         110      110              
  Lines       15764    15770       +6     
  Branches     3461     3463       +2     
==========================================
- Hits        14393    14205     -188     
- Misses        935     1095     +160     
- Partials      436      470      +34     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@charles-cooper charles-cooper merged commit d079562 into vyperlang:master Oct 7, 2024
158 of 159 checks passed
@charles-cooper charles-cooper deleted the fix/pickle branch October 7, 2024 15:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants