Skip to content

fix[tool]: update VarAccess pickle implementation#4270

Merged
charles-cooper merged 9 commits intovyperlang:masterfrom
charles-cooper:fix/pickle
Oct 7, 2024
Merged

fix[tool]: update VarAccess pickle implementation#4270
charles-cooper merged 9 commits intovyperlang:masterfrom
charles-cooper:fix/pickle

Conversation

@charles-cooper
Copy link
Copy Markdown
Member

@charles-cooper charles-cooper commented Oct 3, 2024

What I did

fix a bug where unpickling annotated_vyper_module would lead to a crash:

AttributeError: 'VarAccess' object has no attribute 'variable'

a repro can be found at charles-cooper@2a5facf.

this is a blocker for tooling, for instance titanoboa relies on pickle/unpickle to cache CompilerData objects:
https://github.com/vyperlang/titanoboa/blob/86df8936654db20686410488738d7abaf165a4c9/boa/util/disk_cache.py#L65-L66

the underlying issue is that pickle.loads() calls obj.__hash__() for objects that are keys in a hashed data structure - namely, dicts, sets and frozensets. this causes a crash when there is a cycle in the object graph, because the object is not fully instantiated at the time that __hash__() is called. this is a cpython bug, reported at python/cpython#124937.

@serhiy-storchaka suggested the approach taken in this PR, which breaks the loop before pickling: python/cpython#124937 (comment).

How I did it

How to verify it

Commit message

fix a bug where unpickling `annotated_vyper_module` would lead to
a crash:

```
AttributeError: 'VarAccess' object has no attribute 'variable'
```

this is a blocker for tooling, for instance, titanoboa
relies on pickle/unpickle to cache `CompilerData` objects:
https://github.com/vyperlang/titanoboa/blob/86df8936654db2068641/boa/util/disk_cache.py#L65-L66

the underlying issue is that `pickle.loads()` calls `obj.__hash__()`
for objects that are keys in a hashed data structure - namely, dicts,
sets and frozensets. this causes a crash when there is a cycle in
the object graph, because the object is not fully instantiated at the
time that `__hash__()` is called. this is a cpython issue, reported at
python/cpython#124937.

@serhiy-storchaka suggested the approach taken in this PR, which breaks
the loop before pickling:
https://github.com/python/cpython/issues/124937#issuecomment-2392227290

note that the implementation of `__reduce__()` in this PR is safe, since
there is no cycle in the hash function itself, since the recursion
breaks in `VarInfo.__hash__()`. in other words, there is no possibility
of `VarAccess.__hash__()` changing mid-way through reconstructing the
object.

Description for the changelog

Cute Animal Picture

Put a link to a cute animal picture inside the parenthesis-->

@charles-cooper charles-cooper marked this pull request as ready for review October 4, 2024 20:00
@cyberthirst
Copy link
Copy Markdown
Collaborator

can we add an explainer comment on why's this necessary?

@charles-cooper
Copy link
Copy Markdown
Member Author

can we add an explainer comment on why's this necessary?

will get to it

# see https://github.com/python/cpython/issues/124937#issuecomment-2392227290
def __reduce__(self):
dict_obj = {f.name: getattr(self, f.name) for f in fields(self)}
return self.__class__._produce, (dict_obj,)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could also use the following trick if all fields can be passed as keyword arguments to constructor in subclasses:

Suggested change
return self.__class__._produce, (dict_obj,)
return partial(self.__class__, **dict_obj), ()

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting, is there a performance difference?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be slower. The advantage is that you do not need to add private method _produce().

@charles-cooper
Copy link
Copy Markdown
Member Author

copying in some offline discussion with @cyberthirst about why the hash is safe even though the data structure is cyclic:

There is no cycle in the hash function, it breaks at VarInfo.

return hash(id(self))

So the contents of the _expr_info._reads/_writes can't affect the value of VarAccess hash.

@codecov
Copy link
Copy Markdown

codecov bot commented Oct 5, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 90.07%. Comparing base (c02d2d8) to head (15efb2e).
Report is 69 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #4270      +/-   ##
==========================================
- Coverage   91.30%   90.07%   -1.23%     
==========================================
  Files         110      110              
  Lines       15764    15770       +6     
  Branches     3461     3463       +2     
==========================================
- Hits        14393    14205     -188     
- Misses        935     1095     +160     
- Partials      436      470      +34     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@charles-cooper charles-cooper merged commit d079562 into vyperlang:master Oct 7, 2024
@charles-cooper charles-cooper deleted the fix/pickle branch October 7, 2024 15:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants