Clear cache hash on de-serialization#489
Conversation
Sync to main attrs repo
Sync up to attrs master
Because the hash code cache field gets serialized and deserialized by Pickle, previously when you deserialize a cache_hash=True attrs object, the hashcode will be the hashcode the object had at serialization-time. However, if your object had fields with hash codes which were not deterministic between interpreter runs, then on a new interpreter run your deserialized object would have a hash code which differs from a newly created identical object. This commit fixes that by clearing the cache on deserialization. It needs to override the __setstate__ method to do so, so this commit also forbids using a custom __setstate__ on a cache_hash=True object. Closes python-attrs#482 .
4ea79f3 to
42e8653
Compare
hynek
left a comment
There was a problem hiding this comment.
This is a pretty good first shot! Needs some more polish, but we should be able to merge it soon.
I'm not happy with the way we patch __setstate__ but what can we do…
| @@ -0,0 +1,2 @@ | |||
| Fixes a bug where deserialized objects with ``cache_hash=True`` could | |||
There was a problem hiding this comment.
-
Please check out: https://github.com/python-attrs/attrs/blob/master/.github/CONTRIBUTING.rst#changelog
Most notably: simple past + semantic newlines.
-
Also this is specifically about
pickleso that should be mentioned. -
This is a breaking change, see below for an explanation.
| # Clearing the hash code cache on deserialization is needed | ||
| # because hash codes can change from run to run. See issue | ||
| # https://github.com/python-attrs/attrs/issues/482 . | ||
| # Note that this code only handles setstate for slots classes. |
There was a problem hiding this comment.
same about nomenclature, also in the next line.
| # Note that this code only handles setstate for non-slots classes. | ||
| # For slots classes, see similar code in _create_slots_class . | ||
| if self._cache_hash: | ||
| if hasattr(cls, "__setstate__"): |
There was a problem hiding this comment.
Ugh, thanks for enlightening me on the awfulness of hasattr :-)
| # For slots classes, see similar code in _create_slots_class . | ||
| if self._cache_hash: | ||
| if hasattr(cls, "__setstate__"): | ||
| raise NotImplementedError( |
There was a problem hiding this comment.
I guess this should be marked as a breaking change? 🤔 It should be mentioned that this isn't possible anymore in the newsfragment.
|
|
||
| # these are for use in TestAddHash.test_cache_hash_serialization | ||
| # they need to be out here so they can be un-pickled | ||
| HashCacheSerializationTestUncached = make_class( |
There was a problem hiding this comment.
Any particular reason why you use make_class here instead of defining the classes the usual way?
There was a problem hiding this comment.
No good reason - I've switched it to the usual way
| https://github.com/python-attrs/attrs/issues/482 . | ||
| """ | ||
|
|
||
| # first, check that our fix didn't break serialization without |
There was a problem hiding this comment.
I love the elaborate explanations here because this problem is indeed rather obscure and easy to forget about. But – and I do realize this is a bit petty 😔 – could you use proper capitalization throughout it please? 😇
There was a problem hiding this comment.
No problem - I've fixed it.
|
@hynek : Thanks for the review! I'll get to these Monday or Tuesday. |
|
@hynek : ready for a second review pass. I believe I've addressed all your comments. |
|
Thanks! |
|
@hynek : Do you know when the next release is expected? This bug is causing us trouble in several places, so we'd be very appreciative if there were an updated release which had the fix. :-) |
|
It's a nice change of pace from the usual situation of my own software inconveniencing me with the bugs I put there. :-) Good motivation for me to address #503 quickly! |
This fixes a bug ( #482 ) where deserialized objects could have stale cached hash values. There is some remaining work for a rare case which I have shifted to ( #494 ) because it is out-of-scope for my needs and tricky to fix.
Pull Request Check List
This is just a reminder about the most common mistakes. Please make sure that you tick all appropriate boxes. But please read our contribution guide at least once, it will save you unnecessary review cycles!
If an item doesn't apply to your pull request, check it anyway to make it apparent that there's nothing to do.
.pyi). n/atests/typing_example.py.docs/api.rstby hand.@attr.s()have to be added by hand too.versionadded,versionchanged, ordeprecateddirectives..rstfiles is written using semantic newlines. n/achangelog.d.If you have any questions to any of the points above, just submit and ask! This checklist is here to help you, not to deter you from contributing!