Skip to content

cache_hash breaks copy and pickle for non-slots classes #613

@pganssle

Description

@pganssle

After "fixing" #611, I realized that my test in #612 was incomplete - I was not asserting that the copy.deepcopy worked, and it turns out it did not. The test, modified as below, fails because b.x is never set (the attribute doesn't even exist in the copied object):

    def test_copy_roundtrip(self):
        @attr.s(frozen=True, cache_hash=True)
        class C(object):
            x = attr.ib()

        a = C(1)
        b = copy.deepcopy(C(1))

        assert a == b

I may be missing something, but it seems like #489 actually broke serialization entirely for any class with cache_hash:

>>> import attr
>>> import copy
>>> @attr.s(hash=True, cache_hash=True)
... class SomeClass:
...     x = attr.ib()
...
>>> SomeClass(1)
SomeClass(x=1)
>>> copy.deepcopy(SomeClass(1))
SomeClass(x=NOTHING)

I believe the reason for this is that copy and pickle don't do whatever their default behavior is if __setstate__ is set - they just create a new object and then call __setstate__, which means that when __setstate__ doesn't actually initialize the object, the object remains uninitialized.

I am assuming this went unnoticed because @gabbard (who had the problem in the first place) is, I'm assuming, using a slots class, which doesn't have this problem (slots defines a __setstate__).

I think there are two options here:

  1. Define a custom __getstate__ and __setstate__ for classes with cache_hash=True to duplicate what pickle and copy were doing anyway.
  2. Define a default __reduce__ method that removes the hash cache.

I don't like the first option very much, because it means that we have to re-implement copy and pickle's default behavior (which may even diverge from one another)! I like the second one a lot more, particularly because this will just be the default __reduce__. People implementing their own custom __reduce__ can choose to include or not include the cached hash (though I'm not sure if there's a public variable anywhere they can access to tell what member it would be - maybe exposing such a public member should be part of this)?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions