Skip to content

Performance issues with hashing attrs objects #261

@DRMacIver

Description

@DRMacIver

Backstory: I got a bug about a performance regression reported in Hypothesis ( HypothesisWorks/hypothesis#919 ) and in the profiling data in the linked issue, the majority of the extra time (10 out of 13 seconds) is in attrs hashing.

This makes sense, as the enclosing method where most of the time is being spent does a lot of hashing of attrs objects, but it was still a bit surprising that it was this slow.

The class being hashed looks as follows:

@attr.s(slots=True, frozen=True)
class Arc(object):
    filename = attr.ib()
    source = attr.ib()
    target = attr.ib()

I have worked around this problem by now by removing attrs from this class and memoizing creation of it so that all value equal objects are reference equal, which is a bit more of an extreme solution than I expect attrs to support, but means that this bug is in no way critical for me as attrs will no longer be on the hot path of this code. It will probably however limit attrs uptake inside Hypothesis for now (I am likely to add a bunch more classes with a similar usage pattern and now will probably follow the pattern of the workaround I used here rather than using attrs).

The obvious two things that attrs could do to help are:

  • Have faster hashing. Almost all of that hashing time is in _attrs_to_tuple. Not creating these intermediate tuples (or creating them faster somehow) might be a significant win here.
  • Add support for caching the hash on the object. In my usage pattern the same objects were being hashed over and over again, and paying the hashing cost each time.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions