Skip to content

Remove hashing by key from CacheDatast (hash_as_key)  #5390

@myron

Description

@myron

In the CacheDataset we have an option to hash by key (hash_as_key with hash_func=pickle_hashing). It looks like , it is not used anywhere in the code or in any tutorials.

Historically, the option was added due to @dongyang0122 request
#3734
#3739
I've just synced with @dongyang0122 and he is not using this option either (and never used it)

Hashing by key also seems not practical, computing hash on 3D image inquires overhead (and it needs to be done every time when trying to access hash , both for read/write). Finally other Datasets sub-classes don't have this option at all, which makes it inconsistent.

I don't really see a good use-case to keep it. If there are duplicate images in the dataset , and they have to be exactly identical to a voxel for it to work, then it will be probably faster to just process a dubplicate image, then checking for hash for all images. Or even better to run some check for dublicates beforehand.

If we remove it completely, we'll simplify quite a bit our code in CacheDataset. There is already a refactoring PR #5365, perhaps we can remove this hashing by key too @Nic-Ma

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions