-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Remove hashing by key from CacheDatast (hash_as_key) #5390
Description
In the CacheDataset we have an option to hash by key (hash_as_key with hash_func=pickle_hashing). It looks like , it is not used anywhere in the code or in any tutorials.
Historically, the option was added due to @dongyang0122 request
#3734
#3739
I've just synced with @dongyang0122 and he is not using this option either (and never used it)
Hashing by key also seems not practical, computing hash on 3D image inquires overhead (and it needs to be done every time when trying to access hash , both for read/write). Finally other Datasets sub-classes don't have this option at all, which makes it inconsistent.
I don't really see a good use-case to keep it. If there are duplicate images in the dataset , and they have to be exactly identical to a voxel for it to work, then it will be probably faster to just process a dubplicate image, then checking for hash for all images. Or even better to run some check for dublicates beforehand.
If we remove it completely, we'll simplify quite a bit our code in CacheDataset. There is already a refactoring PR #5365, perhaps we can remove this hashing by key too @Nic-Ma