-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Control Consistency of PersistentDataset #999
Description
Is your feature request related to a problem? Please describe.
Already included as a warning in the source code, the PersistentDataset does not check if any of the transformations changed, and thus after modifying the transformations, the cache is outdated.
Describe the solution you'd like
Calculate the hash of the very first element of the dataset (after calling _pre_first_random_transform) and store this in the cache folder. In case the stored hash matches with the one calculated during __init__ of PersistentDataset the cache is still valid and no deterministic transformations have changed.
However, if the hashes do not match, either a new cache folder could be created or the old cache can be overwritten.
Describe alternatives you've considered
Alternatively, the deterministic transformations could be hashed, but I am not sure how that would work. Might be more memory efficient, but potentially harder to implement due to TypeError: Object of type *** is not JSON serializable for most transforms.