-
Notifications
You must be signed in to change notification settings - Fork 185
Open
Description
Cloudpickle seems to produce non-deterministic dumps when the file's formatting is "innocuously" modified (e.g., formatting changes outside of pickled object's definition) whereas dill and pickle would produce deterministic dumps.
For example, inserting a blank line anywhere after where the pickled function foo is defined will initially produce a different hash, then subsequently produce the same hash upon successive runs:
import cloudpickle
import dill
import pickle
def foo():
pass
def get_cpickle():
return cloudpickle.dumps(foo)
def get_dill():
return dill.dumps(foo)
def get_pickle():
return pickle.dumps(foo)
if __name__ == '__main__':
print('Cpickle:', hash(get_cpickle()))
print('Dill:', hash(get_dill()))
print('Pickle:', hash(get_pickle()))
Command:
PYTHONHASHSEED=1 python bad_pickle.py
First run:
Cpickle: -185195056977094428
Dill: 1827482599472099751
Pickle: -2221802750934099445
Second run:
Cpickle: 5072829361071368526
Dill: 1827482599472099751
Pickle: -2221802750934099445
Blank line inserted after print('Cpickle:', ...) (third run):
Cpickle: -185195056977094428
Dill: 1827482599472099751
Pickle: -2221802750934099445
Fourth run:
Cpickle: 5072829361071368526
Dill: 1827482599472099751
Pickle: -2221802750934099445
This was tested on the following versions:
Cpickle version: 1.2.2
Dill version: 0.2.7.1
Python version: 3.6.10 (default, Jan 1 2020, 00:00:00)
This seems like perhaps Cloudpickle is also hashing some eventually cached version of the source file (e.g., .pyc).
This is also somewhat related to #120 .
Metadata
Metadata
Assignees
Labels
No labels