-
Notifications
You must be signed in to change notification settings - Fork 185
Description
This question arose in the following context. I have multiple Python processes, and some classes are defined in each process. Sometimes class definitions are shipped from one process to another (using cloudpickle). Sometimes classes may be shipped multiple times or in multiple ways to a given process, and I'd like to deduplicate them based on the output of cloudpickle (that is, if cloudpickle.dumps(class1) == cloudpickle.dumps(class2), then the classes are the "same" and I can throw away one of them. This works, but there are way too many false negatives (that is, two classes really are the same (in some sense), but cloudpickle.dumps gives different results on the two classes.
Here's one example that sort of illustrates the issue (although there are a number of ways this kind of thing can arise).
Suppose I do the following.
import cloudpickle
class Foo1(object):
def __init__(self):
pass
serialized1 = cloudpickle.dumps(Foo1)
Foo2 = cloudpickle.loads(serialized1)
serialized2 = cloudpickle.dumps(Foo2)
assert serialized1 == serialized2 # This assertion fails.I'd love for this kind of assertion to succeed. Does anyone know if this is achievable or what the main obstacles are?
Interestingly, if I iterate this a third time,
Foo3 = cloudpickle.loads(serialized2)
serialized3 = cloudpickle.dumps(Foo3)
assert serialized2 == serialized3 # This succeeds.then the assert succeeds, so maybe it suffices to use cloudpickle.dumps(cloudpickle.loads(cloudpickle.dumps(cls))) to deduplicate classes (though this seems kind of insane, and I haven't tested this extensively). Would you expect this to work?
One thing that may be related/revealing is the outputs I get if I do something similar in an IPython interpreter (instead of a regular Python interpreter).
First copy and paste this block into IPython.
import cloudpickle
class Foo(object):
def __init__(self):
pass
serialized1 = cloudpickle.dumps(Foo)Then copy and paste this block into IPython.
class Foo(object):
def __init__(self):
pass
serialized2 = cloudpickle.dumps(Foo)Comparing serialized1 and serialized2 next to each other, they are
serialized1 # b'\x80\x02ccloudpickle.cloudpickle\n_rehydrate_skeleton_class\nq\x00(ccloudpickle.cloudpickle\n_builtin_type\nq\x01X\t\x00\x00\x00ClassTypeq\x02\x85q\x03Rq\x04X\x03\x00\x00\x00Fooq\x05c__builtin__\nobject\nq\x06\x85q\x07}q\x08X\x07\x00\x00\x00__doc__q\tNs\x87q\nRq\x0b}q\x0c(X\n\x00\x00\x00__module__q\rX\x08\x00\x00\x00__main__q\x0eX\x08\x00\x00\x00__init__q\x0fccloudpickle.cloudpickle\n_fill_function\nq\x10(ccloudpickle.cloudpickle\n_make_skel_func\nq\x11h\x01X\x08\x00\x00\x00CodeTypeq\x12\x85q\x13Rq\x14(K\x01K\x00K\x01K\x01KCc_codecs\nencode\nq\x15X\x04\x00\x00\x00d\x00S\x00q\x16X\x06\x00\x00\x00latin1q\x17\x86q\x18Rq\x19N\x85q\x1a)X\x04\x00\x00\x00selfq\x1b\x85q\x1cX\x1e\x00\x00\x00<ipython-input-1-d9b5c81388ae>q\x1dh\x0fK\x04h\x15X\x02\x00\x00\x00\x00\x01q\x1eh\x17\x86q\x1fRq ))tq!Rq"J\xff\xff\xff\xff}q#\x87q$Rq%}q&N}q\'NtRutR.'
serialized2 # b'\x80\x02ccloudpickle.cloudpickle\n_rehydrate_skeleton_class\nq\x00(ccloudpickle.cloudpickle\n_builtin_type\nq\x01X\t\x00\x00\x00ClassTypeq\x02\x85q\x03Rq\x04X\x03\x00\x00\x00Fooq\x05c__builtin__\nobject\nq\x06\x85q\x07}q\x08X\x07\x00\x00\x00__doc__q\tNs\x87q\nRq\x0b}q\x0c(X\n\x00\x00\x00__module__q\rX\x08\x00\x00\x00__main__q\x0eX\x08\x00\x00\x00__init__q\x0fccloudpickle.cloudpickle\n_fill_function\nq\x10(ccloudpickle.cloudpickle\n_make_skel_func\nq\x11h\x01X\x08\x00\x00\x00CodeTypeq\x12\x85q\x13Rq\x14(K\x01K\x00K\x01K\x01KCc_codecs\nencode\nq\x15X\x04\x00\x00\x00d\x00S\x00q\x16X\x06\x00\x00\x00latin1q\x17\x86q\x18Rq\x19N\x85q\x1a)X\x04\x00\x00\x00selfq\x1b\x85q\x1cX\x1e\x00\x00\x00<ipython-input-2-a08e1f07615d>q\x1dh\x0fK\x02h\x15X\x02\x00\x00\x00\x00\x01q\x1eh\x17\x86q\x1fRq ))tq!Rq"J\xff\xff\xff\xff}q#\x87q$Rq%}q&N}q\'NtRutR.'
They seem to be the same everywhere except that the first includes the string <ipython-input-1-d9b5c81388ae>q\x1dh\x0fK\x04h and the second includes the string <ipython-input-2-a08e1f07615d>q\x1dh\x0fK\x02h. Any idea where these strings come from or if it is possible to remove them?