ARROW-1758: [Python] Remove pickle=True option for object serialization#1347
ARROW-1758: [Python] Remove pickle=True option for object serialization#1347Licht-T wants to merge 2 commits intoapache:masterfrom
Conversation
python/pyarrow/serialization.py
Outdated
There was a problem hiding this comment.
If no custom serializer/deserializer is passed, should this default to pickle?
There was a problem hiding this comment.
Currently it defaults to using __dict__, which is more efficient so it seems a good default, we use that in Ray for pretty much all types (however rarely I have seen cases where it doesn't work). I'd prefer to keep __dict__ the default but no big deals since it can easily be changed.
There was a problem hiding this comment.
In Ray, lambdas are handled with cloudpickle as are builtin types like "type"; most user defined types are handled via dict; namedtuples are special cased.
One reasonable way to handle this is to provide default callbacks for types we encounter that don't work with dict (using cloudpickle or a custom serializer/deserializer), there are already examples of this in the repo, and use dict for anything else. That worked well for us so far. If this strategy doesn't work for the user, they have full flexibility by providing their own serialization context.
What do you think?
There was a problem hiding this comment.
This is sufficiently advanced stuff that I think it's not unreasonable to ask people to explicitly use cloudpickle in a case like this. I would be OK if we changed this patch to do that to get the build passing, and we can always move on later
ebfc414 to
4e71bd3
Compare
|
The tests fail here, need to use cloudpickle, I think |
|
Okay, I'll check. |
Change-Id: Id4423a228ae2388c3e3f75d5650f0f0126fa9cc8
4e71bd3 to
927f154
Compare
|
@pcmoritz @robertnishihara does this look ok? |
|
Looks good to me. |
|
+1 LGTM |
|
Thanks @wesm! |
This closes ARROW-1758.