feat(bindings/python): support pickle [de]serialization for Operator#5324
feat(bindings/python): support pickle [de]serialization for Operator#5324Zheaoli merged 13 commits intoapache:mainfrom
Conversation
|
It seems like we should add a new capability |
| """ | ||
| serialized = pickle.dumps(operator) | ||
| deserialized = pickle.loads(serialized) | ||
| assert repr(operator) == repr(deserialized) |
There was a problem hiding this comment.
For me, I think the ideal behavior for pickle [de]serialization is that the object has the same behavior/operation result.
So I suggest the test can be
data="abc"
operator.write("/abc","abc")
serialized = pickle.dumps(operator)
del operator
gc.collect()
deserialized = pickle.loads(serialized)
assert "abc"==deserialized.read("/abc")| }) | ||
| } | ||
|
|
||
| fn __repr__(&self) -> String { |
There was a problem hiding this comment.
I think we need add extra __hash__ method to represent two object are equal.
There was a problem hiding this comment.
It's hard to implement, we can't get internal information of Operator.
There was a problem hiding this comment.
how about hash the scheme and the map?
repr is used for show what the object is looking at. Maybe not a good method for identify
There was a problem hiding this comment.
I prefer don’t implement hash and eq for Operator, and remove the equal assertion based on repr.
(schema, map) can’t identify an Operator, leave eq and empty empty is better. Only the ID of PyObject can identify the Operator.
There was a problem hiding this comment.
I prefer don’t implement hash and eq for Operator, and remove the equal assertion based on repr.
Agreed. ocore::Operator doesn't implement Eq ahd Hash too.
SGTM, maybe |
IMO all operators can be serialized as pickle. In most Python libraries, all objects can be serialized as pickle, but it may not be identical when deserialized. The behavior is acceptable. We need the capability only for tests. We can ignore backends which doesn't support the capability in the test. |
|
I'm considering adding a Based on this definition:
I believe this capability is better than the following:
I don't like the idea that:
I view it as a design failure for OpenDAL to include something in the public API solely for testing purposes. |
SGTM |
There was a problem hiding this comment.
Copilot reviewed 4 out of 4 changed files in this pull request and generated no suggestions.
Comments skipped due to low confidence (2)
bindings/python/src/operator.rs:230
- The method
__getnewargs_ex__might not correctly handle the conversion ofkwargsto a Python dictionary. Ensure thatkwargsare correctly converted to a Python dictionary.
fn __getnewargs_ex__(&self, py: Python) -> PyResult<PyObject> {
bindings/python/src/operator.rs:51
- [nitpick] The naming of the private attributes
__schemeand__mapshould be consistent with the convention used in the rest of the codebase.
__scheme: ocore::Scheme,
Please note that What we only can do is disable the tests for unique operator. |
That's correct. As you mentioned, we just need to avoid testing it in a shared manner (which implies that we can access the same storage service after unpickling) if this storage service doesn't I prefer to add the |
|
In fact, there are two types of capabilities in all backends.
Of course, we don't need to be so strict and complicated, I support temporarily adding only shared capability. |
Would you like to add this capability? I believe we can include them in the next 0.51 release. |
I can add it. Please note that However, I guess we will not support that in a while. |
Currently, we don't offer any services that aren't |
|
Will test another PR use the CI flow, just ignore it. |
Zheaoli
left a comment
There was a problem hiding this comment.
LGTM about the Python binding part
I will merge this after all ci green
Xuanwo
left a comment
There was a problem hiding this comment.
LGTM. Thank you @TennyZhuang for working on this and thanks @Zheaoli's review.
Please merge when you feel this PR is ok, @Zheaoli
Which issue does this PR close?
Partially finished #5316
Rationale for this change
Now an
Operatorcan be serialized and deserialized via pickle, as a result, theOperatorclass can be used in multiprocessing context.What changes are included in this PR?
In the PR, we store the arguments for
Operatorconstructor, so that we can recover theOperatorfrom pickle deserialization.Are there any user-facing changes?
Operatorin python bindingOperatorto be used in multiprocessing context@Zheaoli I'm not very familiar with Python binding, PTAL for the PR, thanks!