Copy Snapshot operation arguments in a distributed task#10391
Copy Snapshot operation arguments in a distributed task#10391vepadulano merged 1 commit intoroot-project:masterfrom
Conversation
|
Starting build on |
|
Build failed on ROOT-ubuntu2004/soversion. Failing tests: |
|
Build failed on mac11/cxx17. Failing tests: |
|
Build failed on mac1015/python3. Failing tests: |
The Snapshot operation file name is modified in-place to append the range id of a certain task. This can lead to a task receiving the input operation from a previous task with an already modified file name. Thus, the current task would create a wrong file name with more than one range id. Solve this by creating a deep copy of the Snapshot operation arguments in each task, so that the filename is correctly changed in isolation.
9fb282e to
fc76271
Compare
|
Starting build on |
|
Build failed on mac1015/python3. Failing tests: |
|
@phsft-bot build also on mac12/default |
|
Starting build on |
|
Build failed on windows10/cxx14. Errors:
And 309 more |
|
Build failed on mac12/default. Warnings:
And 349 more Failing tests: |
|
@phsft-bot build |
|
Starting build on |
|
@phsft-bot build |
|
Starting build on |
|
Build failed on ROOT-performance-centos8-multicore/default. Failing tests: |
| rdf_operation = getattr(previous_rdf_node, distrdf_node.operation.name) | ||
| _make_op_lazy_if_needed(distrdf_node.operation, range_id) | ||
| pyroot_node = rdf_operation(*distrdf_node.operation.args, **distrdf_node.operation.kwargs) | ||
| in_task_op = _create_lazy_op_if_needed(distrdf_node.operation, range_id) |
There was a problem hiding this comment.
So two tasks can invoke _call_rdf_operation on the same distrdf_node object? How does this happen, I thought every task generates its own graph?
There was a problem hiding this comment.
Every task generates its own RDataFrame C++ graph, but the DistRDF Python graph is a single object that gets serialized/deserialized. On a single machine, a single Python process can receive two(or more) tasks. When the first task starts, it deserializes the distrdf graph, modifies in-place the operation object, then it creates the RDF C++ calls, sends everything to the mapper function that executes them. When the second task starts, it gets the same distrdf graph objects, but at this point their operation attributes were modified by the previous task, thus leading to the errors described in the PR
The Snapshot operation file name is modified in-place to append the
range id of a certain task. This can lead to a task receiving the
input operation from a previous task with an already modified file
name. Thus, the current task would create a wrong file name with more
than one range id. Solve this by creating a deep copy of the Snapshot
operation arguments in each task, so that the filename is correctly
changed in isolation.
This PR fixes #10390