Skip to content

Distributed RDataFrame doesn't respect lazy instant actions #9993

@vepadulano

Description

@vepadulano

Describe the bug

If an instant action is purposely made lazy by the user, distributed RDataFrame does not respect it. For example:

if __name__ == "__main__":
    client = Client(LocalCluster(n_workers=2, threads_per_worker=1, processes=True))

    opts = ROOT.RDF.RSnapshotOptions()
    opts.fLazy = True
    snap_ptr = RDataFrame(10, daskclient=client).Define("a","1.").Snapshot("dummy_distributed",
               "dummy_distributed.root", ["a"], opts)

When executed, the code above results in:

$: python test_lazy_distributed.py
$: ls dummy_distributed*
dummy_distributed_0.root  dummy_distributed_1.root

Similarly a lazy AsNumpy call would still trigger the computation graph right away

Expected behavior

If an instant action is made lazy by the user, it should not trigger distributed execution

To Reproduce

ROOT 6.24 and above (6.26 and above to use the Dask backend, but it's unrelated to this issue)

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions