-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Distributed RDataFrame doesn't respect lazy instant actions #9993
Copy link
Copy link
Closed
Description
Describe the bug
If an instant action is purposely made lazy by the user, distributed RDataFrame does not respect it. For example:
if __name__ == "__main__":
client = Client(LocalCluster(n_workers=2, threads_per_worker=1, processes=True))
opts = ROOT.RDF.RSnapshotOptions()
opts.fLazy = True
snap_ptr = RDataFrame(10, daskclient=client).Define("a","1.").Snapshot("dummy_distributed",
"dummy_distributed.root", ["a"], opts)When executed, the code above results in:
$: python test_lazy_distributed.py
$: ls dummy_distributed*
dummy_distributed_0.root dummy_distributed_1.root
Similarly a lazy AsNumpy call would still trigger the computation graph right away
Expected behavior
If an instant action is made lazy by the user, it should not trigger distributed execution
To Reproduce
ROOT 6.24 and above (6.26 and above to use the Dask backend, but it's unrelated to this issue)
Reactions are currently unavailable