Skip to content

Use RDatasetSpec in distrdf task creation#10939

Merged
vepadulano merged 2 commits intoroot-project:masterfrom
vepadulano:distrdf-fix-friends
Jul 26, 2022
Merged

Use RDatasetSpec in distrdf task creation#10939
vepadulano merged 2 commits intoroot-project:masterfrom
vepadulano:distrdf-fix-friends

Conversation

@vepadulano
Copy link
Copy Markdown
Member

@vepadulano vepadulano commented Jul 10, 2022

Fixes #10872

@lgtm-com
Copy link
Copy Markdown

lgtm-com bot commented Jul 10, 2022

This pull request introduces 1 alert when merging 06becc0 into 103525b - view on LGTM.com

new alerts:

  • 1 for Unused import

Copy link
Copy Markdown
Contributor

@eguiraud eguiraud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reviewed all changes except those in Ranges.py, which as very large and got me stumped for a while until I had to move on. the rest looks good to me.

the commit message should probably mention #10872 explicitly.

@vepadulano vepadulano force-pushed the distrdf-fix-friends branch from 06becc0 to 99036ce Compare July 19, 2022 10:49
@lgtm-com
Copy link
Copy Markdown

lgtm-com bot commented Jul 19, 2022

This pull request introduces 1 alert when merging 99036ce into dcf49e4 - view on LGTM.com

new alerts:

  • 1 for Unused import

@eguiraud eguiraud added this to the 6.26/06 milestone Jul 20, 2022
@vepadulano vepadulano marked this pull request as ready for review July 26, 2022 15:30
@vepadulano vepadulano requested a review from etejedor as a code owner July 26, 2022 15:30
Copy link
Copy Markdown
Contributor

@etejedor etejedor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice! Thanks!

@vepadulano vepadulano changed the title [skip-ci] Use RDatasetSpec in distrdf task creation Use RDatasetSpec in distrdf task creation Jul 26, 2022
At the beginning of a distributed task, move from creating a TChain to
creating an RDatasetSpec to pass to the RDataFrame constructor. This
simplifies the previous logic by avoiding the need for a TEntryList to
restrict the reading of the chain entries to those assigned to the task.

This commit also fixes root-project#10872
Previously, each task received a copy of the names of all
files of the friend dataset. When restricting the processing to a
certain range of entries of the main chain, the friend chain was always
being read starting from the first file. Now, in case there are friend
trees, each task will receive information about the full dataset and
will have to open all files in order to retrieve the number of entries
in all trees. This in turn allows proper alignment w.r.t. the friend
chain.
@vepadulano vepadulano force-pushed the distrdf-fix-friends branch from 58b9f38 to d427515 Compare July 26, 2022 16:41
@phsft-bot
Copy link
Copy Markdown

Starting build on ROOT-debian10-i386/soversion, ROOT-performance-centos8-multicore/cxx17, ROOT-ubuntu18.04/nortcxxmod, ROOT-ubuntu2004/python3, mac1015/cxx17, mac11/cxx14, windows10/cxx14
How to customize builds

@vepadulano vepadulano merged commit ed38eb8 into root-project:master Jul 26, 2022
@phsft-bot
Copy link
Copy Markdown

Build failed on mac1015/cxx17.
Running on macitois21.dyndns.cern.ch:/Users/sftnight/build/workspace/root-pullrequests-build
See console output.

Failing tests:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[DF] Wrong entries are loaded from friend trees with distributed RDF

4 participants