Skip to content

[DF] Fix snapshot of array branch with out-of-order size branch#10934

Merged
eguiraud merged 7 commits intoroot-project:masterfrom
eguiraud:fix-snapshot-size-columns
Jul 24, 2022
Merged

[DF] Fix snapshot of array branch with out-of-order size branch#10934
eguiraud merged 7 commits intoroot-project:masterfrom
eguiraud:fix-snapshot-size-columns

Conversation

@eguiraud
Copy link
Copy Markdown
Contributor

@eguiraud eguiraud commented Jul 8, 2022

This PR fixes #10920 and #6932 (see those for more information).

@eguiraud eguiraud self-assigned this Jul 8, 2022
@phsft-bot
Copy link
Copy Markdown

Starting build on ROOT-debian10-i386/soversion, ROOT-performance-centos8-multicore/cxx17, ROOT-ubuntu18.04/nortcxxmod, ROOT-ubuntu2004/python3, mac1015/cxx17, mac11/cxx14, windows10/cxx14
How to customize builds

@phsft-bot
Copy link
Copy Markdown

Starting build on ROOT-debian10-i386/soversion, ROOT-performance-centos8-multicore/cxx17, ROOT-ubuntu18.04/nortcxxmod, ROOT-ubuntu2004/python3, mac1015/cxx17, mac11/cxx14, windows10/cxx14
How to customize builds

eguiraud added 5 commits July 20, 2022 12:29
Sometimes, when users specify an explicit list of columns to be
written out by a Snapshot, they might forget to list the names of
branches the store the sizes of array branches (e.g. `sz` for a
branch `x` with leaflist `x[sz]/F`).

We now explicitly check for this case and provide a (hopefully)
helpful error message.
Even when users do not specify them explicitly, we now add size
branches required by array branches to the list of branches that
will be written out by a Snapshot.
@eguiraud eguiraud force-pushed the fix-snapshot-size-columns branch from ee61235 to 1b9e9ca Compare July 20, 2022 13:09
@phsft-bot
Copy link
Copy Markdown

Starting build on ROOT-debian10-i386/soversion, ROOT-performance-centos8-multicore/cxx17, ROOT-ubuntu18.04/nortcxxmod, ROOT-ubuntu2004/python3, mac1015/cxx17, mac11/cxx14, windows10/cxx14
How to customize builds

@eguiraud
Copy link
Copy Markdown
Contributor Author

@vepadulano @Axel-Naumann with the last commit the situation is now the following:

explicit column list passed, a necessary size branch was omitted:

  • compiled Snapshot throws an exception with a (hopefully) clear error message
  • jitted Snapshot silently adds the required size branches

explicit column list passed, a necessary size branch is listed after the branch that needs it:

  • compiled and jitted Snapshot both work

no explicit column list passed:

  • this is only possible with jitted Snapshots and this now works despite the fact that Snapshot reorders the column names in alphabetical order (which might put a size branch after the branch that needs it). In a subsequent PR I'll try to go back to Snapshot using the same ordering for the output columns as the ordering of the input columns -- it's not super straightforward.

@eguiraud eguiraud added this to the 6.26/06 milestone Jul 20, 2022
@phsft-bot
Copy link
Copy Markdown

Build failed on ROOT-debian10-i386/soversion.
Running on pcepsft10.dyndns.cern.ch:/build/workspace/root-pullrequests-build
See console output.

Failing tests:

@phsft-bot
Copy link
Copy Markdown

Build failed on mac11/cxx14.
Running on macphsft23.dyndns.cern.ch:/Users/sftnight/build/workspace/root-pullrequests-build
See console output.

Failing tests:

@phsft-bot
Copy link
Copy Markdown

Build failed on windows10/cxx14.
Running on null:C:\build\workspace\root-pullrequests-build
See console output.

Failing tests:

@phsft-bot
Copy link
Copy Markdown

Build failed on mac1015/cxx17.
Running on macitois22.dyndns.cern.ch:/Users/sftnight/build/workspace/root-pullrequests-build
See console output.

Failing tests:

Copy link
Copy Markdown
Member

@Axel-Naumann Axel-Naumann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

const auto counterStr =
leaf->GetLeafCount() ? std::string(leaf->GetLeafCount()->GetName()) : std::to_string(leaf->GetLenStatic());
auto *sizeLeaf = leaf->GetLeafCount();
const auto sizeLeafName = sizeLeaf ? std::string(sizeLeaf->GetName()) : std::to_string(leaf->GetLenStatic());
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, for a leaf [4][4], leaf->GetLenStatic() returns 16. I guess that's good?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I...don't know, we don't have test coverage for 2-dimensional arrays in RDF

continue; // not a variable-sized array or the size branch is already there, nothing to do

// otherwise we must insert the size in colsWithoutAliases _and_ colsWithAliases
colsWithoutAliases.insert(colsWithoutAliases.begin() + i, countLeaf->GetName());
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means you insert the size column for branch X right before branch X, and then you check that branch X again in the next for loop iteration. Is that intentional?

Copy link
Copy Markdown
Contributor Author

@eguiraud eguiraud Jul 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah thank you , I guess we can also increment the counter to avoid the redundant work, will do

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


// make input tree
{
TFile f(inFile, "recreate");
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to use in-memory trees or TMemFile for these kind of tests?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes in principle it makes sense, in practice I don't know if it works e.g. with TTreeProcessorMT (which has to re-open the file in each thread). I noted it down to try and convert all our RDF tests to use TMemFiles whenever possible, but I'd address this separately.

Co-authored-by: Axel Naumann <[email protected]>
@phsft-bot
Copy link
Copy Markdown

Starting build on ROOT-debian10-i386/soversion, ROOT-performance-centos8-multicore/cxx17, ROOT-ubuntu18.04/nortcxxmod, ROOT-ubuntu2004/python3, mac1015/cxx17, mac11/cxx14, windows10/cxx14
How to customize builds

@phsft-bot
Copy link
Copy Markdown

Starting build on ROOT-debian10-i386/soversion, ROOT-performance-centos8-multicore/cxx17, ROOT-ubuntu18.04/nortcxxmod, ROOT-ubuntu2004/python3, mac1015/cxx17, mac11/cxx14, windows10/cxx14
How to customize builds

@phsft-bot
Copy link
Copy Markdown

Build failed on ROOT-performance-centos8-multicore/cxx17.
Running on olbdw-01.cern.ch:/data/sftnight/workspace/root-pullrequests-build
See console output.

Failing tests:

@phsft-bot
Copy link
Copy Markdown

Build failed on ROOT-ubuntu2004/python3.
Running on root-ubuntu-2004-1.cern.ch:/home/sftnight/build/workspace/root-pullrequests-build
See console output.

Failing tests:

@eguiraud eguiraud merged commit 56c28ff into root-project:master Jul 24, 2022
@eguiraud eguiraud deleted the fix-snapshot-size-columns branch July 24, 2022 03:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[DF] RDataFrame confused by array variables in 6.26/04

4 participants