Skip to content

Fix Display output when friend trees are present#8529

Merged
eguiraud merged 4 commits intoroot-project:masterfrom
eguiraud:fix_display
Jun 30, 2021
Merged

Fix Display output when friend trees are present#8529
eguiraud merged 4 commits intoroot-project:masterfrom
eguiraud:fix_display

Conversation

@eguiraud
Copy link
Copy Markdown
Contributor

Before this patch, friend branches or leaves were listed in the output of GetColumnNames twice, as friendname.bname and as bname. Now we only list the longer version.

This fixes #8450 as a side-effect.

This PR also adds a test and improves the related docs.

Rationale for the change in behavior of GetColumnNames: currently, in RDataFrame we have (fairly convoluted) logic to retrieve 3 different lists of branch names given a TTree/TChain:

  1. a user-readable of list of available column names returned by df.GetColumnNames()
  2. a larger list of all valid spellings for all column names, used to validate user inputs (this includes all names returned by df.GetColumnNames() plus alternative spellings such as branchname.leafname when branchname == leafname, shorthands for friendname.branchname as just branchname, etc.
  3. a list of only top-level branches that we use as the list of branches to Snapshot by default

Before this patch df.GetColumnNames() returned multiple valid spellings for the same friend branch. That can be confusing, so I'd rather (try to) return only one valid spelling for each available branch/leaf. For consistency with Display and to not withhold information from users, among the two valid spellings we always show the "fully qualified friendname.branchname". Users can still use the shorthand "branchname" if it's not ambiguous, as the relevant documentation now points out.
I think this solution is the sweet spot between not being surprising (to users and to RDF developers), being easy to implement without further complicating or completely refactoring the branch-retrieval logic and being somewhat backward-compatible.

eguiraud added 4 commits June 23, 2021 12:46
Before this patch, friend branches or leaves were listed in the output
of `GetColumnNames` twice, as `friendname.bname` and as `bname`.
Now we only list the longer version.

This fixes root-project#8450 ("[DF] Display gets confused by friend columns, prints
them twice") as a side-effect.
@eguiraud eguiraud requested a review from Axel-Naumann June 24, 2021 11:20
@eguiraud eguiraud self-assigned this Jun 24, 2021
@phsft-bot
Copy link
Copy Markdown

Starting build on ROOT-debian10-i386/cxx14, ROOT-performance-centos8-multicore/default, ROOT-ubuntu16/nortcxxmod, mac1014/python3, mac11.0/cxx17, windows10/cxx14
How to customize builds

@phsft-bot
Copy link
Copy Markdown

@eguiraud eguiraud requested a review from etejedor June 30, 2021 09:37

Friend TTrees with a TTreeIndex are supported from ROOT v6.24.
Columns coming from the friend trees can be referred to by their full name, like in the example above,
or the friend tree name can be omitted in case the branch name is not ambiguous (e.g. "MyCol" could be used instead of
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is already tested elsewhere? The fact that one can refer to a non-ambiguous friend column omitting the friend tree name or not.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes dataframe_friends.cxx tests it

@eguiraud eguiraud merged commit fe0045d into root-project:master Jun 30, 2021
@eguiraud eguiraud deleted the fix_display branch June 30, 2021 14:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[DF] Display gets confused by friend columns, prints them twice

3 participants