Skip to content

[DF] Make Aliases specific to the computation graph branch#9496

Merged
eguiraud merged 6 commits intoroot-project:masterfrom
eguiraud:per-branch-alias
Jan 10, 2022
Merged

[DF] Make Aliases specific to the computation graph branch#9496
eguiraud merged 6 commits intoroot-project:masterfrom
eguiraud:per-branch-alias

Conversation

@eguiraud
Copy link
Copy Markdown
Contributor

@eguiraud eguiraud commented Jan 5, 2022

As there is one RLoopManager per computation graph, when aliases
were managed by RLoopManager they were computation-graph-wide.
It is desirable to make Alias definitions behave coherently with
Defines instead, i.e. have Aliases be only accessible in the branch
of the computation graph in which they were defined, and only in
nodes that are downstream of the one where the alias is added.

This resolves #7381, "[DF] Let Aliases be defined per computation graph
branch, not globally".

In particular, in this commit:

  • move alias management from RLoopManager to RBookedColumns
  • remove alias-managing logic from RLoopManager
  • refactor several functions so they only take RBookedColumns as input
    rather than a list of defined names plus a map of aliases
    (CheckForDefinition, CheckForRedefinition, FindUnknownColumns,
    GetValidatedColumnNames, BookFilterJit, FindUsedColumns,
    ParseRDFExpression)
  • use RBookedColumns::ResolveAlias instead of ResolveAlias helper function
  • adapt dataframe_utils test to the new signature of FindUnknownColumns

Note that this is a backward-incompatible change for user code that
relied on this global definition of Aliases. We expect this to be an
exceedingly rare usecase, and if such code exists it will produce a
clear error message.

@eguiraud eguiraud requested a review from Axel-Naumann January 5, 2022 12:36
@eguiraud eguiraud self-assigned this Jan 5, 2022
@phsft-bot
Copy link
Copy Markdown

Starting build on ROOT-debian10-i386/cxx14, ROOT-performance-centos8-multicore/default, ROOT-ubuntu16/nortcxxmod, ROOT-ubuntu2004/soversion, mac1015/python3, mac11/cxx17, windows10/cxx14
How to customize builds

@phsft-bot
Copy link
Copy Markdown

Build failed on ROOT-debian10-i386/cxx14.
Running on pcepsft11.dyndns.cern.ch:/home/sftnight/build/workspace/root-pullrequests-build
See console output.

Errors:

  • [2022-01-05T12:38:31.217Z] CMake Error at /home/sftnight/build/workspace/root-pullrequests-build/rootspi/jenkins/root-build.cmake:1064 (message):

Copy link
Copy Markdown
Member

@Axel-Naumann Axel-Naumann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just minor comments. Very nice improvement: it's indeed a more consistent, simpler design now! I am a bit worried that the pair HasAlias() / ResolveAlias() has a perf impact because the lookup needs to happen twice - but so far I see that mostly for errors where it shouldn't matter. An alternative might be to have ResolveAlias() return an empty string, if that can never be a valid column name, or to have it return a pair of "it's a valid alias, this is the aliased name".

@phsft-bot
Copy link
Copy Markdown

Starting build on ROOT-debian10-i386/cxx14, ROOT-performance-centos8-multicore/default, ROOT-ubuntu16/nortcxxmod, ROOT-ubuntu2004/soversion, mac1015/python3, mac11/cxx17, windows10/cxx14
How to customize builds

@phsft-bot
Copy link
Copy Markdown

Starting build on ROOT-debian10-i386/cxx14, ROOT-performance-centos8-multicore/default, ROOT-ubuntu16/nortcxxmod, ROOT-ubuntu2004/soversion, mac1015/python3, mac11/cxx17, windows10/cxx14
How to customize builds

@phsft-bot
Copy link
Copy Markdown

Build failed on mac1015/python3.
Running on macitois19.dyndns.cern.ch:/Users/sftnight/build/workspace/root-pullrequests-build
See console output.

Failing tests:

As there is one RLoopManager per computation graph, when aliases
were managed by RLoopManager they were computation-graph-wide.
It is desirable to make Alias definitions behave coherently with
Defines instead, i.e. have Aliases be only accessible in the branch
of the computation graph in which they were defined, and only in
nodes that are downstream of the one where the alias is added.

This resolves root-project#7381, "[DF] Let Aliases be defined per computation graph
branch, not globally".

In particular, in this commit:
- move alias management from RLoopManager to RBookedColumns
- remove alias-managing logic from RLoopManager
- refactor several functions so they only take RBookedColumns as input
  rather than a list of defined names plus a map of aliases
  (CheckForDefinition, CheckForRedefinition, FindUnknownColumns,
  GetValidatedColumnNames, BookFilterJit, FindUsedColumns,
  ParseRDFExpression)
- use RBookedColumns::ResolveAlias instead of ResolveAlias helper function
- adapt dataframe_utils test to the new signature of FindUnknownColumns

Note that this is a backward-incompatible change for user code that
relied on this global definition of Aliases. We expect this to be an
exceedingly rare usecase, and if such code exists it will produce a
clear error message.
RBookedColumns has a better API now, and client code does not need
to "manually" add names to its internal list.
It used to be impossible to Define (or Redefine, or Alias) a column
that had already been Aliased from anywhere in the computation graph.

Now that Aliases are specific to a branch of the computation graph,
we can add a certain Alias in one branch and still Define a column
with the same name in another.
@phsft-bot
Copy link
Copy Markdown

Starting build on ROOT-debian10-i386/cxx14, ROOT-performance-centos8-multicore/default, ROOT-ubuntu16/nortcxxmod, ROOT-ubuntu2004/soversion, mac1015/python3, mac11/cxx17, windows10/cxx14
How to customize builds

@eguiraud eguiraud merged commit a887414 into root-project:master Jan 10, 2022
@eguiraud eguiraud deleted the per-branch-alias branch January 10, 2022 08:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[DF] Let Aliases be defined per computation graph branch, not globally

3 participants