[DF] Make Aliases specific to the computation graph branch#9496
[DF] Make Aliases specific to the computation graph branch#9496eguiraud merged 6 commits intoroot-project:masterfrom
Conversation
|
Starting build on |
|
Build failed on ROOT-debian10-i386/cxx14. Errors:
|
Axel-Naumann
left a comment
There was a problem hiding this comment.
LGTM, just minor comments. Very nice improvement: it's indeed a more consistent, simpler design now! I am a bit worried that the pair HasAlias() / ResolveAlias() has a perf impact because the lookup needs to happen twice - but so far I see that mostly for errors where it shouldn't matter. An alternative might be to have ResolveAlias() return an empty string, if that can never be a valid column name, or to have it return a pair of "it's a valid alias, this is the aliased name".
|
Starting build on |
|
Starting build on |
|
Build failed on mac1015/python3. Failing tests: |
As there is one RLoopManager per computation graph, when aliases were managed by RLoopManager they were computation-graph-wide. It is desirable to make Alias definitions behave coherently with Defines instead, i.e. have Aliases be only accessible in the branch of the computation graph in which they were defined, and only in nodes that are downstream of the one where the alias is added. This resolves root-project#7381, "[DF] Let Aliases be defined per computation graph branch, not globally". In particular, in this commit: - move alias management from RLoopManager to RBookedColumns - remove alias-managing logic from RLoopManager - refactor several functions so they only take RBookedColumns as input rather than a list of defined names plus a map of aliases (CheckForDefinition, CheckForRedefinition, FindUnknownColumns, GetValidatedColumnNames, BookFilterJit, FindUsedColumns, ParseRDFExpression) - use RBookedColumns::ResolveAlias instead of ResolveAlias helper function - adapt dataframe_utils test to the new signature of FindUnknownColumns Note that this is a backward-incompatible change for user code that relied on this global definition of Aliases. We expect this to be an exceedingly rare usecase, and if such code exists it will produce a clear error message.
RBookedColumns has a better API now, and client code does not need to "manually" add names to its internal list.
It used to be impossible to Define (or Redefine, or Alias) a column that had already been Aliased from anywhere in the computation graph. Now that Aliases are specific to a branch of the computation graph, we can add a certain Alias in one branch and still Define a column with the same name in another.
d2c02c5 to
27c0fb5
Compare
|
Starting build on |
As there is one RLoopManager per computation graph, when aliases
were managed by RLoopManager they were computation-graph-wide.
It is desirable to make Alias definitions behave coherently with
Defines instead, i.e. have Aliases be only accessible in the branch
of the computation graph in which they were defined, and only in
nodes that are downstream of the one where the alias is added.
This resolves #7381, "[DF] Let Aliases be defined per computation graph
branch, not globally".
In particular, in this commit:
rather than a list of defined names plus a map of aliases
(CheckForDefinition, CheckForRedefinition, FindUnknownColumns,
GetValidatedColumnNames, BookFilterJit, FindUsedColumns,
ParseRDFExpression)
Note that this is a backward-incompatible change for user code that
relied on this global definition of Aliases. We expect this to be an
exceedingly rare usecase, and if such code exists it will produce a
clear error message.