Directly store needed attributes in TreeHeadNode#8816
Directly store needed attributes in TreeHeadNode#8816vepadulano merged 2 commits intoroot-project:masterfrom
TreeHeadNode#8816Conversation
|
Starting build on |
| # RDataFrame(tree, defaultBranches = {}) | ||
| self.tree = args[0] | ||
| # Retrieve information about friend trees when user passes a TTree | ||
| # or TChain object. |
There was a problem hiding this comment.
this comment belongs to the previous commit
| hn = create_dummy_headnode(tree) | ||
|
|
||
| self.assertEqual(hn.get_num_entries(), 4) | ||
| self.assertEqual(hn.tree.GetEntries(), 4) |
There was a problem hiding this comment.
what are you testing here?
There was a problem hiding this comment.
This and the above tests in class NumEntriesTest were previously testing that the method get_num_entries of TreeHeadNode was returning the correct amount. Now, they test that the attribute tree of the class is the correct TTree/TChain that we expect by retrieving the number of entries.
There was a problem hiding this comment.
I may agree that it's not a super useful test, but I didn't want to remove it as a first step
|
This pull request introduces 1 alert when merging 27f893d into 68c7ac9 - view on LGTM.com new alerts:
|
* Remove the superfluous getters plus properties logic for attributes in TreeHeadNode. * Instead of storing the user arguments and parse them later, parse them directly in the __init__ method and store a `tree` class attribute with a TTree or TChain that will be processed. In general, the `npartitions` and `tree` attributes are always needed to setup the distributed execution. The `defaultbranches` and `friendinfo` attributes are initialized to `None` and changed if some conditions apply: * `defaultbranches` needs to be explicitly passed by the user to the RDataFrame constructor * Information about friend trees is retrieved only if the user passed a `TTree &` to the RDataFrame constructor This commit retrieves and stores the `treename` and `inputfiles` attributes to keep compatibility with the current signature of `DistRDF.Ranges.get_clusters`. In a future commit, this signature may be changed to accept directly a TTree or TChain and thus these two attributes could be removed.
27f893d to
8c64c8a
Compare
|
Starting build on |
|
|
Build failed on ROOT-ubuntu16/nortcxxmod. Errors:
|
|
Build failed on ROOT-debian10-i386/cxx14. Errors:
|
| The head node of a computation graph where the RDataFrame data source is | ||
| a TTree. This head node is responsible for the following RDataFrame constructor:: | ||
| a TTree or a TChain. This head node is responsible for the following | ||
| RDataFrame constructors:: |
There was a problem hiding this comment.
Double colon is how Sphinx parses a code block (doxygen would expect a ~~~{.py}) . It's still here and there in DistRDF, when we'll move to fully utilise doxygen will be changed
This PR addresses the TODO comments to remove the superfluous public getters / properties logic in
TreeHeadNode. Currently, just parses the user arguments in the__init__method and stores the needed attributes. My idea is that in the end we'll only need to storenpartitionsandtree(which are always needed) plusdefaultbranchesandfriendinfoas "optional" class attributes (meaning they are initialized toNoneand changed if some conditions apply).For simplicity, in this PR I still leave
treenameandinputfilesattributes which are needed in theRanges.get_clustersfunction.The next PRs will address:
Ranges.get_clustersto accept aTTreeinstance as argument, pairing it with new C++ functions inROOT::Internal::TreeUtilsto retrieve a vector of clusters (plus some other metadata) depending if it's a TTree or TChainIf the logic for this PR is approved I'll add more docs and commit messages