Possible way of supporting multiple files with same name with TEntryList#8660
Conversation
|
The latest commit reintroduces the
|
|
Starting build on |
|
Build failed on ROOT-ubuntu16/nortcxxmod. Failing tests: |
|
Build failed on ROOT-performance-centos8-multicore/default. Failing tests: |
|
Build failed on mac1014/python3. Failing tests: |
|
Build failed on mac11.0/cxx17. Failing tests: |
|
Build failed on ROOT-debian10-i386/cxx14. Failing tests: |
|
Starting build on |
|
Build failed on ROOT-debian10-i386/cxx14. Errors:
Failing tests: |
The new scheme indeed seems like a reasonable solution. |
|
Starting build on |
Thanks Philippe! I've uploaded a few tests. I'd like to have also @eguiraud 's opinion |
eguiraud
left a comment
There was a problem hiding this comment.
I like the AddSubList idea. With that, what is something that still goes wrong unless "sync" is also used?
| fLists = new TList(); | ||
| } | ||
| fLists->Add(elistcopy); | ||
| } |
There was a problem hiding this comment.
this logic is much simpler that what is done in Add. One thing I see it's missing is handling the case in which elist has sublists (totally ok on my side, maybe it warrants a mention in the method description).
@pcanal do we need to take care of something else that Add takes care of?
There was a problem hiding this comment.
With the current signature there is nothing more to do. The complexity in the regular case is the "or merge with existing" part. Since here, by contract, it is just an add, we are done. (One would need to reintroduce that complexity, if this routine was 'AddAt' to merge with an existing entry list as the same index.
|
Build failed on ROOT-debian10-i386/cxx14. Failing tests: |
|
As per our discussion:
|
|
Starting build on |
|
Build failed on ROOT-debian10-i386/cxx14. Errors:
Failing tests: |
bellenot
left a comment
There was a problem hiding this comment.
Please look for the No newline at end of file warnings. Thanks
|
Starting build on |
|
Build failed on ROOT-debian10-i386/cxx14. Errors:
Failing tests: |
TEntryList does not support the case of a TChain that holds two or more trees with same name and filename. This is due to two reasons:
1. TEntryList::Add merges the input TEntryList in the current one if they share the same treename and filename, instead of adding the input as a sub entry list (which would be the intended behaviour with TChain).
2. TChain::SetEntryList ultimately calls TEntryList::GetEntryList(treename, filename) to retrieve the correct sub entry list for the current tree in the chain. This means that two trees with same name would retrieve the same TEntryList thus leading to wrong reads.
This commit enables support for such usecases, which can be especially useful in test and benchmark scenarios, by addressing the two issues above:
1. The new method TEntryList::AddSubList allows to add a TEntryList to another one directly as an element of the fLists data member.
2. A new option has been added to TChain::SetEntryList, called "sync", such that the TChain will go through the TEntryList in lockstep with the trees in the chain rather than performing a lookup based on treename and filename.
Both need to be activated in order for the use case to work properly. Here is an example ( The file "file_20_entries.root" holds a tree with name "entries" and 20 entries in total). The expected output is for RDataFrame to read 30 entries (20 from the first tree and 10 from the second).
```
int main(){
auto start_1{0};
auto start_2{0};
auto end_1{20};
auto end_2{10};
TEntryList elists;
TEntryList elist1{"","","entries","file_20entries.root"};
TEntryList elist2{"","","entries","file_20entries.root"};
for(auto entry = start_1; entry < end_1; entry++){
elist1.Enter(entry);
}
for(auto entry = start_2; entry < end_2; entry++){
elist2.Enter(entry);
}
elists.AddSubList(&elist1);
elists.AddSubList(&elist2);
TChain chain{"entries"};
chain.Add("file_20entries.root");
chain.Add("file_20entries.root");
chain.SetEntryList(&elists, "sync");
ROOT::RDataFrame df{chain};
std::cout << df.Count().GetValue() << "\n";
}
```
cc515f9 to
9e5a201
Compare
|
Starting build on |
|
I squashed and rebased the commits. Will merge if tests are green 👍 |
|
Build failed on ROOT-debian10-i386/cxx14. Failing tests: |
This PR shows the ingredients needed to provide a fix (or more like a workaround) to #8505
TEntryList::AddSubListwhich purpose is specifically of adding a sub list to the main TEntryList. With this we can avoid the logic ofTEntryList::Addthat merges multiple TEntryList referring to the same tree into a global list.TChain::SetEntryListto grab the sub-TEntryList with the same index as the current sub-tree in the chain. Currently, this is done instead withGetEntryList(treename, filename, opt)that will always report the same sub-TEntryList referring to the "first" file (because all files of the sublists of the global TEntryList have the same name).The changes are not definitive at all. I wanted to ask your opinion if it's a valid way forward. If so, I will polish the code and prepare it for a real PR.
With this commit, the following snippet
Outputs the correct result: