Skip to content

Possible way of supporting multiple files with same name with TEntryList#8660

Merged
vepadulano merged 3 commits intoroot-project:masterfrom
vepadulano:tentrylist-addsublist
Jul 26, 2021
Merged

Possible way of supporting multiple files with same name with TEntryList#8660
vepadulano merged 3 commits intoroot-project:masterfrom
vepadulano:tentrylist-addsublist

Conversation

@vepadulano
Copy link
Copy Markdown
Member

@vepadulano vepadulano commented Jul 13, 2021

This PR shows the ingredients needed to provide a fix (or more like a workaround) to #8505

  • A new function TEntryList::AddSubList which purpose is specifically of adding a sub list to the main TEntryList. With this we can avoid the logic of TEntryList::Add that merges multiple TEntryList referring to the same tree into a global list.
  • A slight modification in TChain::SetEntryList to grab the sub-TEntryList with the same index as the current sub-tree in the chain. Currently, this is done instead with GetEntryList(treename, filename, opt) that will always report the same sub-TEntryList referring to the "first" file (because all files of the sublists of the global TEntryList have the same name).

The changes are not definitive at all. I wanted to ask your opinion if it's a valid way forward. If so, I will polish the code and prepare it for a real PR.

With this commit, the following snippet

int main(){
    auto start_1{0};
    auto start_2{0};
    auto end_1{20};
    auto end_2{10};

    TEntryList elists;
    TEntryList elist1{"e","e","entries","file_20entries_1.root"};
    TEntryList elist2{"e","e","entries","file_20entries_1.root"};

    for(auto entry = start_1; entry < end_1; entry++){
        elist1.Enter(entry);
    }

    for(auto entry = start_2; entry < end_2; entry++){
        elist2.Enter(entry);
    }

    elists.AddSubList(&elist1);
    elists.AddSubList(&elist2);

    TChain chain{"entries"};
    chain.Add("file_20entries_1.root");
    chain.Add("file_20entries_1.root");

    chain.SetEntryList(&elists, "sync");

    ROOT::RDataFrame df{chain};

    std::cout << df.Count().GetValue() << "\n";
}

Outputs the correct result:

$: ./tentrylist_emptysource_addsublist 
30

@vepadulano vepadulano requested review from eguiraud and pcanal July 13, 2021 13:28
@vepadulano vepadulano self-assigned this Jul 13, 2021
@vepadulano vepadulano changed the title Possible way of supporting multiple files with same name with TEntryList [skip-ci]Possible way of supporting multiple files with same name with TEntryList Jul 13, 2021
@root-project root-project deleted a comment from phsft-bot Jul 13, 2021
@vepadulano
Copy link
Copy Markdown
Member Author

The latest commit reintroduces the Option_t* parameter in TChain::SetEntryList and adds another possible value for it "sync". With this new option, we can support the usecase described in this PR while still relying on the preexisting default behaviour. The option assumes that there is a 1:1 mapping between the trees in the chain and the sub lists in the entry list. This assumption is verified in two ways:

  1. The number of sub lists is equal to the number of trees in the chain. If not, the following error is thrown
terminate called after throwing an instance of 'std::runtime_error'
  what():  In 'TChain::SetEntryList': the number of sub entry lists in the input TEntryList (1) is not equal to the number of files in the chain (2)
Aborted (core dumped)
  1. That each sub list corresponds to the correct treename, filename at the same index in the chain. If not, the following error is thrown:
terminate called after throwing an instance of 'std::runtime_error'
  what():  In 'TChain::SetEntryList': the sub entry list at index 1 doesn't correspond to treename 'entries' and filename 'file_20entries_1.root': it has treename 'entries' and filename 'file_20entries_2.root'
Aborted (core dumped)

@vepadulano vepadulano changed the title [skip-ci]Possible way of supporting multiple files with same name with TEntryList Possible way of supporting multiple files with same name with TEntryList Jul 15, 2021
@phsft-bot
Copy link
Copy Markdown

Starting build on ROOT-debian10-i386/cxx14, ROOT-performance-centos8-multicore/default, ROOT-ubuntu16/nortcxxmod, mac1014/python3, mac11.0/cxx17, windows10/cxx14
How to customize builds

@phsft-bot
Copy link
Copy Markdown

Build failed on ROOT-ubuntu16/nortcxxmod.
Running on sft-ubuntu-1604-1.cern.ch:/build/workspace/root-pullrequests-build
See console output.

Failing tests:

@phsft-bot
Copy link
Copy Markdown

Build failed on ROOT-performance-centos8-multicore/default.
Running on olbdw-01.cern.ch:/data/sftnight/workspace/root-pullrequests-build
See console output.

Failing tests:

@phsft-bot
Copy link
Copy Markdown

Build failed on mac1014/python3.
Running on macitois21.dyndns.cern.ch:/Users/sftnight/build/workspace/root-pullrequests-build
See console output.

Failing tests:

@phsft-bot
Copy link
Copy Markdown

Build failed on mac11.0/cxx17.
Running on macphsft20.dyndns.cern.ch:/Users/sftnight/build/workspace/root-pullrequests-build
See console output.

Failing tests:

@phsft-bot
Copy link
Copy Markdown

Build failed on ROOT-debian10-i386/cxx14.
Running on pcepsft11.dyndns.cern.ch:/home/sftnight/build/workspace/root-pullrequests-build
See console output.

Failing tests:

@vepadulano vepadulano marked this pull request as ready for review July 17, 2021 07:23
@phsft-bot
Copy link
Copy Markdown

Starting build on ROOT-debian10-i386/cxx14, ROOT-performance-centos8-multicore/default, ROOT-ubuntu16/nortcxxmod, mac1014/python3, mac11.0/cxx17, windows10/cxx14
How to customize builds

@phsft-bot
Copy link
Copy Markdown

Build failed on ROOT-debian10-i386/cxx14.
Running on pcepsft10.dyndns.cern.ch:/build/workspace/root-pullrequests-build
See console output.

Errors:

  • [2021-07-17T07:25:39.750Z] stderr: error: could not read '.git/rebase-apply/head-name': No such file or directory

Failing tests:

@pcanal
Copy link
Copy Markdown
Member

pcanal commented Jul 19, 2021

With this new option, we can support the usecase described in this PR while still relying on the preexisting default behaviour.

The new scheme indeed seems like a reasonable solution.

@vepadulano vepadulano requested a review from bellenot as a code owner July 20, 2021 13:01
@phsft-bot
Copy link
Copy Markdown

Starting build on ROOT-debian10-i386/cxx14, ROOT-performance-centos8-multicore/default, ROOT-ubuntu16/nortcxxmod, mac1014/python3, mac11.0/cxx17, windows10/cxx14
How to customize builds

@vepadulano
Copy link
Copy Markdown
Member Author

The new scheme indeed seems like a reasonable solution.

Thanks Philippe! I've uploaded a few tests. I'd like to have also @eguiraud 's opinion

Copy link
Copy Markdown
Contributor

@eguiraud eguiraud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the AddSubList idea. With that, what is something that still goes wrong unless "sync" is also used?

fLists = new TList();
}
fLists->Add(elistcopy);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this logic is much simpler that what is done in Add. One thing I see it's missing is handling the case in which elist has sublists (totally ok on my side, maybe it warrants a mention in the method description).

@pcanal do we need to take care of something else that Add takes care of?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the current signature there is nothing more to do. The complexity in the regular case is the "or merge with existing" part. Since here, by contract, it is just an add, we are done. (One would need to reintroduce that complexity, if this routine was 'AddAt' to merge with an existing entry list as the same index.

@phsft-bot
Copy link
Copy Markdown

Build failed on ROOT-debian10-i386/cxx14.
Running on pcepsft11.dyndns.cern.ch:/home/sftnight/build/workspace/root-pullrequests-build
See console output.

Failing tests:

@eguiraud
Copy link
Copy Markdown
Contributor

eguiraud commented Jul 21, 2021

As per our discussion:

  • add an \attention or similar to TEntryList::Add's docs to tell users to use AddSublist instead if they want multiple sublists for the same treename and filename

@phsft-bot
Copy link
Copy Markdown

Starting build on ROOT-debian10-i386/cxx14, ROOT-performance-centos8-multicore/default, ROOT-ubuntu16/nortcxxmod, mac1014/python3, mac11.0/cxx17, windows10/cxx14
How to customize builds

@phsft-bot
Copy link
Copy Markdown

Build failed on ROOT-debian10-i386/cxx14.
Running on pcepsft10.dyndns.cern.ch:/build/workspace/root-pullrequests-build
See console output.

Errors:

  • [2021-07-21T09:13:10.090Z] stderr: error: could not read '.git/rebase-apply/head-name': No such file or directory

Failing tests:

Copy link
Copy Markdown
Member

@bellenot bellenot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please look for the No newline at end of file warnings. Thanks

@phsft-bot
Copy link
Copy Markdown

Starting build on ROOT-debian10-i386/cxx14, ROOT-performance-centos8-multicore/default, ROOT-ubuntu16/nortcxxmod, mac1014/python3, mac11.0/cxx17, windows10/cxx14
How to customize builds

@phsft-bot
Copy link
Copy Markdown

Build failed on ROOT-debian10-i386/cxx14.
Running on pcepsft10.dyndns.cern.ch:/build/workspace/root-pullrequests-build
See console output.

Errors:

  • [2021-07-21T12:44:39.245Z] stderr: error: could not read '.git/rebase-apply/head-name': No such file or directory

Failing tests:

TEntryList does not support the case of a TChain that holds two or more trees with same name and filename. This is due to two reasons:

1. TEntryList::Add merges the input TEntryList in the current one if they share the same treename and filename, instead of adding the input as a sub entry list (which would be the intended behaviour with TChain).
2. TChain::SetEntryList ultimately calls TEntryList::GetEntryList(treename, filename) to retrieve the correct sub entry list for the current tree in the chain. This means that two trees with same name would retrieve the same TEntryList thus leading to wrong reads.

This commit enables support for such usecases, which can be especially useful in test and benchmark scenarios, by addressing the two issues above:
1. The new method TEntryList::AddSubList allows to add a TEntryList to another one directly as an element of the fLists data member.
2. A new option has been added to TChain::SetEntryList, called "sync", such that the TChain will go through the TEntryList in lockstep with the trees in the chain rather than performing a lookup based on treename and filename.

Both need to be activated in order for the use case to work properly. Here is an example ( The file "file_20_entries.root" holds a tree with name "entries" and 20 entries in total). The expected output is for RDataFrame to read 30 entries (20 from the first tree and 10 from the second).

```
int main(){
    auto start_1{0};
    auto start_2{0};
    auto end_1{20};
    auto end_2{10};

    TEntryList elists;
    TEntryList elist1{"","","entries","file_20entries.root"};
    TEntryList elist2{"","","entries","file_20entries.root"};

    for(auto entry = start_1; entry < end_1; entry++){
        elist1.Enter(entry);
    }

    for(auto entry = start_2; entry < end_2; entry++){
        elist2.Enter(entry);
    }

    elists.AddSubList(&elist1);
    elists.AddSubList(&elist2);

    TChain chain{"entries"};
    chain.Add("file_20entries.root");
    chain.Add("file_20entries.root");

    chain.SetEntryList(&elists, "sync");

    ROOT::RDataFrame df{chain};

    std::cout << df.Count().GetValue() << "\n";
}
```
@vepadulano vepadulano force-pushed the tentrylist-addsublist branch from cc515f9 to 9e5a201 Compare July 23, 2021 15:28
@phsft-bot
Copy link
Copy Markdown

Starting build on ROOT-debian10-i386/cxx14, ROOT-performance-centos8-multicore/default, ROOT-ubuntu16/nortcxxmod, mac1014/python3, mac11.0/cxx17, windows10/cxx14
How to customize builds

@vepadulano
Copy link
Copy Markdown
Member Author

I squashed and rebased the commits. Will merge if tests are green 👍

@phsft-bot
Copy link
Copy Markdown

Build failed on ROOT-debian10-i386/cxx14.
Running on pcepsft11.dyndns.cern.ch:/home/sftnight/build/workspace/root-pullrequests-build
See console output.

Failing tests:

@vepadulano vepadulano merged commit 6de8aef into root-project:master Jul 26, 2021
@vepadulano vepadulano deleted the tentrylist-addsublist branch October 25, 2021 15:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants