[ntuple] Make cluster entries and elements per page configurable#8113
[ntuple] Make cluster entries and elements per page configurable#8113jblomer merged 6 commits intoroot-project:masterfrom
Conversation
|
Starting build on |
|
Build failed on mac11.0/cxx17. Warnings:
And 1476 more Failing tests:
|
jalopezg-git
left a comment
There was a problem hiding this comment.
Thanks! I have just left an small comment. :-)
| value.GetField()->Append(value); | ||
| } | ||
| fNEntries++; | ||
| if ((fNEntries % fClusterSizeEntries) == 0) |
There was a problem hiding this comment.
Maybe the fClusterSizeEntries data member is unused now? I would either remove it or initialize it in the RNTupleWriter constructor using the value of fSink->GetWriteOptions().GetNClusterEntries().
| : fSink(std::move(sink)) | ||
| , fModel(std::move(model)) | ||
| , fMetrics("RNTupleWriter") | ||
| , fClusterSizeEntries(kDefaultClusterSizeEntries) |
|
Starting build on |
|
Future extension: add soft limits to cluster memory sizes, since we don't want to have a case where a single entry is 1MB and the number of cluster entries is the default 64K. |
|
Build failed on mac11.0/cxx17. Warnings:
And 1497 more Failing tests:
|
jblomer
left a comment
There was a problem hiding this comment.
Very useful! Thanks a lot!
|
|
||
| bool GetUseBufferedWrite() const { return fUseBufferedWrite; } | ||
| void SetUseBufferedWrite(bool val) { fUseBufferedWrite = val; } | ||
| NTupleSize_t GetNClusterEntries() const { return fNClusterEntries; } |
There was a problem hiding this comment.
Let's rename to GetNEntriesPerCluster so that it is consistent with GetNElementsPerPage (also for the setter and the data member)
| } | ||
|
|
||
| auto ntuple = RNTupleReader::Open("ntuple", fileGuard.GetPath()); | ||
| auto col0_pages = ntuple->GetDescriptor().GetClusterDescriptor(0).GetPageRange(0).Clone(); |
There was a problem hiding this comment.
Couldn't this be a const auto & to avoid the cloning?
There was a problem hiding this comment.
I think so (changed in 557854b). I know similar style is used throughout the other tutorials too but for whatever reason on Friday I was being paranoid about dangling references with so many chained calls (ubsan misses easy mistakes here too). On second thought, I think we're OK because every method returns a const reference, which in turn points to a member of the ntuple's descriptor which is always in scope.
|
@phsft-bot build with flags -DCTEST_TEST_EXCLUDE_NONE=On |
|
Starting build on |
We were using Clone to avoid any possibility of a dangling reference, but in this case we are OK as everything in the call chain returns a const ref to a member of the ntuple's descriptor.
|
Build failed on ROOT-ubuntu16/nortcxxmod. Failing tests: |
|
Build failed on windows10/cxx14. Errors:
And 28 more |
|
Starting build on |
|
Build failed on mac11.0/cxx17. Warnings:
And 1497 more Failing tests:
|
Allow users to adjust the number of entries per cluster and the number of elements per page using
RNTupleWriteOptions.I also took the opportunity to fix some whitespace errors in
RNTupleOptions.hxx. Fixes #7853.Usage (see tests as well):
As was pointed out in #7112 (comment), there are some
NElementsPerPageinputs that could cause compression problems (i.e. those where the total page memory is larger than0xffffff). Should we have some error checking at theRNTupleWriteOptionslevel? Or maybe this should be considered a bug on the compression side of things.cc @jalopezg-r00t