[ntuple] Overhaul tuning and default settings when writing#8703
[ntuple] Overhaul tuning and default settings when writing#8703jblomer merged 46 commits intoroot-project:masterfrom
Conversation
|
Starting build on |
DeepCode's analysis on #9dbf29 found:
Top issues
👉 View analysis in DeepCode’s Dashboard | Configure the bot👉 The DeepCode service and API will be deprecated in August, 2021. Here is the information how to migrate. Thank you for using DeepCode 🙏 ❤️ !If you are using our plugins, you might be interested in their successors: Snyk's JetBrains plugin and Snyk's VS Code plugin. |
|
Build failed on ROOT-debian10-i386/cxx14. Errors:
Failing tests: |
|
Starting build on |
|
Build failed on ROOT-debian10-i386/cxx14. Failing tests: |
Axel-Naumann
left a comment
There was a problem hiding this comment.
It would be good to have the PR description as documentation somewhere. Maybe split for cluster and page flushing?
Co-authored-by: Axel Naumann <[email protected]>
|
Starting build on |
Co-authored-by: Axel Naumann <[email protected]>
|
Starting build on |
Co-authored-by: Axel Naumann <[email protected]>
|
Starting build on |
|
Build failed on ROOT-debian10-i386/cxx14. Errors:
Failing tests: |
|
Build failed on mac11.0/cxx17. Failing tests:
|
|
Starting build on |
|
Build failed on windows10/cxx14. Errors:
|
|
Starting build on |
|
Build failed on mac11.0/cxx17. Failing tests: |
|
Starting build on |
|
Starting build on |
|
Build failed on windows10/cxx14. Errors:
|
|
Build failed on mac11.0/cxx17. Failing tests: |
|
Starting build on |
|
Starting build on |
|
Build failed on windows10/cxx14. Errors:
|
|
Build failed on mac11.0/cxx17. Failing tests: |
pcanal
left a comment
There was a problem hiding this comment.
Approved pending resolution of 2 left over conversations.
This Pull request:
The PR sets new defaults for the cluster size and page size of RNTuple. The default should work well in the majority of cases but can be adjusted if needed. The idea is to give target sizes for clusters and pages (measured in bytes). RNTuple will try make good decisions and approximate the target sizes. The PR replaces previous defaults for cluster size and page size given in number of entries and number of elements resp.
Changes or fixes:
The PR sets three new defaults:
Given the three settings, writing works as follows: when the current cluster is larger than the maximum uncompressed size, it will be flushed unconditionally. When the current cluster size reaches the estimate for the compressed cluster size, it will be flushed, too. The estimated compression ratio for the first cluster is 0.5 if compression is used, and 1 otherwise. The following clusters use the compression ratio of the last cluster as estimate.
Pages are filled until the target size and then flushed. If a column has enough elements to fill at least half a page, there is a mechanism to prevent undersized tail pages: writing uses two page buffers in turns and flushes the previous buffer only once the next buffer is at least at 50%. If the cluster gets flushed with an undersized tail page, the small page is appended to the previous page before flushing. Therefore, tail pages sizes are between
[0.5 * target size .. 1.5 * target size].