Skip to content

[ntuple] Overhaul tuning and default settings when writing#8703

Merged
jblomer merged 46 commits intoroot-project:masterfrom
jblomer:ntuple-write-autotuning
Sep 21, 2021
Merged

[ntuple] Overhaul tuning and default settings when writing#8703
jblomer merged 46 commits intoroot-project:masterfrom
jblomer:ntuple-write-autotuning

Conversation

@jblomer
Copy link
Copy Markdown
Contributor

@jblomer jblomer commented Jul 20, 2021

This Pull request:

The PR sets new defaults for the cluster size and page size of RNTuple. The default should work well in the majority of cases but can be adjusted if needed. The idea is to give target sizes for clusters and pages (measured in bytes). RNTuple will try make good decisions and approximate the target sizes. The PR replaces previous defaults for cluster size and page size given in number of entries and number of elements resp.

Changes or fixes:

The PR sets three new defaults:

  • Target size for compressed clusters of 50MB. In general, larger clusters provide room for more and larger pages and should improve compression ratio and speed. However, clusters also need to be buffered during write and (partially) during read, so larger cluster increase the memory footprint.
  • Maximum size for uncompressed clusters of 512MiB. Prevents very compressible clusters from growing too large. That is mostly a problem for writing.
  • Target size for uncompressed pages of 64KiB. In general, larger pages give better compression ratios. Smaller pages, however, reduce the memory footprint. When reading, every active column requires at least one page buffer. For the number of read requests, the page size does not matter because pages of the same column are written consecutively and therefore read in one go.

Given the three settings, writing works as follows: when the current cluster is larger than the maximum uncompressed size, it will be flushed unconditionally. When the current cluster size reaches the estimate for the compressed cluster size, it will be flushed, too. The estimated compression ratio for the first cluster is 0.5 if compression is used, and 1 otherwise. The following clusters use the compression ratio of the last cluster as estimate.

Pages are filled until the target size and then flushed. If a column has enough elements to fill at least half a page, there is a mechanism to prevent undersized tail pages: writing uses two page buffers in turns and flushes the previous buffer only once the next buffer is at least at 50%. If the cluster gets flushed with an undersized tail page, the small page is appended to the previous page before flushing. Therefore, tail pages sizes are between [0.5 * target size .. 1.5 * target size].

@jblomer jblomer self-assigned this Jul 20, 2021
@jblomer jblomer requested a review from pcanal as a code owner July 20, 2021 06:20
@phsft-bot
Copy link
Copy Markdown

Starting build on ROOT-debian10-i386/cxx14, ROOT-performance-centos8-multicore/default, ROOT-ubuntu16/nortcxxmod, mac1014/python3, mac11.0/cxx17, windows10/cxx14
How to customize builds

@jblomer jblomer requested review from Axel-Naumann and eguiraud July 20, 2021 06:21
@ghost
Copy link
Copy Markdown

ghost commented Jul 20, 2021

DeepCode's analysis on #9dbf29 found:

  • ⚠️ 1 warning 👇

Top issues

Description Example fixes
Potential nullptr dereference. Null flows from nullptr literal. Consider adding a check. Occurrences: 🔧 Example fixes

👉 View analysis in DeepCode’s Dashboard | Configure the bot

👉 The DeepCode service and API will be deprecated in August, 2021. Here is the information how to migrate. Thank you for using DeepCode 🙏 ❤️ !

If you are using our plugins, you might be interested in their successors: Snyk's JetBrains plugin and Snyk's VS Code plugin.

@phsft-bot
Copy link
Copy Markdown

Build failed on ROOT-debian10-i386/cxx14.
Running on pcepsft10.dyndns.cern.ch:/build/workspace/root-pullrequests-build
See console output.

Errors:

  • [2021-07-20T06:21:55.108Z] stderr: error: could not read '.git/rebase-apply/head-name': No such file or directory

Failing tests:

@phsft-bot
Copy link
Copy Markdown

Starting build on ROOT-debian10-i386/cxx14, ROOT-performance-centos8-multicore/default, ROOT-ubuntu16/nortcxxmod, mac1014/python3, mac11.0/cxx17, windows10/cxx14
How to customize builds

@phsft-bot
Copy link
Copy Markdown

Build failed on ROOT-debian10-i386/cxx14.
Running on pcepsft11.dyndns.cern.ch:/home/sftnight/build/workspace/root-pullrequests-build
See console output.

Failing tests:

Copy link
Copy Markdown
Member

@Axel-Naumann Axel-Naumann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to have the PR description as documentation somewhere. Maybe split for cluster and page flushing?

@phsft-bot
Copy link
Copy Markdown

Starting build on ROOT-debian10-i386/cxx14, ROOT-performance-centos8-multicore/default, ROOT-ubuntu16/nortcxxmod, mac1014/python3, mac11.0/cxx17, windows10/cxx14
How to customize builds

@phsft-bot
Copy link
Copy Markdown

Starting build on ROOT-debian10-i386/cxx14, ROOT-performance-centos8-multicore/default, ROOT-ubuntu16/nortcxxmod, mac1014/python3, mac11.0/cxx17, windows10/cxx14
How to customize builds

@phsft-bot
Copy link
Copy Markdown

Starting build on ROOT-debian10-i386/cxx14, ROOT-performance-centos8-multicore/default, ROOT-ubuntu16/nortcxxmod, mac1014/python3, mac11.0/cxx17, windows10/cxx14
How to customize builds

@phsft-bot
Copy link
Copy Markdown

Build failed on ROOT-debian10-i386/cxx14.
Running on pcepsft10.dyndns.cern.ch:/build/workspace/root-pullrequests-build
See console output.

Errors:

  • [2021-07-22T12:51:42.350Z] stderr: error: could not read '.git/rebase-apply/head-name': No such file or directory

Failing tests:

@phsft-bot
Copy link
Copy Markdown

@phsft-bot
Copy link
Copy Markdown

Starting build on ROOT-debian10-i386/cxx14, ROOT-performance-centos8-multicore/default, ROOT-ubuntu16/nortcxxmod, mac1014/python3, mac11.0/cxx17, windows10/cxx14
How to customize builds

@phsft-bot
Copy link
Copy Markdown

Build failed on windows10/cxx14.
Running on null:C:\build\workspace\root-pullrequests-build
See console output.

Errors:

  • [2021-09-17T07:44:24.243Z] CMake Error at cmake/modules/RootBuildOptions.cmake:407 (message):
  • [2021-09-17T07:44:24.243Z] CMake Error at C:/build/workspace/root-pullrequests-build/rootspi/jenkins/root-build.cmake:1117 (message):

@phsft-bot
Copy link
Copy Markdown

Starting build on ROOT-debian10-i386/cxx14, ROOT-performance-centos8-multicore/default, ROOT-ubuntu16/nortcxxmod, mac1014/python3, mac11.0/cxx17, windows10/cxx14
How to customize builds

@phsft-bot
Copy link
Copy Markdown

Build failed on mac11.0/cxx17.
Running on macphsft20.dyndns.cern.ch:/Users/sftnight/build/workspace/root-pullrequests-build
See console output.

Failing tests:

@phsft-bot
Copy link
Copy Markdown

Starting build on ROOT-debian10-i386/cxx14, ROOT-performance-centos8-multicore/default, ROOT-ubuntu16/nortcxxmod, mac1014/python3, mac11.0/cxx17, windows10/cxx14
How to customize builds

@phsft-bot
Copy link
Copy Markdown

Starting build on ROOT-debian10-i386/cxx14, ROOT-performance-centos8-multicore/default, ROOT-ubuntu16/nortcxxmod, mac1014/python3, mac11.0/cxx17, windows10/cxx14
How to customize builds

@phsft-bot
Copy link
Copy Markdown

Build failed on windows10/cxx14.
Running on null:C:\build\workspace\root-pullrequests-build
See console output.

Errors:

  • [2021-09-17T10:15:29.154Z] CMake Error at cmake/modules/RootBuildOptions.cmake:407 (message):
  • [2021-09-17T10:15:29.154Z] CMake Error at C:/build/workspace/root-pullrequests-build/rootspi/jenkins/root-build.cmake:1117 (message):

@phsft-bot
Copy link
Copy Markdown

Build failed on mac11.0/cxx17.
Running on macphsft20.dyndns.cern.ch:/Users/sftnight/build/workspace/root-pullrequests-build
See console output.

Failing tests:

@phsft-bot
Copy link
Copy Markdown

Starting build on ROOT-debian10-i386/cxx14, ROOT-performance-centos8-multicore/default, ROOT-ubuntu16/nortcxxmod, mac1014/python3, mac11.0/cxx17, windows10/cxx14
How to customize builds

@phsft-bot
Copy link
Copy Markdown

Starting build on ROOT-debian10-i386/cxx14, ROOT-performance-centos8-multicore/default, ROOT-ubuntu16/nortcxxmod, mac1014/python3, mac11.0/cxx17, windows10/cxx14
How to customize builds

@jblomer
Copy link
Copy Markdown
Contributor Author

jblomer commented Sep 19, 2021

@pcanal As discussed, the estimator for the compressed cluster size is now the average compression ratio of all so-far written clusters. I also added a stub checklist for the future RNTuple validation.

@phsft-bot
Copy link
Copy Markdown

Build failed on windows10/cxx14.
Running on null:C:\build\workspace\root-pullrequests-build
See console output.

Errors:

  • [2021-09-19T11:33:11.882Z] CMake Error at cmake/modules/RootBuildOptions.cmake:407 (message):
  • [2021-09-19T11:33:11.882Z] CMake Error at C:/build/workspace/root-pullrequests-build/rootspi/jenkins/root-build.cmake:1117 (message):

@phsft-bot
Copy link
Copy Markdown

Build failed on mac11.0/cxx17.
Running on macphsft20.dyndns.cern.ch:/Users/sftnight/build/workspace/root-pullrequests-build
See console output.

Failing tests:

Copy link
Copy Markdown
Member

@pcanal pcanal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved pending resolution of 2 left over conversations.

@jblomer jblomer merged commit 61124f2 into root-project:master Sep 21, 2021
@jblomer jblomer deleted the ntuple-write-autotuning branch September 21, 2021 12:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants