Skip to content

[ntuple] Change to the version 1 RC1 binary format#8897

Merged
jalopezg-git merged 73 commits intoroot-project:masterfrom
jblomer:ntuple-binary-format-v1
Dec 10, 2021
Merged

[ntuple] Change to the version 1 RC1 binary format#8897
jalopezg-git merged 73 commits intoroot-project:masterfrom
jblomer:ntuple-binary-format-v1

Conversation

@jblomer
Copy link
Copy Markdown
Contributor

@jblomer jblomer commented Aug 25, 2021

This Pull request:

Adds the version 1 binary format specification and upgrades the implementation from version 0 to version 1. This is a backwards incompatible change. As of version 1, RNTuple is supposed to stay backwards compatible.

The new binary format is a precondition for, among other things, the following required features

  • Support for incremental loading of meta-data for very large files (>100G)
  • Sharded clusters, needed for backfilling
  • Forward-compatibility
  • Meta-data support

Compared to the v0 format, the header is ~40% smaller and the footer ~100% smaller (after zstd compression).

This PR sets the pre-release tag 1 in the binary format, so files written in this version trigger a warning on reading. The pre-release tag might increase in follow-up PRs. Once stable, the pre-release tag will be set to 0.

Other follow-up PRs:

  • Use v1 serialization in DAOS backend
  • Remove v0 serialization code

@jblomer jblomer self-assigned this Aug 25, 2021
@jblomer jblomer requested a review from bellenot as a code owner August 25, 2021 08:26
@jblomer jblomer marked this pull request as draft August 25, 2021 08:26
@phsft-bot
Copy link
Copy Markdown

Starting build on ROOT-debian10-i386/cxx14, ROOT-performance-centos8-multicore/default, ROOT-ubuntu16/nortcxxmod, mac1014/python3, mac11.0/cxx17, windows10/cxx14
How to customize builds

@phsft-bot
Copy link
Copy Markdown

Build failed on ROOT-debian10-i386/cxx14.
Running on pcepsft10.dyndns.cern.ch:/build/workspace/root-pullrequests-build
See console output.

Errors:

  • [2021-08-25T08:29:11.816Z] stderr: error: could not read '.git/rebase-apply/head-name': No such file or directory

@phsft-bot
Copy link
Copy Markdown

Build failed on mac11.0/cxx17.
Running on macphsft23.dyndns.cern.ch:/Users/sftnight/build/workspace/root-pullrequests-build
See console output.

Failing tests:

@phsft-bot
Copy link
Copy Markdown

Starting build on ROOT-debian10-i386/cxx14, ROOT-performance-centos8-multicore/default, ROOT-ubuntu16/nortcxxmod, mac1014/python3, mac11.0/cxx17, windows10/cxx14
How to customize builds

@phsft-bot
Copy link
Copy Markdown

Build failed on mac11.0/cxx17.
Running on macphsft23.dyndns.cern.ch:/Users/sftnight/build/workspace/root-pullrequests-build
See console output.

Failing tests:

@phsft-bot
Copy link
Copy Markdown

Starting build on ROOT-debian10-i386/cxx14, ROOT-performance-centos8-multicore/default, ROOT-ubuntu16/nortcxxmod, mac1014/python3, mac11.0/cxx17, windows10/cxx14
How to customize builds

@phsft-bot
Copy link
Copy Markdown

Build failed on windows10/cxx14.
Running on null:C:\build\workspace\root-pullrequests-build
See console output.

Errors:

  • [2021-09-08T12:23:34.779Z] CMake Error at cmake/modules/RootBuildOptions.cmake:407 (message):
  • [2021-09-08T12:23:34.779Z] CMake Error at C:/build/workspace/root-pullrequests-build/rootspi/jenkins/root-build.cmake:1117 (message):

@phsft-bot
Copy link
Copy Markdown

Build failed on mac11.0/cxx17.
Running on macphsft23.dyndns.cern.ch:/Users/sftnight/build/workspace/root-pullrequests-build
See console output.

Failing tests:

@phsft-bot
Copy link
Copy Markdown

Starting build on ROOT-debian10-i386/cxx14, ROOT-performance-centos8-multicore/default, ROOT-ubuntu16/nortcxxmod, mac1014/python3, mac11.0/cxx17, windows10/cxx14
How to customize builds

@phsft-bot
Copy link
Copy Markdown

Build failed on windows10/cxx14.
Running on null:C:\build\workspace\root-pullrequests-build
See console output.

Errors:

  • [2021-09-08T20:18:49.946Z] CMake Error at cmake/modules/RootBuildOptions.cmake:407 (message):
  • [2021-09-08T20:18:49.946Z] CMake Error at C:/build/workspace/root-pullrequests-build/rootspi/jenkins/root-build.cmake:1117 (message):

The first 32bit integer references the physical column ID.
The second 32bit integer references a field that needs to have the "alias field" flag set.
The ID of the alias column itself is given implicitly by the serialization order.
In particular, alias columns have larger IDs than physical columns.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't you intertwine alias field and physical field?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose of having the alias column being on disk (since the alias field is already there)?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can make an alias field the child of a physical field. That should be as far as it goes in terms of mixing.

The motivation for alias fields come from the conversion of of TTree files with leaf count branches (e.g. njets plus jet_eta, jet_pt, etc.). This would be translated into an (anonymous) collection in RNTuple but that implies that the field names change, e.g. to jet.eta, jet.pt. To make sure existing RDF analysis code continues working on the converted file, I though it would be useful to remember their original names.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I mean is why is the alias represented, on disk, as a column rather than the column being recreated out of the alias field when reading the RNtuple meta-data?

@phsft-bot
Copy link
Copy Markdown

Build failed on mac11.0/cxx17.
Running on macphsft20.dyndns.cern.ch:/Users/sftnight/build/workspace/root-pullrequests-build
See console output.

Failing tests:

jblomer and others added 26 commits December 9, 2021 21:42
…:GetBitsOnStorage()` (NFC)

Co-authored-by: Javier Lopez-Gomez <[email protected]>
@jalopezg-git jalopezg-git force-pushed the ntuple-binary-format-v1 branch from 51148cc to 10de5d2 Compare December 9, 2021 20:46
@phsft-bot
Copy link
Copy Markdown

Starting build on ROOT-debian10-i386/cxx14, ROOT-performance-centos8-multicore/default, ROOT-ubuntu16/nortcxxmod, ROOT-ubuntu2004/soversion, mac1015/python3, mac11/cxx17, windows10/cxx14
How to customize builds

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants