[ntuple] Add support for "friend ntuples" by jblomer · Pull Request #6979 · root-project/root

jblomer · 2021-01-04T08:23:38Z

Adds a new, virtual page source, RPageSourceFriends, that takes a list of other page sources in order to combine them horizontally. The friends page source constructs a virtual descriptor and maps field, column, and cluster IDs from virtual to physical ones and back.

Note that this PR introduces a change to the cluster semantics: clusters do not need to cover all the columns for an event range but they can cover only a part of it (a shard). Columns that are linked (e.g. offset column and value column, columns belonging to the same field subtree) should still be part of only a single cluster.

Remaining todos:

Improve comments
Implement RPageSourceFriends::Clone()

phsft-bot · 2021-01-04T08:23:46Z

Starting build on ROOT-debian10-i386/cxx14, ROOT-fedora30/cxx14, ROOT-fedora31/noimt, ROOT-ubuntu16/nortcxxmod, mac1014/python3, mac11.0/cxx17, windows10/cxx14
How to customize builds

phsft-bot · 2021-01-04T08:24:26Z

Build failed on ROOT-debian10-i386/cxx14.
Running on pcepsft10.dyndns.cern.ch:/build/workspace/root-pullrequests-build
See console output.

Errors:

[2021-01-04T08:24:25.267Z] CMake Error at /home/sftnight/build/workspace/root-pullrequests-build/rootspi/jenkins/root-build.cmake:1009 (message):

phsft-bot · 2021-01-04T08:24:27Z

Build failed on ROOT-fedora30/cxx14.
Running on root-fedora30-1.cern.ch:/build/workspace/root-pullrequests-build
See console output.

Errors:

[2021-01-04T08:24:26.149Z] CMake Error at /build/workspace/root-pullrequests-build/rootspi/jenkins/root-build.cmake:1009 (message):

phsft-bot · 2021-01-04T08:24:28Z

Build failed on ROOT-fedora31/noimt.
Running on root-fedora-31-2.cern.ch:/home/sftnight/build/workspace/root-pullrequests-build
See console output.

Errors:

[2021-01-04T08:24:26.146Z] CMake Error at /home/sftnight/build/workspace/root-pullrequests-build/rootspi/jenkins/root-build.cmake:1009 (message):

phsft-bot · 2021-01-04T08:24:29Z

Build failed on mac11.0/cxx17.
Running on macphsft20.dyndns.cern.ch:/Users/sftnight/build/workspace/root-pullrequests-build
See console output.

Errors:

[2021-01-04T08:24:25.387Z] CMake Error at /Users/sftnight/build/workspace/root-pullrequests-build/rootspi/jenkins/root-build.cmake:1009 (message):

phsft-bot · 2021-01-04T08:24:39Z

Build failed on ROOT-ubuntu16/nortcxxmod.
Running on sft-ubuntu-1604-1.cern.ch:/build/workspace/root-pullrequests-build
See console output.

Errors:

[2021-01-04T08:24:38.522Z] CMake Error at /mnt/build/workspace/root-pullrequests-build/rootspi/jenkins/root-build.cmake:1009 (message):

phsft-bot · 2021-01-04T08:24:57Z

Build failed on windows10/cxx14.
Running on null:C:\build\workspace\root-pullrequests-build
See console output.

Errors:

[2021-01-04T08:24:56.830Z] CMake Error at C:/build/workspace/root-pullrequests-build/rootspi/jenkins/root-build.cmake:1009 (message):

phsft-bot · 2021-01-04T08:25:29Z

Build failed on mac1014/python3.
Running on macitois21.dyndns.cern.ch:/Users/sftnight/build/workspace/root-pullrequests-build
See console output.

Errors:

[2021-01-04T08:25:27.809Z] CMake Error at /Volumes/HD2/build/workspace/root-pullrequests-build/rootspi/jenkins/root-build.cmake:1009 (message):

phsft-bot · 2021-01-04T08:35:50Z

Starting build on ROOT-debian10-i386/cxx14, ROOT-fedora30/cxx14, ROOT-fedora31/noimt, ROOT-ubuntu16/nortcxxmod, mac1014/python3, mac11.0/cxx17, windows10/cxx14
How to customize builds

phsft-bot · 2021-01-04T09:36:52Z

Build failed on mac11.0/cxx17.
Running on macphsft20.dyndns.cern.ch:/Users/sftnight/build/workspace/root-pullrequests-build
See console output.

Failing tests:

projectroot.roottest.python.JupyROOT.roottest_python_JupyROOT_tpython_notebook

eguiraud · 2021-01-04T16:06:47Z

[are we set on the (super-ambiguous, imho) "friend" terminology? some possible alternatives: horizontal stack, column-wise concat, field-wise concat, horizontal concat...]

mxxo · 2021-01-06T18:08:20Z

some possible alternatives: horizontal stack, column-wise concat, field-wise concat, horizontal concat...]

I'm not sure if it lines up with the SQL term exactly but we could use "join" (column-join, field-join, etc.)

pcanal · 2021-01-06T18:58:34Z

Is there a provision/design for the case where the entries in the "friend" are not aligned (either because the friend is missing entries, or has more entries and/or they are in a different order)?

tree/ntuple/v7/inc/ROOT/RPageSourceFriends.hxx

phsft-bot · 2021-03-23T17:10:12Z

Starting build on ROOT-debian10-i386/cxx14, ROOT-performance-centos8-multicore/default, ROOT-fedora30/cxx14, ROOT-fedora31/noimt, ROOT-ubuntu16/nortcxxmod, mac1014/python3, mac11.0/cxx17, windows10/cxx14
How to customize builds

jblomer · 2021-03-23T17:14:57Z

@pcanal This should be ready for review. I think the way to later allow for unaligned friends is through combining friends with another virtual page source that gives access to the underlying page source with an entry list (or another mechanism for shuffling & skimming the original entries).

I like @mxxo suggestion of renaming friends to "joins" or joined ntuples. @eguiraud @Axel-Naumann what do you think?

eguiraud · 2021-03-25T09:16:13Z

About the name of the feature, my two cents: I think "join" has the huge advantage that it is what users (both database-savvy and not) would instinctively search for in a lot of cases. "Joined ntuples" is a better wording if what you can do is basically a horizontal "paste". If you can actually build relations between two ntuples based on entry values, then it's basically an analog of the SQL "join" so even just "join" is not misleading.

pcanal · 2021-04-06T00:23:38Z

tree/ntuple/v7/inc/ROOT/RNTupleDescriptor.hxx

+   FindNextClusterId and FindPrevClusterId to travers clusters by entry number.
+   */
+   // clang-format on
+   class RClusterDescriptorRange {


Do you mean the suffix 'Range'? If so would you have some sort of range start and end end that is different in each instance of RClusterDescriptorRange ? (ie. see std::span or even RNTupleClusterRange :) )

Maybe RClusterDescriptorIteratable and then instead of GetClusterRange something like make_iteratable or GetClusterIteratable? (Another alternative I can think of is to use 'Collectioninstead ofIteratable` but that might be over-promising)

As discussed, I added a corresponding TODO because there are other *Range classes similar to RClusterDescriptorRange that should be renamed consistently.

phsft-bot · 2021-04-23T13:35:54Z

Starting build on ROOT-debian10-i386/cxx14, ROOT-performance-centos8-multicore/default, ROOT-fedora30/cxx14, ROOT-fedora31/noimt, ROOT-ubuntu16/nortcxxmod, mac1014/python3, mac11.0/cxx17, windows10/cxx14
How to customize builds

…tualField

phsft-bot · 2021-04-27T14:59:09Z

Starting build on ROOT-debian10-i386/cxx14, ROOT-performance-centos8-multicore/default, ROOT-fedora30/cxx14, ROOT-fedora31/noimt, ROOT-ubuntu16/nortcxxmod, mac1014/python3, mac11.0/cxx17, windows10/cxx14
How to customize builds

jblomer self-assigned this Jan 4, 2021

jblomer requested review from couet, oshadura and pcanal as code owners January 4, 2021 08:23

jblomer marked this pull request as draft January 4, 2021 08:23

jblomer changed the title ~~[ntuple] Add support for friends~~ [ntuple] Add support for friend ntuples Jan 4, 2021

jblomer changed the title ~~[ntuple] Add support for friend ntuples~~ [ntuple] Add support for befriending ntuples Jan 4, 2021

jblomer changed the title ~~[ntuple] Add support for befriending ntuples~~ [ntuple] Add support for "friend ntuples" Jan 4, 2021

jblomer force-pushed the ntuple-friends branch from 6f1e260 to 2212937 Compare January 4, 2021 08:35

pcanal reviewed Jan 6, 2021

View reviewed changes

tree/ntuple/v7/inc/ROOT/RPageSourceFriends.hxx Show resolved Hide resolved

couet removed their request for review February 12, 2021 07:21

jblomer force-pushed the ntuple-friends branch from 2212937 to 408dc4f Compare March 23, 2021 17:10

jblomer marked this pull request as ready for review March 23, 2021 17:10

jblomer requested a review from pcanal March 23, 2021 17:16

pcanal reviewed Apr 6, 2021

View reviewed changes

jblomer requested a review from pcanal April 23, 2021 13:36

pcanal approved these changes Apr 26, 2021

View reviewed changes

jblomer added 15 commits April 27, 2021 09:57

[ntuple] Add skeleton for RPageSourceFriends

87ea983

[ntuple] Merge origin fields for friend descriptor

4a0a9b7

[ntuple] merge origin clusters into friend descriptor (WIP)

8254364

[ntuple] implement page redirection in friend source

a2be869

[ntuple] add basic unit tests for friends

cb32dd4

[ntuple] Fix column registration for friends

9d41411

[ntuple] Add friends tutorial

efa215e

[ntuple] Fix loading of friend pages with cluster index

889b6ad

[ntuple] Remove unnecessary map from friend page source

5544ce4

[ntuple] Remove unnecessary parameter from RPageSourceFriends::AddVir…

1ae0c86

…tualField

[nutple] Improve code comments for friend sources (NFC)

8f775fc

[ntuple] Add unit test for the empty friend ntuple

e1841a2

[ntuple] Implement RPageSourceFriends::Clone()

95e606a

[ntuple] Add TODO on naming of *Range classes in descriptor (NFC)

b656927

[ntuple] Fix-up after rebase

13809dd

jblomer force-pushed the ntuple-friends branch from 55e04fa to 13809dd Compare April 27, 2021 14:59

jblomer merged commit 8dc8e8f into root-project:master Apr 27, 2021

jblomer deleted the ntuple-friends branch April 27, 2021 21:25

jblomer mentioned this pull request Apr 30, 2021

[ntuple] Rename *DescriptorRange to *DescriptorIterable #8054

Merged

Conversation

jblomer commented Jan 4, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

phsft-bot commented Jan 4, 2021

Uh oh!

phsft-bot commented Jan 4, 2021

Errors:

Uh oh!

phsft-bot commented Jan 4, 2021

Errors:

Uh oh!

phsft-bot commented Jan 4, 2021

Errors:

Uh oh!

phsft-bot commented Jan 4, 2021

Errors:

Uh oh!

phsft-bot commented Jan 4, 2021

Errors:

Uh oh!

phsft-bot commented Jan 4, 2021

Errors:

Uh oh!

phsft-bot commented Jan 4, 2021

Errors:

Uh oh!

phsft-bot commented Jan 4, 2021

Uh oh!

phsft-bot commented Jan 4, 2021

Failing tests:

Uh oh!

eguiraud commented Jan 4, 2021

Uh oh!

mxxo commented Jan 6, 2021

Uh oh!

pcanal commented Jan 6, 2021

Uh oh!

Uh oh!

phsft-bot commented Mar 23, 2021

Uh oh!

jblomer commented Mar 23, 2021

Uh oh!

eguiraud commented Mar 25, 2021

Uh oh!

pcanal Apr 6, 2021

Choose a reason for hiding this comment

Uh oh!

jblomer Apr 23, 2021

Choose a reason for hiding this comment

Uh oh!

phsft-bot commented Apr 23, 2021

Uh oh!

phsft-bot commented Apr 27, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

jblomer commented Jan 4, 2021 •

edited

Loading