[DF] Add DefinePerSample by eguiraud · Pull Request #8841 · root-project/root

eguiraud · 2021-08-13T09:58:56Z

This patch adds df.DefinePerSample, a method that lets user define
new columns that are only updated per "data-block" rather than per
entry, where a "data-block" is made of several entries that have the
same data-block ID (e.g. that belong to the same TTree in a TChain).

The data-block ID is passed as an argument to the callback, so that
quantities can be defined based on the sample being processed.

Currently a jitted version is not available and RDataSources have
no way to hook into the mechanism (they get one data-block per task
with empty data-block ID). Support for these cases will be added by
later commits.

This resolves #6745.

This PR should make @stwunsch happy.

To do:

test RDataBlockID with entry ranges
naming: RDataBlockID -> RDataBlockInfo? DefinePerSample -> DefinePerDataBlock?
add support for jitted df.DefinePerSample("myconstant", "rdfdatablock_.Contains(\"MC\") ? 42. : 8.")
add release notes

phsft-bot · 2021-08-16T13:14:00Z

Starting build on ROOT-debian10-i386/cxx14, ROOT-performance-centos8-multicore/default, ROOT-ubuntu16/nortcxxmod, mac1014/python3, mac11.0/cxx17, windows10/cxx14
How to customize builds

phsft-bot · 2021-09-01T11:49:10Z

Starting build on ROOT-debian10-i386/cxx14, ROOT-performance-centos8-multicore/default, ROOT-ubuntu16/nortcxxmod, mac1014/python3, mac11.0/cxx17, windows10/cxx14
How to customize builds

This commit adds the infrastructure needed to pass a data-block id as well as the slot number to data-block callbacks. At this point we still never actually _set_ the data-block id argument to something meaningful (see next commits).

This leaves out the case of RDataSources, for which data-block IDs will always be empty at the moment.

This patch adds `df.DefinePerSample`, a method that lets user define new columns that are only updated per "data-block" rather than per entry, where a "data-block" is made of several entries that have the same data-block ID (e.g. that belong to the same TTree in a TChain). The data-block ID is passed as an argument to the callback, so that quantities can be defined based on the sample being processed. Currently a jitted version is not available and RDataSources have no way to hook into the mechanism (they get one data-block per task with empty data-block ID). Support for these cases will be added by later commits. This resolves root-project#6745.

phsft-bot · 2021-09-01T12:13:52Z

Build failed on ROOT-performance-centos8-multicore/default.
Running on olbdw-01.cern.ch:/data/sftnight/workspace/root-pullrequests-build
See console output.

Failing tests:

projectroot.tree.dataframe.test.gtest_tree_dataframe_test_dataframe_definepersample

phsft-bot · 2021-09-01T13:07:25Z

Build failed on mac11.0/cxx17.
Running on macphsft20.dyndns.cern.ch:/Users/sftnight/build/workspace/root-pullrequests-build
See console output.

Failing tests:

tree/dataframe/src/RJittedAction.cxx

Axel-Naumann

Very nice! Is this PR is changing a public and possibly used interface? Is this a conscious decision? If so, please update the release notes calling out this change.

eguiraud · 2021-09-01T16:13:55Z

Is this PR is changing a public and possibly used interface? Is this a conscious decision?

I guess you refer to the change in signature of the callbacks (from unsigned int to unsigned int, RDataBlockId&): that interface was never released.

Rename: - DataBlockCallback_t -> SampleCallback_t - RDataBlockID -> RSampleInfo - RDataBlockNotifier -> RNewSampleNotifier - RDataBlockFlag -> RNewSampleFlag - GetDataBlockCallback -> GetSampleCallback etc.

phsft-bot · 2021-09-07T16:35:58Z

Starting build on ROOT-debian10-i386/cxx14, ROOT-performance-centos8-multicore/default, ROOT-ubuntu16/nortcxxmod, mac1014/python3, mac11.0/cxx17, windows10/cxx14
How to customize builds

eguiraud · 2021-09-07T16:39:10Z

@Axel-Naumann I was wrong, RActionImpl::GetDataBlockCallback was released in v6.24 and this PR will potentially break user-defined action helpers that use the feature (it's likely that none exist, but still...). I'll try to think of a deprecation strategy to move users to the new callback signature and the new name, if I can't come up with anything I'll add the breaking change to the release notes -- this is an expert feature with rare applications outside of RDF internals, if at all.

phsft-bot · 2021-09-07T17:24:55Z

Build failed on windows10/cxx14.
Running on null:C:\build\workspace\root-pullrequests-build
See console output.

Errors:

[2021-09-07T17:24:55.116Z] CMake Error at cmake/modules/RootBuildOptions.cmake:407 (message):
[2021-09-07T17:24:55.116Z] CMake Error at C:/build/workspace/root-pullrequests-build/rootspi/jenkins/root-build.cmake:1117 (message):

phsft-bot · 2021-09-07T18:01:11Z

Build failed on mac11.0/cxx17.
Running on macphsft20.dyndns.cern.ch:/Users/sftnight/build/workspace/root-pullrequests-build
See console output.

Failing tests:

phsft-bot · 2021-09-08T16:09:14Z

Starting build on ROOT-debian10-i386/cxx14, ROOT-performance-centos8-multicore/default, ROOT-ubuntu16/nortcxxmod, mac1014/python3, mac11.0/cxx17, windows10/cxx14
How to customize builds

phsft-bot · 2021-09-08T16:16:49Z

Build failed on windows10/cxx14.
Running on null:C:\build\workspace\root-pullrequests-build
See console output.

Errors:

[2021-09-08T16:16:48.753Z] CMake Error at cmake/modules/RootBuildOptions.cmake:407 (message):
[2021-09-08T16:16:48.753Z] CMake Error at C:/build/workspace/root-pullrequests-build/rootspi/jenkins/root-build.cmake:1117 (message):

phsft-bot · 2021-09-08T16:17:49Z

Build failed on ROOT-ubuntu16/nortcxxmod.
Running on sft-ubuntu-1604-3.cern.ch:/build/workspace/root-pullrequests-build
See console output.

Errors:

[2021-09-08T16:17:14.719Z] FAILED: /usr/bin/ccache /usr/bin/c++ -DVECCORE_ENABLE_VC -I/mnt/build/workspace/root-pullrequests-build/root/tree/dataframe/inc -I/mnt/build/workspace/root-pullrequests-build/root/core/unix/inc -I/mnt/build/workspace/root-pullrequests-build/root/core/foundation/v7/inc -I/mnt/build/workspace/root-pullrequests-build/root/core/base/v7/inc -I/mnt/build/workspace/root-pullrequests-build/root/core/clingutils/inc -I/mnt/build/workspace/root-pullrequests-build/root/core/textinput/inc -I/mnt/build/workspace/root-pullrequests-build/root/core/thread/inc -I/mnt/build/workspace/root-pullrequests-build/root/core/zip/inc -I/mnt/build/workspace/root-pullrequests-build/root/core/rint/inc -I/mnt/build/workspace/root-pullrequests-build/root/core/clib/inc -I/mnt/build/workspace/root-pullrequests-build/root/core/meta/inc -I/mnt/build/workspace/root-pullrequests-build/root/core/gui/inc -I/mnt/build/workspace/root-pullrequests-build/root/core/cont/inc -I/mnt/build/workspace/root-pullrequests-build/root/core/foundation/inc -I/mnt/build/workspace/root-pullrequests-build/root/core/base/inc -Iginclude -I/mnt/build/workspace/root-pullrequests-build/root/tree/tree/inc -I/mnt/build/workspace/root-pullrequests-build/root/core/imt/inc -I/mnt/build/workspace/root-pullrequests-build/root/core/multiproc/inc -I/mnt/build/workspace/root-pullrequests-build/root/math/mathcore/inc -I/mnt/build/workspace/root-pullrequests-build/root/math/mathcore/v7/inc -Iexternals/mnt/build/workspace/root-pullrequests-build/install/include -I/mnt/build/workspace/root-pullrequests-build/root/tree/treeplayer/inc -I/mnt/build/workspace/root-pullrequests-build/root/hist/hist/inc -I/mnt/build/workspace/root-pullrequests-build/root/math/matrix/inc -I/mnt/build/workspace/root-pullrequests-build/root/math/vecops/inc -Itree/dataframe/test -I/mnt/build/workspace/root-pullrequests-build/root/net/net/inc -I/mnt/build/workspace/root-pullrequests-build/root/io/io/inc -I/mnt/build/workspace/root-pullrequests-build/root/graf2d/gpad/inc -I/mnt/build/workspace/root-pullrequests-build/root/graf2d/graf/inc -I/mnt/build/workspace/root-pullrequests-build/root/graf3d/g3d/inc -I/mnt/build/workspace/root-pullrequests-build/root/test/unit_testing_support -isystem googletest-prefix/src/googletest/googletest/include -isystem googletest-prefix/src/googletest/googlemock/include -fdiagnostics-color=always -std=c++14 -pipe -Wshadow -Wall -W -Woverloaded-virtual -fsigned-char -pthread -O3 -std=c++14 -MD -MT tree/dataframe/test/CMakeFiles/dataframe_definepersample.dir/dataframe_definepersample.cxx.o -MF tree/dataframe/test/CMakeFiles/dataframe_definepersample.dir/dataframe_definepersample.cxx.o.d -o tree/dataframe/test/CMakeFiles/dataframe_definepersample.dir/dataframe_definepersample.cxx.o -c /mnt/build/workspace/root-pullrequests-build/root/tree/dataframe/test/dataframe_definepersample.cxx
[2021-09-08T16:17:14.719Z] /mnt/build/workspace/root-pullrequests-build/root/tree/dataframe/test/dataframe_definepersample.cxx:57:30: error: use of deleted function ‘std::atomic<int>::atomic(const std::atomic<int>&)’
[2021-09-08T16:17:14.719Z] /mnt/build/workspace/root-pullrequests-build/root/tree/dataframe/test/dataframe_definepersample.cxx:77:30: error: use of deleted function ‘std::atomic<int>::atomic(const std::atomic<int>&)’
[2021-09-08T16:17:14.994Z] /mnt/build/workspace/root-pullrequests-build/root/tree/dataframe/test/dataframe_definepersample.cxx:97:30: error: use of deleted function ‘std::atomic<int>::atomic(const std::atomic<int>&)’

phsft-bot · 2021-09-08T16:20:37Z

Starting build on ROOT-debian10-i386/cxx14, ROOT-performance-centos8-multicore/default, ROOT-ubuntu16/nortcxxmod, mac1014/python3, mac11.0/cxx17, windows10/cxx14
How to customize builds

phsft-bot · 2021-09-08T18:34:31Z

Starting build on ROOT-debian10-i386/cxx14, ROOT-performance-centos8-multicore/default, ROOT-ubuntu16/nortcxxmod, mac1014/python3, mac11.0/cxx17, windows10/cxx14
How to customize builds

phsft-bot · 2021-09-08T20:27:46Z

Build failed on mac11.0/cxx17.
Running on macphsft20.dyndns.cern.ch:/Users/sftnight/build/workspace/root-pullrequests-build
See console output.

Failing tests:

GetSampleCallback is the superior alternative.

phsft-bot · 2021-09-28T09:29:53Z

Starting build on ROOT-debian10-i386/cxx14, ROOT-performance-centos8-multicore/default, ROOT-ubuntu16/nortcxxmod, mac1014/python3, mac11.0/cxx17, windows10/cxx14
How to customize builds

phsft-bot · 2021-09-28T10:38:47Z

Build failed on mac11.0/cxx17.
Running on macphsft23.dyndns.cern.ch:/Users/sftnight/build/workspace/root-pullrequests-build
See console output.

Failing tests:

tree/dataframe/inc/ROOT/RDF/RInterface.hxx

tree/dataframe/inc/ROOT/RDF/RDataBlockID.hxx

hageboeck · 2021-09-28T12:13:01Z

tree/dataframe/src/RDFInterfaceUtils.cxx

   return s.str();
 }

+// Book the jitting of a Filter call


Was this supposed to be doxygen readable? --> ///

nope (EDIT: i mean, it could be, but this is function is so internal that this was really meant as a normal code comment for developers reading the code)

tree/dataframe/inc/ROOT/RDF/RSampleInfo.hxx

Co-authored-by: Stephan Hageboeck <[email protected]>

phsft-bot · 2021-09-28T12:47:48Z

Starting build on ROOT-debian10-i386/cxx14, ROOT-performance-centos8-multicore/default, ROOT-ubuntu16/nortcxxmod, mac1014/python3, mac11.0/cxx17, windows10/cxx14
How to customize builds

phsft-bot · 2021-09-28T13:59:53Z

Build failed on mac11.0/cxx17.
Running on macphsft23.dyndns.cern.ch:/Users/sftnight/build/workspace/root-pullrequests-build
See console output.

Failing tests:

eguiraud · 2021-09-28T15:43:13Z

@hageboeck I will address those two NFC changes in an upcoming PR together with other small NFC things, did not want to trigger jenkins again :)

hageboeck · 2021-09-28T15:45:20Z

Sure, whatever works best!

eguiraud self-assigned this Aug 13, 2021

eguiraud force-pushed the df-definepersample branch from 8331e14 to 2b0e122 Compare August 16, 2021 13:13

root-project deleted a comment from phsft-bot Aug 16, 2021

eguiraud force-pushed the df-definepersample branch from 2b0e122 to 64242a3 Compare September 1, 2021 11:49

eguiraud marked this pull request as ready for review September 1, 2021 11:50

eguiraud requested a review from bellenot as a code owner September 1, 2021 11:50

eguiraud requested review from Axel-Naumann and etejedor and removed request for bellenot September 1, 2021 11:50

eguiraud added 7 commits September 1, 2021 13:50

[DF] Generate data-block IDs when running over no data or over TTrees

ec22bee

This leaves out the case of RDataSources, for which data-block IDs will always be empty at the moment.

[DF] Add test for data-block IDs

6682fc0

[DF] Add test suite for DefinePerSample

4fce49e

[DF] Add entry range info to RDataBlockID

0b39f47

[DF] Test entry ranges in RDataBlockID

64242a3

Axel-Naumann reviewed Sep 1, 2021

View reviewed changes

tree/dataframe/src/RJittedAction.cxx Outdated Show resolved Hide resolved

Axel-Naumann approved these changes Sep 1, 2021

View reviewed changes

[DF] Consistently use "sample" instead of "data-block"

aaba203

Rename: - DataBlockCallback_t -> SampleCallback_t - RDataBlockID -> RSampleInfo - RDataBlockNotifier -> RNewSampleNotifier - RDataBlockFlag -> RNewSampleFlag - GetDataBlockCallback -> GetSampleCallback etc.

eguiraud added 5 commits September 28, 2021 11:27

[DF] Make counters atomic for MT tests

f4d0eab

[DF] Deprecate GetDataBlockCallback instead of removing it

a57a08e

GetSampleCallback is the superior alternative.

[DF][NFC] Improve comments

c8974c3

[DF] Add jitting support to DefinePerSample + test

3f3b340

[DF][NFC] Add docs for DefinePerSample, RSampleInfo

df5513a

eguiraud force-pushed the df-definepersample branch from 66b6b59 to df5513a Compare September 28, 2021 09:29

hageboeck reviewed Sep 28, 2021

View reviewed changes

[DF][NFC] Add a comma

59b5752

Co-authored-by: Stephan Hageboeck <[email protected]>

eguiraud mentioned this pull request Sep 28, 2021

Implement an RDataFrame::Merge(RDataFrame &df) function. #9030

Closed

eguiraud merged commit f145169 into root-project:master Sep 28, 2021

eguiraud deleted the df-definepersample branch September 28, 2021 15:39

Conversation

eguiraud commented Aug 13, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

phsft-bot commented Aug 16, 2021

Uh oh!

phsft-bot commented Sep 1, 2021

Uh oh!

phsft-bot commented Sep 1, 2021

Failing tests:

Uh oh!

phsft-bot commented Sep 1, 2021

Failing tests:

Uh oh!

Uh oh!

Axel-Naumann left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eguiraud commented Sep 1, 2021

Uh oh!

phsft-bot commented Sep 7, 2021

Uh oh!

eguiraud commented Sep 7, 2021

Uh oh!

phsft-bot commented Sep 7, 2021

Errors:

Uh oh!

phsft-bot commented Sep 7, 2021

Failing tests:

Uh oh!

phsft-bot commented Sep 8, 2021

Uh oh!

phsft-bot commented Sep 8, 2021

Errors:

Uh oh!

phsft-bot commented Sep 8, 2021

Errors:

Uh oh!

phsft-bot commented Sep 8, 2021

Uh oh!

phsft-bot commented Sep 8, 2021

Uh oh!

phsft-bot commented Sep 8, 2021

Failing tests:

Uh oh!

phsft-bot commented Sep 28, 2021

Uh oh!

phsft-bot commented Sep 28, 2021

Failing tests:

Uh oh!

Uh oh!

Uh oh!

hageboeck Sep 28, 2021

Choose a reason for hiding this comment

Uh oh!

eguiraud Sep 28, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

phsft-bot commented Sep 28, 2021

Uh oh!

phsft-bot commented Sep 28, 2021

Failing tests:

Uh oh!

eguiraud commented Sep 28, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hageboeck commented Sep 28, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

eguiraud commented Aug 13, 2021 •

edited

Loading

Axel-Naumann left a comment •

edited

Loading

eguiraud Sep 28, 2021 •

edited

Loading

eguiraud commented Sep 28, 2021 •

edited

Loading