Skip to content

[DF] Add DefinePerSample #8841

Merged
eguiraud merged 14 commits intoroot-project:masterfrom
eguiraud:df-definepersample
Sep 28, 2021
Merged

[DF] Add DefinePerSample #8841
eguiraud merged 14 commits intoroot-project:masterfrom
eguiraud:df-definepersample

Conversation

@eguiraud
Copy link
Copy Markdown
Contributor

@eguiraud eguiraud commented Aug 13, 2021

This patch adds df.DefinePerSample, a method that lets user define
new columns that are only updated per "data-block" rather than per
entry, where a "data-block" is made of several entries that have the
same data-block ID (e.g. that belong to the same TTree in a TChain).

The data-block ID is passed as an argument to the callback, so that
quantities can be defined based on the sample being processed.

Currently a jitted version is not available and RDataSources have
no way to hook into the mechanism (they get one data-block per task
with empty data-block ID). Support for these cases will be added by
later commits.

This resolves #6745.

This PR should make @stwunsch happy.

To do:

  • test RDataBlockID with entry ranges
  • naming: RDataBlockID -> RDataBlockInfo? DefinePerSample -> DefinePerDataBlock?
  • add support for jitted df.DefinePerSample("myconstant", "rdfdatablock_.Contains(\"MC\") ? 42. : 8.")
  • add release notes

@eguiraud eguiraud self-assigned this Aug 13, 2021
@eguiraud eguiraud force-pushed the df-definepersample branch from 8331e14 to 2b0e122 Compare August 16, 2021 13:13
@phsft-bot
Copy link
Copy Markdown

Starting build on ROOT-debian10-i386/cxx14, ROOT-performance-centos8-multicore/default, ROOT-ubuntu16/nortcxxmod, mac1014/python3, mac11.0/cxx17, windows10/cxx14
How to customize builds

@root-project root-project deleted a comment from phsft-bot Aug 16, 2021
@root-project root-project deleted a comment from phsft-bot Aug 16, 2021
@root-project root-project deleted a comment from phsft-bot Aug 16, 2021
@root-project root-project deleted a comment from phsft-bot Aug 16, 2021
@root-project root-project deleted a comment from phsft-bot Aug 16, 2021
@root-project root-project deleted a comment from phsft-bot Aug 16, 2021
@root-project root-project deleted a comment from phsft-bot Aug 16, 2021
@phsft-bot
Copy link
Copy Markdown

Starting build on ROOT-debian10-i386/cxx14, ROOT-performance-centos8-multicore/default, ROOT-ubuntu16/nortcxxmod, mac1014/python3, mac11.0/cxx17, windows10/cxx14
How to customize builds

@eguiraud eguiraud marked this pull request as ready for review September 1, 2021 11:50
@eguiraud eguiraud requested a review from bellenot as a code owner September 1, 2021 11:50
@eguiraud eguiraud requested review from Axel-Naumann and etejedor and removed request for bellenot September 1, 2021 11:50
This commit adds the infrastructure needed to pass a data-block id
as well as the slot number to data-block callbacks.

At this point we still never actually _set_ the data-block id
argument to something meaningful (see next commits).
This leaves out the case of RDataSources, for which data-block IDs
will always be empty at the moment.
This patch adds `df.DefinePerSample`, a method that lets user define
new columns that are only updated per "data-block" rather than per
entry, where a "data-block" is made of several entries that have the
same data-block ID (e.g. that belong to the same TTree in a TChain).

The data-block ID is passed as an argument to the callback, so that
quantities can be defined based on the sample being processed.

Currently a jitted version is not available and RDataSources have
no way to hook into the mechanism (they get one data-block per task
with empty data-block ID). Support for these cases will be added by
later commits.

This resolves root-project#6745.
@phsft-bot
Copy link
Copy Markdown

Build failed on ROOT-performance-centos8-multicore/default.
Running on olbdw-01.cern.ch:/data/sftnight/workspace/root-pullrequests-build
See console output.

Failing tests:

@phsft-bot
Copy link
Copy Markdown

Build failed on mac11.0/cxx17.
Running on macphsft20.dyndns.cern.ch:/Users/sftnight/build/workspace/root-pullrequests-build
See console output.

Failing tests:

Copy link
Copy Markdown
Member

@Axel-Naumann Axel-Naumann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice! Is this PR is changing a public and possibly used interface? Is this a conscious decision? If so, please update the release notes calling out this change.

@eguiraud
Copy link
Copy Markdown
Contributor Author

eguiraud commented Sep 1, 2021

Is this PR is changing a public and possibly used interface? Is this a conscious decision?

I guess you refer to the change in signature of the callbacks (from unsigned int to unsigned int, RDataBlockId&): that interface was never released.

Rename:
- DataBlockCallback_t -> SampleCallback_t
- RDataBlockID -> RSampleInfo
- RDataBlockNotifier -> RNewSampleNotifier
- RDataBlockFlag -> RNewSampleFlag
- GetDataBlockCallback -> GetSampleCallback

etc.
@phsft-bot
Copy link
Copy Markdown

Starting build on ROOT-debian10-i386/cxx14, ROOT-performance-centos8-multicore/default, ROOT-ubuntu16/nortcxxmod, mac1014/python3, mac11.0/cxx17, windows10/cxx14
How to customize builds

@eguiraud
Copy link
Copy Markdown
Contributor Author

eguiraud commented Sep 7, 2021

@Axel-Naumann I was wrong, RActionImpl::GetDataBlockCallback was released in v6.24 and this PR will potentially break user-defined action helpers that use the feature (it's likely that none exist, but still...). I'll try to think of a deprecation strategy to move users to the new callback signature and the new name, if I can't come up with anything I'll add the breaking change to the release notes -- this is an expert feature with rare applications outside of RDF internals, if at all.

@phsft-bot
Copy link
Copy Markdown

Build failed on windows10/cxx14.
Running on null:C:\build\workspace\root-pullrequests-build
See console output.

Errors:

  • [2021-09-07T17:24:55.116Z] CMake Error at cmake/modules/RootBuildOptions.cmake:407 (message):
  • [2021-09-07T17:24:55.116Z] CMake Error at C:/build/workspace/root-pullrequests-build/rootspi/jenkins/root-build.cmake:1117 (message):

@phsft-bot
Copy link
Copy Markdown

Build failed on mac11.0/cxx17.
Running on macphsft20.dyndns.cern.ch:/Users/sftnight/build/workspace/root-pullrequests-build
See console output.

Failing tests:

@phsft-bot
Copy link
Copy Markdown

Starting build on ROOT-debian10-i386/cxx14, ROOT-performance-centos8-multicore/default, ROOT-ubuntu16/nortcxxmod, mac1014/python3, mac11.0/cxx17, windows10/cxx14
How to customize builds

@phsft-bot
Copy link
Copy Markdown

Build failed on windows10/cxx14.
Running on null:C:\build\workspace\root-pullrequests-build
See console output.

Errors:

  • [2021-09-08T16:16:48.753Z] CMake Error at cmake/modules/RootBuildOptions.cmake:407 (message):
  • [2021-09-08T16:16:48.753Z] CMake Error at C:/build/workspace/root-pullrequests-build/rootspi/jenkins/root-build.cmake:1117 (message):

@phsft-bot
Copy link
Copy Markdown

Build failed on ROOT-ubuntu16/nortcxxmod.
Running on sft-ubuntu-1604-3.cern.ch:/build/workspace/root-pullrequests-build
See console output.

Errors:

  • [2021-09-08T16:17:14.719Z] FAILED: /usr/bin/ccache /usr/bin/c++ -DVECCORE_ENABLE_VC -I/mnt/build/workspace/root-pullrequests-build/root/tree/dataframe/inc -I/mnt/build/workspace/root-pullrequests-build/root/core/unix/inc -I/mnt/build/workspace/root-pullrequests-build/root/core/foundation/v7/inc -I/mnt/build/workspace/root-pullrequests-build/root/core/base/v7/inc -I/mnt/build/workspace/root-pullrequests-build/root/core/clingutils/inc -I/mnt/build/workspace/root-pullrequests-build/root/core/textinput/inc -I/mnt/build/workspace/root-pullrequests-build/root/core/thread/inc -I/mnt/build/workspace/root-pullrequests-build/root/core/zip/inc -I/mnt/build/workspace/root-pullrequests-build/root/core/rint/inc -I/mnt/build/workspace/root-pullrequests-build/root/core/clib/inc -I/mnt/build/workspace/root-pullrequests-build/root/core/meta/inc -I/mnt/build/workspace/root-pullrequests-build/root/core/gui/inc -I/mnt/build/workspace/root-pullrequests-build/root/core/cont/inc -I/mnt/build/workspace/root-pullrequests-build/root/core/foundation/inc -I/mnt/build/workspace/root-pullrequests-build/root/core/base/inc -Iginclude -I/mnt/build/workspace/root-pullrequests-build/root/tree/tree/inc -I/mnt/build/workspace/root-pullrequests-build/root/core/imt/inc -I/mnt/build/workspace/root-pullrequests-build/root/core/multiproc/inc -I/mnt/build/workspace/root-pullrequests-build/root/math/mathcore/inc -I/mnt/build/workspace/root-pullrequests-build/root/math/mathcore/v7/inc -Iexternals/mnt/build/workspace/root-pullrequests-build/install/include -I/mnt/build/workspace/root-pullrequests-build/root/tree/treeplayer/inc -I/mnt/build/workspace/root-pullrequests-build/root/hist/hist/inc -I/mnt/build/workspace/root-pullrequests-build/root/math/matrix/inc -I/mnt/build/workspace/root-pullrequests-build/root/math/vecops/inc -Itree/dataframe/test -I/mnt/build/workspace/root-pullrequests-build/root/net/net/inc -I/mnt/build/workspace/root-pullrequests-build/root/io/io/inc -I/mnt/build/workspace/root-pullrequests-build/root/graf2d/gpad/inc -I/mnt/build/workspace/root-pullrequests-build/root/graf2d/graf/inc -I/mnt/build/workspace/root-pullrequests-build/root/graf3d/g3d/inc -I/mnt/build/workspace/root-pullrequests-build/root/test/unit_testing_support -isystem googletest-prefix/src/googletest/googletest/include -isystem googletest-prefix/src/googletest/googlemock/include -fdiagnostics-color=always -std=c++14 -pipe -Wshadow -Wall -W -Woverloaded-virtual -fsigned-char -pthread -O3 -std=c++14 -MD -MT tree/dataframe/test/CMakeFiles/dataframe_definepersample.dir/dataframe_definepersample.cxx.o -MF tree/dataframe/test/CMakeFiles/dataframe_definepersample.dir/dataframe_definepersample.cxx.o.d -o tree/dataframe/test/CMakeFiles/dataframe_definepersample.dir/dataframe_definepersample.cxx.o -c /mnt/build/workspace/root-pullrequests-build/root/tree/dataframe/test/dataframe_definepersample.cxx
  • [2021-09-08T16:17:14.719Z] /mnt/build/workspace/root-pullrequests-build/root/tree/dataframe/test/dataframe_definepersample.cxx:57:30: error: use of deleted function ‘std::atomic<int>::atomic(const std::atomic<int>&)’
  • [2021-09-08T16:17:14.719Z] /mnt/build/workspace/root-pullrequests-build/root/tree/dataframe/test/dataframe_definepersample.cxx:77:30: error: use of deleted function ‘std::atomic<int>::atomic(const std::atomic<int>&)’
  • [2021-09-08T16:17:14.994Z] /mnt/build/workspace/root-pullrequests-build/root/tree/dataframe/test/dataframe_definepersample.cxx:97:30: error: use of deleted function ‘std::atomic<int>::atomic(const std::atomic<int>&)’

@phsft-bot
Copy link
Copy Markdown

Starting build on ROOT-debian10-i386/cxx14, ROOT-performance-centos8-multicore/default, ROOT-ubuntu16/nortcxxmod, mac1014/python3, mac11.0/cxx17, windows10/cxx14
How to customize builds

1 similar comment
@phsft-bot
Copy link
Copy Markdown

Starting build on ROOT-debian10-i386/cxx14, ROOT-performance-centos8-multicore/default, ROOT-ubuntu16/nortcxxmod, mac1014/python3, mac11.0/cxx17, windows10/cxx14
How to customize builds

@phsft-bot
Copy link
Copy Markdown

Build failed on mac11.0/cxx17.
Running on macphsft20.dyndns.cern.ch:/Users/sftnight/build/workspace/root-pullrequests-build
See console output.

Failing tests:

@phsft-bot
Copy link
Copy Markdown

Starting build on ROOT-debian10-i386/cxx14, ROOT-performance-centos8-multicore/default, ROOT-ubuntu16/nortcxxmod, mac1014/python3, mac11.0/cxx17, windows10/cxx14
How to customize builds

@phsft-bot
Copy link
Copy Markdown

Build failed on mac11.0/cxx17.
Running on macphsft23.dyndns.cern.ch:/Users/sftnight/build/workspace/root-pullrequests-build
See console output.

Failing tests:

return s.str();
}

// Book the jitting of a Filter call
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this supposed to be doxygen readable? --> ///

Copy link
Copy Markdown
Contributor Author

@eguiraud eguiraud Sep 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nope (EDIT: i mean, it could be, but this is function is so internal that this was really meant as a normal code comment for developers reading the code)

Co-authored-by: Stephan Hageboeck <[email protected]>
@phsft-bot
Copy link
Copy Markdown

Starting build on ROOT-debian10-i386/cxx14, ROOT-performance-centos8-multicore/default, ROOT-ubuntu16/nortcxxmod, mac1014/python3, mac11.0/cxx17, windows10/cxx14
How to customize builds

@phsft-bot
Copy link
Copy Markdown

Build failed on mac11.0/cxx17.
Running on macphsft23.dyndns.cern.ch:/Users/sftnight/build/workspace/root-pullrequests-build
See console output.

Failing tests:

@eguiraud eguiraud merged commit f145169 into root-project:master Sep 28, 2021
@eguiraud eguiraud deleted the df-definepersample branch September 28, 2021 15:39
@eguiraud
Copy link
Copy Markdown
Contributor Author

eguiraud commented Sep 28, 2021

@hageboeck I will address those two NFC changes in an upcoming PR together with other small NFC things, did not want to trigger jenkins again :)

@hageboeck
Copy link
Copy Markdown
Member

Sure, whatever works best!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add RDataFrame::DefinePerSample

4 participants