[DF] Add DefinePerSample #8841
Conversation
8331e14 to
2b0e122
Compare
|
Starting build on |
2b0e122 to
64242a3
Compare
|
Starting build on |
This commit adds the infrastructure needed to pass a data-block id as well as the slot number to data-block callbacks. At this point we still never actually _set_ the data-block id argument to something meaningful (see next commits).
This leaves out the case of RDataSources, for which data-block IDs will always be empty at the moment.
This patch adds `df.DefinePerSample`, a method that lets user define new columns that are only updated per "data-block" rather than per entry, where a "data-block" is made of several entries that have the same data-block ID (e.g. that belong to the same TTree in a TChain). The data-block ID is passed as an argument to the callback, so that quantities can be defined based on the sample being processed. Currently a jitted version is not available and RDataSources have no way to hook into the mechanism (they get one data-block per task with empty data-block ID). Support for these cases will be added by later commits. This resolves root-project#6745.
|
Build failed on ROOT-performance-centos8-multicore/default. Failing tests: |
|
Build failed on mac11.0/cxx17. Failing tests: |
I guess you refer to the change in signature of the callbacks (from |
Rename: - DataBlockCallback_t -> SampleCallback_t - RDataBlockID -> RSampleInfo - RDataBlockNotifier -> RNewSampleNotifier - RDataBlockFlag -> RNewSampleFlag - GetDataBlockCallback -> GetSampleCallback etc.
|
Starting build on |
|
@Axel-Naumann I was wrong, |
|
Build failed on windows10/cxx14. Errors:
|
|
Build failed on mac11.0/cxx17. Failing tests: |
|
Starting build on |
|
Build failed on windows10/cxx14. Errors:
|
|
Build failed on ROOT-ubuntu16/nortcxxmod. Errors:
|
|
Starting build on |
1 similar comment
|
Starting build on |
|
Build failed on mac11.0/cxx17. Failing tests: |
GetSampleCallback is the superior alternative.
66b6b59 to
df5513a
Compare
|
Starting build on |
|
Build failed on mac11.0/cxx17. Failing tests: |
| return s.str(); | ||
| } | ||
|
|
||
| // Book the jitting of a Filter call |
There was a problem hiding this comment.
Was this supposed to be doxygen readable? --> ///
There was a problem hiding this comment.
nope (EDIT: i mean, it could be, but this is function is so internal that this was really meant as a normal code comment for developers reading the code)
Co-authored-by: Stephan Hageboeck <[email protected]>
|
Starting build on |
|
Build failed on mac11.0/cxx17. Failing tests: |
|
@hageboeck I will address those two NFC changes in an upcoming PR together with other small NFC things, did not want to trigger jenkins again :) |
|
Sure, whatever works best! |
This patch adds
df.DefinePerSample, a method that lets user definenew columns that are only updated per "data-block" rather than per
entry, where a "data-block" is made of several entries that have the
same data-block ID (e.g. that belong to the same TTree in a TChain).
The data-block ID is passed as an argument to the callback, so that
quantities can be defined based on the sample being processed.
Currently a jitted version is not available and RDataSources have
no way to hook into the mechanism (they get one data-block per task
with empty data-block ID). Support for these cases will be added by
later commits.
This resolves #6745.
This PR should make @stwunsch happy.
To do:
RDataBlockIDwith entry rangesRDataBlockID -> RDataBlockInfo?DefinePerSample -> DefinePerDataBlock?df.DefinePerSample("myconstant", "rdfdatablock_.Contains(\"MC\") ? 42. : 8.")