Skip to content

Add N-dimensional and more flexible histogram filling to RDataFrame#7499

Merged
eguiraud merged 8 commits intoroot-project:masterfrom
bendavid:rdf_ndhist
Dec 16, 2021
Merged

Add N-dimensional and more flexible histogram filling to RDataFrame#7499
eguiraud merged 8 commits intoroot-project:masterfrom
bendavid:rdf_ndhist

Conversation

@bendavid
Copy link
Copy Markdown
Contributor

@bendavid bendavid commented Mar 14, 2021

This PR does a few things

  1. Extends the RDF Fill functionality to support arbitrary types and number of columns and an arbitrary mix of individual objects and containers with variadic templates.

Note that this will likely result in slower code being generated in case of compiling/jitting without optimization.

  1. Adds a HistoND function to RDF to fill a THnD with arbitrary number of dimensions

The main issue here was actually that THnT does not have a publicly accessible copy constructor or assignment operator, which are needed for use with RDF. I didn't have the patience to implement this by hand for all the classes in the inheritance chain, so the relevant classes have been migrated from C-style arrays to std::vector such that default copy (and move) constructors and assignment operators can be automatically generated.

  1. Appropriate constructors have been added to allow THnT to be used with variable binning.

@phsft-bot
Copy link
Copy Markdown

Can one of the admins verify this patch?

@eguiraud eguiraud self-assigned this Mar 15, 2021
@eguiraud
Copy link
Copy Markdown
Contributor

Hi @bendavid , thank you for the contribution, I will take a look as soon as possible

@eguiraud
Copy link
Copy Markdown
Contributor

We need @lmoneta to review the changes in THn and TNDArray

@eguiraud
Copy link
Copy Markdown
Contributor

@bendavid about a), a move to C++14 for ROOT6 is foreseen but the timescale is too long to matter for this PR, I'm afraid.

Copy link
Copy Markdown
Member

@lmoneta lmoneta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. I leave to @Axel-Naumann author of the THn classes to give the final approval.

Copy link
Copy Markdown
Contributor

@eguiraud eguiraud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great!

  • addition of a copy-constructor in THn: good (a move-constructor should also be autogenerated right?)
  • generalization of the different FillParHelper::Exec overloads into a single implementation: good, I actually find it pretty readable considering the inherent complexity of what it is doing. No fold expressions and constexpr if unfortunately, not much we can do about that
  • addition of a FillN method to RDF: great, certainly useful and it even reuses a lot of the existing infrastructure

As mentioned above, we need @Axel-Naumann to review the THn-related changes. However, I don't know if the "temporary hack" to allow copy-construction of THn's is really temporary, what are the drawbacks?

See below a bunch of minor comments.

Only strictly required change: we need a test for FillN (jitted, non-jitted, single-thread, multi-thread): it should be easy to duplicate and adapt one of the tests we have for Fill.

@eguiraud
Copy link
Copy Markdown
Contributor

(will start a PR build after merge conflicts are resolved)

@bendavid
Copy link
Copy Markdown
Contributor Author

bendavid commented Mar 24, 2021

The main drawback of the "temporary hack" is that there is some duplication of information and operations now between the size of the vector and separate integer data members in the class which were previously needed to keep track of the size of the arrays. If migrating from bare arrays to std::vector to allow the compiler to autogenerate the constructors is the preferred solution (rather than implementing them by hand) then I can of course clean this up.

For what concerns the readability of the code in C++11, one thing which could help a little bit and might be feasible is if std::apply were added to the STL backports.

Another concern I had is that this implementation might be a bit slower than the previous individual cases if compiled without optimization (e.g. when jitted).

@oshadura
Copy link
Copy Markdown
Collaborator

@bendavid whenever you will have time can you please rebase on master? I will retrigger CI after...

@eguiraud
Copy link
Copy Markdown
Contributor

@Axel-Naumann is it ok to switch THn to std::vectors instead of C-style arrays as it's done in this PR? We get compiler-defined constructors out of it.

@bendavid can you please rebase on master to fix the conflicts when you have time?

@bendavid
Copy link
Copy Markdown
Contributor Author

Yes I would like some feedback on the use of std::vector so I can take care of that appropriately together with rebasing. If this strategy is agreed I can also clean up a bit more to avoid the redundant size counters, etc.

Copy link
Copy Markdown
Member

@Axel-Naumann Axel-Naumann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The THn part LGTM! A couple of minor comments, but I'm happy with the change from C-style array to vector.

⚠️ NOTE that this change means we need to increase the ClassDef version for all classes with changes base or data members! This should happen before this PR gets merged. Assuming that: ✔️

@bendavid bendavid changed the title [RFC] Add N-dimensional and more flexible histogram filling to RDataFrame Add N-dimensional and more flexible histogram filling to RDataFrame Nov 11, 2021
@bendavid
Copy link
Copy Markdown
Contributor Author

Alright I've rebased and hopefully addressed all of the comments.

@bendavid bendavid force-pushed the rdf_ndhist branch 3 times, most recently from 9af8629 to b07914a Compare November 11, 2021 01:40
@bendavid
Copy link
Copy Markdown
Contributor Author

There's something weird going on with clang-format here, where running locally seems to give a different result from the online checks for e.g.

void Exec(unsigned int slot, const Xs &...xs)

vs

void Exec(unsigned int slot, const Xs &... xs)

@eguiraud
Copy link
Copy Markdown
Contributor

Thanks Josh, I'll do one last review pass as soon as I have spare cycles. Re: clang-format, probably different versions disagree on some edge cases, you can ignore the CI complaints.

@eguiraud
Copy link
Copy Markdown
Contributor

@phsft-bot build

@phsft-bot
Copy link
Copy Markdown

Starting build on ROOT-debian10-i386/cxx14, ROOT-performance-centos8-multicore/default, ROOT-ubuntu16/nortcxxmod, ROOT-ubuntu2004/soversion, mac1014/python3, mac11.0/cxx17, windows10/cxx14
How to customize builds

@phsft-bot
Copy link
Copy Markdown

Build failed on mac11.0/cxx17.
See console output.

@bendavid
Copy link
Copy Markdown
Contributor Author

Small update to this to make the new Fill method for THnBase also work with integers (ie allowing the normal set of implicit type conversions, rather than excluding narrowing ones).

@eguiraud
Copy link
Copy Markdown
Contributor

@phsft-bot build

@phsft-bot
Copy link
Copy Markdown

Starting build on ROOT-debian10-i386/cxx14, ROOT-performance-centos8-multicore/default, ROOT-ubuntu16/nortcxxmod, ROOT-ubuntu2004/soversion, mac1015/python3, mac11/cxx17, windows10/cxx14
How to customize builds

@phsft-bot
Copy link
Copy Markdown

Build failed on ROOT-ubuntu2004/soversion.
See console output.

Copy link
Copy Markdown
Contributor

@eguiraud eguiraud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Green for me if green for jenkins, thank you very much @bendavid , this is great

@phsft-bot
Copy link
Copy Markdown

Build failed on ROOT-ubuntu16/nortcxxmod.
Running on sft-ubuntu-1604-2.cern.ch:/build/workspace/root-pullrequests-build
See console output.

Failing tests:

@phsft-bot
Copy link
Copy Markdown

Build failed on ROOT-performance-centos8-multicore/default.
Running on olbdw-01.cern.ch:/data/sftnight/workspace/root-pullrequests-build
See console output.

Failing tests:

@phsft-bot
Copy link
Copy Markdown

Build failed on ROOT-debian10-i386/cxx14.
Running on pcepsft11.dyndns.cern.ch:/home/sftnight/build/workspace/root-pullrequests-build
See console output.

Warnings:

  • [2021-12-15T15:56:58.236Z] /home/sftnight/build/workspace/root-pullrequests-build/root/hist/hist/src/THnBase.cxx:71:24: warning: format ‘%lu’ expects argument of type ‘long unsigned int’, but argument 5 has type ‘std::vector<std::vector<double> >::size_type’ {aka ‘unsigned int’} [-Wformat=]
  • [2021-12-15T15:56:58.236Z] /home/sftnight/build/workspace/root-pullrequests-build/root/hist/hist/src/THnBase.cxx:76:27: warning: format ‘%lu’ expects argument of type ‘long unsigned int’, but argument 5 has type ‘std::vector<double>::size_type’ {aka ‘unsigned int’} [-Wformat=]

@phsft-bot
Copy link
Copy Markdown

Build failed on mac11/cxx17.
Running on macphsft23.dyndns.cern.ch:/Users/sftnight/build/workspace/root-pullrequests-build
See console output.

Failing tests:

@phsft-bot
Copy link
Copy Markdown

Build failed on mac1015/python3.
Running on macitois22.dyndns.cern.ch:/Users/sftnight/build/workspace/root-pullrequests-build
See console output.

Failing tests:

@phsft-bot
Copy link
Copy Markdown

Build failed on windows10/cxx14.
Running on null:C:\build\workspace\root-pullrequests-build
See console output.

Failing tests:

@eguiraud
Copy link
Copy Markdown
Contributor

eguiraud commented Dec 15, 2021

Uuuh is Jenkins not picking up the latest roottest? Will take a look tomorrow afternoon.

@eguiraud
Copy link
Copy Markdown
Contributor

@phsft-bot build

@phsft-bot
Copy link
Copy Markdown

Starting build on ROOT-debian10-i386/cxx14, ROOT-performance-centos8-multicore/default, ROOT-ubuntu16/nortcxxmod, ROOT-ubuntu2004/soversion, mac1015/python3, mac11/cxx17, windows10/cxx14
How to customize builds

@eguiraud
Copy link
Copy Markdown
Contributor

@Axel-Naumann
Copy link
Copy Markdown
Member

@phsft-bot build
Should be fixed (caused by server migration)

@phsft-bot
Copy link
Copy Markdown

Starting build on ROOT-debian10-i386/cxx14, ROOT-performance-centos8-multicore/default, ROOT-ubuntu16/nortcxxmod, ROOT-ubuntu2004/soversion, mac1015/python3, mac11/cxx17, windows10/cxx14
How to customize builds

@phsft-bot
Copy link
Copy Markdown

Build failed on ROOT-debian10-i386/cxx14.
Running on pcepsft11.dyndns.cern.ch:/home/sftnight/build/workspace/root-pullrequests-build
See console output.

Warnings:

  • [2021-12-16T09:45:30.650Z] /home/sftnight/build/workspace/root-pullrequests-build/root/hist/hist/src/THnBase.cxx:71:24: warning: format ‘%lu’ expects argument of type ‘long unsigned int’, but argument 5 has type ‘std::vector<std::vector<double> >::size_type’ {aka ‘unsigned int’} [-Wformat=]
  • [2021-12-16T09:45:30.650Z] /home/sftnight/build/workspace/root-pullrequests-build/root/hist/hist/src/THnBase.cxx:76:27: warning: format ‘%lu’ expects argument of type ‘long unsigned int’, but argument 5 has type ‘std::vector<double>::size_type’ {aka ‘unsigned int’} [-Wformat=]

@phsft-bot
Copy link
Copy Markdown

Build failed on mac1015/python3.
Running on macitois21.dyndns.cern.ch:/Users/sftnight/build/workspace/root-pullrequests-build
See console output.

Failing tests:

@phsft-bot
Copy link
Copy Markdown

Build failed on windows10/cxx14.
Running on null:C:\build\workspace\root-pullrequests-build
See console output.

Failing tests:

@eguiraud
Copy link
Copy Markdown
Contributor

@bendavid the remaining failures are real (ignoring tmva), can you please take a look?

@bendavid
Copy link
Copy Markdown
Contributor Author

Should be fixed now. The ND histogram test was using a huge number of bins by mistake.

@eguiraud
Copy link
Copy Markdown
Contributor

@phsft-bot build

@phsft-bot
Copy link
Copy Markdown

Starting build on ROOT-debian10-i386/cxx14, ROOT-performance-centos8-multicore/default, ROOT-ubuntu16/nortcxxmod, ROOT-ubuntu2004/soversion, mac1015/python3, mac11/cxx17, windows10/cxx14
How to customize builds

@phsft-bot
Copy link
Copy Markdown

Build failed on windows10/cxx14.
Running on null:C:\build\workspace\root-pullrequests-build
See console output.

Errors:

  • [2021-12-16T15:19:07.801Z] C:\build\workspace\root-pullrequests-build\root\core\imt\src\TTaskGroup.cxx(18,10): fatal error C1083: Cannot open include file: 'tbb/task_group.h': No such file or directory [C:\build\workspace\root-pullrequests-build\build\core\imt\Imt.vcxproj]
  • [2021-12-16T15:19:08.392Z] C:\build\workspace\root-pullrequests-build\root\core\imt\src\ROpaqueTaskArena.hxx(1,10): fatal error C1083: Cannot open include file: 'tbb/task_arena.h': No such file or directory [C:\build\workspace\root-pullrequests-build\build\core\imt\Imt.vcxproj]
  • [2021-12-16T15:19:08.392Z] C:\build\workspace\root-pullrequests-build\root\core\imt\src\ROpaqueTaskArena.hxx(1,10): fatal error C1083: Cannot open include file: 'tbb/task_arena.h': No such file or directory [C:\build\workspace\root-pullrequests-build\build\core\imt\Imt.vcxproj]

@eguiraud
Copy link
Copy Markdown
Contributor

Windows failure is unrelated, merging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants