RooFit::MultiProcess & TestStatistics part 6: add RooFit::TestStatistics by egpbos · Pull Request #8700 · root-project/root

egpbos · 2021-07-19T14:57:45Z

Changes:

This PR introduces a major refactoring of the RooAbs(Opt)TestStatistic-RooNLLVar inheritance tree into:

statistics-based classes on the one hand;
calculation/evaluation/optimization based classes on the other hand.

The likelihood is the central unit on the statistics side. The RooAbsL class is implemented for four kinds of likelihoods: binned, unbinned, "subsidiary" (an optimization for numerical stability that gathers components like global observables) and "sum" (over multiple components of the other types). These classes provide ways to compute their components in parallelizable chunks that can be used by the calculator classes as they see fit. On top of the likelihood classes, we also provide for convenience a set of likelihood builders.

The calculator "...Wrapper" classes are abstract interfaces. These can be implemented for different kinds of algorithms, or with different kinds of optimization "back-ends" in mind. In an upcoming PR, we will introduce the fork-based multi-processing implementation based on RooFit::MultiProcess. Other possible implementations could use the GPU or external tools like TensorFlow.

The coupling of all these classes to RooMinimizer is made via the MinuitFcnGrad class, which owns the ...Wrappers that calculate the likelihood components.

Todo:

There are still a couple of things that require attention, which I hope the reviewers can help me with:

I have a Kahan summation helper class. This should probably be replaced with the new Kahan summation class in ROOT itself, but I didn't get to this yet. Maybe it is not urgent to do now, up for discussion.
likelihood_builders.cxx is still missing the top level entry point function. This should be quite a simple function: it should just check whether the pdf is binned, unbinned or sum and has subsidiary terms. All the building blocks are already present, so it is just a matter of building it, but didn't get to it yet.
optional_parameter_types.h: maybe we could replace this with @guitargeek's new configuration structs.
testRooRealL.getValRooAddition fails because it doesn't know RooFormulaVar... I'm not sure what this means, but I vaguely remember that (I think) @hageboeck and @cburgard were talking about this, so I was hoping you may be able to help me out with fixing that test.

Finally, one thing that in an ideal world I would have liked to do is add a serial implementation of the LikelihoodGradientWrapper class. Probably, we can reuse a lot of RooGradMinimizerFcn for this and it should be rather straightforward. However, I'm rather short on time right now, and these things almost always tend to take longer than one expects, so I'm hesitant to build it now. The advantage would be that we could add a test to this PR that covers all that is added, basically by just running a minimization. Note that I do have such a full-coverage test, but only for the MultiProcess minimization, where I implemented LikelihoodGradientWrapper in the parallel LikelihoodGradientJob class. So this test will come in "PR 7" (see overview). So, up for discussion, two choices: A: add "LikelihoodGradientSerial" class (based on RooGradMinimizerFcn) so we can add an integration test in this PR; B: wait for the integration test in PR 7.

hageboeck

Hello!

I understand it's a big thing, but unfortunately it's far from ready. Please excuse that the comments become shorter and shorter, but I literally went through 4,008 additions.

I now also realise that there is a very nice description of what's going on in the commit message, which should be findable in doxygen. I recommend to copy those paragraphs to a findable location in doxygen. The class documentation of the base classes would be one option or the (not yet existing) docstrings for the TestStatistics namespace.

A really big problem while maintaining RooFit was the lack of documentation in many critical places. I will hold you to the higher standard that I tried to establish, since that will help the future maintainers a lot. 🙂

Furthermore, there's some things that should be done "in the new way", i.e.

Almost no TODOs, FIXMEs, xxx HACK
No TIterator-style loops, because the range-based loops are now 30% faster, less leaks and easier to read
I don't accept code that's commented out and destined to rot, as nobody is compiling and checking it.
Includes from within ROOT should be in " ".
And the Kahan summation should be unified as you already anticipated.
There's code that has been modernised / sped up while this feature here was in development, and it seems that one should look at both worlds, and take the best of both. Committing it like this would set us back two years, losing possible improvements, e.g. due to vectorisation.

For hot loops, one should consider to either make them auto-vectorisable by e.g. using ROOT's KahanSum (but this requires that users compile ROOT with vectorisation support) or (better) to put them in Manos' mini computation library that gets compiled for multiple architecture. This will get the best speed on every platform. That's of course much more work, though ... 🙁

roofit/roofitcore/inc/RooMinimizer.h

roofit/roofitcore/inc/TestStatistics/LikelihoodGradientWrapper.h

roofit/roofitcore/inc/TestStatistics/LikelihoodSerial.h

roofit/roofitcore/test/TestStatistics/testLikelihoodSerial.cxx

egpbos · 2021-07-21T09:36:56Z

@hageboeck thanks for another Herculean reviewing effort :) Your comments make total sense; indeed, all the copy-pasted stuff still has to be merged with all the modernization and optimization work that was done in the past two years.

I will go through as much of your suggestions as I can before I will be on leave after tomorrow for three weeks.

After that, if anyone else has time to work on some of the issues, I'd of course welcome the help. As you know, my time on the project is running out, so I probably won't have time to integrate everything. For instance, Manos' mini computation library I have only learned about in the last few months and don't know it in enough detail to make any kind of sensible estimate of how to do it, let alone how much time this would take.

Let's discuss detailed plans in our meeting at 14:00 today.

@hageboeck

Based on review by @hageboeck in PR root-project#8700. - Cleaned remaining TODOs, commented out code, etc. - Replaced TIterator and RooFIter with range-based for-loops. - Include cleanup. - Replace raw with unique_ptrs where possible, also thereby fixing many memory leaks in tests. - Remove unused functions (processEmptyDataSets) - Use auto. Note: work in progress. Please amend further modernization work to this commit (and remove this WIP notice).

egpbos · 2021-07-22T15:07:43Z

I went through all comments, closed the ones I fixed and left some further comments for follow up discussion. The fixes I made so far are summarized in the commit message. The obvious big things still are KahanSum and adding lots of documentation. I will do these (and the other remaining todo's in the first post above this PR, #8700 (comment)) after my holidays.

egpbos · 2021-09-01T13:36:30Z

During our meeting just now we decided that the highest priority issues for me to work on the coming days are:

Documentation
The top-level likelihood builder interface / PDF analysis (together with @wverkerke).

If time permits, after that I will try to harmonize the Kahan sum usages and the other remaining issues, but we may need to postpone these to another PR.

@hageboeck

Based on review by @hageboeck in PR root-project#8700. - Cleaned remaining TODOs, commented out code, etc. - Replaced TIterator and RooFIter with range-based for-loops. - Include cleanup. - Replace raw with unique_ptrs where possible, also thereby fixing many memory leaks in tests. - Remove unused functions (processEmptyDataSets) - Use auto. Note: work in progress. Please amend further modernization work to this commit (and remove this WIP notice).

phsft-bot · 2021-09-13T13:33:14Z

Build failed on ROOT-ubuntu16/nortcxxmod.
Running on sft-ubuntu-1604-2.cern.ch:/build/workspace/root-pullrequests-build
See console output.

Failing tests:

phsft-bot · 2021-09-13T14:28:54Z

Build failed on mac11.0/cxx17.
Running on macphsft23.dyndns.cern.ch:/Users/sftnight/build/workspace/root-pullrequests-build
See console output.

Failing tests:

phsft-bot · 2021-09-13T15:08:31Z

Build failed on windows10/cxx14.
Running on null:C:\build\workspace\root-pullrequests-build
See console output.

Failing tests:

egpbos · 2021-09-14T07:35:33Z

All remaining CI issues are not caused by this PR.

If someone wants to review again, feel free. If not, I think we can merge this now.

Note that I tried to keep commit history clean and used fixup commits + rebase to also make sure all commits compile and pass tests so it should be possible to merge without squashing. Since I made a few changes in the calculation area (Kahan sums, batch computation) which could affect precision and hence results (and hence tests) it would be nice if the commit history of this PR could be included fully.

lmoneta · 2021-09-14T09:23:19Z

@phsft-bot build with flags -DCTEST_TEST_EXCLUDE_NONE=On

phsft-bot · 2021-09-14T09:23:25Z

Starting build on ROOT-debian10-i386/cxx14, ROOT-performance-centos8-multicore/default, ROOT-ubuntu16/nortcxxmod, mac1014/python3, mac11.0/cxx17, windows10/cxx14 with flags -DCTEST_TEST_EXCLUDE_NONE=On
How to customize builds

lmoneta · 2021-09-14T09:24:50Z

Thank you Patrick for the update. I am testing it now again with all tests and tutorials

phsft-bot · 2021-09-14T10:17:00Z

Build failed on ROOT-ubuntu16/nortcxxmod.
Running on sft-ubuntu-1604-1.cern.ch:/build/workspace/root-pullrequests-build
See console output.

Failing tests:

phsft-bot · 2021-09-14T10:43:44Z

Build failed on mac11.0/cxx17.
Running on macphsft23.dyndns.cern.ch:/Users/sftnight/build/workspace/root-pullrequests-build
See console output.

Failing tests:

phsft-bot · 2021-09-14T11:40:30Z

Build failed on windows10/cxx14.
Running on null:C:\build\workspace\root-pullrequests-build
See console output.

Failing tests:

guitargeek · 2021-09-14T12:30:56Z

roofit/roofitcore/src/RooNLLVar.cxx

+}
+
+// static function, also used from TestStatistics::RooUnbinnedL
+RooNLLVar::ComputeResult RooNLLVar::computeBatchedFunc(const RooAbsPdf *pdfClone, RooAbsData *dataClone,


Suggested change

RooNLLVar::ComputeResult RooNLLVar::computeBatchedFunc(const RooAbsPdf *pdfClone, RooAbsData *dataClone,

RooNLLVar::ComputeResult RooNLLVar::computeBatchedFunc(const RooAbsPdf & pdf, RooAbsData & data,

It's not relevant that these are *Clone at this point, and it's better to use references and not pointers if they can't be null.

guitargeek · 2021-09-14T12:31:36Z

roofit/roofitcore/src/RooNLLVar.cxx

+}

+// static function, also used from TestStatistics::RooUnbinnedL
+RooNLLVar::ComputeResult RooNLLVar::computeScalarFunc(const RooAbsPdf *pdfClone, RooAbsData *dataClone,


Suggested change

RooNLLVar::ComputeResult RooNLLVar::computeScalarFunc(const RooAbsPdf *pdfClone, RooAbsData *dataClone,

RooNLLVar::ComputeResult RooNLLVar::computeScalarFunc(const RooAbsPdf & pdf, RooAbsData const& data,

guitargeek · 2021-09-14T12:41:17Z

roofit/roofitcore/inc/RooDataHist.h

+  // for access into copied dataset:
+  friend class RooFit::TestStatistics::RooAbsL;


The changes in this file are not necessary, RooAbsL is not using any non-public functions in RooDataHist.

Ah, weird. I thought this was necessary for RooAbsL::initClones, but maybe I've edited the dependency out.

guitargeek · 2021-09-14T12:49:14Z

roofit/roofitcore/inc/TestStatistics/MinuitFcnGrad.h

+MinuitFcnGrad *
+MinuitFcnGrad::create(const std::shared_ptr<RooFit::TestStatistics::RooAbsL> &likelihood, RooMinimizer *context,
+                      std::vector<ROOT::Fit::ParameterSettings> &parameters, bool verbose)
+{


Suggested change

MinuitFcnGrad *

MinuitFcnGrad::create(const std::shared_ptr<RooFit::TestStatistics::RooAbsL> &likelihood, RooMinimizer *context,

std::vector<ROOT::Fit::ParameterSettings> &parameters, bool verbose)

{

std::unique_ptr<MinuitFcnGrad>

MinuitFcnGrad::create(const std::shared_ptr<RooFit::TestStatistics::RooAbsL> &likelihood, RooMinimizer *context,

std::vector<ROOT::Fit::ParameterSettings> &parameters, bool verbose)

{

Can we return a smart pointer instead?

guitargeek

Thanks Patrick, this PR looks good as it is!

new test statistic classes are introduced to RooFit
review comment by @hageboeck were addressed
the things I don't like myself will be addressed by me in a follow up PR so @egpbos doesn't have to spent time on this he doesn't have
I don't agree that the test failures on Windows are necessarily unrelated to this PR, because the failures are related to RooFit. It seems the RooFit library is unexpectedly loaded now, because the RooFit banner appears in PyROOT even if RooFit is not used. But given that it's difficult to debug windows failures, this can be investigated after merging this PR (and maybe it's a good excuse to get rid of the banner).

Comments were addressed by Patrick

@hageboeck

Based on review by @hageboeck and @guitargeek in PR #8700. - Cleaned remaining TODOs, commented out code, etc. - Replaced TIterator and RooFIter with range-based for-loops. - Include cleanup. - Replace raw with unique_ptrs where possible, also thereby fixing many memory leaks in tests. - Remove unused functions (processEmptyDataSets) - Use auto. - Break circular dependency between RooMinimizer & MinuitFcnGrad * Also fixes some minor ensuing include errors and reorders RooMinimizer.cxx includes. - Some minor automatic style fixes and corrects to debugging/warning prints in RooAbsMinimizerFcn. - Fixes RooFormulaVar failure in RooRealL test, thanks to @guitargeek.

@hageboeck

Based on review by @hageboeck in PR #8700.

This is a shot in the dark to fix the following test failures in the nightly: ``` projectroot.roottest.python.cling.roottest_python_cling_class projectroot.roottest.python.cling.roottest_python_cling_api projectroot.roottest.root.meta.tclass.regression.roottest_root_meta_tclass_regression_execNormalizationInfPy projectroot.roottest.python.cling.roottest_python_cling_cling projectroot.roottest.root.meta.enumPayloadManipulation.roottest_root_meta_enumPayloadManipulation_checkEnumFwdDecl ``` For sure we know that these failures got introduced by root-project#8700. I narrowed down the origin of this regression to a small part of the diff of the full PR, and the `main()` function in `testLikelihoodSerial.cxx` is the most suspicious thing in that diff.

This is a shot in the dark to fix the following test failures in the nightly: ``` projectroot.roottest.python.cling.roottest_python_cling_class projectroot.roottest.python.cling.roottest_python_cling_api projectroot.roottest.root.meta.tclass.regression.roottest_root_meta_tclass_regression_execNormalizationInfPy projectroot.roottest.python.cling.roottest_python_cling_cling projectroot.roottest.root.meta.enumPayloadManipulation.roottest_root_meta_enumPayloadManipulation_checkEnumFwdDecl ``` For sure we know that these failures got introduced by root-project#8700. I narrowed down the origin of this regression to a small part of the diff of the full PR. The bad guy is some change in one of these files: * roofit/roofitcore/inc/RooMinimizer.h * roofit/roofitcore/src/RooMinimizer.cxx * roofit/roofitcore/test/CMakeLists.txt * roofit/roofitcore/test/TestStatistics/testLikelihoodSerial.cxx The `main()` function in `testLikelihoodSerial.cxx` is the most suspicious part of that diff.

This is a shot in the dark to fix the following test failures in the nightly: ``` projectroot.roottest.python.cling.roottest_python_cling_class projectroot.roottest.python.cling.roottest_python_cling_api projectroot.roottest.root.meta.tclass.regression.roottest_root_meta_tclass_regression_execNormalizationInfPy projectroot.roottest.python.cling.roottest_python_cling_cling projectroot.roottest.root.meta.enumPayloadManipulation.roottest_root_meta_enumPayloadManipulation_checkEnumFwdDecl ``` For sure we know that these failures got introduced by root-project#8700. I narrowed down the origin of this regression to a small part of the diff of the full PR. The bad guy is some change in one of these files: * roofit/roofitcore/inc/RooMinimizer.h * roofit/roofitcore/src/RooMinimizer.cxx * roofit/roofitcore/test/CMakeLists.txt * roofit/roofitcore/test/TestStatistics/testLikelihoodSerial.cxx The the call to `RooSentinel::activate()` in a templated function in the header is very suspicious. Since it's useless to call in `RooMinimizer` (and also `RooMinuit` for that matter), these calls are removed. The only thing that the RooSentinel is used for is to make sure that the memory arenas for `RooArgSet` and `RooDataSet` are cleaned up.

@egpbos

Avoid using forward-declared class as default template arguments in `RooMinimizer` and `MinuitFcnGrad`. This is the fix for the following test failures in the nightlies: ``` projectroot.roottest.python.cling.roottest_python_cling_class projectroot.roottest.python.cling.roottest_python_cling_api projectroot.roottest.root.meta.tclass.regression.roottest_root_meta_tclass_regression_execNormalizationInfPy projectroot.roottest.python.cling.roottest_python_cling_cling projectroot.roottest.root.meta.enumPayloadManipulation.roottest_root_meta_enumPayloadManipulation_checkEnumFwdDecl ``` For sure we know that these failures got introduced by root-project#8700. The failures that we see since [root-project#8700](root-project#8700), here are the comments where the bot reported them first: First Ubuntu 16 fail: [root-project#8700 (comment)](root-project#8700 (comment)) First Windows 10 fail: [root-project#8700 (comment)](root-project#8700 (comment)) I narrowed down the origin of this regression to a small part of the diff of the full PR. The bad guy is some change in one of these files: * roofit/roofitcore/inc/RooMinimizer.h * roofit/roofitcore/test/CMakeLists.txt * roofit/roofitcore/test/TestStatistics/testLikelihoodSerial.cxx What was fishy in `RooMinimizer.h` was the usage of a forward-declared class as default template argument. The default template arguments are commented out now, because these will only become relevant in later developments by @egpbos.

@egpbos

Avoid using forward-declared class as default template arguments in `RooMinimizer` and `MinuitFcnGrad`. This is the fix for the following test failures in the nightlies: ``` projectroot.roottest.python.cling.roottest_python_cling_class projectroot.roottest.python.cling.roottest_python_cling_api projectroot.roottest.root.meta.tclass.regression.roottest_root_meta_tclass_regression_execNormalizationInfPy projectroot.roottest.python.cling.roottest_python_cling_cling projectroot.roottest.root.meta.enumPayloadManipulation.roottest_root_meta_enumPayloadManipulation_checkEnumFwdDecl ``` For sure we know that these failures got introduced by #8700. The failures that we see since [#8700](#8700), here are the comments where the bot reported them first: First Ubuntu 16 fail: [#8700 (comment)](#8700 (comment)) First Windows 10 fail: [#8700 (comment)](#8700 (comment)) I narrowed down the origin of this regression to a small part of the diff of the full PR. The bad guy is some change in one of these files: * roofit/roofitcore/inc/RooMinimizer.h * roofit/roofitcore/test/CMakeLists.txt * roofit/roofitcore/test/TestStatistics/testLikelihoodSerial.cxx What was fishy in `RooMinimizer.h` was the usage of a forward-declared class as default template argument. The default template arguments are commented out now, because these will only become relevant in later developments by @egpbos.

Axel-Naumann · 2021-10-28T13:45:49Z

roofit/roofitcore/inc/TestStatistics/ConstantTermsOptimizer.h

+ * Copyright (c) 2000-2021, Regents of the University of California          *
+ *                          and Stanford University. All rights reserved.    *


@egpbos @lmoneta Could this be fixed to state (c) CERN, please? I don't think we should donate this code to Stanford, there's no need :-)

I'll do this, about to open a PR with some follow-ups to the multiprocessing

egpbos requested review from bellenot, guitargeek, hageboeck and lmoneta as code owners July 19, 2021 14:57