RooFit::MultiProcess & TestStatistics part 6: add RooFit::TestStatistics#8700
Conversation
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
hageboeck
left a comment
There was a problem hiding this comment.
Hello!
I understand it's a big thing, but unfortunately it's far from ready. Please excuse that the comments become shorter and shorter, but I literally went through 4,008 additions.
I now also realise that there is a very nice description of what's going on in the commit message, which should be findable in doxygen. I recommend to copy those paragraphs to a findable location in doxygen. The class documentation of the base classes would be one option or the (not yet existing) docstrings for the TestStatistics namespace.
A really big problem while maintaining RooFit was the lack of documentation in many critical places. I will hold you to the higher standard that I tried to establish, since that will help the future maintainers a lot. 🙂
Furthermore, there's some things that should be done "in the new way", i.e.
- Almost no
TODOs,FIXMEs,xxx HACK - No TIterator-style loops, because the range-based loops are now 30% faster, less leaks and easier to read
- I don't accept code that's commented out and destined to rot, as nobody is compiling and checking it.
- Includes from within ROOT should be in
" ". - And the Kahan summation should be unified as you already anticipated.
- There's code that has been modernised / sped up while this feature here was in development, and it seems that one should look at both worlds, and take the best of both. Committing it like this would set us back two years, losing possible improvements, e.g. due to vectorisation.
For hot loops, one should consider to either make them auto-vectorisable by e.g. using ROOT's KahanSum (but this requires that users compile ROOT with vectorisation support) or (better) to put them in Manos' mini computation library that gets compiled for multiple architecture. This will get the best speed on every platform. That's of course much more work, though ... 🙁
|
@hageboeck thanks for another Herculean reviewing effort :) Your comments make total sense; indeed, all the copy-pasted stuff still has to be merged with all the modernization and optimization work that was done in the past two years. I will go through as much of your suggestions as I can before I will be on leave after tomorrow for three weeks. After that, if anyone else has time to work on some of the issues, I'd of course welcome the help. As you know, my time on the project is running out, so I probably won't have time to integrate everything. For instance, Manos' mini computation library I have only learned about in the last few months and don't know it in enough detail to make any kind of sensible estimate of how to do it, let alone how much time this would take. Let's discuss detailed plans in our meeting at 14:00 today. |
Based on review by @hageboeck in PR root-project#8700. - Cleaned remaining TODOs, commented out code, etc. - Replaced TIterator and RooFIter with range-based for-loops. - Include cleanup. - Replace raw with unique_ptrs where possible, also thereby fixing many memory leaks in tests. - Remove unused functions (processEmptyDataSets) - Use auto. Note: work in progress. Please amend further modernization work to this commit (and remove this WIP notice).
This comment has been minimized.
This comment has been minimized.
|
I went through all comments, closed the ones I fixed and left some further comments for follow up discussion. The fixes I made so far are summarized in the commit message. The obvious big things still are |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
During our meeting just now we decided that the highest priority issues for me to work on the coming days are:
If time permits, after that I will try to harmonize the Kahan sum usages and the other remaining issues, but we may need to postpone these to another PR. |
Based on review by @hageboeck in PR root-project#8700. - Cleaned remaining TODOs, commented out code, etc. - Replaced TIterator and RooFIter with range-based for-loops. - Include cleanup. - Replace raw with unique_ptrs where possible, also thereby fixing many memory leaks in tests. - Remove unused functions (processEmptyDataSets) - Use auto. Note: work in progress. Please amend further modernization work to this commit (and remove this WIP notice).
676fb7d to
c3f2300
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
Build failed on ROOT-ubuntu16/nortcxxmod. Failing tests: |
|
Build failed on mac11.0/cxx17. Failing tests: |
|
Build failed on windows10/cxx14. Failing tests:
|
|
All remaining CI issues are not caused by this PR. If someone wants to review again, feel free. If not, I think we can merge this now. Note that I tried to keep commit history clean and used fixup commits + rebase to also make sure all commits compile and pass tests so it should be possible to merge without squashing. Since I made a few changes in the calculation area (Kahan sums, batch computation) which could affect precision and hence results (and hence tests) it would be nice if the commit history of this PR could be included fully. |
|
@phsft-bot build with flags -DCTEST_TEST_EXCLUDE_NONE=On |
|
Starting build on |
|
Thank you Patrick for the update. I am testing it now again with all tests and tutorials |
|
Build failed on ROOT-ubuntu16/nortcxxmod. Failing tests: |
|
Build failed on mac11.0/cxx17. Failing tests: |
|
Build failed on windows10/cxx14. Failing tests:
|
| } | ||
|
|
||
| // static function, also used from TestStatistics::RooUnbinnedL | ||
| RooNLLVar::ComputeResult RooNLLVar::computeBatchedFunc(const RooAbsPdf *pdfClone, RooAbsData *dataClone, |
There was a problem hiding this comment.
| RooNLLVar::ComputeResult RooNLLVar::computeBatchedFunc(const RooAbsPdf *pdfClone, RooAbsData *dataClone, | |
| RooNLLVar::ComputeResult RooNLLVar::computeBatchedFunc(const RooAbsPdf & pdf, RooAbsData & data, |
It's not relevant that these are *Clone at this point, and it's better to use references and not pointers if they can't be null.
| } | ||
|
|
||
| // static function, also used from TestStatistics::RooUnbinnedL | ||
| RooNLLVar::ComputeResult RooNLLVar::computeScalarFunc(const RooAbsPdf *pdfClone, RooAbsData *dataClone, |
There was a problem hiding this comment.
| RooNLLVar::ComputeResult RooNLLVar::computeScalarFunc(const RooAbsPdf *pdfClone, RooAbsData *dataClone, | |
| RooNLLVar::ComputeResult RooNLLVar::computeScalarFunc(const RooAbsPdf & pdf, RooAbsData const& data, |
| // for access into copied dataset: | ||
| friend class RooFit::TestStatistics::RooAbsL; |
There was a problem hiding this comment.
The changes in this file are not necessary, RooAbsL is not using any non-public functions in RooDataHist.
There was a problem hiding this comment.
Ah, weird. I thought this was necessary for RooAbsL::initClones, but maybe I've edited the dependency out.
| MinuitFcnGrad * | ||
| MinuitFcnGrad::create(const std::shared_ptr<RooFit::TestStatistics::RooAbsL> &likelihood, RooMinimizer *context, | ||
| std::vector<ROOT::Fit::ParameterSettings> ¶meters, bool verbose) | ||
| { |
There was a problem hiding this comment.
| MinuitFcnGrad * | |
| MinuitFcnGrad::create(const std::shared_ptr<RooFit::TestStatistics::RooAbsL> &likelihood, RooMinimizer *context, | |
| std::vector<ROOT::Fit::ParameterSettings> ¶meters, bool verbose) | |
| { | |
| std::unique_ptr<MinuitFcnGrad> | |
| MinuitFcnGrad::create(const std::shared_ptr<RooFit::TestStatistics::RooAbsL> &likelihood, RooMinimizer *context, | |
| std::vector<ROOT::Fit::ParameterSettings> ¶meters, bool verbose) | |
| { |
Can we return a smart pointer instead?
guitargeek
left a comment
There was a problem hiding this comment.
Thanks Patrick, this PR looks good as it is!
- new test statistic classes are introduced to RooFit
- review comment by @hageboeck were addressed
- the things I don't like myself will be addressed by me in a follow up PR so @egpbos doesn't have to spent time on this he doesn't have
- I don't agree that the test failures on Windows are necessarily unrelated to this PR, because the failures are related to RooFit. It seems the RooFit library is unexpectedly loaded now, because the RooFit banner appears in PyROOT even if RooFit is not used. But given that it's difficult to debug windows failures, this can be investigated after merging this PR (and maybe it's a good excuse to get rid of the banner).
Comments were addressed by Patrick
Based on review by @hageboeck and @guitargeek in PR #8700. - Cleaned remaining TODOs, commented out code, etc. - Replaced TIterator and RooFIter with range-based for-loops. - Include cleanup. - Replace raw with unique_ptrs where possible, also thereby fixing many memory leaks in tests. - Remove unused functions (processEmptyDataSets) - Use auto. - Break circular dependency between RooMinimizer & MinuitFcnGrad * Also fixes some minor ensuing include errors and reorders RooMinimizer.cxx includes. - Some minor automatic style fixes and corrects to debugging/warning prints in RooAbsMinimizerFcn. - Fixes RooFormulaVar failure in RooRealL test, thanks to @guitargeek.
Based on review by @hageboeck in PR #8700.
This is a shot in the dark to fix the following test failures in the
nightly:
```
projectroot.roottest.python.cling.roottest_python_cling_class
projectroot.roottest.python.cling.roottest_python_cling_api
projectroot.roottest.root.meta.tclass.regression.roottest_root_meta_tclass_regression_execNormalizationInfPy
projectroot.roottest.python.cling.roottest_python_cling_cling
projectroot.roottest.root.meta.enumPayloadManipulation.roottest_root_meta_enumPayloadManipulation_checkEnumFwdDecl
```
For sure we know that these failures got introduced by
root-project#8700.
I narrowed down the origin of this regression to a small part of the
diff of the full PR, and the `main()` function in
`testLikelihoodSerial.cxx` is the most suspicious thing in that diff.
This is a shot in the dark to fix the following test failures in the
nightly:
```
projectroot.roottest.python.cling.roottest_python_cling_class
projectroot.roottest.python.cling.roottest_python_cling_api
projectroot.roottest.root.meta.tclass.regression.roottest_root_meta_tclass_regression_execNormalizationInfPy
projectroot.roottest.python.cling.roottest_python_cling_cling
projectroot.roottest.root.meta.enumPayloadManipulation.roottest_root_meta_enumPayloadManipulation_checkEnumFwdDecl
```
For sure we know that these failures got introduced by
root-project#8700.
I narrowed down the origin of this regression to a small part of the
diff of the full PR. The bad guy is some change in one of these files:
* roofit/roofitcore/inc/RooMinimizer.h
* roofit/roofitcore/src/RooMinimizer.cxx
* roofit/roofitcore/test/CMakeLists.txt
* roofit/roofitcore/test/TestStatistics/testLikelihoodSerial.cxx
The `main()` function in `testLikelihoodSerial.cxx` is the most
suspicious part of that diff.
This is a shot in the dark to fix the following test failures in the
nightly:
```
projectroot.roottest.python.cling.roottest_python_cling_class
projectroot.roottest.python.cling.roottest_python_cling_api
projectroot.roottest.root.meta.tclass.regression.roottest_root_meta_tclass_regression_execNormalizationInfPy
projectroot.roottest.python.cling.roottest_python_cling_cling
projectroot.roottest.root.meta.enumPayloadManipulation.roottest_root_meta_enumPayloadManipulation_checkEnumFwdDecl
```
For sure we know that these failures got introduced by
root-project#8700.
I narrowed down the origin of this regression to a small part of the
diff of the full PR. The bad guy is some change in one of these files:
* roofit/roofitcore/inc/RooMinimizer.h
* roofit/roofitcore/src/RooMinimizer.cxx
* roofit/roofitcore/test/CMakeLists.txt
* roofit/roofitcore/test/TestStatistics/testLikelihoodSerial.cxx
The the call to `RooSentinel::activate()` in a templated function in the
header is very suspicious. Since it's useless to call in `RooMinimizer`
(and also `RooMinuit` for that matter), these calls are removed. The
only thing that the RooSentinel is used for is to make sure that the
memory arenas for `RooArgSet` and `RooDataSet` are cleaned up.
This is a shot in the dark to fix the following test failures in the
nightly:
```
projectroot.roottest.python.cling.roottest_python_cling_class
projectroot.roottest.python.cling.roottest_python_cling_api
projectroot.roottest.root.meta.tclass.regression.roottest_root_meta_tclass_regression_execNormalizationInfPy
projectroot.roottest.python.cling.roottest_python_cling_cling
projectroot.roottest.root.meta.enumPayloadManipulation.roottest_root_meta_enumPayloadManipulation_checkEnumFwdDecl
```
For sure we know that these failures got introduced by
root-project#8700.
I narrowed down the origin of this regression to a small part of the
diff of the full PR. The bad guy is some change in one of these files:
* roofit/roofitcore/inc/RooMinimizer.h
* roofit/roofitcore/src/RooMinimizer.cxx
* roofit/roofitcore/test/CMakeLists.txt
* roofit/roofitcore/test/TestStatistics/testLikelihoodSerial.cxx
The the call to `RooSentinel::activate()` in a templated function in the
header is very suspicious. Since it's useless to call in `RooMinimizer`
(and also `RooMinuit` for that matter), these calls are removed. The
only thing that the RooSentinel is used for is to make sure that the
memory arenas for `RooArgSet` and `RooDataSet` are cleaned up.
Avoid using forward-declared class as default template arguments in
`RooMinimizer` and `MinuitFcnGrad`.
This is the fix for the following test failures in the nightlies:
```
projectroot.roottest.python.cling.roottest_python_cling_class
projectroot.roottest.python.cling.roottest_python_cling_api
projectroot.roottest.root.meta.tclass.regression.roottest_root_meta_tclass_regression_execNormalizationInfPy
projectroot.roottest.python.cling.roottest_python_cling_cling
projectroot.roottest.root.meta.enumPayloadManipulation.roottest_root_meta_enumPayloadManipulation_checkEnumFwdDecl
```
For sure we know that these failures got introduced by
root-project#8700.
The failures that we see since [root-project#8700](root-project#8700), here are the comments where the bot reported them first:
First Ubuntu 16 fail:
[root-project#8700 (comment)](root-project#8700 (comment))
First Windows 10 fail:
[root-project#8700 (comment)](root-project#8700 (comment))
I narrowed down the origin of this regression to a small part of the
diff of the full PR. The bad guy is some change in one of these files:
* roofit/roofitcore/inc/RooMinimizer.h
* roofit/roofitcore/test/CMakeLists.txt
* roofit/roofitcore/test/TestStatistics/testLikelihoodSerial.cxx
What was fishy in `RooMinimizer.h` was the usage of a forward-declared
class as default template argument. The default template arguments are
commented out now, because these will only become relevant in later
developments by @egpbos.
Avoid using forward-declared class as default template arguments in
`RooMinimizer` and `MinuitFcnGrad`.
This is the fix for the following test failures in the nightlies:
```
projectroot.roottest.python.cling.roottest_python_cling_class
projectroot.roottest.python.cling.roottest_python_cling_api
projectroot.roottest.root.meta.tclass.regression.roottest_root_meta_tclass_regression_execNormalizationInfPy
projectroot.roottest.python.cling.roottest_python_cling_cling
projectroot.roottest.root.meta.enumPayloadManipulation.roottest_root_meta_enumPayloadManipulation_checkEnumFwdDecl
```
For sure we know that these failures got introduced by
#8700.
The failures that we see since [#8700](#8700), here are the comments where the bot reported them first:
First Ubuntu 16 fail:
[#8700 (comment)](#8700 (comment))
First Windows 10 fail:
[#8700 (comment)](#8700 (comment))
I narrowed down the origin of this regression to a small part of the
diff of the full PR. The bad guy is some change in one of these files:
* roofit/roofitcore/inc/RooMinimizer.h
* roofit/roofitcore/test/CMakeLists.txt
* roofit/roofitcore/test/TestStatistics/testLikelihoodSerial.cxx
What was fishy in `RooMinimizer.h` was the usage of a forward-declared
class as default template argument. The default template arguments are
commented out now, because these will only become relevant in later
developments by @egpbos.
| * Copyright (c) 2000-2021, Regents of the University of California * | ||
| * and Stanford University. All rights reserved. * |
There was a problem hiding this comment.
I'll do this, about to open a PR with some follow-ups to the multiprocessing
Changes:
This PR introduces a major refactoring of the
RooAbs(Opt)TestStatistic-RooNLLVarinheritance tree into:The likelihood is the central unit on the statistics side. The
RooAbsLclass is implemented for four kinds of likelihoods: binned, unbinned, "subsidiary" (an optimization for numerical stability that gathers components like global observables) and "sum" (over multiple components of the other types). These classes provide ways to compute their components in parallelizable chunks that can be used by the calculator classes as they see fit. On top of the likelihood classes, we also provide for convenience a set of likelihood builders.The calculator "
...Wrapper" classes are abstract interfaces. These can be implemented for different kinds of algorithms, or with different kinds of optimization "back-ends" in mind. In an upcoming PR, we will introduce the fork-based multi-processing implementation based onRooFit::MultiProcess. Other possible implementations could use the GPU or external tools like TensorFlow.The coupling of all these classes to
RooMinimizeris made via theMinuitFcnGradclass, which owns the...Wrappersthat calculate the likelihood components.Todo:
There are still a couple of things that require attention, which I hope the reviewers can help me with:
likelihood_builders.cxxis still missing the top level entry point function. This should be quite a simple function: it should just check whether the pdf is binned, unbinned or sum and has subsidiary terms. All the building blocks are already present, so it is just a matter of building it, but didn't get to it yet.optional_parameter_types.h: maybe we could replace this with @guitargeek's new configuration structs.testRooRealL.getValRooAdditionfails because it doesn't knowRooFormulaVar... I'm not sure what this means, but I vaguely remember that (I think) @hageboeck and @cburgard were talking about this, so I was hoping you may be able to help me out with fixing that test.Finally, one thing that in an ideal world I would have liked to do is add a serial implementation of the
LikelihoodGradientWrapperclass. Probably, we can reuse a lot ofRooGradMinimizerFcnfor this and it should be rather straightforward. However, I'm rather short on time right now, and these things almost always tend to take longer than one expects, so I'm hesitant to build it now. The advantage would be that we could add a test to this PR that covers all that is added, basically by just running a minimization. Note that I do have such a full-coverage test, but only for theMultiProcessminimization, where I implementedLikelihoodGradientWrapperin the parallelLikelihoodGradientJobclass. So this test will come in "PR 7" (see overview). So, up for discussion, two choices: A: add "LikelihoodGradientSerial" class (based onRooGradMinimizerFcn) so we can add an integration test in this PR; B: wait for the integration test in PR 7.