Skip to content

RooFit::MultiProcess & TestStatistics part 6: add RooFit::TestStatistics#8700

Merged
guitargeek merged 6 commits intoroot-project:masterfrom
roofit-dev:RooFit_MultiProcess_PR_6_TestStatistics
Sep 14, 2021
Merged

RooFit::MultiProcess & TestStatistics part 6: add RooFit::TestStatistics#8700
guitargeek merged 6 commits intoroot-project:masterfrom
roofit-dev:RooFit_MultiProcess_PR_6_TestStatistics

Conversation

@egpbos
Copy link
Copy Markdown
Contributor

@egpbos egpbos commented Jul 19, 2021

Changes:

This PR introduces a major refactoring of the RooAbs(Opt)TestStatistic-RooNLLVar inheritance tree into:

  1. statistics-based classes on the one hand;
  2. calculation/evaluation/optimization based classes on the other hand.

The likelihood is the central unit on the statistics side. The RooAbsL class is implemented for four kinds of likelihoods: binned, unbinned, "subsidiary" (an optimization for numerical stability that gathers components like global observables) and "sum" (over multiple components of the other types). These classes provide ways to compute their components in parallelizable chunks that can be used by the calculator classes as they see fit. On top of the likelihood classes, we also provide for convenience a set of likelihood builders.

The calculator "...Wrapper" classes are abstract interfaces. These can be implemented for different kinds of algorithms, or with different kinds of optimization "back-ends" in mind. In an upcoming PR, we will introduce the fork-based multi-processing implementation based on RooFit::MultiProcess. Other possible implementations could use the GPU or external tools like TensorFlow.

The coupling of all these classes to RooMinimizer is made via the MinuitFcnGrad class, which owns the ...Wrappers that calculate the likelihood components.

Todo:

There are still a couple of things that require attention, which I hope the reviewers can help me with:

  1. I have a Kahan summation helper class. This should probably be replaced with the new Kahan summation class in ROOT itself, but I didn't get to this yet. Maybe it is not urgent to do now, up for discussion.
  2. likelihood_builders.cxx is still missing the top level entry point function. This should be quite a simple function: it should just check whether the pdf is binned, unbinned or sum and has subsidiary terms. All the building blocks are already present, so it is just a matter of building it, but didn't get to it yet.
  3. optional_parameter_types.h: maybe we could replace this with @guitargeek's new configuration structs.
  4. testRooRealL.getValRooAddition fails because it doesn't know RooFormulaVar... I'm not sure what this means, but I vaguely remember that (I think) @hageboeck and @cburgard were talking about this, so I was hoping you may be able to help me out with fixing that test.

Finally, one thing that in an ideal world I would have liked to do is add a serial implementation of the LikelihoodGradientWrapper class. Probably, we can reuse a lot of RooGradMinimizerFcn for this and it should be rather straightforward. However, I'm rather short on time right now, and these things almost always tend to take longer than one expects, so I'm hesitant to build it now. The advantage would be that we could add a test to this PR that covers all that is added, basically by just running a minimization. Note that I do have such a full-coverage test, but only for the MultiProcess minimization, where I implemented LikelihoodGradientWrapper in the parallel LikelihoodGradientJob class. So this test will come in "PR 7" (see overview). So, up for discussion, two choices: A: add "LikelihoodGradientSerial" class (based on RooGradMinimizerFcn) so we can add an integration test in this PR; B: wait for the integration test in PR 7.

@phsft-bot

This comment has been minimized.

@phsft-bot

This comment has been minimized.

@phsft-bot

This comment has been minimized.

@phsft-bot

This comment has been minimized.

@phsft-bot

This comment has been minimized.

@phsft-bot

This comment has been minimized.

@phsft-bot

This comment has been minimized.

Copy link
Copy Markdown
Member

@hageboeck hageboeck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello!

I understand it's a big thing, but unfortunately it's far from ready. Please excuse that the comments become shorter and shorter, but I literally went through 4,008 additions.

I now also realise that there is a very nice description of what's going on in the commit message, which should be findable in doxygen. I recommend to copy those paragraphs to a findable location in doxygen. The class documentation of the base classes would be one option or the (not yet existing) docstrings for the TestStatistics namespace.

A really big problem while maintaining RooFit was the lack of documentation in many critical places. I will hold you to the higher standard that I tried to establish, since that will help the future maintainers a lot. 🙂

Furthermore, there's some things that should be done "in the new way", i.e.

  • Almost no TODOs, FIXMEs, xxx HACK
  • No TIterator-style loops, because the range-based loops are now 30% faster, less leaks and easier to read
  • I don't accept code that's commented out and destined to rot, as nobody is compiling and checking it.
  • Includes from within ROOT should be in " ".
  • And the Kahan summation should be unified as you already anticipated.
  • There's code that has been modernised / sped up while this feature here was in development, and it seems that one should look at both worlds, and take the best of both. Committing it like this would set us back two years, losing possible improvements, e.g. due to vectorisation.

For hot loops, one should consider to either make them auto-vectorisable by e.g. using ROOT's KahanSum (but this requires that users compile ROOT with vectorisation support) or (better) to put them in Manos' mini computation library that gets compiled for multiple architecture. This will get the best speed on every platform. That's of course much more work, though ... 🙁

@egpbos
Copy link
Copy Markdown
Contributor Author

egpbos commented Jul 21, 2021

@hageboeck thanks for another Herculean reviewing effort :) Your comments make total sense; indeed, all the copy-pasted stuff still has to be merged with all the modernization and optimization work that was done in the past two years.

I will go through as much of your suggestions as I can before I will be on leave after tomorrow for three weeks.

After that, if anyone else has time to work on some of the issues, I'd of course welcome the help. As you know, my time on the project is running out, so I probably won't have time to integrate everything. For instance, Manos' mini computation library I have only learned about in the last few months and don't know it in enough detail to make any kind of sensible estimate of how to do it, let alone how much time this would take.

Let's discuss detailed plans in our meeting at 14:00 today.

egpbos added a commit to roofit-dev/root that referenced this pull request Jul 22, 2021
Based on review by @hageboeck in PR root-project#8700.

- Cleaned remaining TODOs, commented out code, etc.
- Replaced TIterator and RooFIter with range-based for-loops.
- Include cleanup.
- Replace raw with unique_ptrs where possible, also thereby fixing many memory leaks in tests.
- Remove unused functions (processEmptyDataSets)
- Use auto.

Note: work in progress. Please amend further modernization work to this commit (and remove this WIP notice).
@phsft-bot

This comment has been minimized.

@egpbos
Copy link
Copy Markdown
Contributor Author

egpbos commented Jul 22, 2021

I went through all comments, closed the ones I fixed and left some further comments for follow up discussion. The fixes I made so far are summarized in the commit message. The obvious big things still are KahanSum and adding lots of documentation. I will do these (and the other remaining todo's in the first post above this PR, #8700 (comment)) after my holidays.

@phsft-bot

This comment has been minimized.

@phsft-bot

This comment has been minimized.

@phsft-bot

This comment has been minimized.

@phsft-bot

This comment has been minimized.

@phsft-bot

This comment has been minimized.

@phsft-bot

This comment has been minimized.

@egpbos
Copy link
Copy Markdown
Contributor Author

egpbos commented Sep 1, 2021

During our meeting just now we decided that the highest priority issues for me to work on the coming days are:

  1. Documentation
  2. The top-level likelihood builder interface / PDF analysis (together with @wverkerke).

If time permits, after that I will try to harmonize the Kahan sum usages and the other remaining issues, but we may need to postpone these to another PR.

egpbos added a commit to roofit-dev/root that referenced this pull request Sep 2, 2021
Based on review by @hageboeck in PR root-project#8700.

- Cleaned remaining TODOs, commented out code, etc.
- Replaced TIterator and RooFIter with range-based for-loops.
- Include cleanup.
- Replace raw with unique_ptrs where possible, also thereby fixing many memory leaks in tests.
- Remove unused functions (processEmptyDataSets)
- Use auto.

Note: work in progress. Please amend further modernization work to this commit (and remove this WIP notice).
@egpbos egpbos force-pushed the RooFit_MultiProcess_PR_6_TestStatistics branch from 676fb7d to c3f2300 Compare September 2, 2021 11:21
@phsft-bot

This comment has been minimized.

@phsft-bot

This comment has been minimized.

@phsft-bot
Copy link
Copy Markdown

Build failed on ROOT-ubuntu16/nortcxxmod.
Running on sft-ubuntu-1604-2.cern.ch:/build/workspace/root-pullrequests-build
See console output.

Failing tests:

@phsft-bot
Copy link
Copy Markdown

Build failed on mac11.0/cxx17.
Running on macphsft23.dyndns.cern.ch:/Users/sftnight/build/workspace/root-pullrequests-build
See console output.

Failing tests:

@egpbos
Copy link
Copy Markdown
Contributor Author

egpbos commented Sep 14, 2021

All remaining CI issues are not caused by this PR.

If someone wants to review again, feel free. If not, I think we can merge this now.

Note that I tried to keep commit history clean and used fixup commits + rebase to also make sure all commits compile and pass tests so it should be possible to merge without squashing. Since I made a few changes in the calculation area (Kahan sums, batch computation) which could affect precision and hence results (and hence tests) it would be nice if the commit history of this PR could be included fully.

@lmoneta
Copy link
Copy Markdown
Member

lmoneta commented Sep 14, 2021

@phsft-bot build with flags -DCTEST_TEST_EXCLUDE_NONE=On

@phsft-bot
Copy link
Copy Markdown

Starting build on ROOT-debian10-i386/cxx14, ROOT-performance-centos8-multicore/default, ROOT-ubuntu16/nortcxxmod, mac1014/python3, mac11.0/cxx17, windows10/cxx14 with flags -DCTEST_TEST_EXCLUDE_NONE=On
How to customize builds

@lmoneta
Copy link
Copy Markdown
Member

lmoneta commented Sep 14, 2021

Thank you Patrick for the update. I am testing it now again with all tests and tutorials

@phsft-bot
Copy link
Copy Markdown

Build failed on ROOT-ubuntu16/nortcxxmod.
Running on sft-ubuntu-1604-1.cern.ch:/build/workspace/root-pullrequests-build
See console output.

Failing tests:

@phsft-bot
Copy link
Copy Markdown

Build failed on mac11.0/cxx17.
Running on macphsft23.dyndns.cern.ch:/Users/sftnight/build/workspace/root-pullrequests-build
See console output.

Failing tests:

}

// static function, also used from TestStatistics::RooUnbinnedL
RooNLLVar::ComputeResult RooNLLVar::computeBatchedFunc(const RooAbsPdf *pdfClone, RooAbsData *dataClone,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
RooNLLVar::ComputeResult RooNLLVar::computeBatchedFunc(const RooAbsPdf *pdfClone, RooAbsData *dataClone,
RooNLLVar::ComputeResult RooNLLVar::computeBatchedFunc(const RooAbsPdf & pdf, RooAbsData & data,

It's not relevant that these are *Clone at this point, and it's better to use references and not pointers if they can't be null.

}

// static function, also used from TestStatistics::RooUnbinnedL
RooNLLVar::ComputeResult RooNLLVar::computeScalarFunc(const RooAbsPdf *pdfClone, RooAbsData *dataClone,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
RooNLLVar::ComputeResult RooNLLVar::computeScalarFunc(const RooAbsPdf *pdfClone, RooAbsData *dataClone,
RooNLLVar::ComputeResult RooNLLVar::computeScalarFunc(const RooAbsPdf & pdf, RooAbsData const& data,

Comment on lines +230 to +231
// for access into copied dataset:
friend class RooFit::TestStatistics::RooAbsL;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes in this file are not necessary, RooAbsL is not using any non-public functions in RooDataHist.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, weird. I thought this was necessary for RooAbsL::initClones, but maybe I've edited the dependency out.

Comment on lines +155 to +158
MinuitFcnGrad *
MinuitFcnGrad::create(const std::shared_ptr<RooFit::TestStatistics::RooAbsL> &likelihood, RooMinimizer *context,
std::vector<ROOT::Fit::ParameterSettings> &parameters, bool verbose)
{
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
MinuitFcnGrad *
MinuitFcnGrad::create(const std::shared_ptr<RooFit::TestStatistics::RooAbsL> &likelihood, RooMinimizer *context,
std::vector<ROOT::Fit::ParameterSettings> &parameters, bool verbose)
{
std::unique_ptr<MinuitFcnGrad>
MinuitFcnGrad::create(const std::shared_ptr<RooFit::TestStatistics::RooAbsL> &likelihood, RooMinimizer *context,
std::vector<ROOT::Fit::ParameterSettings> &parameters, bool verbose)
{

Can we return a smart pointer instead?

Copy link
Copy Markdown
Contributor

@guitargeek guitargeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Patrick, this PR looks good as it is!

  • new test statistic classes are introduced to RooFit
  • review comment by @hageboeck were addressed
  • the things I don't like myself will be addressed by me in a follow up PR so @egpbos doesn't have to spent time on this he doesn't have
  • I don't agree that the test failures on Windows are necessarily unrelated to this PR, because the failures are related to RooFit. It seems the RooFit library is unexpectedly loaded now, because the RooFit banner appears in PyROOT even if RooFit is not used. But given that it's difficult to debug windows failures, this can be investigated after merging this PR (and maybe it's a good excuse to get rid of the banner).

@guitargeek guitargeek dismissed hageboeck’s stale review September 14, 2021 13:54

Comments were addressed by Patrick

@guitargeek guitargeek merged commit f0e5258 into root-project:master Sep 14, 2021
guitargeek pushed a commit that referenced this pull request Sep 14, 2021
Based on review by @hageboeck and @guitargeek in PR #8700.

- Cleaned remaining TODOs, commented out code, etc.
- Replaced TIterator and RooFIter with range-based for-loops.
- Include cleanup.
- Replace raw with unique_ptrs where possible, also thereby fixing many memory leaks in tests.
- Remove unused functions (processEmptyDataSets)
- Use auto.
- Break circular dependency between RooMinimizer & MinuitFcnGrad
  * Also fixes some minor ensuing include errors and reorders RooMinimizer.cxx includes.
- Some minor automatic style fixes and corrects to debugging/warning prints in RooAbsMinimizerFcn.
- Fixes RooFormulaVar failure in RooRealL test, thanks to @guitargeek.
guitargeek pushed a commit that referenced this pull request Sep 14, 2021
guitargeek added a commit to guitargeek/root that referenced this pull request Sep 15, 2021
This is a shot in the dark to fix the following test failures in the
nightly:

```
    projectroot.roottest.python.cling.roottest_python_cling_class
    projectroot.roottest.python.cling.roottest_python_cling_api
    projectroot.roottest.root.meta.tclass.regression.roottest_root_meta_tclass_regression_execNormalizationInfPy
    projectroot.roottest.python.cling.roottest_python_cling_cling
    projectroot.roottest.root.meta.enumPayloadManipulation.roottest_root_meta_enumPayloadManipulation_checkEnumFwdDecl
```

For sure we know that these failures got introduced by
root-project#8700.

I narrowed down the origin of this regression to a small part of the
diff of the full PR, and the `main()` function in
`testLikelihoodSerial.cxx` is the most suspicious thing in that diff.
guitargeek added a commit to guitargeek/root that referenced this pull request Sep 15, 2021
This is a shot in the dark to fix the following test failures in the
nightly:

```
    projectroot.roottest.python.cling.roottest_python_cling_class
    projectroot.roottest.python.cling.roottest_python_cling_api
    projectroot.roottest.root.meta.tclass.regression.roottest_root_meta_tclass_regression_execNormalizationInfPy
    projectroot.roottest.python.cling.roottest_python_cling_cling
    projectroot.roottest.root.meta.enumPayloadManipulation.roottest_root_meta_enumPayloadManipulation_checkEnumFwdDecl
```

For sure we know that these failures got introduced by
root-project#8700.

I narrowed down the origin of this regression to a small part of the
diff of the full PR. The bad guy is some change in one of these files:

  * roofit/roofitcore/inc/RooMinimizer.h
  * roofit/roofitcore/src/RooMinimizer.cxx
  * roofit/roofitcore/test/CMakeLists.txt
  * roofit/roofitcore/test/TestStatistics/testLikelihoodSerial.cxx

The `main()` function in `testLikelihoodSerial.cxx` is the most
suspicious part of that diff.
guitargeek added a commit to guitargeek/root that referenced this pull request Sep 15, 2021
This is a shot in the dark to fix the following test failures in the
nightly:

```
    projectroot.roottest.python.cling.roottest_python_cling_class
    projectroot.roottest.python.cling.roottest_python_cling_api
    projectroot.roottest.root.meta.tclass.regression.roottest_root_meta_tclass_regression_execNormalizationInfPy
    projectroot.roottest.python.cling.roottest_python_cling_cling
    projectroot.roottest.root.meta.enumPayloadManipulation.roottest_root_meta_enumPayloadManipulation_checkEnumFwdDecl
```

For sure we know that these failures got introduced by
root-project#8700.

I narrowed down the origin of this regression to a small part of the
diff of the full PR. The bad guy is some change in one of these files:

  * roofit/roofitcore/inc/RooMinimizer.h
  * roofit/roofitcore/src/RooMinimizer.cxx
  * roofit/roofitcore/test/CMakeLists.txt
  * roofit/roofitcore/test/TestStatistics/testLikelihoodSerial.cxx

The the call to `RooSentinel::activate()` in a templated function in the
header is very suspicious. Since it's useless to call in `RooMinimizer`
(and also `RooMinuit` for that matter), these calls are removed. The
only thing that the RooSentinel is used for is to make sure that the
memory arenas for `RooArgSet` and `RooDataSet` are cleaned up.
guitargeek added a commit to guitargeek/root that referenced this pull request Sep 15, 2021
This is a shot in the dark to fix the following test failures in the
nightly:

```
    projectroot.roottest.python.cling.roottest_python_cling_class
    projectroot.roottest.python.cling.roottest_python_cling_api
    projectroot.roottest.root.meta.tclass.regression.roottest_root_meta_tclass_regression_execNormalizationInfPy
    projectroot.roottest.python.cling.roottest_python_cling_cling
    projectroot.roottest.root.meta.enumPayloadManipulation.roottest_root_meta_enumPayloadManipulation_checkEnumFwdDecl
```

For sure we know that these failures got introduced by
root-project#8700.

I narrowed down the origin of this regression to a small part of the
diff of the full PR. The bad guy is some change in one of these files:

  * roofit/roofitcore/inc/RooMinimizer.h
  * roofit/roofitcore/src/RooMinimizer.cxx
  * roofit/roofitcore/test/CMakeLists.txt
  * roofit/roofitcore/test/TestStatistics/testLikelihoodSerial.cxx

The the call to `RooSentinel::activate()` in a templated function in the
header is very suspicious. Since it's useless to call in `RooMinimizer`
(and also `RooMinuit` for that matter), these calls are removed. The
only thing that the RooSentinel is used for is to make sure that the
memory arenas for `RooArgSet` and `RooDataSet` are cleaned up.
guitargeek added a commit to guitargeek/root that referenced this pull request Sep 15, 2021
Avoid using forward-declared class as default template arguments in
`RooMinimizer` and `MinuitFcnGrad`.

This is the fix for the following test failures in the nightlies:

```
    projectroot.roottest.python.cling.roottest_python_cling_class
    projectroot.roottest.python.cling.roottest_python_cling_api
    projectroot.roottest.root.meta.tclass.regression.roottest_root_meta_tclass_regression_execNormalizationInfPy
    projectroot.roottest.python.cling.roottest_python_cling_cling
    projectroot.roottest.root.meta.enumPayloadManipulation.roottest_root_meta_enumPayloadManipulation_checkEnumFwdDecl
```

For sure we know that these failures got introduced by
root-project#8700.

The failures that we see since [root-project#8700](root-project#8700), here are the comments where the bot reported them first:
First Ubuntu 16 fail:
[root-project#8700 (comment)](root-project#8700 (comment))

First Windows 10 fail:
[root-project#8700 (comment)](root-project#8700 (comment))

I narrowed down the origin of this regression to a small part of the
diff of the full PR. The bad guy is some change in one of these files:

  * roofit/roofitcore/inc/RooMinimizer.h
  * roofit/roofitcore/test/CMakeLists.txt
  * roofit/roofitcore/test/TestStatistics/testLikelihoodSerial.cxx

What was fishy in `RooMinimizer.h` was the usage of a forward-declared
class as default template argument. The default template arguments are
commented out now, because these will only become relevant in later
developments by @egpbos.
guitargeek added a commit that referenced this pull request Sep 15, 2021
Avoid using forward-declared class as default template arguments in
`RooMinimizer` and `MinuitFcnGrad`.

This is the fix for the following test failures in the nightlies:

```
    projectroot.roottest.python.cling.roottest_python_cling_class
    projectroot.roottest.python.cling.roottest_python_cling_api
    projectroot.roottest.root.meta.tclass.regression.roottest_root_meta_tclass_regression_execNormalizationInfPy
    projectroot.roottest.python.cling.roottest_python_cling_cling
    projectroot.roottest.root.meta.enumPayloadManipulation.roottest_root_meta_enumPayloadManipulation_checkEnumFwdDecl
```

For sure we know that these failures got introduced by
#8700.

The failures that we see since [#8700](#8700), here are the comments where the bot reported them first:
First Ubuntu 16 fail:
[#8700 (comment)](#8700 (comment))

First Windows 10 fail:
[#8700 (comment)](#8700 (comment))

I narrowed down the origin of this regression to a small part of the
diff of the full PR. The bad guy is some change in one of these files:

  * roofit/roofitcore/inc/RooMinimizer.h
  * roofit/roofitcore/test/CMakeLists.txt
  * roofit/roofitcore/test/TestStatistics/testLikelihoodSerial.cxx

What was fishy in `RooMinimizer.h` was the usage of a forward-declared
class as default template argument. The default template arguments are
commented out now, because these will only become relevant in later
developments by @egpbos.
Comment on lines +9 to +10
* Copyright (c) 2000-2021, Regents of the University of California *
* and Stanford University. All rights reserved. *
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@egpbos @lmoneta Could this be fixed to state (c) CERN, please? I don't think we should donate this code to Stanford, there's no need :-)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll do this, about to open a PR with some follow-ups to the multiprocessing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants