RooFit MultiProcess and TestStatistics#8294
RooFit MultiProcess and TestStatistics#8294egpbos wants to merge 518 commits intoroot-project:masterfrom
Conversation
preexisting BidirMMapPipe instances this allows more complex communication topologies
checks whether process pid is equal to the earlier determined parent id, this ensures the waitpid can actually be done (needs child)
The test currently gets stuck for some reason.
|
Build failed on mac11.0/cxx17. Errors:
|
|
Build failed on mac1014/python3. Errors:
|
|
Build failed on windows10/cxx14. Errors:
|
As highlighted by @lmoneta in PR review comments (root-project#8369 (review)), the second derivative and step size information is not necessary at this point. The floating point exact minuit-external (roofit-internal) derivative duplication is currently done using ExternalInternalGradientCalculator and further made possible by the long double parameter transformation functions. This is all we need for now.
…Fit_MultiProcess_PR
|
Starting build on |
|
Build failed on ROOT-performance-centos8-multicore/default. Errors:
|
|
Build failed on mac11.0/cxx17. Errors:
|
|
Build failed on ROOT-ubuntu16/nortcxxmod. Errors:
|
|
Build failed on mac1014/python3. Errors:
|
|
Build failed on windows10/cxx14. Errors:
|
|
Build failed on ROOT-debian10-i386/cxx14. Errors:
|
| void LikelihoodJob::update_bool(std::size_t ix, bool value) { | ||
| if (get_manager()->process_manager().is_master()) { | ||
| auto msg = RooFit::MultiProcess::M2Q::update_bool; | ||
| get_manager()->messenger().send_from_queue_to_master(msg, ix, value); |
There was a problem hiding this comment.
Wrong! Should be master_to_queue.
| //} | ||
|
|
||
| void LikelihoodJob::send_back_task_result_from_worker(std::size_t /*task*/) { | ||
| get_manager()->messenger().send_from_worker_to_master(result, carry); |
There was a problem hiding this comment.
This all doesn't work, e.g. this doesn't send task ID at the moment, see LikelihoodGradientJob for a better implementation.
| } | ||
|
|
||
| void LikelihoodGradientJob::receive_task_result_on_queue(std::size_t task, std::size_t worker_id) | ||
| { |
There was a problem hiding this comment.
Remove, old plumbing.
|
Closing because all these developments are merged, with #9349 as the last PR. |
This PR adds to RooFit:
TestStatistics/likelihood_buildersstill has to be finished, this will be done in the coming few weeks.RooFitZMQ maybe still needs some attention, because in its current form it includes a big part of the libzmq source tree (needed for ppoll, see below), which I'm sure causes licensing issues (it's LGPLv3). I'm open to suggestions on how to handle this.
To make the above additions possible, some modifications to both RooFit and non-RooFit code were made as well:
Minuit2:NumericalDerivatorMinuit2, was based on earlier work by @lmoneta who already had separated out the bulk of the gradient calculation code from Minuit2.mathcore: Some additions toIFunctionwere made to allow Minuit2 to probe functions for their ability to generate gradients and second derivatives. Similar additions were made to function adapter classes in Minuit2.CPUAffinity. In Unix systems (not macOS), this makes the MPFE based parallelization a lot faster by pinning processes to physical CPU cores.createtemplate factory function. This template function allows users to pass in their own calculation back-ends, e.g. for calculating on GPUs or in autograd enabled frameworks.The commit history also contains the proof of concept version, the benchmark results of which were presented at ACAT19 and CHEP19 (and preliminary results at the 2018 ROOT Users workshop in Sarajevo). That version was redesigned starting from 2019 to better integrate with the rest of the code and at the same time untangle the test statistics classes to conceptually bring them closer to the math, instead of the more implementation-detail oriented existing design (RooAbsTestStatistic et al.).
The new packages include the following tests, which should probably still be added to the testing infrastructure somehow:
From my side (and that of the NL eScience Center), the project has ended and time has run out to make any further major contributions to it, except, of course finishing this PR and providing help to get it working and to possibly hand over further development :)
Here are some notes for possible future work:
roofitcore/MultiProcess), but have only been partially maintained since we started with the final version. Probably the best thing to do there is to remove that, but maybe people disagree and want to keep it for comparison while benchmarking and reproducing the results of the proof-of-concept benchmarks. Note: BidirMMapPipe is in there as well, since it was moved there. This class is used in the RooRealMPFE event-based parallelization method that was present already before I started.RooGaussMinimizerFcnandRooTaskSpecwere also part of our proof-of-concept exploration work.RooTimerandRooJSONListFile, but also strewn around the code there are still somechronoincludes or other timing remnants.This work was done over the past 5 years at the initiative of Wouter Verkerke @wverkerke under a Netherlands eScience Center grant, with direct code contributions from @vincecr0ft and @ipelupessy on the RooFit side and @roelaaij on ZeroMQ, lots of support from @cburgard, Lydia Brenner and @jiskattema, invaluable design input from @hageboeck and @lmoneta in the final stage of moving from proof of concept version to the version before you.