-
Notifications
You must be signed in to change notification settings - Fork 1.5k
[RF] Low-statistics fits terminate with BatchMode and NumCPU arguments #9406
Description
ROOT 6.24/06
I have observed in multiple circumstances that the fits of very small datasets (< 50 events) occasionally terminate if the arguments BatchMode(1) and NumCPU(X) are used, especially if X is a sufficiently large number. This also happens in simultaneous fits where at least one dataset is very small.
A minimal reproducible example is the following:
void test_crash(){
using namespace RooFit;
Int_t to_gen = 10;
RooRealVar m("m","m",5000,5500);
RooRealVar slope("slope", "slope", -0.001, -1., 1.);
RooExponential* exp_pdf = new RooExponential("exp", "exp", m, slope);
RooDataSet* ds = (RooDataSet*) exp_pdf->generate(RooArgSet(m), to_gen);
exp_pdf->fitTo(*ds, BatchMode(1), NumCPU(20));
}
Here I fit to a dataset of 10 events and it causes the following
...
NOW USING STRATEGY 1: TRY TO BALANCE SPEED AGAINST RELIABILITY
**********
** 6 **MIGRAD 500 1
**********
FIRST CALL TO USER FUNCTION AT NEW START POINT, WITH IFLAG=4.
terminate called after throwing an instance of 'std::length_error'
terminate called after throwing an instance of 'terminate called after throwing an instance of 'std::length_error'
std::length_error what(): '
what(): what(): vector::_M_fill_insert
terminate called after throwing an instance of 'terminate called after throwing an instance of 'std::length_errorstd::length_error'
'
what(): what(): vector::_M_fill_insertvector::_M_fill_insertvector::_M_fill_insertvector::_M_fill_insert
terminate called after throwing an instance of '
terminate called after throwing an instance of 'std::length_errorstd::length_error'
'
what(): what(): vector::_M_fill_insert
vector::_M_fill_insert
terminate called after throwing an instance of 'terminate called after throwing an instance of 'std::length_error'
terminate called after throwing an instance of 'std::length_error'
terminate called after throwing an instance of 'terminate called after throwing an instance of 'std::length_errorstd::length_errorstd::length_error'
what(): vector::_M_fill_insert
what(): vector::_M_fill_insert
'
'
what(): vector::_M_fill_insert what():
vector::_M_fill_insert
what(): vector::_M_fill_insert
terminate called after throwing an instance of 'std::length_error'
what(): vector::_M_fill_insert
terminate called after throwing an instance of 'std::length_error'
what(): vector::_M_fill_insert
terminate called after throwing an instance of 'std::length_error'
what(): vector::_M_fill_insert
terminate called after throwing an instance of 'std::length_error'
terminate called after throwing an instance of ' what(): vector::_M_fill_insert
std::length_error'
terminate called after throwing an instance of 'std::length_error what(): terminate called after throwing an instance of ''
terminate called after throwing an instance of 'vector::_M_fill_insertstd::length_error
'
std::length_error'
what(): vector::_M_fill_insert
what(): vector::_M_fill_insert
what(): vector::_M_fill_insert
RooRealMPFE::evaluate(nll_exp_expData_55d734b4c5e0_MPFE0) ERROR: unexpected message from server process: 8
At the same time, either setting BatchMode(0) or reducing the number of requested CPU cores allows to avoid this misbehavior. I have also encountered a case (with a complex simultanous fit) where the BatchMode(1) alone was leading to this terminate even without any NumCPU request.
I believe this can be handled by RooFit in a more careful manner to avoid such terminates.