Skip to content

[RF] Low-statistics fits terminate with BatchMode and NumCPU arguments #9406

@vlisovsk

Description

@vlisovsk

ROOT 6.24/06

I have observed in multiple circumstances that the fits of very small datasets (< 50 events) occasionally terminate if the arguments BatchMode(1) and NumCPU(X) are used, especially if X is a sufficiently large number. This also happens in simultaneous fits where at least one dataset is very small.

A minimal reproducible example is the following:

void test_crash(){
  using namespace RooFit;
  Int_t to_gen = 10;
  RooRealVar m("m","m",5000,5500);
  RooRealVar slope("slope", "slope", -0.001, -1., 1.);
  RooExponential* exp_pdf = new RooExponential("exp", "exp", m, slope);
  RooDataSet* ds = (RooDataSet*) exp_pdf->generate(RooArgSet(m), to_gen);
  exp_pdf->fitTo(*ds, BatchMode(1), NumCPU(20));
}

Here I fit to a dataset of 10 events and it causes the following

...
 NOW USING STRATEGY  1: TRY TO BALANCE SPEED AGAINST RELIABILITY
 **********
 **    6 **MIGRAD         500           1
 **********
 FIRST CALL TO USER FUNCTION AT NEW START POINT, WITH IFLAG=4.
terminate called after throwing an instance of 'std::length_error'
terminate called after throwing an instance of 'terminate called after throwing an instance of 'std::length_error'
std::length_error  what():  '
  what():    what():  vector::_M_fill_insert
terminate called after throwing an instance of 'terminate called after throwing an instance of 'std::length_errorstd::length_error'
'
  what():    what():  vector::_M_fill_insertvector::_M_fill_insertvector::_M_fill_insertvector::_M_fill_insert
terminate called after throwing an instance of '

terminate called after throwing an instance of 'std::length_errorstd::length_error'

'
  what():    what():  vector::_M_fill_insert
vector::_M_fill_insert
terminate called after throwing an instance of 'terminate called after throwing an instance of 'std::length_error'
terminate called after throwing an instance of 'std::length_error'
terminate called after throwing an instance of 'terminate called after throwing an instance of 'std::length_errorstd::length_errorstd::length_error'
  what():  vector::_M_fill_insert
  what():  vector::_M_fill_insert
'
'
  what():  vector::_M_fill_insert  what():  
vector::_M_fill_insert
  what():  vector::_M_fill_insert
terminate called after throwing an instance of 'std::length_error'
  what():  vector::_M_fill_insert
terminate called after throwing an instance of 'std::length_error'
  what():  vector::_M_fill_insert
terminate called after throwing an instance of 'std::length_error'
  what():  vector::_M_fill_insert
terminate called after throwing an instance of 'std::length_error'
terminate called after throwing an instance of '  what():  vector::_M_fill_insert
std::length_error'
terminate called after throwing an instance of 'std::length_error  what():  terminate called after throwing an instance of ''
terminate called after throwing an instance of 'vector::_M_fill_insertstd::length_error
'
std::length_error'
  what():  vector::_M_fill_insert
  what():  vector::_M_fill_insert
  what():  vector::_M_fill_insert
RooRealMPFE::evaluate(nll_exp_expData_55d734b4c5e0_MPFE0) ERROR: unexpected message from server process: 8

At the same time, either setting BatchMode(0) or reducing the number of requested CPU cores allows to avoid this misbehavior. I have also encountered a case (with a complex simultanous fit) where the BatchMode(1) alone was leading to this terminate even without any NumCPU request.

I believe this can be handled by RooFit in a more careful manner to avoid such terminates.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions