Skip to content

Spectator type in TMVACrossValidationApplication #9115

@liyimingihep

Description

@liyimingihep

Expected behavior

$ROOTSYS/tutorial/tmva/TMVACrossValidationApplication.C provides a nice example of applying weights trained using the code $ROOTSYS/tutorial/tmva/TMVACrossValidation.C. It assumes the data file has variables x, y and eventID, and assigns the fold deterministically using split expression int([eventID])%int([numFolds]), where in this case numFolds = 2.

Describe the bug

The spectator eventID has integer type in training and in application, however TMVA seems to treat all variables as float, and there is problem in converting integer type into float when treating the split expression. My observation is the split expression returns 0 all the time, namely instead of assigning weights from fold int([eventID])%int([numFolds]) it actually always assigning to fold 0 or numFolds. A simple fix is to require eventID in the data has type float.

To Reproduce

Attached files contains a modified TMVACrossValidationApplication.C demonstrating the issues.
It generates a tree with x, y and eventID, and using weights saved in dataset/weights/ to evaluate BDTG output.
For each eventID it decides which fold using split expression int([eventID])%int([numFolds]) for all the cases below:

  • each event with a unique eventID as integer (as in official ROOT tutorial);
  • all eventID fixed to 1 as integer;
  • all eventID fixed to 2 as integer (equivalent as fixed to 0);
  • all eventID fixed to 1 as float;
  • all eventID fixed to 2 as float (equivalent as fixed to 0);

It can be run with root -l TMVACrossValidationApplication.C, and the output for ten events are listed here:

eventID	 | BDT int(eventID) | BDT int(1) | BDT float(1) | BDT int(2) | BDT float(2) 
1	 | 0.478264	 | 0.478264	 | 0.81558	 | 0.478264	 | 0.478264	 | 
2	 | -0.626303	 | -0.626303	 | -0.796553	 | -0.626303	 | -0.626303	 | 
3	 | -0.612484	 | -0.612484	 | -0.335053	 | -0.612484	 | -0.612484	 | 
4	 | 0.981251	 | 0.981251	 | 0.939638	 | 0.981251	 | 0.981251	 | 
5	 | 0.964202	 | 0.964202	 | 0.696889	 | 0.964202	 | 0.964202	 | 
6	 | 0.992213	 | 0.992213	 | 0.989813	 | 0.992213	 | 0.992213	 | 
7	 | 0.948738	 | 0.948738	 | 0.971397	 | 0.948738	 | 0.948738	 | 
8	 | 0.927619	 | 0.927619	 | 0.932366	 | 0.927619	 | 0.927619	 | 
9	 | 0.478264	 | 0.478264	 | 0.778263	 | 0.478264	 | 0.478264	 | 
10	 | 0.994266	 | 0.994266	 | 0.996632	 | 0.994266	 | 0.994266	 | 

The BDT output when fixing eventID to 1 or 2 should be different, which is the case when they are floats (column 4 and 6);
But if eventID has integer type, whatever the value it has the BDT output is the same as if they were 0 (or the same the numFolds = 2).

Setup

The test above is run with ROOT 6.22 installed in macOS Catalina, though it does seem to matter much.

demo.zip

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions