Skip to content

Add RDataFrame::DefinePerSample #6745

@stwunsch

Description

@stwunsch

We have identified in previous meetings (see here and here) that a typical HEP analysis benefits from a Define version, which evaluates only once per "dataset". The identifier of a "dataset" is not yet clear. An example scenario is given below (event weights per sample, typical for simulated datasets):

// Construct RDF
RDataFrame df(tree, files);

// Declare computations
auto get_scale = [](const Identifier_t& dataset)
 {
   // dataset = filename.root/treename
   if (dataset.contains("Data")) return 1.0;
   else if (dataset.contains("DY")) return 0.9;
   else if (dataset.contains("WJets")) return 1.1;
   else throw std::runtime_error("Unknown dataset");
 };
auto h = df.DefinePerSample("weight", get_scale)
           .Histo1D("nMuon", "weight");

// Access result
h->Draw();

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions