Skip to content

[DF] Distributed RDataFrame doesn't handle friend trees correctly #7584

@vepadulano

Description

@vepadulano
  • Checked for duplicates

Describe the bug

Distributed RDataFrame has support for friend trees, but it seems something is missing. See the gist at https://gist.github.com/vepadulano/b42343bff7297958c46675577bce46a9 :

  1. Two RDF are created and one column is filled
  2. They are both snapshotted to disk and merged into a single file through hadd
  3. A TChain is created with one column from the merged file and a friend TChain is attached to it with the second column from the merged file
  4. A distributed RDataFrame with spark is created using the TChain as input, then one histogram per column is booked and drawn to a canvas
  5. The operation fails with
    TypeError: Template method resolution failed:
      none of the 4 overloaded methods succeeded. Full details:
      ROOT::RDF::RResultPtr<TH1D> ROOT::RDF::RInterface<ROOT::Detail::RDF::RRange<ROOT::Detail::RDF::RLoopManager>,void>::Histo1D(experimental::basic_string_view<char,char_traits<char> > vName) =>
        runtime_error: Unknown column: myfriend.rnd
    

Expected behavior

The program should not fail, in fact substituting the distributed rdataframe object with a plain rdataframe gives the correct output image

To Reproduce

  1. Source an environment with ROOT master
  2. download the linked gist
  3. python friendtrees_spark.py

Setup

Fedora 32
ROOT version: master
Built from source

Additional context

Thanks to @Zeguivert for originally reporting this issue

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions