Skip to content

[I/O] Race condition when reading vectors with custom allocators with TTreeProcessorMT #10357

@eguiraud

Description

@eguiraud

This is a reproducer (segfaults frequently but not always):

#include <ROOT/TTreeProcessorMT.hxx>
#include <TROOT.h>
#include <TTreeReader.h>
#include <TTreeReaderArray.h>

void workload(TTreeReader &r) {
  TTreeReaderArray<double> ra(r, "truthCaloPt");
  while (r.Next())
    ra.GetSize();
}

int main() {
  ROOT::EnableImplicitMT(2);
  ROOT::TTreeProcessorMT mt({"f1.root", "f2.root", "f3.root", "f4.root", "f5.root"}, "t");
  mt.Process(workload);
}

With these files: files.zip

The problem seems to be at the level of TGenCollectionProxy: multiple threads end up sharing the same TGenCollectionProxy objects, which is not thread safe (e.g. because of

// FIXME: This is not thread safe.
TVirtualCollectionProxy::TPushPop env(const_cast<TEmulatedCollectionProxy*>(this), p);
). In principle, however, as we use different TChains/TTreeReaders in each thread, they should also access different TGenCollectionProxy instances.

Example backtraces at the point of crash (this is one of several failure modes, but it's the one where the problem is clear -- both threads, at frame 0, are accessing the same TGenCollectionProxy instance):

>>> thread apply all bt 10

Thread 2 (Thread 0x7fffdc0e2640 (LWP 312745) "repro_ttreeproc"):
#0  0x00007ffff767d973 in TGenCollectionProxy::PopProxy (this=0x7fffd4016090) at ../io/io/src/TGenCollectionProxy.cxx:1333
#1  0x00007ffff7d57a15 in TVirtualCollectionProxy::TPushPop::~TPushPop (this=0x7fffdc0dad20, __in_chrg=<optimized out>) at ../core/cont/inc/TVirtualCollectionProxy.h:65
#2  0x00007ffff76274b1 in TEmulatedCollectionProxy::Destructor (this=0x7fffd4016090, p=0x7fffd40156e0, dtorOnly=false) at ../io/io/src/TEmulatedCollectionProxy.cxx:87
#3  0x00007ffff7d4f8c2 in TClass::Destructor (this=0x7fffd40152c0, obj=0x7fffd40156e0, dtorOnly=false) at ../core/meta/src/TClass.cxx:5417
#4  0x00007ffff676afdb in TBranchElement::ReleaseObject (this=0x7fffd4017590) at ../tree/tree/src/TBranchElement.cxx:4743
#5  0x00007ffff676b265 in TBranchElement::ResetAddress (this=0x7fffd4017590) at ../tree/tree/src/TBranchElement.cxx:4806
#6  0x00007ffff675b10b in TBranchElement::~TBranchElement (this=0x7fffd4017590, __in_chrg=<optimized out>) at ../tree/tree/src/TBranchElement.cxx:982
#7  0x00007ffff675b338 in TBranchElement::~TBranchElement (this=0x7fffd4017590, __in_chrg=<optimized out>) at ../tree/tree/src/TBranchElement.cxx:1003
#8  0x00007ffff7ceae9f in TCollection::GarbageCollect (obj=0x7fffd4017590) at ../core/cont/src/TCollection.cxx:736
#9  0x00007ffff7cfbe70 in TObjArray::Delete (this=0x7fffd4011ab8) at ../core/cont/src/TObjArray.cxx:376
(More stack frames follow...)

Thread 1 (Thread 0x7ffff42bec00 (LWP 312681) "repro_ttreeproc"):
#0  0x00007ffff767d973 in TGenCollectionProxy::PopProxy (this=0x7fffd4016090) at ../io/io/src/TGenCollectionProxy.cxx:1333
#1  0x00007ffff656b78d in (anonymous namespace)::TCollectionLessSTLReader::GetSize (this=0x5555577ccb80, proxy=0x5555577cdde0) at ../tree/treeplayer/src/TTreeReaderArray.cxx:130
#2  0x0000555555561837 in ROOT::Internal::TTreeReaderArrayBase::GetSize (this=0x7fffffffc1c0) at /home/blue/ROOT/master/cmake-build-foo/include/TTreeReaderArray.h:35
#3  0x00005555555612bc in workload (r=...) at repro_ttreeprocmt.cpp:10
#4  0x0000555555563ef5 in std::__invoke_impl<void, void (*&)(TTreeReader&), TTreeReader&> (__f=@0x7fffffffde50: 0x555555561269 <workload(TTreeReader&)>) at /usr/include/c++/11.2.0/bits/invoke.h:61
#5  0x0000555555563784 in std::__invoke_r<void, void (*&)(TTreeReader&), TTreeReader&> (__fn=@0x7fffffffde50: 0x555555561269 <workload(TTreeReader&)>) at /usr/include/c++/11.2.0/bits/invoke.h:111
#6  0x0000555555562df8 in std::_Function_handler<void (TTreeReader&), void (*)(TTreeReader&)>::_M_invoke(std::_Any_data const&, TTreeReader&) (__functor=..., __args#0=...) at /usr/include/c++/11.2.0/bits/std_function.h:291
#7  0x00007ffff659e8a9 in std::function<void (TTreeReader&)>::operator()(TTreeReader&) const (this=0x7fffffffde50, __args#0=...) at /usr/include/c++/11.2.0/bits/std_function.h:560
#8  0x00007ffff659881c in operator() (__closure=0x7fffffffcf10, c=...) at ../tree/treeplayer/src/TTreeProcessorMT.cxx:555
#9  0x00007ffff6599d8c in operator() (__closure=0x7fffffffceb0, i=0) at ../core/imt/inc/ROOT/TThreadExecutor.hxx:231
(More stack frames follow...)

First reported at https://root-forum.cern.ch/t/root-6-26-00-issue-with-multi-threaded-rdataframe-and-rvec/49310 .

Metadata

Metadata

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions