-
Notifications
You must be signed in to change notification settings - Fork 1.5k
[I/O] Race condition when reading vectors with custom allocators with TTreeProcessorMT #10357
Description
This is a reproducer (segfaults frequently but not always):
#include <ROOT/TTreeProcessorMT.hxx>
#include <TROOT.h>
#include <TTreeReader.h>
#include <TTreeReaderArray.h>
void workload(TTreeReader &r) {
TTreeReaderArray<double> ra(r, "truthCaloPt");
while (r.Next())
ra.GetSize();
}
int main() {
ROOT::EnableImplicitMT(2);
ROOT::TTreeProcessorMT mt({"f1.root", "f2.root", "f3.root", "f4.root", "f5.root"}, "t");
mt.Process(workload);
}With these files: files.zip
The problem seems to be at the level of TGenCollectionProxy: multiple threads end up sharing the same TGenCollectionProxy objects, which is not thread safe (e.g. because of
root/io/io/src/TEmulatedCollectionProxy.cxx
Lines 84 to 85 in bce5777
| // FIXME: This is not thread safe. | |
| TVirtualCollectionProxy::TPushPop env(const_cast<TEmulatedCollectionProxy*>(this), p); |
Example backtraces at the point of crash (this is one of several failure modes, but it's the one where the problem is clear -- both threads, at frame 0, are accessing the same TGenCollectionProxy instance):
>>> thread apply all bt 10
Thread 2 (Thread 0x7fffdc0e2640 (LWP 312745) "repro_ttreeproc"):
#0 0x00007ffff767d973 in TGenCollectionProxy::PopProxy (this=0x7fffd4016090) at ../io/io/src/TGenCollectionProxy.cxx:1333
#1 0x00007ffff7d57a15 in TVirtualCollectionProxy::TPushPop::~TPushPop (this=0x7fffdc0dad20, __in_chrg=<optimized out>) at ../core/cont/inc/TVirtualCollectionProxy.h:65
#2 0x00007ffff76274b1 in TEmulatedCollectionProxy::Destructor (this=0x7fffd4016090, p=0x7fffd40156e0, dtorOnly=false) at ../io/io/src/TEmulatedCollectionProxy.cxx:87
#3 0x00007ffff7d4f8c2 in TClass::Destructor (this=0x7fffd40152c0, obj=0x7fffd40156e0, dtorOnly=false) at ../core/meta/src/TClass.cxx:5417
#4 0x00007ffff676afdb in TBranchElement::ReleaseObject (this=0x7fffd4017590) at ../tree/tree/src/TBranchElement.cxx:4743
#5 0x00007ffff676b265 in TBranchElement::ResetAddress (this=0x7fffd4017590) at ../tree/tree/src/TBranchElement.cxx:4806
#6 0x00007ffff675b10b in TBranchElement::~TBranchElement (this=0x7fffd4017590, __in_chrg=<optimized out>) at ../tree/tree/src/TBranchElement.cxx:982
#7 0x00007ffff675b338 in TBranchElement::~TBranchElement (this=0x7fffd4017590, __in_chrg=<optimized out>) at ../tree/tree/src/TBranchElement.cxx:1003
#8 0x00007ffff7ceae9f in TCollection::GarbageCollect (obj=0x7fffd4017590) at ../core/cont/src/TCollection.cxx:736
#9 0x00007ffff7cfbe70 in TObjArray::Delete (this=0x7fffd4011ab8) at ../core/cont/src/TObjArray.cxx:376
(More stack frames follow...)
Thread 1 (Thread 0x7ffff42bec00 (LWP 312681) "repro_ttreeproc"):
#0 0x00007ffff767d973 in TGenCollectionProxy::PopProxy (this=0x7fffd4016090) at ../io/io/src/TGenCollectionProxy.cxx:1333
#1 0x00007ffff656b78d in (anonymous namespace)::TCollectionLessSTLReader::GetSize (this=0x5555577ccb80, proxy=0x5555577cdde0) at ../tree/treeplayer/src/TTreeReaderArray.cxx:130
#2 0x0000555555561837 in ROOT::Internal::TTreeReaderArrayBase::GetSize (this=0x7fffffffc1c0) at /home/blue/ROOT/master/cmake-build-foo/include/TTreeReaderArray.h:35
#3 0x00005555555612bc in workload (r=...) at repro_ttreeprocmt.cpp:10
#4 0x0000555555563ef5 in std::__invoke_impl<void, void (*&)(TTreeReader&), TTreeReader&> (__f=@0x7fffffffde50: 0x555555561269 <workload(TTreeReader&)>) at /usr/include/c++/11.2.0/bits/invoke.h:61
#5 0x0000555555563784 in std::__invoke_r<void, void (*&)(TTreeReader&), TTreeReader&> (__fn=@0x7fffffffde50: 0x555555561269 <workload(TTreeReader&)>) at /usr/include/c++/11.2.0/bits/invoke.h:111
#6 0x0000555555562df8 in std::_Function_handler<void (TTreeReader&), void (*)(TTreeReader&)>::_M_invoke(std::_Any_data const&, TTreeReader&) (__functor=..., __args#0=...) at /usr/include/c++/11.2.0/bits/std_function.h:291
#7 0x00007ffff659e8a9 in std::function<void (TTreeReader&)>::operator()(TTreeReader&) const (this=0x7fffffffde50, __args#0=...) at /usr/include/c++/11.2.0/bits/std_function.h:560
#8 0x00007ffff659881c in operator() (__closure=0x7fffffffcf10, c=...) at ../tree/treeplayer/src/TTreeProcessorMT.cxx:555
#9 0x00007ffff6599d8c in operator() (__closure=0x7fffffffceb0, i=0) at ../core/imt/inc/ROOT/TThreadExecutor.hxx:231
(More stack frames follow...)
First reported at https://root-forum.cern.ch/t/root-6-26-00-issue-with-multi-threaded-rdataframe-and-rvec/49310 .