Conversation
This fixes a crash that looks like:
```
Thread 1 "nix-build" received signal SIGSEGV, Segmentation fault.
0x00007ffff7ad22a0 in GC_push_all_eager () from /nix/store/p1z58l18klf88iijpd0qi8yd2n9lhlk4-boehm-gc-8.0.4/lib/libgc.so.1
(gdb) bt
0 0x00007ffff7ad22a0 in GC_push_all_eager () from /nix/store/p1z58l18klf88iijpd0qi8yd2n9lhlk4-boehm-gc-8.0.4/lib/libgc.so.1
1 0x00007ffff7adeefb in GC_push_all_stacks () from /nix/store/p1z58l18klf88iijpd0qi8yd2n9lhlk4-boehm-gc-8.0.4/lib/libgc.so.1
2 0x00007ffff7ad5ac7 in GC_mark_some () from /nix/store/p1z58l18klf88iijpd0qi8yd2n9lhlk4-boehm-gc-8.0.4/lib/libgc.so.1
3 0x00007ffff7ad77bd in GC_stopped_mark () from /nix/store/p1z58l18klf88iijpd0qi8yd2n9lhlk4-boehm-gc-8.0.4/lib/libgc.so.1
4 0x00007ffff7adbe3a in GC_try_to_collect_inner.part.0 () from /nix/store/p1z58l18klf88iijpd0qi8yd2n9lhlk4-boehm-gc-8.0.4/lib/libgc.so.1
5 0x00007ffff7adc2a2 in GC_collect_or_expand () from /nix/store/p1z58l18klf88iijpd0qi8yd2n9lhlk4-boehm-gc-8.0.4/lib/libgc.so.1
6 0x00007ffff7adc4f8 in GC_allocobj () from /nix/store/p1z58l18klf88iijpd0qi8yd2n9lhlk4-boehm-gc-8.0.4/lib/libgc.so.1
7 0x00007ffff7adc88f in GC_generic_malloc_inner () from /nix/store/p1z58l18klf88iijpd0qi8yd2n9lhlk4-boehm-gc-8.0.4/lib/libgc.so.1
8 0x00007ffff7ae1a04 in GC_generic_malloc_many () from /nix/store/p1z58l18klf88iijpd0qi8yd2n9lhlk4-boehm-gc-8.0.4/lib/libgc.so.1
9 0x00007ffff7ae1c72 in GC_malloc_kind () from /nix/store/p1z58l18klf88iijpd0qi8yd2n9lhlk4-boehm-gc-8.0.4/lib/libgc.so.1
10 0x00007ffff7e003d6 in nix::EvalState::allocValue() () from /nix/store/hzdzcv9d3bc8rlsaphh7x54zsf0x8nx6-nix-2.4pre20210601_5985b8b/lib/libnixexpr.so
11 0x00007ffff7e04b9c in nix::EvalState::callPrimOp(nix::Value&, nix::Value&, nix::Value&, nix::Pos const&) () from /nix/store/hzdzcv9d3bc8rlsaphh7x54zsf0x8nx6-nix-2.4pre20210601_5985b8b/lib/libnixexpr.so
12 0x00007ffff7e0a773 in nix::EvalState::callFunction(nix::Value&, nix::Value&, nix::Value&, nix::Pos const&) () from /nix/store/hzdzcv9d3bc8rlsaphh7x54zsf0x8nx6-nix-2.4pre20210601_5985b8b/lib/libnixexpr.so
13 0x00007ffff7e0a91d in nix::ExprApp::eval(nix::EvalState&, nix::Env&, nix::Value&) () from /nix/store/hzdzcv9d3bc8rlsaphh7x54zsf0x8nx6-nix-2.4pre20210601_5985b8b/lib/libnixexpr.so
14 0x00007ffff7e0a8f8 in nix::ExprApp::eval(nix::EvalState&, nix::Env&, nix::Value&) () from /nix/store/hzdzcv9d3bc8rlsaphh7x54zsf0x8nx6-nix-2.4pre20210601_5985b8b/lib/libnixexpr.so
15 0x00007ffff7e0e0e8 in nix::ExprOpNEq::eval(nix::EvalState&, nix::Env&, nix::Value&) () from /nix/store/hzdzcv9d3bc8rlsaphh7x54zsf0x8nx6-nix-2.4pre20210601_5985b8b/lib/libnixexpr.so
16 0x00007ffff7e0d708 in nix::ExprOpOr::eval(nix::EvalState&, nix::Env&, nix::Value&) () from /nix/store/hzdzcv9d3bc8rlsaphh7x54zsf0x8nx6-nix-2.4pre20210601_5985b8b/lib/libnixexpr.so
17 0x00007ffff7e0d695 in nix::ExprOpOr::eval(nix::EvalState&, nix::Env&, nix::Value&) () from /nix/store/hzdzcv9d3bc8rlsaphh7x54zsf0x8nx6-nix-2.4pre20210601_5985b8b/lib/libnixexpr.so
18 0x00007ffff7e0d695 in nix::ExprOpOr::eval(nix::EvalState&, nix::Env&, nix::Value&) () from /nix/store/hzdzcv9d3bc8rlsaphh7x54zsf0x8nx6-nix-2.4pre20210601_5985b8b/lib/libnixexpr.so
19 0x00007ffff7e0d695 in nix::ExprOpOr::eval(nix::EvalState&, nix::Env&, nix::Value&) () from /nix/store/hzdzcv9d3bc8rlsaphh7x54zsf0x8nx6-nix-2.4pre20210601_5985b8b/lib/libnixexpr.so
20 0x00007ffff7e0d695 in nix::ExprOpOr::eval(nix::EvalState&, nix::Env&, nix::Value&) () from /nix/store/hzdzcv9d3bc8rlsaphh7x54zsf0x8nx6-nix-2.4pre20210601_5985b8b/lib/libnixexpr.so
21 0x00007ffff7e09e19 in nix::ExprOpNot::eval(nix::EvalState&, nix::Env&, nix::Value&) () from /nix/store/hzdzcv9d3bc8rlsaphh7x54zsf0x8nx6-nix-2.4pre20210601_5985b8b/lib/libnixexpr.so
22 0x00007ffff7e0a792 in nix::EvalState::callFunction(nix::Value&, nix::Value&, nix::Value&, nix::Pos const&) () from /nix/store/hzdzcv9d3bc8rlsaphh7x54zsf0x8nx6-nix-2.4pre20210601_5985b8b/lib/libnixexpr.so
23 0x00007ffff7e8cba0 in nix::addPath(nix::EvalState&, nix::Pos const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, nix::Value*, nix::FileIngestionMethod, std::optional<nix::Hash>, nix::Value&)::{lambda(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)#1}::operator()(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const () from /nix/store/hzdzcv9d3bc8rlsaphh7x54zsf0x8nx6-nix-2.4pre20210601_5985b8b/lib/libnixexpr.so
24 0x00007ffff752e6f9 in nix::dump(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, nix::Sink&, std::function<bool (std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)>&) () from /nix/store/hzdzcv9d3bc8rlsaphh7x54zsf0x8nx6-nix-2.4pre20210601_5985b8b/lib/libnixutil.so
25 0x00007ffff752e8e2 in nix::dump(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, nix::Sink&, std::function<bool (std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)>&) () from /nix/store/hzdzcv9d3bc8rlsaphh7x54zsf0x8nx6-nix-2.4pre20210601_5985b8b/lib/libnixutil.so
26 0x00007ffff752e8e2 in nix::dump(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, nix::Sink&, std::function<bool (std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)>&) () from /nix/store/hzdzcv9d3bc8rlsaphh7x54zsf0x8nx6-nix-2.4pre20210601_5985b8b/lib/libnixutil.so
27 0x00007ffff752e8e2 in nix::dump(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, nix::Sink&, std::function<bool (std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)>&) () from /nix/store/hzdzcv9d3bc8rlsaphh7x54zsf0x8nx6-nix-2.4pre20210601_5985b8b/lib/libnixutil.so
28 0x00007ffff752e8e2 in nix::dump(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, nix::Sink&, std::function<bool (std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)>&) () from /nix/store/hzdzcv9d3bc8rlsaphh7x54zsf0x8nx6-nix-2.4pre20210601_5985b8b/lib/libnixutil.so
29 0x00007ffff752e8e2 in nix::dump(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, nix::Sink&, std::function<bool (std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)>&) () from /nix/store/hzdzcv9d3bc8rlsaphh7x54zsf0x8nx6-nix-2.4pre20210601_5985b8b/lib/libnixutil.so
30 0x00007ffff757f8c0 in void boost::context::detail::fiber_entry<boost::context::detail::fiber_record<boost::context::fiber, nix::VirtualStackAllocator, boost::coroutines2::detail::pull_coroutine<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >::control_block::control_block<nix::VirtualStackAllocator, nix::sinkToSource(std::function<void (nix::Sink&)>, std::function<void ()>)::SinkToSource::read(char*, unsigned long)::{lambda(boost::coroutines2::detail::push_coroutine<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >&)#1}>(boost::context::preallocated, nix::VirtualStackAllocator&&, nix::sinkToSource(std::function<void (nix::Sink&)>, std::function<void ()>)::SinkToSource::read(char*, unsigned long)::{lambda(boost::coroutines2::detail::push_coroutine<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >&)#1}&&)::{lambda(boost::context::fiber&&)#1}> >(boost::context::detail::transfer_t) () from /nix/store/hzdzcv9d3bc8rlsaphh7x54zsf0x8nx6-nix-2.4pre20210601_5985b8b/lib/libnixutil.so
31 0x00007ffff6f331ef in make_fcontext () from /nix/store/hzdzcv9d3bc8rlsaphh7x54zsf0x8nx6-nix-2.4pre20210601_5985b8b/lib/libboost_context.so.1.69.0
32 0x0000000000000000 in ?? ()
```
Fixes the problem where a stack pointer outside the original thread causes the collector to crash. It could be made more accurate by recording the stack pointer every time we switch to a coroutine. We can use this information to update our own coroutine stacks like normal data. When the stack pointer is on a thread, we can add a field to GC_thread "fallback_sp" to be used when the thread sp is outside the original thread range.
|
should the boehm-gc patch be upstreamed? looks like it addresses some of the |
The existing fixmes in boehm are for an unrelated feature on very rare architectures where the stack grows in the other direction. I did not implement those fixmes because we don't need it. Otoh, I couldn't resist writing the code to support these rare archs in my own part. I'll remove it, so the next person won't think it matters to Nix. |
As it is, it is part of a very specific solution that works for Nix, which isn't ideal because it scans garbage stack sections sometimes. That's ok because it's a conservative garbage collector and in the case of Nix those garbage sections will not be scanned anymore when the coroutine has ended. This is the main reason I have doubts about upstreaming. |
|
Can confirm that this fixes a segfault we experience when building |
That's fair. Could we open an issue and explain what we did, so they're aware of our use case? |
|
Really great work, thanks! Yeah it would be good to get upstream's ideas about this. |
|
While formulating my question I've found issues I hadn't seen before, with some recommendations. I'll summarize my findings here. |
As has been done in NixOS/nix#4944 This introduces the boehmgc_nix and boehmgc_nixUnstable attributes which are useful for external packages that link with Nix and its boehmgc. (cherry picked from commit 596ac24)
|
The boehmgc patch only applies to Linux. Boehm GC has a separate "stop the world and push thread stacks" implementation for Darwin (see |
This was an oversight. See #4919
It's more frequent than that. The main cause is that an OS thread has its stack pointer in a coroutine, which is not supported by an unpatched GC. Which thread (or stack) invokes the GC doesn't really matter beyond that. |
I've figured out another cause of gc crashes, related to coroutines.
This now solves
I'll run
nixUnstablewith this PR and #4895 to see if any other crashes occur.