Skip to content

Python thread state segfaults#3853

Merged
MridulS merged 6 commits intoscipp:mainfrom
MridulS:segfaults
Mar 2, 2026
Merged

Python thread state segfaults#3853
MridulS merged 6 commits intoscipp:mainfrom
MridulS:segfaults

Conversation

@MridulS
Copy link
Copy Markdown
Member

@MridulS MridulS commented Feb 24, 2026

Cherry picks c95645f from #3852

The wheel builds for cp14t were failing https://github.com/scipp/scipp/actions/runs/22346477173

This is Claude's reasoning while trying to push it:

  Root Cause Chain

  1. PyEval_SaveThread() detaches thread state (even on free-threaded Python)

  From CPython Thread State docs:
  "Detach the attached thread state and return it. The thread will have no thread state upon returning."

  "For free-threaded builds, the GIL is normally out of the question, but detaching the thread state is still
  required for blocking I/O and long operations."

  So py::call_guard<py::gil_scoped_release>() calls PyEval_SaveThread() which sets the thread state to NULL, even on
   free-threaded Python.

  2. PyCriticalSection_Begin requires an attached thread state

  From CPython's pycore_critical_section.h, the macro expands to:
  PyThreadState *_cs_tstate = _PyThreadState_GET();
  _PyCriticalSection_Begin(_cs_tstate, &_py_cs, op);

  And _PyCriticalSection_BeginMutex then accesses tstate->critical_section with no NULL check. After
  PyEval_SaveThread(), _PyThreadState_GET() returns NULL, causing the segfault at tstate->critical_section.

  3. pybind11's cast() triggers critical sections on free-threaded Python

  pybind11 3.x uses PyCriticalSection internally (via scoped_critical_section in critical_section.h) when
  Py_GIL_DISABLED is defined. The a.cast<T&>() call in the in-place operators accesses Python object internals,
  which triggers PyCriticalSection_Begin on free-threaded builds.

  4. PEP 703 Critical Sections

  CPython issue #111569 documents the implementation of critical sections from PEP 703, confirming they are the
  per-object locking mechanism replacing the GIL in free-threaded builds.

  Why the fix works

  The non-inplace operators (__add__, etc.) take const T &a — pybind11 converts Python→C++ before the call guard, so
   no Python API calls happen after PyEval_SaveThread(). The in-place operators take py::object &a and call
  a.cast<T&>() after the GIL release. The fix extracts the C++ reference while thread state is still attached, then
  releases the GIL only around the pure C++ arithmetic.

  Sources:
  - CPython Thread States and GIL docs
  - CPython pycore_critical_section.h
  - CPython Synchronization Primitives docs
  - PEP 703 Critical Sections implementation
  - pybind11 changelog - free-threaded Python fixes
  - pybind11 critical_section.h

@MridulS
Copy link
Copy Markdown
Member Author

MridulS commented Feb 24, 2026

The wheel builds for free threaded python is fixed with these changes https://github.com/scipp/scipp/actions/runs/22350551565

@MridulS MridulS enabled auto-merge (squash) February 24, 2026 14:05
@MridulS MridulS requested a review from jl-wynen February 24, 2026 14:05
Copy link
Copy Markdown
Member

@jl-wynen jl-wynen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really understand the C API and the documentation is lacking. But I don't see how this PR could be harmful and since it seems to solve the issue, I think we can merge this after some small improvements.

[](py::object &a, Other &b) -> py::object & {
a.cast<T &>() += RHSSetup{}(b);
auto &self = a.cast<T &>();
auto &&rhs = RHSSetup{}(b);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does RHSSetup need the thread state?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, it's in cpp land. Updated and pushed again.

// thread state, and a.cast<T &>() may trigger PyCriticalSection_Begin
// which requires the thread state, causing a segfault. Instead, we
// extract the C++ reference first, then release the GIL only around the
// pure C++ operation.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You lost the original comment. I think explaining why we return by reference here is still relevant.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is now a lot of repeated, non-trivial code here. Can you extract that into a reusable function? E.g., using a callback like

c.def(
    "__iadd__",
    inplace_operator([](auto &&lhs, auto &&rhs) { lhs += rhs; }),
    py::is_operator());

};
inplace_logical("__ior__", [](auto &a, const auto &b) { a |= b; });
inplace_logical("__ixor__", [](auto &a, const auto &b) { a ^= b; });
inplace_logical("__iand__", [](auto &a, const auto &b) { a &= b; });
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use the same pattern for logical and arithmetic operators? That is, either use a function like inplace_logical for all or the c.def("...", inplace_op...) pattern for all.

Also, how come arithmetic uses (auto &a, auto &&b) and logical uses (auto &a, const auto &b)?

@MridulS MridulS merged commit 0c024a3 into scipp:main Mar 2, 2026
37 of 54 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants