Skip to content

Redo fix lookup for current Thread in async signal handler#122513

Merged
jkotas merged 5 commits intodotnet:mainfrom
janvorli:redo-fix-async-thread-by-threadid-lookup
Dec 14, 2025
Merged

Redo fix lookup for current Thread in async signal handler#122513
jkotas merged 5 commits intodotnet:mainfrom
janvorli:redo-fix-async-thread-by-threadid-lookup

Conversation

@janvorli
Copy link
Member

This is a new version of the fix. The code in HijackCallback was using the fact whether the
pThreadToHijack is NULL or not as an indicator of whether it should
suspend inline or redirect the thread. So passing in the pThreadToHijack
without other changes caused it to never suspend inline on Unix and it
was causing hangs in System.Collections.Concurrent tests.

There are two commits - one is the original change and the second it
the fix.

The current CheckActivationSafePoint uses thread local storage to
get the current Thread instance. But this function is called from
async signal handler (the activation signal handler) and it is not
allowed to access TLS variables there because the access can allocate
and if the interrupted code was running in an allocation code, it
could crash.
There was no problem with this since .NET 1.0, but a change in the
recent glibc version has broken this. We've got reports of crashes
in this code due to the reason mentioned above.

This change introduces an async safe mechanism for accessing the
current Thread instance from async signal handlers. It uses a
segmented array that can grow, but never shrink. Entries for
threads are added when runtime creates a thread / attaches to an
external thread and removed when the thread dies.

The check for safety of the activation injection was further enhanced
to make sure that the ScanReaderLock is not taken. In cases it would
need to be taken, we just reject the location.

Since NativeAOT is subject to the same issue, the code to maintain the
thread id to thread instance map is placed to the minipal and shared
between coreclr and NativeAOT.

Closes #121581

The code in HijackCallback was using the fact whether the
pThreadToHijack is NULL or not as an indicator of whether it should
suspend inline or redirect the thread. So passing in the pThreadToHijack
without other changes caused it to never suspend inline on Unix.
@janvorli janvorli added this to the 11.0.0 milestone Dec 12, 2025
@janvorli janvorli requested a review from jkotas December 12, 2025 23:38
@janvorli janvorli self-assigned this Dec 12, 2025
Copilot AI review requested due to automatic review settings December 12, 2025 23:38
@janvorli
Copy link
Member Author

cc: @agocke

@dotnet-policy-service
Copy link
Contributor

Tagging subscribers to this area: @mangod9
See info in area-owners.md if you want to be subscribed.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a critical issue where accessing Thread-Local Storage (TLS) in async signal handlers could crash on recent glibc versions. The fix introduces an async-safe mechanism using a lock-free segmented array to map OS thread IDs to Thread instances, avoiding TLS access in signal handlers.

Key changes:

  • Added async-safe thread map infrastructure using lock-free atomics
  • Enhanced CheckActivationSafePoint to use async-safe thread lookup and verify reader lock safety
  • Updated HijackCallback signature to explicitly control inline vs redirect suspension behavior

Reviewed changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
src/native/minipal/thread.h Added minipal_get_current_thread_id_no_cache() function that retrieves thread ID without TLS caching for async-safe usage
src/coreclr/runtime/asyncsafethreadmap.h New header defining async-safe thread map API for coreclr
src/coreclr/runtime/asyncsafethreadmap.cpp New implementation of lock-free segmented thread map using atomic operations
src/coreclr/vm/threads.h Added GetThreadAsyncSafe() declaration for async-safe thread retrieval
src/coreclr/vm/threads.cpp Implemented GetThreadAsyncSafe() and integrated thread map operations into SetThread()
src/coreclr/vm/threadsuspend.cpp Updated CheckActivationSafePoint to use async-safe thread lookup and verify scan lock safety
src/coreclr/vm/codeman.h Added GetScanFlags() thread parameter and IsManagedCodeNoLock() declaration
src/coreclr/vm/codeman.cpp Implemented IsManagedCodeNoLock() and updated GetScanFlags() to accept optional thread parameter
src/coreclr/vm/CMakeLists.txt Added asyncsafethreadmap source files to coreclr build for Unix non-WASM targets
src/coreclr/pal/src/exception/signal.cpp Restructured signal handler control flow (cosmetic change)
src/coreclr/nativeaot/Runtime/threadstore.h Added GetCurrentThreadIfAvailableAsyncSafe() declaration
src/coreclr/nativeaot/Runtime/threadstore.cpp Implemented GetCurrentThreadIfAvailableAsyncSafe() and integrated thread map into attach/detach
src/coreclr/nativeaot/Runtime/thread.h Updated HijackCallback signature to include doInlineSuspend parameter
src/coreclr/nativeaot/Runtime/thread.cpp Updated HijackCallback implementation to use doInlineSuspend parameter instead of null check
src/coreclr/nativeaot/Runtime/unix/PalUnix.cpp Updated ActivationHandler to use async-safe thread lookup and pass doInlineSuspend flag
src/coreclr/nativeaot/Runtime/windows/PalMinWin.cpp Updated Windows activation and hijack handlers to pass doInlineSuspend flag
src/coreclr/nativeaot/Runtime/CMakeLists.txt Added asyncsafethreadmap source files to NativeAOT build for Unix non-WASM targets

@jkotas
Copy link
Member

jkotas commented Dec 13, 2025

/azp run runtime-nativeaot-outerloop

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@jkotas
Copy link
Member

jkotas commented Dec 14, 2025

/ba-g generic wasm failure - unrelated to the changes in the PR

@jkotas jkotas merged commit 8d796d8 into dotnet:main Dec 14, 2025
157 of 162 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Jan 13, 2026
@janvorli
Copy link
Member Author

/backport to release/10.0

@github-actions github-actions bot unlocked this conversation Feb 12, 2026
@github-actions
Copy link
Contributor

Started backporting to release/10.0 (link to workflow run)

@github-actions
Copy link
Contributor

@janvorli backporting to release/10.0 failed, the patch most likely resulted in conflicts. Please backport manually!

git am output
$ git am --3way --empty=keep --ignore-whitespace --keep-non-patch changes.patch

Applying: Redo fix lookup for current Thread in async signal handler
.git/rebase-apply/patch:233: trailing whitespace.
            ThreadSegment* pExpected = NULL;           
warning: 1 line adds whitespace errors.
Using index info to reconstruct a base tree...
M	src/coreclr/nativeaot/Runtime/CMakeLists.txt
M	src/coreclr/nativeaot/Runtime/threadstore.cpp
M	src/coreclr/nativeaot/Runtime/threadstore.h
M	src/coreclr/nativeaot/Runtime/unix/PalUnix.cpp
M	src/coreclr/pal/src/exception/signal.cpp
M	src/coreclr/vm/CMakeLists.txt
M	src/coreclr/vm/codeman.cpp
M	src/coreclr/vm/codeman.h
M	src/coreclr/vm/threads.cpp
M	src/coreclr/vm/threads.h
M	src/coreclr/vm/threadsuspend.cpp
M	src/native/minipal/thread.h
Falling back to patching base and 3-way merge...
Auto-merging src/coreclr/nativeaot/Runtime/CMakeLists.txt
Auto-merging src/coreclr/nativeaot/Runtime/threadstore.cpp
Auto-merging src/coreclr/nativeaot/Runtime/threadstore.h
Auto-merging src/coreclr/nativeaot/Runtime/unix/PalUnix.cpp
Auto-merging src/coreclr/pal/src/exception/signal.cpp
Auto-merging src/coreclr/vm/CMakeLists.txt
Auto-merging src/coreclr/vm/codeman.cpp
Auto-merging src/coreclr/vm/codeman.h
Auto-merging src/coreclr/vm/threads.cpp
Auto-merging src/coreclr/vm/threads.h
Auto-merging src/coreclr/vm/threadsuspend.cpp
Auto-merging src/native/minipal/thread.h
CONFLICT (content): Merge conflict in src/native/minipal/thread.h
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
hint: When you have resolved this problem, run "git am --continue".
hint: If you prefer to skip this patch, run "git am --skip" instead.
hint: To restore the original branch and stop patching, run "git am --abort".
hint: Disable this message with "git config set advice.mergeConflict false"
Patch failed at 0001 Redo fix lookup for current Thread in async signal handler
Error: The process '/usr/bin/git' failed with exit code 128

Link to workflow output

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Feb 12, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Address sanitizer: Crash/deadlock in inject signal handler on Linux with glibc >= 2.40

3 participants