Adjust how we access thread ID for header thin locks#124878
Adjust how we access thread ID for header thin locks#124878jkoritzinsky wants to merge 2 commits intodotnet:mainfrom
Conversation
Pass the `Thread*` object instead of the thread ID and only compute thread ID in the paths that need it to convince the compiler to better separate memory accesses for better pipelining. This brings the emitted assembly of the .NET 11 implementation closer to the .NET 10 implementation, which I think should be enough to get performance back (as the assembly is darn near identical at this point, arguably better for the new implementation).
|
Tagging subscribers to this area: @agocke |
|
@MihuBot benchmark System.Collections.TryAddGiventSize |
There was a problem hiding this comment.
Pull request overview
This PR optimizes the thin lock implementation by adjusting how thread IDs are accessed in AcquireHeaderThinLock and ReleaseHeaderThinLock. Instead of passing a pre-computed thread ID (DWORD tid), these methods now accept a Thread* pointer and only call GetThreadId() in code paths where the thread ID is actually needed. This change helps the compiler better separate memory accesses for improved CPU pipelining, bringing the .NET 11 implementation's generated assembly closer to .NET 10's performance characteristics.
Changes:
- Modified function signatures to accept
Thread* pCurThreadinstead ofDWORD tid - Moved
GetThreadId()calls inside conditional branches that actually use the thread ID - Updated call sites to pass
GetThread()instead ofGetThread()->GetThreadId()
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| src/coreclr/vm/syncblk.h | Updated method signatures for AcquireHeaderThinLock and ReleaseHeaderThinLock to accept Thread* parameter |
| src/coreclr/vm/syncblk.inl | Modified implementations to defer GetThreadId() calls to only the code paths that need the thread ID value |
| src/coreclr/vm/comsynchronizable.cpp | Updated call sites to pass GetThread() instead of computing thread ID upfront |
System.Collections.TryAddGiventSize_String_
|
|
@MihuBot benchmark Microsoft.Extensions.DependencyInjection.GetService.Scoped |
|
From the benchmark run:
I believe we have a fix for the regressions now. |
|
|
||
| // Here we know we have the "thin lock" layout, but the lock is not free. | ||
| // It could still be the recursion case - compare the thread id to check | ||
| if (tid != (DWORD)(oldValue & SBLK_MASK_LOCK_THREADID)) |
There was a problem hiding this comment.
Could the current thread have changed here? Why do we need to refetch the id?
|
|
||
| if ((syncBlockValue & (BIT_SBLK_SPIN_LOCK + BIT_SBLK_IS_HASH_OR_SYNCBLKINDEX)) == 0) | ||
| { | ||
| if (tid > SBLK_MASK_LOCK_THREADID) |
There was a problem hiding this comment.
Why isn't this early-out important?
Pass the
Thread*object instead of the thread ID and only compute thread ID in the paths that need it to convince the compiler to better separate memory accesses for better pipelining.This brings the emitted assembly of the .NET 11 implementation closer to the .NET 10 implementation, which I think should be enough to get performance back (as the assembly is darn near identical at this point, arguably better for the new implementation).
Fixes dotnet/perf-autofiling-issues#64673
Fixes #121350