-
Notifications
You must be signed in to change notification settings - Fork 5.3k
[NativeAOT] Use QueueUserAPC2 in GC suspension, if available. #80087
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Tagging subscribers to this area: @agocke, @MichalStrehovsky, @jkotas Issue DetailsNot completed yet, but should be functional. There will be some refactoring. Current test run will fail if lab machines have
|
At minimum the ARM64 legs are Win11. From the console log of one of the tests in this PR: 22621 is the 22H2 Windows 11 build. |
|
I am testing locally on 22623.1037 (x64 insider preview in Beta/Slow ring) and it has the feature. It is not that much newer than 22621. |
|
/azp run runtime-extra-platforms |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
@jkotas - anything else we might want to add to this change? |
|
|
||
| #ifdef FEATURE_SPECIAL_USER_MODE_APC | ||
| typedef BOOL (WINAPI* QueueUserAPC2Proc)(PAPCFUNC ApcRoutine, HANDLE Thread, ULONG_PTR Data, QUEUE_USER_APC_FLAGS Flags); | ||
| static QueueUserAPC2Proc pfnQueueUserAPC2Proc; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| static QueueUserAPC2Proc pfnQueueUserAPC2Proc; | |
| static QueueUserAPC2Proc g_pfnQueueUserAPC2Proc; |
| // See if QueueUserAPC2 supports the special user-mode APC with a callback that includes the interrupted CONTEXT | ||
| if (pfnQueueUserAPC2Proc != NULL) | ||
| { | ||
| if (!(*pfnQueueUserAPC2Proc)(EmptyActivationHandler, GetCurrentThread(), 0, SpecialUserModeApcWithContextFlags)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we delay this until we actually need to suspend the thread? We should not even need a dummy call for this. We can just try to call it for real and see whether it failed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On Win10 this will always fail, but the API is present. We would need to prevent trying over and over.
With some sentinel value for "unknown" state and NULL for "does not work" this might be possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, we can have a bool flag that says the API failed with "SpecialUserModeApcWithContextFlags not supported error". Once the flag gets set, we will stop trying.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've uses a sentinel value for uninitialized state. Hijacking pass is single-threaded, so initializing on demand is simple.
Maybe it will save us some ticks at startup.
| { | ||
| g_pfnQueueUserAPC2Proc = (QueueUserAPC2Proc)GetProcAddress(LoadKernel32dll(), "QueueUserAPC2"); | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not thread safe in C/C++.
It is valid for C/C++ compiler to rewrite this as:
QueueUserAPC2Proc pfn = g_pfnQueueUserAPC2Proc;
if (g_pfnQueueUserAPC2Proc == QUEUE_USER_APC2_UNINITIALIZED)
{
g_pfnQueueUserAPC2Proc = pfn = (QueueUserAPC2Proc)GetProcAddress(LoadKernel32dll(), "QueueUserAPC2");
}
if (pfn)
{
....
}
It needs to use volatile or something like that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only one thread may perform suspension/hijacking at a time. This code is single threaded, protected by the thread store lock.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It needs comment at least to make this assumption explicit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’ve made the variable ‘volatile’ initially n this change, but then realized that there is no need.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’ll add a comment. It was not obvious.
I even had ‘Interlocked’, etc, before realizing it is single threaded and can be a lot simpler.
| return; | ||
| } | ||
|
|
||
| // queuing an APC failed. we will not try again |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we check for specific "not suppported" last error here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think API documents particular errors. They just say the error could be fetched via last error.
If we recognize some errors, what would we do with errors that we do not recognize?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we recognize some errors, what would we do with errors that we do not recognize?
Ignore the error and try again a bit later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’ll check what OS with no support for QUEUE_USER_APC_CALLBACK_DATA_CONTEXT sets the last error to.
jkotas
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
|
Looks like the tests are failing with "failed to queue an APC for unusual reason." |
|
/azp run runtime-extra-platforms |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
Thanks!! |
|
The build is failing locally for me after this PR: C:\dev\dotnet\runtime\src\coreclr\nativeaot\Runtime\windows\PalRedhawkMinWin.cpp(452): error C2061: syntax error: identifier 'QUEUE_USER_APC_FLAGS'
C:\dev\dotnet\runtime\src\coreclr\nativeaot\Runtime\windows\PalRedhawkMinWin.cpp(457): error C4430: missing type specifier - int assumed. Note: C++ does not support default-int
C:\dev\dotnet\runtime\src\coreclr\nativeaot\Runtime\windows\PalRedhawkMinWin.cpp(457): error C2146: syntax error: missing ';' before identifier 'SpecialUserModeApcWithContextFlags'
C:\dev\dotnet\runtime\src\coreclr\nativeaot\Runtime\windows\PalRedhawkMinWin.cpp(457): error C2059: syntax error: '='
C:\dev\dotnet\runtime\src\coreclr\nativeaot\Runtime\windows\PalRedhawkMinWin.cpp(463): error C2065: 'APC_CALLBACK_DATA': undeclared identifier
C:\dev\dotnet\runtime\src\coreclr\nativeaot\Runtime\windows\PalRedhawkMinWin.cpp(463): error C2065: 'data': undeclared identifier
C:\dev\dotnet\runtime\src\coreclr\nativeaot\Runtime\windows\PalRedhawkMinWin.cpp(463): error C2065: 'APC_CALLBACK_DATA': undeclared identifier
C:\dev\dotnet\runtime\src\coreclr\nativeaot\Runtime\windows\PalRedhawkMinWin.cpp(463): error C2059: syntax error: ')'
C:\dev\dotnet\runtime\src\coreclr\nativeaot\Runtime\windows\PalRedhawkMinWin.cpp(464): error C2065: 'data': undeclared identifier
C:\dev\dotnet\runtime\src\coreclr\nativeaot\Runtime\windows\PalRedhawkMinWin.cpp(466): error C2065: 'data': undeclared identifier
C:\dev\dotnet\runtime\src\coreclr\nativeaot\Runtime\windows\PalRedhawkMinWin.cpp(507): error C2065: 'SpecialUserModeApcWithContextFlags': undeclared identifier |
|
@jakobbotsch What is the Windows SDK version on your machine? ( |
|
As I understand 8.0 main builds with Windows 11 SDK 10.0.22621, which defines these. |
|
Although vsconfig (https://github.com/dotnet/runtime/blob/main/.vsconfig) lists Windows10SDK.19041 |
|
This is confusing. When I tried defining these the lab build failed because of duplicate definitions and I only could reproduce the lab behavior by installing 10.0.22621 locally |
FWIW, this is the SDK version that I have on my machines as well. I always use the recommended Workloads installation approach (https://github.com/dotnet/runtime/blob/main/docs/workflow/requirements/windows-requirements.md#visual-studio). |
|
It is possible to redefine the types/constants under different name. It looks like this is what we did in coreclr CET support. |
|
Here is a PR that should build without 22621 #80333 |
|
With that PR I can build even if I remove 22621. I could not remove Windows 11 22000 SDK though, because, according to VS installer, c++ v143 depends on it and would need to be removed too. |
The version was 10.0.19041. It indeed builds fine for me after updating the SDK, so no worries from my side at least. |
No description provided.