Skip to content

Conversation

@VSadov
Copy link
Member

@VSadov VSadov commented Dec 31, 2022

No description provided.

@ghost ghost assigned VSadov Dec 31, 2022
@ghost
Copy link

ghost commented Dec 31, 2022

Tagging subscribers to this area: @agocke, @MichalStrehovsky, @jkotas
See info in area-owners.md if you want to be subscribed.

Issue Details

Not completed yet, but should be functional. There will be some refactoring.

Current test run will fail if lab machines have QueueUserAPC2, intentionally - I want to find out if the OS builds on lab machines have this API, especially on ARM64, where I did not have a way to test.

Author: VSadov
Assignees: -
Labels:

area-NativeAOT-coreclr

Milestone: -

@MichalStrehovsky
Copy link
Member

Most likely everything runs on Win10.

At minimum the ARM64 legs are Win11. From the console log of one of the tests in this PR:

Console log: 'Microsoft.Extensions.Configuration.Functional.Tests' from job a5783e36-675c-4d1c-bd66-912b1b2cd73c workitem fd692c4c-6e4e-464a-bffa-2e49f0d3d415 (windows.11.arm64.open) executed on machine a0006UM running Windows-10-10.0.22621-SP0

22621 is the 22H2 Windows 11 build.

@VSadov
Copy link
Member Author

VSadov commented Jan 2, 2023

I am testing locally on 22623.1037 (x64 insider preview in Beta/Slow ring) and it has the feature. It is not that much newer than 22621.
Perhaps it is different with ARM64? It should not be though. The API is not architecture-specific.

@VSadov VSadov marked this pull request as ready for review January 2, 2023 03:04
@VSadov
Copy link
Member Author

VSadov commented Jan 2, 2023

/azp run runtime-extra-platforms

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@dotnet dotnet deleted a comment from azure-pipelines bot Jan 2, 2023
@VSadov
Copy link
Member Author

VSadov commented Jan 6, 2023

@jkotas - anything else we might want to add to this change?


#ifdef FEATURE_SPECIAL_USER_MODE_APC
typedef BOOL (WINAPI* QueueUserAPC2Proc)(PAPCFUNC ApcRoutine, HANDLE Thread, ULONG_PTR Data, QUEUE_USER_APC_FLAGS Flags);
static QueueUserAPC2Proc pfnQueueUserAPC2Proc;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
static QueueUserAPC2Proc pfnQueueUserAPC2Proc;
static QueueUserAPC2Proc g_pfnQueueUserAPC2Proc;

// See if QueueUserAPC2 supports the special user-mode APC with a callback that includes the interrupted CONTEXT
if (pfnQueueUserAPC2Proc != NULL)
{
if (!(*pfnQueueUserAPC2Proc)(EmptyActivationHandler, GetCurrentThread(), 0, SpecialUserModeApcWithContextFlags))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we delay this until we actually need to suspend the thread? We should not even need a dummy call for this. We can just try to call it for real and see whether it failed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Win10 this will always fail, but the API is present. We would need to prevent trying over and over.
With some sentinel value for "unknown" state and NULL for "does not work" this might be possible.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, we can have a bool flag that says the API failed with "SpecialUserModeApcWithContextFlags not supported error". Once the flag gets set, we will stop trying.

Copy link
Member Author

@VSadov VSadov Jan 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've uses a sentinel value for uninitialized state. Hijacking pass is single-threaded, so initializing on demand is simple.
Maybe it will save us some ticks at startup.

{
g_pfnQueueUserAPC2Proc = (QueueUserAPC2Proc)GetProcAddress(LoadKernel32dll(), "QueueUserAPC2");
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not thread safe in C/C++.

It is valid for C/C++ compiler to rewrite this as:

QueueUserAPC2Proc pfn = g_pfnQueueUserAPC2Proc;
if (g_pfnQueueUserAPC2Proc == QUEUE_USER_APC2_UNINITIALIZED)
{
   g_pfnQueueUserAPC2Proc = pfn = (QueueUserAPC2Proc)GetProcAddress(LoadKernel32dll(), "QueueUserAPC2");
}

if (pfn)
{
....
}

It needs to use volatile or something like that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only one thread may perform suspension/hijacking at a time. This code is single threaded, protected by the thread store lock.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It needs comment at least to make this assumption explicit.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’ve made the variable ‘volatile’ initially n this change, but then realized that there is no need.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’ll add a comment. It was not obvious.
I even had ‘Interlocked’, etc, before realizing it is single threaded and can be a lot simpler.

return;
}

// queuing an APC failed. we will not try again
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we check for specific "not suppported" last error here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think API documents particular errors. They just say the error could be fetched via last error.

If we recognize some errors, what would we do with errors that we do not recognize?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we recognize some errors, what would we do with errors that we do not recognize?

Ignore the error and try again a bit later.

Copy link
Member Author

@VSadov VSadov Jan 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’ll check what OS with no support for QUEUE_USER_APC_CALLBACK_DATA_CONTEXT sets the last error to.

Copy link
Member

@jkotas jkotas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@jkotas
Copy link
Member

jkotas commented Jan 6, 2023

Looks like the tests are failing with "failed to queue an APC for unusual reason."

@VSadov
Copy link
Member Author

VSadov commented Jan 6, 2023

/azp run runtime-extra-platforms

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@VSadov
Copy link
Member Author

VSadov commented Jan 6, 2023

Thanks!!

@VSadov VSadov merged commit 7a6f33b into dotnet:main Jan 6, 2023
@VSadov VSadov deleted the quapc2 branch January 6, 2023 23:45
@jakobbotsch
Copy link
Member

The build is failing locally for me after this PR:

  C:\dev\dotnet\runtime\src\coreclr\nativeaot\Runtime\windows\PalRedhawkMinWin.cpp(452): error C2061: syntax error: identifier 'QUEUE_USER_APC_FLAGS'
  C:\dev\dotnet\runtime\src\coreclr\nativeaot\Runtime\windows\PalRedhawkMinWin.cpp(457): error C4430: missing type specifier - int assumed. Note: C++ does not support default-int
  C:\dev\dotnet\runtime\src\coreclr\nativeaot\Runtime\windows\PalRedhawkMinWin.cpp(457): error C2146: syntax error: missing ';' before identifier 'SpecialUserModeApcWithContextFlags'
  C:\dev\dotnet\runtime\src\coreclr\nativeaot\Runtime\windows\PalRedhawkMinWin.cpp(457): error C2059: syntax error: '='
  C:\dev\dotnet\runtime\src\coreclr\nativeaot\Runtime\windows\PalRedhawkMinWin.cpp(463): error C2065: 'APC_CALLBACK_DATA': undeclared identifier
  C:\dev\dotnet\runtime\src\coreclr\nativeaot\Runtime\windows\PalRedhawkMinWin.cpp(463): error C2065: 'data': undeclared identifier
  C:\dev\dotnet\runtime\src\coreclr\nativeaot\Runtime\windows\PalRedhawkMinWin.cpp(463): error C2065: 'APC_CALLBACK_DATA': undeclared identifier
  C:\dev\dotnet\runtime\src\coreclr\nativeaot\Runtime\windows\PalRedhawkMinWin.cpp(463): error C2059: syntax error: ')'
  C:\dev\dotnet\runtime\src\coreclr\nativeaot\Runtime\windows\PalRedhawkMinWin.cpp(464): error C2065: 'data': undeclared identifier
  C:\dev\dotnet\runtime\src\coreclr\nativeaot\Runtime\windows\PalRedhawkMinWin.cpp(466): error C2065: 'data': undeclared identifier
  C:\dev\dotnet\runtime\src\coreclr\nativeaot\Runtime\windows\PalRedhawkMinWin.cpp(507): error C2065: 'SpecialUserModeApcWithContextFlags': undeclared identifier

@jkotas
Copy link
Member

jkotas commented Jan 7, 2023

@jakobbotsch What is the Windows SDK version on your machine? (dir C:\Program Files (x86)\Windows Kits\10\Include)

@VSadov
Copy link
Member Author

VSadov commented Jan 7, 2023

As I understand 8.0 main builds with Windows 11 SDK 10.0.22621, which defines these.

@VSadov
Copy link
Member Author

VSadov commented Jan 7, 2023

@VSadov
Copy link
Member Author

VSadov commented Jan 7, 2023

This is confusing. When I tried defining these the lab build failed because of duplicate definitions and I only could reproduce the lab behavior by installing 10.0.22621 locally

@jkotas
Copy link
Member

jkotas commented Jan 7, 2023

@jkotas
Copy link
Member

jkotas commented Jan 7, 2023

As I understand 8.0 main builds with Windows 11 SDK 10.0.22621

FWIW, this is the SDK version that I have on my machines as well. I always use the recommended Workloads installation approach (https://github.com/dotnet/runtime/blob/main/docs/workflow/requirements/windows-requirements.md#visual-studio).

@VSadov
Copy link
Member Author

VSadov commented Jan 7, 2023

It is possible to redefine the types/constants under different name. It looks like this is what we did in coreclr CET support.

@VSadov
Copy link
Member Author

VSadov commented Jan 7, 2023

Here is a PR that should build without 22621 #80333

@VSadov
Copy link
Member Author

VSadov commented Jan 7, 2023

With that PR I can build even if I remove 22621.

I could not remove Windows 11 22000 SDK though, because, according to VS installer, c++ v143 depends on it and would need to be removed too.

@jakobbotsch
Copy link
Member

jakobbotsch commented Jan 8, 2023

@jakobbotsch What is the Windows SDK version on your machine? (dir C:\Program Files (x86)\Windows Kits\10\Include)

The version was 10.0.19041. It indeed builds fine for me after updating the SDK, so no worries from my side at least.

@ghost ghost locked as resolved and limited conversation to collaborators Feb 7, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants