Skip to content

Fix x64 and x86 emulation on arm64 Windows#72693

Merged
janvorli merged 3 commits intodotnet:mainfrom
janvorli:fix-x64-x86-emulation-2
Aug 3, 2022
Merged

Fix x64 and x86 emulation on arm64 Windows#72693
janvorli merged 3 commits intodotnet:mainfrom
janvorli:fix-x64-x86-emulation-2

Conversation

@janvorli
Copy link
Member

The runtime was hanging or crashing when running on x64 and x86
emulation on Windows ARM64. Most of these failures were caused by
missing cache flushes after code updates where the emulator kept
executing arm64 code that it has jitted from the previous state of
the code.

There were also two CONTEXT related issues. One was that we were
assuming that AVX state is always supported. However, the emulator
doesn't support it, so SetXStateFeaturesMask(pCtx, XSTATE_MASK_AVX)
was failing and we have considered that a failure to capture thread
context.
The other one was that we have assumed that the CONTEXT we get in
Thread::RedirectCurrentThreadAtHandledJITCase is CONTEXT_COMPLETE,
but with the emulation, we one get CONTEXT_FULL, that means
CONTEXT_COMPLETE without the debug registers.

There were also two CoreCLR tests issues that caused failures under the
emulation, I have fixed those.

Close #71856

The runtime was hanging or crashing when running on x64 and x86
emulation on Windows ARM64. Most of these failures were caused by
missing cache flushes after code updates where the emulator kept
executing arm64 code that it has jitted from the previous state of
the code.

There were also two CONTEXT related issues. One was that we were
assuming that AVX state is always supported. However, the emulator
doesn't support it, so SetXStateFeaturesMask(pCtx, XSTATE_MASK_AVX)
was failing and we have considered that a failure to capture thread
context.
The other one was that we have assumed that the CONTEXT we get in
Thread::RedirectCurrentThreadAtHandledJITCase is CONTEXT_COMPLETE,
but with the emulation, we one get CONTEXT_FULL, that means
CONTEXT_COMPLETE without the debug registers.

There were also two CoreCLR tests issues that caused failures under the
emulation, I have fixed those.
@janvorli janvorli added this to the 7.0.0 milestone Jul 22, 2022
@janvorli janvorli requested review from VSadov and jkotas July 22, 2022 19:04
@ghost ghost assigned janvorli Jul 22, 2022
//DbgPrintf("Jitted Entry at" FMT_ADDR "method %s::%s %s size %08x\n", DBG_ADDR(nativeEntry),
// pszDebugClassName, pszDebugMethodName, pszDebugMethodSignature, sizeOfCode);

ClrFlushInstructionCache(nativeEntry, sizeOfCode);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to fix the implementation of ClrFlushInstructionCache instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be too big hammer. Almost all usages of it don't require flushing of cache on intel.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, this is needed only because we reuse code heap memory for dynamic methods. It took me a while to figure it out.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost all usages of it don't require flushing of cache on intel.

So it soft flush can be only used in places where the code was never executed before. I think that we have more of those, for example Precode::Reset should use also use the hard flush.

Would it make sense to add bool codeNeverExecutedBefore argument to ClrFlushInstructionCache and switch all places to call ClrFlushInstructionCache with the right true/false argument? We can then audit it more easily. Also, it would allow us to centralize the policy and do things like only do the hard flushes when emulating if it has material overhead.

{
ExecutableWriterHolder<void> writeBarrierWriterHolder(GetWriteBarrierCodeLocation((void*)JIT_WriteBarrier), GetCurrentWriteBarrierSize());
memcpy(writeBarrierWriterHolder.GetRW(), (LPVOID)GetCurrentWriteBarrierCode(), GetCurrentWriteBarrierSize());
stompWBCompleteActions |= SWB_ICACHE_FLUSH;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sounds like SWB_ICACHE_FLUSH is required on all write barrier updates now. Are there any situations where it is not required/set?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is needed whenever we change the code of the barrier. So for this ChangeWriteBarrierTo, it is needed always, because we always switch the write barrier kind. Only the WriteBarrierManager::UpdateEphemeralBounds and WriteBarrierManager::UpdateWriteWatchAndCardTableLocations have it optional, since if the bounds and barrier kind doesn't change, there is no code change.

I am actually considering making most of these changes conditional on W^X enabled. Since without W^X enabled, things work fine even with the emulation. That's because the emulator doesn't see the connection between the write location and execute location.

I just wonder - why were we flushing the instruction cache for some of the changes, but not for all of them before?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WriteBarrierManager::UpdateEphemeralBounds and WriteBarrierManager::UpdateWriteWatchAndCardTableLocations have it optional, since if the bounds and barrier kind doesn't change, there is no code change.

UpdateEphemeralBounds and UpdateWriteWatchAndCardTableLocations will change immediates baked into the code. The emulator JIT has to flush the caches and rejit the code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just wonder - why were we flushing the instruction cache for some of the changes, but not for all of them before?

This is not the first time we are trying to make the updateable code work on some sort of emulator. Some of the existing hard flushes are left-overs from the earlier efforts.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, makes sense

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UpdateEphemeralBounds and UpdateWriteWatchAndCardTableLocations will change immediates baked into the code. The emulator JIT has to flush the caches and rejit the code.

Only if the limits really change. That's not always the case.

@janvorli
Copy link
Member Author

@jkotas btw, we could also just disable W^X when we detect emulation (like we do on Rosetta) instead of all these changes. I just felt it was worth the effort to find and fix the real culprits.

// The system silently ignores any feature specified in the FeatureMask
// which is not enabled on the processor.
bRes &= SetXStateFeaturesMask(pCtx, XSTATE_MASK_AVX);
SetXStateFeaturesMask(pCtx, XSTATE_MASK_AVX);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did this actually return a failure?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the actual implementation would filter the mask with available features and then apply, thus it would not fail for unavailable features.
Maybe that is different under the emulation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it always returned FALSE and GetLastError returned ERROR_INVALID_PARAMETER.

Copy link
Member

@jkotas jkotas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@janvorli janvorli merged commit 82f7100 into dotnet:main Aug 3, 2022
@janvorli janvorli deleted the fix-x64-x86-emulation-2 branch August 3, 2022 08:49
@ghost ghost locked as resolved and limited conversation to collaborators Sep 2, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Emulated x86 and x64 .NET 7 on ARM64 can't create new project

3 participants