Fix x64 and x86 emulation on arm64 Windows#72693
Conversation
The runtime was hanging or crashing when running on x64 and x86 emulation on Windows ARM64. Most of these failures were caused by missing cache flushes after code updates where the emulator kept executing arm64 code that it has jitted from the previous state of the code. There were also two CONTEXT related issues. One was that we were assuming that AVX state is always supported. However, the emulator doesn't support it, so SetXStateFeaturesMask(pCtx, XSTATE_MASK_AVX) was failing and we have considered that a failure to capture thread context. The other one was that we have assumed that the CONTEXT we get in Thread::RedirectCurrentThreadAtHandledJITCase is CONTEXT_COMPLETE, but with the emulation, we one get CONTEXT_FULL, that means CONTEXT_COMPLETE without the debug registers. There were also two CoreCLR tests issues that caused failures under the emulation, I have fixed those.
| //DbgPrintf("Jitted Entry at" FMT_ADDR "method %s::%s %s size %08x\n", DBG_ADDR(nativeEntry), | ||
| // pszDebugClassName, pszDebugMethodName, pszDebugMethodSignature, sizeOfCode); | ||
|
|
||
| ClrFlushInstructionCache(nativeEntry, sizeOfCode); |
There was a problem hiding this comment.
Do we need to fix the implementation of ClrFlushInstructionCache instead?
There was a problem hiding this comment.
That would be too big hammer. Almost all usages of it don't require flushing of cache on intel.
There was a problem hiding this comment.
Btw, this is needed only because we reuse code heap memory for dynamic methods. It took me a while to figure it out.
There was a problem hiding this comment.
Almost all usages of it don't require flushing of cache on intel.
So it soft flush can be only used in places where the code was never executed before. I think that we have more of those, for example Precode::Reset should use also use the hard flush.
Would it make sense to add bool codeNeverExecutedBefore argument to ClrFlushInstructionCache and switch all places to call ClrFlushInstructionCache with the right true/false argument? We can then audit it more easily. Also, it would allow us to centralize the policy and do things like only do the hard flushes when emulating if it has material overhead.
| { | ||
| ExecutableWriterHolder<void> writeBarrierWriterHolder(GetWriteBarrierCodeLocation((void*)JIT_WriteBarrier), GetCurrentWriteBarrierSize()); | ||
| memcpy(writeBarrierWriterHolder.GetRW(), (LPVOID)GetCurrentWriteBarrierCode(), GetCurrentWriteBarrierSize()); | ||
| stompWBCompleteActions |= SWB_ICACHE_FLUSH; |
There was a problem hiding this comment.
It sounds like SWB_ICACHE_FLUSH is required on all write barrier updates now. Are there any situations where it is not required/set?
There was a problem hiding this comment.
It is needed whenever we change the code of the barrier. So for this ChangeWriteBarrierTo, it is needed always, because we always switch the write barrier kind. Only the WriteBarrierManager::UpdateEphemeralBounds and WriteBarrierManager::UpdateWriteWatchAndCardTableLocations have it optional, since if the bounds and barrier kind doesn't change, there is no code change.
I am actually considering making most of these changes conditional on W^X enabled. Since without W^X enabled, things work fine even with the emulation. That's because the emulator doesn't see the connection between the write location and execute location.
I just wonder - why were we flushing the instruction cache for some of the changes, but not for all of them before?
There was a problem hiding this comment.
WriteBarrierManager::UpdateEphemeralBounds and WriteBarrierManager::UpdateWriteWatchAndCardTableLocations have it optional, since if the bounds and barrier kind doesn't change, there is no code change.
UpdateEphemeralBounds and UpdateWriteWatchAndCardTableLocations will change immediates baked into the code. The emulator JIT has to flush the caches and rejit the code.
There was a problem hiding this comment.
I just wonder - why were we flushing the instruction cache for some of the changes, but not for all of them before?
This is not the first time we are trying to make the updateable code work on some sort of emulator. Some of the existing hard flushes are left-overs from the earlier efforts.
There was a problem hiding this comment.
UpdateEphemeralBounds and UpdateWriteWatchAndCardTableLocations will change immediates baked into the code. The emulator JIT has to flush the caches and rejit the code.
Only if the limits really change. That's not always the case.
|
@jkotas btw, we could also just disable W^X when we detect emulation (like we do on Rosetta) instead of all these changes. I just felt it was worth the effort to find and fix the real culprits. |
| // The system silently ignores any feature specified in the FeatureMask | ||
| // which is not enabled on the processor. | ||
| bRes &= SetXStateFeaturesMask(pCtx, XSTATE_MASK_AVX); | ||
| SetXStateFeaturesMask(pCtx, XSTATE_MASK_AVX); |
There was a problem hiding this comment.
Did this actually return a failure?
There was a problem hiding this comment.
I think the actual implementation would filter the mask with available features and then apply, thus it would not fail for unavailable features.
Maybe that is different under the emulation.
There was a problem hiding this comment.
Yes, it always returned FALSE and GetLastError returned ERROR_INVALID_PARAMETER.
The runtime was hanging or crashing when running on x64 and x86
emulation on Windows ARM64. Most of these failures were caused by
missing cache flushes after code updates where the emulator kept
executing arm64 code that it has jitted from the previous state of
the code.
There were also two
CONTEXTrelated issues. One was that we wereassuming that AVX state is always supported. However, the emulator
doesn't support it, so
SetXStateFeaturesMask(pCtx, XSTATE_MASK_AVX)was failing and we have considered that a failure to capture thread
context.
The other one was that we have assumed that the
CONTEXTwe get inThread::RedirectCurrentThreadAtHandledJITCaseisCONTEXT_COMPLETE,but with the emulation, we one get
CONTEXT_FULL, that meansCONTEXT_COMPLETEwithout the debug registers.There were also two CoreCLR tests issues that caused failures under the
emulation, I have fixed those.
Close #71856