Update SKIP_ALLOC_FRAME to match 3.1, 5.0 and 6.0 instruction sequences in function prolog#45184
Update SKIP_ALLOC_FRAME to match 3.1, 5.0 and 6.0 instruction sequences in function prolog#45184echesakov merged 12 commits intodotnet:masterfrom
Conversation
|
I couldn't figure out the best area label to add to this PR. If you have write-permissions please help me learn by adding exactly one area label. |
src/coreclr/src/vm/eetwain.cpp
Outdated
There was a problem hiding this comment.
Should this be hard-coded number instead of the OS page size to avoid problem with OS page size changing between AOT time and JIT time?
There was a problem hiding this comment.
Yes, it should. Thanks for catching this.
There was a problem hiding this comment.
Hmm, if I take an approach as discussed in another PR - that this should match target page size for JIT-ed code and 0x1000 for R2R code, right?
There was a problem hiding this comment.
The runtime assumes that the unwinding algorithm is same for all code flavors.
src/coreclr/src/vm/eetwain.cpp
Outdated
There was a problem hiding this comment.
The unwinding algorithm is part of R2R contract. This will require bumping the major R2R format version.
There was a problem hiding this comment.
I never changed the R2R format version in the past. Is there any guideline I should follow to accomplish that?
There was a problem hiding this comment.
It is about bumping the version number in https://github.com/dotnet/runtime/blob/master/src/coreclr/inc/readytorun.h and other places this file points to.
There are two ways how to evolve the R2R contract:
- Backward compatible: It means keeping both old and new version of the unwinding algorithm, and choosing the right algortihm based on the R2R format version in the given image
- Non-backward compatible (ie breaking): We will fallback to JIT for the older R2R images.
We prefer the backward compatible evolution when we have a choice.
@dotnet/crossgen-contrib Do you have thoughts on whether we want to do R2R version break for 6.0?
There was a problem hiding this comment.
Actually, the simplest thing to do for this one may be to do support both old and new unwind sequences. The unwinding inspects the instruction stream and so it should be possible to handle both without passing any explicit versions around.
You will still want to change these numbers to:
#define READYTORUN_MAJOR_VERSION 0x0005
#define READYTORUN_MINOR_VERSION 0x0001
but that will still be backward compatible change.
There was a problem hiding this comment.
Thank you for suggestion, Jan! I will rework this and add support for the old unwind sequence.
There was a problem hiding this comment.
I am still validating the changes but so far I haven't seen any issues with 3.1 and 5.0 R2R images.
There was a problem hiding this comment.
Don't forget to also include READYTORUN_MAJOR_VERSION bump that I have mentioned above.
There was a problem hiding this comment.
Thanks for reminding me to do the bump - I updated both readytorun.h and ModuleHeaders.cs.
There was a problem hiding this comment.
What is special about this test that it cannot be written in C#?
There was a problem hiding this comment.
It probably can. But it was simpler for me to express conditions for AllocLocal in order to have esp-based frame.
There was a problem hiding this comment.
I rewrote the test in C# - turned out that it was enough to have a method as virtual to make inliner cooperate and not inline the method while keeping its frame esp based (I couldn't use NoInlining attribute since the method would have JIT_FLAG_FRAMED flag set and becomes ebp based).
|
The test seem to expose some limitation in Mono runtime |
You can try to build a test that hits this crash even without GC stress. |
@jkotas Do you have any suggestion how to approach this? I was thinking about spawning a large number of threads all running |
|
Yes, this sounds like the right approach. |
|
// Auto-generated message 69e114c which was merged 12/7 removed the intermediate src/coreclr/src/ folder. This PR needs to be updated as it touches files in that directory which causes conflicts. To update your commits you can use this bash script: https://gist.github.com/ViktorHofer/6d24f62abdcddb518b4966ead5ef3783. Feel free to use the comment section of the gist to improve the script for others. |
fca8c7b to
4539436
Compare
I tried this approach by running methods allocating a local frame in different threads while triggering GC at the same time. I also extended the test cases and I believe I covered all the different frame allocation methods that JIT can use on win-x86: G_M21059_IG01: ;; offset=0000H
50 push eax
33C0 xor eax, eax
890424 mov dword ptr [esp], eaxNoProbe G_M27846_IG01: ;; offset=0000H
57 push edi
81EC00010000 sub esp, 256InlineProbe G_M31598_IG01: ;; offset=0000H
57 push edi
81EC000E0000 sub esp, 0xE00
850424 test eax, dword ptr [esp]HelperProbe G_M56481_IG01: ;; offset=0000H
57 push edi
8D842400F0FFFF lea eax, [esp-1000H]
E849E90673 call CORINFO_HELP_STACK_PROBE
8BE0 mov esp, eax |
|
Waiting for #10184 fix for full validation. |
|
Given that GCStress=8 doesn't currently work with R2R methods I tested the change manually. Test methodology is to call one of the implementations of |
|
Triggered runtime-coreclr gcstress0x3-gcstress0xc pipeline to validate the change and that it fixes #45090 |
|
gcstress windows x86 is clean - suggesting that the change fixes the original issue. There are failures in gcstress windows x64 and gcstress linux arm pipelines - I triggered another gcstress on top of master. |
src/coreclr/jit/codegenxarch.cpp
Outdated
There was a problem hiding this comment.
Is JIT_FLAG_PROF_REJIT_NOPS still going to work with this change?
JIT_FLAG_PROF_REJIT_NOPS is not used currently. It may be best to delete it instead of keeping partially broken implementation around.
There was a problem hiding this comment.
JIT_FLAG_PROF_REJIT_NOPS is not used currently. It may be best to delete it instead of keeping partially broken implementation around.
@jkotas Would it be fine with you if I undo the change to the jit file in this PR and come back with another PR removing JIT_FLAG_PROF_REJIT_NOPS flag and all the related infrastructure?
…OC_FRAME in src/coreclr/vm/eetwain.cpp
…f R2R code in src/coreclr/vm/eetwain.cpp
…for 3.1 stack probing loop
…/tools/Common/Internal/Runtime/ModuleHeaders.cs
…coreclr/jit/codegenxarch.cpp" This reverts commit 0b88d71.
9221151 to
a2f2cc7
Compare
|
Rebased on top of latest master and reverted the change src/coreclr/jit/codegenxarch.cpp. |
First, let's confirm that the added test fails.It doesThe added test is designed in a way that exposes an issue as in #45090.
In order to do this,
AllocLocalmethods must haveesp-based frames. Most of the methods would require to call a stack probe helper, but some would use inlined stack probing sequence as it is demonstrated in an example below.There are three cases that should be considered in
SKIP_ALLOC_FRAMEin order for the unwinder to work properly:frameSize <= 0xC00- a function prolog containssub esp, frameSize- no probing necessary0xC00<frameSize < 0x1000- a function prolog containssub esp, frameSizefollowed bytest eax, [esp]- the latter ensure that[esp+1024]always points to either already committed page or guard page.frameSize > 0x1000- a function prolog contains a call to stack probe helper