-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Allow larger than 0x1000 bytes allocation in genStackPointerConstantAdjustment() #47862
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
… size in src/coreclr/vm/jitinterface.cpp
…2 and Arm64 The temporary register is going to be used when sub sp,sp,#spDelta can not be encoded. For example, this happens when spDelta corresponds to OsPageSize and the OsPageSize is larger than 0x1000 bytes. The following code needs to be generated in such cases mov regTmp,#spDelta sub sp,sp,regTmp
|
I did a smoke test yesterday with the increased reported page size on top of current master and the following are the links to the test runs:
The test originally failing in #42023 was failing in those test runs as well. |
…ister to encode amount of sp adjustment in src/coreclr/jit/codegenarm64.cpp
837d458 to
feffd99
Compare
|
@dotnet/jit-contrib The change is ready to be reviewed, please take a look. Since the CI testing was non-functioning yesterday I manually validated the change by running superpmi on libraries and tests collections with win_arm64_x64 clrjit. In order to simulate larger page size I made I think it might be beneficial to add a complus variable (e.g. |
feffd99 to
d0fdc3e
Compare
sandreenko
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you.
I think it might be beneficial to add a complus variable (e.g. COMPlus_ForcePageSize) that would be active during altjits and force the JIT to use a different page size than reported by VM.
Sounds like a rare scenario to me and I don't think we have an appropriate pipeline for such testing, because if I understand correctly we can't run code generated with a bigger page size?
You are right, we don't run altjit pipelines anymore. But we could try running with an increased page size on Linux anyway. And if we had M1 machines in CI those, presumably, would run with 16KB page size. Given the recent developments with the SuperPMI collections pipelines I wonder if it's worth adding a pipeline replaying the collections with different altjits that would allow not to only validate the SuperPMI replay function/harness but also validate altjit infrastructure. |
…StackPointerConstantAdjustment in src/coreclr/jit/codegenarmarch.cpp
|
arm64 mac tests passed (3 failures are expected): |
|
@sandreenko Thank you for confirming! |
|
I am not convinced this is the right fix. Why aren't we just hard coding the probing size to 4K for local alloc. How will we handle when crossgen occurs on a 64K page machine and generates code to run on a 4K page machine. Is the extra complexity and register usage really worth it? |
The crossgen will always use the hardcoded 4K page - see the changes in src/coreclr/vm/jitinterface.cpp
The temporary register usage would happen at most once. On a 4K page machine the register is never going to be used. On a 16K or 64K page machine allocation of a local heap would be done in a page size decrements (that size can encoded by |
Root Cause: The issue #42023 is due to inability to allocate a local heap (localloc) using decrements of sp with immediate offset larger than 0x1000 bytes - this happens if OsPageSize is greater than 0x1000.
I originally suspected the issue was due to prolog allocation/stack probing as I mentioned in #42023 (comment) but after locally reproducing and analyzing the failure I identified the issue to be not related to prolog execution.
Solution: Instead of directly encoding the decrement value in
sub sp, sp, #imminstruction immediate the JIT would instead use a temporary registertmpRegto store the value withmov tmpReg, #immand usesub sp, sp, tmpRegto do the allocation.Validation: The testing is the most challenging part of the change - given the fact that we don't have a machine with non-4KB page size.
Hence, I try to simulate the larger page size making VM to report 0x10000 as OsPageSize with a hope that a failure would occur earlier during the JIT/AOT compilation test and not during the tests running time (due to not illegal memory access below stack guard page). In fact, I believe that the test run on linux-arm64 should succeed even with the increased OsPageSize value since on Linux the default value of
stack_guard_gap=is 256 pages according to Linux kernel documentation.