Skip to content

Conversation

@echesakov
Copy link
Contributor

@echesakov echesakov commented Feb 4, 2021

Root Cause: The issue #42023 is due to inability to allocate a local heap (localloc) using decrements of sp with immediate offset larger than 0x1000 bytes - this happens if OsPageSize is greater than 0x1000.

I originally suspected the issue was due to prolog allocation/stack probing as I mentioned in #42023 (comment) but after locally reproducing and analyzing the failure I identified the issue to be not related to prolog execution.

Solution: Instead of directly encoding the decrement value in sub sp, sp, #imm instruction immediate the JIT would instead use a temporary register tmpReg to store the value with mov tmpReg, #imm and use sub sp, sp, tmpReg to do the allocation.

Validation: The testing is the most challenging part of the change - given the fact that we don't have a machine with non-4KB page size.
Hence, I try to simulate the larger page size making VM to report 0x10000 as OsPageSize with a hope that a failure would occur earlier during the JIT/AOT compilation test and not during the tests running time (due to not illegal memory access below stack guard page). In fact, I believe that the test run on linux-arm64 should succeed even with the increased OsPageSize value since on Linux the default value of stack_guard_gap= is 256 pages according to Linux kernel documentation.

…2 and Arm64

The temporary register is going to be used when sub sp,sp,#spDelta can not be encoded.
For example, this happens when spDelta corresponds to OsPageSize and the
OsPageSize is larger than 0x1000 bytes.

The following code needs to be generated in such cases

mov regTmp,#spDelta
sub sp,sp,regTmp
@ghost ghost added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Feb 4, 2021
@dotnet dotnet deleted a comment from azure-pipelines bot Feb 4, 2021
@echesakov echesakov added the NO-MERGE The PR is not ready for merge yet (see discussion for detailed reasons) label Feb 4, 2021
@dotnet dotnet deleted a comment from azure-pipelines bot Feb 4, 2021
@echesakov
Copy link
Contributor Author

echesakov commented Feb 4, 2021

I did a smoke test yesterday with the increased reported page size on top of current master and the following are the links to the test runs:

The test originally failing in #42023 was failing in those test runs as well.

…ister to encode amount of sp adjustment in src/coreclr/jit/codegenarm64.cpp
@dotnet dotnet deleted a comment from azure-pipelines bot Feb 5, 2021
@dotnet dotnet deleted a comment from azure-pipelines bot Feb 5, 2021
@echesakov echesakov closed this Feb 5, 2021
@echesakov echesakov reopened this Feb 5, 2021
@echesakov echesakov marked this pull request as ready for review February 5, 2021 17:52
@echesakov
Copy link
Contributor Author

echesakov commented Feb 5, 2021

@dotnet/jit-contrib The change is ready to be reviewed, please take a look.

Since the CI testing was non-functioning yesterday I manually validated the change by running superpmi on libraries and tests collections with win_arm64_x64 clrjit. In order to simulate larger page size I made eeGetPageSize() returning 0x10000 (since the VM changes would not affect superpmi replay). I confirmed that without the change there are two assertions in the JIT related to un-encodable immediate value (one is it seen in #42023) and there are no assertions with the change.

I think it might be beneficial to add a complus variable (e.g. COMPlus_ForcePageSize) that would be active during altjits and force the JIT to use a different page size than reported by VM. Having such variable and a corresponding test scenario would've allowed to catch an issue like #42023 earlier in product cycle.

@dotnet dotnet deleted a comment from azure-pipelines bot Feb 5, 2021
@echesakov echesakov removed the NO-MERGE The PR is not ready for merge yet (see discussion for detailed reasons) label Feb 5, 2021
@dotnet dotnet deleted a comment from azure-pipelines bot Feb 5, 2021
@sandreenko sandreenko self-requested a review February 6, 2021 00:04
Copy link
Contributor

@sandreenko sandreenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you.

I think it might be beneficial to add a complus variable (e.g. COMPlus_ForcePageSize) that would be active during altjits and force the JIT to use a different page size than reported by VM.

Sounds like a rare scenario to me and I don't think we have an appropriate pipeline for such testing, because if I understand correctly we can't run code generated with a bigger page size?

@echesakov
Copy link
Contributor Author

Sounds like a rare scenario to me and I don't think we have an appropriate pipeline for such testing, because if I understand correctly we can't run code generated with a bigger page size?

You are right, we don't run altjit pipelines anymore. But we could try running with an increased page size on Linux anyway. And if we had M1 machines in CI those, presumably, would run with 16KB page size.

Given the recent developments with the SuperPMI collections pipelines I wonder if it's worth adding a pipeline replaying the collections with different altjits that would allow not to only validate the SuperPMI replay function/harness but also validate altjit infrastructure.

@JulieLeeMSFT JulieLeeMSFT linked an issue Feb 8, 2021 that may be closed by this pull request
@JulieLeeMSFT JulieLeeMSFT added this to the 6.0.0 milestone Feb 8, 2021
…StackPointerConstantAdjustment in src/coreclr/jit/codegenarmarch.cpp
@sandreenko
Copy link
Contributor

arm64 mac tests passed (3 failures are expected):

Total tests run    : 10383
Total passing tests: 10380
Total failed tests : 3
Total skipped tests: 0

@echesakov echesakov merged commit bf9f899 into dotnet:master Feb 9, 2021
@echesakov echesakov deleted the Runtime_42023 branch February 9, 2021 03:04
@echesakov
Copy link
Contributor Author

@sandreenko Thank you for confirming!

@sdmaclea
Copy link
Contributor

sdmaclea commented Feb 9, 2021

I am not convinced this is the right fix. Why aren't we just hard coding the probing size to 4K for local alloc. How will we handle when crossgen occurs on a 64K page machine and generates code to run on a 4K page machine.

Is the extra complexity and register usage really worth it?

@echesakov
Copy link
Contributor Author

echesakov commented Feb 9, 2021

How will we handle when crossgen occurs on a 64K page machine and generates code to run on a 4K page machine.

The crossgen will always use the hardcoded 4K page - see the changes in src/coreclr/vm/jitinterface.cpp

Is the extra complexity and register usage really worth it?

The temporary register usage would happen at most once. On a 4K page machine the register is never going to be used. On a 16K or 64K page machine allocation of a local heap would be done in a page size decrements (that size can encoded by sub) until the remaining space is smaller than a page. Now, if the remaining space is larger than allowed range of sub instruction immediate then this is when the register is going to be used.

@ghost ghost locked as resolved and limited conversation to collaborators Mar 11, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

arch-arm64 area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

arm64 skippage6.sh test fails to JIT code when page size > 4KB

5 participants