Skip to content

Fix Arm64: for 124357#124765

Merged
dhartglassMSFT merged 5 commits intodotnet:mainfrom
dhartglassMSFT:fix_for_124357
Feb 25, 2026
Merged

Fix Arm64: for 124357#124765
dhartglassMSFT merged 5 commits intodotnet:mainfrom
dhartglassMSFT:fix_for_124357

Conversation

@dhartglassMSFT
Copy link
Contributor

Running out of consecutive registers for arm64 under jit reg stress. Lsra is trying to place C380, C381, V363, V364 into four consecutive registers d31, d0, d1, d2 for a VectorTableLookup (TBL) and runs out of registers when trying to assign V364 as d2.

LSRA can normally handle this wraparound. Issue is that the C381 def also occupies d2, so d2 is incorrectly considered not available for V364. Normally this conflict is handled by the LinearScan::consecutiveRegsInUseThisLocation mask - but the way the mask is calculated does not handle the d31->d0 wraparound properly.

fixes #124357

Copilot AI review requested due to automatic review settings February 23, 2026 20:46
@github-actions github-actions bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Feb 23, 2026
@dotnet-policy-service
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes an ARM64 JIT register allocation bug where LSRA (Linear Scan Register Allocator) fails under JIT register stress when allocating 4 consecutive registers that wrap around from d31 to d0, d1, d2 for SVE VectorTableLookup instructions.

Changes:

  • Introduces a helper function getNextFPRegWraparound() to correctly handle register wraparound from REG_FP_LAST to REG_FP_FIRST
  • Replaces bit-shift mask calculation with an iterative loop that properly handles the d31→d0 wraparound case
  • Consolidates three instances of inline wraparound logic into calls to the new helper function

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
src/coreclr/jit/lsraarm64.cpp Adds getNextFPRegWraparound() helper function and replaces the broken bit-shift mask calculation with a loop that correctly builds the consecutiveRegsInUseThisLocation mask by iterating through registers using the wraparound helper
src/coreclr/jit/lsra.h Declares the new getNextFPRegWraparound() helper function in the LinearScan class under the TARGET_ARM64 conditional block

@dhartglassMSFT dhartglassMSFT marked this pull request as ready for review February 24, 2026 19:42
Copilot AI review requested due to automatic review settings February 24, 2026 19:42
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

Copilot AI review requested due to automatic review settings February 24, 2026 20:14
@dhartglassMSFT dhartglassMSFT enabled auto-merge (squash) February 24, 2026 20:16
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.

@dhartglassMSFT dhartglassMSFT merged commit 40f1da5 into dotnet:main Feb 25, 2026
133 of 137 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Test failure: Assertion in spmi replay for windows-arm64

3 participants