Skip to content

[RISC-V] Fix 64-bit offset handling in emitLoadImmediate#122501

Merged
jakobbotsch merged 8 commits intodotnet:mainfrom
vmolab:riscv64-emitLoadImm-offset-type-correction
Feb 3, 2026
Merged

[RISC-V] Fix 64-bit offset handling in emitLoadImmediate#122501
jakobbotsch merged 8 commits intodotnet:mainfrom
vmolab:riscv64-emitLoadImm-offset-type-correction

Conversation

@credo-quia-absurdum
Copy link
Contributor

@credo-quia-absurdum credo-quia-absurdum commented Dec 12, 2025

Summary

This change addresses a potential hazard in offset handling in emitLoadImmediate. The current implementation computes the offset using 32-bit representation and stores it in a uint32_t, even though the offset boundary (x) can exceed 32.

From my investigation, this does not currently break correctness in the generated code. However, this can lead to suboptimal instruction sequences and fragile behavior if the logic is reused or extended in the future.

This change resolves the issue by performing offset computation and masking at 64-bit width, making the implementation more robust and future-proof.

@clamp03 @tomeksowi @SkyShield, @namu-lee @fuad1502
part of #84834, cc @dotnet/samsung

Details

In C++, the behavior of the right-shift operator is undefined if the shift count is negative or greater than or equal to the bit width of the left-hand operand. (cppreference)

If the value of rhs is negative or is not less than the number of bits in lhs, the behavior is undefined.

The current implementation computes the offset using a 32-bit mask and stores it as a 32-bit unsigned integer. Based on this offset, the emitter repeatedly generates slli and addi instructions using 11-bit non-zero chunks. When the chunk is zero, the shift amount is accumulated and applied later.

This behavior is generally not problematic as long as the offset boundary (x) does not exceed 32. However, when x is larger than 32, the 32-bit representation can lead to suboptimal and fragile behavior. One such example is the constant 0xFFFF'F7FF'FFFF'FFFF, whose emission sequence is shown below.

Emission for 0xFFFF'F7FF'FFFF'FFFF (current implementation)

offset : 0x0000'0001, x: 43

# Instruction Immediate Register Value After x
1 addiw 0xFFF 0xFFFF'FFFF'FFFF'FFFF
2 slli 21 0xFFFF'FFFF'FFE0'0000 43 -> 32 -> 22
3 addi 0x000 0xFFFF'FFFF'FFE0'0000
4 slli 22 0xFFFF'F800'0000'0000 22 -> 11-> 0
5 addi 0xFFF 0xFFFF'F7FF'FFFF'FFFF

In this case, a redundant instruction such as addi a0, a0, 0 (equivalently mv a0, a0) is generated. This happens because offset >> 32 is undefined in C++; as a result, the observed behavior is compiler-dependent and evaluates to the same value as offset itself (0x0000'0001) in this case. As a result, the emitter incorrectly treats the chunk as non-zero and emits an unnecessary slli/addi pair.

During the subsequent removal of leading zeros, the offset boundary is effectively shifted, and the correct chunk (0x0000'0000), offset >> 22, is eventually recomputed. While this process happens to recover the correct value, it relies on fragile implementation details and produces suboptimal instruction sequences.

Emission with 64-bit offset handling

With the offset computed and masked at 64-bit width, the same constant can be materialized more efficiently as below.

offset : 0x0000'0000'0000'0001, x: 43

# Instruction Immediate Register Value After x
1 addiw 0xFFF 0xFFFF'FFFF'FFFF'FFFF
2 slli 43 0xFFFF'F800'0000'0000 43 -> 32 -> 21 -> 10 -> 0
3 addi 0xFFF 0xFFFF'F7FF'FFFF'FFFF

This sequence avoids the redundant instructions and more directly reflects the intended offset handling.

This is not necessarily the only scenario where the 32-bit offset representation can lead to unintended behavior. From my investigation, I did not observe any cases that break correctness in the currently generated instructions. However, the behavior depends on magical recovery steps, which makes the implementation fragile. Such assumption may become problematic if the logic is reused or extended in the future.

The case where x exceeds 32 only arises when the constant contains more than 32 trailing zeros or ones. In such cases, the 32-bit offset representation is limited to either 0x0000'0000 or 0xFFFF'FFFF. For the latter, the subtract-offset form (0x0000'0001) is selected, which can trigger the behavior described above when x is greater than 32. Although the current logic eventually recovers and produces correct code, it relies on coincidental properties of the shifting and the zero-removal process for chunks.

The diff below shows the changes introduced by this update for the example above and across System.*.dll. As seen in System.*.dll, this pattern appears to be rare and does not commonly occur in managed code. While the performance impact of this change is minimal, improving the robustness of offset handling is important for long-term maintainability, which is the motivation for this PR.

@@ -20,18 +20,16 @@ G_M48308_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
 						;; size=16 bbWeight=1 PerfScore 9.00
 G_M48308_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
             addiw          a0, zero, 0xD1FFAB1E
-            slli           a0, a0, 21
-            mv             a0, a0
-            slli           a0, a0, 22
+            slli           a0, a0, 43
             addi           a0, a0, 0xD1FFAB1E
-						;; size=20 bbWeight=1 PerfScore 5.00
+						;; size=12 bbWeight=1 PerfScore 3.00
 G_M48308_IG03:        ; bbWeight=1, epilog, nogc, extend
             ld             ra, 8(sp)
             ld             fp, 0(sp)
             addi           sp, sp, 16
             ret						;; size=16 bbWeight=1 PerfScore 7.50
 
-; Total bytes of code 52, prolog size 16, PerfScore 21.50, instruction count 9, allocated bytes for code 52 (MethodHash=4897434b) for method ConstTest:LoadConst():ulong (FullOpts)
+; Total bytes of code 44, prolog size 16, PerfScore 19.50, instruction count 9, allocated bytes for code 44 (MethodHash=4897434b) for method ConstTest:LoadConst():ulong (FullOpts)
 ; ============================================================

SuperPMI asdmdiffs for 0xFFFF'F7FF'FFFF'FFFF

Diffs are based on 1,809 contexts (613 MinOpts, 1,196 FullOpts).

Overall (-8 bytes)
Collection Base size (bytes) Diff size (bytes) PerfScore in Diffs
emitLoadImmediate.mch 775,644 -8 -0.08%
MinOpts (+0 bytes)
Collection Base size (bytes) Diff size (bytes) PerfScore in Diffs
emitLoadImmediate.mch 278,096 +0 0.00%
FullOpts (-8 bytes)
Collection Base size (bytes) Diff size (bytes) PerfScore in Diffs
emitLoadImmediate.mch 497,548 -8 -0.12%
Example diffs
emitLoadImmediate.mch
-8 (-15.38%) : 1779.dasm - ConstTest:LoadConst():ulong (FullOpts)
@@ -20,18 +20,16 @@ G_M48308_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
 						;; size=16 bbWeight=1 PerfScore 9.00
 G_M48308_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
             addiw          a0, zero, 0xD1FFAB1E
-            slli           a0, a0, 21
-            mv             a0, a0
-            slli           a0, a0, 22
+            slli           a0, a0, 43
             addi           a0, a0, 0xD1FFAB1E
-						;; size=20 bbWeight=1 PerfScore 5.00
+						;; size=12 bbWeight=1 PerfScore 3.00
 G_M48308_IG03:        ; bbWeight=1, epilog, nogc, extend
             ld             ra, 8(sp)
             ld             fp, 0(sp)
             addi           sp, sp, 16
             ret						;; size=16 bbWeight=1 PerfScore 7.50
 
-; Total bytes of code 52, prolog size 16, PerfScore 21.50, instruction count 9, allocated bytes for code 52 (MethodHash=4897434b) for method ConstTest:LoadConst():ulong (FullOpts)
+; Total bytes of code 44, prolog size 16, PerfScore 19.50, instruction count 9, allocated bytes for code 44 (MethodHash=4897434b) for method ConstTest:LoadConst():ulong (FullOpts)
 ; ============================================================
 
 Unwind Info:
@@ -42,7 +40,7 @@ Unwind Info:
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 26 (0x0001a) Actual length = 52 (0x000034)
+  Function Length   : 22 (0x00016) Actual length = 44 (0x00002c)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)
+0 (0.00%) : 1729.dasm - System.RuntimeType+RuntimeTypeCache+MemberInfoCache`1[System.__Canon]:PopulateMethods(System.RuntimeType+RuntimeTypeCache+Filter):System.Reflection.RuntimeMethodInfo[]:this (FullOpts)

No diffs found?

+0 (0.00%) : 1474.dasm - System.Diagnostics.ProcessWaitState:WaitForExit(int):bool:this (MinOpts)

No diffs found?

+0 (0.00%) : 1792.dasm - System.Number:Dragon4(ulong,int,uint,bool,int,bool,System.Span`1[byte],byref):uint (FullOpts)

No diffs found?

+0 (0.00%) : 895.dasm - Interop+Sys:Open(System.String,int,int):Microsoft.Win32.SafeHandles.SafeFileHandle (MinOpts)

No diffs found?

+0 (0.00%) : 447.dasm - System.Text.RegularExpressions.RegexParser:ParseReplacement(System.String,int,System.Collections.Hashtable,int,System.Collections.Hashtable):System.Text.RegularExpressions.RegexReplacement (MinOpts)

No diffs found?

Details

Size improvements/regressions per collection

Collection Contexts with diffs Improvements Regressions Same size Improvements (bytes) Regressions (bytes)
emitLoadImmediate.mch 130 1 0 129 -8 +0

PerfScore improvements/regressions per collection

Collection Contexts with diffs Improvements Regressions Same PerfScore Improvements (PerfScore) Regressions (PerfScore) PerfScore Overall in FullOpts
emitLoadImmediate.mch 130 1 0 129 -9.30% 0.00% -0.0082%

Context information

Collection Diffed contexts MinOpts FullOpts Missed, base Missed, diff
emitLoadImmediate.mch 1,809 613 1,196 0 (0.00%) 0 (0.00%)

jit-analyze output

SuperPMI asdmdiffs for System.*.dll

Diffs are based on 87,783 contexts (86,131 MinOpts, 1,652 FullOpts).

Overall (+0 bytes)
Collection Base size (bytes) Diff size (bytes) PerfScore in Diffs
system.mch 32,228,202 +0 0.00%
MinOpts (+0 bytes)
Collection Base size (bytes) Diff size (bytes) PerfScore in Diffs
system.mch 31,550,092 +0 0.00%
FullOpts (+0 bytes)
Collection Base size (bytes) Diff size (bytes) PerfScore in Diffs
system.mch 678,110 +0 0.00%
Example diffs
system.mch
+0 (0.00%) : 1217.dasm - System.Security.Cryptography.Cose.CoseKey:Sign(System.ReadOnlySpan`1[byte],System.Span`1[byte]):int:this (MinOpts)

No diffs found?

+0 (0.00%) : 3329.dasm - System.IO.Hashing.XxHash3:HashLength0To16(ptr,uint,ulong):ulong (MinOpts)

No diffs found?

+0 (0.00%) : 5825.dasm - System.Linq.ParallelEnumerable:ElementAt[byte](System.Linq.ParallelQuery`1[byte],int):byte (MinOpts)

No diffs found?

+0 (0.00%) : 86848.dasm - System.ConsolePal:get_WindowWidth():int (MinOpts)

No diffs found?

+0 (0.00%) : 84032.dasm - System.Net.ServerSentEvents.Helpers:WriteUtf8Number(System.Buffers.IBufferWriter`1[byte],long) (MinOpts)

No diffs found?

+0 (0.00%) : 82240.dasm - System.Linq.Expressions.Compiler.LambdaCompiler:.ctor(System.Linq.Expressions.Compiler.AnalyzedTree,System.Linq.Expressions.LambdaExpression):this (MinOpts)

No diffs found?

Details

Size improvements/regressions per collection

Collection Contexts with diffs Improvements Regressions Same size Improvements (bytes) Regressions (bytes)
system.mch 2,716 0 0 2,716 -0 +0

PerfScore improvements/regressions per collection

Collection Contexts with diffs Improvements Regressions Same PerfScore Improvements (PerfScore) Regressions (PerfScore) PerfScore Overall in FullOpts
system.mch 2,716 0 0 2,716 0.00% 0.00% 0.0000%

Context information

Collection Diffed contexts MinOpts FullOpts Missed, base Missed, diff
system.mch 87,783 86,131 1,652 0 (0.00%) 0 (0.00%)

jit-analyze output

Notes

I added BitMask64 alongside the existing WordMask to handle 64-bit masks. The inconsistency between these utility helpers will be addressed in a follow-up refactoring PR to minimize the scope of changes and review overhead in this PR.

Copilot AI review requested due to automatic review settings December 12, 2025 18:06
@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Dec 12, 2025
@github-actions github-actions bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Dec 12, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a potential hazard in offset handling for RISC-V's emitLoadImmediate function by upgrading the implementation from 32-bit to 64-bit offset computation. While the current implementation does not break correctness, it can produce suboptimal instruction sequences and relies on fragile, compiler-dependent behavior when the offset boundary exceeds 32 bits.

Key changes:

  • Introduces a new BitMask64 helper function for 64-bit mask generation
  • Upgrades offset computation variables from uint32_t to uint64_t
  • Fixes leading zero count calculation to use 64-bit width

@fuad1502
Copy link
Contributor

fuad1502 commented Dec 13, 2025

Hi, thank you very much for this PR 🙏

I've reviewed this carefully and I realize that I have incorrectly assumed that after extending (or capping) the y:x range to 32 bits optimally, x will be less than or equal to 32:

if (y < 32)
{
y = 31;
x = 0;
}
else if ((y - x) < 31)
{
y = x + 31;
}
else
{
x = y - 31;
}

You can see my assumption in the comments:

* Where high32 = imm[y:x] and imm[63:y] are all zeroes or all ones.

That is, y is less than 63, y:x range is exactly 32 bits, and therefore x is less than or equal to 32.

In your example, y - x is less than 31, therefore it will enter the second conditional block. However, since x is larger than 32, it sets y to be larger than 63. That is, y:x range is not within the immediate bits. We could cap y to 63 and set x to 32, but that would yield sub-optimal instructions (for example, 0x0000'1800'0000'0000). This is because high32 would then need to be loaded with both lui and addiw instead of just addiw.

So, instead, we should cap y to 63 while keeping x unmodified (therefore, y:x is not necessarily 32 bits anymore) and update the comments:

Where high32 = sext(imm[y:x]) and imm[63:y] are all zeroes or all ones.

Again, thank you very much for discovering this bug 🙏

@fuad1502
Copy link
Contributor

fuad1502 commented Dec 13, 2025

BTW, in your PR description, section "Emission with 64-bit offset handling" I think you mistyped the second instruction's immediate, it should be 43, right? 🙏

Copy link
Contributor

@fuad1502 fuad1502 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you very much for your detailed attention to the emitLoadImmediate implementation. I apologize to everyone that this bug went through 🙏

@fuad1502
Copy link
Contributor

fuad1502 commented Dec 13, 2025

FYI, I just realized, the y and x determination is also still sub-optimal.

For example, the 64-bit immediate 0xff0007ffffffffff. Currently, y will be 74, and x will be 43, therefore high32 (0xffffe001) will be loaded with both lui and addiw instructions. If instead y is 56 and x is 25, high32 (0x80040000) can be loaded with a single lui instruction.

For a later PR, modifying the y:x range extension code to the following should yield fewer instructions in cases like those:

  if (y < 32) {
    y = 31;
    x = 0;
  } else if (y - x <= 11) {
    y = x + 31;
  } else {
    x = y - 31;
  }

Edit: The idea is to place the high32 y:x range maximally to the left if it can be loaded with a single addiw, otherwise, maximally place it to the right.

@credo-quia-absurdum
Copy link
Contributor Author

BTW, in your PR description, section "Emission with 64-bit offset handling" I think you mistyped the second instruction's immediate, it should be 43, right? 🙏

Thanks for pointing it out! I updated the PR description.

@am11 am11 added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI arch-riscv Related to the RISC-V architecture and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Dec 14, 2025
@credo-quia-absurdum
Copy link
Contributor Author

I've reviewed this carefully and I realize that I have incorrectly assumed that after extending (or capping) the y:x range to 32 bits optimally, x will be less than or equal to 32:

@fuad1502 Thanks for your careful and detailed review.

Before your explanation, my understanding was that allowing cases where y could become greater than 63 was intentional, as it did not lead to correctness issues under my investigation. However, for clarity, we can explicitly clip y to be less than or equal to 63 as shown below, together with the comment update you suggested.

// Where high32 = sext(imm[y:x]) and imm[63:y] are all zeroes or all ones.

if (y < 32)
{
    y = 31;
    x = 0;
}
else if ((y - x) < 31)
{
    y = x + 31;
    y = (y > 63) ? 63 : y;    // explicit upper bound
}
else
{
    x = y - 31;
}

@credo-quia-absurdum
Copy link
Contributor Author

For a later PR, modifying the y:x range extension code to the following should yield fewer instructions in cases like those:

  if (y < 32) {
    y = 31;
    x = 0;
  } else if (y - x <= 11) {
    y = x + 31;
  } else {
    x = y - 31;
  }

@fuad1502 Thanks for your suggestion. I'll follow up with a separate PR to keep this one focused, and will make sure to validate this with appropriate regression tests.

Copy link
Member

@tomeksowi tomeksowi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@credo-quia-absurdum
Copy link
Contributor Author

@jakobbotsch I’ve resolved the merge conflicts in this PR as well and updated it to reflect the latest main.
When you have a chance, I’d appreciate it if you could take a look.

@credo-quia-absurdum
Copy link
Contributor Author

credo-quia-absurdum commented Feb 3, 2026

@jakobbotsch Quick ping — the PR is up to date with main and already approved by relevant reviewers.
Whenever you have time, please let me know if anything else is needed. Thanks!

@jakobbotsch
Copy link
Member

/ba-g Infra issues

@jakobbotsch jakobbotsch merged commit 7bdcad9 into dotnet:main Feb 3, 2026
120 of 125 checks passed
@jakobbotsch
Copy link
Member

@jakobbotsch Quick ping — the PR is up to date with main and already approved by relevant reviewers. Whenever you have time, please let me know if anything else is needed. Thanks!

Sorry for the wait. Merged! Thanks.

lewing pushed a commit to lewing/runtime that referenced this pull request Feb 9, 2026
…2501)

# Summary

This change addresses a potential hazard in offset handling in
`emitLoadImmediate`. The current implementation computes the offset
using 32-bit representation and stores it in a `uint32_t`, even though
the offset boundary (`x`) can exceed 32.

From my investigation, this does not currently break correctness in the
generated code. However, this can lead to suboptimal instruction
sequences and fragile behavior if the logic is reused or extended in the
future.

This change resolves the issue by performing offset computation and
masking at 64-bit width, making the implementation more robust and
future-proof.

@clamp03 @tomeksowi @SkyShield, @namu-lee @fuad1502 
part of dotnet#84834, cc
@dotnet/samsung

# Details

In C++, the behavior of the right-shift operator is undefined if the
shift count is negative or greater than or equal to the bit width of the
left-hand operand.
([cppreference](https://en.cppreference.com/w/cpp/language/operator_arithmetic.html))

> If the value of rhs is negative or is not less than the number of bits
in lhs, the behavior is undefined.

The current implementation computes the offset using a 32-bit mask and
stores it as a 32-bit unsigned integer. Based on this offset, the
emitter repeatedly generates `slli` and `addi` instructions using 11-bit
non-zero chunks. When the chunk is zero, the shift amount is accumulated
and applied later.

This behavior is generally not problematic as long as the offset
boundary (`x`) does not exceed 32. However, when `x` is larger than 32,
the 32-bit representation can lead to suboptimal and fragile behavior.
One such example is the constant `0xFFFF'F7FF'FFFF'FFFF`, whose emission
sequence is shown below.

### Emission for `0xFFFF'F7FF'FFFF'FFFF` (current implementation)
`offset` : `0x0000'0001`, `x`: `43`
| # | Instruction | Immediate | Register Value After | `x` |
| --- | --- | --- | --- | --- |
| 1 | `addiw` | `0xFFF` | `0xFFFF'FFFF'FFFF'FFFF` | |
| 2 | `slli` | `21` | `0xFFFF'FFFF'FFE0'0000` | 43 -> 32 -> 22 |
| 3 | `addi`| `0x000` | `0xFFFF'FFFF'FFE0'0000` |  |
| 4 | `slli` | `22` | `0xFFFF'F800'0000'0000` | 22 -> 11-> 0 |
| 5 | `addi` | `0xFFF` | `0xFFFF'F7FF'FFFF'FFFF` | |

In this case, a redundant instruction such as `addi a0, a0, 0`
(equivalently `mv a0, a0`) is generated. This happens because `offset >>
32` is undefined in C++; as a result, the observed behavior is
compiler-dependent and evaluates to the same value as `offset` itself
(`0x0000'0001`) in this case. As a result, the emitter incorrectly
treats the chunk as non-zero and emits an unnecessary `slli`/`addi`
pair.

During the subsequent removal of leading zeros, the offset boundary is
effectively shifted, and the correct chunk (`0x0000'0000`), `offset >>
22`, is eventually recomputed. While this process happens to recover the
correct value, it relies on fragile implementation details and produces
suboptimal instruction sequences.

### Emission with 64-bit offset handling
With the offset computed and masked at 64-bit width, the same constant
can be materialized more efficiently as below.

`offset` : `0x0000'0000'0000'0001`, `x`: `43`
| # | Instruction | Immediate | Register Value After | `x` |
| --- | --- | --- | --- | --- |
| 1 | `addiw` | `0xFFF` | `0xFFFF'FFFF'FFFF'FFFF` | |
| 2 | `slli` | `43` | `0xFFFF'F800'0000'0000` | 43 -> 32 -> 21 -> 10 ->
0 |
| 3 | `addi`| `0xFFF` | `0xFFFF'F7FF'FFFF'FFFF` |  |

This sequence avoids the redundant instructions and more directly
reflects the intended offset handling.

This is not necessarily the only scenario where the 32-bit offset
representation can lead to unintended behavior. From my investigation, I
did not observe any cases that break correctness in the currently
generated instructions. However, the behavior depends on magical
recovery steps, which makes the implementation fragile. Such assumption
may become problematic if the logic is reused or extended in the future.

The case where `x` exceeds 32 only arises when the constant contains
more than 32 trailing zeros or ones. In such cases, the 32-bit offset
representation is limited to either `0x0000'0000` or `0xFFFF'FFFF`. For
the latter, the subtract-offset form (`0x0000'0001`) is selected, which
can trigger the behavior described above when `x` is greater than 32.
Although the current logic eventually recovers and produces correct
code, it relies on coincidental properties of the shifting and the
zero-removal process for chunks.
 
The diff below shows the changes introduced by this update for the
example above and across `System.*.dll`. As seen in `System.*.dll`, this
pattern appears to be rare and does not commonly occur in managed code.
While the performance impact of this change is minimal, improving the
robustness of offset handling is important for long-term
maintainability, which is the motivation for this PR.

```diff
@@ -20,18 +20,16 @@ G_M48308_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
 						;; size=16 bbWeight=1 PerfScore 9.00
 G_M48308_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
             addiw          a0, zero, 0xD1FFAB1E
-            slli           a0, a0, 21
-            mv             a0, a0
-            slli           a0, a0, 22
+            slli           a0, a0, 43
             addi           a0, a0, 0xD1FFAB1E
-						;; size=20 bbWeight=1 PerfScore 5.00
+						;; size=12 bbWeight=1 PerfScore 3.00
 G_M48308_IG03:        ; bbWeight=1, epilog, nogc, extend
             ld             ra, 8(sp)
             ld             fp, 0(sp)
             addi           sp, sp, 16
             ret						;; size=16 bbWeight=1 PerfScore 7.50
 
-; Total bytes of code 52, prolog size 16, PerfScore 21.50, instruction count 9, allocated bytes for code 52 (MethodHash=4897434b) for method ConstTest:LoadConst():ulong (FullOpts)
+; Total bytes of code 44, prolog size 16, PerfScore 19.50, instruction count 9, allocated bytes for code 44 (MethodHash=4897434b) for method ConstTest:LoadConst():ulong (FullOpts)
 ; ============================================================
```



## SuperPMI asdmdiffs for `0xFFFF'F7FF'FFFF'FFFF`

Diffs are based on <span style="color:#1460aa">1,809</span> contexts
(<span style="color:#1460aa">613</span> MinOpts, <span
style="color:#1460aa">1,196</span> FullOpts).


<details>
<summary>Overall (<span style="color:green">-8</span> bytes)</summary>
<div style="margin-left:1em">

|Collection|Base size (bytes)|Diff size (bytes)|PerfScore in Diffs
|---|--:|--:|--:|
|emitLoadImmediate.mch|775,644|<span style="color:green">-8</span>|<span
style="color:green">-0.08%</span>|


</div></details>

<details>
<summary>MinOpts (+0 bytes)</summary>
<div style="margin-left:1em">

|Collection|Base size (bytes)|Diff size (bytes)|PerfScore in Diffs
|---|--:|--:|--:|
|emitLoadImmediate.mch|278,096|+0|0.00%|


</div></details>

<details>
<summary>FullOpts (<span style="color:green">-8</span> bytes)</summary>
<div style="margin-left:1em">

|Collection|Base size (bytes)|Diff size (bytes)|PerfScore in Diffs
|---|--:|--:|--:|
|emitLoadImmediate.mch|497,548|<span style="color:green">-8</span>|<span
style="color:green">-0.12%</span>|


</div></details>

<details>
<summary>Example diffs</summary>
<div style="margin-left:1em">


<details>
<summary>emitLoadImmediate.mch</summary>
<div style="margin-left:1em">


<details>
<summary><span style="color:green">-8</span> (<span
style="color:green">-15.38%</span>) : 1779.dasm -
ConstTest:LoadConst():ulong (FullOpts)</summary>
<div style="margin-left:1em">

```diff
@@ -20,18 +20,16 @@ G_M48308_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
 						;; size=16 bbWeight=1 PerfScore 9.00
 G_M48308_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
             addiw          a0, zero, 0xD1FFAB1E
-            slli           a0, a0, 21
-            mv             a0, a0
-            slli           a0, a0, 22
+            slli           a0, a0, 43
             addi           a0, a0, 0xD1FFAB1E
-						;; size=20 bbWeight=1 PerfScore 5.00
+						;; size=12 bbWeight=1 PerfScore 3.00
 G_M48308_IG03:        ; bbWeight=1, epilog, nogc, extend
             ld             ra, 8(sp)
             ld             fp, 0(sp)
             addi           sp, sp, 16
             ret						;; size=16 bbWeight=1 PerfScore 7.50
 
-; Total bytes of code 52, prolog size 16, PerfScore 21.50, instruction count 9, allocated bytes for code 52 (MethodHash=4897434b) for method ConstTest:LoadConst():ulong (FullOpts)
+; Total bytes of code 44, prolog size 16, PerfScore 19.50, instruction count 9, allocated bytes for code 44 (MethodHash=4897434b) for method ConstTest:LoadConst():ulong (FullOpts)
 ; ============================================================
 
 Unwind Info:
@@ -42,7 +40,7 @@ Unwind Info:
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 26 (0x0001a) Actual length = 52 (0x000034)
+  Function Length   : 22 (0x00016) Actual length = 44 (0x00002c)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)
```

</div></details>

<details>
<summary>+0 (0.00%) : 1729.dasm -
System.RuntimeType+RuntimeTypeCache+MemberInfoCache`1[System.__Canon]:PopulateMethods(System.RuntimeType+RuntimeTypeCache+Filter):System.Reflection.RuntimeMethodInfo[]:this
(FullOpts)</summary>
<div style="margin-left:1em">

No diffs found?

</div></details>

<details>
<summary>+0 (0.00%) : 1474.dasm -
System.Diagnostics.ProcessWaitState:WaitForExit(int):bool:this
(MinOpts)</summary>
<div style="margin-left:1em">

No diffs found?

</div></details>

<details>
<summary>+0 (0.00%) : 1792.dasm -
System.Number:Dragon4(ulong,int,uint,bool,int,bool,System.Span`1[byte],byref):uint
(FullOpts)</summary>
<div style="margin-left:1em">

No diffs found?

</div></details>

<details>
<summary>+0 (0.00%) : 895.dasm -
Interop+Sys:Open(System.String,int,int):Microsoft.Win32.SafeHandles.SafeFileHandle
(MinOpts)</summary>
<div style="margin-left:1em">

No diffs found?

</div></details>

<details>
<summary>+0 (0.00%) : 447.dasm -
System.Text.RegularExpressions.RegexParser:ParseReplacement(System.String,int,System.Collections.Hashtable,int,System.Collections.Hashtable):System.Text.RegularExpressions.RegexReplacement
(MinOpts)</summary>
<div style="margin-left:1em">

No diffs found?

</div></details>


</div></details>


</div></details>

<details>
<summary>Details</summary>
<div style="margin-left:1em">

#### Size improvements/regressions per collection

|Collection|Contexts with diffs|Improvements|Regressions|Same
size|Improvements (bytes)|Regressions (bytes)|
|---|--:|--:|--:|--:|--:|--:|
|emitLoadImmediate.mch|130|<span style="color:green">1</span>|<span
style="color:red">0</span>|<span style="color:blue">129</span>|<span
style="color:green">-8</span>|<span style="color:red">+0</span>|

---

#### PerfScore improvements/regressions per collection

|Collection|Contexts with diffs|Improvements|Regressions|Same
PerfScore|Improvements (PerfScore)|Regressions (PerfScore)|PerfScore
Overall in FullOpts|
|---|--:|--:|--:|--:|--:|--:|--:|
|emitLoadImmediate.mch|130|<span style="color:green">1</span>|<span
style="color:red">0</span>|<span style="color:blue">129</span>|<span
style="color:green">-9.30%</span>|0.00%|<span
style="color:green">-0.0082%</span>|

---

#### Context information

|Collection|Diffed contexts|MinOpts|FullOpts|Missed, base|Missed, diff|
|---|--:|--:|--:|--:|--:|
|emitLoadImmediate.mch|1,809|613|1,196|0 (0.00%)|0 (0.00%)|


---

#### jit-analyze output


</div></details>



## SuperPMI asdmdiffs for System.*.dll

Diffs are based on <span style="color:#1460aa">87,783</span> contexts
(<span style="color:#1460aa">86,131</span> MinOpts, <span
style="color:#1460aa">1,652</span> FullOpts).


<details>
<summary>Overall (+0 bytes)</summary>
<div style="margin-left:1em">

|Collection|Base size (bytes)|Diff size (bytes)|PerfScore in Diffs
|---|--:|--:|--:|
|system.mch|32,228,202|+0|0.00%|


</div></details>

<details>
<summary>MinOpts (+0 bytes)</summary>
<div style="margin-left:1em">

|Collection|Base size (bytes)|Diff size (bytes)|PerfScore in Diffs
|---|--:|--:|--:|
|system.mch|31,550,092|+0|0.00%|


</div></details>

<details>
<summary>FullOpts (+0 bytes)</summary>
<div style="margin-left:1em">

|Collection|Base size (bytes)|Diff size (bytes)|PerfScore in Diffs
|---|--:|--:|--:|
|system.mch|678,110|+0|0.00%|


</div></details>

<details>
<summary>Example diffs</summary>
<div style="margin-left:1em">


<details>
<summary>system.mch</summary>
<div style="margin-left:1em">


<details>
<summary>+0 (0.00%) : 1217.dasm -
System.Security.Cryptography.Cose.CoseKey:Sign(System.ReadOnlySpan`1[byte],System.Span`1[byte]):int:this
(MinOpts)</summary>
<div style="margin-left:1em">

No diffs found?

</div></details>

<details>
<summary>+0 (0.00%) : 3329.dasm -
System.IO.Hashing.XxHash3:HashLength0To16(ptr,uint,ulong):ulong
(MinOpts)</summary>
<div style="margin-left:1em">

No diffs found?

</div></details>

<details>
<summary>+0 (0.00%) : 5825.dasm -
System.Linq.ParallelEnumerable:ElementAt[byte](System.Linq.ParallelQuery`1[byte],int):byte
(MinOpts)</summary>
<div style="margin-left:1em">

No diffs found?

</div></details>

<details>
<summary>+0 (0.00%) : 86848.dasm -
System.ConsolePal:get_WindowWidth():int (MinOpts)</summary>
<div style="margin-left:1em">

No diffs found?

</div></details>

<details>
<summary>+0 (0.00%) : 84032.dasm -
System.Net.ServerSentEvents.Helpers:WriteUtf8Number(System.Buffers.IBufferWriter`1[byte],long)
(MinOpts)</summary>
<div style="margin-left:1em">

No diffs found?

</div></details>

<details>
<summary>+0 (0.00%) : 82240.dasm -
System.Linq.Expressions.Compiler.LambdaCompiler:.ctor(System.Linq.Expressions.Compiler.AnalyzedTree,System.Linq.Expressions.LambdaExpression):this
(MinOpts)</summary>
<div style="margin-left:1em">

No diffs found?

</div></details>


</div></details>


</div></details>

<details>
<summary>Details</summary>
<div style="margin-left:1em">

#### Size improvements/regressions per collection

|Collection|Contexts with diffs|Improvements|Regressions|Same
size|Improvements (bytes)|Regressions (bytes)|
|---|--:|--:|--:|--:|--:|--:|
|system.mch|2,716|<span style="color:green">0</span>|<span
style="color:red">0</span>|<span style="color:blue">2,716</span>|<span
style="color:green">-0</span>|<span style="color:red">+0</span>|

---

#### PerfScore improvements/regressions per collection

|Collection|Contexts with diffs|Improvements|Regressions|Same
PerfScore|Improvements (PerfScore)|Regressions (PerfScore)|PerfScore
Overall in FullOpts|
|---|--:|--:|--:|--:|--:|--:|--:|
|system.mch|2,716|<span style="color:green">0</span>|<span
style="color:red">0</span>|<span
style="color:blue">2,716</span>|0.00%|0.00%|0.0000%|

---

#### Context information

|Collection|Diffed contexts|MinOpts|FullOpts|Missed, base|Missed, diff|
|---|--:|--:|--:|--:|--:|
|system.mch|87,783|86,131|1,652|0 (0.00%)|0 (0.00%)|


---

#### jit-analyze output


</div></details>



## Notes

I added `BitMask64` alongside the existing `WordMask` to handle 64-bit
masks. The inconsistency between these utility helpers will be addressed
in a follow-up refactoring PR to minimize the scope of changes and
review overhead in this PR.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arch-riscv Related to the RISC-V architecture area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI community-contribution Indicates that the PR has been added by a community member

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants