[RISC-V] Fix 64-bit offset handling in emitLoadImmediate#122501
[RISC-V] Fix 64-bit offset handling in emitLoadImmediate#122501jakobbotsch merged 8 commits intodotnet:mainfrom
emitLoadImmediate#122501Conversation
There was a problem hiding this comment.
Pull request overview
This PR fixes a potential hazard in offset handling for RISC-V's emitLoadImmediate function by upgrading the implementation from 32-bit to 64-bit offset computation. While the current implementation does not break correctness, it can produce suboptimal instruction sequences and relies on fragile, compiler-dependent behavior when the offset boundary exceeds 32 bits.
Key changes:
- Introduces a new
BitMask64helper function for 64-bit mask generation - Upgrades offset computation variables from
uint32_ttouint64_t - Fixes leading zero count calculation to use 64-bit width
|
Hi, thank you very much for this PR 🙏 I've reviewed this carefully and I realize that I have incorrectly assumed that after extending (or capping) the runtime/src/coreclr/jit/emitriscv64.cpp Lines 1582 to 1594 in a0f3fea You can see my assumption in the comments: runtime/src/coreclr/jit/emitriscv64.cpp Line 1484 in a0f3fea That is, In your example, So, instead, we should cap Again, thank you very much for discovering this bug 🙏 |
|
BTW, in your PR description, section "Emission with 64-bit offset handling" I think you mistyped the second instruction's immediate, it should be 43, right? 🙏 |
fuad1502
left a comment
There was a problem hiding this comment.
LGTM, thank you very much for your detailed attention to the emitLoadImmediate implementation. I apologize to everyone that this bug went through 🙏
|
FYI, I just realized, the For example, the 64-bit immediate For a later PR, modifying the if (y < 32) {
y = 31;
x = 0;
} else if (y - x <= 11) {
y = x + 31;
} else {
x = y - 31;
}Edit: The idea is to place the |
Thanks for pointing it out! I updated the PR description. |
@fuad1502 Thanks for your careful and detailed review. Before your explanation, my understanding was that allowing cases where // Where high32 = sext(imm[y:x]) and imm[63:y] are all zeroes or all ones.
if (y < 32)
{
y = 31;
x = 0;
}
else if ((y - x) < 31)
{
y = x + 31;
y = (y > 63) ? 63 : y; // explicit upper bound
}
else
{
x = y - 31;
} |
if (y < 32) {
y = 31;
x = 0;
} else if (y - x <= 11) {
y = x + 31;
} else {
x = y - 31;
}@fuad1502 Thanks for your suggestion. I'll follow up with a separate PR to keep this one focused, and will make sure to validate this with appropriate regression tests. |
|
@jakobbotsch I’ve resolved the merge conflicts in this PR as well and updated it to reflect the latest main. |
|
@jakobbotsch Quick ping — the PR is up to date with main and already approved by relevant reviewers. |
|
/ba-g Infra issues |
Sorry for the wait. Merged! Thanks. |
…2501) # Summary This change addresses a potential hazard in offset handling in `emitLoadImmediate`. The current implementation computes the offset using 32-bit representation and stores it in a `uint32_t`, even though the offset boundary (`x`) can exceed 32. From my investigation, this does not currently break correctness in the generated code. However, this can lead to suboptimal instruction sequences and fragile behavior if the logic is reused or extended in the future. This change resolves the issue by performing offset computation and masking at 64-bit width, making the implementation more robust and future-proof. @clamp03 @tomeksowi @SkyShield, @namu-lee @fuad1502 part of dotnet#84834, cc @dotnet/samsung # Details In C++, the behavior of the right-shift operator is undefined if the shift count is negative or greater than or equal to the bit width of the left-hand operand. ([cppreference](https://en.cppreference.com/w/cpp/language/operator_arithmetic.html)) > If the value of rhs is negative or is not less than the number of bits in lhs, the behavior is undefined. The current implementation computes the offset using a 32-bit mask and stores it as a 32-bit unsigned integer. Based on this offset, the emitter repeatedly generates `slli` and `addi` instructions using 11-bit non-zero chunks. When the chunk is zero, the shift amount is accumulated and applied later. This behavior is generally not problematic as long as the offset boundary (`x`) does not exceed 32. However, when `x` is larger than 32, the 32-bit representation can lead to suboptimal and fragile behavior. One such example is the constant `0xFFFF'F7FF'FFFF'FFFF`, whose emission sequence is shown below. ### Emission for `0xFFFF'F7FF'FFFF'FFFF` (current implementation) `offset` : `0x0000'0001`, `x`: `43` | # | Instruction | Immediate | Register Value After | `x` | | --- | --- | --- | --- | --- | | 1 | `addiw` | `0xFFF` | `0xFFFF'FFFF'FFFF'FFFF` | | | 2 | `slli` | `21` | `0xFFFF'FFFF'FFE0'0000` | 43 -> 32 -> 22 | | 3 | `addi`| `0x000` | `0xFFFF'FFFF'FFE0'0000` | | | 4 | `slli` | `22` | `0xFFFF'F800'0000'0000` | 22 -> 11-> 0 | | 5 | `addi` | `0xFFF` | `0xFFFF'F7FF'FFFF'FFFF` | | In this case, a redundant instruction such as `addi a0, a0, 0` (equivalently `mv a0, a0`) is generated. This happens because `offset >> 32` is undefined in C++; as a result, the observed behavior is compiler-dependent and evaluates to the same value as `offset` itself (`0x0000'0001`) in this case. As a result, the emitter incorrectly treats the chunk as non-zero and emits an unnecessary `slli`/`addi` pair. During the subsequent removal of leading zeros, the offset boundary is effectively shifted, and the correct chunk (`0x0000'0000`), `offset >> 22`, is eventually recomputed. While this process happens to recover the correct value, it relies on fragile implementation details and produces suboptimal instruction sequences. ### Emission with 64-bit offset handling With the offset computed and masked at 64-bit width, the same constant can be materialized more efficiently as below. `offset` : `0x0000'0000'0000'0001`, `x`: `43` | # | Instruction | Immediate | Register Value After | `x` | | --- | --- | --- | --- | --- | | 1 | `addiw` | `0xFFF` | `0xFFFF'FFFF'FFFF'FFFF` | | | 2 | `slli` | `43` | `0xFFFF'F800'0000'0000` | 43 -> 32 -> 21 -> 10 -> 0 | | 3 | `addi`| `0xFFF` | `0xFFFF'F7FF'FFFF'FFFF` | | This sequence avoids the redundant instructions and more directly reflects the intended offset handling. This is not necessarily the only scenario where the 32-bit offset representation can lead to unintended behavior. From my investigation, I did not observe any cases that break correctness in the currently generated instructions. However, the behavior depends on magical recovery steps, which makes the implementation fragile. Such assumption may become problematic if the logic is reused or extended in the future. The case where `x` exceeds 32 only arises when the constant contains more than 32 trailing zeros or ones. In such cases, the 32-bit offset representation is limited to either `0x0000'0000` or `0xFFFF'FFFF`. For the latter, the subtract-offset form (`0x0000'0001`) is selected, which can trigger the behavior described above when `x` is greater than 32. Although the current logic eventually recovers and produces correct code, it relies on coincidental properties of the shifting and the zero-removal process for chunks. The diff below shows the changes introduced by this update for the example above and across `System.*.dll`. As seen in `System.*.dll`, this pattern appears to be rare and does not commonly occur in managed code. While the performance impact of this change is minimal, improving the robustness of offset handling is important for long-term maintainability, which is the motivation for this PR. ```diff @@ -20,18 +20,16 @@ G_M48308_IG01: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, ;; size=16 bbWeight=1 PerfScore 9.00 G_M48308_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref addiw a0, zero, 0xD1FFAB1E - slli a0, a0, 21 - mv a0, a0 - slli a0, a0, 22 + slli a0, a0, 43 addi a0, a0, 0xD1FFAB1E - ;; size=20 bbWeight=1 PerfScore 5.00 + ;; size=12 bbWeight=1 PerfScore 3.00 G_M48308_IG03: ; bbWeight=1, epilog, nogc, extend ld ra, 8(sp) ld fp, 0(sp) addi sp, sp, 16 ret ;; size=16 bbWeight=1 PerfScore 7.50 -; Total bytes of code 52, prolog size 16, PerfScore 21.50, instruction count 9, allocated bytes for code 52 (MethodHash=4897434b) for method ConstTest:LoadConst():ulong (FullOpts) +; Total bytes of code 44, prolog size 16, PerfScore 19.50, instruction count 9, allocated bytes for code 44 (MethodHash=4897434b) for method ConstTest:LoadConst():ulong (FullOpts) ; ============================================================ ``` ## SuperPMI asdmdiffs for `0xFFFF'F7FF'FFFF'FFFF` Diffs are based on <span style="color:#1460aa">1,809</span> contexts (<span style="color:#1460aa">613</span> MinOpts, <span style="color:#1460aa">1,196</span> FullOpts). <details> <summary>Overall (<span style="color:green">-8</span> bytes)</summary> <div style="margin-left:1em"> |Collection|Base size (bytes)|Diff size (bytes)|PerfScore in Diffs |---|--:|--:|--:| |emitLoadImmediate.mch|775,644|<span style="color:green">-8</span>|<span style="color:green">-0.08%</span>| </div></details> <details> <summary>MinOpts (+0 bytes)</summary> <div style="margin-left:1em"> |Collection|Base size (bytes)|Diff size (bytes)|PerfScore in Diffs |---|--:|--:|--:| |emitLoadImmediate.mch|278,096|+0|0.00%| </div></details> <details> <summary>FullOpts (<span style="color:green">-8</span> bytes)</summary> <div style="margin-left:1em"> |Collection|Base size (bytes)|Diff size (bytes)|PerfScore in Diffs |---|--:|--:|--:| |emitLoadImmediate.mch|497,548|<span style="color:green">-8</span>|<span style="color:green">-0.12%</span>| </div></details> <details> <summary>Example diffs</summary> <div style="margin-left:1em"> <details> <summary>emitLoadImmediate.mch</summary> <div style="margin-left:1em"> <details> <summary><span style="color:green">-8</span> (<span style="color:green">-15.38%</span>) : 1779.dasm - ConstTest:LoadConst():ulong (FullOpts)</summary> <div style="margin-left:1em"> ```diff @@ -20,18 +20,16 @@ G_M48308_IG01: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, ;; size=16 bbWeight=1 PerfScore 9.00 G_M48308_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref addiw a0, zero, 0xD1FFAB1E - slli a0, a0, 21 - mv a0, a0 - slli a0, a0, 22 + slli a0, a0, 43 addi a0, a0, 0xD1FFAB1E - ;; size=20 bbWeight=1 PerfScore 5.00 + ;; size=12 bbWeight=1 PerfScore 3.00 G_M48308_IG03: ; bbWeight=1, epilog, nogc, extend ld ra, 8(sp) ld fp, 0(sp) addi sp, sp, 16 ret ;; size=16 bbWeight=1 PerfScore 7.50 -; Total bytes of code 52, prolog size 16, PerfScore 21.50, instruction count 9, allocated bytes for code 52 (MethodHash=4897434b) for method ConstTest:LoadConst():ulong (FullOpts) +; Total bytes of code 44, prolog size 16, PerfScore 19.50, instruction count 9, allocated bytes for code 44 (MethodHash=4897434b) for method ConstTest:LoadConst():ulong (FullOpts) ; ============================================================ Unwind Info: @@ -42,7 +40,7 @@ Unwind Info: E bit : 0 X bit : 0 Vers : 0 - Function Length : 26 (0x0001a) Actual length = 52 (0x000034) + Function Length : 22 (0x00016) Actual length = 44 (0x00002c) ---- Epilog scopes ---- ---- Scope 0 Epilog Start Offset : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e) ``` </div></details> <details> <summary>+0 (0.00%) : 1729.dasm - System.RuntimeType+RuntimeTypeCache+MemberInfoCache`1[System.__Canon]:PopulateMethods(System.RuntimeType+RuntimeTypeCache+Filter):System.Reflection.RuntimeMethodInfo[]:this (FullOpts)</summary> <div style="margin-left:1em"> No diffs found? </div></details> <details> <summary>+0 (0.00%) : 1474.dasm - System.Diagnostics.ProcessWaitState:WaitForExit(int):bool:this (MinOpts)</summary> <div style="margin-left:1em"> No diffs found? </div></details> <details> <summary>+0 (0.00%) : 1792.dasm - System.Number:Dragon4(ulong,int,uint,bool,int,bool,System.Span`1[byte],byref):uint (FullOpts)</summary> <div style="margin-left:1em"> No diffs found? </div></details> <details> <summary>+0 (0.00%) : 895.dasm - Interop+Sys:Open(System.String,int,int):Microsoft.Win32.SafeHandles.SafeFileHandle (MinOpts)</summary> <div style="margin-left:1em"> No diffs found? </div></details> <details> <summary>+0 (0.00%) : 447.dasm - System.Text.RegularExpressions.RegexParser:ParseReplacement(System.String,int,System.Collections.Hashtable,int,System.Collections.Hashtable):System.Text.RegularExpressions.RegexReplacement (MinOpts)</summary> <div style="margin-left:1em"> No diffs found? </div></details> </div></details> </div></details> <details> <summary>Details</summary> <div style="margin-left:1em"> #### Size improvements/regressions per collection |Collection|Contexts with diffs|Improvements|Regressions|Same size|Improvements (bytes)|Regressions (bytes)| |---|--:|--:|--:|--:|--:|--:| |emitLoadImmediate.mch|130|<span style="color:green">1</span>|<span style="color:red">0</span>|<span style="color:blue">129</span>|<span style="color:green">-8</span>|<span style="color:red">+0</span>| --- #### PerfScore improvements/regressions per collection |Collection|Contexts with diffs|Improvements|Regressions|Same PerfScore|Improvements (PerfScore)|Regressions (PerfScore)|PerfScore Overall in FullOpts| |---|--:|--:|--:|--:|--:|--:|--:| |emitLoadImmediate.mch|130|<span style="color:green">1</span>|<span style="color:red">0</span>|<span style="color:blue">129</span>|<span style="color:green">-9.30%</span>|0.00%|<span style="color:green">-0.0082%</span>| --- #### Context information |Collection|Diffed contexts|MinOpts|FullOpts|Missed, base|Missed, diff| |---|--:|--:|--:|--:|--:| |emitLoadImmediate.mch|1,809|613|1,196|0 (0.00%)|0 (0.00%)| --- #### jit-analyze output </div></details> ## SuperPMI asdmdiffs for System.*.dll Diffs are based on <span style="color:#1460aa">87,783</span> contexts (<span style="color:#1460aa">86,131</span> MinOpts, <span style="color:#1460aa">1,652</span> FullOpts). <details> <summary>Overall (+0 bytes)</summary> <div style="margin-left:1em"> |Collection|Base size (bytes)|Diff size (bytes)|PerfScore in Diffs |---|--:|--:|--:| |system.mch|32,228,202|+0|0.00%| </div></details> <details> <summary>MinOpts (+0 bytes)</summary> <div style="margin-left:1em"> |Collection|Base size (bytes)|Diff size (bytes)|PerfScore in Diffs |---|--:|--:|--:| |system.mch|31,550,092|+0|0.00%| </div></details> <details> <summary>FullOpts (+0 bytes)</summary> <div style="margin-left:1em"> |Collection|Base size (bytes)|Diff size (bytes)|PerfScore in Diffs |---|--:|--:|--:| |system.mch|678,110|+0|0.00%| </div></details> <details> <summary>Example diffs</summary> <div style="margin-left:1em"> <details> <summary>system.mch</summary> <div style="margin-left:1em"> <details> <summary>+0 (0.00%) : 1217.dasm - System.Security.Cryptography.Cose.CoseKey:Sign(System.ReadOnlySpan`1[byte],System.Span`1[byte]):int:this (MinOpts)</summary> <div style="margin-left:1em"> No diffs found? </div></details> <details> <summary>+0 (0.00%) : 3329.dasm - System.IO.Hashing.XxHash3:HashLength0To16(ptr,uint,ulong):ulong (MinOpts)</summary> <div style="margin-left:1em"> No diffs found? </div></details> <details> <summary>+0 (0.00%) : 5825.dasm - System.Linq.ParallelEnumerable:ElementAt[byte](System.Linq.ParallelQuery`1[byte],int):byte (MinOpts)</summary> <div style="margin-left:1em"> No diffs found? </div></details> <details> <summary>+0 (0.00%) : 86848.dasm - System.ConsolePal:get_WindowWidth():int (MinOpts)</summary> <div style="margin-left:1em"> No diffs found? </div></details> <details> <summary>+0 (0.00%) : 84032.dasm - System.Net.ServerSentEvents.Helpers:WriteUtf8Number(System.Buffers.IBufferWriter`1[byte],long) (MinOpts)</summary> <div style="margin-left:1em"> No diffs found? </div></details> <details> <summary>+0 (0.00%) : 82240.dasm - System.Linq.Expressions.Compiler.LambdaCompiler:.ctor(System.Linq.Expressions.Compiler.AnalyzedTree,System.Linq.Expressions.LambdaExpression):this (MinOpts)</summary> <div style="margin-left:1em"> No diffs found? </div></details> </div></details> </div></details> <details> <summary>Details</summary> <div style="margin-left:1em"> #### Size improvements/regressions per collection |Collection|Contexts with diffs|Improvements|Regressions|Same size|Improvements (bytes)|Regressions (bytes)| |---|--:|--:|--:|--:|--:|--:| |system.mch|2,716|<span style="color:green">0</span>|<span style="color:red">0</span>|<span style="color:blue">2,716</span>|<span style="color:green">-0</span>|<span style="color:red">+0</span>| --- #### PerfScore improvements/regressions per collection |Collection|Contexts with diffs|Improvements|Regressions|Same PerfScore|Improvements (PerfScore)|Regressions (PerfScore)|PerfScore Overall in FullOpts| |---|--:|--:|--:|--:|--:|--:|--:| |system.mch|2,716|<span style="color:green">0</span>|<span style="color:red">0</span>|<span style="color:blue">2,716</span>|0.00%|0.00%|0.0000%| --- #### Context information |Collection|Diffed contexts|MinOpts|FullOpts|Missed, base|Missed, diff| |---|--:|--:|--:|--:|--:| |system.mch|87,783|86,131|1,652|0 (0.00%)|0 (0.00%)| --- #### jit-analyze output </div></details> ## Notes I added `BitMask64` alongside the existing `WordMask` to handle 64-bit masks. The inconsistency between these utility helpers will be addressed in a follow-up refactoring PR to minimize the scope of changes and review overhead in this PR.
Summary
This change addresses a potential hazard in offset handling in
emitLoadImmediate. The current implementation computes the offset using 32-bit representation and stores it in auint32_t, even though the offset boundary (x) can exceed 32.From my investigation, this does not currently break correctness in the generated code. However, this can lead to suboptimal instruction sequences and fragile behavior if the logic is reused or extended in the future.
This change resolves the issue by performing offset computation and masking at 64-bit width, making the implementation more robust and future-proof.
@clamp03 @tomeksowi @SkyShield, @namu-lee @fuad1502
part of #84834, cc @dotnet/samsung
Details
In C++, the behavior of the right-shift operator is undefined if the shift count is negative or greater than or equal to the bit width of the left-hand operand. (cppreference)
The current implementation computes the offset using a 32-bit mask and stores it as a 32-bit unsigned integer. Based on this offset, the emitter repeatedly generates
slliandaddiinstructions using 11-bit non-zero chunks. When the chunk is zero, the shift amount is accumulated and applied later.This behavior is generally not problematic as long as the offset boundary (
x) does not exceed 32. However, whenxis larger than 32, the 32-bit representation can lead to suboptimal and fragile behavior. One such example is the constant0xFFFF'F7FF'FFFF'FFFF, whose emission sequence is shown below.Emission for
0xFFFF'F7FF'FFFF'FFFF(current implementation)offset:0x0000'0001,x:43xaddiw0xFFF0xFFFF'FFFF'FFFF'FFFFslli210xFFFF'FFFF'FFE0'0000addi0x0000xFFFF'FFFF'FFE0'0000slli220xFFFF'F800'0000'0000addi0xFFF0xFFFF'F7FF'FFFF'FFFFIn this case, a redundant instruction such as
addi a0, a0, 0(equivalentlymv a0, a0) is generated. This happens becauseoffset >> 32is undefined in C++; as a result, the observed behavior is compiler-dependent and evaluates to the same value asoffsetitself (0x0000'0001) in this case. As a result, the emitter incorrectly treats the chunk as non-zero and emits an unnecessaryslli/addipair.During the subsequent removal of leading zeros, the offset boundary is effectively shifted, and the correct chunk (
0x0000'0000),offset >> 22, is eventually recomputed. While this process happens to recover the correct value, it relies on fragile implementation details and produces suboptimal instruction sequences.Emission with 64-bit offset handling
With the offset computed and masked at 64-bit width, the same constant can be materialized more efficiently as below.
offset:0x0000'0000'0000'0001,x:43xaddiw0xFFF0xFFFF'FFFF'FFFF'FFFFslli430xFFFF'F800'0000'0000addi0xFFF0xFFFF'F7FF'FFFF'FFFFThis sequence avoids the redundant instructions and more directly reflects the intended offset handling.
This is not necessarily the only scenario where the 32-bit offset representation can lead to unintended behavior. From my investigation, I did not observe any cases that break correctness in the currently generated instructions. However, the behavior depends on magical recovery steps, which makes the implementation fragile. Such assumption may become problematic if the logic is reused or extended in the future.
The case where
xexceeds 32 only arises when the constant contains more than 32 trailing zeros or ones. In such cases, the 32-bit offset representation is limited to either0x0000'0000or0xFFFF'FFFF. For the latter, the subtract-offset form (0x0000'0001) is selected, which can trigger the behavior described above whenxis greater than 32. Although the current logic eventually recovers and produces correct code, it relies on coincidental properties of the shifting and the zero-removal process for chunks.The diff below shows the changes introduced by this update for the example above and across
System.*.dll. As seen inSystem.*.dll, this pattern appears to be rare and does not commonly occur in managed code. While the performance impact of this change is minimal, improving the robustness of offset handling is important for long-term maintainability, which is the motivation for this PR.SuperPMI asdmdiffs for
0xFFFF'F7FF'FFFF'FFFFDiffs are based on 1,809 contexts (613 MinOpts, 1,196 FullOpts).
Overall (-8 bytes)
MinOpts (+0 bytes)
FullOpts (-8 bytes)
Example diffs
emitLoadImmediate.mch
-8 (-15.38%) : 1779.dasm - ConstTest:LoadConst():ulong (FullOpts)
+0 (0.00%) : 1729.dasm - System.RuntimeType+RuntimeTypeCache+MemberInfoCache`1[System.__Canon]:PopulateMethods(System.RuntimeType+RuntimeTypeCache+Filter):System.Reflection.RuntimeMethodInfo[]:this (FullOpts)
No diffs found?
+0 (0.00%) : 1474.dasm - System.Diagnostics.ProcessWaitState:WaitForExit(int):bool:this (MinOpts)
No diffs found?
+0 (0.00%) : 1792.dasm - System.Number:Dragon4(ulong,int,uint,bool,int,bool,System.Span`1[byte],byref):uint (FullOpts)
No diffs found?
+0 (0.00%) : 895.dasm - Interop+Sys:Open(System.String,int,int):Microsoft.Win32.SafeHandles.SafeFileHandle (MinOpts)
No diffs found?
+0 (0.00%) : 447.dasm - System.Text.RegularExpressions.RegexParser:ParseReplacement(System.String,int,System.Collections.Hashtable,int,System.Collections.Hashtable):System.Text.RegularExpressions.RegexReplacement (MinOpts)
No diffs found?
Details
Size improvements/regressions per collection
PerfScore improvements/regressions per collection
Context information
jit-analyze output
SuperPMI asdmdiffs for System.*.dll
Diffs are based on 87,783 contexts (86,131 MinOpts, 1,652 FullOpts).
Overall (+0 bytes)
MinOpts (+0 bytes)
FullOpts (+0 bytes)
Example diffs
system.mch
+0 (0.00%) : 1217.dasm - System.Security.Cryptography.Cose.CoseKey:Sign(System.ReadOnlySpan`1[byte],System.Span`1[byte]):int:this (MinOpts)
No diffs found?
+0 (0.00%) : 3329.dasm - System.IO.Hashing.XxHash3:HashLength0To16(ptr,uint,ulong):ulong (MinOpts)
No diffs found?
+0 (0.00%) : 5825.dasm - System.Linq.ParallelEnumerable:ElementAt[byte](System.Linq.ParallelQuery`1[byte],int):byte (MinOpts)
No diffs found?
+0 (0.00%) : 86848.dasm - System.ConsolePal:get_WindowWidth():int (MinOpts)
No diffs found?
+0 (0.00%) : 84032.dasm - System.Net.ServerSentEvents.Helpers:WriteUtf8Number(System.Buffers.IBufferWriter`1[byte],long) (MinOpts)
No diffs found?
+0 (0.00%) : 82240.dasm - System.Linq.Expressions.Compiler.LambdaCompiler:.ctor(System.Linq.Expressions.Compiler.AnalyzedTree,System.Linq.Expressions.LambdaExpression):this (MinOpts)
No diffs found?
Details
Size improvements/regressions per collection
PerfScore improvements/regressions per collection
Context information
jit-analyze output
Notes
I added
BitMask64alongside the existingWordMaskto handle 64-bit masks. The inconsistency between these utility helpers will be addressed in a follow-up refactoring PR to minimize the scope of changes and review overhead in this PR.