Improve Math.BigMul performance on x64#117261
Improve Math.BigMul performance on x64#117261Daniel-Svensson wants to merge 13 commits intodotnet:mainfrom
Conversation
There was a problem hiding this comment.
Pull Request Overview
This PR backports the Math.BigMul hardware intrinsic support on x64 without the mulx instruction extension. It adds new BigMul overloads in the X86Base APIs, hooks them into Math.BigMul, and extends the JIT to lower, schedule, and codegen these multi-register intrinsics.
- Introduces
BigMulmethods for 32-, 64-, and pointer-size integers inX86Baseand their platform-not-supported stubs. - Updates
Math.BigMulto prefer the newX86Base.X64.BigMulpath on non-MONO x64, falling back as before. - Enhances JIT (linear scan, lowering, import, list, codegen, tree layout) to recognize and generate
BigMulintrinsics.
Reviewed Changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| src/System.Private.CoreLib/src/System/Runtime/Intrinsics/X86/X86Base.cs | Added BigMul intrinsics for various operand widths |
| src/System.Private.CoreLib/src/System/Runtime/Intrinsics/X86/X86Base.PlatformNotSupported.cs | Added BigMul stubs throwing on unsupported platforms |
| src/System.Private.CoreLib/src/System/Math.cs | Routed Math.BigMul to use the new intrinsics on x64 |
| src/coreclr/jit/lsraxarch.cpp | Updated register allocator for BigMul multi-reg defs |
| src/coreclr/jit/lowerxarch.cpp | Enabled containment checks for BigMul |
| src/coreclr/jit/hwintrinsicxarch.cpp | Imported BigMul as a multi-register HW intrinsic |
| src/coreclr/jit/hwintrinsiclistxarch.h | Listed BigMul in the x86 and x64 HW intrinsic tables |
| src/coreclr/jit/hwintrinsiccodegenxarch.cpp | Emitting MUL/IMUL sequence for BigMul |
| src/coreclr/jit/hwintrinsic.h | Updated multi-reg return count for BigMul |
| src/coreclr/jit/gentree.cpp | Defined struct layout for BigMul return |
Comments suppressed due to low confidence (3)
src/libraries/System.Private.CoreLib/src/System/Math.cs:205
- Consider adding unit tests that validate the new
Math.BigMul(ulong, ulong, out ulong)path on x64 and ensure correct behavior whenX86Base.X64.IsSupportedis true/false.
#if !MONO // X64.BigMul is not yet implemented in MONO
|
@EgorBo, PTAL. |
|
Rerunning failed test. |
| /// <remarks> | ||
| /// <para>Its functionality is exposed in the public <see cref="Math" /> class.</para> | ||
| /// </remarks> | ||
| internal static (uint Lower, uint Upper) BigMul(uint left, uint right) { throw new PlatformNotSupportedException(); } |
There was a problem hiding this comment.
Why do we need these if we don't use them and they're internal?
There was a problem hiding this comment.
no, they (int and nint overloads) can probably be removed since we do not have any such public api unless the corresponding API proposal is approved (#58263)
They were there to follow the pattern of other hardware methods and because 32bit bigmul used them before implementing it in JIT instead.
Comment on the other PR:
The nint and 32 bit version never gets called, should I remove them or keep them for the symmetry ?
I do no longer think it makes much sense to make this methods public >(#58263) since you can just call existing BigMul methods, and the nint version was solved by a simple *if * in #114731
There was a problem hiding this comment.
I guess it means that NI_X86Base_BigMul should be removed as well ?
There was a problem hiding this comment.
I'd remove, it's basically a dead/untested code. Btw, I've pushed some AI-generated tests 🙂
There was a problem hiding this comment.
I've removed the 32bit and IntPtr (nint) overloads as well as NI_X86Base_BigMul .
I did not look that much on the tests but thy seems fine (during developmen one error where register was swapped was detected by xxhash beeing used in tests and not by existing BigMul test. It seems like it should be covered by one of the new tests)
|
Sorry for the slow feedback loop, I think it's looking good, let's see what CI thinks |
| // Licensed to the .NET Foundation under one or more agreements. | ||
| // The .NET Foundation licenses this file to you under the MIT license. | ||
|
|
||
| using System; | ||
| using Xunit; | ||
|
|
||
| namespace JIT.HardwareIntrinsics.X86 | ||
| { | ||
| public class BigMulTestSuite | ||
| { |
There was a problem hiding this comment.
The test file BigMul.cs is not referenced in any .csproj file, which means it won't be compiled or executed as part of the test suite. You need to either:
- Add a new test project (e.g., X86Base.X64_BigMul.csproj) that includes this file, or
- Add this file to an existing X86Base.X64 test project
Looking at similar tests in the X86Base.X64 directory, you would typically need to create a .csproj file that includes both this test file and the Program.cs files.
| /// <summary> | ||
| /// Supplementary test to ensure that the 'out' parameter behavior | ||
| /// handles stack-allocated variables correctly. | ||
| /// </summary> | ||
| private static void TestOutParameterConsistency() | ||
| { | ||
| long a = 123456789; | ||
| long b = 987654321; | ||
| long lowExpected = a * b; | ||
|
|
||
| Math.BigMul(a, b, out long lowActual); | ||
|
|
||
| Assert.Equal(lowExpected, lowActual); | ||
| } | ||
|
|
||
| /// <summary> | ||
| /// Test using properties instead of fields to ensure getter logic | ||
| /// doesn't interfere with the intrinsic mapping. | ||
| /// </summary> | ||
| private class PropertyContainer | ||
| { | ||
| public long Val => 0x1122334455667788; | ||
| } | ||
|
|
||
| private static void TestPropertyInputs() | ||
| { | ||
| var p = new PropertyContainer(); | ||
| long low; | ||
| long high = Math.BigMul(p.Val, 2, out low); | ||
|
|
||
| Assert.Equal(0L, high); | ||
| Assert.Equal(p.Val * 2, low); | ||
| } |
There was a problem hiding this comment.
The test methods TestOutParameterConsistency and TestPropertyInputs are defined but never called from TestEntryPoint, so they won't be executed during test runs. Either add calls to these methods in TestEntryPoint (lines 30-42), or remove these methods if they're not needed.
| /// <summary> | |
| /// Supplementary test to ensure that the 'out' parameter behavior | |
| /// handles stack-allocated variables correctly. | |
| /// </summary> | |
| private static void TestOutParameterConsistency() | |
| { | |
| long a = 123456789; | |
| long b = 987654321; | |
| long lowExpected = a * b; | |
| Math.BigMul(a, b, out long lowActual); | |
| Assert.Equal(lowExpected, lowActual); | |
| } | |
| /// <summary> | |
| /// Test using properties instead of fields to ensure getter logic | |
| /// doesn't interfere with the intrinsic mapping. | |
| /// </summary> | |
| private class PropertyContainer | |
| { | |
| public long Val => 0x1122334455667788; | |
| } | |
| private static void TestPropertyInputs() | |
| { | |
| var p = new PropertyContainer(); | |
| long low; | |
| long high = Math.BigMul(p.Val, 2, out low); | |
| Assert.Equal(0L, high); | |
| Assert.Equal(p.Val * 2, low); | |
| } |
| var_types targetType = node->GetSimdBaseType(); | ||
| GenTree* regOp = node->Op(1); | ||
| GenTree* rmOp = node->Op(2); | ||
| instruction ins = HWIntrinsicInfo::lookupIns(intrinsicId, targetType, m_compiler); | ||
|
|
||
| emitAttr attr = emitTypeSize(targetType); | ||
| emitter* emit = GetEmitter(); |
There was a problem hiding this comment.
Local variable 'targetType' shadows the function parameter of the same name (line 2484). Consider using a different name like 'operandType' or 'mulTargetType' to avoid confusion. The same applies to 'emit' which shadows the function-scope variable at line 2486.
| var_types targetType = node->GetSimdBaseType(); | |
| GenTree* regOp = node->Op(1); | |
| GenTree* rmOp = node->Op(2); | |
| instruction ins = HWIntrinsicInfo::lookupIns(intrinsicId, targetType, m_compiler); | |
| emitAttr attr = emitTypeSize(targetType); | |
| emitter* emit = GetEmitter(); | |
| GenTree* regOp = node->Op(1); | |
| GenTree* rmOp = node->Op(2); | |
| instruction ins = HWIntrinsicInfo::lookupIns(intrinsicId, baseType, m_compiler); | |
| emitAttr attr = emitTypeSize(baseType); |
This #115966 but without the mulx support.
See the old PR for benchmarks results and generated code
The reason for opening a separate PR is
Feel free to close it if you prefer to work with the original PR