-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Description
Background and Motivation
In .NET 3.1 we introduced the following intrinsics to expose mulx x86 instruction:
uint MultiplyNoFlags(uint left, uint right, uint* low)ulong MultiplyNoFlags(ulong left, ulong right, ulong* low)
where low is an out parameter that is used to return the lower 32-bit/64-bit part of 64-bit/128-bit result of left * right multiplication while the return value contains the upper part.
When the instrinsics are used the JIT produces sub-optimal code due to the fact that low has "address-taken" attribute.
For example, the following C# methods
static unsafe uint mulx(uint a, uint b)
{
uint r;
return Bmi2.MultiplyNoFlags(a, b, &r) + r;
}
static unsafe ulong mulx_64(ulong a, ulong b)
{
ulong r;
return Bmi2.X64.MultiplyNoFlags(a, b, &r) + r;
}will be compiled down to the following code by the current implementation of the JIT
mulx
G_M48748_IG01: ;; offset=0000H
50 push rax
33C0 xor rax, rax
89442404 mov dword ptr [rsp+04H], eax
89542418 mov dword ptr [rsp+18H], edx
;; bbWeight=1 PerfScore 3.25
G_M48748_IG02: ;; offset=000BH
488D442404 lea rax, bword ptr [rsp+04H]
448B442418 mov r8d, dword ptr [rsp+18H]
8BD1 mov edx, ecx
C4C233F6D0 mulx edx, r9d, r8d
448908 mov dword ptr [rax], r9d
8BC2 mov eax, edx
03442404 add eax, dword ptr [rsp+04H]
;; bbWeight=1 PerfScore 7.00
G_M48748_IG03: ;; offset=0025H
4883C408 add rsp, 8
C3 retmulx_64
G_M55976_IG01: ;; offset=0000H
50 push rax
33C0 xor rax, rax
48890424 mov qword ptr [rsp], rax
4889542418 mov qword ptr [rsp+18H], rdx
;; bbWeight=1 PerfScore 3.25
G_M55976_IG02: ;; offset=000CH
488D0424 lea rax, bword ptr [rsp]
4C8B442418 mov r8, qword ptr [rsp+18H]
488BD1 mov rdx, rcx
C4C2B3F6D0 mulx rdx, r9, r8
4C8908 mov qword ptr [rax], r9
488BC2 mov rax, rdx
48030424 add rax, qword ptr [rsp]
;; bbWeight=1 PerfScore 7.00
G_M55976_IG03: ;; offset=0027H
4883C408 add rsp, 8
C3 ret
;; bbWeight=1 PerfScore 1.25However, if the Bmi2.MultiplyNoFlags were implemented instead as
public static unsafe uint MultiplyNoFlags(uint left, uint right, uint* low)
{
var result = MultiplyNoFlags2(left, right); *low = result.Item1; return result.Item2;
}
public static unsafe ulong MultiplyNoFlags(ulong left, ulong right, ulong* low)
{
var result = MultiplyNoFlags2(left, right); *low = result.Item1; return result.Item2;
}the JIT as in #37928 would inline MultiplyNoFlags and be able to remove the address-taken attribute from a local corresponding to low:
mulx
G_M48748_IG01: ;; offset=0000H
;; bbWeight=1 PerfScore 0.00
G_M48748_IG02: ;; offset=0000H
C4E27BF6D1 mulx edx, eax, ecx
03C2 add eax, edx
;; bbWeight=1 PerfScore 3.25
G_M48748_IG03: ;; offset=0007H
C3 ret
;; bbWeight=1 PerfScore 1.00mulx_64
G_M55976_IG01: ;; offset=0000H
;; bbWeight=1 PerfScore 0.00
G_M55976_IG02: ;; offset=0000H
C4E2FBF6D1 mulx rdx, rax, rcx
4803C2 add rax, rdx
;; bbWeight=1 PerfScore 3.25
G_M55976_IG03: ;; offset=0008H
C3 ret
;; bbWeight=1 PerfScore 1.00Proposed API
namespace System.Runtime.Intrinsics.X86
{
public abstract class Bmi2 : X86Base
{
public static (uint Lower, uint Upper) MultiplyNoFlags2(uint left, uint right);
public abstract class X64: X86Base.X64
{
public static (ulong Lower, ulong Upper) MultiplyNoFlags2(ulong left, ulong right);
}
}
}Based on work Carol did in #37928