Conversation
|
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
| @@ -8,36 +8,35 @@ namespace Microsoft.Win32.SafeHandles | |||
| { | |||
| public abstract partial class CriticalHandleMinusOneIsInvalid : System.Runtime.InteropServices.CriticalHandle | |||
| { | |||
| protected CriticalHandleMinusOneIsInvalid() : base (default(System.IntPtr)) { } | |||
There was a problem hiding this comment.
There shouldn't be any changes to System.Runtime.cs with this PR
| InstructionSet_AVX512BMM=46, | ||
| InstructionSet_AVX512BMM_X64=47, |
There was a problem hiding this comment.
Looks like this may have been added manually.
You rather want to modify src/coreclr/tools/Common/JitInterface/ThunkGenerator/InstructionSetDesc.txt and then run src/coreclr/tools/Common/JitInterface/ThunkGenerator/gen.bat
| case NI_AVX512BMM_BitMultiplyMatrix16x16WithOrReduction: | ||
| case NI_AVX512BMM_BitMultiplyMatrix16x16WithXorReduction: |
There was a problem hiding this comment.
These could be put in the same grouping as AvxVnni just above, right?
Looks like they're identical and should also have the same assertions.
src/coreclr/jit/instrsxarch.h
Outdated
| INST3(vp2intersectd, "vp2intersectd", IUM_WR, BAD_CODE, BAD_CODE, PSSE38(0xF2, 0x68), ILLEGAL, ILLEGAL, INS_TT_FULL, Input_32Bit | KMask_Base4 | REX_W0 | Encoding_EVEX) // Compute Intersection Between DWORDS to a Pair of Mask Registers | ||
| INST3(vp2intersectq, "vp2intersectq", IUM_WR, BAD_CODE, BAD_CODE, PSSE38(0xF2, 0x68), ILLEGAL, ILLEGAL, INS_TT_FULL, Input_64Bit | KMask_Base2 | REX_W1 | Encoding_EVEX) // Compute Intersection Between QWORDS to a Pair of Mask Registers | ||
|
|
||
| #define FIRST_AVX512BMM_INSTRUCTION INS_vbmacor16x16x16 |
There was a problem hiding this comment.
We should probably put these below the AvxVnni group. Just so its not splitting the Avx10v2 and Avx10v2.x64 group
| /// <para>__m512i _mm512_bmacxor16x16x16 (__m512i left, __m512i right, __m512i addend)</para> | ||
| /// <para> VBMACXOR16x16x16 zmm1, zmm2, zmm3/m256</para> | ||
| /// </summary> | ||
| public static Vector512<ushort> BitMultiplyMatrix16x16WithXorReduction(Vector512<ushort> left, Vector512<ushort> right, Vector512<ushort> addend) => BitMultiplyMatrix16x16WithXorReduction(left, right, addend); |
There was a problem hiding this comment.
After double checking the instruction, it looks like the parameter order is VBMACXOR16x16x16 ymm1, ymm2, ymm3/m256 where ymm1: addend, ymm2: left, ymm3: right
and so the order here should be: addend, left, right instead
There was a problem hiding this comment.
-- we want to ensure parameter order matches instruction order, so we don't have to move things around
| HARDWARE_INTRINSIC(AVX512BMM, BitMultiplyMatrix16x16WithOrReduction, -1, -1, {INS_invalid, INS_invalid, INS_vbmacor16x16x16, INS_vbmacor16x16x16, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid}, HW_Category_SimpleSIMD, HW_Flag_BaseTypeFromFirstArg) | ||
| HARDWARE_INTRINSIC(AVX512BMM, BitMultiplyMatrix16x16WithXorReduction, -1, -1, {INS_invalid, INS_invalid, INS_vbmacxor16x16x16, INS_vbmacxor16x16x16, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid}, HW_Category_SimpleSIMD, HW_Flag_BaseTypeFromFirstArg) |
There was a problem hiding this comment.
These two are "RMW" (read-modify write) and so should be marked as IsRmwIntrinsic and have some specialized handling like Fma and AvxVnni intrinsics have (in lsra and lower). This will ensure better codegen since the first operand is both a source and destination as far as the register allocator is concerned
There was a problem hiding this comment.
I'd also expect some minimal handling in lower to handle the fact that left and right are commutative and so either can be the "from memory" operand.
This reverts commit 1fc5374.
…e memory args, correct API argument order
5ac552e to
60f7879
Compare
This PR implements the AVX512 BMM API.
Disasm Samples
BitMultiplyMatrix16x16WithOrReduction
BitMultiplyMatrix16x16WithXorReduction
ReverseBits
ReverseBits (Merge Masking)
ReverseBits (Zero Masking)