Implement scalar Sse2 hardware intrinsics#16237
Conversation
src/jit/instrsxarch.h
Outdated
| INST3( movd, "movd" , 0, IUM_WR, 0, 0, PCKDBL(0x7E), BAD_CODE, PCKDBL(0x6E)) | ||
| INST3( movq, "movq" , 0, IUM_WR, 0, 0, PCKDBL(0xD6), BAD_CODE, SSEFLT(0x7E)) | ||
| INST3( movd, "movd" , 0, IUM_WR, 0, 0, PCKDBL(0x6E), BAD_CODE, PCKDBL(0xFE)) // Move doubleword from r/m32 to xmm or move doubleword from xmm register to r/m32 | ||
| INST3( movq, "movq" , 0, IUM_WR, 0, 0, PCKDBL(0xD6), BAD_CODE, SSEFLT(0x7E)) // Move quadword from r/m64 to xmm or move quadword from xmm register to r/m64 |
|
Just a general nit. It is helpful to break these PRs into two logical commits (one containing the product code changes and one containing the test changes). The GitHub UI (and even various external code review tools) tend to not do so well with large commits, and making it easier to review the product code is generally desirable (IMO). |
| { | ||
| // special case vzeroupper as it requires 2-byte VEX prefix | ||
| if (ins == INS_vzeroupper) | ||
| switch (ins) |
There was a problem hiding this comment.
this seems to be in a weird state. The sfence and prefetch instructions are missing.
There was a problem hiding this comment.
Yeah, rebasing to latest head which includes commit containing memory instructions - will fix
src/jit/emitxarch.cpp
Outdated
| switch (ins) | ||
| { | ||
| return true; | ||
| case INS_cvttsd2si: |
There was a problem hiding this comment.
cvttss2si is missing.
Additionally, given that you aren't adding anything new here, it would be generally desirable to move it into its own refactoring PR.
| break; | ||
| } | ||
|
|
||
| case NI_SSE2_CompareEqualOrderedScalar: |
There was a problem hiding this comment.
These should be merged with the SSE implementations.
There was a problem hiding this comment.
Agree, but since there quite a lot refactoring opportunities still coming I have created issue to track this work item #16014 and I plan to work on it once all SSE2 and other similar SSE intrinsics are in.
| baseType = JITtype2varType(corType); | ||
|
|
||
| #ifdef _TARGET_X86_ | ||
| if (varTypeIsLong(JITtype2varType(corType))) |
There was a problem hiding this comment.
This case should already be handled by impHWIntrinsic
There was a problem hiding this comment.
This case should already be handled by impHWIntrinsic
Currently, impHWIntrinsic only handle r64 for return types. Maybe we can make the change latter.
There was a problem hiding this comment.
Do we have a bug tracking this yet?
Do we also have any mechanism for determining when the intrinsic takes/returns a 64-bit value but also works on 32-bit machines (there is at least a couple of them)?
There was a problem hiding this comment.
Do we have a bug tracking this yet?
I do not think this is a bug. We can introduce more flags to hnit the framework that the intrinsic may have an r64 oprand on first/second/.. position.
There was a problem hiding this comment.
Let me make this change after this PR merged.
Do you prefer to break this PR or next ones? |
Breaking this one into two should be fairly trivial, since the split is only along a directory: |
43bf732 to
dc4e7cd
Compare
src/jit/emitxarch.cpp
Outdated
| ======= | ||
| // these are the only ISA extension instructions taking zero operands | ||
| || ins == INS_vzeroupper | ||
| >>>>>>> Implement scalar Sse2 hardware intrinsics |
There was a problem hiding this comment.
Thanks will fix - next bad merge
| baseType = JITtype2varType(corType); | ||
|
|
||
| #ifdef _TARGET_X86_ | ||
| if (varTypeIsLong(JITtype2varType(corType))) |
There was a problem hiding this comment.
This case should already be handled by impHWIntrinsic
Currently, impHWIntrinsic only handle r64 for return types. Maybe we can make the change latter.
src/jit/hwintrinsicxarch.cpp
Outdated
|
|
||
| if (baseType == TYP_STRUCT) | ||
| { | ||
| baseType = getBaseTypeOfSIMDType(argClass); |
There was a problem hiding this comment.
I believe that the baseType is always double on Sse2.ConvertScalarToVector128Double.
There was a problem hiding this comment.
Yes, that's the case however I try to write code which will be easy to refactor or abstract to handle more cases later see issue #16014. If it is acceptable I would prefer to defer to refactoring PR
There was a problem hiding this comment.
Never mind. CVTSS2SD xmm, xmm/m32 and CVTSI2SD xmm, reg/m64 need different encoding.
There was a problem hiding this comment.
In this case I use argType as baseType to use it later during instruction emission
src/jit/hwintrinsicxarch.cpp
Outdated
| CorInfoType corType = | ||
| strip(info.compCompHnd->getArgType(sig, argList, &argClass)); // type of the second argument | ||
|
|
||
| baseType = getBaseTypeOfSIMDType(argClass); |
There was a problem hiding this comment.
The baseType is float (retType's base type) on ConvertScalarToVector128Single.
There was a problem hiding this comment.
I need arg types to use correct encoding later (r64 versus r32) during instruction emitting.
There was a problem hiding this comment.
Are you talking about ConvertScalarToVector128Double?
Sse2.ConvertScalarToVector128Single just has one signature Vector128<float> ConvertScalarToVector128Single(Vector128<float> upper, Vector128<double> value)
There was a problem hiding this comment.
Once I will work on refactoring I will have overloaded - where I need to use second argType as base type
ConvertScalarToVector128Single(Vector128<float> upper, int value)
ConvertScalarToVector128Single(Vector128<float> upper, long value)
If you prefer to simplify it for this PR and not defer changes to refactoring I will do it.
There was a problem hiding this comment.
I see. However, I do not think we should combine Sse.ConvertScalarToVector128Single and Sse2.ConvertScalarToVector128Single because they have different signatures and codgen consideration.
Furthermore, combining HW intrinsic code cross-ISA should rely on the current category/flag mechanism rather than "intrinsic names". Meanwhile, a little bit duplicated code is fine to me, it may be not worthwhile to complicate the framework just for a few special cases. We can implement these "special cases" individually, then add categories/flags to combine them if there are some obvious patterns.
There was a problem hiding this comment.
OK than I will simplify it and leave any changes during refactoring to the result of discussion of its scope.
| assert(sig->numArgs == 1); | ||
| op1 = impSIMDPopStack(TYP_SIMD16); | ||
| retType = JITtype2varType(sig->retType); | ||
| baseType = getBaseTypeOfSIMDType(info.compCompHnd->getArgClass(sig, sig->args)); |
There was a problem hiding this comment.
We can directly use the return type as base type.
There was a problem hiding this comment.
There are overloads where it is necessary to select different instruction based on argument type - I use baseType to handle this. AFAIR during debugging I have seen problems with baseType detection and that's why I have explicitly recovered it from sig->retType.
There was a problem hiding this comment.
Ah, I see there are two special guys
/// <summary>
/// int _mm_cvtsd_si32 (__m128d a)
/// CVTSD2SI r32, xmm/m64
/// </summary>
public static int ConvertToInt32(Vector128<double> value) => ConvertToInt32(value);
/// <summary>
/// int _mm_cvtsi128_si32 (__m128i a)
/// MOVD reg/m32, xmm
/// </summary>
public static int ConvertToInt32(Vector128<int> value) => ConvertToInt32(value);
/// <summary>
/// __int64 _mm_cvtsd_si64 (__m128d a)
/// CVTSD2SI r64, xmm/m64
/// </summary>
public static long ConvertToInt64(Vector128<double> value) => ConvertToInt64(value);
/// <summary>
/// __int64 _mm_cvtsi128_si64 (__m128i a)
/// MOVQ reg/m64, xmm
/// </summary>
public static long ConvertToInt64(Vector128<long> value) => ConvertToInt64(value);Perhasps, we can separate these two groups
case NI_SSE2_ConvertToInt32:
case NI_SSE2_ConvertToInt64:
// get baseType from arg
case NI_SSE2_ConvertToUInt32:
case NI_SSE2_ConvertToUInt64:
// use the return type as base typebecause calling getBaseTypeOfSIMDType and info.compCompHnd->getArgClass are relatively expensive
1b76e1a to
b7d95c7
Compare
| case NI_SSE2_ConvertScalarToVector128Int64: | ||
| case NI_SSE2_ConvertScalarToVector128UInt64: | ||
| { | ||
| assert(sig->numArgs == 1); |
There was a problem hiding this comment.
Due to above I have had to handle above conversion intrinsics as HW_Category_Special
a7c3f15 to
14fab5b
Compare
|
@tannergooding There are strange test results. Could you help verify if I should report them as issues to resolve or dig in and look for the reasons myself? All OSX and Ubuntu stress jobs failed due to the following failure: While |
|
Most of these appear to be @jkotas, are you aware of any changes that went in recently that could impact this? |
|
No idea what can be causing this. |
8a1c97f to
0f0e6a4
Compare
|
@dotnet-bot test Alpine.3.6 x64 Debug Build please |
|
@dotnet-bot test OSX10.12 x64 Checked Innerloop Build and Test please |
| break; | ||
| } | ||
|
|
||
| case NI_SSE2_CompareEqualOrderedScalar: |
There was a problem hiding this comment.
It would be good to have an explicit bug that tracks merging this with the SSE implementation.
There was a problem hiding this comment.
The issue #16330 tracks this work item. Could you assign it to me within hardware intrinsics project?
src/jit/hwintrinsiccodegenxarch.cpp
Outdated
| assert(op2 != nullptr); | ||
| op2Reg = op2->gtRegNum; | ||
| instruction ins = Compiler::insOfHWIntrinsic(intrinsicID, baseType); | ||
| emit->emitIns_SIMD_R_R_R(ins, emitTypeSize(TYP_SIMD16), targetReg, op1Reg, op2Reg); |
There was a problem hiding this comment.
Can't this one go through genHWIntrinsic_R_R_RM?
src/jit/hwintrinsiccodegenxarch.cpp
Outdated
| assert(op2 != nullptr); | ||
| op2Reg = op2->gtRegNum; | ||
| instruction ins = Compiler::insOfHWIntrinsic(intrinsicID, baseType); | ||
| emit->emitIns_SIMD_R_R_R(ins, emitTypeSize(TYP_SIMD16), targetReg, op1Reg, op2Reg); |
There was a problem hiding this comment.
Same here, genHWIntrinsic_R_R_RM?
| } | ||
| else | ||
| { | ||
| emit->emitIns_R_R(INS_mov_xmm2i, emitActualTypeSize(baseType), op1Reg, targetReg); |
There was a problem hiding this comment.
Why isn't this one using insOfHWIntrinsic?
There was a problem hiding this comment.
It's due to #16322 which I plan to fix in #16329
There was a problem hiding this comment.
Not sure I follow. We are using INS_mov_xmm2i in both the TYP_LONG and TYP_ULONG case, so it shouldn't matter which base type is used in the lookup...
That is, if only TYP_LONG is being set, we can fill that column of the ConvertToUInt64 table entry with a TODO-XArch-Bug note that it is a workaround.
There was a problem hiding this comment.
if only
TYP_LONGis being set, we can fill that column of theConvertToUInt64table entry with aTODO-XArch-Bugnote that it is a workaround
It should be done for TYP_UINT as well. In principle proposed change is equivalent workaround to that used in current code as we always select INS_mov_xmm2i. The workaround currently requires one change but in case we would do it in table we should change instruction selection for multiple intrinsics having unsigned integral base types.
If you agree that current solution is simpler I will add TODO-XArch-Bug comment here.
There was a problem hiding this comment.
Looks like this problem is from baseType = JITtype2varType(sig->retType);. We should have a new sign-awared function (e.g., JITtype2BaseVarType, or other name) that only used for HW intrinsics.
There was a problem hiding this comment.
We should have a new sign-awared function (e.g., JITtype2BaseVarType, or other name) that only used for HW intrinsics.
Yes, it is tracked by #16329
There was a problem hiding this comment.
@tannergooding Could you reply to my comment #16237 (comment)
src/jit/lsraxarch.cpp
Outdated
| switch (intrinsicID) | ||
| { | ||
| case NI_SSE_CompareEqualOrderedScalar: | ||
| case NI_SSE2_CompareEqualOrderedScalar: |
There was a problem hiding this comment.
nit: It might be better to keep these grouped as NI_SSE, then NI_SSE2
2947cb9 to
8d8ea6b
Compare
|
@dotnet-bot test Tizen armel Cross Checked Innerloop Build and Test please |
src/jit/hwintrinsiccodegenxarch.cpp
Outdated
| break; | ||
| } | ||
|
|
||
| case NI_SSE2_ConvertScalarToVector128Single: |
There was a problem hiding this comment.
nit: This path could be merged with ConvertScalarToVector128Double above. The only difference is the baseType assertion.
| } | ||
| #endif // _TARGET_X86_ | ||
|
|
||
| if (baseType == TYP_STRUCT) |
There was a problem hiding this comment.
I think we could simplify this to use getArgForHWIntrinsic
There was a problem hiding this comment.
Unfortunately, getArgForHWIntrinsic does not handle this logic properly
There was a problem hiding this comment.
What about it isn't being handled properly?
There was a problem hiding this comment.
It returns TYP_STRUCT instead of TYP_FLOAT - didn't dig deeper to solve it. Plan to have closer look while working on #16329
|
Couple more pieces of feedback/questions. Will probably be good for sign-off after the next round. |
8d8ea6b to
cb767d2
Compare
|
All OSX and Ubuntu jitstress jobs timed out. |
|
test Ubuntu x64 Checked jitincompletehwintrinsic |
|
Again 2 Ubuntu jitstress jobs timeouts |
| static const wchar_t *coreCLRInstallDirectory = W("%windir%\\system32\\"); | ||
|
|
||
| // Encapsulates the environment that CoreCLR will run in, including the TPALIST | ||
| class HostEnvironment |
There was a problem hiding this comment.
This file only contains whitespace changes and isn't related to the PR. It should probably be dropped.
There was a problem hiding this comment.
Yes, it is my workflow error, as I use DebugBreak() in corerun.cpp to start debugging and my editor automatically corrects all whitespace errors. Just forgot to undo changes before commit. Will revert.
tannergooding
left a comment
There was a problem hiding this comment.
Overall this LGTM.
It would be good to undo the corerun.cpp changes, since they are unrelated and whitespace changes only.
It would also probably be good (long term) to have explicit comments as to why particular intrinsics aren't/can't be table driven.
cb767d2 to
f8fc2fb
Compare
|
test Windows_NT x64 Checked jitincompletehwintrinsic test Windows_NT x86 Checked jitincompletehwintrinsic test Ubuntu x64 Checked jitincompletehwintrinsic test OSX10.12 x64 Checked jitincompletehwintrinsic |
CarolEidt
left a comment
There was a problem hiding this comment.
LGTM. Thanks @4creators and also thanks @tannergooding for the reviews!
fiigii
left a comment
There was a problem hiding this comment.
Thank you so much for the work. I logged the PNSE issue at https://github.com/dotnet/coreclr/issues/16342.
|
test Windows_NT x86 Checked jitx86hwintrinsicnosimd |
|
test Windows_NT x64 Checked jitx86hwintrinsicnosimd |
|
Thanks @4creators |
No description provided.