ARM64-SVE: Add Not, InsertIntoShiftedVector#103725
ARM64-SVE: Add Not, InsertIntoShiftedVector#103725amanasifkhalid merged 4 commits intodotnet:mainfrom
Conversation
|
Note regarding the |
1 similar comment
|
Note regarding the |
|
Tagging subscribers to this area: @dotnet/area-system-runtime-intrinsics |
kunalspathak
left a comment
There was a problem hiding this comment.
checks if the instruction is scalable before checking if it has RMW semantics (which this one does), and thus ends up calling the wrong emitIns* method.
Can you confirm the exact place where this is happening?
| /// INSR Ztied1.D, Xop2 | ||
| /// INSR Ztied1.D, Dop2 | ||
| /// </summary> | ||
| public static unsafe Vector<double> InsertIntoShiftedVector(Vector<double> left, double right) { throw new PlatformNotSupportedException(); } |
There was a problem hiding this comment.
This API, as per the docs is implementing both INSR - SIMD&FP and INSR - scalar, but it is not clear to me, how we decide which one to pick. @a74nh - any idea?
| if (targetReg != op1Reg) | ||
| { | ||
| assert(targetReg != op2Reg); | ||
| GetEmitter()->emitIns_Mov(INS_mov, emitTypeSize(node), targetReg, op1Reg, |
There was a problem hiding this comment.
Does this work with both op1Reg being gpr or SIMD/FP?
There was a problem hiding this comment.
I think so, though I'm struggling to get the JIT to use a gpr for op1Reg under the stress modes -- I'm only ever getting the "easy" case, where when op1Reg and targetReg differ, op1Reg is already a vector register, so we're moving from a vector reg to a vector reg. Since the first argument in Sve.InsertIntoShiftedVector<T> is of type Vector<T>, we'd expect op1Reg to always be a vector register, right? I could add an assert here clarify this. (Though looking at emitIns_Mov, it does have a path for emitting mov instructions from a gpr to a vector register.)
There was a problem hiding this comment.
Actually I meant that for op2Reg. Sorry. So ideally, where you have double or float as 2nd argument, we should have op2Reg as SIMD/floating point and otherwise should be gpr. Can you verify that please? For op1Reg it should always be scalable register and we already assert that in emitter.
There was a problem hiding this comment.
No worries, I've added an assert to check this. Stress tests for unoptimized and optimized tests are still passing.
There was a problem hiding this comment.
Do you mind sharing small section of disassembly for both the categories?
There was a problem hiding this comment.
Not at all. Here's a snippet from the double tests:
ldr q16, [x0]
ldr d17, [fp, #0x30] // [V01 loc0]
insr z16.d, d17
And from the uint tests:
ldr q16, [x0]
ldr w0, [fp, #0x34] // [V01 loc0]
insr z16.s, w0
Sure; right here. Notice how in the if-else statement, we check if the instruction is scalable before we check if it has RMW semantics. |
|
I made a small change to |
|
@kunalspathak @a74nh does this PR need anything else? |
Your change looks good except that I realized that after #103620, we do not need condition anymore for below code because both branches are doing same thing. if (HWIntrinsicInfo::IsExplicitMaskedOperation(intrin.id))
{
GetEmitter()->emitIns_R_R_R(ins, emitSize, targetReg, op1Reg, op2Reg, opt);
}
else
{
// This generates an unpredicated version
// Implicitly predicated should be taken care above `intrin.op2->IsEmbMaskOp()`
GetEmitter()->emitIns_R_R_R(ins, emitSize, targetReg, op1Reg, op2Reg, opt);
} |
Got it, I'll fix that in my next PR. |
|
/ba-g Build Analysis blocked by timeouts |
Part of #99957.
InsertIntoShiftedVectorrequires a new test template,_SveVecAndScalarOpTest.template. This template is a clone of_SveMasklessUnaryOpTestTemplate.template, except that a second argument of typeTis passed toSve.InsertIntoShiftedVector<T>and handled elsewhere accordingly. I could've implementedInsertIntoShiftedVectorwithout any special codegen, but the path for handling unpredicated instructions with two operands inCodeGen::genHWIntrinsicchecks if the instruction is scalable before checking if it has RMW semantics (which this one does), and thus ends up calling the wrongemitIns*method. Existing instructions seem to be dependent on this order of checking, so I cannot flip the order of the if-elseif-else statement without breaking existing tests, and I wasn't sure if duplicating the RMW logic on the scalable path would be ideal if it's only for one instruction -- if we prefer to do that now, I can get rid of the special path for this intrinsic, and handle the scalable RMW case on the general path.Not tests:
InsertIntoShiftedVector tests:
cc @dotnet/arm64-contrib