Implement the SSE2 MaskMove intrinsic#16669
Implement the SSE2 MaskMove intrinsic#16669CarolEidt merged 2 commits intodotnet:masterfrom tannergooding:hwintrin-sse2-maskmove
Conversation
| { | ||
| // SSE2 MaskMove hardcodes the destination (op3) in DI/EDI/RDI | ||
| LocationInfoListNode* op3Info = useList.Begin()->Next()->Next(); | ||
| op3Info->info.setSrcCandidates(this, RBM_EDI); |
There was a problem hiding this comment.
@CarolEidt, what is the proper way to do this? The LSRA doesn't seem to respect this (or the similar pattern used by NI_SSE41_BlendVariable for XMM0 below)...
There was a problem hiding this comment.
The codgen should insert register copy if op3Reg is not EDI.
There was a problem hiding this comment.
@fiigii, I don't think that is quite correct (and would like confirmation if it is).
I would think that the register allocator needs to explicitly give us the required register, otherwise we would be consuming it without its knowledge and potentially overwriting whatever value the JIT thinks is in there.
There was a problem hiding this comment.
Looks like this comment on checkConflictingDefUse is relevant: https://github.com/dotnet/coreclr/blob/master/src/jit/lsrabuild.cpp#L365
It would be helpful if there was a "hack" we could do for this to enforce that the required register is actually used.
There was a problem hiding this comment.
Man, this was discussed to death in the past. The RA will allocate the required register for you but you have to move the value to the required register if it's not already there. There's little reason for the RA to automatically insert a copy node for this.
There was a problem hiding this comment.
I don't know of a good way to indicate that on the node without adding extra fields to GenTree
What about tracking it as an internal register, that seems to have all the "necessary" hookups already...
Users could then validate that the desired register is actually allocated to the node, as expected:
if (op3Reg != REG_EDI)
{
assert((node->gtRsvdRegs & RBM_EDI) != 0);
emit->emitIns_R_R(INS_mov, EA_PTRSIZE, REG_EDI, op3Reg);
}
emit->emitIns_R_R(ins, simdSize, op1Reg, op2Reg);
There was a problem hiding this comment.
Why do you need to check? You asked for register EDI. Provided that LSRA isn't broken you'll get that register. It's as simple as that. And if LSRA is broken, well, that isn't your problem.
Maybe its just me being a paranoid coder, but I like to be able to actually assert that my assumptions are correct. I don't want to blindly assume that REG_EDI is available for use, I want to assert that it is actually available for use before I use it.
This is one of the cases where, if the assumption ever did break, we don't have a good way to catch it outside of tests that fail in possibly obscure manners. Having the ability to assert that it is correct makes it infinitely easier to determine why something is failing because it is clearly called out.
There was a problem hiding this comment.
That would require special handling in all nodes that require internal registers, as they would have to explicitly exclude that register. Honestly, this has just not been an issue so far so it seems unnecessary. You could assert that it is not in rsMaskVars (I think that's the name), which would at least ensure that no lclVar is in that register.
There was a problem hiding this comment.
I don't want to blindly assume that REG_EDI is available for use, I want to assert that it is actually available for use before I use it.
I don't know, I'd say that it's the norm to assume that another component of the system does its job. I like a few asserts here and there myself, but not because I don't trust other components but because it tends to add more clarity to the code.
This JIT's IR is rather convoluted, with all sorts of corner cases and odd balls, partly a result of it being pushed beyond its original design and partly a result of it trying to be efficient. Asserts help a bit but a better solution would be to have IR validation checks that run after each phase (that modifies IR). There are some but they're not sufficient. I don't know if LSRA has anything like this.
This is one of the cases where, if the assumption ever did break, we don't have a good way to catch it outside of tests that fail in possibly obscure manners.
Welcome to the optimization and code generation world. There are an infinite number of inputs and all kinds of interactions between various compiler transforms. No matter how many tests you write, no matter how many asserts and checks you have in code you won't be able to catch all bugs. The JIT has even this "stress mode" thing that's intended to catch more bugs by triggering scenarios that are more difficult to trigger using normal tests. Yet bugs slip through and most of the time they happen because someone simply did not think that some particular scenario can happen and thus there are neither tests nor asserts for it.
There was a problem hiding this comment.
a better solution would be to have IR validation checks that run after each phase (that modifies IR). There are some but they're not sufficient. I don't know if LSRA has anything like this.
LSRA does have a validation phase (LinearScan::verifyFinalAllocation()), though I'm sure it could be improved.
| { | ||
| // SSE2 MaskMove hardcodes the destination (op3) in DI/EDI/RDI | ||
| LocationInfoListNode* op3Info = useList.Begin()->Next()->Next(); | ||
| op3Info->info.setSrcCandidates(this, RBM_EDI); |
There was a problem hiding this comment.
If you look at the allocation table in the jitdump, you should be able to see that RDI isn't occupied at the time of the use. You can search for "final allocation" to see what the register allocator thinks is in each register at each virtual cycle (LsraLocation).
src/jit/hwintrinsiccodegenxarch.cpp
Outdated
| assert(targetReg == REG_NA); | ||
|
|
||
| // SSE2 MaskMove hardcodes the destination (op3) in DI/EDI/RDI | ||
| assert((op3Reg & RBM_EDI) != 0); |
There was a problem hiding this comment.
Even if op3Reg was guaranteed to be in the correct register, this would not be correct, since op3Reg is a regNumber and RBM_EDI is a regMaskTP. In any case, you need to check to see if it is REG_EDI and if not generate a move.
|
test Windows_NT x64 Checked jitincompletehwintrinsic test Windows_NT x86 Checked jitincompletehwintrinsic test Ubuntu x64 Checked jitincompletehwintrinsic test OSX10.12 x64 Checked jitincompletehwintrinsic |
|
@CarolEidt. I believe all feedback has been addressed, could you give another pass when you have the chance? |
No description provided.