Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

Implement the SSE2 MaskMove intrinsic#16669

Merged
CarolEidt merged 2 commits intodotnet:masterfrom
tannergooding:hwintrin-sse2-maskmove
Mar 2, 2018
Merged

Implement the SSE2 MaskMove intrinsic#16669
CarolEidt merged 2 commits intodotnet:masterfrom
tannergooding:hwintrin-sse2-maskmove

Conversation

@tannergooding
Copy link
Member

No description provided.

{
// SSE2 MaskMove hardcodes the destination (op3) in DI/EDI/RDI
LocationInfoListNode* op3Info = useList.Begin()->Next()->Next();
op3Info->info.setSrcCandidates(this, RBM_EDI);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CarolEidt, what is the proper way to do this? The LSRA doesn't seem to respect this (or the similar pattern used by NI_SSE41_BlendVariable for XMM0 below)...

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The codgen should insert register copy if op3Reg is not EDI.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fiigii, I don't think that is quite correct (and would like confirmation if it is).

I would think that the register allocator needs to explicitly give us the required register, otherwise we would be consuming it without its knowledge and potentially overwriting whatever value the JIT thinks is in there.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this comment on checkConflictingDefUse is relevant: https://github.com/dotnet/coreclr/blob/master/src/jit/lsrabuild.cpp#L365

It would be helpful if there was a "hack" we could do for this to enforce that the required register is actually used.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Man, this was discussed to death in the past. The RA will allocate the required register for you but you have to move the value to the required register if it's not already there. There's little reason for the RA to automatically insert a copy node for this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know of a good way to indicate that on the node without adding extra fields to GenTree

What about tracking it as an internal register, that seems to have all the "necessary" hookups already...

Users could then validate that the desired register is actually allocated to the node, as expected:

if (op3Reg != REG_EDI)
{
    assert((node->gtRsvdRegs & RBM_EDI) != 0);
    emit->emitIns_R_R(INS_mov, EA_PTRSIZE, REG_EDI, op3Reg);
}

emit->emitIns_R_R(ins, simdSize, op1Reg, op2Reg);

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need to check? You asked for register EDI. Provided that LSRA isn't broken you'll get that register. It's as simple as that. And if LSRA is broken, well, that isn't your problem.

Maybe its just me being a paranoid coder, but I like to be able to actually assert that my assumptions are correct. I don't want to blindly assume that REG_EDI is available for use, I want to assert that it is actually available for use before I use it.

This is one of the cases where, if the assumption ever did break, we don't have a good way to catch it outside of tests that fail in possibly obscure manners. Having the ability to assert that it is correct makes it infinitely easier to determine why something is failing because it is clearly called out.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would require special handling in all nodes that require internal registers, as they would have to explicitly exclude that register. Honestly, this has just not been an issue so far so it seems unnecessary. You could assert that it is not in rsMaskVars (I think that's the name), which would at least ensure that no lclVar is in that register.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to blindly assume that REG_EDI is available for use, I want to assert that it is actually available for use before I use it.

I don't know, I'd say that it's the norm to assume that another component of the system does its job. I like a few asserts here and there myself, but not because I don't trust other components but because it tends to add more clarity to the code.

This JIT's IR is rather convoluted, with all sorts of corner cases and odd balls, partly a result of it being pushed beyond its original design and partly a result of it trying to be efficient. Asserts help a bit but a better solution would be to have IR validation checks that run after each phase (that modifies IR). There are some but they're not sufficient. I don't know if LSRA has anything like this.

This is one of the cases where, if the assumption ever did break, we don't have a good way to catch it outside of tests that fail in possibly obscure manners.

Welcome to the optimization and code generation world. There are an infinite number of inputs and all kinds of interactions between various compiler transforms. No matter how many tests you write, no matter how many asserts and checks you have in code you won't be able to catch all bugs. The JIT has even this "stress mode" thing that's intended to catch more bugs by triggering scenarios that are more difficult to trigger using normal tests. Yet bugs slip through and most of the time they happen because someone simply did not think that some particular scenario can happen and thus there are neither tests nor asserts for it.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a better solution would be to have IR validation checks that run after each phase (that modifies IR). There are some but they're not sufficient. I don't know if LSRA has anything like this.

LSRA does have a validation phase (LinearScan::verifyFinalAllocation()), though I'm sure it could be improved.

{
// SSE2 MaskMove hardcodes the destination (op3) in DI/EDI/RDI
LocationInfoListNode* op3Info = useList.Begin()->Next()->Next();
op3Info->info.setSrcCandidates(this, RBM_EDI);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you look at the allocation table in the jitdump, you should be able to see that RDI isn't occupied at the time of the use. You can search for "final allocation" to see what the register allocator thinks is in each register at each virtual cycle (LsraLocation).

assert(targetReg == REG_NA);

// SSE2 MaskMove hardcodes the destination (op3) in DI/EDI/RDI
assert((op3Reg & RBM_EDI) != 0);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if op3Reg was guaranteed to be in the correct register, this would not be correct, since op3Reg is a regNumber and RBM_EDI is a regMaskTP. In any case, you need to check to see if it is REG_EDI and if not generate a move.

@tannergooding tannergooding changed the title [WIP] Implement the SSE2 MaskMove intrinsic Implement the SSE2 MaskMove intrinsic Mar 1, 2018
@tannergooding
Copy link
Member Author

test Windows_NT x64 Checked jitincompletehwintrinsic
test Windows_NT x64 Checked jitx86hwintrinsicnoavx
test Windows_NT x64 Checked jitx86hwintrinsicnoavx2
test Windows_NT x64 Checked jitx86hwintrinsicnosimd
test Windows_NT x64 Checked jitnox86hwintrinsic

test Windows_NT x86 Checked jitincompletehwintrinsic
test Windows_NT x86 Checked jitx86hwintrinsicnoavx
test Windows_NT x86 Checked jitx86hwintrinsicnoavx2
test Windows_NT x86 Checked jitx86hwintrinsicnosimd
test Windows_NT x86 Checked jitnox86hwintrinsic

test Ubuntu x64 Checked jitincompletehwintrinsic
test Ubuntu x64 Checked jitx86hwintrinsicnoavx
test Ubuntu x64 Checked jitx86hwintrinsicnoavx2
test Ubuntu x64 Checked jitx86hwintrinsicnosimd
test Ubuntu x64 Checked jitnox86hwintrinsic

test OSX10.12 x64 Checked jitincompletehwintrinsic
test OSX10.12 x64 Checked jitx86hwintrinsicnoavx
test OSX10.12 x64 Checked jitx86hwintrinsicnoavx2
test OSX10.12 x64 Checked jitx86hwintrinsicnosimd
test OSX10.12 x64 Checked jitnox86hwintrinsic

@tannergooding
Copy link
Member Author

@CarolEidt. I believe all feedback has been addressed, could you give another pass when you have the chance?

Copy link

@CarolEidt CarolEidt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - thanks!

@CarolEidt CarolEidt merged commit 9302df9 into dotnet:master Mar 2, 2018
@tannergooding tannergooding deleted the hwintrin-sse2-maskmove branch May 30, 2018 04:16
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants