Skip to content

Conversation

@EgorBo
Copy link
Member

@EgorBo EgorBo commented Apr 3, 2022

A quick fix for #67500

bool Test(Vector<ushort> v) => Vector<ushort>.Zero.Equals(v);

codegen diff:

; Assembly listing for method Tests:Test(System.Numerics.Vector`1[UInt16]):bool:this
G_M10888_IG01:  
       C5F877               vzeroupper 
G_M10888_IG02:   
-      C5FC57C0             vxorps   ymm0, ymm0, ymm0
-      C5FDEF02             vpxor    ymm0, ymm0, ymmword ptr[rdx]
+      C5FD1002             vmovupd  ymm0, ymmword ptr[rdx]
       C4E27D17C0           vptest   ymm0, ymm0
       0F94C0               sete     al
       0FB6C0               movzx    rax, al
G_M10888_IG03:
       C5F877               vzeroupper 
       C3                   ret     
-; Total bytes of code 26
+; Total bytes of code 22

Diffs

@ghost ghost assigned EgorBo Apr 3, 2022
@ghost ghost added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Apr 3, 2022
@ghost
Copy link

ghost commented Apr 3, 2022

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

A quick fix for #67500

bool Test(Vector<ushort> v) => Vector<ushort>.Zero.Equals(v);

codegen diff:

; Assembly listing for method Tests:Test(System.Numerics.Vector`1[UInt16]):bool:this
G_M10888_IG01:  
       C5F877               vzeroupper 
G_M10888_IG02:   
-      C5FC57C0             vxorps   ymm0, ymm0, ymm0
-      C5FDEF02             vpxor    ymm0, ymm0, ymmword ptr[rdx]
+      C5FD1002             vmovupd  ymm0, ymmword ptr[rdx]
       C4E27D17C0           vptest   ymm0, ymm0
       0F94C0               sete     al
       0FB6C0               movzx    rax, al
G_M10888_IG03:
       C5F877               vzeroupper 
       C3                   ret     
-; Total bytes of code 26
+; Total bytes of code 22
Author: EgorBo
Assignees: EgorBo
Labels:

area-CodeGen-coreclr

Milestone: -

Comment on lines +14538 to +14539
// Move constant vectors from op1 to op2 for commutative and compare operations
// For now we only do it for zero vector
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like something we really should just be handling in lowering when doing the relevant checks...

Why is it needed in morph instead?

Copy link
Contributor

@SingleAccretion SingleAccretion Apr 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Morph is the place where we perform canonicalizations like this one. It benefits the rest of the compiler (e. g. VN).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like something we really should just be handling in lowering when doing the relevant checks...

Why is it needed in morph instead?

I tried that first but it was more verbose (LIR changes are always quite verbose) and yeah, just like @SingleAccretion mentioned it's a sort of an canonization that we also do for scalars in morph already

Copy link
Member

@tannergooding tannergooding Apr 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried that first but it was more verbose (LIR changes are always quite verbose) and yeah, just like @SingleAccretion mentioned it's a sort of an canonization that we also do for scalars in morph already

I'm not sure why it would be more verbose? I would expect we just add the relevant bit right here: https://github.com/dotnet/runtime/blob/main/src/coreclr/jit/lowerxarch.cpp#L1243

That is, today we are only checking op2. We should just need to insert if (op1->IsIntegralConstVector(0)) { std::swap(op1, op2); }

just like @SingleAccretion mentioned it's a sort of an canonization that we also do for scalars in morph already

I understand why we might want to do some canonization in morph. However, this is going to miss opts and other transforms that might happen later.

We really should have something in lowering that tries to handle it as well, particularly where its trivial to do so. And particularly since lowering already needs to be concerned with commutativity, with swapping operands where appropriate, etc

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, this is going to miss opts and other transforms that might happen later.

I think until we have cases that show this, we should not be duplicating code between morph and lower.

Morph is already called from pretty much every optimization phase that can introduce new constants into the IR (currently for vectors that can only happen in global assertion propagation).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

morph is the transform pass for HIR. lowering is basically the equivalent for LIR and does transforms necessary or appropriate for lsra/codegen.

It will itself introduce new constant nodes, swap operands, do containment checks, etc.

I'm not saying everything needs to do this. Just that both morph and lowering should be and we are likely going to be missing optimizations without it. At the very least, if its not also done in lowering, an assert should exist in lowering to help validate that people don't accidentally introduce Zero as the LHS elsewhere or in the future.

Copy link
Contributor

@SingleAccretion SingleAccretion Apr 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not saying everything needs to do this. Just that both morph and lowering should be and we are likely going to be missing optimizations without it

Lowering is "just another" downstream phase. There is nothing special about it that does not apply to other post-morph phases. For example, the optimization phases are potentially much more impactful to what the end code will look like, and they rely on canonicalization just fine. There is no reason lowering should not.

If we see that optimizations are missed in some cases, we should ensure whatever is creating/transforming IR does that in a canonicality-friendly fashion.

I would be ok with an assert, however, we should be mindful of the fact that such asserts tend to fail under stress where we want non-canonical IR to exist and be tested.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lowering is "just another" downstream phase

Lowering is, by definition, a transform phase that deals with converting high level IR into the low level IR needed for generation.

As part of this, it has to deal with swapping operands and dealing with various IR concepts that almost no other phase sees. It's likewise the last phase before register allocation at which point the only thing left we can do is minor peephole optimizations, so nothing else runs after it really.

Its not just another phase IMO, its the place that codegen optimizations like this happen traditionally speaking.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Morph will never be able to get them all. It runs far too early and there is no guarantee canonicalization actually occurs.

We've had a ton of good codegen improvements in recent releases specifically because lowering is doing them and catching them. I expect that we'd end up regressing things if we removed those opts and tried moving them up to morph instead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Morph will never be able to get them all.

This argument can apply to any phase, all of them are potentially "missing out".

I am saying that's a problem for which the solution is not to teach each and every phase to work with each and every IR shape, but to ensure they all see a single one, unless not possible.

Special-casing lowering is not needed unless it is un-canonicalizing the IR, which it does not do in this case.

Morph will never be able to get them all.

True. But we're not after perfection, we should just ensure the cases we care about work as expected. For example: we do not care if a stress mode makes us swap operands backwards and now PTEST isn't recognized.

Practically, for this change, I do not see a possibility that we'd miss canonicalization, for the reason I mentioned above (the only phase that can discover new vector constants after morph is global assertion propagation and that calls morph).

{
GenTreeHWIntrinsic* hw = multiOp->AsHWIntrinsic();

// Move constant vectors from op1 to op2 for commutative and compare operations
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to move this inside fgOptimizeHWIntrinsic (along with the XOR transform).

@EgorBo
Copy link
Member Author

EgorBo commented Apr 13, 2022

@tannergooding can we merge it as is for now - it's 4 lines change and it has some diffs already 🙂

@tannergooding
Copy link
Member

tannergooding commented Apr 13, 2022

I don't see any blockers. I just still think that we should be also handling this in lowering (and that could be handled/investigated in a separate PR).

@tannergooding
Copy link
Member

Just realized the reason this couldn't be merged is because no one had ever given the green check. Didn't mean to block this with my feedback.

This should probably rerun CI before getting merged

@EgorBo EgorBo merged commit 3d63499 into dotnet:main May 30, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Jun 29, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants