-
Notifications
You must be signed in to change notification settings - Fork 5.3k
JIT: Move constant vectors to op2 from op1 #67502
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Tagging subscribers to this area: @JulieLeeMSFT Issue DetailsA quick fix for #67500 bool Test(Vector<ushort> v) => Vector<ushort>.Zero.Equals(v);codegen diff: ; Assembly listing for method Tests:Test(System.Numerics.Vector`1[UInt16]):bool:this
G_M10888_IG01:
C5F877 vzeroupper
G_M10888_IG02:
- C5FC57C0 vxorps ymm0, ymm0, ymm0
- C5FDEF02 vpxor ymm0, ymm0, ymmword ptr[rdx]
+ C5FD1002 vmovupd ymm0, ymmword ptr[rdx]
C4E27D17C0 vptest ymm0, ymm0
0F94C0 sete al
0FB6C0 movzx rax, al
G_M10888_IG03:
C5F877 vzeroupper
C3 ret
-; Total bytes of code 26
+; Total bytes of code 22
|
| // Move constant vectors from op1 to op2 for commutative and compare operations | ||
| // For now we only do it for zero vector |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels like something we really should just be handling in lowering when doing the relevant checks...
Why is it needed in morph instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Morph is the place where we perform canonicalizations like this one. It benefits the rest of the compiler (e. g. VN).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels like something we really should just be handling in
loweringwhen doing the relevant checks...Why is it needed in morph instead?
I tried that first but it was more verbose (LIR changes are always quite verbose) and yeah, just like @SingleAccretion mentioned it's a sort of an canonization that we also do for scalars in morph already
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried that first but it was more verbose (LIR changes are always quite verbose) and yeah, just like @SingleAccretion mentioned it's a sort of an canonization that we also do for scalars in morph already
I'm not sure why it would be more verbose? I would expect we just add the relevant bit right here: https://github.com/dotnet/runtime/blob/main/src/coreclr/jit/lowerxarch.cpp#L1243
That is, today we are only checking op2. We should just need to insert if (op1->IsIntegralConstVector(0)) { std::swap(op1, op2); }
just like @SingleAccretion mentioned it's a sort of an canonization that we also do for scalars in morph already
I understand why we might want to do some canonization in morph. However, this is going to miss opts and other transforms that might happen later.
We really should have something in lowering that tries to handle it as well, particularly where its trivial to do so. And particularly since lowering already needs to be concerned with commutativity, with swapping operands where appropriate, etc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, this is going to miss opts and other transforms that might happen later.
I think until we have cases that show this, we should not be duplicating code between morph and lower.
Morph is already called from pretty much every optimization phase that can introduce new constants into the IR (currently for vectors that can only happen in global assertion propagation).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
morph is the transform pass for HIR. lowering is basically the equivalent for LIR and does transforms necessary or appropriate for lsra/codegen.
It will itself introduce new constant nodes, swap operands, do containment checks, etc.
I'm not saying everything needs to do this. Just that both morph and lowering should be and we are likely going to be missing optimizations without it. At the very least, if its not also done in lowering, an assert should exist in lowering to help validate that people don't accidentally introduce Zero as the LHS elsewhere or in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not saying everything needs to do this. Just that both morph and lowering should be and we are likely going to be missing optimizations without it
Lowering is "just another" downstream phase. There is nothing special about it that does not apply to other post-morph phases. For example, the optimization phases are potentially much more impactful to what the end code will look like, and they rely on canonicalization just fine. There is no reason lowering should not.
If we see that optimizations are missed in some cases, we should ensure whatever is creating/transforming IR does that in a canonicality-friendly fashion.
I would be ok with an assert, however, we should be mindful of the fact that such asserts tend to fail under stress where we want non-canonical IR to exist and be tested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lowering is "just another" downstream phase
Lowering is, by definition, a transform phase that deals with converting high level IR into the low level IR needed for generation.
As part of this, it has to deal with swapping operands and dealing with various IR concepts that almost no other phase sees. It's likewise the last phase before register allocation at which point the only thing left we can do is minor peephole optimizations, so nothing else runs after it really.
Its not just another phase IMO, its the place that codegen optimizations like this happen traditionally speaking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Morph will never be able to get them all. It runs far too early and there is no guarantee canonicalization actually occurs.
We've had a ton of good codegen improvements in recent releases specifically because lowering is doing them and catching them. I expect that we'd end up regressing things if we removed those opts and tried moving them up to morph instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Morph will never be able to get them all.
This argument can apply to any phase, all of them are potentially "missing out".
I am saying that's a problem for which the solution is not to teach each and every phase to work with each and every IR shape, but to ensure they all see a single one, unless not possible.
Special-casing lowering is not needed unless it is un-canonicalizing the IR, which it does not do in this case.
Morph will never be able to get them all.
True. But we're not after perfection, we should just ensure the cases we care about work as expected. For example: we do not care if a stress mode makes us swap operands backwards and now PTEST isn't recognized.
Practically, for this change, I do not see a possibility that we'd miss canonicalization, for the reason I mentioned above (the only phase that can discover new vector constants after morph is global assertion propagation and that calls morph).
| { | ||
| GenTreeHWIntrinsic* hw = multiOp->AsHWIntrinsic(); | ||
|
|
||
| // Move constant vectors from op1 to op2 for commutative and compare operations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be better to move this inside fgOptimizeHWIntrinsic (along with the XOR transform).
|
@tannergooding can we merge it as is for now - it's 4 lines change and it has some diffs already 🙂 |
|
I don't see any blockers. I just still think that we should be also handling this in lowering (and that could be handled/investigated in a separate PR). |
|
Just realized the reason this couldn't be merged is because no one had ever given the green check. Didn't mean to block this with my feedback. This should probably rerun CI before getting merged |
A quick fix for #67500
codegen diff:
; Assembly listing for method Tests:Test(System.Numerics.Vector`1[UInt16]):bool:this G_M10888_IG01: C5F877 vzeroupper G_M10888_IG02: - C5FC57C0 vxorps ymm0, ymm0, ymm0 - C5FDEF02 vpxor ymm0, ymm0, ymmword ptr[rdx] + C5FD1002 vmovupd ymm0, ymmword ptr[rdx] C4E27D17C0 vptest ymm0, ymm0 0F94C0 sete al 0FB6C0 movzx rax, al G_M10888_IG03: C5F877 vzeroupper C3 ret -; Total bytes of code 26 +; Total bytes of code 22Diffs