Skip to content

Conversation

@SingleAccretion
Copy link
Contributor

@SingleAccretion SingleAccretion commented Aug 26, 2021

Replace ChangeOperConst with BashToConst.

The nature of bashing to a constant is that the code doing it practically always wants to set the constant value and the type so it makes sense to do all three at once - it also enables some more stringent asserts.

But mostly this method is for convenience.

Unfortunately, the second commit causes diffs, because now containment is enabled where it was not previously (for float constants with the upper bit set). Even more unfortunately, this causes the following regression in System.Numerics.Tests.Perf_Quaternion:ConjugateBenchmark (and two sister benchmarks):

-       mov      eax, 0xD1FFAB1E
-       mov      dword ptr [rdx], eax
-       mov      dword ptr [rdx+4], eax
-       mov      dword ptr [rdx+8], eax
+       mov      dword ptr [rdx], 0xD1FFAB1E
+       mov      dword ptr [rdx+4], 0xD1FFAB1E
+       mov      dword ptr [rdx+8], 0xD1FFAB1E

I have built the benchmarks locally and can measure ~20-40% increase in time, though the accuracy of such measurements is highly questionable as the benchmarks "take" 0.1 ns, with errors of the same magnitude. It is a CQ regression, so, given fixing it properly is out of scope for this PR, I see a few ways to address the problem:

  1. Abandon the change.
  2. Abandon the new assert.
  3. Quirk the lowering transform to retain the previous behavior.

Without the second commit, there is one other source of diffs, which has to do with the fact that the old code in assertion propagation for relops left nodes with small types unchanged, and gtFoldExprConst fails to fold things like EQ(bool 0, int 0). The new code (correctly) retypes the node to TYP_INT, thus enabling the folding. I have checked that using "actual" types in gtFoldExprConst does not yield much beyond what this PR enables already.

Diffs with the second commit: win-x64, win-arm64, win-x86, linux-x64.

@ghost ghost added community-contribution Indicates that the PR has been added by a community member area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI labels Aug 26, 2021
@ghost
Copy link

ghost commented Aug 26, 2021

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

Replace ChangeOperConst with BashToConst.

The nature of bashing to a constant is that the code doing it practically always wants to set the constant value and the type so it makes sense to do all three at once - it also enables some more stringent asserts.

But mostly this method is for convenience.

Unfortunately, the second commit causes diffs, because now containment is enabled where it was not previously (for float constants with the upper bit set). Even more unfortunately, this causes the following regression in System.Numerics.Tests.Perf_Quaternion:ConjugateBenchmark (and two sister benchmarks):

-       mov      eax, 0xD1FFAB1E
-       mov      dword ptr [rdx], eax
-       mov      dword ptr [rdx+4], eax
-       mov      dword ptr [rdx+8], eax
+       mov      dword ptr [rdx], 0xD1FFAB1E
+       mov      dword ptr [rdx+4], 0xD1FFAB1E
+       mov      dword ptr [rdx+8], 0xD1FFAB1E

I have built the benchmarks locally and can measure ~20-40% increase in time, though the accuracy of such measurements is highly questionable as the benchmarks "take" 0.1 ns, with errors of the same magnitude. It is a CQ regression, so, given fixing it properly is out of scope for this PR, I see a few ways to address the problem:

  1. Abandon the change.
  2. Abandon the new assert.
  3. Quirk the lowering transform to retain the previous behavior.

Without the second commit, there is one other source of diffs, which has to do with the fact that the old code in assertion propagation for relops left nodes with small types unchanged, and gtFoldExprConst fails to fold things like EQ(bool 0, int 0). The new code (correctly) retypes the node to TYP_INT, thus enabling the folding. I have checked that using "actual" types in gtFoldExprConst does not yield much beyond what this PR enables already.

Diffs with the second commit: win-x64.

Author: SingleAccretion
Assignees: -
Labels:

area-CodeGen-coreclr, community-contribution

Milestone: -

@SingleAccretion SingleAccretion force-pushed the Universal-ChangeOperConst branch 4 times, most recently from b2d5353 to 6ec74c4 Compare August 26, 2021 21:46
@SingleAccretion SingleAccretion marked this pull request as ready for review August 27, 2021 09:34
The nature of bashing to a constant is that
the code doing it practically always wants to
set the constant value and the type so it makes
sense to do all three at once - it also enables
some more stringent asserts.

But mostly this method is for convenience.
It is a minor "bug" in lowering where zero-extension was used for
a TYP_INT constant instead of sign extension.
@JulieLeeMSFT
Copy link
Member

cc @dotnet/jit-contrib @AndyAyersMS please review the community PR.

@AndyAyersMS
Copy link
Member

For the regression, seems like we ought to extend #44419 to cover float literals as well?

@SingleAccretion
Copy link
Contributor Author

The tricky part may be that the encoding for floating point operations is larger, so we'll still not be able to get "the best" result.

@AndyAyersMS
Copy link
Member

The tricky part may be that the encoding for floating point operations is larger,

Can you elaborate? I'm not sure I follow.

If there is a tradeoff, presumably the CSE cost heuristics would try and balance this somehow?

@SingleAccretion
Copy link
Contributor Author

Can you elaborate? I'm not sure I follow.

Sure. The existing codegen is "ideal" since it uses an integer register for the stores. Were we to CSE the floating point constant, we'd end up using the floating point stores (because we don't want to play tricks with types in the optimizer - or do we...?), which are a bit bigger. It is still better than not using any register at all for the stores though.

If there is a tradeoff, presumably the CSE cost heuristics would try and balance this somehow?

I would say it is a bit complicated case to handle "cleanly"... Basically the ideal outcome requires us to:

  1. See how many uses a constant float has.
  2. If it has "enough" uses where the type doesn't matter (in indirect stores say), retype them to integers.
  3. CSE the integer constant.

I am not convinced the gains would justify the complexity in this case. I was also thinking of some way to handle this later, in lowering, by keeping tabs on stores as we lower the range and re-evaluating containment if we see a situation where LSRA could reuse a register with a constant (would also open the door for some coalescing I suppose). But it requires more careful design and I worry about the TP implications.

@AndyAyersMS
Copy link
Member

These benchmarks are perhaps not all that valuable. @tannergooding any way we can measure something more meaningful?

newplot (70)

Copy link
Member

@AndyAyersMS AndyAyersMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the change, and since we were already containing some float constants it seems odd to hold back here because of a few regressions and not allow containing them all.

@AndyAyersMS AndyAyersMS merged commit 6901262 into dotnet:main Sep 13, 2021
@SingleAccretion SingleAccretion deleted the Universal-ChangeOperConst branch September 14, 2021 18:55
@ghost ghost locked as resolved and limited conversation to collaborators Nov 3, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI community-contribution Indicates that the PR has been added by a community member

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants