-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Have CSE correctly track float/simd and mask registers for their own enregistration #117118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
3352e17 to
6707408
Compare
ee5c6bb to
8cde536
Compare
|
CC. @dotnet/jit-contrib. This should be ready for review. There's a minor (+0.02% for most, but up to +0.04% on x64 Windows) throughput hit, but this gets us more into a correct shape for CSE considerations and removes some historical quirks where we were eating the integer CSE budget due to floating-point and mask types being in the method. So I think it's worth taking and then doing some follow-up targeted fixes (likely around not CSE'ing certain containable constants) where possible. |
|
@dotnet/samsung Could you please take a look? These changes may be related to riscv64. |
RISC-V Release-CLR-QEMU: 9086 / 9116 (99.67%)report.xml, report.md, failures.xml, testclr_details.tar.zst RISC-V Release-CLR-VF2: 9087 / 9117 (99.67%)report.xml, report.md, failures.xml, testclr_details.tar.zst RISC-V Release-FX-QEMU: 335773 / 336836 (99.68%)report.xml, report.md, failures.xml, testclr_details.tar.zst RISC-V Release-FX-VF2: 499778 / 501520 (99.65%)report.xml, report.md, failures.xml, testclr_details.tar.zst Build information and commandsGIT: |
|
@jakobbotsch I requested your review since you touch CSE most frequently. |
jakobbotsch
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks reasonable to me. Pinged @AndyAyersMS on a question related to the CSE ML experiment.
bed117e is being scheduled for building and testingGIT: |
This updates CSE to correctly track float/simd and mask registers for their own enregistration.
Previously the logic attempted to skip the handling of these types or even misclassified simd/mask under the integer enregistration limits. This could lead to weird diffs and register selection in some cases due to suboptimal CSE considerations.
As is typical with these changes, it's "difficult" to assess whether its an improvement or not just by looking at the diffs, because the CSE decisions impact other optimizations, register selection, etc. The diffs are primarily impacting methods that use floating-point or SIMD and generally look positive overall.
In general, we're ending up with fewer things marked as
CSE moderateand instead more being marked asCSE aggressiveinstead. This is because we're no longer eating up the CSE budget by classifying everything as integer. This leads to many cases with more total CSEs:This in turn shifts some temp numbers and impacts everything else in the method. Most of which look to be positive changes, such as:
Which is because we no longer reload some constants and do other computations unnecessarily:
; gcrRegs +[x19] ldr d16, [@RWD00] ldr d17, [@RWD08] - fmin d16, d16, d17 - str d16, [fp, #0x30] // [V08 cse0] - str d16, [x19, #0x08] - ldr d17, [@RWD16] - ldr d18, [@RWD24] - fmin d17, d17, d18 - str d17, [fp, #0x28] // [V09 cse1] - str d17, [x19, #0x10] - ldr d18, [@RWD00] - ldr d19, [@RWD08] - fmax d18, d18, d19 - str d18, [fp, #0x20] // [V10 cse2] - str d18, [x19, #0x18] + fmin d18, d16, d17 + str d18, [fp, #0x30] // [V08 cse0] + str d18, [x19, #0x08] ldr d19, [@RWD16] ldr d20, [@RWD24] - fmax d19, d19, d20 - str d19, [fp, #0x18] // [V11 cse3] - str d19, [x19, #0x20] + fmin d21, d19, d20 + str d21, [fp, #0x28] // [V09 cse1] + str d21, [x19, #0x10] + fmax d16, d16, d17 + str d16, [fp, #0x20] // [V10 cse2] + str d16, [x19, #0x18] + fmax d17, d19, d20 + str d17, [fp, #0x18] // [V11 cse3] + str d17, [x19, #0x20]There's some tuning to be done, however, because there's some places where things like zero constants are being CSE'd where it'd be better to have
morphmark against that so we can still optimize accordingly (which should be a simple fix):mov w3, wzr ;; size=16 bbWeight=0.50 PerfScore 1.00 G_M48160_IG04: ; bbWeight=3.96, gcrefRegs=0000 {}, byrefRegs=0005 {x0 x2}, byref, isz - ldr d18, [x0, w3, UXTW #3] - ldr d19, [x2, w3, UXTW #3] + mov w4, w3 + ldr d18, [x0, x4, LSL #3] + ldr d19, [x2, x4, LSL #3]There are many similar cases on x64 and with some similar fine tuning that could be done.