Fix SQRT, RCP, RSQRT PerfScores #50813

pentp · 2021-04-06T21:42:52Z

These were incorrectly grouped and thus sqrt PerfScores were way too low.

tannergooding · 2021-04-07T23:08:19Z

src/coreclr/jit/emitxarch.cpp

-            result.insThroughput = PERFSCORE_THROUGHPUT_3C;
-            result.insLatency += PERFSCORE_LATENCY_12C;
+            result.insThroughput = PERFSCORE_THROUGHPUT_1C;
+            result.insLatency += PERFSCORE_LATENCY_4C;


On https://uops.info/table.html, I see the following for Skylake:

sqrtsd 64, TP 1.00 / 4.50, LAT [≤13.0;≤19] sqrtpd 128, TP 1.00 / 4.50, LAT [≤13.0;≤19] sqrtpd 256, TP 1.00 / 9.00, LAT [≤13.0;≤20] sqrtss 32, TP 1.00 / 3.00, LAT [≤12.0;≤13] sqrtps 128, TP 1.00 / 3.00, LAT [≤12.0;≤13] sqrtps 256, TP 1.00 / 6.00, LAT [≤12.0;≤13] rsqrtss 32, TP 1.00 / 1.00, LAT [≤4.0;≤5] rsqrtps 128, TP 1.00 / 1.00, LAT [≤4.0;≤5] rsqrtps 256, TP 1.00 / 1.00, LAT [≤4.0;≤5] rcpss 32, TP 1.00 / 1.00, LAT 4 rcpps 128, TP 1.00 / 1.00, LAT 4 rcpps 256, TP 1.00 / 1.00, LAT 4

In particular, it looks like the 256-bit variants can be twice as expensive on throughput (as least for non-reciprocal cases)

Yes, I saw that also when I checked the values, thought of even coding the argument size check, but then decided against it based on the fact that it wouldn't have any practical effect since the PerfScore is dominated by the larger latency value.

That's fair, just thought I'd call it out 😄

It might be worth an up-for-grabs issue to get these correctly tracked. Given that uops.info has everything available in an xml file (https://uops.info/xml.html), it would likely be possible to have a small generator that fills in a metadata header file (much like is used for hwintrinsiclistxarch.h) so this can be automated and is trivial to update to newer or older micro-architectures.

I'm not too familiar with PerfScore code, but such an approach might be quite complicated - the uops.info measurements are made for each variant of encoding/reg-vs-mem/op-size and latencies are for each combination of input and output parameters. Currently PerfScore values are more of an art than science (e.g. latency for DIV is just set to the arithmetic mean of min/max, while in practice it might actually be on the lower end for 128:64 div as the upper 64bits are basically always zero) or the latency for MUL rdx:rax,reg is set to 4 (high half of output used) as it's the common use case, not 3 (only low half used) which is only rarely emitted.

AndyAyersMS · 2021-04-09T16:27:08Z

@briansull PTAL

briansull

Looks Good,
Thanks for your contribution

Fix SQRT, RCP, RSQRT PerfScores

213a5d8

ghost added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Apr 6, 2021

tannergooding reviewed Apr 7, 2021

View reviewed changes

briansull approved these changes Apr 14, 2021

View reviewed changes

briansull merged commit 0cf4f15 into dotnet:main Apr 15, 2021

JulieLeeMSFT assigned pentp Apr 15, 2021

JulieLeeMSFT added this to the 6.0.0 milestone Apr 15, 2021

pentp deleted the sqrt-perfscore branch April 15, 2021 23:58

ghost locked as resolved and limited conversation to collaborators May 16, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix SQRT, RCP, RSQRT PerfScores #50813

Fix SQRT, RCP, RSQRT PerfScores #50813

Uh oh!

pentp commented Apr 6, 2021

Uh oh!

tannergooding Apr 7, 2021 •

edited

Loading

Uh oh!

tannergooding Apr 7, 2021

Uh oh!

pentp Apr 7, 2021

Uh oh!

tannergooding Apr 7, 2021

Uh oh!

pentp Apr 8, 2021

Uh oh!

AndyAyersMS commented Apr 9, 2021

Uh oh!

briansull left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Fix SQRT, RCP, RSQRT PerfScores #50813

Fix SQRT, RCP, RSQRT PerfScores #50813

Uh oh!

Conversation

pentp commented Apr 6, 2021

Uh oh!

tannergooding Apr 7, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tannergooding Apr 7, 2021

Choose a reason for hiding this comment

Uh oh!

pentp Apr 7, 2021

Choose a reason for hiding this comment

Uh oh!

tannergooding Apr 7, 2021

Choose a reason for hiding this comment

Uh oh!

pentp Apr 8, 2021

Choose a reason for hiding this comment

Uh oh!

AndyAyersMS commented Apr 9, 2021

Uh oh!

briansull left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

tannergooding Apr 7, 2021 •

edited

Loading