-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Add support for cloning loops with <= conditions
#60148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for cloning loops with <= conditions
#60148
Conversation
|
Tagging subscribers to this area: @JulieLeeMSFT Issue DetailsLoop cloning currently only handles loops using There are perf wins and losses, with the usual existing issues for loop cloning.
Even so, there is one fewer bounds check than the baseline. We could remove one more,
|
|
Diffs are a code size regression, because loop cloning kicks in more, which is code expanding. benchmarks.run.windows.x64.checked.mch:Detail diffscoreclr_tests.pmi.windows.x64.checked.mch:Detail diffslibraries.crossgen2.windows.x64.checked.mch:Detail diffslibraries.pmi.windows.x64.checked.mch:Detail diffslibraries_tests.pmi.windows.x64.checked.mch:Detail diffs |
|
Note: I did touch the multi-dimensional code path, to make it more like the jagged array path, but multi-dimensional arrays are not supported in loop cloning, so that code isn't currently used. |
|
/azp run runtime-coreclr outerloop |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
@AndyAyersMS @dotnet/jit-contrib PTAL |
AndyAyersMS
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Do we have good "negative" tests for cloning -- ones where the fast path conditions are almost but not quite satisfied?
I don't think so. In fact, I'm not sure we have tests where none of the conditions succeed (versus some subset). It's something that should be added. |
src/coreclr/jit/loopcloning.cpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think some of the other common cases might be this simple? For example limit <= (arrLen - cns) or reverse loops: for (int i = arrLen - 1; i >= 0; i--), or native integer as the tpye for (nuint i = (nuint)(arrLen - 1); i != nuint.MaxValue; i--), etc
Not asking they be handled in this PR or anything; just wondering if they might be "low hanging fruit".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I doubt that other cases would be this simple. However, "count down" loops have been mentioned before as something to consider, and there are plenty of other generalizations that we could look into. This particular generalization came up because I saw it in some benchmarks I was investigating, so I knew there was code out there that would benefit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I know we have a couple places in the BCL where we are using ref arithmetic, Unsafe, and other tricks to workaround the other scenarios not working
So supporting them, while not a priority, would allow us to simplify some core perf-critical code and make it more readable/maintainable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I opened #60188 to track this.
interested in reading about the rationale on why is it a good change despite the 78.57% regression in top case? |
It's a code size regression, not necessarily a run-time regression. Loop cloning literally duplicates the body of a loop, making a "fast path" with fewer bounds checks, and a "slow path" with all the bounds checks. In the best case, the "loop choice conditions" are quick, the "fast path loop" is entered, and the missing bounds checks makes the loop run faster. |
Loop cloning currently only handles loops using `<` limit comparison, e.g., in loops like `for (i = c; i < n; i++)`. This change allows for loop cloning for loops using `<=`. This happens frequently in the BenchF benchmarks (which maybe had their origin long ago in 1-based array Fortran code?), and some in the BenchI benchmarks. In real world code, it happens in Visual Basic code. There are perf wins and losses, with the usual existing issues for loop cloning. For one case, BenchI.XPosMatrix, I see a loss due to various things, including: 1. Unfortunate register allocation and/or missing copy prop, leading to lots of unnecessary copies. 2. Hitting jcc erratum 3. CSE leads to un-contained array length load in array bounds check Even so, there is one fewer bounds check than the baseline. We could remove one more, but a limitation in recognizing multi-dimensional jagged array components prevents loop cloning from recognizing a legal-to-remove bounds check (dotnet#54074).
This change activated existing unused code that statically determined
a loop choice condition was false, leading to aborting the loop
cloning (and at runtime to an almost certain exception). (The test
case was `for (i = 1; i <= a.Length; i++) { a[i] = 1; }`)
If any choice condition is false, we shouldn't also check if all conditions
are true (that value is unreliable in this case).
There are no SPMI asm diffs from this change.
cda27b2 to
f046668
Compare
|
/azp run runtime-coreclr outerloop, runtime-coreclr jitstress |
|
Azure Pipelines successfully started running 2 pipeline(s). |
|
@DrewScoggins - I am wondering why this one was not flagged by auto filer? |

Loop cloning currently only handles loops using
<limit comparison,e.g., in loops like
for (i = c; i < n; i++). This change allowsfor loop cloning for loops using
<=. This happens frequently in the BenchFbenchmarks (which maybe had their origin long ago in 1-based array Fortran code?),
and some in the BenchI benchmarks. In real world code, it happens
in Visual Basic code.
There are perf wins and losses, with the usual existing issues for loop cloning.
For one case, BenchI.XPosMatrix, I see a loss due to various things, including:
unnecessary copies.
Even so, there is one fewer bounds check than the baseline. We could remove one more,
but a limitation in recognizing multi-dimensional jagged array components prevents loop
cloning from recognizing a legal-to-remove bounds check (#54074).