Conversation
- Remove its ability to replace locals by constants while cloning. Only loop unrolling uses this capability and it can be replaced by a simple visit over the tree after cloning, so that everyone else does not need to pay for it. - Remove its ability to add flags. This is only used by hoisting to add `GTF_MAKE_CSE` flag recursively to the cloned tree. I do not see any good reason not just to only put this on the root node. - Simplify how it propagates flags; simply copy them from the source node instead of having a separate `gtUpdateNodeSideEffects` call for the cloned tree. I do not see why the cloned version should have its flags updated like this when the original shouldn't.
|
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch Issue Details
A few diffs expected due to not marking hoisted clones with
|
Diff results for #97411Assembly diffsAssembly diffs for linux/arm64 ran on windows/x64Diffs are based on 2,501,261 contexts (1,003,806 MinOpts, 1,497,455 FullOpts). MISSED contexts: 3,956 (0.16%) Overall (-664 bytes)
FullOpts (-664 bytes)
Assembly diffs for linux/x64 ran on windows/x64Diffs are based on 2,595,036 contexts (1,052,329 MinOpts, 1,542,707 FullOpts). MISSED contexts: 3,599 (0.14%) Overall (-1,262 bytes)
FullOpts (-1,262 bytes)
Assembly diffs for osx/arm64 ran on windows/x64Diffs are based on 2,262,764 contexts (930,876 MinOpts, 1,331,888 FullOpts). MISSED contexts: 3,201 (0.14%) Overall (-672 bytes)
FullOpts (-672 bytes)
Assembly diffs for windows/arm64 ran on windows/x64Diffs are based on 2,318,293 contexts (931,543 MinOpts, 1,386,750 FullOpts). MISSED contexts: 2,601 (0.11%) Overall (-616 bytes)
FullOpts (-616 bytes)
Assembly diffs for windows/x64 ran on windows/x64Diffs are based on 2,492,949 contexts (983,689 MinOpts, 1,509,260 FullOpts). MISSED contexts: 3,862 (0.15%) Overall (-2,177 bytes)
FullOpts (-2,177 bytes)
Details here Assembly diffs for linux/arm ran on windows/x86Diffs are based on 2,237,703 contexts (827,812 MinOpts, 1,409,891 FullOpts). MISSED contexts: 74,543 (3.22%) Overall (-328 bytes)
FullOpts (-328 bytes)
Assembly diffs for windows/x86 ran on windows/x86Diffs are based on 2,298,783 contexts (841,817 MinOpts, 1,456,966 FullOpts). MISSED contexts: base: 2,552 (0.11%), diff: 2,555 (0.11%) Overall (-1,172 bytes)
FullOpts (-1,172 bytes)
Details here Throughput diffsThroughput diffs for linux/arm64 ran on windows/x64Overall (-0.27% to -0.11%)
MinOpts (-0.10% to -0.02%)
FullOpts (-0.27% to -0.11%)
Throughput diffs for linux/x64 ran on windows/x64Overall (-0.24% to -0.12%)
MinOpts (-0.12% to -0.02%)
FullOpts (-0.24% to -0.12%)
Throughput diffs for osx/arm64 ran on windows/x64Overall (-0.18% to -0.11%)
MinOpts (-0.13% to -0.02%)
FullOpts (-0.20% to -0.11%)
Throughput diffs for windows/arm64 ran on windows/x64Overall (-0.25% to -0.11%)
MinOpts (-0.13% to -0.02%)
FullOpts (-0.25% to -0.11%)
Throughput diffs for windows/x64 ran on windows/x64Overall (-0.23% to -0.13%)
MinOpts (-0.15% to -0.02%)
FullOpts (-0.23% to -0.13%)
Details here Throughput diffs for linux/arm ran on windows/x86Overall (-0.19% to -0.13%)
MinOpts (-0.14% to -0.03%)
FullOpts (-0.22% to -0.13%)
Throughput diffs for windows/x86 ran on windows/x86Overall (-0.22% to -0.14%)
MinOpts (-0.20% to -0.03%)
FullOpts (-0.26% to -0.14%)
Details here Throughput diffs for linux/arm64 ran on linux/x64Overall (-0.26% to -0.11%)
MinOpts (-0.08% to 0.00%)
FullOpts (-0.26% to -0.11%)
Throughput diffs for linux/x64 ran on linux/x64Overall (-0.23% to -0.11%)
MinOpts (-0.10% to 0.00%)
FullOpts (-0.23% to -0.11%)
Details here |
|
/azp run runtime-coreclr jitstress, runtime-coreclr libraries-jitstress, Fuzzlyn |
|
Azure Pipelines successfully started running 3 pipeline(s). |
|
cc @dotnet/jit-contrib PTAL @EgorBo, maybe @AndyAyersMS too since you've been looking at Should be ready pending CI. Diffs. Some from the |
| // This is used to replace the loop iterator with the constant value when | ||
| // unrolling. | ||
| // | ||
| void Compiler::optReplaceScalarUsesWithConst(BasicBlock* block, unsigned lclNum, ssize_t cnsVal) |
There was a problem hiding this comment.
I am not familair with the existing logic - can you explain why it's fine to do propagation like this? can the locals be address exposed?
There was a problem hiding this comment.
FlowGraphNaturalLoop::AnalyzeIteration checks for this (it guarantees that the iterator local is only defined once within the loop and that the loop invariant is always true).
Diff results for #97411Assembly diffsAssembly diffs for linux/arm64 ran on windows/x64Diffs are based on 2,501,157 contexts (1,003,806 MinOpts, 1,497,351 FullOpts). MISSED contexts: 4,060 (0.16%) Overall (-1,372 bytes)
FullOpts (-1,372 bytes)
Assembly diffs for linux/x64 ran on windows/x64Diffs are based on 2,595,007 contexts (1,052,329 MinOpts, 1,542,678 FullOpts). MISSED contexts: 3,628 (0.14%) Overall (-2,033 bytes)
FullOpts (-2,033 bytes)
Assembly diffs for osx/arm64 ran on windows/x64Diffs are based on 2,262,709 contexts (930,876 MinOpts, 1,331,833 FullOpts). MISSED contexts: 3,256 (0.14%) Overall (-1,300 bytes)
FullOpts (-1,300 bytes)
Assembly diffs for windows/arm64 ran on windows/x64Diffs are based on 2,318,207 contexts (931,543 MinOpts, 1,386,664 FullOpts). MISSED contexts: 2,687 (0.12%) Overall (-1,208 bytes)
FullOpts (-1,208 bytes)
Assembly diffs for windows/x64 ran on windows/x64Diffs are based on 2,492,912 contexts (983,689 MinOpts, 1,509,223 FullOpts). MISSED contexts: 3,899 (0.16%) Overall (-3,190 bytes)
FullOpts (-3,190 bytes)
Details here Assembly diffs for linux/arm ran on windows/x86Diffs are based on 2,237,690 contexts (827,812 MinOpts, 1,409,878 FullOpts). MISSED contexts: 74,588 (3.23%) Overall (-724 bytes)
FullOpts (-724 bytes)
Assembly diffs for windows/x86 ran on windows/x86Diffs are based on 2,296,274 contexts (841,817 MinOpts, 1,454,457 FullOpts). MISSED contexts: base: 5,093 (0.22%), diff: 5,096 (0.22%) Overall (-1,264 bytes)
FullOpts (-1,264 bytes)
Details here Throughput diffsThroughput diffs for linux/arm64 ran on windows/x64Overall (-0.26% to -0.11%)
MinOpts (-0.10% to -0.02%)
FullOpts (-0.26% to -0.11%)
Throughput diffs for linux/x64 ran on windows/x64Overall (-0.23% to -0.12%)
MinOpts (-0.12% to -0.02%)
FullOpts (-0.23% to -0.12%)
Throughput diffs for osx/arm64 ran on windows/x64Overall (-0.18% to -0.11%)
MinOpts (-0.13% to -0.02%)
FullOpts (-0.20% to -0.11%)
Throughput diffs for windows/arm64 ran on windows/x64Overall (-0.24% to -0.11%)
MinOpts (-0.13% to -0.02%)
FullOpts (-0.24% to -0.11%)
Throughput diffs for windows/x64 ran on windows/x64Overall (-0.23% to -0.12%)
MinOpts (-0.15% to -0.02%)
FullOpts (-0.23% to -0.12%)
Details here Throughput diffs for linux/arm64 ran on linux/x64Overall (-0.25% to -0.10%)
MinOpts (-0.08% to 0.00%)
FullOpts (-0.25% to -0.10%)
Throughput diffs for linux/x64 ran on linux/x64Overall (-0.23% to -0.11%)
MinOpts (-0.10% to 0.00%)
FullOpts (-0.23% to -0.11%)
Details here Throughput diffs for linux/arm ran on windows/x86Overall (-0.19% to -0.13%)
MinOpts (-0.14% to -0.03%)
FullOpts (-0.20% to -0.13%)
Throughput diffs for windows/x86 ran on windows/x86Overall (-0.22% to -0.14%)
MinOpts (-0.20% to -0.03%)
FullOpts (-0.25% to -0.14%)
Details here |
|
GTF_MAKE_CSE change is good, I had the same thing in the branch I was using to investigate why we skip doing some of them. |
Diff results for #97411Assembly diffsAssembly diffs for linux/arm ran on windows/x86Diffs are based on 2,237,690 contexts (827,812 MinOpts, 1,409,878 FullOpts). MISSED contexts: 74,588 (3.23%) Overall (-724 bytes)
FullOpts (-724 bytes)
Details here |
|
Failures are #97437 |
GTF_MAKE_CSEflag recursively to the cloned tree. I do not see any good reason not just to only put this on the root node.gtUpdateNodeSideEffectscall for the cloned tree. I do not see why the cloned version should have its flags updated like this when the original shouldn't.A few diffs expected due to not marking hoisted clones with
GTF_MAKE_CSErecursively, which sometimes allows us to optimize them away. Some other diffs from the removal ofgtUpdateNodeSideEffects(copy)ingtCloneExpr, which can impact cases where we get our flag maintenance wrong (e.g. #13758).