Also get add nuw from uN::checked_add#126852
Conversation
|
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
|
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
|
Finished benchmarking commit (ffcb98e): comparison URL. Overall result: ❌✅ regressions and improvements - ACTION NEEDEDBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @bors rollup=never Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)Results (primary -0.5%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResults (primary 1.8%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeResults (primary -0.1%, secondary -0.1%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Bootstrap: 692.678s -> 694.501s (0.26%) |
|
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
|
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
|
Finished benchmarking commit (3c488df): comparison URL. Overall result: ❌✅ regressions and improvements - ACTION NEEDEDBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @bors rollup=never Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)Results (primary 0.2%, secondary -4.8%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesThis benchmark run did not return any relevant results for this metric. Binary sizeResults (primary -0.1%, secondary -0.1%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Bootstrap: 691.47s -> 694.315s (0.41%) |
a53dd2f to
ec9e356
Compare
|
(Last force-push didn't change anything, just cleaned up history, so the #126852 (comment) run is still accurate) Ok, I think this is ready. It's maybe a small instruction regression, but shows neutral on cycles, while also being a broad size win. So overall I think that's completely acceptable for improved optimization potential. |
| // CHECK: %[[IS_MAX:.+]] = icmp eq i32 %x, -1 | ||
| // CHECK: br i1 %[[IS_MAX]], label %[[NONE_BB:.+]], label %[[SOME_BB:.+]] | ||
| // CHECK: [[SOME_BB]]: | ||
| // CHECK: %[[R:.+]] = add nuw i32 %x, 1 |
There was a problem hiding this comment.
Compare to the current codegen in https://rust.godbolt.org/z/vcPv8xse3 where this returns a wrapping addition, via llvm.uadd.with.overflow.
|
@bors r+ |
|
The job Click to see the possible cause of the failure (guessed by this bot) |
|
💔 Test failed - checks-actions |
|
@bors retry |
|
It seems that this actually pessimizes the result; see https://rust.godbolt.org/z/c1qebWbxM. LLVM is not good at recombining the (The corresponding PR for |
|
☀️ Test successful - checks-actions |
|
Finished benchmarking commit (fc555cd): comparison URL. Overall result: ❌✅ regressions and improvements - ACTION NEEDEDNext Steps: If you can justify the regressions found in this perf run, please indicate this with @rustbot label: +perf-regression Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)Results (primary 1.4%, secondary 2.2%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResults (primary 3.9%, secondary -3.5%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeResults (primary -0.1%, secondary -0.1%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Bootstrap: 693.576s -> 693.565s (-0.00%) |
Hey @scottmcm , did you see this comment about this PR? Do you have thoughts about the claim that this is a pessimization, at least on x86 it seems? |
|
visiting for weekly rustc-perf triage
|
When I was doing this for
checked_{sub,shl,shr}, it was mentioned #124114 (comment) that it'd be worth trying forchecked_addtoo.It makes a particularly-big difference for
x.checked_add(C), as doing this means that LLVM removes the intrinsic and does it as a normalx <= MAX - Cinstead.cc @dianqk who had commented about
checked_addrelated to rust-lang/hashbrown#509 beforecc llvm/llvm-project#80637 for how LLVM is unlikely to do this itself