Skip to content

Make bitset would_modify_words more vectorizer-friendly#153640

Open
Zalathar wants to merge 3 commits intorust-lang:mainfrom
Zalathar:subchunk
Open

Make bitset would_modify_words more vectorizer-friendly#153640
Zalathar wants to merge 3 commits intorust-lang:mainfrom
Zalathar:subchunk

Conversation

@Zalathar
Copy link
Copy Markdown
Member

@Zalathar Zalathar commented Mar 10, 2026

View all comments

Currently this function compares a single pair of u64 at a time, which is potentially slower than comparing multiple words before each early-exit check, especially for the large chunks used by ChunkedBitSet.

Perf shows a notable improvement in cranelift-codegen, which is the one benchmark that is known to stress these code paths.

@rustbot rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Mar 10, 2026
@Zalathar
Copy link
Copy Markdown
Member Author

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rust-bors

This comment has been minimized.

rust-bors Bot pushed a commit that referenced this pull request Mar 10, 2026
Make bitset `would_modify_words` more vectorizer-friendly
@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Mar 10, 2026
@rust-bors
Copy link
Copy Markdown
Contributor

rust-bors Bot commented Mar 10, 2026

☀️ Try build successful (CI)
Build commit: af612eb (af612eb844c6b669b58fe4697341f163b33d231b, parent: 2d76d9bc76f27b03b4899e72ce561c7ac2c5cf6b)

@rust-timer

This comment has been minimized.

@rust-timer

This comment was marked as outdated.

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Mar 10, 2026
@Zalathar
Copy link
Copy Markdown
Member Author

Let's see what happens if we double the subchunk length from 32 bytes (4 words) to 64 bytes (8 words).

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rust-bors

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Mar 10, 2026
rust-bors Bot pushed a commit that referenced this pull request Mar 10, 2026
Make bitset `would_modify_words` more vectorizer-friendly
@rust-bors
Copy link
Copy Markdown
Contributor

rust-bors Bot commented Mar 11, 2026

☀️ Try build successful (CI)
Build commit: b3e83b4 (b3e83b46f88ed6694cef6eadfcffb7f5b6d8e9d7, parent: 0c68443b0a0469e4211acca7e7b06e14f256ada8)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Copy Markdown
Collaborator

Finished benchmarking commit (b3e83b4): comparison URL.

Overall result: ✅ improvements - no action needed

Benchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf.

@bors rollup=never
@rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

Our most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-1.5% [-2.3%, -0.8%] 4
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -1.5% [-2.3%, -0.8%] 4

Max RSS (memory usage)

Results (secondary 7.5%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
7.5% [7.5%, 7.5%] 1
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) - - 0

Cycles

Results (secondary -2.3%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-2.3% [-2.4%, -2.1%] 2
All ❌✅ (primary) - - 0

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 480.034s -> 484.684s (0.97%)
Artifact size: 394.90 MiB -> 396.92 MiB (0.51%)

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Mar 11, 2026
@Zalathar
Copy link
Copy Markdown
Member Author

Zalathar commented Mar 11, 2026

I was initially disappointed to see this only affect cranelift-codegen, but I guess that's the only benchmark in our suite where these paths are actually hot.

I wonder what code patterns cause these paths to be relevant.

@lqd
Copy link
Copy Markdown
Member

lqd commented Mar 11, 2026

Probably its huge functions with a bunch of locals, exercising the move/init dataflow a lot?

@Zalathar
Copy link
Copy Markdown
Member Author

If it has large functions with thousands of locals, then yeah I can imagine that stressing MixedBitSet in ways that most crates never come close to.

@lqd
Copy link
Copy Markdown
Member

lqd commented Mar 11, 2026

My recollection is that it has indeed, with the usual suspect of using machine-generated code.

@Zalathar Zalathar marked this pull request as ready for review March 12, 2026 10:53
@rustbot rustbot added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Mar 12, 2026
@rustbot rustbot removed the S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. label Mar 12, 2026
@rustbot
Copy link
Copy Markdown
Collaborator

rustbot commented Mar 12, 2026

r? @madsmtm

rustbot has assigned @madsmtm.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

Why was this reviewer chosen?

The reviewer was selected based on:

  • Owners of files modified in this PR: compiler
  • compiler expanded to 69 candidates
  • Random selection from 16 candidates

@Zalathar
Copy link
Copy Markdown
Member Author

No substantial changes since last time, but let's have another perf run to make sure everything's still good.

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rust-bors

This comment has been minimized.

rust-bors Bot pushed a commit that referenced this pull request Mar 12, 2026
Make bitset `would_modify_words` more vectorizer-friendly
@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Mar 12, 2026
@rust-bors
Copy link
Copy Markdown
Contributor

rust-bors Bot commented Mar 12, 2026

☀️ Try build successful (CI)
Build commit: 55f9125 (55f9125ede638db1aa6ecb7e07941e7911b33e18, parent: d1ee5e59a964a419b84b760812a35075034f4861)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Copy Markdown
Collaborator

Finished benchmarking commit (55f9125): comparison URL.

Overall result: ❌✅ regressions and improvements - please read the text below

Benchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please do so in sufficient writing along with @rustbot label: +perf-regression-triaged. If not, please fix the regressions and do another perf run. If its results are neutral or positive, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

Our most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
0.3% [0.3%, 0.3%] 1
Improvements ✅
(primary)
-1.6% [-2.3%, -0.9%] 4
Improvements ✅
(secondary)
-0.0% [-0.0%, -0.0%] 2
All ❌✅ (primary) -1.6% [-2.3%, -0.9%] 4

Max RSS (memory usage)

Results (secondary -2.8%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-2.8% [-4.7%, -0.9%] 2
All ❌✅ (primary) - - 0

Cycles

Results (secondary 0.0%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
4.6% [4.6%, 4.6%] 1
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-4.5% [-4.5%, -4.5%] 1
All ❌✅ (primary) - - 0

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 480.488s -> 481.781s (0.27%)
Artifact size: 394.87 MiB -> 394.97 MiB (0.02%)

@rustbot rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Mar 12, 2026
@nnethercote
Copy link
Copy Markdown
Contributor

I know that some of the dataflow analysis for cranelift reaches a fixpoint very slowly, and many the basic blocks get reprocessed over and over. A while back I tried tweaking the basic block traversals and I got up to 20% improvements(!) on some cranelift builds but I also got similar slowdowns on some other benchmarks so I never filed any PRs about it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

perf-regression Performance regression. S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants