Make bitset `would_modify_words` more vectorizer-friendly by Zalathar · Pull Request #153640 · rust-lang/rust

Zalathar · 2026-03-10T05:35:06Z

Currently this function compares a single pair of u64 at a time, which is potentially slower than comparing multiple words before each early-exit check, especially for the large chunks used by ChunkedBitSet.

Perf shows a notable improvement in cranelift-codegen, which is the one benchmark that is known to stress these code paths.

Incorporates Add a suite of ChunkedBitSet union/subtract/intersect test scenarios #153759.

Zalathar · 2026-03-10T05:35:16Z

@bors try @rust-timer queue

Make bitset `would_modify_words` more vectorizer-friendly

rust-bors · 2026-03-10T07:44:44Z

☀️ Try build successful (CI)
Build commit: af612eb (af612eb844c6b669b58fe4697341f163b33d231b, parent: 2d76d9bc76f27b03b4899e72ce561c7ac2c5cf6b)

Zalathar · 2026-03-10T23:30:59Z

Let's see what happens if we double the subchunk length from 32 bytes (4 words) to 64 bytes (8 words).

@bors try @rust-timer queue

Make bitset `would_modify_words` more vectorizer-friendly

rust-bors · 2026-03-11T01:40:06Z

☀️ Try build successful (CI)
Build commit: b3e83b4 (b3e83b46f88ed6694cef6eadfcffb7f5b6d8e9d7, parent: 0c68443b0a0469e4211acca7e7b06e14f256ada8)

rust-timer · 2026-03-11T02:20:20Z

Finished benchmarking commit (b3e83b4): comparison URL.

Overall result: ✅ improvements - no action needed

Benchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf.

@bors rollup=never
@rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

Our most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-1.5%	[-2.3%, -0.8%]	4
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-1.5%	[-2.3%, -0.8%]	4

Max RSS (memory usage)

Results (secondary 7.5%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	7.5%	[7.5%, 7.5%]	1
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-	-	0

Cycles

Results (secondary -2.3%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-2.3%	[-2.4%, -2.1%]	2
All ❌✅ (primary)	-	-	0

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 480.034s -> 484.684s (0.97%)
Artifact size: 394.90 MiB -> 396.92 MiB (0.51%)

Zalathar · 2026-03-11T02:53:29Z

I was initially disappointed to see this only affect cranelift-codegen, but I guess that's the only benchmark in our suite where these paths are actually hot.

I wonder what code patterns cause these paths to be relevant.

lqd · 2026-03-11T06:02:35Z

Probably its huge functions with a bunch of locals, exercising the move/init dataflow a lot?

Zalathar · 2026-03-11T06:59:06Z

If it has large functions with thousands of locals, then yeah I can imagine that stressing MixedBitSet in ways that most crates never come close to.

lqd · 2026-03-11T08:32:41Z

My recollection is that it has indeed, with the usual suspect of using machine-generated code.

rustbot · 2026-03-12T10:53:19Z

r? @madsmtm

rustbot has assigned @madsmtm.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

Why was this reviewer chosen?

The reviewer was selected based on:

Owners of files modified in this PR: compiler
compiler expanded to 69 candidates
Random selection from 16 candidates

Zalathar · 2026-03-12T10:57:49Z

No substantial changes since last time, but let's have another perf run to make sure everything's still good.

@bors try @rust-timer queue

Make bitset `would_modify_words` more vectorizer-friendly

rust-bors · 2026-03-12T13:06:51Z

☀️ Try build successful (CI)
Build commit: 55f9125 (55f9125ede638db1aa6ecb7e07941e7911b33e18, parent: d1ee5e59a964a419b84b760812a35075034f4861)

rust-timer · 2026-03-12T13:46:51Z

Finished benchmarking commit (55f9125): comparison URL.

Overall result: ❌✅ regressions and improvements - please read the text below

Benchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please do so in sufficient writing along with @rustbot label: +perf-regression-triaged. If not, please fix the regressions and do another perf run. If its results are neutral or positive, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

Our most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	0.3%	[0.3%, 0.3%]	1
Improvements ✅ (primary)	-1.6%	[-2.3%, -0.9%]	4
Improvements ✅ (secondary)	-0.0%	[-0.0%, -0.0%]	2
All ❌✅ (primary)	-1.6%	[-2.3%, -0.9%]	4

Max RSS (memory usage)

Results (secondary -2.8%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-2.8%	[-4.7%, -0.9%]	2
All ❌✅ (primary)	-	-	0

Cycles

Results (secondary 0.0%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	4.6%	[4.6%, 4.6%]	1
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-4.5%	[-4.5%, -4.5%]	1
All ❌✅ (primary)	-	-	0

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 480.488s -> 481.781s (0.27%)
Artifact size: 394.87 MiB -> 394.97 MiB (0.02%)

nnethercote · 2026-03-13T08:39:34Z

I know that some of the dataflow analysis for cranelift reaches a fixpoint very slowly, and many the basic blocks get reprocessed over and over. A while back I tried tweaking the basic block traversals and I got up to 20% improvements(!) on some cranelift builds but I also got similar slowdowns on some other benchmarks so I never filed any PRs about it.

rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Mar 10, 2026

This comment has been minimized.

Sign in to view

rust-bors Bot pushed a commit that referenced this pull request Mar 10, 2026

Auto merge of #153640 - Zalathar:subchunk, r=<try>

af612eb

Make bitset `would_modify_words` more vectorizer-friendly

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Mar 10, 2026

This comment has been minimized.

Sign in to view

This comment was marked as outdated.

Sign in to view

rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Mar 10, 2026

Zalathar force-pushed the subchunk branch from 46bd3ac to e8758f5 Compare March 10, 2026 23:08

This comment has been minimized.

Sign in to view

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Mar 10, 2026

rust-bors Bot pushed a commit that referenced this pull request Mar 10, 2026

Auto merge of #153640 - Zalathar:subchunk, r=<try>

b3e83b4

Make bitset `would_modify_words` more vectorizer-friendly

Zalathar mentioned this pull request Mar 10, 2026

Avoid the would-change check for bitset chunks that are unique #153680

Closed

This comment has been minimized.

Sign in to view

rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Mar 11, 2026

Zalathar added 3 commits March 12, 2026 21:45

Add a suite of ChunkedBitSet union/subtract/intersect test scenarios

da71ed2

Rename some confusing bitset helper functions

347f5ca

Make bitset would_modify_words more vectorizer-friendly

c6a1968

Zalathar force-pushed the subchunk branch from 8be68bf to c6a1968 Compare March 12, 2026 10:47

Zalathar marked this pull request as ready for review March 12, 2026 10:53

rustbot added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Mar 12, 2026

rustbot assigned madsmtm Mar 12, 2026

rustbot removed the S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. label Mar 12, 2026

This comment has been minimized.

Sign in to view

rust-bors Bot pushed a commit that referenced this pull request Mar 12, 2026

Auto merge of #153640 - Zalathar:subchunk, r=<try>

55f9125

Make bitset `would_modify_words` more vectorizer-friendly

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Mar 12, 2026

This comment has been minimized.

Sign in to view

rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Mar 12, 2026

Uh oh!

Conversation

Zalathar commented Mar 10, 2026 • edited by rustbot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Zalathar commented Mar 10, 2026

Uh oh!

This comment has been minimized.

This comment has been minimized.

rust-bors Bot commented Mar 10, 2026

Uh oh!

This comment has been minimized.

This comment was marked as outdated.

Zalathar commented Mar 10, 2026

Uh oh!

This comment has been minimized.

This comment has been minimized.

rust-bors Bot commented Mar 11, 2026

Uh oh!

This comment has been minimized.

rust-timer commented Mar 11, 2026

Overall result: ✅ improvements - no action needed

Uh oh!

Zalathar commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lqd commented Mar 11, 2026

Uh oh!

Zalathar commented Mar 11, 2026

Uh oh!

lqd commented Mar 11, 2026

Uh oh!

rustbot commented Mar 12, 2026

Uh oh!

Zalathar commented Mar 12, 2026

Uh oh!

This comment has been minimized.

This comment has been minimized.

rust-bors Bot commented Mar 12, 2026

Uh oh!

This comment has been minimized.

rust-timer commented Mar 12, 2026

Overall result: ❌✅ regressions and improvements - please read the text below

Uh oh!

nnethercote commented Mar 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Zalathar commented Mar 10, 2026 •

edited by rustbot

Loading

Zalathar commented Mar 11, 2026 •

edited

Loading