Skip to content

Conversation

@blitz
Copy link
Member

@blitz blitz commented Nov 7, 2025

This is fallout from #30. I've noticed that the function that turns dirty bitmaps during migration into MemoryRangeTable is very inefficient. It creates useless temporary vectors. It has a very basic method of finding the 1 bits in the bitmap.

I've transformed the dirty bitmap scanning using iterators (without temporaries). To do that I introduced BitposIterator which takes an iterator over a bitmap and emits the positions of all 1 bits. With some Itertools magic (coalesce) we can then easily merge the resulting bit ranges.

The code now:

  • handles large patches of unchanged memory much more efficiently (by efficiently skipping over 0 in the bitmap)
  • creates no unneeded temporaries
  • has (some) unit tests

Improvements

I've benchmarked this for a 384 MB dirty bitmap (equivalent to 12TB VM). I got the following results:

Dirty Rate Old Code New Code Improvement
0% (artificial) 1.8512s 16ms 115x
0.1% 1.8761s 58.885ms 32x
1% 2.1668s 382.63ms 5.6x
5% 3.1310s 1.1993ms 1.5x

The 0% means that we are scanning the empty bitmap at 24 GB/s (on my system). I should have 56GB/s memory bandwidth, so this is pretty good without any SIMD magic.

For even dirtier bitmaps, the performance will be more equal, because it's dominated by constructing the huge vector inside of MemoryRangeTable. This can also be optimized away (https://github.com/cobaltcore-dev/cobaltcore/issues/280) and the optimization in this PR is a first step.

These numbers were generated using criterion using this harness.

Testing

  • Ran libvirt test suite.

@blitz blitz force-pushed the fast-dirty-log branch 2 times, most recently from b315ec0 to 2d294ef Compare November 7, 2025 14:36
blitz added 6 commits November 7, 2025 15:39
Instead of using ad-hoc code, just write an extension to the Iterator
trait that we can easily unit test.

On-behalf-of: SAP [email protected]
Signed-off-by: Julian Stecklina <[email protected]>
It improves iteration code a lot! I'll use it in the upcoming commit
to speed up dirty bitmap scanning.

On-behalf-of: SAP [email protected]
Signed-off-by: Julian Stecklina <[email protected]>
With this change, we don't need a copy of the vector. Just something
that can be coerced into an iterator. We also use the bit position
iterator to make the code somewhat clearer. The new code is much
faster, because it will not iterate over every bit, just each 1 bit in
the input.

The next commit will complete this optimization and have some concrete
numbers.

On-behalf-of: SAP [email protected]
Signed-off-by: Julian Stecklina <[email protected]>
This would be a good opportunity to optimize another pointless vector
away, but I don't have a good way to test this at the moment. But
maybe someone else gives it a shot.

On-behalf-of: SAP [email protected]
Signed-off-by: Julian Stecklina <[email protected]>
... by passing the slice along instead.

On-behalf-of: SAP [email protected]
Signed-off-by: Julian Stecklina <[email protected]>
... by just passing the iterator along. For large VMs this bitmap is
gigantic. A 12TB VM has 384MB of dirty bitmap.

With all these optimizations from the previous commits in place, we
see quite the improvement when it comes to scanning the dirty bitmap.

For a bitmap with 1% bits (randomly) set, dirty_log() takes:

Original code: 2166ms (100.0%)
New code:       382ms ( 17.6%)

on my system. The sparser the dirty bitmap the faster. Scanning an
empty bitmap is 100x faster. For a 5% populated bitmap we are still 3x
faster.

If someone wants to play with this, there is a benchmark harness here:
https://github.com/blitz/chv-bitmap-bench

On-behalf-of: SAP [email protected]
Signed-off-by: Julian Stecklina <[email protected]>
Copy link

@olivereanderson olivereanderson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love these changes!

The new code definitely avoids large copies and is at least as readable as the old.

It would be interesting to see some benchmarks with real world data, but I doubt this will ever be slower than what we currently have (except perhaps for extremely dense bitmaps which should not be a common occurrence).

@phip1611
Copy link
Member

Copy link
Member

@phip1611 phip1611 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love it, thanks! Please have some more cookies 🍪 🍪

@phip1611 phip1611 merged commit 484d4f3 into cyberus-technology:gardenlinux Nov 10, 2025
15 checks passed
@phip1611 phip1611 mentioned this pull request Nov 10, 2025
13 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants