Skip to content

Conversation

@ljrcore
Copy link

@ljrcore ljrcore commented Mar 18, 2025

The larger the VM memory, the greater the memory pressure, and the greater the stop_dirty_log() overhead. For example, in the case of a 32c64g virtual machine with memory compression, stop_dirty_log() takes 36ms. Moving stop_dirty_log() outside the downtime period can reduce downtime.

Jinrong Liang added 2 commits March 17, 2025 17:39
Starting and stopping logging dirty pages only occurs during cross-host
migrations.

Signed-off-by: Jinrong Liang <[email protected]>
The larger the VM memory, the greater the memory pressure, and the
greater the stop_dirty_log() overhead. Moving stop_dirty_log() outside
the downtime period can reduce downtime.

Signed-off-by: Jinrong Liang <[email protected]>
@ljrcore ljrcore requested a review from a team as a code owner March 18, 2025 02:31
@ljrcore ljrcore changed the title Optimize downtime by moving stop_dirty_log() vm-migration: Optimize downtime by moving stop_dirty_log() Mar 19, 2025
Copy link
Member

@likebreath likebreath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ljrcore Thank you for the contributions. It makes sense, e.g. tearing down process on the source VM can be delayed for reducing downtime on the destination VM side.

Have you measured the downtime improvements from this change? It would be include some data here for future references if you have it. To my understanding, there are only few places that the stop_dirty_log() is not a no-op, such as guest memory (kvm dirty-bit tracking), vhost-user and vdpa. It would be good to know how much time we saved from these. Thank you.

return e;
// Stop logging dirty pages only for non-local migrations
if !send_data_migration.local {
if let Err(e) = vm.stop_dirty_log() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How much overhead does the ioctl really add? Hopefully we never started dirty logging in the local case?

Copy link
Author

@ljrcore ljrcore Mar 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How much overhead does the ioctl really add? Hopefully we never started dirty logging in the local case?

The detailed stop_dirty_log() overhead is shown in another comment.

The code marked here (the first patch) is a fix for the stop_dirty_log() usage defect. Dirty page recording starts with non-local migration, but it may be executed in local migration to stop recording dirty pages. Therefore, we should limit stop_dirty_log() only in the case of non-local migration.

@ljrcore
Copy link
Author

ljrcore commented Mar 20, 2025

@ljrcore Thank you for the contributions. It makes sense, e.g. tearing down process on the source VM can be delayed for reducing downtime on the destination VM side.

Have you measured the downtime improvements from this change? It would be include some data here for future references if you have it. To my understanding, there are only few places that the stop_dirty_log() is not a no-op, such as guest memory (kvm dirty-bit tracking), vhost-user and vdpa. It would be good to know how much time we saved from these. Thank you.

The VM size and memory pressure have a significant impact on the stop_dirty_log() overhead. The larger the VM memory and the greater the memory pressure, the greater the stop_dirty_log() overhead. The data obtained by calculating the time difference before and after stop_dirty_log() is executed is as follows:

VM specifications Memory pressure stop_dirty_log overhead(ms)
4c8g / 3
16c32g / 6
64c128g / 17
128c256g / 40
4c8g stress -m 2 --vm-bytes 256M 6
16c32g stress -m 8 --vm-bytes 256M 16
64c128g stress -m 32 --vm-bytes 256M 56
128c256g stress -m 64 --vm-bytes 256M 139

@likebreath likebreath added this pull request to the merge queue Mar 27, 2025
@likebreath likebreath moved this from 🆕 New to ✅ Done in Cloud Hypervisor Roadmap Mar 27, 2025
@likebreath likebreath moved this from ✅ Done to 👀 In review in Cloud Hypervisor Roadmap Mar 27, 2025
Merged via the queue into cloud-hypervisor:main with commit 9d93df4 Mar 27, 2025
38 of 39 checks passed
@github-project-automation github-project-automation bot moved this from 👀 In review to ✅ Done in Cloud Hypervisor Roadmap Mar 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants