-
Notifications
You must be signed in to change notification settings - Fork 565
vm-migration: Optimize downtime by moving stop_dirty_log() #6987
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Starting and stopping logging dirty pages only occurs during cross-host migrations. Signed-off-by: Jinrong Liang <[email protected]>
The larger the VM memory, the greater the memory pressure, and the greater the stop_dirty_log() overhead. Moving stop_dirty_log() outside the downtime period can reduce downtime. Signed-off-by: Jinrong Liang <[email protected]>
likebreath
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ljrcore Thank you for the contributions. It makes sense, e.g. tearing down process on the source VM can be delayed for reducing downtime on the destination VM side.
Have you measured the downtime improvements from this change? It would be include some data here for future references if you have it. To my understanding, there are only few places that the stop_dirty_log() is not a no-op, such as guest memory (kvm dirty-bit tracking), vhost-user and vdpa. It would be good to know how much time we saved from these. Thank you.
| return e; | ||
| // Stop logging dirty pages only for non-local migrations | ||
| if !send_data_migration.local { | ||
| if let Err(e) = vm.stop_dirty_log() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How much overhead does the ioctl really add? Hopefully we never started dirty logging in the local case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How much overhead does the ioctl really add? Hopefully we never started dirty logging in the local case?
The detailed stop_dirty_log() overhead is shown in another comment.
The code marked here (the first patch) is a fix for the stop_dirty_log() usage defect. Dirty page recording starts with non-local migration, but it may be executed in local migration to stop recording dirty pages. Therefore, we should limit stop_dirty_log() only in the case of non-local migration.
The VM size and memory pressure have a significant impact on the stop_dirty_log() overhead. The larger the VM memory and the greater the memory pressure, the greater the stop_dirty_log() overhead. The data obtained by calculating the time difference before and after stop_dirty_log() is executed is as follows:
|
9d93df4
The larger the VM memory, the greater the memory pressure, and the greater the stop_dirty_log() overhead. For example, in the case of a 32c64g virtual machine with memory compression, stop_dirty_log() takes 36ms. Moving stop_dirty_log() outside the downtime period can reduce downtime.