Skip to content

Conversation

@phip1611
Copy link
Member

@phip1611 phip1611 commented Oct 7, 2025

The PR prepares the live-migration statistics. For that, we need to ensure that SendMigration won't block the whole VMM thread's event loop all the time ("asynchronization").

Further, this PR lays the groundwork to prevent VM change events (resize, hotplug) when a migration is ongoing.

Changes

  • EventFD logic to sync VMM from the VM Migration Thread
  • Put migration into dedicated thread that takes ownership of the VM from the VMM
  • Let all endpoints know when a migration is ongoing (the VMM knows the VMs ownership

Please review this commit-by-commit.

Closes https://github.com/cobaltcore-dev/cobaltcore/issues/293 and https://github.com/cobaltcore-dev/cobaltcore/issues/230

Steps Before Merge

  • passes libvirt-tests
  • Team aggrees design

@phip1611 phip1611 changed the title vmm: live migration asynchronization (dedicated thread) [WIP] vmm: live migration asynchronization (dedicated thread) Oct 27, 2025
@phip1611 phip1611 changed the title [WIP] vmm: live migration asynchronization (dedicated thread) vmm: live migration asynchronization (dedicated thread) Oct 30, 2025
@phip1611 phip1611 self-assigned this Oct 30, 2025
@phip1611 phip1611 requested review from amphi and tpressure October 30, 2025 11:50
@phip1611 phip1611 force-pushed the asynchronization branch 4 times, most recently from c18c399 to c1055f8 Compare November 11, 2025 13:53
@phip1611 phip1611 force-pushed the asynchronization branch 2 times, most recently from 7a7e248 to 7d1e566 Compare November 18, 2025 11:47
@phip1611 phip1611 marked this pull request as ready for review November 18, 2025 11:47
@phip1611 phip1611 force-pushed the asynchronization branch 3 times, most recently from 18baf46 to ec84b5d Compare November 18, 2025 12:21
Copy link

@olivereanderson olivereanderson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks quite good,

I mostly have some nits and there are a few comments that I think can be improved.

I am wondering whether a design using the scoped thread API could also work here, but that does not need to be explored now.

@phip1611 phip1611 marked this pull request as draft November 20, 2025 12:34
@phip1611 phip1611 marked this pull request as ready for review November 21, 2025 11:22
@phip1611
Copy link
Member Author

@olivereanderson since your review, I've simplified VmMigrationGuard and MigrationThread into a single struct. Further, the channel used to transfer the VM ownership is gone and now I return it via the JoinHandle. Thanks for the suggestion! Makes things very much simpler. LAst but not least, I've split the first commit into two.

Code diff should be visible here: https://github.com/cyberus-technology/cloud-hypervisor/compare/6eab06b350aa0b504874a652735a328e23beaa21..937f8804a84ffb96ca8fcb9c2e8d279dde49933d

@phip1611 phip1611 force-pushed the asynchronization branch 3 times, most recently from 2aa0c4f to 4a372b4 Compare November 21, 2025 11:39
@phip1611 phip1611 requested review from scholzp and removed request for tpressure November 21, 2025 12:52
Copy link

@scholzp scholzp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Only one thing I have a question about.

@scholzp scholzp self-requested a review November 21, 2025 14:42
@phip1611 phip1611 force-pushed the asynchronization branch 2 times, most recently from f9db2a3 to 1ae2ba4 Compare November 24, 2025 12:26
Copy link

@amphi amphi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me now.

This is a pre-requisite for the following commit which puts the
migration into a dedicated thread. It allows the VMM to react to
migration events (success/failure).

The commit series was inspired by @ljcore [0] but was changed quite
significantly.

[0] cloud-hypervisor#7038

Signed-off-by: Philipp Schuster <[email protected]>
On-behalf-of: SAP [email protected]
This puts the send-migration action into a dedicated thread. This
means:

1. The send-migration call will exit sooner (just trigger the
   migration)
2. Other API Call will not be possible as the VM's ownership is
   transferred from the VMM to the migration thread. E.g., hotplugging
   won't work (which is good).
3. If the migration causes the VMM process to crash, this currently
   can't be observed. A mechanism to query the migration status doesn't
   exist.

Signed-off-by: Philipp Schuster <[email protected]>
On-behalf-of: SAP [email protected]
The commit prepares to properly handle API events during ongoing
live-migrations. The VmInfo call is currently not working when a VM is
migrating. This will be addressed in a follow-up as part of statistics
migration statistics about ongoing live-migrations.

Signed-off-by: Philipp Schuster <[email protected]>
On-behalf-of: SAP [email protected]
Once we have a mechanism to query the progress of an ongoing
live-migration, we can remove this workaround.

Signed-off-by: Philipp Schuster <[email protected]>
On-behalf-of: SAP [email protected]
Signed-off-by: Philipp Schuster <[email protected]>
On-behalf-of: SAP [email protected]
@phip1611 phip1611 enabled auto-merge (rebase) November 24, 2025 12:48
@phip1611 phip1611 merged commit dc905d9 into cyberus-technology:gardenlinux Nov 24, 2025
14 checks passed
@phip1611 phip1611 deleted the asynchronization branch November 24, 2025 13:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants