Skip to content

Conversation

@phip1611
Copy link
Member

@phip1611 phip1611 commented Aug 18, 2025

This adds the PoC for vCPU-throttling/auto-converge for pre-copy migration.

vm-migration: add vCPU throttling (auto-converge) for pre-copy

About

auto-converge (vCPU throttling) is a crucial technique to migrate
VMs with a high dirty rate (high working set with intense usage).
It is an alternative to postcopy migration, which is not yet
implemented in Cloud Hypervisor.

Implementation

vCPU throttling was implemented with a different thread and a
manager for that thread. It is possible to abort vCPU throttling
in case one aborts a live-migration, for example.

The rather complex thread state management is covered in unit
tests ensuring liveliness throughout all possible scenarios.

The throttling itself is implemented by utilizing the CpuManager's
pause() and resume() functions.

Effective Behaviour

  • auto-converging starts always after four memory delta transfer
    iterations (not configurable yet)
  • every two iterations, it is increased again
  • step size is 10%
  • maximum is 99%
  • time window for throttling is 100ms. So with 99% throttling,
    the vCPU is paused for 99ms and runs for 1ms, and again.

Signed-off-by: Philipp Schuster [email protected]
On-behalf-of: SAP [email protected]

Hints for Reviewers

  • Review this commit by commit
  • We did this to enable SAP to migrate their memory intensive workloads somehow in the Pilot.

Steps to Undraft

@phip1611 phip1611 self-assigned this Aug 18, 2025
@phip1611 phip1611 force-pushed the cyberus-fork-poc-precopy-autoconverge branch from 86db5eb to f2978b1 Compare August 18, 2025 15:03
@phip1611 phip1611 force-pushed the cyberus-fork-poc-precopy-autoconverge branch from f2978b1 to d7003ea Compare August 18, 2025 15:10
@phip1611 phip1611 marked this pull request as draft August 18, 2025 15:11
@phip1611 phip1611 changed the title PoC: vCPU throttling/auto-converge gardenlinux: PoC: vCPU throttling/auto-converge Aug 18, 2025
@tpressure
Copy link

I just stress tested this and ran into an issue after 174 successful migrations. I'm now in a state where the receiver side is on 100% cpu whereas the sender side is idling, and the migration does not continue

@phip1611 phip1611 force-pushed the cyberus-fork-poc-precopy-autoconverge branch 2 times, most recently from 5554499 to 36985cb Compare August 19, 2025 13:46
olivereanderson

This comment was marked as outdated.

@phip1611

This comment was marked as outdated.

@phip1611 phip1611 force-pushed the cyberus-fork-poc-precopy-autoconverge branch from 36985cb to 7b8e12a Compare August 20, 2025 06:21
@olivereanderson

This comment was marked as outdated.

@olivereanderson

This comment was marked as outdated.

@phip1611

This comment was marked as outdated.

@phip1611 phip1611 force-pushed the cyberus-fork-poc-precopy-autoconverge branch 6 times, most recently from b291803 to 464bc74 Compare August 22, 2025 11:14
@phip1611 phip1611 marked this pull request as ready for review August 22, 2025 11:46
@phip1611 phip1611 force-pushed the cyberus-fork-poc-precopy-autoconverge branch from 464bc74 to fc98bdb Compare August 22, 2025 11:49
vmm/src/vm.rs Outdated
VmState::Created
};

let vcpu_throttle_thread_handle = ThrottleThreadHandle::new_from_cpu_manager(&cpu_manager);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let vcpu_throttle_thread_handle = ThrottleThreadHandle::new_from_cpu_manager(&cpu_manager);
let vcpu_throttler = ThrottleThreadHandle::new_from_cpu_manager(&cpu_manager);

@phip1611 phip1611 force-pushed the cyberus-fork-poc-precopy-autoconverge branch 3 times, most recently from 502eaff to f4bd5b4 Compare August 22, 2025 13:04
Copy link

@olivereanderson olivereanderson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the changes and especially for the awesome documentation that made it relatively easy to follow what the code in this commit does!

I have added a few comments below which you can either decide to consider already now or wait until you start preparing this feature for production. See also the list on this issue.

Moreover once we get to making this production ready then if we decide to instead spawn the throttling thread per (full) migration attempt (potentially with seccomp setup) instead, then as we already mentioned in direct messages it should be possible to simplify the code a bit by only having a single loop in the throttling thread.

Comment on lines 454 to 603
fn test_vcpu_throttling_thread_lifecycle() {
for _ in 0..5 {
// State transitions: Waiting -> Exit
{
let mut handler = ThrottleThreadHandle::new(Box::new(|| {}), Box::new(|| {}));

// The test is successful if it does not get stuck.
handler.shutdown();
}

// Dummy CpuManager
let cpus_throttled = Arc::new(AtomicBool::new(false));
let callback_pause_vcpus = {
let cpus_running = cpus_throttled.clone();
Box::new(move || {
let old = cpus_running.swap(true, Ordering::SeqCst);
assert!(!old);
})
};
let callback_resume_vcpus = {
let cpus_running = cpus_throttled.clone();
Box::new(move || {
let old = cpus_running.swap(false, Ordering::SeqCst);
assert!(old);
})
};

// State transitions: Waiting -> Throttle -> Waiting -> Throttle -> Exit
{
let mut handler =
ThrottleThreadHandle::new(callback_pause_vcpus, callback_resume_vcpus);
handler.set_throttle_percent(5);
sleep(Duration::from_millis(ThrottleWorker::TIMESLICE_MS));
handler.set_throttle_percent(10);
sleep(Duration::from_millis(ThrottleWorker::TIMESLICE_MS));

// Assume we aborted vCPU throttling (or the live-migration at all).
handler.set_throttle_percent(0 /* reset to waiting */);
handler.set_throttle_percent(5);
sleep(Duration::from_millis(ThrottleWorker::TIMESLICE_MS));
handler.set_throttle_percent(10);
sleep(Duration::from_millis(ThrottleWorker::TIMESLICE_MS));

// The test is successful if we don't have a panic here due to a
// closed channel.
for _ in 0..10 {
handler.shutdown();
sleep(Duration::from_millis(1));
}

// The test is successful if it does not get stuck.
drop(handler);
}
}
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will suffice for the PoC, but when you start preparing this for production then it would be cool if you would instead insert callbacks that record certain events (e.g. the timestamps when they get called, how many times they are called, etc) and check that this (roughly) corresponds to what one would expect.

/// function must not perform any artificial delay itself.
/// - `callback_resume_vcpus`: Function putting all vCPUs back into running
/// state. The function must not perform any artificial delay itself.
fn new(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit:

Suggested change
fn new(
fn spawn(

///
/// # Parameters
/// - `cpu_manager`: CPU manager to pause and resume vCPUs
pub fn new_from_cpu_manager(cpu_manager: &Arc<Mutex<CpuManager>>) -> Self {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit:

Suggested change
pub fn new_from_cpu_manager(cpu_manager: &Arc<Mutex<CpuManager>>) -> Self {
pub fn spawn_with_cpu_manager(cpu_manager: Arc<Mutex<CpuManager>>) -> Self {

@phip1611 phip1611 force-pushed the cyberus-fork-poc-precopy-autoconverge branch from f4bd5b4 to 55a1549 Compare August 25, 2025 09:16
@phip1611 phip1611 marked this pull request as draft August 27, 2025 14:24
@phip1611 phip1611 force-pushed the cyberus-fork-poc-precopy-autoconverge branch 4 times, most recently from c2bc313 to ff9d41e Compare September 3, 2025 10:55
@phip1611 phip1611 force-pushed the cyberus-fork-poc-precopy-autoconverge branch 3 times, most recently from 3dd8f15 to 0b82a8f Compare September 12, 2025 11:55
auto-converge (vCPU throttling) is a technique combined with precopy
live-migration flows to migrate VMs with a high dirty rate
(high working set with many writes). It is an alternative to postcopy
migration, which is not yet implemented in Cloud Hypervisor.

By throttling the vCPUs incrementally, the dirty rate drops and the
VM migrates (converges) eventually. More specifically, the reduced
dirty rate ensures that the configured downtime can be reached.

The implementation is inspired by QEMU, but adapted to Cloud
Hypervisor. Various discussions, intermediate steps, and experiments
lead to this final result.

vCPU throttling was implemented with a dedicated thread and a
manager for that thread. This thread utilizes the CpuManager's
pause() and resume() in conjunction with (interruptible) sleeps
to apply the current throttling percentage onto the vCPUs, thus
the VM. The implementation is designed to not block or delay
normal operation any longer than necessary.

The proposed design relies on the recent improvements
and fixes for CpuManager's pause() and resume(). For correctness,
on each pause/resume cycle, the time for these actions is measured.
This way, a dynamic timeslice can be used, guaranteeing the VM
is indeed throttled at the indented percentage.

Although not supported yet by Cloud Hypervisor, this thread will
support throttling cancellation when live-migrations are cancelled.

This was intensively tested in an automated setup with thousands
of live-migrations with VMs under load.

- auto-converging starts always after two memory delta transfer
  iterations
- every two iterations, it is increased (step size is 10%)
- maximum throttling is 99%
- the VM will get slower. At 99% throttling, it will be unsurprisingly
  barely usable. This is something users have to accept if they want to
  migrate their VMs running heavy workloads.

Signed-off-by: Philipp Schuster <[email protected]>
Reviewed-by: Stefan Kober <[email protected]>
Reviewed-by: Oliver Anderson <[email protected]>
Reviewed-by: Thomas Prescher <[email protected]>
On-behalf-of: SAP [email protected]
@phip1611 phip1611 force-pushed the cyberus-fork-poc-precopy-autoconverge branch from 0b82a8f to 1b1cabb Compare September 12, 2025 12:52
@phip1611 phip1611 marked this pull request as ready for review September 15, 2025 07:58
@phip1611 phip1611 merged commit 5c96d5f into cyberus-technology:gardenlinux Sep 15, 2025
20 of 21 checks passed
@phip1611 phip1611 deleted the cyberus-fork-poc-precopy-autoconverge branch September 15, 2025 07:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants