-
Notifications
You must be signed in to change notification settings - Fork 565
vmm: cpu_manager: massively accelerate .pause() #7290
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
alyssais
merged 1 commit into
cloud-hypervisor:main
from
phip1611:accelerate-vcpus-pause
Aug 20, 2025
Merged
vmm: cpu_manager: massively accelerate .pause() #7290
alyssais
merged 1 commit into
cloud-hypervisor:main
from
phip1611:accelerate-vcpus-pause
Aug 20, 2025
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
454da2c to
7c6dffb
Compare
alyssais
reviewed
Aug 20, 2025
Member
alyssais
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice find!
With 254 vCPUs, pausing now takes ~4ms instead of >254ms. This improvement is visible when running `ch-remote pause` and is particularly important for live migration, where every millisecond of downtime matters. For the wait logic, it is fine to stick to the approach of sleeping 1ms on the first missed ACK as: 1) we have to wait anyway 2) we give time to the OS, enabling it to schedule a vCPU thread next Signed-off-by: Philipp Schuster <[email protected]> On-behalf-of: SAP [email protected]
7c6dffb to
59eabe4
Compare
alyssais
approved these changes
Aug 20, 2025
phip1611
added a commit
to phip1611/cloud-hypervisor
that referenced
this pull request
Sep 1, 2025
It is odd that for pause(), the CpuManager waited via `state.paused` for the vCPU thread to ACK the state change but not for `resume()`. In the `resume()` case, oddly CpuManager "owned" the state change in `state.paused`. This commit changes this so that the vCPU ACKs its state change from pause->run is also gracefully recognized by `CpuManager::resume()`. This change ensures proper synchronization and prevents situations in that park() follows right after unpark(), causing deadlocks and other weird behavior due to race conditions. Calling resume() now takes slightly longer, very similar to pause(). This is, however, even for 254 vCPUs in the range of less than 10ms. ## Reproducer `ch-remote --api-socket ... pause` ```patch diff --git a/vmm/src/vm.rs b/vmm/src/vm.rs index d7bba25..35557d58f 100644 --- a/vmm/src/vm.rs +++ b/vmm/src/vm.rs @@ -2687,6 +2687,10 @@ impl Pausable for Vm { MigratableError::Pause(anyhow!("Error activating pending virtio devices: {:?}", e)) })?; + for _ in 0..1000 { + self.cpu_manager.lock().unwrap().pause()?; + self.cpu_manager.lock().unwrap().resume()?; + } self.cpu_manager.lock().unwrap().pause()?; self.device_manager.lock().unwrap().pause()?; ``` Since [0] is merged, this fix can be tested for example by modifying the pause() API call to run pause() and resume() in a loop a thousand times. With this change, things do not get stuck anymore. ## Outlook Decades of experience in VMM development showed us that using many AtomicBools is a footgun. They are not synchronized with each other at all. On the long term, we might want to refactor things to have a single AtomicU64 with different bits having different meanings. [0] cloud-hypervisor#7290
phip1611
added a commit
to phip1611/cloud-hypervisor
that referenced
this pull request
Sep 1, 2025
It is odd that for pause(), the CpuManager waited via `state.paused` for the vCPU thread to ACK the state change but not for `resume()`. In the `resume()` case, oddly CpuManager "owned" the state change in `state.paused`. This commit changes this so that the vCPU ACKs its state change from pause->run is also gracefully recognized by `CpuManager::resume()`. This change ensures proper synchronization and prevents situations in that park() follows right after unpark(), causing deadlocks and other weird behavior due to race conditions. Calling resume() now takes slightly longer, very similar to pause(). This is, however, even for 254 vCPUs in the range of less than 10ms. ## Reproducer `ch-remote --api-socket ... pause` ```patch diff --git a/vmm/src/vm.rs b/vmm/src/vm.rs index d7bba25..35557d58f 100644 --- a/vmm/src/vm.rs +++ b/vmm/src/vm.rs @@ -2687,6 +2687,10 @@ impl Pausable for Vm { MigratableError::Pause(anyhow!("Error activating pending virtio devices: {:?}", e)) })?; + for _ in 0..1000 { + self.cpu_manager.lock().unwrap().pause()?; + self.cpu_manager.lock().unwrap().resume()?; + } self.cpu_manager.lock().unwrap().pause()?; self.device_manager.lock().unwrap().pause()?; ``` Since [0] is merged, this fix can be tested for example by modifying the pause() API call to run pause() and resume() in a loop a thousand times. With this change, things do not get stuck anymore. ## Outlook Decades of experience in VMM development showed us that using many AtomicBools is a footgun. They are not synchronized with each other at all. On the long term, we might want to refactor things to have a single AtomicU64 with different bits having different meanings. [0] cloud-hypervisor#7290
phip1611
added a commit
to phip1611/cloud-hypervisor
that referenced
this pull request
Sep 1, 2025
It is odd that for pause(), the CpuManager waited via `state.paused` for the vCPU thread to ACK the state change but not for `resume()`. In the `resume()` case, oddly CpuManager "owned" the state change in `state.paused`. This commit changes this so that the vCPU ACKs its state change from pause->run is also gracefully recognized by `CpuManager::resume()`. This change ensures proper synchronization and prevents situations in that park() follows right after unpark(), causing deadlocks and other weird behavior due to race conditions. Calling resume() now takes slightly longer, very similar to pause(). This is, however, even for 254 vCPUs in the range of less than 10ms. ## Reproducer `ch-remote --api-socket ... pause` ```patch diff --git a/vmm/src/vm.rs b/vmm/src/vm.rs index d7bba25..35557d58f 100644 --- a/vmm/src/vm.rs +++ b/vmm/src/vm.rs @@ -2687,6 +2687,10 @@ impl Pausable for Vm { MigratableError::Pause(anyhow!("Error activating pending virtio devices: {:?}", e)) })?; + for _ in 0..1000 { + self.cpu_manager.lock().unwrap().pause()?; + self.cpu_manager.lock().unwrap().resume()?; + } self.cpu_manager.lock().unwrap().pause()?; self.device_manager.lock().unwrap().pause()?; ``` Since [0] is merged, this fix can be tested for example by modifying the pause() API call to run pause() and resume() in a loop a thousand times. With this change, things do not get stuck anymore. ## Outlook Decades of experience in VMM development showed us that using many AtomicBools is a footgun. They are not synchronized with each other at all. On the long term, we might want to refactor things to have a single AtomicU64 with different bits having different meanings. [0] cloud-hypervisor#7290
phip1611
added a commit
to phip1611/cloud-hypervisor
that referenced
this pull request
Sep 2, 2025
It is odd that for pause(), the CpuManager waited via `state.paused` for the vCPU thread to ACK the state change but not for `resume()`. In the `resume()` case, oddly CpuManager "owned" the state change in `state.paused`. This commit changes this so that the vCPU ACKs its state change from pause->run is also gracefully recognized by `CpuManager::resume()`. This change ensures proper synchronization and prevents situations in that park() follows right after unpark(), causing deadlocks and other weird behavior due to race conditions. Calling resume() now takes slightly longer, very similar to pause(). This is, however, even for 254 vCPUs in the range of less than 10ms. ## Reproducer `ch-remote --api-socket ... pause` ```patch diff --git a/vmm/src/vm.rs b/vmm/src/vm.rs index d7bba25..35557d58f 100644 --- a/vmm/src/vm.rs +++ b/vmm/src/vm.rs @@ -2687,6 +2687,10 @@ impl Pausable for Vm { MigratableError::Pause(anyhow!("Error activating pending virtio devices: {:?}", e)) })?; + for _ in 0..1000 { + self.cpu_manager.lock().unwrap().pause()?; + self.cpu_manager.lock().unwrap().resume()?; + } self.cpu_manager.lock().unwrap().pause()?; self.device_manager.lock().unwrap().pause()?; ``` Since [0] is merged, this fix can be tested for example by modifying the pause() API call to run pause() and resume() in a loop a thousand times. With this change, things do not get stuck anymore. ## Outlook Decades of experience in VMM development showed us that using many AtomicBools is a footgun. They are not synchronized with each other at all. On the long term, we might want to refactor things to have a single AtomicU64 with different bits having different meanings. [0] cloud-hypervisor#7290
olivereanderson
pushed a commit
to cyberus-technology/cloud-hypervisor
that referenced
this pull request
Sep 2, 2025
It is odd that for pause(), the CpuManager waited via `state.paused` for the vCPU thread to ACK the state change but not for `resume()`. In the `resume()` case, oddly CpuManager "owned" the state change in `state.paused`. This commit changes this so that the vCPU ACKs its state change from pause->run is also gracefully recognized by `CpuManager::resume()`. This change ensures proper synchronization and prevents situations in that park() follows right after unpark(), causing deadlocks and other weird behavior due to race conditions. Calling resume() now takes slightly longer, very similar to pause(). This is, however, even for 254 vCPUs in the range of less than 10ms. ## Reproducer `ch-remote --api-socket ... pause` ```patch diff --git a/vmm/src/vm.rs b/vmm/src/vm.rs index d7bba25..35557d58f 100644 --- a/vmm/src/vm.rs +++ b/vmm/src/vm.rs @@ -2687,6 +2687,10 @@ impl Pausable for Vm { MigratableError::Pause(anyhow!("Error activating pending virtio devices: {:?}", e)) })?; + for _ in 0..1000 { + self.cpu_manager.lock().unwrap().pause()?; + self.cpu_manager.lock().unwrap().resume()?; + } self.cpu_manager.lock().unwrap().pause()?; self.device_manager.lock().unwrap().pause()?; ``` Since [0] is merged, this fix can be tested for example by modifying the pause() API call to run pause() and resume() in a loop a thousand times. With this change, things do not get stuck anymore. ## Outlook Decades of experience in VMM development showed us that using many AtomicBools is a footgun. They are not synchronized with each other at all. On the long term, we might want to refactor things to have a single AtomicU64 with different bits having different meanings. [0] cloud-hypervisor#7290
phip1611
added a commit
to phip1611/cloud-hypervisor
that referenced
this pull request
Sep 15, 2025
It is odd that for pause(), the CpuManager waited via `state.paused` for the vCPU thread to ACK the state change but not for `resume()`. In the `resume()` case, oddly CpuManager "owned" the state change in `state.paused`. This commit changes this so that the vCPU ACKs its state change from pause->run is also gracefully recognized by `CpuManager::resume()`. This change ensures proper synchronization and prevents situations in that park() follows right after unpark(), causing deadlocks and other weird behavior due to race conditions. Calling resume() now takes slightly longer, very similar to pause(). This is, however, even for 254 vCPUs in the range of less than 10ms. ## Reproducer `ch-remote --api-socket ... pause` ```patch diff --git a/vmm/src/vm.rs b/vmm/src/vm.rs index d7bba25..35557d58f 100644 --- a/vmm/src/vm.rs +++ b/vmm/src/vm.rs @@ -2687,6 +2687,10 @@ impl Pausable for Vm { MigratableError::Pause(anyhow!("Error activating pending virtio devices: {:?}", e)) })?; + for _ in 0..1000 { + self.cpu_manager.lock().unwrap().pause()?; + self.cpu_manager.lock().unwrap().resume()?; + } self.cpu_manager.lock().unwrap().pause()?; self.device_manager.lock().unwrap().pause()?; ``` Since [0] is merged, this fix can be tested for example by modifying the pause() API call to run pause() and resume() in a loop a thousand times. With this change, things do not get stuck anymore. ## Outlook Decades of experience in VMM development showed us that using many AtomicBools is a footgun. They are not synchronized with each other at all. On the long term, we might want to refactor things to have a single AtomicU64 with different bits having different meanings. [0] cloud-hypervisor#7290
phip1611
added a commit
to phip1611/cloud-hypervisor
that referenced
this pull request
Sep 15, 2025
It is odd that for pause(), the CpuManager waited via `state.paused` for the vCPU thread to ACK the state change but not for `resume()`. In the `resume()` case, oddly CpuManager "owned" the state change in `state.paused`. This commit changes this so that the vCPU ACKs its state change from pause->run is also gracefully recognized by `CpuManager::resume()`. This change ensures proper synchronization and prevents situations in that park() follows right after unpark(), causing deadlocks and other weird behavior due to race conditions. Calling resume() now takes slightly longer, very similar to pause(). This is, however, even for 254 vCPUs in the range of less than 10ms. ## Reproducer `ch-remote --api-socket ... pause` ```patch diff --git a/vmm/src/vm.rs b/vmm/src/vm.rs index d7bba25..35557d58f 100644 --- a/vmm/src/vm.rs +++ b/vmm/src/vm.rs @@ -2687,6 +2687,10 @@ impl Pausable for Vm { MigratableError::Pause(anyhow!("Error activating pending virtio devices: {:?}", e)) })?; + for _ in 0..1000 { + self.cpu_manager.lock().unwrap().pause()?; + self.cpu_manager.lock().unwrap().resume()?; + } self.cpu_manager.lock().unwrap().pause()?; self.device_manager.lock().unwrap().pause()?; ``` Since [0] is merged, this fix can be tested for example by modifying the pause() API call to run pause() and resume() in a loop a thousand times. With this change, things do not get stuck anymore. ## Outlook Decades of experience in VMM development showed us that using many AtomicBools is a footgun. They are not synchronized with each other at all. On the long term, we might want to refactor things to have a single AtomicU64 with different bits having different meanings. [0] cloud-hypervisor#7290
phip1611
added a commit
to phip1611/cloud-hypervisor
that referenced
this pull request
Sep 16, 2025
It is odd that for pause(), the CpuManager waited via `state.paused` for the vCPU thread to ACK the state change but not for `resume()`. In the `resume()` case, oddly CpuManager "owned" the state change in `state.paused`. This commit changes this so that the vCPU ACKs its state change from pause->run is also gracefully recognized by `CpuManager::resume()`. This change ensures proper synchronization and prevents situations in that park() follows right after unpark(), causing deadlocks and other weird behavior due to race conditions. Calling resume() now takes slightly longer, very similar to pause(). This is, however, even for 254 vCPUs in the range of less than 10ms. ## Reproducer `ch-remote --api-socket ... pause` ```patch diff --git a/vmm/src/vm.rs b/vmm/src/vm.rs index d7bba25..35557d58f 100644 --- a/vmm/src/vm.rs +++ b/vmm/src/vm.rs @@ -2687,6 +2687,10 @@ impl Pausable for Vm { MigratableError::Pause(anyhow!("Error activating pending virtio devices: {:?}", e)) })?; + for _ in 0..1000 { + self.cpu_manager.lock().unwrap().pause()?; + self.cpu_manager.lock().unwrap().resume()?; + } self.cpu_manager.lock().unwrap().pause()?; self.device_manager.lock().unwrap().pause()?; ``` Since [0] is merged, this fix can be tested for example by modifying the pause() API call to run pause() and resume() in a loop a thousand times. With this change, things do not get stuck anymore. ## Outlook Decades of experience in VMM development showed us that using many AtomicBools is a footgun. They are not synchronized with each other at all. On the long term, we might want to refactor things to have a single AtomicU64 with different bits having different meanings. [0] cloud-hypervisor#7290
phip1611
added a commit
to phip1611/cloud-hypervisor
that referenced
this pull request
Oct 6, 2025
Fix a race condition that happens in resume()-pause() cycles. It is odd that for pause(), the CpuManager waited via `state.paused` for the vCPU thread to ACK the state change but not for `resume()`. In the `resume()` case, oddly CpuManager "owned" the state change in `state.paused`. This commit changes this so that the vCPU ACKs its state change itself in `state.paused` when it transitions from pause->run. Further, `CpuManager::resume()` now gracefully waits for the vCPU to be resumed. More technical: This change ensures proper synchronization and prevents situations in that park() follows right after unpark(), causing deadlocks and other weird behavior due to race conditions. Calling resume() now takes slightly longer, very similar to pause(). This is, however, even for 254 vCPUs in the range of less than 10ms, and ultimately we now have correct behaviour. ## Reproducer Since [0] is merged, the underlying problem can be tested without this commit by modifying the pause() API call to run `CpuManager::pause()` and `CpuManager::resume()` in a loop a thousand times. `ch-remote --api-socket ... pause` ```patch diff --git a/vmm/src/vm.rs b/vmm/src/vm.rs index d7bba25..35557d58f 100644 --- a/vmm/src/vm.rs +++ b/vmm/src/vm.rs @@ -2687,6 +2687,10 @@ impl Pausable for Vm { MigratableError::Pause(anyhow!("Error activating pending virtio devices: {:?}", e)) })?; + for _ in 0..1000 { + self.cpu_manager.lock().unwrap().pause()?; + self.cpu_manager.lock().unwrap().resume()?; + } self.cpu_manager.lock().unwrap().pause()?; self.device_manager.lock().unwrap().pause()?; ``` ## Outlook Decades of experience in VMM development showed us that using many AtomicBools is a footgun. They are not synchronized with each other at all. On the long term, we might want to refactor things to have a single shared AtomicU64 with different bits having different meanings. [0] cloud-hypervisor#7290
phip1611
added a commit
to phip1611/cloud-hypervisor
that referenced
this pull request
Oct 6, 2025
Fix a race condition that happens in resume()-pause() cycles. It is odd that for pause(), the CpuManager waited via `state.paused` for the vCPU thread to ACK the state change but not for `resume()`. In the `resume()` case, oddly CpuManager "owned" the state change in `state.paused`. This commit changes this so that the vCPU ACKs its state change itself in `state.paused` when it transitions from pause->run. Further, `CpuManager::resume()` now gracefully waits for the vCPU to be resumed. More technical: This change ensures proper synchronization and prevents situations in that park() follows right after unpark(), causing deadlocks and other weird behavior due to race conditions. Calling resume() now takes slightly longer, very similar to pause(). This is, however, even for 254 vCPUs in the range of less than 10ms, and ultimately we now have correct behaviour. ## Reproducer Since [0] is merged, the underlying problem can be tested without this commit by modifying the pause() API call to run `CpuManager::pause()` and `CpuManager::resume()` in a loop a thousand times. `ch-remote --api-socket ... pause` ```patch diff --git a/vmm/src/vm.rs b/vmm/src/vm.rs index d7bba25..35557d58f 100644 --- a/vmm/src/vm.rs +++ b/vmm/src/vm.rs @@ -2687,6 +2687,10 @@ impl Pausable for Vm { MigratableError::Pause(anyhow!("Error activating pending virtio devices: {:?}", e)) })?; + for _ in 0..1000 { + self.cpu_manager.lock().unwrap().pause()?; + self.cpu_manager.lock().unwrap().resume()?; + } self.cpu_manager.lock().unwrap().pause()?; self.device_manager.lock().unwrap().pause()?; ``` ## Outlook Decades of experience in VMM development showed us that using many AtomicBools is a footgun. They are not synchronized with each other at all. On the long term, we might want to refactor things to have a single shared AtomicU64 with different bits having different meanings. [0] cloud-hypervisor#7290
phip1611
added a commit
to phip1611/cloud-hypervisor
that referenced
this pull request
Oct 6, 2025
Fix a race condition that happens in resume()-pause() cycles. It is odd that for pause(), the CpuManager waited via `state.paused` for the vCPU thread to ACK the state change but not for `resume()`. In the `resume()` case, oddly CpuManager "owned" the state change in `state.paused`. This commit changes this so that the vCPU ACKs its state change itself in `state.paused` when it transitions from pause->run. Further, `CpuManager::resume()` now gracefully waits for the vCPU to be resumed. More technical: This change ensures proper synchronization and prevents situations in that park() follows right after unpark(), causing deadlocks and other weird behavior due to race conditions. Calling resume() now takes slightly longer, very similar to pause(). This is, however, even for 254 vCPUs in the range of less than 10ms, and ultimately we now have correct behaviour. ## Reproducer Since [0] is merged, the underlying problem can be tested without this commit by modifying the pause() API call to run `CpuManager::pause()` and `CpuManager::resume()` in a loop a thousand times. `ch-remote --api-socket ... pause` ```patch diff --git a/vmm/src/vm.rs b/vmm/src/vm.rs index d7bba25..35557d58f 100644 --- a/vmm/src/vm.rs +++ b/vmm/src/vm.rs @@ -2687,6 +2687,10 @@ impl Pausable for Vm { MigratableError::Pause(anyhow!("Error activating pending virtio devices: {:?}", e)) })?; + for _ in 0..1000 { + self.cpu_manager.lock().unwrap().pause()?; + self.cpu_manager.lock().unwrap().resume()?; + } self.cpu_manager.lock().unwrap().pause()?; self.device_manager.lock().unwrap().pause()?; ``` ## Outlook Decades of experience in VMM development showed us that using many AtomicBools is a footgun. They are not synchronized with each other at all. On the long term, we might want to refactor things to have a single shared AtomicU64 with different bits having different meanings. [0] cloud-hypervisor#7290 Signed-off-by: Philipp Schuster <[email protected]> On-behalf-of: SAP [email protected]
phip1611
added a commit
to phip1611/cloud-hypervisor
that referenced
this pull request
Oct 6, 2025
Fix a race condition that happens in resume()-pause() cycles. It is odd that for pause(), the CpuManager waited via `state.paused` for the vCPU thread to ACK the state change but not for `resume()`. In the `resume()` case, oddly CpuManager "owned" the state change in `state.paused`. This commit changes this so that the vCPU ACKs its state change itself in `state.paused` when it transitions from pause->run. Further, `CpuManager::resume()` now gracefully waits for the vCPU to be resumed. More technical: This change ensures proper synchronization and prevents situations in that park() follows right after unpark(), causing deadlocks and other weird behavior due to race conditions. Calling resume() now takes slightly longer, very similar to pause(). This is, however, even for 254 vCPUs in the range of less than 10ms, and ultimately we now have correct behaviour. ## Reproducer Since [0] is merged, the underlying problem can be tested without this commit by modifying the pause() API call to run `CpuManager::pause()` and `CpuManager::resume()` in a loop a thousand times. `ch-remote --api-socket ... pause` ```patch diff --git a/vmm/src/vm.rs b/vmm/src/vm.rs index d7bba25..35557d58f 100644 --- a/vmm/src/vm.rs +++ b/vmm/src/vm.rs @@ -2687,6 +2687,10 @@ impl Pausable for Vm { MigratableError::Pause(anyhow!("Error activating pending virtio devices: {:?}", e)) })?; + for _ in 0..1000 { + self.cpu_manager.lock().unwrap().pause()?; + self.cpu_manager.lock().unwrap().resume()?; + } self.cpu_manager.lock().unwrap().pause()?; self.device_manager.lock().unwrap().pause()?; ``` ## Outlook Decades of experience in VMM development showed us that using many AtomicBools is a footgun. They are not synchronized with each other at all. On the long term, we might want to refactor things to have a single shared AtomicU64 with different bits having different meanings. [0] cloud-hypervisor#7290 Signed-off-by: Philipp Schuster <[email protected]> On-behalf-of: SAP [email protected]
phip1611
added a commit
to phip1611/cloud-hypervisor
that referenced
this pull request
Oct 21, 2025
Fix a race condition that happens in resume()-pause() cycles. It is odd that for pause(), the CpuManager waited via `state.paused` for the vCPU thread to ACK the state change but not for `resume()`. In the `resume()` case, oddly CpuManager "owned" the state change in `state.paused`. This commit changes this so that the vCPU ACKs its state change itself in `state.paused` when it transitions from pause->run. Further, `CpuManager::resume()` now gracefully waits for the vCPU to be resumed. More technical: This change ensures proper synchronization and prevents situations in that park() follows right after unpark(), causing deadlocks and other weird behavior due to race conditions. Calling resume() now takes slightly longer, very similar to pause(). This is, however, even for 254 vCPUs in the range of less than 10ms, and ultimately we now have correct behaviour. ## Reproducer Since [0] is merged, the underlying problem can be tested without this commit by modifying the pause() API call to run `CpuManager::pause()` and `CpuManager::resume()` in a loop a thousand times. `ch-remote --api-socket ... pause` ```patch diff --git a/vmm/src/vm.rs b/vmm/src/vm.rs index d7bba25..35557d58f 100644 --- a/vmm/src/vm.rs +++ b/vmm/src/vm.rs @@ -2687,6 +2687,10 @@ impl Pausable for Vm { MigratableError::Pause(anyhow!("Error activating pending virtio devices: {:?}", e)) })?; + for _ in 0..1000 { + self.cpu_manager.lock().unwrap().pause()?; + self.cpu_manager.lock().unwrap().resume()?; + } self.cpu_manager.lock().unwrap().pause()?; self.device_manager.lock().unwrap().pause()?; ``` ## Outlook Decades of experience in VMM development showed us that using many AtomicBools is a footgun. They are not synchronized with each other at all. On the long term, we might want to refactor things to have a single shared AtomicU64 with different bits having different meanings. [0] cloud-hypervisor#7290 Signed-off-by: Philipp Schuster <[email protected]> On-behalf-of: SAP [email protected]
phip1611
added a commit
to phip1611/cloud-hypervisor
that referenced
this pull request
Oct 21, 2025
Fix a race condition that happens in resume()-pause() cycles. It is odd that for pause(), the CpuManager waited via `state.paused` for the vCPU thread to ACK the state change but not for `resume()`. In the `resume()` case, oddly CpuManager "owned" the state change in `state.paused`. This commit changes this so that the vCPU ACKs its state change itself in `state.paused` when it transitions from pause->run. Further, `CpuManager::resume()` now gracefully waits for the vCPU to be resumed. More technical: This change ensures proper synchronization and prevents situations in that park() follows right after unpark(), causing deadlocks and other weird behavior due to race conditions. Calling resume() now takes slightly longer, very similar to pause(). This is, however, even for 254 vCPUs in the range of less than 10ms, and ultimately we now have correct behaviour. ## Reproducer Since [0] is merged, the underlying problem can be tested without this commit by modifying the pause() API call to run `CpuManager::pause()` and `CpuManager::resume()` in a loop a thousand times. `ch-remote --api-socket ... pause` ```patch diff --git a/vmm/src/vm.rs b/vmm/src/vm.rs index d7bba25..35557d58f 100644 --- a/vmm/src/vm.rs +++ b/vmm/src/vm.rs @@ -2687,6 +2687,10 @@ impl Pausable for Vm { MigratableError::Pause(anyhow!("Error activating pending virtio devices: {:?}", e)) })?; + for _ in 0..1000 { + self.cpu_manager.lock().unwrap().pause()?; + self.cpu_manager.lock().unwrap().resume()?; + } self.cpu_manager.lock().unwrap().pause()?; self.device_manager.lock().unwrap().pause()?; ``` ## Outlook Decades of experience in VMM development showed us that using many AtomicBools is a footgun. They are not synchronized with each other at all. On the long term, we might want to refactor things to have a single shared AtomicU64 with different bits having different meanings. [0] cloud-hypervisor#7290 Signed-off-by: Philipp Schuster <[email protected]> On-behalf-of: SAP [email protected]
phip1611
added a commit
to phip1611/cloud-hypervisor
that referenced
this pull request
Oct 22, 2025
Fix a race condition that happens in resume()-pause() cycles. It is odd that for pause(), the CpuManager waited via `state.paused` for the vCPU thread to ACK the state change but not for `resume()`. In the `resume()` case, oddly CpuManager "owned" the state change in `state.paused`. This commit changes this so that the vCPU ACKs its state change itself in `state.paused` when it transitions from pause->run. Further, `CpuManager::resume()` now gracefully waits for the vCPU to be resumed. More technical: This change ensures proper synchronization and prevents situations in that park() follows right after unpark(), causing deadlocks and other weird behavior due to race conditions. Calling resume() now takes slightly longer, very similar to pause(). This is, however, even for 254 vCPUs in the range of less than 10ms, and ultimately we now have correct behaviour. ## Reproducer Since [0] is merged, the underlying problem can be tested without this commit by modifying the pause() API call to run `CpuManager::pause()` and `CpuManager::resume()` in a loop a thousand times. `ch-remote --api-socket ... pause` ```patch diff --git a/vmm/src/vm.rs b/vmm/src/vm.rs index d7bba25..35557d58f 100644 --- a/vmm/src/vm.rs +++ b/vmm/src/vm.rs @@ -2687,6 +2687,10 @@ impl Pausable for Vm { MigratableError::Pause(anyhow!("Error activating pending virtio devices: {:?}", e)) })?; + for _ in 0..1000 { + self.cpu_manager.lock().unwrap().pause()?; + self.cpu_manager.lock().unwrap().resume()?; + } self.cpu_manager.lock().unwrap().pause()?; self.device_manager.lock().unwrap().pause()?; ``` ## Outlook Decades of experience in VMM development showed us that using many AtomicBools is a footgun. They are not synchronized with each other at all. On the long term, we might want to refactor things to have a single shared AtomicU64 with different bits having different meanings. [0] cloud-hypervisor#7290 Signed-off-by: Philipp Schuster <[email protected]> On-behalf-of: SAP [email protected]
github-merge-queue bot
pushed a commit
that referenced
this pull request
Oct 22, 2025
Fix a race condition that happens in resume()-pause() cycles. It is odd that for pause(), the CpuManager waited via `state.paused` for the vCPU thread to ACK the state change but not for `resume()`. In the `resume()` case, oddly CpuManager "owned" the state change in `state.paused`. This commit changes this so that the vCPU ACKs its state change itself in `state.paused` when it transitions from pause->run. Further, `CpuManager::resume()` now gracefully waits for the vCPU to be resumed. More technical: This change ensures proper synchronization and prevents situations in that park() follows right after unpark(), causing deadlocks and other weird behavior due to race conditions. Calling resume() now takes slightly longer, very similar to pause(). This is, however, even for 254 vCPUs in the range of less than 10ms, and ultimately we now have correct behaviour. ## Reproducer Since [0] is merged, the underlying problem can be tested without this commit by modifying the pause() API call to run `CpuManager::pause()` and `CpuManager::resume()` in a loop a thousand times. `ch-remote --api-socket ... pause` ```patch diff --git a/vmm/src/vm.rs b/vmm/src/vm.rs index d7bba25..35557d58f 100644 --- a/vmm/src/vm.rs +++ b/vmm/src/vm.rs @@ -2687,6 +2687,10 @@ impl Pausable for Vm { MigratableError::Pause(anyhow!("Error activating pending virtio devices: {:?}", e)) })?; + for _ in 0..1000 { + self.cpu_manager.lock().unwrap().pause()?; + self.cpu_manager.lock().unwrap().resume()?; + } self.cpu_manager.lock().unwrap().pause()?; self.device_manager.lock().unwrap().pause()?; ``` ## Outlook Decades of experience in VMM development showed us that using many AtomicBools is a footgun. They are not synchronized with each other at all. On the long term, we might want to refactor things to have a single shared AtomicU64 with different bits having different meanings. [0] #7290 Signed-off-by: Philipp Schuster <[email protected]> On-behalf-of: SAP [email protected]
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
vmm: cpu_manager: massively accelerate .pause()
With 254 vCPUs, pausing now takes ~4ms instead of >254ms. This
improvement is visible when running
ch-remote pauseand isparticularly important for live migration, where every millisecond
of downtime matters.
For the wait logic, it is fine to stick to the approach of
sleeping 1ms on the first missed ACK as: