-
Notifications
You must be signed in to change notification settings - Fork 565
vmm: cpu: Alternative pause race avoidance using immediate_exit #7455
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vmm: cpu: Alternative pause race avoidance using immediate_exit #7455
Conversation
rbradford
commented
Nov 3, 2025
- vmm: properly unset immediate_exit on -EINTR
- vmm: cpu: Remove (now) unnecessary clearing of KVM set_immediate_exit()
- vmm: Use set_immediate_exit to avoid a race on pausing the vCPU
- vmm: cpu: Clear immediate_exit flag on resume
Also see [0] for more info. [0] https://docs.kernel.org/virt/kvm/api.html#the-kvm-run-structure Signed-off-by: Philipp Schuster <[email protected]> On-behalf-of: SAP [email protected]
This is now done as a part of the handling exits that result in an EINTR error. Signed-off-by: Rob Bradford <[email protected]>
A race can occur if the signal to the vCPU thread is sent between the shared state atomic being checked (e.g. for paused) and entering KVM_RUN. As such we can use KVM's immediate_exit boolean to make the KVM_RUN exit immediately. We only conditionally try and take the lock - since because we can end up a deadlock from a priority inversion situation where the ordering of signals doesn't match the ordering in which we take the locks. Fixes: cloud-hypervisor#7427 Signed-off-by: Rob Bradford <[email protected]>
This avoids an extra KVM_RUN exit upon resume where the immediate_exit flags is still set. Signed-off-by: Rob Bradford <[email protected]>
a5786b5 to
8b1ac03
Compare
| Ok(mut vcpu) => { | ||
| vcpu.vcpu.set_immediate_exit(true); | ||
| } | ||
| Err(std::sync::TryLockError::WouldBlock) => {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be better (e.g. reducing the race window further) if we call into signal_thread() here, rather than waiting for looping all vcpus? Essentially, we will be using vcpu.try_lock() to detect from vmm thread if the vcpu thread is executing KVM_RUN, and pick either set_immediate_exit(true) or signal_thread to kick out the vcpu thread.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think picking between two paths would actually increase the chance of a race (and setting immediate_exit or signalling are both cheap) - one of the reasons we have to do it like this is that these loops are split. If we recombine them like this - I think we can avoid the race entirely (the reduces the chance further but I think it could still theoretically happen):
fn signal_vcpus(&mut self) {
// If the lock is unavailable it is almost certainly due to being in KVM_RUN itself so the
// signal will kick it out.
for vcpu_id in 0..self.vcpus.len() {
match self.vcpus[vcpu_id].try_lock() {
Ok(mut vcpu) => {
vcpu.vcpu.set_immediate_exit(true);
}
Err(std::sync::TryLockError::WouldBlock) => {}
Err(e) => {
unreachable!("{e:?}")
}
}
self.vcpu_states[vcpu_id].signal_thread();
// In this case it's fine to wait for the vCPU run to make the lock available for the taking
self.vcpus[vcpu_id]
.lock()
.unwrap()
.vcpu
.set_immediate_exit(true);
self.vcpu_states[vcpu_id].wait_until_signal_acknowledged();
}
}We can't do this two attempts - one non-blocking, one blocking with the split loop variant as you get a deadlock due to the signals not being handled in vCPU ordering.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These were split into two loops in this commit - c1f4df6
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think picking between two paths would actually increase the chance of a race
Hmm... my view is the opposite. I thought with such change, we will be able to further reduce the time window between vmm thread detected vCPU thread is in KVM_RUN and vmm thread kick vCPU thread with signal, which is the racing window. Reducing the race window would reduce the changes of race, isn't it?
The change I am thinking is:
fn signal_vcpus(&self) {
// If the lock is unavailable it is almost certainly due to being in KVM_RUN itself so the
// signal will kick it out.
for vcpu_id in 0..self.vcpus.len() {
match self.vcpus[vcpu_id].try_lock() {
Ok(mut vcpu) => {
// vCPU thread is not in KVM_RUN, kick it out with set_immediate_exit
vcpu.vcpu.set_immediate_exit(true);
}
Err(std::sync::TryLockError::WouldBlock) => {
// vCPU thread is in KVM_RUN, kick it out with signal
self.vcpu_states[vcpu_id].signal_thread();
}
Err(e) => {
unreachable!("{e:?}")
}
}
}
for state in self.vcpu_states.iter() {
state.wait_until_signal_acknowledged();
}
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right - but the thing is that KVM_RUN isn't the only place that the lock is taken - it's the most likely. I think conditionally sending the signal could result in situations where the wait_until_signal_acknowledged spins forever.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've gone ahead and added even more "defence in depth" - by making the interrupted signalling check and retry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right - but the thing is that
KVM_RUNisn't the only place that the lock is taken
I see. How about the following change? It should help with reducing the race further, while not assuming KVM_RUN is the only place where the vcpu lock is held.
fn signal_vcpus(&self) {
// If the lock is unavailable it is almost certainly due to being in KVM_RUN itself so the
// signal will kick it out.
for vcpu_id in 0..self.vcpus.len() {
match self.vcpus[vcpu_id].try_lock() {
Ok(mut vcpu) => {
// Try to avoid the race mentioned above, when the vcpu lock is not taken.
vcpu.vcpu.set_immediate_exit(true);
}
Err(std::sync::TryLockError::WouldBlock) => {}
Err(e) => {
unreachable!("{e:?}")
}
}
// Always send signal to kick out vCPU thread, as KVM_RUN is not the only place
// that the vCPU lock could be taken.
self.vcpu_states[vcpu_id].signal_thread();
}
for state in self.vcpu_states.iter() {
state.wait_until_signal_acknowledged();
}
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure - let me incorporate that change - will tackle that tomorrow.
6061b65 to
273b3ec
Compare
vmm/src/cpu.rs
Outdated
| /// | ||
| /// Returns whether the vCPU was interrupted within the 1 second timeout | ||
| fn wait_until_signal_acknowledged(&self) -> bool { | ||
| let start = Instant::now(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's split such commit into a different PR, as such change addresses a separate topic.
Also, we should report at least a warning if a timeout occurs here, as this likely suggests abnormal behavior that warrants our attention.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've actually improved it since - but yes, let me split it out (and tbh, this change on it's own would probably fix the lost signal issue in #7427 itself.)
By retrying the signalling on the vCPU thread if it doesn't respond within 1 second the race between pause being requested and the KVM_RUN ioctl returning can be mitigated. Signed-off-by: Rob Bradford <[email protected]>
273b3ec to
b8a5afc
Compare
|
Just to make sure that I get it right. Your high-level idea is that instead of doing (unsafe) stuff in the signal handler, we stick to the NOOP signal handler and instead you tackle the challenge at places than cause the signal to be send? I think this could be a possible way forward, but we need some time to think about it and evaluate it. Thanks for working on this! |
|
I've added our (=@cyberus-technology) analysis, statement, and recommendation in the issue #7427 (comment). |
|
Closing as @phip1611 is going to raise a new PR with a proposed solution. |