Skip to content

Conversation

@phip1611
Copy link
Member

@phip1611 phip1611 commented Sep 3, 2025

About

vmm: fix kicking vCPU out of kvm run vCPU from signal handler

More details in the commit messages!

Checklist (for Author)

  • libvirt-tests
    pipeline succeeded (currently this must be done manually locally)
  • PR associated with ticket

Hints for Reviewers

  • please note that we need this to unblock the vCPU auto-converge feature
  • we nevertheless can improve this code here, but we should be pragmatic

Steps to Undraft (if draft)

None.

@phip1611 phip1611 self-assigned this Sep 3, 2025
@phip1611 phip1611 force-pushed the cyberus-fork-poc-signal-delivery-fix branch 2 times, most recently from 88f62ce to 9cf2a8d Compare September 3, 2025 10:45
@phip1611 phip1611 changed the title vmm: fix kicking vCPU out of kvm run vCPU from signal handler vmm: fix kicking vCPU out of KVM_RUN vCPU from signal handler Sep 3, 2025
@phip1611 phip1611 force-pushed the cyberus-fork-poc-signal-delivery-fix branch 4 times, most recently from 215c4c7 to 58691e2 Compare September 3, 2025 11:39
@phip1611
Copy link
Member Author

phip1611 commented Sep 3, 2025

Any idea how to abstract away the things we have to do here more nicely?

I am thinking about a new function on the Vcpu trait that does something like on_signal_received() or so that does the necessary stuff behind the scenes?

@phip1611 phip1611 force-pushed the cyberus-fork-poc-signal-delivery-fix branch 3 times, most recently from b00df93 to 867af0a Compare September 3, 2025 12:01
Copy link

@olivereanderson olivereanderson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can make this simpler and more efficient if we simply set kvm_run in a thread local as I suggest in my comments below:

@phip1611 phip1611 force-pushed the cyberus-fork-poc-signal-delivery-fix branch from 867af0a to 5246c44 Compare September 3, 2025 18:36
@olivereanderson
Copy link

Any idea how to abstract away the things we have to do here more nicely?

I am thinking about a new function on the Vcpu trait that does something like on_signal_received() or so that does the necessary stuff behind the scenes?

I think we should make the PoC "good enough" to merge first and then answer this question together with what to do with the single vCPU removal case when we start preparing this for production usage/upstream.

@phip1611
Copy link
Member Author

phip1611 commented Sep 4, 2025

I think we should make the PoC "good enough" to merge first and then answer this question together with what to do with the single vCPU removal case when we start preparing this for production usage/upstream.

perhaps we could add a todo!() to the API handler that (un)plugs CPUs to be more fail-stop.

@phip1611 phip1611 force-pushed the cyberus-fork-poc-signal-delivery-fix branch from cf011a3 to 24f2291 Compare September 10, 2025 14:55
@phip1611
Copy link
Member Author

Please re-check and also review this please commit-by-commit and read the commit messages.

@phip1611 phip1611 force-pushed the cyberus-fork-poc-signal-delivery-fix branch from 24f2291 to 151a827 Compare September 10, 2025 15:00
Copy link

@olivereanderson olivereanderson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review of the first commit.

Copy link

@olivereanderson olivereanderson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review of commit 2: LGTM

@phip1611 phip1611 force-pushed the cyberus-fork-poc-signal-delivery-fix branch 4 times, most recently from 1a59d0e to 81897a2 Compare September 10, 2025 16:25
@phip1611 phip1611 force-pushed the cyberus-fork-poc-signal-delivery-fix branch 2 times, most recently from be0303e to dc79b66 Compare September 11, 2025 05:48
Copy link

@olivereanderson olivereanderson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move the code that sets KVM_RUN into the spawned thread. Otherwise each thread will have a null pointer for kvm_run. Apart from that this is good.

@phip1611 phip1611 force-pushed the cyberus-fork-poc-signal-delivery-fix branch from dc79b66 to c8b6a7b Compare September 11, 2025 06:44
@phip1611 phip1611 force-pushed the cyberus-fork-poc-signal-delivery-fix branch 4 times, most recently from c41c34d to 39948f3 Compare September 11, 2025 07:48
olivereanderson

This comment was marked as outdated.

No need to grab the lock multiple times in this short period
of time. The lock is anyway held for the duration of the long
operation (KVM_RUN).
These are the prerequisites for the upcoming (quick and dirty)
solution to the problem that we might miss some events.
A common scenario for a VMM to regain control over the vCPU thread from
the hypervisor is to interrupt the vCPU. A use-case might be the `pause`
API call of CHV.

VMMs using KVM as hypervisor must use signals for this interception, i.e., a
thread sends a signal to the vCPU thread. Sending and handling these signals
is inherently racy because the signal sender does not know if the receiving
thread is currently in the RUN_VCPU [0] call, or executing userspace VMM
code.

If we are in kernel space in KVM_RUN, things are easy as KVM just exits with
-EINTR. For user-space this is more complicated. For example, it might
happen that we receive a signal but the vCPU thread was about to go into the
KVM_RUN system call as next instruction. There is no more opportunity to
check for any pending signal flag or similar.

KVM offers the `immediate_exit` flag [1] as part of the KVM_RUN structure
for that. The signal handler of a vCPU is supposed to set this flag, to
ensure that we do not miss any events. If the flag is set, KVM_RUN will
exit immediately [2].

We will miss signals to the vCPU if the vCPU thread is in userspace VMM
code and we do not use the `immediate_exit` flag.

We must have access to the KVM_RUN data structure when the signal
handler executes in a vCPU thread's context and set the
`immediate_exit` [1] flag. This way, the next invocation of KVM_RUN
exits immediately and the userspace VMM code can do the normal event
handling.

We must not use any shared locks between the normal vCPU thread VMM
code and the signal handler, as otherwise we might end up in deadlocks.

The signal handler therefore needs its dedicated mutable version of
KVM_RUN.

This commit introduces a (very hacky but good enough for a PoC) solution
to this problem.

[0] https://docs.kernel.org/virt/kvm/api.html#kvm-run
[1] https://docs.kernel.org/virt/kvm/api.html#the-kvm-run-structure
[2] https://elixir.bootlin.com/linux/v6.12/source/arch/x86/kvm/x86.c#L11566
@phip1611 phip1611 force-pushed the cyberus-fork-poc-signal-delivery-fix branch from 3557949 to 5877ee0 Compare September 12, 2025 09:38
Copy link

@olivereanderson olivereanderson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Thanks for the changes!

@phip1611 phip1611 merged commit 7586272 into cyberus-technology:gardenlinux Sep 12, 2025
21 of 22 checks passed
@phip1611 phip1611 deleted the cyberus-fork-poc-signal-delivery-fix branch September 12, 2025 11:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants