-
Notifications
You must be signed in to change notification settings - Fork 565
[WIP] add support for live-migration of virtio-net devices & network FDs #7150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
2771036 to
07dbecf
Compare
dce61a3 to
4943a69
Compare
|
I'm not sure how to address the live-migration of virtio-net devices in In the case of libvirt, things are working as there is a UNIX domain socket where one can send FDs using a SCS_RIGHTS message next to the HTTP REST JSON Request. My current preferred approach: When a user provides |
f4c1a89 to
85c47f5
Compare
6434161 to
6e6888f
Compare
Why not? |
Well, how do you open these FDs in ch-remote?
The only way I can think of is the following bash script to send FDs via ch-remote to Cloud Hypervisor: # Open tap0 as FD 44
exec 44<>/tap/tap0
# Note that the API Receive Magic in CHV might replace this FD
# with an FD that is available in the FD space of the cloud
# hypervisor process; semantically it is the same FD tho
ch-remote --api-socket /tmp/chv-dst.sock \
receive-migration receiver_url=/tmp/chv-migration.sock,,net_fds=[net1@[44]]but I don't know how practical or common this use-case is. |
I wouldn't do it in bash, but I'd be very interested in doing exactly this: invoking ch-remote with file descriptors set up, and referring to those descriptors on its command line. |
|
But how would you get these FDs to ch-remote? Why would a management layer does that when it can directly with CHV via the socket? |
|
Because it might be a script (I'd likely use execline, which is basically designed for doing this sort of thing, but could be a shell), or it could be some other simple thing where the overhead of doing JSON over HTTP is just higher than opening a file and then running a binary. |
|
Interesting, never heard of it. Thanks for the pointer! IMHO: I think what you proposed should be solved independently of this PR in a dedicated PR, as
Does that sound like a fair plan? |
|
Yeah, fine by me. |
720eb85 to
07103d3
Compare
This comment was marked as outdated.
This comment was marked as outdated.
07103d3 to
2277085
Compare
During development this new message helps to quickly parse the logs for success. In case this message is not shown but the last message is not an error, one can assume that likely a livelock (locking a contended lock) is the cause of the problem. Signed-off-by: Philipp Schuster <[email protected]> On-behalf-of: SAP [email protected]
One can call `to_vec()` anyway if one needs an owned copy. This change further helps to prevent needless copies in upcoming changes. Signed-off-by: Philipp Schuster <[email protected]> On-behalf-of: SAP [email protected]
Signed-off-by: Philipp Schuster <[email protected]> On-behalf-of: SAP [email protected]
Signed-off-by: Philipp Schuster <[email protected]> On-behalf-of: SAP [email protected]
To ease debugging of networking in the field, especially in context of libvirt, state save/resume, and live-migration, more logging helps to identify what happens behind the scenes in certain corner-cases as well as (apparently) normal operation. For example: There are multiple ways to create virtio-net devices: - with vhost-user backend - from provided network FDs - from provided interface name of Tap device is given - Tap device is created by CHV (fallback) To confirm that the expected behavior occurs—especially in the more complex case of network file descriptors—these logging statements provide valuable insight into the system's internal operations. Signed-off-by: Philipp Schuster <[email protected]> On-behalf-of: SAP [email protected]
Live migration, state save/resume, and hotplug are not trivial when it comes to virtio-net devices backed by network FDs. As the mechanism behind it can be considered as quite "multi-step magic" even for experienced programmers, it makes sense to thoroughly document this to ease debugging and to improve the mental model of developers working on this in the future. Signed-off-by: Philipp Schuster <[email protected]> On-behalf-of: SAP [email protected]
Signed-off-by: Philipp Schuster <[email protected]> On-behalf-of: SAP [email protected]
(1) The old messages are missing the "why" part. With this change, users
of Cloud Hypervisor have somehow more context and people looking at
the code perfectly know what's going on.
(2) Using warn! implies that the user should take action, but in this
case, there’s nothing the user can do. If the API is used correctly
and file descriptors are passed via an SCM_RIGHTS message over a
UNIX domain socket, everything works as intended. In that case,
there's no need to issue a warning — debug! is sufficient.
Signed-off-by: Philipp Schuster <[email protected]>
On-behalf-of: SAP [email protected]
Deserializing values as `-1` makes sense to prevent errors, so let's keep it. However, serializing them differently adds confusion. For example, a `ch-remote info` call should not report `-1` but the actual FDs. Signed-off-by: Philipp Schuster <[email protected]> On-behalf-of: SAP [email protected]
This unifies error handling, the implementation, and logging. Otherwise, we have code repetition, especially with the upcoming support for the VmReceiveMigration patch. Signed-off-by: Philipp Schuster <[email protected]> On-behalf-of: SAP [email protected]
This adds support for the live migration of virtio-net devices backed by network FDs. So far, this was tested successfully with a patched libvirt setup. `receive-migration` via ch-remote is not supported as transferring any FDs from there to Cloud Hypervisor isn't really sensible in typical workflows. Signed-off-by: Philipp Schuster <[email protected]> On-behalf-of: SAP [email protected]
2277085 to
2674da9
Compare
This adds support for the live migration of virtio-net devices backed by
network FDs along with some pre-requisites and QoL improvements.
So far, this was tested successfully with a patched libvirt setup (patches will be upstreamed soon). @hertrste is the responsible person here.
Closes #7054 #7291.
Hints for Reviewers
Steps to Undraft