Skip to content

Conversation

@phip1611
Copy link
Member

@phip1611 phip1611 commented Sep 22, 2025

For a graceful resource management of externally provided FDs in Cloud
Hypervisor, corresponding FDs need to be closed on a device removal.
This is the case for virtio-net devices using external FDs, for example.

With the fix introduced in this commit, we allow management software to
properly clean up resources, e.g., libvirt can clean up tap devices.

PS: CHV uses "added" and "removed", which has the same meaning as
hot device attach/hotplug and hot device detach/unplug.


Part of #7291.

@phip1611 phip1611 requested a review from a team as a code owner September 22, 2025 16:01
@phip1611 phip1611 force-pushed the productize-network_fd_livemig_prep-1 branch from 014ccc5 to 99e72c1 Compare September 22, 2025 16:03
fds.append(&mut preserved_fds.clone());
for fd in preserved_fds {
if fds.contains(fd) {
log::warn!("fd {fd} is already preserved, skipping");
Copy link
Member Author

@phip1611 phip1611 Sep 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, this should be an error I think. Otherwise, we have non-unique resources which is undefined behavior anyway

Copy link
Member

@rbradford rbradford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't use the terminology of attached/detached - we use added/removed. Please update the commit message, comment, etc to be consistent with that (otherwise it might be confusing!.)

.unwrap()
.device_type(),
);
// When the device is hot-detached, we close all file descriptors
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(We also use "hotplugged" as well as added.)

@phip1611 phip1611 force-pushed the productize-network_fd_livemig_prep-1 branch 3 times, most recently from f415134 to 82070e1 Compare September 25, 2025 10:47
@phip1611 phip1611 changed the title devices: gracefully close preserved FDs on hot device detach devices: gracefully close preserved FDs on hot device remove Sep 25, 2025
@phip1611 phip1611 force-pushed the productize-network_fd_livemig_prep-1 branch from 82070e1 to f9d6055 Compare September 25, 2025 14:10
For a graceful resource management of externally provided FDs in Cloud
Hypervisor, corresponding FDs need to be closed on a device removal.
This is the case for virtio-net devices using external FDs, for example.

With the fix introduced in this commit, we allow management software to
properly clean up resources, e.g., libvirt can clean up tap devices.

PS: CHV uses "added" and "removed", which has the same meaning as
hot device attach/hotplug and hot device detach/unplug.

Signed-off-by: Philipp Schuster <[email protected]>
On-behalf-of: SAP [email protected]
@phip1611 phip1611 force-pushed the productize-network_fd_livemig_prep-1 branch from f9d6055 to e261ecf Compare September 25, 2025 14:22
@rbradford rbradford added this pull request to the merge queue Sep 25, 2025
Merged via the queue into cloud-hypervisor:main with commit a6426e3 Sep 25, 2025
38 of 40 checks passed
@phip1611 phip1611 deleted the productize-network_fd_livemig_prep-1 branch September 29, 2025 07:32
phip1611 added a commit to phip1611/cloud-hypervisor that referenced this pull request Oct 23, 2025
Allowing the same file descriptor (FD) to be preserved more than once
can lead to severe issues. Consider the case where management software adds
two virtio-net devices at runtime, both backed by the same externally
provided FD.

When the first device is dropped, the FD for the second device becomes
invalid [0].

To avoid this misconfiguration, we now prevent multiple preservation attempts
of the same FD.

[0] cloud-hypervisor#7371
phip1611 added a commit to phip1611/cloud-hypervisor that referenced this pull request Oct 23, 2025
Allowing the same file descriptor (FD) to be preserved more than once
can lead to severe issues. Consider the case where management software
adds two virtio-net devices at runtime, both backed by the same
externally provided FD.

When the first device is removed during runtime, the FD for the second
device becomes invalid [0].

To avoid misconfigurations, we now prevent multiple preservation
attempts of the same FD.

[0] cloud-hypervisor#7371

Signed-off-by: Philipp Schuster <[email protected]>
On-behalf-of: SAP [email protected]
phip1611 added a commit to phip1611/cloud-hypervisor that referenced this pull request Oct 23, 2025
Allowing the same file descriptor (FD) to be preserved more than once
can lead to severe issues. Consider the case where management software
adds two virtio-net devices at runtime, both backed by the same
externally provided FD.

When the first device is removed during runtime, the FD for the second
device becomes invalid [0].

To avoid misconfigurations, we now prevent multiple preservation
attempts of the same FD.

[0] cloud-hypervisor#7371

Signed-off-by: Philipp Schuster <[email protected]>
On-behalf-of: SAP [email protected]
phip1611 added a commit to phip1611/cloud-hypervisor that referenced this pull request Oct 23, 2025
Allowing the same file descriptor (FD) to be preserved more than once
can lead to severe issues. Consider the case where management software
adds two virtio-net devices at runtime, both backed by the same
externally provided FD.

When the first device is removed during runtime, the FD for the second
device becomes invalid [0].

To avoid misconfigurations, we now prevent multiple preservation
attempts of the same FD.

[0] cloud-hypervisor#7371

Signed-off-by: Philipp Schuster <[email protected]>
On-behalf-of: SAP [email protected]
phip1611 added a commit to phip1611/cloud-hypervisor that referenced this pull request Oct 23, 2025
Allowing the same file descriptor (FD) to be preserved more than once
can lead to severe issues. Consider the case where management software
adds two virtio-net devices at runtime, both backed by the same
externally provided FD.

When the first device is removed during runtime, the FD for the second
device becomes invalid [0].

To avoid misconfigurations, we now prevent multiple preservation
attempts of the same FD.

This however has some caveats: In the current model, external FDs are
added to the list of preserved FDs after the corresponding device has
been created successfully. Now assume the following:
- VM boots
  - VM config has no preserved FDs
  - We add a device to the VM that uses an external FD for its Tap dev
  - The device is successfully initialized
  - VM config is extended with the preserved FDs
- VM reboots
  - VM config already has preserved FDs
  - Same as above
  - CRASH

Therefore, we need to know for every preserved FD whether it is "cold",
i.e., only has been part of a former VmConfig without associated device,
or "hot", i.e., it is currently also actively in use. This allows state
transitions from cold->hot in which case we won't throw an error for a
reused FD. Otherwise, we throw errors.

This requires that on shutdown, all preserved FDs are marked as cold.

[0] cloud-hypervisor#7371

Signed-off-by: Philipp Schuster <[email protected]>
On-behalf-of: SAP [email protected]
phip1611 added a commit to phip1611/cloud-hypervisor that referenced this pull request Oct 23, 2025
Allowing the same file descriptor (FD) to be preserved more than once
can lead to severe issues. Consider the case where management software
adds two virtio-net devices at runtime, both backed by the same
externally provided FD.

When the first device is removed during runtime, the FD for the second
device becomes invalid [0].

To avoid misconfigurations, we now prevent multiple preservation
attempts of the same FD.

This however has some caveats: In the current model, external FDs are
added to the list of preserved FDs after the corresponding device has
been created successfully. Now assume the following:
- VM boots
  - VM config has no preserved FDs
  - We add a device to the VM that uses an external FD for its Tap dev
  - The device is successfully initialized
  - VM config is extended with the preserved FDs
- VM reboots
  - VM config already has preserved FDs
  - Same as above
  - CRASH

Therefore, we need to know for every preserved FD whether it is "cold",
i.e., only has been part of a former VmConfig without associated device,
or "hot", i.e., it is currently also actively in use. This allows state
transitions from cold->hot in which case we won't throw an error for a
reused FD. Otherwise, we throw errors.

This requires that on shutdown, all preserved FDs are marked as cold.

[0] cloud-hypervisor#7371

Signed-off-by: Philipp Schuster <[email protected]>
On-behalf-of: SAP [email protected]
phip1611 added a commit to phip1611/cloud-hypervisor that referenced this pull request Oct 23, 2025
Allowing the same file descriptor (FD) to be preserved more than once
can lead to severe issues. Consider the case where management software
adds two virtio-net devices at runtime, both backed by the same
externally provided FD.

When the first device is removed during runtime, the FD for the second
device becomes invalid [0].

To avoid misconfigurations, we now prevent multiple preservation
attempts of the same FD.

This however has some caveats: In the current model, external FDs are
added to the list of preserved FDs after the corresponding device has
been created successfully. Now assume the following:
- VM boots
  - VM config has no preserved FDs
  - We add a device to the VM that uses an external FD for its Tap dev
  - The device is successfully initialized
  - VM config is extended with the preserved FDs
- VM reboots
  - VM config already has preserved FDs
  - Same as above
  - CRASH

Therefore, we need to know for every preserved FD whether it is "cold",
i.e., only has been part of a former VmConfig without associated device,
or "hot", i.e., it is currently also actively in use. This allows state
transitions from cold->hot in which case we won't throw an error for a
reused FD. Otherwise, we throw errors.

This requires that on shutdown, all preserved FDs are marked as cold.

[0] cloud-hypervisor#7371

Signed-off-by: Philipp Schuster <[email protected]>
On-behalf-of: SAP [email protected]
phip1611 added a commit to phip1611/cloud-hypervisor that referenced this pull request Oct 27, 2025
Allowing the same file descriptor (FD) to be preserved more than once
can lead to severe issues. Consider the case where management software
adds two virtio-net devices at runtime, both backed by the same
externally provided FD.

When the first device is removed during runtime, the FD for the second
device becomes invalid [0].

To avoid misconfigurations, we now prevent multiple preservation
attempts of the same FD.

This however has some caveats: In the current model, external FDs are
added to the list of preserved FDs after the corresponding device has
been created successfully. Now assume the following:
- VM boots
  - VM config has no preserved FDs
  - We add a device to the VM that uses an external FD for its Tap dev
  - The device is successfully initialized
  - VM config is extended with the preserved FDs
- VM reboots
  - VM config already has preserved FDs
  - Same as above
  - CRASH

Therefore, we need to know for every preserved FD whether it is "cold",
i.e., only has been part of a former VmConfig without associated device,
or "hot", i.e., it is currently also actively in use. This allows state
transitions from cold->hot in which case we won't throw an error for a
reused FD. Otherwise, we throw errors.

This requires that on shutdown, all preserved FDs are marked as cold.

[0] cloud-hypervisor#7371

Signed-off-by: Philipp Schuster <[email protected]>
On-behalf-of: SAP [email protected]
phip1611 added a commit to phip1611/cloud-hypervisor that referenced this pull request Oct 27, 2025
Allowing the same file descriptor (FD) to be preserved more than once
can lead to severe issues. Consider the case where management software
adds two virtio-net devices at runtime, both backed by the same
externally provided FD.

When the first device is removed during runtime, the FD for the second
device becomes invalid [0].

To avoid misconfigurations, we now prevent multiple preservation
attempts of the same FD.

This however has some caveats: In the current model, external FDs are
added to the list of preserved FDs after the corresponding device has
been created successfully. Now assume the following:
- VM boots
  - VM config has no preserved FDs
  - We add a device to the VM that uses an external FD for its Tap dev
  - The device is successfully initialized
  - VM config is extended with the preserved FDs
- VM reboots
  - VM config already has preserved FDs
  - Same as above
  - CRASH

Therefore, we need to know for every preserved FD whether it is "cold",
i.e., only has been part of a former VmConfig without associated device,
or "hot", i.e., it is currently also actively in use. This allows state
transitions from cold->hot in which case we won't throw an error for a
reused FD. Otherwise, we throw errors.

This requires that on shutdown, all preserved FDs are marked as cold.

[0] cloud-hypervisor#7371

Signed-off-by: Philipp Schuster <[email protected]>
On-behalf-of: SAP [email protected]
@likebreath likebreath moved this to ✅ Done in Cloud Hypervisor Roadmap Nov 6, 2025
@likebreath likebreath added the bug-fix Bug fix to include in release notes label Nov 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug-fix Bug fix to include in release notes

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants