Skip to content

Conversation

@phip1611
Copy link
Member

@phip1611 phip1611 commented Jun 25, 2025

Currently (if I'm not missing something!), the following scenarios are supported by Cloud Hypervisor to bootstrap a VM (across all archs):

  1. Provide firmware
  2. Provide kernel [+ cmdline] [+ initrd]

As the difference between --firmware and --kernel is not very clear currently, especially as both use/support a Xen PVH entry, adding this helps to identify the cause of misconfiguration. Further, I've updated the documentation.

@phip1611 phip1611 requested a review from a team as a code owner June 25, 2025 11:51
@phip1611 phip1611 force-pushed the cleanup-firmware branch 6 times, most recently from c11d5bc to 0838d78 Compare June 25, 2025 12:21
@phip1611 phip1611 changed the title vmm: add enum PayloadError to improve error reporting vmm: add enum PayloadError to improve error reporting + doc update Jun 25, 2025
@rbradford
Copy link
Member

Currently (if I'm not missing something!), the following scenarios are supported by Cloud Hypervisor to bootstrap a VM (across all archs):

  1. provide firmware
  2. provide kernel
  3. provide kernel + cmdline
  4. provide kernel + initrd
  5. provide kernel + cmdline + initrd

As the difference between --firmware and --kernel is not very clear currently, especially as both use/support a Xen PVH entry, adding this helps to identify the cause of misconfiguration. Further, I've updated the documentation.

I think this list will get more confusing with #7117 where when you have firmware you can also specify kernel, command line and initrd.

@phip1611
Copy link
Member Author

phip1611 commented Jun 25, 2025

  1. Are you in favor of the new error type, tho?
  2. How close is devices: Add fw_cfg device #7117 to being merged? I think you are referring to --fw_cfg "kernel=./path" etc. I raised the concern if we want this at all, see here.

I made the list a little more compact; this allows to extend the list in the future for fw_cfg, hopefully a little less confusing

Copy link
Member

@rbradford rbradford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me - this codifies what we have. But I just wanted to highlight that #7117 might change this.

@phip1611
Copy link
Member Author

phip1611 commented Jun 26, 2025

But I just wanted to highlight that #7117 might change this.

Would it be a good idea to edit the original issue with an additional checkbox (section: "steps to undraft this" or similar) to look for this specific aspect? Assuming the other PR will be rebased onto this

@phip1611 phip1611 force-pushed the cleanup-firmware branch 5 times, most recently from 03269f4 to 29c7af4 Compare July 9, 2025 12:30
@phip1611 phip1611 changed the title vmm: add enum PayloadError to improve error reporting + doc update vmm: add PayloadConfig::validate to improve error reporting + doc update Jul 9, 2025
Comment on lines 730 to 816
match (&self.firmware, &self.kernel, &self.initramfs, &self.cmdline) {
(Some(_firmware), None, None, None) => Ok(()),
(None, Some(_kernel), _, _) => Ok(()),
(Some(_firmware), Some(_kernel), _, _) => {
Err(PayloadConfigError::FirmwarePlusOtherPayloads)
}
(Some(_firmware), _, Some(_initrd), _) => {
Err(PayloadConfigError::FirmwarePlusOtherPayloads)
}
(_, None, _, Some(_cmdline)) => Err(PayloadConfigError::CmdlineWithoutKernel),
(None, None, _, _) => Err(PayloadConfigError::MissingBootitem),
}
Copy link
Member

@likebreath likebreath Jul 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks right and aligns with our current behavior on x86_64. But I think we should take the chance to simply it.

As I read the code, the payload requirement on aarch64 (also risc-v) is cleaner though it is a bit more relax. Pretty much something like:

        match (&self.firmware, &self.kernel) {
            (Some(_firmware), None) => Ok(()),
            (None, Some(_kernel)) => Ok(()),
            (Some(_firmware), Some(_kernel)) => {
                Err(PayloadConfigError::FirmwarePlusOtherPayloads)
            }
            (None, None) => Err(PayloadConfigError::MissingBootitem),
        }

I'd change the payload requirement on x86_64 and making it consistent on all platforms. If we worry about silently ignoring kernel comdline and initrd with firmware boot, adding a warning message would be good to me.

Copy link
Member Author

@phip1611 phip1611 Aug 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried the relaxed approach. However, I discovered a problem with warn messages. After using this patch:

Show Code
diff --git a/vmm/src/vm.rs b/vmm/src/vm.rs
index e2e046ad7..c9cb0a376 100644
--- a/vmm/src/vm.rs
+++ b/vmm/src/vm.rs
@@ -1235,7 +1235,7 @@ impl Vm {
             &payload.initramfs,
             &payload.cmdline,
         ) {
-            (Some(firmware), None, None, None) => {
+            (Some(firmware), None, _, _) => {
                 let firmware = File::open(firmware).map_err(Error::FirmwareFile)?;
                 Self::load_kernel(firmware, None, memory_manager)
             }
diff --git a/vmm/src/vm_config.rs b/vmm/src/vm_config.rs
index 4619fcc8d..14570b26e 100644
--- a/vmm/src/vm_config.rs
+++ b/vmm/src/vm_config.rs
@@ -694,9 +694,6 @@ pub enum PayloadConfigError {
     /// When a firmware is specified, providing kernel, initrd, or cmdline is not supported.
     #[error("When a firmware is specified, providing kernel, initrd, or cmdline is not supported")]
     FirmwarePlusOtherPayloads,
-    /// Provided cmdline without kernel.
-    #[error("Provided cmdline without kernel")]
-    CmdlineWithoutKernel,
     /// No bootitem provided: neither firmware nor kernel.
     #[error("No bootitem provided: neither firmware nor kernel")]
     MissingBootitem,
@@ -726,19 +723,29 @@ pub struct PayloadConfig {
 impl PayloadConfig {
     /// Validates the payload config and succeeds if Cloud Hypervisor will be
     /// able to boot the configuration.
+    ///
+    /// Some odd yet non-critical configurations will result in a warning
+    /// message rather than an error.
     pub fn validate(&self) -> Result<(), PayloadConfigError> {
-        match (&self.firmware, &self.kernel, &self.initramfs, &self.cmdline) {
-            (Some(_firmware), None, None, None) => Ok(()),
-            (None, Some(_kernel), _, _) => Ok(()),
-            (Some(_firmware), Some(_kernel), _, _) => {
-                Err(PayloadConfigError::FirmwarePlusOtherPayloads)
-            }
-            (Some(_firmware), _, Some(_initrd), _) => {
+        // Hard validation
+        let res = match (&self.firmware, &self.kernel) {
+            (Some(_firmware), None) => Ok(()),
+            (None, Some(_kernel)) => Ok(()),
+            (Some(_firmware), Some(_kernel)) => {
                 Err(PayloadConfigError::FirmwarePlusOtherPayloads)
             }
-            (_, None, _, Some(_cmdline)) => Err(PayloadConfigError::CmdlineWithoutKernel),
-            (None, None, _, _) => Err(PayloadConfigError::MissingBootitem),
+            (None, None) => Err(PayloadConfigError::MissingBootitem),
+        };
+
+        // After filtering out illegal configurations, no warn about odd
+        // configurations.
+        match (&self.firmware, &self.kernel, &self.initramfs, &self.cmdline) {
+            (_, None, _, Some(_cmdline)) => log::warn!("provided cmdline without kernel"),
+            (_, None, Some(_initramfs), _) => log::warn!("provided initramfs without kernel"),
+            _ => {},
         }
+
+        res
     }
 }

I noticed that each warning is printed twice - at least. Multiple code paths actually perform some validations multiple times (such as: change config (add device) and re-validate). Therefore, I'm in favor of being more restrictive here - for all platforms: no warning messages but hard errors.

Otherwise people will also get spammed with warn messages every time a device is hot-plugged which causes a re-validation of the config. And moving the check out of PayloadConfig::validate() also doesn't feel right.

I'm looking forward to your thoughts on that! Perhaps, you have ideas how to tackle this even differently!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for sharing the findings - I can certainly see the problem of spamming warning messages.

I think we can solve this problem by:

  1. Make pub fn validate(&mut self) for Payload;
  2. Warn it the first time, and overwrite the unused field to None (as that's the actual config we are using with the VM anway);

Also, I'd use nested match statement, which reads much clearer to me.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to confirm we’re aligned: you want to allow users to specify --cmdline and --initrd together with --firmware. This has no real effect, and we prefer printing a warning rather than rejecting the invocation of cloud Hypervisor. Correct? The latest state of the PR should reflect that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, exactly. That's the behavior we have for aarch64 (and riscv) before this change. Making x86_64 to follow the same behavior is a good change to me.

I find large match patterns with more than three components can be difficult to read and error-prone. How about:

        match (&self.firmware, &self.kernel) {
            (Some(_firmware), None) => {
                if self.cmdline.is_some() || self.initramfs.is_some() {
                    log::warn!(
                        "Ignoring cmdline and initramfs as firmware is provided as the payload"
                    );
                    self.cmdline = None;
                    self.initramfs = None;
                }
                Ok(())
            }
            (None, Some(_kernel)) => Ok(()),
            (Some(_firmware), Some(_kernel)) => Err(PayloadConfigError::FirmwarePlusOtherPayloads),
            (None, None) => Err(PayloadConfigError::MissingBootitem),
        }

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That looks better indeed, thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool. Can you adopt the changes above so that we can land this one? Thanks.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I think/hope that we are finally ready to go :)

@phip1611
Copy link
Member Author

phip1611 commented Aug 1, 2025

FYI: Technically speaking, I am back from vacation. Unfortunately, I injured myself while exercising and can hardly type. I hope to resume work in mid-August.

@likebreath
Copy link
Member

FYI: Technically speaking, I am back from vacation. Unfortunately, I injured myself while exercising and can hardly type. I hope to resume work in mid-August.

@phip1611 Sorry to hear about that. Getting better soon. Take care.

@likebreath
Copy link
Member

Convert this PR to a draft. Feel free to bring it back when it is ready. Thank you.

@likebreath likebreath marked this pull request as draft August 4, 2025 16:28
@phip1611 phip1611 marked this pull request as ready for review August 11, 2025 09:42
@phip1611 phip1611 requested a review from likebreath August 11, 2025 09:50
@phip1611
Copy link
Member Author

phip1611 commented Aug 12, 2025

@likebreath @rbradford

I've noticed there is room for improvement for harmonizing load_payload(), load_kernel(), and load_firmware() between the three architectures. Then the boot path for all three architectures will be more streamlined.

I however would prefer this in a dedicated PR. That being said, is the current PR ready to go?

@phip1611 phip1611 force-pushed the cleanup-firmware branch 2 times, most recently from 4257630 to 466fe53 Compare August 12, 2025 09:52
@rbradford
Copy link
Member

@likebreath @rbradford

I've noticed there is room for improvement for harmonizing load_payload(), load_config(), and load_firmware() between the three architectures. Then the boot path for all three architectures will be more streamlined.

Sounds good.

I however would prefer this in a dedicated PR. That being said, is the current PR ready to go?

Yes - I looked through this and was happy. I was waiting for @likebreath to re-review.

@phip1611 phip1611 mentioned this pull request Aug 12, 2025
2 tasks
@likebreath
Copy link
Member

I was waiting for @likebreath to re-review.

Overall looks good. One comment added above.

@phip1611 phip1611 force-pushed the cleanup-firmware branch 2 times, most recently from bc16e6f to cdee05a Compare August 15, 2025 07:50
Currently, the following scenarios are supported by Cloud Hypervisor to
bootstrap a VM:

1. provide firmware
2. provide kernel
3. provide kernel + cmdline
4. provide kernel + initrd
5. provide kernel + cmdline + initrd

As the difference between `--firmware` and `--kernel` is not very clear
currently, especially as both use/support a Xen PVH entry, adding this
helps to identify the cause of misconfiguration.

Signed-off-by: Philipp Schuster <[email protected]>
On-behalf-of: SAP [email protected]
@likebreath likebreath added this pull request to the merge queue Aug 15, 2025
Merged via the queue into cloud-hypervisor:main with commit dd8687a Aug 15, 2025
39 checks passed
@phip1611 phip1611 deleted the cleanup-firmware branch November 25, 2025 13:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants