Skip to content

Conversation

@lisongqian
Copy link
Contributor

hypervisor: AMX state snapshot and restore support

The TILE data state of AMX may require 8KB+ space, calling the legacy
KVM_GET_XSAVE will encounter an error since KVM_GET_XSAVE can only get
4KB space. This patch adds KVM_GET_XSAVE2 support to allow snapping more
data.

Fixes: #7533

Signed-off-by: Songqian Li [email protected]

@rbradford
Copy link
Member

Your code handles a very tricky API situation in the neatest way I can think of. Good work! I can't think of a better way to handle this myself - but maybe @phip1611 or @tpressure would have an alternative idea. Otherwise I think we can go ahead with this.

@phip1611
Copy link
Member

phip1611 commented Dec 3, 2025

My colleague @olivereanderson just solved this in a different way which was merged yesterday into our fork. We'll raise a PR with his solution in the next hours and then we can discuss :)

Seems it happens more frequent now that by coincidence two parties work on the same thing at the same time

@lisongqian thanks for your contribution! Before we continue here, I'd love to wait until @olivereanderson opened a PR for his solution which solves things in a different way - we can then discuss which solution is the best (or we combine ideas from both PRs).

@olivereanderson
Copy link
Contributor

olivereanderson commented Dec 3, 2025

To expand a bit on @phip1611's comment: My PR for our fork is here cyberus-technology#46 and it has already been extensively reviewed. The main idea is similar, but there are some subtle differences. I believe my version aligns better with our ongoing work on CPU templates/profiles and it also avoids keeping the KVM_GET_XSAVE syscall around (simpler seccomp rules! which I see we both forgot to update 😅 ).

I will open my PR here as well today (or tomorrow at the latest) and let the maintainers decide which variant they want to go with.

I want to add that I did comment on #5800 that I have a fix about to be upstreamed, but I am sympathetic to the fact that one cannot be aware of every single issue in this repository and snapshot isn't 100% the same as live migration.

Please don't get me wrong. I am very happy to see more people working to improve Cloud hypervisor. It would just be great if we can avoid duplicating work to the greatest extent possible 🙂

EDIT: On the other hand the version presented here avoids global state which tends to be a good thing 👍

@lisongqian
Copy link
Contributor Author

My colleague @olivereanderson just solved this in a different way which was merged yesterday into our fork. We'll raise a PR with his solution in the next hours and then we can discuss :)

I'm glad we have the same use case. AMX support for cloud-hypervisor will be better!

@lisongqian thanks for your contribution! Before we continue here, I'd love to wait until @olivereanderson opened a PR for his solution which solves things in a different way - we can then discuss which solution is the best (or we combine ideas from both PRs).

I just looked at the code of @olivereanderson , it's a good idea to make amx feature to be a single component.👍🏻 About removing KVM_GET_XSAVE, I hope to keep KVM_GET_XSAVE since KVM_GET_XSAVE2 is supported from Linux 5.17. Keeping KVM_GET_XSAVE can be compatible with many low-level Linux machines that don't enable AMX.

Overall, I prefer to combine our codes. We can have more discussion after the new PR is ready. 😄 @phip1611 @olivereanderson

@phip1611
Copy link
Member

phip1611 commented Dec 3, 2025

KVM_GET_XSAVE2 is supported from Linux 5.17

Is there a Cloud Hypervisor guideline about which Linux kernel version we are supporting at least? 5.17 is kinda old already (March 2022)

@rbradford
Copy link
Member

KVM_GET_XSAVE2 is supported from Linux 5.17

Is there a Cloud Hypervisor guideline about which Linux kernel version we are supporting at least? 5.17 is kinda old already (March 2022)

A good rule of thumb is the oldest supported Ubuntu LTS release - jammy in this case. It shipped with 5.15.

@olivereanderson
Copy link
Contributor

olivereanderson commented Dec 3, 2025

@lisongqian @rbradford my PR is up now #7536

I will give you some time to review both versions and then we can proceed in the best way possible from there. As mentioned in my PR description the most important thing for me is the standalone enable_amx_state_components function/method. Apart from that I am quite happy with whichever approach we land on.

@olivereanderson
Copy link
Contributor

olivereanderson commented Dec 3, 2025

Aside: With regards to the seccomp rules. In the longer term I would recommend that we consider moving away from statically set seccomp rules and instead introduce a toml/json/yaml file where operators can set their desired seccomp rules that cloud hypervisor would then parse at runtime.

@rbradford
Copy link
Member

Aside: With regards to the seccomp rules. In the longer term I would recommend that we consider moving away from statically set seccomp rules and instead introduce a toml/json/yaml file where operators can set their desired seccomp rules that cloud hypervisor would then parse at runtime.

We rejected that approach as it would mean you can't just deploy a single binary to run CH - instead you would need to decide on paths to find the files, etc - deal with people installing in different places and make it to hard to just run from the source tree.

@olivereanderson
Copy link
Contributor

olivereanderson commented Dec 3, 2025

Aside: With regards to the seccomp rules. In the longer term I would recommend that we consider moving away from statically set seccomp rules and instead introduce a toml/json/yaml file where operators can set their desired seccomp rules that cloud hypervisor would then parse at runtime.

We rejected that approach as it would mean you can't just deploy a single binary to run CH - instead you would need to decide on paths to find the files, etc - deal with people installing in different places and make it to hard to just run from the source tree.

That is fair, but could be handled by having a default i.e. if no config file is given/found it defaults to enabling everything it might possibly need (essentially the current behaviour). This way users who want a single binary get what they want, but so do the users who know that their use cases don't require certain syscalls.

@phip1611
Copy link
Member

phip1611 commented Dec 3, 2025

What's the pragmatic way forward? We keep @olivereanderson's patches that streamline the AMX CPUID handling code and combine it with this approach. Right?

Who will be working on this?

@olivereanderson
Copy link
Contributor

What's the pragmatic way forward? We keep @olivereanderson's patches that streamline the AMX CPUID handling code and combine it with this approach. Right?

Who will be working on this?

See my comment here #7536 (comment)

I think we can go with the approach on this PR (almost as is, but please remember to update the seccomp rules to also enable KVM_GET_XSAVE2) and I will open a separate PR for the enable_amx_state_components function.

@lisongqian
Copy link
Contributor Author

@likebreath
Copy link
Member

The reason for the failed CI doesn't seem to be due to this PR: https://github.com/cloud-hypervisor/cloud-hypervisor/actions/runs/19899422217/job/57038687035?pr=7534

@lisongqian The failure appears to be related to the changes and likely due to seccomp issues (as @olivereanderson called-out earlier). Here are relative test failure log (from the first run):

    cloud-hypervisor: 250.741516s: <vmm> INFO:event_monitor/src/lib.rs:113 -- Event: source = vm event = paused 
    cloud-hypervisor: 250.742874s: <vmm> INFO:vmm/src/api/mod.rs:1329 -- API request event: VmSnapshot VmSnapshotConfig { destination_url: "file:///tmp/ch9FXt0U/snapshot" }
    cloud-hypervisor: 250.742874s: <vmm> INFO:event_monitor/src/lib.rs:113 -- Event: source = vm event = snapshotting 
    cloud-hypervisor: 250.742889s: <vmm> INFO:arch/src/x86_64/mod.rs:572 -- Running under nested virtualisation. Hypervisor string: Microsoft Hv
    cloud-hypervisor: 250.742905s: <vmm> INFO:arch/src/x86_64/mod.rs:578 -- Generating guest CPUID for with physical address size: 46

    ==== Possible seccomp violation ====
    Try running with `strace -ff` to identify the cause and open an issue: https://github.com/cloud-hypervisor/cloud-hypervisor/issues/new

Copy link
Member

@likebreath likebreath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work. We now can migrate VMs from hosts missing KVM_CAP_XSAVE2 to hosts that support it.

Some comments below that should be quick to address. The main one is that we need to explicit block migration from hosts using KVM_CAP_XSAVE2 to the these that do not support.

@lisongqian lisongqian force-pushed the xsave2 branch 3 times, most recently from 78907c9 to e68fac2 Compare December 4, 2025 04:34
Copy link
Contributor

@olivereanderson olivereanderson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for improving the comments and taking the time to explain all your reasoning.

Some of the comments here, do however seem to not be entirely compatible with the official Linux kernel documentation though and it would be great to clarify this before merging.

@lisongqian lisongqian force-pushed the xsave2 branch 3 times, most recently from 4e3f765 to cb7f4d1 Compare December 7, 2025 15:56
Comment on lines 306 to 318
impl TryFrom<kvm_xsave> for XsaveState {
type Error = XsaveStateError;
fn try_from(s: kvm_xsave) -> Result<Self, Self::Error> {
// Check if kvm_xsave struct size is larger than region size, indicating extra data exists
if std::mem::size_of_val(&s) != std::mem::size_of_val(&s.region) {
error!("kvm_xsave extra field is not empty");
return Err(XsaveStateError::XsaveExtraFieldNotEmpty);
}

Ok(Self {
region: s.region,
extra: Vec::new(),
})
Copy link
Member

@likebreath likebreath Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might misread my previous comment: #7534 (comment).

We need to error out for impl From<XsaveState> for kvm_xsave, to avoid truncating XsaveState data on the destination VM side. (We are fine on the source VM side, e.g. no change needed for impl TryFrom<kvm_xsave> for XsaveState impl From<kvm_xsave> for XsaveState.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fast turn-around. The implementation of impl From<XsaveState> for kvm_xsave looks good.

Perhaps my previous comment was not clear - we should stick with the original impl From<kvm_xsave> for XsaveState rather than switching to TryFrom. Practically, the condition std::mem::size_of_val(&s) != std::mem::size_of_val(&s.region) will always be false, leaving the error path unreachable (dead code).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, kvm_xsave stores the value obtained by KVM_GET_XSAVE2 and kvm_xsave.extra stores >4KB of data. kvm-ioctls wraps this structure into the Xsave and provides it to vmm. I am concerned that the subsequent updates of hypervisor may directly get value by calling ioctl or the subsequent updates of kvm ioctls may directly return kvm_xsave to vmm, so I keep the code that currently appears to be dead code to remind future developers.

This refers to the idea of adding assert for get-xsave2/set-xsave2. If you feel it's unnecessary, I will remove it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am concerned that the subsequent updates of hypervisor may directly get value by calling ioctl or the subsequent updates of kvm ioctls may directly return kvm_xsave to vmm, so I keep the code that currently appears to be dead code to remind future developers.

I see your point - the usage of kvm_xsave, kvm_xsave2, and Xsave from kvm_bindings could be confusing, which may lead to future programming errors. For this case, the best way is adding an assertion to the original impl From<kvm_xsave> for XsaveState.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to an assertion.

@lisongqian lisongqian force-pushed the xsave2 branch 2 times, most recently from f6ba7ea to 36abd64 Compare December 9, 2025 12:21
@olivereanderson
Copy link
Contributor

#7552 should take care of the failing pipeline.

@likebreath likebreath requested a review from phip1611 December 9, 2025 17:49
@likebreath likebreath dismissed phip1611’s stale review December 9, 2025 17:51

I believe all previous comments are addressed. Please take another look.

lisongqian and others added 2 commits December 11, 2025 12:16
The TILE data state of AMX may require 8KB+ space, calling the legacy
KVM_GET_XSAVE will encounter an error since KVM_GET_XSAVE only can get
4KB space. This patch adds KVM_GET_XSAVE2 support to allow snapping more
data.

Fixes: cloud-hypervisor#7533

Signed-off-by: Songqian Li <[email protected]>
Signed-off-by: Oliver Anderson <[email protected]>
Signed-off-by: Songqian Li <[email protected]>
@likebreath likebreath added this pull request to the merge queue Dec 11, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Dec 11, 2025
@rbradford rbradford added this pull request to the merge queue Dec 11, 2025
Merged via the queue into cloud-hypervisor:main with commit bcc6ac6 Dec 11, 2025
43 checks passed
@github-project-automation github-project-automation bot moved this from 🆕 New to ✅ Done in Cloud Hypervisor Roadmap Dec 11, 2025
@likebreath likebreath added the bug-fix Bug fix to include in release notes label Dec 11, 2025
@lisongqian lisongqian deleted the xsave2 branch December 12, 2025 01:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug-fix Bug fix to include in release notes

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

Cannot snapshot for VMM when enabling AMX

5 participants