-
Notifications
You must be signed in to change notification settings - Fork 565
hypervisor: AMX state snapshot and restore support #7534
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Your code handles a very tricky API situation in the neatest way I can think of. Good work! I can't think of a better way to handle this myself - but maybe @phip1611 or @tpressure would have an alternative idea. Otherwise I think we can go ahead with this. |
|
My colleague @olivereanderson just solved this in a different way which was merged yesterday into our fork. We'll raise a PR with his solution in the next hours and then we can discuss :) Seems it happens more frequent now that by coincidence two parties work on the same thing at the same time @lisongqian thanks for your contribution! Before we continue here, I'd love to wait until @olivereanderson opened a PR for his solution which solves things in a different way - we can then discuss which solution is the best (or we combine ideas from both PRs). |
|
To expand a bit on @phip1611's comment: My PR for our fork is here cyberus-technology#46 and it has already been extensively reviewed. The main idea is similar, but there are some subtle differences. I believe my version aligns better with our ongoing work on CPU templates/profiles and it also avoids keeping the KVM_GET_XSAVE syscall around (simpler seccomp rules! which I see we both forgot to update 😅 ). I will open my PR here as well today (or tomorrow at the latest) and let the maintainers decide which variant they want to go with. I want to add that I did comment on #5800 that I have a fix about to be upstreamed, but I am sympathetic to the fact that one cannot be aware of every single issue in this repository and snapshot isn't 100% the same as live migration. Please don't get me wrong. I am very happy to see more people working to improve Cloud hypervisor. It would just be great if we can avoid duplicating work to the greatest extent possible 🙂 EDIT: On the other hand the version presented here avoids global state which tends to be a good thing 👍 |
I'm glad we have the same use case. AMX support for cloud-hypervisor will be better!
I just looked at the code of @olivereanderson , it's a good idea to make amx feature to be a single component.👍🏻 About removing Overall, I prefer to combine our codes. We can have more discussion after the new PR is ready. 😄 @phip1611 @olivereanderson |
Is there a Cloud Hypervisor guideline about which Linux kernel version we are supporting at least? 5.17 is kinda old already (March 2022) |
A good rule of thumb is the oldest supported Ubuntu LTS release - jammy in this case. It shipped with 5.15. |
|
@lisongqian @rbradford my PR is up now #7536 I will give you some time to review both versions and then we can proceed in the best way possible from there. As mentioned in my PR description the most important thing for me is the standalone |
|
Aside: With regards to the seccomp rules. In the longer term I would recommend that we consider moving away from statically set seccomp rules and instead introduce a toml/json/yaml file where operators can set their desired seccomp rules that cloud hypervisor would then parse at runtime. |
We rejected that approach as it would mean you can't just deploy a single binary to run CH - instead you would need to decide on paths to find the files, etc - deal with people installing in different places and make it to hard to just run from the source tree. |
That is fair, but could be handled by having a default i.e. if no config file is given/found it defaults to enabling everything it might possibly need (essentially the current behaviour). This way users who want a single binary get what they want, but so do the users who know that their use cases don't require certain syscalls. |
|
What's the pragmatic way forward? We keep @olivereanderson's patches that streamline the AMX CPUID handling code and combine it with this approach. Right? Who will be working on this? |
See my comment here #7536 (comment) I think we can go with the approach on this PR (almost as is, but please remember to update the seccomp rules to also enable KVM_GET_XSAVE2) and I will open a separate PR for the |
|
The reason for the failed CI doesn't seem to be due to this PR: |
@lisongqian The failure appears to be related to the changes and likely due to seccomp issues (as @olivereanderson called-out earlier). Here are relative test failure log (from the first run): |
likebreath
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work. We now can migrate VMs from hosts missing KVM_CAP_XSAVE2 to hosts that support it.
Some comments below that should be quick to address. The main one is that we need to explicit block migration from hosts using KVM_CAP_XSAVE2 to the these that do not support.
78907c9 to
e68fac2
Compare
olivereanderson
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for improving the comments and taking the time to explain all your reasoning.
Some of the comments here, do however seem to not be entirely compatible with the official Linux kernel documentation though and it would be great to clarify this before merging.
4e3f765 to
cb7f4d1
Compare
hypervisor/src/kvm/x86_64/mod.rs
Outdated
| impl TryFrom<kvm_xsave> for XsaveState { | ||
| type Error = XsaveStateError; | ||
| fn try_from(s: kvm_xsave) -> Result<Self, Self::Error> { | ||
| // Check if kvm_xsave struct size is larger than region size, indicating extra data exists | ||
| if std::mem::size_of_val(&s) != std::mem::size_of_val(&s.region) { | ||
| error!("kvm_xsave extra field is not empty"); | ||
| return Err(XsaveStateError::XsaveExtraFieldNotEmpty); | ||
| } | ||
|
|
||
| Ok(Self { | ||
| region: s.region, | ||
| extra: Vec::new(), | ||
| }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might misread my previous comment: #7534 (comment).
We need to error out for impl From<XsaveState> for kvm_xsave, to avoid truncating XsaveState data on the destination VM side. (We are fine on the source VM side, e.g. no change needed for impl TryFrom<kvm_xsave> for XsaveStateimpl From<kvm_xsave> for XsaveState.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fast turn-around. The implementation of impl From<XsaveState> for kvm_xsave looks good.
Perhaps my previous comment was not clear - we should stick with the original impl From<kvm_xsave> for XsaveState rather than switching to TryFrom. Practically, the condition std::mem::size_of_val(&s) != std::mem::size_of_val(&s.region) will always be false, leaving the error path unreachable (dead code).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, kvm_xsave stores the value obtained by KVM_GET_XSAVE2 and kvm_xsave.extra stores >4KB of data. kvm-ioctls wraps this structure into the Xsave and provides it to vmm. I am concerned that the subsequent updates of hypervisor may directly get value by calling ioctl or the subsequent updates of kvm ioctls may directly return kvm_xsave to vmm, so I keep the code that currently appears to be dead code to remind future developers.
This refers to the idea of adding assert for get-xsave2/set-xsave2. If you feel it's unnecessary, I will remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am concerned that the subsequent updates of hypervisor may directly get value by calling ioctl or the subsequent updates of kvm ioctls may directly return kvm_xsave to vmm, so I keep the code that currently appears to be dead code to remind future developers.
I see your point - the usage of kvm_xsave, kvm_xsave2, and Xsave from kvm_bindings could be confusing, which may lead to future programming errors. For this case, the best way is adding an assertion to the original impl From<kvm_xsave> for XsaveState.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to an assertion.
f6ba7ea to
36abd64
Compare
|
#7552 should take care of the failing pipeline. |
I believe all previous comments are addressed. Please take another look.
The TILE data state of AMX may require 8KB+ space, calling the legacy KVM_GET_XSAVE will encounter an error since KVM_GET_XSAVE only can get 4KB space. This patch adds KVM_GET_XSAVE2 support to allow snapping more data. Fixes: cloud-hypervisor#7533 Signed-off-by: Songqian Li <[email protected]>
Signed-off-by: Oliver Anderson <[email protected]> Signed-off-by: Songqian Li <[email protected]>
hypervisor: AMX state snapshot and restore support
The TILE data state of AMX may require 8KB+ space, calling the legacy
KVM_GET_XSAVE will encounter an error since KVM_GET_XSAVE can only get
4KB space. This patch adds KVM_GET_XSAVE2 support to allow snapping more
data.
Fixes: #7533
Signed-off-by: Songqian Li [email protected]