Skip to content

Conversation

@olivereanderson
Copy link
Contributor

@olivereanderson olivereanderson commented Dec 3, 2025

Closes #5800.

This PR was developed independently from #7534, but they both end up essentially solving the same problem. Thus issue #7533 will also be closed if this gets merged.

I would be quite happy to combine code and ideas from @lisongqian's PR (as already suggested by @lisongqian here) so we get the best of both the approaches. The most important thing for me is that we keep the standalone enable_amx_state_components (static) method introduced here as we want to use it in the ongoing work we are doing on CPU profiles/templates (#7068 and #7111).

Details about this PR itself

Variable length Xsave state

When XSAVE state components are dynamically enabled, the de-facto length of kvm_xsave changes and this needs to be accounted for. I do this by updating the pre-existing XSaveState struct to wrap the FAM-struct Xsave from the kvm_bindings crate directly and I track the FAM length in a Oncelocked static variable.

How this relates to the upcoming CPU profiles feature

This fix is necessary to properly test the upcoming CPU Profiles feature, but the XSaveState::enable_amx_state_components method introduced here will also be called during CPU profile generation and also prior to checking CPU compatibility when receiving a live migration command.

Risks

There are a few things we should keep in mind with regards to this fix:

  1. It assumes that the FAM (flexible array member) struct approach from the kvm_bindings crate is sound. I am not entirely convinced that this will be the case in the eyes of the Rust abstract machine, but I think we can investigate that more in the future.
  2. It uses static variables. This is fine as long as there are no plans to introduce unit tests, some with and some without AMX (or other dynamically enabled state components), running in the same process. If I recall correctly cargo nextest runs each test in a separate process hence as long as that is used for running the unit/integration tests this should not be a problem.
  3. Heap allocations: The XsaveState will now heap allocate whenever AMX is enabled. Each initialization will require roughly 1KB of memory on the heap. We have however determined that this will never be called in any hot paths.

How this PR has been tested

This was tested by running an AMX workload (loading and mutiplying some matrices, then printing the result in a loop)
inside a guest on a Sapphire rapids server and then migrating it to a Granite rapids server. The migration was successful and the program kept running.

Hints for reviewers

I recommend reviewing one commit at a time.

@olivereanderson olivereanderson requested a review from a team as a code owner December 3, 2025 13:01
@olivereanderson olivereanderson changed the title Fix live migration amx enabled Fix live migration when amx features are enabled Dec 3, 2025
@phip1611
Copy link
Member

phip1611 commented Dec 3, 2025

I think you have to change your commit messages from Signed-Off-by to Signed-off-by for the CI. Not sure why the CI complains otherwise.

In order for guests to use AMX it is necessary to ask the kernel to
enable the related state components for guests. While cloud hypervisor
already does this, we would prefer to extract the logic into a stand
alone (reusable) function.

In this commit we only introduce the error type that will later be part
of the enable_amx_state_components function's signature.

Signed-off-by: Oliver Anderson <[email protected]>
On-behalf-of: SAP [email protected]
We introduce a static enable_amx_state_components method on the
XSaveState struct that will be used in a follow up commit. We will
also extend the logic of what this method does in the near future.

Signed-off-by: Oliver Anderson <[email protected]>
On-behalf-of: SAP [email protected]
We use the central amx_state_components enabling function in the
CpuManager constructor. This way we can make changes to the AMX related
state components functionality without needing to update the
CpuManager's constructor.

Signed-off-by: Oliver Anderson <[email protected]>
On-behalf-of: SAP <[email protected]>
We will need to query KVM for the size of the xsave struct in a follow
up commit. This commit introduces the necessary method on the
hypervisor trait to do that.

Signed-off-by: Oliver Anderson <[email protected]>
On-behalf-of: SAP [email protected]
As we are going to replace all KVM_GET_XSAVE calls with KVM_GET_XSAVE2
we need to update the seccomp filters accordingly.

Signed-off-by: Oliver Anderson <[email protected]>
On-behalf-of: SAP [email protected]
AMX requires dynamically enabling certain large state components which
leads to an increase in the size of kvm_xsave. This was not taken into
account by Cloud hypervisor until now.

We solve this by refactoring `XSaveState` to directly wrap `kvm::Xsave`
and always ensure (via a OnceLocked static variable) that all operations
on the wrapped xsave state obtain an instance of the right size.

Signed-off-by: Oliver Anderson <[email protected]>
On-behalf-of: SAP [email protected]
@olivereanderson olivereanderson force-pushed the fix-live-migration-amx-enabled branch from e94c360 to a66dc72 Compare December 3, 2025 13:19
@olivereanderson olivereanderson changed the title Fix live migration when amx features are enabled hypervisor: Fix live migration when amx features are enabled Dec 3, 2025
@rbradford
Copy link
Member

I don't like the use of the static variable for the length - I think how #7534 handled that aspect was cleaner.

F: FnMut(&mut kvm_bindings::Xsave) -> Result<(), E>,
E: Into<Box<dyn std::error::Error + Send + Sync + 'static>>,
{
let fam_length = XSAVE_FAM_LENGTH.get().unwrap_or(&0);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For safety we should probably use get_or_init here instead.

@olivereanderson
Copy link
Contributor Author

@rbradford @phip1611 @lisongqian what do you think about me moving the three first commits from this PR into a dedicated PR and then we go with #7534?

@lisongqian
Copy link
Contributor

IMO, I prefer to create a new struct like XsaveMgr instead of changing XsaveState. Some items named XXXState are usually only used to save data to the snapshot.

@olivereanderson
Copy link
Contributor Author

olivereanderson commented Dec 3, 2025

IMO, I prefer to create a new struct like XsaveMgr instead of changing XsaveState. Some items named XXXState are usually only used to save data to the snapshot.

I mean we can have it as a free standing function, it does not have to be tied to any struct at all.

All I am trying to say is that your changes are good (at least if I am understanding the kernel documentation correctly), but we would like to extract the preexisting state component enabling logic into a separate function. With my new understanding that is an orthogonal issue and can be done independently of your PR.

If everyone agrees I think I will go ahead and close this (my) open PR and open another one with a standalone function (that will not have any effect on your PR).

if result != 0 || (mask & XFEATURE_XTILEDATA_MASK) != XFEATURE_XTILEDATA_MASK {
return Err(Error::AmxEnable(anyhow!("Guest AMX usage not supported")));
}
hypervisor::arch::x86::XsaveState::enable_amx_state_components(hypervisor.as_ref())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeh, I don't think should be on XsaveState - like @lisongqian said. That struct is really for storing the register state. Can we have this on hypervisor instead?

@likebreath likebreath closed this Dec 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Live migration failed with AMX feature enabled.

5 participants