Skip to content

[RFC] devices: pvmemcontrol, control guest physical memory proterties #6318

@Dummyc0m

Description

@Dummyc0m

I'm working on memory passthrough for lightweight VMs. We've come up with an approach that's guest driven and tries to keep the VM slim proactively. Memctl is the name of the device/driver that communicates between the guest and vmm to control the host backing of guest memory.

Yuanchu Xie [email protected]
Pasha Tatashin [email protected] @soleen


Memctl provides a way for the guest to control its physical memory
properties, and enables optimizations and security features. For
example, the guest can provide information to the host where parts of a
hugepage may be unbacked, or sensitive data may not be swapped out, etc.

Memctl allows guests to manipulate its gPTE entries in the SLAT, and
also some other properties of the memory map the back's host memory.
This is achieved by using the KVM_CAP_SYNC_MMU capability. When this
capability is available, the changes in the backing of the memory region
on the host are automatically reflected into the guest. For example, an
mmap() or madvise() that affects the region will be made visible
immediately.

There are two components of the implementation: the guest Linux driver
and Virtual Machine Monitor (VMM) device. A guest-allocated shared
buffer is negotiated per-cpu through a few PCI MMIO registers, the VMM
device assigns a unique command for each per-cpu buffer. The guest
writes its memctl request in the per-cpu buffer, then writes the
corresponding command into the command register, calling into the VMM
device to perform the memctl request.

The synchronous per-cpu shared buffer approach avoids the kick and busy
waiting that the guest would have to do with virtio virtqueue transport.

We provide both kernel and userspace APIs
Kernel API
long memctl_vmm_call(__u64 func_code, __u64 addr, __u64 length, __u64 arg,
struct memctl_buf *buf);

Kernel drivers can take advantage of the memctl calls to provide
paravirtualization of kernel stacks or page zeroing.

User API
From the userland, the memctl guest driver is controlled via ioctl(2)
call. It requires CAP_SYS_ADMIN.

ioctl(fd, MEMCTL_IOCTL, struct memctl_buf *buf);

Guest userland applications can tag VMAs and guest hugepages, or advise
the host on how to handle sensitive guest pages.

Supported function codes and their use cases:
MEMCTL_FREE/REMOVE/DONTNEED/PAGEOUT. For the guest. One can reduce the
struct page and page table lookup overhead by using hugepages backed by
smaller pages on the host. These memctl commands can allow for partial
freeing of private guest hugepages to save memory. They also allow
kernel memory, such as kernel stacks and task_structs to be
paravirtualized.

MEMCTL_UNMERGEABLE is useful for security, when the VM does not want to
share its backing pages.
The same with MADV_DONTDUMP, so sensitive pages are not included in a
dump.
MLOCK/UNLOCK can advise the host that sensitive information is not
swapped out on the host.

MEMCTL_MPROTECT_NONE/R/W/RW. For guest stacks backed by hugepages, stack
guard pages can be handled in the host and memory can be saved in the
hugepage.

MEMCTL_SET_VMA_ANON_NAME is useful for observability and debugging how
guest memory is being mapped on the host.

Sample program making use of MEMCTL_SET_VMA_ANON_NAME and
MEMCTL_DONTNEED:
https://github.com/Dummyc0m/memctl-set-anon-vma-name/tree/main
https://github.com/Dummyc0m/memctl-set-anon-vma-name/tree/dontneed

Guest kernel driver
https://github.com/Dummyc0m/linux-memctl

The VMM implementation is being proposed for Cloud Hypervisor:
https://github.com/Dummyc0m/cloud-hypervisor/

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    ✅ Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions