Enhance the VirtioDevice trait to support PCI tranport layer #4

jiangliu · 2019-06-05T09:42:51Z

The VirtioDevice trait derived from the firecracker project is heavily
shaped by the MMIO transport layer. So enhance the VirtioDevice trait
to prepare for Virtio PCI transport layer. The key change is to support
multiple interrupts for a virtio device. And also encapsulate parameters
to VirtioDevice::activate() as VirtioDeviceConfig for readability.

Signed-off-by: Liu Jiang [email protected]

The VirtioDevice trait derived from the firecracker project is heavily shaped by the MMIO transport layer. So enhance the VirtioDevice trait to prepare for Virtio PCI transport layer. The key change is to support multiple interrupts for a virtio device. And also encapsulate parameters to VirtioDevice::activate() as VirtioDeviceConfig for readability. Signed-off-by: Liu Jiang <[email protected]>

bonzini · 2019-06-05T13:14:32Z

I'm a bit wary of adding something like VirtioDeviceInterrupt here, it seems like a generic trait that should be in vm-device. What is the state there? Should we start that crate by just adding

// not sure why Sized + Send in @jiangliu's patch
pub trait ConsumeInterrupt {
    /// Inject the specified interrupt into the guest.
    fn trigger(&self, interrupt_id: u32) -> std::io::Error;
}

impl ConsumeInterrupt for EventFd {
    fn trigger(&self, interrupt_id: u32) {
        self.write(1);
    }
}

impl ConsumeInterrupt for Vec<EventFd> {
    fn trigger(&self, interrupt_id: u32) {
        self[interrupt_id].trigger(0);
    }
}

?

jiangliu · 2019-06-05T13:25:03Z

I'm a bit wary of adding something like VirtioDeviceInterrupt here, it seems like a generic trait that should be in vm-device. What is the state there? Should we start that crate by just adding
// not sure why Sized + Send in @jiangliu's patch
pub trait ConsumeInterrupt {
    /// Inject the specified interrupt into the guest.
    fn trigger(&self, interrupt_id: u32) -> std::io::Error;
}

impl ConsumeInterrupt for EventFd {
    fn trigger(&self, interrupt_id: u32) {
        self.write(1);
    }
}

impl ConsumeInterrupt for Vec<EventFd> {
    fn trigger(&self, interrupt_id: u32) {
        self[interrupt_id].trigger(0);
    }
}
?

The VirtioDeviceInterrupt will have concrete implementations for both legacy irq and PCI MSI. For legacy IRQ, it need to set status bits into the interrupt status register before writing to the eventfd. So we can't simple implement ConsumeInterrupt trait for EventFd/Vec.

And it's a good point that the VirtioDeviceInterrupt should be renamed and moved into the vm-device, there's no special treatment for VirtioDevice at all.

The Sized + Send may be an over-claim:(

sameo · 2019-06-05T13:44:19Z

I'm a bit wary of adding something like VirtioDeviceInterrupt here, it seems like a generic trait that should be in vm-device. What is the state there?

I agree, and this is exactly what we're currently talking about here: cloud-hypervisor/cloud-hypervisor#38 (comment) , also for adding MSI-X support to cloud-hypervisor.

A generic trait makes sense to me, and it should indeed be part of the vm-device crate.
As for the latter, there's a PR currently pending for it, feel free to chime in so that we can pile this kind of addition on top of the initial PR once merged.

jiangliu · 2019-06-05T13:49:02Z

Seems I should review the vm-device crate first:)
It's a good point to let the vm-device crate to handle interrupts.

bonzini · 2019-06-05T14:06:10Z

For legacy IRQ, it need to set status bits into the interrupt status register before writing to the eventfd.

That should be handled by the PCI transport. The PCI transport should provide its own implementation of ConsumeInterrupt, with code like "if !self.msi_active { self.set_interrupt_status(); } interrupt_controller.trigger(interrupt_id);" where interrupt_controller is also a ConsumeInterrupt implementor, probably provided by the PCI bus, which in turn will percolate all the way down to Vec<Eventfd> or something that invokes the KVM_SIGNAL_MSI ioctl.

sboeuf · 2019-06-05T14:14:36Z

@bonzini I feel we need two levels here since we want to keep the KVM specific bits at the vmm code level, while in between at the PCI level, we need to make a difference between IRQ and MSI.
I'm sure what I'm trying to say is not very clear but take a look here for more details.

bonzini · 2019-06-05T14:19:50Z

I feel we need two levels here since we want to keep the KVM specific bits at the vmm code level, while in between at the PCI level, we need to make a difference between IRQ and MSI.

Right, a trait gives you all the levels you need, for example:

virtio devices just see ConsumeInterrupt
virtio-pci devices provide an implementation that distinguishes between IRQ and MSI
the PCI bus provides two implementations, one for IRQ and one for MSI
the VMM uses the closure mechanism you have in Add MSI-X support cloud-hypervisor/cloud-hypervisor#38 to invoke KVM ioctls or write to eventfds

jiangliu · 2019-06-05T14:20:06Z

For legacy IRQ, it need to set status bits into the interrupt status register before writing to the eventfd.

That should be handled by the PCI transport. The PCI transport should provide its own implementation of ConsumeInterrupt, with code like "if !self.msi_active { self.set_interrupt_status(); } interrupt_controller.trigger(interrupt_id);" where interrupt_controller is also a ConsumeInterrupt implementor, probably provided by the PCI bus, which in turn will percolate all the way down to Vec<Eventfd> or something that invokes the KVM_SIGNAL_MSI ioctl.

Good point. Originally I planned to share code to deal with legacy IRQs between the PCI and MMIO transport layer.

jiangliu · 2019-06-05T14:22:30Z

I feel we need two levels here since we want to keep the KVM specific bits at the vmm code level, while in between at the PCI level, we need to make a difference between IRQ and MSI.

Right, a trait gives you all the levels you need, for example:

virtio devices just see ConsumeInterrupt

virtio-pci devices provide an implementation that distinguishes between IRQ and MSI

the PCI bus provides two implementations, one for IRQ and one for MSI

the VMM uses the closure mechanism you have in intel/cloud-hypervisor#38 to invoke KVM ioctls or write to eventfds

I like the idea of cascaded ConsumeInterrupt, it gives us chances to play magic:)

sboeuf · 2019-06-05T15:09:03Z

@bonzini ok I'm really trying to understand :)

virtio devices just see ConsumeInterrupt

Right now, the EventFd, or the callback, is passed through VirtioDevice.activate(), how does that work with ConsumeInterrupt? What type of object would you pass to VirtioDevice so that later on, VirtioDevice can call into object.trigger() because the object implements ConsumeInterrupt?

virtio-pci devices provide an implementation that distinguishes between IRQ and MSI

Are you suggesting something like:

impl ConsumeInterrupt for VirtioPciDevice {
        fn trigger() {
                ...
        }
}

But we're not supposed to pass the VirtioPciDevice down to the VirtioDevice, so who would call into this implementation?

the PCI bus provides two implementations, one for IRQ and one for MSI

PciDevice is a trait. Are you suggesting that PciDevice could also implement ConsumeInterrupt with some default implementation?

the VMM uses the closure mechanism you have in cloud-hypervisor/cloud-hypervisor#38 to invoke KVM ioctls or write to eventfds

Ok cool!

jiangliu · 2019-06-05T15:53:08Z

On Jun 5, 2019, at 11:09 PM, Sebastien Boeuf ***@***.*** ***@***.***>> wrote: @bonzini <https://github.com/bonzini> ok I'm really trying to understand :) virtio devices just see ConsumeInterrupt Right now, the EventFd, or the callback, is passed through VirtioDevice.activate(), how does that work with ConsumeInterrupt? What type of object would you pass to VirtioDevice so that later on, VirtioDevice can call into object.trigger() because the object implements ConsumeInterrupt? virtio-pci devices provide an implementation that distinguishes between IRQ and MSI Are you suggesting something like: impl ConsumeInterrupt for VirtioPciDevice { fn trigger() { ... } } But we're not supposed to pass the VirtioPciDevice down to the VirtioDevice, so who would call into this implementation? the PCI bus provides two implementations, one for IRQ and one for MSI PciDevice is a trait. Are you suggesting that PciDevice could also implement ConsumeInterrupt with some default implementation? the VMM uses the closure mechanism you have in cloud-hypervisor/cloud-hypervisor#38 <cloud-hypervisor/cloud-hypervisor#38> to invoke KVM ioctls or write to eventfds Ok cool!

Something looks like below may work. Then the MMIO transport layer will create an instance of DeviceLegacyInterrupt for each VirtioDevice, And the PCI transport layer will create an instance of DevicePCIInterrupt for each VirtioDevice. // This trait should be moved into the vm-device crate. /// Trait for virtio devices to inject interrupts into the guest. pub trait DeviceInterrupt { /// Inject the specified interrupt into the guest. fn trigger(&mut self, interrupt_id: u32) -> std::io::Error; } impl DeviceInterrupt for EventFd { fn trigger(&mut self, interrupt_id: u32) -> std::io::Error { self.write(0) } } impl DeviceInterrupt For Vec<EventFd> { fn trigger(&mut self, interrupt_id: u32) -> std::io::Error { if interrupt_id as usize >= self.len() { std::io::Error::from_raw_os_error(libc::EINVAL) } else { self[interrupt_id].trigger(0) } } } struct DeviceLegacyInterrupt { intr_evt: EventFd, intr_status: Arc<AtomicU32>, } impl DeviceInterrupt for DeviceLegacyInterrupt { fn trigger(&mut self, interrupt_id: u32) -> std::io::Error { self.intr_status.fetch_or(1u32 << interrupt_id, Ordering::SeqCst); self.intr_evt.trigger(0) } } struct DeviceMSIInterrupt { mode: u32, intr_evt: Vec<EventFd>, intr_status: Arc<AtomicU32>, } impl DeviceInterrupt for DeviceMSIInterrupt { fn trigger(&mut self, interrupt_id: u32) -> std::io::Error { .... } }

…

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#4?email_source=notifications&email_token=AAOXR7GIM6FIPVBQUKOCIG3PY7JJBA5CNFSM4HTT4VFKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXAATGA#issuecomment-499124632>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAOXR7DOXOX6GZKNLHVTLKLPY7JJBANCNFSM4HTT4VFA>.

bonzini · 2019-06-05T16:13:34Z

But we're not supposed to pass the VirtioPciDevice down to the VirtioDevice, so who would
call into this implementation?

VirtioDevice could implement the trait, or alternatively it could be passed to activate.

the PCI bus provides two implementations, one for IRQ and one for MSI

PciDevice is a trait. Are you suggesting that PciDevice could also implement ConsumeInterrupt with some default implementation?

Either that, or it could have metods

    fn intx_handler(&self) -> &LevelTriggeredInterrupt;
    fn msi_handler(&self) -> &EdgeTriggeredInterrupt;

(what I wrote as ConsumeInterrupt above, has now become EdgeTriggeredInterrupt - sorry about that).

alexandruag · 2019-06-13T16:39:40Z

Hi everyone, wall of text incoming :( It looks like the functionality that supports Virtio devices in Firecracker and other VMMs can be split into three major components:

There's the transport, that handles discovery (when necessary) & configuration in terms of device status, feature negotiation, queue setup, and device-specific configuration space (and also keeps the associated state).
Then, we have the queue handler (based on the Virtio queue implementation), which does actions in response to events/notifications associated with queues and/or the device backend. I think this part should be totally agnostic to the mechanism used to deliver notifications; it should just expose functionality such as handle_this_queue or handle_backend, etc. which is then invoked by the event driver (the following point). The handler must notify the guest on certain occasions, but this can be enabled, for example, via an externally provided closure (I think someone already mentioned this somewhere).
Finally, there's an event driver/device state machine of sorts, which handles things like registering io/irq eventfds, other similar things, triggering functionality in response to events, etc. This basically connects the other two components among themselves and to the rest of the VMM.

Whenever we'd want to use a Virtio device, we pick a transport, a queue handler + backend, and the event driver part to actuate the rest. The event driver is a bit more difficult to precisely define as an interface, because it's naturally related to other rust-vmm high level concepts such as buses and resource allocators, but it looks doable.

I think there's value in clearly separating the transport and queue handler (for example, right now the activate method from the VirtioDevice trait assumes EventFds are used no matter what). If we look at an MMIO device, in the separation approach, the event driver passes the appropriate events (MMIO exits) to the transport part, and keeps tabs on its state. When the device is ready to activate, the event driver instantiates the queue handler, and drives it in response to its relevant events (which may arrive via EventFd, VM exits, etc). It also propagates any configuration changes, or situations like the elusive device reset. When looking from above, things can get wrapped into different traits and interfaces, but it seems these three are good candidates for the basic building blocks. What do you think?

jiangliu · 2019-06-16T08:05:58Z

On Jun 14, 2019, at 12:39 AM, Alexandru Agache ***@***.*** ***@***.***>> wrote: Hi everyone, wall of text incoming :( It looks like the functionality that supports Virtio devices in Firecracker and other VMMs can be split into three major components:

Great to see discussions about the vm-virtio overall design:)

There's the transport, that handles discovery (when necessary) & configuration in terms of device status, feature negotiation, queue setup, and device-specific configuration space (and also keeps the associated state).

I haven’t get the point why configuration is tied into the transport layer. Device status, feature negotiation, queue setup, and device-specific configuration space should be agonistic to transport layer.

Then, we have the queue handler (based on the Virtio queue implementation), which does actions in response to events/notifications associated with queues and/or the device backend. I think this part should be totally agnostic to the mechanism used to deliver notifications; it should just expose functionality such as handle_this_queue or handle_backend, etc. which is then invoked by the event driver (the following point). The handler must notify the guest on certain occasions, but this can be enabled, for example, via an externally provided closure (I think someone already mentioned this somewhere).

We should support different queue formats here, for the packed vring defined by virtio spec 1.1.

…

Finally, there's an event driver/device state machine of sorts, which handles things like registering io/irq eventfds, other similar things, triggering functionality in response to events, etc. This basically connects the other two components among themselves and to the rest of the VMM. Whenever we'd want to use a Virtio device, we pick a transport, a queue handler + backend, and the event driver part to actuate the rest. The event driver is a bit more difficult to precisely define as an interface, because it's naturally related to other rust-vmm high level concepts such as buses and resource allocators, but it looks doable. I think there's value in clearly separating the transport and queue handler (for example, right now the activate method from the VirtioDevice trait assumes EventFds are used no matter what). If we look at an MMIO device, in the separation approach, the event driver passes the appropriate events (MMIO exits) to the transport part, and keeps tabs on its state. When the device is ready to activate, the event driver instantiates the queue handler, and drives it in response to its relevant events (which may arrive via EventFd, VM exits, etc). It also propagates any configuration changes, or situations like the elusive device reset. When looking from above, things can get wrapped into different traits and interfaces, but it seems these three are good candidates for the basic building blocks. What do you think? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#4?email_source=notifications&email_token=AAOXR7EE3IC2C6XIKQN6QS3P2JZ43A5CNFSM4HTT4VFKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXUJFQY#issuecomment-501781187>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAOXR7EKQ3QH72FACZ7ONNDP2JZ43ANCNFSM4HTT4VFA>.

alexandruag · 2019-06-17T12:13:52Z

Hi Gerry, my choice of words was poor regarding what I meant by transport, because I was referring to precisely the logic which handles the initial configuration/negotiation/setup of the device. However, I included transport in there because I was under the impression there are at least some slight differences regarding the configuration space itself depending on whether we're dealing with MMIO or PCI devices. You're right we should keep in mind that packed virtqueues are a thing now; this is prob done by having a Queue trait implemented by both split and packed queue handling logic.

Some code would prob be much better at illustrating what I was trying to say, so I'm planning to write a very simple mock design/interface to get feedback from ppl. Btw, is there some overall top-down example/discussion about how the current device/bus/etc proposals interact? Basically, if someone wants to build a vmm with these parts from rust-vmm, which are the generic reusable components and what has to be vmm specific? I think this is an important perspective to consider, and I'll try to touch upon it as well.

jiangliu · 2020-03-29T10:34:09Z

It's out of date now.

sameo mentioned this pull request Jun 5, 2019

Add MSI-X support cloud-hypervisor/cloud-hypervisor#38

Merged

jiangliu closed this Mar 29, 2020

Enhance the VirtioDevice trait to support PCI tranport layer #4

Enhance the VirtioDevice trait to support PCI tranport layer #4

Uh oh!

Conversation

jiangliu commented Jun 5, 2019

Uh oh!

bonzini commented Jun 5, 2019

Uh oh!

jiangliu commented Jun 5, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sameo commented Jun 5, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jiangliu commented Jun 5, 2019

Uh oh!

bonzini commented Jun 5, 2019

Uh oh!

sboeuf commented Jun 5, 2019

Uh oh!

bonzini commented Jun 5, 2019

Uh oh!

jiangliu commented Jun 5, 2019

Uh oh!

jiangliu commented Jun 5, 2019

Uh oh!

sboeuf commented Jun 5, 2019

Uh oh!

jiangliu commented Jun 5, 2019 via email

Uh oh!

bonzini commented Jun 5, 2019

Uh oh!

alexandruag commented Jun 13, 2019

Uh oh!

jiangliu commented Jun 16, 2019 via email

Uh oh!

alexandruag commented Jun 17, 2019

Uh oh!

jiangliu commented Mar 29, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

jiangliu commented Jun 5, 2019 •

edited

Loading

sameo commented Jun 5, 2019 •

edited

Loading