Skip to content

Conversation

@jiangliu
Copy link
Member

@jiangliu jiangliu commented Jun 5, 2019

The VirtioDevice trait derived from the firecracker project is heavily
shaped by the MMIO transport layer. So enhance the VirtioDevice trait
to prepare for Virtio PCI transport layer. The key change is to support
multiple interrupts for a virtio device. And also encapsulate parameters
to VirtioDevice::activate() as VirtioDeviceConfig for readability.

Signed-off-by: Liu Jiang [email protected]

The VirtioDevice trait derived from the firecracker project is heavily
shaped by the MMIO transport layer. So enhance the VirtioDevice trait
to prepare for Virtio PCI transport layer. The key change is to support
multiple interrupts for a virtio device. And also encapsulate parameters
to VirtioDevice::activate() as VirtioDeviceConfig for readability.

Signed-off-by: Liu Jiang <[email protected]>
@bonzini
Copy link
Member

bonzini commented Jun 5, 2019

I'm a bit wary of adding something like VirtioDeviceInterrupt here, it seems like a generic trait that should be in vm-device. What is the state there? Should we start that crate by just adding

// not sure why Sized + Send in @jiangliu's patch
pub trait ConsumeInterrupt {
    /// Inject the specified interrupt into the guest.
    fn trigger(&self, interrupt_id: u32) -> std::io::Error;
}

impl ConsumeInterrupt for EventFd {
    fn trigger(&self, interrupt_id: u32) {
        self.write(1);
    }
}

impl ConsumeInterrupt for Vec<EventFd> {
    fn trigger(&self, interrupt_id: u32) {
        self[interrupt_id].trigger(0);
    }
}

?

@jiangliu
Copy link
Member Author

jiangliu commented Jun 5, 2019

I'm a bit wary of adding something like VirtioDeviceInterrupt here, it seems like a generic trait that should be in vm-device. What is the state there? Should we start that crate by just adding

// not sure why Sized + Send in @jiangliu's patch
pub trait ConsumeInterrupt {
    /// Inject the specified interrupt into the guest.
    fn trigger(&self, interrupt_id: u32) -> std::io::Error;
}

impl ConsumeInterrupt for EventFd {
    fn trigger(&self, interrupt_id: u32) {
        self.write(1);
    }
}

impl ConsumeInterrupt for Vec<EventFd> {
    fn trigger(&self, interrupt_id: u32) {
        self[interrupt_id].trigger(0);
    }
}

?

The VirtioDeviceInterrupt will have concrete implementations for both legacy irq and PCI MSI. For legacy IRQ, it need to set status bits into the interrupt status register before writing to the eventfd. So we can't simple implement ConsumeInterrupt trait for EventFd/Vec.

And it's a good point that the VirtioDeviceInterrupt should be renamed and moved into the vm-device, there's no special treatment for VirtioDevice at all.

The Sized + Send may be an over-claim:(

@sameo
Copy link

sameo commented Jun 5, 2019

I'm a bit wary of adding something like VirtioDeviceInterrupt here, it seems like a generic trait that should be in vm-device. What is the state there?

I agree, and this is exactly what we're currently talking about here: cloud-hypervisor/cloud-hypervisor#38 (comment) , also for adding MSI-X support to cloud-hypervisor.

A generic trait makes sense to me, and it should indeed be part of the vm-device crate.
As for the latter, there's a PR currently pending for it, feel free to chime in so that we can pile this kind of addition on top of the initial PR once merged.

@jiangliu
Copy link
Member Author

jiangliu commented Jun 5, 2019

Seems I should review the vm-device crate first:)
It's a good point to let the vm-device crate to handle interrupts.

@bonzini
Copy link
Member

bonzini commented Jun 5, 2019

For legacy IRQ, it need to set status bits into the interrupt status register before writing to the eventfd.

That should be handled by the PCI transport. The PCI transport should provide its own implementation of ConsumeInterrupt, with code like "if !self.msi_active { self.set_interrupt_status(); } interrupt_controller.trigger(interrupt_id);" where interrupt_controller is also a ConsumeInterrupt implementor, probably provided by the PCI bus, which in turn will percolate all the way down to Vec<Eventfd> or something that invokes the KVM_SIGNAL_MSI ioctl.

@sboeuf
Copy link

sboeuf commented Jun 5, 2019

@bonzini I feel we need two levels here since we want to keep the KVM specific bits at the vmm code level, while in between at the PCI level, we need to make a difference between IRQ and MSI.
I'm sure what I'm trying to say is not very clear but take a look here for more details.

@bonzini
Copy link
Member

bonzini commented Jun 5, 2019

I feel we need two levels here since we want to keep the KVM specific bits at the vmm code level, while in between at the PCI level, we need to make a difference between IRQ and MSI.

Right, a trait gives you all the levels you need, for example:

  • virtio devices just see ConsumeInterrupt

  • virtio-pci devices provide an implementation that distinguishes between IRQ and MSI

  • the PCI bus provides two implementations, one for IRQ and one for MSI

  • the VMM uses the closure mechanism you have in Add MSI-X support cloud-hypervisor/cloud-hypervisor#38 to invoke KVM ioctls or write to eventfds

@jiangliu
Copy link
Member Author

jiangliu commented Jun 5, 2019

For legacy IRQ, it need to set status bits into the interrupt status register before writing to the eventfd.

That should be handled by the PCI transport. The PCI transport should provide its own implementation of ConsumeInterrupt, with code like "if !self.msi_active { self.set_interrupt_status(); } interrupt_controller.trigger(interrupt_id);" where interrupt_controller is also a ConsumeInterrupt implementor, probably provided by the PCI bus, which in turn will percolate all the way down to Vec<Eventfd> or something that invokes the KVM_SIGNAL_MSI ioctl.

Good point. Originally I planned to share code to deal with legacy IRQs between the PCI and MMIO transport layer.

@jiangliu
Copy link
Member Author

jiangliu commented Jun 5, 2019

I feel we need two levels here since we want to keep the KVM specific bits at the vmm code level, while in between at the PCI level, we need to make a difference between IRQ and MSI.

Right, a trait gives you all the levels you need, for example:

  • virtio devices just see ConsumeInterrupt
  • virtio-pci devices provide an implementation that distinguishes between IRQ and MSI
  • the PCI bus provides two implementations, one for IRQ and one for MSI
  • the VMM uses the closure mechanism you have in intel/cloud-hypervisor#38 to invoke KVM ioctls or write to eventfds

I like the idea of cascaded ConsumeInterrupt, it gives us chances to play magic:)

@sboeuf
Copy link

sboeuf commented Jun 5, 2019

@bonzini ok I'm really trying to understand :)

virtio devices just see ConsumeInterrupt

Right now, the EventFd, or the callback, is passed through VirtioDevice.activate(), how does that work with ConsumeInterrupt? What type of object would you pass to VirtioDevice so that later on, VirtioDevice can call into object.trigger() because the object implements ConsumeInterrupt?

virtio-pci devices provide an implementation that distinguishes between IRQ and MSI

Are you suggesting something like:

impl ConsumeInterrupt for VirtioPciDevice {
        fn trigger() {
                ...
        }
}

But we're not supposed to pass the VirtioPciDevice down to the VirtioDevice, so who would call into this implementation?

the PCI bus provides two implementations, one for IRQ and one for MSI

PciDevice is a trait. Are you suggesting that PciDevice could also implement ConsumeInterrupt with some default implementation?

the VMM uses the closure mechanism you have in cloud-hypervisor/cloud-hypervisor#38 to invoke KVM ioctls or write to eventfds

Ok cool!

@jiangliu
Copy link
Member Author

jiangliu commented Jun 5, 2019 via email

@bonzini
Copy link
Member

bonzini commented Jun 5, 2019

But we're not supposed to pass the VirtioPciDevice down to the VirtioDevice, so who would
call into this implementation?

VirtioDevice could implement the trait, or alternatively it could be passed to activate.

the PCI bus provides two implementations, one for IRQ and one for MSI

PciDevice is a trait. Are you suggesting that PciDevice could also implement ConsumeInterrupt with some default implementation?

Either that, or it could have metods

    fn intx_handler(&self) -> &LevelTriggeredInterrupt;
    fn msi_handler(&self) -> &EdgeTriggeredInterrupt;

(what I wrote as ConsumeInterrupt above, has now become EdgeTriggeredInterrupt - sorry about that).

@alexandruag
Copy link
Collaborator

Hi everyone, wall of text incoming :( It looks like the functionality that supports Virtio devices in Firecracker and other VMMs can be split into three major components:

  • There's the transport, that handles discovery (when necessary) & configuration in terms of device status, feature negotiation, queue setup, and device-specific configuration space (and also keeps the associated state).

  • Then, we have the queue handler (based on the Virtio queue implementation), which does actions in response to events/notifications associated with queues and/or the device backend. I think this part should be totally agnostic to the mechanism used to deliver notifications; it should just expose functionality such as handle_this_queue or handle_backend, etc. which is then invoked by the event driver (the following point). The handler must notify the guest on certain occasions, but this can be enabled, for example, via an externally provided closure (I think someone already mentioned this somewhere).

  • Finally, there's an event driver/device state machine of sorts, which handles things like registering io/irq eventfds, other similar things, triggering functionality in response to events, etc. This basically connects the other two components among themselves and to the rest of the VMM.

Whenever we'd want to use a Virtio device, we pick a transport, a queue handler + backend, and the event driver part to actuate the rest. The event driver is a bit more difficult to precisely define as an interface, because it's naturally related to other rust-vmm high level concepts such as buses and resource allocators, but it looks doable.

I think there's value in clearly separating the transport and queue handler (for example, right now the activate method from the VirtioDevice trait assumes EventFds are used no matter what). If we look at an MMIO device, in the separation approach, the event driver passes the appropriate events (MMIO exits) to the transport part, and keeps tabs on its state. When the device is ready to activate, the event driver instantiates the queue handler, and drives it in response to its relevant events (which may arrive via EventFd, VM exits, etc). It also propagates any configuration changes, or situations like the elusive device reset. When looking from above, things can get wrapped into different traits and interfaces, but it seems these three are good candidates for the basic building blocks. What do you think?

@jiangliu
Copy link
Member Author

jiangliu commented Jun 16, 2019 via email

@alexandruag
Copy link
Collaborator

Hi Gerry, my choice of words was poor regarding what I meant by transport, because I was referring to precisely the logic which handles the initial configuration/negotiation/setup of the device. However, I included transport in there because I was under the impression there are at least some slight differences regarding the configuration space itself depending on whether we're dealing with MMIO or PCI devices. You're right we should keep in mind that packed virtqueues are a thing now; this is prob done by having a Queue trait implemented by both split and packed queue handling logic.

Some code would prob be much better at illustrating what I was trying to say, so I'm planning to write a very simple mock design/interface to get feedback from ppl. Btw, is there some overall top-down example/discussion about how the current device/bus/etc proposals interact? Basically, if someone wants to build a vmm with these parts from rust-vmm, which are the generic reusable components and what has to be vmm specific? I think this is an important perspective to consider, and I'll try to touch upon it as well.

@jiangliu
Copy link
Member Author

It's out of date now.

@jiangliu jiangliu closed this Mar 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants