-
Notifications
You must be signed in to change notification settings - Fork 565
Implement EVENT_IDX and polling for vhost_user_block #774
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
VIRTIO_RING_F_EVENT_IDX is a virtio feature that allows to avoid device <-> driver notifications under some circunstances, most notably when actively polling the queue. This commit implements support for in in the vm-virtio crate. Consumers of this crate will also need to add support for it by exposing the feature and calling using update_avail_event() and get_used_event() accordingly. Signed-off-by: Sergio Lopez <[email protected]>
Add helpers to Vring and VhostUserSlaveReqHandler for EVENT_IDX, so consumers of this crate can make use of this feature. Signed-off-by: Sergio Lopez <[email protected]>
Now that vhost_user_backend and vm-virtio do support EVENT_IDX, use it in vhost_user_block to reduce the number of notifications sent between the driver and the device. This is specially useful when using active polling on the virtqueue, as it'll be implemented by a future patch. This is a snapshot of kvm_stat while generating ~60K IOPS with fio on the guest without EVENT_IDX: Event Total %Total CurAvg/s kvm_entry 393454 20.3 62494 kvm_exit 393446 20.3 62494 kvm_apic_accept_irq 378146 19.5 60268 kvm_msi_set_irq 369720 19.0 58881 kvm_fast_mmio 370497 19.1 58817 kvm_hv_timer_state 10197 0.5 1715 kvm_msr 8770 0.5 1443 kvm_wait_lapic_expire 7018 0.4 1118 kvm_apic 2768 0.1 538 kvm_pv_tlb_flush 2028 0.1 360 kvm_vcpu_wakeup 1453 0.1 278 kvm_apic_ipi 1384 0.1 269 kvm_fpu 1148 0.1 164 kvm_pio 574 0.0 82 kvm_userspace_exit 574 0.0 82 kvm_halt_poll_ns 24 0.0 3 And this is the snapshot while doing the same thing with EVENT_IDX: Event Total %Total CurAvg/s kvm_entry 35506 26.0 3873 kvm_exit 35499 26.0 3873 kvm_hv_timer_state 14740 10.8 1672 kvm_apic_accept_irq 13017 9.5 1438 kvm_msr 12845 9.4 1421 kvm_wait_lapic_expire 10422 7.6 1118 kvm_apic 3788 2.8 502 kvm_pv_tlb_flush 2708 2.0 340 kvm_vcpu_wakeup 1992 1.5 258 kvm_apic_ipi 1894 1.4 251 kvm_fpu 1476 1.1 164 kvm_pio 738 0.5 82 kvm_userspace_exit 738 0.5 82 kvm_msi_set_irq 701 0.5 69 kvm_fast_mmio 238 0.2 4 kvm_halt_poll_ns 50 0.0 1 kvm_ple_window_update 28 0.0 0 kvm_page_fault 4 0.0 0 It can be clearly appreciated how the number of vm exits per second, specially the ones related to notifications (kvm_fast_mmio and kvm_msi_set_irq) is drastically lower. Signed-off-by: Sergio Lopez <[email protected]>
rbradford
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@slp lgtm. I'm not sure I see the value in making the number configurable, could we just set a sensible default and be done with it?
@rbradford That can be complicated, as the best value for |
Sounds good. |
|
@rbradford OK, I'll turn |
|
@slp Can we just turn it on by default? |
|
@rbradford why do you want to do polling by default? |
EVENT_IDX is designed to improve performance so we should turn it on by default. |
@sameo @rbradford EVENT_IDX will always be enabled (as long its negotiated by the guest), the question is whether to have active polling on vhost_user_block enabled by default, with the CPU cost this implies, or make it a tunable. Personally, I don't have a problem have enabling it by default, as I always enable it in my VMs ;-) On the other hand, even if we decide to have it enabled by default, should we also still have a parameter to disable it? |
|
Can you get some simple numbers to show the improvement it makes? (bonnie? or dd or something like that?) I would lean towards making it a boolean, on by default. |
Same here, a boolean knob enabled by default. It will help reducing our number of vm exits, but if users don't want to pay the CPU utilization price they can turn it off. |
Sure, I've put some numbers on the fourth commit's message:
Sounds good to me too. |
|
@slp whoops. Sorry I missed that, the improvement looks great. |
Actively polling the virtqueue significantly reduces the latency of
each I/O operation, at the expense of using more CPU time. This
features is specially useful when using low-latency devices (SSD,
NVMe) as the backend.
This change implements static polling. When a request arrives after
being idle, vhost_user_block will keep checking the virtqueue for new
requests, until POLL_QUEUE_US (50us) has passed without finding one.
POLL_QUEUE_US is defined to be 50us, based on the current latency of
enterprise SSDs (< 30us) and the overhead of the emulation.
This feature is enabled by default, and can be disabled by using the
"poll_queue" parameter of "block-backend".
This is a test using null_blk as a backend for the image, with the
following parameters:
- null_blk gb=20 nr_devices=1 irqmode=2 completion_nsec=0 no_sched=1
With "poll_queue=false":
fio --ioengine=sync --bs=4k --rw randread --name randread --direct=1
--filename=/dev/vdb --time_based --runtime=10
randread: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=sync, iodepth=1
fio-3.14
Starting 1 process
Jobs: 1 (f=1): [r(1)][100.0%][r=169MiB/s][r=43.2k IOPS][eta 00m:00s]
randread: (groupid=0, jobs=1): err= 0: pid=433: Tue Feb 18 11:12:59 2020
read: IOPS=43.2k, BW=169MiB/s (177MB/s)(1688MiB/10001msec)
clat (usec): min=17, max=836, avg=21.64, stdev= 3.81
lat (usec): min=17, max=836, avg=21.77, stdev= 3.81
clat percentiles (nsec):
| 1.00th=[19328], 5.00th=[19840], 10.00th=[20352], 20.00th=[21120],
| 30.00th=[21376], 40.00th=[21376], 50.00th=[21376], 60.00th=[21632],
| 70.00th=[21632], 80.00th=[21888], 90.00th=[22144], 95.00th=[22912],
| 99.00th=[28544], 99.50th=[30336], 99.90th=[39168], 99.95th=[42752],
| 99.99th=[71168]
bw ( KiB/s): min=168440, max=188496, per=100.00%, avg=172912.00, stdev=3975.63, samples=19
iops : min=42110, max=47124, avg=43228.00, stdev=993.91, samples=19
lat (usec) : 20=5.90%, 50=94.08%, 100=0.02%, 250=0.01%, 500=0.01%
lat (usec) : 750=0.01%, 1000=0.01%
cpu : usr=10.35%, sys=25.82%, ctx=432417, majf=0, minf=10
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=432220,0,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=1
Run status group 0 (all jobs):
READ: bw=169MiB/s (177MB/s), 169MiB/s-169MiB/s (177MB/s-177MB/s), io=1688MiB (1770MB), run=10001-10001msec
Disk stats (read/write):
vdb: ios=427867/0, merge=0/0, ticks=7346/0, in_queue=0, util=99.04%
With "poll_queue=true" (default):
fio --ioengine=sync --bs=4k --rw randread --name randread --direct=1
--filename=/dev/vdb --time_based --runtime=10
randread: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=sync, iodepth=1
fio-3.14
Starting 1 process
Jobs: 1 (f=1): [r(1)][100.0%][r=260MiB/s][r=66.7k IOPS][eta 00m:00s]
randread: (groupid=0, jobs=1): err= 0: pid=422: Tue Feb 18 11:14:47 2020
read: IOPS=68.5k, BW=267MiB/s (280MB/s)(2674MiB/10001msec)
clat (usec): min=10, max=966, avg=13.60, stdev= 3.49
lat (usec): min=10, max=966, avg=13.70, stdev= 3.50
clat percentiles (nsec):
| 1.00th=[11200], 5.00th=[11968], 10.00th=[11968], 20.00th=[12224],
| 30.00th=[12992], 40.00th=[13504], 50.00th=[13760], 60.00th=[13888],
| 70.00th=[14016], 80.00th=[14144], 90.00th=[14272], 95.00th=[14656],
| 99.00th=[20352], 99.50th=[23936], 99.90th=[35072], 99.95th=[36096],
| 99.99th=[47872]
bw ( KiB/s): min=265456, max=296456, per=100.00%, avg=274229.05, stdev=13048.14, samples=19
iops : min=66364, max=74114, avg=68557.26, stdev=3262.03, samples=19
lat (usec) : 20=98.84%, 50=1.15%, 100=0.01%, 250=0.01%, 500=0.01%
lat (usec) : 750=0.01%, 1000=0.01%
cpu : usr=8.24%, sys=21.15%, ctx=684669, majf=0, minf=10
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=684611,0,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=1
Run status group 0 (all jobs):
READ: bw=267MiB/s (280MB/s), 267MiB/s-267MiB/s (280MB/s-280MB/s), io=2674MiB (2804MB), run=10001-10001msec
Disk stats (read/write):
vdb: ios=677855/0, merge=0/0, ticks=7026/0, in_queue=0, util=99.04%
Signed-off-by: Sergio Lopez <[email protected]>
|
Done, I've replaced |
| "vhost-user-block backend parameters \"image=<image_path>,\ | ||
| sock=<socket_path>,num_queues=<number_of_queues>,\ | ||
| readonly=true|false,direct=true|false\"", | ||
| readonly=true|false,direct=true|false,poll_queue=true|false\"", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please update the OpenAPI definition here: https://github.com/cloud-hypervisor/cloud-hypervisor/blob/master/vmm/src/api/openapi/cloud-hypervisor.yaml
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I'll do that in a separate patch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually @rbradford pointed me that you don't need to do anything since you're not modifying any of the VMM's device. Sorry I got confused because now the main.rs CLI includes options for vhost-user backends.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was about to ask about this. Should I add poll_queue to DiskConfig and CH's disk argument? Reading the code, I looks to me that block-backend and disk need to be synchronized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
This patch is nice @slp as it'll let the user choose for perf at CPU cost. |
Good question. IIRC, virtio-net has its own interrupt suppression mechanism, so it may need a slightly different approach. It's probably worth creating an issue to track this, though. |
| // Actively poll the queue until POLL_QUEUE_US has passed | ||
| // without seeing a new request. | ||
| let mut now = Instant::now(); | ||
| loop { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the events come here are from the epoll in vhost_user_backend, is it necessary to add a loop here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is for polling the queue for new elements without having to wait for an ioeventfd signal.
| } | ||
| } | ||
|
|
||
| if self.event_idx { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This branch is already added in process_queue, why check it again here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The check in process_queue is to determine whether the EVENT_IDX feature can be used to omit sending an interrupt to the guest, while the one here, as the comment says, is to tackle the fact that the Queue iterator only checks avail_index once.
This patch series implements EVENT_IDX, a virtio feature that allows reducing the number of driver <-> device notifications and thus, reducing the number of vm exits needed while processing requests. This is specially useful when actively polling the virtqueues.