Skip to content

Conversation

@rbradford
Copy link
Member

@rbradford rbradford commented Nov 5, 2024

This improves sequential write performance using fio (2888MiB/s ->
3293MiB/s)

VM config: cloud-hypervisor --disk path=/workloads/jammy.raw,direct=on path=/workloads/big-disk.img,direct=on --cpus boot=1 --memory size=2G,shared=on --serial tty --console off --seccomp log --kernel ~/workloads/hypervisor-fw

Host: fio --filename=big-disk.img --direct=1 --rw=write --bs=256k --ioengine=libaio --iodepth=64 --runtime=120 --numjobs=1 --time_based --group_reporting --name=throughput-test-job --eta-newline=1

VM: fio --filename=/dev/vdb --direct=1 --rw=write --bs=256k --ioengine=libaio --iodepth=64 --runtime=120 --numjobs=1 --time_based --group_reporting --name=throughput-test-job --eta-newline=1

Baseline (file on filesystem on host used as backing store for block
device):

throughput-test-job: (groupid=0, jobs=1): err= 0: pid=10169: Tue Nov 5 09:31:55 2024
write: IOPS=13.5k, BW=3385MiB/s (3549MB/s)(397GiB/120008msec); 0 zone resets
slat (usec): min=4, max=10222, avg=20.25, stdev=29.01
clat (usec): min=984, max=45599, avg=4706.01, stdev=2278.11
lat (usec): min=1002, max=45610, avg=4726.27, stdev=2278.77
clat percentiles (usec):
| 1.00th=[ 3195], 5.00th=[ 3228], 10.00th=[ 3261], 20.00th=[ 3261],
| 30.00th=[ 3261], 40.00th=[ 3261], 50.00th=[ 3294], 60.00th=[ 3916],
| 70.00th=[ 5014], 80.00th=[ 7308], 90.00th=[ 7635], 95.00th=[ 7898],
| 99.00th=[ 8586], 99.50th=[ 8979], 99.90th=[36439], 99.95th=[36963],
| 99.99th=[43779]
bw ( MiB/s): min= 1934, max= 4821, per=100.00%, avg=3391.67, stdev=1266.42, samples=239
iops : min= 7738, max=19286, avg=13566.67, stdev=5065.65, samples=239
lat (usec) : 1000=0.01%
lat (msec) : 2=0.03%, 4=61.10%, 10=38.62%, 20=0.11%, 50=0.15%
cpu : usr=17.13%, sys=14.38%, ctx=1352501, majf=0, minf=11
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
issued rwts: total=0,1624829,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
WRITE: bw=3385MiB/s (3549MB/s), 3385MiB/s-3385MiB/s (3549MB/s-3549MB/s), io=397GiB (426GB), run=120008-120008msec

Disk stats (read/write):
dm-2: ios=129/1624787, sectors=1872/831364040, merge=0/0, ticks=185/6960387, in_queue=6960572, util=100.00%, aggrios=130/1626025, aggsectors=1880/831915888, aggrmerge=0/0, aggrticks=194/6967818, aggrin_queue=6968012, aggrutil=99.97%
dm-0: ios=130/1626025, sectors=1880/831915888, merge=0/0, ticks=194/6967818, in_queue=6968012, util=99.97%, aggrios=130/1606095, aggsectors=1880/831915888, aggrmerge=0/19930, aggrticks=204/6634488, aggrin_queue=6635288, aggrutil=58.59%
nvme0n1: ios=130/1606095, sectors=1880/831915888, merge=0/19930, ticks=204/6634488, in_queue=6635288, util=58.59%

On block device in VM:

throughput-test-job: (groupid=0, jobs=1): err= 0: pid=667: Tue Nov 5 09:53:19 2024
write: IOPS=13.2k, BW=3293MiB/s (3453MB/s)(386GiB/120008msec); 0 zone resets
slat (usec): min=4, max=3518, avg=27.77, stdev=35.32
clat (usec): min=723, max=44252, avg=4829.82, stdev=2222.41
lat (usec): min=735, max=44270, avg=4857.85, stdev=2223.45
clat percentiles (usec):
| 1.00th=[ 3097], 5.00th=[ 3195], 10.00th=[ 3195], 20.00th=[ 3228],
| 30.00th=[ 3261], 40.00th=[ 3294], 50.00th=[ 3621], 60.00th=[ 4555],
| 70.00th=[ 5997], 80.00th=[ 7242], 90.00th=[ 7570], 95.00th=[ 7898],
| 99.00th=[ 8586], 99.50th=[ 8848], 99.90th=[36439], 99.95th=[36963],
| 99.99th=[40633]
bw ( MiB/s): min= 1914, max= 4857, per=100.00%, avg=3299.46, stdev=1180.81, samples=239
iops : min= 7658, max=19430, avg=13197.77, stdev=4723.22, samples=239
lat (usec) : 750=0.01%, 1000=0.01%
lat (msec) : 2=0.01%, 4=52.79%, 10=46.95%, 20=0.10%, 50=0.14%
cpu : usr=25.95%, sys=16.71%, ctx=1111821, majf=0, minf=10
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
issued rwts: total=0,1580693,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
WRITE: bw=3293MiB/s (3453MB/s), 3293MiB/s-3293MiB/s (3453MB/s-3453MB/s), io=386GiB (414GB), run=120008-120008msec

Disk stats (read/write):
vdb: ios=60/1953213, merge=0/0, ticks=14/8229134, in_queue=8229149, util=100.00%

Prior to change:

throughput-test-job: (groupid=0, jobs=1): err= 0: pid=667: Tue Nov 5 09:37:45 2024
write: IOPS=11.6k, BW=2888MiB/s (3028MB/s)(338GiB/120008msec); 0 zone resets
slat (usec): min=3, max=3200, avg=18.48, stdev=24.54
clat (usec): min=1237, max=46575, avg=5521.41, stdev=2641.99
lat (usec): min=1249, max=46591, avg=5540.06, stdev=2643.54
clat percentiles (usec):
| 1.00th=[ 2999], 5.00th=[ 3163], 10.00th=[ 3195], 20.00th=[ 3261],
| 30.00th=[ 3294], 40.00th=[ 3359], 50.00th=[ 6063], 60.00th=[ 7111],
| 70.00th=[ 7373], 80.00th=[ 7570], 90.00th=[ 7832], 95.00th=[ 8094],
| 99.00th=[ 8717], 99.50th=[ 9241], 99.90th=[36963], 99.95th=[37487],
| 99.99th=[41157]
bw ( MiB/s): min= 1936, max= 4826, per=100.00%, avg=2892.43, stdev=1202.99, samples=239
iops : min= 7746, max=19306, avg=11569.68, stdev=4811.98, samples=239
lat (msec) : 2=0.01%, 4=46.26%, 10=53.38%, 20=0.09%, 50=0.26%
cpu : usr=14.20%, sys=8.59%, ctx=1246257, majf=0, minf=12
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
issued rwts: total=0,1386102,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
WRITE: bw=2888MiB/s (3028MB/s), 2888MiB/s-2888MiB/s (3028MB/s-3028MB/s), io=338GiB (363GB), run=120008-120008msec

Signed-off-by: Rob Bradford [email protected]

@rbradford rbradford requested a review from a team as a code owner November 5, 2024 10:14
Copy link
Member

@liuw liuw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wow, so there was code to support those features but they were never advertised to the guest. What a great improvement.

This improves sequential write performance using fio (2888MiB/s ->
3293MiB/s)

VM config: cloud-hypervisor --disk path=~/workloads/jammy.raw,direct=on path=~/workloads/big-disk.img,direct=on --cpus boot=1 --memory size=2G,shared=on --serial tty --console off --seccomp log --kernel ~/workloads/hypervisor-fw

Host: fio --filename=big-disk.img --direct=1 --rw=write --bs=256k --ioengine=libaio --iodepth=64 --runtime=120 --numjobs=1 --time_based --group_reporting --name=throughput-test-job --eta-newline=1

VM:  fio --filename=/dev/vdb --direct=1 --rw=write --bs=256k --ioengine=libaio --iodepth=64 --runtime=120 --numjobs=1 --time_based --group_reporting --name=throughput-test-job --eta-newline=1

Baseline (file on filesystem on host used as backing store for block
device):

throughput-test-job: (groupid=0, jobs=1): err= 0: pid=10169: Tue Nov  5 09:31:55 2024
  write: IOPS=13.5k, BW=3385MiB/s (3549MB/s)(397GiB/120008msec); 0 zone resets
    slat (usec): min=4, max=10222, avg=20.25, stdev=29.01
    clat (usec): min=984, max=45599, avg=4706.01, stdev=2278.11
     lat (usec): min=1002, max=45610, avg=4726.27, stdev=2278.77
    clat percentiles (usec):
     |  1.00th=[ 3195],  5.00th=[ 3228], 10.00th=[ 3261], 20.00th=[ 3261],
     | 30.00th=[ 3261], 40.00th=[ 3261], 50.00th=[ 3294], 60.00th=[ 3916],
     | 70.00th=[ 5014], 80.00th=[ 7308], 90.00th=[ 7635], 95.00th=[ 7898],
     | 99.00th=[ 8586], 99.50th=[ 8979], 99.90th=[36439], 99.95th=[36963],
     | 99.99th=[43779]
   bw (  MiB/s): min= 1934, max= 4821, per=100.00%, avg=3391.67, stdev=1266.42, samples=239
   iops        : min= 7738, max=19286, avg=13566.67, stdev=5065.65, samples=239
  lat (usec)   : 1000=0.01%
  lat (msec)   : 2=0.03%, 4=61.10%, 10=38.62%, 20=0.11%, 50=0.15%
  cpu          : usr=17.13%, sys=14.38%, ctx=1352501, majf=0, minf=11
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
     issued rwts: total=0,1624829,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
  WRITE: bw=3385MiB/s (3549MB/s), 3385MiB/s-3385MiB/s (3549MB/s-3549MB/s), io=397GiB (426GB), run=120008-120008msec

Disk stats (read/write):
    dm-2: ios=129/1624787, sectors=1872/831364040, merge=0/0, ticks=185/6960387, in_queue=6960572, util=100.00%, aggrios=130/1626025, aggsectors=1880/831915888, aggrmerge=0/0, aggrticks=194/6967818, aggrin_queue=6968012, aggrutil=99.97%
    dm-0: ios=130/1626025, sectors=1880/831915888, merge=0/0, ticks=194/6967818, in_queue=6968012, util=99.97%, aggrios=130/1606095, aggsectors=1880/831915888, aggrmerge=0/19930, aggrticks=204/6634488, aggrin_queue=6635288, aggrutil=58.59%
  nvme0n1: ios=130/1606095, sectors=1880/831915888, merge=0/19930, ticks=204/6634488, in_queue=6635288, util=58.59%

On block device in VM:

throughput-test-job: (groupid=0, jobs=1): err= 0: pid=667: Tue Nov  5 09:53:19 2024
  write: IOPS=13.2k, BW=3293MiB/s (3453MB/s)(386GiB/120008msec); 0 zone resets
    slat (usec): min=4, max=3518, avg=27.77, stdev=35.32
    clat (usec): min=723, max=44252, avg=4829.82, stdev=2222.41
     lat (usec): min=735, max=44270, avg=4857.85, stdev=2223.45
    clat percentiles (usec):
     |  1.00th=[ 3097],  5.00th=[ 3195], 10.00th=[ 3195], 20.00th=[ 3228],
     | 30.00th=[ 3261], 40.00th=[ 3294], 50.00th=[ 3621], 60.00th=[ 4555],
     | 70.00th=[ 5997], 80.00th=[ 7242], 90.00th=[ 7570], 95.00th=[ 7898],
     | 99.00th=[ 8586], 99.50th=[ 8848], 99.90th=[36439], 99.95th=[36963],
     | 99.99th=[40633]
   bw (  MiB/s): min= 1914, max= 4857, per=100.00%, avg=3299.46, stdev=1180.81, samples=239
   iops        : min= 7658, max=19430, avg=13197.77, stdev=4723.22, samples=239
  lat (usec)   : 750=0.01%, 1000=0.01%
  lat (msec)   : 2=0.01%, 4=52.79%, 10=46.95%, 20=0.10%, 50=0.14%
  cpu          : usr=25.95%, sys=16.71%, ctx=1111821, majf=0, minf=10
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
     issued rwts: total=0,1580693,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
  WRITE: bw=3293MiB/s (3453MB/s), 3293MiB/s-3293MiB/s (3453MB/s-3453MB/s), io=386GiB (414GB), run=120008-120008msec

Disk stats (read/write):
  vdb: ios=60/1953213, merge=0/0, ticks=14/8229134, in_queue=8229149, util=100.00%

Prior to change:

throughput-test-job: (groupid=0, jobs=1): err= 0: pid=667: Tue Nov  5 09:37:45 2024
  write: IOPS=11.6k, BW=2888MiB/s (3028MB/s)(338GiB/120008msec); 0 zone resets
    slat (usec): min=3, max=3200, avg=18.48, stdev=24.54
    clat (usec): min=1237, max=46575, avg=5521.41, stdev=2641.99
     lat (usec): min=1249, max=46591, avg=5540.06, stdev=2643.54
    clat percentiles (usec):
     |  1.00th=[ 2999],  5.00th=[ 3163], 10.00th=[ 3195], 20.00th=[ 3261],
     | 30.00th=[ 3294], 40.00th=[ 3359], 50.00th=[ 6063], 60.00th=[ 7111],
     | 70.00th=[ 7373], 80.00th=[ 7570], 90.00th=[ 7832], 95.00th=[ 8094],
     | 99.00th=[ 8717], 99.50th=[ 9241], 99.90th=[36963], 99.95th=[37487],
     | 99.99th=[41157]
   bw (  MiB/s): min= 1936, max= 4826, per=100.00%, avg=2892.43, stdev=1202.99, samples=239
   iops        : min= 7746, max=19306, avg=11569.68, stdev=4811.98, samples=239
  lat (msec)   : 2=0.01%, 4=46.26%, 10=53.38%, 20=0.09%, 50=0.26%
  cpu          : usr=14.20%, sys=8.59%, ctx=1246257, majf=0, minf=12
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
     issued rwts: total=0,1386102,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
  WRITE: bw=2888MiB/s (3028MB/s), 2888MiB/s-2888MiB/s (3028MB/s-3028MB/s), io=338GiB (363GB), run=120008-120008msec

Signed-off-by: Rob Bradford <[email protected]>
@rbradford rbradford force-pushed the 2024-11-05-virtio-block-features branch from 8970df4 to 15d9f6c Compare November 5, 2024 14:28
@rbradford rbradford changed the title virtio-devices: Enable indirect descriptors and in order features virtio-devices: Enable VIRTIO_RING_F_INDIRECT_DESC Nov 5, 2024
@rbradford
Copy link
Member Author

Oh wow, so there was code to support those features but they were never advertised to the guest. What a great improvement.

I did a bit more digging - and Linux doesn't currently use VIRTIO_F_IN_ORDER so i've removed that for now - the small difference I saw with that enabled must have been run-to-run variance.

@rbradford rbradford added this pull request to the merge queue Nov 5, 2024
Merged via the queue into cloud-hypervisor:main with commit df1d6ea Nov 5, 2024
@rbradford rbradford deleted the 2024-11-05-virtio-block-features branch December 18, 2024 16:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants