Introduce VIRTIO_BLK_F_SEG_MAX #6885

liuw · 2024-12-21T01:40:06Z

Significant improvements in block device performance across the board.

With this new feature:

Test 'block_read_MiBps' running .. (control: test_timeout = 10s, test_iterations = 5, num_queues = 1, queue_size = 128, fio_ops = read, bandwidth = true, overrides: )
Test 'block_read_MiBps' .. ok: mean = 1041.1836760485135, std_dev = 55.7625072234085
Test 'block_write_MiBps' running .. (control: test_timeout = 10s, test_iterations = 5, num_queues = 1, queue_size = 128, fio_ops = write, bandwidth = true, overrides: )
Test 'block_write_MiBps' .. ok: mean = 630.6574363756009, std_dev = 58.07112046007685
Test 'block_random_read_MiBps' running .. (control: test_timeout = 10s, test_iterations = 5, num_queues = 1, queue_size = 128, fio_ops = randread, bandwidth = true, overrides: )
Test 'block_random_read_MiBps' .. ok: mean = 1049.6611271606039, std_dev = 12.955092699638199
Test 'block_random_write_MiBps' running .. (control: test_timeout = 10s, test_iterations = 5, num_queues = 1, queue_size = 128, fio_ops = randwrite, bandwidth = true, overrides: )
Test 'block_random_write_MiBps' .. ok: mean = 649.828933515884, std_dev = 15.113753535445566
Test 'block_multi_queue_read_MiBps' running .. (control: test_timeout = 10s, test_iterations = 5, num_queues = 2, queue_size = 128, fio_ops = read, bandwidth = true, overrides: )
Test 'block_multi_queue_read_MiBps' .. ok: mean = 1045.9945313128374, std_dev = 5.687116997857165
Test 'block_multi_queue_write_MiBps' running .. (control: test_timeout = 10s, test_iterations = 5, num_queues = 2, queue_size = 128, fio_ops = write, bandwidth = true, overrides: )
Test 'block_multi_queue_write_MiBps' .. ok: mean = 1001.8954735340525, std_dev = 17.82514962812694
Test 'block_multi_queue_random_read_MiBps' running .. (control: test_timeout = 10s, test_iterations = 5, num_queues = 2, queue_size = 128, fio_ops = randread, bandwidth = true, overrides: )
Test 'block_multi_queue_random_read_MiBps' .. ok: mean = 1029.3417347532672, std_dev = 8.32247648480339
Test 'block_multi_queue_random_write_MiBps' running .. (control: test_timeout = 10s, test_iterations = 5, num_queues = 2, queue_size = 128, fio_ops = randwrite, bandwidth = true, overrides: )
Test 'block_multi_queue_random_write_MiBps' .. ok: mean = 771.0545246611134, std_dev = 5.895151767945272
Test 'block_read_IOPS' running .. (control: test_timeout = 10s, test_iterations = 5, num_queues = 1, queue_size = 128, fio_ops = read, bandwidth = false, overrides: )
Test 'block_read_IOPS' .. ok: mean = 272408.07766566763, std_dev = 6170.61954484717
Test 'block_write_IOPS' running .. (control: test_timeout = 10s, test_iterations = 5, num_queues = 1, queue_size = 128, fio_ops = write, bandwidth = false, overrides: )
Test 'block_write_IOPS' .. ok: mean = 164097.1681134429, std_dev = 7045.576954582243
Test 'block_random_read_IOPS' running .. (control: test_timeout = 10s, test_iterations = 5, num_queues = 1, queue_size = 128, fio_ops = randread, bandwidth = false, overrides: )
Test 'block_random_read_IOPS' .. ok: mean = 270026.4975599876, std_dev = 3818.1133536678954
Test 'block_random_write_IOPS' running .. (control: test_timeout = 10s, test_iterations = 5, num_queues = 1, queue_size = 128, fio_ops = randwrite, bandwidth = false, overrides: )
Test 'block_random_write_IOPS' .. ok: mean = 167925.33109767124, std_dev = 4540.00892989962
Test 'block_multi_queue_read_IOPS' running .. (control: test_timeout = 10s, test_iterations = 5, num_queues = 2, queue_size = 128, fio_ops = read, bandwidth = false, overrides: )
Test 'block_multi_queue_read_IOPS' .. ok: mean = 267630.6297761863, std_dev = 1144.5182710512386
Test 'block_multi_queue_write_IOPS' running .. (control: test_timeout = 10s, test_iterations = 5, num_queues = 2, queue_size = 128, fio_ops = write, bandwidth = false, overrides: )
Test 'block_multi_queue_write_IOPS' .. ok: mean = 249247.9256193595, std_dev = 2903.4336638199043
Test 'block_multi_queue_random_read_IOPS' running .. (control: test_timeout = 10s, test_iterations = 5, num_queues = 2, queue_size = 128, fio_ops = randread, bandwidth = false, overrides: )
Test 'block_multi_queue_random_read_IOPS' .. ok: mean = 261706.90908901737, std_dev = 1360.5256601287383
Test 'block_multi_queue_random_write_IOPS' running .. (control: test_timeout = 10s, test_iterations = 5, num_queues = 2, queue_size = 128, fio_ops = randwrite, bandwidth = false, overrides: )
Test 'block_multi_queue_random_write_IOPS' .. ok: mean = 196864.48090243965, std_dev = 1069.8654182837279

Without:

Test 'block_read_MiBps' running .. (control: test_timeout = 10s, test_iterations = 5, num_queues = 1, queue_size = 128, fio_ops = read, bandwidth = true, overrides: )
Test 'block_read_MiBps' .. ok: mean = 485.82088019858895, std_dev = 1.7447280725526355
Test 'block_write_MiBps' running .. (control: test_timeout = 10s, test_iterations = 5, num_queues = 1, queue_size = 128, fio_ops = write, bandwidth = true, overrides: )
Test 'block_write_MiBps' .. ok: mean = 487.8825615445379, std_dev = 31.092624291944347
Test 'block_random_read_MiBps' running .. (control: test_timeout = 10s, test_iterations = 5, num_queues = 1, queue_size = 128, fio_ops = randread, bandwidth = true, overrides: )
Test 'block_random_read_MiBps' .. ok: mean = 155.45657135073682, std_dev = 0.5834015665376312
Test 'block_random_write_MiBps' running .. (control: test_timeout = 10s, test_iterations = 5, num_queues = 1, queue_size = 128, fio_ops = randwrite, bandwidth = true, overrides: )
Test 'block_random_write_MiBps' .. ok: mean = 190.13974678009396, std_dev = 2.979427312202239
Test 'block_multi_queue_read_MiBps' running .. (control: test_timeout = 10s, test_iterations = 5, num_queues = 2, queue_size = 128, fio_ops = read, bandwidth = true, overrides: )
Test 'block_multi_queue_read_MiBps' .. ok: mean = 882.0916764400447, std_dev = 26.671466624352554
Test 'block_multi_queue_write_MiBps' running .. (control: test_timeout = 10s, test_iterations = 5, num_queues = 2, queue_size = 128, fio_ops = write, bandwidth = true, overrides: )
Test 'block_multi_queue_write_MiBps' .. ok: mean = 865.7831747164018, std_dev = 108.55131155781514
Test 'block_multi_queue_random_read_MiBps' running .. (control: test_timeout = 10s, test_iterations = 5, num_queues = 2, queue_size = 128, fio_ops = randread, bandwidth = true, overrides: )
Test 'block_multi_queue_random_read_MiBps' .. ok: mean = 166.38560575436063, std_dev = 1.18464999551714
Test 'block_multi_queue_random_write_MiBps' running .. (control: test_timeout = 10s, test_iterations = 5, num_queues = 2, queue_size = 128, fio_ops = randwrite, bandwidth = true, overrides: )
Test 'block_multi_queue_random_write_MiBps' .. ok: mean = 185.04609699916395, std_dev = 4.761066952685287
Test 'block_read_IOPS' running .. (control: test_timeout = 10s, test_iterations = 5, num_queues = 1, queue_size = 128, fio_ops = read, bandwidth = false, overrides: )
Test 'block_read_IOPS' .. ok: mean = 124633.69637797368, std_dev = 1161.9433928482986
Test 'block_write_IOPS' running .. (control: test_timeout = 10s, test_iterations = 5, num_queues = 1, queue_size = 128, fio_ops = write, bandwidth = false, overrides: )
Test 'block_write_IOPS' .. ok: mean = 128261.38608432449, std_dev = 436.4849919404479
Test 'block_random_read_IOPS' running .. (control: test_timeout = 10s, test_iterations = 5, num_queues = 1, queue_size = 128, fio_ops = randread, bandwidth = false, overrides: )
Test 'block_random_read_IOPS' .. ok: mean = 39719.41820628812, std_dev = 198.91780571227557
Test 'block_random_write_IOPS' running .. (control: test_timeout = 10s, test_iterations = 5, num_queues = 1, queue_size = 128, fio_ops = randwrite, bandwidth = false, overrides: )
Test 'block_random_write_IOPS' .. ok: mean = 48465.39677010305, std_dev = 1094.3443985519684
Test 'block_multi_queue_read_IOPS' running .. (control: test_timeout = 10s, test_iterations = 5, num_queues = 2, queue_size = 128, fio_ops = read, bandwidth = false, overrides: )
Test 'block_multi_queue_read_IOPS' .. ok: mean = 234775.41672233236, std_dev = 6278.984603821063
Test 'block_multi_queue_write_IOPS' running .. (control: test_timeout = 10s, test_iterations = 5, num_queues = 2, queue_size = 128, fio_ops = write, bandwidth = false, overrides: )
Test 'block_multi_queue_write_IOPS' .. ok: mean = 231800.0319106508, std_dev = 27757.126228905294
Test 'block_multi_queue_random_read_IOPS' running .. (control: test_timeout = 10s, test_iterations = 5, num_queues = 2, queue_size = 128, fio_ops = randread, bandwidth = false, overrides: )
Test 'block_multi_queue_random_read_IOPS' .. ok: mean = 42492.338109083124, std_dev = 520.181025050384
Test 'block_multi_queue_random_write_IOPS' running .. (control: test_timeout = 10s, test_iterations = 5, num_queues = 2, queue_size = 128, fio_ops = randwrite, bandwidth = false, overrides: )
Test 'block_multi_queue_random_write_IOPS' .. ok: mean = 47620.581976033485, std_dev = 1079.9372604029465

Cc @russell-islam @xietou

block/src/lib.rs

liuw · 2024-12-21T03:10:35Z

The VHDX test on ARM64 is broken by this. Need to investigate.

rbradford · 2024-12-21T22:43:49Z

The VHDX test on ARM64 is broken by this. Need to investigate.

And on x86-64 too - so I don't think it's architecture specific!

liuw · 2024-12-22T01:36:28Z

The VHDX test on ARM64 is broken by this. Need to investigate.

And on x86-64 too - so I don't think it's architecture specific!

Just noticed that.

QCOW tests are passing, so this is probably a latent bug in VHDX implementation.

virtio-devices/src/block.rs

liuw · 2024-12-24T22:31:34Z

I will resend this PR after #6890 is merged.

liuw · 2024-12-25T02:07:54Z

There is another check we should add in all the Virtio device config validation function. The queue size should be a power of 2.

This PR introduces a new InvalidQueueSize error. It can be used later for the new check.

liuw · 2024-12-31T21:00:57Z

@cloud-hypervisor/cloud-hypervisor-reviewers any more comments on this PR?

russell-islam · 2024-12-31T23:11:46Z

LGTM

likebreath

Good work. I can see single queue tests benefit more comparing with multiple queue tests, which is kind of expected.

For the random read/write tests, they are seeing the most significant improvements, and basically matching up with the sequential tests performance. Do you think this is expected?

likebreath · 2025-01-01T00:19:13Z

virtio-devices/src/block.rs

                    physical_block_exp,
                    min_io_size: (topology.minimum_io_size / logical_block_size) as u16,
                    opt_io_size: (topology.optimal_io_size / logical_block_size) as u32,
+                    seg_max: (queue_size - 2) as u32,


Can you please explain why the seg_max is set to this value?

A request is consist of at least one our header and one in header, IIRC. What's left in the queue can be used for data segments.

Thank you for the explanation. Can you please share some pointers for a more detailed context? I always find virtio spec way too concise to understand by itself.

I looked at QEMU code. It was always like that since the beginning with no explanation.

The closest I can find is virtblk_add_req in Linux.

vmm/src/config.rs

liuw · 2025-01-01T01:59:21Z

For the random read/write tests, they are seeing the most significant improvements, and basically matching up with the sequential tests performance. Do you think this is expected?

My only expectation is the performance will improve a lot. I cannot say one way or another whether random rws can be better or worse than seq rws.

This allows the guest to put in more than one segment per request. It can improve the throughput of the system. Introduce a new check to make sure the queue size configured by the user is large enough to hold at least one segment. Signed-off-by: Wei Liu <[email protected]>

The size was set to one because without VIRTIO_BLK_F_SEG_MAX, the guest only used one data descriptor per request. The value 32 is empirically derived from booting a guest. This value eliminates all SmallVec allocations observable by DHAT. Signed-off-by: Wei Liu <[email protected]>

likebreath · 2025-01-01T16:55:37Z

For the random read/write tests, they are seeing the most significant improvements, and basically matching up with the sequential tests performance. Do you think this is expected?

My only expectation is the performance will improve a lot. I cannot say one way or another whether random rws can be better or worse than seq rws.

Yes, the performance improvements are substantial and awesome. No doubt on that.

It was the random read/write matching up with sequential read/write across the board that puzzled me. I wanted to see if you have any insights. (I always thought random read/write are supposed to be much slower.)

likebreath · 2025-01-01T17:02:27Z

@TimePrinciple @rbradford The risc-v runner is offline. Would you please take a look? Thanks.

rbradford · 2025-01-01T18:50:28Z

test_virtio_block_vhdx is failing on musl on AMD - this might just be a timing flake

TimePrinciple · 2025-01-02T02:05:32Z

@TimePrinciple @rbradford The risc-v runner is offline. Would you please take a look? Thanks.

I've synced the message in our Slack channel, and unfortunately it looks like the process is about to take much longer than expected(Our lab is building a new server room). I have moved that machine to my office and get it online for now(and will be moved into server room when it's completed) 🙂

liuw requested a review from a team as a code owner December 21, 2024 01:40

liuw force-pushed the blk-seg-max branch from 187f07d to 6d90368 Compare December 21, 2024 01:41

russell-islam approved these changes Dec 21, 2024

View reviewed changes

block/src/lib.rs Show resolved Hide resolved

russell-islam approved these changes Dec 21, 2024

View reviewed changes

block/src/lib.rs Show resolved Hide resolved

liuw force-pushed the blk-seg-max branch from 6d90368 to 0e72d9b Compare December 21, 2024 20:36

xietuo reviewed Dec 23, 2024

View reviewed changes

virtio-devices/src/block.rs Outdated Show resolved Hide resolved

liuw force-pushed the blk-seg-max branch from 0e72d9b to 9d6c785 Compare December 25, 2024 01:57

liuw force-pushed the blk-seg-max branch 2 times, most recently from c69522a to ba31f83 Compare December 29, 2024 07:49

likebreath reviewed Jan 1, 2025

View reviewed changes

liuw added 2 commits January 1, 2025 02:06

liuw force-pushed the blk-seg-max branch from ba31f83 to a31a5ea Compare January 1, 2025 02:07

likebreath approved these changes Jan 1, 2025

View reviewed changes

likebreath added this pull request to the merge queue Jan 1, 2025

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jan 1, 2025

rbradford added this pull request to the merge queue Jan 1, 2025

Merged via the queue into cloud-hypervisor:main with commit 1f7b809 Jan 1, 2025
34 of 38 checks passed

liuw deleted the blk-seg-max branch January 1, 2025 20:26

likebreath mentioned this pull request Jan 2, 2025

Flaky test test_virtio_block_vhdx #6897

Closed

Introduce VIRTIO_BLK_F_SEG_MAX #6885

Introduce VIRTIO_BLK_F_SEG_MAX #6885

Uh oh!

Conversation

liuw commented Dec 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

liuw commented Dec 21, 2024

Uh oh!

rbradford commented Dec 21, 2024

Uh oh!

liuw commented Dec 22, 2024

Uh oh!

Uh oh!

liuw commented Dec 24, 2024

Uh oh!

liuw commented Dec 25, 2024

Uh oh!

liuw commented Dec 31, 2024

Uh oh!

russell-islam commented Dec 31, 2024

Uh oh!

likebreath left a comment

Choose a reason for hiding this comment

Uh oh!

likebreath Jan 1, 2025

Choose a reason for hiding this comment

Uh oh!

liuw Jan 1, 2025

Choose a reason for hiding this comment

Uh oh!

likebreath Jan 1, 2025

Choose a reason for hiding this comment

Uh oh!

liuw Jan 2, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

liuw commented Jan 1, 2025

Uh oh!

likebreath commented Jan 1, 2025

Uh oh!

likebreath commented Jan 1, 2025

Uh oh!

Uh oh!

rbradford commented Jan 1, 2025

Uh oh!

Uh oh!

TimePrinciple commented Jan 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

liuw commented Dec 21, 2024 •

edited

Loading

TimePrinciple commented Jan 2, 2025 •

edited

Loading