Skip to content

Conversation

@russell-islam
Copy link
Contributor

@russell-islam russell-islam commented Jun 18, 2025

Compare to QEMU CLH block performace is worse in smaller block sizes. For smaller block sizes, multiple requests are in a Queue, Batching those requests improves the IO performance while not regressing the perfromace of the larger block sizes. Attaching the FIO bench marking here (SEQ read and SEQ write). Comparing the reg(Current impletion) with batch(this PR).
This pull request introduces batch request submission functionality to the asynchronous I/O subsystem, improves error handling, and refactors existing code to support the new feature. The changes primarily affect the block and virtio-devices modules, with updates to enums, traits, and implementations to enable caching and submission of batched I/O requests.

seq-write
Seq-Read

@russell-islam russell-islam requested a review from a team as a code owner June 18, 2025 03:59
@russell-islam russell-islam force-pushed the muislam/block-batch-req branch 2 times, most recently from 5f5cc77 to 6cd2808 Compare June 18, 2025 04:21
Copy link
Member

@rbradford rbradford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commits need some restructuring to make more sense.

@russell-islam russell-islam force-pushed the muislam/block-batch-req branch from 6cd2808 to 8a58caa Compare June 18, 2025 16:16
Copy link
Member

@liuw liuw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you confirm each individual commit builds fine? We cannot break bisection.

I asked GitHub Copilot to write me a script to check. It is fine.

@russell-islam russell-islam force-pushed the muislam/block-batch-req branch 2 times, most recently from c228e3b to feca574 Compare July 10, 2025 21:34
@russell-islam russell-islam marked this pull request as draft July 10, 2025 22:29
@russell-islam russell-islam force-pushed the muislam/block-batch-req branch 3 times, most recently from cea5167 to 57ed9bf Compare July 11, 2025 23:16
@russell-islam russell-islam marked this pull request as ready for review July 14, 2025 19:23
@russell-islam
Copy link
Contributor Author

@liuw @rbradford PTAL

liuw
liuw previously approved these changes Jul 21, 2025
Copy link
Member

@liuw liuw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Judging from the numbers, this is an improvement over the status quo.

@russell-islam russell-islam force-pushed the muislam/block-batch-req branch from 57ed9bf to 873ca81 Compare July 21, 2025 20:33
@liuw liuw self-requested a review July 22, 2025 04:11
@liuw
Copy link
Member

liuw commented Jul 22, 2025

There is one problem. Although each commit builds, they are not functional. This still breaks bisection.

For example, the first commit only collects the requests into a vector, but they are never submitted. If we run purely that commit, guest I/O will stall. This is still bad.

Given the complexity (or lack thereof) of this change, you should just squash everything into one commit.

@liuw liuw dismissed their stale review July 22, 2025 04:20

Commits break bisection

@russell-islam russell-islam force-pushed the muislam/block-batch-req branch 2 times, most recently from bc75b5c to 78ef10a Compare July 23, 2025 19:35
Copy link
Member

@likebreath likebreath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@russell-islam Thank you for the good work and the detailed results. I like the idea and believe this is a good improvement to have.

Some comments below about the implementation. Also, can you share the comparison with QEMU before and after the change?

I am thinking such feature should be useful for other async backends too, say the aio backend. If that's the case, such feature might be better to be added at a higher layer, perhaps refactor execute_async(). This would potentially solve the problem of avoiding multiple duplications and also thread-safety issue with the new struct. I still need to put more thoughts on it. Let me know your thoughts.

likebreath added a commit to likebreath/cloud-hypervisor that referenced this pull request Aug 1, 2025
The idea of request submission in batch was proprosed and measured by
@russell-islam from cloud-hypervisor#7146. This presents a different design to support
such feature, aimming to address the following issues from cloud-hypervisor#7146:

* Make the implementation thread safe
* Support multiple block backend
* Reduce heap allocation
* Properly handle errors from request submission in batch (todo)

Signed-off-by: Bo Chen <[email protected]>
@likebreath
Copy link
Member

likebreath commented Aug 1, 2025

I am thinking such feature should be useful for other async backends too, say the aio backend. If that's the case, such feature might be better to be added at a higher layer, perhaps refactor execute_async(). This would potentially solve the problem of avoiding multiple duplications and also thread-safety issue with the new struct. I still need to put more thoughts on it. Let me know your thoughts.

Okay. I did a quick PoC based on this idea and here is implementation: https://github.com/likebreath/cloud-hypervisor/commits/0801/redesign_block_batch_submit/, particularly likebreath@97afe58 likebreath@241483d. I tested with raw file using io_uring backend.

It tries to address to following issues (including ones being raised from previous comments):

  • Make the implementation thread safe
  • Support multiple block backend
  • Reduce heap allocation
  • Properly handle errors from request submission in batch (see todo in comments)

@russell-islam If we agree on this design, I'd love to hand over it to you for further improvements and measurements. Please take a look and let me know your thoughts.

likebreath added a commit to likebreath/cloud-hypervisor that referenced this pull request Aug 1, 2025
The idea of request submission in batch was proposed and measured by
@russell-islam from cloud-hypervisor#7146. This presents a different design to support
such feature, aiming to address the following issues from cloud-hypervisor#7146:

* Make the implementation thread safe
* Support multiple block backend
* Reduce heap allocation
* Properly handle errors from request submission in batch

Signed-off-by: Bo Chen <[email protected]>
likebreath added a commit to likebreath/cloud-hypervisor that referenced this pull request Aug 1, 2025
The idea of request submission in batch was proposed and measured by
@russell-islam from cloud-hypervisor#7146. This presents a different design to support
such feature, aiming to address the following issues from cloud-hypervisor#7146:

* Make the implementation thread safe
* Support multiple block backend
* Reduce heap allocation
* Properly handle errors from request submission in batch

Signed-off-by: Bo Chen <[email protected]>
@russell-islam
Copy link
Contributor Author

@likebreath This week I am on call. Will resume this work next week,

@likebreath
Copy link
Member

@likebreath This week I am on call. Will resume this work next week,

No problem. Thanks for the heads-up.

@likebreath
Copy link
Member

CLH Command:

sudo ./cloud-hypervisor_reg --kernel ./bzImage --disk path=focal-server-cloudimg-amd64-ch.raw,direct=on path=big-disk-ch.img,direct=on --cmdline "console=hvc0 root=/dev/vda1 rw" --cpus boot=$1 --memory size=32G --net tap=ich0,mac=00:11:22:33:44:55,ip=192.168.4.2,mask=255.255.255.0 --api-socket /tmp/cloud-hypervisor.sock --console pty

The num_queues for the virtio-blk device should match the --numjobs and the vcpu count. Otherwise, testing on multiple IO are competing a single virt queue, which is why you don't see performance improvements as the --numjobs increases.

@russell-islam
Copy link
Contributor Author

CLH Command:
sudo ./cloud-hypervisor_reg --kernel ./bzImage --disk path=focal-server-cloudimg-amd64-ch.raw,direct=on path=big-disk-ch.img,direct=on --cmdline "console=hvc0 root=/dev/vda1 rw" --cpus boot=$1 --memory size=32G --net tap=ich0,mac=00:11:22:33:44:55,ip=192.168.4.2,mask=255.255.255.0 --api-socket /tmp/cloud-hypervisor.sock --console pty

The num_queues for the virtio-blk device should match the --numjobs and the vcpu count. Otherwise, testing on multiple IO are competing a single virt queue, which is why you don't see performance improvements as the --numjobs increases.

If num_queues not provided in clh command, by default it is equal to number of VCPU, right? In the guest I used the num-jobs equal to number of cpu. Did I miss anything?

@likebreath
Copy link
Member

CLH Command:
sudo ./cloud-hypervisor_reg --kernel ./bzImage --disk path=focal-server-cloudimg-amd64-ch.raw,direct=on path=big-disk-ch.img,direct=on --cmdline "console=hvc0 root=/dev/vda1 rw" --cpus boot=$1 --memory size=32G --net tap=ich0,mac=00:11:22:33:44:55,ip=192.168.4.2,mask=255.255.255.0 --api-socket /tmp/cloud-hypervisor.sock --console pty

The num_queues for the virtio-blk device should match the --numjobs and the vcpu count. Otherwise, testing on multiple IO are competing a single virt queue, which is why you don't see performance improvements as the --numjobs increases.

If num_queues not provided in clh command, by default it is equal to number of VCPU, right? In the guest I used the num-jobs equal to number of cpu. Did I miss anything?

I think the default is 1, both with CLI and HTTP api:

pub const DEFAULT_DISK_NUM_QUEUES: usize = 1;
pub fn default_diskconfig_num_queues() -> usize {
DEFAULT_DISK_NUM_QUEUES
}

@russell-islam
Copy link
Contributor Author

Seq_write

@russell-islam
Copy link
Contributor Author

Seq_Read

@russell-islam
Copy link
Contributor Author

russell-islam commented Aug 27, 2025

@likebreath Numbers are way better now. We can improve further.

@russell-islam russell-islam force-pushed the muislam/block-batch-req branch from e6f93bc to c343843 Compare August 27, 2025 19:55
@russell-islam
Copy link
Contributor Author

@rbradford Could you please take another look? Looks like your requested change might block the merging.

@likebreath likebreath self-requested a review August 27, 2025 20:54
@likebreath likebreath dismissed their stale review August 27, 2025 20:54

Code changed

@likebreath
Copy link
Member

@likebreath Numbers are way better now. We still can improve further.

Yes, I see it. It looks good and is an improvement for smaller block sizes.

Some more finding - I believe we now saturated at the limit of the actual device on the host around 2735 MB/s for write and 6600 MB/s for read. This happens basically for all tests with block size larger than 128KB. With this in mind, the comparison result won't provide much actual insights between batch vs regular and CH vs QEMU.

@russell-islam
Copy link
Contributor Author

Code rebased.

@likebreath
Copy link
Member

Code rebased.

@russell-islam I think you pushed the wrong commit - my two comments (#7146 (review)) you addressed got discarded with the current push.

russell-islam force-pushed the muislam/block-batch-req branch from e6f93bc to c343843

Instead of returning boolean return an struct of completion status
so that it can be cached for batch submission.

Signed-off-by: Bo Chen <[email protected]>
Signed-off-by: Muminul Islam <[email protected]>
Cache and batch IO requests after parsing all
items in the queue, improving performance—especially
for small block sizes—by reducing per-request overhead.

Introduced two methods in the AsyncIo trait for batch
submission, with implementation in the raw disk backend.
This method should be called during/after parsing all block IO requests
in the available queue. If the batch submission is not enabled, by
default it does the old way of submitting requests.

Signed-off-by: Bo Chen <[email protected]>
Signed-off-by: Muminul Islam <[email protected]>
Implement the batch submission function for raw disk, default it is
enabled. After parsing the requests this method is
called for better IO latency and bandwidth.

Signed-off-by: Bo Chen <[email protected]>
Signed-off-by: Muminul Islam <[email protected]>
Updated VHD async implementation to call the batch submit
method via the raw async IO layer.

Signed-off-by: Muminul Islam <[email protected]>
@russell-islam russell-islam force-pushed the muislam/block-batch-req branch from c343843 to f1281fe Compare August 27, 2025 21:53
@russell-islam
Copy link
Contributor Author

russell-islam commented Aug 27, 2025

Code rebased.

@russell-islam I think you pushed the wrong commit - my two comments (#7146 (review)) you addressed got discarded with the current push.

russell-islam force-pushed the muislam/block-batch-req branch from e6f93bc to c343843

Thank you. Sorry, too many places to dev and test on my side. Fixed now.

@rbradford rbradford dismissed their stale review August 28, 2025 12:30

Code updated.

@likebreath likebreath added this pull request to the merge queue Aug 29, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Aug 29, 2025
@likebreath likebreath added this pull request to the merge queue Sep 2, 2025
Merged via the queue into cloud-hypervisor:main with commit a9d6807 Sep 2, 2025
40 checks passed
@likebreath likebreath moved this from 🆕 New to ✅ Done in Cloud Hypervisor Roadmap Sep 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

4 participants