-
Notifications
You must be signed in to change notification settings - Fork 565
Batch submit block Async IO Requests #7146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Batch submit block Async IO Requests #7146
Conversation
5f5cc77 to
6cd2808
Compare
rbradford
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commits need some restructuring to make more sense.
6cd2808 to
8a58caa
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you confirm each individual commit builds fine? We cannot break bisection.
I asked GitHub Copilot to write me a script to check. It is fine.
c228e3b to
feca574
Compare
cea5167 to
57ed9bf
Compare
|
@liuw @rbradford PTAL |
liuw
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Judging from the numbers, this is an improvement over the status quo.
57ed9bf to
873ca81
Compare
|
There is one problem. Although each commit builds, they are not functional. This still breaks bisection. For example, the first commit only collects the requests into a vector, but they are never submitted. If we run purely that commit, guest I/O will stall. This is still bad. Given the complexity (or lack thereof) of this change, you should just squash everything into one commit. |
bc75b5c to
78ef10a
Compare
likebreath
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@russell-islam Thank you for the good work and the detailed results. I like the idea and believe this is a good improvement to have.
Some comments below about the implementation. Also, can you share the comparison with QEMU before and after the change?
I am thinking such feature should be useful for other async backends too, say the aio backend. If that's the case, such feature might be better to be added at a higher layer, perhaps refactor execute_async(). This would potentially solve the problem of avoiding multiple duplications and also thread-safety issue with the new struct. I still need to put more thoughts on it. Let me know your thoughts.
The idea of request submission in batch was proprosed and measured by @russell-islam from cloud-hypervisor#7146. This presents a different design to support such feature, aimming to address the following issues from cloud-hypervisor#7146: * Make the implementation thread safe * Support multiple block backend * Reduce heap allocation * Properly handle errors from request submission in batch (todo) Signed-off-by: Bo Chen <[email protected]>
Okay. I did a quick PoC based on this idea and here is implementation: https://github.com/likebreath/cloud-hypervisor/commits/0801/redesign_block_batch_submit/, particularly It tries to address to following issues (including ones being raised from previous comments):
@russell-islam If we agree on this design, I'd love to hand over it to you for further improvements and measurements. Please take a look and let me know your thoughts. |
The idea of request submission in batch was proposed and measured by @russell-islam from cloud-hypervisor#7146. This presents a different design to support such feature, aiming to address the following issues from cloud-hypervisor#7146: * Make the implementation thread safe * Support multiple block backend * Reduce heap allocation * Properly handle errors from request submission in batch Signed-off-by: Bo Chen <[email protected]>
The idea of request submission in batch was proposed and measured by @russell-islam from cloud-hypervisor#7146. This presents a different design to support such feature, aiming to address the following issues from cloud-hypervisor#7146: * Make the implementation thread safe * Support multiple block backend * Reduce heap allocation * Properly handle errors from request submission in batch Signed-off-by: Bo Chen <[email protected]>
|
@likebreath This week I am on call. Will resume this work next week, |
No problem. Thanks for the heads-up. |
The |
If num_queues not provided in clh command, by default it is equal to number of VCPU, right? In the guest I used the num-jobs equal to number of cpu. Did I miss anything? |
I think the default is cloud-hypervisor/vmm/src/vm_config.rs Lines 290 to 294 in 91d15c3
|
|
@likebreath Numbers are way better now. We can improve further. |
e6f93bc to
c343843
Compare
|
@rbradford Could you please take another look? Looks like your requested change might block the merging. |
Yes, I see it. It looks good and is an improvement for smaller block sizes. Some more finding - I believe we now saturated at the limit of the actual device on the host around 2735 MB/s for write and 6600 MB/s for read. This happens basically for all tests with block size larger than 128KB. With this in mind, the comparison result won't provide much actual insights between batch vs regular and CH vs QEMU. |
|
Code rebased. |
@russell-islam I think you pushed the wrong commit - my two comments (#7146 (review)) you addressed got discarded with the current push.
|
Instead of returning boolean return an struct of completion status so that it can be cached for batch submission. Signed-off-by: Bo Chen <[email protected]> Signed-off-by: Muminul Islam <[email protected]>
Cache and batch IO requests after parsing all items in the queue, improving performance—especially for small block sizes—by reducing per-request overhead. Introduced two methods in the AsyncIo trait for batch submission, with implementation in the raw disk backend. This method should be called during/after parsing all block IO requests in the available queue. If the batch submission is not enabled, by default it does the old way of submitting requests. Signed-off-by: Bo Chen <[email protected]> Signed-off-by: Muminul Islam <[email protected]>
Implement the batch submission function for raw disk, default it is enabled. After parsing the requests this method is called for better IO latency and bandwidth. Signed-off-by: Bo Chen <[email protected]> Signed-off-by: Muminul Islam <[email protected]>
Updated VHD async implementation to call the batch submit method via the raw async IO layer. Signed-off-by: Muminul Islam <[email protected]>
c343843 to
f1281fe
Compare
Thank you. Sorry, too many places to dev and test on my side. Fixed now. |


Compare to QEMU CLH block performace is worse in smaller block sizes. For smaller block sizes, multiple requests are in a Queue, Batching those requests improves the IO performance while not regressing the perfromace of the larger block sizes. Attaching the FIO bench marking here (SEQ read and SEQ write). Comparing the reg(Current impletion) with batch(this PR).
This pull request introduces batch request submission functionality to the asynchronous I/O subsystem, improves error handling, and refactors existing code to support the new feature. The changes primarily affect the
blockandvirtio-devicesmodules, with updates to enums, traits, and implementations to enable caching and submission of batched I/O requests.