-
Notifications
You must be signed in to change notification settings - Fork 565
block: virtio-blk: report IO errors to the guest #7107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
likebreath
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the contribution. Besides the comment below, I believe this patch only handles the async block backend, while we also need to have the same behavior for the synchronous backend. Please take a look at process_queue_submit(), particularly the call to request.execut_async() which returns false on no error when using synchronous backend.
753d09b to
59af0df
Compare
|
As far as I can tell, the only synchronous operation handled by |
Please checkout the different implementations of |
59af0df to
39f1335
Compare
likebreath
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes now look good to me. Just kicked off the CI pipeline.
Would you please look into if it is possible to add an integration test for this, e.g. manually trigger such error on the host and check the guest is still alive with expected errors reported to the guest kernel log? Thank you.
|
@likebreath thank you for your review. I am working on the integration tests. I have a few questions:
|
They won't hold up merging your PR (if you rebase you will see the fuzz failures gone I think.)
That's fine - the check is to ensure that normal paragraphs are appropriately wrapped.
It's not easy to do but you can use |
rbradford
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I imagine this would be very tricky to test with an integration test - the code looks okay to me.
virtio-devices/src/block.rs
Outdated
| let status = match result { | ||
| Ok(_) => VIRTIO_BLK_S_OK, | ||
| Err(e) => { | ||
| error!("Request failed: {:x?} {:?}", request, e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we think error! might be too aggressive here - it's no longer fatal for the VMM so maybe it should be warn! @likebreath WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that makes sense, both for the sync and async code path.
My reproducer setup involves NBD and qemu-nbd, which seems to require root. So it's not as simple to add to the tests as I originally thought. |
@gjolly Thanks for looking into this. That's where I was not sure and wanted you to help and find out. For such case, I am fine with not adding a integration test. Can you please manually validate both the async and sync cases? To enforce the sync backend of block devices, you can either use the internal options of cloud-hypervisor/tests/integration.rs Lines 3254 to 3257 in 59e11f1
Can you please also rebase that should resolve the fuzz build error? Thank you. |
|
For the commit message, I believe you meant to say:
|
I agree - we don't need an integration test if you've manually verified it. |
39f1335 to
4c9713d
Compare
Instead of exiting on IO errors, report the errors to the guest with VIRTIO_BLK_S_IOERR. For example, the guest kernel will log something similar to this if the nbd behind /dev/vdc is unexpectedly disconnected: [ 166.033957] I/O error, dev vdc, sector 264 op 0x1:(WRITE) flags 0x9800 phys_seg 1 prio class 2 [ 166.035083] Aborting journal on device vdc-8. [ 166.037307] Buffer I/O error on dev vdc, logical block 9, lost sync page write [ 166.038471] JBD2: I/O error when updating journal superblock for vdc-8. [...] [ 174.234470] EXT4-fs (vdc): I/O error while writing superblock In case the rootfs is not located on the affected block device, this will not crash the guest. Fixes: cloud-hypervisor#6995 Signed-off-by: Gauthier Jolly <[email protected]>
4c9713d to
fdf7339
Compare
|
I rebased and I think I fixed all the things that were mentioned (commit message and error level) truncate -s 1M disk.raw
qemu-nbd --connect /dev/nbd0 disk.raw
./cloud-hypervisor -disk path=/dev/nbd0 ...Then when the VM is running I simply do: (guest) echo "foo" > /dev/vdc # that should not raise any error
(host) qemu-nbd --disconnect /dev/nbd0
(guest) echo "foo" > /dev/vdc # that should trigger the kernel errors + the hypervisor warningsAt the end the guest should still be running. The same process can be repeated but with |
|
@likebreath If you're happy with the changes could you approve so this can be merged. |
3d78662
Instead of exiting on IO errors, report the errors to the guest with VIRTIO_BLK_S_IOERR. For example, the guest kernel will log something similar to this if the nbd behind /dev/vdc is unexpectedly disconnected:
[ 166.033957] I/O error, dev vdc, sector 264 op 0x1:(WRITE) flags 0x9800 phys_seg 1 prio class 2
[ 166.035083] Aborting journal on device vdc-8.
[ 166.037307] Buffer I/O error on dev vdc, logical block 9, lost sync page write
[ 166.038471] JBD2: I/O error when updating journal superblock for vdc-8.
[...]
[ 174.234470] EXT4-fs (vdc): I/O error while writing superblock
In case the rootfs is not located on the affected block device, this will not crash the host.
Fixes: #6995