Skip to content

Conversation

@weltling
Copy link
Member

…ance

The OFLAG_COPIED bit (bit 63) indicates a cluster's refcount is exactly 1 and doesn't need copy-on-write. This bit must be set in L1 entries when their referenced L2 clusters have refcount=1.

Previously, L1 entries were always written as raw addresses without the OFLAG_COPIED bit, violating the QCOW2 specification and causing qemu-img check to report errors like

ERROR OFLAG_COPIED L2 cluster: l1_index=X .... refcount=1

The implementation queries each L2 cluster's refcount in sync_caches() and sets OFLAG_COPIED appropriately when writing the L1 table. This ensures QCOW2 images are specification compliant and maintain correct COW semantics to avoid data corruption.

@weltling weltling requested a review from a team as a code owner November 27, 2025 15:41
@phip1611
Copy link
Member

phip1611 commented Nov 27, 2025

Thanks for your contribution! @ all: Shouldn't #7506 help us to discover such issues?

@weltling weltling force-pushed the qcow-fixes-copied-flag branch from 11079a0 to c478e3e Compare November 27, 2025 15:48
The OFLAG_COPIED bit (bit 63) indicates a cluster's refcount is exactly
1 and doesn't need copy-on-write. This bit must be set in L1 entries
when their referenced L2 clusters have refcount=1.

Previously, L1 entries were always written as raw addresses without the
OFLAG_COPIED bit, violating the QCOW2 specification and causing qemu-img
check to report errors like

`ERROR OFLAG_COPIED L2 cluster: l1_index=X .... refcount=1`

The implementation queries each L2 cluster's refcount in sync_caches()
and sets OFLAG_COPIED appropriately when writing the L1 table. This
ensures QCOW2 images are specification compliant and maintain correct
COW semantics to avoid data corruption.

Signed-off-by: Anatol Belski <[email protected]>
@weltling
Copy link
Member Author

Thanks for your contribution! @ all: Shouldn't #7506 help us to discover such issues?

Absolutely. There is yet one issue I'm tracking. I'll plan to extend the tests with the corresponding test cases to cover the mentioned issue.

Thanks

@weltling weltling force-pushed the qcow-fixes-copied-flag branch from c478e3e to 5b6c14b Compare November 27, 2025 15:55
@liuw
Copy link
Member

liuw commented Nov 28, 2025

Thanks for your contribution! @ all: Shouldn't #7506 help us to discover such issues?

Absolutely. There is yet one issue I'm tracking. I'll plan to extend the tests with the corresponding test cases to cover the mentioned issue.

Thanks

Yes, please extend the test in the same PR if possible.

@weltling
Copy link
Member Author

Thanks for your contribution! @ all: Shouldn't #7506 help us to discover such issues?

Absolutely. There is yet one issue I'm tracking. I'll plan to extend the tests with the corresponding test cases to cover the mentioned issue.
Thanks

Yes, please extend the test in the same PR if possible.

Here's the actual output from qemu-img check ...

Leaked cluster 748 refcount=1 reference=0
ERROR OFLAG_COPIED L2 cluster: l1_index=0 l1_entry=890000 refcount=1
ERROR OFLAG_COPIED L2 cluster: l1_index=2 l1_entry=3180000 refcount=1
ERROR OFLAG_COPIED L2 cluster: l1_index=4 l1_entry=3200000 refcount=1
ERROR OFLAG_COPIED L2 cluster: l1_index=5 l1_entry=30b0000 refcount=1

4 errors were found on the image.
Data may be corrupted, or further writes to the image may corrupt it.

1 leaked clusters were found on the image.
This means waste of disk space, but no harm to data.
790/163840 = 0.48% allocated, 22.53% fragmented, 0.00% compressed clusters
Image end offset: 52494336

There are actually 2 distinct issues here:

  • Data corruption (OFLAG_COPIED errors) - addressed by this PR
  • Cluster leak - I'm tracking this as a separate issue to fix next

Adding tests in this PR would still fail because qemu-img check would report the cluster leak issue even after this fix. Therefore, I'd prefer to either:

  • Add tests in a separate PR dedicated to testing, or
  • Combine the tests with the cluster leak fix in the next PR

This way the tests will pass cleanly once both issues are resolved.

Thanks

@weltling
Copy link
Member Author

Strangely, I don't get a seccomp violation while running this locally :/

@weltling weltling force-pushed the qcow-fixes-copied-flag branch from 6df537d to 3c22199 Compare November 29, 2025 16:12
@liuw
Copy link
Member

liuw commented Nov 29, 2025

Strangely, I don't get a seccomp violation while running this locally :/

Your new code calls try_clone which probably calls dup2 or dup3.

If you want to make the write function take an iterator, why not keep most of the original code and not call the collect function?

@liuw
Copy link
Member

liuw commented Nov 29, 2025

Well, it looks like the BufWriter in the write_pointer_table function allocates a vector as well. It is fine to keep using the vector for now. We should spend some dedicated effort in eliminating allocations in the future if necessary.

@weltling
Copy link
Member Author

Strangely, I don't get a seccomp violation while running this locally :/

Your new code calls try_clone which probably calls dup2 or dup3.

If you want to make the write function take an iterator, why not keep most of the original code and not call the collect function?

collect() would create a temporary vector, that's what we wanted to avoid.

Thanks

@weltling
Copy link
Member Author

Well, it looks like the BufWriter in the write_pointer_table function allocates a vector as well. It is fine to keep using the vector for now. We should spend some dedicated effort in eliminating allocations in the future if necessary.

Yeah, it uses BufWriter::with_capacity which will pre-allocate the buffer. But here, it is clearly justified, as otherwise using self.file.write() would be an alternative and it would be an unbuffered write very unefficient. That's why I went for dup'ing the file handle to keep the BufWriter piece. As usual, the I/O is the biggest part in the perf ottleneck.

Thanks

@weltling weltling force-pushed the qcow-fixes-copied-flag branch from 3c22199 to 5b6c14b Compare November 29, 2025 22:36
@weltling
Copy link
Member Author

Well, it looks like the BufWriter in the write_pointer_table function allocates a vector as well. It is fine to keep using the vector for now. We should spend some dedicated effort in eliminating allocations in the future if necessary.

ACK. I reverted back to the previous version. It's a good experiment and I agree rewriting this part to reduce temp allocations should be a separate concerted effort. Thanks!

@weltling
Copy link
Member Author

BTW I also had some considerations about the memory impart in this case. The size of the L1 table depends on the disk and cluster size. Say, with a disk of 100GB L1 table could grow to something like 2KB. I guess, a vec<64> would even fit on stack, but in this case would need to investigate the Rust compiler.

While doing the callback solution, I also had an idea about how to make it not requiring passing the file handle. As the file handle is needed for refcounts, one colud gather refcounts and pass it to that callback. In that case, it would be a vec<16> and for a 100GB disk it colud take something like 400 bytes. As it's a temp allocation only at a flush time, it would be sort a mid solution between zero allocation vs. temporary vec<64>.

Of course, not having a temporary vector is the best for production case. In the future we colud consider to either minimize the memory usage by worknig out a iterator+callback based solution or moving to a simple solution with vec<16> for just refcounts. For now, keeping this temporary vector solution for a quick patch iteration would seem to be indeed ok.

Thanks

@rbradford rbradford added this pull request to the merge queue Dec 1, 2025
Merged via the queue into cloud-hypervisor:main with commit 6897c2a Dec 1, 2025
85 of 86 checks passed
@weltling weltling deleted the qcow-fixes-copied-flag branch December 1, 2025 22:26
@likebreath likebreath added the bug-fix Bug fix to include in release notes label Dec 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug-fix Bug fix to include in release notes

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

5 participants