rgw: add radosgw-admin bucket check olh/unlinked commands#52576
rgw: add radosgw-admin bucket check olh/unlinked commands#52576
Conversation
cbodley
left a comment
There was a problem hiding this comment.
very nicely done with the coroutines!
can you please share some example output so i can review the formatting?
|
what guidance should we provide for users that might be affected? do you think a release note would suffice? |
|
f612d4c to
5e4ddc5
Compare
I'm not sure, what have you guys done for this sort of thing in the past? I could write something up to provide an explanation but not sure where it should live? |
d197ecb to
bcf9046
Compare
|
@ivancich could you give these changes some review, as well? |
ivancich
left a comment
There was a problem hiding this comment.
Looks really good. There's a lot of iterated I/O (e.g., check on listable), but there's really no obvious way around it.
I would love to see really nice comments above each of the functions added to rgw/driver/rados/rgw_bucket.cc explaining what they do and a little bit of how they do it.
Thank you!
@cfsnyder a recent
|
Thanks for the review, @ivancich! I've added some doc comments, let me know if you have any feedback. |
@cbodley how about this: RGW: New tools have been added to radosgw-admin for identifying and correcting |
Adds commands to radosgw-admin for checking for and fixing leftover entries in the bucket index (and associated RADOS objects). Fixes: https://tracker.ceph.com/issues/62075 Signed-off-by: Cory Snyder <[email protected]>
… bugs Signed-off-by: Cory Snyder <[email protected]>
If a call to bucket_index_link_olh or bucket_index_unlink_instance fails, its associated pending xattr may have prevented the olh object from being removed by another thread. We should do a best effort cleanup attempt for this case by calling update_olh before returning an error to the caller. Signed-off-by: Cory Snyder <[email protected]>
|
FYI added another small commit to handle another possible scenario where olh objects get left behind when concurrent requests are interleaved in a particular way |
|
Looks like failures are unrelated to me, but I'll let you double check the results @cbodley . |
…--check-objects flag Printing all index entries can be very time consuming for large buckets and the inability to switch this behavior off makes it cumbersome to use the command for fixing bucket stats. This was also preventing the command from outputting recalculated bucket stats when the --fix flag wasn't specified. Signed-off-by: Cory Snyder <[email protected]>
|
Actually added one more tiny change in a new commit. I don't think it should require a new QA run. Based upon the error message here, it looks like this was the intended use of the --check-objects flag anyhow, so this may have been a regression |
| // it's possible that the pending xattr from this op prevented the olh | ||
| // object from being cleaned by another thread that was deleting the last | ||
| // existing version. We invoke a best-effort update_olh here to handle this case. | ||
| int r = update_olh(dpp, obj_ctx, state, bucket_info, olh_obj, y); |
@cfsnyder unfortunately, that commit broke the ragweed tests which use |
|
@cbodley , I noticed that at the EOD on Friday while testing a Pacific backport of these changes. What I found was that this admin API doesn't return valid JSON. It coincidentally did return valid JSON previously, with the right combination of query params and due to the bug that was fixed here. I just added a commit here #53607 to fix the output format of the API. I'll look at updating the Ragweed tests accordingly this morning, but wondering if you know of any other clients that may need to be updated to account for this fix? |
Adds commands to radosgw-admin for checking for and fixing leftover entries in the bucket index (and associated RADOS objects).
Fixes: https://tracker.ceph.com/issues/62075
Signed-off-by: Cory Snyder [email protected]
Checklist
Show available Jenkins commands
jenkins retest this pleasejenkins test classic perfjenkins test crimson perfjenkins test signedjenkins test make checkjenkins test make check arm64jenkins test submodulesjenkins test dashboardjenkins test dashboard cephadmjenkins test apijenkins test docsjenkins render docsjenkins test ceph-volume alljenkins test ceph-volume toxjenkins test windows