rgw: fix consistency bug with OLH objects#51700
Conversation
8f1880c to
e43a991
Compare
e43a991 to
b01b017
Compare
|
I still want to investigate and add test cases for the scenario where an object transitions from unversioned to versioned, because I suspect that these changes have the potential to cause problems there |
60a2da6 to
d34a36c
Compare
I added https://github.com/ceph/ceph/pull/51700/files#diff-fb30258ed43305d97a0fdeb02fdb91bc0e1613985e4dfbeab46381c4f1719bc4R6818-R6829 to handle that case properly and added a test case for it |
eadb206 to
e44e12d
Compare
e44e12d to
8b0038c
Compare
8b0038c to
534032d
Compare
|
the new "apply_olh_log should clear_olh before trimming olh log" commit looks great 👍 |
cbodley
left a comment
There was a problem hiding this comment.
high-level feedback for the admin commands:
- this adds too much to rgw_admin.cc. please move the logic to driver/rados/rgw_bucket.* similar to check_index(). please create separate functions for "check olh" vs "check unlinked" instead of branching internally
- instead of just logging the number of matches, can we dump a json array with all of their keys? i believe that's how 'bucket check' works
- please consider using coroutines instead of threads. a coroutine function that handles a single shard would be much easier to read and maintain. if you want to limit how many shards we process at a time, spawn N coroutines that each process a different value of (shard % N)
src/rgw/rgw_admin.cc
Outdated
| return; | ||
| } | ||
| RGWRados::BucketShard bs(store); | ||
| string marker = opt_cmd == OPT::BUCKET_CHECK_OLH ? "\x80" "1001_" : "\x80" "1000_"; |
There was a problem hiding this comment.
ok. these strings are an implementation detail of cls_rgw, but i don't think the existing bi_list API gives us a way to list a specific index otherwise
There was a problem hiding this comment.
Yeah I know. I don't feel great about that either, but it's a significant optimization to skip listing all of the irrelevant parts of the keyspace and I didn't think it would be a great idea to add new cls methods just for these specialty listing cases.
src/rgw/rgw_admin.cc
Outdated
| listable = false; | ||
| int ret; | ||
| do { | ||
| ret = store->bi_list(bs, key.name, marker, 1000, &entries, &is_truncated); |
There was a problem hiding this comment.
consider using cls_rgw_bucket_list_op() instead, which should only be returning the plain entries. bi_list() would search each of the special namespaces too
There was a problem hiding this comment.
I'm breaking out of the loop when it encounters a non-plain entry, so it won't actually iterate over the special namespaces. The issue with using cls_rgw_bucket_list_op is that it's susceptible to high latencies and errors when a bunch of these bogus entries are in the index (the ones that we're trying to clean up). That said, I acknowledge that I do need to do something different here because breaking out of the loop when I hit the first non-plain entry means that I'm not considering any plain entries that come after the special namespaces. I'll review this.
Ok I agree, I will refactor accordingly.
Yep that sounds good.
I did consider that initially, but since rgw_admin doesn't currently use coroutines or set up an asio io_context, I wasn't sure if it made sense to include in a bugfix PR. I'll give it a try and see what it looks like. |
you can find an example in rgw_sync_checkpoint.cc |
I'm questioning whether this is a good idea now that I've got around to implementing it. I'm currently outputting progress information via stderr logs and then the JSON output goes to stdout. When dumping all of the keys in a JSON array, the command output with no redirection ends up looking like this: That obviously isn't very nice for the user experience. Unfortunately, we have cases where there are millions of bogus entries per shard, so it isn't practical to buffer everything in memory and only print all of the keys to stdout at the end. Maybe I should add another cli flag so that a file path can be specified for output of the keys? What are your thoughts? |
are the per-shard messages really necessary? to avoid excessive buffering, you could keep a counter and call |
|
the merge of #50206 introduced several conflicts in rgw_rados.*. hopefully that will make it easier to get real |
|
This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved |
|
jenkins test make check |
|
i've been unable to get successful builds for this in https://shaman.ceph.com/builds/ceph/pr-51700/, trying again.. |
|
testing with the |
I'll look into this today. Tested briefly this morning and see that it is still passing for me with a vstart cluster. |
|
rescheduled the rgw/verify suite with existing builds against your updated suite-branch in https://pulpito.ceph.com/cbodley-2023-07-12_19:51:14-rgw:verify-main-distro-default-smithi |
|
same failures again |
Signed-off-by: Cory Snyder <[email protected]>
635295f to
aa1f40e
Compare
|
jenkins test signed |
|
lots of dead jobs due to lab issues. scheduled a --rerun in https://pulpito.ceph.com/cbodley-2023-07-17_14:52:30-rgw-main-distro-default-smithi |
|
one run-reshard.sh failure from http://qa-proxy.ceph.com/teuthology/cbodley-2023-07-17_14:52:30-rgw-main-distro-default-smithi/7341683/teuthology.log: |
|
i don't think that's due to anything in this pr, it was just an unlikely race between the and the reshard logging from rgw.ceph.client.0.log: i scheduled a re-rerun in https://pulpito.ceph.com/cbodley-2023-07-17_20:43:19-rgw-main-distro-default-smithi to see if it goes away |
|
lab issues causing too many dead jobs. i'll try again tomorrow |
|
down to one dead job in https://pulpito.ceph.com/cbodley-2023-07-17_21:21:12-rgw-main-distro-default-smithi/, and the failures all look unrelated. good enough for me! |
|
@cfsnyder i closed https://tracker.ceph.com/issues/59663 as resolved, and moved https://tracker.ceph.com/issues/61359 to Pending Backport. could you please prepare the reef backport asap so i can schedule tests? separately, we'll follow up on the radosgw-admin command for cleanup. can you please create a new tracker issue for that part? |
|
@cbodley thanks, yeah I'll work on the backport this morning |
|
Added this tracker for the radosgw-admin commands: https://tracker.ceph.com/issues/62075 |
Relates to:
https://tracker.ceph.com/issues/61359
https://tracker.ceph.com/issues/59663
Contribution Guidelines
To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "pacific"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
Checklist
Show available Jenkins commands
jenkins retest this pleasejenkins test classic perfjenkins test crimson perfjenkins test signedjenkins test make checkjenkins test make check arm64jenkins test submodulesjenkins test dashboardjenkins test dashboard cephadmjenkins test apijenkins test docsjenkins render docsjenkins test ceph-volume alljenkins test ceph-volume toxjenkins test windows