rgw: fix consistency bug with OLH objects by cfsnyder · Pull Request #51700 · ceph/ceph

cfsnyder · 2023-05-23T09:56:08Z

Relates to:
https://tracker.ceph.com/issues/61359
https://tracker.ceph.com/issues/59663

Contribution Guidelines

To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "pacific"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.

Checklist

Tracker (select at least one)
- References tracker ticket
- Very recent bug; references commit where it was introduced
- New feature (ticket optional)
- Doc update (no ticket needed)
- Code cleanup (no ticket needed)
Component impact
- Affects Dashboard, opened tracker ticket
- Affects Orchestrator, opened tracker ticket
- No impact that needs to be tracked
Documentation (select at least one)
- Updates relevant documentation
- No doc update is appropriate
Tests (select at least one)
- Includes unit test(s)
- Includes integration test(s)
- Includes bug reproducer
- No tests

Show available Jenkins commands

jenkins retest this please
jenkins test classic perf
jenkins test crimson perf
jenkins test signed
jenkins test make check
jenkins test make check arm64
jenkins test submodules
jenkins test dashboard
jenkins test dashboard cephadm
jenkins test api
jenkins test docs
jenkins render docs
jenkins test ceph-volume all
jenkins test ceph-volume tox
jenkins test windows

src/rgw/driver/rados/rgw_rados.cc

qa/workunits/rgw/test_rgw_reshard.py

cfsnyder · 2023-06-01T09:45:20Z

I still want to investigate and add test cases for the scenario where an object transitions from unversioned to versioned, because I suspect that these changes have the potential to cause problems there

cfsnyder · 2023-06-02T10:05:02Z

I still want to investigate and add test cases for the scenario where an object transitions from unversioned to versioned, because I suspect that these changes have the potential to cause problems there

I added https://github.com/ceph/ceph/pull/51700/files#diff-fb30258ed43305d97a0fdeb02fdb91bc0e1613985e4dfbeab46381c4f1719bc4R6818-R6829 to handle that case properly and added a test case for it

src/rgw/driver/rados/rgw_rados.cc

cbodley · 2023-06-15T16:37:57Z

the new "apply_olh_log should clear_olh before trimming olh log" commit looks great 👍

cbodley

high-level feedback for the admin commands:

this adds too much to rgw_admin.cc. please move the logic to driver/rados/rgw_bucket.* similar to check_index(). please create separate functions for "check olh" vs "check unlinked" instead of branching internally
instead of just logging the number of matches, can we dump a json array with all of their keys? i believe that's how 'bucket check' works
please consider using coroutines instead of threads. a coroutine function that handles a single shard would be much easier to read and maintain. if you want to limit how many shards we process at a time, spawn N coroutines that each process a different value of (shard % N)

src/rgw/driver/rados/rgw_rados.cc

cbodley · 2023-06-15T20:36:06Z

src/rgw/rgw_admin.cc

+            return;
+          }
+          RGWRados::BucketShard bs(store);
+          string marker = opt_cmd == OPT::BUCKET_CHECK_OLH ? "\x80" "1001_" : "\x80" "1000_";


ok. these strings are an implementation detail of cls_rgw, but i don't think the existing bi_list API gives us a way to list a specific index otherwise

Yeah I know. I don't feel great about that either, but it's a significant optimization to skip listing all of the irrelevant parts of the keyspace and I didn't think it would be a great idea to add new cls methods just for these specialty listing cases.

cbodley · 2023-06-15T21:13:37Z

src/rgw/rgw_admin.cc

+  listable = false;
+  int ret;
+  do {
+    ret = store->bi_list(bs, key.name, marker, 1000, &entries, &is_truncated);


consider using cls_rgw_bucket_list_op() instead, which should only be returning the plain entries. bi_list() would search each of the special namespaces too

I'm breaking out of the loop when it encounters a non-plain entry, so it won't actually iterate over the special namespaces. The issue with using cls_rgw_bucket_list_op is that it's susceptible to high latencies and errors when a bunch of these bogus entries are in the index (the ones that we're trying to clean up). That said, I acknowledge that I do need to do something different here because breaking out of the loop when I hit the first non-plain entry means that I'm not considering any plain entries that come after the special namespaces. I'll review this.

cfsnyder · 2023-06-16T13:35:51Z

high-level feedback for the admin commands:

* this adds too much to rgw_admin.cc. please move the logic to driver/rados/rgw_bucket.* similar to check_index(). please create separate functions for "check olh" vs "check unlinked" instead of branching internally

Ok I agree, I will refactor accordingly.

* instead of just logging the number of matches, can we dump a json array with all of their keys? i believe that's how 'bucket check' works

Yep that sounds good.

* please consider using coroutines instead of threads. a coroutine function that handles a single shard would be much easier to read and maintain. if you want to limit how many shards we process at a time, spawn N coroutines that each process a different value of (shard % N)

I did consider that initially, but since rgw_admin doesn't currently use coroutines or set up an asio io_context, I wasn't sure if it made sense to include in a bugfix PR. I'll give it a try and see what it looks like.

cbodley · 2023-06-16T17:37:39Z

I'll give it a try and see what it looks like.

you can find an example in rgw_sync_checkpoint.cc rgw_bucket_sync_checkpoint() for BUCKET_SYNC_CHECKPOINT

cfsnyder · 2023-06-20T09:45:28Z

@cbodley

instead of just logging the number of matches, can we dump a json array with all of their keys? i believe that's how 'bucket check' works

I'm questioning whether this is a good idea now that I've got around to implementing it. I'm currently outputting progress information via stderr logs and then the JSON output goes to stdout. When dumping all of the keys in a JSON array, the command output with no redirection ends up looking like this:

2023-06-20T09:34:46.444+0000 7f4491b96f40  1 NOTICE: finished shard 0 (0 entries found)
2023-06-20T09:34:46.444+0000 7f4491b96f40  1 NOTICE: finished shard 1 (0 entries found)
2023-06-20T09:34:46.448+0000 7f4491b96f40  1 NOTICE: finished shard 2 (0 entries found)
2023-06-20T09:34:46.452+0000 7f4491b96f40  1 NOTICE: finished shard 3 (0 entries found)
2023-06-20T09:34:46.452+0000 7f4491b96f40  1 NOTICE: finished shard 4 (0 entries found)
[
    "d"2023-06-20T09:34:46.456+0000 7f4491b96f40  1 NOTICE: finished shard 5 (1 entries found)
2023-06-20T09:34:46.460+0000 7f4491b96f40  1 NOTICE: finished shard 6 (0 entries found)
2023-06-20T09:34:46.460+0000 7f4491b96f40  1 NOTICE: finished shard 7 (0 entries found)
,
    "c"2023-06-20T09:34:46.464+0000 7f4491b96f40  1 NOTICE: finished shard 8 (1 entries found)
2023-06-20T09:34:46.468+0000 7f4491b96f40  1 NOTICE: finished shard 9 (0 entries found)

]
2023-06-20T09:34:46.468+0000 7f4491b96f40  1 NOTICE: finished shard 10 (0 entries found)
2023-06-20T09:34:46.468+0000 7f4491b96f40  1 NOTICE: finished all shards (2 entries found)

That obviously isn't very nice for the user experience. Unfortunately, we have cases where there are millions of bogus entries per shard, so it isn't practical to buffer everything in memory and only print all of the keys to stdout at the end.

Maybe I should add another cli flag so that a file path can be specified for output of the keys? What are your thoughts?

cbodley · 2023-06-21T19:12:38Z

That obviously isn't very nice for the user experience. Unfortunately, we have cases where there are millions of bogus entries per shard, so it isn't practical to buffer everything in memory and only print all of the keys to stdout at the end.

are the per-shard messages really necessary? to avoid excessive buffering, you could keep a counter and call formatter->flush(std::cout) regularly

cbodley · 2023-06-21T19:17:10Z

the merge of #50206 introduced several conflicts in rgw_rados.*. hopefully that will make it easier to get real optional_yields everywhere

github-actions · 2023-06-22T00:25:43Z

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

cfsnyder · 2023-07-05T08:38:39Z

jenkins test make check

cbodley · 2023-07-07T14:43:28Z

i've been unable to get successful builds for this in https://shaman.ceph.com/builds/ceph/pr-51700/, trying again..

cbodley · 2023-07-11T16:01:21Z

passed qa in https://pulpito.ceph.com/cbodley-2023-07-11_12:29:18-rgw-pr-51700-distro-default-smithi/

qa/suites/rgw/verify/tasks/versioning.yml

cbodley · 2023-07-11T19:57:40Z

testing with the versioning.yaml rename is consistently failing here:

2023-07-11T18:57:35.658 INFO:tasks.workunit.client.0.smithi101.stdout:connected to http://localhost:80
2023-07-11T18:57:35.665 INFO:tasks.workunit.client.0.smithi101.stderr:Traceback (most recent call last):
2023-07-11T18:57:35.665 INFO:tasks.workunit.client.0.smithi101.stderr:  File "/home/ubuntu/cephtest/clone.client.0/qa/workunits/rgw/test_rgw_versioning.py", line 106, in <module>
2023-07-11T18:57:35.666 INFO:tasks.workunit.client.0.smithi101.stderr:    main()
2023-07-11T18:57:35.666 INFO:tasks.workunit.client.0.smithi101.stderr:  File "/home/ubuntu/cephtest/clone.client.0/qa/workunits/rgw/test_rgw_versioning.py", line 79, in main
2023-07-11T18:57:35.666 INFO:tasks.workunit.client.0.smithi101.stderr:    json_out = json.loads(out)
2023-07-11T18:57:35.666 INFO:tasks.workunit.client.0.smithi101.stderr:  File "/usr/lib64/python3.6/json/__init__.py", line 349, in loads
2023-07-11T18:57:35.667 INFO:tasks.workunit.client.0.smithi101.stderr:    s = s.decode(detect_encoding(s), 'surrogatepass')
2023-07-11T18:57:35.667 INFO:tasks.workunit.client.0.smithi101.stderr:UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 1954: invalid start byte
2023-07-11T18:57:35.705 DEBUG:teuthology.orchestra.run:got remote process result: 1

cfsnyder · 2023-07-12T11:37:19Z

testing with the versioning.yaml rename is consistently failing here:

2023-07-11T18:57:35.658 INFO:tasks.workunit.client.0.smithi101.stdout:connected to http://localhost:80
2023-07-11T18:57:35.665 INFO:tasks.workunit.client.0.smithi101.stderr:Traceback (most recent call last):
2023-07-11T18:57:35.665 INFO:tasks.workunit.client.0.smithi101.stderr:  File "/home/ubuntu/cephtest/clone.client.0/qa/workunits/rgw/test_rgw_versioning.py", line 106, in <module>
2023-07-11T18:57:35.666 INFO:tasks.workunit.client.0.smithi101.stderr:    main()
2023-07-11T18:57:35.666 INFO:tasks.workunit.client.0.smithi101.stderr:  File "/home/ubuntu/cephtest/clone.client.0/qa/workunits/rgw/test_rgw_versioning.py", line 79, in main
2023-07-11T18:57:35.666 INFO:tasks.workunit.client.0.smithi101.stderr:    json_out = json.loads(out)
2023-07-11T18:57:35.666 INFO:tasks.workunit.client.0.smithi101.stderr:  File "/usr/lib64/python3.6/json/__init__.py", line 349, in loads
2023-07-11T18:57:35.667 INFO:tasks.workunit.client.0.smithi101.stderr:    s = s.decode(detect_encoding(s), 'surrogatepass')
2023-07-11T18:57:35.667 INFO:tasks.workunit.client.0.smithi101.stderr:UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 1954: invalid start byte
2023-07-11T18:57:35.705 DEBUG:teuthology.orchestra.run:got remote process result: 1

I'll look into this today. Tested briefly this morning and see that it is still passing for me with a vstart cluster.

qa/workunits/rgw/test_rgw_versioning.py

cbodley · 2023-07-12T19:53:04Z

rescheduled the rgw/verify suite with existing builds against your updated suite-branch in https://pulpito.ceph.com/cbodley-2023-07-12_19:51:14-rgw:verify-main-distro-default-smithi

cbodley · 2023-07-13T15:10:54Z

same failures again

Signed-off-by: Cory Snyder <[email protected]>

cfsnyder · 2023-07-14T13:43:39Z

jenkins test signed

cbodley · 2023-07-14T17:15:42Z

latest version passed a single job: https://pulpito.ceph.com/cbodley-2023-07-14_14:13:10-rgw:verify-main-distro-default-smithi/

scheduled the whole suite 🤞 https://pulpito.ceph.com/cbodley-2023-07-14_16:57:08-rgw-main-distro-default-smithi/

cbodley · 2023-07-17T14:53:32Z

lots of dead jobs due to lab issues. scheduled a --rerun in https://pulpito.ceph.com/cbodley-2023-07-17_14:52:30-rgw-main-distro-default-smithi

cbodley · 2023-07-17T18:36:11Z

one run-reshard.sh failure from http://qa-proxy.ceph.com/teuthology/cbodley-2023-07-17_14:52:30-rgw-main-distro-default-smithi/7341683/teuthology.log:

2023-07-17T16:34:00.338 INFO:tasks.workunit.client.0.smithi105.stdout:connected to http://localhost:80
2023-07-17T16:34:00.338 INFO:tasks.workunit.client.0.smithi105.stderr:Traceback (most recent call last):
2023-07-17T16:34:00.338 INFO:tasks.workunit.client.0.smithi105.stderr:  File "/home/ubuntu/cephtest/clone.client.0/qa/workunits/rgw/test_rgw_reshard.py", line 288, in <module>
2023-07-17T16:34:00.339 INFO:tasks.workunit.client.0.smithi105.stderr:    main()
2023-07-17T16:34:00.339 INFO:tasks.workunit.client.0.smithi105.stderr:  File "/home/ubuntu/cephtest/clone.client.0/qa/workunits/rgw/test_rgw_reshard.py", line 166, in main
2023-07-17T16:34:00.339 INFO:tasks.workunit.client.0.smithi105.stderr:    log.debug('bucket name {}'.format(json_op[0]['bucket_name']))
2023-07-17T16:34:00.339 INFO:tasks.workunit.client.0.smithi105.stderr:IndexError: list index out of range
2023-07-17T16:34:00.411 DEBUG:teuthology.orchestra.run:got remote process result: 1
2023-07-17T16:34:00.411 INFO:tasks.workunit:Stopping ['rgw/run-reshard.sh'] on client.0...

cbodley · 2023-07-17T20:45:10Z

i don't think that's due to anything in this pr, it was just an unlikely race between the reshard add/reshard list commands and radosgw's resharding thread. note the timestamps from teuthology.log:

2023-07-17T16:33:59.989 INFO:tasks.workunit.client.0.smithi105.stderr:radosgw-admin reshard add --bucket a-bucket --num-shards 2
2023-07-17T16:34:00.025 INFO:tasks.workunit.client.0.smithi105.stderr:ignoring --setuser ceph since I am not root
2023-07-17T16:34:00.025 INFO:tasks.workunit.client.0.smithi105.stderr:ignoring --setgroup ceph since I am not root
...
2023-07-17T16:34:00.153 INFO:tasks.workunit.client.0.smithi105.stderr:2023-07-17T16:34:00.130+0000 7f73b1615f40 20 remove_watcher() i=7
2023-07-17T16:34:00.165 INFO:tasks.workunit.client.0.smithi105.stderr:radosgw-admin reshard list

and the reshard logging from rgw.ceph.client.0.log:

2023-07-17T16:34:00.122+0000 7f845fc16700 20 rgw reshard worker thread: process_entry resharding a-bucket
...
2023-07-17T16:34:00.142+0000 7f845fc16700  1 rgw reshard worker thread: execute INFO: reshard of bucket "a-bucket" completed successfully

i scheduled a re-rerun in https://pulpito.ceph.com/cbodley-2023-07-17_20:43:19-rgw-main-distro-default-smithi to see if it goes away

cbodley · 2023-07-17T21:34:21Z

lab issues causing too many dead jobs. i'll try again tomorrow

cbodley · 2023-07-18T13:28:19Z

down to one dead job in https://pulpito.ceph.com/cbodley-2023-07-17_21:21:12-rgw-main-distro-default-smithi/, and the failures all look unrelated. good enough for me!

cbodley · 2023-07-18T13:31:44Z

@cfsnyder i closed https://tracker.ceph.com/issues/59663 as resolved, and moved https://tracker.ceph.com/issues/61359 to Pending Backport. could you please prepare the reef backport asap so i can schedule tests?

separately, we'll follow up on the radosgw-admin command for cleanup. can you please create a new tracker issue for that part?

cfsnyder · 2023-07-18T14:57:35Z

@cbodley thanks, yeah I'll work on the backport this morning

cfsnyder · 2023-07-19T09:11:52Z

Added this tracker for the radosgw-admin commands: https://tracker.ceph.com/issues/62075

cfsnyder requested a review from a team as a code owner May 23, 2023 09:56

github-actions bot added common rgw tests labels May 23, 2023

cbodley reviewed May 24, 2023

View reviewed changes

src/rgw/driver/rados/rgw_rados.cc Show resolved Hide resolved

qa/workunits/rgw/test_rgw_reshard.py Outdated Show resolved Hide resolved

cfsnyder force-pushed the wip-cfsnyder-put-404 branch 3 times, most recently from 8f1880c to e43a991 Compare May 26, 2023 08:36

cbodley mentioned this pull request May 26, 2023

radosgw-admin: support injecting delays into bucket reshard #51733

Closed

cfsnyder force-pushed the wip-cfsnyder-put-404 branch from e43a991 to b01b017 Compare May 26, 2023 19:19

cfsnyder force-pushed the wip-cfsnyder-put-404 branch 2 times, most recently from 60a2da6 to d34a36c Compare June 2, 2023 10:03

cfsnyder requested a review from cbodley June 5, 2023 08:45

cfsnyder force-pushed the wip-cfsnyder-put-404 branch 3 times, most recently from eadb206 to e44e12d Compare June 5, 2023 10:42

cbodley reviewed Jun 5, 2023

View reviewed changes

src/rgw/driver/rados/rgw_rados.cc Outdated Show resolved Hide resolved

src/rgw/driver/rados/rgw_rados.cc Show resolved Hide resolved

src/rgw/driver/rados/rgw_rados.cc Outdated Show resolved Hide resolved

src/rgw/driver/rados/rgw_rados.cc Outdated Show resolved Hide resolved

cfsnyder force-pushed the wip-cfsnyder-put-404 branch from e44e12d to 8b0038c Compare June 6, 2023 20:31

cfsnyder force-pushed the wip-cfsnyder-put-404 branch from 8b0038c to 534032d Compare June 14, 2023 14:45

cbodley reviewed Jun 15, 2023

View reviewed changes

github-actions bot added the needs-rebase label Jun 22, 2023

cfsnyder requested a review from cbodley July 4, 2023 10:44

cbodley added the wip-cbodley-testing label Jul 5, 2023

cbodley reviewed Jul 11, 2023

View reviewed changes

qa/suites/rgw/verify/tasks/versioning.yml Show resolved Hide resolved

cfsnyder commented Jul 12, 2023

View reviewed changes

qa/workunits/rgw/test_rgw_versioning.py Outdated Show resolved Hide resolved

rgw: fix/improve test_rgw_versioning.py tests

aa1f40e

Signed-off-by: Cory Snyder <[email protected]>

cfsnyder force-pushed the wip-cfsnyder-put-404 branch from 635295f to aa1f40e Compare July 14, 2023 12:51

cfsnyder requested a review from cbodley July 14, 2023 12:52

cbodley approved these changes Jul 14, 2023

View reviewed changes

cbodley merged commit 21ad525 into ceph:main Jul 18, 2023

cfsnyder mentioned this pull request Jul 18, 2023

reef: rgw: fix consistency bug with OLH objects #52525

Merged

This was referenced Jul 19, 2023

quincy: rgw: fix consistency bug with OLH objects #52538

Merged

pacific: rgw: fix consistency bug with OLH objects #52552

Merged

cbodley mentioned this pull request Jul 20, 2023

rgw: Remove RGWSI_RADOS #50359

Merged

14 tasks

Comments

Conversation

cfsnyder commented May 23, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Contribution Guidelines

Checklist

Uh oh!

Uh oh!

Uh oh!

cfsnyder commented Jun 1, 2023

Uh oh!

cfsnyder commented Jun 2, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cbodley commented Jun 15, 2023

Uh oh!

cbodley left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cbodley Jun 15, 2023

Choose a reason for hiding this comment

Uh oh!

cfsnyder Jun 16, 2023

Choose a reason for hiding this comment

Uh oh!

cbodley Jun 15, 2023

Choose a reason for hiding this comment

Uh oh!

cfsnyder Jun 16, 2023

Choose a reason for hiding this comment

Uh oh!

cfsnyder commented Jun 16, 2023

Uh oh!

cbodley commented Jun 16, 2023

Uh oh!

cfsnyder commented Jun 20, 2023

Uh oh!

cbodley commented Jun 21, 2023

Uh oh!

cbodley commented Jun 21, 2023

Uh oh!

github-actions bot commented Jun 22, 2023

Uh oh!

cfsnyder commented Jul 5, 2023

Uh oh!

cbodley commented Jul 7, 2023

Uh oh!

cbodley commented Jul 11, 2023

Uh oh!

Uh oh!

cbodley commented Jul 11, 2023

Uh oh!

cfsnyder commented Jul 12, 2023

Uh oh!

Uh oh!

cbodley commented Jul 12, 2023

Uh oh!

cbodley commented Jul 13, 2023

Uh oh!

cfsnyder commented Jul 14, 2023

Uh oh!

cbodley commented Jul 14, 2023

Uh oh!

cbodley commented Jul 17, 2023

Uh oh!

cbodley commented Jul 17, 2023

Uh oh!

cbodley commented Jul 17, 2023

Uh oh!

cbodley commented Jul 17, 2023

Uh oh!

cbodley commented Jul 18, 2023

Uh oh!

cbodley commented Jul 18, 2023

Uh oh!

cfsnyder commented Jul 18, 2023

cfsnyder commented May 23, 2023 •

edited

Loading