Skip to content

Comments

rgw: fix consistency bug with OLH objects#51700

Merged
cbodley merged 8 commits intoceph:mainfrom
cfsnyder:wip-cfsnyder-put-404
Jul 18, 2023
Merged

rgw: fix consistency bug with OLH objects#51700
cbodley merged 8 commits intoceph:mainfrom
cfsnyder:wip-cfsnyder-put-404

Conversation

@cfsnyder
Copy link
Contributor

@cfsnyder cfsnyder commented May 23, 2023

Relates to:
https://tracker.ceph.com/issues/61359
https://tracker.ceph.com/issues/59663

Contribution Guidelines

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows

@cfsnyder cfsnyder requested a review from a team as a code owner May 23, 2023 09:56
@cfsnyder cfsnyder force-pushed the wip-cfsnyder-put-404 branch 3 times, most recently from 8f1880c to e43a991 Compare May 26, 2023 08:36
@cfsnyder cfsnyder force-pushed the wip-cfsnyder-put-404 branch from e43a991 to b01b017 Compare May 26, 2023 19:19
@cfsnyder
Copy link
Contributor Author

cfsnyder commented Jun 1, 2023

I still want to investigate and add test cases for the scenario where an object transitions from unversioned to versioned, because I suspect that these changes have the potential to cause problems there

@cfsnyder cfsnyder force-pushed the wip-cfsnyder-put-404 branch 2 times, most recently from 60a2da6 to d34a36c Compare June 2, 2023 10:03
@cfsnyder
Copy link
Contributor Author

cfsnyder commented Jun 2, 2023

I still want to investigate and add test cases for the scenario where an object transitions from unversioned to versioned, because I suspect that these changes have the potential to cause problems there

I added https://github.com/ceph/ceph/pull/51700/files#diff-fb30258ed43305d97a0fdeb02fdb91bc0e1613985e4dfbeab46381c4f1719bc4R6818-R6829 to handle that case properly and added a test case for it

@cfsnyder cfsnyder requested a review from cbodley June 5, 2023 08:45
@cfsnyder cfsnyder force-pushed the wip-cfsnyder-put-404 branch 3 times, most recently from eadb206 to e44e12d Compare June 5, 2023 10:42
@cfsnyder cfsnyder force-pushed the wip-cfsnyder-put-404 branch from e44e12d to 8b0038c Compare June 6, 2023 20:31
@cfsnyder cfsnyder force-pushed the wip-cfsnyder-put-404 branch from 8b0038c to 534032d Compare June 14, 2023 14:45
@cbodley
Copy link
Contributor

cbodley commented Jun 15, 2023

the new "apply_olh_log should clear_olh before trimming olh log" commit looks great 👍

Copy link
Contributor

@cbodley cbodley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high-level feedback for the admin commands:

  • this adds too much to rgw_admin.cc. please move the logic to driver/rados/rgw_bucket.* similar to check_index(). please create separate functions for "check olh" vs "check unlinked" instead of branching internally
  • instead of just logging the number of matches, can we dump a json array with all of their keys? i believe that's how 'bucket check' works
  • please consider using coroutines instead of threads. a coroutine function that handles a single shard would be much easier to read and maintain. if you want to limit how many shards we process at a time, spawn N coroutines that each process a different value of (shard % N)

return;
}
RGWRados::BucketShard bs(store);
string marker = opt_cmd == OPT::BUCKET_CHECK_OLH ? "\x80" "1001_" : "\x80" "1000_";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok. these strings are an implementation detail of cls_rgw, but i don't think the existing bi_list API gives us a way to list a specific index otherwise

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I know. I don't feel great about that either, but it's a significant optimization to skip listing all of the irrelevant parts of the keyspace and I didn't think it would be a great idea to add new cls methods just for these specialty listing cases.

listable = false;
int ret;
do {
ret = store->bi_list(bs, key.name, marker, 1000, &entries, &is_truncated);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider using cls_rgw_bucket_list_op() instead, which should only be returning the plain entries. bi_list() would search each of the special namespaces too

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm breaking out of the loop when it encounters a non-plain entry, so it won't actually iterate over the special namespaces. The issue with using cls_rgw_bucket_list_op is that it's susceptible to high latencies and errors when a bunch of these bogus entries are in the index (the ones that we're trying to clean up). That said, I acknowledge that I do need to do something different here because breaking out of the loop when I hit the first non-plain entry means that I'm not considering any plain entries that come after the special namespaces. I'll review this.

@cfsnyder
Copy link
Contributor Author

high-level feedback for the admin commands:

* this adds too much to rgw_admin.cc. please move the logic to driver/rados/rgw_bucket.* similar to check_index(). please create separate functions for "check olh" vs "check unlinked" instead of branching internally

Ok I agree, I will refactor accordingly.

* instead of just logging the number of matches, can we dump a json array with all of their keys? i believe that's how 'bucket check' works

Yep that sounds good.

* please consider using coroutines instead of threads. a coroutine function that handles a single shard would be much easier to read and maintain. if you want to limit how many shards we process at a time, spawn N coroutines that each process a different value of (shard % N)

I did consider that initially, but since rgw_admin doesn't currently use coroutines or set up an asio io_context, I wasn't sure if it made sense to include in a bugfix PR. I'll give it a try and see what it looks like.

@cbodley
Copy link
Contributor

cbodley commented Jun 16, 2023

I'll give it a try and see what it looks like.

you can find an example in rgw_sync_checkpoint.cc rgw_bucket_sync_checkpoint() for BUCKET_SYNC_CHECKPOINT

@cfsnyder
Copy link
Contributor Author

@cbodley

instead of just logging the number of matches, can we dump a json array with all of their keys? i believe that's how 'bucket check' works

I'm questioning whether this is a good idea now that I've got around to implementing it. I'm currently outputting progress information via stderr logs and then the JSON output goes to stdout. When dumping all of the keys in a JSON array, the command output with no redirection ends up looking like this:

2023-06-20T09:34:46.444+0000 7f4491b96f40  1 NOTICE: finished shard 0 (0 entries found)
2023-06-20T09:34:46.444+0000 7f4491b96f40  1 NOTICE: finished shard 1 (0 entries found)
2023-06-20T09:34:46.448+0000 7f4491b96f40  1 NOTICE: finished shard 2 (0 entries found)
2023-06-20T09:34:46.452+0000 7f4491b96f40  1 NOTICE: finished shard 3 (0 entries found)
2023-06-20T09:34:46.452+0000 7f4491b96f40  1 NOTICE: finished shard 4 (0 entries found)
[
    "d"2023-06-20T09:34:46.456+0000 7f4491b96f40  1 NOTICE: finished shard 5 (1 entries found)
2023-06-20T09:34:46.460+0000 7f4491b96f40  1 NOTICE: finished shard 6 (0 entries found)
2023-06-20T09:34:46.460+0000 7f4491b96f40  1 NOTICE: finished shard 7 (0 entries found)
,
    "c"2023-06-20T09:34:46.464+0000 7f4491b96f40  1 NOTICE: finished shard 8 (1 entries found)
2023-06-20T09:34:46.468+0000 7f4491b96f40  1 NOTICE: finished shard 9 (0 entries found)

]
2023-06-20T09:34:46.468+0000 7f4491b96f40  1 NOTICE: finished shard 10 (0 entries found)
2023-06-20T09:34:46.468+0000 7f4491b96f40  1 NOTICE: finished all shards (2 entries found)

That obviously isn't very nice for the user experience. Unfortunately, we have cases where there are millions of bogus entries per shard, so it isn't practical to buffer everything in memory and only print all of the keys to stdout at the end.

Maybe I should add another cli flag so that a file path can be specified for output of the keys? What are your thoughts?

@cbodley
Copy link
Contributor

cbodley commented Jun 21, 2023

That obviously isn't very nice for the user experience. Unfortunately, we have cases where there are millions of bogus entries per shard, so it isn't practical to buffer everything in memory and only print all of the keys to stdout at the end.

are the per-shard messages really necessary? to avoid excessive buffering, you could keep a counter and call formatter->flush(std::cout) regularly

@cbodley
Copy link
Contributor

cbodley commented Jun 21, 2023

the merge of #50206 introduced several conflicts in rgw_rados.*. hopefully that will make it easier to get real optional_yields everywhere

@github-actions
Copy link

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

@cfsnyder cfsnyder requested a review from cbodley July 4, 2023 10:44
@cfsnyder
Copy link
Contributor Author

cfsnyder commented Jul 5, 2023

jenkins test make check

@cbodley
Copy link
Contributor

cbodley commented Jul 7, 2023

i've been unable to get successful builds for this in https://shaman.ceph.com/builds/ceph/pr-51700/, trying again..

@cbodley
Copy link
Contributor

cbodley commented Jul 11, 2023

@cbodley
Copy link
Contributor

cbodley commented Jul 11, 2023

testing with the versioning.yaml rename is consistently failing here:

2023-07-11T18:57:35.658 INFO:tasks.workunit.client.0.smithi101.stdout:connected to http://localhost:80
2023-07-11T18:57:35.665 INFO:tasks.workunit.client.0.smithi101.stderr:Traceback (most recent call last):
2023-07-11T18:57:35.665 INFO:tasks.workunit.client.0.smithi101.stderr:  File "/home/ubuntu/cephtest/clone.client.0/qa/workunits/rgw/test_rgw_versioning.py", line 106, in <module>
2023-07-11T18:57:35.666 INFO:tasks.workunit.client.0.smithi101.stderr:    main()
2023-07-11T18:57:35.666 INFO:tasks.workunit.client.0.smithi101.stderr:  File "/home/ubuntu/cephtest/clone.client.0/qa/workunits/rgw/test_rgw_versioning.py", line 79, in main
2023-07-11T18:57:35.666 INFO:tasks.workunit.client.0.smithi101.stderr:    json_out = json.loads(out)
2023-07-11T18:57:35.666 INFO:tasks.workunit.client.0.smithi101.stderr:  File "/usr/lib64/python3.6/json/__init__.py", line 349, in loads
2023-07-11T18:57:35.667 INFO:tasks.workunit.client.0.smithi101.stderr:    s = s.decode(detect_encoding(s), 'surrogatepass')
2023-07-11T18:57:35.667 INFO:tasks.workunit.client.0.smithi101.stderr:UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 1954: invalid start byte
2023-07-11T18:57:35.705 DEBUG:teuthology.orchestra.run:got remote process result: 1

@cfsnyder
Copy link
Contributor Author

testing with the versioning.yaml rename is consistently failing here:

2023-07-11T18:57:35.658 INFO:tasks.workunit.client.0.smithi101.stdout:connected to http://localhost:80
2023-07-11T18:57:35.665 INFO:tasks.workunit.client.0.smithi101.stderr:Traceback (most recent call last):
2023-07-11T18:57:35.665 INFO:tasks.workunit.client.0.smithi101.stderr:  File "/home/ubuntu/cephtest/clone.client.0/qa/workunits/rgw/test_rgw_versioning.py", line 106, in <module>
2023-07-11T18:57:35.666 INFO:tasks.workunit.client.0.smithi101.stderr:    main()
2023-07-11T18:57:35.666 INFO:tasks.workunit.client.0.smithi101.stderr:  File "/home/ubuntu/cephtest/clone.client.0/qa/workunits/rgw/test_rgw_versioning.py", line 79, in main
2023-07-11T18:57:35.666 INFO:tasks.workunit.client.0.smithi101.stderr:    json_out = json.loads(out)
2023-07-11T18:57:35.666 INFO:tasks.workunit.client.0.smithi101.stderr:  File "/usr/lib64/python3.6/json/__init__.py", line 349, in loads
2023-07-11T18:57:35.667 INFO:tasks.workunit.client.0.smithi101.stderr:    s = s.decode(detect_encoding(s), 'surrogatepass')
2023-07-11T18:57:35.667 INFO:tasks.workunit.client.0.smithi101.stderr:UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 1954: invalid start byte
2023-07-11T18:57:35.705 DEBUG:teuthology.orchestra.run:got remote process result: 1

I'll look into this today. Tested briefly this morning and see that it is still passing for me with a vstart cluster.

@cbodley
Copy link
Contributor

cbodley commented Jul 12, 2023

rescheduled the rgw/verify suite with existing builds against your updated suite-branch in https://pulpito.ceph.com/cbodley-2023-07-12_19:51:14-rgw:verify-main-distro-default-smithi

@cbodley
Copy link
Contributor

cbodley commented Jul 13, 2023

same failures again

@cfsnyder cfsnyder force-pushed the wip-cfsnyder-put-404 branch from 635295f to aa1f40e Compare July 14, 2023 12:51
@cfsnyder cfsnyder requested a review from cbodley July 14, 2023 12:52
@cfsnyder
Copy link
Contributor Author

jenkins test signed

@cbodley
Copy link
Contributor

cbodley commented Jul 17, 2023

@cbodley
Copy link
Contributor

cbodley commented Jul 17, 2023

one run-reshard.sh failure from http://qa-proxy.ceph.com/teuthology/cbodley-2023-07-17_14:52:30-rgw-main-distro-default-smithi/7341683/teuthology.log:

2023-07-17T16:34:00.338 INFO:tasks.workunit.client.0.smithi105.stdout:connected to http://localhost:80
2023-07-17T16:34:00.338 INFO:tasks.workunit.client.0.smithi105.stderr:Traceback (most recent call last):
2023-07-17T16:34:00.338 INFO:tasks.workunit.client.0.smithi105.stderr:  File "/home/ubuntu/cephtest/clone.client.0/qa/workunits/rgw/test_rgw_reshard.py", line 288, in <module>
2023-07-17T16:34:00.339 INFO:tasks.workunit.client.0.smithi105.stderr:    main()
2023-07-17T16:34:00.339 INFO:tasks.workunit.client.0.smithi105.stderr:  File "/home/ubuntu/cephtest/clone.client.0/qa/workunits/rgw/test_rgw_reshard.py", line 166, in main
2023-07-17T16:34:00.339 INFO:tasks.workunit.client.0.smithi105.stderr:    log.debug('bucket name {}'.format(json_op[0]['bucket_name']))
2023-07-17T16:34:00.339 INFO:tasks.workunit.client.0.smithi105.stderr:IndexError: list index out of range
2023-07-17T16:34:00.411 DEBUG:teuthology.orchestra.run:got remote process result: 1
2023-07-17T16:34:00.411 INFO:tasks.workunit:Stopping ['rgw/run-reshard.sh'] on client.0...

@cbodley
Copy link
Contributor

cbodley commented Jul 17, 2023

i don't think that's due to anything in this pr, it was just an unlikely race between the reshard add/reshard list commands and radosgw's resharding thread. note the timestamps from teuthology.log:

2023-07-17T16:33:59.989 INFO:tasks.workunit.client.0.smithi105.stderr:radosgw-admin reshard add --bucket a-bucket --num-shards 2
2023-07-17T16:34:00.025 INFO:tasks.workunit.client.0.smithi105.stderr:ignoring --setuser ceph since I am not root
2023-07-17T16:34:00.025 INFO:tasks.workunit.client.0.smithi105.stderr:ignoring --setgroup ceph since I am not root
...
2023-07-17T16:34:00.153 INFO:tasks.workunit.client.0.smithi105.stderr:2023-07-17T16:34:00.130+0000 7f73b1615f40 20 remove_watcher() i=7
2023-07-17T16:34:00.165 INFO:tasks.workunit.client.0.smithi105.stderr:radosgw-admin reshard list

and the reshard logging from rgw.ceph.client.0.log:

2023-07-17T16:34:00.122+0000 7f845fc16700 20 rgw reshard worker thread: process_entry resharding a-bucket
...
2023-07-17T16:34:00.142+0000 7f845fc16700  1 rgw reshard worker thread: execute INFO: reshard of bucket "a-bucket" completed successfully

i scheduled a re-rerun in https://pulpito.ceph.com/cbodley-2023-07-17_20:43:19-rgw-main-distro-default-smithi to see if it goes away

@cbodley
Copy link
Contributor

cbodley commented Jul 17, 2023

lab issues causing too many dead jobs. i'll try again tomorrow

@cbodley
Copy link
Contributor

cbodley commented Jul 18, 2023

@cbodley cbodley merged commit 21ad525 into ceph:main Jul 18, 2023
@cbodley
Copy link
Contributor

cbodley commented Jul 18, 2023

@cfsnyder i closed https://tracker.ceph.com/issues/59663 as resolved, and moved https://tracker.ceph.com/issues/61359 to Pending Backport. could you please prepare the reef backport asap so i can schedule tests?

separately, we'll follow up on the radosgw-admin command for cleanup. can you please create a new tracker issue for that part?

@cfsnyder
Copy link
Contributor Author

@cbodley thanks, yeah I'll work on the backport this morning

@cfsnyder
Copy link
Contributor Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants