Skip to content

Comments

rgw: add radosgw-admin bucket check olh/unlinked commands#52576

Merged
cbodley merged 5 commits intoceph:mainfrom
cfsnyder:wip-62075
Sep 22, 2023
Merged

rgw: add radosgw-admin bucket check olh/unlinked commands#52576
cbodley merged 5 commits intoceph:mainfrom
cfsnyder:wip-62075

Conversation

@cfsnyder
Copy link
Contributor

@cfsnyder cfsnyder commented Jul 21, 2023

Adds commands to radosgw-admin for checking for and fixing leftover entries in the bucket index (and associated RADOS objects).

Fixes: https://tracker.ceph.com/issues/62075
Signed-off-by: Cory Snyder [email protected]

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows

@cfsnyder cfsnyder requested a review from a team as a code owner July 21, 2023 15:49
@cfsnyder cfsnyder requested a review from cbodley July 21, 2023 15:58
Copy link
Contributor

@cbodley cbodley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very nicely done with the coroutines!

can you please share some example output so i can review the formatting?

@cbodley
Copy link
Contributor

cbodley commented Aug 2, 2023

what guidance should we provide for users that might be affected? do you think a release note would suffice?

@cfsnyder
Copy link
Contributor Author

cfsnyder commented Sep 1, 2023

very nicely done with the coroutines!

can you please share some example output so i can review the formatting?

radosgw-admin bucket check olh --bucket check-bucket --fix
2023-09-01T09:10:56.191+0000 7f77b1ab9f40 -1 WARNING: all dangerous and experimental features are enabled.
2023-09-01T09:10:56.199+0000 7f77b1ab9f40 -1 WARNING: all dangerous and experimental features are enabled.
2023-09-01T09:10:56.199+0000 7f77b1ab9f40 -1 WARNING: all dangerous and experimental features are enabled.
2023-09-01T09:10:56.399+0000 7f77b1ab9f40  1 NOTICE: finished shard 0 (0 entries removed)
2023-09-01T09:10:56.399+0000 7f77b1ab9f40  1 NOTICE: finished shard 1 (0 entries removed)
2023-09-01T09:10:56.399+0000 7f77b1ab9f40  1 NOTICE: finished shard 2 (0 entries removed)
2023-09-01T09:10:56.403+0000 7f77b1ab9f40  1 NOTICE: finished shard 3 (0 entries removed)
2023-09-01T09:10:56.403+0000 7f77b1ab9f40  1 NOTICE: finished shard 4 (0 entries removed)
2023-09-01T09:10:56.407+0000 7f77b1ab9f40  1 NOTICE: finished shard 6 (0 entries removed)
2023-09-01T09:10:56.411+0000 7f77b1ab9f40  1 NOTICE: finished shard 7 (0 entries removed)
2023-09-01T09:10:56.411+0000 7f77b1ab9f40  1 NOTICE: finished shard 9 (0 entries removed)
2023-09-01T09:10:56.415+0000 7f77b1ab9f40  1 NOTICE: finished shard 10 (0 entries removed)
2023-09-01T09:10:56.451+0000 7f77b1ab9f40  1 NOTICE: finished shard 5 (1 entries removed)
2023-09-01T09:10:56.455+0000 7f77b1ab9f40  1 NOTICE: finished shard 8 (1 entries removed)
2023-09-01T09:10:56.455+0000 7f77b1ab9f40  1 NOTICE: finished all shards (2 entries removed)
radosgw-admin bucket check olh --bucket check-bucket
2023-09-01T09:10:55.611+0000 7f05c8fa0f40 -1 WARNING: all dangerous and experimental features are enabled.
2023-09-01T09:10:55.619+0000 7f05c8fa0f40 -1 WARNING: all dangerous and experimental features are enabled.
2023-09-01T09:10:55.619+0000 7f05c8fa0f40 -1 WARNING: all dangerous and experimental features are enabled.
2023-09-01T09:10:55.811+0000 7f05c8fa0f40  1 NOTICE: finished shard 0 (0 entries found)
2023-09-01T09:10:55.815+0000 7f05c8fa0f40  1 NOTICE: finished shard 1 (0 entries found)
2023-09-01T09:10:55.815+0000 7f05c8fa0f40  1 NOTICE: finished shard 2 (0 entries found)
2023-09-01T09:10:55.815+0000 7f05c8fa0f40  1 NOTICE: finished shard 3 (0 entries found)
2023-09-01T09:10:55.819+0000 7f05c8fa0f40  1 NOTICE: finished shard 4 (0 entries found)
2023-09-01T09:10:55.819+0000 7f05c8fa0f40  1 NOTICE: finished shard 5 (1 entries found)
2023-09-01T09:10:55.819+0000 7f05c8fa0f40  1 NOTICE: finished shard 6 (0 entries found)
2023-09-01T09:10:55.823+0000 7f05c8fa0f40  1 NOTICE: finished shard 7 (0 entries found)
2023-09-01T09:10:55.823+0000 7f05c8fa0f40  1 NOTICE: finished shard 8 (1 entries found)
2023-09-01T09:10:55.827+0000 7f05c8fa0f40  1 NOTICE: finished shard 9 (0 entries found)
2023-09-01T09:10:55.827+0000 7f05c8fa0f40  1 NOTICE: finished shard 10 (0 entries found)
2023-09-01T09:10:55.827+0000 7f05c8fa0f40  1 NOTICE: finished all shards (2 entries found)
radosgw-admin bucket check olh --bucket check-bucket --dump-keys --hide-progress
2023-09-01T09:14:54.523+0000 7fcdc6e0af40 -1 WARNING: all dangerous and experimental features are enabled.
2023-09-01T09:14:54.531+0000 7fcdc6e0af40 -1 WARNING: all dangerous and experimental features are enabled.
2023-09-01T09:14:54.531+0000 7fcdc6e0af40 -1 WARNING: all dangerous and experimental features are enabled.
[
    "d",
    "c"
]
radosgw-admin bucket check unlinked --bucket check-bucket --min-age-hours 0
2023-09-01T09:16:29.506+0000 7f74b2919f40 -1 WARNING: all dangerous and experimental features are enabled.
2023-09-01T09:16:29.510+0000 7f74b2919f40 -1 WARNING: all dangerous and experimental features are enabled.
2023-09-01T09:16:29.514+0000 7f74b2919f40 -1 WARNING: all dangerous and experimental features are enabled.
2023-09-01T09:16:29.698+0000 7f74b2919f40  1 NOTICE: finished shard 0 (0 entries found)
2023-09-01T09:16:29.706+0000 7f74b2919f40  1 NOTICE: finished shard 2 (0 entries found)
2023-09-01T09:16:29.706+0000 7f74b2919f40  1 NOTICE: finished shard 3 (0 entries found)
2023-09-01T09:16:29.706+0000 7f74b2919f40  1 NOTICE: finished shard 4 (0 entries found)
2023-09-01T09:16:29.714+0000 7f74b2919f40  1 NOTICE: finished shard 6 (0 entries found)
2023-09-01T09:16:29.714+0000 7f74b2919f40  1 NOTICE: finished shard 7 (0 entries found)
2023-09-01T09:16:29.718+0000 7f74b2919f40  1 NOTICE: finished shard 9 (0 entries found)
2023-09-01T09:16:29.718+0000 7f74b2919f40  1 NOTICE: finished shard 1 (1 entries found)
2023-09-01T09:16:29.718+0000 7f74b2919f40  1 NOTICE: finished shard 5 (1 entries found)
2023-09-01T09:16:29.722+0000 7f74b2919f40  1 NOTICE: finished shard 8 (1 entries found)
2023-09-01T09:16:29.722+0000 7f74b2919f40  1 NOTICE: finished shard 10 (1 entries found)
2023-09-01T09:16:29.722+0000 7f74b2919f40  1 NOTICE: finished all shards (4 entries found)
radosgw-admin bucket check unlinked --bucket check-bucket --min-age-hours 0 --dump-keys --hide-progress
2023-09-01T09:17:15.511+0000 7f03e0f56f40 -1 WARNING: all dangerous and experimental features are enabled.
2023-09-01T09:17:15.519+0000 7f03e0f56f40 -1 WARNING: all dangerous and experimental features are enabled.
2023-09-01T09:17:15.519+0000 7f03e0f56f40 -1 WARNING: all dangerous and experimental features are enabled.
[
    {
        "name": "e",
        "instance": "ojyZwRsIalwzxD70bhPvXk8r7K4huNz"
    },
    {
        "name": "d",
        "instance": "Z5X.BaZV.ojkThuajQpB98mjzZpocCx"
    },
    {
        "name": "c",
        "instance": "KGw9c1D-FfFh2mLT-76wzx1qyqCz0fm"
    },
    {
        "name": "f",
        "instance": "lbJGb.boc7p2VfBq9gW7ISWg3NbCIzT"
    }
]

@cfsnyder cfsnyder force-pushed the wip-62075 branch 3 times, most recently from f612d4c to 5e4ddc5 Compare September 1, 2023 09:23
@cfsnyder
Copy link
Contributor Author

cfsnyder commented Sep 1, 2023

what guidance should we provide for users that might be affected? do you think a release note would suffice?

I'm not sure, what have you guys done for this sort of thing in the past? I could write something up to provide an explanation but not sure where it should live?

@cfsnyder cfsnyder requested a review from cbodley September 1, 2023 09:27
@cfsnyder cfsnyder force-pushed the wip-62075 branch 2 times, most recently from d197ecb to bcf9046 Compare September 11, 2023 19:41
@mattbenjamin
Copy link
Contributor

@ivancich could you give these changes some review, as well?

Copy link
Member

@ivancich ivancich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks really good. There's a lot of iterated I/O (e.g., check on listable), but there's really no obvious way around it.

I would love to see really nice comments above each of the functions added to rgw/driver/rados/rgw_bucket.cc explaining what they do and a little bit of how they do it.

Thank you!

@cbodley
Copy link
Contributor

cbodley commented Sep 12, 2023

what guidance should we provide for users that might be affected? do you think a release note would suffice?

I'm not sure, what have you guys done for this sort of thing in the past? I could write something up to provide an explanation but not sure where it should live?

@cfsnyder a recent PendingReleaseNotes example from #52248:

RGW: S3 multipart uploads using Server-Side Encryption now replicate correctly in
multi-site. Previously, the replicas of such objects were corrupted on decryption.
A new tool, radosgw-admin bucket resync encrypted multipart, can be used to
identify these original multipart uploads. The LastModified timestamp of any
identified object is incremented by 1ns to cause peer zones to replicate it again.
For multi-site deployments that make any use of Server-Side Encryption, we
recommended running this command against every bucket in every zone after all
zones have upgraded.

@cfsnyder
Copy link
Contributor Author

Looks really good. There's a lot of iterated I/O (e.g., check on listable), but there's really no obvious way around it.

I would love to see really nice comments above each of the functions added to rgw/driver/rados/rgw_bucket.cc explaining what they do and a little bit of how they do it.

Thank you!

Thanks for the review, @ivancich! I've added some doc comments, let me know if you have any feedback.

@cfsnyder
Copy link
Contributor Author

what guidance should we provide for users that might be affected? do you think a release note would suffice?

I'm not sure, what have you guys done for this sort of thing in the past? I could write something up to provide an explanation but not sure where it should live?

@cfsnyder a recent PendingReleaseNotes example from #52248:

RGW: S3 multipart uploads using Server-Side Encryption now replicate correctly in
multi-site. Previously, the replicas of such objects were corrupted on decryption.
A new tool, radosgw-admin bucket resync encrypted multipart, can be used to
identify these original multipart uploads. The LastModified timestamp of any
identified object is incremented by 1ns to cause peer zones to replicate it again.
For multi-site deployments that make any use of Server-Side Encryption, we
recommended running this command against every bucket in every zone after all
zones have upgraded.

@cbodley how about this:

RGW: New tools have been added to radosgw-admin for identifying and correcting
issues with versioned bucket indexes. Historical bugs with the versioned bucket
index transaction workflow made it possible for the index to accumulate
extraneous "book-keeping" olh entries and plain placeholder entries. In some
specific scenarios where clients made concurrent requests referencing the same
object key, it was likely that a lot of extra index entries would accumulate. When
a significant number of these entries are present in a single bucket index shard,
they can cause high bucket listing latencies and lifecycle processing failures. To
check whether a versioned bucket has unnecessary olh entries, users can
now run radosgw-admin bucket check olh. If the --fix flag is used, the
extra entries will be safely removed. A distinct issue from the one described thus
far, it is also possible that some versioned buckets are maintaining extra unlinked
objects that are not listable from the S3/Swift APIs. These extra objects are
typically a result of PUT requests that exited abnormally, in the middle of a
bucket index transaction - so the client would not have received a successful
response. Bugs in prior releases made these unlinked objects easy to
reproduce with any PUT request that was made on a bucket that was actively
resharding. Besides the extra space that these hidden, unlinked objects consume,
there can be another side effect in certain scenarios, caused by the nature of the
failure mode that produced them, where a client of a bucket that was a victim
of this bug may find the object associated with the key to be in an inconsistent
state. To check whether a versioned bucket has unlinked entries, users can
now run radosgw-admin bucket check unlinked. If the --fix flag is used, the
unlinked objects will be safely removed. Finally, a third issue made it possible
for versioned bucket index stats to be accounted inaccurately. The tooling
for recalculating versioned bucket stats also had a bug, and was not previously
capable of fixing these inaccuracies. This release resolves those issues and
users can now expect that the existing radosgw-admin bucket check command
will produce correct results. We recommend that users with versioned buckets,
especially those that existed on prior releases, use these new tools to check
whether their buckets are affected and to clean them up accordingly.

@cbodley
Copy link
Contributor

cbodley commented Sep 15, 2023

@cbodley how about this:

@cfsnyder that looks great. can you please add that to the PendingReleaseNotes file?

If a call to bucket_index_link_olh or bucket_index_unlink_instance
fails, its associated pending xattr may have prevented the olh object
from being removed by another thread. We should do a best effort
cleanup attempt for this case by calling update_olh before returning
an error to the caller.

Signed-off-by: Cory Snyder <[email protected]>
@cfsnyder
Copy link
Contributor Author

FYI added another small commit to handle another possible scenario where olh objects get left behind when concurrent requests are interleaved in a particular way

@cfsnyder
Copy link
Contributor Author

Looks like failures are unrelated to me, but I'll let you double check the results @cbodley .

…--check-objects flag

Printing all index entries can be very time consuming for large
buckets and the inability to switch this behavior off makes it
cumbersome to use the command for fixing bucket stats. This was
also preventing the command from outputting recalculated bucket
stats when the --fix flag wasn't specified.

Signed-off-by: Cory Snyder <[email protected]>
@cfsnyder
Copy link
Contributor Author

Actually added one more tiny change in a new commit. I don't think it should require a new QA run.

radosgw-admin bucket check should only print index entries with --check-objects flag

Printing all index entries can be very time consuming for large
buckets and the inability to switch this behavior off makes it
cumbersome to use the command for fixing bucket stats. This was
also preventing the command from outputting recalculated bucket
stats when the --fix flag wasn't specified.

Based upon the error message here, it looks like this was the intended use of the --check-objects flag anyhow, so this may have been a regression

Comment on lines +8105 to +8108
// it's possible that the pending xattr from this op prevented the olh
// object from being cleaned by another thread that was deleting the last
// existing version. We invoke a best-effort update_olh here to handle this case.
int r = update_olh(dpp, obj_ctx, state, bucket_info, olh_obj, y);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@cbodley cbodley merged commit 9b7cd7f into ceph:main Sep 22, 2023
@cbodley
Copy link
Contributor

cbodley commented Sep 23, 2023

Actually added one more tiny change in a new commit. I don't think it should require a new QA run.

@cfsnyder unfortunately, that commit broke the ragweed tests which use RGWOp_Check_Bucket_Index (the admin API version of 'bucket check'): https://tracker.ceph.com/issues/62956

@cfsnyder
Copy link
Contributor Author

@cbodley , I noticed that at the EOD on Friday while testing a Pacific backport of these changes. What I found was that this admin API doesn't return valid JSON. It coincidentally did return valid JSON previously, with the right combination of query params and due to the bug that was fixed here. I just added a commit here #53607 to fix the output format of the API. I'll look at updating the Ragweed tests accordingly this morning, but wondering if you know of any other clients that may need to be updated to account for this fix?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants