Skip to content

Comments

radosgw-admin: add command to resync encrypted multipart objects#51842

Closed
cbodley wants to merge 3 commits intoceph:mainfrom
cbodley:wip-rgw-resync-encrypted-multipart
Closed

radosgw-admin: add command to resync encrypted multipart objects#51842
cbodley wants to merge 3 commits intoceph:mainfrom
cbodley:wip-rgw-resync-encrypted-multipart

Conversation

@cbodley
Copy link
Contributor

@cbodley cbodley commented May 30, 2023

a recovery command for encrypted multipart objects that were corrupted on replication by https://tracker.ceph.com/issues/46062. depends on a separate fix for the replication logic to remember the source object's part boundaries for decryption

radosgw-admin bucket resync encrypted multipart command lists all of the object versions in a given bucket and checks the head object xattrs for encryption and multipart. if found, rewrites the head object with an updated mtime so other zones will resync the data

TODO:

  • clean up json format
  • test with versioned buckets
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows

@github-actions github-actions bot added the rgw label May 30, 2023
@cbodley cbodley requested a review from mdw-at-linuxbox May 30, 2023 18:01
@mattbenjamin
Copy link
Contributor

@cbodley I assume is there logic to detect that the current site is the originating location of each object?

@cbodley
Copy link
Contributor Author

cbodley commented May 30, 2023

@cbodley I assume is there logic to detect that the current site is the originating location of each object?

@mattbenjamin the check for a multipart manifest should cover that, since the replicas aren't multipart. hopefully @mdw-at-linuxbox's fix doesn't change that, but we'll see

@cbodley cbodley force-pushed the wip-rgw-resync-encrypted-multipart branch 3 times, most recently from 1254441 to 9fd0a4d Compare May 30, 2023 20:09
@cbodley
Copy link
Contributor Author

cbodley commented May 30, 2023

tested in a 2-zone multisite setup with default encryption support from #51786

after creating a testbucket and uploading a 6m multipart object, i ran the command once on the source zone:

$ bin/radosgw-admin -c run/c1/ceph.conf bucket resync encrypted multipart --bucket testbucket
{
    "bucket": "testbucket",
    "bucket_id": "550f5338-009b-4020-b8d7-7be0d7d7da98.4190.1",
    "progress": [
        {
            "name": "6m",
            "mtime": "2023-05-30T20:03:16.621151Z"
        },
        {
            "processed": 1,
            "marker": "6m"
        }
    ]
}

and verified that the secondary zone re-fetched the object:

2023-05-30T16:05:03.643-0400 7fb7ce7886c0 20 HTTP_IF_MODIFIED_SINCE=2023-05-30T20:03:16.621151420Z
2023-05-30T16:05:03.652-0400 7fb7c176e6c0  2 req 2113387870146149013 0.009000324s s3:get_obj http status=200
2023-05-30T16:05:03.652-0400 7fb7c176e6c0  1 ====== req done req=0x7fb6e5db6710 op status=0 http_status=200 latency=0.009000324s ======

the request returns the expected 200 OK instead of 304 Not Modified, because the source mtime was 1ns higher than the destination's mtime

i then ran the command again:


$ bin/radosgw-admin -c run/c1/ceph.conf bucket resync encrypted multipart --bucket testbucket
{
    "bucket": "testbucket",
    "bucket_id": "550f5338-009b-4020-b8d7-7be0d7d7da98.4190.1",
    "progress": [
        {
            "name": "6m",
            "mtime": "2023-05-30T20:03:16.621151Z"
        },
        {
            "processed": 1,
            "marker": "6m"
        }
    ]
}

and saw the remote zone fetch it again successfully:

2023-05-30T16:06:23.646-0400 7fb77e6e86c0 20 HTTP_IF_MODIFIED_SINCE=2023-05-30T20:03:16.621151421Z
2023-05-30T16:06:23.654-0400 7fb7736d26c0  2 req 5608249310521457762 0.008000289s s3:get_obj http status=200
2023-05-30T16:06:23.654-0400 7fb7736d26c0  1 ====== req done req=0x7fb6e5db6710 op status=0 http_status=200 latency=0.008000289s ======

@cbodley cbodley force-pushed the wip-rgw-resync-encrypted-multipart branch 2 times, most recently from b591656 to bd92e2b Compare May 30, 2023 21:24
@cbodley
Copy link
Contributor Author

cbodley commented Jun 1, 2023

verified that it works for versioned objects, and that set_attrs() updates the existing version without creating a new one

updated json format:

{
    "bucket": "testbucket",
    "bucket_id": "d5f7d998-2bbd-4336-a7bd-91a387238d21.4190.1",
    "progress": [
        {
            "modified": [
                {
                    "name": "7m",
                    "version": "XSY3vYkehzdtAcrRQFRsTyoTyc3pkqQ",
                    "mtime": "2023-05-30T20:44:03.708808Z"
                },
                {
                    "name": "6m",
                    "version": "VZaeMU6GdgbJxSPRhwyDSgT5EMA1na.",
                    "mtime": "2023-05-30T20:36:48.161519Z"
                },
                {
                    "name": "6m",
                    "version": "-jRwboscFog-0-xYmd.GNYdwSM08bNx",
                    "mtime": "2023-05-30T20:36:47.041855Z"
                }
            ],
            "total processed": 3,
            "marker": "6m"
        }
    ]
}

@zap51
Copy link
Contributor

zap51 commented Jun 18, 2023

Hello Devs,
I'm assuming it might take a while for this to be merged, but please let me know if the question at https://lists.ceph.io/hyperkitty/list/[email protected]/thread/PDDZB6XFB5M2JTDXQPX6WLYLGYUYDOOI/ can be achieved.

Thanks!

@cbodley cbodley mentioned this pull request Jun 29, 2023
14 tasks
@cbodley
Copy link
Contributor Author

cbodley commented Jun 29, 2023

after reproducing the checksum mismatch error with the same steps from https://tracker.ceph.com/issues/46062#note-6, i upgraded radosgws with the draft fix in #52248, and verified that this tool reschedules its replication, and that replication corrects the md5 mismatch

object on secondary zone is corrupted:

~/ceph/build $ s3cmd -c ../work/c2.s3cfg get s3://testbucket/6m 6m.c2
download: 's3://testbucket/6m' -> '6m.c2'  [1 of 1]    
 6291456 of 6291456   100% in    0s   176.78 MB/s  done
WARNING: MD5 signatures do not match: computed=4101694a589baca05b76afb00a53206c, received=cca06bdd97b45abef3ac0f28f182ab69

repair tool on primary zone detects the encrypted multipart object:

~/ceph/build $ bin/radosgw-admin -c run/c1/ceph.conf bucket resync encrypted multipart --bucket testbucket
{
    "bucket": "testbucket",
    "bucket_id": "58fd8dfe-4738-4885-bede-f684962fe59f.4190.1",
    "progress": [
        {
            "modified": [
                {
                    "name": "6m",
                    "mtime": "2023-06-29T17:06:53.860185Z"
                }
            ],
            "total processed": 2,
            "marker": "6m"
        }
    ]
}

the re-replicated object has correct checksum:

~/ceph/build $ s3cmd -c ../work/c2.s3cfg get s3://testbucket/6m 6m.c2
download: 's3://testbucket/6m' -> '6m.c2'  [1 of 1]
 6291456 of 6291456   100% in    0s   128.03 MB/s  done

@github-actions
Copy link

github-actions bot commented Jul 3, 2023

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

@cbodley cbodley force-pushed the wip-rgw-resync-encrypted-multipart branch from bd92e2b to b8942dc Compare July 3, 2023 18:33
@cbodley cbodley marked this pull request as ready for review July 31, 2023 13:39
@cbodley cbodley requested a review from a team as a code owner July 31, 2023 13:39
@cbodley
Copy link
Contributor Author

cbodley commented Aug 2, 2023

jenkins test make check

@cbodley
Copy link
Contributor Author

cbodley commented Aug 2, 2023

jenkins test api

@cbodley
Copy link
Contributor Author

cbodley commented Aug 3, 2023

@cbodley
Copy link
Contributor Author

cbodley commented Aug 3, 2023

jenkins test api

@cbodley
Copy link
Contributor Author

cbodley commented Aug 3, 2023

jenkins build docs

@cbodley
Copy link
Contributor Author

cbodley commented Aug 3, 2023

jenkins test docs

@cbodley
Copy link
Contributor Author

cbodley commented Aug 3, 2023

moved commits into #52248 to simplify backports

@cbodley cbodley closed this Aug 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants