Skip to content

Comments

qa/multisite: enable two zonegroup yaml#60172

Merged
smanjara merged 1 commit intoceph:mainfrom
smanjara:rgw-multisite-two-zonegroups
Dec 3, 2024
Merged

qa/multisite: enable two zonegroup yaml#60172
smanjara merged 1 commit intoceph:mainfrom
smanjara:rgw-multisite-two-zonegroups

Conversation

@smanjara
Copy link
Contributor

@smanjara smanjara commented Oct 7, 2024

Contribution Guidelines

  • To sign and title your commits, please refer to Submitting Patches to Ceph.

  • If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.

  • When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows
  • jenkins test rook e2e

@adamemerson
Copy link
Contributor

Is this something to merge or are you testing something?

@smanjara
Copy link
Contributor Author

Is this something to merge or are you testing something?

oh just testing. adding a DNM

@smanjara smanjara added the DNM label Oct 12, 2024
@adamemerson adamemerson marked this pull request as draft October 12, 2024 01:41
@cbodley
Copy link
Contributor

cbodley commented Oct 16, 2024

@smanjara this PR removes the file two-zonegroup.yaml.disabled, but should instead rename it back to two-zonegroup.yaml

@smanjara smanjara force-pushed the rgw-multisite-two-zonegroups branch from 1aa28e8 to 001f815 Compare October 23, 2024 18:28
@smanjara
Copy link
Contributor Author

@smanjara this PR removes the file two-zonegroup.yaml.disabled, but should instead rename it back to two-zonegroup.yaml

oops

@smanjara smanjara force-pushed the rgw-multisite-two-zonegroups branch from 001f815 to 2290ddf Compare November 13, 2024 23:31
@smanjara
Copy link
Contributor Author

smanjara commented Nov 21, 2024

http://qa-proxy.ceph.com/teuthology/smanjara-2024-11-20_17:01:24-rgw:multisite-rgw-multisite-two-zonegroups-distro-default-smithi/8002132/teuthology.log

inconsistently reproducible time skew errors found:

2024-11-20T18:52:33.760 INFO:tasks.rgw_multisite_tests:======================================================================
2024-11-20T18:52:33.760 INFO:tasks.rgw_multisite_tests:ERROR: rgw_multi.tests.test_bucket_versioning
2024-11-20T18:52:33.760 INFO:tasks.rgw_multisite_tests:----------------------------------------------------------------------
2024-11-20T18:52:33.760 INFO:tasks.rgw_multisite_tests:Traceback (most recent call last):
2024-11-20T18:52:33.760 INFO:tasks.rgw_multisite_tests:  File "/home/teuthworker/src/git.ceph.com_teuthology_029bda193b1d38b67b279eb5c1037caa8408be24/virtualenv/lib/python3.10/site-packages/nose/case.py", line 170, in runTest
2024-11-20T18:52:33.760 INFO:tasks.rgw_multisite_tests:    self.test(*self.arg)
2024-11-20T18:52:33.761 INFO:tasks.rgw_multisite_tests:  File "/home/teuthworker/src/git.ceph.com_ceph-c_80d20f023a635a342ed280c3a62c9c6c83db1351/qa/../src/test/rgw/rgw_multi/tests.py", line 1071, in test_bucket_versioning
2024-11-20T18:52:33.761 INFO:tasks.rgw_multisite_tests:    bucket.configure_versioning(True)
2024-11-20T18:52:33.761 INFO:tasks.rgw_multisite_tests:  File "/home/teuthworker/src/git.ceph.com_teuthology_029bda193b1d38b67b279eb5c1037caa8408be24/virtualenv/lib/python3.10/site-packages/boto/s3/bucket.py", line 1308, in configure_versioning
2024-11-20T18:52:33.761 INFO:tasks.rgw_multisite_tests:    raise self.connection.provider.storage_response_error(
2024-11-20T18:52:33.761 INFO:tasks.rgw_multisite_tests:boto.exception.S3ResponseError: S3ResponseError: 403 Forbidden
2024-11-20T18:52:33.761 INFO:tasks.rgw_multisite_tests:<?xml version="1.0" encoding="UTF-8"?><Error><Code>RequestTimeTooSkewed</Code><Message></Message><RequestId>tx0000087bc35e38bfd5508-00673e21be-4397-b1</RequestId><HostId>4397-b1-b</HostId></Error>

logs on secondary zonegroup shows:

/plcdnh-24/?versioning
2024-11-20T17:51:58.002+0000 7f1f5e743640 10 NOTICE: request time skew too big.
2024-11-20T17:51:58.002+0000 7f1f5e743640 10 req_tp=2024-11-20T17:36:58.000000+0000, cur_tp=2024-11-20T17:51:58.002170+0000

@smanjara
Copy link
Contributor Author

smanjara commented Nov 22, 2024

http://qa-proxy.ceph.com/teuthology/smanjara-2024-11-20_17:01:24-rgw:multisite-rgw-multisite-two-zonegroups-distro-default-smithi/8002132/teuthology.log

logs on secondary zonegroup shows:

/plcdnh-24/?versioning
2024-11-20T17:51:58.002+0000 7f1f5e743640 10 NOTICE: request time skew too big.
2024-11-20T17:51:58.002+0000 7f1f5e743640 10 req_tp=2024-11-20T17:36:58.000000+0000, cur_tp=2024-11-20T17:51:58.002170+0000

a local reproducer that creates a bucket on secondary zonegroup and tries to enable versioning. below errors are from set_bucket_versioning failure

2024-11-21T18:41:04.277-0500 7fac935ae6c0  0 req 14950580262357201806 0.017000088s s3:set_bucket_versioning NOTICE: request for data in a different zonegroup (887bc69b-9548-4ae7-b12f-3c478ac19e11 != ef470422-c850-4b0d-91e4-772f21d476e8)
2024-11-21T18:41:04.277-0500 7fac935ae6c0 10 req 14950580262357201806 0.017000088s s3:set_bucket_versioning init_permissions on :bucket1[59f4524c-2f39-4044-a628-855823695e92.4283.1]) failed, ret=-2024
2024-11-21T18:41:04.277-0500 7fac935ae6c0 20 req 14950580262357201806 0.017000088s op->ERRORHANDLER: err_no=-2024 new_err_no=-2024
2024-11-21T18:41:04.277-0500 7fac935ae6c0 10 req 14950580262357201806 0.017000088s cache get: name=zg2-3.rgw.log++script.postrequest. : hit (negative entry)
2024-11-21T18:41:04.277-0500 7fac935ae6c0  2 req 14950580262357201806 0.017000088s s3:set_bucket_versioning op status=0
2024-11-21T18:41:04.277-0500 7fac935ae6c0  2 req 14950580262357201806 0.017000088s s3:set_bucket_versioning http status=301
2024-11-21T18:41:04.277-0500 7fac935ae6c0  1 ====== req done req=0x7fac5a4185a0 op=set_bucket_versioning bucket=bucket1 status=0 http_status=301 latency=0.017000088s request_id=tx00000cf7b21897bc4738e-00673fc510-4299-zg2-3 ======
2024-11-21T18:41:04.277-0500 7fac935ae6c0  1 beast: 0x7fac5a4185a0: ::1 - testuser [21/Nov/2024:18:41:04.260 -0500] "PUT /bucket1?versioning HTTP/1.1" 301 0 - "aws-cli/2.15.2 Python/3.11.7 Linux/6.6.9-100.fc38.x86_64 source/x86_64.fedora.38 prompt/off command/s3api.put-bucket-versioning" - latency=0.017000088s

@clwluvw you've been working on zonegroup related fixes. does any of your fixes solve the above issue?

@clwluvw
Copy link
Member

clwluvw commented Nov 22, 2024

test_bucket_versioning passes for me. basically, 301 happens when the request is hitting the wrong zonegroup. Is your local reproduce pointing to the right zone?

The RequestTimeTooSkewed we see as the original in the logs sounds like a setup issue to me rather than the code... I haven't faced such failure basically and is passing for me locally. maybe deserves a retry?

@smanjara
Copy link
Contributor Author

aws-cli/2.15.2

test_bucket_versioning passes for me but what is odd in your output is the user-agent. Are you sure this is coming from the rgw_multi suite? basically, 301 happens when the request is hitting the wrong zonegroup but the rgw_multi is/should use the right zone conn.

The RequestTimeTooSkewed we see as the original in the logs sounds like a setup issue to me rather than the code... I haven't faced such failure basically and is passing for me locally. maybe deserves a retry?

RequestTimeTooSkewed just comes from teuthology runs but locally when you try a simple test to create bucket and enable versioning, aws cli command throws maximum recursion depth exceeded in comparison and then when you

test_bucket_versioning passes for me. basically, 301 happens when the request is hitting the wrong zonegroup. Is your local reproduce pointing to the right zone?

The RequestTimeTooSkewed we see as the original in the logs sounds like a setup issue to me rather than the code... I haven't faced such failure basically and is passing for me locally. maybe deserves a retry?

it has been consistently reproducible in teuthology atleast.

@clwluvw
Copy link
Member

clwluvw commented Nov 22, 2024

Ah I see the problem now... You need to backport #59305 otherwise the buckets created on zonegroup B will be created for zonegroup A and you will keep getting 301 as the response and boto2 (haven't tested aws-cli) will keep retrying until it reaches the max recursion.

@smanjara
Copy link
Contributor Author

Ah I see the problem now... You need to backport #59305 otherwise the buckets created on zonegroup B will be created for zonegroup A and you will keep getting 301 as the response and boto2 (haven't tested aws-cli) will keep retrying until it reaches the max recursion.

yeah, I used that pr too. there is a different problem there. create_bucket() path crashes on primary. will provide more details in that pr.

@clwluvw
Copy link
Member

clwluvw commented Nov 22, 2024

create_bucket() path crashes on primary. will provide more details in that pr.

Right, I had one crash fix also here: #60254

@smanjara
Copy link
Contributor Author

create_bucket() path crashes on primary. will provide more details in that pr.

Right, I had one crash fix also here: #60254

great! could you make those two commits part of a single pr? and close #59305 because running qa against it will catch this crash again.

@clwluvw
Copy link
Member

clwluvw commented Nov 22, 2024

create_bucket() path crashes on primary. will provide more details in that pr.

Right, I had one crash fix also here: #60254

great! could you make those two commits part of a single pr? and close #59305 because running qa against it will catch this crash again.

Done. I thought for QA we could import multiple changes so we can keep these distinct to PRs.

@smanjara
Copy link
Contributor Author

create_bucket() path crashes on primary. will provide more details in that pr.

Right, I had one crash fix also here: #60254

great! could you make those two commits part of a single pr? and close #59305 because running qa against it will catch this crash again.

Done. I thought for QA we could import multiple changes so we can keep these distinct to PRs.

thanks, seena!
with #60254 pulled in, the tests pass: https://pulpito.ceph.com/smanjara-2024-11-22_22:48:55-rgw:multisite-rgw-multisite-two-zonegroups-distro-default-smithi/.

@smanjara smanjara removed the DNM label Nov 23, 2024
@smanjara smanjara marked this pull request as ready for review November 23, 2024 00:41
@smanjara smanjara requested a review from cbodley November 25, 2024 20:20
@clwluvw
Copy link
Member

clwluvw commented Nov 27, 2024

jenkins test api

@clwluvw
Copy link
Member

clwluvw commented Nov 27, 2024

jenkins test make check

@clwluvw
Copy link
Member

clwluvw commented Nov 27, 2024

jenkins test submodules

@clwluvw
Copy link
Member

clwluvw commented Nov 28, 2024

jenkins test make check

@clwluvw
Copy link
Member

clwluvw commented Nov 28, 2024

jenkins test submodules

@clwluvw
Copy link
Member

clwluvw commented Nov 29, 2024

jenkins test make check

@clwluvw
Copy link
Member

clwluvw commented Dec 2, 2024

@smanjara
Copy link
Contributor Author

smanjara commented Dec 2, 2024

with #60254 pulled in, the tests pass: https://pulpito.ceph.com/smanjara-2024-11-22_22:48:55-rgw:multisite-rgw-multisite-two-zonegroups-distro-default-smithi/.

Hi @cbodley, @smanjara - Can we ship this based on the QA result?

@clwluvw yes, i'd love to! but the tests passed with #60254, #60589 and #60591. let's get those merged too.

@smanjara smanjara force-pushed the rgw-multisite-two-zonegroups branch 2 times, most recently from c28e8ca to 9db83bc Compare December 2, 2024 23:26
@smanjara smanjara force-pushed the rgw-multisite-two-zonegroups branch from 9db83bc to 254dad2 Compare December 2, 2024 23:42
@clwluvw
Copy link
Member

clwluvw commented Dec 3, 2024

the tests passed with #60254, #60589 and #60591. let's get those merged too.

I guess this one is also needed. #59960

@smanjara
Copy link
Contributor Author

smanjara commented Dec 3, 2024

the tests passed with #60254, #60589 and #60591. let's get those merged too.

I guess this one is also needed. #59960

passes without it because there are no location-constraint based tests. but that pr is also ready to be merged.

@clwluvw
Copy link
Member

clwluvw commented Dec 3, 2024

Do we need @cbodley's approval (someone from ceph/rgw team basically) to merge this? :D

@smanjara
Copy link
Contributor Author

smanjara commented Dec 3, 2024

@cbodley could you approve please? results here:
https://pulpito.ceph.com/smanjara-2024-11-22_22:48:55-rgw:multisite-rgw-multisite-two-zonegroups-distro-default-smithi/

I already merged the dependent prs

@cbodley
Copy link
Contributor

cbodley commented Dec 3, 2024

Do we need @cbodley's approval (someone from ceph/rgw team basically) to merge this? :D

@clwluvw as a member of Ceph team, you also have the power to approve PRs for merge

the ceph/rgw team is just the list of members that get notified about every rgw pr

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants