Skip to content

Comments

RGW/standalone: refactor rgw_zone.h with configstore#62398

Merged
cbodley merged 12 commits intoceph:mainfrom
AliMasarweh:wip-alimasa-rgw-standalone-zone
May 5, 2025
Merged

RGW/standalone: refactor rgw_zone.h with configstore#62398
cbodley merged 12 commits intoceph:mainfrom
AliMasarweh:wip-alimasa-rgw-standalone-zone

Conversation

@AliMasarweh
Copy link
Member

@AliMasarweh AliMasarweh commented Mar 19, 2025

https://tracker.ceph.com/issues/57167

Contribution Guidelines

  • To sign and title your commits, please refer to Submitting Patches to Ceph.

  • If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.

  • When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands

@AliMasarweh AliMasarweh requested a review from a team as a code owner March 19, 2025 14:26
@AliMasarweh AliMasarweh requested review from cbodley and dang March 19, 2025 14:27
Copy link
Contributor

@cbodley cbodley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is wonderful, thanks @AliMasarweh

@AliMasarweh AliMasarweh force-pushed the wip-alimasa-rgw-standalone-zone branch from fe1da35 to d041edf Compare April 1, 2025 01:50
Copy link
Contributor

@cbodley cbodley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good progress 👍 RGWZoneGroup and RGWZoneParams still have several read/write member functions and inherit from RGWSystemMetaObj - will we be able to get rid of that entirely?

@cbodley
Copy link
Contributor

cbodley commented Apr 1, 2025

i think the rgw/multisite suite in teuthology will be the "final boss" for this pr, but an easy way to get started with smoke testing is:

~/ceph/build $ MON=1 OSD=1 RGW=1 MDS=0 MGR=0 ../src/test/rgw/test-rgw-multisite.sh 2 1

this runs vstart in two different subdirectories and sets up a simple multisite configuration between them (a realm with one zonegroup and two zones). that should give reasonable coverage of the admin commands and rest api

@github-actions
Copy link

github-actions bot commented Apr 3, 2025

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

@AliMasarweh AliMasarweh force-pushed the wip-alimasa-rgw-standalone-zone branch 2 times, most recently from 33bc46f to 265d7d9 Compare April 4, 2025 23:54
@AliMasarweh AliMasarweh force-pushed the wip-alimasa-rgw-standalone-zone branch from 265d7d9 to 8db9e09 Compare April 7, 2025 07:13
@AliMasarweh
Copy link
Member Author

vstart is failing on:

setting up user testid
2025-04-07T07:17:22.395+0000 7f8164e0fa40  0 zone cannot have an empty id
2025-04-07T07:17:22.395+0000 7f8164e0fa40 -1 failed reading zone info: ret -22 (22) Invalid argument
2025-04-07T07:17:22.395+0000 7f8164e0fa40  0 ERROR: failed to start notify service ((22) Invalid argument
2025-04-07T07:17:22.395+0000 7f8164e0fa40  0 ERROR: failed to init services (ret=(22) Invalid argument)

@AliMasarweh AliMasarweh force-pushed the wip-alimasa-rgw-standalone-zone branch from 8db9e09 to 8ef1ea0 Compare April 7, 2025 14:08
@AliMasarweh
Copy link
Member Author

Normal vstart now works without running into this issue:

vstart is failing on:

setting up user testid
2025-04-07T07:17:22.395+0000 7f8164e0fa40  0 zone cannot have an empty id
2025-04-07T07:17:22.395+0000 7f8164e0fa40 -1 failed reading zone info: ret -22 (22) Invalid argument
2025-04-07T07:17:22.395+0000 7f8164e0fa40  0 ERROR: failed to start notify service ((22) Invalid argument
2025-04-07T07:17:22.395+0000 7f8164e0fa40  0 ERROR: failed to init services (ret=(22) Invalid argument)

running MON=1 OSD=1 RGW=1 MDS=0 MGR=0 ../src/test/rgw/test-rgw-multisite.sh 2 1 hits this issue:

2025-04-07T14:13:18.284+0000 7f71e1d21a40  0 ERROR: current period 6f777de9-f055-4d58-ba6b-422ee813fb67 does not contain zone id 26e96fb4-c055-49af-851b-c1e87f7bac7c
2025-04-07T14:13:18.296+0000 7f71e1d21a40  0 WARNING: period init failed: (2) No such file or directory ... skipping
2025-04-07T14:13:18.296+0000 7f71e1d21a40  0 ERROR: search_realm_conf() failed: ret=-2
2025-04-07T14:13:18.296+0000 7f71e1d21a40  0 ERROR: failed to start notify service ((2) No such file or directory
2025-04-07T14:13:18.296+0000 7f71e1d21a40  0 ERROR: failed to init services (ret=(2) No such file or directory)


@AliMasarweh
Copy link
Member Author

Running the multisite tests locally, using these instructions.
I hit this error:

======================================================================
ERROR: test suite for <module 'test_multi' from '/home/alimasa/ceph/src/test/rgw/test_multi.py'>
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/nose/suite.py", line 210, in run
    self.setUp()
  File "/usr/local/lib/python3.9/site-packages/nose/suite.py", line 293, in setUp
    self.setupContext(ancestor)
  File "/usr/local/lib/python3.9/site-packages/nose/suite.py", line 316, in setupContext
    try_run(context, names)
  File "/usr/local/lib/python3.9/site-packages/nose/util.py", line 471, in try_run
    return func()
  File "/home/alimasa/ceph/src/test/rgw/test_multi.py", line 426, in setup_module
    init(False)
  File "/home/alimasa/ceph/src/test/rgw/test_multi.py", line 241, in init
    c1.start()
  File "/home/alimasa/ceph/src/test/rgw/test_multi.py", line 76, in start
    bash(cmd, env=env)
  File "/home/alimasa/ceph/src/test/rgw/test_multi.py", line 46, in bash
    assert(process.returncode == 0)
AssertionError: 
-------------------- >> begin captured stdout << ---------------------
WARNING: error reading test config. Path can be set through the RGW_MULTI_TEST_CONF env variable

--------------------- >> end captured stdout << ----------------------
-------------------- >> begin captured logging << --------------------
rgw_multi.tests: DEBUG: running cmd: /home/alimasa/ceph/src/mstart.sh c1 -n
rgw_multi.tests: DEBUG: command returned status=1 stdout=Cluster dest path: /home/alimasa/ceph/build/run/c1
monitors base port: 6820
rgw base port: 8001
hostname folio11
ip 172.21.5.161
port 6820
creating /home/alimasa/ceph/build/run/c1/keyring
/home/alimasa/ceph/build/bin/monmaptool: monmap file /tmp/ceph_monmap.1119608
/home/alimasa/ceph/build/bin/monmaptool: generated fsid df840448-d257-4102-86d2-d8e974991811
setting min_mon_release = tentacle
epoch 0
fsid df840448-d257-4102-86d2-d8e974991811
last_changed 2025-04-09T06:10:39.243606+0000
created 2025-04-09T06:10:39.243606+0000
min_mon_release 20 (tentacle)
election_strategy: 1
0: [v2:172.21.5.161:6820/0,v1:172.21.5.161:6821/0] mon.a
1: [v2:172.21.5.161:6822/0,v1:172.21.5.161:6823/0] mon.b
2: [v2:172.21.5.161:6824/0,v1:172.21.5.161:6825/0] mon.c
/home/alimasa/ceph/build/bin/monmaptool: writing epoch 0 to /tmp/ceph_monmap.1119608 (3 monitors)

--------------------- >> end captured logging << ---------------------

----------------------------------------------------------------------

@AliMasarweh AliMasarweh force-pushed the wip-alimasa-rgw-standalone-zone branch from c0d43e4 to 643d71f Compare April 10, 2025 03:42
@AliMasarweh
Copy link
Member Author

jenkins test api

@AliMasarweh
Copy link
Member Author

jenkins test make check arm64

@AliMasarweh
Copy link
Member Author

jenkins test make check

@AliMasarweh
Copy link
Member Author

running RGW_MULTI_TEST_CONF=./test_multi.conf nosetests test_multi.py -a '!fails_with_rgw' is passing:

----------------------------------------------------------------------
Ran 60 tests in 2330.366s

OK (SKIP=16)
[alimasa@folio11 rgw]$ 

@AliMasarweh
Copy link
Member Author

AliMasarweh commented Apr 14, 2025

@AliMasarweh
Copy link
Member Author

AliMasarweh commented Apr 17, 2025

looks like this is the issue that we hit in teuthology:

======================================================================
ERROR: test_multi.test_multi_object_delete
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/nose/case.py", line 198, in runTest
    self.test(*self.arg)
  File "/usr/local/lib/python3.9/site-packages/nose/util.py", line 620, in newfunc
    return func(*arg, **kw)
  File "/home/alimasa/ceph/src/test/rgw/rgw_multi/tests.py", line 928, in test_multi_object_delete
    bucket.delete_keys(objnames)
  File "/home/alimasa/.local/lib/python3.9/site-packages/boto/s3/bucket.py", line 728, in delete_keys
    while delete_keys2(headers):
  File "/home/alimasa/.local/lib/python3.9/site-packages/boto/s3/bucket.py", line 722, in delete_keys2
    xml.sax.parseString(body, h)
  File "/usr/lib64/python3.9/xml/sax/__init__.py", line 48, in parseString
    parser.parse(inpsrc)
  File "/usr/lib64/python3.9/xml/sax/expatreader.py", line 111, in parse
    xmlreader.IncrementalParser.parse(self, source)
  File "/usr/lib64/python3.9/xml/sax/xmlreader.py", line 125, in parse
    self.feed(buffer)
  File "/usr/lib64/python3.9/xml/sax/expatreader.py", line 221, in feed
    self._err_handler.fatalError(exc)
  File "/usr/lib64/python3.9/xml/sax/handler.py", line 38, in fatalError
    raise exception
xml.sax._exceptions.SAXParseException: <unknown>:1:426: mismatched tag

this issue rises when we run test_multi.py:test_multi_object_delete with two zonegroups

@cbodley
Copy link
Contributor

cbodley commented Apr 17, 2025

teuthology run https://pulpito.ceph.com/alimasa-2025-04-12_18:57:25-rgw:multisite-wip-alimasa-rgw-standalone-zone-test-distro-default-smithi/

not sure what's wrong in test_multi_object_delete, but i looked through the logs for the linked teuthology results. that job fails with failed meta checkpoint for zone=a2 before tests even start running

i think the issue is that the radosgw for this zone a2 isn't actually running as zone a2. teuthology.log shows us starting radosgw c1.client.1 with --rgw-zone a2:

2025-04-12T19:25:05.428 INFO:tasks.rgw.c1.client.1:Restarting daemon
2025-04-12T19:25:05.428 DEBUG:teuthology.orchestra.run.smithi119:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage daemon-helper term radosgw --rgw-frontends 'beast port=8001' -n client.1 --cluster c1 -k /etc/ceph/c1.client.1.keyring --log-file /var/log/ceph/rgw.c1.client.1.log --rgw_ops_log_socket_path /home/ubuntu/cephtest/rgw.opslog.c1.client.1.sock --foreground --rgw-zone a2 --rgw-zonegroup a --rgw-realm test-realm | sudo tee /var/log/ceph/rgw.c1.client.1.stdout 2>&1
2025-04-12T19:25:05.431 INFO:tasks.rgw.c1.client.1:Started

but rgw.c1.client.1.log for that radosgw shows it starting as zone a1:

2025-04-12T19:25:05.545+0000 7f218143bb40 20 rgw main: realm  test-realm 38f18758-c2ed-43e8-a197-ac4d79875e7d
2025-04-12T19:25:05.549+0000 7f218143bb40 20 rgw main: searching for the correct realm
2025-04-12T19:25:05.569+0000 7f218143bb40 20 rgw main: RGWRados::pool_iterate: got default.zone.38f18758-c2ed-43e8-a197-ac4d79875e7d
2025-04-12T19:25:05.569+0000 7f218143bb40 20 rgw main: RGWRados::pool_iterate: got zonegroup_info.1b07bf21-a804-4bb3-9e3e-73964cad1e9d
2025-04-12T19:25:05.569+0000 7f218143bb40 20 rgw main: RGWRados::pool_iterate: got periods.5837ad13-e5ff-4107-9382-c4093e811cfd.latest_epoch
2025-04-12T19:25:05.569+0000 7f218143bb40 20 rgw main: RGWRados::pool_iterate: got zone_info.81d50c94-4340-48db-bee8-c5f1e78d215e
2025-04-12T19:25:05.569+0000 7f218143bb40 20 rgw main: RGWRados::pool_iterate: got zone_names.a2
2025-04-12T19:25:05.569+0000 7f218143bb40 20 rgw main: RGWRados::pool_iterate: got period_config.38f18758-c2ed-43e8-a197-ac4d79875e7d
2025-04-12T19:25:05.569+0000 7f218143bb40 20 rgw main: RGWRados::pool_iterate: got default.realm
2025-04-12T19:25:05.569+0000 7f218143bb40 20 rgw main: RGWRados::pool_iterate: got realms.38f18758-c2ed-43e8-a197-ac4d79875e7d
2025-04-12T19:25:05.569+0000 7f218143bb40 20 rgw main: RGWRados::pool_iterate: got periods.29287bcf-3c0e-4744-824b-1a82404ff4e2.2
2025-04-12T19:25:05.569+0000 7f218143bb40 20 rgw main: RGWRados::pool_iterate: got periods.29287bcf-3c0e-4744-824b-1a82404ff4e2.1
2025-04-12T19:25:05.569+0000 7f218143bb40 20 rgw main: RGWRados::pool_iterate: got default.zonegroup.
2025-04-12T19:25:05.569+0000 7f218143bb40 20 rgw main: RGWRados::pool_iterate: got periods.38f18758-c2ed-43e8-a197-ac4d79875e7d:staging.latest_epoch
2025-04-12T19:25:05.569+0000 7f218143bb40 20 rgw main: RGWRados::pool_iterate: got zone_info.75c178d5-1017-4699-a6f6-754e74789584
2025-04-12T19:25:05.569+0000 7f218143bb40 20 rgw main: RGWRados::pool_iterate: got default.zone.
2025-04-12T19:25:05.569+0000 7f218143bb40 20 rgw main: RGWRados::pool_iterate: got default.zonegroup.38f18758-c2ed-43e8-a197-ac4d79875e7d
2025-04-12T19:25:05.569+0000 7f218143bb40 20 rgw main: RGWRados::pool_iterate: got realms.38f18758-c2ed-43e8-a197-ac4d79875e7d.control
2025-04-12T19:25:05.569+0000 7f218143bb40 20 rgw main: RGWRados::pool_iterate: got periods.38f18758-c2ed-43e8-a197-ac4d79875e7d:staging
2025-04-12T19:25:05.569+0000 7f218143bb40 20 rgw main: RGWRados::pool_iterate: got zone_names.a1
2025-04-12T19:25:05.569+0000 7f218143bb40 20 rgw main: RGWRados::pool_iterate: got zonegroup_info.a
2025-04-12T19:25:05.569+0000 7f218143bb40 20 rgw main: RGWRados::pool_iterate: got realms_names.test-realm
2025-04-12T19:25:05.569+0000 7f218143bb40 20 rgw main: RGWRados::pool_iterate: got periods.5837ad13-e5ff-4107-9382-c4093e811cfd.1
2025-04-12T19:25:05.569+0000 7f218143bb40 20 rgw main: RGWRados::pool_iterate: got periods.29287bcf-3c0e-4744-824b-1a82404ff4e2.latest_epoch
2025-04-12T19:25:05.569+0000 7f218143bb40 20 rgw main: RGWRados::pool_iterate: got zonegroups_names.a
2025-04-12T19:25:05.569+0000 7f218143bb40 20 rgw main: RGWRados::pool_iterate: got zone_names.default
2025-04-12T19:25:05.569+0000 7f218143bb40 20 rgw main: RGWRados::pool_iterate: got zonegroups_names.default
2025-04-12T19:25:05.569+0000 7f218143bb40 20 rgw main: RGWRados::pool_iterate: got zone_info.a95c8816-4471-484f-a97f-da8680801a58
2025-04-12T19:25:05.569+0000 7f218143bb40 20 rgw main: search_realm_with_zone(): found realm_id= realm_name=test-realm
2025-04-12T19:25:05.569+0000 7f218143bb40 20 zone a1 found
2025-04-12T19:25:05.569+0000 7f218143bb40  4 rgw main: Realm:     test-realm           (38f18758-c2ed-43e8-a197-ac4d79875e7d)
2025-04-12T19:25:05.569+0000 7f218143bb40  4 rgw main: ZoneGroup: a                    (a)
2025-04-12T19:25:05.569+0000 7f218143bb40  4 rgw main: Zone:      a1                   (75c178d5-1017-4699-a6f6-754e74789584)

@AliMasarweh AliMasarweh force-pushed the wip-alimasa-rgw-standalone-zone branch from 643d71f to aa607ce Compare April 22, 2025 14:03
@AliMasarweh
Copy link
Member Author

jenkins test api

@AliMasarweh AliMasarweh force-pushed the wip-alimasa-rgw-standalone-zone branch from 617a0f4 to b229903 Compare April 29, 2025 12:31
@AliMasarweh
Copy link
Member Author

jenkins test api

2 similar comments
@AliMasarweh
Copy link
Member Author

jenkins test api

@AliMasarweh
Copy link
Member Author

jenkins test api

@AliMasarweh
Copy link
Member Author

@AliMasarweh AliMasarweh requested a review from cbodley May 4, 2025 06:19
Copy link
Contributor

@cbodley cbodley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ship it! 🚀

@cbodley cbodley added the TESTED label May 5, 2025
@cbodley cbodley merged commit f733a87 into ceph:main May 5, 2025
12 checks passed
@cbodley
Copy link
Contributor

cbodley commented May 5, 2025

thanks @AliMasarweh, great work!

@AliMasarweh
Copy link
Member Author

thanks @AliMasarweh, great work!

You are welcome, thanks for you help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants