Skip to content

Comments

rgw/sts: correcting authentication in case s3 ops are directed to a primary from secondary after assumerole.#56576

Merged
smanjara merged 2 commits intoceph:mainfrom
pritha-srivastava:wip-rgw-assume-role-multisite
Apr 28, 2025
Merged

rgw/sts: correcting authentication in case s3 ops are directed to a primary from secondary after assumerole.#56576
smanjara merged 2 commits intoceph:mainfrom
pritha-srivastava:wip-rgw-assume-role-multisite

Conversation

@pritha-srivastava
Copy link
Contributor

@pritha-srivastava pritha-srivastava commented Mar 29, 2024

rgw/sts: by-passing authentication using temp creds
in case the request is forwarded from secondary in a multi-site setup. authenticating with the system user creds of which are used to sign the request.
Permissions are still derived from the role.

fixes https://tracker.ceph.com/issues/71112

Contribution Guidelines

  • To sign and title your commits, please refer to Submitting Patches to Ceph.

  • If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.

  • When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows
  • jenkins test rook e2e

@pritha-srivastava
Copy link
Contributor Author

@cbodley , @mattbenjamin @smanjara : requesting a quick feedback on the changes in this PR. Doc changes and teuthology changes still to be done.

@pritha-srivastava
Copy link
Contributor Author

The changes have been tested locally and work fine.

@github-actions github-actions bot added the tests label Apr 2, 2024
@pritha-srivastava pritha-srivastava force-pushed the wip-rgw-assume-role-multisite branch from 8fdc0ec to 167e89f Compare April 3, 2024 09:57
@pritha-srivastava pritha-srivastava marked this pull request as ready for review April 3, 2024 09:57
@pritha-srivastava pritha-srivastava requested a review from a team as a code owner April 3, 2024 09:57
@pritha-srivastava
Copy link
Contributor Author

The test case also works fine locally. I want to schedule a teuthology run, where do I add rgw config options for multisite tests? @smanjara @cbodley @alimaredia

@pritha-srivastava pritha-srivastava changed the title [DNM] rgw/sts: correcting authentication in case s3 ops are directed to a primary from secondary after assumerole. rgw/sts: correcting authentication in case s3 ops are directed to a primary from secondary after assumerole. Apr 3, 2024
@smanjara
Copy link
Contributor

smanjara commented Apr 3, 2024

The test case also works fine locally. I want to schedule a teuthology run, where do I add rgw config options for multisite tests? @smanjara @cbodley @alimaredia

hi @pritha-srivastava you could add them in qa/suites/rgw/multisite/overrides.yaml

@pritha-srivastava
Copy link
Contributor Author

The test case also works fine locally. I want to schedule a teuthology run, where do I add rgw config options for multisite tests? @smanjara @cbodley @alimaredia

hi @pritha-srivastava you could add them in qa/suites/rgw/multisite/overrides.yaml

Thank you @smanjara

@pritha-srivastava pritha-srivastava force-pushed the wip-rgw-assume-role-multisite branch from 167e89f to 644eb75 Compare April 4, 2024 05:18
leonid-s-usov pushed a commit to ceph/ceph-ci that referenced this pull request Apr 8, 2024
in case the request is forwarded from secondary in
a multi-site setup. authenticating with the system
user creds of which are used to sign the request.
Permissions are still derived from the role.

Fixes: ceph/ceph#56576
Resolves: rhbz#2271595

Signed-off-by: Pritha Srivastava <[email protected]>
(cherry picked from commit 1b3e1a6)
@smanjara
Copy link
Contributor

jenkins test make check

@github-actions
Copy link

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

mkogan1 pushed a commit to mkogan1/ceph that referenced this pull request May 8, 2024
in case the request is forwarded from secondary in
a multi-site setup. authenticating with the system
user creds of which are used to sign the request.
Permissions are still derived from the role.

Fixes: ceph#56576
Resolves: rhbz#2271595

Signed-off-by: Pritha Srivastava <[email protected]>
(cherry picked from commit 1b3e1a6)
@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days.
If you are a maintainer or core committer, please follow-up on this pull request to identify what steps should be taken by the author to move this proposed change forward.
If you are the author of this pull request, thank you for your proposed contribution. If you believe this change is still appropriate, please ensure that any feedback has been addressed and ask for a code review.

@github-actions github-actions bot added the stale label Jun 11, 2024
@github-actions
Copy link

This pull request has been automatically closed because there has been no activity for 90 days. Please feel free to reopen this pull request (or open a new one) if the proposed change is still appropriate. Thank you for your contribution!

@mattbenjamin
Copy link
Contributor

@pritha-srivastava @smanjara what is this PR stalled on?

@pritha-srivastava
Copy link
Contributor Author

@pritha-srivastava @smanjara what is this PR stalled on?

I will rebase and update the PR.

@pritha-srivastava pritha-srivastava force-pushed the wip-rgw-assume-role-multisite branch from 644eb75 to 6bff1be Compare August 13, 2024 07:15
@cbodley
Copy link
Contributor

cbodley commented Apr 21, 2025

I looked at her test case, the difference in my test case is that the user credentials is needed for creating an sts connection. so the user has to be created elsewhere. And I need the alternate user to be present in the same zone as the other user.

i think the unnecessary "zone conn" abstraction is what's making this so difficult. that abstraction assumes there's only ever one identity, which isn't good enough for the test coverage we need

in the test_bucket_create_with_tenant() example, it's running radosgw-admin user create and constructing its own boto connection with those credentials

could you do similar with the sts and iam connections, instead of relying on changes to zone_rados.py? you can still use the get_gateway_sts_connection() helper function there, as long as it doesn't cache the result in gateway.sts_connection

Can non_account_alt_user be used for this purpose? but I see that there are different zone connections for non_acct_user and non_account_alt_user

i'm not familiar with those, but cc @clwluvw who added them

log.debug('ZonegroupConns::__init__ alt_user=%s', alt_user.name)
zone_conn = z.get_conn(user.credentials, alt_user.credentials)
non_account_zone_conn = z.get_conn(non_account_user.credentials)
non_account_alt_zone_conn = z.get_conn(non_account_alt_user.credentials)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the changes to get_conn(), the current usage of non_account_* calls will be the issue.
I'm not sure about the motivation for passing alternate credentials to ZoneConn—it appears they are primarily used for creating the STS client. However, if we can extend ZoneConn to include support for an alternate user context, it might be possible to unify those independent non_account_* calls under the same framework.

@pritha-srivastava pritha-srivastava force-pushed the wip-rgw-assume-role-multisite branch 12 times, most recently from 171cbaf to 1293b8c Compare April 25, 2025 12:04
@pritha-srivastava
Copy link
Contributor Author

pritha-srivastava commented Apr 25, 2025

@cbodley @smanjara : the teuthology results are here: the tests are passing in both the jobs:

https://pulpito.ceph.com/prsrivas-2025-04-25_12:09:31-rgw:multisite-wip-rgw-assume-role-multisite-distro-default-smithi/

2025-04-25T13:01:19.581 INFO:tasks.rgw_multisite_tests:rgw_multi.tests.test_assume_role_after_sync ... ok

2025-04-25T13:03:06.512 INFO:tasks.rgw_multisite_tests:rgw_multi.tests.test_assume_role_after_sync ... ok

Copy link
Contributor

@cbodley cbodley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approved regardless, lets get this through the full suite 🚀

Comment on lines 257 to 260
alt_user_creds = gen_credentials()
log.debug('created alt_user_creds access key=%s secret=%s', alt_user_creds.access_key, alt_user_creds.secret)
alt_user = multisite.User('alt_tester', tenant=args.tenant, account='RGW11111111111111111')
log.debug('created alt_user=%s', alt_user.name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we still using stuff? or can these changes in test_multi.py be reverted?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some code cleanup to do, will do so and update the PR

@pritha-srivastava
Copy link
Contributor Author

alt_user_creds = gen_credentials()

yes, scheduled one just now

@pritha-srivastava pritha-srivastava force-pushed the wip-rgw-assume-role-multisite branch from 1293b8c to 9366433 Compare April 25, 2025 15:04
in case the request is forwarded from secondary in
a multi-site setup. authenticating with the system
user creds of which are used to sign the request.
Permissions are still derived from the role.

Signed-off-by: Pritha Srivastava <[email protected]>
syncs, and then creating a bucket on both primary and secondary.
The test name is test_assume_role_after_sync.

Signed-off-by: Pritha Srivastava <[email protected]>
@pritha-srivastava pritha-srivastava force-pushed the wip-rgw-assume-role-multisite branch from 9366433 to 855db87 Compare April 26, 2025 09:44
@pritha-srivastava
Copy link
Contributor Author

Teuthology tests have passed, except two: https://pulpito.ceph.com/prsrivas-2025-04-27_07:35:37-rgw-wip-rgw-assume-role-multisite-distro-default-smithi/

  1. d4n suite failure (which is known) -
Command failed (workunit test rgw/run-d4n.sh) on smithi033 with status 1: 'mkdir -p -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && cd -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && CEPH_CLI_TEST_DUP_COMMAND=1 CEPH_REF=855db87f4addec8576708d56b6f6d6554caf8b37 TESTDIR="/home/ubuntu/cephtest" CEPH_ARGS="--cluster ceph" CEPH_ID="0" PATH=$PATH:/usr/sbin CEPH_BASE=/home/ubuntu/cephtest/clone.client.0 CEPH_ROOT=/home/ubuntu/cephtest/clone.client.0 CEPH_MNT=/home/ubuntu/cephtest/mnt.0 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 3h /home/ubuntu/cephtest/clone.client.0/qa/workunits/rgw/run-d4n.sh'
  1. multisite failure -
FAIL: rgw_multi.tests.test_bucket_log_trim_after_delete_bucket_secondary_reshard
2025-04-27T11:18:26.234 INFO:tasks.rgw_multisite_tests:----------------------------------------------------------------------
2025-04-27T11:18:26.234 INFO:tasks.rgw_multisite_tests:Traceback (most recent call last):
2025-04-27T11:18:26.235 INFO:tasks.rgw_multisite_tests:  File "/home/teuthworker/src/git.ceph.com_teuthology_dc15ac4a651ab5968cb4ffaa9ef0ff1a02484ea2/virtualenv/lib/python3.10/site-packages/nose/case.py", line 170, in runTest
2025-04-27T11:18:26.235 INFO:tasks.rgw_multisite_tests:    self.test(*self.arg)
2025-04-27T11:18:26.235 INFO:tasks.rgw_multisite_tests:  File "/home/teuthworker/src/github.com_pritha-srivastava_ceph_855db87f4addec8576708d56b6f6d6554caf8b37/qa/../src/test/rgw/rgw_multi/tests.py", line 1897, in test_bucket_log_trim_after_delete_bucket_secondary_reshard
2025-04-27T11:18:26.235 INFO:tasks.rgw_multisite_tests:    assert check_bucket_instance_metadata(zone.zone, test_bucket.name)
2025-04-27T11:18:26.235 INFO:tasks.rgw_multisite_tests:AssertionError

In the same run, the test in this PR has passed:

2025-04-27T11:17:17.370 INFO:tasks.rgw_multisite_tests:rgw_multi.tests.test_assume_role_after_sync ... ok

@pritha-srivastava
Copy link
Contributor Author

hence this PR is ready to be merged @cbodley @smanjara

@smanjara smanjara merged commit 1f95ea1 into ceph:main Apr 28, 2025
12 checks passed
@cbodley
Copy link
Contributor

cbodley commented Apr 28, 2025

does this not need backports? with account support for roles in squid, i think it would be nice to have there at least

@smanjara
Copy link
Contributor

does this not need backports? with account support for roles in squid, i think it would be nice to have there at least

@pritha-srivastava I created a tracker https://tracker.ceph.com/issues/71113 for squid. would you please do the backport?

@cbodley
Copy link
Contributor

cbodley commented Apr 28, 2025

@pritha-srivastava I created a tracker https://tracker.ceph.com/issues/71113 for squid. would you please do the backport?

thanks @smanjara. i added 'tentacle' to https://tracker.ceph.com/issues/71112 as well, since we forked on friday before this merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants