[DNM] rgw: allow zone to sync from itself (cross bucket sync)#56563
[DNM] rgw: allow zone to sync from itself (cross bucket sync)#56563
Conversation
Signed-off-by: Yehuda Sadeh <[email protected]>
|
without tests or examples, it's hard to tell exactly what this does. maybe you could share a couple examples of replication policies that are enabled by this? if you have two active-active zones A and B, i would have guessed that cross-bucket policies already work because objects uploaded to A:bucket1 would replicate to B:bucket2, then back to A:bucket2 from there. so is the primary motivation to enable these replication policies in single-zone configurations? |
|
@cbodley the scenario that you describe would work only if there's bidirectional sync between A and B. Also, as you pointed out doesn't work at all if there's only a single zone. From the perspective of the S3 API users this can be very confusing as the topology might not even be visible to them and the expectation is that when you replicate a bucket into another one then the data shows up there. |
|
what kind of configuration management changes would this entail? a user might just want to replicate buckets within the same zone and not opt for multi-zone env at all. they may also expect to access the replicated buckets with separate accounts, as it seems like the main use case for SRR. |
|
is it safe to assume that these sync policy changes cause us to spawn a RGWDataSyncProcessorThread against the local zone's endpoints, consult local sync status when trimming datalogs/bilogs, report on local status in 'radosgw-admin sync status', etc?
two thoughts here:
perhaps we could add a special case to RGWDefaultDataSyncModule::sync_object() for src_zone==dst_zone that skips all calls with src_obj==dst_obj, and calls RGWRados::copy_obj() otherwise? hopefully there's a way to prevent copy_obj() from overwriting new objects with older ones that could go a long way to reduce the per-object cost of this feature, but the additional cost of polling seems inescapable. considering a zonegroup with two active-active zones, this would double the number of datalog/bilog listing requests overall |
|
@smanjara we would need to figure out what the correct configuration is. As it is right now this PR is missing the configuration part. It should still work on a single zone (that's how I tested it), but we'll need to create a way to enable/disable this feature. |
|
@cbodley copying object instead of fetching it sounds like a relatively easy optimization. The RGWObjFetchCR does a bit more than just fetching the object, it also in certain cases stats the remote object to get its tags, so that it can match it with the appropriate replication policy rules. I'm not sure copy pasting all this logic would be the best way forward, can probably add the logic into the RGWObjFetchCR itself. |
does the sync policy framework have a way to add a unidirectional pipe from one zone to another? adding such a pipe with src_zone==dst_zone seems like a reasonable way for users to opt in to this |
I'm not sure we allow it explicitly, but that's what this PR does implicitly. We can bake some syntactic sugar into the policy that would be easier for the users to digest. |
Remove unneeded params and refactor internal calls Signed-off-by: Yehuda Sadeh <[email protected]>
When doing local sync use local stat operation, rather than go through the remote calls. Signed-off-by: Yehuda Sadeh <[email protected]>
|
This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days. |
|
This pull request has been automatically closed because there has been no activity for 90 days. Please feel free to reopen this pull request (or open a new one) if the proposed change is still appropriate. Thank you for your contribution! |
Enable zone sync to happen from itself. This is probably not the most efficient way to achieve cross bucket sync (within the same zone) functionality, however, it a very minimal change.
Would need to be conditionally enabled.
Contribution Guidelines
To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an
xbetween the brackets:[x]. Spaces and capitalization matter when checking off items this way.Checklist
Show available Jenkins commands
jenkins retest this pleasejenkins test classic perfjenkins test crimson perfjenkins test signedjenkins test make checkjenkins test make check arm64jenkins test submodulesjenkins test dashboardjenkins test dashboard cephadmjenkins test apijenkins test docsjenkins render docsjenkins test ceph-volume alljenkins test ceph-volume toxjenkins test windowsjenkins test rook e2e