rgw: add support bucket replication between zonegroups#59911
rgw: add support bucket replication between zonegroups#59911
Conversation
165e38c to
98f9710
Compare
|
If I understood correctly, currently the only overhead is every zone will process all logged buckets from the source zone but just by the shard entries and they will stop at ceph/src/rgw/driver/rados/rgw_data_sync.cc Lines 5392 to 5395 in cbbddfd I'm not sure how expensive this could be. |
b67a763 to
fc8efc8
Compare
c46029d to
0059779
Compare
|
This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved |
2281f42 to
8f87e0a
Compare
|
I believe the write path is now fully optimized, as it only logs when an enabled pipe is associated with the object. For the polling phase, I propose two possible approaches:
|
|
I think I misunderstood the purpose of |
|
thanks @clwluvw. does replication between zonegroups work with the changes from df1867f alone? if it does, we might want to do some testing to see how expensive the polling would be under a reasonable load and a reasonable number of zones before we optimize for its efficiency. |
|
Hi @smanjara. Thanks for the interest. I'll break my commits as below:
So to summarise the first three commits should let you replicate without any manual work but the last one will optimize to only log if the rule has any prefix/tag filter. |
With the current implementation, as I understand from (#59911 (comment)), other zones in different zonegroups will just list the datalog and ignore entries when they are not interested in a particular bucket. This would likely affect the There’s one incompatibility with AWS BucketReplication: the priority field in the ReplicationRule. AWS only replicates to one bucket per object, and if there are collisions with the rules defined on the source bucket, it selects the highest priority. Currently, RGW does not respect this priority, which might confuse users who rely on it. However, with the proposed approach outlined here (#59911 (comment)), I believe we can address this issue while also reducing log processing by destination zones. In the logging phase, we can define where the object should be replicated. If I'm not mistaken, this might also replace the need for the |
|
@clwluvw sorry it took me so long to get back. the commit df1867f establishes connection objects for zones across zonegroups when zonegroup policy is set. this alone along with bucket location constraint fixes #59305 and #59960 should be sufficient to setup replication between two zonegroups. along with zonegroup policy set to documenting configuration example here for reference. at the zonegroup level: enable bucket1 replication between zg1-2 and zg1-2 belonging to zonegroup zg1. enable bucket3 replication to sync between the zonegroups zg and zg2 involving all zones: please note that setting zonegroup sync policy there is a commit in #60018 that sets multiple zonegroups for you to test with. the other commits deal with changing the way we log data, that I am not very comfortable with. the most common multisite configuration is the one where we sync to/from all zones within a zonegroup. adding conditional checks for sync pipes or for specific objects may not work and adds an overhead for configurations that does not care about sync policies. |
|
Thank you for your time on this @smanjara.
I'm not sure if this would be enough. Basically as long as ceph/src/rgw/driver/rados/rgw_rados.cc Lines 7319 to 7325 in 20af41d Currently log_data will be enabled when we have more than one zone in within the RGW's zonegroup ceph/src/rgw/driver/rados/rgw_zone.cc Line 234 in 20af41d log_data set to true. that's why the other two commits are needed I believe.
Right, from the concept I was thinking of the logging and polling mechanism independently and replaying them based on the filters and bucket's zonegroup availability. So I guess it should not matter whether the source or dest is master.
Right, currently AWS is also designed in a way that you can only replicate an object to a single bucket (zonegroup) and not concurrently to more. but still, that is not enough here, as you said before, the entry processing would be done by all zones no matter what the policy says, but only one will do that actual replication in terms of data.
Currently, the same check per object based on the sync policies available is done here: ceph/src/rgw/driver/rados/rgw_datalog.cc Lines 662 to 664 in 20af41d The only difference that I'm making now is to limit the filter to the object's properties ( prefix and tags) and only log if the pipe has any interest in that object particularly (which can be an enabled pipe on the bucket itself or on the zonegroup level). So, the load of processing the sync pipe is already being done (the pipe loading has a cache already though), but I'm just adding a minor check based on the filters, which I guess shouldn't cause a significant performance drop. Do you see it in another way?
|
c8f23e8 to
7fe1b75
Compare
yeah I am looking at it as an extension of multisite where we will simply add new zones from other zonegroups seamlessly while 'log_data' is true on all zones. I have added your pr as a topic of discussion here: https://pad.ceph.com/p/rgw-weekly. hopefull you can make it. |
c7ea980 to
16262e4
Compare
|
There is also another challenge regarding replicating an already replicated object. Currently, we log and replicate even when an object is being replicated again, which increases the load and adds complexity in managing The main issue is that when an object is replicated via AWS S3 doesn't support this type of replication, as described here: https://docs.aws.amazon.com/AmazonS3/latest/userguide/replication-what-is-isnot-replicated.html
For performance and efficiency, could we consider dropping this replication? Or at least maybe we introduce an option in the configuration to disable this functionality. |
This bucket sync state used for buckets that cannot be replicated to the zonegroup or they have been deleted in the middle of the sync and so we don't want to keep the bilogs so the zonegroup needs to report a bilog status or them as the pipe for the source bucket still points to that zone. On bilog trimming, the peers that got this status will be excluded from min generation and min position calculation as they are not intrested in replicating that bucket. Signed-off-by: Seena Fallah <[email protected]>
indexless buckets do not have bilogs. Signed-off-by: Seena Fallah <[email protected]>
The RGWOp_BILog_List API now reports the last processed marker. This allows destination zones that have no bilog entries to process to update their bucket sync status to the last marker, preventing them from reporting an obsolete marker and blocking trimming. This scenario can occur when multiple rules with filters exist on the source bucket, where some zones may not receive all entries due to log_zones limiting entry processing. These zones still need to follow the markers to stay up-to-date, even if they don't process the actual entries. Signed-off-by: Seena Fallah <[email protected]>
With the introduction of log_zones, a zone might not receive certain datalog entries if they are irrelevant to that zone. To support proper trimming, we now return the last processed marker, allowing zones to report this marker even when they haven't processed those irrelevant entries. This ensures that the source zone can proceed with trimming. Signed-off-by: Seena Fallah <[email protected]>
Consider all sources (including resolved sources) in sync info as some sources like the ones in another zonegroup are only included in the resolved sources. Signed-off-by: Seena Fallah <[email protected]>
Rule ID in ReplicationConfiguration is not required, therefore pipe id can be empty. This happens mostly on PutBucketReplication API as the user would not provide an ID and a sync will be created with an empty ID and radosgw-admin doesn't allow to modify the pipe because of the check. Signed-off-by: Seena Fallah <[email protected]>
With the zonegroup replication, buckets can have zones fron another zonegroups as well in the sync. this will allow to consider all available zones defined in the sync pipe than only the ones in the zonegroup. Signed-off-by: Seena Fallah <[email protected]>
When running `radosgw-admin bucket sync run` with only the target bucket specified and no source bucket, RGWGetBucketPeersCR doesn't account for resolved sources from the sync pipe, resulting in no pipes being returned and causing the command to fail. This change ensures that hint sources are considered to avoid this issue. Signed-off-by: Seena Fallah <[email protected]>
Signed-off-by: Seena Fallah <[email protected]>
as per destination bucket existence check before sync policy creation, we can assure that there are no policies poining to my zonegroup from other zonegroups so we can safely skip this bucket instance if it's not in my zonegroup for full sync. Signed-off-by: Seena Fallah <[email protected]>
328c080 to
7bbec7b
Compare
|
This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved |
|
This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days. |
|
This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days. |
|
This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days. |
| e->exists = true; | ||
| e->meta = *m; | ||
| e->tag = "tag"; | ||
| e->log_zones = { rgw_zone_id("1588bb2c-439a-4b75-91ef-f0b31d02563b"), rgw_zone_id("1f9654c2-3a66-4407-b07c-0f6727c9df17") }; |
There was a problem hiding this comment.
Is it intentional to hardcode these UUIDs?
There was a problem hiding this comment.
This is a function that generates test data.
| type: bool | ||
| level: advanced | ||
| default: true | ||
| desc: This option controls whether replication of already replicated objects (chain replication) |
There was a problem hiding this comment.
suggest
already-replicated
|
This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days. |
|
This pull request has been automatically closed because there has been no activity for 90 days. Please feel free to reopen this pull request (or open a new one) if the proposed change is still appropriate. Thank you for your contribution! |
Config Diff Tool Output+ added: rgw_data_sync_allow_chain_replication (rgw.yaml.in)
The above configuration changes are found in the PR. Please update the relevant release documentation if necessary. |
The current implementation of bucket replication is limited to replication within a zonegroup and does not account for bucket location constraints. To align with AWS's model, this proposal introduces cross-zonegroup bucket replication, respecting location constraints and only replicating based on user requests through the
PutBucketReplicationAPI (https://docs.aws.amazon.com/AmazonS3/latest/API/API_PutBucketReplication.html).In addition, the existing DataLogChanges and BiLogs systems are inefficient for multi-zonegroup scenarios, as they require all zones to process every entry. To improve this, a new property,
log_zones, has been introduced for both DataLogChanges and BiLogs. For BiLogs,log_zoneswill include either alltarget_zonesfrom the sync policy or, it will select the zones from the available rules for the bucket that match the criteria (prefix and tagset). For DataLogChanges, it will always include all target zones from the bucket's sync policy.Both BiLogs Listing and DataLogChanges Listing APIs now report the last processed marker, allowing zones to report the actual last marker so that the source can trim logs efficiently. DataLogChanges are global across zones, so all zones must sync to the last marker for trimming. BiLogs, however, are specific to the bucket’s interest, and zones listing BiLogs must report the correct marker, even in cases where some zones may miss entries (due to the rules filters).
The
rgwx-zonegroupproperty has been deprecated in favor ofrgwx-zone, allowing log filtering based on the requesting zone. If a zonegroup reference is still required, it can be derived from the zone.Additionally, logging has been optimized to occur only when an active sync pipe exists for the corresponding object. To prevent performance issues caused by circular replication, a new configuration option,
rgw_data_sync_allow_chain_replication, has been introduced, allowing control over chain replication and reducing redundant logging.Buckets created in other zonegroups will now operate in an indexless mode, controlling unnecessary operations and reporting on non-owned buckets.
Finally, since sync pipes can now encompass all zones across zonegroups, the wildcard (
*) configuration for pipes is no longer effective. When pipes are bucket-specific, the wildcard is automatically translated into the available zones from the bucket’s zonegroup, while still allowing updates to the policy if the bucket is recreated in a different zonegroup.Fixes: https://tracker.ceph.com/issues/66649
Related PRs: