rgw: allow controlling chain data replication#61140
rgw: allow controlling chain data replication#61140
Conversation
When chain replication is enabled, an object that has been replicated to a destination zone can be further replicated to other zones. Disabling this option prevents chain replication, avoiding potential performance issues and redundant logging caused by circular replication patterns between zones. This can be useful in scenarios where chain replication is unnecessary or undesirable for optimizing sync operations. Introduced a new zonegroup feature called "data-sync-disable-chain-replication" to disable this type of replication for the new deployments. Currently AWS also doesn't support such a replication: https://docs.aws.amazon.com/AmazonS3/latest/userguide/replication-what-is-isnot-replicated.html Fixes: https://tracker.ceph.com/issues/69310 Signed-off-by: Seena Fallah <[email protected]>
Only add the replication status of replica when fetch_remote_obj() was called by data sync not copy obj. Signed-off-by: Seena Fallah <[email protected]>
|
Extracted from #59911 |
anthonyeleven
left a comment
There was a problem hiding this comment.
Since AWS does not support chain replication, this feature is enabled by default for new deployments but requires existing deployments to opt-in manually.
One might read this and be unsure if "the feature" means chain replication, or preventing chain replication.
I might suggest naming the feature feature_data_sync_enable_chain_replication to be more congruent with others and to avoid double negatives. With appropriate defaults (on for new deployments, off for existing?)
I also felt uncomfortable with the naming, but the problem is the feature is already enabled for existing deployments. we want to disable this for new deployments as default and let existing ones to opt-in manually as well. |
|
on the subject of cross-zonegroup replication, one our of scaling challenges relates to the need for every zone to sync from every other, leading to exponential scaling as you add zones. one of my ideas to handle this was to run cross-zonegroup replication only between the master zones of each zonegroup. this model would rely heavily on the chaining behavior consider two zonegroups with two zones each: and some bucket located on zonegroup A with a bucket replication policy pointing to a destination bucket located on zonegroup B objects uploaded to A1 would replicate normally to A2 and cross-zonegroup replicate to B1, then chain replicate to B2 the advantage of this model is that we greatly reduce the number of zone sync relationships (and associated log polling traffic). the disadvantage is that extra hops increase the time-to-sync to some zones |
i tend to think we should adopt this AWS behavior for bucket replication policy only |
I had a helper here (https://github.com/clwluvw/ceph/blob/rgw-zonegroup-replication/src/rgw/rgw_common.cc#L3231-L3244) to know the destination. but that is so specific to cross-zonegroup feature. I perhaps need to introduce similar but with a different naming and then deprecate that later there if this landed faster. |
|
On the other hand, perhaps before cross-zonegroup feature this might not much make sense to be delivered. I guess we should either take it there or after the cross-zonegroup feature. |
|
This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved |
I'm happy with either. Please either close this PR in favor of incorporating it into the cross-zonegroup feature or convert this to a draft, add a note explaining what it's waiting on, and tag it pinned until that merges in. |
When chain replication is enabled, an object that has been replicated to a destination zone can be further replicated to other zones. Disabling this option prevents chain replication, avoiding potential performance issues and redundant logging caused by circular replication patterns between zones. This can be useful in scenarios where chain replication is unnecessary or undesirable for optimizing sync operations.
Introduced a new zonegroup feature called "data-sync-disable-chain-replication" to disable this type of replication for the new deployments. Currently AWS also doesn't support such a replication: https://docs.aws.amazon.com/AmazonS3/latest/userguide/replication-what-is-isnot-replicated.html
Fixes: https://tracker.ceph.com/issues/69310