Skip to content

Comments

rgw: allow controlling chain data replication#61140

Draft
clwluvw wants to merge 2 commits intoceph:mainfrom
clwluvw:rgw-chain-replication
Draft

rgw: allow controlling chain data replication#61140
clwluvw wants to merge 2 commits intoceph:mainfrom
clwluvw:rgw-chain-replication

Conversation

@clwluvw
Copy link
Member

@clwluvw clwluvw commented Dec 19, 2024

When chain replication is enabled, an object that has been replicated to a destination zone can be further replicated to other zones. Disabling this option prevents chain replication, avoiding potential performance issues and redundant logging caused by circular replication patterns between zones. This can be useful in scenarios where chain replication is unnecessary or undesirable for optimizing sync operations.

Introduced a new zonegroup feature called "data-sync-disable-chain-replication" to disable this type of replication for the new deployments. Currently AWS also doesn't support such a replication: https://docs.aws.amazon.com/AmazonS3/latest/userguide/replication-what-is-isnot-replicated.html

Fixes: https://tracker.ceph.com/issues/69310

When chain replication is enabled, an object that has been
replicated to a destination zone can be further replicated
to other zones. Disabling this option prevents chain
replication, avoiding potential performance issues and
redundant logging caused by circular replication patterns
between zones. This can be useful in scenarios where chain
replication is unnecessary or undesirable for optimizing
sync operations.

Introduced a new zonegroup feature called "data-sync-disable-chain-replication"
to disable this type of replication for the new deployments.
Currently AWS also doesn't support such a replication:
https://docs.aws.amazon.com/AmazonS3/latest/userguide/replication-what-is-isnot-replicated.html

Fixes: https://tracker.ceph.com/issues/69310
Signed-off-by: Seena Fallah <[email protected]>
Only add the replication status of replica when fetch_remote_obj() was
called by data sync not copy obj.

Signed-off-by: Seena Fallah <[email protected]>
@clwluvw clwluvw requested review from a team as code owners December 19, 2024 00:58
@clwluvw clwluvw changed the title Rgw chain replication rgw: allow controlling chain data replication Dec 19, 2024
@clwluvw
Copy link
Member Author

clwluvw commented Dec 19, 2024

Extracted from #59911

Copy link
Contributor

@anthonyeleven anthonyeleven left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since AWS does not support chain replication, this feature is enabled by default for new deployments but requires existing deployments to opt-in manually.

One might read this and be unsure if "the feature" means chain replication, or preventing chain replication.

I might suggest naming the feature feature_data_sync_enable_chain_replication to be more congruent with others and to avoid double negatives. With appropriate defaults (on for new deployments, off for existing?)

@clwluvw
Copy link
Member Author

clwluvw commented Dec 19, 2024

I might suggest naming the feature feature_data_sync_enable_chain_replication to be more congruent with others and to avoid double negatives. With appropriate defaults (on for new deployments, off for existing?)

I also felt uncomfortable with the naming, but the problem is the feature is already enabled for existing deployments. we want to disable this for new deployments as default and let existing ones to opt-in manually as well.
When it comes to checking for a feature, it's mostly about whether the feature "exists" in the feature set. So, for backward compatibility and not breaking the existing deployments, we should go by something that default is false and can be true later on.
Then I guess either we need something negative here or some other wording at all that explains the feature (like one-shot-replication).

@cbodley
Copy link
Contributor

cbodley commented Dec 19, 2024

on the subject of cross-zonegroup replication, one our of scaling challenges relates to the need for every zone to sync from every other, leading to exponential scaling as you add zones. one of my ideas to handle this was to run cross-zonegroup replication only between the master zones of each zonegroup. this model would rely heavily on the chaining behavior

consider two zonegroups with two zones each:

zonegroup A: (master)
  zone A1 (master)
  zone A2
zonegroup B:
  zone B1 (master)
  zone B2

and some bucket located on zonegroup A with a bucket replication policy pointing to a destination bucket located on zonegroup B

objects uploaded to A1 would replicate normally to A2 and cross-zonegroup replicate to B1, then chain replicate to B2
objects uploaded to A2 would replicate normally to A1, then chain cross-zonegroup replicate to B1, then chain again to B2

the advantage of this model is that we greatly reduce the number of zone sync relationships (and associated log polling traffic). the disadvantage is that extra hops increase the time-to-sync to some zones

@cbodley
Copy link
Contributor

cbodley commented Dec 19, 2024

Currently AWS also doesn't support such a replication

i tend to think we should adopt this AWS behavior for bucket replication policy only

@clwluvw
Copy link
Member Author

clwluvw commented Dec 19, 2024

i tend to think we should adopt this AWS behavior for bucket replication policy only

I had a helper here (https://github.com/clwluvw/ceph/blob/rgw-zonegroup-replication/src/rgw/rgw_common.cc#L3231-L3244) to know the destination. but that is so specific to cross-zonegroup feature. I perhaps need to introduce similar but with a different naming and then deprecate that later there if this landed faster.

@clwluvw
Copy link
Member Author

clwluvw commented Dec 19, 2024

On the other hand, perhaps before cross-zonegroup feature this might not much make sense to be delivered. I guess we should either take it there or after the cross-zonegroup feature.

@github-actions
Copy link

github-actions bot commented Feb 5, 2025

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

@adamemerson
Copy link
Contributor

On the other hand, perhaps before cross-zonegroup feature this might not much make sense to be delivered. I guess we should either take it there or after the cross-zonegroup feature.

I'm happy with either. Please either close this PR in favor of incorporating it into the cross-zonegroup feature or convert this to a draft, add a note explaining what it's waiting on, and tag it pinned until that merges in.

@clwluvw clwluvw marked this pull request as draft March 5, 2025 21:56
@clwluvw clwluvw added the pinned Use this label if you want to exempt a PR from being stalled label Mar 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation needs-rebase pinned Use this label if you want to exempt a PR from being stalled rgw

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants