Skip to content

Comments

rgw: only skip versioned to non-versioned buckets#62642

Closed
clwluvw wants to merge 5 commits intoceph:mainfrom
clwluvw:mismatch-versioning-fix
Closed

rgw: only skip versioned to non-versioned buckets#62642
clwluvw wants to merge 5 commits intoceph:mainfrom
clwluvw:mismatch-versioning-fix

Conversation

@clwluvw
Copy link
Member

@clwluvw clwluvw commented Apr 3, 2025

Since replication entries are compared against the current state of the bucket, delays—regardless of the cause—can lead to inconsistencies when a bucket transitions from non-versioned to versioned. Entries created before versioning was enabled may not have version IDs, which could result in discrepancies.

To address this, we will allow replication of non-versioned objects to versioned buckets as null versions while skipping only those entries that are versioned when the destination bucket is non-versioned.

The same approach applies to suspended versioning. Since suspended buckets generate objects with a null version ID, replication will be permitted to both versioned and non-versioned destinations.

Fixes: https://tracker.ceph.com/issues/70728 & https://tracker.ceph.com/issues/70486
Will be rebased with #62637

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)

@clwluvw clwluvw requested a review from a team as a code owner April 3, 2025 02:20
@clwluvw clwluvw requested review from cbodley and smanjara April 3, 2025 02:22
Copy link
Contributor

@cbodley cbodley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, the pr description sounds very reasonable. it looks like the tests have solid coverage of creates, deletes, delete markers 👍

# by default, wait up to 5 minutes before giving up on a sync checkpoint
self.checkpoint_retries = kwargs.get('checkpoint_retries', 60)
self.checkpoint_delay = kwargs.get('checkpoint_delay', 5)
self.checkpoint_delay = kwargs.get('checkpoint_delay', 15)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why was this part necessary? it seems like it has the potential to slow down the multisite tests in general

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I observed an unexpected state during delete replication, where it seemed that two stages of data sync were required, and invoking zone_data_checkpoint() alone wasn't sufficient.

I haven't investigated it in detail yet, but it might be related to the presence of two sync pipelines—one between zones and another between buckets—which could affect how shards are processed or updated. Increasing the timeout might have compensated for the first checkpoint.

For now, I've reverted the timeout to the default value of 5 to see how Teuthology handles it. If it fails there as well, I'll dive deeper to identify the root cause and explore a proper fix without relying on timeout adjustments.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps a zonegroup_bucket_checkpoint() would work better? the data checkpoints are based on data sync status, which doesn't take the 'error repo' into account if retries are necessary

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

didn’t help. The errors in the ‘error repo’ seem to be mostly related to resharding if I'm not mistaken, so it might not be the case here.

@github-actions
Copy link

github-actions bot commented Apr 4, 2025

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

@clwluvw
Copy link
Member Author

clwluvw commented Apr 7, 2025

@clwluvw
Copy link
Member Author

clwluvw commented Apr 7, 2025

jenkins test make check

@clwluvw
Copy link
Member Author

clwluvw commented Apr 7, 2025

jenkins test make check arm64

@github-actions
Copy link

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

Comment on lines +4478 to +4438
// make sure versioned object only lands on versioned bucket
if (!key.instance.empty() && !sync_pipe.dest_bucket_info.versioned()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we know whether key.instance.empty() matches null versions? is this behavior the same for full- vs incremental sync?

i see RGWBucketSyncSingleEntryCR has a bool versioned that comes from rgw_bi_log_entry::is_versioned(), along with a bool null_verid from rgw_bi_log_entry::is_null_verid()

maybe those are relevant here, although RGWBucketFullSyncCR passes false for both

Copy link
Member Author

@clwluvw clwluvw Apr 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we know whether key.instance.empty() matches null versions?

yes it does.

there was a discussion around those vars here: #62594
but what i could understand eventually was, is_null_verid is only implemented for delete (#62594 (comment)), and for is_versioned() for null version id it results in false.

@github-actions
Copy link

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

clwluvw added 5 commits April 29, 2025 17:08
Reject PutBucketReplication calls if versioning is not identical
between the source and destination buckets. This check also applies
to object lock configurations to ensure consistency.

Fixes: https://tracker.ceph.com/issues/70486
Signed-off-by: Seena Fallah <[email protected]>
replicating lock enabled objects require the destination bucket to
have object lock enabled.

Fixes: https://tracker.ceph.com/issues/70486
Signed-off-by: Seena Fallah <[email protected]>
…d buckets

When a bucket transitions from non-versioned to versioned, there
may be a delay in synchronizing this state and the already ongoing
data replication of non-versioned objects. During this period, some
older non-versioned objects may still need to be replicated to the
now-versioned destination bucket.

To handle this, non-versioned objects can be replicated to a
versioned bucket with a null version, ensuring no data is lost
during the transition. However, replication should still skip
versioned objects when syncing to a non-versioned bucket.

Signed-off-by: Seena Fallah <[email protected]>
…ed buckets

To properly replicate delete requests from a time when a bucket was
non-versioned before transitioning to versioned, such deletes should
be applied to the null version in the destination bucket.

we no need to set version to true for CLS_RGW_OP_UNLINK_INSTANCE as
after 92002be those bilogs will have RGW_BILOG_FLAG_VERSIONED_OP flag.

Signed-off-by: Seena Fallah <[email protected]>
@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days.
If you are a maintainer or core committer, please follow-up on this pull request to identify what steps should be taken by the author to move this proposed change forward.
If you are the author of this pull request, thank you for your proposed contribution. If you believe this change is still appropriate, please ensure that any feedback has been addressed and ask for a code review.

@github-actions github-actions bot added the stale label Jun 28, 2025
@github-actions
Copy link

This pull request has been automatically closed because there has been no activity for 90 days. Please feel free to reopen this pull request (or open a new one) if the proposed change is still appropriate. Thank you for your contribution!

@github-actions github-actions bot closed this Jul 28, 2025
@clwluvw clwluvw reopened this Jul 28, 2025
@clwluvw clwluvw requested a review from cbodley July 28, 2025 18:04
@clwluvw clwluvw removed the stale label Jul 28, 2025
@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days.
If you are a maintainer or core committer, please follow-up on this pull request to identify what steps should be taken by the author to move this proposed change forward.
If you are the author of this pull request, thank you for your proposed contribution. If you believe this change is still appropriate, please ensure that any feedback has been addressed and ask for a code review.

@github-actions github-actions bot added the stale label Sep 26, 2025
@clwluvw clwluvw removed the stale label Oct 6, 2025
@github-actions
Copy link

github-actions bot commented Dec 5, 2025

This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days.
If you are a maintainer or core committer, please follow-up on this pull request to identify what steps should be taken by the author to move this proposed change forward.
If you are the author of this pull request, thank you for your proposed contribution. If you believe this change is still appropriate, please ensure that any feedback has been addressed and ask for a code review.

@github-actions github-actions bot added the stale label Dec 5, 2025
@github-actions
Copy link

github-actions bot commented Jan 4, 2026

This pull request has been automatically closed because there has been no activity for 90 days. Please feel free to reopen this pull request (or open a new one) if the proposed change is still appropriate. Thank you for your contribution!

@github-actions github-actions bot closed this Jan 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants