Project

General

Profile

Actions

Bug #61359

closed

Consistency bugs with OLH objects

Added by Cory Snyder almost 3 years ago. Updated 8 months ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Target version:
-
% Done:

100%

Pull request ID:
Tags (freeform):
Fixed In:
v18.0.0-5005-g21ad52551d3
Released In:
v19.2.0~1988
Upkeep Timestamp:
2025-07-12T10:40:43+00:00

Description

When a PUT request is waiting on reshard, it does not properly update the bucket reference post-reshard and fails after storing the object instance, but before linking it into the bucket index. This results in us having the object stored on disk and accounted for in the bucket stats, but not visible in bucket listings. Additionally, it initializes the OLH RADOS object but never adds the user.rgw.olh.info xattr (which informs the is_olh() predicate). This means that future GET requests for that key return a 200 with an empty object since the OLH is recognized as a plain unversioned object. This can wreak havoc on clients that use well-known keys to store formatted data and fail to parse an unexpectedly empty object.

This was fixed on master and in Reef as part of the multi-site changes [1], but we could use a test case to ensure there are no future regressions on those branches. We need backports of [1] for Quincy and Pacific.

There is also a need for index cleanup tooling since buckets affected by this issue have inconsistent stats, inconsistent OLH RADOS objects, and dark data instance objects.

[1] https://github.com/ceph/ceph/commit/f57973725feeaa84321884c8eebc048989446572


Related issues 8 (1 open7 closed)

Related to rgw - Bug #50552: rgw: set_olh return -2 when reshardingTriagedMark Kogan

Actions
Related to rgw - Bug #59663: rgw: expired delete markers created by deleting non-existant object multiple times are not being removed from data pool after deletion from bucketResolvedCory Snyder

Actions
Related to rgw - Bug #59164: LC rules cause latency spikesCan't reproduce

Actions
Related to rgw - Bug #61710: quincy/pacific: PUT requests during reshard of versioned bucket fail with 404 and leave behind dark dataWon't FixCory Snyder

Actions
Related to rgw - Bug #62075: New radosgw-admin commands to cleanup leftover OLH index entries and unlinked instance objectsResolvedCory Snyder

Actions
Copied to rgw - Backport #62064: pacific: Consistency bugs with OLH objectsResolvedCory SnyderActions
Copied to rgw - Backport #62065: reef: Consistency bugs with OLH objectsResolvedCory SnyderActions
Copied to rgw - Backport #62066: quincy: Consistency bugs with OLH objectsResolvedCory SnyderActions
Actions #1

Updated by Cory Snyder almost 3 years ago

  • Affected Versions v16.0.0, v16.0.1, v16.1.0, v16.1.1, v16.2.0, v16.2.1, v16.2.10, v16.2.11, v16.2.12, v16.2.13, v16.2.2, v16.2.3, v16.2.4, v16.2.5, v16.2.6, v16.2.7, v16.2.8, v16.2.9, v17.0.0, v17.2.1, v17.2.2, v17.2.3, v17.2.4, v17.2.5 added
Actions #2

Updated by Cory Snyder almost 3 years ago

  • Pull request ID set to 51700
Actions #3

Updated by Casey Bodley almost 3 years ago

  • Related to Bug #50552: rgw: set_olh return -2 when resharding added
Actions #4

Updated by Cory Snyder over 2 years ago

  • Related to Bug #59663: rgw: expired delete markers created by deleting non-existant object multiple times are not being removed from data pool after deletion from bucket added
Actions #5

Updated by Cory Snyder over 2 years ago

  • Related to Bug #59164: LC rules cause latency spikes added
Actions #6

Updated by Casey Bodley over 2 years ago

  • Status changed from New to Fix Under Review
  • Backport changed from pacific quincy to pacific quincy reef

tagged for reef since we'll at least want the recovery command there

Actions #7

Updated by Cory Snyder over 2 years ago

With further investigation, I found that the previously referenced commit [1] was not responsible for fixing this scenario on main/reef. In fact, that commit was actually resolving a different sort of PUT 404 scenario that did not affect earlier releases.

The actual reason that this issue isn't observed on main/reef is due to [2]. The fact that the bucket instance ID doesn't change during resharding means that there is no bucket instance metadata object to remove, and an attempt to retrieve the bucket instance metadata object associated with the old bucket instance is what was causing the ENOENT error.

[1] https://github.com/ceph/ceph/commit/f57973725feeaa84321884c8eebc048989446572
[2] https://github.com/ceph/ceph/pull/39002/commits/7348a8397af99752fd64ce0a44a95a405c6b9e3e

Actions #8

Updated by Cory Snyder over 2 years ago

  • Subject changed from PUT requests during reshard of versioned bucket fail with 404 and leave behind dark data to Consistency bugs with OLH objects
Actions #9

Updated by Cory Snyder over 2 years ago

  • Related to Bug #61710: quincy/pacific: PUT requests during reshard of versioned bucket fail with 404 and leave behind dark data added
Actions #10

Updated by Casey Bodley over 2 years ago

  • Status changed from Fix Under Review to Pending Backport
Actions #11

Updated by Upkeep Bot over 2 years ago

  • Copied to Backport #62064: pacific: Consistency bugs with OLH objects added
Actions #12

Updated by Upkeep Bot over 2 years ago

Actions #13

Updated by Upkeep Bot over 2 years ago

  • Copied to Backport #62066: quincy: Consistency bugs with OLH objects added
Actions #15

Updated by Cory Snyder over 2 years ago

  • Related to Bug #62075: New radosgw-admin commands to cleanup leftover OLH index entries and unlinked instance objects added
Actions #16

Updated by Konstantin Shalygin almost 2 years ago

  • Status changed from Pending Backport to Resolved
  • % Done changed from 0 to 100
Actions #17

Updated by Upkeep Bot 8 months ago

  • Merge Commit set to 21ad52551d325ba83705d2c18247dddb177a798e
  • Fixed In set to v18.0.0-5005-g21ad52551d3
  • Released In set to v19.2.0~1988
  • Upkeep Timestamp set to 2025-07-12T10:40:43+00:00
Actions

Also available in: Atom PDF