Skip to content

osd/scrub: use an AsyncReserver to handle scrub reservations on the replica side#55340

Merged
ronen-fr merged 10 commits intoceph:mainfrom
ronen-fr:wip-rf-reserver2
Jan 31, 2024
Merged

osd/scrub: use an AsyncReserver to handle scrub reservations on the replica side#55340
ronen-fr merged 10 commits intoceph:mainfrom
ronen-fr:wip-rf-reserver2

Conversation

@ronen-fr
Copy link
Contributor

@ronen-fr ronen-fr commented Jan 28, 2024

That means that when trying to reserve its replicas for scrubbing, the primary sends a reservation request
with a 'queue this request' flag set. That request is queued at the scrub-reserver, and granted once
the number of concurrent 'remote reservations' falls below the configured threshold.

@ljflores
Copy link
Member

jenkins test make check

@ljflores
Copy link
Member

jenkins test api

@github-actions
Copy link

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

@ronen-fr ronen-fr force-pushed the wip-rf-reserver2 branch 2 times, most recently from 769d16e to 22d93ae Compare January 29, 2024 12:35
@ronen-fr
Copy link
Contributor Author

jenkins test make check

@ronen-fr
Copy link
Contributor Author

jenkins retest this please

Copy link
Contributor

@athanatos athanatos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments are mainly related to callback invalidation, pretty close. PR structure was super easy to review, thanks!

ronen-fr and others added 4 commits January 30, 2024 02:56
Based on Sam's Crimson FSM's 'value-event'.

Co-authored-by: Sam Just <[email protected]>
Signed-off-by: Ronen Friedman <[email protected]>
As no callbacks are needed for request_reservation_or_fail(),
the synchronous request API.

Signed-off-by: Ronen Friedman <[email protected]>
@ronen-fr ronen-fr force-pushed the wip-rf-reserver2 branch 4 times, most recently from 74cc8bb to 3364a92 Compare January 30, 2024 13:24
@ronen-fr ronen-fr requested a review from athanatos January 30, 2024 13:33
@ronen-fr
Copy link
Contributor Author

@athanatos - ready for re-review.

@ronen-fr
Copy link
Contributor Author

jenkins retest this please

DECLARE_LOCALS; // 'scrbr' & 'pg_id' aliases
const auto m = ev.m_op->get_req<MOSDScrubReserve>();
const auto msg_nonce = m->reservation_nonce;
pending_reservation_nonce = m->reservation_nonce;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ceph_assert(pending_reservation_nonce == 0);

Setting pending_reservation_nonce is appropriate for the if (async_request) branch below, but not the else branch. Move this into the if branch to preserve the invariant that pending_reservation_nonce != 0 if and only if there is an pending reservation in the reserver.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. A bug introduced by too many changes in a short time

reply = new MOSDScrubReserve(
spg_t(pg_id.pgid, m_pg->get_primary().shard), ev.m_op->sent_epoch,
MOSDScrubReserve::GRANT, m_pg->pg_whoami, msg_nonce);
MOSDScrubReserve::GRANT, m_pg->pg_whoami, pending_reservation_nonce);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above, use m->reservation_nonce as pending_reservation_nonce should not have been set in this branch.

"{}: reservation_nonce mismatch: {} != {}", __func__, reservation.nonce,
pending_reservation_nonce);
dout(1) << msg << dendl;
scrbr->get_clog()->error() << msg;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't a necessarily a bug. Depending on timing, a cancellation may occur between when the reserver puts the callback into the finisher and when the callback gets the PG lock. That callback will then see a different pending_reservation_nonce (likely though not necessarily 0) by design. Let's not log this to the central logger -- dout(5) perhaps, but not more.

(reservation_granted ? "yes" : "no"),
pending_reservation_nonce)
<< dendl;
if (!reservation_granted && pending_reservation_nonce) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The !reservation_granted is superfluous -- reservation_granted must be false if pending_reservation_nonce != 0.

dout(10) << __func__ << dendl;
// if we exit the 'pending a reservation' state while still waiting for
// the reservation, we must cancel the request.
context<ReplicaActive>().clear_pending_remote_reservation();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

~ReplicaActive does

  DECLARE_LOCALS;  // 'scrbr' & 'pg_id' aliases
  if (reservation_granted) {
    dout(10) << "ReplicaActive::~ReplicaActive(): clearing reservation"
	     << dendl;
    clear_reservation_by_remote_primary(false);
  }

Without the if (reservation_granted), it would also handle this case in ~ReplicaWaitingReservation. I think it would be simpler for ReplicaActive to own clearing reservation state upon exit -- not least because it owns the pending_reservation_nonce member -- rather than splitting the responsibility between these two destructors, both of which will run in the event of an interval change. Is there another case where ReplicaWaitingReservation can exit early?

* If set, we log a failure if the state is not as expected
* (none of reservation_granted or pending_reservation_nonce is set).
*/
void clear_remote_reservation(bool registration_expected);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename registration_expected to warn_if_no_reservation.

  /**
   * cancel a granted or pending reservation
   *
   * warn_if_no_reservation is set to true if the call is in response to a
   * cancellation from the primary.  In that event, we *must* find a
   * a granted or pending reservation and failing to do so warrants
   * a warning to clog as it is a bug.
   */

"reserved? {}",
(reserved_by_my_primary ? "yes" : "no"))
"ReplicaActive::clear_remote_reservation(): was "
"reserved? {}; had pending reservation? {}",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the actual member names in the log line.

"ReplicaActive::clear_remote_reservation(): "
"pending_reservation_nonce {}, reservation_granted {}",
reservation_granted,
pending_reservation_nonce);

@ronen-fr ronen-fr force-pushed the wip-rf-reserver2 branch 3 times, most recently from ccd93a8 to b458851 Compare January 30, 2024 18:22
@ronen-fr ronen-fr requested a review from athanatos January 30, 2024 18:27
Copy link
Contributor

@athanatos athanatos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remaining comments are straightforward, no need for another review once addressed.

service.snap_reserver.set_max(cct->_conf->osd_max_trimming_pgs);
}
if (changed.count("osd_max_scrubs")) {
service.scrub_reserver.set_max(cct->_conf->osd_max_scrubs);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: with this change we effectively allow osd_max_scrubs primary scrubs and osd_max_scrubs replica scrubs, right? That's reasonable to me, but it's probably worth noting that in the config description.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That change was introduced in another PR, on November 2023.
And - yes - I should change the documentation.

* pending cb in the AsyncReserver for a request with nonce
* 'pending_reservation_nonce'
* - pending_reservation_nonce == 0 && reservation_granted -- we have sent
* a response to the primary granting the reservation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's state the invariant explicitly:

 * invariant: !((pending_reservation_nonce != 0) && reservation_granted)

It's also worth noting how these states relate to the synchronous/compat case:

 * Note that in the event that the primary is too old to support asynchronous reservation,
 * MOSDScrubReserve::wait_for_resources will be set to false by the decoder and we
 * bypass the 2nd case above.  See ReplicaActive::on_reserve_request.

struct RtReservationCB : public Context {
PGRef pg;
AsyncScrubResData res_data;
bool canceled{false};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No remaining users, right? Let's remove it.

spg_t pgid;
pg_shard_t from;
epoch_t request_epoch;
MOSDScrubReserve::reservation_nonce_t nonce;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initialize nonce and request_epoch to 0 or disallow the default constructor and ensure that all constructors initialize them to a parameter. We don't want it to be possible to construct an AsyncScrubResData with uninitialized fields.

"ReplicaActive::on_reserve_req() from {} request:{} is "
"async?{} (reservation_nonce:{})",
ev.m_from, ev, async_request, msg_nonce)
"ReplicaActive::on_reserve_req() request:{} is async?{} "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

async_request rather than async in the log line.

The FSM now interacts with the scrub_reserver directly.

Signed-off-by: Ronen Friedman <[email protected]>
As ScrubResources is no longer involved in remote reservations, some
of the data listed by 'dump_scrub_reservations' is now collected by
OsdScrub itself (prior to this change, OsdScrub just forwarded the
request to ScrubResources).

Signed-off-by: Ronen Friedman <[email protected]>
@ronen-fr
Copy link
Contributor Author

merging based on my Teuthology runs

@ronen-fr ronen-fr merged commit 52a1268 into ceph:main Jan 31, 2024
ronen-fr added a commit to ronen-fr/ceph that referenced this pull request Dec 26, 2024
Since ceph#55340, the
osd_max_scrubs (also) affects the parameters of the
async scrub reserver used by the replicas. Thus,
the code must notice and acknowledge changes to this config.

Fixes: https://tracker.ceph.com/issues/69362

Signed-off-by: Ronen Friedman <[email protected]>
ronen-fr added a commit to ronen-fr/ceph that referenced this pull request Dec 26, 2024
Since ceph#55340, the
osd_max_scrubs (also) affects the parameters of the
async scrub reserver used by the replicas. Thus,
the code must notice and acknowledge changes to this config.

Fixes: https://tracker.ceph.com/issues/69362

Signed-off-by: Ronen Friedman <[email protected]>
(cherry picked from commit 31e6bac)
ronen-fr added a commit to ronen-fr/ceph that referenced this pull request Dec 29, 2024
Since ceph#55340, the
osd_max_scrubs (also) affects the parameters of the
async scrub reserver used by the replicas. Thus,
the code must notice and acknowledge changes to this config.

Fixes: https://tracker.ceph.com/issues/69362

Signed-off-by: Ronen Friedman <[email protected]>
vshankar pushed a commit to vshankar/ceph that referenced this pull request Dec 30, 2024
Since ceph#55340, the
osd_max_scrubs (also) affects the parameters of the
async scrub reserver used by the replicas. Thus,
the code must notice and acknowledge changes to this config.

Fixes: https://tracker.ceph.com/issues/69362

Signed-off-by: Ronen Friedman <[email protected]>
piyushagarwal1411 pushed a commit to piyushagarwal1411/ceph that referenced this pull request Jan 7, 2025
Since ceph#55340, the
osd_max_scrubs (also) affects the parameters of the
async scrub reserver used by the replicas. Thus,
the code must notice and acknowledge changes to this config.

Fixes: https://tracker.ceph.com/issues/69362

Signed-off-by: Ronen Friedman <[email protected]>
Jayaprakash-ibm pushed a commit to Jayaprakash-ibm/ceph that referenced this pull request Jan 9, 2025
Since ceph#55340, the
osd_max_scrubs (also) affects the parameters of the
async scrub reserver used by the replicas. Thus,
the code must notice and acknowledge changes to this config.

Fixes: https://tracker.ceph.com/issues/69362

Signed-off-by: Ronen Friedman <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants