osd/scrub: use an AsyncReserver to handle scrub reservations on the replica side#55340
osd/scrub: use an AsyncReserver to handle scrub reservations on the replica side#55340
Conversation
acc1609 to
a88e7d3
Compare
|
jenkins test make check |
|
jenkins test api |
Signed-off-by: Ronen Friedman <[email protected]>
Signed-off-by: Ronen Friedman <[email protected]>
|
This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved |
9950d8d to
b90b8ad
Compare
769d16e to
22d93ae
Compare
|
jenkins test make check |
|
jenkins retest this please |
athanatos
left a comment
There was a problem hiding this comment.
Comments are mainly related to callback invalidation, pretty close. PR structure was super easy to review, thanks!
Signed-off-by: Ronen Friedman <[email protected]>
Based on Sam's Crimson FSM's 'value-event'. Co-authored-by: Sam Just <[email protected]> Signed-off-by: Ronen Friedman <[email protected]>
As no callbacks are needed for request_reservation_or_fail(), the synchronous request API. Signed-off-by: Ronen Friedman <[email protected]>
Signed-off-by: Ronen Friedman <[email protected]>
74cc8bb to
3364a92
Compare
|
@athanatos - ready for re-review. |
3364a92 to
b12ade4
Compare
|
jenkins retest this please |
src/osd/scrubber/scrub_machine.cc
Outdated
| DECLARE_LOCALS; // 'scrbr' & 'pg_id' aliases | ||
| const auto m = ev.m_op->get_req<MOSDScrubReserve>(); | ||
| const auto msg_nonce = m->reservation_nonce; | ||
| pending_reservation_nonce = m->reservation_nonce; |
There was a problem hiding this comment.
ceph_assert(pending_reservation_nonce == 0);
Setting pending_reservation_nonce is appropriate for the if (async_request) branch below, but not the else branch. Move this into the if branch to preserve the invariant that pending_reservation_nonce != 0 if and only if there is an pending reservation in the reserver.
There was a problem hiding this comment.
Yes. A bug introduced by too many changes in a short time
src/osd/scrubber/scrub_machine.cc
Outdated
| reply = new MOSDScrubReserve( | ||
| spg_t(pg_id.pgid, m_pg->get_primary().shard), ev.m_op->sent_epoch, | ||
| MOSDScrubReserve::GRANT, m_pg->pg_whoami, msg_nonce); | ||
| MOSDScrubReserve::GRANT, m_pg->pg_whoami, pending_reservation_nonce); |
There was a problem hiding this comment.
See above, use m->reservation_nonce as pending_reservation_nonce should not have been set in this branch.
src/osd/scrubber/scrub_machine.cc
Outdated
| "{}: reservation_nonce mismatch: {} != {}", __func__, reservation.nonce, | ||
| pending_reservation_nonce); | ||
| dout(1) << msg << dendl; | ||
| scrbr->get_clog()->error() << msg; |
There was a problem hiding this comment.
This isn't a necessarily a bug. Depending on timing, a cancellation may occur between when the reserver puts the callback into the finisher and when the callback gets the PG lock. That callback will then see a different pending_reservation_nonce (likely though not necessarily 0) by design. Let's not log this to the central logger -- dout(5) perhaps, but not more.
src/osd/scrubber/scrub_machine.cc
Outdated
| (reservation_granted ? "yes" : "no"), | ||
| pending_reservation_nonce) | ||
| << dendl; | ||
| if (!reservation_granted && pending_reservation_nonce) { |
There was a problem hiding this comment.
The !reservation_granted is superfluous -- reservation_granted must be false if pending_reservation_nonce != 0.
src/osd/scrubber/scrub_machine.cc
Outdated
| dout(10) << __func__ << dendl; | ||
| // if we exit the 'pending a reservation' state while still waiting for | ||
| // the reservation, we must cancel the request. | ||
| context<ReplicaActive>().clear_pending_remote_reservation(); |
There was a problem hiding this comment.
~ReplicaActive does
DECLARE_LOCALS; // 'scrbr' & 'pg_id' aliases
if (reservation_granted) {
dout(10) << "ReplicaActive::~ReplicaActive(): clearing reservation"
<< dendl;
clear_reservation_by_remote_primary(false);
}
Without the if (reservation_granted), it would also handle this case in ~ReplicaWaitingReservation. I think it would be simpler for ReplicaActive to own clearing reservation state upon exit -- not least because it owns the pending_reservation_nonce member -- rather than splitting the responsibility between these two destructors, both of which will run in the event of an interval change. Is there another case where ReplicaWaitingReservation can exit early?
src/osd/scrubber/scrub_machine.h
Outdated
| * If set, we log a failure if the state is not as expected | ||
| * (none of reservation_granted or pending_reservation_nonce is set). | ||
| */ | ||
| void clear_remote_reservation(bool registration_expected); |
There was a problem hiding this comment.
Rename registration_expected to warn_if_no_reservation.
/**
* cancel a granted or pending reservation
*
* warn_if_no_reservation is set to true if the call is in response to a
* cancellation from the primary. In that event, we *must* find a
* a granted or pending reservation and failing to do so warrants
* a warning to clog as it is a bug.
*/
src/osd/scrubber/scrub_machine.cc
Outdated
| "reserved? {}", | ||
| (reserved_by_my_primary ? "yes" : "no")) | ||
| "ReplicaActive::clear_remote_reservation(): was " | ||
| "reserved? {}; had pending reservation? {}", |
There was a problem hiding this comment.
Use the actual member names in the log line.
"ReplicaActive::clear_remote_reservation(): "
"pending_reservation_nonce {}, reservation_granted {}",
reservation_granted,
pending_reservation_nonce);
ccd93a8 to
b458851
Compare
athanatos
left a comment
There was a problem hiding this comment.
Remaining comments are straightforward, no need for another review once addressed.
| service.snap_reserver.set_max(cct->_conf->osd_max_trimming_pgs); | ||
| } | ||
| if (changed.count("osd_max_scrubs")) { | ||
| service.scrub_reserver.set_max(cct->_conf->osd_max_scrubs); |
There was a problem hiding this comment.
Nit: with this change we effectively allow osd_max_scrubs primary scrubs and osd_max_scrubs replica scrubs, right? That's reasonable to me, but it's probably worth noting that in the config description.
There was a problem hiding this comment.
That change was introduced in another PR, on November 2023.
And - yes - I should change the documentation.
| * pending cb in the AsyncReserver for a request with nonce | ||
| * 'pending_reservation_nonce' | ||
| * - pending_reservation_nonce == 0 && reservation_granted -- we have sent | ||
| * a response to the primary granting the reservation |
There was a problem hiding this comment.
Let's state the invariant explicitly:
* invariant: !((pending_reservation_nonce != 0) && reservation_granted)
It's also worth noting how these states relate to the synchronous/compat case:
* Note that in the event that the primary is too old to support asynchronous reservation,
* MOSDScrubReserve::wait_for_resources will be set to false by the decoder and we
* bypass the 2nd case above. See ReplicaActive::on_reserve_request.
src/osd/scrubber/scrub_machine.h
Outdated
| struct RtReservationCB : public Context { | ||
| PGRef pg; | ||
| AsyncScrubResData res_data; | ||
| bool canceled{false}; |
There was a problem hiding this comment.
No remaining users, right? Let's remove it.
| spg_t pgid; | ||
| pg_shard_t from; | ||
| epoch_t request_epoch; | ||
| MOSDScrubReserve::reservation_nonce_t nonce; |
There was a problem hiding this comment.
Initialize nonce and request_epoch to 0 or disallow the default constructor and ensure that all constructors initialize them to a parameter. We don't want it to be possible to construct an AsyncScrubResData with uninitialized fields.
src/osd/scrubber/scrub_machine.cc
Outdated
| "ReplicaActive::on_reserve_req() from {} request:{} is " | ||
| "async?{} (reservation_nonce:{})", | ||
| ev.m_from, ev, async_request, msg_nonce) | ||
| "ReplicaActive::on_reserve_req() request:{} is async?{} " |
There was a problem hiding this comment.
async_request rather than async in the log line.
The FSM now interacts with the scrub_reserver directly. Signed-off-by: Ronen Friedman <[email protected]>
Signed-off-by: Ronen Friedman <[email protected]>
As ScrubResources is no longer involved in remote reservations, some of the data listed by 'dump_scrub_reservations' is now collected by OsdScrub itself (prior to this change, OsdScrub just forwarded the request to ScrubResources). Signed-off-by: Ronen Friedman <[email protected]>
Signed-off-by: Ronen Friedman <[email protected]>
b458851 to
6fa0fa3
Compare
|
merging based on my Teuthology runs |
Since ceph#55340, the osd_max_scrubs (also) affects the parameters of the async scrub reserver used by the replicas. Thus, the code must notice and acknowledge changes to this config. Fixes: https://tracker.ceph.com/issues/69362 Signed-off-by: Ronen Friedman <[email protected]>
Since ceph#55340, the osd_max_scrubs (also) affects the parameters of the async scrub reserver used by the replicas. Thus, the code must notice and acknowledge changes to this config. Fixes: https://tracker.ceph.com/issues/69362 Signed-off-by: Ronen Friedman <[email protected]> (cherry picked from commit 31e6bac)
Since ceph#55340, the osd_max_scrubs (also) affects the parameters of the async scrub reserver used by the replicas. Thus, the code must notice and acknowledge changes to this config. Fixes: https://tracker.ceph.com/issues/69362 Signed-off-by: Ronen Friedman <[email protected]>
Since ceph#55340, the osd_max_scrubs (also) affects the parameters of the async scrub reserver used by the replicas. Thus, the code must notice and acknowledge changes to this config. Fixes: https://tracker.ceph.com/issues/69362 Signed-off-by: Ronen Friedman <[email protected]>
Since ceph#55340, the osd_max_scrubs (also) affects the parameters of the async scrub reserver used by the replicas. Thus, the code must notice and acknowledge changes to this config. Fixes: https://tracker.ceph.com/issues/69362 Signed-off-by: Ronen Friedman <[email protected]>
Since ceph#55340, the osd_max_scrubs (also) affects the parameters of the async scrub reserver used by the replicas. Thus, the code must notice and acknowledge changes to this config. Fixes: https://tracker.ceph.com/issues/69362 Signed-off-by: Ronen Friedman <[email protected]>
That means that when trying to reserve its replicas for scrubbing, the primary sends a reservation request
with a 'queue this request' flag set. That request is queued at the scrub-reserver, and granted once
the number of concurrent 'remote reservations' falls below the configured threshold.