Project

General

Profile

Actions

Feature #61863

open

mds: issue a health warning with estimated time to complete replay

Added by Patrick Donnelly over 2 years ago. Updated 4 months ago.

Status:
Pending Backport
Priority:
Urgent
Assignee:
Category:
Administration/Usability
Target version:
% Done:

0%

Source:
Development
Backport:
squid,reef
Reviewed:
Affected Versions:
Component(FS):
MDS
Labels (FS):
Pull request ID:
Tags (freeform):
backport_processed
Fixed In:
v19.3.0-6631-g2215d554be
Released In:
v20.2.0~1435
Upkeep Timestamp:
2025-11-01T00:58:33+00:00

Description

When the MDS is in up:replay, it does not give any indication to the operator when it will complete. We do have this information though. The MDS knows the end of the journal, the write position, and how quickly it's consuming (replaying) events. Issue a periodic health warning with the current completion percentage (e.g. 40% of journal read), time spent in replay, and expected time remaining.


Related issues 3 (3 open0 closed)

Related to CephFS - Feature #65637: mds: continue sending heartbeats during recovery when MDS journal is largeIn ProgressVenky Shankar

Actions
Copied to CephFS - Backport #69370: squid: mds: issue a health warning with estimated time to complete replayDeferredVenky ShankarActions
Copied to CephFS - Backport #69371: reef: mds: issue a health warning with estimated time to complete replayDeferredVenky ShankarActions
Actions #1

Updated by Greg Farnum over 2 years ago

  • Assignee set to Manish Yathnalli
Actions #2

Updated by Manish Yathnalli over 2 years ago

  • Status changed from New to In Progress
Actions #3

Updated by Venky Shankar over 2 years ago

Patrick, the "MDS behind trimming" warning during up:replay is kind of expected in cases where there are lot many journal events/segments to replay. I think it makes sense to have this warning show up when the MDS is up:active and during up:replay the MDS could warn for longer replay times depending on the expected replay completion time/percentage.

Actions #4

Updated by Patrick Donnelly over 2 years ago

Venky Shankar wrote:

Patrick, the "MDS behind trimming" warning during up:replay is kind of expected in cases where there are lot many journal events/segments to replay.

Sorry I'm not understanding where "MDS behind trimming" warning fits into this particular issue (beyond causing longer replay).

I think it makes sense to have this warning show up when the MDS is up:active and during up:replay the MDS could warn for longer replay times depending on the expected replay completion time/percentage.

If "this warning" meaning "MDS behind on trimming", I think we're on the same page.

Actions #5

Updated by Greg Farnum over 2 years ago

  • Pull request ID set to 52527
Actions #7

Updated by Venky Shankar over 2 years ago

Manish Yathnalli wrote:

https://github.com/ceph/ceph/pull/52527

Manish, the PR id is linked in the "Pull request ID" field.

Actions #8

Updated by Venky Shankar about 2 years ago

  • Assignee changed from Manish Yathnalli to Venky Shankar
  • Backport changed from reef,quincy,pacific to reef,quincy
  • Pull request ID changed from 52527 to 55616

Manish, I'm taking the ownership of this one. I retained your contribution tags in the new pull request (of course!).

Actions #9

Updated by Patrick Donnelly almost 2 years ago

  • Related to Feature #65637: mds: continue sending heartbeats during recovery when MDS journal is large added
Actions #10

Updated by Patrick Donnelly almost 2 years ago

  • Target version changed from v19.0.0 to v20.0.0
Actions #11

Updated by Venky Shankar about 1 year ago

  • Status changed from Fix Under Review to Pending Backport
  • Backport changed from reef,quincy to squid,reef
Actions #12

Updated by Upkeep Bot about 1 year ago

  • Copied to Backport #69370: squid: mds: issue a health warning with estimated time to complete replay added
Actions #13

Updated by Upkeep Bot about 1 year ago

  • Copied to Backport #69371: reef: mds: issue a health warning with estimated time to complete replay added
Actions #14

Updated by Upkeep Bot about 1 year ago

  • Tags (freeform) set to backport_processed
Actions #15

Updated by Upkeep Bot 8 months ago

  • Merge Commit set to 2215d554bea0d47fc131e90f2c5a0e6494f471c0
  • Fixed In set to v19.3.0-6631-g2215d554bea
  • Upkeep Timestamp set to 2025-07-09T16:43:50+00:00
Actions #16

Updated by Upkeep Bot 8 months ago

  • Fixed In changed from v19.3.0-6631-g2215d554bea to v19.3.0-6631-g2215d554be
  • Upkeep Timestamp changed from 2025-07-09T16:43:50+00:00 to 2025-07-14T17:41:56+00:00
Actions #17

Updated by Upkeep Bot 4 months ago

  • Released In set to v20.2.0~1435
  • Upkeep Timestamp changed from 2025-07-14T17:41:56+00:00 to 2025-11-01T00:58:33+00:00
Actions

Also available in: Atom PDF