Skip to content

Require explicit permission to start a recovery after 30 generations #2796

@etschannen

Description

@etschannen

Large numbers of unfinished recoveries (recoveries that never make it to the fully_recovered state) in a row is very bad for cluster performance. This is because the metadata for all these generations of TLogs are stored in the coordinated state, so each recovery is adding more work for the next recovery.

This makes problems which cause repeated failures more dangerous, because is no action is taken, the cluster will degraded as more and more recoveries are attempted.

To prevent this failure cascade, after a certain number of recoveries the user should be required to tell the system it is okay to attempt a recovery. This will give administrators a chance to fix the root cause before doing more recoveries.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions