Skip to content

Some merges may stuck #10368

@alexey-milovidov

Description

@alexey-milovidov

This bug is discovered on Yandex.Metrica servers.

If there is assigned merge but some parts in between of the range of parts to merge get lost on all replicas, the merge cannot proceed and the following messages will be printed in log:

Executing log entry to merge parts ...
Don't have all parts for merge ...; will try to fetch it instead
No active replica has part ...
Checking part ...
Checking if anyone has a part covering ...
Found parts with the same min block and with the same max block as the missing part ... Hoping that it will eventually appear as a result of a merge.

And in system.replication_log you will see the following entries:

Not executing log entry for part ... because it is covered by part ... that is currently executing
No active replica has part ... or covering part

Actually this logic exists as a safety measure when automated action is not possible: #1251

And it's unclear how to fix it in code.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugConfirmed user-visible misbehaviour in official releasecomp-replicationReplicatedMergeTree* + replication log/state transitions, eventual consistency mechanics.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions