Avoid too long waiting for inactive replicas by tavplubix · Pull Request #27931 · ClickHouse/ClickHouse

tavplubix · 2021-08-20T13:16:53Z

I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

Changelog category (leave one):

Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Added replication_wait_for_inactive_replica_timeout setting. It allows to specify how long to wait for inactive replicas to execute ALTER/OPTIMZE/TRUNCATE query (default is 120 seconds). If replication_alter_partitions_sync is 2 and some replicas are not active for more than replication_wait_for_inactive_replica_timeout seconds, then UNFINISHED will be thrown.

Detailed description / Documentation draft:
User may expect that with replication_alter_partitions_sync=2 query will wait for all replicas, but waiting for inactive replica does not make sense usually. It's better to wait for a short period of time and throw exception if replica did not restore connection to ZooKeeper (because it probably will not restore it soon anyway).
This PR also unifies behavior of queries that may wait for other replicas.
Related: #27178

Algunenano

First of all, thanks a lot for looking into this.

I've left some questions I had while looking at the PR code. Aside from that, do you know if there is any place where I can learn about what should be waited for and possible issues when it isn't (like when it throws an ErrorCodes::UNFINISHED as it will do now).

Algunenano · 2021-08-20T14:05:27Z

src/Storages/StorageReplicatedMergeTree.cpp

      */

    bool waiting_itself = replica == replica_name;
+    bool wait_for_inactive = 0 <= wait_for_inactive_timeout;


The different 0 <= wait_for_inactive_timeout or 0 < wait_for_inactive_timeout checks feel super unnatural to me and forced me to read things several times. It's personal preference, but to me wait_for_inactive_timeout => 0 is much clearer.

Algunenano · 2021-08-20T14:15:20Z

src/Storages/StorageReplicatedMergeTree.cpp

-        /// NOTE Table lock must not be held while waiting. Some combination of R-W-R locks from different threads will yield to deadlock.
-        for (auto & merge_entry : merge_entries)
-            waitForAllReplicasToProcessLogEntry(merge_entry, false);
+        if (query_context->getSettingsRef().replication_alter_partitions_sync == 1)


This code if (replication_alter_partitions_sync == 1) else if == 2 seems to be repeated 6-7 times. Maybe a common function would be better and goes in line with the changes in the PR to unify the behaviour.

Algunenano · 2021-08-20T14:21:00Z

src/Storages/StorageReplicatedMergeTree.cpp

+    for (const auto & entry : entries_to_wait)
+    {
+        if (query_context->getSettingsRef().replication_alter_partitions_sync == 1)
+            waitForReplicaToProcessLogEntry(replica_name, *entry, wait_for_inactive_timeout);


As far as I can see this won't throw if it's waiting on itself and it times out (not a shutdown). Should it? Maybe it should use some tryWaitForReplicaToProcessLogEntry instead?

I think it should always throw if entry was not processed on specified replica[s] for any reason.

Algunenano · 2021-08-23T13:14:30Z

src/Storages/MergeTree/PartMovesBetweenShardsOrchestrator.cpp

                /// better to have some notification which will call `step`
                /// function when all replicated will finish. TODO.
-                storage.waitForAllReplicasToProcessLogEntry(log_entry, true);
+                storage.waitForAllReplicasToProcessLogEntry(zookeeper_path, log_entry, 0);


Should these (the 5 calls in this file) now be -1 to wait for an unlimited amount of time?

tavplubix · 2021-08-23T13:30:16Z

do you know if there is any place where I can learn about what should be waited for and possible issues when it isn't (like when it throws an ErrorCodes::UNFINISHED as it will do now).

It's documented here. As for ErrorCodes::UNFINISHED, it used to throw the exception for unfinished ALTER ... COLUMN ... and ALTER ... UPDATE|DELETE ... (it's not documented), now it throws for OPTIMIZE, TRUNCATE and all ALTER queries. We need to update documentation.

sevirov · 2021-08-27T16:35:46Z

Internal documentation ticket: DOCSUP-13875

avoid too long waiting for inactive replicas

59eb3aa

robot-clickhouse added the pr-improvement Pull request with some product improvements label Aug 20, 2021

tavplubix mentioned this pull request Aug 20, 2021

Break some tests #27529

Merged

fix config

9ef0b00

Algunenano reviewed Aug 20, 2021

View reviewed changes

make code better

cc9c2fd

Algunenano reviewed Aug 23, 2021

View reviewed changes

fix

4a4a0b4

tavplubix added the doc-alert label Aug 23, 2021

tavplubix merged commit 703101f into master Aug 27, 2021

tavplubix deleted the wait_for_all_replicas_timeouts branch August 27, 2021 11:31

tavplubix mentioned this pull request Aug 30, 2021

Maybe fix livelock in ZooKeeper client #28195

Merged

sevirov mentioned this pull request Sep 1, 2021

DOCSUP-13875: Document the replication_wait_for_inactive_replica_timeout setting #28464

Merged

ianton-ru mentioned this pull request Nov 2, 2021

TRUNCATE TABLE with ReplicatedMergeTree and S3 storage causes server to hang. #28094

Closed

filimonov mentioned this pull request Nov 30, 2021

TRUNCATE waits for offline replicas. #31989

Closed

Algunenano mentioned this pull request Nov 30, 2021

Stuck DDL worker on replica down #27178

Closed

genzgd mentioned this pull request Apr 9, 2022

Modifying TTL causes distributed DDL to block for a long time #36092

Open

den-crane mentioned this pull request Oct 4, 2022

TRUNCATE query stuck if replica is down #42051

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid too long waiting for inactive replicas#27931

Avoid too long waiting for inactive replicas#27931
tavplubix merged 4 commits intomasterfrom
wait_for_all_replicas_timeouts

tavplubix commented Aug 20, 2021

Uh oh!

Algunenano left a comment

Uh oh!

Algunenano Aug 20, 2021

Uh oh!

Algunenano Aug 20, 2021

Uh oh!

Algunenano Aug 20, 2021

Uh oh!

tavplubix Aug 23, 2021

Uh oh!

Algunenano Aug 23, 2021

Uh oh!

tavplubix commented Aug 23, 2021 •

edited

Loading

Uh oh!

sevirov commented Aug 27, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

tavplubix commented Aug 20, 2021

Uh oh!

Algunenano left a comment

Choose a reason for hiding this comment

Uh oh!

Algunenano Aug 20, 2021

Choose a reason for hiding this comment

Uh oh!

Algunenano Aug 20, 2021

Choose a reason for hiding this comment

Uh oh!

Algunenano Aug 20, 2021

Choose a reason for hiding this comment

Uh oh!

tavplubix Aug 23, 2021

Choose a reason for hiding this comment

Uh oh!

Algunenano Aug 23, 2021

Choose a reason for hiding this comment

Uh oh!

tavplubix commented Aug 23, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sevirov commented Aug 27, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

tavplubix commented Aug 23, 2021 •

edited

Loading