Skip to content

Avoid too long waiting for inactive replicas#27931

Merged
tavplubix merged 4 commits intomasterfrom
wait_for_all_replicas_timeouts
Aug 27, 2021
Merged

Avoid too long waiting for inactive replicas#27931
tavplubix merged 4 commits intomasterfrom
wait_for_all_replicas_timeouts

Conversation

@tavplubix
Copy link
Copy Markdown
Member

I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

Changelog category (leave one):

  • Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Added replication_wait_for_inactive_replica_timeout setting. It allows to specify how long to wait for inactive replicas to execute ALTER/OPTIMZE/TRUNCATE query (default is 120 seconds). If replication_alter_partitions_sync is 2 and some replicas are not active for more than replication_wait_for_inactive_replica_timeout seconds, then UNFINISHED will be thrown.

Detailed description / Documentation draft:
User may expect that with replication_alter_partitions_sync=2 query will wait for all replicas, but waiting for inactive replica does not make sense usually. It's better to wait for a short period of time and throw exception if replica did not restore connection to ZooKeeper (because it probably will not restore it soon anyway).
This PR also unifies behavior of queries that may wait for other replicas.
Related: #27178

@robot-clickhouse robot-clickhouse added the pr-improvement Pull request with some product improvements label Aug 20, 2021
@tavplubix tavplubix mentioned this pull request Aug 20, 2021
Copy link
Copy Markdown
Member

@Algunenano Algunenano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First of all, thanks a lot for looking into this.

I've left some questions I had while looking at the PR code. Aside from that, do you know if there is any place where I can learn about what should be waited for and possible issues when it isn't (like when it throws an ErrorCodes::UNFINISHED as it will do now).

*/

bool waiting_itself = replica == replica_name;
bool wait_for_inactive = 0 <= wait_for_inactive_timeout;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The different 0 <= wait_for_inactive_timeout or 0 < wait_for_inactive_timeout checks feel super unnatural to me and forced me to read things several times. It's personal preference, but to me wait_for_inactive_timeout => 0 is much clearer.

/// NOTE Table lock must not be held while waiting. Some combination of R-W-R locks from different threads will yield to deadlock.
for (auto & merge_entry : merge_entries)
waitForAllReplicasToProcessLogEntry(merge_entry, false);
if (query_context->getSettingsRef().replication_alter_partitions_sync == 1)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code if (replication_alter_partitions_sync == 1) else if == 2 seems to be repeated 6-7 times. Maybe a common function would be better and goes in line with the changes in the PR to unify the behaviour.

for (const auto & entry : entries_to_wait)
{
if (query_context->getSettingsRef().replication_alter_partitions_sync == 1)
waitForReplicaToProcessLogEntry(replica_name, *entry, wait_for_inactive_timeout);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can see this won't throw if it's waiting on itself and it times out (not a shutdown). Should it? Maybe it should use some tryWaitForReplicaToProcessLogEntry instead?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should always throw if entry was not processed on specified replica[s] for any reason.

/// better to have some notification which will call `step`
/// function when all replicated will finish. TODO.
storage.waitForAllReplicasToProcessLogEntry(log_entry, true);
storage.waitForAllReplicasToProcessLogEntry(zookeeper_path, log_entry, 0);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these (the 5 calls in this file) now be -1 to wait for an unlimited amount of time?

@tavplubix
Copy link
Copy Markdown
Member Author

tavplubix commented Aug 23, 2021

do you know if there is any place where I can learn about what should be waited for and possible issues when it isn't (like when it throws an ErrorCodes::UNFINISHED as it will do now).

It's documented here. As for ErrorCodes::UNFINISHED, it used to throw the exception for unfinished ALTER ... COLUMN ... and ALTER ... UPDATE|DELETE ... (it's not documented), now it throws for OPTIMIZE, TRUNCATE and all ALTER queries. We need to update documentation.

@tavplubix tavplubix merged commit 703101f into master Aug 27, 2021
@tavplubix tavplubix deleted the wait_for_all_replicas_timeouts branch August 27, 2021 11:31
@sevirov
Copy link
Copy Markdown
Contributor

sevirov commented Aug 27, 2021

Internal documentation ticket: DOCSUP-13875

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-improvement Pull request with some product improvements

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants