Avoid too long waiting for inactive replicas#27931
Conversation
Algunenano
left a comment
There was a problem hiding this comment.
First of all, thanks a lot for looking into this.
I've left some questions I had while looking at the PR code. Aside from that, do you know if there is any place where I can learn about what should be waited for and possible issues when it isn't (like when it throws an ErrorCodes::UNFINISHED as it will do now).
| */ | ||
|
|
||
| bool waiting_itself = replica == replica_name; | ||
| bool wait_for_inactive = 0 <= wait_for_inactive_timeout; |
There was a problem hiding this comment.
The different 0 <= wait_for_inactive_timeout or 0 < wait_for_inactive_timeout checks feel super unnatural to me and forced me to read things several times. It's personal preference, but to me wait_for_inactive_timeout => 0 is much clearer.
| /// NOTE Table lock must not be held while waiting. Some combination of R-W-R locks from different threads will yield to deadlock. | ||
| for (auto & merge_entry : merge_entries) | ||
| waitForAllReplicasToProcessLogEntry(merge_entry, false); | ||
| if (query_context->getSettingsRef().replication_alter_partitions_sync == 1) |
There was a problem hiding this comment.
This code if (replication_alter_partitions_sync == 1) else if == 2 seems to be repeated 6-7 times. Maybe a common function would be better and goes in line with the changes in the PR to unify the behaviour.
| for (const auto & entry : entries_to_wait) | ||
| { | ||
| if (query_context->getSettingsRef().replication_alter_partitions_sync == 1) | ||
| waitForReplicaToProcessLogEntry(replica_name, *entry, wait_for_inactive_timeout); |
There was a problem hiding this comment.
As far as I can see this won't throw if it's waiting on itself and it times out (not a shutdown). Should it? Maybe it should use some tryWaitForReplicaToProcessLogEntry instead?
There was a problem hiding this comment.
I think it should always throw if entry was not processed on specified replica[s] for any reason.
| /// better to have some notification which will call `step` | ||
| /// function when all replicated will finish. TODO. | ||
| storage.waitForAllReplicasToProcessLogEntry(log_entry, true); | ||
| storage.waitForAllReplicasToProcessLogEntry(zookeeper_path, log_entry, 0); |
There was a problem hiding this comment.
Should these (the 5 calls in this file) now be -1 to wait for an unlimited amount of time?
It's documented here. As for |
|
Internal documentation ticket: DOCSUP-13875 |
I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Added
replication_wait_for_inactive_replica_timeoutsetting. It allows to specify how long to wait for inactive replicas to executeALTER/OPTIMZE/TRUNCATEquery (default is 120 seconds). Ifreplication_alter_partitions_syncis 2 and some replicas are not active for more thanreplication_wait_for_inactive_replica_timeoutseconds, thenUNFINISHEDwill be thrown.Detailed description / Documentation draft:
User may expect that with
replication_alter_partitions_sync=2query will wait for all replicas, but waiting for inactive replica does not make sense usually. It's better to wait for a short period of time and throw exception if replica did not restore connection to ZooKeeper (because it probably will not restore it soon anyway).This PR also unifies behavior of queries that may wait for other replicas.
Related: #27178