Improve the time to recover keeper connections#42541
Improve the time to recover keeper connections#42541nikitamikhaylov merged 15 commits intoClickHouse:masterfrom
Conversation
| static String generateActiveNodeIdentifier() | ||
| { | ||
| return "pid: " + toString(getpid()) + ", random: " + toString(randomSeed()); | ||
| return Field(ServerUUID::get()).dump(); |
There was a problem hiding this comment.
I decided to change this to use the server UUID only so it's faster to recover also after unclean shutdowns. There might be better approaches
INSERT retries should cover it, see #39764 |
| task->scheduleAfter(immediately_ms); | ||
| } | ||
| } | ||
| catch (...) | ||
| { | ||
| task->scheduleAfter(immediately_ms); |
There was a problem hiding this comment.
It will retry each 100-1000ms even if current replica is completely partitioned from a cluster. Should we increase this value a bit on each failure until it reaches 10s (the original value of retry_period_ms)?
There was a problem hiding this comment.
Sure. I went for 1s because it was reduced to it before, but it should be better to have some exponential backoff (or some other simpler backoff) for up to 10 seconds.
| storage.partial_shutdown_event.set(); | ||
| storage.replica_is_active_node = nullptr; | ||
|
|
||
| LOG_TRACE(log, "Waiting for threads to finish"); |
There was a problem hiding this comment.
These messages were useful sometimes
There was a problem hiding this comment.
It was a mistake on my part. I had added a bunch of extra logs to help me understand what was going on and deleted some that I didn't add.
|
Stress tests (thread) - Azure thread leek. #42640 |
|
Hm, there's an issue: quorum inserts rely on content of ClickHouse/src/Storages/MergeTree/ReplicatedMergeTreeSink.cpp Lines 678 to 683 in c53f7cd Maybe we should compare node version instead |
I'm not familiar with that part of the code. Do you mind explaining what the issue is? Before the change the value changed between restarts and stable inside the process (AFAICS), and now the value is also stable between restarts too. Is this a problem? |
|
According to the code I posted above, the value is supposed to change between sessions (so probably this code did not work correctly even before this PR) |
Fixed in #42878 |
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Based on #42323
Closes #42323
Closes #42251
2 main changes:
ReplicatedMergeTreeRestartingThread) is now relaunched on failure. Before it re-scheduled sometimes immediately and sometimes after 1 second (10 seconds in previous releases), but it was buggy and in some situations it would fail to re-schedule it (thus falling back to the automated checks). Now failures are re-scheduled after 100-10000ms.After this is discussed and reviewed, I'm thinking it would make sense for some queries to be able to wait for ZK recovery instead of immediately fail; it would be something similar to
lock_acquire_timeoutbut just for this. For example, with the new improvements anINSERTcould avoid throwing an exception if it waited for 100-300 ms for a new ZK connection to be ready.