Conversation
|
Let's test it doing rolling restart of cluster of keepers in integration test as well? We should not fail inserts in this case? |
+ no break is necessary in retry loop + zk -> keeper renaming
|
Changelog entry is required. |
+ integration test (not working yet)
+ intergration tests with disconnect and keeper restart
|
Some notes about usability:
In any case, this new feature is great and helps to reduce complexity on the user side. |
try*() methods return error code but in case of connectivity errors throw this can be improved since different try*() method can returns error (not throws) on different set of error, i.e. some can throw on non-connectivity errors as well (tryGet() for example)
|
to check:
|
Initial implementation was different and it changed the entire ReplicatedMergeTreeSink::commitPart() which change history provided by git blame. Then RetriesControl.retryLoop() was introduced later which significantly reduces the diff since it's like while() used before. So, check outing the current version will keep more original history in git blame, which is useful here
When keeper failure is injected, session expiration is emulated. It should lead to ephemeral nodes removal. So, we do it too. + addressing some review comments
+ enable fault injection for functional tests
+ fix some review comments
|
Further changes will be done (here, please check description for the reason behind) |
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Support for keeper request retries during insert into replicated merge trees. Apart from fault tolerance, it aims to provide better user experience, - avoid returning a user an error during insert if keeper is restarted (for example, due to upgrade).
It also introduces failure injection mode for keeper requests during insert into replicated merge tree, so it's possible to inject a failure with certain probability when accessing keeper interface (see
insert_keeper_fault_injection_probability). It's also possible to reproduce a particular insert execution with failure injections by providing the same seed, seeinsert_keeper_fault_injection_seedCloses #39764