Skip to content

Keeper retries during insert (clean)#42607

Merged
devcrafter merged 32 commits intomasterfrom
igor/insert_zk_retries_retry
Nov 10, 2022
Merged

Keeper retries during insert (clean)#42607
devcrafter merged 32 commits intomasterfrom
igor/insert_zk_retries_retry

Conversation

@devcrafter
Copy link
Copy Markdown
Member

@devcrafter devcrafter commented Oct 24, 2022

Changelog category (leave one):

  • New Feature

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Support for keeper request retries during insert into replicated merge trees. Apart from fault tolerance, it aims to provide better user experience, - avoid returning a user an error during insert if keeper is restarted (for example, due to upgrade).

It also introduces failure injection mode for keeper requests during insert into replicated merge tree, so it's possible to inject a failure with certain probability when accessing keeper interface (see insert_keeper_fault_injection_probability). It's also possible to reproduce a particular insert execution with failure injections by providing the same seed, see insert_keeper_fault_injection_seed

Closes #39764

It should be merged instead of #41734. This PR contains all the changes from #41734, but commit history does less harm to git blame output (some details in comment to 5401300)

Initial implementation was different and it changed the entire
ReplicatedMergeTreeSink::commitPart() which change history provided by git blame.
Then RetriesControl.retryLoop() was introduced later which significantly reduces
the diff since it's like while() used before.
So, check outing the current version will keep more original history in
git blame, which is useful here
+ enable fault injection for functional tests
@robot-clickhouse robot-clickhouse added the pr-feature Pull request with new product feature label Oct 24, 2022
@tavplubix tavplubix self-assigned this Oct 26, 2022
@devcrafter devcrafter requested a review from tavplubix October 28, 2022 15:13
+ throw immediately if table has no metadata in zk
+ stop retries in case of shutdown
+ check if table is readonly at the begining of every retry
@Algunenano
Copy link
Copy Markdown
Member

Super nice feature.

I just wanted to let you know I did some basic tests and didn't find any issues. Basically I've run 600 inserts (one every 0.1 seconds) while I applied chaos engineering (and I was the monkey restarting random ZK instances in a cluster of 3) and it worked without flaws.

All inserts finished successfully, with the max delay being ~2200 ms:

SELECT
    event_time,
    count(),
    min(query_duration_ms),
    max(query_duration_ms)
FROM system.query_log
WHERE (event_time > CAST('2022-11-02 15:40:00', 'DateTime')) AND (query_kind = 'Insert') AND (type != 'QueryStart')
GROUP BY event_time
    WITH TOTALS
ORDER BY event_time ASC

Query id: c07a130a-91d3-4b73-ab16-9269f0bd9e5d

┌──────────event_time─┬─count()─┬─min(query_duration_ms)─┬─max(query_duration_ms)─┐
│ 2022-11-02 15:43:16 │       4 │                     10 │                     21 │
│ 2022-11-02 15:43:17 │      10 │                      9 │                     20 │
│ 2022-11-02 15:43:18 │      10 │                     10 │                     20 │
│ 2022-11-02 15:43:19 │      10 │                     10 │                     22 │
│ 2022-11-02 15:43:20 │      10 │                      8 │                     21 │
│ 2022-11-02 15:43:21 │      10 │                      9 │                     17 │
│ 2022-11-02 15:43:22 │      10 │                     12 │                     25 │
│ 2022-11-02 15:43:24 │      19 │                     14 │                   1037 │
│ 2022-11-02 15:43:25 │       7 │                     10 │                     14 │
│ 2022-11-02 15:43:26 │      13 │                     11 │                    387 │
│ 2022-11-02 15:43:28 │      20 │                     14 │                   1629 │
│ 2022-11-02 15:43:29 │      10 │                     10 │                     20 │
│ 2022-11-02 15:43:30 │       8 │                     11 │                     20 │
│ 2022-11-02 15:43:31 │       8 │                     12 │                    786 │
│ 2022-11-02 15:43:32 │      10 │                      9 │                   1143 │
│ 2022-11-02 15:43:33 │      13 │                     19 │                   1017 │
│ 2022-11-02 15:43:34 │       4 │                     10 │                     26 │
│ 2022-11-02 15:43:35 │      16 │                     12 │                   1051 │
│ 2022-11-02 15:43:36 │      10 │                     11 │                     17 │
│ 2022-11-02 15:43:37 │       9 │                     10 │                     16 │
│ 2022-11-02 15:43:38 │       6 │                     11 │                     18 │
│ 2022-11-02 15:43:39 │      14 │                     20 │                   1233 │
│ 2022-11-02 15:43:40 │       5 │                     10 │                     18 │
│ 2022-11-02 15:43:41 │      15 │                     11 │                   1003 │
│ 2022-11-02 15:43:42 │       4 │                      9 │                     18 │
│ 2022-11-02 15:43:43 │       8 │                     18 │                    593 │
│ 2022-11-02 15:43:44 │      17 │                     12 │                   2269 │
│ 2022-11-02 15:43:45 │       2 │                     12 │                     14 │
│ 2022-11-02 15:43:46 │      18 │                     11 │                   1128 │
│ 2022-11-02 15:47:07 │       4 │                     12 │                     18 │
│ 2022-11-02 15:47:08 │      10 │                     11 │                     25 │
│ 2022-11-02 15:47:09 │      10 │                     12 │                     22 │
│ 2022-11-02 15:47:10 │      10 │                     11 │                     26 │
│ 2022-11-02 15:47:11 │       4 │                     15 │                     25 │
│ 2022-11-02 15:47:12 │      16 │                     11 │                   1125 │
│ 2022-11-02 15:47:13 │      10 │                     12 │                     21 │
│ 2022-11-02 15:47:14 │      10 │                     14 │                    401 │
│ 2022-11-02 15:47:15 │       9 │                     12 │                     18 │
│ 2022-11-02 15:47:16 │       2 │                     23 │                     36 │
│ 2022-11-02 15:47:18 │      28 │                     10 │                   2227 │
│ 2022-11-02 15:47:19 │      10 │                     10 │                     18 │
│ 2022-11-02 15:47:20 │       5 │                     10 │                     20 │
│ 2022-11-02 15:47:21 │      15 │                     12 │                   1006 │
│ 2022-11-02 15:47:22 │      10 │                     10 │                     89 │
│ 2022-11-02 15:47:23 │       8 │                     10 │                     20 │
│ 2022-11-02 15:47:24 │       7 │                     41 │                   1089 │
│ 2022-11-02 15:47:25 │       8 │                     12 │                    763 │
│ 2022-11-02 15:47:26 │       3 │                     31 │                    727 │
│ 2022-11-02 15:47:27 │      23 │                     10 │                   2051 │
│ 2022-11-02 15:47:28 │       7 │                      9 │                     15 │
│ 2022-11-02 15:47:30 │      21 │                     15 │                   1876 │
│ 2022-11-02 15:47:31 │      11 │                     10 │                   1550 │
│ 2022-11-02 15:47:32 │      10 │                     11 │                     25 │
│ 2022-11-02 15:47:33 │      10 │                     10 │                     30 │
│ 2022-11-02 15:47:34 │       6 │                     16 │                     24 │
│ 2022-11-02 15:47:35 │       2 │                   1217 │                   1308 │
│ 2022-11-02 15:47:36 │      21 │                      8 │                   1130 │
│ 2022-11-02 15:47:37 │      10 │                     11 │                     21 │
└─────────────────────┴─────────┴────────────────────────┴────────────────────────┘

Totals:
┌──────────event_time─┬─count()─┬─min(query_duration_ms)─┬─max(query_duration_ms)─┐
│ 1970-01-01 00:00:00 │     600 │                      8 │                   2269 │
└─────────────────────┴─────────┴────────────────────────┴────────────────────────┘

58 rows in set. Elapsed: 0.006 sec. Processed 11.30 thousand rows, 87.17 KB (1.80 million rows/s., 13.90 MB/s.)

As expected there were a lot of errors in the background as connections where failing:

SELECT
    name,
    code,
    value
FROM system.errors

Query id: bbe18fab-cd06-4e79-85cd-67107b1da59b

┌─name─────────────────┬─code─┬─value─┐
│ CANNOT_READ_ALL_DATA │   33 │  8164 │
│ NETWORK_ERROR        │  210 │    14 │
│ TABLE_IS_READ_ONLY   │  242 │   547 │
│ KEEPER_EXCEPTION     │  999 │  3292 │
└──────────────────────┴──────┴───────┘

4 rows in set. Elapsed: 0.001 sec.

@devcrafter
Copy link
Copy Markdown
Member Author

I just wanted to let you know I did some basic tests and didn't find any issues. Basically I've run 600 inserts (one every 0.1 seconds) while I applied chaos engineering (and I was the monkey restarting random ZK instances in a cluster of 3) and it worked without flaws.

@Algunenano Thanks. Your testing is very appreciated

I just wanted to mention that the insert retries are doing the best effort job, so, if you're especially "lucky" the insert still can fail, - for example, - when a request to keeper which corresponds to committing a part to the keeper (see request) is failed and no data was written. In such case, we'll return Insert failed due to zookeeper error. Please retry. Reason: ... from here (note: the links above are part of a file with large diff, so you'd need to unfold the file first to see the corresponding code lines)

@devcrafter
Copy link
Copy Markdown
Member Author

devcrafter commented Nov 5, 2022

Test failures to investigate:

  • flaky 00502_custom_partitioning_replicated_zookeeper_long here
  • flaky 01346_alter_enum_partition_key_replicated_zookeeper_long here
  • test_restore_replica/test.py::test_restore_replica_sequential, test_restore_replica/test.py::test_restore_replica_parallel release and tsan
  • 01532_execute_merges_on_single_replica_long here

@tavplubix @alesapin, I need your help here. It's not really clear to me what's going on there

@alesapin alesapin self-assigned this Nov 9, 2022
@devcrafter
Copy link
Copy Markdown
Member Author

Integration test for keeper has failed. It's not related to this change, but worth to check. @antonio2368 would you mind to look at it?
test_keeper_four_word_command/test.py::test_cmd_stat here:

>           assert int(result["Zxid"]) > 14
E           AssertionError: assert 11 > 14
E            +  where 11 = int('11')

@devcrafter devcrafter merged commit 1b9be27 into master Nov 10, 2022
@devcrafter devcrafter deleted the igor/insert_zk_retries_retry branch November 10, 2022 11:01
@tavplubix
Copy link
Copy Markdown
Member

@devcrafter, you forgot to take a look at Stress Tests failures, some of them are related

@devcrafter
Copy link
Copy Markdown
Member Author

devcrafter commented Nov 12, 2022

@devcrafter, you forgot to take a look at Stress Tests failures, some of them are related

Stress test failures were caused by config for functional tests, which includes newly introduced settings. The settings are not known by previous ClickHouse versions, so the previous version failed to start

The settings are reintroduced in #43117

@zhyhang
Copy link
Copy Markdown

zhyhang commented Nov 6, 2023

@devcrafter
version 23.8.5.16
settings: insert_keeper_max_retries=20 (default value)
insert 72,000 rows(about 1000 paritions) to a replicate merge tree table, when kill a zookeeper node, the insert finish normally, but observe duplicated rows (about 85,000)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-feature Pull request with new product feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Retries on ZooKeeper errors inside INSERT

6 participants