Revert "Revert " Keeper retries during insert (clean)"" by devcrafter · Pull Request #43122 · ClickHouse/ClickHouse

devcrafter · 2022-11-10T12:16:01Z

Reverts #43116
Resurrect #42607

- remove unused config for now

devcrafter · 2022-11-11T12:39:36Z

Integration test failures:

test_keeper_four_word_command/test.py::test_cmd_stat - fix Fix test_keeper_four_word_command::test_cmd_stat #43115
test_merge_tree_s3/test.py::test_table_manipulations[node]- see CI/integration test: incorrect reference to file with sanitizer report #43168 Integration test: thread sanitizer report on cluster shutdown #43169

Stress tests:

Stress test (asan)/Stress test (ubsan) — Backward compatibility check - BC check: DiskObjectStorageTransaction: Cannot open a file #41369
Stress test (tsan) - Backward compatibility check looks a bit different from others (error code is ATOMIC_RENAME_FAIL):

2022.11.11 11:10:46.078370 [ 304294 ] {} <Error> virtual void DB::MetadataStorageFromDiskTransaction::commit(): Code: 521. DB::ErrnoException: Paths cannot be exchanged because /var/lib/clickhouse/disks/s3/data/system/part_log/202211_1_243_106 or /var/lib/clickhouse/disks/s3/data/system/part_log/delete_tmp_202211_1_243_106 does not exist, errno: 2, strerror: No such file or directory. (ATOMIC_RENAME_FAIL), Stack trace (when copying this message, always include the lines below):
2022.11.11 11:10:57.823019 [ 303780 ] {} <Error> Application: Caught exception while loading metadata: Code: 521. DB::ErrnoException: Paths cannot be exchanged because /var/lib/clickhouse/disks/s3/data/system/part_log/delete_tmp_202211_1_243_106 or /var/lib/clickhouse/disks/s3/data/system/part_log/202211_1_243_106 does not exist, errno: 2, strerror: No such file or directory: While rolling back operation #0. (ATOMIC_RENAME_FAIL), Stack trace (when copying this message, always include the lines below):
2022.11.11 11:11:37.809876 [ 303780 ] {} <Error> Application: DB::ErrnoException: Paths cannot be exchanged because /var/lib/clickhouse/disks/s3/data/system/part_log/delete_tmp_202211_1_243_106 or /var/lib/clickhouse/disks/s3/data/system/part_log/202211_1_243_106 does not exist, errno: 2, strerror: No such file or directory: While rolling back operation #0

@tavplubix doesn't look as something familiar?

tavplubix · 2022-11-11T15:36:10Z

@tavplubix doesn't look as something familiar?

No, it's something new. Seems like the directory just disappeared somehow, but I don't see who could remove it:

2022.11.11 11:09:33.816618 [ 304178 ] {} <Trace> system.part_log: Loading part 202211_1_243_106 (Compact) from disk s3
2022.11.11 11:09:34.872752 [ 304178 ] {} <Trace> system.part_log: Finished part 202211_1_243_106 load on disk s3
2022.11.11 11:09:38.018187 [ 304046 ] {} <Test> VersionMetadata: Trying to lock removal_tid by (1, 1, 00000000-0000-0000-0000-000000000000), table: system.part_log, part: 202211_1_243_106
2022.11.11 11:10:41.972273 [ 304294 ] {} <Debug> system.part_log: Removing 52 parts from filesystem: 202211_1_222_90, 202211_1_223_91, 202211_1_224_92, 202211_1_225_93, 202211_1_231_94, 202211_1_232_95, 202211_1_233_96, 202211_1_234_97, 202211_1_235_98, 202211_1_236_99, 202211_1_237_100, 202211_1_238_101, 202211_1_239_102, 202211_1_240_103, 202211_1_241_104, 202211_1_242_105, 202211_1_243_106, 202211_1_244_107, 202211_1_245_108, 202211_1_246_109, 202211_1_247_110, 202211_223_223_0, 202211_224_224_0, 202211_225_225_0, 202211_226_226_0, 202211_226_231_1, 202211_227_227_0, 202211_228_228_0, 202211_229_229_0, 202211_230_230_0, 202211_231_231_0, 202211_232_232_0, 202211_233_233_0, 202211_234_234_0, 202211_235_235_0, 202211_236_236_0, 202211_237_237_0, 202211_238_238_0, 202211_239_239_0, 202211_240_240_0, 202211_241_241_0, 202211_242_242_0, 202211_243_243_0, 202211_244_244_0, 202211_245_245_0, 202211_246_246_0, 202211_247_247_0, 202211_248_248_0, 202211_249_249_0, 202211_250_250_0, 202211_251_251_0, 202211_252_252_0
2022.11.11 11:10:46.078370 [ 304294 ] {} <Error> virtual void DB::MetadataStorageFromDiskTransaction::commit(): Code: 521. DB::ErrnoException: Paths cannot be exchanged because /var/lib/clickhouse/disks/s3/data/system/part_log/202211_1_243_106 or /var/lib/clickhouse/disks/s3/data/system/part_log/delete_tmp_202211_1_243_106 does not exist, errno: 2, strerror: No such file or directory. (ATOMIC_RENAME_FAIL), Stack trace (when copying this message, always include the lines below):
2022.11.11 11:10:57.823019 [ 303780 ] {} <Error> Application: Caught exception while loading metadata: Code: 521. DB::ErrnoException: Paths cannot be exchanged because /var/lib/clickhouse/disks/s3/data/system/part_log/delete_tmp_202211_1_243_106 or /var/lib/clickhouse/disks/s3/data/system/part_log/202211_1_243_106 does not exist, errno: 2, strerror: No such file or directory: While rolling back operation #0. (ATOMIC_RENAME_FAIL), Stack trace (when copying this message, always include the lines below):
2022.11.11 11:11:37.809876 [ 303780 ] {} <Error> Application: DB::ErrnoException: Paths cannot be exchanged because /var/lib/clickhouse/disks/s3/data/system/part_log/delete_tmp_202211_1_243_106 or /var/lib/clickhouse/disks/s3/data/system/part_log/202211_1_243_106 does not exist, errno: 2, strerror: No such file or directory: While rolling back operation #0
2022.11.11 11:11:54.685395 [ 305224 ] {} <Trace> system.part_log_0: Loading part 202211_1_243_106 (Compact) from disk s3
2022.11.11 11:11:55.010476 [ 305224 ] {} <Trace> system.part_log_0: Finished part 202211_1_243_106 load on disk s3
2022.11.11 11:11:55.223057 [ 305144 ] {} <Test> VersionMetadata: Trying to lock removal_tid by (1, 1, 00000000-0000-0000-0000-000000000000), table: system.part_log_0, part: 202211_1_243_106
2022.11.11 11:12:58.868184 [ 305298 ] {} <Debug> system.part_log_0: Removing 87 parts from filesystem: 202211_1_243_106, 202211_1_244_107, 202211_1_245_108, 202211_1_246_109, 202211_1_247_110, 202211_1_248_111, 202211_1_249_112, 202211_1_250_113, 202211_1_251_114, 202211_1_252_115, 202211_1_253_116, 202211_1_254_117, 202211_1_255_118, 202211_1_256_119, 202211_1_262_120, 202211_1_263_121, 202211_1_264_122, 202211_1_265_123, 202211_1_266_124, 202211_1_267_125, 202211_1_268_126, 202211_1_269_127, 202211_1_275_128, 202211_1_281_129, 202211_223_223_0, 202211_224_224_0, 202211_225_225_0, 202211_226_226_0, 202211_226_231_1, 202211_227_227_0, 202211_228_228_0, 202211_229_229_0, 202211_230_230_0, 202211_231_231_0, 202211_232_232_0, 202211_233_233_0, 202211_234_234_0, 202211_235_235_0, 202211_236_236_0, 202211_237_237_0, 202211_238_238_0, 202211_239_239_0, 202211_240_240_0, 202211_241_241_0, 202211_242_242_0, 202211_243_243_0, 202211_244_244_0, 202211_245_245_0, 202211_246_246_0, 202211_247_247_0, 202211_248_248_0, 202211_249_249_0, 202211_250_250_0, 202211_251_251_0, 202211_252_252_0, 202211_253_253_0, 202211_254_254_0, 202211_255_255_0, 202211_256_256_0, 202211_257_257_0, 202211_257_262_1, 202211_258_258_0, 202211_259_259_0, 202211_260_260_0, 202211_261_261_0, 202211_262_262_0, 202211_263_263_0, 202211_264_264_0, 202211_265_265_0, 202211_266_266_0, 202211_267_267_0, 202211_268_268_0, 202211_269_269_0, 202211_270_270_0, 202211_270_275_1, 202211_271_271_0, 202211_272_272_0, 202211_273_273_0, 202211_274_274_0, 202211_275_275_0, 202211_276_276_0, 202211_277_277_0, 202211_278_278_0, 202211_279_279_0, 202211_280_280_0, 202211_281_281_0, 202211_282_282_0

devcrafter · 2022-11-11T17:15:38Z

@tavplubix doesn't look as something familiar?

No, it's something new. Seems like the directory just disappeared somehow, but I don't see who could remove it:

Created #43185

Algunenano · 2023-07-31T15:52:39Z

src/Storages/MergeTree/ReplicatedMergeTreeSink.cpp

        }
-        else if (multi_code == Coordination::Error::ZCONNECTIONLOSS
-            || multi_code == Coordination::Error::ZOPERATIONTIMEOUT)
+        else if (Coordination::isHardwareError(multi_code))


I wonder if it would be possible to improve this error and the use of part_committed_locally_but_zookeeper.

Currently if there is a hardware error here CH checks if the part was written to ZK, if it is then it returns success and if it doesn't it returns a failure UNEXPECTED_ZOOKEEPER_ERROR, asking the customer to retry and leaving the part to be cleaned up later.

Knowing that of the operations done in the process is to unlock the block_number_lock, could we use the that to retry these operations if the lock still exist? The idea would be to retry only this request and not the whole function. WDYT @devcrafter ?

One thing that might be necessary to do this would be to break down the function a big (it's over 600 lines with lots of branches) so it's simpler to decide if ZK retrial, and which part, is possible. It might require a refactor of the function and of the retries (to share retries across different blocks).

@devcrafter Any thoughts on this?

When it happens the backtrace is something like:

2023.07.26 11:52:28.793411 [ 7396 ] {91b00065-31e7-4129-be83-521f34038017} <Error> executeQuery: Code: 244. DB::Exception: Insert failed due to zookeeper error. Please retry. Reason: Connection loss: while pushing to view XXXXXXXXXXXXXXXXXXXXXX (5f9f5456-e71b-41f7-ac1e-4cee888817a9): while pushing to view YYYYYYYYYYYYYYYYYYYYYY (6f1e9b40-f141-4df6-b7c2-de8863ef7561). (UNEXPECTED_ZOOKEEPER_ERROR) (version 23.5.3.24 (official build)) (from 10.150.15.221:50962) ( in query: INSERT INTO ZZZZZZZZZZZZZZZZZZZZZZZZ(oid, pid, id, h, u, v, t, c, p, n, s, bn, bv, en, ev, on, ov, ovn, pt, pv, pm) FORMAT RowBinary ), Stack trace (when copying this message, always include the lines below): 0. DB::Exception::Exception(DB::Exception::MessageMasked&&, int, bool) @ 0x000000000e348c95 in /usr/bin/clickhouse 1. ? @ 0x0000000008b89b4d in /usr/bin/clickhouse 2. ? @ 0x00000000129b7e9e in /usr/bin/clickhouse 3. ? @ 0x00000000129b705e in /usr/bin/clickhouse 4. DB::ReplicatedMergeTreeSinkImpl<false>::commitPart(std::shared_ptr<DB::ZooKeeperWithFaultInjection> const&, std::shared_ptr<DB::IMergeTreeDataPart>&, String const&, unsigned long, bool) @ 0x0000000014b1bdf9 in /usr/bin/clickhouse 5. DB::ReplicatedMergeTreeSinkImpl<false>::finishDelayedChunk(std::shared_ptr<DB::ZooKeeperWithFaultInjection> const&) @ 0x0000000014b18a1a in /usr/bin/clickhouse 6. DB::ReplicatedMergeTreeSinkImpl<false>::consume(DB::Chunk) @ 0x0000000014b17bdd in /usr/bin/clickhouse 7. DB::SinkToStorage::onConsume(DB::Chunk) @ 0x0000000014ff6566 in /usr/bin/clickhouse 8. ? @ 0x0000000014f3dbeb in /usr/bin/clickhouse 9. ? @ 0x0000000014f3d91c in /usr/bin/clickhouse 10. DB::ExceptionKeepingTransform::work() @ 0x0000000014f3d1d8 in /usr/bin/clickhouse 11. DB::ExecutionThreadContext::executeTask() @ 0x0000000014d37d89 in /usr/bin/clickhouse 12. DB::PipelineExecutor::executeStepImpl(unsigned long, std::atomic<bool>*) @ 0x0000000014d2ea10 in /usr/bin/clickhouse 13. DB::PipelineExecutor::executeImpl(unsigned long) @ 0x0000000014d2df30 in /usr/bin/clickhouse 14. DB::PipelineExecutor::execute(unsigned long) @ 0x0000000014d2dbd1 in /usr/bin/clickhouse 15. DB::CompletedPipelineExecutor::execute() @ 0x0000000014d2c4dd in /usr/bin/clickhouse 16. DB::executeQuery(DB::ReadBuffer&, DB::WriteBuffer&, bool, std::shared_ptr<DB::Context>, std::function<void (DB::QueryResultDetails const&)>, std::optional<DB::FormatSettings> const&) @ 0x0000000013ee210b in /usr/bin/clickhouse 17. DB::HTTPHandler::processQuery(DB::HTTPServerRequest&, DB::HTMLForm&, DB::HTTPServerResponse&, DB::HTTPHandler::Output&, std::optional<DB::CurrentThread::QueryScope>&) @ 0x0000000014c7ee35 in /usr/bin/clickhouse 18. DB::HTTPHandler::handleRequest(DB::HTTPServerRequest&, DB::HTTPServerResponse&) @ 0x0000000014c830c4 in /usr/bin/clickhouse 19. DB::HTTPServerConnection::run() @ 0x0000000014cf2bf2 in /usr/bin/clickhouse 20. Poco::Net::TCPServerConnection::start() @ 0x0000000017c2fe34 in /usr/bin/clickhouse 21. Poco::Net::TCPServerDispatcher::run() @ 0x0000000017c31051 in /usr/bin/clickhouse 22. Poco::PooledThread::run() @ 0x0000000017dadd07 in /usr/bin/clickhouse 23. Poco::ThreadImpl::runnableEntry(void*) @ 0x0000000017dab73c in /usr/bin/clickhouse 24. start_thread @ 0x00000000000076db in /lib/x86_64-linux-gnu/libpthread-2.27.so 25. __clone @ 0x000000000012161f in /lib/x86_64-linux-gnu/libc-2.27.so

Knowing that of the operations done in the process is to unlock the block_number_lock, could we use the that to retry these operations if the lock still exist? The idea would be to retry only this request and not the whole function.

block_number_lock creates an ephemeral node in zookeeper. After the connection/session to the zookeeper is lost/closed, it's no longer valid, and so block_number. So, this way is not viable (until I miss something)

Usually the ephemeral lock will still be alive as ZK keeps it for a set amount of time so we can try to recover it. IIRC for the table locks, we check if it exists and if it does delete it and recreate it to gain it again thus avoiding the need to wait until deletion happens normally. We could do something similar as long as we keep it concurrent safe.

Revert "Revert " Keeper retries during insert (clean)""

f8c907c

robot-ch-test-poll added the pr-not-for-changelog This PR should not be mentioned in the changelog label Nov 10, 2022

Fix: backward-compatility check

c00f713

- remove unused config for now

devcrafter force-pushed the revert-43116-revert-42607-igor/insert_zk_retries_retry branch from 1972b25 to c00f713 Compare November 10, 2022 12:31

devcrafter mentioned this pull request Nov 11, 2022

BC check: ATOMIC_RENAME_FAIL (cannot remove part) #43185

Closed

devcrafter merged commit ee5756b into master Nov 11, 2022

devcrafter deleted the revert-43116-revert-42607-igor/insert_zk_retries_retry branch November 11, 2022 17:22

This was referenced Nov 11, 2022

Retries on ZooKeeper errors inside INSERT #39764

Closed

Enable keeper fault injection for inserts in functional tests #43117

Merged

Randomize keeper fault injection settings in stress tests #43187

Merged

Algunenano mentioned this pull request Mar 20, 2023

Fix ZK exception error message #47757

Merged

Algunenano reviewed Jul 31, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revert "Revert " Keeper retries during insert (clean)""#43122

Revert "Revert " Keeper retries during insert (clean)""#43122
devcrafter merged 2 commits intomasterfrom
revert-43116-revert-42607-igor/insert_zk_retries_retry

devcrafter commented Nov 10, 2022 •

edited

Loading

Uh oh!

devcrafter commented Nov 11, 2022 •

edited

Loading

Uh oh!

tavplubix commented Nov 11, 2022

Uh oh!

devcrafter commented Nov 11, 2022

Uh oh!

Algunenano Jul 31, 2023

Uh oh!

Algunenano Aug 31, 2023 •

edited

Loading

Uh oh!

devcrafter Aug 31, 2023

Uh oh!

Algunenano Aug 31, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

devcrafter commented Nov 10, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

devcrafter commented Nov 11, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tavplubix commented Nov 11, 2022

Uh oh!

devcrafter commented Nov 11, 2022

Uh oh!

Algunenano Jul 31, 2023

Choose a reason for hiding this comment

Uh oh!

Algunenano Aug 31, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

devcrafter Aug 31, 2023

Choose a reason for hiding this comment

Uh oh!

Algunenano Aug 31, 2023

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

devcrafter commented Nov 10, 2022 •

edited

Loading

devcrafter commented Nov 11, 2022 •

edited

Loading

Algunenano Aug 31, 2023 •

edited

Loading