Skip to content

Revert "Revert " Keeper retries during insert (clean)""#43122

Merged
devcrafter merged 2 commits intomasterfrom
revert-43116-revert-42607-igor/insert_zk_retries_retry
Nov 11, 2022
Merged

Revert "Revert " Keeper retries during insert (clean)""#43122
devcrafter merged 2 commits intomasterfrom
revert-43116-revert-42607-igor/insert_zk_retries_retry

Conversation

@devcrafter
Copy link
Copy Markdown
Member

@devcrafter devcrafter commented Nov 10, 2022

Reverts #43116
Resurrect #42607

@robot-ch-test-poll robot-ch-test-poll added the pr-not-for-changelog This PR should not be mentioned in the changelog label Nov 10, 2022
- remove unused config for now
@devcrafter devcrafter force-pushed the revert-43116-revert-42607-igor/insert_zk_retries_retry branch from 1972b25 to c00f713 Compare November 10, 2022 12:31
@devcrafter
Copy link
Copy Markdown
Member Author

devcrafter commented Nov 11, 2022

Integration test failures:

Stress tests:

2022.11.11 11:10:46.078370 [ 304294 ] {} <Error> virtual void DB::MetadataStorageFromDiskTransaction::commit(): Code: 521. DB::ErrnoException: Paths cannot be exchanged because /var/lib/clickhouse/disks/s3/data/system/part_log/202211_1_243_106 or /var/lib/clickhouse/disks/s3/data/system/part_log/delete_tmp_202211_1_243_106 does not exist, errno: 2, strerror: No such file or directory. (ATOMIC_RENAME_FAIL), Stack trace (when copying this message, always include the lines below):
2022.11.11 11:10:57.823019 [ 303780 ] {} <Error> Application: Caught exception while loading metadata: Code: 521. DB::ErrnoException: Paths cannot be exchanged because /var/lib/clickhouse/disks/s3/data/system/part_log/delete_tmp_202211_1_243_106 or /var/lib/clickhouse/disks/s3/data/system/part_log/202211_1_243_106 does not exist, errno: 2, strerror: No such file or directory: While rolling back operation #0. (ATOMIC_RENAME_FAIL), Stack trace (when copying this message, always include the lines below):
2022.11.11 11:11:37.809876 [ 303780 ] {} <Error> Application: DB::ErrnoException: Paths cannot be exchanged because /var/lib/clickhouse/disks/s3/data/system/part_log/delete_tmp_202211_1_243_106 or /var/lib/clickhouse/disks/s3/data/system/part_log/202211_1_243_106 does not exist, errno: 2, strerror: No such file or directory: While rolling back operation #0

@tavplubix doesn't look as something familiar?

@tavplubix
Copy link
Copy Markdown
Member

@tavplubix doesn't look as something familiar?

No, it's something new. Seems like the directory just disappeared somehow, but I don't see who could remove it:

2022.11.11 11:09:33.816618 [ 304178 ] {} <Trace> system.part_log: Loading part 202211_1_243_106 (Compact) from disk s3
2022.11.11 11:09:34.872752 [ 304178 ] {} <Trace> system.part_log: Finished part 202211_1_243_106 load on disk s3
2022.11.11 11:09:38.018187 [ 304046 ] {} <Test> VersionMetadata: Trying to lock removal_tid by (1, 1, 00000000-0000-0000-0000-000000000000), table: system.part_log, part: 202211_1_243_106
2022.11.11 11:10:41.972273 [ 304294 ] {} <Debug> system.part_log: Removing 52 parts from filesystem: 202211_1_222_90, 202211_1_223_91, 202211_1_224_92, 202211_1_225_93, 202211_1_231_94, 202211_1_232_95, 202211_1_233_96, 202211_1_234_97, 202211_1_235_98, 202211_1_236_99, 202211_1_237_100, 202211_1_238_101, 202211_1_239_102, 202211_1_240_103, 202211_1_241_104, 202211_1_242_105, 202211_1_243_106, 202211_1_244_107, 202211_1_245_108, 202211_1_246_109, 202211_1_247_110, 202211_223_223_0, 202211_224_224_0, 202211_225_225_0, 202211_226_226_0, 202211_226_231_1, 202211_227_227_0, 202211_228_228_0, 202211_229_229_0, 202211_230_230_0, 202211_231_231_0, 202211_232_232_0, 202211_233_233_0, 202211_234_234_0, 202211_235_235_0, 202211_236_236_0, 202211_237_237_0, 202211_238_238_0, 202211_239_239_0, 202211_240_240_0, 202211_241_241_0, 202211_242_242_0, 202211_243_243_0, 202211_244_244_0, 202211_245_245_0, 202211_246_246_0, 202211_247_247_0, 202211_248_248_0, 202211_249_249_0, 202211_250_250_0, 202211_251_251_0, 202211_252_252_0
2022.11.11 11:10:46.078370 [ 304294 ] {} <Error> virtual void DB::MetadataStorageFromDiskTransaction::commit(): Code: 521. DB::ErrnoException: Paths cannot be exchanged because /var/lib/clickhouse/disks/s3/data/system/part_log/202211_1_243_106 or /var/lib/clickhouse/disks/s3/data/system/part_log/delete_tmp_202211_1_243_106 does not exist, errno: 2, strerror: No such file or directory. (ATOMIC_RENAME_FAIL), Stack trace (when copying this message, always include the lines below):
2022.11.11 11:10:57.823019 [ 303780 ] {} <Error> Application: Caught exception while loading metadata: Code: 521. DB::ErrnoException: Paths cannot be exchanged because /var/lib/clickhouse/disks/s3/data/system/part_log/delete_tmp_202211_1_243_106 or /var/lib/clickhouse/disks/s3/data/system/part_log/202211_1_243_106 does not exist, errno: 2, strerror: No such file or directory: While rolling back operation #0. (ATOMIC_RENAME_FAIL), Stack trace (when copying this message, always include the lines below):
2022.11.11 11:11:37.809876 [ 303780 ] {} <Error> Application: DB::ErrnoException: Paths cannot be exchanged because /var/lib/clickhouse/disks/s3/data/system/part_log/delete_tmp_202211_1_243_106 or /var/lib/clickhouse/disks/s3/data/system/part_log/202211_1_243_106 does not exist, errno: 2, strerror: No such file or directory: While rolling back operation #0
2022.11.11 11:11:54.685395 [ 305224 ] {} <Trace> system.part_log_0: Loading part 202211_1_243_106 (Compact) from disk s3
2022.11.11 11:11:55.010476 [ 305224 ] {} <Trace> system.part_log_0: Finished part 202211_1_243_106 load on disk s3
2022.11.11 11:11:55.223057 [ 305144 ] {} <Test> VersionMetadata: Trying to lock removal_tid by (1, 1, 00000000-0000-0000-0000-000000000000), table: system.part_log_0, part: 202211_1_243_106
2022.11.11 11:12:58.868184 [ 305298 ] {} <Debug> system.part_log_0: Removing 87 parts from filesystem: 202211_1_243_106, 202211_1_244_107, 202211_1_245_108, 202211_1_246_109, 202211_1_247_110, 202211_1_248_111, 202211_1_249_112, 202211_1_250_113, 202211_1_251_114, 202211_1_252_115, 202211_1_253_116, 202211_1_254_117, 202211_1_255_118, 202211_1_256_119, 202211_1_262_120, 202211_1_263_121, 202211_1_264_122, 202211_1_265_123, 202211_1_266_124, 202211_1_267_125, 202211_1_268_126, 202211_1_269_127, 202211_1_275_128, 202211_1_281_129, 202211_223_223_0, 202211_224_224_0, 202211_225_225_0, 202211_226_226_0, 202211_226_231_1, 202211_227_227_0, 202211_228_228_0, 202211_229_229_0, 202211_230_230_0, 202211_231_231_0, 202211_232_232_0, 202211_233_233_0, 202211_234_234_0, 202211_235_235_0, 202211_236_236_0, 202211_237_237_0, 202211_238_238_0, 202211_239_239_0, 202211_240_240_0, 202211_241_241_0, 202211_242_242_0, 202211_243_243_0, 202211_244_244_0, 202211_245_245_0, 202211_246_246_0, 202211_247_247_0, 202211_248_248_0, 202211_249_249_0, 202211_250_250_0, 202211_251_251_0, 202211_252_252_0, 202211_253_253_0, 202211_254_254_0, 202211_255_255_0, 202211_256_256_0, 202211_257_257_0, 202211_257_262_1, 202211_258_258_0, 202211_259_259_0, 202211_260_260_0, 202211_261_261_0, 202211_262_262_0, 202211_263_263_0, 202211_264_264_0, 202211_265_265_0, 202211_266_266_0, 202211_267_267_0, 202211_268_268_0, 202211_269_269_0, 202211_270_270_0, 202211_270_275_1, 202211_271_271_0, 202211_272_272_0, 202211_273_273_0, 202211_274_274_0, 202211_275_275_0, 202211_276_276_0, 202211_277_277_0, 202211_278_278_0, 202211_279_279_0, 202211_280_280_0, 202211_281_281_0, 202211_282_282_0

@devcrafter
Copy link
Copy Markdown
Member Author

@tavplubix doesn't look as something familiar?

No, it's something new. Seems like the directory just disappeared somehow, but I don't see who could remove it:

Created #43185

}
else if (multi_code == Coordination::Error::ZCONNECTIONLOSS
|| multi_code == Coordination::Error::ZOPERATIONTIMEOUT)
else if (Coordination::isHardwareError(multi_code))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it would be possible to improve this error and the use of part_committed_locally_but_zookeeper.

Currently if there is a hardware error here CH checks if the part was written to ZK, if it is then it returns success and if it doesn't it returns a failure UNEXPECTED_ZOOKEEPER_ERROR, asking the customer to retry and leaving the part to be cleaned up later.

Knowing that of the operations done in the process is to unlock the block_number_lock, could we use the that to retry these operations if the lock still exist? The idea would be to retry only this request and not the whole function. WDYT @devcrafter ?

One thing that might be necessary to do this would be to break down the function a big (it's over 600 lines with lots of branches) so it's simpler to decide if ZK retrial, and which part, is possible. It might require a refactor of the function and of the retries (to share retries across different blocks).

Copy link
Copy Markdown
Member

@Algunenano Algunenano Aug 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@devcrafter Any thoughts on this?

When it happens the backtrace is something like:

2023.07.26 11:52:28.793411 [ 7396 ] {91b00065-31e7-4129-be83-521f34038017} <Error> executeQuery: Code: 244. DB::Exception: Insert failed due to zookeeper error. Please retry. Reason: Connection loss: while pushing to view XXXXXXXXXXXXXXXXXXXXXX (5f9f5456-e71b-41f7-ac1e-4cee888817a9): while pushing to view YYYYYYYYYYYYYYYYYYYYYY (6f1e9b40-f141-4df6-b7c2-de8863ef7561). (UNEXPECTED_ZOOKEEPER_ERROR) (version 23.5.3.24 (official build)) (from 10.150.15.221:50962) (
in query: INSERT INTO ZZZZZZZZZZZZZZZZZZZZZZZZ(oid, pid, id, h, u, v, t, c, p, n, s, bn, bv, en, ev, on, ov, ovn, pt, pv, pm) FORMAT RowBinary ), Stack trace (when copying this message, always include the lines below):

0. DB::Exception::Exception(DB::Exception::MessageMasked&&, int, bool) @ 0x000000000e348c95 in /usr/bin/clickhouse
1. ? @ 0x0000000008b89b4d in /usr/bin/clickhouse
2. ? @ 0x00000000129b7e9e in /usr/bin/clickhouse
3. ? @ 0x00000000129b705e in /usr/bin/clickhouse
4. DB::ReplicatedMergeTreeSinkImpl<false>::commitPart(std::shared_ptr<DB::ZooKeeperWithFaultInjection> const&, std::shared_ptr<DB::IMergeTreeDataPart>&, String const&, unsigned long, bool) @ 0x0000000014b1bdf9 in /usr/bin/clickhouse
5. DB::ReplicatedMergeTreeSinkImpl<false>::finishDelayedChunk(std::shared_ptr<DB::ZooKeeperWithFaultInjection> const&) @ 0x0000000014b18a1a in /usr/bin/clickhouse
6. DB::ReplicatedMergeTreeSinkImpl<false>::consume(DB::Chunk) @ 0x0000000014b17bdd in /usr/bin/clickhouse
7. DB::SinkToStorage::onConsume(DB::Chunk) @ 0x0000000014ff6566 in /usr/bin/clickhouse
8. ? @ 0x0000000014f3dbeb in /usr/bin/clickhouse
9. ? @ 0x0000000014f3d91c in /usr/bin/clickhouse
10. DB::ExceptionKeepingTransform::work() @ 0x0000000014f3d1d8 in /usr/bin/clickhouse
11. DB::ExecutionThreadContext::executeTask() @ 0x0000000014d37d89 in /usr/bin/clickhouse
12. DB::PipelineExecutor::executeStepImpl(unsigned long, std::atomic<bool>*) @ 0x0000000014d2ea10 in /usr/bin/clickhouse
13. DB::PipelineExecutor::executeImpl(unsigned long) @ 0x0000000014d2df30 in /usr/bin/clickhouse
14. DB::PipelineExecutor::execute(unsigned long) @ 0x0000000014d2dbd1 in /usr/bin/clickhouse
15. DB::CompletedPipelineExecutor::execute() @ 0x0000000014d2c4dd in /usr/bin/clickhouse
16. DB::executeQuery(DB::ReadBuffer&, DB::WriteBuffer&, bool, std::shared_ptr<DB::Context>, std::function<void (DB::QueryResultDetails const&)>, std::optional<DB::FormatSettings> const&) @ 0x0000000013ee210b in /usr/bin/clickhouse
17. DB::HTTPHandler::processQuery(DB::HTTPServerRequest&, DB::HTMLForm&, DB::HTTPServerResponse&, DB::HTTPHandler::Output&, std::optional<DB::CurrentThread::QueryScope>&) @ 0x0000000014c7ee35 in /usr/bin/clickhouse
18. DB::HTTPHandler::handleRequest(DB::HTTPServerRequest&, DB::HTTPServerResponse&) @ 0x0000000014c830c4 in /usr/bin/clickhouse
19. DB::HTTPServerConnection::run() @ 0x0000000014cf2bf2 in /usr/bin/clickhouse
20. Poco::Net::TCPServerConnection::start() @ 0x0000000017c2fe34 in /usr/bin/clickhouse
21. Poco::Net::TCPServerDispatcher::run() @ 0x0000000017c31051 in /usr/bin/clickhouse
22. Poco::PooledThread::run() @ 0x0000000017dadd07 in /usr/bin/clickhouse
23. Poco::ThreadImpl::runnableEntry(void*) @ 0x0000000017dab73c in /usr/bin/clickhouse
24. start_thread @ 0x00000000000076db in /lib/x86_64-linux-gnu/libpthread-2.27.so
25. __clone @ 0x000000000012161f in /lib/x86_64-linux-gnu/libc-2.27.so

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Knowing that of the operations done in the process is to unlock the block_number_lock, could we use the that to retry these operations if the lock still exist? The idea would be to retry only this request and not the whole function.

block_number_lock creates an ephemeral node in zookeeper. After the connection/session to the zookeeper is lost/closed, it's no longer valid, and so block_number. So, this way is not viable (until I miss something)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually the ephemeral lock will still be alive as ZK keeps it for a set amount of time so we can try to recover it. IIRC for the table locks, we check if it exists and if it does delete it and recreate it to gain it again thus avoiding the need to wait until deletion happens normally. We could do something similar as long as we keep it concurrent safe.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-not-for-changelog This PR should not be mentioned in the changelog

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants