Skip to content

do not clear old parts at shutdown#43760

Merged
CheSema merged 14 commits intoClickHouse:masterfrom
CheSema:do-not-run-clear-old-parts-at-shutdown
Dec 12, 2022
Merged

do not clear old parts at shutdown#43760
CheSema merged 14 commits intoClickHouse:masterfrom
CheSema:do-not-run-clear-old-parts-at-shutdown

Conversation

@CheSema
Copy link
Copy Markdown
Member

@CheSema CheSema commented Nov 28, 2022

Shutdown at tests will be much faster.

Thus is allow to decrease back to the default value max-tries at stop. With was increased up to 3 times here #43277

Changelog category (leave one):

  • Build/Testing/Packaging Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Shutdown will be much faster if do not call clearOldPartsFromFilesystem. Especially this is right for tests with zero-copy due to single thread deletion parts. clearOldPartsFromFilesystem is unnecessary after #41145

Information about CI checks: https://clickhouse.com/docs/en/development/continuous-integration/

@robot-clickhouse robot-clickhouse added the pr-build Pull request with build/testing/packaging improvement label Nov 28, 2022
@tavplubix tavplubix self-assigned this Nov 28, 2022
Comment on lines 134 to 137
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should keep increased timeout for BC check (because old versions do not have this improvement)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw. I foresee that bc check will need more time to start. It will delete old parts from previous run at start now.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that old version in BC check does not support --max-tries argument:


+ clickhouse stop --max-tries 180 --do-not-kill
std::exception. Code: 1001, type: boost::wrapexcept<boost::program_options::unknown_option>, e.what() = unrecognised option '--max-tries' (version 22.11.2.30 (official build))

https://s3.amazonaws.com/clickhouse-test-reports/44157/d54f739836fa998f0e32458bcaf3dad545e25526/stress_test__msan_.html

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And seems like it triggers gdb in BC check all the time and sometimes gdb hangs and it leads to "stress test without check_status.tsv "

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CheSema CheSema force-pushed the do-not-run-clear-old-parts-at-shutdown branch from 4579746 to 9a2a2dd Compare November 28, 2022 15:28
Copy link
Copy Markdown
Member

@alexey-milovidov alexey-milovidov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverts #8602.

@CheSema CheSema force-pushed the do-not-run-clear-old-parts-at-shutdown branch 2 times, most recently from 76a5896 to 6dab740 Compare December 2, 2022 16:03
@CheSema CheSema force-pushed the do-not-run-clear-old-parts-at-shutdown branch from 84a18d6 to 949a8e2 Compare December 5, 2022 17:07
@CheSema
Copy link
Copy Markdown
Member Author

CheSema commented Dec 7, 2022

Strange fail. It seems to be rare, according to the stats. But after the changes from this PR could be more cases like this. Because clickhouse re-reads old parts after detach/attach.

Error log
2022.12.06 23:07:07.671890 [ 14129 ] {c9248060-b770-4bfd-9be2-f5131c90711a} <Error> test_3wsfqacq.t_sparse_alter (7255bf95-b5b8-4ddf-8adb-a904e2f424b8): while loading part all_1_1_0 on path store/725/7255bf95-b5b8-4ddf-8adb-a904e2f424b8/all_1_1_0: Code: 228. DB::Exception: /var/lib/clickhouse/store/725/7255bf95-b5b8-4ddf-8adb-a904e2f424b8/all_1_1_0/serialization.json has unexpected size: 154 instead of 219. (BAD_SIZE_OF_FILE_IN_DATA_PART), Stack trace (when copying this message, always include the lines below):

0. ./build_docker/../src/Common/Exception.cpp:77: DB::Exception::Exception(DB::Exception::MessageMasked const&, int, bool) @ 0xe5179fa in /usr/lib/debug/usr/bin/clickhouse.debug
1. DB::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, int, bool) @ 0x7e886ad in /usr/lib/debug/usr/bin/clickhouse.debug
2. ./build_docker/../src/Storages/MergeTree/MergeTreeDataPartChecksum.cpp:0: DB::MergeTreeDataPartChecksum::checkSize(std::__1::shared_ptr<DB::IDisk> const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&) const @ 0x14c426a2 in /usr/lib/debug/usr/bin/clickhouse.debug
3. ./build_docker/../contrib/llvm-project/libcxx/include/string:1499: DB::MergeTreeDataPartChecksums::checkSizes(std::__1::shared_ptr<DB::IDisk> const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&) const @ 0x14c42c90 in /usr/lib/debug/usr/bin/clickhouse.debug
4. ./build_docker/../contrib/llvm-project/libcxx/include/string:1499: DB::DataPartStorageOnDisk::checkConsistency(DB::MergeTreeDataPartChecksums const&) const @ 0x14adf51e in /usr/lib/debug/usr/bin/clickhouse.debug
5. ./build_docker/../contrib/llvm-project/libcxx/include/__memory/shared_ptr.h:701: DB::IMergeTreeDataPart::checkConsistencyBase() const @ 0x14b5317f in /usr/lib/debug/usr/bin/clickhouse.debug
6. ./build_docker/../src/Storages/MergeTree/MergeTreeDataPartWide.cpp:187: DB::MergeTreeDataPartWide::checkConsistency(bool) const @ 0x14c54698 in /usr/lib/debug/usr/bin/clickhouse.debug
7. ./build_docker/../src/Storages/MergeTree/IMergeTreeDataPart.cpp:659: DB::IMergeTreeDataPart::loadColumnsChecksumsIndexes(bool, bool) @ 0x14b41935 in /usr/lib/debug/usr/bin/clickhouse.debug
8. ./build_docker/../contrib/llvm-project/libcxx/include/__memory/shared_ptr.h:815: DB::MergeTreeData::loadDataPartsFromDisk(std::__1::vector<std::__1::shared_ptr<DB::IMergeTreeDataPart>, std::__1::allocator<std::__1::shared_ptr<DB::IMergeTreeDataPart>>>&, std::__1::vector<std::__1::shared_ptr<DB::IMergeTreeDataPart>, std::__1::allocator<std::__1::shared_ptr<DB::IMergeTreeDataPart>>>&, ThreadPoolImpl<ThreadFromGlobalPoolImpl<false>>&, unsigned long, std::__1::queue<std::__1::vector<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::shared_ptr<DB::IDisk>>, std::__1::allocator<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::shared_ptr<DB::IDisk>>>>, std::__1::deque<std::__1::vector<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::shared_ptr<DB::IDisk>>, std::__1::allocator<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::shared_ptr<DB::IDisk>>>>, std::__1::allocator<std::__1::vector<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::shared_ptr<DB::IDisk>>, std::__1::allocator<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::shared_ptr<DB::IDisk>>>>>>>&, bool, std::__1::shared_ptr<DB::MergeTreeSettings const> const&)::$_15::operator()(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, std::__1::shared_ptr<DB::IDisk> const&) const @ 0x14c307ee in /usr/lib/debug/usr/bin/clickhouse.debug
9. ./build_docker/../src/Storages/MergeTree/MergeTreeData.cpp:1173: void std::__1::__function::__policy_invoker<void ()>::__call_impl<std::__1::__function::__default_alloc_func<DB::MergeTreeData::loadDataPartsFromDisk(std::__1::vector<std::__1::shared_ptr<DB::IMergeTreeDataPart>, std::__1::allocator<std::__1::shared_ptr<DB::IMergeTreeDataPart>>>&, std::__1::vector<std::__1::shared_ptr<DB::IMergeTreeDataPart>, std::__1::allocator<std::__1::shared_ptr<DB::IMergeTreeDataPart>>>&, ThreadPoolImpl<ThreadFromGlobalPoolImpl<false>>&, unsigned long, std::__1::queue<std::__1::vector<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::shared_ptr<DB::IDisk>>, std::__1::allocator<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::shared_ptr<DB::IDisk>>>>, std::__1::deque<std::__1::vector<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::shared_ptr<DB::IDisk>>, std::__1::allocator<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::shared_ptr<DB::IDisk>>>>, std::__1::allocator<std::__1::vector<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::shared_ptr<DB::IDisk>>, std::__1::allocator<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::shared_ptr<DB::IDisk>>>>>>>&, bool, std::__1::shared_ptr<DB::MergeTreeSettings const> const&)::$_16, void ()>>(std::__1::__function::__policy_storage const*) @ 0x14c2f640 in /usr/lib/debug/usr/bin/clickhouse.debug
10. ./build_docker/../base/base/wide_integer_impl.h:774: ThreadPoolImpl<ThreadFromGlobalPoolImpl<false>>::worker(std::__1::__list_iterator<ThreadFromGlobalPoolImpl<false>, void*>) @ 0xe5d4756 in /usr/lib/debug/usr/bin/clickhouse.debug
11. ./build_docker/../src/Common/ThreadPool.cpp:0: void std::__1::__function::__policy_invoker<void ()>::__call_impl<std::__1::__function::__default_alloc_func<ThreadFromGlobalPoolImpl<false>::ThreadFromGlobalPoolImpl<void ThreadPoolImpl<ThreadFromGlobalPoolImpl<false>>::scheduleImpl<void>(std::__1::function<void ()>, long, std::__1::optional<unsigned long>, bool)::'lambda0'()>(void&&)::'lambda'(), void ()>>(std::__1::__function::__policy_storage const*) @ 0xe5d6f75 in /usr/lib/debug/usr/bin/clickhouse.debug
12. ./build_docker/../base/base/wide_integer_impl.h:774: ThreadPoolImpl<std::__1::thread>::worker(std::__1::__list_iterator<std::__1::thread, void*>) @ 0xe5d0d76 in /usr/lib/debug/usr/bin/clickhouse.debug
13. ./build_docker/../contrib/llvm-project/libcxx/include/__memory/unique_ptr.h:302: void* std::__1::__thread_proxy[abi:v15000]<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, void ThreadPoolImpl<std::__1::thread>::scheduleImpl<void>(std::__1::function<void ()>, long, std::__1::optional<unsigned long>, bool)::'lambda0'()>>(void*) @ 0xe5d5f41 in /usr/lib/debug/usr/bin/clickhouse.debug
14. ? @ 0x7f9c38fec609 in ?
15. clone @ 0x7f9c38f11133 in ?
 (version 22.12.1.1254)
2022.12.06 23:07:07.672138 [ 14129 ] {c9248060-b770-4bfd-9be2-f5131c90711a} <Error> test_3wsfqacq.t_sparse_alter (7255bf95-b5b8-4ddf-8adb-a904e2f424b8): Detaching broken part /var/lib/clickhouse/store/725/7255bf95-b5b8-4ddf-8adb-a904e2f424b8/all_1_1_0 (size: 10.48 KiB). If it happened after update, it is likely because of backward incompatibility. You need to resolve this manually

@CheSema
Copy link
Copy Markdown
Member Author

CheSema commented Dec 9, 2022

Integration tests asan, failed all cases in test_distributed_queries_stress/test.py::test_stress_distributed.
Test is flaky. #41776

@CheSema CheSema force-pushed the do-not-run-clear-old-parts-at-shutdown branch from 8c14e22 to 6aa0b2a Compare December 9, 2022 11:37
@CheSema
Copy link
Copy Markdown
Member Author

CheSema commented Dec 12, 2022

stress test without check_status.tsv #41850

@CheSema
Copy link
Copy Markdown
Member Author

CheSema commented Dec 12, 2022

02481_async_insert_dedup new test, seems to be flaky.

@CheSema
Copy link
Copy Markdown
Member Author

CheSema commented Dec 12, 2022

test_merge_tree_hdfs/test.py::test_attach_detach_partition
Actually test needs to be adjusted. There is wrong way how to wait part deletion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-build Pull request with build/testing/packaging improvement

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants