Skip to content

Fix removing of parts in a Temporary state#28221

Merged
alesapin merged 1 commit intoClickHouse:masterfrom
azat:remove-Temporary-part-fix
Aug 30, 2021
Merged

Fix removing of parts in a Temporary state#28221
alesapin merged 1 commit intoClickHouse:masterfrom
azat:remove-Temporary-part-fix

Conversation

@azat
Copy link
Copy Markdown
Member

@azat azat commented Aug 27, 2021

Changelog category (leave one):

  • Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Fix removing of parts in a Temporary state which can lead to an unexpected exception (Part %name% doesn't exist). Fixes #23661.

Detailed description / Documentation draft:
Parts in a temporary state does not exists in data_parts_by_info, so do
not try to search there and throw LOGICAL_ERROR in case of failure:

Details
    02:45:49.037546 [ 5890 ] {} <Error> test_iy9rta.concurrent_kill_4 (526aa7c8-db2a-4f0e-926a-a7c8db2a9f0e): Code: 40. DB::Exception: Part all_0_0_0_1 from r11 has different columns hash. (CHECKSUM_DOESNT_MATCH) (version 21.10.1.7910 (official build)). Data after mutation is not byte-identical to data on another replicas. We will download merged part from replica to force byte-identical result.
    02:45:49.049422 [ 5890 ] {} <Trace> test_iy9rta.concurrent_kill_4 (526aa7c8-db2a-4f0e-926a-a7c8db2a9f0e): Trying to immediately remove part all_0_0_0_1 (state Temporary)
    02:45:49.060210 [ 5890 ] {} <Fatal> : Logical error: 'Part all_0_0_0_1 doesn't exist'.
    02:47:01.572508 [ 29208 ] {} <Fatal> BaseDaemon: (version 21.10.1.7910 (official build), build id: 9309CECED9A0D32CBB00BB8EC77B91456679868E) (from thread 5890) (no query) Received signal Aborted (6)
    ...
    02:47:06.815000 [ 29208 ] {} <Fatal> BaseDaemon: 8. ./obj-x86_64-linux-gnu/../src/Storages/MergeTree/MergeTreeData.cpp:2593: DB::MergeTreeData::tryRemovePartImmediately(std::__1::shared_ptr<DB::IMergeTreeDataPart const>&&) @ 0x1f959e0d in /usr/bin/clickhouse
    02:47:06.816309 [ 29209 ] {} <Fatal> BaseDaemon: 8. ./obj-x86_64-linux-gnu/../src/Storages/MergeTree/MergeTreeData.cpp:2593: DB::MergeTreeData::tryRemovePartImmediately(std::__1::shared_ptr<DB::IMergeTreeDataPart const>&&) @ 0x1f959e0d in /usr/bin/clickhouse
    02:47:09.455665 [ 29208 ] {} <Fatal> BaseDaemon: 9. ./obj-x86_64-linux-gnu/../src/Storages/StorageReplicatedMergeTree.cpp:1939: DB::StorageReplicatedMergeTree::tryExecutePartMutation(DB::ReplicatedMergeTreeLogEntry const&) @ 0x1f5f2bf6 in /usr/bin/clickhouse
    02:47:09.468738 [ 29209 ] {} <Fatal> BaseDaemon: 9. ./obj-x86_64-linux-gnu/../src/Storages/StorageReplicatedMergeTree.cpp:1939: DB::StorageReplicatedMergeTree::tryExecutePartMutation(DB::ReplicatedMergeTreeLogEntry const&) @ 0x1f5f2bf6 in /usr/bin/clickhouse
    02:47:11.776857 [ 29208 ] {} <Fatal> BaseDaemon: 10. ./obj-x86_64-linux-gnu/../src/Storages/StorageReplicatedMergeTree.cpp:1581: DB::StorageReplicatedMergeTree::executeLogEntry(DB::ReplicatedMergeTreeLogEntry&) @ 0x1f5e484c in /usr/bin/clickhouse
    02:47:11.904232 [ 29209 ] {} <Fatal> BaseDaemon: 10. ./obj-x86_64-linux-gnu/../src/Storages/StorageReplicatedMergeTree.cpp:1581: DB::StorageReplicatedMergeTree::executeLogEntry(DB::ReplicatedMergeTreeLogEntry&) @ 0x1f5e484c in /usr/bin/clickhouse
    02:47:13.941811 [ 29208 ] {} <Fatal> BaseDaemon: 11. ./obj-x86_64-linux-gnu/../src/Storages/StorageReplicatedMergeTree.cpp:3176: DB::StorageReplicatedMergeTree::processQueueEntry(std::__1::shared_ptr<DB::ReplicatedMergeTreeQueue::SelectedEntry>)::$_18::operator() shared_ptr<DB::ReplicatedMergeTreeLogEntry>&) const @ 0x1f65faa5 in /usr/bin/clickhouse
    02:47:14.477158 [ 29209 ] {} <Fatal> BaseDaemon: 11. ./obj-x86_64-linux-gnu/../src/Storages/StorageReplicatedMergeTree.cpp:3176: DB::StorageReplicatedMergeTree::processQueueEntry(std::__1::shared_ptr<DB::ReplicatedMergeTreeQueue::SelectedEntry>)::$_18::operator() shared_ptr<DB::ReplicatedMergeTreeLogEntry>&) const @ 0x1f65faa5 in /usr/bin/clickhouse
    02:47:16.475373 [ 29208 ] {} <Fatal> BaseDaemon: 12. ./obj-x86_64-linux-gnu/../contrib/libcxx/include/type_traits:3676: decltype(std::__1::forward<DB::StorageReplicatedMergeTree::processQueueEntry(std::__1::shared_ptr<DB::ReplicatedMergeTreeQueue::SelectedEntry>))(std::__1::forward<std::__1::shared_ptr<DB::ReplicatedMergeTreeLogEntry>&>(fp0))) std::__1::__invoke<DB::StorageReplicatedMergeTree::processQueueEntry(std::__1::shared_ptr<DB::ReplicatedMergeTreeQueue::SelectedEntry>)::$_18&, std::__1::shared_ptr<DB::ReplicatedMEntry>&>(DB::StorageReplicatedMergeTree::processQueueEntry(std::__1::shared_ptr<DB::ReplicatedMergeTreeQueue::SelectedEntry>)::$_18&, std::__1::shared_ptr<DB::ReplicatedMergeTreeLogEntry>&) @ 0x1f65fa32 in /usr/bin/clickhouse
    02:47:16.970325 [ 29209 ] {} <Fatal> BaseDaemon: 12. ./obj-x86_64-linux-gnu/../contrib/libcxx/include/type_traits:3676: decltype(std::__1::forward<DB::StorageReplicatedMergeTree::processQueueEntry(std::__1::shared_ptr<DB::ReplicatedMergeTreeQueue::SelectedEntry>))(std::__1::forward<std::__1::shared_ptr<DB::ReplicatedMergeTreeLogEntry>&>(fp0))) std::__1::__invoke<DB::StorageReplicatedMergeTree::processQueueEntry(std::__1::shared_ptr<DB::ReplicatedMergeTreeQueue::SelectedEntry>)::$_18&, std::__1::shared_ptr<DB::ReplicatedMEntry>&>(DB::StorageReplicatedMergeTree::processQueueEntry(std::__1::shared_ptr<DB::ReplicatedMergeTreeQueue::SelectedEntry>)::$_18&, std::__1::shared_ptr<DB::ReplicatedMergeTreeLogEntry>&) @ 0x1f65fa32 in /usr/bin/clickhouse
    02:47:18.979481 [ 29208 ] {} <Fatal> BaseDaemon: 13. ./obj-x86_64-linux-gnu/../contrib/libcxx/include/__functional_base:317: bool std::__1::__invoke_void_return_wrapper<bool>::__call<DB::StorageReplicatedMergeTree::processQueueEntry(std::__1::shared_ptr<DB::ReplireeQueue::SelectedEntry>)::$_18&, std::__1::shared_ptr<DB::ReplicatedMergeTreeLogEntry>&>(DB::StorageReplicatedMergeTree::processQueueEntry(std::__1::shared_ptr<DB::ReplicatedMergeTreeQueue::SelectedEntry>)::$_18&, std::__1::shared_ptr<DB::ReplicatedMergeTreeLogE x1f65f9f2 in /usr/bin/clickhouse
    02:47:19.450807 [ 29209 ] {} <Fatal> BaseDaemon: 13. ./obj-x86_64-linux-gnu/../contrib/libcxx/include/__functional_base:317: bool std::__1::__invoke_void_return_wrapper<bool>::__call<DB::StorageReplicatedMergeTree::processQueueEntry(std::__1::shared_ptr<DB::ReplireeQueue::SelectedEntry>)::$_18&, std::__1::shared_ptr<DB::ReplicatedMergeTreeLogEntry>&>(DB::StorageReplicatedMergeTree::processQueueEntry(std::__1::shared_ptr<DB::ReplicatedMergeTreeQueue::SelectedEntry>)::$_18&, std::__1::shared_ptr<DB::ReplicatedMergeTreeLogE x1f65f9f2 in /usr/bin/clickhouse
    02:47:21.055007 [ 29208 ] {} <Fatal> BaseDaemon: 14. ./obj-x86_64-linux-gnu/../contrib/libcxx/include/functional:1608: std::__1::__function::__default_alloc_func<DB::StorageReplicatedMergeTree::processQueueEntry(std::__1::shared_ptr<DB::ReplicatedMergeTreeQueue::ry>)::$_18, bool (std::__1::shared_ptr<DB::ReplicatedMergeTreeLogEntry>&)>::operator()(std::__1::shared_ptr<DB::ReplicatedMergeTreeLogEntry>&) @ 0x1f65f9b0 in /usr/bin/clickhouse
    02:47:23.546946 [ 413 ] {} <Fatal> Application: Child process was terminated by signal 6.

CI: https://clickhouse-test-reports.s3.yandex.net/0/4a8b82232c11512232df3ecdf4ffaec287116ad5/stress_test_(debug).html#fail1
CI: https://clickhouse-test-reports.s3.yandex.net/0/df1fe27791f82c2a917390faa30716effbd32b8f/stress_test_(debug)/test_run.txt.out.log

Cc: @alesapin

Parts in a temporary state is not exists in data_parts_by_info, so do
not try to search there and throw LOGICAL_ERROR in case of failure:

<details>

```
    02:45:49.037546 [ 5890 ] {} <Error> test_iy9rta.concurrent_kill_4 (526aa7c8-db2a-4f0e-926a-a7c8db2a9f0e): Code: 40. DB::Exception: Part all_0_0_0_1 from r11 has different columns hash. (CHECKSUM_DOESNT_MATCH) (version 21.10.1.7910 (official build)). Data after mutation is not byte-identical to data on another replicas. We will download merged part from replica to force byte-identical result.
    02:45:49.049422 [ 5890 ] {} <Trace> test_iy9rta.concurrent_kill_4 (526aa7c8-db2a-4f0e-926a-a7c8db2a9f0e): Trying to immediately remove part all_0_0_0_1 (state Temporary)
    02:45:49.060210 [ 5890 ] {} <Fatal> : Logical error: 'Part all_0_0_0_1 doesn't exist'.
    02:47:01.572508 [ 29208 ] {} <Fatal> BaseDaemon: (version 21.10.1.7910 (official build), build id: 9309CECED9A0D32CBB00BB8EC77B91456679868E) (from thread 5890) (no query) Received signal Aborted (6)
    ...
    02:47:06.815000 [ 29208 ] {} <Fatal> BaseDaemon: 8. ./obj-x86_64-linux-gnu/../src/Storages/MergeTree/MergeTreeData.cpp:2593: DB::MergeTreeData::tryRemovePartImmediately(std::__1::shared_ptr<DB::IMergeTreeDataPart const>&&) @ 0x1f959e0d in /usr/bin/clickhouse
    02:47:06.816309 [ 29209 ] {} <Fatal> BaseDaemon: 8. ./obj-x86_64-linux-gnu/../src/Storages/MergeTree/MergeTreeData.cpp:2593: DB::MergeTreeData::tryRemovePartImmediately(std::__1::shared_ptr<DB::IMergeTreeDataPart const>&&) @ 0x1f959e0d in /usr/bin/clickhouse
    02:47:09.455665 [ 29208 ] {} <Fatal> BaseDaemon: 9. ./obj-x86_64-linux-gnu/../src/Storages/StorageReplicatedMergeTree.cpp:1939: DB::StorageReplicatedMergeTree::tryExecutePartMutation(DB::ReplicatedMergeTreeLogEntry const&) @ 0x1f5f2bf6 in /usr/bin/clickhouse
    02:47:09.468738 [ 29209 ] {} <Fatal> BaseDaemon: 9. ./obj-x86_64-linux-gnu/../src/Storages/StorageReplicatedMergeTree.cpp:1939: DB::StorageReplicatedMergeTree::tryExecutePartMutation(DB::ReplicatedMergeTreeLogEntry const&) @ 0x1f5f2bf6 in /usr/bin/clickhouse
    02:47:11.776857 [ 29208 ] {} <Fatal> BaseDaemon: 10. ./obj-x86_64-linux-gnu/../src/Storages/StorageReplicatedMergeTree.cpp:1581: DB::StorageReplicatedMergeTree::executeLogEntry(DB::ReplicatedMergeTreeLogEntry&) @ 0x1f5e484c in /usr/bin/clickhouse
    02:47:11.904232 [ 29209 ] {} <Fatal> BaseDaemon: 10. ./obj-x86_64-linux-gnu/../src/Storages/StorageReplicatedMergeTree.cpp:1581: DB::StorageReplicatedMergeTree::executeLogEntry(DB::ReplicatedMergeTreeLogEntry&) @ 0x1f5e484c in /usr/bin/clickhouse
    02:47:13.941811 [ 29208 ] {} <Fatal> BaseDaemon: 11. ./obj-x86_64-linux-gnu/../src/Storages/StorageReplicatedMergeTree.cpp:3176: DB::StorageReplicatedMergeTree::processQueueEntry(std::__1::shared_ptr<DB::ReplicatedMergeTreeQueue::SelectedEntry>)::$_18::operator() shared_ptr<DB::ReplicatedMergeTreeLogEntry>&) const @ 0x1f65faa5 in /usr/bin/clickhouse
    02:47:14.477158 [ 29209 ] {} <Fatal> BaseDaemon: 11. ./obj-x86_64-linux-gnu/../src/Storages/StorageReplicatedMergeTree.cpp:3176: DB::StorageReplicatedMergeTree::processQueueEntry(std::__1::shared_ptr<DB::ReplicatedMergeTreeQueue::SelectedEntry>)::$_18::operator() shared_ptr<DB::ReplicatedMergeTreeLogEntry>&) const @ 0x1f65faa5 in /usr/bin/clickhouse
    02:47:16.475373 [ 29208 ] {} <Fatal> BaseDaemon: 12. ./obj-x86_64-linux-gnu/../contrib/libcxx/include/type_traits:3676: decltype(std::__1::forward<DB::StorageReplicatedMergeTree::processQueueEntry(std::__1::shared_ptr<DB::ReplicatedMergeTreeQueue::SelectedEntry>))(std::__1::forward<std::__1::shared_ptr<DB::ReplicatedMergeTreeLogEntry>&>(fp0))) std::__1::__invoke<DB::StorageReplicatedMergeTree::processQueueEntry(std::__1::shared_ptr<DB::ReplicatedMergeTreeQueue::SelectedEntry>)::$_18&, std::__1::shared_ptr<DB::ReplicatedMEntry>&>(DB::StorageReplicatedMergeTree::processQueueEntry(std::__1::shared_ptr<DB::ReplicatedMergeTreeQueue::SelectedEntry>)::$_18&, std::__1::shared_ptr<DB::ReplicatedMergeTreeLogEntry>&) @ 0x1f65fa32 in /usr/bin/clickhouse
    02:47:16.970325 [ 29209 ] {} <Fatal> BaseDaemon: 12. ./obj-x86_64-linux-gnu/../contrib/libcxx/include/type_traits:3676: decltype(std::__1::forward<DB::StorageReplicatedMergeTree::processQueueEntry(std::__1::shared_ptr<DB::ReplicatedMergeTreeQueue::SelectedEntry>))(std::__1::forward<std::__1::shared_ptr<DB::ReplicatedMergeTreeLogEntry>&>(fp0))) std::__1::__invoke<DB::StorageReplicatedMergeTree::processQueueEntry(std::__1::shared_ptr<DB::ReplicatedMergeTreeQueue::SelectedEntry>)::$_18&, std::__1::shared_ptr<DB::ReplicatedMEntry>&>(DB::StorageReplicatedMergeTree::processQueueEntry(std::__1::shared_ptr<DB::ReplicatedMergeTreeQueue::SelectedEntry>)::$_18&, std::__1::shared_ptr<DB::ReplicatedMergeTreeLogEntry>&) @ 0x1f65fa32 in /usr/bin/clickhouse
    02:47:18.979481 [ 29208 ] {} <Fatal> BaseDaemon: 13. ./obj-x86_64-linux-gnu/../contrib/libcxx/include/__functional_base:317: bool std::__1::__invoke_void_return_wrapper<bool>::__call<DB::StorageReplicatedMergeTree::processQueueEntry(std::__1::shared_ptr<DB::ReplireeQueue::SelectedEntry>)::$_18&, std::__1::shared_ptr<DB::ReplicatedMergeTreeLogEntry>&>(DB::StorageReplicatedMergeTree::processQueueEntry(std::__1::shared_ptr<DB::ReplicatedMergeTreeQueue::SelectedEntry>)::$_18&, std::__1::shared_ptr<DB::ReplicatedMergeTreeLogE x1f65f9f2 in /usr/bin/clickhouse
    02:47:19.450807 [ 29209 ] {} <Fatal> BaseDaemon: 13. ./obj-x86_64-linux-gnu/../contrib/libcxx/include/__functional_base:317: bool std::__1::__invoke_void_return_wrapper<bool>::__call<DB::StorageReplicatedMergeTree::processQueueEntry(std::__1::shared_ptr<DB::ReplireeQueue::SelectedEntry>)::$_18&, std::__1::shared_ptr<DB::ReplicatedMergeTreeLogEntry>&>(DB::StorageReplicatedMergeTree::processQueueEntry(std::__1::shared_ptr<DB::ReplicatedMergeTreeQueue::SelectedEntry>)::$_18&, std::__1::shared_ptr<DB::ReplicatedMergeTreeLogE x1f65f9f2 in /usr/bin/clickhouse
    02:47:21.055007 [ 29208 ] {} <Fatal> BaseDaemon: 14. ./obj-x86_64-linux-gnu/../contrib/libcxx/include/functional:1608: std::__1::__function::__default_alloc_func<DB::StorageReplicatedMergeTree::processQueueEntry(std::__1::shared_ptr<DB::ReplicatedMergeTreeQueue::ry>)::$_18, bool (std::__1::shared_ptr<DB::ReplicatedMergeTreeLogEntry>&)>::operator()(std::__1::shared_ptr<DB::ReplicatedMergeTreeLogEntry>&) @ 0x1f65f9b0 in /usr/bin/clickhouse
    02:47:23.546946 [ 413 ] {} <Fatal> Application: Child process was terminated by signal 6.
```

</details>

CI: https://clickhouse-test-reports.s3.yandex.net/0/4a8b82232c11512232df3ecdf4ffaec287116ad5/stress_test_(debug).html#fail1
@robot-clickhouse robot-clickhouse added the pr-bugfix Pull request with bugfix, not backported by default label Aug 27, 2021
@alesapin alesapin self-assigned this Aug 27, 2021
Copy link
Copy Markdown
Member

@alesapin alesapin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like these changes are correct. But there are several unclear moments:

  1. Temporary parts removed in their destructors https://github.com/ClickHouse/ClickHouse/blob/master/src/Storages/MergeTree/IMergeTreeDataPart.cpp#L418-L420. So it seems like we have a race condition here or at least unclear responsibility....
  2. Why do we have part stateTemporary and part flag is_temp? What is the difference?

@azat
Copy link
Copy Markdown
Member Author

azat commented Aug 28, 2021

@alesapin
Copy link
Copy Markdown
Member

Other failures unrelated to changes. I'm not sure about fetches test, but let's check in master.

@tavplubix
Copy link
Copy Markdown
Member

Seems like it didn't help: https://clickhouse-test-reports.s3.yandex.net/0/855a53ff8160c4638fe345b0d26e062804ba790a/stress_test_(debug).html#fail1

/var/log/clickhouse-server/clickhouse-server.log.2:2021.08.30 15:21:17.683297 [ 1242 ] {} <Fatal> : Logical error: 'Deleting data part all_0_0_0_3 doesn't exist'.
/var/log/clickhouse-server/clickhouse-server.log.2:2021.08.30 15:21:17.705345 [ 5135 ] {} <Fatal> : Logical error: 'Deleting data part all_0_0_0_3 doesn't exist'.
/var/log/clickhouse-server/clickhouse-server.log.2:2021.08.30 15:22:02.239592 [ 14783 ] {} <Fatal> BaseDaemon: ########################################
/var/log/clickhouse-server/clickhouse-server.log.2:2021.08.30 15:22:02.242827 [ 14783 ] {} <Fatal> BaseDaemon: (version 21.10.1.7940 (official build), build id: B0DD119D1B7152631927F24BC0B1B5FC66335294) (from thread 1242) (no query) Received signal Aborted (6)
/var/log/clickhouse-server/clickhouse-server.log.2:2021.08.30 15:22:02.246357 [ 14783 ] {} <Fatal> BaseDaemon: 
/var/log/clickhouse-server/clickhouse-server.log.2:2021.08.30 15:22:02.246629 [ 14783 ] {} <Fatal> BaseDaemon: Stack trace: 0x7f2516fca18b 0x7f2516fa9859 0x137aaa66 0x137aab75 0x1f94dffa 0x1f9631c0 0x1f5f9af6 0x1f5eb74c 0x1f666845 0x1f6667d2 0x1f666792 0x1f666750 0x1f6666f8 0x1fc2743e 0x1fc10830 0x1fc01425 0x1f61bc63 0x1f66845d 0x1f6683dd 0x1f66839d 0x1f668375 0x1f668340 0x1e271609 0x1e2715d5 0x1f8a438a 0x1f8a431d 0x1f8a42dd 0x1f8a42b5 0x1f8a4280 0x137f98a9 0x137f89d5 0x13821d0e 0x1382b764 0x1382b71d 0x1382b6a1 0x1382b5b2 0x1382b487 0x1382b39d 0x1382b35d 0x1382b335 0x1382b300
/var/log/clickhouse-server/clickhouse-server.log.2:2021.08.30 15:22:02.251633 [ 14783 ] {} <Fatal> BaseDaemon: 4. gsignal @ 0x4618b in /usr/lib/x86_64-linux-gnu/libc-2.31.so
/var/log/clickhouse-server/clickhouse-server.log.2:2021.08.30 15:22:02.251845 [ 14783 ] {} <Fatal> BaseDaemon: 5. abort @ 0x25859 in /usr/lib/x86_64-linux-gnu/libc-2.31.so
/var/log/clickhouse-server/clickhouse-server.log.2:2021.08.30 15:22:02.269264 [ 14782 ] {} <Fatal> BaseDaemon: ########################################
/var/log/clickhouse-server/clickhouse-server.log.2:2021.08.30 15:22:02.275405 [ 14782 ] {} <Fatal> BaseDaemon: (version 21.10.1.7940 (official build), build id: B0DD119D1B7152631927F24BC0B1B5FC66335294) (from thread 5135) (no query) Received signal Aborted (6)
/var/log/clickhouse-server/clickhouse-server.log.2:2021.08.30 15:22:02.279121 [ 14782 ] {} <Fatal> BaseDaemon: 
/var/log/clickhouse-server/clickhouse-server.log.2:2021.08.30 15:22:02.279384 [ 14782 ] {} <Fatal> BaseDaemon: Stack trace: 0x7f2516fca18b 0x7f2516fa9859 0x137aaa66 0x137aab75 0x1f94dffa 0x1f9631c0 0x1f5f9af6 0x1f5eb74c 0x1f666845 0x1f6667d2 0x1f666792 0x1f666750 0x1f6666f8 0x1fc2743e 0x1fc10830 0x1fc01425 0x1f61bc63 0x1f66845d 0x1f6683dd 0x1f66839d 0x1f668375 0x1f668340 0x1e271609 0x1e2715d5 0x1f8a438a 0x1f8a431d 0x1f8a42dd 0x1f8a42b5 0x1f8a4280 0x137f98a9 0x137f89d5 0x13821d0e 0x1382b764 0x1382b71d 0x1382b6a1 0x1382b5b2 0x1382b487 0x1382b39d 0x1382b35d 0x1382b335 0x1382b300
/var/log/clickhouse-server/clickhouse-server.log.2:2021.08.30 15:22:02.299855 [ 14782 ] {} <Fatal> BaseDaemon: 4. gsignal @ 0x4618b in /usr/lib/x86_64-linux-gnu/libc-2.31.so
/var/log/clickhouse-server/clickhouse-server.log.2:2021.08.30 15:22:02.300150 [ 14782 ] {} <Fatal> BaseDaemon: 5. abort @ 0x25859 in /usr/lib/x86_64-linux-gnu/libc-2.31.so
/var/log/clickhouse-server/clickhouse-server.log.2:2021.08.30 15:22:02.628776 [ 14783 ] {} <Fatal> BaseDaemon: 6. ./obj-x86_64-linux-gnu/../src/Common/Exception.cpp:53: DB::handle_error_code(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int, bool, std::__1::vector<void*, std::__1::allocator<void*> > const&) @ 0x137aaa66 in /usr/bin/clickhouse
/var/log/clickhouse-server/clickhouse-server.log.2:2021.08.30 15:22:02.653743 [ 14782 ] {} <Fatal> BaseDaemon: 6. ./obj-x86_64-linux-gnu/../src/Common/Exception.cpp:53: DB::handle_error_code(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int, bool, std::__1::vector<void*, std::__1::allocator<void*> > const&) @ 0x137aaa66 in /usr/bin/clickhouse
/var/log/clickhouse-server/clickhouse-server.log.2:2021.08.30 15:22:02.840573 [ 14783 ] {} <Fatal> BaseDaemon: 7. ./obj-x86_64-linux-gnu/../src/Common/Exception.cpp:60: DB::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int, bool) @ 0x137aab75 in /usr/bin/clickhouse
/var/log/clickhouse-server/clickhouse-server.log.2:2021.08.30 15:22:02.888479 [ 14782 ] {} <Fatal> BaseDaemon: 7. ./obj-x86_64-linux-gnu/../src/Common/Exception.cpp:60: DB::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int, bool) @ 0x137aab75 in /usr/bin/clickhouse
/var/log/clickhouse-server/clickhouse-server.log.2:2021.08.30 15:22:09.920365 [ 14783 ] {} <Fatal> BaseDaemon: 8. ./obj-x86_64-linux-gnu/../src/Storages/MergeTree/MergeTreeData.cpp:1254: DB::MergeTreeData::removePartsFinally(std::__1::vector<std::__1::shared_ptr<DB::IMergeTreeDataPart const>, std::__1::allocator<std::__1::shared_ptr<DB::IMergeTreeDataPart const> > > const&) @ 0x1f94dffa in /usr/bin/clickhouse
/var/log/clickhouse-server/clickhouse-server.log.2:2021.08.30 15:22:09.921135 [ 14782 ] {} <Fatal> BaseDaemon: 8. ./obj-x86_64-linux-gnu/../src/Storages/MergeTree/MergeTreeData.cpp:1254: DB::MergeTreeData::removePartsFinally(std::__1::vector<std::__1::shared_ptr<DB::IMergeTreeDataPart const>, std::__1::allocator<std::__1::shared_ptr<DB::IMergeTreeDataPart const> > > const&) @ 0x1f94dffa in /usr/bin/clickhouse
/var/log/clickhouse-server/clickhouse-server.log.2:2021.08.30 15:22:12.107053 [ 14782 ] {} <Fatal> BaseDaemon: 9. ./obj-x86_64-linux-gnu/../src/Storages/MergeTree/MergeTreeData.cpp:2627: DB::MergeTreeData::tryRemovePartImmediately(std::__1::shared_ptr<DB::IMergeTreeDataPart const>&&) @ 0x1f9631c0 in /usr/bin/clickhouse
/var/log/clickhouse-server/clickhouse-server.log.2:2021.08.30 15:22:12.107740 [ 14783 ] {} <Fatal> BaseDaemon: 9. ./obj-x86_64-linux-gnu/../src/Storages/MergeTree/MergeTreeData.cpp:2627: DB::MergeTreeData::tryRemovePartImmediately(std::__1::shared_ptr<DB::IMergeTreeDataPart const>&&) @ 0x1f9631c0 in /usr/bin/clickhouse
/var/log/clickhouse-server/clickhouse-server.log.2:2021.08.30 15:22:14.923952 [ 14783 ] {} <Fatal> BaseDaemon: 10. ./obj-x86_64-linux-gnu/../src/Storages/StorageReplicatedMergeTree.cpp:1939: DB::StorageReplicatedMergeTree::tryExecutePartMutation(DB::ReplicatedMergeTreeLogEntry const&) @ 0x1f5f9af6 in /usr/bin/clickhouse
/var/log/clickhouse-server/clickhouse-server.log.2:2021.08.30 15:22:14.924804 [ 14782 ] {} <Fatal> BaseDaemon: 10. ./obj-x86_64-linux-gnu/../src/Storages/StorageReplicatedMergeTree.cpp:1939: DB::StorageReplicatedMergeTree::tryExecutePartMutation(DB::ReplicatedMergeTreeLogEntry const&) @ 0x1f5f9af6 in /usr/bin/clickhouse
/var/log/clickhouse-server/clickhouse-server.log.2:2021.08.30 15:22:16.983413 [ 14783 ] {} <Fatal> BaseDaemon: 11. ./obj-x86_64-linux-gnu/../src/Storages/StorageReplicatedMergeTree.cpp:1581: DB::StorageReplicatedMergeTree::executeLogEntry(DB::ReplicatedMergeTreeLogEntry&) @ 0x1f5eb74c in /usr/bin/clickhouse
/var/log/clickhouse-server/clickhouse-server.log.2:2021.08.30 15:22:17.047274 [ 14782 ] {} <Fatal> BaseDaemon: 11. ./obj-x86_64-linux-gnu/../src/Storages/StorageReplicatedMergeTree.cpp:1581: DB::StorageReplicatedMergeTree::executeLogEntry(DB::ReplicatedMergeTreeLogEntry&) @ 0x1f5eb74c in /usr/bin/clickhouse
/var/log/clickhouse-server/clickhouse-server.log.2:2021.08.30 15:22:19.440360 [ 14783 ] {} <Fatal> BaseDaemon: 12. ./obj-x86_64-linux-gnu/../src/Storages/StorageReplicatedMergeTree.cpp:3176: DB::StorageReplicatedMergeTree::processQueueEntry(std::__1::shared_ptr<DB::ReplicatedMergeTreeQueue::SelectedEntry>)::$_18::operator()(std::__1::shared_ptr<DB::ReplicatedMergeTreeLogEntry>&) const @ 0x1f666845 in /usr/bin/clickhouse
/var/log/clickhouse-server/clickhouse-server.log.2:2021.08.30 15:22:19.923363 [ 14782 ] {} <Fatal> BaseDaemon: 12. ./obj-x86_64-linux-gnu/../src/Storages/StorageReplicatedMergeTree.cpp:3176: DB::StorageReplicatedMergeTree::processQueueEntry(std::__1::shared_ptr<DB::ReplicatedMergeTreeQueue::SelectedEntry>)::$_18::operator()(std::__1::shared_ptr<DB::ReplicatedMergeTreeLogEntry>&) const @ 0x1f666845 in /usr/bin/clickhouse
/var/log/clickhouse-server/clickhouse-server.log.2:2021.08.30 15:22:24.274328 [ 415 ] {} <Fatal> Application: Child process was terminated by signal 6.

@azat
Copy link
Copy Markdown
Member Author

azat commented Aug 30, 2021

azat added a commit to azat/ClickHouse that referenced this pull request Aug 30, 2021
CI reports [1]:

    <Fatal> : Logical error: 'Deleting data part all_0_0_0_3 doesn't exist'.
    ...
    <Fatal> BaseDaemon: 8. ./obj-x86_64-linux-gnu/../src/Storages/MergeTree/MergeTreeData.cpp:1254: DB::MergeTreeData::removePartsFinally() @ 0x1f94dffa in /usr/bin/clickhouse

  [1]: https://clickhouse-test-reports.s3.yandex.net/0/855a53ff8160c4638fe345b0d26e062804ba790a/stress_test_(debug).html#fail1

Follow-up for: ClickHouse#28221
@azat azat deleted the remove-Temporary-part-fix branch August 30, 2021 18:10
@robot-clickhouse robot-clickhouse added pr-improvement Pull request with some product improvements and removed pr-bugfix Pull request with bugfix, not backported by default labels Aug 31, 2021
@alexey-milovidov
Copy link
Copy Markdown
Member

Classification of this issue was incorrect.
It cannot be classificated as "bug fix" because there was no user visible scenario reproducing this issue.
It was found only by fuzzing (stress test).

I changed to improvement.

@azat
Copy link
Copy Markdown
Member Author

azat commented Aug 31, 2021

It was found only by fuzzing (stress test).

But under certain circumstances it may be triggered by user too, so I'm voting for a Bug fix

@tavplubix
Copy link
Copy Markdown
Member

It cannot be classificated as "bug fix" because there was no user visible scenario reproducing this issue.

KILL MUTATION may lead replicas to having a temporary part with the same name, but different data and checksums. The first replica will successfully commit part to ZooKeeper and others will throw Part ... doesn't exist on attempt to remove temporary part with mismatching checksums. Seems like in very rare cases (e.g. when the first replica have applied mutation commands to the part before mutation was killed and there are no covering parts) it may cause MUTATE_PART entry to stuck with Part ... doesn't exist error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-improvement Pull request with some product improvements

Projects

None yet

Development

Successfully merging this pull request may close these issues.

01079_parallel_alter_add_drop_column_zookeeper.sh: Logical error: Part doesn't exist.

5 participants