Skip to content

LXC containers: Crashes after update/hardware server restart #32806

@vitamin-caig

Description

@vitamin-caig

Hello!

I'm having this problem for about a year.
After CH updating and/or hardware server reboot (CH installed in LXC container), I get a bunch of random crashes. Initially, they appear just at start. Some time later they appear only at quite high load (multiple parallel queries). Later they disappear until reboot/update.
There're no any RAM errors reported by memcheck (no ECC though), no any random crashes or errors in other software (10+ LXC containers and KVMs running, zfs is using most of the RAM).

2021.12.15 16:04:10.923799 [ 9351 ] {} <Fatal> BaseDaemon: ########################################
2021.12.15 16:04:10.923890 [ 9351 ] {} <Fatal> BaseDaemon: (version 21.8.12.29 (official build), build id: 89CB735EABD0B424DF213861E4D0FD666E2A0CF1) (from thread 8643) (no query) Received signal Segmentation fault (11)
2021.12.15 16:04:10.923946 [ 9351 ] {} <Fatal> BaseDaemon: Address: 0x7f3300007f1c Access: read. Address not mapped to object.
2021.12.15 16:04:10.923970 [ 9351 ] {} <Fatal> BaseDaemon: Stack trace: 0x10d2b0b0 0x10d5dd2e 0x10f39120 0x10f3dd57 0x10c8d6d0 0x10052768 0x10054797 0x10055514 0x9024b1f 0x9028403 0x7f33ac9d6ea7 0x7f33ac8f5def
2021.12.15 16:04:10.924078 [ 9351 ] {} <Fatal> BaseDaemon: 1. DB::MergeTreeData::getDataPartsVector(std::initializer_list<DB::IMergeTreeDataPart::State> const&, std::__1::vector<DB::IMergeTreeDataPart::State, std::__1::allocator<DB::IMergeTreeDataPart::State> >*, bool) const @ 0x10d2b0b0 in /usr/bin/clickhouse
2021.12.15 16:04:10.924136 [ 9351 ] {} <Fatal> BaseDaemon: 2. DB::MergeTreeDataMergerMutator::selectPartsToMerge(DB::FutureMergedMutatedPart&, bool, unsigned long, std::__1::function<bool (std::__1::shared_ptr<DB::IMergeTreeDataPart const> const&, std::__1::shared_ptr<DB::IMergeTreeDataPart const> const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >*)> const&, bool, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>*) @ 0x10d5dd2e in /usr/bin/clickhouse
2021.12.15 16:04:10.924200 [ 9351 ] {} <Fatal> BaseDaemon: 3. DB::StorageMergeTree::selectPartsToMerge(std::__1::shared_ptr<DB::StorageInMemoryMetadata const> const&, bool, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, bool, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >*, std::__1::shared_ptr<DB::RWLockImpl::LockHolderImpl>&, std::__1::unique_lock<std::__1::mutex>&, bool, DB::SelectPartsDecision*) @ 0x10f39120 in /usr/bin/clickhouse
2021.12.15 16:04:10.924225 [ 9351 ] {} <Fatal> BaseDaemon: 4. DB::StorageMergeTree::scheduleDataProcessingJob(DB::IBackgroundJobExecutor&) @ 0x10f3dd57 in /usr/bin/clickhouse
2021.12.15 16:04:10.924258 [ 9351 ] {} <Fatal> BaseDaemon: 5. DB::IBackgroundJobExecutor::backgroundTaskFunction() @ 0x10c8d6d0 in /usr/bin/clickhouse
2021.12.15 16:04:10.924851 [ 9351 ] {} <Fatal> BaseDaemon: 6. DB::BackgroundSchedulePoolTaskInfo::execute() @ 0x10052768 in /usr/bin/clickhouse
2021.12.15 16:04:10.924882 [ 9351 ] {} <Fatal> BaseDaemon: 7. DB::BackgroundSchedulePool::threadFunction() @ 0x10054797 in /usr/bin/clickhouse
2021.12.15 16:04:10.924900 [ 9351 ] {} <Fatal> BaseDaemon: 8. ? @ 0x10055514 in /usr/bin/clickhouse
2021.12.15 16:04:10.924933 [ 9351 ] {} <Fatal> BaseDaemon: 9. ThreadPoolImpl<std::__1::thread>::worker(std::__1::__list_iterator<std::__1::thread, void*>) @0x9024b1f in /usr/bin/clickhouse
2021.12.15 16:04:10.924951 [ 9351 ] {} <Fatal> BaseDaemon: 10. ? @ 0x9028403 in /usr/bin/clickhouse
2021.12.15 16:04:10.924988 [ 9351 ] {} <Fatal> BaseDaemon: 11. start_thread @ 0x8ea7 in /lib/x86_64-linux-gnu/libpthread-2.31.so
2021.12.15 16:04:10.925012 [ 9351 ] {} <Fatal> BaseDaemon: 12. clone @ 0xfddef in /lib/x86_64-linux-gnu/libc-2.31.so
2021.12.15 16:04:11.037995 [ 9351 ] {} <Fatal> BaseDaemon: Checksum of the binary: A8C5BDC5B60DE1251EAACB4D0E110F95, integrity check passed.
2021.12.15 16:04:30.981745 [ 8604 ] {} <Fatal> Application: Child process was terminated by signal 11.

and

2021.12.15 16:05:02.470615 [ 9544 ] {} <Fatal> BaseDaemon: ########################################
2021.12.15 16:05:02.470699 [ 9544 ] {} <Fatal> BaseDaemon: (version 21.8.12.29 (official build), build id: 89CB735EABD0B424DF213861E4D0FD666E2A0CF1) (from thread 9440) (no query) Received signal Segmentation fault (11)
2021.12.15 16:05:02.470740 [ 9544 ] {} <Fatal> BaseDaemon: Address: NULL pointer. Access: read. Unknown si_code.
2021.12.15 16:05:02.470783 [ 9544 ] {} <Fatal> BaseDaemon: Stack trace: 0x10d50d30 0x10d2b832 0x10f3ae37 0x10f3ddf0 0x10c8d6d0 0x10052768 0x10054797 0x10055514 0x9024b1f 0x9028403 0x7f91d1000ea7 0x7f91d0f1fdef
2021.12.15 16:05:02.470937 [ 9544 ] {} <Fatal> BaseDaemon: 1. std::__1::back_insert_iterator<std::__1::vector<std::__1::shared_ptr<DB::IMergeTreeDataPart const>, std::__1::allocator<std::__1::shared_ptr<DB::IMergeTreeDataPart const> > > > std::__1::__merge<DB::MergeTreeData::LessDataPart&, boost::multi_index::detail::bidir_node_iterator<boost::multi_index::detail::ordered_index_node<boost::multi_index::detail::null_augment_policy, boost::multi_index::detail::index_node_base<std::__1::shared_ptr<DB::IMergeTreeDataPart const>, std::__1::allocator<std::__1::shared_ptr<DB::IMergeTreeDataPart const> > > > >, std::__1::__wrap_iter<std::__1::shared_ptr<DB::IMergeTreeDataPart const>*>, std::__1::back_insert_iterator<std::__1::vector<std::__1::shared_ptr<DB::IMergeTreeDataPart const>, std::__1::allocator<std::__1::shared_ptr<DB::IMergeTreeDataPart const> > > > >(boost::multi_index::detail::bidir_node_iterator<boost::multi_index::detail::ordered_index_node<boost::multi_index::detail::null_augment_policy, boost::multi_index::detail::index_node_base<std::__1::shared_ptr<DB::IMergeTreeDataPart const>, std::__1::allocator<std::__1::shared_ptr<DB::IMergeTreeDataPart const> > > > >, boost::multi_index::detail::bidir_node_iterator<boost::multi_index::detail::ordered_index_node<boost::multi_index::detail::null_augment_policy, boost::multi_index::detail::index_node_base<std::__1::shared_ptr<DB::IMergeTreeDataPart const>, std::__1::allocator<std::__1::shared_ptr<DB::IMergeTreeDataPart const> > > > >, std::__1::__wrap_iter<std::__1::shared_ptr<DB::IMergeTreeDataPart const>*>, std::__1::__wrap_iter<std::__1::shared_ptr<DB::IMergeTreeDataPart const>*>, std::__1::back_insert_iterator<std::__1::vector<std::__1::shared_ptr<DB::IMergeTreeDataPart const>, std::__1::allocator<std::__1::shared_ptr<DB::IMergeTreeDataPart const> > > >, DB::MergeTreeData::LessDataPart&) @ 0x10d50d30 in /usr/bin/clickhouse
2021.12.15 16:05:02.470997 [ 9544 ] {} <Fatal> BaseDaemon: 2. DB::MergeTreeData::getDataPartsVector(std::initializer_list<DB::IMergeTreeDataPart::State> const&, std::__1::vector<DB::IMergeTreeDataPart::State, std::__1::allocator<DB::IMergeTreeDataPart::State> >*, bool) const @ 0x10d2b832 in /usr/bin/clickhouse
2021.12.15 16:05:02.471037 [ 9544 ] {} <Fatal> BaseDaemon: 3. DB::StorageMergeTree::selectPartsToMutate(std::__1::shared_ptr<DB::StorageInMemoryMetadata const> const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >*, std::__1::shared_ptr<DB::RWLockImpl::LockHolderImpl>&) @ 0x10f3ae37 in /usr/bin/clickhouse
2021.12.15 16:05:02.471064 [ 9544 ] {} <Fatal> BaseDaemon: 4. DB::StorageMergeTree::scheduleDataProcessingJob(DB::IBackgroundJobExecutor&) @ 0x10f3ddf0 in /usr/bin/clickhouse
2021.12.15 16:05:02.471093 [ 9544 ] {} <Fatal> BaseDaemon: 5. DB::IBackgroundJobExecutor::backgroundTaskFunction() @ 0x10c8d6d0 in /usr/bin/clickhouse
2021.12.15 16:05:02.471116 [ 9544 ] {} <Fatal> BaseDaemon: 6. DB::BackgroundSchedulePoolTaskInfo::execute() @ 0x10052768 in /usr/bin/clickhouse
2021.12.15 16:05:02.471137 [ 9544 ] {} <Fatal> BaseDaemon: 7. DB::BackgroundSchedulePool::threadFunction() @ 0x10054797 in /usr/bin/clickhouse
2021.12.15 16:05:02.471156 [ 9544 ] {} <Fatal> BaseDaemon: 8. ? @ 0x10055514 in /usr/bin/clickhouse
2021.12.15 16:05:02.471182 [ 9544 ] {} <Fatal> BaseDaemon: 9. ThreadPoolImpl<std::__1::thread>::worker(std::__1::__list_iterator<std::__1::thread, void*>) @0x9024b1f in /usr/bin/clickhouse
2021.12.15 16:05:02.471201 [ 9544 ] {} <Fatal> BaseDaemon: 10. ? @ 0x9028403 in /usr/bin/clickhouse
2021.12.15 16:05:02.471228 [ 9544 ] {} <Fatal> BaseDaemon: 11. start_thread @ 0x8ea7 in /lib/x86_64-linux-gnu/libpthread-2.31.so
2021.12.15 16:05:02.471254 [ 9544 ] {} <Fatal> BaseDaemon: 12. clone @ 0xfddef in /lib/x86_64-linux-gnu/libc-2.31.so
2021.12.15 16:05:02.582671 [ 9544 ] {} <Fatal> BaseDaemon: Checksum of the binary: A8C5BDC5B60DE1251EAACB4D0E110F95, integrity check passed.
2021.12.15 16:05:22.491631 [ 9394 ] {} <Fatal> Application: Child process was terminated by signal 11.

Crash reporting is enabled.

Metadata

Metadata

Labels

crashCrash / segfault / abortpotential bugTo be reviewed by developers and confirmed/rejected.st-need-infoWe need extra data to continue (waiting for response). Either some details or a repro of the issue.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions