Skip to content

Fix stack overflow in recursive Buffer tables.#40643

Merged
alexey-milovidov merged 7 commits intomasterfrom
fix-recursive-buffer
Aug 27, 2022
Merged

Fix stack overflow in recursive Buffer tables.#40643
alexey-milovidov merged 7 commits intomasterfrom
fix-recursive-buffer

Conversation

@alexey-milovidov
Copy link
Copy Markdown
Member

Changelog category (leave one):

  • Bug Fix (user-visible misbehavior in official stable or prestable release)

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Fix stack overflow in recursive Buffer tables. This closes #40637

@robot-ch-test-poll2 robot-ch-test-poll2 added the pr-bugfix Pull request with bugfix, not backported by default label Aug 25, 2022
if (underlying)
underlying_rows = underlying->totalRows(settings);
if (!underlying_rows)
return underlying_rows;
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here was another bug, CC @azat.

Copy link
Copy Markdown
Member

@azat azat Aug 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, nice catch! (Thread can be resolved).

{
std::optional<UInt64> underlying_rows;
auto underlying = DatabaseCatalog::instance().tryGetTable(destination_id, getContext());
auto destination = DatabaseCatalog::instance().tryGetTable(destination_id, getContext());
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a helper to get destination, like in #40640?

DROP TABLE IF EXISTS test;
CREATE TABLE test (key UInt32) Engine = Buffer(currentDatabase(), test, 16, 10, 100, 10000, 1000000, 10000000, 100000000);
SELECT * FROM test; -- { serverError 269 }
SELECT * FROM system.tables WHERE table = 'test' AND database = currentDatabase() FORMAT Null; -- { serverError 269 }
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note, that concurrent run this test with select * from system.tables queries (in some other tests) may produce INFINITE_LOOP errors there, but maybe this test should be run in parallel, and instead tests that do select * from system.tables should not.

@azat
Copy link
Copy Markdown
Member

azat commented Aug 26, 2022

Stateless tests (ubsan) — Tests are not finished, fail: 1, passed: 63, skipped: 2

Simple queries takes too much time, I'm pretty sure that there are some hung tasks reported by the kernel, maybe dmesg should be captured for all tests (just in case).

Stress test (debug) — Backward compatibility check: Error message in clickhouse-server.log (see bc_check_error_messages.txt)

Will take a look at OOM

Stress test (thread) — Sanitizer assert (in stderr.log)

==================
WARNING: ThreadSanitizer: data race (pid=666)
  Read of size 8 at 0x7b44023c74e0 by thread T2702 (mutexes: write M0):
    #0 <null> <null> (clickhouse+0x19f9e75b) (BuildId: 9ba41c32de4e62863b82c2387fcabcd793540e90)

Bad report, #40655 should help

There is a data-race between reading in system.filesystem_cache, and changing the segments for FileSegment::range().
But the fix will conflict with #40586, so I will wait then (cc @kssenii).

  Read of size 8 at 0x7b44023c74e0 by thread T2702 (mutexes: write M0):
azat@s1:~/ch/tmp/40643$ llvm-symbolizer --exe ./clickhouse $(awk -F '[(+)]' '{print $3}' tsan-read.log) | head -n15
DB::FileSegment::Range::size() const
./build_docker/../src/Common/FileSegment.h:98:38
DB::FileSegment::getSnapshot(std::__1::shared_ptr<DB::FileSegment> const&, std::__1::lock_guard<std::__1::mutex>&)
./build_docker/../src/Common/FileSegment.cpp:667:31

DB::FileCache::getSnapshot() const
./build_docker/../src/Common/FileCache.cpp:1067:37

DB::StorageSystemFilesystemCache::fillData(std::__1::vector<COW<DB::IColumn>::mutable_ptr<DB::IColumn>, std::__1::allocator<COW<DB::IColumn>::mutable_ptr<DB::IColumn> > >&, std::__1::shared_ptr<DB::Context const>, DB::SelectQueryInfo const&) const
./build_docker/../src/Storages/System/StorageSystemFilesystemCache.cpp:43:37

DB::IStorageSystemOneBlock<DB::StorageSystemFilesystemCache>::read(std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&, std::__1::shared_ptr<DB::StorageSnapshot> const&, DB::SelectQueryInfo&, std::__1::shared_ptr<DB::Context const>, DB::QueryProcessingStage::Enum, unsigned long, unsigned int)
./build_docker/../src/Storages/System/IStorageSystemOneBlock.h:54:9

DB::IStorage::read(DB::QueryPlan&, std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&, std::__1::shared_ptr<DB::StorageSnapshot> const&, DB::SelectQueryInfo&, std::__1::shared_ptr<DB::Context const>, DB::QueryProcessingStage::Enum, unsigned long, unsigned int)

  Previous write of size 8 at 0x7b44023c74e0 by thread T1807:

azat@s1:~/ch/tmp/40643$ llvm-symbolizer --exe ./clickhouse $(awk -F '[(+)]' '{print $3}' tsan-write.log) | head -n15
DB::FileSegmentRangeWriter::completeFileSegment(DB::FileSegment&)
./build_docker/../src/Common/FileSegment.cpp:827:36

DB::FileSegmentRangeWriter::finalize()
./build_docker/../src/Common/FileSegment.cpp:917:5

DB::CachedOnDiskWriteBufferFromFile::finalizeImpl()
./build_docker/../src/Disks/IO/CachedOnDiskWriteBufferFromFile.cpp:175:23

DB::WriteBuffer::finalize()
./build_docker/../src/IO/WriteBuffer.h:125:13

DB::MergedBlockOutputStream::Finalizer::Impl::finish()
./build_docker/../src/Storages/MergeTree/MergedBlockOutputStream.cpp:102:15
DB::MergedBlockOutputStream::Finalizer::finish()

@kssenii
Copy link
Copy Markdown
Member

kssenii commented Aug 26, 2022

There is a data-race between reading in system.filesystem_cache, and changing the segments for FileSegment::range().
But the fix will conflict with #40586, so I will wait then (cc @kssenii).

yeah, I fixed it in a different pr #40608.

@azat
Copy link
Copy Markdown
Member

azat commented Aug 27, 2022

@alexey-milovidov alexey-milovidov merged commit 7569cbb into master Aug 27, 2022
@alexey-milovidov alexey-milovidov deleted the fix-recursive-buffer branch August 27, 2022 16:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-bugfix Pull request with bugfix, not backported by default

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SegFault with "recursive" buffer table

4 participants