Skip to content

Small simplification of MergeTreeDataWriter#17943

Merged
alesapin merged 33 commits intomasterfrom
try_rewrite_writer
Dec 18, 2020
Merged

Small simplification of MergeTreeDataWriter#17943
alesapin merged 33 commits intomasterfrom
try_rewrite_writer

Conversation

@alesapin
Copy link
Copy Markdown
Member

@alesapin alesapin commented Dec 9, 2020

I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

Changelog category (leave one):

  • Not for changelog (changelog entry is not required)

@robot-clickhouse robot-clickhouse added the pr-not-for-changelog This PR should not be mentioned in the changelog label Dec 9, 2020
@alesapin alesapin changed the title Try rewrite writer Mechanical refactoring of MergeTreeDataWriter Dec 9, 2020
@alesapin alesapin changed the title Mechanical refactoring of MergeTreeDataWriter [WIP] Trying to refactoring MergeTreeDataWriter Dec 14, 2020
@alesapin alesapin changed the title [WIP] Trying to refactoring MergeTreeDataWriter [WIP] Trying to refactor MergeTreeDataWriter Dec 14, 2020
@alesapin alesapin changed the title [WIP] Trying to refactor MergeTreeDataWriter Small simplification of MergeTreeDataWriter Dec 15, 2020
@alesapin alesapin requested a review from CurtizJ December 15, 2020 10:39
using SerializationState = IDataType::SerializeBinaryBulkStatePtr;
using SerializationStates = std::unordered_map<String, SerializationState>;
/// Count index_granularity for block and store in `index_granularity`
size_t computeIndexGranularity(const Block & block) const;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, index granularity was somewhat confusing to me now that we have secondary indexes. Maybe order_by_granularity or something would be better...

Copy link
Copy Markdown
Member Author

@alesapin alesapin Dec 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, order_by_granularity is more confusing because of not all columns from ORDER BY expression stored in primary.idx on disk. Secondary indices granularity calculated in primary index granules. I agree that these interchangeable words like mark, granule, granularity, rows in mark, index granularity and so on are very confusing at first glance.

Copy link
Copy Markdown
Member

@CurtizJ CurtizJ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general looks ok.

@alesapin
Copy link
Copy Markdown
Member Author

Saved preprocessed configuration to '/var/lib/clickhouse/preprocessed_configs/config.xml'.
../src/Common/PODArray.h:497:29: runtime error: null pointer passed as argument 2, which is declared to never be null
../contrib/libc-headers/string.h:43:28: note: nonnull attribute specified here
    #0 0xebee778 in void DB::PODArray<double, 32ul, DB::MixedArenaAllocator<4096ul, Allocator<false, false>, DB::AlignedArenaAllocator<8ul>, 8ul>, 0ul, 0ul>::insert_assume_reserved<double const*, double const*>(double const*, double const*) /build/obj-x86_64-linux-gnu/../src/Common/PODArray.h:497:9
    #1 0x1021e945 in void DB::PODArray<double, 32ul, DB::MixedArenaAllocator<4096ul, Allocator<false, false>, DB::AlignedArenaAllocator<8ul>, 8ul>, 0ul, 0ul>::insert<double const*, double const*, DB::Arena*&>(double const*, double const*, DB::Arena*&) /build/obj-x86_64-linux-gnu/../src/Common/PODArray.h:457:9
    #2 0x1021e945 in DB::StatisticalSample<double, double>::merge(DB::StatisticalSample<double, double> const&, DB::Arena*) /build/obj-x86_64-linux-gnu/../src/AggregateFunctions/StatCommon.h:90:11
    #3 0x1945ed13 in DB::Aggregator::mergeWithoutKeyDataImpl(std::__1::vector<std::__1::shared_ptr<DB::AggregatedDataVariants>, std::__1::allocator<std::__1::shared_ptr<DB::AggregatedDataVariants> > >&) const /build/obj-x86_64-linux-gnu/../src/Interpreters/Aggregator.cpp:1699:37
    #4 0x1a724ba7 in DB::ConvertingAggregatedToChunksTransform::initialize() /build/obj-x86_64-linux-gnu/../src/Processors/Transforms/AggregatingTransform.cpp:340:32
    #5 0x1a499098 in DB::executeJob(DB::IProcessor*) /build/obj-x86_64-linux-gnu/../src/Processors/Executors/PipelineExecutor.cpp:79:20
    #6 0x1a498f86 in DB::PipelineExecutor::addJob(DB::ExecutingGraph::Node*)::$_0::operator()() const /build/obj-x86_64-linux-gnu/../src/Processors/Executors/PipelineExecutor.cpp:96:13
    #7 0x1a498f86 in decltype(std::__1::forward<DB::PipelineExecutor::addJob(DB::ExecutingGraph::Node*)::$_0&>(fp)()) std::__1::__invoke<DB::PipelineExecutor::addJob(DB::ExecutingGraph::Node*)::$_0&>(DB::PipelineExecutor::addJob(DB::ExecutingGraph::Node*)::$_0&) /build/obj-x86_64-linux-gnu/../contrib/libcxx/include/type_traits:3519:1
    #8 0x1a49751a in std::__1::__function::__value_func<void ()>::operator()() const /build/obj-x86_64-linux-gnu/../contrib/libcxx/include/functional:1867:16
    #9 0x1a49751a in std::__1::function<void ()>::operator()() const /build/obj-x86_64-linux-gnu/../contrib/libcxx/include/functional:2473:12
    #10 0x1a49751a in DB::PipelineExecutor::executeStepImpl(unsigned long, unsigned long, std::__1::atomic<bool>*) /build/obj-x86_64-linux-gnu/../src/Processors/Executors/PipelineExecutor.cpp:562:17
    #11 0x1a49a1e6 in DB::PipelineExecutor::executeSingleThread(unsigned long, unsigned long) /build/obj-x86_64-linux-gnu/../src/Processors/Executors/PipelineExecutor.cpp:478:5
    #12 0x1a49a1e6 in DB::PipelineExecutor::executeImpl(unsigned long)::$_4::operator()() const /build/obj-x86_64-linux-gnu/../src/Processors/Executors/PipelineExecutor.cpp:739:21
    #13 0x1a49a1e6 in decltype(std::__1::forward<DB::PipelineExecutor::executeImpl(unsigned long)::$_4&>(fp)()) std::__1::__invoke_constexpr<DB::PipelineExecutor::executeImpl(unsigned long)::$_4&>(DB::PipelineExecutor::executeImpl(unsigned long)::$_4&) /build/obj-x86_64-linux-gnu/../contrib/libcxx/include/type_traits:3525:1
    #14 0x1a49a0a9 in decltype(auto) std::__1::__apply_tuple_impl<DB::PipelineExecutor::executeImpl(unsigned long)::$_4&, std::__1::tuple<>&>(DB::PipelineExecutor::executeImpl(unsigned long)::$_4&, std::__1::tuple<>&, std::__1::__tuple_indices<>) /build/obj-x86_64-linux-gnu/../contrib/libcxx/include/tuple:1415:1
    #15 0x1a49a0a9 in decltype(auto) std::__1::apply<DB::PipelineExecutor::executeImpl(unsigned long)::$_4&, std::__1::tuple<>&>(DB::PipelineExecutor::executeImpl(unsigned long)::$_4&, std::__1::tuple<>&) /build/obj-x86_64-linux-gnu/../contrib/libcxx/include/tuple:1424:1
    #16 0x1a49a0a9 in ThreadFromGlobalPool::ThreadFromGlobalPool<DB::PipelineExecutor::executeImpl(unsigned long)::$_4>(DB::PipelineExecutor::executeImpl(unsigned long)::$_4&&)::'lambda'()::operator()() /build/obj-x86_64-linux-gnu/../src/Common/ThreadPool.h:178:13
    #17 0x1a49a0a9 in decltype(std::__1::forward<DB::PipelineExecutor::executeImpl(unsigned long)::$_4>(fp)()) std::__1::__invoke<ThreadFromGlobalPool::ThreadFromGlobalPool<DB::PipelineExecutor::executeImpl(unsigned long)::$_4>(DB::PipelineExecutor::executeImpl(unsigned long)::$_4&&)::'lambda'()&>(DB::PipelineExecutor::executeImpl(unsigned long)::$_4&&) /build/obj-x86_64-linux-gnu/../contrib/libcxx/include/type_traits:3519:1
    #18 0xe69c3ac in std::__1::function<void ()>::operator()() const /build/obj-x86_64-linux-gnu/../contrib/libcxx/include/functional:2473:12
    #19 0xe69c3ac in ThreadPoolImpl<std::__1::thread>::worker(std::__1::__list_iterator<std::__1::thread, void*>) /build/obj-x86_64-linux-gnu/../src/Common/ThreadPool.cpp:236:17
    #20 0xe69ffa5 in void ThreadPoolImpl<std::__1::thread>::scheduleImpl<void>(std::__1::function<void ()>, int, std::__1::optional<unsigned long>)::'lambda1'()::operator()() const /build/obj-x86_64-linux-gnu/../src/Common/ThreadPool.cpp:117:73
    #21 0xe69ffa5 in decltype(std::__1::forward<void>(fp)(std::__1::forward<void ThreadPoolImpl<std::__1::thread>::scheduleImpl<void>(std::__1::function<void ()>, int, std::__1::optional<unsigned long>)::'lambda1'()>(fp0)...)) std::__1::__invoke<void ThreadPoolImpl<std::__1::thread>::scheduleImpl<void>(std::__1::function<void ()>, int, std::__1::optional<unsigned long>)::'lambda1'()>(void&&, void ThreadPoolImpl<std::__1::thread>::scheduleImpl<void>(std::__1::function<void ()>, int, std::__1::optional<unsigned long>)::'lambda1'()&&...) /build/obj-x86_64-linux-gnu/../contrib/libcxx/include/type_traits:3519:1
    #22 0xe69ffa5 in void std::__1::__thread_execute<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void ThreadPoolImpl<std::__1::thread>::scheduleImpl<void>(std::__1::function<void ()>, int, std::__1::optional<unsigned long>)::'lambda1'()>(std::__1::tuple<void, void ThreadPoolImpl<std::__1::thread>::scheduleImpl<void>(std::__1::function<void ()>, int, std::__1::optional<unsigned long>)::'lambda1'()>&, std::__1::__tuple_indices<>) /build/obj-x86_64-linux-gnu/../contrib/libcxx/include/thread:273:5
    #23 0xe69ffa5 in void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void ThreadPoolImpl<std::__1::thread>::scheduleImpl<void>(std::__1::function<void ()>, int, std::__1::optional<unsigned long>)::'lambda1'()> >(void*) /build/obj-x86_64-linux-gnu/../contrib/libcxx/include/thread:284:5
    #24 0x7f0ac5227608 in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x9608)
    #25 0x7f0ac514e292 in clone (/lib/x86_64-linux-gnu/libc.so.6+0x122292)

SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior ../src/Common/PODArray.h:497:29 in 

Known error.

@alesapin alesapin merged commit 26637bd into master Dec 18, 2020
@alesapin alesapin deleted the try_rewrite_writer branch December 18, 2020 09:41
@CurtizJ CurtizJ self-assigned this Dec 22, 2020
azat added a commit to azat/ClickHouse that referenced this pull request Jan 5, 2021
In ClickHouse#17120 (that was reverted in ClickHouse#17918 already) marks adjustemnt was
introduced, but it may lead to corruption of marks files, that may cause
two things:
1) Incorrect marks files, which lead to reading more rows than there are
   (can be reproduced with `max_threads`>1 or/and `PREWHERE`, and
    `optimize_move_to_prewhere`)
2) Can't adjust last granule because it has X rows, but try to subtract Y
   rows (ClickHouse#9260)

And 1) is pretty hard to diagnosis, since it does not throw any error,
it just may return wrong result.

Fortunately both problems can be fixed with `OPTIMIZE TABLE ... [ FINAL]`.

Cc: @alesapin
Refs: ClickHouse#17943
Refs: ClickHouse#18223
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-not-for-changelog This PR should not be mentioned in the changelog

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants