Skip to content

Projections#20202

Merged
KochetovNicolai merged 34 commits intoClickHouse:masterfrom
amosbird:projection
May 12, 2021
Merged

Projections#20202
KochetovNicolai merged 34 commits intoClickHouse:masterfrom
amosbird:projection

Conversation

@amosbird
Copy link
Copy Markdown
Collaborator

@amosbird amosbird commented Feb 8, 2021

I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

Changelog category (leave one):

  • New Feature

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Add projection support for MergeTree* tables.

Detailed description / Documentation draft:

The documentation is almost listed in #14730

Possible TODOs.

  • Complete query analysis for projections.
  • Normal projections with different order by clause
  • ProjectionMergeTree to store projection data only can be done via CollapsingMergeTree with default sign = 1
  • Secondary indices via projection

@robot-clickhouse robot-clickhouse added doc-alert pr-feature Pull request with new product feature labels Feb 8, 2021
@alexey-milovidov
Copy link
Copy Markdown
Member

Some exception occurred :(

GithubException: 500 null
This is GitHub downtime, https://www.githubstatus.com/history

Need to restart the tests.

@zhaojintaozhao
Copy link
Copy Markdown

I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

Changelog category (leave one):

  • New Feature

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Add projection support for MergeTree* tables.

Detailed description / Documentation draft:

The documentation is almost listed in #14730

Possible TODOs.

  • Complete query routing for projections. Currently alias in group-by will not work.
  • Normal projections with different order by clause
  • ProjectionMergeTree to store projection data only
  • Secondary indices via projection

@amosbird :
This is a very valuable feature;
I have some questions about projects.

  1. What is the difference between projections and materialized views?
  2. Does projections also use the idea of kylin cube? Pre-aggregate the query results in advance. The aggregated results are directly queried during query. Projections: Are the combinations of all dimensions calculated in advance?

@itzikiusa
Copy link
Copy Markdown

Long expected feature!
Waiting for PR to be approved, and the following steps, specially different order by
Thank you very much for your work!

@amosbird
Copy link
Copy Markdown
Collaborator Author

amosbird commented May 6, 2021

2021-05-05 23:51:05 00992_system_parts_race_condition_zookeeper_long:                       Received exception from server (version 21.6.1):
2021-05-05 23:51:05 Code: 999. DB::Exception: Received from localhost:9000. DB::Exception: Run time inconsistency, path: /clickhouse/task_queue/ddl/query-0000003330. 
2021-05-05 23:51:06 Received exception from server (version 21.6.1):
2021-05-05 23:51:06 Code: 81. DB::Exception: Received from localhost:9000. DB::Exception: There was an error on [localhost:19000]: Code: 81, e.displayText() = DB::Exception: Database test_06rrwt doesn't exist (version 21.6.1.6765). 
2021-05-05 23:51:06 [ FAIL ] 4.53 sec. - return code 81
2021-05-05 23:51:06 Code: 81. DB::Exception: Received from localhost:9000. DB::Exception: Database `default` doesn't exist. 
2021-05-05 23:51:06 
2021-05-05 23:51:06 , result:
2021-05-05 23:51:06 
2021-05-05 23:51:06 

@amosbird
Copy link
Copy Markdown
Collaborator Author

amosbird commented May 6, 2021

Integration tests (asan) — Timeout, fail: 0, passed: 1314, flaky: 1

What does this mean? (stably reproduced in last three commits)

@amosbird amosbird force-pushed the projection branch 2 times, most recently from 8d7eb80 to b258cbc Compare May 11, 2021 08:47
amosbird and others added 7 commits May 11, 2021 18:12
TODO (suggested by Nikolai)

1. Build query plan fro current query (inside storage::read) up to WithMergableState
2. Check, that plan is simple enough: Aggregating - Expression - Filter - ReadFromStorage (or simplier)
3. Check, that filter is the same as filter in projection, and also expression calculates the same aggregation keys as in projection
4. Return WithMergableState if projection applies

3 will be easier to do with ActionsDAG, cause it sees all functions, and dependencies are direct (but it is possible with ExpressionActions also)

Also need to figure out how prewhere works for projections, and
row_filter_policies.

wip
@KochetovNicolai
Copy link
Copy Markdown
Member

KochetovNicolai commented May 11, 2021

Currently, most of mentioned in #14730 (comment) is implemented. Exceptions are: WHERE in projection definition and everything from Projection Analysis and Settings.

This feature is experimental. Enable allow_experimental_projection_optimization to use projections in SELECT.
Projection with ORDER BY uses it's own primary key when selected. Read-in-order optimization also supported.
Projection with GROUP BY uses stored pre-aggregated data.

Alter update/delete will rebuild projection if any column used by projection is affected.

There is still not enough tests. Also, not all the code was thoughtfully reviewed. However, I am going to merge this pr, because the feature works in general (and to avoid major conflicts).

@amosbird
Copy link
Copy Markdown
Collaborator Author

@KochetovNicolai KochetovNicolai merged commit a1ba67e into ClickHouse:master May 12, 2021
@alexey-milovidov
Copy link
Copy Markdown
Member

@KochetovNicolai There is no comment about MergeTreeDataSelectCache.

You've missed the case when PVS-Studio argued about uncomprehensible code:

PVS check — Have 7 new errors, 21 total errors

https://clickhouse-test-reports.s3.yandex.net/20202/817bc1377c131e83d13bbd90260f8f2d97856bfd/pvs_studio_report/pvs-studio-html-report/sources/MergeTreeDataSelectExecutor.cpp_13.html#ln583

The code is correct but it becomes too tangled, which PVS-Studio does not like.

@alexey-milovidov
Copy link
Copy Markdown
Member

We need clear explanation - what MergeTreeDataSelectCache is doing, what's going on in the code...

@olgarev
Copy link
Copy Markdown
Contributor

olgarev commented Aug 15, 2021

Internal ticket DOCSUP-12894

@zhouerlin
Copy link
Copy Markdown

zhouerlin commented Jan 11, 2022

Currently, projection does not support the where clause,
What is the difficulty in implementing it or for what purpose it was designed to do so? @amosbird

azat added a commit to azat/ClickHouse that referenced this pull request Aug 9, 2023
In ClickHouse#20202 DROP COLUMN had been made no-op, i.e. it simply hardlinked
everything and no columns had been removed. In most of the cases this is
OK, however Vertical merge may inject another column instead of current
if it does not exists
(injectRequiredColumns(LoadedMergeTreeDataPartInfoForReader{}) in
MergeTreeSequentialSource ctor), and it may select this already removed
column and it will fail with an error:

    DB::Exception: There is no column legacy_features_Map.count in table. (NO_SUCH_COLUMN_IN_TABLE), Stack trace (when copying this message, always include the lines below):
    0. src/Common/Exception.cpp:91: DB::Exception::Exception(DB::Exception::MessageMasked&&, int, bool) @ 0xe0c67d5 in /usr/lib/debug/usr/bin/clickhouse.debug
    2. src/Storages/StorageSnapshot.cpp:111: DB::StorageSnapshot::getColumn(DB::GetColumnsOptions const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&) const @ 0x13ff1cb3 in /usr/lib/debug/usr/bin/clickhouse.debug
    3. contrib/llvm-project/libcxx/include/new:246: DB::StorageSnapshot::getColumnsByNames(DB::GetColumnsOptions const&, std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>> const&) const @ 0x13ff19f1 in /usr/lib/debug/usr/bin/clickhouse.debug
    4. src/Storages/MergeTree/MergeTreeSequentialSource.cpp:0: DB::MergeTreeSequentialSource::MergeTreeSequentialSource(DB::MergeTreeData const&, std::__1::shared_ptr<DB::StorageSnapshot> const&, std::__1::shared_ptr<DB::IMergeTreeDataPart const>, std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>>, std::__1::optional<DB::MarkRanges>, bool, bool, bool, bool) @ 0x143d108c in /usr/lib/debug/usr/bin/clickhouse.debug
    5. contrib/llvm-project/libcxx/include/__memory/construct_at.h:35: DB::createMergeTreeSequentialSource(DB::MergeTreeData const&, std::__1::shared_ptr<DB::StorageSnapshot> const&, std::__1::shared_ptr<DB::IMergeTreeDataPart const>, std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>>, bool, bool, bool, std::__1::shared_ptr<std::__1::atomic<unsigned long>>) @ 0x143d25ee in /usr/lib/debug/usr/bin/clickhouse.debug
    6. contrib/llvm-project/libcxx/include/vector:434: DB::MergeTask::VerticalMergeStage::prepareVerticalMergeForOneColumn() const @ 0x1422c525 in /usr/lib/debug/usr/bin/clickhouse.debug

Such merges can be distinguished with the following message in logs:

    Part X doesn't change up to mutation version Y (optimized)
                                                   ^^^^^^^^^^^

Funny that this has been accidentally fixed in the PR with the title
"Remove strange code from MutateTask" (ClickHouse#48666)

Signed-off-by: Azat Khuzhin <[email protected]>
azat added a commit to azat/ClickHouse that referenced this pull request Aug 17, 2023
In ClickHouse#20202 DROP COLUMN had been made no-op, i.e. it simply hardlinked
everything and no columns had been removed. In most of the cases this is
OK, however Vertical merge may inject another column instead of current
if it does not exists
(injectRequiredColumns(LoadedMergeTreeDataPartInfoForReader{}) in
MergeTreeSequentialSource ctor), and it may select this already removed
column and it will fail with an error:

    DB::Exception: There is no column legacy_features_Map.count in table. (NO_SUCH_COLUMN_IN_TABLE), Stack trace (when copying this message, always include the lines below):
    0. src/Common/Exception.cpp:91: DB::Exception::Exception(DB::Exception::MessageMasked&&, int, bool) @ 0xe0c67d5 in /usr/lib/debug/usr/bin/clickhouse.debug
    2. src/Storages/StorageSnapshot.cpp:111: DB::StorageSnapshot::getColumn(DB::GetColumnsOptions const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&) const @ 0x13ff1cb3 in /usr/lib/debug/usr/bin/clickhouse.debug
    3. contrib/llvm-project/libcxx/include/new:246: DB::StorageSnapshot::getColumnsByNames(DB::GetColumnsOptions const&, std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>> const&) const @ 0x13ff19f1 in /usr/lib/debug/usr/bin/clickhouse.debug
    4. src/Storages/MergeTree/MergeTreeSequentialSource.cpp:0: DB::MergeTreeSequentialSource::MergeTreeSequentialSource(DB::MergeTreeData const&, std::__1::shared_ptr<DB::StorageSnapshot> const&, std::__1::shared_ptr<DB::IMergeTreeDataPart const>, std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>>, std::__1::optional<DB::MarkRanges>, bool, bool, bool, bool) @ 0x143d108c in /usr/lib/debug/usr/bin/clickhouse.debug
    5. contrib/llvm-project/libcxx/include/__memory/construct_at.h:35: DB::createMergeTreeSequentialSource(DB::MergeTreeData const&, std::__1::shared_ptr<DB::StorageSnapshot> const&, std::__1::shared_ptr<DB::IMergeTreeDataPart const>, std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>>, bool, bool, bool, std::__1::shared_ptr<std::__1::atomic<unsigned long>>) @ 0x143d25ee in /usr/lib/debug/usr/bin/clickhouse.debug
    6. contrib/llvm-project/libcxx/include/vector:434: DB::MergeTask::VerticalMergeStage::prepareVerticalMergeForOneColumn() const @ 0x1422c525 in /usr/lib/debug/usr/bin/clickhouse.debug

Such merges can be distinguished with the following message in logs:

    Part X doesn't change up to mutation version Y (optimized)
                                                   ^^^^^^^^^^^

Funny that this has been accidentally fixed in the PR with the title
"Remove strange code from MutateTask" (ClickHouse#48666)

Signed-off-by: Azat Khuzhin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-feature Pull request with new product feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants