Skip to content

Polymorphic parts (compact format).#8290

Merged
alexey-milovidov merged 99 commits intoClickHouse:masterfrom
CurtizJ:polymorphic-parts
Feb 23, 2020
Merged

Polymorphic parts (compact format).#8290
alexey-milovidov merged 99 commits intoClickHouse:masterfrom
CurtizJ:polymorphic-parts

Conversation

@CurtizJ
Copy link
Copy Markdown
Member

@CurtizJ CurtizJ commented Dec 19, 2019

I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

Changelog category (leave one):

  • New Feature

Changelog entry (up to few sentences, required except for Non-significant/Documentation categories):

Add new compact format of parts in MergeTree-family tables in which all columns are stored in one file. It helps to increase performance of small and frequent inserts. The old format (one file per column) is now called wide. Data storing format is controlled by settings min_bytes_for_wide_part and min_rows_for_wide_part.

@CurtizJ CurtizJ requested a review from alesapin February 14, 2020 21:34
Copy link
Copy Markdown
Member

@alesapin alesapin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need one more test with compact parts and skip indices. Also, we need at least one detailed comment about compact parts.

}


static bool arrayHasNoElementsRead(const IColumn & column)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ToRead? Maybe add a comment?

set min_insert_block_size_rows=1;
insert into mt_compact select number, 'aaa' from numbers(100);

select count() from system.parts where table = 'mt_compact' and database = currentDatabase() and active;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm wondering - will system.columns and part_columns be able to show proper sizes for those parts.

Shouldn't those parts be marked in system.parts somehow?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm wondering - will system.columns and part_columns be able to show proper sizes for those parts.

No, per-column sizes wouldn't be counted for compact parts. Count compressed size of every column is almost impossible. Uncompressed size can be counted, but it's not very usefull, and it's not implemented for simplicity.

Shouldn't those parts be marked in system.parts somehow?

system.parts table would have part_type column.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That will definitely be reported as bug ("size on disk doesn't match the size in system.columns" etc.) :\ AFAIK a lot of users look in system.columns.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Compact parts are intended to have only small ratio in size to total size of the table.

@CurtizJ
Copy link
Copy Markdown
Member Author

CurtizJ commented Feb 19, 2020

Need one more test with compact parts and skip indices.

I've added a smoke test to check basic functionality with compact parts and skip indices.

@zhang2014
Copy link
Copy Markdown
Contributor

👍 Great feature !!

For compact format it will be append mode in the future (always write to one file for multiple INSERT queries) ?

@CurtizJ CurtizJ changed the title Polymorphic parts. Polymorphic parts (compact format). Feb 21, 2020
@CurtizJ
Copy link
Copy Markdown
Member Author

CurtizJ commented Feb 21, 2020

For compact format it will be append mode in the future (always write to one file for multiple INSERT queries) ?

It will be implemented as in-memory parts with WAL, which is the append-only file in Native format.

@alexey-milovidov
Copy link
Copy Markdown
Member

But the parts itself remain atomic and immutable.


void read(ReadBuffer & buffer, size_t from, size_t count)
{
buffer.readStrict(reinterpret_cast<char *>(data() + from), count * sizeof(MarkInCompressedFile));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks dangerous because it neither checks nor resizes the array.

@alexey-milovidov alexey-milovidov merged commit d0fea62 into ClickHouse:master Feb 23, 2020
@alexey-milovidov
Copy link
Copy Markdown
Member

Now let's enable them for system logs for size less than 1 MB.

azat added a commit to azat/ClickHouse that referenced this pull request Jan 11, 2021
But reduce scope, to avoid leaking too much memory, since there are old
values in last_block_index_columns.

The scope of the MemoryTracker::BlockerInThread has been increased in ClickHouse#8290
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-feature Pull request with new product feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants