Skip to content

Sparse serialization and ColumnSparse#22535

Merged
alesapin merged 102 commits intoClickHouse:masterfrom
CurtizJ:sparse-serialization
Dec 17, 2021
Merged

Sparse serialization and ColumnSparse#22535
alesapin merged 102 commits intoClickHouse:masterfrom
CurtizJ:sparse-serialization

Conversation

@CurtizJ
Copy link
Copy Markdown
Member

@CurtizJ CurtizJ commented Apr 3, 2021

I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

Changelog category (leave one):

  • New Feature

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Implemented sparse serialization. It can reduce usage of disk space and improve performance of some queries for columns, which contain a lot of default (zero) values. It can be enabled by setting ratio_of_defaults_for_sparse_serialization. Sparse serialization will be chosen dynamically for column, if it has ratio of number of default values to number of all values above that threshold. Serialization (default or sparse) will be fixed for every column in part, but may varies between parts.

Detailed description / Documentation draft:

Second part of #19953.

TODO:

  • Sorting by sparse columns.
  • Aggregating by sparse columns.
  • Unit tests for ColumnSparse.

@robot-clickhouse robot-clickhouse added doc-alert pr-feature Pull request with new product feature labels Apr 3, 2021
@alesapin alesapin self-assigned this Nov 23, 2021
Copy link
Copy Markdown
Member

@alesapin alesapin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general LGTM.

/// Convert to full column, because sparse column has
/// access to element in O(log(K)), where K is number of non-default rows,
/// which can be inefficient.
convertToFullIfSparse(chunk);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better to add such comments for each place where we use convertToFullIfSparse. Because it looks like code here works with some column internals, but just not ready for sparse format.

@alesapin
Copy link
Copy Markdown
Member

Tests Ok, let's merge!

@sevirov
Copy link
Copy Markdown
Contributor

sevirov commented Dec 18, 2021

Internal documentation ticket: DOCSUP-20369

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-feature Pull request with new product feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants