Skip to content

StorageEmbeddedRocksDB#15073

Merged
alexey-milovidov merged 62 commits intoClickHouse:masterfrom
sundy-li:storage-rocksdb
Nov 12, 2020
Merged

StorageEmbeddedRocksDB#15073
alexey-milovidov merged 62 commits intoClickHouse:masterfrom
sundy-li:storage-rocksdb

Conversation

@sundy-li
Copy link
Copy Markdown
Contributor

@sundy-li sundy-li commented Sep 21, 2020

I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

Changelog category (leave one):

  • New Feature

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Add StorageEmbeddedRocksdb Engine

Detailed description / Documentation draft:

StorageEmbeddedRocksdb. As the name suggests, it introduces rocksdb as an embedded database with ClickHouse.
This is just a simple implementation.


Motivations for adding this feature

  • If we low down the setting index_granularity_bytes, ClickHouse
    may work kind like a key-value LSM-Tree database(key, value are in separated files), yet the pure key-value store can reduce much more extra IO READ than column store.

  • In bytedance, bigo, suning , they used rocksdb for caching the MergeTreeParts, it significantly reduces the loading time for large set tables. Because io sequence reads perform better than random reads in HDD.

  • In bigo, we introduced pilosa to ClickHouse, large bitmapState will be stored in rocksdb. It performs better than count distinct in UV metrics calculation, and we can join the results with normal OLAP queries.

  • Why not StorageRocksdb instead of StorageEmbeddedRocksdb? I think It's ok for both names.

@robot-clickhouse robot-clickhouse added doc-alert pr-feature Pull request with new product feature labels Sep 21, 2020
@sundy-li sundy-li changed the title Storagerocksdb StorageEmbeddedRocksdb Sep 21, 2020
@robot-clickhouse robot-clickhouse added the submodule changed At least one submodule changed in this PR. label Sep 21, 2020
@filimonov
Copy link
Copy Markdown
Contributor

#14863
#8624

@sundy-li
Copy link
Copy Markdown
Contributor Author

How to make fast test check run with the new added submodule?

@filimonov
Copy link
Copy Markdown
Contributor

filimonov commented Sep 22, 2020

The script which is executed for fast builds is here:
https://github.com/ClickHouse/ClickHouse/blob/master/docker/test/fasttest/run.sh

If cmake for Rocksdb will respect ENABLE_LIBRARIES as a default value - it will just work automatically

export CMAKE_LIBS_CONFIG="-DENABLE_LIBRARIES=0 -DENABLE_TESTS=0 -DENABLE_UTILS=0 -DENABLE_EMBEDDED_COMPILER=0 -DENABLE_THINLTO=0 -DUSE_UNWIND=1"

See also #9226

@alexey-milovidov alexey-milovidov self-assigned this Sep 24, 2020
@alexey-milovidov
Copy link
Copy Markdown
Member

alexey-milovidov commented Sep 24, 2020

It will be better to allow arbitrary set of columns and PRIMARY KEY specification.
Then primary key columns and the rest columns will be serialized and deserialized with IDataType::serialize/deserializeBinary into single key/value blobs.

It will greater improve the usability of this feature!

@sundy-li sundy-li marked this pull request as draft September 24, 2020 07:30
@sundy-li sundy-li force-pushed the storage-rocksdb branch 7 times, most recently from 28dcdcd to 3cec63c Compare October 2, 2020 12:13
Copy link
Copy Markdown
Member

@azat azat Oct 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Allow passing extra options will be very useful (since there are tons of them, compression, threads, bloom filters, hash index), and not only rocksdb options but also options for column family, there are API that create options from map:

  • rocksdb::GetDBOptionsFromMap
  • rocksdb::GetColumnFamilyOptionsFromMap
  • And a separate option for TTL, i.e. rocksdb::DBWithTTL::Open.

Plus rocksdb exports tons of statistics:

  • rocksdb::CreateDBStatistics
  • pus there are histograms (rocksdb::StatsLevel::kExceptHistogramOrTimers), but they a little bit heavy
    worth export it via some metrics or system.rocksdb

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can split the task to two steps and add support for options in the second step.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, since there are lots of options in rocksdb, I'd like to continue with options support in another PR.

@sundy-li
Copy link
Copy Markdown
Contributor Author

sundy-li commented Oct 3, 2020

@alexey-milovidov
can you help resolve the Compatibility check in spare time? I still have no idea about it :( .

@azat
Copy link
Copy Markdown
Member

azat commented Oct 3, 2020

Compatibility check — glibc check failed

AFAIR understand rocksdb uses sync_file_range while there is no support in glibc-compatibility layer, so you can:

@sundy-li sundy-li marked this pull request as ready for review October 4, 2020 12:24
@sundy-li sundy-li force-pushed the storage-rocksdb branch 2 times, most recently from ecda5cf to b104217 Compare October 6, 2020 06:34
@azat
Copy link
Copy Markdown
Member

azat commented Oct 15, 2020

@sundy-li BTW do you have any numbers? (performance related)

@sundy-li
Copy link
Copy Markdown
Contributor Author

sundy-li commented Nov 11, 2020

alexey-milovidov: I wonder why don't allow multi-column primary key?
eg: primary key (a, b, c)

  • I think it's useless, we can define multi-column as a single Tuple column instead.
  • It's hard to apply index for multi-column primary key, such as where a = 1 and b in (2, 3) and c in (4, 5), this may rewrite to (a, b, c) in [ (1, 2, 4), (1, 3, 5) ...], but it's not an easy way to implement
  • If it's really needed in the future for some reasons, we can add it later.

It's strange that this PR view in github involves lots of changes when I pull rebase from origin/master. But it's normal when I git diff origin/master locally. Maybe it's cached by github.

@alexey-milovidov
Copy link
Copy Markdown
Member

@sundy-li

It's hard to apply index for multi-column primary key
If it's really needed in the future for some reasons, we can add it later.

Ok.

It's strange that this PR view in github involves lots of changes when I pull rebase from origin/master.

I don't know the reason but don't worry...

@alexey-milovidov alexey-milovidov merged commit ff29b2d into ClickHouse:master Nov 12, 2020
M(550, CONDITIONAL_TREE_PARENT_NOT_FOUND) \
M(551, ILLEGAL_PROJECTION_MANIPULATOR) \
M(552, UNRECOGNIZED_ARGUMENTS) \
M(553, ROCKSDB_ERROR) \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alexey-milovidov Error code is duplicated (after syncing with upstream.)

@amosbird
Copy link
Copy Markdown
Collaborator

It's hard to apply index for multi-column primary key, such as where a = 1 and b in (2, 3) and c in (4, 5), this may rewrite to (a, b, c) in [ (1, 2, 4), (1, 3, 5) ...], but it's not an easy way to implement

It looks like https://github.com/ClickHouse/ClickHouse/blob/master/src/Storages/MergeTree/KeyCondition.cpp#L1800 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-feature Pull request with new product feature submodule changed At least one submodule changed in this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants