Skip to content

Explicitly specify storage features#8830

Merged
alexey-milovidov merged 7 commits intoClickHouse:masterfrom
zlobober:storage_properties
Jan 27, 2020
Merged

Explicitly specify storage features#8830
alexey-milovidov merged 7 commits intoClickHouse:masterfrom
zlobober:storage_properties

Conversation

@zlobober
Copy link
Copy Markdown
Contributor

@zlobober zlobober commented Jan 25, 2020

I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

There are several features that are supported by a certain subset of storages like support of TTL, key/sort order specification, settings specification or skipping indices support. As for today, ClickHouse checks if given table engine supports these features in storage factory by hard-coded rules like "if ORDER BY is specified then storage name should end with MergeTree". Such logic blocks third-party storage engines to support similar features, for example, if ClickHouse is used as a library in third-party application which provides its own storage engine supporting TTLs, there is no way to use it as StorageFactory will throw an exception.

This PR does the following:

  • A notion of storage property introduced; namely, there are currently four properties supports_settings, supports_skipping_indices, supports_ttl and supports_sort_order;
  • Storage registration is extended by optional specification of supported properties;
  • System table system.table_engines is extended with information about feature support in following manner:
max42-dev.sas.yp-c.yandex.net :) select * from system.table_engines

SELECT *
FROM system.table_engines

┌─name───────────────────────────────────┬─supports_settings─┬─supports_skipping_indices─┬─supports_sort_order─┬─supports_ttl─┬─supports_replication─┬─supports_deduplication──┐
│ Kafka                                  │                 1 │                         0 │                   0 │            0 │                    0 │                       0 │
│ MySQL                                  │                 0 │                         0 │                   0 │            0 │                    0 │                       0 │
│ HDFS                                   │                 0 │                         0 │                   0 │            0 │                    0 │                       0 │
│ S3                                     │                 0 │                         0 │                   0 │            0 │                    0 │                       0 │
│ MaterializedView                       │                 0 │                         0 │                   0 │            0 │                    0 │                       0 │
│ View                                   │                 0 │                         0 │                   0 │            0 │                    0 │                       0 │
│ JDBC                                   │                 0 │                         0 │                   0 │            0 │                    0 │                       0 │
│ Join                                   │                 1 │                         0 │                   0 │            0 │                    0 │                       0 │
│ Set                                    │                 0 │                         0 │                   0 │            0 │                    0 │                       0 │
│ Dictionary                             │                 0 │                         0 │                   0 │            0 │                    0 │                       0 │
│ LiveView                               │                 0 │                         0 │                   0 │            0 │                    0 │                       0 │
│ MergeTree                              │                 1 │                         1 │                   1 │            1 │                    0 │                       0 │
│ Memory                                 │                 0 │                         0 │                   0 │            0 │                    0 │                       0 │
│ Buffer                                 │                 0 │                         0 │                   0 │            0 │                    0 │                       0 │
│ URL                                    │                 0 │                         0 │                   0 │            0 │                    0 │                       0 │
│ ReplicatedVersionedCollapsingMergeTree │                 1 │                         1 │                   1 │            1 │                    1 │                       1 │
│ ReplacingMergeTree                     │                 1 │                         1 │                   1 │            1 │                    0 │                       0 │
│ ReplicatedSummingMergeTree             │                 1 │                         1 │                   1 │            1 │                    1 │                       1 │
│ ReplicatedAggregatingMergeTree         │                 1 │                         1 │                   1 │            1 │                    1 │                       1 │
│ ReplicatedCollapsingMergeTree          │                 1 │                         1 │                   1 │            1 │                    1 │                       1 │
│ File                                   │                 0 │                         0 │                   0 │            0 │                    0 │                       0 │
│ ReplicatedGraphiteMergeTree            │                 1 │                         1 │                   1 │            1 │                    1 │                       1 │
│ ReplicatedMergeTree                    │                 1 │                         1 │                   1 │            1 │                    1 │                       1 │
│ ReplicatedReplacingMergeTree           │                 1 │                         1 │                   1 │            1 │                    1 │                       1 │
│ VersionedCollapsingMergeTree           │                 1 │                         1 │                   1 │            1 │                    0 │                       0 │
│ SummingMergeTree                       │                 1 │                         1 │                   1 │            1 │                    0 │                       0 │
│ Distributed                            │                 0 │                         0 │                   0 │            0 │                    0 │                       0 │
│ TinyLog                                │                 0 │                         0 │                   0 │            0 │                    0 │                       0 │
│ GraphiteMergeTree                      │                 1 │                         1 │                   1 │            1 │                    0 │                       0 │
│ CollapsingMergeTree                    │                 1 │                         1 │                   1 │            1 │                    0 │                       0 │
│ Merge                                  │                 0 │                         0 │                   0 │            0 │                    0 │                       0 │
│ AggregatingMergeTree                   │                 1 │                         1 │                   1 │            1 │                    0 │                       0 │
│ ODBC                                   │                 0 │                         0 │                   0 │            0 │                    0 │                       0 │
│ Null                                   │                 0 │                         0 │                   0 │            0 │                    0 │                       0 │
│ StripeLog                              │                 0 │                         0 │                   0 │            0 │                    0 │                       0 │
│ Log                                    │                 0 │                         0 │                   0 │            0 │                    0 │                       0 │
└────────────────────────────────────────┴───────────────────┴───────────────────────────┴─────────────────────┴──────────────┴──────────────────────┴─────────────────────────┘

36 rows in set. Elapsed: 0.006 sec. 

Disclaimer: I divided features into these four groups according to my understanding of their logic. Feel free to suggest a better division into groups or better naming.

Specify supported properties for each storage engine, expose them via system.table_engines table, properly check property support in storage factory.

Changelog category (leave one):

  • Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
System table system.table_engine now provides information about feature support (like supports_ttl or supports_sort_order).

Detailed description / Documentation draft:

System table system.table_engines is currently not documented at all, so a documentation draft for this table is included as a second commit.

…ystem.table_engines table, properly check feature support in storage factory.
@alexey-milovidov
Copy link
Copy Markdown
Member

Ok.

@alexey-milovidov
Copy link
Copy Markdown
Member

We have some properties in virtual methods of IStorage. Some of them depends on concrete table instance (not only on engine type) but maybe some of them worth moving out to factory.

@zlobober zlobober marked this pull request as ready for review January 25, 2020 20:24
@zlobober
Copy link
Copy Markdown
Contributor Author

zlobober commented Jan 25, 2020

We have some properties in virtual methods of IStorage. Some of them depends on concrete table instance (not only on engine type) but maybe some of them worth moving out to factory.

This PR mainly focuses on fixing the issue of not being able to specify some clauses in engine specification for third-party engines, the rest of work is done on my personal "sense of beauty" basis and presented here for the sake of PR completeness :)

Though I agree that moving some of the IStorage per-query features to factory is definitely a good idea, I think it is more an informativeness (and documentation) improvement rather than what is done here.

Also, I do not feel 100% sure which features are worth exposing them in the system.table_engines table. I would be glad if anybody suggests me which of the IStorage::supports*() are worth exposing and documenting; also maybe it is a good idea to delegate this part of work to somebody who is more competent than me in ClickHouse storage engine features :)

@alexey-milovidov
Copy link
Copy Markdown
Member

which of the IStorage::supports*() are worth exposing

Those that always return true/false.

@zlobober
Copy link
Copy Markdown
Contributor Author

Those that always return true/false.

This plan excludes almost all features (except for supportsReplcation() and supportsDeduplication()) as there is MaterializedView storage whose features depend on the target table.

Maybe we should introduce special value of 2 (or null) for this flag indicating that certain feature may be supported or not depending on the particular table?

@alexey-milovidov
Copy link
Copy Markdown
Member

Maybe we should introduce special value of 2 (or null) for this flag indicating that certain feature may be supported or not depending on the particular table?

Not necessary. Just move only those features that are constantly true or false...

@alexey-milovidov
Copy link
Copy Markdown
Member

2020.01.26 01:50:02.420571 [ 77 ] {dd0f13ee-adb3-4350-85f1-4b510eb3af55} <Debug> executeQuery: (from [::1]:34264) CREATE VIEW test.view AS SELECT CounterID, count() AS c FROM test.hits GROUP BY CounterID
2020.01.26 01:50:02.423212 [ 247 ] {} <Fatal> BaseDaemon: ########################################
2020.01.26 01:50:02.423351 [ 247 ] {} <Fatal> BaseDaemon: (version 20.2.1.2216) (from thread 77) (query_id: dd0f13ee-adb3-4350-85f1-4b510eb3af55) Received signal Segmentation fault (11).
2020.01.26 01:50:02.423423 [ 247 ] {} <Fatal> BaseDaemon: Address: 0x60 Access: read. Address not mapped to object.
2020.01.26 01:50:02.423511 [ 247 ] {} <Fatal> BaseDaemon: Stack trace: 0x18cf1b07 0x17c9483e 0x17c92bfb 0x17c98406 0x189e9ed5 0x189e7316 0xa9c4ad3 0xa9e6303 0x1a28be6f 0x1a28cb09 0x2000a975 0x2000522e 0x200090be 0x7f9d329446db 0x7f9d3226188f
2020.01.26 01:50:02.426100 [ 247 ] {} <Fatal> BaseDaemon: 3. 0x18cf1b07 DB::StorageFactory::get(DB::ASTCreateQuery const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, DB::Context&, DB::Context&, DB::ColumnsDescription const&, DB::ConstraintsDescription const&, bool) const /build/obj-x86_64-linux-gnu/../dbms/src/Storages/StorageFactory.cpp:146 in /usr/lib/debug/usr/bin/clickhouse
2020.01.26 01:50:02.429432 [ 247 ] {} <Fatal> BaseDaemon: 4. 0x17c9483e DB::InterpreterCreateQuery::doCreateTable(DB::ASTCreateQuery const&, DB::InterpreterCreateQuery::TableProperties const&) /build/obj-x86_64-linux-gnu/../dbms/src/Interpreters/InterpreterCreateQuery.cpp:639 in /usr/lib/debug/usr/bin/clickhouse
2020.01.26 01:50:02.432518 [ 247 ] {} <Fatal> BaseDaemon: 5. 0x17c92bfb DB::InterpreterCreateQuery::createTable(DB::ASTCreateQuery&) /build/obj-x86_64-linux-gnu/../dbms/src/Interpreters/InterpreterCreateQuery.cpp:580 in /usr/lib/debug/usr/bin/clickhouse
2020.01.26 01:50:02.435774 [ 247 ] {} <Fatal> BaseDaemon: 6. 0x17c98406 DB::InterpreterCreateQuery::execute() /build/obj-x86_64-linux-gnu/../dbms/src/Interpreters/InterpreterCreateQuery.cpp:743 in /usr/lib/debug/usr/bin/clickhouse
2020.01.26 01:50:02.438218 [ 247 ] {} <Fatal> BaseDaemon: 7. 0x189e9ed5 DB::executeQueryImpl(char const*, char const*, DB::Context&, bool, DB::QueryProcessingStage::Enum, bool, DB::ReadBuffer*, bool) /build/obj-x86_64-linux-gnu/../dbms/src/Interpreters/executeQuery.cpp:0 in /usr/lib/debug/usr/bin/clickhouse
2020.01.26 01:50:02.440730 [ 247 ] {} <Fatal> BaseDaemon: 8. 0x189e7316 DB::executeQuery(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, DB::Context&, bool, DB::QueryProcessingStage::Enum, bool, bool) /build/obj-x86_64-linux-gnu/../dbms/src/Interpreters/executeQuery.cpp:576 in /usr/lib/debug/usr/bin/clickhouse
2020.01.26 01:50:02.440987 [ 247 ] {} <Fatal> BaseDaemon: 9. 0xa9c4ad3 DB::TCPHandler::runImpl() /build/obj-x86_64-linux-gnu/../dbms/programs/server/TCPHandler.cpp:248 in /usr/lib/debug/usr/bin/clickhouse
2020.01.26 01:50:02.442694 [ 247 ] {} <Fatal> BaseDaemon: 10. 0xa9e6303 DB::TCPHandler::run() /build/obj-x86_64-linux-gnu/../dbms/programs/server/TCPHandler.cpp:0 in /usr/lib/debug/usr/bin/clickhouse
2020.01.26 01:50:02.445794 [ 247 ] {} <Fatal> BaseDaemon: 11. 0x1a28be6f Poco::Net::TCPServerConnection::start() /build/obj-x86_64-linux-gnu/../contrib/poco/Net/src/TCPServerConnection.cpp:57 in /usr/lib/debug/usr/bin/clickhouse
2020.01.26 01:50:02.449056 [ 247 ] {} <Fatal> BaseDaemon: 12. 0x1a28cb09 Poco::Net::TCPServerDispatcher::run() /build/obj-x86_64-linux-gnu/../contrib/libcxx/include/atomic:1036 in /usr/lib/debug/usr/bin/clickhouse
2020.01.26 01:50:02.452507 [ 247 ] {} <Fatal> BaseDaemon: 13. 0x2000a975 Poco::PooledThread::run() /build/obj-x86_64-linux-gnu/../contrib/poco/Foundation/include/Poco/Mutex_STD.h:132 in /usr/lib/debug/usr/bin/clickhouse
2020.01.26 01:50:02.455910 [ 247 ] {} <Fatal> BaseDaemon: 14. 0x2000522e Poco::ThreadImpl::runnableEntry(void*) /build/obj-x86_64-linux-gnu/../contrib/poco/Foundation/include/Poco/SharedPtr.h:256 in /usr/lib/debug/usr/bin/clickhouse
2020.01.26 01:50:02.459665 [ 247 ] {} <Fatal> BaseDaemon: 15. 0x200090be void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void* (*)(void*), Poco::ThreadImpl*> >(void*) /build/obj-x86_64-linux-gnu/../contrib/libcxx/include/memory:2615 in /usr/lib/debug/usr/bin/clickhouse
2020.01.26 01:50:02.459756 [ 247 ] {} <Fatal> BaseDaemon: 16. 0x76db start_thread  in /lib/x86_64-linux-gnu/libpthread-2.27.so
2020.01.26 01:50:02.459818 [ 247 ] {} <Fatal> BaseDaemon: 17. 0x12188f clone  in /lib/x86_64-linux-gnu/libc-2.27.so

@filimonov
Copy link
Copy Markdown
Contributor

filimonov commented Jan 27, 2020

Maybe just array?

features: ['settings' , 'skipping_indices' , 'ttl', 'sort_order'] 

It's easier to extend and horizontal output is not so wide. Checking with select * from where has(features, 'ttl') looks very natural

@alexey-milovidov
Copy link
Copy Markdown
Member

@filimonov Separate columns is more ClickHouse style.

@zlobober
Copy link
Copy Markdown
Contributor Author

zlobober commented Jan 27, 2020

Fixed segfault, added supports_replication and supports_deduplication.

Maybe just array?

I have no strong opinion on this, but n columns seems OK for me.

Copy link
Copy Markdown
Member

@alexey-milovidov alexey-milovidov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok.

There is no functional test for new columns. And virtual methods from IStorage were not removed.

@alexey-milovidov alexey-milovidov merged commit 45dce57 into ClickHouse:master Jan 27, 2020
@CurtizJ CurtizJ added the pr-improvement Pull request with some product improvements label Jan 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-improvement Pull request with some product improvements

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants