-
Notifications
You must be signed in to change notification settings - Fork 8.3k
Roadmap 2021 (discussion) #17623
Description
This is ClickHouse roadmap 2021.
Descriptions and links to be filled.
It will be published in documentation in December.
Main tasks
✔️ Provide alternative for ZooKeeper
Implementation of a server with ZooKeeper interface inside ClickHouse.
Done, @alesapin
#15090 #16877 #19580 #20585 #21425 #21677 #21593 #21690 #22274 #26150 #28981 #31150 #30880 #30678 #30372 #30170 #29417 #29367 #29268 #29223 #29071 #29030 #28526 #28519 #28360 #28152 #28190 #28197 #28143 #28080 #27818 #27125 #26874 #25428 #25421 #24533 #24499 #24448 #24412 #24059 #24017 #23077 #23038 #22992 #22743 #22707 #22470 #22373 #22274 #21677
✔️ Nested and semistructured data
In progress, @CurtizJ
Reading of subcolumns from tables. Nested type with arbitrary nesting level. Unify Nested and named tuples. Better support for nested and named tuples in syntax. Naturally map Nested to JSON format. Map datatype. Move ser/de methods from DataType to Column. Allow different column representations to store the same DataType. ColumnSparse. Codec inference from data. Dynamic columns in tables.
#21562
#21157
#21699
#21562
#14196
#17310
#14963
#15806
#1841
Limited support for transactions
Atomic inserts into table and all dependent materialized views. Atomic inserts of more than one block.
Acquire a snapshot to use in multiple SELECT queries.
In progress, @tavplubix
✔️ Backups
✔️ Hedged requests
✔️ Window functions
Experimental support, @akuzm
#18222
#18455
#19022
#19299
#19921
#19951
#20041
#20060
#20111
#20284
#20293
#20337
#21895
✔️ Separation of storage and compute
✔️ Object storage for Replicated tables: #16240
✔️ Support for partitions in file-like engines
✔️ Distributed INSERT and SELECT over file-like engines, @nikitamikhaylov #22012
✔️ Remove ugliness and general inefficiencies from reading from remote storage.
Remote filesystem over ClickHouse server
✔️ Distributed SELECT over MergeTree on shared filesystem, @nikitamikhaylov #29279
✔️ Short-circuit evaluation
Done, @Avogar
✔️ Projections
Experimental stage.
#20202
@amosbird
ALTER PRIMARY KEY
In progress, @amosbird
✔️ Lightweight DELETE/UPDATE
✔️ Workload management
Add async method for processors. Shared thread pool for all queries.
@KochetovNicolai
✔️ User-Defined Functions
Done, @kitaisreal
SQL UDFs - done!
Executable UDFs - done!
Simplify replication
JOIN improvements
Embedded documentation
In progress, @FArthur-cmd
Pluggable auth with tokens
Experimental and interns tasks
🗑️ Calculation of test coverage on a per-query basis
Limited support for correlated subqueries
Postponed.
✔️ PostgreSQL table engine.
✔️ Streaming replication from PostgreSQL.
✔️ Implement SQL/JSON standard.
Done, #24148
✔️ Table constraints and hypothesis on data for query optimization
✔️ Schema inference for text formats
Done, @Avogar
🗑️ Advanced compression methods
Cancelled.
🗑️ Integration of ClickHouse with Tensorflow
Cancelled.
✔️ Integration of more streaming data sketches in ClickHouse
Two new sketches are added.
✔️ Data processing with external tools in streaming fashion aka ClickHouse MR
Done @kitaisreal
🗑️ Caching of deserialized data in memory on MergeTree part level
Cancelled.
✔️ Subquery operators: INTERSECT/EXCEPT, ANY/ALL/EXISTS.
Done.
✔️ Implementation of GROUPING SETS.
In progress.
✔️ Refreshable materialized views and cron jobs.
In progress.
User-defined data types
In progress.
Limited support for unique key constraints.
✔️ YAML configuration
Done, scheduled for release in 21.7.
#21858
@BoloniniD
Incremental data aggregation in memory
In progress.
✔️ Natural language processing functions
Done, @evillique.
✔️ Implementation of a table engine to consume application log files
✔️ Collection of common system metrics
Done, @alexey-milovidov
✔️ Integration of S2 geometry library
Done.
SQL functions for compatibility with MySQL
A few functions were added. Review stage.
Data formats for fast import of nested JSON and XML
In progress.
✔️ Text classification
Done, @evillique
✔️ Data encryption on-rest
🗑️ NEAR modifier for GROUP BY
Cancelled.
🗑️ Specialized precompression codecs
Moved to 2022.
✔️ Integration of SQLite as database engine and data format
Done.
✔️ Query cache for result datasets
Postponed.
✔️ Support for INFORMATION SCHEMA
Done by @tavplubix
🗑️ Arrow Flight interface
Cancelled.
✔️ Functions and data types for geospatial data
Experimental stage.
✔️ User-Agent parsing functions
Integrate novel optimization for GROUP BY
✔️ Descriptive analysis of datasets
Done.
🗑️ Learning of vector embeddings for table rows
Cancelled.
🗑️ Userspace RAID
Postponed.
✔️ VFS over HDFS
🗑️ Etcd instead of ZooKeeper
#17495 Cancelled.
🗑️ GPU accelerated aggregate functions
nVidia
Cancelled.
✔️ Rewrite type inference and identifiers analysis
E.g. a way to analyze this query
WITH b + 1 AS c
SELECT a AS b, *, t.*, n.b, a -> a = b + 1 AS func, arrayMap(func, n.c)
FROM mysql(...) RIGHT JOIN (SELECT ...) t ARRAY JOIN nest AS n
in a generic, not ad-hoc fashion.
In progress, @kitaisreal
Tech debt and small tasks
✔️ Fix low performance of encrypt/decrypt functions
Done. @alexey-milovidov
✔️ Fix the remaining issues with in-memory parts and WAL
@CurtizJ
We removed in-memory parts and WAL.
✔️ Continue to support play.clickhouse.com
There is no source code. The version of ClickHouse is too old. There are multiple bugs.
Or remove it completely. @qoega
✔️ Fix issues with Postgres via ODBC
Done @kssenii
✔️ User roles from LDAP
✔️ Remove DataStreams
Done, @KochetovNicolai
🗑️ Incremental data clustering
Cancelled, @KochetovNicolai
✔️ Min-hash, Sim-hash support
Done. @KochetovNicolai, @alexey-milovidov
✔️ Enable compile_expressions by default
Done. @kitaisreal
✔️ Z-order indexing
In creeping progress.
✔️ Low performance of ser/de functions of DataType
Due to introduction of "Data type domains".
✔️ Library dictionary bridge
✔️ Versioning of aggregate function states
Done.
@kssenii
✔️ Type conversions for IN, JOIN
✔️ Support for all types in CASE operator with values
✔️ Extended range for DateTime64
Done, @Enmk, @alexey-milovidov
#9404
✔️ Improve logic of priorities of background merges
@nikitamikhaylov #22381
Done.
✔️ Better criteria for Too Many Parts
✔️ Speed-up ODBC table engine
Done, @kssenii
✔️ Replace OpenSSL with BoringSSL
Done. @alexey-milovidov
#16043
#18129
Enable pk-aware GROUP BY by default
✔️ Deduplication for non-replicated MergeTree on block level
Done, @yuzhichang, @alesapin: #8467
✔️ Pre-configured named connections in config
To avoid specifying user/password for external storages.
Done, @kssenii
Testing improvements
✔️ Automated tests for AArch64 builds
#15174
#22534
#22580
#22582
#22590
#22595
#22596
#22632
✔️ Add Query Fuzzer for Stress Tests
Done.
✔️ Add Thread Fuzzer for flaky tests checking
Done, #18299
Import obfuscated queries from Yandex.Metrica production
Fuzzing of cluster configurations
Fuzzing of ClickHouse versions for tests with distributed queries for compatibility
✔️ Integrate SQLancer
But it is abandoned and does not work anymore.
Integrate SQLLogicTest
✔️ More intense fuzzing of new added tests
Done, @alexey-milovidov
#18916
🗑️ Network replay server
Moved to next year.