-
Notifications
You must be signed in to change notification settings - Fork 8.3k
Roadmap 2023 #44767
Description
This is ClickHouse roadmap 2023.
Descriptions and links are to be filled.
This roadmap does not cover the tasks related to infrastructure, orchestration, documentation, marketing, external integrations, drivers, etc.
See also:
Roadmap 2022: #32513
Roadmap 2021: #17623
Roadmap 2020: link
Testing and hardening
Fuzzer of data formats.
Fuzzer of network protocols.
✔️ Fuzzer of MergeTree settings.
✔️ Server-side AST query fuzzer.
Generic fuzzer for query text.
Randomization of DETACH/ATTACH in tests.
✔️ Integrate with SQLogicTest.
✔️ GWP-ASan integration.
✔️ Fully-static ClickHouse builds.
✔️ Embedded documentation.
✔️ Green CI: https://aretestsgreenyet.com/
Query analysis and optimization
✔️ Enable Analyzer by default.
Remove old predicate pushdown mechanics.
✔️ Support for all types of CASE operators with values.
Transforming anti-join: LEFT JOIN ... WHERE ... IS NULL to NOT IN.
Deriving index condition from the right-hand side of INNER JOIN.
✔️ JOINs reordering and extended pushdown.
✔️ Correlated subqueries (with decorrelation).
Parallel in-order GROUP BY.
GROUP BY optimizations based on query rewrite.
Use table sorting for DISTINCT optimization.
✔️ Use table sorting for merge JOIN.
Alias columns in views.
✔️ Recursive CTE.
Separation of storage and compute ☁️
✔️ Increased prefetching for VFS.
✔️ Lazy initialization of tables and data parts.
✔️ Cooperation of temporary data for query processing with VFS cache.
✔️ Multiplexed data parts format for lowering the number of writes ☁️
✔️ Parallel replicas with task callbacks (production readiness).
✔️ Parallel replicas with custom sharding key.
✔️ SharedMergeTree - a table engine with a shared set of data parts. ☁️
✔️ Shared metadata storage - no local metadata on replicas. ☁️
Optimization of high-cardinality operations with the help of parallel replicas.
Query offloading to dynamic stateless workers.
✔️ Instant attaching tables from backups.
Security and access control
✔️ Untangle auth methods for S3.
✔️ Managing named collections with SQL.
Secure storage for named collections. ☁️
Dynamic managing of custom query handlers.
✔️ Default password hashing schema.
✔️ Bcrypt or other PBKDF.
✔️ Resource scheduler.
✔️ TLS for the PostgreSQL compatibility protocol.
✔️ Data masking in row-level security.
✔️ JWT support. ☁️
Formats and integrations
✔️ Headers autodetect for -WithNames formats.
✔️ Support s3-style URLs along with gs and others.
✔️ Support for Apache Iceberg.
✔️ Optimization of reading for Parquet.
✔️ Overlay databases and File database engine.
✔️ Streaming consumption from a bunch of files.
✔️ Support for embedded indices inside Parquet.
Schema inference for table function mongodb.
Table function zookeeper.
Simple HTTP PUT or form file API for data insertion.
Asynchronous inserts by default.
Data storage
✔️ Lightweight deletes - production readiness.
Transactions for ReplicatedMergeTree
✔️ Indexing with space-filling curves.
Uniform treatment of LowCardinality, Sparse, and Const columns.
✔️ Grouping a huge number of columns together (packed part format). ☁️
✔️ Deprecate in-memory data parts or make them production ready.
Experimental features and research
✔️ Query results cache.
✔️ Regexp-Tree dictionaries.
✔️ Batch jobs and refreshable materialized views.
Streaming queries.
Freeform text format.
✔️ Semistructured data support - production readiness.
Websocket protocol for ClickHouse.
✔️ ssh protocol for ClickHouse.
Key-value data marts.
Unique Key Constraint.
✔️ PRQL as a dialect.
✔️ Kusto support.