-
Notifications
You must be signed in to change notification settings - Fork 8.3k
Intern Tasks 2021/2022 #29601
Description
This is the list of proposed tasks. It is to be extented. You can propose more tasks.
You can also find the previous lists here:
2020/2021: #15065
2019/2020: https://gist.github.com/alexey-milovidov/4251f71275f169d8fd0867e2051715e9
2018/2019: https://gist.github.com/alexey-milovidov/6735d193762cab1ad3b6e6af643e3a43
2017/2018: https://gist.github.com/alexey-milovidov/26cc3862eb87e52869b9dac64ab99156
The tasks should be:
- not too hard (doable within about a month) but usually not less than a week;
- not alter some core components of the system;
- mostly isolated, does not require full knowledge of the system;
- somewhat interesting to implement or have some point of research;
- not in critical path of our roadmap (ok to be throwed away after a year);
- most of them are for C++ developers, but there should be also tasks for frontend developers or tools/research that only require Go/Python/whatever;
- some tasks should allow team work;
- cover various skills, e.g. system programming, algorithm knowledge, etc...
This is a draft. Descriptions to be filled.
⚙️ Aggregate functions for graph processing
Booked by @ElderlyPassionFruit and @Hattonuri
⚙️ Aggregate functions for statistical tests
Booked by @trenin17
Normality tests and similar.
✔️ Representation of ZooKeeper data model as a flat table in ClickHouse
Booked by @punkmunk
But implemented by completely different people.
🗑️ Network replay server for testing
Booked by @zhukowladimir
✔️ Evaluation and testing of non-cryptographic hash functions in ClickHouse
Booked by @olevino
Integrate wyhash, meowhash, aquahash, farsh, t1ha and highwayhash.
⚙️ Parallel compression for data export
Booked by @kavladst
Allow to parallelize data export into gz, xz and bzip2 formats.
🗑️ ClickHouse in a web browser with WebAssembly
Booked by @Alucardik
An experiment has been finished successfully and the outcome is a demonstration of why this task is unrealistic.
⚙️ Implementation of Graphite Carbon API (Graphite Web) in ClickHouse
Booked by @qwertBR
🗑️ Implementation of Prometheus querying API in ClickHouse
Booked by @gitnabi
✔️ Minimal plotting capabilities in ClickHouse
Booked by @vlerdman
Reimplemented by @alexey-milovidov as /dashboard UI.
⚙️ Collecting of Linux Perf data in ClickHouse
Booked by @rubin-do
A prototype has been demonstrated, but it has low applicability.
✔️ Integration of ClickHouse with MeiliSearch
Booked by @Michicosun
🗑️ Time series analysis with window functions
Booked by @mathalex Alexey Boykov.
Simple moving average. Holt-Winters forecast. ARIMA. Discovery of "shock events".
A modifier for ORDER BY WITH FILL or similar to fill data with extrapolation.
🗑️ API endpoints based on parametrized views in ClickHouse
Booked by @Fancy2000
Manage HTTP handlers (API endpoints) with SQL queries (creating parametrized views a.k.a. table functions).
⚙️ Embedded ClickHouse as a Python module
Booked by @LGrishin
⚙️ Compilation of expressions to GPU code
Booked by @evillique
✔️ Integrating Rust code into ClickHouse
Booked by @BoloniniD
With BLAKE3 hash function as an example.
⚙️ Key value data marts in ClickHouse
Booked by @dankondr
⚙️ Improvements of ClickHouse integration with foreign databases
Booked by @aapetrenko and @kate1mag
Table functions to access MongoDB, Redis, and Cassandra. Integration with ElasticSearch.
Unrestricted reads from ZooKeeper.
⚙️ Improvements of ClickHouse integration with data streams
Booked by @tchepavel
Integration with Apache Pulsar, Redis Streams, NATS or Kinesis, SQS.
NATS successfully merged and used in production.
Redis Streams is in pull request stage.
🗑️ Integration of ClickHouse with embedded key-value stores
Booked by @nautaa
Integration with TerarkDB, libfpta, FASTER.
🗑️ Integration of ClickHouse with MADLib
Booked by @antikvist, @sabinadayanova
#4425
✔️ Schema inference for data formats. Support for new input/output formats in ClickHouse
Booked by @Avogar
Flatbuffers, HDF5 and sas7bdat.
🗑️ Advanced compression methods in ClickHouse
Booked by @takashirei
bsc, csc and bcm.
✔️ ClickHouse as a backend for Istio Telemetry.
Booked by @Romanchenko
https://github.com/Romanchenko/telemetry_broker
Limited applicability.
✔️ Versions Playground for ClickHouse
Booked by @darkkeks
https://fiddle.clickhouse.com/
⚙️ Integration of ClickHouse with MySQL Parser
Booked by @mrworker27
🗑️ Tamper-proof data storage with blockchain
Booked by @Justarone
🗑️ Isolation of user-defined functions with Firecracker VM
Booked by @ivolff
🗑️ Direct import from files inside tar/zip/7z archives
Booked by @0442A403
⚙️ SQL functions for compatibility with MySQL dialect
Booked by @Shuba-Buba, @evlampiy-lavrentiev, @psevdoinsaf
✔️ Porting ClickHouse SIMD optimizations to ARM NEON
Booked by @chalice19, preliminary
🗑️ Limited support for correlated subqueries in ClickHouse
Booked by @Amesaru
🗑️ : User Defined Functions with Julia, R or Scipy
Booked by @vvd170501
⚙️ Functions to extract data from HTML with CSS selectors
Booked by @zdikov
⚙️ Improvements of PREWHERE operator in ClickHouse.
Booked by @nikvas0
⚙️ Implicit user credentials and TOTP for authentication
Booked by @kam3nskii
🗑️ : Parallel execution of Distributed DDL queries
Booked by @shaprunovk
🗑️ Fuzzy GROUP BY for data clustering
Booked by @umchemurziev
✔️ Optimization of caching strategies in ClickHouse
Booked by @alexX512
⚙️ Optimization of queries with ordering by sublinear aggregate functions
Booked by @dimarub2000
✔️ Extensions of ZooKeeper protocol for transactions.
Booked by @asokol123
Finished by @antonio2368, #41410.
⚙️ Improvements for CASE operator and transform function
Booked by @pmimanukyan
⚙️ Comparison of Snap, AppImage and Flatpak formats on ClickHouse builds
Booked by @TrueAstralpirate
✔️ Integration of ClickHouse with Observable and Falcon
Booked by @DotJason
⚙️ Implementation of GWP-Asan and comparison of memory allocators in ClickHouse
**Booked by **
⚙️ Implementation of aggregate function combinators: TOTAL, BY and ORDER BY.
**Booked by **
🗑️ Improvements of ClickHouse fuzzing
Booked by @mark-polokhov
✔️ Optimizations of ClickHouse for cloud infrastructure
Booked by @nikitamikhaylov
⚙️ Extending Date and Time Functions in ClickHouse
Booked by @elevankoff
🗑️ Extended Temporary Tables in ClickHouse
**Booked by **
✔️ Specialized compression codecs for floating point data
Booked by @koloshmet
⚙️ Grace hash JOIN
Booked by Sergei Skvortsov, @BigRedEye
Probabilistic data structures for approximate (range) filtering in ClickHouse queries.
For example: SuRF: Practical Range Query Filtering with Fast Succinct Tries (2018) and Proteus: A Self-Designing Range Filter (2022)
Contact: @rschu1ze
Booked by @ruct
Investigate last-level cache partitioning for ClickHouse queries (Intel Cache Allocation Technology)
For example, Accelerating Concurrent Workloads with CPU Cache Partitioning (2018) and Data Processing on Modern Hardware
Contact: @rschu1ze
Entropy-learned Hashing
Try out Entropy-Learned Hashing Constant Time Hashing with Controllable Uniformity (2022) in ClickHouse's hash aggregation
Contact: @rschu1ze