-
Notifications
You must be signed in to change notification settings - Fork 8.3k
Support for Writing to Apache Iceberg Tables in ClickHouse #49973
Description
An (albeit edited) ChatGPT'ed request @tbragin
Is your feature request related to a problem? Please describe.
While ClickHouse supports querying on top of various data formats, including Apache Iceberg, there's currently no way to write data directly into Iceberg format or other data lake formats. This is a limitation for users who heavily rely on data lakes and Iceberg tables for their diverse workflows. Like Clickhouse, Python, Spark, DuckDB can read Iceberg directly, and get better performance reading Iceberg tables directly.
Describe the solution you'd like
I propose that ClickHouse adds support for writing data into Iceberg tables. The possibility of using a common table format like Iceberg, that is agreed upon, could potentially make ClickHouse a more inclusive and appealing solution for data lake-centric organizations.
Describe alternatives you've considered
Currently, the only viable way to write Iceberg tables is through platforms like Dremio, Snowflake, or Spark. However, these are either too complex, outdated, or not ideal for various reasons. Having the ability to write to Iceberg directly from ClickHouse would simplify the process greatly.
Additional context
Adding this feature could have several benefits:
-
It would eliminate the need for copying data into different systems, thus reducing redundancy and potential inconsistencies.
-
ClickHouse could potentially become the metadata owner for data lakes, making it an indispensable service for data lake management. Every org needs a datalake, not every org needs Clickhouse. Clickhouse has rock solid ETL tools in other ways, if it can control the metadata Clickhouse becomes a necessity for the datalake
-
It could encourage a more seamless collaboration between different data tools (Spark, Trino, DuckDB, etc.), as they could all perform their unique operations directly on the shared Iceberg data.
AFAIK right now there is no good way to write Iceberg tables except via Spark.