Introduce `STALENESS` for Iceberg engine

Iceberg table engine consists of metadata files (`.json` and `.avro`) and data files (in 99% of cases `.parquet`). All the files in iceberg table are immutable. This property allows to cache them in a very convenient way which clickhouse already extensively use:
1. Cache for parsed metadata files: https://github.com/ClickHouse/ClickHouse/pull/77156
2. Cache for parquet footers: https://github.com/ClickHouse/ClickHouse/pull/89750
3. Filesystem on-disk cache for all objects in Object Storage

However when we execute new `SELECT` query we still cannot avoid touching object storage (or catalog) because we need to check for new `metadata.json` files. So even if we already have 100% cached data we will still spend a lot of time going to external service. However it's quite common scenario when it's not required to always query most recent up-to-date data and some staleness is acceptable. Actually for ReplicatedMergeTree table engine some unpredictable staleness (replication lag) is default mode of SELECT query execution.

The idea is to introduce a setting `iceberg_read_staleness_seconds=xxx` for `Iceberg` table engine or table function.  If this setting is specified table will have background thread(s) which periodically proactively check table state and put metadata files into cache. The time of the last check is recorded and if it's less than specified in setting -- we serve query fully from the latest cached `metadata.json`.

Reference: https://www.firebolt.io/blog/querying-apache-iceberg-with-sub-second-performance

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce `STALENESS` for Iceberg engine #90387

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Introduce STALENESS for Iceberg engine #90387

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Introduce `STALENESS` for Iceberg engine #90387