Read optimization for Iceberg tables using Iceberg metadata by ianton-ru · Pull Request #90001 · ClickHouse/ClickHouse

ianton-ru · 2025-11-13T15:08:05Z

Changelog category (leave one):

Performance Improvement

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Read optimization for Iceberg tables using Iceberg metadata

Documentation entry for user-facing changes

When Iceberg table is from Iceberg catalog, catalog returns additional metadata about table, including count of rows, count of nulls, min/max values for columns in files.
If some column in some file has zero nulls and min value is equal to max value, we can make a decision that this column has the same value for each rows in this specific file.
That we can avoid reading this column from file and create values based on metadata.
This reduces number of requests to remote storage like S3 or Azure, and as result reduces time on request execution.

Current PR introduces setting allow_experimental_iceberg_read_optimization to turn on this optimization.

alesapin · 2025-11-13T15:31:21Z

@ianton-ru I understand the optimization, but does it really optimize some real-world workloads? To me it looks like a very rare situation. Even if user has such columns in some files, very likely they will read some other columns in the same query. And in this case benefit will be minuscule.

Not really sure it worth some specific and quite complex code. Maybe you can provide some context on it?

ianton-ru · 2025-11-14T08:17:35Z

@alesapin
Yes, benefits mostly only when query select only constant columns from files. But may be great. In our test instance query like SELECT value,count() FROM iceberg.table GROUP BY key where value is constant in each file, but different in different files, near 4 billion rows, time reduced from 60+ seconds to 4 seconds.

clickhouse-gh · 2025-11-14T09:12:21Z

Workflow [PR], commit [8513c3a]

Summary: ❌

job_name	test_name	status	info
Build (amd_compat)		failure
	Cmake configuration	failure	cidb
Stateless tests (arm_binary, parallel)		failure
	03711_query_cache_http_introspection	FAIL	cidb
Integration tests (amd_tsan, 5/6)		failure
	test_storage_nats/test_nats_jet_stream.py::test_nats_overloaded_insert	FAIL	cidb
BuzzHouse (amd_debug)		failure
	Logical error: 'Inconsistent AST formatting: the query:	FAIL	cidb

Removed conditional compilation for Avro support in IcebergIterator.

divanik

I briefly scimmed the PR and left some comments but the main question is here: #90001 (comment)

divanik · 2025-11-19T13:05:05Z

src/Storages/ObjectStorage/DataLakes/Iceberg/IcebergIterator.h

    std::vector<Iceberg::ManifestFileEntry> equality_deletes_files;
    std::exception_ptr exception;
    std::mutex exception_mutex;
+    Int32 table_schema_id;


Could you clarify, why do we need new field here?

When schema was change, some parquet files can cave old schema, some new.
Say, initially schema had column name, and file1 was created with this schema.
Later column was renamed to old_name, added new column name, and file2 was created with new schema.
But column index stay the same.
Iceberg metadata with min/max values does not contains column names, only indexes.
Query, on the contrary, contains only names, and I need map column name => metadata.
So I made next steps to get correct column metadata:

get current schema (with this table_schema_id)

get column index for each column in current schema (again with table_schema_id)

create map column name => metadata instead of column index => metadata from Iceberg metadata.
All this in new DataFileMetaInfo constructor, parameter schema_id is this table_schema_id.

divanik · 2025-11-19T13:06:34Z

src/Disks/ObjectStorages/IObjectStorage.h

    /// Object metadata: size, modification time, etc.
    std::optional<ObjectMetadata> metadata;
+    /// Information about columns
+    std::optional<DataFileMetaInfoPtr> file_meta_info;


Do we have any reasons to store field file_meta_info in the class RelativePathWithMetadata? If not, please, move it to ObjectInfo

Let's move it either to DataLakeObjectMetadata or IcebergDataObject

No, just historical reasons. Will move into DataLakeObjectMetadata

Moved to ObjectInfo. data_lake_metadata in ObjectInfo is optional and can absent independently.

divanik · 2025-11-19T13:09:57Z

src/Storages/ObjectStorage/StorageObjectStorageSource.h

        std::unique_ptr<PullingPipelineExecutor> reader;
+
+    public:
+        std::map<size_t, ConstColumnWithValue> constant_columns_with_values;


At least this field should be constant

Can be changed in ReaderHolder::operator=

divanik · 2025-11-19T14:02:05Z

src/Storages/ObjectStorage/DataLakes/IDataLakeMetadata.h

+struct DataFileInfo
+{
+    std::string file_path;
+    std::optional<DataFileMetaInfoPtr> file_meta_info;
+
+    explicit DataFileInfo(const std::string & file_path_)
+        : file_path(file_path_) {}
+
+    explicit DataFileInfo(std::string && file_path_)
+        : file_path(std::move(file_path_)) {}
+
+    bool operator==(const DataFileInfo & rhs) const
+    {
+        return file_path == rhs.file_path;
+    }
+};


Seems as an odd abstraction level, let's remove it completely

divanik · 2025-11-19T14:07:02Z

src/Storages/ObjectStorage/DataLakes/IDataLakeMetadata.cpp

+constexpr size_t FIELD_MASK_RECT = 0x4;
+constexpr size_t FIELD_MASK_ALL = 0x7;
+
+void DataFileMetaInfo::serialize(WriteBuffer & out) const


You need to do these serialization/deserialization functions versioned. There are two approaches: you either make a separate protocol version for this structure or you inherit protocol version from above protocol. In the latter case you name function as serializeForClusterFunction/deserializeForClusterFunction and take protocol versions from functions above

Sorry, I don't understand. serialize/deserialize is already called only when protocol version is 5 or greater. Why need check it inside one more time?

divanik · 2025-11-19T14:08:16Z

src/Storages/ObjectStorage/StorageObjectStorageSource.cpp

+            /// Not empty when allow_experimental_iceberg_read_optimization=true
+            /// and some columns were removed from read list as columns with constant values.
+            /// Restore data for these columns.
+            for (const auto & constant_column : reader.constant_columns_with_values)


Use structure binding

divanik · 2025-11-19T14:10:05Z

src/Storages/ObjectStorage/StorageObjectStorageSource.cpp

+    std::map<size_t, ConstColumnWithValue> constant_columns_with_values;
+    std::unordered_set<String> constant_columns;
+
+    NamesAndTypesList requested_columns_copy = read_from_format_info.requested_columns;


Make copy later, when you indeed need a copy

It can be changed in the next block - constant columns are removed from list.
Can rename it to non_constant_requested_columns.

divanik · 2025-11-19T14:23:36Z

src/Storages/ObjectStorage/StorageObjectStorageSource.cpp

            columns.emplace_back(type->createColumn(), type, name);
        builder.init(Pipe(std::make_shared<ConstChunkGenerator>(
                              std::make_shared<const Block>(columns), *num_rows_from_cache, max_block_size)));
+        if (!constant_columns.empty())


Could you explain these lines of code?

Here I remove constant columns from requested list to read only non-constant. Contant columns are restored later in StorageObjectStorageSource::generate() from metadata without reading from source file.

divanik · 2025-11-19T14:28:42Z

src/Storages/ObjectStorage/StorageObjectStorageSource.cpp

    builder.addSimpleTransform([&](const SharedHeader & header)
    {
-        return std::make_shared<ExtractColumnsTransform>(header, read_from_format_info.requested_columns);
+        return std::make_shared<ExtractColumnsTransform>(header, requested_columns_copy);


I am not sure if it is indeed an optimization. Does this code really ensure that we don't read constant lines from the file?

Here in requested_columns_copy only non-constant columns, constant column were removed from this list above.

divanik · 2025-11-20T15:48:38Z

Found some major problem, so convert the PR to draft until it is resolved

divanik · 2025-12-29T11:56:43Z

Feel free to reopen the PR and tag me in DM if the major problem is resolved

Read optimization using Iceberg metadata

999fa20

alesapin added the can be tested Allows running workflows for external contributors label Nov 14, 2025

clickhouse-gh bot added the pr-performance Pull request with some performance improvements label Nov 14, 2025

divanik self-requested a review November 14, 2025 11:40

divanik self-assigned this Nov 14, 2025

ianton-ru and others added 4 commits November 18, 2025 10:15

Fix style

8ba561c

Merge master

d6550ec

Fix build without AVRO

7f88457

Remove USE_AVRO conditional from IcebergIterator

d76ee18

Removed conditional compilation for Avro support in IcebergIterator.

divanik reviewed Nov 19, 2025

View reviewed changes

Fix after review

8513c3a

divanik marked this pull request as draft November 20, 2025 15:48

ianton-ru mentioned this pull request Dec 17, 2025

Protocol version conflict with upstream Altinity/ClickHouse#1236

Closed

divanik closed this Dec 29, 2025

Conversation

ianton-ru commented Nov 13, 2025 • edited by clickhouse-gh bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changelog category (leave one):

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Documentation entry for user-facing changes

Uh oh!

alesapin commented Nov 13, 2025

Uh oh!

ianton-ru commented Nov 14, 2025

Uh oh!

clickhouse-gh bot commented Nov 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

divanik left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

divanik commented Nov 20, 2025

Uh oh!

divanik commented Dec 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ianton-ru commented Nov 13, 2025 •

edited by clickhouse-gh bot

Loading

clickhouse-gh bot commented Nov 14, 2025 •

edited

Loading