Add parquet format check to metadata cache#99230
Merged
grantholly-clickhouse merged 3 commits intoClickHouse:masterfrom Mar 11, 2026
Merged
Conversation
…tching to a metadata-aware parquet reader
…the v3 parquet reader
Contributor
Contributor
LLVM Coverage Report
PR changed lines: PR changed-lines coverage: 100.00% (7/7) |
alesapin
approved these changes
Mar 11, 2026
Merged
via the queue into
ClickHouse:master
with commit Mar 11, 2026
0249456
320 of 321 checks passed
arthurpassos
pushed a commit
to Altinity/ClickHouse
that referenced
this pull request
Mar 25, 2026
…rquet_format_check_to_metadata_cache Add parquet format check to metadata cache
27 tasks
zvonand
added a commit
to Altinity/ClickHouse
that referenced
this pull request
Mar 31, 2026
…data_cache_261 Antalya 26.1 Backport of ClickHouse#98140, ClickHouse#99230, ClickHouse#99231 and ClickHouse#96545 - Parquet metadata cache (upstream impl) and arrow library version bump
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
...
Documentation entry for user-facing changes
This added format check should ensure that we will only use the parquet metadata cache enabled format creator if the following are true
use_parquet_metadata_cache=1input_format_parquet_use_native_reader_v3=1Otherwise, we should throw a logical error.
This change makes check number 2 stronger. Here is a comment from cursor bot, which also detected that we could wrongly throw a logical error: #98140 (comment)
Note
Medium Risk
Touches core
FormatFactoryinput-format selection logic; incorrect gating could change which reader implementation is used for some Parquet/S3 reads and impact performance or caching behavior.Overview
Tightens when
FormatFactoryuses the metadata-aware random-access input creator by additionally requiringinput_format_parquet_use_native_reader_v3to be enabled (alongside presence of object metadata).Extends
03707_parquet_metadata_cacheto assert that Parquet metadata cache hits occur with the native v3 reader, and that disabling v3 results in no cache hit; updates expected output accordingly.Written by Cursor Bugbot for commit ab4c618. This will update automatically on new commits. Configure here.