Skip to content

Add parquet format check to metadata cache#99230

Merged
grantholly-clickhouse merged 3 commits intoClickHouse:masterfrom
grantholly-clickhouse:add_parquet_format_check_to_metadata_cache
Mar 11, 2026
Merged

Add parquet format check to metadata cache#99230
grantholly-clickhouse merged 3 commits intoClickHouse:masterfrom
grantholly-clickhouse:add_parquet_format_check_to_metadata_cache

Conversation

@grantholly-clickhouse
Copy link
Copy Markdown
Contributor

@grantholly-clickhouse grantholly-clickhouse commented Mar 10, 2026

Changelog category (leave one):

  • Not for changelog (changelog entry is not required)

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

...

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

This added format check should ensure that we will only use the parquet metadata cache enabled format creator if the following are true

  1. We have enabled use_parquet_metadata_cache=1
  2. We are using the native v3 parquet reader input_format_parquet_use_native_reader_v3=1
  3. The file format is Parquet
  4. We have an etag from the object store response

Otherwise, we should throw a logical error.

This change makes check number 2 stronger. Here is a comment from cursor bot, which also detected that we could wrongly throw a logical error: #98140 (comment)


Note

Medium Risk
Touches core FormatFactory input-format selection logic; incorrect gating could change which reader implementation is used for some Parquet/S3 reads and impact performance or caching behavior.

Overview
Tightens when FormatFactory uses the metadata-aware random-access input creator by additionally requiring input_format_parquet_use_native_reader_v3 to be enabled (alongside presence of object metadata).

Extends 03707_parquet_metadata_cache to assert that Parquet metadata cache hits occur with the native v3 reader, and that disabling v3 results in no cache hit; updates expected output accordingly.

Written by Cursor Bugbot for commit ab4c618. This will update automatically on new commits. Configure here.

@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Mar 10, 2026

Workflow [PR], commit [ab4c618]

Summary:

@clickhouse-gh clickhouse-gh bot added the pr-not-for-changelog This PR should not be mentioned in the changelog label Mar 10, 2026
@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Mar 11, 2026

LLVM Coverage Report

Metric Baseline Current Δ
Lines 83.80% 83.80% +0.00%
Functions 23.80% 23.80% +0.00%
Branches 76.30% 76.30% +0.00%

PR changed lines: PR changed-lines coverage: 100.00% (7/7)
Diff coverage report
Uncovered code

@alesapin alesapin self-assigned this Mar 11, 2026
@grantholly-clickhouse grantholly-clickhouse added this pull request to the merge queue Mar 11, 2026
Merged via the queue into ClickHouse:master with commit 0249456 Mar 11, 2026
320 of 321 checks passed
@grantholly-clickhouse grantholly-clickhouse deleted the add_parquet_format_check_to_metadata_cache branch March 11, 2026 23:44
@robot-clickhouse-ci-2 robot-clickhouse-ci-2 added the pr-synced-to-cloud The PR is synced to the cloud repo label Mar 11, 2026
arthurpassos pushed a commit to Altinity/ClickHouse that referenced this pull request Mar 25, 2026
…rquet_format_check_to_metadata_cache

Add parquet format check to metadata cache
zvonand added a commit to Altinity/ClickHouse that referenced this pull request Mar 31, 2026
…data_cache_261

Antalya 26.1 Backport of ClickHouse#98140, ClickHouse#99230, ClickHouse#99231 and ClickHouse#96545 - Parquet metadata cache (upstream impl) and arrow library version bump
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-not-for-changelog This PR should not be mentioned in the changelog pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants