Skip to content

Parquet metadata cache v2#98140

Merged
divanik merged 21 commits intoClickHouse:masterfrom
grantholly-clickhouse:parquet_metadata_cache_v2
Mar 9, 2026
Merged

Parquet metadata cache v2#98140
divanik merged 21 commits intoClickHouse:masterfrom
grantholly-clickhouse:parquet_metadata_cache_v2

Conversation

@grantholly-clickhouse
Copy link
Copy Markdown
Contributor

@grantholly-clickhouse grantholly-clickhouse commented Feb 26, 2026

Changelog category (leave one):

  • New Feature

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Added a new SLRU cache for Parquet metadata to improve read performance by removing the need to re-download files just to read metadata.

Resolves #89102.

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

This pull request adds a new SLRU cache for parquet file metadata. When a file is read for the first time, we capture the file metadata which includes the schema information. Now when the file needs to be read, we can look up the file from the cache and have access to the file metadata. We use the filename and etag from the cloud storage response as cache keys. That way we can cache updated schemas for the same file.

The cache is enabled with use_parquet_metadata_cache=1 and can be dropped with SYSTEM CLEAR PARQUET METADATA CACHE. Dropping a table backed by a parquet file will not drop the corresponding cache entry. All cache invalidation is handled by SLRU in the parent class CacheBase.h. Even if the cache is enabled, it only works with the native v3 parquet reader.


Note

Medium Risk
Introduces a new global cache and threads it through Parquet read paths (especially object storage), which can affect memory usage and correctness of metadata invalidation (etag-keyed) as well as format factory selection logic.

Overview
Adds a new global Parquet metadata SLRU cache (defaults + server settings) and wires it into Parquet native v3 reads so metadata can be reused instead of re-reading file footers, keyed by object path + etag.

Extends FormatFactory with metadata-aware random-access creators (getInputWithMetadata / registerRandomAccessInputFormatWithMetadata) and updates object-storage reads to pass object metadata when use_parquet_metadata_cache=1 (and v3 reader is enabled). Adds observability/ops hooks: new ProfileEvents/CurrentMetrics, new privilege + parser support for SYSTEM DROP PARQUET METADATA CACHE, and documentation/tests covering cache hit/miss and disabling in existing S3 Parquet tests.

Written by Cursor Bugbot for commit 9ae55b9. This will update automatically on new commits. Configure here.

@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Feb 26, 2026

Workflow [PR], commit [9ae55b9]

Summary:

@clickhouse-gh clickhouse-gh bot added the pr-feature Pull request with new product feature label Feb 26, 2026
@divanik divanik self-assigned this Mar 3, 2026
size_t max_parsing_threads = parser_shared_resources->getParsingThreadsPerReader();
bool parallel_parsing = max_parsing_threads > 1 && settings[Setting::input_format_parallel_parsing]
&& creators.file_segmentation_engine_creator && !creators.random_access_input_creator && !need_only_count;
&& creators.file_segmentation_engine_creator && !(creators.random_access_input_creator && creators.random_access_input_creator_with_metadata)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's replace creators.random_access_input_creator_with_metadata with (creators.random_access_input_creator_with_metadata && metadata.has_value())

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, we could make this check more strict like this

    bool parallel_parsing = max_parsing_threads > 1 
        && settings[Setting::input_format_parallel_parsing]
        && creators.file_segmentation_engine_creator 
        && !(creators.random_access_input_creator 
            && creators.random_access_input_creator_with_metadata
            && metadata.has_value())
        && !need_only_count;

We are testing to see if we should use parallel parsing. I'm not sure that the presence or absence of object metadata makes that decision any clearer or more correct

Comment on lines +1446 to +1453
LOG_TRACE(lambda_logger, "using arrow reader in ParquetBlockInputFormat without metadata cache");
return std::make_shared<ParquetBlockInputFormat>(
buf,
std::make_shared<const Block>(sample),
settings,
std::move(parser_shared_resources),
std::move(format_filter_info),
min_bytes_for_seek);
min_bytes_for_seek
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is not resolved by removing the logical error. We still calls the function which deosn't use object metadata in registerRandomAccessInputFormatWithMetadata. It is redundant, we have the same code below. We still should have logical error if we are in this branch

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is presumably:
here you use settings.parquet.use_native_reader_v3
in the code above you use
context_->getSettingsRef()[Setting::input_format_parquet_use_native_reader_v3]

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will try to fix and reproduce

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revert #89750 (parquet footer cache) which causes a LOGICAL_ERROR exception when reading Parquet from S3 with the default settings (use_native_reader_v3 = false). The metadata-aware format creator only handled the use_native_reader_v3 path and threw a LOGICAL_ERROR for the default Arrow reader path, triggered whenever S3 storage provides blob metadata

I thought Alexey reverted this because we need a usable format creator in the case that the parquet metadata cache is turned on, but we don't have object store metadata. That makes sense to me. If, for some reason, we don't get object store metadata in the response, we still need to read the parquet file, and thus, we need some kind of format creator for reading.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alexey-milovidov Is this correct?

Copy link
Copy Markdown
Member

@divanik divanik Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose Alexey reverted it just because we encountered logical error. Why can we have metadata cache without object_metadata? It seems redundant, because it is unusable without metadata

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it was only due to a logical error.

@divanik
Copy link
Copy Markdown
Member

divanik commented Mar 5, 2026

There is really strange diff in SettingsChangesHistory, my bad that I overlooked it, I will try to fix

…check more restrictive, and setting previous value for use_parquet_metadata_cache to false

size_t ParquetMetadataCacheKeyHash::operator()(const ParquetMetadataCacheKey & key) const
{
return std::hash<String>{}(key.file_path) ^ std::hash<String>{}(key.etag);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While now it's not too wrong, the code smells bad practice.
See slide 11 here: https://presentations.clickhouse.com/2017-hash_tables/?full#11

Also, we shouldn't be using std::hash. Use methods from Hash.h

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

…and improving hash calculation of cache keys
ProfileEvents::increment(ProfileEvents::ParquetMetadataCacheHits);
}
return result.first->metadata;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cache returns large metadata struct by value

Medium Severity

getOrSetMetadata returns parquet::format::FileMetaData by value, causing a full deep copy of the thrift metadata struct on every cache hit. For files with many row groups and columns, this metadata can be substantial. The whole point of the cache is to avoid re-downloading, but the copy overhead on each access partially defeats the purpose.

Fix in Cursor Fix in Web

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is true, but it seems like a pretty large refactor. Maybe in another PR we could attempt this? The part that gives me pause here https://github.com/ClickHouse/ClickHouse/pull/98140/changes/BASE..b8ba83a5537d6c235d33919156cee5868f93fba6#diff-0b5a953b45a537794f28b2c4c4822e1f884b496c68fc5d05697d512958dd2791R101 in the native reader's initializeIfNeeded

        {
            std::lock_guard lock(reader_mutex);
            reader.emplace();
            reader->reader.prefetcher.init(in, read_options, parser_shared_resources);
            reader->reader.file_metadata = getFileMetadata(reader->reader.prefetcher);
            reader->reader.init(read_options, getPort().getHeader(), format_filter_info);
            reader->init(parser_shared_resources, buckets_to_read ? std::optional(buckets_to_read->row_group_ids) : std::nullopt);
        }

We are assigning the file_metadata from our cache lookup. I think we would need to refactor the reader to also deal with a pointer as well

Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

min_bytes_for_seek);
throw Exception(
ErrorCodes::LOGICAL_ERROR,
"Implementation of ParquetBlockInputFormat using arrow reader didn't require blob metadata for initialization");
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Metadata-aware creator throws for non-v3 Parquet reader

Medium Severity

The random_access_input_creator_with_metadata lambda for Parquet throws LOGICAL_ERROR when use_native_reader_v3 is false. In getInputImpl, this creator is unconditionally preferred whenever object_with_metadata has a value, with no check on the reader version. The current single call site in StorageObjectStorageSource guards against this, but the public getInputWithMetadata API has no such protection, making any future caller that passes metadata for non-v3 Parquet reads crash.

Additional Locations (1)

Fix in Cursor Fix in Web

@divanik divanik added this pull request to the merge queue Mar 9, 2026
Merged via the queue into ClickHouse:master with commit aa00dcb Mar 9, 2026
294 of 295 checks passed
@robot-clickhouse-ci-1 robot-clickhouse-ci-1 added the pr-synced-to-cloud The PR is synced to the cloud repo label Mar 9, 2026
if (settings.parquet.use_native_reader_v3)
{
LOG_TRACE(lambda_logger, "using native reader v3 in ParquetBlockInputFormat with metadata cache");
ParquetMetadataCachePtr metadata_cache = CurrentThread::getQueryContext()->getParquetMetadataCache();
Copy link
Copy Markdown
Member

@azat azat Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Member

@azat azat Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

arthurpassos pushed a commit to Altinity/ClickHouse that referenced this pull request Mar 24, 2026
…t_metadata_cache_v2

Parquet metadata cache v2
zvonand added a commit to Altinity/ClickHouse that referenced this pull request Mar 31, 2026
…data_cache_261

Antalya 26.1 Backport of ClickHouse#98140, ClickHouse#99230, ClickHouse#99231 and ClickHouse#96545 - Parquet metadata cache (upstream impl) and arrow library version bump
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-feature Pull request with new product feature pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Cache Parquet footer in LRU Cache

6 participants