Conversation
…ress the v3 reader
…and factoring out extractFilePathAndETagFromReadBuffer
…ng separate options for v3 caches
| if (metadata_cache && metadata.has_value()) | ||
| { | ||
| String file_name = metadata->getFileName(); | ||
| String file_name = metadata->getPath(); |
There was a problem hiding this comment.
What is the important difference here?
There was a problem hiding this comment.
Full path on a blob_storage opposed to a last file part (after the last slash) in a file path. We can have potential problems with files which have one file name but lay in different folders
…se/ClickHouse into parquet_footer_cache
| std::lock_guard lock(reader_mutex); | ||
| reader.emplace(); | ||
| reader->reader.prefetcher.init(in, read_options, parser_shared_resources); | ||
| reader->reader.file_metadata = getFileMetadata(reader->reader.prefetcher); |
There was a problem hiding this comment.
I wonder if this line of code was added (I refactored the block of code which was done before, so I don't know).
There was a problem hiding this comment.
That line is correct. This is where we need to look up the file metadata.
|
The PR seems ok to merge, but there is some strange stuff with CI, creating new commit and restarting CI not to spend time on dealing with CI infrastructure |
b4886cc to
d4179cb
Compare
|
Sorry, I merged dirty master to the PR, already fixed the problem |
divanik
left a comment
There was a problem hiding this comment.
Small question if the change is related
5243ef7
Revert "Merge pull request #89750 - parquet_footer_cache"
…se/parquet_footer_cache" This reverts commit 223e53c.
|
The problem is clear: firstly we took info about use_parquet_v3_reader from the context, after that we took it from the format_settings. For some reason we have different values in two different sources (that's obviously mess, but out of the scope of this PR). @grantholly-clickhouse, can you try to reproduce the problem, adding a test, fixing and opening the new PR (I suggest not dealing with rerevert, it can be dangerous) |
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
Added a new SLRU cache for Parquet metadata to improve reading performance by removing the need to re-download files just to read metadata. The cache can be dropped with
SYSTEM DROP PARQUET METADATA CACHE. Resolves #89102.Documentation entry for user-facing changes
Details
This adds a new SLRU cache for parquet metadata (#89102). This aims to improve parquet reading performance by removing the need to re-download a file just to read the metadata. I tried to closely follow the pattern of the Iceberg file metadata cache as was suggested in the related issue.
All the configuration for both caches falls under
parquet_metadata_cache_*.