[GLUTEN-8623][CH] Support File meta and row index for parquet#8624
Merged
baibaichen merged 14 commits intoapache:mainfrom Feb 5, 2025
Merged
[GLUTEN-8623][CH] Support File meta and row index for parquet#8624baibaichen merged 14 commits intoapache:mainfrom
baibaichen merged 14 commits intoapache:mainfrom
Conversation
|
Run Gluten Clickhouse CI on x86 |
3 similar comments
Contributor
Author
|
Run Gluten Clickhouse CI on x86 |
|
Run Gluten Clickhouse CI on x86 |
|
Run Gluten Clickhouse CI on x86 |
fd7cd44 to
abcd560
Compare
|
Run Gluten Clickhouse CI on x86 |
abcd560 to
dd2b207
Compare
|
Run Gluten Clickhouse CI on x86 |
dd2b207 to
db908d0
Compare
|
Run Gluten Clickhouse CI on x86 |
1 similar comment
|
Run Gluten Clickhouse CI on x86 |
cfcee54 to
e410ff6
Compare
|
Run Gluten Clickhouse CI on x86 |
e410ff6 to
4346e8b
Compare
|
Run Gluten Clickhouse CI on x86 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
(Fixes: #8623)
This PR supports file meta for all supported format and row index for parquet.
Supporting File Meta
To support File Meta,
FileReaderWrapperis renamed toBaseReader, and add a member namedDB::Columns addVirtualColumn(DB::Chunk dataChunk, size_t rowNum = 0) const, which is called atXXFileReader::pull.NormalFileReader::pullis responsible for reading real data from file, andConstColumnsFileReader::pullis responsible for generating n rows of meta data when there is no need to read real data.After read data from file, file meta are added.
Supporting Row index for parquetParquetInputFormat::generate
To support row index for parquet, I refactor
FormatFile::InputFormatand create a new child classParquetInputFormatIn
ParquetInputFormat::generate, we did same asXXFileReader::pull, reading real data from parquet file first, and then add row index.How was this patch tested?
spark 35 test are added
see https://opencicd.kyligence.com/blue/rest/organizations/jenkins/pipelines/gluten/pipelines/gluten-ci/runs/14511/nodes/151/steps/205/log/?start=0