Enable string-based column projections from Parquet files#6871
Merged
alamb merged 2 commits intoapache:mainfrom Dec 18, 2024
Merged
Enable string-based column projections from Parquet files#6871alamb merged 2 commits intoapache:mainfrom
alamb merged 2 commits intoapache:mainfrom
Conversation
alamb
approved these changes
Dec 16, 2024
| message test_schema { | ||
| OPTIONAL INT32 a; | ||
| OPTIONAL INT32 b; | ||
| OPTIONAL INT32 a; |
Contributor
There was a problem hiding this comment.
This seems a nasty thing to do (repeat the name of a field in the parquet file) but it seems to be allowed and your code handles it
Contributor
Author
There was a problem hiding this comment.
Yeah, I'm not a fan of this behavior, but I think some query engines (spark perhaps) will produce duplicate names when joining tables. Necessary evil I guess.
Contributor
|
🚀 -- thanks again @etseidl |
CurtHagenlocher
pushed a commit
to CurtHagenlocher/arrow-rs
that referenced
this pull request
Dec 28, 2024
* add function to create ProjectionMask from column names * add some more tests
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Closes #182.
It's an old issue, so perhaps this change is not wanted, in which case this can be closed.
Rationale for this change
Allows projecting columns by name rather than index.
What changes are included in this PR?
Adds a new method
ProjectionMask::columnswhich takes a list of column names and returns aProjectionMask.Are there any user-facing changes?
New API call.