ARROW-10100: [C++][Python][Dataset] Add ParquetFileFragment::Subset method#8301
Conversation
|
@bkietz this is what I had in mind for https://issues.apache.org/jira/browse/ARROW-10100. Thoughts? (only POC, if we add it, need to add proper tests etc) |
There was a problem hiding this comment.
Here I would still need to check that all row_group_ids exist in the fragment
288a329 to
a4faba0
Compare
|
@bkietz so I was now thinking it is actually fine to return a fragment with 0 row groups (it can then be the responsibility of the user to check for this, if they want to filter out the empty fragments out of their list of fragments). But, the problem is that an empty row group vector currently already means "all row groups" ... |
bkietz
left a comment
There was a problem hiding this comment.
Being more explicit about "lack of subselection" vs "empty subselection" sounds fine to me. I'll push a patch
|
@bkietz Thanks for the update! I further expanded the tests somewhat for the empty case. |
kszucs
left a comment
There was a problem hiding this comment.
LGTM, build error is unrelated.
No description provided.