ARROW-9827: [C++][Dataset] Skip parsing RowGroup metadata statistics when there is no filter#8037
Conversation
There was a problem hiding this comment.
@bkietz ideally we would also skip this if the flilter only involves the partition_expression and not any actual columns of the file. What is the best way to check this? (simplify the filter first with partition_expression ?)
There was a problem hiding this comment.
Simplifying the filter by the partition expression would work, but it's not safe without loading the physical schema since we need to validate the filter against that. Otherwise a filter without implicit casts (for example) would result in an assertion failure. Let's leave that optimization for a follow up; the right way to do this is probably to rewrite Expression::Assume to return a Result (allowing it to fail non-catastrophically even without a schema against which to validate), but that's a much larger project
bkietz
left a comment
There was a problem hiding this comment.
Looks good to me, please add a comment explaining the purpose of the new code inline
There was a problem hiding this comment.
Simplifying the filter by the partition expression would work, but it's not safe without loading the physical schema since we need to validate the filter against that. Otherwise a filter without implicit casts (for example) would result in an assertion failure. Let's leave that optimization for a follow up; the right way to do this is probably to rewrite Expression::Assume to return a Result (allowing it to fail non-catastrophically even without a schema against which to validate), but that's a much larger project
There was a problem hiding this comment.
| } else { | |
| } else { | |
| // since we are not scanning this fragment with a filter, don't bother loading statistics |
0fc54ca to
99d3aec
Compare
|
@bkietz with some delay, added the suggested comment |
bkietz
left a comment
There was a problem hiding this comment.
LGTM
https://issues.apache.org/jira/browse/ARROW-9945 for follow up to add the more general case
|
Thanks for opening the follow-up issue! |
No description provided.