[feat] Support using offset index in ParquetRecordBatchStream when pu…#3616
[feat] Support using offset index in ParquetRecordBatchStream when pu…#3616tustvold merged 2 commits intoapache:masterfrom
Conversation
Codecov Report
@@ Coverage Diff @@
## master #3616 +/- ##
==========================================
- Coverage 86.03% 86.02% -0.01%
==========================================
Files 300 300
Lines 56253 56456 +203
==========================================
+ Hits 48395 48564 +169
- Misses 7858 7892 +34
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
|
Thanks @Ted-Jiang. I think it may be better to just pass the option to |
Thanks for your advice! @thinkharderdev , IMOP, if we want to read
the reason why i put the index in If i am wrong or misunderstand plz correct me 😊? |
Correct, the advantage of what |
@thinkharderdev Thanks! If am right, we should make the read index api into I prefer keep read page_index in because it can reduce the code change , |
I think we may be talking about different things :). I'm saying the code to fetch the indexes already exists in arrow-rs so we don't need to duplicate the code in datafusion. You can just construct the |
oh! i miss this part 😂 , using the order version arrow-rs |
…shing down RowFilter. Signed-off-by: yangjiang <[email protected]>
232429f to
c849a30
Compare
|
@thinkharderdev Sorry for the mistake, i think its ready. |
|
This is modifying submodules, is this intentional? |
update the submodules for using the new added test parquet files |
tustvold
left a comment
There was a problem hiding this comment.
Looks good to me, only a minor suggestion.
Thank you for working on this, do you intend to work on hooking up the column index as well?
Co-authored-by: Raphael Taylor-Davies <[email protected]>
yes, i plan to do using |
|
Benchmark runs are scheduled for baseline = 451e441 and contender = 87faf86. 87faf86 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
alamb
left a comment
There was a problem hiding this comment.
This is great -- thank you @Ted-Jiang and @tustvold
…shing down RowFilter.
Signed-off-by: yangjiang [email protected]
Which issue does this PR close?
Closes #3456
Rationale for this change
enable read
page indexinParquetScanOptionsIf true, the reader will read pageIndex, If exit:
RowSelectorbefore read the file (like row-group pruning avoid I/O)RowSelector.What changes are included in this PR?
update submodule
testingandparquet-testingAre there any user-facing changes?