Extends parquet fuzz tests to also tests nulls, dictionaries and row groups with multiple pages (#1053)#1110
Conversation
c5597b2 to
7838585
Compare
parquet/src/arrow/arrow_reader.rs
Outdated
There was a problem hiding this comment.
This didn't seem to serve a purpose, as it was always set in such a way as to read all the data, so I removed it
There was a problem hiding this comment.
I agree that it is redundant when record_batch_size is provided (which means the data is not all read in one big chunk, but is read in record_batch_size chunks)
7838585 to
0baa151
Compare
Codecov Report
@@ Coverage Diff @@
## master #1110 +/- ##
==========================================
+ Coverage 82.55% 82.56% +0.01%
==========================================
Files 169 169
Lines 50456 50535 +79
==========================================
+ Hits 41655 41726 +71
- Misses 8801 8809 +8
Continue to review full report at Codecov.
|
ff604e0 to
8b98d0e
Compare
|
Thanks to @yordan-pavlov 's work on #1130 this now passes on master 🎉 |
parquet/src/arrow/arrow_reader.rs
Outdated
There was a problem hiding this comment.
I agree that it is redundant when record_batch_size is provided (which means the data is not all read in one big chunk, but is read in record_batch_size chunks)
Which issue does this PR close?
Closes #1053.
Rationale for this change
See ticket
What changes are included in this PR?
This extends the parquet fuzz tests to also tests nulls, dictionaries and row groups with multiple pages.
Currently this runs into what appears to be a bug in the null handling for ArrowArrayReader. This is likely the same as in apache/datafusion#1441 - I have temporarily switched back to ComplexObjectArrayReader to get the test to pass, and will look into a fix prior to marking this ready for review.This has been fixed by #1130Are there any user-facing changes?
No, this only adds tests