Consolidate ParquetExec tests in parquet_exec integration test#4130
Consolidate ParquetExec tests in parquet_exec integration test#4130alamb merged 3 commits intoapache:masterfrom
ParquetExec tests in parquet_exec integration test#4130Conversation
| mod tests { | ||
| // See also `parquet_exec` integration test | ||
|
|
||
| use super::*; |
There was a problem hiding this comment.
@Ted-Jiang / @tustvold / @thinkharderdev what do you think about moving the remaining tests in this module into the integration test?
There was a problem hiding this comment.
I think some of them may make use of crate private functions, otherwise I have no strong feeling either way
There was a problem hiding this comment.
👍 yes I think the pruning ones in particular do so -- I would likely need to refactor somehow
There was a problem hiding this comment.
No objection from me. I think we can probably leave the pruning tests as unit tests here and just move the more "integration-y" tests?
There was a problem hiding this comment.
Yeah the only real reason to consolidate them is that my sense of order is somewhat violated that the three kinds of parquet predicate pushdown are not in the same pattern
There was a problem hiding this comment.
Maybe I will pursue the idea as a small follow on refactor
| // under the License. | ||
|
|
||
| /// Run all tests that are found in the `parquet` directory | ||
| mod parquet; |
There was a problem hiding this comment.
The idea is that this has become a single integration test that runs rather than three independent ones (that each need to be compiled / linked / run serially)
| // under the License. | ||
|
|
||
| /// Run all tests that are found in the `parquet` directory | ||
| mod parquet; |
| mod tests { | ||
| // See also `parquet_exec` integration test | ||
|
|
||
| use super::*; |
There was a problem hiding this comment.
No objection from me. I think we can probably leave the pruning tests as unit tests here and just move the more "integration-y" tests?
|
Benchmark runs are scheduled for baseline = 175adbd and contender = ebc279c. ebc279c is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
Which issue does this PR close?
re #3463
Rationale for this change
The parquet reader is both important and well tested in DataFusion. However, currently the tests are split in several places so it may not be all that clear what is covered.
Also, the parquet integration test currently takes 24 seconds so more parallelism would be better
What changes are included in this PR?
Changes:
Move most ParquetExec tests into the
parquet_execintegration test:So now we can run the parquet tests like:
cargo test --test parquet_execAre there any user-facing changes?
No