Skip to content

alamb/parquet_cmp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Parquet Implementation Compare Tool

This program is for testing the correctness of changes in the parquet crate with existing corpus of parquet files.

This crate reads parquet files into arrow RecordBatches using two different parquet implementations and compares the results are equal. It is used to verify proposed changes to the parquet crate.

It was initially created to verify apache/arrow-rs#1284

Usage:

./parquet_cmp <directory_with_parquet_files>

Example output:

$ cargo run --release -- --path ~/Documents/prod_dbs/
...
Both readers had same problem reading "010f0bd7-080f-4bbd-bcbf-a5c1048ef93a.parquet"; skipping file.
...
file "6f0333f2-c50c-463d-a09a-01396bb73504.parquet" compared successfully
...
107 files read with different readers compared successfully

About

Parquet Implementation Compare Tool

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages