-
Notifications
You must be signed in to change notification settings - Fork 4.1k
[Go][Parquet] Trouble using the C++ reader to read a Parquet file written with the Go writer #38503
Description
Describe the bug, including details regarding any error messages, version, and platform.
Version: 7ef517e31ec3
OS: macOS 13 arm64
I'm uncertain if this is user error, an issue with the Go packages, or an issue with the C++ reader. I've put together a test that demonstrates the issue here: https://github.com/tschaub/parquet-issue-38503
I'm trying to use the pqarrow package to read an input Parquet file, transform some of the data, and write an output Parquet file. In the linked test case, there is no transformation step. So the test uses a pqarrow.FileReader, gets a pqarrow.RowGroupReader for each row group, reads each column as an arrow.Chunked, and uses a pqarrow.ArrowColumnWriter to write out the same.
When I try to use the C++ parquet-reader to read in the output file, I see the following error:
# parquet-reader output.parquet > /dev/null
Parquet error: Malformed levels. min: 2 max: 2 out of range. Max Level: 1This same test passes for other Parquet files. I originally encountered the problem with one of the Overture Maps Parquet files, and the linked test case is based on a subset of that data using only two columns and a single row.
Summarizing
| file | C++ reader | Go reader |
|---|---|---|
input.parquet |
✅ | ✅ |
output.parquet |
❌ | ✅ |
Component(s)
Go, Parquet