fix(parquet): fix parquet export bug resulting in "more than one dictionary" error#6715
fix(parquet): fix parquet export bug resulting in "more than one dictionary" error#6715
Conversation
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
[PR Coverage check]😍 pass : 231 / 372 (62.10%) file detail
|
|
|
@CodeRabbit pls review this PR critically |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain
🏁 Script executed: Length of output: 47849 🏁 Script executed: Length of output: 760 🏁 Script executed: Length of output: 771 🏁 Script executed: Length of output: 8692 🏁 Script executed: Length of output: 47849
🚨 Critical Issues1. Debug Statement in Production CodeLine 1273 in println!(
"column_chunk_to_pages_with_dict_state: column={}, chunk_offset={}, include_dict={}",
column.name, chunk_offset, include_dict
);This will pollute production logs and should either be removed or converted to proper logging using a logging framework. 2. CI Failures Need Investigation
3. Low Test Coverage on Critical Code
The new functions like
|
Fixes #6692
Summary
Details
Problem: When exporting tables with symbol columns spanning multiple row groups (or multiple partitions merged into one row group), QuestDB was writing multiple dictionary pages per column chunk. This violates the Parquet specification which requires exactly one dictionary per column chunk, causing PyArrow to fail with "Column cannot have more than one dictionary."
Root cause: The previous implementation wrote a dictionary page for each partition's data within a row group, rather than a single unified dictionary for the entire row group.
Solution: