feat(parquet): add decimal type support for parquet read/write#6725
feat(parquet): add decimal type support for parquet read/write#6725bluestreak01 merged 43 commits intomasterfrom
Conversation
Add support for all 6 decimal types (DECIMAL8/16/32/64/128/256) in parquet format. Write support: - Add decimal native types in Rust for all sizes - Implement parquet encoding storing decimals as fixed-length byte arrays in big-endian format per Parquet specification Read support: - Add decoders for DECIMAL8/16/32/64 using ReverseFixedColumnSink - Add WordSwapDecimalColumnSink for DECIMAL128/256 that correctly converts from Parquet's big-endian format by reversing each 8-byte word independently Testing: - Add comprehensive fuzz tests with random precision/scale values - Add col-tops tests for all decimal types - Add round-trip tests (native -> parquet -> native -> parquet) - Add Rust unit tests for WordSwapDecimalColumnSink Co-Authored-By: Claude Opus 4.5 <[email protected]>
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Use the checkbox below for a quick retry:
WalkthroughAdds end-to-end support for fixed-length decimal types (Decimal8–Decimal256) across Parquet read/write and core column typing: new decimal type wrappers, Parquet encoding/decoding paths and sinks, byte-representation API changes, ColumnType decimal constructors/accessors, and extensive tests across Rust and Java layers. Changes
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@CodeRabbit pls review critically |
|
I'll provide detailed feedback on any issues, potential improvements, or concerns I identify. ✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@core/rust/qdbr/src/parquet_write/decimal.rs`:
- Around line 49-81: The ord() implementations for Decimal128 and Decimal256
currently only compare the high-part fields (Decimal128::ord and
Decimal256::ord) causing values with equal high parts but different low parts to
be treated as equal; update both ord() methods to first compare the high parts
(self.0 vs other.0), and if Equal compare the low parts treating them as
unsigned (e.g., cast to u64 for Decimal128 low part, and appropriate unsigned
types for Decimal256 low parts) so the ordering reflects the full multi-word
value while keeping the same null handling and NativeType semantics.
🧹 Nitpick comments (2)
core/rust/qdbr/src/parquet_write/schema.rs (1)
287-312: Add a guard for scale > precision before emitting Decimal metadata.This prevents constructing invalid decimal logical types if upstream encodes a bad scale/precision pair.
🔧 Suggested defensive check
let scale = column_type.decimal_scale() as usize; let precision = column_type.decimal_precision() as usize; + if scale > precision { + return Err(fmt_err!( + InvalidType, + "decimal scale {scale} exceeds precision {precision}" + )); + } Ok(ParquetType::try_from_primitive( name, PhysicalType::FixedLenByteArray(size),core/rust/qdbr/src/parquet_read/column_sink/fixed.rs (1)
651-653: Redundant.clone()call onParquetResult<()>.The
result()method returnsParquetResult<()>which isResult<(), ParquetError>. Since()isCopyand the method signature suggests returning the result directly, the.clone()is unnecessary. This appears to be a copy-paste artifact fromReverseFixedColumnSink(line 241).🔧 Suggested fix
fn result(&self) -> ParquetResult<()> { - self.slicer.result().clone() + self.slicer.result() }
kafka1991
left a comment
There was a problem hiding this comment.
- pls add compat tests in
src/test/java/io/questdb/compat/ParquetTest.java - It's better to introduce some decimal parquet sqlLogicTests from
https://github.com/duckdb/duckdb/tree/main/data/parquet-testing/decimal
…rd_parquet_decimal
|
Thanks @kafka1991 |
|
@CodeRabbit There has been many changes, review it again critically |
|
I'll thoroughly examine:
I'll also note the Enterprise CI failure that needs attention. ✅ Actions performedFull review triggered. |
…rd_parquet_decimal
…rd_parquet_decimal
…rd_parquet_decimal
…rd_parquet_decimal
|
@coderabbitai please review again critically |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain
🏁 Script executed: Length of output: 3644 🧠 Learnings used |
[PR Coverage check]😍 pass : 2738 / 3522 (77.74%) file detail
|
Summary
WordSwapDecimalColumnSinkfor DECIMAL128/256 that correctly converts from Parquet's big-endian format by reversing each 8-byte word independentlyTest plan
Supported physical types
The following Parquet decimal physical types are supported: