Skip to content

feat(parquet): add decimal type support for parquet read/write#6725

Merged
bluestreak01 merged 43 commits intomasterfrom
rd_parquet_decimal
Feb 18, 2026
Merged

feat(parquet): add decimal type support for parquet read/write#6725
bluestreak01 merged 43 commits intomasterfrom
rd_parquet_decimal

Conversation

@RaphDal
Copy link
Copy Markdown
Contributor

@RaphDal RaphDal commented Jan 29, 2026

Summary

  • Add support for all 6 decimal types (DECIMAL8/16/32/64/128/256) in parquet format
  • Implement write support storing decimals as fixed-length byte arrays in big-endian format per Parquet specification
  • Add read support with WordSwapDecimalColumnSink for DECIMAL128/256 that correctly converts from Parquet's big-endian format by reversing each 8-byte word independently
  • Add comprehensive fuzz tests with random precision/scale values and round-trip tests

Test plan

  • Fuzz tests with random precision/scale values running multiple iterations
  • Col-tops tests for all decimal types
  • Round-trip tests (native -> parquet -> native -> parquet)
  • Rust unit tests for WordSwapDecimalColumnSink

Supported physical types

The following Parquet decimal physical types are supported:

  • INT32 physical type for small decimals (DECIMAL8/16/32)
  • INT64 physical type for DECIMAL64
  • FixedLenByteArray for all decimals

Add support for all 6 decimal types (DECIMAL8/16/32/64/128/256) in
parquet format.

Write support:
- Add decimal native types in Rust for all sizes
- Implement parquet encoding storing decimals as fixed-length byte
  arrays in big-endian format per Parquet specification

Read support:
- Add decoders for DECIMAL8/16/32/64 using ReverseFixedColumnSink
- Add WordSwapDecimalColumnSink for DECIMAL128/256 that correctly
  converts from Parquet's big-endian format by reversing each 8-byte
  word independently

Testing:
- Add comprehensive fuzz tests with random precision/scale values
- Add col-tops tests for all decimal types
- Add round-trip tests (native -> parquet -> native -> parquet)
- Add Rust unit tests for WordSwapDecimalColumnSink

Co-Authored-By: Claude Opus 4.5 <[email protected]>
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Jan 29, 2026

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review

Walkthrough

Adds end-to-end support for fixed-length decimal types (Decimal8–Decimal256) across Parquet read/write and core column typing: new decimal type wrappers, Parquet encoding/decoding paths and sinks, byte-representation API changes, ColumnType decimal constructors/accessors, and extensive tests across Rust and Java layers.

Changes

Cohort / File(s) Summary
Core Type Accessors
core/rust/qdb-core/src/col_type.rs
Adds pub const fn new_decimal(precision: u8, scale: u8) -> Option<Self> plus pub fn decimal_scale(&self) -> u8 and pub fn decimal_precision(&self) -> u8 to encode/extract precision/scale in ColumnType.
Parquet native bytes API
core/rust/qdbr/parquet2/src/types.rs, core/rust/qdbr/parquet2/src/bloom_filter/hash.rs, core/rust/qdbr/parquet2/src/indexes/index.rs, core/rust/qdbr/parquet2/src/statistics/primitive.rs
Renames public trait methods from to_le_bytes/from_le_bytes to to_bytes/from_bytes and updates callsites: hash/index/statistics now use to_bytes/from_bytes (canonical byte representation) instead of LE-specific helpers.
Parquet write: decimal types
core/rust/qdbr/src/parquet_write/decimal.rs, core/rust/qdbr/src/parquet_write/mod.rs
Introduces Decimal8/16/32/64/128/256 newtypes, NULL sentinels, NativeType/Nullable implementations (big-endian fixed-length byte-array semantics), and registers pub(crate) mod decimal.
Parquet write: encoding integration
core/rust/qdbr/src/parquet_write/primitive.rs, core/rust/qdbr/src/parquet_write/file.rs, core/rust/qdbr/src/parquet_write/schema.rs
Adds decimal_slice_to_page_plain and integrates DecimalN branches into chunk-to-page encoding and schema mapping (FixedLenByteArray with decimal metadata); serializes statistics using to_bytes().
Parquet read: sinks & decode
core/rust/qdbr/src/parquet_read/column_sink/fixed.rs, core/rust/qdbr/src/parquet_read/column_sink/tests.rs, core/rust/qdbr/src/parquet_read/decode.rs
Adds WordSwapDecimalColumnSink (per-8-byte-word BE→LE swap) and SignExtendDecimalColumnSink (BE sign-extend to target width) with extensive unit tests; extends decode logic to route FixedLenByteArray decimal cases to new sinks and uses decimal null constants.
Parquet metadata inference
core/rust/qdbr/src/parquet_read/meta.rs
Extends descriptor_to_column_type to infer decimal ColumnType for Int32/Int64 and FixedLenByteArray when decimal logical/converted metadata is present.
Java read path: Parquet record cursor
core/src/main/java/io/questdb/griffin/engine/functions/table/ReadParquetRecordCursor.java
Adds getters for decimal types (getDecimal8/16/32/64 and getDecimal128/256 with sink objects) to expose decimal reading from Parquet records.
SQL tests & resources
core/src/test/java/io/questdb/test/griffin/ParquetTest.java, compat/src/test/java/io/questdb/compat/ParquetTest.java, core/src/test/resources/sqllogictest/test/parquet/*
Adds extensive Java Parquet tests for decimal sizes/variants, updates compat tests to validate decimals, and adds multiple sqllogictest resources for fixed-length and int32/int64 decimal Parquet files.
Misc small change
core/src/main/java/io/questdb/griffin/SqlParser.java
Uses thread-local StringSink to append "_ignore_nulls" when generating a column alias for a window expression with ignore-nulls.

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

Suggested labels

Enhancement

Suggested reviewers

  • bluestreak01
🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 70.51% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat(parquet): add decimal type support for parquet read/write' accurately and concisely describes the primary change—adding comprehensive decimal type support to Parquet read/write operations.
Description check ✅ Passed The description is directly related to the changeset, providing a clear summary of decimal type support implementation, test coverage, and supported physical types.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch rd_parquet_decimal

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@bluestreak01
Copy link
Copy Markdown
Member

@CodeRabbit pls review critically

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Jan 30, 2026

@bluestreak01, I'll perform a critical review of this PR, carefully examining:

  • Correctness of decimal encoding/decoding: Verification of big-endian byte array handling and word-swap logic for DECIMAL128/256 types
  • Parquet specification compliance: Ensuring the implementation follows Parquet format standards for decimal types
  • Edge cases and boundary conditions: Precision and scale handling, overflow scenarios, and extreme values
  • Test coverage: Adequacy of fuzz tests, round-trip tests, and unit tests for all decimal types
  • Error handling: Proper validation and error reporting
  • Performance implications: Efficiency of the conversion operations
  • Code quality: Readability, maintainability, and potential bugs

I'll provide detailed feedback on any issues, potential improvements, or concerns I identify.

✅ Actions performed

Full review triggered.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@core/rust/qdbr/src/parquet_write/decimal.rs`:
- Around line 49-81: The ord() implementations for Decimal128 and Decimal256
currently only compare the high-part fields (Decimal128::ord and
Decimal256::ord) causing values with equal high parts but different low parts to
be treated as equal; update both ord() methods to first compare the high parts
(self.0 vs other.0), and if Equal compare the low parts treating them as
unsigned (e.g., cast to u64 for Decimal128 low part, and appropriate unsigned
types for Decimal256 low parts) so the ordering reflects the full multi-word
value while keeping the same null handling and NativeType semantics.
🧹 Nitpick comments (2)
core/rust/qdbr/src/parquet_write/schema.rs (1)

287-312: Add a guard for scale > precision before emitting Decimal metadata.

This prevents constructing invalid decimal logical types if upstream encodes a bad scale/precision pair.

🔧 Suggested defensive check
             let scale = column_type.decimal_scale() as usize;
             let precision = column_type.decimal_precision() as usize;
+            if scale > precision {
+                return Err(fmt_err!(
+                    InvalidType,
+                    "decimal scale {scale} exceeds precision {precision}"
+                ));
+            }
             Ok(ParquetType::try_from_primitive(
                 name,
                 PhysicalType::FixedLenByteArray(size),
core/rust/qdbr/src/parquet_read/column_sink/fixed.rs (1)

651-653: Redundant .clone() call on ParquetResult<()>.

The result() method returns ParquetResult<()> which is Result<(), ParquetError>. Since () is Copy and the method signature suggests returning the result directly, the .clone() is unnecessary. This appears to be a copy-paste artifact from ReverseFixedColumnSink (line 241).

🔧 Suggested fix
     fn result(&self) -> ParquetResult<()> {
-        self.slicer.result().clone()
+        self.slicer.result()
     }

Copy link
Copy Markdown
Collaborator

@kafka1991 kafka1991 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. pls add compat tests in src/test/java/io/questdb/compat/ParquetTest.java
  2. It's better to introduce some decimal parquet sqlLogicTests from https://github.com/duckdb/duckdb/tree/main/data/parquet-testing/decimal

@RaphDal
Copy link
Copy Markdown
Contributor Author

RaphDal commented Feb 2, 2026

Thanks @kafka1991
Added the compat tests in c63dacd
Added some tests using duckdb parquet files in 29ab7b6

@RaphDal
Copy link
Copy Markdown
Contributor Author

RaphDal commented Feb 3, 2026

@CodeRabbit There has been many changes, review it again critically

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Feb 3, 2026

@RaphDal, I'll perform a fresh critical review of all the changes, including the new compatibility tests and DuckDB parquet file tests you've added.

I'll thoroughly examine:

  • Decimal encoding/decoding correctness: Big-endian byte array handling, word-swap logic for DECIMAL128/256, and the new WordSwapDecimalColumnSink implementation
  • Parquet specification compliance: Fixed-length byte array encoding for all decimal types per spec
  • Endianness handling changes: The refactoring from to_le_bytes/from_le_bytes to to_bytes/from_bytes in the NativeType trait and its implications across the codebase
  • New test coverage: Compatibility tests and DuckDB integration tests alongside existing fuzz/round-trip/unit tests
  • Edge cases: Precision/scale boundaries, null handling (especially the new DECIMAL128_NULL and DECIMAL256_NULL constants), overflow scenarios
  • Error handling: Validation logic for decimal types during read/write operations
  • Performance implications: Word-swapping operations and conversion efficiency
  • Code quality: Implementation correctness, maintainability, and potential bugs

I'll also note the Enterprise CI failure that needs attention.

✅ Actions performed

Full review triggered.

@RaphDal
Copy link
Copy Markdown
Contributor Author

RaphDal commented Feb 17, 2026

@coderabbitai please review again critically

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Feb 17, 2026

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

@RaphDal, I'll perform a thorough critical review of the decimal support implementation. Let me examine the key areas in detail.


🏁 Script executed:

#!/bin/bash

# Examine the WordSwapDecimalColumnSink implementation for potential issues
echo "=== WordSwapDecimalColumnSink implementation ==="
ast-grep --pattern $'impl$$$WordSwapDecimalColumnSink$$$push_slice$$$'

echo -e "\n=== SignExtendDecimalColumnSink unsafe code ==="
ast-grep --pattern $'unsafe fn sign_extend_and_convert($$$) {
  $$$
}'

# Check for any TODOs or FIXMEs in the decimal implementation
echo -e "\n=== TODOs/FIXMEs in decimal code ==="
rg -n 'TODO|FIXME|XXX|HACK' core/rust/qdbr/src/parquet_write/decimal.rs core/rust/qdbr/src/parquet_read/column_sink/fixed.rs core/rust/qdbr/src/parquet_read/decode.rs

# Check null value consistency across the codebase
echo -e "\n=== Decimal NULL constant definitions ==="
rg -n 'DECIMAL.*_NULL' core/rust/qdbr/src/parquet_write/decimal.rs

# Look for panic/unwrap in decimal handling code
echo -e "\n=== Potential panics in decimal code ==="
rg -n 'unwrap\(|panic!|expect\(' core/rust/qdbr/src/parquet_write/decimal.rs core/rust/qdbr/src/parquet_read/column_sink/fixed.rs core/rust/qdbr/src/parquet_read/decode.rs | head -20

# Check the endianness handling changes in NativeType
echo -e "\n=== NativeType trait changes ==="
cat core/rust/qdbr/parquet2/src/types.rs | grep -A 5 'fn to_bytes\|fn from_bytes' | head -30

# Verify error handling in decimal decoding
echo -e "\n=== Error handling in decode_fixed_decimal ==="
rg -n 'Err\(|bail!|error!' core/rust/qdbr/src/parquet_read/decode.rs | grep -i decimal | head -20

Length of output: 3644



🧠 Learnings used
Learnt from: javier
Repo: questdb/questdb PR: 0
File: :0-0
Timestamp: 2026-02-06T18:22:28.130Z
Learning: In questdb/questdb, for PRs requested by javier, provide line-cited, push-back style reviews with concrete follow-ups (tests, diffs, and performance considerations).

Learnt from: javier
Repo: questdb/questdb PR: 0
File: :0-0
Timestamp: 2026-02-06T18:22:28.130Z
Learning: For questdb/questdb, javier expects a rigorous, push-back style review; always include concrete line-cited findings and propose targeted tests/fixes when reviewing javier’s PRs.

Learnt from: jerrinot
Repo: questdb/questdb PR: 0
File: :0-0
Timestamp: 2026-02-04T11:18:36.492Z
Learning: For PRs by jerrinot, always use a thorough and critical review mode: question design decisions, challenge assumptions, scrutinize implementations for edge cases and potential issues, and avoid superficial validation. They prefer hard-core reviews that push back on the code.

@glasstiger
Copy link
Copy Markdown
Contributor

[PR Coverage check]

😍 pass : 2738 / 3522 (77.74%)

file detail

path covered line new line coverage
🔵 qdbr/parquet2/src/bloom_filter/hash.rs 0 1 00.00%
🔵 qdbr/parquet2/src/indexes/index.rs 0 1 00.00%
🔵 qdbr/src/parquet_read/slicer/mod.rs 14 31 45.16%
🔵 io/questdb/griffin/engine/functions/table/ReadParquetRecordCursor.java 8 16 50.00%
🔵 qdbr/parquet2/src/types.rs 3 5 60.00%
🔵 qdbr/src/parquet_read/decimal.rs 678 1089 62.26%
🔵 qdbr/src/parquet_read/decode.rs 796 1082 73.57%
🔵 qdb-core/src/col_type.rs 26 33 78.79%
🔵 qdbr/src/parquet_read/column_sink/fixed.rs 179 203 88.18%
🔵 qdbr/src/parquet_write/schema.rs 16 18 88.89%
🔵 qdbr/parquet2/src/write/statistics.rs 43 48 89.58%
🔵 io/questdb/griffin/SqlParser.java 34 37 91.89%
🔵 qdbr/src/parquet_read/meta.rs 41 44 93.18%
🔵 qdbr/src/parquet_read/column_sink/tests.rs 535 548 97.63%
🔵 qdbr/src/parquet_write/primitive.rs 57 58 98.28%
🔵 qdbr/src/parquet_write/file.rs 36 36 100.00%
🔵 qdbr/parquet2/src/statistics/primitive.rs 2 2 100.00%
🔵 qdbr/src/parquet_write/decimal.rs 270 270 100.00%

@bluestreak01 bluestreak01 merged commit ce4c49b into master Feb 18, 2026
46 checks passed
@bluestreak01 bluestreak01 deleted the rd_parquet_decimal branch February 18, 2026 14:42
maciulis pushed a commit to maciulis/questdb that referenced this pull request Feb 19, 2026
@RaphDal RaphDal linked an issue Mar 23, 2026 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Full Decimal type support in Parquet read/write

4 participants