fix(parquet): fix crash when parquet partition statistics are missing by kafka1991 · Pull Request #6827 · questdb/questdb

kafka1991 · 2026-02-27T04:55:44Z

The cairo.partition.encoder.parquet.statistics.enabled configuration allows users to disable Parquet statistics, but the read path (ParquetTimestampFinder, TableWriter) and the O3 merge path (O3PartitionJob.processParquetPartition) hard-depend on timestamp column statistics. When statistics are absent, getMinValueLong hits assert size == Long.BYTES (crash with -ea) or reads garbage memory (silent data corruption without -ea). This PR removes that hard dependency.

PartitionDecoder adds rowGroupMinTimestamp / rowGroupMaxTimestamp methods that try Parquet column statistics first (zero cost), then fall back to decoding the first/last row from actual data pages when statistics are absent.
findRowGroupByTimestamp also falls back to decoding when statistics are missing, instead of reading garbage memory.
O3PartitionJob, ParquetTimestampFinder, and TableWriter migrate from readRowGroupStats + getMinValueLong/getMaxValueLong to the new methods.

coderabbitai · 2026-02-27T04:55:54Z

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

Walkthrough

Replaced row-group statistics buffer usage with on-demand, file-backed timestamp decoding. Added new decoder helpers and JNI/native methods for row-group min/max timestamps, updated findRowGroupByTimestamp to accept file pointer/size, and removed RowGroupStatBuffers from Java paths and tests.

Changes

Cohort / File(s)	Summary
Rust Decoder Core `core/rust/qdbr/src/parquet_read/decode.rs`	Added on-demand file-backed timestamp decoding (file_ptr, file_size); introduced helpers: `row_group_timestamp_stat`, `validate_timestamp_column`, `validate_timestamp_sorting_key`, `decode_single_timestamp`, `row_group_min_timestamp`, `row_group_max_timestamp`; updated `find_row_group_by_timestamp` to use metadata + single-timestamp fallback; removed unused helper.
Rust JNI Bridge `core/rust/qdbr/src/parquet_read/jni.rs`	Added JNI exports for `rowGroupMinTimestamp` and `rowGroupMaxTimestamp`; updated `findRowGroupByTimestamp` to accept file pointer/size; added null-pointer assertions and richer error context.
Rust Minor Refactor & Formatting `core/rust/qdbr/src/parquet_read/slicer/mod.rs`, `core/rust/qdbr/src/parquet_write/decimal.rs`	Non-functional formatting and struct initializer reflow; constant array line-break adjustments only.
Java Parquet Native API `core/src/main/java/io/questdb/griffin/engine/table/parquet/PartitionDecoder.java`	Added public wrappers `rowGroupMinTimestamp`/`rowGroupMaxTimestamp`; changed native `findRowGroupByTimestamp` signature to include fileAddr/fileSize; added native declarations for new methods.
Java: remove RowGroupStatBuffers usages `core/src/main/java/io/questdb/cairo/O3PartitionJob.java`, `core/src/main/java/io/questdb/cairo/ParquetTimestampFinder.java`, `core/src/main/java/io/questdb/cairo/TableWriter.java`	Removed `RowGroupStatBuffers` field/imports and related lifecycle code; replaced stat-buffer reads with direct decoder calls to new row-group min/max timestamp native methods.
Tests & Test Config `core/src/test/java/io/questdb/test/cairo/TimestampFinderTest.java`, `core/src/test/java/io/questdb/test/cairo/fuzz/O3ParquetPartitionFuzzTest.java`, `core/src/test/java/io/questdb/test/cairo/o3/AbstractO3Test.java`	Added parameterized TimestampFinderTest to run with/without parquet statistics; introduced `isParquetStatisticsEnabled()` helper and randomized override in fuzz test; updated test config integration.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

perf(sql): recognize data sorted by timestamp in read_parquet() SQL function #6079: Modifies Parquet decoder JNI/Java bridge and findRowGroupByTimestamp signature — likely directly related to these native/signature changes.
chore(sql): speedup tests #6347: Touches the same test classes and parquet-statistics test configuration, indicating overlap in test harness changes.

Suggested labels

Bug

Suggested reviewers

puzpuzpuz
bluestreak01

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 11.11% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'fix(parquet): fix crash when parquet partition statistics are missing' directly and clearly describes the main problem being fixed—a crash when Parquet partition statistics are disabled.
Description check	✅ Passed	The description comprehensively explains the root cause (hard dependency on statistics), the solution (fallback mechanisms in PartitionDecoder), and affected code paths, relating clearly to all the changeset modifications.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch parquet_timestamp_min_max_without_statistics

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

glasstiger · 2026-03-02T12:53:03Z

@coderabbitai review

coderabbitai · 2026-03-02T12:53:11Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 4

🧹 Nitpick comments (2)

core/src/test/java/io/questdb/test/cairo/fuzz/O3ParquetPartitionFuzzTest.java (1)
261-264: Stabilize the parquet-statistics toggle per test instance.

At Line 263, using rnd.nextBoolean() directly can flip configuration between calls in the same run. Sample once and return that fixed value so a single run tests one coherent mode.
💡 Suggested patch
 public class O3ParquetPartitionFuzzTest extends AbstractO3Test {
+    private final boolean parquetStatisticsEnabled = rnd.nextBoolean();
+
@@
     `@Override`
     protected boolean isParquetStatisticsEnabled() {
-        return rnd.nextBoolean();
+        return parquetStatisticsEnabled;
     }
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@core/src/test/java/io/questdb/test/cairo/fuzz/O3ParquetPartitionFuzzTest.java`
around lines 261 - 264, The isParquetStatisticsEnabled() method currently
returns rnd.nextBoolean() each call causing the parquet-statistics mode to flip
during a test run; instead, sample rnd.nextBoolean() once per test instance into
a private final boolean field (e.g., parquetStatisticsEnabled) initialized in
the test constructor or a `@Before` method and have isParquetStatisticsEnabled()
return that field so the mode remains stable for the entire test instance;
reference the isParquetStatisticsEnabled() method and the rnd RNG when making
this change.
core/rust/qdbr/src/parquet_read/decode.rs (1)
802-811: Missing-stats path decodes bounds twice per row group

Line 809 and Line 835 can each trigger a separate decode_single_timestamp call when both stats are absent. On stats-disabled files, this doubles setup/page traversal work per row group.

Consider a single helper that returns (min,max) in one path (or reuses one DecodeContext/buffers across both probes) to reduce fallback overhead.
Based on learnings: provide line-cited, push-back style reviews with concrete follow-ups (tests, diffs, and performance considerations).

Also applies to: 828-844
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core/rust/qdbr/src/parquet_read/decode.rs` around lines 802 - 811, The
current fallback path calls decode_single_timestamp twice per row group when
stats are missing (see row_group_timestamp_stat::<false>,
validate_timestamp_sorting_key, and decode_single_timestamp), doubling
decoding/setup cost; fix by creating a single helper (e.g.,
decode_row_group_timestamp_bounds or reuse a single DecodeContext/buffers) that
returns both min and max in one traversal and call
validate_timestamp_sorting_key before that helper as needed; update the branches
that currently call decode_single_timestamp at lines around the min and max
probes to use this helper/reused context, and add a unit/perf test that
exercises stats-disabled files to ensure only one decode occurs per row group.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@core/rust/qdbr/src/parquet_read/decode.rs`:
- Around line 835-842: The timestamp decode paths perform size - 1 without
guarding against zero, causing usize underflow; before calling
decode_single_timestamp ensure column_chunk_size != 0 and handle the zero case
(e.g., return an Err with context) instead of passing column_chunk_size - 1;
similarly, in the code paths around row_group_size use an explicit check if
row_group_size == 0 then return Err(InvalidLayout, "row group {} has zero rows
for timestamp column {}", ...) in the function that handles row group layout,
and at the loop near the nested loops add if row_group_size == 0 { continue; }
to skip processing empty groups; update the functions referenced
(decode_single_timestamp call site, the function validating row_group_size, and
the loop around the nested timestamp decode logic) and add a regression test
that feeds an empty row group with stats disabled and asserts a controlled error
(no panic/underflow).
- Around line 924-951: decode_single_timestamp currently casts
ColumnChunkBuffers.data_vec (u8) to *const i64 and unsafely dereferences it,
risking empty-buffer deref and alignment UB; also code that computes
column_chunk_size - 1 can underflow when num_values == 0. Fix by making
decode_single_timestamp perform safe bounds and alignment-checked extraction:
verify bufs.data_vec.len() >= size_of::<i64>(), read the first 8 bytes with a
safe conversion (e.g., copy into an [u8;8] and use i64::from_le_bytes or
equivalent) rather than casting to *const i64, and return a descriptive
ParquetResult error if buffer is empty or too short. Also audit the code that
computes column_chunk_size - 1 (the place using column_chunk_size) to guard
against num_values == 0 and handle that branch explicitly (return empty/zero
result or an error) to avoid underflow. Add regression tests exercising
decode_single_timestamp / decode_column_chunk for (1) missing stats with empty
fallback decode and (2) column_chunk_size == 0 to ensure the new checks and
error paths behave as expected.

In `@core/src/main/java/io/questdb/cairo/ParquetTimestampFinder.java`:
- Around line 144-146: minTimestampExact() currently calls
partitionDecoder.rowGroupMinTimestamp(0, timestampIdAndType.get(0)) without
checking for empty metadata; add a guard like the existing max-path precheck to
detect zero row groups (or empty metadata) and throw a clear CairoException if
none exist, otherwise proceed to call partitionDecoder.rowGroupMinTimestamp(0,
timestampIdAndType.get(0)); ensure you reference the same metadata/row-group
count check used elsewhere so the behavior is symmetric with the max-timestamp
path.

In `@core/src/main/java/io/questdb/cairo/TableWriter.java`:
- Line 8879: The code calls parquetDecoder.rowGroupMinTimestamp(0,
timestampIndex) (and similarly rowGroupMaxTimestamp) without checking
rowGroupCount, risking invalid native reads on empty/corrupt metadata; update
the logic in TableWriter to first check rowGroupCount == 0 and throw a
CairoException with a clear message when zero, and only then call
rowGroupMinTimestamp/rowGroupMaxTimestamp with validated indices, and add a
regression unit test that constructs/loads a parquet file with zero row groups
and asserts the CairoException is thrown.

---

Nitpick comments:
In `@core/rust/qdbr/src/parquet_read/decode.rs`:
- Around line 802-811: The current fallback path calls decode_single_timestamp
twice per row group when stats are missing (see
row_group_timestamp_stat::<false>, validate_timestamp_sorting_key, and
decode_single_timestamp), doubling decoding/setup cost; fix by creating a single
helper (e.g., decode_row_group_timestamp_bounds or reuse a single
DecodeContext/buffers) that returns both min and max in one traversal and call
validate_timestamp_sorting_key before that helper as needed; update the branches
that currently call decode_single_timestamp at lines around the min and max
probes to use this helper/reused context, and add a unit/perf test that
exercises stats-disabled files to ensure only one decode occurs per row group.

In
`@core/src/test/java/io/questdb/test/cairo/fuzz/O3ParquetPartitionFuzzTest.java`:
- Around line 261-264: The isParquetStatisticsEnabled() method currently returns
rnd.nextBoolean() each call causing the parquet-statistics mode to flip during a
test run; instead, sample rnd.nextBoolean() once per test instance into a
private final boolean field (e.g., parquetStatisticsEnabled) initialized in the
test constructor or a `@Before` method and have isParquetStatisticsEnabled()
return that field so the mode remains stable for the entire test instance;
reference the isParquetStatisticsEnabled() method and the rnd RNG when making
this change.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fc24a7f and 2a5c1be.

⛔ Files ignored due to path filters (5)

core/src/main/resources/io/questdb/bin/darwin-aarch64/libquestdbr.dylib is excluded by !**/*.dylib
core/src/main/resources/io/questdb/bin/darwin-x86-64/libquestdbr.dylib is excluded by !**/*.dylib
core/src/main/resources/io/questdb/bin/linux-aarch64/libquestdbr.so is excluded by !**/*.so
core/src/main/resources/io/questdb/bin/linux-x86-64/libquestdbr.so is excluded by !**/*.so
core/src/main/resources/io/questdb/bin/windows-x86-64/questdbr.dll is excluded by !**/*.dll

📒 Files selected for processing (11)

core/rust/qdbr/src/parquet_read/decode.rs
core/rust/qdbr/src/parquet_read/jni.rs
core/rust/qdbr/src/parquet_read/slicer/mod.rs
core/rust/qdbr/src/parquet_write/decimal.rs
core/src/main/java/io/questdb/cairo/O3PartitionJob.java
core/src/main/java/io/questdb/cairo/ParquetTimestampFinder.java
core/src/main/java/io/questdb/cairo/TableWriter.java
core/src/main/java/io/questdb/griffin/engine/table/parquet/PartitionDecoder.java
core/src/test/java/io/questdb/test/cairo/TimestampFinderTest.java
core/src/test/java/io/questdb/test/cairo/fuzz/O3ParquetPartitionFuzzTest.java
core/src/test/java/io/questdb/test/cairo/o3/AbstractO3Test.java

core/rust/qdbr/src/parquet_read/decode.rs

core/src/main/java/io/questdb/cairo/ParquetTimestampFinder.java

core/src/main/java/io/questdb/cairo/TableWriter.java

glasstiger · 2026-03-02T14:47:46Z

Claude review:

Bug 1 (Medium): Unsafe read from potentially empty buffer in decode_single_timestamp

decode.rs:949-950:

  let ptr = bufs.data_vec.as_ptr() as *const i64;
  Ok(unsafe { *ptr })

After decode_column_chunk returns, the code reads from bufs.data_vec without verifying it contains at least 8 bytes. If the row group is corrupt or zero-size, data_vec could be empty,
making this undefined behavior. A debug assertion or explicit length check before the unsafe read would be prudent:

  assert!(bufs.data_vec.len() >= 8, "expected at least 8 bytes of decoded timestamp data");

Bug 2 (Low): usize underflow on zero-size row group

decode.rs:1022-1030 in row_group_max_timestamp:

  let row_group_size = self.row_group_sizes[rg] as usize;
  self.decode_single_timestamp(file_ptr, file_size, rg, ts, row_group_size - 1, row_group_size)?

And similarly in find_row_group_by_timestamp at line 840:

  column_chunk_size - 1,
  column_chunk_size,

If a row group has 0 rows, row_group_size - 1 wraps to usize::MAX. An empty row group is probably invalid per the Parquet spec, but this would cause a spectacular failure rather than a
clean error. A guard like assert!(row_group_size > 0) or an explicit error would be safer.

Issue 3 (Medium): Unnecessary allocations in fallback path

decode.rs:924-951 — decode_single_timestamp creates a fresh DecodeContext and ColumnChunkBuffers on every call:

  let mut ctx = DecodeContext::new(file_ptr, file_size);
  let mut bufs = ColumnChunkBuffers::new(self.allocator.clone());

In find_row_group_by_timestamp, this can be called up to 2x per row group (min + max), for every row group in the file. Each call allocates a decompression buffer internally. For a file
with many row groups and statistics disabled, this produces O(2 * row_group_count) allocations where 2 would suffice.

Consider hoisting the DecodeContext and ColumnChunkBuffers out of the loop in find_row_group_by_timestamp and passing them into decode_single_timestamp. For the standalone
row_group_min_timestamp / row_group_max_timestamp methods (single call), the current approach is acceptable.

Issue 4 (Nit): Redundant metadata access in find_row_group_by_timestamp

Lines 786-795 already fetch columns_meta and column_chunk_meta for the current row group. Then row_group_timestamp_stat (line 802) re-fetches the same
self.metadata.row_groups[row_group_index].columns() and walks to the same column chunk. This double access is harmless but wasteful. The stat extraction could take the already-fetched
column_chunk_meta as a parameter instead.

Issue 5 (Nit): Redundant condition check

decode.rs:846:

  if timestamp >= min_value && timestamp < max_value {

The timestamp >= min_value part is always true at this point, because the timestamp < min_value case returns on line 823. The check is logically correct but redundant. Minor readability
concern.

Test Coverage: Adequate

TimestampFinderTest is now parameterized with {true, false} for statistics — good, this exercises the ParquetTimestampFinder path (which uses rowGroupMinTimestamp/rowGroupMaxTimestamp
and findRowGroupByTimestamp).
O3ParquetPartitionFuzzTest randomly toggles statistics via rnd.nextBoolean() — good, this exercises the O3 merge path.
AbstractO3Test wires isParquetStatisticsEnabled() into both worker-pool and single-threaded configurations.

Gap: The TableWriter.readMinTimestampParquet and readParquetMinMaxTimestamps paths are only exercised indirectly. A direct unit test writing a Parquet partition with statistics
disabled, then calling attach/detach operations, would strengthen confidence. Not a blocker since the fuzz test should hit these paths eventually.

glasstiger · 2026-03-02T21:54:02Z

@coderabbitai review

coderabbitai · 2026-03-02T21:54:10Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

♻️ Duplicate comments (1)

core/src/main/java/io/questdb/cairo/ParquetTimestampFinder.java (1)

133-136: ⚠️ Potential issue | 🟡 Minor

Replace assert with an explicit runtime guard for empty row-group metadata.

assert rowGroupCount > 0 is not a production-safe precondition. With assertions off, maxTimestampExact() can call rowGroupMaxTimestamp(rowGroupCount - 1, ...) with -1. Please keep the assert if you want, but add a real guard and throw CairoException in both methods.

🛠️ Proposed fix

     `@Override`
     public long maxTimestampExact() {
         final int rowGroupCount = partitionDecoder.metadata().getRowGroupCount();
-        assert rowGroupCount > 0;
+        if (rowGroupCount < 1) {
+            throw CairoException.critical(0)
+                    .put("parquet partition has no row groups [table=").put(tableToken)
+                    .put(", partitionIndex=").put(partitionIndex)
+                    .put(']');
+        }
+        assert rowGroupCount > 0;
         return partitionDecoder.rowGroupMaxTimestamp(rowGroupCount - 1, timestampIdAndType.get(0));
     }
@@
     `@Override`
     public long minTimestampExact() {
         final int rowGroupCount = partitionDecoder.metadata().getRowGroupCount();
-        assert rowGroupCount > 0;
+        if (rowGroupCount < 1) {
+            throw CairoException.critical(0)
+                    .put("parquet partition has no row groups [table=").put(tableToken)
+                    .put(", partitionIndex=").put(partitionIndex)
+                    .put(']');
+        }
+        assert rowGroupCount > 0;
         return partitionDecoder.rowGroupMinTimestamp(0, timestampIdAndType.get(0));
     }

Run this to verify where assert-only guards exist and whether -ea is configured in repo-managed runtime configs:

#!/bin/bash
rg -nP 'maxTimestampExact|minTimestampExact|assert\s+rowGroupCount\s*>\s*0|rowGroup(Max|Min)Timestamp' core/src/main/java/io/questdb/cairo/ParquetTimestampFinder.java -C2
rg -nP '(^|\s)-ea(\s|$)|enableassertions' -g '*.sh' -g '*.service' -g '*.md' -g '*.xml' -g '*.yml' -g '*.yaml' .

Also applies to: 145-147

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@core/src/main/java/io/questdb/cairo/ParquetTimestampFinder.java` around lines
133 - 136, Replace the assert with an explicit runtime check in both
maxTimestampExact() and minTimestampExact(): read rowGroupCount =
partitionDecoder.metadata().getRowGroupCount(), and if rowGroupCount <= 0 throw
a new CairoException with a clear message (e.g. "no row groups in partition
metadata") instead of relying on assert, then call
partitionDecoder.rowGroupMaxTimestamp(rowGroupCount - 1,
timestampIdAndType.get(0)) / partitionDecoder.rowGroupMinTimestamp(rowGroupCount
- 1, timestampIdAndType.get(0)) as before; keep or remove the existing assert as
you prefer but ensure the CairoException guard exists so production runs cannot
pass a -1 index to rowGroupMaxTimestamp/rowGroupMinTimestamp.

🧹 Nitpick comments (2)

core/rust/qdbr/src/parquet_read/decode.rs (2)

979-1014: Add focused regression tests for direct row_group_min_timestamp / row_group_max_timestamp fallback behavior.

Please add a unit that disables stats and validates: (1) first/last-row decode path returns expected values, and (2) zero-row-group/zero-row error paths return controlled InvalidLayout errors. This will lock the new public API semantics independent of fuzz coverage.

Based on learnings: in questdb/questdb PR reviews, include concrete test follow-ups.

Also applies to: 1047-1065

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@core/rust/qdbr/src/parquet_read/decode.rs` around lines 979 - 1014, Add
focused unit tests for the public fallback behavior of row_group_min_timestamp
and row_group_max_timestamp: disable statistics (so row_group_timestamp_stat
returns None) and assert that the first/last-row decode path via
validate_timestamp_sorting_key + decode_single_timestamp returns expected
timestamps, and separately assert that requesting min/max on an empty row group
or a file with zero rows produces a controlled InvalidLayout error; target tests
to cover the functions row_group_min_timestamp and row_group_max_timestamp and
the fallback code paths (row_group_timestamp_stat::<false>,
validate_timestamp_sorting_key, decode_single_timestamp) so the API semantics
are locked in independent of fuzzing.

805-818: Reuse decode scratch objects in the fallback scan path to avoid repeated allocations.

When statistics are disabled, find_row_group_by_timestamp may probe min/max per row group, and each probe currently allocates a fresh DecodeContext + ColumnChunkBuffers via decode_single_timestamp. Hoisting these out of the loop will cut allocator churn on hot read paths.

♻️ Proposed refactor

-    fn decode_single_timestamp(
-        &self,
-        file_ptr: *const u8,
-        file_size: u64,
+    fn decode_single_timestamp(
+        &self,
+        ctx: &mut DecodeContext,
+        bufs: &mut ColumnChunkBuffers,
         row_group_index: usize,
         timestamp_column_index: usize,
         row_lo: usize,
         row_hi: usize,
     ) -> ParquetResult<i64> {
-        let mut ctx = DecodeContext::new(file_ptr, file_size);
-        let mut bufs = ColumnChunkBuffers::new(self.allocator.clone());
         let col_info = QdbMetaCol {
             column_type: ColumnType::new(ColumnTypeTag::Timestamp, 0),
             column_top: 0,
             format: None,
         };
         self.decode_column_chunk(
-            &mut ctx,
-            &mut bufs,
+            ctx,
+            bufs,
             row_group_index,
             row_lo,
             row_hi,
             timestamp_column_index,
             col_info,
         )?;

     pub fn find_row_group_by_timestamp(...) -> ParquetResult<u64> {
+        let mut decode_ctx = DecodeContext::new(file_ptr, file_size);
+        let mut decode_bufs = ColumnChunkBuffers::new(self.allocator.clone());
         ...
-                        self.decode_single_timestamp(file_ptr, file_size, row_group_idx, ts, 0, 1)?
+                        self.decode_single_timestamp(&mut decode_ctx, &mut decode_bufs, row_group_idx, ts, 0, 1)?
         ...
-                        self.decode_single_timestamp(file_ptr, file_size, row_group_idx, ts, column_chunk_size - 1, column_chunk_size)?
+                        self.decode_single_timestamp(&mut decode_ctx, &mut decode_bufs, row_group_idx, ts, column_chunk_size - 1, column_chunk_size)?

Based on learnings: in questdb/questdb PR reviews, include concrete performance follow-ups.

Also applies to: 835-855, 952-953

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@core/rust/qdbr/src/parquet_read/decode.rs` around lines 805 - 818, The
fallback path in find_row_group_by_timestamp repeatedly allocates DecodeContext
and ColumnChunkBuffers by calling decode_single_timestamp for each probe; hoist
reusable scratch objects out of the loop and reuse them to avoid allocator
churn. Concretely, allocate a single DecodeContext and ColumnChunkBuffers before
the row-group probe loop (or add an overload like
decode_single_timestamp_with_scratch that accepts &mut DecodeContext and &mut
ColumnChunkBuffers), then call that variant from find_row_group_by_timestamp’s
fallback branch (the code that currently calls decode_single_timestamp and
validate_timestamp_sorting_key) so each probe reuses the same buffers; apply the
same hoist/reuse change for the other probe locations referenced around the
blocks at the ranges corresponding to 835-855 and 952-953.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@core/src/main/java/io/questdb/cairo/ParquetTimestampFinder.java`:
- Around line 133-136: Replace the assert with an explicit runtime check in both
maxTimestampExact() and minTimestampExact(): read rowGroupCount =
partitionDecoder.metadata().getRowGroupCount(), and if rowGroupCount <= 0 throw
a new CairoException with a clear message (e.g. "no row groups in partition
metadata") instead of relying on assert, then call
partitionDecoder.rowGroupMaxTimestamp(rowGroupCount - 1,
timestampIdAndType.get(0)) / partitionDecoder.rowGroupMinTimestamp(rowGroupCount
- 1, timestampIdAndType.get(0)) as before; keep or remove the existing assert as
you prefer but ensure the CairoException guard exists so production runs cannot
pass a -1 index to rowGroupMaxTimestamp/rowGroupMinTimestamp.

---

Nitpick comments:
In `@core/rust/qdbr/src/parquet_read/decode.rs`:
- Around line 979-1014: Add focused unit tests for the public fallback behavior
of row_group_min_timestamp and row_group_max_timestamp: disable statistics (so
row_group_timestamp_stat returns None) and assert that the first/last-row decode
path via validate_timestamp_sorting_key + decode_single_timestamp returns
expected timestamps, and separately assert that requesting min/max on an empty
row group or a file with zero rows produces a controlled InvalidLayout error;
target tests to cover the functions row_group_min_timestamp and
row_group_max_timestamp and the fallback code paths
(row_group_timestamp_stat::<false>, validate_timestamp_sorting_key,
decode_single_timestamp) so the API semantics are locked in independent of
fuzzing.
- Around line 805-818: The fallback path in find_row_group_by_timestamp
repeatedly allocates DecodeContext and ColumnChunkBuffers by calling
decode_single_timestamp for each probe; hoist reusable scratch objects out of
the loop and reuse them to avoid allocator churn. Concretely, allocate a single
DecodeContext and ColumnChunkBuffers before the row-group probe loop (or add an
overload like decode_single_timestamp_with_scratch that accepts &mut
DecodeContext and &mut ColumnChunkBuffers), then call that variant from
find_row_group_by_timestamp’s fallback branch (the code that currently calls
decode_single_timestamp and validate_timestamp_sorting_key) so each probe reuses
the same buffers; apply the same hoist/reuse change for the other probe
locations referenced around the blocks at the ranges corresponding to 835-855
and 952-953.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2a5c1be and 20c0cd8.

⛔ Files ignored due to path filters (5)

core/src/main/resources/io/questdb/bin/darwin-aarch64/libquestdbr.dylib is excluded by !**/*.dylib
core/src/main/resources/io/questdb/bin/darwin-x86-64/libquestdbr.dylib is excluded by !**/*.dylib
core/src/main/resources/io/questdb/bin/linux-aarch64/libquestdbr.so is excluded by !**/*.so
core/src/main/resources/io/questdb/bin/linux-x86-64/libquestdbr.so is excluded by !**/*.so
core/src/main/resources/io/questdb/bin/windows-x86-64/questdbr.dll is excluded by !**/*.dll

📒 Files selected for processing (3)

core/rust/qdbr/src/parquet_read/decode.rs
core/src/main/java/io/questdb/cairo/ParquetTimestampFinder.java
core/src/main/java/io/questdb/cairo/TableWriter.java

🚧 Files skipped from review as they are similar to previous changes (1)

core/src/main/java/io/questdb/cairo/TableWriter.java

core/src/main/java/io/questdb/cairo/ParquetTimestampFinder.java

glasstiger · 2026-03-03T11:36:55Z

[PR Coverage check]

😍 pass : 217 / 293 (74.06%)

file detail

	path	covered line	new line	coverage
🔵	qdbr/src/parquet_read/decode.rs	148	214	69.16%
🔵	qdbr/src/parquet_read/jni.rs	51	61	83.61%
🔵	io/questdb/griffin/engine/table/parquet/PartitionDecoder.java	4	4	100.00%
🔵	io/questdb/cairo/O3PartitionJob.java	1	1	100.00%
🔵	io/questdb/cairo/ParquetTimestampFinder.java	2	2	100.00%
🔵	io/questdb/cairo/TableWriter.java	5	5	100.00%
🔵	qdbr/src/parquet_read/slicer/mod.rs	6	6	100.00%

…questdb#6827)

enhance timestamp handling with new min/max methods and statistics

63677b8

kafka1991 changed the title ~~chore(parquet): enhance timestamp handling with new min/max methods and statistics~~ fix(parquet): fix crash when parquet partition statistics are missing Feb 27, 2026

Rebuild Rust libraries

dc3f9d8

glasstiger self-assigned this Mar 2, 2026

Merge branch 'master' into parquet_timestamp_min_max_without_statistics

2a5c1be

coderabbitai bot reviewed Mar 2, 2026

View reviewed changes

kafka1991 and others added 5 commits March 2, 2026 23:14

code review

a2469e4

Rebuild Rust libraries

40a1c9a

code review

8ebdc61

code review

1aefa47

Rebuild Rust libraries

20c0cd8

coderabbitai bot reviewed Mar 2, 2026

View reviewed changes

glasstiger previously approved these changes Mar 3, 2026

View reviewed changes

core/src/main/java/io/questdb/cairo/ParquetTimestampFinder.java Outdated Show resolved Hide resolved

remove duplicate assert

e08828b

kafka1991 dismissed glasstiger’s stale review via e08828b March 3, 2026 10:59

glasstiger approved these changes Mar 3, 2026

View reviewed changes

bluestreak01 approved these changes Mar 3, 2026

View reviewed changes

bluestreak01 merged commit e6e9158 into master Mar 3, 2026
52 checks passed

bluestreak01 deleted the parquet_timestamp_min_max_without_statistics branch March 3, 2026 13:47

maciulis pushed a commit to maciulis/questdb that referenced this pull request Mar 16, 2026

fix(parquet): fix crash when parquet partition statistics are missing (…

c5d8caa

…questdb#6827)

Conversation

kafka1991 commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

glasstiger commented Mar 2, 2026

Uh oh!

coderabbitai bot commented Mar 2, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

glasstiger commented Mar 2, 2026

Uh oh!

glasstiger commented Mar 2, 2026

Uh oh!

coderabbitai bot commented Mar 2, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

glasstiger commented Mar 3, 2026

[PR Coverage check]

file detail

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

kafka1991 commented Feb 27, 2026 •

edited

Loading

coderabbitai bot commented Feb 27, 2026 •

edited

Loading