perf(sql): recognize data sorted by timestamp in read_parquet() SQL function by puzpuzpuz · Pull Request #6079 · questdb/questdb

puzpuzpuz · 2025-08-26T13:01:58Z

Also updates zstd, snappy and other compression algo library versions in parquet2 and regenerates cargo lock files.

…unction

coderabbitai · 2025-08-26T13:02:04Z

Important

Review skipped

Auto reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

Adds designated-timestamp support across Parquet paths: ColumnType designation API, writer marks designated timestamps, reader records/clears designated flag, infers timestamp when absent, exposes timestamp_index through Rust/JNI/Java, and refactors Java metadata accessors to getters; updates tests and sqllogictest.

Changes

Cohort / File(s)	Summary
ColumnType designated API `core/rust/qdb-core/src/col_type.rs`	Add `pub fn into_non_designated(self) -> CoreResult<ColumnType>` and unit test for designated/un-designated timestamp handling.
Parquet metadata (Rust writer) `core/rust/qdbr/src/parquet/qdb_metadata.rs`, `core/rust/qdbr/src/parquet_write/schema.rs`	Change `QdbMeta::new()` → `QdbMeta::new(capacity)`; preallocate schema; writer marks designated timestamp via `into_designated()` and propagates errors.
Parquet decoder (Rust read/meta) `core/rust/qdbr/src/parquet_read/meta.rs`, `core/rust/qdbr/src/parquet_read/mod.rs`	Add `timestamp_index: i32` to `ParquetDecoder`; record designated timestamp index and clear designated flag before storing types; infer timestamp by scanning row groups if none designated; return decoder with `timestamp_index`.
Rust JNI surface `core/rust/qdbr/src/parquet_read/jni.rs`	Rename timestamp parameter to `timestamp_index` in findRowGroupByTimestamp native binding; add `Java_...PartitionDecoder_timestampIndexOffset` to return `timestamp_index` field offset for JNI.
Java PartitionDecoder & Metadata `core/src/main/java/io/questdb/griffin/engine/table/parquet/PartitionDecoder.java`	Add `TIMESTAMP_INDEX_OFFSET` and `timestampIndexOffset()` native accessor; rename native param to `timestampIndex`; refactor `Metadata` API to getters (`getColumnCount`, `getColumnId`, `getColumnName`, `getRowCount`, `getRowGroupCount`, `getRowGroupSize`), add `getTimestampIndex()` and `setTimestampIndex()`, update init/copy flows to propagate timestamp index.
Java call sites / engine integration `core/src/main/java/io/questdb/cairo/O3PartitionJob.java`, `.../TableWriter.java`, `.../ParquetTimestampFinder.java`, `.../sql/PageFrameMemoryPool.java`, `.../engine/table/*PageFrameCursor.java`, `.../functions/table/ReadParquetPageFrameCursor.java`, `.../functions/table/ReadParquetRecordCursor.java`	Replace field-style metadata accessors with new getters (`getRowGroupCount`, `getRowGroupSize`, `getColumnCount`, `getColumnId`, `getColumnName`, `getRowCount`, `getColumnType`); add some assertions for non-zero counts.
Java tests and SQL tests `core/src/test/java/io/questdb/test/.../PartitionDecoderTest.java`, `.../ReadParquetFunctionTest.java`, `.../cairo/IntervalBwdPartitionFrameCursorTest.java`, `.../cairo/IntervalFwdPartitionFrameCursorTest.java`, `core/src/test/resources/sqllogictest/test/parquet/designated_ts_detection.test`	Update tests to use getter API; add `testDesignatedTimestamp()` and sqllogictest to validate designated timestamp detection and SAMPLE BY behavior; adjust expected outputs to include timestamp column.
Parquet encoder API change `core/rust/qdbr/parquet2/src/encoding/hybrid_rle/mod.rs`, `.../encoder.rs`	Change `encode_u32` signature to accept explicit `len: usize` parameter; update tests to pass `len`.
Misc Rust test API updates `core/rust/qdbr/src/parquet2/Cargo.toml`, `core/rust/qdb-core/Cargo.toml`, `core/rust/qdbr/Cargo.toml`, `core/rust/qdbr/src/allocator.rs`, `core/rust/qdbr/src/parquet_read/decode.rs`, `core/rust/qdbr/parquet2/src/encoding/hybrid_rle/encoder.rs`	Dev-dep version bumps; test RNG API changes (thread_rng/gen_range → rng/random_range/etc.); minor test refactors (len capture).
CI / trivial `ci/github-release-pipeline.yml`, `core/src/main/java/io/questdb/griffin/engine/table/parquet/PartitionDescriptor.java`	CI displayName fix and trailing newline; no functional change.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant SQL as SQL read_parquet()
  participant J as Java PartitionDecoder
  participant JNI as JNI bridge
  participant R as Rust ParquetDecoder
  participant FS as Parquet file scanning

  SQL->>J: open parquet path -> create decoder
  J->>JNI: call native init (path)
  JNI->>R: construct ParquetDecoder (read schema)
  note right of R: For each column<br/>if ColumnType is designated Timestamp:<br/>- record timestamp_index<br/>- clear designated flag before storing ColumnMeta
  R->>FS: scan row groups if timestamp not designated (infer by sorted column)
  FS-->>R: row-group sort info
  R-->>JNI: return decoder pointer (contains timestamp_index)
  JNI-->>J: decoder initialized
  J->>J: metadata.getTimestampIndex()
  alt timestamp_index >= 0
    J->>J: use designated timestamp for TS-specific ops (findRowGroupByTimestamp)
  else
    J->>J: fallback behavior (no designated timestamp)
  end
  J-->>SQL: produce frames/records using discovered timestamp info

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

chore(core): assert all munmaps are succesful, add if-guards around non-compliant code #6092 — Modifies PartitionDecoder metadata API and exposes TIMESTAMP_INDEX_OFFSET; overlaps JNI/Java timestamp index changes.
fix(sql): support more Parquet field type combinations in read_parquet #6069 — Touches col_type.rs and Parquet metadata/decoder logic; related to ColumnType designated-flag handling.

Suggested reviewers

puzpuzpuz

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Out of Scope Changes Check	⚠️ Warning	The change to ci/github-release-pipeline.yml fixing a display name typo and altering the OS matrix is unrelated to designated timestamp detection or dependency updates documented in the pull request objectives, introducing out-of-scope modifications.	Please remove or separate the CI pipeline adjustments into a dedicated pull request so that this change remains focused on the timestamp detection feature and dependency updates.
Docstring Coverage	⚠️ Warning	Docstring coverage is 42.86% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title Check	✅ Passed	The title clearly and concisely summarizes the main functional change of recognizing timestamp‐sorted data in the read_parquet() SQL function, directly reflecting the objective of detecting a designated timestamp column for performance optimization.
Linked Issues Check	✅ Passed	The pull request implements automatic detection of a designated timestamp from Parquet metadata flags and fallback scanning of row groups, propagates the timestamp index through the native decoder and Java PartitionDecoder.Metadata API, updates the SQL read_parquet planning to use this index in both parallel and sequential execution paths, and updates tests to verify the feature, thereby fulfilling the objectives of issue #5293.
Description Check	✅ Passed	The description mentions closing issue #5293 and notes the update of compression libraries and regeneration of Cargo.lock files, which accurately reflects the core feature implementation and dependency changes in the pull request.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

CLAassistant · 2025-08-26T14:56:02Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
2 out of 3 committers have signed the CLA.

✅ puzpuzpuz
✅ bluestreak01
❌ GitHub Actions - Rebuild Native Libraries

GitHub Actions - Rebuild Native Libraries seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

puzpuzpuz · 2025-08-28T09:43:30Z

@coderabbitai review

coderabbitai · 2025-08-28T09:43:36Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

…d_parquet' into puzpuzpuz_designated_ts_in_read_parquet

coderabbitai

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)

core/src/main/java/io/questdb/cairo/O3PartitionJob.java (1)

1526-1534: Guard against zero-length Parquet row groups
Before calling decodeRowGroup, skip any row group whose size is zero to prevent invalid merge ranges ([0, -1]):

 core/src/main/java/io/questdb/cairo/O3PartitionJob.java:1526
-        final int rowGroupSize = decoder.metadata().getRowGroupSize(rowGroupIndex);
+        final int rowGroupSize = decoder.metadata().getRowGroupSize(rowGroupIndex);
+        if (rowGroupSize <= 0) {
+            // skip empty Parquet row group to avoid invalid merge ranges
+            continue;
+        }
         decoder.decodeRowGroup(rowGroupBuffers, parquetColumns, rowGroupIndex, 0, rowGroupSize);

core/src/main/java/io/questdb/griffin/engine/functions/table/ReadParquetRecordCursor.java (1)

100-125: Detect metadata drift when designated timestamp mapping changes.

Currently, metadataHasChanged ignores changes in the designated timestamp’s local index (after skipping unsupported columns). This can cause stale cursors when parquet’s timestamp_index shifts.

     public static boolean metadataHasChanged(RecordMetadata metadata, PartitionDecoder decoder) {
         final PartitionDecoder.Metadata parquetMetadata = decoder.metadata();
         if (metadata.getColumnCount() > parquetMetadata.getColumnCount()) {
             return true;
         }
 
-        int metadataIndex = 0;
+        int metadataIndex = 0;
+        final int parquetTsIndex = parquetMetadata.getTimestampIndex();
+        int expectedLocalTsIndex = -1;
         for (int parquetIndex = 0, n = parquetMetadata.getColumnCount(); parquetIndex < n; parquetIndex++) {
             final int parquetType = parquetMetadata.getColumnType(parquetIndex);
             // If the column is not recognized by the decoder, we have to skip it.
             if (ColumnType.isUndefined(parquetType)) {
                 continue;
             }
             if (!Chars.equalsIgnoreCase(parquetMetadata.getColumnName(parquetIndex), metadata.getColumnName(metadataIndex))) {
                 return true;
             }
             final int metadataType = metadata.getColumnType(metadataIndex);
             final boolean symbolRemappingDetected = (metadataType == ColumnType.VARCHAR && parquetType == ColumnType.SYMBOL);
             // No need to compare column types if we deal with symbol remapping.
             if (!symbolRemappingDetected && metadataType != parquetType) {
                 return true;
             }
+            // Track the local index of the designated timestamp after skipping unsupported columns.
+            if (ColumnType.isTimestamp(parquetType) && parquetIndex == parquetTsIndex) {
+                expectedLocalTsIndex = metadataIndex;
+            }
             metadataIndex++;
         }
-        return metadataIndex != metadata.getColumnCount();
+        if (metadataIndex != metadata.getColumnCount()) {
+            return true;
+        }
+        // Recompile if the local timestamp index differs.
+        return expectedLocalTsIndex != metadata.getTimestampIndex();
     }

core/src/main/java/io/questdb/griffin/engine/table/parquet/PartitionDecoder.java (1)

199-206: Reorder native signature to match wrapper call

- private static native long findRowGroupByTimestamp(
-         long decoderPtr,
-         long rowLo,
-         long rowHi,
-         long timestamp,
-         int timestampIndex
- );
+ private static native long findRowGroupByTimestamp(
+         long decoderPtr,
+         long timestamp,
+         long rowLo,
+         long rowHi,
+         int timestampIndex
+ );

🧹 Nitpick comments (14)

core/rust/qdbr/src/parquet/qdb_metadata.rs (2)

112-118: Prefer rustdoc on field; clarify designated-bit note

Use a rustdoc comment on the field so the note surfaces in generated docs and is colocated with the API.
-    // designated timestamp has TYPE_FLAG_DESIGNATED_TIMESTAMP bit set
+    /// Designated timestamp marker: `column_type` has TYPE_FLAG_DESIGNATED_TIMESTAMP set.
     pub column_type: ColumnType,
134-139: Add Default for QdbMetaV1

Implement Default to preserve zero-arg construction ergonomics without downstream churn:
impl Default for QdbMetaV1 {
    fn default() -> Self {
        Self::new(0)
    }
}

core/rust/qdbr/src/parquet_write/schema.rs (1)

379-386: Minor style: avoid mutable temp for column_type

Slightly tighter by constructing in one expression.

-        let mut column_type = column.data_type;
-        if column.designated_timestamp {
-            column_type = column_type.into_designated()?;
-        }
-        qdb_meta
-            .schema
-            .push(QdbMetaCol { column_type, column_top: column.column_top, format });
+        let column_type = if column.designated_timestamp {
+            column.data_type.into_designated()?
+        } else {
+            column.data_type
+        };
+        qdb_meta.schema.push(QdbMetaCol {
+            column_type,
+            column_top: column.column_top,
+            format,
+        });

core/src/main/java/io/questdb/cairo/ParquetTimestampFinder.java (1)

71-106: Minor: add defensive assertions to aid diagnostics

findTimestamp(...): optionally assert rowGroupCount > 0 and timestampIdAndType.size() >= 2.

timestampAt(...): early-check for metadata.getRowGroupCount() == 0 to fail fast with a clearer message.

Also applies to: 187-194, 209-215

core/src/test/resources/sqllogictest/test/parquet/designated_ts_detection.test (1)

1-1: Nit: keep test name header consistent with file path

Align the # name: with the actual location to avoid confusion in tooling.

Apply this diff:
-# name: test/sql/copy/parquet/designated_ts_detection.test
+# name: test/parquet/designated_ts_detection.test

core/rust/qdbr/src/parquet_read/meta.rs (1)

93-129: Fallback detection via SortingColumns: add nullability and consistency tests

Logic correctly:

requires same first sorting column across all row groups,

requires ascending,

validates Timestamp tag and Required repetition.

Recommend adding tests for:

inconsistent first sorting column across row groups,

first sorting column not Timestamp,

Optional repetition (nulls present),

files without sorting columns (expect -1).

If available, also consider validating that sorting columns refer to leaf column indices aligned with schema_descr.columns() to avoid surprises with nested schemas.

core/src/test/java/io/questdb/test/griffin/engine/table/parquet/ReadParquetFunctionTest.java (1)

161-191: Add a result-equality assertion to complement the plan check

Plan text can drift; also validate ORDER BY elimination by asserting result equality against the native table sorted on ts.

Apply after the existing assertPlanNoLeakCheck:

                 assertPlanNoLeakCheck(
                         "select * from read_parquet('x.parquet') order by ts",
                         expectedPlan
                 );
+                // Also verify result correctness when relying on designated timestamp sorting.
+                sink.clear();
+                sink.put("select * from read_parquet('x.parquet') order by ts");
+                assertSqlCursors0("select * from x order by ts");

core/src/main/java/io/questdb/cairo/O3PartitionJob.java (1)

2534-2536: Skip decoding zero-sized row groups when rebuilding symbol indexes

Defensive skip avoids needless decode calls and edge cases when rowGroupSize == 0.

-                        final int rowGroupCount = parquetMetadata.getRowGroupCount();
+                        final int rowGroupCount = parquetMetadata.getRowGroupCount();
                         for (int rowGroupIndex = 0; rowGroupIndex < rowGroupCount; rowGroupIndex++) {
-                            final int rowGroupSize = parquetMetadata.getRowGroupSize(rowGroupIndex);
+                            final int rowGroupSize = parquetMetadata.getRowGroupSize(rowGroupIndex);
+                            if (rowGroupSize == 0) {
+                                continue;
+                            }

core/src/test/java/io/questdb/test/griffin/engine/table/parquet/PartitionDecoderTest.java (3)

57-84: Stabilize assertion against column order assumptions.

Good end-to-end setup. To make the test resilient if column ordering changes, also assert the timestamp column’s name at the reported index.

             execute(
                 "create table x as (select" +
 ...
                 " from long_sequence(" + rows + ")) timestamp(designated_ts) partition by month"
             );
+
+            // Sanity: the designated timestamp column must be named "designated_ts".
+            // This keeps the test robust even if column ordering changes in the future.
+            // (Verified again below after decoding metadata.)

103-108: Also assert the timestamp column name at the detected index.

You already verify the index; checking the name strengthens the test and decouples it from strict ordering.

-                // designated timestamp is the last column
-                Assert.assertEquals(partitionDecoder.metadata().getColumnCount() - 1, partitionDecoder.metadata().getTimestampIndex());
+                // designated timestamp is the last column
+                Assert.assertEquals(partitionDecoder.metadata().getColumnCount() - 1, partitionDecoder.metadata().getTimestampIndex());
+                // and its name matches the designated column used in DDL
+                TestUtils.assertEquals(
+                        "designated_ts",
+                        partitionDecoder.metadata().getColumnName(partitionDecoder.metadata().getTimestampIndex())
+                );

110-115: LGTM on getter migration; consider avoiding hardcoded column count.

Getter usage looks correct. Minor: using readerMeta.getColumnCount() as the loop bound would avoid relying on the local constant ‘columns’.

core/src/main/java/io/questdb/griffin/engine/functions/table/ReadParquetRecordCursor.java (1)

175-181: Pre-size columns to reduce resizes on wide schemas.

When many columns are present, ensuring capacity avoids multiple reallocations.

-            for (int i = 0, n = parquetMetadata.getColumnCount(); i < n; i++) {
+            columns.ensureCapacity(parquetMetadata.getColumnCount() * 2);
+            for (int i = 0, n = parquetMetadata.getColumnCount(); i < n; i++) {
                 final int columnType = parquetMetadata.getColumnType(i);
                 if (!ColumnType.isUndefined(columnType)) {
                     columns.add(i);
                     columns.add(columnType);
                 }
             }

core/src/main/java/io/questdb/griffin/engine/table/parquet/PartitionDecoder.java (2)

243-279: Avoid shadowing the outer ‘metadata’ with the method parameter.

The inner method parameter name ‘metadata’ shadows the outer field ‘metadata’, which is error-prone. Rename the parameter for clarity.

-        public void copyToSansUnsupported(GenericRecordMetadata metadata, boolean treatSymbolsAsVarchar) {
-            metadata.clear();
+        public void copyToSansUnsupported(GenericRecordMetadata dst, boolean treatSymbolsAsVarchar) {
+            dst.clear();
             final int timestampIndex = getTimestampIndex();
             int copyTimestampIndex = -1;
             for (int i = 0, n = getColumnCount(); i < n; i++) {
                 final String columnName = Chars.toString(getColumnName(i));
                 final int columnType = getColumnType(i);
 
                 if (ColumnType.isUndefined(columnType)) {
                     LOG.info().$("unsupported column type, skipping [column=").$(columnName).I$();
                     continue;
                 }
 
                 if (ColumnType.isSymbol(columnType)) {
                     if (treatSymbolsAsVarchar) {
-                        metadata.add(new TableColumnMetadata(columnName, ColumnType.VARCHAR));
+                        dst.add(new TableColumnMetadata(columnName, ColumnType.VARCHAR));
                     } else {
-                        metadata.add(new TableColumnMetadata(
+                        dst.add(new TableColumnMetadata(
                                 columnName,
                                 columnType,
                                 false,
                                 64,
                                 true,
                                 null
                         ));
                     }
                 } else {
-                    metadata.add(new TableColumnMetadata(columnName, columnType));
+                    dst.add(new TableColumnMetadata(columnName, columnType));
                     // Check for designated timestamp's local index.
                     if (ColumnType.isTimestamp(columnType) && i == timestampIndex) {
-                        copyTimestampIndex = metadata.getColumnCount() - 1;
+                        copyTimestampIndex = dst.getColumnCount() - 1;
                     }
                 }
             }
 
-            metadata.setTimestampIndex(copyTimestampIndex);
+            dst.setTimestampIndex(copyTimestampIndex);
         }

317-327: Minor: use int consistently for columnCount.

getColumnCount() returns int; using int locally avoids unnecessary widening.

-            final long columnCount = getColumnCount();
+            final int columnCount = getColumnCount();
             long currentColumnPtr = columnsPtr;
-            for (long i = 0; i < columnCount; i++) {
+            for (int i = 0; i < columnCount; i++) {
                 DirectString str = directStringPool.next();
                 int len = Unsafe.getUnsafe().getInt(currentColumnPtr + COLUMN_RECORD_NAME_SIZE_OFFSET);
                 long colNamePtr = Unsafe.getUnsafe().getLong(currentColumnPtr + COLUMN_RECORD_NAME_PTR_OFFSET);
                 str.of(colNamePtr, len);
                 columnNames.add(str);
                 currentColumnPtr += COLUMN_STRUCT_SIZE;
             }

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 3d345f1 and 25aaf34.

⛔ Files ignored due to path filters (6)

core/src/main/resources/io/questdb/bin/darwin-aarch64/libquestdbr.dylib is excluded by !**/*.dylib
core/src/main/resources/io/questdb/bin/darwin-x86-64/libquestdbr.dylib is excluded by !**/*.dylib
core/src/main/resources/io/questdb/bin/linux-aarch64/libquestdbr.so is excluded by !**/*.so
core/src/main/resources/io/questdb/bin/linux-x86-64/libquestdbr.so is excluded by !**/*.so
core/src/main/resources/io/questdb/bin/windows-x86-64/questdbr.dll is excluded by !**/*.dll
core/src/test/resources/sqllogictest/data/parquet-testing/sorted.parquet is excluded by !**/*.parquet

📒 Files selected for processing (21)

core/rust/qdb-core/src/col_type.rs (2 hunks)
core/rust/qdbr/src/parquet/qdb_metadata.rs (2 hunks)
core/rust/qdbr/src/parquet_read/jni.rs (2 hunks)
core/rust/qdbr/src/parquet_read/meta.rs (5 hunks)
core/rust/qdbr/src/parquet_read/mod.rs (2 hunks)
core/rust/qdbr/src/parquet_write/schema.rs (1 hunks)
core/src/main/java/io/questdb/cairo/O3PartitionJob.java (4 hunks)
core/src/main/java/io/questdb/cairo/ParquetTimestampFinder.java (5 hunks)
core/src/main/java/io/questdb/cairo/TableWriter.java (4 hunks)
core/src/main/java/io/questdb/cairo/sql/PageFrameMemoryPool.java (2 hunks)
core/src/main/java/io/questdb/griffin/engine/functions/table/ReadParquetPageFrameCursor.java (2 hunks)
core/src/main/java/io/questdb/griffin/engine/functions/table/ReadParquetRecordCursor.java (4 hunks)
core/src/main/java/io/questdb/griffin/engine/table/BwdTableReaderPageFrameCursor.java (1 hunks)
core/src/main/java/io/questdb/griffin/engine/table/FwdTableReaderPageFrameCursor.java (1 hunks)
core/src/main/java/io/questdb/griffin/engine/table/parquet/PartitionDecoder.java (7 hunks)
core/src/main/java/io/questdb/griffin/engine/table/parquet/PartitionDescriptor.java (1 hunks)
core/src/test/java/io/questdb/test/cairo/IntervalBwdPartitionFrameCursorTest.java (1 hunks)
core/src/test/java/io/questdb/test/cairo/IntervalFwdPartitionFrameCursorTest.java (1 hunks)
core/src/test/java/io/questdb/test/griffin/engine/table/parquet/PartitionDecoderTest.java (3 hunks)
core/src/test/java/io/questdb/test/griffin/engine/table/parquet/ReadParquetFunctionTest.java (2 hunks)
core/src/test/resources/sqllogictest/test/parquet/designated_ts_detection.test (1 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (26)

GitHub Check: New pull request (SelfHosted Griffin And Fuzz tests on linux-x64-zfs)
GitHub Check: New pull request (SelfHosted Other tests on linux-x64-zfs)
GitHub Check: New pull request (SelfHosted Griffin And Fuzz tests on linux-arm64)
GitHub Check: New pull request (SelfHosted Other tests on linux-arm64)
GitHub Check: New pull request (SelfHosted Other tests Start X64Zfs Agent)
GitHub Check: New pull request (Hosted Running tests on windows-other)
GitHub Check: New pull request (Hosted Running tests on windows-pgwire)
GitHub Check: New pull request (Hosted Running tests on windows-cairo)
GitHub Check: New pull request (Hosted Running tests on windows-fuzz2)
GitHub Check: New pull request (Hosted Running tests on windows-fuzz1)
GitHub Check: New pull request (Hosted Running tests on windows-griffin)
GitHub Check: New pull request (Hosted Running tests with cover on linux-other)
GitHub Check: New pull request (Hosted Running tests on mac-other)
GitHub Check: New pull request (Hosted Running tests with cover on linux-pgwire)
GitHub Check: New pull request (Hosted Running tests on mac-pgwire)
GitHub Check: New pull request (Hosted Running tests with cover on linux-cairo)
GitHub Check: New pull request (Hosted Running tests on mac-cairo-fuzz)
GitHub Check: New pull request (Hosted Running tests with cover on linux-fuzz)
GitHub Check: New pull request (Hosted Running tests on mac-cairo)
GitHub Check: New pull request (Hosted Running tests with cover on linux-griffin)
GitHub Check: New pull request (SelfHosted Griffin And Fuzz tests Start X64Zfs Agent)
GitHub Check: New pull request (Hosted Running tests on mac-griffin)
GitHub Check: New pull request (SelfHosted Other tests Start ARM Agent)
GitHub Check: New pull request (SelfHosted Griffin And Fuzz tests Start ARM Agent)
GitHub Check: New pull request (Rust Test and Lint on linux-jdk17)
GitHub Check: New pull request (Check Changes Check changes)

🔇 Additional comments (29)

core/src/main/java/io/questdb/griffin/engine/table/parquet/PartitionDescriptor.java (1)

182-182: No-op change acknowledged

Trailing newline at EOF only. Nothing to action.

core/rust/qdbr/src/parquet_write/schema.rs (1)

371-386: LGTM: write-path sets designated bit and preallocates metadata

Preallocation + explicit push is clean; designated-flag propagation aligns with repetition change for timestamps.

core/src/main/java/io/questdb/griffin/engine/table/BwdTableReaderPageFrameCursor.java (2)

253-253: Getter migration is correct

Using metadata.getRowGroupCount() matches the new API.

259-259: Getter migration is correct

Using metadata.getRowGroupSize(i) matches the new API.

core/src/main/java/io/questdb/griffin/engine/table/FwdTableReaderPageFrameCursor.java (1)

254-266: Getter migration in computeParquetFrame looks correct; edge-case check for empty row-group sets?

API switch to getRowGroupCount()/getRowGroupSize() preserves behavior. Please confirm upstream guarantees rowGroupCount > 0 for PARQUET partitions; otherwise adjustedHi will be 0 and reenterPartitionFrame may spin if partitionHi > 0.

core/src/main/java/io/questdb/cairo/sql/PageFrameMemoryPool.java (1)

270-299: LGTM: Parquet metadata getters adoption is consistent.

The migration to getColumnCount()/getColumnId()/getColumnName() aligns with the new API and keeps mapping logic intact.

core/src/main/java/io/questdb/griffin/engine/functions/table/ReadParquetPageFrameCursor.java (1)

139-141: Getter migration for rowCount/rowGroupCount is correct

No functional changes; values are cached in fields and used consistently in size()/next().

core/src/test/resources/sqllogictest/test/parquet/designated_ts_detection.test (1)

7-11: Test intent is clear and valuable

Good coverage for designated timestamp detection via SAMPLE BY over parquet_scan().

core/src/test/java/io/questdb/test/cairo/IntervalBwdPartitionFrameCursorTest.java (2)

646-648: Getter rename alignment looks good

Updated calls to parquetMetadata.getRowGroupCount()/getRowGroupSize(i) correctly reflect the new API.

646-653: Confirm indexing semantics when iterating decoded row groups

Unrelated to the rename: after each decodeRowGroup(..., 0, size), addr points to the current row-group buffer. The loop indexes r using absolute frame rows (rowHi/rowLo), which may exceed the current row-group buffer unless the decoder materializes absolute offsets. Please verify buffer indexing matches decoder semantics or constrain r to the current row group range.

core/src/test/java/io/questdb/test/cairo/IntervalFwdPartitionFrameCursorTest.java (2)

672-674: Getter rename alignment looks good

Updated to parquetMetadata.getRowGroupCount()/getRowGroupSize(i); consistent with the new API.

672-679: Double-check per-row-group iteration bounds

Same concern as the backward test: r iterates over frame row bounds while addr references the just-decoded row-group buffer. Please confirm or adjust to iterate 0..size per group if the buffer is per-group.

core/rust/qdbr/src/parquet_read/mod.rs (2)

25-27: Adding timestamp_index to ParquetDecoder is sound; ensure Java reads offset at runtime

Field is #[repr(C)] and exported via a dedicated JNI offset accessor; ABI remains stable as Java retrieves offsets dynamically. Confirm Java code uses the new timestampIndexOffset() when reading metadata and honors -1 as “absent”.

102-118: Test update to QdbMeta::new(1) looks correct

Capacity matches a single column; negative test still asserts the intended error path.
core/rust/qdbr/src/parquet_read/jni.rs (2)
201-207: New timestampIndexOffset accessor is correct

Using offset_of! on #[repr(C)] keeps Java side resilient to struct layout changes.

153-160: find_row_group_by_timestamp already rejects out-of-range timestamp_index
The implementation in decode.rs checks
if timestamp_column_index >= self.col_count {
    return Err(...);
}
so passing u32::MAX (from Java’s –1) is caught and turned into a thrown exception rather than causing OOB. No additional guard needed.

Likely an incorrect or invalid review comment.
core/rust/qdbr/src/parquet_read/meta.rs (4)

54-55: Explicitly initializing timestamp_index to -1 is correct

Clear sentinel for “no designated timestamp.”

66-74: Designated bit handling is correct; clears flag after capturing index

Good: capture the designated column via index, then normalize the ColumnType using into_non_designated() so downstream users don’t see a double designation.

Please confirm into_non_designated() cannot fail when is_designated() is true.

82-83: Comment wording improvement acknowledged

“The column is not supported” is clearer.

139-147: Propagating timestamp_index into the returned decoder is correct

Field exposure matches JNI offset accessor and Java metadata reads.

core/src/test/java/io/questdb/test/griffin/engine/table/parquet/ReadParquetFunctionTest.java (1)

445-447: LGTM: expected header updated with designated timestamp column

Passing "a_ts" into assertQueryNoLeakCheck aligns the test with the new designated timestamp propagation in read_parquet().

core/src/main/java/io/questdb/cairo/O3PartitionJob.java (1)

2511-2512: Getter migration LGTM; confirm column-id semantics

Switch to getColumnCount()/getColumnId(idx) looks correct. Please confirm getColumnId returns the writer-index used by tableWriterMetadata (not the physical field order), otherwise the parquetColumnIndex lookup can mismatch.

core/src/main/java/io/questdb/cairo/TableWriter.java (2)

1659-1667: Getter migration in convertPartitionParquetToNative looks correct.

Swapping to getRowGroupCount()/getRowGroupSize() matches the new API; no functional changes introduced here.

5616-5617: No stale Parquet metadata calls remain
Search only hit source.columnCount() in FrameAlgebra.java (unrelated to Parquet metadata); all Parquet metadata accesses now use the updated get* getters.

core/src/test/java/io/questdb/test/griffin/engine/table/parquet/PartitionDecoderTest.java (1)

130-135: LGTM on minimal OOM setup.

DDL reflects the designated timestamp correctly and aligns with later decode usage.

core/src/main/java/io/questdb/griffin/engine/functions/table/ReadParquetRecordCursor.java (2)

195-196: LGTM on getter migration for row count.

Accessor updates are consistent with the new Metadata API.

215-217: LGTM on row-group getters migration.

The new getRowGroupCount()/getRowGroupSize() usage is correct.

core/src/main/java/io/questdb/griffin/engine/table/parquet/PartitionDecoder.java (2)

309-312: LGTM: timestamp index exposure.

getTimestampIndex() correctly reads from the mapped decoder struct via TIMESTAMP_INDEX_OFFSET.

342-344: LGTM: static wiring for TIMESTAMP_INDEX_OFFSET.

Static init calls timestampIndexOffset(); matches the new metadata surface.

core/rust/qdb-core/src/col_type.rs

core/src/main/java/io/questdb/cairo/O3PartitionJob.java

core/src/main/java/io/questdb/cairo/ParquetTimestampFinder.java

core/src/main/java/io/questdb/cairo/TableWriter.java

core/src/main/java/io/questdb/griffin/engine/functions/table/ReadParquetPageFrameCursor.java

…ted_ts_in_read_parquet

…d_parquet' into puzpuzpuz_designated_ts_in_read_parquet

bluestreak01 · 2025-10-01T18:36:17Z

@coderabbitai review

core/rust/qdbr/src/parquet_read/mod.rs

bluestreak01 · 2025-10-02T15:00:21Z

fix for glibc failures: #6216

…d_parquet' into puzpuzpuz_designated_ts_in_read_parquet

puzpuzpuz · 2025-10-03T08:20:20Z

@amunra I've removed Undefined column type from Rust code in 92f4d05

@mtopolnik I've converted timestamp_index to Option<NonMaxU32> in 3fa6698 and ecf70ae

puzpuzpuz · 2025-10-03T09:32:59Z

@mtopolnik thanks for the suggestions!

glasstiger · 2025-10-09T18:28:59Z

[PR Coverage check]

😍 pass : 139 / 172 (80.81%)

file detail

	path	covered line	new line	coverage
🔵	parquet_read/meta.rs	23	46	50.00%
🔵	parquet_read/decode.rs	14	23	60.87%
🔵	io/questdb/cairo/sql/PageFrameMemoryPool.java	5	6	83.33%
🔵	io/questdb/griffin/engine/table/parquet/PartitionDecoder.java	18	18	100.00%
🔵	io/questdb/griffin/engine/functions/table/ReadParquetRecordCursor.java	8	8	100.00%
🔵	io/questdb/griffin/engine/table/FwdTableReaderPageFrameCursor.java	2	2	100.00%
🔵	io/questdb/griffin/engine/functions/table/ReadParquetPageFrameCursor.java	3	3	100.00%
🔵	allocator.rs	13	13	100.00%
🔵	io/questdb/cairo/O3PartitionJob.java	7	7	100.00%
🔵	io/questdb/cairo/ParquetTimestampFinder.java	10	10	100.00%
🔵	io/questdb/griffin/engine/table/BwdTableReaderPageFrameCursor.java	2	2	100.00%
🔵	parquet_read/jni.rs	8	8	100.00%
🔵	io/questdb/cairo/TableWriter.java	9	9	100.00%
🔵	parquet/qdb_metadata.rs	5	5	100.00%
🔵	parquet_write/schema.rs	11	11	100.00%
🔵	parquet_read/mod.rs	1	1	100.00%

perf(sql): recognize data sorted by timestamp in read_parquet() SQL f…

8a17a24

…unction

puzpuzpuz self-assigned this Aug 26, 2025

puzpuzpuz and others added 2 commits August 26, 2025 17:52

Polish code

f382dff

Rebuild Rust libraries

53eab53

Skip designated ts detection logic for our own files

b4e18bb

puzpuzpuz marked this pull request as ready for review August 28, 2025 09:38

Rebuild Rust libraries

25aaf34

puzpuzpuz added 2 commits August 28, 2025 12:48

Improve comments

aa1e995

Merge remote-tracking branch 'upstream/puzpuzpuz_designated_ts_in_rea…

d1dc5bb

…d_parquet' into puzpuzpuz_designated_ts_in_read_parquet

coderabbitai bot reviewed Aug 28, 2025

View reviewed changes

puzpuzpuz and others added 10 commits August 28, 2025 13:05

Address review comments

140396e

Rebuild Rust libraries

c80b05e

Address nits

a1bd1a4

Rebuild Rust libraries

461bb24

Fix test

6e5c213

Merge remote-tracking branch 'upstream/master' into puzpuzpuz_designa…

75d91ee

…ted_ts_in_read_parquet

Update a few parquet2 deps

1638321

Regenerate remaining lock files

b77f743

Fix release pipeline matrix

8c729bf

Rebuild Rust libraries

6c804b2

puzpuzpuz added SQL Issues or changes relating to SQL execution Performance Performance improvements labels Sep 26, 2025

puzpuzpuz and others added 3 commits September 26, 2025 16:46

Update Rust dev deps and fix tests

8b00490

Merge remote-tracking branch 'upstream/puzpuzpuz_designated_ts_in_rea…

d1a14e7

…d_parquet' into puzpuzpuz_designated_ts_in_read_parquet

Rebuild Rust libraries

c1b2e72

GitHub Actions - Rebuild Native Libraries and others added 2 commits October 2, 2025 07:33

Rebuild Rust libraries

29b2f72

Fix StringNullCheckBenchmark

cee4fc2

mtopolnik reviewed Oct 2, 2025

View reviewed changes

core/rust/qdbr/src/parquet_read/mod.rs Outdated Show resolved Hide resolved

puzpuzpuz and others added 2 commits October 2, 2025 17:19

Remove undefined column type from Rust

92f4d05

Merge branch 'master' into puzpuzpuz_designated_ts_in_read_parquet

9083554

mtopolnik previously approved these changes Oct 2, 2025

View reviewed changes

bluestreak01 and others added 4 commits October 2, 2025 16:26

Merge branch 'master' into puzpuzpuz_designated_ts_in_read_parquet

7e2435f

Covert timestamp index to option

3fa6698

Merge remote-tracking branch 'upstream/puzpuzpuz_designated_ts_in_rea…

1d8bd4e

…d_parquet' into puzpuzpuz_designated_ts_in_read_parquet

Fix timestamp_index load code in Java

ecf70ae

puzpuzpuz dismissed mtopolnik’s stale review via ecf70ae October 3, 2025 08:17

GitHub Actions - Rebuild Native Libraries and others added 2 commits October 3, 2025 08:21

Rebuild Rust libraries

157ff5e

Improve comments clarify

d1c94ba

mtopolnik approved these changes Oct 3, 2025

View reviewed changes

puzpuzpuz and others added 2 commits October 3, 2025 15:06

Merge branch 'master' into puzpuzpuz_designated_ts_in_read_parquet

8ab5732

Merge branch 'master' into puzpuzpuz_designated_ts_in_read_parquet

616476c

bluestreak01 approved these changes Oct 9, 2025

View reviewed changes

bluestreak01 merged commit 65b474c into master Oct 10, 2025
36 of 37 checks passed

bluestreak01 deleted the puzpuzpuz_designated_ts_in_read_parquet branch October 10, 2025 14:41

coderabbitai bot mentioned this pull request Oct 20, 2025

feat(sql): add exporting PARQUET functionality to COPY #6008

Merged

32 tasks

This was referenced Nov 5, 2025

perf(sql): pushdown projections to parquet file reads #6287

Closed

fix(core): fix occasional errors in result set size calculation #6351

Merged

coderabbitai bot mentioned this pull request Jan 14, 2026

perf(core): streaming parquet export #6300

Merged

1 task

coderabbitai bot mentioned this pull request Jan 28, 2026

fix(core): fix dictionary and bitpack encoding for symbol columns in parquet #6708

Merged

This was referenced Mar 2, 2026

fix(parquet): fix crash when parquet partition statistics are missing #6827

Merged

feat(core): parquet row group pruning with min/max statistics and bloom filters #6739

Merged

Conversation

puzpuzpuz commented Aug 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot commented Aug 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Pre-merge checks and finishing touches

Uh oh!

CLAassistant commented Aug 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

puzpuzpuz commented Aug 28, 2025

Uh oh!

coderabbitai bot commented Aug 28, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bluestreak01 commented Oct 1, 2025

Uh oh!

Uh oh!

bluestreak01 commented Oct 2, 2025

Uh oh!

puzpuzpuz commented Oct 3, 2025

Uh oh!

puzpuzpuz commented Oct 3, 2025

Uh oh!

glasstiger commented Oct 9, 2025

[PR Coverage check]

file detail

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

puzpuzpuz commented Aug 26, 2025 •

edited

Loading

coderabbitai bot commented Aug 26, 2025 •

edited

Loading

CLAassistant commented Aug 26, 2025 •

edited

Loading