Skip to content

perf(sql): recognize data sorted by timestamp in read_parquet() SQL function#6079

Merged
bluestreak01 merged 33 commits intomasterfrom
puzpuzpuz_designated_ts_in_read_parquet
Oct 10, 2025
Merged

perf(sql): recognize data sorted by timestamp in read_parquet() SQL function#6079
bluestreak01 merged 33 commits intomasterfrom
puzpuzpuz_designated_ts_in_read_parquet

Conversation

@puzpuzpuz
Copy link
Copy Markdown
Contributor

@puzpuzpuz puzpuzpuz commented Aug 26, 2025

Closes #5293

Also updates zstd, snappy and other compression algo library versions in parquet2 and regenerates cargo lock files.

@puzpuzpuz puzpuzpuz self-assigned this Aug 26, 2025
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Aug 26, 2025

Important

Review skipped

Auto reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

Adds designated-timestamp support across Parquet paths: ColumnType designation API, writer marks designated timestamps, reader records/clears designated flag, infers timestamp when absent, exposes timestamp_index through Rust/JNI/Java, and refactors Java metadata accessors to getters; updates tests and sqllogictest.

Changes

Cohort / File(s) Summary
ColumnType designated API
core/rust/qdb-core/src/col_type.rs
Add pub fn into_non_designated(self) -> CoreResult<ColumnType> and unit test for designated/un-designated timestamp handling.
Parquet metadata (Rust writer)
core/rust/qdbr/src/parquet/qdb_metadata.rs, core/rust/qdbr/src/parquet_write/schema.rs
Change QdbMeta::new()QdbMeta::new(capacity); preallocate schema; writer marks designated timestamp via into_designated() and propagates errors.
Parquet decoder (Rust read/meta)
core/rust/qdbr/src/parquet_read/meta.rs, core/rust/qdbr/src/parquet_read/mod.rs
Add timestamp_index: i32 to ParquetDecoder; record designated timestamp index and clear designated flag before storing types; infer timestamp by scanning row groups if none designated; return decoder with timestamp_index.
Rust JNI surface
core/rust/qdbr/src/parquet_read/jni.rs
Rename timestamp parameter to timestamp_index in findRowGroupByTimestamp native binding; add Java_...PartitionDecoder_timestampIndexOffset to return timestamp_index field offset for JNI.
Java PartitionDecoder & Metadata
core/src/main/java/io/questdb/griffin/engine/table/parquet/PartitionDecoder.java
Add TIMESTAMP_INDEX_OFFSET and timestampIndexOffset() native accessor; rename native param to timestampIndex; refactor Metadata API to getters (getColumnCount, getColumnId, getColumnName, getRowCount, getRowGroupCount, getRowGroupSize), add getTimestampIndex() and setTimestampIndex(), update init/copy flows to propagate timestamp index.
Java call sites / engine integration
core/src/main/java/io/questdb/cairo/O3PartitionJob.java, .../TableWriter.java, .../ParquetTimestampFinder.java, .../sql/PageFrameMemoryPool.java, .../engine/table/*PageFrameCursor.java, .../functions/table/ReadParquetPageFrameCursor.java, .../functions/table/ReadParquetRecordCursor.java
Replace field-style metadata accessors with new getters (getRowGroupCount, getRowGroupSize, getColumnCount, getColumnId, getColumnName, getRowCount, getColumnType); add some assertions for non-zero counts.
Java tests and SQL tests
core/src/test/java/io/questdb/test/.../PartitionDecoderTest.java, .../ReadParquetFunctionTest.java, .../cairo/IntervalBwdPartitionFrameCursorTest.java, .../cairo/IntervalFwdPartitionFrameCursorTest.java, core/src/test/resources/sqllogictest/test/parquet/designated_ts_detection.test
Update tests to use getter API; add testDesignatedTimestamp() and sqllogictest to validate designated timestamp detection and SAMPLE BY behavior; adjust expected outputs to include timestamp column.
Parquet encoder API change
core/rust/qdbr/parquet2/src/encoding/hybrid_rle/mod.rs, .../encoder.rs
Change encode_u32 signature to accept explicit len: usize parameter; update tests to pass len.
Misc Rust test API updates
core/rust/qdbr/src/parquet2/Cargo.toml, core/rust/qdb-core/Cargo.toml, core/rust/qdbr/Cargo.toml, core/rust/qdbr/src/allocator.rs, core/rust/qdbr/src/parquet_read/decode.rs, core/rust/qdbr/parquet2/src/encoding/hybrid_rle/encoder.rs
Dev-dep version bumps; test RNG API changes (thread_rng/gen_range → rng/random_range/etc.); minor test refactors (len capture).
CI / trivial
ci/github-release-pipeline.yml, core/src/main/java/io/questdb/griffin/engine/table/parquet/PartitionDescriptor.java
CI displayName fix and trailing newline; no functional change.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant SQL as SQL read_parquet()
  participant J as Java PartitionDecoder
  participant JNI as JNI bridge
  participant R as Rust ParquetDecoder
  participant FS as Parquet file scanning

  SQL->>J: open parquet path -> create decoder
  J->>JNI: call native init (path)
  JNI->>R: construct ParquetDecoder (read schema)
  note right of R: For each column<br/>if ColumnType is designated Timestamp:<br/>- record timestamp_index<br/>- clear designated flag before storing ColumnMeta
  R->>FS: scan row groups if timestamp not designated (infer by sorted column)
  FS-->>R: row-group sort info
  R-->>JNI: return decoder pointer (contains timestamp_index)
  JNI-->>J: decoder initialized
  J->>J: metadata.getTimestampIndex()
  alt timestamp_index >= 0
    J->>J: use designated timestamp for TS-specific ops (findRowGroupByTimestamp)
  else
    J->>J: fallback behavior (no designated timestamp)
  end
  J-->>SQL: produce frames/records using discovered timestamp info
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested reviewers

  • puzpuzpuz

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Out of Scope Changes Check ⚠️ Warning The change to ci/github-release-pipeline.yml fixing a display name typo and altering the OS matrix is unrelated to designated timestamp detection or dependency updates documented in the pull request objectives, introducing out-of-scope modifications. Please remove or separate the CI pipeline adjustments into a dedicated pull request so that this change remains focused on the timestamp detection feature and dependency updates.
Docstring Coverage ⚠️ Warning Docstring coverage is 42.86% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (3 passed)
Check name Status Explanation
Title Check ✅ Passed The title clearly and concisely summarizes the main functional change of recognizing timestamp‐sorted data in the read_parquet() SQL function, directly reflecting the objective of detecting a designated timestamp column for performance optimization.
Linked Issues Check ✅ Passed The pull request implements automatic detection of a designated timestamp from Parquet metadata flags and fallback scanning of row groups, propagates the timestamp index through the native decoder and Java PartitionDecoder.Metadata API, updates the SQL read_parquet planning to use this index in both parallel and sequential execution paths, and updates tests to verify the feature, thereby fulfilling the objectives of issue #5293.
Description Check ✅ Passed The description mentions closing issue #5293 and notes the update of compression libraries and regeneration of Cargo.lock files, which accurately reflects the core feature implementation and dependency changes in the pull request.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Aug 26, 2025

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
2 out of 3 committers have signed the CLA.

✅ puzpuzpuz
✅ bluestreak01
❌ GitHub Actions - Rebuild Native Libraries


GitHub Actions - Rebuild Native Libraries seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

@puzpuzpuz puzpuzpuz marked this pull request as ready for review August 28, 2025 09:38
@puzpuzpuz
Copy link
Copy Markdown
Contributor Author

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Aug 28, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
core/src/main/java/io/questdb/cairo/O3PartitionJob.java (1)

1526-1534: Guard against zero-length Parquet row groups
Before calling decodeRowGroup, skip any row group whose size is zero to prevent invalid merge ranges ([0, -1]):

 core/src/main/java/io/questdb/cairo/O3PartitionJob.java:1526
-        final int rowGroupSize = decoder.metadata().getRowGroupSize(rowGroupIndex);
+        final int rowGroupSize = decoder.metadata().getRowGroupSize(rowGroupIndex);
+        if (rowGroupSize <= 0) {
+            // skip empty Parquet row group to avoid invalid merge ranges
+            continue;
+        }
         decoder.decodeRowGroup(rowGroupBuffers, parquetColumns, rowGroupIndex, 0, rowGroupSize);
core/src/main/java/io/questdb/griffin/engine/functions/table/ReadParquetRecordCursor.java (1)

100-125: Detect metadata drift when designated timestamp mapping changes.

Currently, metadataHasChanged ignores changes in the designated timestamp’s local index (after skipping unsupported columns). This can cause stale cursors when parquet’s timestamp_index shifts.

     public static boolean metadataHasChanged(RecordMetadata metadata, PartitionDecoder decoder) {
         final PartitionDecoder.Metadata parquetMetadata = decoder.metadata();
         if (metadata.getColumnCount() > parquetMetadata.getColumnCount()) {
             return true;
         }
 
-        int metadataIndex = 0;
+        int metadataIndex = 0;
+        final int parquetTsIndex = parquetMetadata.getTimestampIndex();
+        int expectedLocalTsIndex = -1;
         for (int parquetIndex = 0, n = parquetMetadata.getColumnCount(); parquetIndex < n; parquetIndex++) {
             final int parquetType = parquetMetadata.getColumnType(parquetIndex);
             // If the column is not recognized by the decoder, we have to skip it.
             if (ColumnType.isUndefined(parquetType)) {
                 continue;
             }
             if (!Chars.equalsIgnoreCase(parquetMetadata.getColumnName(parquetIndex), metadata.getColumnName(metadataIndex))) {
                 return true;
             }
             final int metadataType = metadata.getColumnType(metadataIndex);
             final boolean symbolRemappingDetected = (metadataType == ColumnType.VARCHAR && parquetType == ColumnType.SYMBOL);
             // No need to compare column types if we deal with symbol remapping.
             if (!symbolRemappingDetected && metadataType != parquetType) {
                 return true;
             }
+            // Track the local index of the designated timestamp after skipping unsupported columns.
+            if (ColumnType.isTimestamp(parquetType) && parquetIndex == parquetTsIndex) {
+                expectedLocalTsIndex = metadataIndex;
+            }
             metadataIndex++;
         }
-        return metadataIndex != metadata.getColumnCount();
+        if (metadataIndex != metadata.getColumnCount()) {
+            return true;
+        }
+        // Recompile if the local timestamp index differs.
+        return expectedLocalTsIndex != metadata.getTimestampIndex();
     }
core/src/main/java/io/questdb/griffin/engine/table/parquet/PartitionDecoder.java (1)

199-206: Reorder native signature to match wrapper call

- private static native long findRowGroupByTimestamp(
-         long decoderPtr,
-         long rowLo,
-         long rowHi,
-         long timestamp,
-         int timestampIndex
- );
+ private static native long findRowGroupByTimestamp(
+         long decoderPtr,
+         long timestamp,
+         long rowLo,
+         long rowHi,
+         int timestampIndex
+ );
🧹 Nitpick comments (14)
core/rust/qdbr/src/parquet/qdb_metadata.rs (2)

112-118: Prefer rustdoc on field; clarify designated-bit note

Use a rustdoc comment on the field so the note surfaces in generated docs and is colocated with the API.

-    // designated timestamp has TYPE_FLAG_DESIGNATED_TIMESTAMP bit set
+    /// Designated timestamp marker: `column_type` has TYPE_FLAG_DESIGNATED_TIMESTAMP set.
     pub column_type: ColumnType,

134-139: Add Default for QdbMetaV1

Implement Default to preserve zero-arg construction ergonomics without downstream churn:

impl Default for QdbMetaV1 {
    fn default() -> Self {
        Self::new(0)
    }
}
core/rust/qdbr/src/parquet_write/schema.rs (1)

379-386: Minor style: avoid mutable temp for column_type

Slightly tighter by constructing in one expression.

-        let mut column_type = column.data_type;
-        if column.designated_timestamp {
-            column_type = column_type.into_designated()?;
-        }
-        qdb_meta
-            .schema
-            .push(QdbMetaCol { column_type, column_top: column.column_top, format });
+        let column_type = if column.designated_timestamp {
+            column.data_type.into_designated()?
+        } else {
+            column.data_type
+        };
+        qdb_meta.schema.push(QdbMetaCol {
+            column_type,
+            column_top: column.column_top,
+            format,
+        });
core/src/main/java/io/questdb/cairo/ParquetTimestampFinder.java (1)

71-106: Minor: add defensive assertions to aid diagnostics

  • findTimestamp(...): optionally assert rowGroupCount > 0 and timestampIdAndType.size() >= 2.
  • timestampAt(...): early-check for metadata.getRowGroupCount() == 0 to fail fast with a clearer message.

Also applies to: 187-194, 209-215

core/src/test/resources/sqllogictest/test/parquet/designated_ts_detection.test (1)

1-1: Nit: keep test name header consistent with file path

Align the # name: with the actual location to avoid confusion in tooling.

Apply this diff:

-# name: test/sql/copy/parquet/designated_ts_detection.test
+# name: test/parquet/designated_ts_detection.test
core/rust/qdbr/src/parquet_read/meta.rs (1)

93-129: Fallback detection via SortingColumns: add nullability and consistency tests

Logic correctly:

  • requires same first sorting column across all row groups,
  • requires ascending,
  • validates Timestamp tag and Required repetition.

Recommend adding tests for:

  • inconsistent first sorting column across row groups,
  • first sorting column not Timestamp,
  • Optional repetition (nulls present),
  • files without sorting columns (expect -1).

If available, also consider validating that sorting columns refer to leaf column indices aligned with schema_descr.columns() to avoid surprises with nested schemas.

core/src/test/java/io/questdb/test/griffin/engine/table/parquet/ReadParquetFunctionTest.java (1)

161-191: Add a result-equality assertion to complement the plan check

Plan text can drift; also validate ORDER BY elimination by asserting result equality against the native table sorted on ts.

Apply after the existing assertPlanNoLeakCheck:

                 assertPlanNoLeakCheck(
                         "select * from read_parquet('x.parquet') order by ts",
                         expectedPlan
                 );
+                // Also verify result correctness when relying on designated timestamp sorting.
+                sink.clear();
+                sink.put("select * from read_parquet('x.parquet') order by ts");
+                assertSqlCursors0("select * from x order by ts");
core/src/main/java/io/questdb/cairo/O3PartitionJob.java (1)

2534-2536: Skip decoding zero-sized row groups when rebuilding symbol indexes

Defensive skip avoids needless decode calls and edge cases when rowGroupSize == 0.

-                        final int rowGroupCount = parquetMetadata.getRowGroupCount();
+                        final int rowGroupCount = parquetMetadata.getRowGroupCount();
                         for (int rowGroupIndex = 0; rowGroupIndex < rowGroupCount; rowGroupIndex++) {
-                            final int rowGroupSize = parquetMetadata.getRowGroupSize(rowGroupIndex);
+                            final int rowGroupSize = parquetMetadata.getRowGroupSize(rowGroupIndex);
+                            if (rowGroupSize == 0) {
+                                continue;
+                            }
core/src/test/java/io/questdb/test/griffin/engine/table/parquet/PartitionDecoderTest.java (3)

57-84: Stabilize assertion against column order assumptions.

Good end-to-end setup. To make the test resilient if column ordering changes, also assert the timestamp column’s name at the reported index.

             execute(
                 "create table x as (select" +
 ...
                 " from long_sequence(" + rows + ")) timestamp(designated_ts) partition by month"
             );
+
+            // Sanity: the designated timestamp column must be named "designated_ts".
+            // This keeps the test robust even if column ordering changes in the future.
+            // (Verified again below after decoding metadata.)

103-108: Also assert the timestamp column name at the detected index.

You already verify the index; checking the name strengthens the test and decouples it from strict ordering.

-                // designated timestamp is the last column
-                Assert.assertEquals(partitionDecoder.metadata().getColumnCount() - 1, partitionDecoder.metadata().getTimestampIndex());
+                // designated timestamp is the last column
+                Assert.assertEquals(partitionDecoder.metadata().getColumnCount() - 1, partitionDecoder.metadata().getTimestampIndex());
+                // and its name matches the designated column used in DDL
+                TestUtils.assertEquals(
+                        "designated_ts",
+                        partitionDecoder.metadata().getColumnName(partitionDecoder.metadata().getTimestampIndex())
+                );

110-115: LGTM on getter migration; consider avoiding hardcoded column count.

Getter usage looks correct. Minor: using readerMeta.getColumnCount() as the loop bound would avoid relying on the local constant ‘columns’.

core/src/main/java/io/questdb/griffin/engine/functions/table/ReadParquetRecordCursor.java (1)

175-181: Pre-size columns to reduce resizes on wide schemas.

When many columns are present, ensuring capacity avoids multiple reallocations.

-            for (int i = 0, n = parquetMetadata.getColumnCount(); i < n; i++) {
+            columns.ensureCapacity(parquetMetadata.getColumnCount() * 2);
+            for (int i = 0, n = parquetMetadata.getColumnCount(); i < n; i++) {
                 final int columnType = parquetMetadata.getColumnType(i);
                 if (!ColumnType.isUndefined(columnType)) {
                     columns.add(i);
                     columns.add(columnType);
                 }
             }
core/src/main/java/io/questdb/griffin/engine/table/parquet/PartitionDecoder.java (2)

243-279: Avoid shadowing the outer ‘metadata’ with the method parameter.

The inner method parameter name ‘metadata’ shadows the outer field ‘metadata’, which is error-prone. Rename the parameter for clarity.

-        public void copyToSansUnsupported(GenericRecordMetadata metadata, boolean treatSymbolsAsVarchar) {
-            metadata.clear();
+        public void copyToSansUnsupported(GenericRecordMetadata dst, boolean treatSymbolsAsVarchar) {
+            dst.clear();
             final int timestampIndex = getTimestampIndex();
             int copyTimestampIndex = -1;
             for (int i = 0, n = getColumnCount(); i < n; i++) {
                 final String columnName = Chars.toString(getColumnName(i));
                 final int columnType = getColumnType(i);
 
                 if (ColumnType.isUndefined(columnType)) {
                     LOG.info().$("unsupported column type, skipping [column=").$(columnName).I$();
                     continue;
                 }
 
                 if (ColumnType.isSymbol(columnType)) {
                     if (treatSymbolsAsVarchar) {
-                        metadata.add(new TableColumnMetadata(columnName, ColumnType.VARCHAR));
+                        dst.add(new TableColumnMetadata(columnName, ColumnType.VARCHAR));
                     } else {
-                        metadata.add(new TableColumnMetadata(
+                        dst.add(new TableColumnMetadata(
                                 columnName,
                                 columnType,
                                 false,
                                 64,
                                 true,
                                 null
                         ));
                     }
                 } else {
-                    metadata.add(new TableColumnMetadata(columnName, columnType));
+                    dst.add(new TableColumnMetadata(columnName, columnType));
                     // Check for designated timestamp's local index.
                     if (ColumnType.isTimestamp(columnType) && i == timestampIndex) {
-                        copyTimestampIndex = metadata.getColumnCount() - 1;
+                        copyTimestampIndex = dst.getColumnCount() - 1;
                     }
                 }
             }
 
-            metadata.setTimestampIndex(copyTimestampIndex);
+            dst.setTimestampIndex(copyTimestampIndex);
         }

317-327: Minor: use int consistently for columnCount.

getColumnCount() returns int; using int locally avoids unnecessary widening.

-            final long columnCount = getColumnCount();
+            final int columnCount = getColumnCount();
             long currentColumnPtr = columnsPtr;
-            for (long i = 0; i < columnCount; i++) {
+            for (int i = 0; i < columnCount; i++) {
                 DirectString str = directStringPool.next();
                 int len = Unsafe.getUnsafe().getInt(currentColumnPtr + COLUMN_RECORD_NAME_SIZE_OFFSET);
                 long colNamePtr = Unsafe.getUnsafe().getLong(currentColumnPtr + COLUMN_RECORD_NAME_PTR_OFFSET);
                 str.of(colNamePtr, len);
                 columnNames.add(str);
                 currentColumnPtr += COLUMN_STRUCT_SIZE;
             }
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 3d345f1 and 25aaf34.

⛔ Files ignored due to path filters (6)
  • core/src/main/resources/io/questdb/bin/darwin-aarch64/libquestdbr.dylib is excluded by !**/*.dylib
  • core/src/main/resources/io/questdb/bin/darwin-x86-64/libquestdbr.dylib is excluded by !**/*.dylib
  • core/src/main/resources/io/questdb/bin/linux-aarch64/libquestdbr.so is excluded by !**/*.so
  • core/src/main/resources/io/questdb/bin/linux-x86-64/libquestdbr.so is excluded by !**/*.so
  • core/src/main/resources/io/questdb/bin/windows-x86-64/questdbr.dll is excluded by !**/*.dll
  • core/src/test/resources/sqllogictest/data/parquet-testing/sorted.parquet is excluded by !**/*.parquet
📒 Files selected for processing (21)
  • core/rust/qdb-core/src/col_type.rs (2 hunks)
  • core/rust/qdbr/src/parquet/qdb_metadata.rs (2 hunks)
  • core/rust/qdbr/src/parquet_read/jni.rs (2 hunks)
  • core/rust/qdbr/src/parquet_read/meta.rs (5 hunks)
  • core/rust/qdbr/src/parquet_read/mod.rs (2 hunks)
  • core/rust/qdbr/src/parquet_write/schema.rs (1 hunks)
  • core/src/main/java/io/questdb/cairo/O3PartitionJob.java (4 hunks)
  • core/src/main/java/io/questdb/cairo/ParquetTimestampFinder.java (5 hunks)
  • core/src/main/java/io/questdb/cairo/TableWriter.java (4 hunks)
  • core/src/main/java/io/questdb/cairo/sql/PageFrameMemoryPool.java (2 hunks)
  • core/src/main/java/io/questdb/griffin/engine/functions/table/ReadParquetPageFrameCursor.java (2 hunks)
  • core/src/main/java/io/questdb/griffin/engine/functions/table/ReadParquetRecordCursor.java (4 hunks)
  • core/src/main/java/io/questdb/griffin/engine/table/BwdTableReaderPageFrameCursor.java (1 hunks)
  • core/src/main/java/io/questdb/griffin/engine/table/FwdTableReaderPageFrameCursor.java (1 hunks)
  • core/src/main/java/io/questdb/griffin/engine/table/parquet/PartitionDecoder.java (7 hunks)
  • core/src/main/java/io/questdb/griffin/engine/table/parquet/PartitionDescriptor.java (1 hunks)
  • core/src/test/java/io/questdb/test/cairo/IntervalBwdPartitionFrameCursorTest.java (1 hunks)
  • core/src/test/java/io/questdb/test/cairo/IntervalFwdPartitionFrameCursorTest.java (1 hunks)
  • core/src/test/java/io/questdb/test/griffin/engine/table/parquet/PartitionDecoderTest.java (3 hunks)
  • core/src/test/java/io/questdb/test/griffin/engine/table/parquet/ReadParquetFunctionTest.java (2 hunks)
  • core/src/test/resources/sqllogictest/test/parquet/designated_ts_detection.test (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (26)
  • GitHub Check: New pull request (SelfHosted Griffin And Fuzz tests on linux-x64-zfs)
  • GitHub Check: New pull request (SelfHosted Other tests on linux-x64-zfs)
  • GitHub Check: New pull request (SelfHosted Griffin And Fuzz tests on linux-arm64)
  • GitHub Check: New pull request (SelfHosted Other tests on linux-arm64)
  • GitHub Check: New pull request (SelfHosted Other tests Start X64Zfs Agent)
  • GitHub Check: New pull request (Hosted Running tests on windows-other)
  • GitHub Check: New pull request (Hosted Running tests on windows-pgwire)
  • GitHub Check: New pull request (Hosted Running tests on windows-cairo)
  • GitHub Check: New pull request (Hosted Running tests on windows-fuzz2)
  • GitHub Check: New pull request (Hosted Running tests on windows-fuzz1)
  • GitHub Check: New pull request (Hosted Running tests on windows-griffin)
  • GitHub Check: New pull request (Hosted Running tests with cover on linux-other)
  • GitHub Check: New pull request (Hosted Running tests on mac-other)
  • GitHub Check: New pull request (Hosted Running tests with cover on linux-pgwire)
  • GitHub Check: New pull request (Hosted Running tests on mac-pgwire)
  • GitHub Check: New pull request (Hosted Running tests with cover on linux-cairo)
  • GitHub Check: New pull request (Hosted Running tests on mac-cairo-fuzz)
  • GitHub Check: New pull request (Hosted Running tests with cover on linux-fuzz)
  • GitHub Check: New pull request (Hosted Running tests on mac-cairo)
  • GitHub Check: New pull request (Hosted Running tests with cover on linux-griffin)
  • GitHub Check: New pull request (SelfHosted Griffin And Fuzz tests Start X64Zfs Agent)
  • GitHub Check: New pull request (Hosted Running tests on mac-griffin)
  • GitHub Check: New pull request (SelfHosted Other tests Start ARM Agent)
  • GitHub Check: New pull request (SelfHosted Griffin And Fuzz tests Start ARM Agent)
  • GitHub Check: New pull request (Rust Test and Lint on linux-jdk17)
  • GitHub Check: New pull request (Check Changes Check changes)
🔇 Additional comments (29)
core/src/main/java/io/questdb/griffin/engine/table/parquet/PartitionDescriptor.java (1)

182-182: No-op change acknowledged

Trailing newline at EOF only. Nothing to action.

core/rust/qdbr/src/parquet_write/schema.rs (1)

371-386: LGTM: write-path sets designated bit and preallocates metadata

Preallocation + explicit push is clean; designated-flag propagation aligns with repetition change for timestamps.

core/src/main/java/io/questdb/griffin/engine/table/BwdTableReaderPageFrameCursor.java (2)

253-253: Getter migration is correct

Using metadata.getRowGroupCount() matches the new API.


259-259: Getter migration is correct

Using metadata.getRowGroupSize(i) matches the new API.

core/src/main/java/io/questdb/griffin/engine/table/FwdTableReaderPageFrameCursor.java (1)

254-266: Getter migration in computeParquetFrame looks correct; edge-case check for empty row-group sets?

API switch to getRowGroupCount()/getRowGroupSize() preserves behavior. Please confirm upstream guarantees rowGroupCount > 0 for PARQUET partitions; otherwise adjustedHi will be 0 and reenterPartitionFrame may spin if partitionHi > 0.

core/src/main/java/io/questdb/cairo/sql/PageFrameMemoryPool.java (1)

270-299: LGTM: Parquet metadata getters adoption is consistent.

The migration to getColumnCount()/getColumnId()/getColumnName() aligns with the new API and keeps mapping logic intact.

core/src/main/java/io/questdb/griffin/engine/functions/table/ReadParquetPageFrameCursor.java (1)

139-141: Getter migration for rowCount/rowGroupCount is correct

No functional changes; values are cached in fields and used consistently in size()/next().

core/src/test/resources/sqllogictest/test/parquet/designated_ts_detection.test (1)

7-11: Test intent is clear and valuable

Good coverage for designated timestamp detection via SAMPLE BY over parquet_scan().

core/src/test/java/io/questdb/test/cairo/IntervalBwdPartitionFrameCursorTest.java (2)

646-648: Getter rename alignment looks good

Updated calls to parquetMetadata.getRowGroupCount()/getRowGroupSize(i) correctly reflect the new API.


646-653: Confirm indexing semantics when iterating decoded row groups

Unrelated to the rename: after each decodeRowGroup(..., 0, size), addr points to the current row-group buffer. The loop indexes r using absolute frame rows (rowHi/rowLo), which may exceed the current row-group buffer unless the decoder materializes absolute offsets. Please verify buffer indexing matches decoder semantics or constrain r to the current row group range.

core/src/test/java/io/questdb/test/cairo/IntervalFwdPartitionFrameCursorTest.java (2)

672-674: Getter rename alignment looks good

Updated to parquetMetadata.getRowGroupCount()/getRowGroupSize(i); consistent with the new API.


672-679: Double-check per-row-group iteration bounds

Same concern as the backward test: r iterates over frame row bounds while addr references the just-decoded row-group buffer. Please confirm or adjust to iterate 0..size per group if the buffer is per-group.

core/rust/qdbr/src/parquet_read/mod.rs (2)

25-27: Adding timestamp_index to ParquetDecoder is sound; ensure Java reads offset at runtime

Field is #[repr(C)] and exported via a dedicated JNI offset accessor; ABI remains stable as Java retrieves offsets dynamically. Confirm Java code uses the new timestampIndexOffset() when reading metadata and honors -1 as “absent”.


102-118: Test update to QdbMeta::new(1) looks correct

Capacity matches a single column; negative test still asserts the intended error path.

core/rust/qdbr/src/parquet_read/jni.rs (2)

201-207: New timestampIndexOffset accessor is correct

Using offset_of! on #[repr(C)] keeps Java side resilient to struct layout changes.


153-160: find_row_group_by_timestamp already rejects out-of-range timestamp_index
The implementation in decode.rs checks

if timestamp_column_index >= self.col_count {
    return Err(...);
}

so passing u32::MAX (from Java’s –1) is caught and turned into a thrown exception rather than causing OOB. No additional guard needed.

Likely an incorrect or invalid review comment.

core/rust/qdbr/src/parquet_read/meta.rs (4)

54-55: Explicitly initializing timestamp_index to -1 is correct

Clear sentinel for “no designated timestamp.”


66-74: Designated bit handling is correct; clears flag after capturing index

Good: capture the designated column via index, then normalize the ColumnType using into_non_designated() so downstream users don’t see a double designation.

Please confirm into_non_designated() cannot fail when is_designated() is true.


82-83: Comment wording improvement acknowledged

“The column is not supported” is clearer.


139-147: Propagating timestamp_index into the returned decoder is correct

Field exposure matches JNI offset accessor and Java metadata reads.

core/src/test/java/io/questdb/test/griffin/engine/table/parquet/ReadParquetFunctionTest.java (1)

445-447: LGTM: expected header updated with designated timestamp column

Passing "a_ts" into assertQueryNoLeakCheck aligns the test with the new designated timestamp propagation in read_parquet().

core/src/main/java/io/questdb/cairo/O3PartitionJob.java (1)

2511-2512: Getter migration LGTM; confirm column-id semantics

Switch to getColumnCount()/getColumnId(idx) looks correct. Please confirm getColumnId returns the writer-index used by tableWriterMetadata (not the physical field order), otherwise the parquetColumnIndex lookup can mismatch.

core/src/main/java/io/questdb/cairo/TableWriter.java (2)

1659-1667: Getter migration in convertPartitionParquetToNative looks correct.

Swapping to getRowGroupCount()/getRowGroupSize() matches the new API; no functional changes introduced here.


5616-5617: No stale Parquet metadata calls remain
Search only hit source.columnCount() in FrameAlgebra.java (unrelated to Parquet metadata); all Parquet metadata accesses now use the updated get* getters.

core/src/test/java/io/questdb/test/griffin/engine/table/parquet/PartitionDecoderTest.java (1)

130-135: LGTM on minimal OOM setup.

DDL reflects the designated timestamp correctly and aligns with later decode usage.

core/src/main/java/io/questdb/griffin/engine/functions/table/ReadParquetRecordCursor.java (2)

195-196: LGTM on getter migration for row count.

Accessor updates are consistent with the new Metadata API.


215-217: LGTM on row-group getters migration.

The new getRowGroupCount()/getRowGroupSize() usage is correct.

core/src/main/java/io/questdb/griffin/engine/table/parquet/PartitionDecoder.java (2)

309-312: LGTM: timestamp index exposure.

getTimestampIndex() correctly reads from the mapped decoder struct via TIMESTAMP_INDEX_OFFSET.


342-344: LGTM: static wiring for TIMESTAMP_INDEX_OFFSET.

Static init calls timestampIndexOffset(); matches the new metadata surface.

@puzpuzpuz puzpuzpuz added SQL Issues or changes relating to SQL execution Performance Performance improvements labels Sep 26, 2025
@bluestreak01
Copy link
Copy Markdown
Member

@coderabbitai review

GitHub Actions - Rebuild Native Libraries and others added 2 commits October 2, 2025 07:33
@bluestreak01
Copy link
Copy Markdown
Member

fix for glibc failures: #6216

mtopolnik
mtopolnik previously approved these changes Oct 2, 2025
@puzpuzpuz
Copy link
Copy Markdown
Contributor Author

@amunra I've removed Undefined column type from Rust code in 92f4d05

@mtopolnik I've converted timestamp_index to Option<NonMaxU32> in 3fa6698 and ecf70ae

GitHub Actions - Rebuild Native Libraries and others added 2 commits October 3, 2025 08:21
@puzpuzpuz
Copy link
Copy Markdown
Contributor Author

@mtopolnik thanks for the suggestions!

@glasstiger
Copy link
Copy Markdown
Contributor

[PR Coverage check]

😍 pass : 139 / 172 (80.81%)

file detail

path covered line new line coverage
🔵 parquet_read/meta.rs 23 46 50.00%
🔵 parquet_read/decode.rs 14 23 60.87%
🔵 io/questdb/cairo/sql/PageFrameMemoryPool.java 5 6 83.33%
🔵 io/questdb/griffin/engine/table/parquet/PartitionDecoder.java 18 18 100.00%
🔵 io/questdb/griffin/engine/functions/table/ReadParquetRecordCursor.java 8 8 100.00%
🔵 io/questdb/griffin/engine/table/FwdTableReaderPageFrameCursor.java 2 2 100.00%
🔵 io/questdb/griffin/engine/functions/table/ReadParquetPageFrameCursor.java 3 3 100.00%
🔵 allocator.rs 13 13 100.00%
🔵 io/questdb/cairo/O3PartitionJob.java 7 7 100.00%
🔵 io/questdb/cairo/ParquetTimestampFinder.java 10 10 100.00%
🔵 io/questdb/griffin/engine/table/BwdTableReaderPageFrameCursor.java 2 2 100.00%
🔵 parquet_read/jni.rs 8 8 100.00%
🔵 io/questdb/cairo/TableWriter.java 9 9 100.00%
🔵 parquet/qdb_metadata.rs 5 5 100.00%
🔵 parquet_write/schema.rs 11 11 100.00%
🔵 parquet_read/mod.rs 1 1 100.00%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Performance Performance improvements SQL Issues or changes relating to SQL execution

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Automatically recognize designated timestamp column in read_parquet() SQL function

5 participants