Skip to content

fix(core): fix read_parquet() crash on SYMBOL columns from native parquet files#6865

Merged
bluestreak01 merged 6 commits intomasterfrom
fix-read-parquet-symbol-crash
Mar 11, 2026
Merged

fix(core): fix read_parquet() crash on SYMBOL columns from native parquet files#6865
bluestreak01 merged 6 commits intomasterfrom
fix-read-parquet-symbol-crash

Conversation

@ideoma
Copy link
Copy Markdown
Collaborator

@ideoma ideoma commented Mar 10, 2026

Summary

  • read_parquet() crashes with SIGSEGV when reading parquet files containing SYMBOL columns encoded by QuestDB's PartitionEncoder
  • canProjectMetadata() passed the actual column type (SYMBOL) to the Rust decoder instead of the expected type (VARCHAR), causing the Rust decoder to write INT32 symbol keys that Java then read as VARCHAR pointers
  • Pass the expected type (VARCHAR) for SYMBOL-to-VARCHAR conversions so the Rust decoder resolves dictionary entries to UTF-8 strings

Test plan

  • Add testNativeSymbolColumnReadBack that creates a table with SYMBOL columns, encodes it to parquet via PartitionEncoder, and reads it back with read_parquet()
  • Parameterized test runs both parallel=true and parallel=false variants (the bug only affected the serial path)
  • Before fix: SIGSEGV crash on parallel=false
  • After fix: both variants pass, returning correct VARCHAR values

🤖 Generated with Claude Code

canProjectMetadata() passed the parquet file's actual column type (SYMBOL)
to the Rust decoder instead of the expected type (VARCHAR) when converting
SYMBOL to VARCHAR. The Rust decoder then used the SYMBOL decode path, which
writes INT32 symbol keys into the output buffer. The Java side expected
VARCHAR data (UTF-8 string pointers), causing a segfault.

Pass expectedType (VARCHAR) for SYMBOL-to-VARCHAR conversions so the Rust
decoder selects the correct BaseVarDictDecoder path that resolves dictionary
entries to UTF-8 strings.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@ideoma ideoma added Bug Incorrect or unexpected behavior SQL Issues or changes relating to SQL execution labels Mar 10, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 10, 2026

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 8643396f-4ae1-4100-9d8d-06d0ad68c508

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review

Walkthrough

Adjusts Parquet metadata/type handling for SYMBOL→VARCHAR promotion, fixes 64-bit masking when composing column metadata for exports, and adds a test validating native SYMBOL column decoding via read_parquet().

Changes

Cohort / File(s) Summary
Parquet metadata projection
core/src/main/java/io/questdb/griffin/engine/functions/table/ReadParquetRecordCursor.java
In canProjectMetadata, record expectedType (VARCHAR) instead of actualType when a SYMBOL column is promoted to VARCHAR for metadata mapping.
Parquet SYMBOL read test
core/src/test/java/io/questdb/test/griffin/engine/table/parquet/ReadParquetFunctionTest.java
Added testNativeSymbolColumnReadBack() to verify read_parquet() decodes QuestDB native Parquet SYMBOL columns to UTF‑8 strings.
Export column metadata masking
core/src/main/java/io/questdb/cutlass/parquet/CopyExportRequestTask.java
Mask symbol and column type values to 32 bits before OR'ing with a 64-bit writer index to avoid sign-extension clobbering upper bits when composing columnMetadata.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested reviewers

  • bluestreak01
🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 40.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main fix: addressing a crash in read_parquet() when handling SYMBOL columns from native parquet files.
Description check ✅ Passed The description clearly relates to the changeset by explaining the root cause (actual type vs expected type mismatch), the solution (passing expected type), and test validation approach.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix-read-parquet-symbol-crash

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@ideoma
Copy link
Copy Markdown
Collaborator Author

ideoma commented Mar 10, 2026

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 10, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
core/src/test/java/io/questdb/test/griffin/engine/table/parquet/ReadParquetFunctionTest.java (1)

74-77: Consolidate INSERT statements into a single multi-row INSERT.

As per coding guidelines, use a single INSERT statement to insert multiple rows in tests.

♻️ Proposed fix
-            execute("CREATE TABLE x (id SYMBOL, val INT, ts TIMESTAMP) TIMESTAMP(ts) PARTITION BY DAY");
-            execute("INSERT INTO x VALUES ('AAA', 1, '2024-01-01T00:00:00.000000Z')");
-            execute("INSERT INTO x VALUES ('BBB', 2, '2024-01-01T01:00:00.000000Z')");
-            execute("INSERT INTO x VALUES ('AAA', 3, '2024-01-01T02:00:00.000000Z')");
+            execute("CREATE TABLE x (id SYMBOL, val INT, ts TIMESTAMP) TIMESTAMP(ts) PARTITION BY DAY");
+            execute(
+                    """
+                    INSERT INTO x VALUES
+                        ('AAA', 1, '2024-01-01T00:00:00.000000Z'),
+                        ('BBB', 2, '2024-01-01T01:00:00.000000Z'),
+                        ('AAA', 3, '2024-01-01T02:00:00.000000Z')
+                    """
+            );
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@core/src/test/java/io/questdb/test/griffin/engine/table/parquet/ReadParquetFunctionTest.java`
around lines 74 - 77, Combine the three separate execute("INSERT INTO x VALUES
...") calls into a single multi-row INSERT by replacing the three execute calls
that insert ('AAA', 1, ...), ('BBB', 2, ...), ('AAA', 3, ...) with one
execute("INSERT INTO x VALUES ('AAA', 1, '2024-01-01T00:00:00.000000Z'), ('BBB',
2, '2024-01-01T01:00:00.000000Z'), ('AAA', 3, '2024-01-01T02:00:00.000000Z')");
keep the existing execute method and the surrounding CREATE TABLE call intact so
only the INSERTs are consolidated in ReadParquetFunctionTest.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In
`@core/src/test/java/io/questdb/test/griffin/engine/table/parquet/ReadParquetFunctionTest.java`:
- Around line 74-77: Combine the three separate execute("INSERT INTO x VALUES
...") calls into a single multi-row INSERT by replacing the three execute calls
that insert ('AAA', 1, ...), ('BBB', 2, ...), ('AAA', 3, ...) with one
execute("INSERT INTO x VALUES ('AAA', 1, '2024-01-01T00:00:00.000000Z'), ('BBB',
2, '2024-01-01T01:00:00.000000Z'), ('AAA', 3, '2024-01-01T02:00:00.000000Z')");
keep the existing execute method and the surrounding CREATE TABLE call intact so
only the INSERTs are consolidated in ReadParquetFunctionTest.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 15a4d520-7528-48cc-87ae-2e02faed6eeb

📥 Commits

Reviewing files that changed from the base of the PR and between bf956c6 and 5280cc9.

📒 Files selected for processing (2)
  • core/src/main/java/io/questdb/griffin/engine/functions/table/ReadParquetRecordCursor.java
  • core/src/test/java/io/questdb/test/griffin/engine/table/parquet/ReadParquetFunctionTest.java

@ideoma
Copy link
Copy Markdown
Collaborator Author

ideoma commented Mar 10, 2026

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 10, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

bluestreak01 and others added 2 commits March 10, 2026 19:35
StreamingParquetBenchmarkTest has the same bit-packing pattern as
CopyExportRequestTask: when symbolColumnType has bit 31 set (no-null
flag), Java sign-extends it to 64 bits before the OR, clobbering the
writerIdx in the upper 32 bits. Apply the same 0xFFFFFFFFL mask.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@glasstiger
Copy link
Copy Markdown
Contributor

[PR Coverage check]

😍 pass : 3 / 3 (100.00%)

file detail

path covered line new line coverage
🔵 io/questdb/griffin/engine/functions/table/ReadParquetRecordCursor.java 1 1 100.00%
🔵 io/questdb/cutlass/parquet/CopyExportRequestTask.java 2 2 100.00%

@questdb-butler
Copy link
Copy Markdown

⚠️ Enterprise CI Failed

The enterprise test suite failed for this PR.

Build: View Details
Tested Commit: 8c40b6103d652537db89a08fd00ae104a9a2c668

Please investigate the failure before merging.

@bluestreak01 bluestreak01 merged commit 8bb20f7 into master Mar 11, 2026
50 of 51 checks passed
@bluestreak01 bluestreak01 deleted the fix-read-parquet-symbol-crash branch March 11, 2026 19:55
maciulis pushed a commit to maciulis/questdb that referenced this pull request Mar 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Bug Incorrect or unexpected behavior SQL Issues or changes relating to SQL execution

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants