fix(core): fix read_parquet() crash on SYMBOL columns from native parquet files by ideoma · Pull Request #6865 · questdb/questdb

ideoma · 2026-03-10T16:31:08Z

Summary

read_parquet() crashes with SIGSEGV when reading parquet files containing SYMBOL columns encoded by QuestDB's PartitionEncoder
canProjectMetadata() passed the actual column type (SYMBOL) to the Rust decoder instead of the expected type (VARCHAR), causing the Rust decoder to write INT32 symbol keys that Java then read as VARCHAR pointers
Pass the expected type (VARCHAR) for SYMBOL-to-VARCHAR conversions so the Rust decoder resolves dictionary entries to UTF-8 strings

Test plan

Add testNativeSymbolColumnReadBack that creates a table with SYMBOL columns, encodes it to parquet via PartitionEncoder, and reads it back with read_parquet()
Parameterized test runs both parallel=true and parallel=false variants (the bug only affected the serial path)
Before fix: SIGSEGV crash on parallel=false
After fix: both variants pass, returning correct VARCHAR values

🤖 Generated with Claude Code

canProjectMetadata() passed the parquet file's actual column type (SYMBOL) to the Rust decoder instead of the expected type (VARCHAR) when converting SYMBOL to VARCHAR. The Rust decoder then used the SYMBOL decode path, which writes INT32 symbol keys into the output buffer. The Java side expected VARCHAR data (UTF-8 string pointers), causing a segfault. Pass expectedType (VARCHAR) for SYMBOL-to-VARCHAR conversions so the Rust decoder selects the correct BaseVarDictDecoder path that resolves dictionary entries to UTF-8 strings. Co-Authored-By: Claude Opus 4.6 <[email protected]>

coderabbitai · 2026-03-10T16:31:51Z

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 8643396f-4ae1-4100-9d8d-06d0ad68c508

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

Walkthrough

Adjusts Parquet metadata/type handling for SYMBOL→VARCHAR promotion, fixes 64-bit masking when composing column metadata for exports, and adds a test validating native SYMBOL column decoding via read_parquet().

Changes

Cohort / File(s)	Summary
Parquet metadata projection `core/src/main/java/io/questdb/griffin/engine/functions/table/ReadParquetRecordCursor.java`	In `canProjectMetadata`, record `expectedType` (VARCHAR) instead of `actualType` when a SYMBOL column is promoted to VARCHAR for metadata mapping.
Parquet SYMBOL read test `core/src/test/java/io/questdb/test/griffin/engine/table/parquet/ReadParquetFunctionTest.java`	Added `testNativeSymbolColumnReadBack()` to verify `read_parquet()` decodes QuestDB native Parquet SYMBOL columns to UTF‑8 strings.
Export column metadata masking `core/src/main/java/io/questdb/cutlass/parquet/CopyExportRequestTask.java`	Mask symbol and column type values to 32 bits before OR'ing with a 64-bit writer index to avoid sign-extension clobbering upper bits when composing `columnMetadata`.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

feat(sql): add column projection pushdown for read_parquet() #6551: Modifies canProjectMetadata for SYMBOL→VARCHAR recording — directly related to the same projection change.
fix(sql): support more Parquet field type combinations in read_parquet #6069: Related changes around Parquet reading and SYMBOL/VARCHAR handling and remapping.

Suggested reviewers

bluestreak01

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 40.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately summarizes the main fix: addressing a crash in read_parquet() when handling SYMBOL columns from native parquet files.
Description check	✅ Passed	The description clearly relates to the changeset by explaining the root cause (actual type vs expected type mismatch), the solution (passing expected type), and test validation approach.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch fix-read-parquet-symbol-crash

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

ideoma · 2026-03-10T16:42:02Z

@coderabbitai review

coderabbitai · 2026-03-10T16:42:16Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

🧹 Nitpick comments (1)

core/src/test/java/io/questdb/test/griffin/engine/table/parquet/ReadParquetFunctionTest.java (1)

74-77: Consolidate INSERT statements into a single multi-row INSERT.

As per coding guidelines, use a single INSERT statement to insert multiple rows in tests.

♻️ Proposed fix

-            execute("CREATE TABLE x (id SYMBOL, val INT, ts TIMESTAMP) TIMESTAMP(ts) PARTITION BY DAY");
-            execute("INSERT INTO x VALUES ('AAA', 1, '2024-01-01T00:00:00.000000Z')");
-            execute("INSERT INTO x VALUES ('BBB', 2, '2024-01-01T01:00:00.000000Z')");
-            execute("INSERT INTO x VALUES ('AAA', 3, '2024-01-01T02:00:00.000000Z')");
+            execute("CREATE TABLE x (id SYMBOL, val INT, ts TIMESTAMP) TIMESTAMP(ts) PARTITION BY DAY");
+            execute(
+                    """
+                    INSERT INTO x VALUES
+                        ('AAA', 1, '2024-01-01T00:00:00.000000Z'),
+                        ('BBB', 2, '2024-01-01T01:00:00.000000Z'),
+                        ('AAA', 3, '2024-01-01T02:00:00.000000Z')
+                    """
+            );

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In
`@core/src/test/java/io/questdb/test/griffin/engine/table/parquet/ReadParquetFunctionTest.java`
around lines 74 - 77, Combine the three separate execute("INSERT INTO x VALUES
...") calls into a single multi-row INSERT by replacing the three execute calls
that insert ('AAA', 1, ...), ('BBB', 2, ...), ('AAA', 3, ...) with one
execute("INSERT INTO x VALUES ('AAA', 1, '2024-01-01T00:00:00.000000Z'), ('BBB',
2, '2024-01-01T01:00:00.000000Z'), ('AAA', 3, '2024-01-01T02:00:00.000000Z')");
keep the existing execute method and the surrounding CREATE TABLE call intact so
only the INSERTs are consolidated in ReadParquetFunctionTest.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In
`@core/src/test/java/io/questdb/test/griffin/engine/table/parquet/ReadParquetFunctionTest.java`:
- Around line 74-77: Combine the three separate execute("INSERT INTO x VALUES
...") calls into a single multi-row INSERT by replacing the three execute calls
that insert ('AAA', 1, ...), ('BBB', 2, ...), ('AAA', 3, ...) with one
execute("INSERT INTO x VALUES ('AAA', 1, '2024-01-01T00:00:00.000000Z'), ('BBB',
2, '2024-01-01T01:00:00.000000Z'), ('AAA', 3, '2024-01-01T02:00:00.000000Z')");
keep the existing execute method and the surrounding CREATE TABLE call intact so
only the INSERTs are consolidated in ReadParquetFunctionTest.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 15a4d520-7528-48cc-87ae-2e02faed6eeb

📥 Commits

Reviewing files that changed from the base of the PR and between bf956c6 and 5280cc9.

📒 Files selected for processing (2)

core/src/main/java/io/questdb/griffin/engine/functions/table/ReadParquetRecordCursor.java
core/src/test/java/io/questdb/test/griffin/engine/table/parquet/ReadParquetFunctionTest.java

ideoma · 2026-03-10T18:00:21Z

@coderabbitai review

coderabbitai · 2026-03-10T18:00:34Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

StreamingParquetBenchmarkTest has the same bit-packing pattern as CopyExportRequestTask: when symbolColumnType has bit 31 set (no-null flag), Java sign-extends it to 64 bits before the OR, clobbering the writerIdx in the upper 32 bits. Apply the same 0xFFFFFFFFL mask. Co-Authored-By: Claude Opus 4.6 <[email protected]>

core/src/main/java/io/questdb/griffin/engine/functions/table/ReadParquetRecordCursor.java

glasstiger · 2026-03-11T19:34:26Z

[PR Coverage check]

😍 pass : 3 / 3 (100.00%)

file detail

	path	covered line	new line	coverage
🔵	io/questdb/griffin/engine/functions/table/ReadParquetRecordCursor.java	1	1	100.00%
🔵	io/questdb/cutlass/parquet/CopyExportRequestTask.java	2	2	100.00%

questdb-butler · 2026-03-11T19:42:04Z

⚠️ Enterprise CI Failed

The enterprise test suite failed for this PR.

Build: View Details
Tested Commit: 8c40b6103d652537db89a08fd00ae104a9a2c668

Please investigate the failure before merging.

…quet files (#6865)

…quet files (questdb#6865)

ideoma added Bug Incorrect or unexpected behavior SQL Issues or changes relating to SQL execution labels Mar 10, 2026

coderabbitai bot reviewed Mar 10, 2026

View reviewed changes

ideoma added 2 commits March 10, 2026 17:59

fix symbol writing 2

e069aab

Merge branch 'master' into fix-read-parquet-symbol-crash

11e1f7e

bluestreak01 and others added 2 commits March 10, 2026 19:35

Merge branch 'master' into fix-read-parquet-symbol-crash

16549f8

bluestreak01 approved these changes Mar 10, 2026

View reviewed changes

kafka1991 reviewed Mar 11, 2026

View reviewed changes

core/src/main/java/io/questdb/griffin/engine/functions/table/ReadParquetRecordCursor.java Show resolved Hide resolved

Merge branch 'master' into fix-read-parquet-symbol-crash

8c40b61

bluestreak01 merged commit 8bb20f7 into master Mar 11, 2026
50 of 51 checks passed

bluestreak01 deleted the fix-read-parquet-symbol-crash branch March 11, 2026 19:55

mtopolnik pushed a commit that referenced this pull request Mar 12, 2026

fix(core): fix read_parquet() crash on SYMBOL columns from native par…

ea6b182

…quet files (#6865)

maciulis pushed a commit to maciulis/questdb that referenced this pull request Mar 16, 2026

fix(core): fix read_parquet() crash on SYMBOL columns from native par…

7c490b5

…quet files (questdb#6865)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(core): fix read_parquet() crash on SYMBOL columns from native parquet files#6865

fix(core): fix read_parquet() crash on SYMBOL columns from native parquet files#6865
bluestreak01 merged 6 commits intomasterfrom
fix-read-parquet-symbol-crash

ideoma commented Mar 10, 2026

Uh oh!

coderabbitai bot commented Mar 10, 2026 •

edited

Loading

Review skipped

❌ Failed checks (1 warning)

Uh oh!

ideoma commented Mar 10, 2026

Uh oh!

coderabbitai bot commented Mar 10, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

ideoma commented Mar 10, 2026

Uh oh!

coderabbitai bot commented Mar 10, 2026

Uh oh!

Uh oh!

glasstiger commented Mar 11, 2026

Uh oh!

questdb-butler commented Mar 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

ideoma commented Mar 10, 2026

Summary

Test plan

Uh oh!

coderabbitai bot commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

ideoma commented Mar 10, 2026

Uh oh!

coderabbitai bot commented Mar 10, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

ideoma commented Mar 10, 2026

Uh oh!

coderabbitai bot commented Mar 10, 2026

Uh oh!

Uh oh!

glasstiger commented Mar 11, 2026

[PR Coverage check]

file detail

Uh oh!

questdb-butler commented Mar 11, 2026

⚠️ Enterprise CI Failed

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

coderabbitai bot commented Mar 10, 2026 •

edited

Loading