perf(parquet): breaking change 💥 - improve strings decoding performance for parquet files by RaphDal · Pull Request #6809 · questdb/questdb

RaphDal · 2026-02-24T11:06:54Z

This PR brings performance improvements to varchar decoding from parquet files by adding a new internal column type: varchar slice.

VarcharSlice aux entries store (length, pointer) pairs pointing directly into mmapped Parquet pages or per-page decompression buffers, eliminating byte copies on the read path (no inlining/asciiness check).

Breaking change

This PR changes the default encoding for Varchar from Delta Length Byte Array to RLE Dictionary.

Benchmark

Benchmark                 (cardinality)  (rowCount)  (stringLength)  Mode  Cnt   Score   Error  Units
VarcharLegacyChecksum               256      500000               8  avgt    5   5.568 ± 0.554  ms/op
VarcharLegacyChecksum               256      500000              50  avgt    5  17.848 ± 0.860  ms/op
VarcharLegacyChecksum               256      500000             200  avgt    5  80.316 ± 1.340  ms/op
VarcharLegacyDecodeOnly             256      500000               8  avgt    5   4.251 ± 0.096  ms/op
VarcharLegacyDecodeOnly             256      500000              50  avgt    5   7.128 ± 0.128  ms/op
VarcharLegacyDecodeOnly             256      500000             200  avgt    5  16.349 ± 0.221  ms/op
VarcharSliceChecksum                256      500000               8  avgt    5   2.042 ± 0.019  ms/op
VarcharSliceChecksum                256      500000              50  avgt    5  11.010 ± 0.186  ms/op
VarcharSliceChecksum                256      500000             200  avgt    5  64.445 ± 1.199  ms/op
VarcharSliceDecodeOnly              256      500000               8  avgt    5   0.746 ± 0.008  ms/op
VarcharSliceDecodeOnly              256      500000              50  avgt    5   0.722 ± 0.085  ms/op
VarcharSliceDecodeOnly              256      500000             200  avgt    5   0.707 ± 0.018  ms/op

coderabbitai · 2026-02-24T11:07:07Z

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 9c13acaf-6480-49de-b0d9-f6826c422b97

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

Walkthrough

Adds a new in-memory VARCHAR_SLICE type and end-to-end Parquet slice encoding/decoding support across Rust and Java: new decoders/sinks, schema/metadata ascii flag, writer dictionary path, Java runtime accessors and type mappings, benchmarks, and extensive tests validating aux-pointer semantics and Parquet read/write roundtrips.

Changes

Cohort / File(s)	Summary
New benchmark `benchmarks/src/main/java/org/questdb/VarcharSliceBenchmark.java`	Adds a JMH benchmark that builds Parquet-backed varchar dictionary, opens memory-mapped partition decoder, and compares legacy vs slice decode paths with checksum/ decode-only variants.
Column type & Java integration `core/src/main/java/io/questdb/cairo/ColumnType.java`, `core/src/main/java/io/questdb/cairo/VarcharTypeDriver.java`, `core/src/main/java/io/questdb/griffin/FunctionParser.java`, `core/src/main/java/io/questdb/griffin/SqlCodeGenerator.java`	Introduces `VARCHAR_SLICE` in Java type system; adds slice accessors (getSliceValue/getSliceValueSize); adjusts parser/memoizer and cast matrix to include the new type.
Parquet write: varchar dict encoding & helpers `core/rust/qdbr/src/parquet_write/varchar.rs`, `core/rust/qdbr/src/parquet_write/schema.rs`, `core/rust/qdbr/src/parquet_write/file.rs`, `core/rust/qdbr/src/parquet_write/mod.rs`	Implements dictionary-based varchar_to_dict_pages, append_varchar_slice/append_varchar_slice_null helpers, propagates optional `ascii` metadata, routes Varchar to dictionary path.
Parquet read: decoders, sinks, integration `core/rust/qdbr/src/parquet_read/decode.rs`, `.../row_groups.rs`, `.../column_sink/var.rs`, `.../decoders/delta_length_array.rs`, `.../decoders/rle_dict_varchar_slice.rs`, `.../decoders/mod.rs`, `.../decoders/delta_binary_packed.rs`, `.../decoders/dictionary.rs`, `.../slicer/mod.rs`	Adds VarcharSlice-specific decoders and RLE-dict decoder, new VarcharSliceColumnSink/VarcharSliceSpillSink, spill-pointer fixup, page buffer fields, ascii-aware decode branches, and visibility/API adjustments needed by new decoders.
Decode context & buffers `core/rust/qdbr/src/parquet_read/mod.rs`, `core/rust/qdbr/src/parquet_read/decoders/plain.rs`, `.../rle.rs`, `.../rle_dictionary.rs`	Extends DecodeContext and ColumnChunkBuffers with varchar_slice buffer pools/`page_buffers` to manage per-page buffers and lifetimes for slice decoding.
Row group handling & pointer fixups `core/rust/qdbr/src/parquet_read/row_groups.rs`, `core/rust/qdbr/src/parquet_read/qdb_metadata.rs`	Detects VarcharSlice pages, chooses appropriate buffer strategy, threads `ascii` into metadata (QdbMetaCol.ascii), and performs spill-pointer fixups after decode.
Rust tests & helpers `core/rust/qdbr/tests/varchar_slice.rs`, `core/rust/qdbr/tests/common/mod.rs`, `core/rust/qdbr/benches/decode_page.rs`, `core/rust/qdbr/tests/strings.rs`	Adds comprehensive VarcharSlice tests and bench scaffolding; test helpers for ASCII metadata; expands bench cases for sized/ascii varchar generation.
Java runtime & parquet read usage `core/src/main/java/io/questdb/cairo/sql/PageFrameMemoryPool.java`, `.../PageFrameMemoryRecord.java`, `.../PageFrameFilteredMemoryRecord.java`, `.../ReadParquetRecordCursor.java`, `.../HybridColumnMaterializer.java`	Remaps VARCHAR→VARCHAR_SLICE for PARQUET frames and switches runtime varchar access to slice-based getters when frameFormat == PARQUET.
Maps, deferred key-copy & group-by `core/src/main/java/io/questdb/cairo/map/MapFactory.java`, `core/src/main/java/io/questdb/cairo/map/UnorderedVarcharMap.java`, `core/src/main/java/io/questdb/griffin/engine/groupby/GroupByRecordCursorFactory.java`, `.../GroupByMapFragment.java`	Adds isDeferredKeyCopy flag and overloads to MapFactory; UnorderedVarcharMap supports deferred key-copy and uses makePackComparable for comparisons; call sites updated to pass `true`.
Java tests & test utils `core/src/test/java/.../ReadParquetFunctionTest.java`, `.../UnorderedVarcharMapTest.java`, `.../ColumnTypeTest.java`, `.../TypeOfFunctionFactoryTest.java`, `.../CopyExportTest.java`, `.../TestUtils.java`	Adds many Parquet VARCHAR read tests, deferred-copy tests, a COPY roundtrip test, adjusts TestUtils ASCII assertion and excludes VARCHAR_SLICE where appropriate in unit tests.
Rust core type registration `core/rust/qdb-core/src/col_type.rs`, `core/rust/qdb-core/src/col_driver/mod.rs`	Adds ColumnTypeTag::VarcharSlice (code 40) with name mapping and TryFrom support; col_driver reports formatted error for on-disk driver lookup of VarcharSlice (transient in-memory only).

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

fix(sql): support more Parquet field type combinations in read_parquet #6069 — benchmark and Java-side VARCHAR_SLICE support touching Parquet slice decode and runtime accessors.
perf(sql): optimize parquet decode rowgroup performance #6632 — Parquet read/write decoder and column_sink changes that overlap with VarcharSlice decoder and spill handling.
feat(core): optimize parquet partition read with late materialization, zero-copy page reading, and use raw array encoding #6675 — Introduces VarcharSlice Parquet read path and decoders that this PR benchmarks and tests.

Suggested labels

Performance, SQL

Suggested reviewers

bluestreak01
puzpuzpuz
glasstiger

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 60.08% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description check	✅ Passed	The PR description clearly relates to the changeset, explaining the introduction of VarcharSlice for varchar decoding performance improvements from Parquet files, with supporting benchmarks.
Title check	✅ Passed	The title 'perf(parquet): breaking change 💥 - improve strings decoding performance for parquet files' clearly describes the main change: a breaking-change performance improvement for Parquet string/varchar decoding via the new VarcharSlice type.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch rd_parquet_varchar

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

… across encoders and decoders

…MemoryRecord

… and update related methods

…ssion strategy

RaphDal · 2026-03-04T08:49:18Z

@coderabbitai please review

coderabbitai · 2026-03-04T08:49:39Z

@RaphDal Sure! I'll start reviewing the PR now. I also noticed that the Enterprise CI failed on commit 48a931f8cd66e5938a7c1a7d50978b7f96f7d4f9 — worth investigating that failure before merging.

🧠 Learnings used

Learnt from: javier
Repo: questdb/questdb PR: 0
File: :0-0
Timestamp: 2026-02-06T18:22:28.130Z
Learning: In questdb/questdb, for PRs requested by javier, provide line-cited, push-back style reviews with concrete follow-ups (tests, diffs, and performance considerations).

Learnt from: javier
Repo: questdb/questdb PR: 0
File: :0-0
Timestamp: 2026-02-06T18:22:28.130Z
Learning: For questdb/questdb, javier expects a rigorous, push-back style review; always include concrete line-cited findings and propose targeted tests/fixes when reviewing javier’s PRs.

✅ Actions performed

Full review triggered.

coderabbitai

Actionable comments posted: 17

🧹 Nitpick comments (6)

core/rust/qdbr/tests/strings.rs (1)

178-179: Turn this note into executable coverage (avoid comment drift).

Lines 178-179 document unsupported encodings, but this can silently go stale. Please add explicit tests that assert Encoding::RleDictionary and Encoding::DeltaByteArray return the expected “unsupported” failure for ColumnTypeTag::String (or mark as #[ignore] with TODO if the error contract is still being finalized).

Proposed test scaffold

 #[test]
 fn test_string_delta_length_byte_array() {
     run_string_test("String", Encoding::DeltaLengthByteArray);
 }

-// Note: String type only supports Plain and DeltaLengthByteArray encodings.
-// RleDictionary and DeltaByteArray are not implemented for String decode.
+#[test]
+fn test_string_rle_dictionary_unsupported() {
+    // TODO: replace with exact error assertion once contract is finalized.
+    // e.g. assert!(decode_result.is_err_and(|e| e.to_string().contains("unsupported encoding")));
+}
+
+#[test]
+fn test_string_delta_byte_array_unsupported() {
+    // TODO: replace with exact error assertion once contract is finalized.
+}

Based on learnings: in questdb/questdb reviews requested by javier, provide line-cited push-back with concrete follow-ups (tests/diffs/perf considerations).

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@core/rust/qdbr/tests/strings.rs` around lines 178 - 179, Add executable tests
in core/rust/qdbr/tests/strings.rs that assert attempting to decode
ColumnTypeTag::String with Encoding::RleDictionary and Encoding::DeltaByteArray
fails with the expected “unsupported” error; locate the test module near the
existing String tests and add two cases that call the same decode/reader entry
point used elsewhere in this file (e.g. the routine that maps Encoding ->
decoder) and assert it returns the unsupported/Err variant, or mark each test
with #[ignore] and a TODO comment if the exact error contract is still unstable;
reference Encoding::RleDictionary, Encoding::DeltaByteArray and
ColumnTypeTag::String in the assertions so the test will fail if the
unsupported-list comment drifts.

core/rust/qdbr/benches/decode_page.rs (1)

603-613: VarcharSlice benchmark matrix is currently ASCII-fast-path only.

Line 1845 and Line 1907 force ascii: Some(true), and the generator in Lines 603-613 emits ASCII-only payloads. This misses non-ASCII decode behavior and can overstate generalized gains.

📈 Suggested extension to cover both ASCII and non-ASCII paths

-fn make_varchar_data_sized(row_count: usize, null_pct: u8, str_len: usize) -> VarcharData {
+fn make_varchar_data_sized(
+    row_count: usize,
+    null_pct: u8,
+    str_len: usize,
+    ascii: bool,
+) -> VarcharData {
@@
-            while v.len() < str_len {
-                v.push(b'a' + (v.len() % 26) as u8);
-            }
+            while v.len() < str_len {
+                if ascii {
+                    v.push(b'a' + (v.len() % 26) as u8);
+                } else {
+                    v.extend_from_slice("é".as_bytes());
+                }
+            }
             v.truncate(str_len);
             v
         })
         .collect();
@@
-            let header =
-                ((size as u8) << HEADER_FLAGS_WIDTH as u8) | HEADER_FLAG_INLINED | HEADER_FLAG_ASCII;
+            let header = ((size as u8) << HEADER_FLAGS_WIDTH as u8)
+                | HEADER_FLAG_INLINED
+                | if ascii { HEADER_FLAG_ASCII } else { 0 };
@@
-            let header = ((size as u32) << HEADER_FLAGS_WIDTH) | HEADER_FLAG_ASCII_32;
+            let header = ((size as u32) << HEADER_FLAGS_WIDTH)
+                | if ascii { HEADER_FLAG_ASCII_32 } else { 0 };

-for &str_len in &[2usize, 200] {
+for &(str_len, ascii) in &[(2usize, true), (200usize, true), (200usize, false)] {
     for &null_pct in null_pcts(true) {
-        let data = make_varchar_data_sized(ROW_COUNT, null_pct, str_len);
+        let data = make_varchar_data_sized(ROW_COUNT, null_pct, str_len, ascii);
@@
-            format!("varchar_slice_delta_len_s{str_len}_n{null_pct}"),
+            format!("varchar_slice_delta_len_a{ascii}_s{str_len}_n{null_pct}"),
@@
-            Some(true),
+            Some(ascii),
             ROW_COUNT,
         ));
     }
 }

Mirror the same ascii matrix in the dictionary-encoded varchar_slice_dict_* block.

Also applies to: 1839-1847, 1901-1908

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@core/rust/qdbr/benches/decode_page.rs` around lines 603 - 613, The benchmark
currently only generates ASCII payloads via the values Vec< Vec<u8> > generator
and forces ascii: Some(true) in the varchar_slice and varchar_slice_dict
benchmarks, missing non-ASCII decode paths; update the values generator in
decode_page.rs to produce both ASCII and non-ASCII variants (e.g., include at
least one multibyte UTF-8 character such as "ñ" or a CJK character in some
entries or alternate rows) and extend the benchmark matrix for the
dictionary-encoded varchar_slice_dict_* cases to mirror the same ascii:
Some(true/false) variants (i.e., add ascii: Some(false) runs) so both ASCII and
non-ASCII decode paths are exercised. Ensure you modify the generator that
builds values and the varchar_slice_dict_* benchmark configuration to include
non-ASCII cases.

core/rust/qdb-core/src/col_driver/mod.rs (1)

85-88: Add a targeted test for the new VarcharSlice rejection branch.

You introduced a new error path in try_lookup_driver, but test_lookup_driver currently only validates success cases. A small negative test will lock this behavior down.

✅ Suggested test addition

 #[test]
 fn test_lookup_driver() {
@@
 }

+#[test]
+fn test_lookup_driver_rejects_varchar_slice() {
+    let err = try_lookup_driver(ColumnTypeTag::VarcharSlice.into_type()).unwrap_err();
+    let msg = err.to_string();
+    assert!(msg.contains("transient in-memory type"));
+}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@core/rust/qdb-core/src/col_driver/mod.rs` around lines 85 - 88, Add a
negative unit test that asserts try_lookup_driver returns the InvalidType error
for ColumnTypeTag::VarcharSlice: update the test suite that contains
test_lookup_driver to include a case calling
try_lookup_driver(ColumnTypeTag::VarcharSlice, ...) and assert it Errs with the
fmt_err InvalidType message (or matches the error variant), so the new rejection
branch in mod.rs is covered; use the same helper/setup used by
test_lookup_driver to obtain any necessary context/arguments.

core/src/test/java/io/questdb/test/cairo/map/UnorderedVarcharMapTest.java (1)

264-292: Add a deferred copyFrom() scenario with unstable source keys.

testDeferredKeyCopyCopyFrom currently validates stable-source behavior. A companion case with unstable source pointers would pin the deferred-copy branch in Key.copyFrom().

✅ Minimal test addition sketch

+    `@Test`
+    public void testDeferredKeyCopyCopyFromUnstable() throws Exception {
+        TestUtils.assertMemoryLeak(() -> {
+            SingleColumnType valueTypes = new SingleColumnType(ColumnType.INT);
+            try (
+                    UnorderedVarcharMap mapA = newDeferredKeyCopyMap(valueTypes);
+                    UnorderedVarcharMap mapB = newDeferredKeyCopyMap(valueTypes)
+            ) {
+                final int n = 1_000;
+                for (int i = 0; i < n; i++) {
+                    MapKey keyA = mapA.withKey();
+                    keyA.putVarchar(new Utf8String("u" + i)); // unstable/on-heap
+
+                    MapKey keyB = mapB.withKey();
+                    keyB.copyFrom(keyA);
+                    MapValue valueB = keyB.createValue();
+                    Assert.assertTrue(valueB.isNew());
+                    valueB.putInt(0, i);
+                }
+                for (int i = 0; i < n; i++) {
+                    Assert.assertEquals(i, get("u" + i, mapB));
+                }
+            }
+        });
+    }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@core/src/test/java/io/questdb/test/cairo/map/UnorderedVarcharMapTest.java`
around lines 264 - 292, Add a companion test that exercises MapKey.copyFrom()
when the source key is unstable (unpinned) so the deferred-copy branch in
Key.copyFrom() is covered: create a new test (e.g.,
testDeferredKeyCopyCopyFrom_unstable) mirroring testDeferredKeyCopyCopyFrom but
use the unstable-key insertion helper (the counterpart to putStable, e.g.,
putUnstable or the method that leaves source pointers non-pinned) when
populating mapA, then perform the same copyFrom into mapB (using
UnorderedVarcharMap, MapKey.copyFrom, newDeferredKeyCopyMap, DirectUtf8Sink) and
assert sizes and values as in the existing test to verify correctness.

core/rust/qdbr/src/parquet_read/decode.rs (1)

2376-2381: Consider a small test helper for QdbMetaCol construction.

QdbMetaCol { ..., ascii: None } is now repeated across many tests; a helper will reduce churn and future field-drift errors.

♻️ Minimal refactor sketch

+    fn test_meta_col(column_type: ColumnType, format: Option<QdbMetaColFormat>) -> QdbMetaCol {
+        QdbMetaCol {
+            column_type,
+            column_top: 0,
+            format,
+            ascii: None,
+        }
+    }
...
-            let col_info = QdbMetaCol {
-                column_type,
-                column_top: 0,
-                format: None,
-                ascii: None,
-            };
+            let col_info = test_meta_col(column_type, None);

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@core/rust/qdbr/src/parquet_read/decode.rs` around lines 2376 - 2381, Multiple
tests repeat QdbMetaCol { column_type, column_top: 0, format: None, ascii: None
}; add a small test helper to centralize construction (e.g., a function like
test_qdb_meta_col(column_type: Type) or a builder like
QdbMetaCol::test_with(column_type) in the tests/util module) that returns a
QdbMetaCol with column_top=0, format=None, ascii=None so tests call that helper
instead of repeating the literal; update existing tests to use the helper to
reduce duplication and prevent future field-drift.

core/rust/qdbr/src/parquet_read/row_groups.rs (1)

130-147: Extract remapping rules into one helper.

Symbol/Varchar/VarcharSlice compatibility rules are implemented three times. Consolidating this into one function will reduce drift risk between full decode, filtered decode, and stats validation.

Also applies to: 257-270, 779-786

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@core/rust/qdbr/src/parquet_read/row_groups.rs` around lines 130 - 147, The
symbol/varchar/varcharslice compatibility logic is duplicated across
read_parquet, filtered decode, and stats validation (e.g., the blocks comparing
ColumnTypeTag::Symbol, ::Varchar and ::VarcharSlice that reassign column_type =
to_column_type); extract this logic into a single helper function (e.g.,
normalize_parquet_column_type or remap_symbol_varchar) that takes (column_type,
to_column_type) and returns the remapped ColumnType; replace the three
duplicated blocks (including the occurrences around read_parquet and the ranges
noted) with calls to that helper so all callers use the same remapping behavior
and avoid drift.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@benchmarks/src/main/java/org/questdb/VarcharSliceBenchmark.java`:
- Line 85: Update the numeric literal in the JMH `@Param` annotation to use a
grouped numeric literal for readability: change the value in the `@Param`
annotation in VarcharSliceBenchmark (the field annotated with
`@Param`({"500000"})) to use an underscore separator (e.g., "500_000") so it
follows the project's coding guideline for numbers with 5+ digits.

In `@core/rust/qdbr/benches/decode_page.rs`:
- Around line 1857-1877: The loop over varchar_to_dict_pages currently
overwrites data_page on every Page::Data and allows dict to remain None, which
yields incorrect benchmarking; change the loop in the block that builds the
"varchar_dict_s{str_len}_n{null_pct}" case (and the analogous
varchar_slice_dict_* block) to capture the first data page and require a dict:
iterate pages from varchar_to_dict_pages, set dict = Some(p) on the first
Page::Dict and set data_page = Some(p) only if data_page.is_none() on Page::Data
(i.e., keep the first data page encountered), then call build_case with
data_page.expect("data page") and dict.expect("dict page") (or return/error if
dict is missing) so ROW_COUNT reporting aligns with the actual data page used.

In `@core/rust/qdbr/src/parquet_read/column_sink/var.rs`:
- Around line 497-508: The push method in VarcharSliceSpillSink currently builds
the header with (len << 4) using value.len() directly, which can overflow the
28-bit length field; mask or clamp the length to 28 bits before shifting to
preserve the header format. In VarcharSliceSpillSink::push (around the
self.slicer.next() usage and header/combined packing), compute a safe_len = (len
as u32) & 0x0FFF_FFFF (or otherwise cap at 0x0FFF_FFFF) and then form header =
(safe_len << 4) | flags so oversized values do not corrupt the header, keeping
the rest of the spill packing with SPILL_MARKER and offset unchanged.

In `@core/rust/qdbr/src/parquet_read/decode.rs`:
- Around line 1205-1258: Add direct regression tests that exercise
ColumnTypeTag::VarcharSlice through the decode paths shown: DeltaLengthByteArray
(using DeltaLAVarcharSliceDecoder), RleDictionary/PlainDictionary
(RleDictVarcharSliceDecoder), Plain (PlainVarSlicer -> VarcharSliceColumnSink)
and DeltaByteArray (DeltaBytesArraySlicer -> VarcharSliceSpillSink), covering
both FILTERED and FILL_NULLS=true/false permutations; each test should call
decode_page0_mode (or decode_column_chunk for multi-page cases) and assert
correct decoded output and pointer lifetime safety (including the end-of-chunk
fixup behavior and reuse of decompression buffers across pages) so the code
paths and pointer lifetimes in decode.rs are actually exercised.
- Around line 1247-1257: The code constructs a DeltaBytesArraySlicer and
VarcharSliceSpillSink and then calls decode_page0_mode, but
fixup_varchar_slice_spill_pointers is only invoked after a successful return; if
decode_page0_mode errors the spill pointers remain un-fixed. Change the flow in
the DeltaByteArray / VarcharSlice branch so that you capture the result of
decode_page0_mode (e.g. let res = decode_page0_mode::<_, FILTERED,
FILL_NULLS>(...);), then always call fixup_varchar_slice_spill_pointers(&mut
bufs.aux_vec) (or the existing fixup function) before propagating the result
(returning res? or res). Alternatively implement a small scope guard/RAII that
runs fixup_varchar_slice_spill_pointers when the VarcharSliceSpillSink is
dropped; ensure the fixup is invoked unconditionally even when decode_page0_mode
returns Err, referencing DeltaBytesArraySlicer, VarcharSliceSpillSink,
decode_page0_mode, and fixup_varchar_slice_spill_pointers.

In `@core/rust/qdbr/src/parquet_read/decoders/delta_binary_packed.rs`:
- Around line 121-143: The get_end_pointer() implementation currently does
unchecked indexing/slicing (page_data[...] and page_data =
&page_data[miniblock_offset..]) and returns a raw pointer computed from
unchecked offsets; change these to bounds-checked operations: use get() or
checked slicing on page_data before accessing block_bitwidths_offset +
miniblock_index and before slicing with miniblock_offset, validate
zigzag_leb128::decode consumed bytes are within page_data, perform checked
arithmetic when updating miniblock_offset and ensure miniblock_offset <=
original_page_data.len() before computing the pointer, and return a
ParquetResult::Err (e.g., via fmt_err!(Layout, "truncated/corrupt page"))
whenever any bound check fails instead of panicking or producing an
out-of-bounds pointer.

In `@core/rust/qdbr/src/parquet_read/decoders/delta_length_array.rs`:
- Around line 125-135: The decoder must reject string lengths that exceed the
28-bit header capacity before writing VarcharSlice headers; add a check against
MAX_VARCHAR_LEN = (1 << 28) - 1 in advance_data (and the other length-checking
sites referenced around the 239-245 and 347-370 regions) so any len >
MAX_VARCHAR_LEN returns Err(fmt_err!(Layout, "string length exceeds 28-bit
header capacity")), ensure you perform the comparison safely by casting to usize
(or using a const usize) and keep the existing negative/overflow and page-bound
checks in place.

In `@core/rust/qdbr/src/parquet_read/decoders/rle_dict_varchar_slice.rs`:
- Around line 42-46: The loop that builds dict_aux in rle_dict_varchar_slice.rs
packs value length into a 28-bit field (see dict_decoder.dict_values, header and
dict_aux.push), so add an explicit cap check before packing: compute len as u32,
compare to MAX_LEN = 1 << 28 and if len >= MAX_LEN return an error (or propagate
a Result Err) with a clear message about oversized dictionary value instead of
packing it; only proceed to create header and push to dict_aux when len <
MAX_LEN.

In `@core/rust/qdbr/src/parquet_read/row_groups.rs`:
- Around line 580-586: Remove the conditional guard that skips calling
fixup_varchar_slice_spill_pointers when column_chunk_bufs.data_vec.is_empty();
always invoke fixup_varchar_slice_spill_pointers for VarcharSlice DeltaByteArray
pages so SPILL_MARKER entries in aux are cleared even when data_vec.len() == 0.
Locate the two decode paths where column_chunk_bufs.page_buffers is set to
varchar_slice_page_bufs (the block that checks is_varchar_slice and the other
similar branch) and delete the if !column_chunk_bufs.data_vec.is_empty() check
so fixup_varchar_slice_spill_pointers(column_chunk_bufs) runs unconditionally;
optionally add a unit test covering the all-empty-strings DeltaByteArray case to
confirm SPILL_MARKER bytes (4-7) are zeroed.

In `@core/rust/qdbr/src/parquet_write/file.rs`:
- Around line 920-946: The current branch only handles ColumnTypeTag::Varchar
and omits ColumnTypeTag::VarcharSlice, leaving slice-backed varchar columns
unhandled and causing runtime errors; update the condition or add a preceding
branch to also detect ColumnTypeTag::VarcharSlice and route it to the same
varchar dict path (varchar::varchar_to_dict_pages) using the appropriate
auxiliary slice from column.secondary_data, preserving the same bounds logic
that uses column.primary_data, column.column_top, chunk_offset, chunk_length,
options, and primitive_type so both Varchar and VarcharSlice are processed
safely.

In `@core/rust/qdbr/src/parquet_write/varchar.rs`:
- Around line 369-376: The function append_varchar_slice must validate that
value.len() fits in the 28-bit length field before packing into header: compute
len as u64 or usize, check it is <= 0x0FFF_FFFF (28 bits), and if not return an
appropriate ParquetResult::Err with a clear message; only after that cast to u32
and compute header = (len << 4) | (...) as currently done. Update the validation
near the start of append_varchar_slice (before the header calculation and
reserve call) and use the existing ParquetResult error path to signal overflow.

In `@core/rust/qdbr/tests/varchar_slice.rs`:
- Around line 71-73: The contains method currently does unchecked pointer
arithmetic which can overflow; change it to use checked addition: convert len to
u64, call ptr.checked_add(len_u64) and if that returns Some(end) verify ptr >=
self.start && end <= self.end (return false on None) so the range guard cannot
be bypassed by wrapping; update the contains function (referenced by the
contains(&self, ptr: u64, len: usize) -> bool signature) to use checked_add and
short-circuit on overflow before any unsafe dereference.

In `@core/src/main/java/io/questdb/cairo/map/UnorderedVarcharMap.java`:
- Around line 479-484: The hex mask literal in makePackComparable(long
packedHashSizeFlags) should be reformatted with underscores for
readability—replace the current 0x7fffffffffffffffL literal with a grouped
version (e.g., 0x7fff_ffff_ffff_ffffL) while preserving the long suffix and
exact bit pattern so behavior of makePackComparable and any callers remains
unchanged.

In `@core/src/main/java/io/questdb/cairo/VarcharTypeDriver.java`:
- Around line 246-268: Add a defensive check for negative rowNum at the start of
both getSliceValue and getSliceValueSize to avoid auxEntry underflow: in
getSliceValue(long auxAddr, long rowNum, Utf8SplitString utf8SplitView) return
null if rowNum < 0 before computing auxEntry, and in getSliceValueSize(long
auxAddr, long rowNum) return TableUtils.NULL_LEN if rowNum < 0; keep all
existing behavior (VARCHAR_AUX_WIDTH_BYTES, VARCHAR_HEADER_FLAG_NULL, header bit
logic, and Utf8SplitString.of call) otherwise.

In `@core/src/main/java/io/questdb/griffin/SqlCodeGenerator.java`:
- Around line 390-431: The UNION matrix was updated to allow casts involving
VARCHAR_SLICE but generateCastFunctions() lacks handling for fromTag ==
ColumnType.VARCHAR_SLICE in the STRING and VARCHAR branches, so add cases in
generateCastFunctions() to produce cast functions when source is
ColumnType.VARCHAR_SLICE targeting STRING and VARCHAR (and vice-versa if
missing), ensuring the cast table entries match the new matrix; then add
regression tests covering UNION ALL between VARCHAR_SLICE<->VARCHAR and
VARCHAR_SLICE<->STRING and mixed-type INTERSECT/EXCEPT cases to guard against
missing cast implementations.

In `@core/src/main/java/io/questdb/jit/CompiledFilterIRSerializer.java`:
- Around line 607-609: Rename the boolean method TypesObserver.requiresScalar()
to follow the is/has prefix convention (e.g., isScalarRequired() or
hasScalarRequirement()), update all call sites including the call in
CompiledFilterIRSerializer that currently checks typesObserver.requiresScalar(),
and move the renamed method declaration into the proper alphabetical location
among the other instance methods in TypesObserver so member ordering follows the
project's sort rules; ensure method signature and javadoc remain unchanged
except for the name so callers compile cleanly.

In
`@core/src/test/java/io/questdb/test/griffin/engine/table/parquet/ReadParquetFunctionTest.java`:
- Around line 596-599: In the test ReadParquetFunctionTest (the SQL passed to
the execute(...) call), replace the SQL cast expression using CAST(NULL AS
VARCHAR) with the preferred QuestDB cast operator NULL::VARCHAR; update the
string inside the execute("CREATE TABLE x AS (SELECT ...") invocation so the
column definition reads NULL::VARCHAR for v to match project cast syntax.

---

Nitpick comments:
In `@core/rust/qdb-core/src/col_driver/mod.rs`:
- Around line 85-88: Add a negative unit test that asserts try_lookup_driver
returns the InvalidType error for ColumnTypeTag::VarcharSlice: update the test
suite that contains test_lookup_driver to include a case calling
try_lookup_driver(ColumnTypeTag::VarcharSlice, ...) and assert it Errs with the
fmt_err InvalidType message (or matches the error variant), so the new rejection
branch in mod.rs is covered; use the same helper/setup used by
test_lookup_driver to obtain any necessary context/arguments.

In `@core/rust/qdbr/benches/decode_page.rs`:
- Around line 603-613: The benchmark currently only generates ASCII payloads via
the values Vec< Vec<u8> > generator and forces ascii: Some(true) in the
varchar_slice and varchar_slice_dict benchmarks, missing non-ASCII decode paths;
update the values generator in decode_page.rs to produce both ASCII and
non-ASCII variants (e.g., include at least one multibyte UTF-8 character such as
"ñ" or a CJK character in some entries or alternate rows) and extend the
benchmark matrix for the dictionary-encoded varchar_slice_dict_* cases to mirror
the same ascii: Some(true/false) variants (i.e., add ascii: Some(false) runs) so
both ASCII and non-ASCII decode paths are exercised. Ensure you modify the
generator that builds values and the varchar_slice_dict_* benchmark
configuration to include non-ASCII cases.

In `@core/rust/qdbr/src/parquet_read/decode.rs`:
- Around line 2376-2381: Multiple tests repeat QdbMetaCol { column_type,
column_top: 0, format: None, ascii: None }; add a small test helper to
centralize construction (e.g., a function like test_qdb_meta_col(column_type:
Type) or a builder like QdbMetaCol::test_with(column_type) in the tests/util
module) that returns a QdbMetaCol with column_top=0, format=None, ascii=None so
tests call that helper instead of repeating the literal; update existing tests
to use the helper to reduce duplication and prevent future field-drift.

In `@core/rust/qdbr/src/parquet_read/row_groups.rs`:
- Around line 130-147: The symbol/varchar/varcharslice compatibility logic is
duplicated across read_parquet, filtered decode, and stats validation (e.g., the
blocks comparing ColumnTypeTag::Symbol, ::Varchar and ::VarcharSlice that
reassign column_type = to_column_type); extract this logic into a single helper
function (e.g., normalize_parquet_column_type or remap_symbol_varchar) that
takes (column_type, to_column_type) and returns the remapped ColumnType; replace
the three duplicated blocks (including the occurrences around read_parquet and
the ranges noted) with calls to that helper so all callers use the same
remapping behavior and avoid drift.

In `@core/rust/qdbr/tests/strings.rs`:
- Around line 178-179: Add executable tests in core/rust/qdbr/tests/strings.rs
that assert attempting to decode ColumnTypeTag::String with
Encoding::RleDictionary and Encoding::DeltaByteArray fails with the expected
“unsupported” error; locate the test module near the existing String tests and
add two cases that call the same decode/reader entry point used elsewhere in
this file (e.g. the routine that maps Encoding -> decoder) and assert it returns
the unsupported/Err variant, or mark each test with #[ignore] and a TODO comment
if the exact error contract is still unstable; reference
Encoding::RleDictionary, Encoding::DeltaByteArray and ColumnTypeTag::String in
the assertions so the test will fail if the unsupported-list comment drifts.

In `@core/src/test/java/io/questdb/test/cairo/map/UnorderedVarcharMapTest.java`:
- Around line 264-292: Add a companion test that exercises MapKey.copyFrom()
when the source key is unstable (unpinned) so the deferred-copy branch in
Key.copyFrom() is covered: create a new test (e.g.,
testDeferredKeyCopyCopyFrom_unstable) mirroring testDeferredKeyCopyCopyFrom but
use the unstable-key insertion helper (the counterpart to putStable, e.g.,
putUnstable or the method that leaves source pointers non-pinned) when
populating mapA, then perform the same copyFrom into mapB (using
UnorderedVarcharMap, MapKey.copyFrom, newDeferredKeyCopyMap, DirectUtf8Sink) and
assert sizes and values as in the existing test to verify correctness.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 8d85cb9e-c65d-4e21-a859-5ec0dfa10743

📥 Commits

Reviewing files that changed from the base of the PR and between 5c98ed5 and 6015090.

⛔ Files ignored due to path filters (8)

core/src/main/bin/linux-aarch64/libjemalloc.so is excluded by !**/*.so
core/src/main/resources/io/questdb/bin/darwin-aarch64/libquestdbr.dylib is excluded by !**/*.dylib
core/src/main/resources/io/questdb/bin/darwin-x86-64/libquestdbr.dylib is excluded by !**/*.dylib
core/src/main/resources/io/questdb/bin/linux-aarch64/libquestdb.so is excluded by !**/*.so
core/src/main/resources/io/questdb/bin/linux-aarch64/libquestdbr.so is excluded by !**/*.so
core/src/main/resources/io/questdb/bin/linux-x86-64/libquestdbr.so is excluded by !**/*.so
core/src/main/resources/io/questdb/bin/windows-x86-64/libquestdb.dll is excluded by !**/*.dll
core/src/main/resources/io/questdb/bin/windows-x86-64/questdbr.dll is excluded by !**/*.dll

📒 Files selected for processing (44)

benchmarks/src/main/java/org/questdb/VarcharSliceBenchmark.java
core/rust/qdb-core/src/col_driver/mod.rs
core/rust/qdb-core/src/col_type.rs
core/rust/qdbr/benches/decode_page.rs
core/rust/qdbr/src/parquet/qdb_metadata.rs
core/rust/qdbr/src/parquet_read/column_sink/var.rs
core/rust/qdbr/src/parquet_read/decode.rs
core/rust/qdbr/src/parquet_read/decoders/delta_binary_packed.rs
core/rust/qdbr/src/parquet_read/decoders/delta_length_array.rs
core/rust/qdbr/src/parquet_read/decoders/dictionary.rs
core/rust/qdbr/src/parquet_read/decoders/mod.rs
core/rust/qdbr/src/parquet_read/decoders/plain.rs
core/rust/qdbr/src/parquet_read/decoders/rle.rs
core/rust/qdbr/src/parquet_read/decoders/rle_dict_varchar_slice.rs
core/rust/qdbr/src/parquet_read/decoders/rle_dictionary.rs
core/rust/qdbr/src/parquet_read/mod.rs
core/rust/qdbr/src/parquet_read/row_groups.rs
core/rust/qdbr/src/parquet_read/slicer/mod.rs
core/rust/qdbr/src/parquet_write/file.rs
core/rust/qdbr/src/parquet_write/mod.rs
core/rust/qdbr/src/parquet_write/schema.rs
core/rust/qdbr/src/parquet_write/varchar.rs
core/rust/qdbr/tests/common/mod.rs
core/rust/qdbr/tests/strings.rs
core/rust/qdbr/tests/varchar_slice.rs
core/src/main/java/io/questdb/cairo/ColumnType.java
core/src/main/java/io/questdb/cairo/VarcharTypeDriver.java
core/src/main/java/io/questdb/cairo/map/MapFactory.java
core/src/main/java/io/questdb/cairo/map/UnorderedVarcharMap.java
core/src/main/java/io/questdb/cairo/sql/PageFrameFilteredMemoryRecord.java
core/src/main/java/io/questdb/cairo/sql/PageFrameMemoryPool.java
core/src/main/java/io/questdb/cairo/sql/PageFrameMemoryRecord.java
core/src/main/java/io/questdb/griffin/FunctionParser.java
core/src/main/java/io/questdb/griffin/SqlCodeGenerator.java
core/src/main/java/io/questdb/griffin/engine/functions/table/ReadParquetRecordCursor.java
core/src/main/java/io/questdb/griffin/engine/groupby/GroupByRecordCursorFactory.java
core/src/main/java/io/questdb/griffin/engine/table/GroupByMapFragment.java
core/src/main/java/io/questdb/jit/CompiledFilterIRSerializer.java
core/src/main/java/io/questdb/std/str/Utf8SplitString.java
core/src/test/java/io/questdb/test/cairo/ColumnTypeTest.java
core/src/test/java/io/questdb/test/cairo/map/UnorderedVarcharMapTest.java
core/src/test/java/io/questdb/test/griffin/engine/functions/catalogue/TypeOfFunctionFactoryTest.java
core/src/test/java/io/questdb/test/griffin/engine/table/parquet/ReadParquetFunctionTest.java
core/src/test/java/io/questdb/test/tools/TestUtils.java

benchmarks/src/main/java/org/questdb/VarcharSliceBenchmark.java

core/rust/qdbr/benches/decode_page.rs

core/rust/qdbr/src/parquet_read/column_sink/var.rs

core/rust/qdbr/src/parquet_read/decode.rs

core/src/main/java/io/questdb/cairo/map/UnorderedVarcharMap.java

core/src/main/java/io/questdb/cairo/VarcharTypeDriver.java

core/src/main/java/io/questdb/griffin/SqlCodeGenerator.java

core/src/main/java/io/questdb/jit/CompiledFilterIRSerializer.java

core/src/test/java/io/questdb/test/griffin/engine/table/parquet/ReadParquetFunctionTest.java

…ad filter condition

…rd_parquet_varchar

…pParquetExportTest

…s in LimitedSizeLongTreeChain

… filtered fill-nulls tests

…st cases

bluestreak01 · 2026-03-11T20:35:27Z

@RaphDal I reviewed the PR in detail — memory lifetime, Rust decoders, Java type integration, and parquet compatibility. The pointer lifecycle design is sound (no use-after-free found), and backward/forward compat looks good. A few findings below:

Confirmed Issues

1. (HIGH) Dictionary size overflow in write path — `varchar.rs:202,232`

let key = dict_entries.len() as u32;  // line 202: silent truncation if > 2^32 entries
// ...
(dict_entries.len() - 1) as u32       // line 232: wraps to u32::MAX

If a column has more than 2^32 unique strings, dict_entries.len() as u32 silently wraps. This corrupts dictionary indices and bits_per_key in the output parquet file. The read side (rle_dict_varchar_slice.rs:42) validates dict value lengths, but the write side doesn't check NDV.

Suggested fix: Add if dict_entries.len() >= u32::MAX as usize { return Err(...) } before the cast.

2. (MEDIUM) No alignment guard in `fixup_varchar_slice_spill_pointers` — `var.rs:587`

let entry_count = aux.len() / VARCHAR_AUX_SIZE;

If aux_vec.len() is ever not a multiple of 16 (e.g. due to a bug upstream), the integer division silently drops the remainder. The truncated tail entry's spill offset would never be fixed up — silent data corruption.

Suggested fix: Add debug_assert!(aux.len() % VARCHAR_AUX_SIZE == 0).

3. (MEDIUM) Unchecked pointer arithmetic in fixup — `var.rs:617`

let ptr = data_base + offset;

If a malformed parquet file produces a very large offset, data_base + offset wraps on u64, producing an invalid pointer stored in the aux entry and dereferenced later.

Suggested fix: Use data_base.checked_add(offset).ok_or_else(|| fmt_err!(...)) or at minimum a debug assertion.

4. (LOW) `RleDictVarcharSliceDecoder::result()` always returns `Ok` — `rle_dict_varchar_slice.rs:388`

Unlike RleDictionaryDecoder which checks accumulated errors in result(), this decoder's result() is a no-op. Individual push/push_slice calls do propagate errors, so this is defense-in-depth rather than a correctness bug, but the inconsistency with the existing decoder pattern is worth noting.

Things that look good

Memory lifetime: No use-after-free. The flyweight pattern, PageFrameMemoryPool usage-bit tracking, LimitedSizeLongTreeChain frame-change detection, and unstable-string flags all work correctly together.
Backward compat: Old DLBA-encoded files are correctly read via DeltaLAVarcharSliceDecoder.
Java integration: VARCHAR_SLICE is properly normalized to VARCHAR in all code paths that feed into MapFactory, GroupByUtils, JOIN key types, and UNION cast resolution. The UNION_CAST_MATRIX row 40 correctly mirrors row 26.
No panics in production paths: All reserve() contracts are satisfied by callers in decode.rs. The unpacked array in the bitpacked iterator is fixed-size ([u32; 32]), so the pointer from as_ref().as_ptr() stays valid across decode_next_pack() calls.

bluestreak01 · 2026-03-11T20:39:40Z

Correction to my review above — after double-checking, I need to retract/downgrade several findings:

Retracted (false positives)

Finding Abstract configuration builder #3 (data_base + offset overflow): The offset is always data_vec.len() written by the spill sink (line 525 of var.rs), not from untrusted parquet input. Since data_base + offset stays within a valid allocation, the addition cannot overflow. Not a real issue.
Finding Parameterize TxListener.onCommit() #4 (result() always Ok): This is actually correct by design. The existing RleDictionaryDecoder needs deferred error storage because its inner Slicer::skip() returns (), so it can't propagate errors inline. The new RleDictVarcharSliceDecoder has skip() -> ParquetResult<()> — all errors propagate immediately via ?. There's nothing to accumulate, so result() -> Ok(()) is correct.

Downgraded

Finding Event Appender #1 (dict overflow): HIGH → LOW. The write path processes one row group at a time (default 100K rows). Overflowing u32 requires >4 billion unique strings in a single row group, which would need >100 GB of HashMap memory. Practically unreachable, though an as u32 guard would be textbook defensive coding.
Finding Ignore unsupported field types #2 (alignment guard): MEDIUM → LOW. Every writer adds aux entries in multiples of AUX_ENTRY_SIZE = 16 via extend_from_slice or set_len, so aux.len() % 16 == 0 by construction. A debug_assert! would be nice but this isn't a real bug.

Net: no real bugs found. The implementation is solid. Apologies for the false alarms — the initial review was too aggressive on edge cases without properly tracing the data origins.

glasstiger · 2026-03-11T21:35:08Z

[PR Coverage check]

😍 pass : 4962 / 5117 (96.97%)

file detail

	path	covered line	new line	coverage
🔵	qdb-core/src/col_driver/mod.rs	0	4	00.00%
🔵	io/questdb/griffin/SqlCodeGenerator.java	1	2	50.00%
🔵	qdb-core/src/col_type.rs	1	2	50.00%
🔵	io/questdb/cairo/sql/PageFrameFilteredMemoryRecord.java	2	4	50.00%
🔵	io/questdb/griffin/RecordToRowCopierUtils.java	4	7	57.14%
🔵	io/questdb/griffin/LoopingRecordToRowCopier.java	3	5	60.00%
🔵	qdbr/src/parquet_write/file.rs	22	34	64.71%
🔵	io/questdb/cutlass/parquet/HybridColumnMaterializer.java	2	3	66.67%
🔵	io/questdb/cairo/map/MapFactory.java	3	4	75.00%
🔵	qdbr/src/parquet_read/row_groups.rs	102	129	79.07%
🔵	qdbr/tests/common/mod.rs	84	93	90.32%
🔵	qdbr/src/parquet_read/decode.rs	67	72	93.06%
🔵	qdbr/src/parquet_write/varchar.rs	277	295	93.90%
🔵	qdbr/src/parquet_read/column_sink/var.rs	113	119	94.96%
🔵	qdbr/tests/row_groups.rs	114	121	94.21%
🔵	qdbr/src/parquet_read/decoders/delta_binary_packed.rs	531	556	95.50%
🔵	qdbr/src/parquet_write/schema.rs	19	20	95.00%
🔵	qdbr/tests/varchar_slice.rs	1161	1185	97.97%
🔵	qdbr/src/parquet_read/decoders/rle_dict_varchar_slice.rs	1503	1504	99.93%
🔵	qdbr/src/parquet_read/decoders/delta_length_array.rs	845	850	99.41%
🔵	io/questdb/griffin/engine/functions/table/ReadParquetRecordCursor.java	9	9	100.00%
🔵	qdbr/src/parquet_read/mod.rs	3	3	100.00%
🔵	io/questdb/cairo/sql/PageFrameMemoryRecord.java	4	4	100.00%
🔵	io/questdb/std/str/Utf8SplitString.java	7	7	100.00%
🔵	qdbr/src/parquet/qdb_metadata.rs	11	11	100.00%
🔵	io/questdb/griffin/engine/table/GroupByMapFragment.java	2	2	100.00%
🔵	io/questdb/griffin/SqlCompilerImpl.java	1	1	100.00%
🔵	qdbr/src/parquet_read/decoders/plain.rs	1	1	100.00%
🔵	io/questdb/cairo/map/UnorderedVarcharMap.java	16	16	100.00%
🔵	qdbr/src/parquet_read/decoders/rle_dictionary.rs	1	1	100.00%
🔵	qdbr/src/parquet_read/decoders/rle.rs	1	1	100.00%
🔵	io/questdb/jit/CompiledFilterIRSerializer.java	1	1	100.00%
🔵	io/questdb/griffin/engine/orderby/LimitedSizeLongTreeChain.java	12	12	100.00%
🔵	qdbr/src/parquet_read/slicer/mod.rs	2	2	100.00%
🔵	io/questdb/cairo/ColumnType.java	15	15	100.00%
🔵	io/questdb/griffin/engine/groupby/GroupByRecordCursorFactory.java	1	1	100.00%
🔵	io/questdb/cairo/VarcharTypeDriver.java	15	15	100.00%
🔵	io/questdb/cairo/sql/PageFrameMemoryPool.java	6	6	100.00%

…ce for parquet files (#6809)

…ce for parquet files (questdb#6809)

RaphDal added the DO NOT MERGE These changes should not be merged to main branch label Feb 24, 2026

RaphDal force-pushed the rd_parquet_varchar branch 5 times, most recently from 6a21f4b to 0481828 Compare March 3, 2026 15:57

RaphDal mentioned this pull request Mar 3, 2026

feat(sql): add per-column parquet encoding/compression config #6843

Merged

RaphDal and others added 14 commits March 4, 2026 09:24

feat: add varchar slice type for zero-copy parquet decoding

829dc7a

feat: add VARCHAR_SLICE to supported column types in ColumnTypeTest

8dd4dd4

feat: add support for VARCHAR retrieval in PARQUET format

cf795f5

Rebuild CXX libraries

b6c862e

refactor: update VarcharSlice handling to use a unified header format…

4b2d667

… across encoders and decoders

Rebuild CXX libraries

c382921

fix: correct Varchar handling for PARQUET format in PageFrameFiltered…

8aaac46

…MemoryRecord

fix: add VARCHAR_SLICE type handling in TypeOfFunctionFactoryTest

230c4a4

fix: restore AARCH64 architecture support in JitUtil

60d3125

fix: enhance ASCII detection for Varchar columns in to_parquet_schema

9f267c1

fix: implement deferred key copy functionality in UnorderedVarcharMap…

41add8b

… and update related methods

fix: enhance varchar slice handling in DecodeContext and add decompre…

dbc47fb

…ssion strategy

fix: add ascii field to ParquetDecoder for enhanced column handling

e3fe67a

Rebuild Rust libraries

8554e85

RaphDal force-pushed the rd_parquet_varchar branch from 48a931f to 8554e85 Compare March 4, 2026 08:24

Rebuild Rust libraries

6015090

coderabbitai bot reviewed Mar 4, 2026

View reviewed changes

RaphDal added 3 commits March 4, 2026 11:07

fix: benchmark should fail if some data pages are ignored

cf6df70

fix: remove unnecessary scalar execution checks and update parquet re…

8975473

…ad filter condition

fix: enhance error handling for varchar_slice spill value length

47c20ca

RaphDal and others added 19 commits March 6, 2026 13:33

tests: adding unit-tests to decoders on top of existing e2e tests

d0ccb55

Rebuild Rust libraries

17df687

tests: more unit tests

dd02ad9

Merge branch 'rd_parquet_varchar' of github.com:questdb/questdb into …

a64052e

…rd_parquet_varchar

clippy

66db2f0

formatting

5eaa762

Merge branch 'master' into rd_parquet_varchar

4606bf5

fix: Fix use-after-free from in parallel top-k

cfc3e34

Merge branch 'master' into rd_parquet_varchar

f147ebe

Rebuild Rust libraries

f19293b

Merge branch 'master' into rd_parquet_varchar

c266a73

fix(test): handle varying error messages for query cancellation in Ex…

fda59f3

…pParquetExportTest

Merge branch 'master' into rd_parquet_varchar

93ee1a3

Rebuild Rust libraries

e8985b0

Merge branch 'master' into rd_parquet_varchar

1c25c30

fix(orderby): ensure comparator's left side is valid for frame change…

828854d

…s in LimitedSizeLongTreeChain

fix(varchar): improve error handling in varchar_to_dict_pages and add…

2d00323

… filtered fill-nulls tests

test(varchar): add tests for VARCHAR handling and improve existing te…

4a11aa7

…st cases

Rebuild Rust libraries

5419720

RaphDal changed the title ~~perf(parquet): improve strings decoding performance for parquet files~~ perf(parquet): breaking change 💥 - improve strings decoding performance for parquet files Mar 11, 2026

Merge remote-tracking branch 'origin/master' into rd_parquet_varchar

0313740

fix merge issue

78170c8

bluestreak01 approved these changes Mar 11, 2026

View reviewed changes

bluestreak01 merged commit c5dc7a1 into master Mar 11, 2026
53 checks passed

bluestreak01 deleted the rd_parquet_varchar branch March 11, 2026 23:28

mtopolnik pushed a commit that referenced this pull request Mar 12, 2026

perf(parquet): breaking change 💥 - improve strings decoding performan…

d124163

…ce for parquet files (#6809)

maciulis pushed a commit to maciulis/questdb that referenced this pull request Mar 16, 2026

perf(parquet): breaking change 💥 - improve strings decoding performan…

4c85d32

…ce for parquet files (questdb#6809)

Conversation

RaphDal commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Breaking change

Benchmark

Uh oh!

coderabbitai bot commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

RaphDal commented Mar 4, 2026

Uh oh!

coderabbitai bot commented Mar 4, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bluestreak01 commented Mar 11, 2026

Confirmed Issues

1. (HIGH) Dictionary size overflow in write path — varchar.rs:202,232

2. (MEDIUM) No alignment guard in fixup_varchar_slice_spill_pointers — var.rs:587

3. (MEDIUM) Unchecked pointer arithmetic in fixup — var.rs:617

4. (LOW) RleDictVarcharSliceDecoder::result() always returns Ok — rle_dict_varchar_slice.rs:388

Things that look good

Uh oh!

bluestreak01 commented Mar 11, 2026

Retracted (false positives)

Downgraded

Uh oh!

glasstiger commented Mar 11, 2026

[PR Coverage check]

file detail

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

RaphDal commented Feb 24, 2026 •

edited

Loading

coderabbitai bot commented Feb 24, 2026 •

edited

Loading

1. (HIGH) Dictionary size overflow in write path — `varchar.rs:202,232`

2. (MEDIUM) No alignment guard in `fixup_varchar_slice_spill_pointers` — `var.rs:587`

3. (MEDIUM) Unchecked pointer arithmetic in fixup — `var.rs:617`

4. (LOW) `RleDictVarcharSliceDecoder::result()` always returns `Ok` — `rle_dict_varchar_slice.rs:388`