perf(sql): optimize parquet decode rowgroup performance by kafka1991 · Pull Request #6632 · questdb/questdb

kafka1991 · 2026-01-13T05:47:03Z

Optimizes Parquet partition read performance through three improvements:

1. Reduce memory allocations during Parquet page decoding

Eliminate intermediate copies for fixed-width column types during decodePage
Batch memory copy for definition-level bitpack decoding instead of per-value iteration
~2x read performance improvement for affected workloads

2. Switch default Parquet compression to LZ4_RAW

Change from ZSTD(level=9) to LZ4_RAW
Faster decompression at the cost of slightly larger file size
~2x additional read performance improvement compare to ZSTD

3. Use REQUIRED repetition for non-null Symbol columns

Skip definition-level decoding when Parquet Symbol column contains no nulls
~10% read performance improvement

4. Skip redundant decode on ParquetBuffers cache hit

Benefits ASOF JOIN scenarios where timeFrame access the same Parquet frame
~25% read performance improvement

Combined improvement: ~6x read performance vs master

Test Environment: M4 Pro

Test query:

WITH
buy AS
( SELECT timestamp, symbol, price FROM trades_parquet WHERE side = 'buy' ),
sell AS
( SELECT timestamp, symbol, price FROM trades_parquet WHERE side = 'sell' )
SELECT
   buy.timestamp timestamp,
   buy.symbol symbol,
   (buy.price - sell.price) spread
FROM buy ASOF JOIN sell ON (symbol) WHERE buy.timestamp BETWEEN '2025-01-01T01:30:00Z' AND '2025-01-08T20:45:00Z';

Results:

Partition Format	Parquet File Size (Per partition)	Query Time
Native	63.9 M	300ms
Master ZSTD(9)	34.8M	~17s
Patch ZSTD(9)	34.8M	~ 8.11s
Patch ZSTD(1)	35.2M	~ 7.92s
Patch SNAPPY	43.7M	~ 8.21s
Patch LZ4_RAW (REQUIRED symbol)	43.9M	~ 4.46s
Patch LZ4_RAW (OPTIONAL symbol)	43.9M	~ 4.81s
Patch LZ4_RAW (REQUIRED symbol) + cache hit opt	43.9M	~3.56s

Note: All performance numbers are from hot runs.

Final improvement: Master Branch ~17s → Patch 2.3 s

Why LZ4_RAW as default?

LZ4_RAW produces ~26% larger files than ZSTD(9) (43.9 MB vs 34.8 MB), but delivers 2x faster read performance

Notes & Future Work

REQUIRED definition level potential: The REQUIRED repetition shows modest improvement for Symbol columns. Extending this to other non-null columns (e.g., price) could yield significant cumulative gains — worth exploring.
Compression algorithm discussion: For more context on Parquet compression trade-offs, see Parquet Compression Benchmark.
Broad applicability: These optimizations benefit all Parquet reads in QuestDB since they target the low-level decodePage path.

coderabbitai · 2026-01-13T05:47:10Z

Important

Review skipped

Auto reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

🔍 Trigger a full review

Walkthrough

Adds bulk-skip/advance APIs and sink-based streaming to Parquet decoders/slicers, bulk in-place writes for column sinks, required-field propagation in writers/schema, symbol nullability encoding, cache-hit avoidance for cached parquet frames, and numerous test updates and defaults changes (codec and cache sizes).

Changes

Cohort / File(s)	Summary
Bitpacked / unpack / bitmap `core/rust/qdbr/parquet2/src/encoding/bitpacked/decode.rs`, `core/rust/qdbr/parquet2/src/encoding/bitpacked/unpack.rs`, `core/rust/qdbr/parquet2/src/encoding/hybrid_rle/bitmap.rs`	Added `advance()` on decoders/bitmap iterator to skip items efficiently; added `#[inline]` to unpack implementations; small fast-paths for 1–4 skips.
Allocator helper `core/rust/qdbr/src/allocator.rs`	New `AcVecSetLen` trait and impl to allow unsafe in-place `set_len` for AcVec-backed Vecs.
Slicer & RLE refactor `core/rust/qdbr/src/parquet_read/slicer/mod.rs`, `core/rust/qdbr/src/parquet_read/slicer/rle.rs`	Introduced `ByteSink` and `Converter<const N>` abstractions; added `next_into`/`next_slice_into` streaming APIs and `skip`/`RepeatN` for RLE iterator; updated slicer implementations to stream into sinks and propagate results.
Column sinks (fixed/var) `core/rust/qdbr/src/parquet_read/column_sink/fixed.rs`, `core/rust/qdbr/src/parquet_read/column_sink/var.rs`, `core/rust/qdbr/src/parquet_read/column_sink/mod.rs`, `core/rust/qdbr/src/parquet_read/column_sink/tests.rs`	Bulk push and bulk-null paths added (0–4 fast paths + unsafe bulk using `AcVecSetLen`); added tests for sinks.
Decode path & boolean aggregation `core/rust/qdbr/src/parquet_read/decode.rs`	Use `DaysToMillisConverter` via slicer generic; replaced `slicer.next()` with `slicer.next_into()` and added boolean run grouping to flush runs in blocks.
Slicer tests `core/rust/qdbr/src/parquet_read/slicer/tests.rs`	Large test suite added covering slicers, skip/next semantics, converters, and sink paths.
Column writing / bulk nulls / converters `core/rust/qdbr/src/parquet_write/*` (array.rs, binary.rs, boolean.rs, fixed_len_bytes.rs, primitive.rs, string.rs, varchar.rs, symbol.rs, util.rs)	`build_plain_page()` signature now takes `required: bool`; many writers pass the new flag; bulk null write paths added (match 0..4 fast paths + unsafe bulk using `AcVecSetLen`); `symbol_to_pages` gains `required` parameter; descriptor max_def_level set from `required`.
Schema / Column struct `core/rust/qdbr/src/parquet_write/schema.rs`	`Column` gains `required: bool`; `column_type_to_parquet_type` takes `required` to set repetition/definition levels.
Rust decoding small changes `core/rust/qdbr/parquet2/src/encoding/hybrid_rle/encoder.rs`, `core/rust/qdbr/parquet2/src/deserialize/utils.rs`	Minor header bit adjustment and formatting; no behaviorally disruptive changes beyond header bit setting.
Error macro `core/rust/qdbr/src/parquet/error.rs`	`fmt_err!` macro extended with pattern supporting variants that take an inner expression.
Page frame cache / cache hit `core/src/main/java/io/questdb/cairo/sql/PageFrameMemoryPool.java`	Added `cacheHit` flag in ParquetBuffers and logic to skip decode when cache is reused.
Symbol encoding & partition descriptors `core/src/main/java/io/questdb/cairo/O3PartitionJob.java`, `core/src/main/java/io/questdb/cairo/TableWriter.java`, `core/src/main/java/io/questdb/griffin/engine/table/parquet/PartitionEncoder.java`, `core/src/main/java/io/questdb/griffin/engine/table/parquet/PartitionDescriptor.java`, `core/src/main/java/io/questdb/griffin/engine/table/parquet/OwnedMemoryPartitionDescriptor.java`	Symbol column type gets high-bit encoding when symbol map has no nulls; `PartitionDescriptor` packing adjusted to avoid sign-extension; symbol cleanup checks use `ColumnType.tagOf`.
Configuration & defaults `core/src/main/java/io/questdb/PropServerConfiguration.java`, `core/src/main/java/io/questdb/cairo/DefaultCairoConfiguration.java`, `core/src/main/resources/io/questdb/site/conf/server.conf`, `pkg/ami/marketplace/assets/server.conf`	Default Parquet compression: ZSTD → LZ4_RAW; partition encoder compression level from 9 → 0; frame cache capacity 3 → 8; test expectations updated.
Java tests & expectations `core/src/test/java/...` (multiple)	Updated expected Parquet payload sizes and configuration-derived expectations; added/modified Parquet-related unit tests (including new symbol tests).

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related issues

Parquet Support Tasks & TODO #6694 — Changes align with tracked Parquet improvements and performance items referenced in that issue.

Possibly related PRs

perf(core): streaming parquet export #6300 — Strong overlap: both add sink-based streaming and skip/advance functionality in slicers and RLE iterator changes.
fix(sql): support more Parquet field type combinations in read_parquet #6069 — Overlaps in parquet_read decode logic and related refactors.
perf(core): scale symbol capacity automatically #6149 — Related symbol-table and symbol-column metadata handling modifications.

Suggested labels

SQL

Suggested reviewers

bluestreak01
puzpuzpuz

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 47.08% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly summarizes the main change: optimizing Parquet decode rowgroup performance through multiple targeted improvements.
Description check	✅ Passed	The description is comprehensive and directly related to the changeset, covering all four optimization areas with detailed explanations, performance metrics, and test results.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

bluestreak01 · 2026-01-14T17:38:29Z

@kafka1991 can we leverage bloom filters for these queries when those are present?

bluestreak01 · 2026-01-14T17:39:08Z

also, lets compare hot perf to duck

kafka1991 · 2026-01-15T01:22:33Z

also, lets compare hot perf to duck

hey @bluestreak01 This optimization is only one part of the picture. My idea is to wait until we’ve finished all our internal optimizations before doing a performance comparison with duck. WDYT?

bluestreak01 · 2026-01-15T04:15:59Z

also, lets compare hot perf to duck

hey @bluestreak01 This optimization is only one part of the picture. My idea is to wait until we’ve finished all our internal optimizations before doing a performance comparison with duck. WDYT?

sure

coderabbitai

Actionable comments posted: 9

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (6)

core/src/main/java/io/questdb/cairo/DefaultCairoConfiguration.java (1)
689-696: Update getParquetExportCompressionLevel() to return 0 for LZ4_RAW.

The method returns 9 unconditionally, but the codec is now LZ4_RAW, which does not support compression levels (unlike ZSTD). This is inconsistent with PropServerConfiguration, which correctly defaults to 0 for non-ZSTD codecs, and with ExportModel.java, which explicitly treats LZ4_RAW as a codec that doesn't use compression level. The test expectations in ServerMainTest.java also confirm the compression level should be 0 for LZ4_RAW export.

Change line 695 to match the conditional logic used in PropServerConfiguration:
return parquetExportCompressionCodec == ParquetCompression.COMPRESSION_ZSTD ? 9 : 0;
Or, more directly, return 0 since the codec is fixed to LZ4_RAW at this configuration level.
core/rust/qdbr/src/parquet_write/fixed_len_bytes.rs (1)

68-82: Pass column.required through to bytes_to_page to avoid schema-encoding mismatch.

The Column.required field is available and correctly used for Symbol columns (line 765 in file.rs), but is not propagated to chunk_to_primitive_page or bytes_to_page. Since Long128, Uuid, and Long256 columns can be marked as required in the schema (via column_type_to_parquet_type), hardcoding required=false in build_plain_page creates a mismatch: the schema descriptor will advertise Repetition::Required while the page encoding includes definition levels and required=false. Update chunk_to_primitive_page to pass column.required, and refactor bytes_to_page to accept and use this flag to skip definition level encoding when appropriate.
core/rust/qdbr/src/parquet_write/varchar.rs (1)
232-277: Guard against usize overflow before unsafe bulk writes.

At Line 266 and Line 276, count * ENTRY_SIZE and base + ... can overflow, which would under-allocate and then write past the buffer inside the unsafe block. Add checked arithmetic and fail early.
✅ Suggested fix
         _ => {
             const ENTRY_SIZE: usize = 16; // 10 bytes header + 6 bytes offset
             let offset = data_mem.len();
             assert!(offset < VARCHAR_MAX_COLUMN_SIZE);

             let mut null_entry = [0u8; ENTRY_SIZE];
             null_entry[..10].copy_from_slice(&VARCHAR_HEADER_FLAG_NULL);
             null_entry[10..12].copy_from_slice(&(offset as u16).to_le_bytes());
             null_entry[12..16].copy_from_slice(&((offset >> 16) as u32).to_le_bytes());

             let base = aux_mem.len();
-            aux_mem.reserve(count * ENTRY_SIZE)?;
+            let total = count
+                .checked_mul(ENTRY_SIZE)
+                .ok_or_else(|| fmt_err!(OutOfBounds, "varchar null batch too large"))?;
+            let new_len = base
+                .checked_add(total)
+                .ok_or_else(|| fmt_err!(OutOfBounds, "varchar null batch too large"))?;
+            aux_mem.reserve(total)?;
             unsafe {
                 let ptr = aux_mem.as_mut_ptr().add(base);
                 for i in 0..count {
                     std::ptr::copy_nonoverlapping(
                         null_entry.as_ptr(),
                         ptr.add(i * ENTRY_SIZE),
                         ENTRY_SIZE,
                     );
                 }
-                aux_mem.set_len(base + count * ENTRY_SIZE);
+                aux_mem.set_len(new_len);
             }
             Ok(())
         }
core/rust/qdbr/src/parquet_write/symbol.rs (1)
159-221: Required fast‑path must exclude column_top/nulls.
If required is true while column_top > 0 or any value is -1, def levels are skipped and null_count remains 0, which corrupts page encoding (and non_null_len). Ensure required is only true when there are truly no nulls, or fall back to the optional path in those cases.
🐛 Suggested fix
-    let mut data_buffer = vec![];
+    let mut data_buffer = vec![];
+    let required_no_nulls = required && column_top == 0;
 
-    let definition_levels_byte_length = if required {
+    let definition_levels_byte_length = if required_no_nulls {
         0
     } else {
         // TODO(amunra): Optimize if there's no column top.
         let deflevels_iter = (0..num_rows).map(|i| {
             if i < column_top {
                 false
             } else {
                 let key = column_values[i - column_top];
                 // negative denotes a null value
                 if key > -1 {
                     true
                 } else {
                     null_count += 1;
                     false
                 }
             }
         });
         encode_primitive_def_levels(&mut data_buffer, deflevels_iter, num_rows, options.version)?;
         data_buffer.len()
     };
@@
-    let data_page = build_plain_page(
+    let data_page = build_plain_page(
         data_buffer,
         num_rows,
         null_count,
         definition_levels_byte_length,
         if options.write_statistics {
             Some(stats.into_parquet_stats(null_count))
         } else {
             None
         },
         primitive_type,
         options,
         Encoding::RleDictionary,
-        required,
+        required_no_nulls,
     )?;
core/src/main/java/io/questdb/cairo/sql/PageFrameMemoryPool.java (1)
126-135: Guard cache against addressCache changes: buffers with the same frameIndex across different parquet files will cause stale data reuse.

The cacheHit mechanism is keyed only by frameIndex. When PageFrameMemoryPool.of() switches to a new PageFrameAddressCache, it frees the parquetDecoder but does not clear cachedParquetBuffers. If the new addressCache has a frame with the same frameIndex as one cached from the previous addressCache, decode() will be skipped due to the cache hit, reusing buffers that contain data from the old parquet file.

Pools are reused across different addressCache instances (visible in TimeFrameCursorImpl, LatestByTask, PageFrameReduceTask), making frameIndex collisions possible. Add releaseParquetBuffers() call in PageFrameMemoryPool.of() to clear stale buffers when switching addressCache, or call it when parquetDecoder switches files in openParquet().
🧹 Suggested fix
     public void of(PageFrameAddressCache addressCache) {
         this.addressCache = addressCache;
         frameMemory.clear();
+        releaseParquetBuffers();
         Misc.free(parquetDecoder);
     }
core/src/main/java/io/questdb/cairo/TableWriter.java (1)

1469-1502: Mask the high-bit nullability flag before any ColumnType operations—this is a critical bug.

Lines 1470–1473 set encodeColumnType |= 1 << 31 for symbol columns with no nulls, making the stored type negative. However, PartitionDecoder.getColumnType() (line 350–351) returns this raw int without masking, and the decoder immediately passes it to ColumnType.isSymbol(), ColumnType.isUndefined(), etc. (lines 298–303) which use direct equality checks.

When the high bit is set, these equality checks fail. For example, SYMBOL (20) | (1 << 31) = -2147483628, and -2147483628 == 20 is false, so the decoder misidentifies symbol columns with no nulls as undefined or fails type validation.

The fix requires masking the high bit before any ColumnType.*() call. Apply columnType & 0x7FFFFFFF in PartitionDecoder.getColumnType() or immediately after calling it in the decoder loop.

🤖 Fix all issues with AI agents

In `@core/rust/qdbr/parquet2/src/encoding/hybrid_rle/bitmap.rs`:
- Around line 86-101: The branch that handles advancing the mask uses `if count
<= bits_left_in_byte` which fails to advance the byte when `count ==
bits_left_in_byte`; change the condition to `if count < bits_left_in_byte` so an
exact-byte consume falls through to the else branch that advances `self.iter`,
updates `self.current_byte`, and sets `self.mask` (the logic around
`self.mask.rotate_left`, `self.iter`, `self.current_byte`, and `self.mask =
1u8.rotate_left(final_bits as u32)` should remain as-is in the else path).

In `@core/rust/qdbr/src/parquet_read/column_sink/fixed.rs`:
- Around line 67-96: The unsafe bulk-write in push_nulls (inside the _ arm) can
exceed the vector capacity because there is no runtime check before calling
ptr::copy_nonoverlapping and AcVecSetLen::set_len; add a guard to ensure
capacity: either call self.buffers.data_vec.reserve(total_bytes) (or
reserve_exact) right before the unsafe block or add a debug_assert!(base +
total_bytes <= self.buffers.data_vec.capacity()) to validate the invariant;
update the same pattern wherever similar unsafe bulk writes occur (e.g., other
methods that use self.buffers.data_vec, AcVecSetLen::set_len, null_value, and N)
so the pointer writes are guaranteed safe at the write site.
- Around line 344-357: In push_int96_as_epoch_nanos, fix endianness by decoding
the 8-byte nanoseconds and 4-byte Julian day explicitly as little-endian instead
of using ptr::read_unaligned (which preserves host endianness); copy bytes[0..8]
into a [u8;8] and use u64::from_le_bytes for nanos, copy bytes[8..12] into a
[u8;4] and use u32::from_le_bytes for julian_date, then compute days_since_epoch
using JULIAN_UNIX_EPOCH_OFFSET and NANOS_PER_DAY and extend data_vec with
nanos_since_epoch.to_le_bytes() as before.

In `@core/rust/qdbr/src/parquet_read/column_sink/var.rs`:
- Around line 199-214: The unsafe write into self.buffers.data_vec using
ptr::write_bytes can overflow because capacity isn't ensured; before calling
ptr::write_bytes and set_len in the match arm (where ELEM = size_of::<i32>() and
base = self.buffers.data_vec.len()), reserve enough space for base + count *
ELEM (e.g., call reserve or reserve_exact on self.buffers.data_vec for count *
ELEM) so the pointer write is safe, then perform the ptr::write_bytes and
set_len, and finally call write_offset_sequence(&mut self.buffers.aux_vec, base
+ ELEM, ELEM, count); ensure you reference data_vec, ELEM, base, and
write_offset_sequence when applying the fix.
- Around line 311-325: The null-fill branch in var.rs writes count * ELEM bytes
into self.buffers.data_vec without ensuring capacity, risking a buffer overflow;
before the unsafe ptr::write_bytes and set_len on data_vec (inside
BinaryColumnSink::push_nulls / this match arm), call reserve/reserve_exact to
allocate at least count * ELEM additional bytes (or ensure capacity >= base +
count * ELEM), and likewise ensure aux_vec has enough capacity for
write_offset_sequence; then perform the unsafe write_bytes and set_len as
before.

In `@core/rust/qdbr/src/parquet_read/slicer/mod.rs`:
- Around line 49-89: The unsafe ByteSink::extend_from_slice implementations (for
AcVec<u8> and Vec<u8>) assume reserved capacity but append_array currently
writes shape and padding via the unsafe extend_from_slice before reserving for
them (buffers.data_vec.reserve(slicer.data_size()) only covers element values),
so either reserve the extra bytes up-front or use the safe path; modify
append_array to call data_mem.reserve(...) (or buffers.data_vec.reserve(...)
including shape/padding size) before any calls to extend_from_slice that write
shape or padding metadata, or switch those initial writes to
extend_from_slice_safe() to ensure capacity checks; reference the ByteSink impls
(extend_from_slice / extend_from_slice_safe) and the append_array call site to
make this change.

In `@core/rust/qdbr/src/parquet_write/array.rs`:
- Around line 745-785: The unsafe bulk write can overflow because count *
ENTRY_SIZE and base + total_bytes are used unchecked; before calling
aux_mem.reserve(...) and AcVecSetLen::set_len(...) in the default branch,
perform checked arithmetic (use checked_mul for count and ENTRY_SIZE and
checked_add for base and total_bytes), return an Err on overflow, and only then
proceed with reserve and the unsafe copy; reference the identifiers count,
ENTRY_SIZE, total_bytes, base, aux_mem, data_mem, append_array_null, and
AcVecSetLen::set_len when applying these checks.

In `@core/rust/qdbr/src/parquet_write/schema.rs`:
- Around line 26-35: The current blanket is_notnull_type check (matching
ColumnTypeTag::Boolean | Byte | Short | Char) incorrectly forces
Repetition::Required and drops definition levels; change the logic so those
types are only treated as Required when the incoming required parameter is true
(i.e., remove or gate the is_notnull_type condition), so compute repetition from
designated_timestamp || required (and not from type alone) when setting the
repetition variable used for Parquet schema generation; update any references to
is_notnull_type, repetition, designated_timestamp, and required in this function
to reflect this corrected gating.

In `@core/src/main/resources/io/questdb/site/conf/server.conf`:
- Around line 608-612: The server.conf default for
cairo.partition.encoder.parquet.compression.codec (currently commented as
LZ4_RAW) is inconsistent with the Java default returned by
DefaultCairoConfiguration.getPartitionEncoderParquetCompressionCodec()
(ParquetCompression.COMPRESSION_ZSTD); update the commented default in
server.conf to match the Java default (set to ZSTD and ensure the allowed codec
list/comments include ZSTD), or alternatively change the Java default to
LZ4_RAW—pick one consistent canonical default and make the config key
(cairo.partition.encoder.parquet.compression.codec) and
DefaultCairoConfiguration.getPartitionEncoderParquetCompressionCodec() agree.

🧹 Nitpick comments (7)

core/src/main/java/io/questdb/cutlass/parquet/CopyExportRequestTask.java (1)
396-407: Inconsistent column type retrieval in non-symbol branch.

Line 396 extracts columnType into a final local variable, but line 406 re-fetches it via metadata.getColumnType(i) instead of reusing columnType. This is functionally correct but inconsistent with the symbol branch (line 400) which uses the local variable.
♻️ Suggested fix for consistency
                 } else {
-                    columnMetadata.add((long) metadata.getWriterIndex(i) << 32 | metadata.getColumnType(i));
+                    columnMetadata.add((long) metadata.getWriterIndex(i) << 32 | columnType);
                 }
core/rust/qdbr/parquet2/src/encoding/bitpacked/decode.rs (1)
62-95: LGTM — minor optimization opportunity.

The advance method correctly maintains invariants:

remaining is decremented first, ensuring next() won't read stale data if packs are exhausted

Pack boundary crossing is handled correctly

The edge case where packed.next() returns None is safe since remaining would be zero
♻️ Optional: Use `nth()` instead of loop for skipping packs
-            let packs_to_skip = to_skip / T::Unpacked::LENGTH;
-            for _ in 0..packs_to_skip {
-                self.packed.next();
-            }
+            let packs_to_skip = to_skip / T::Unpacked::LENGTH;
+            if packs_to_skip > 0 {
+                self.packed.nth(packs_to_skip - 1);
+            }
Using nth(n-1) skips n elements more efficiently than a loop, as Chunks iterator may optimize this.
core/src/main/java/io/questdb/cairo/O3PartitionJob.java (2)
1774-1781: Use a named flag for required symbol encoding.
The sign-bit mask in Line 1775 is correct but opaque; a named constant makes the intent explicit and reduces the chance of accidental misuse.
♻️ Suggested tweak
-                        if (!symbolMapWriter.getNullFlag()) {
-                            encodeColumnType |= 1 << 31;
-                        }
+                        if (!symbolMapWriter.getNullFlag()) {
+                            encodeColumnType |= Integer.MIN_VALUE; // required (sign bit)
+                        }
2496-2511: Treat ENOENT as benign during phantom-dir cleanup.
If the directory is already gone, logging an error is noisy. Consider skipping the error log when errno indicates “not found.”
core/rust/qdbr/src/parquet_write/schema.rs (1)
302-339: Make the required‑bit mask explicit.
column_type < 0 and 0x7FFFFFFF are magic values; a named mask improves readability and reduces sign‑bit confusion.
♻️ Suggested refactor
-        let required = column_type < 0;
-        let column_type: ColumnType = (column_type & 0x7FFFFFFF).try_into()?;
+        const REQUIRED_FLAG: i32 = 0x8000_0000;
+        let required = (column_type & REQUIRED_FLAG) != 0;
+        let column_type: ColumnType = (column_type & !REQUIRED_FLAG).try_into()?;
core/src/main/java/io/questdb/griffin/engine/table/parquet/PartitionEncoder.java (1)
164-175: Prefer a named constant for the symbol‑required flag.

Using 1 << 31 is clear for us now but easy to miss later. A named constant keeps intent explicit and avoids duplication if reused elsewhere.
♻️ Suggested tweak
 public class PartitionEncoder {
+    private static final int SYMBOL_REQUIRED_FLAG = 1 << 31;
     ...
-                    if (!symbolMapReader.containsNullValue()) {
-                        encodeColumnType |= 1 << 31;
-                    }
+                    if (!symbolMapReader.containsNullValue()) {
+                        encodeColumnType |= SYMBOL_REQUIRED_FLAG;
+                    }
core/rust/qdbr/src/parquet_read/slicer/mod.rs (1)
12-31: SliceSink overwrites instead of appending.

ByteSink::extend_from_slice implies append semantics. The current SliceSink always writes at offset 0, so any converter that writes in multiple chunks will clobber prior bytes. Adding a cursor keeps behavior consistent with other ByteSink impls.
♻️ Proposed fix
-pub struct SliceSink<'a>(pub &'a mut [u8]);
+pub struct SliceSink<'a> {
+    buf: &'a mut [u8],
+    pos: usize,
+}
 
 impl ByteSink for SliceSink<'_> {
     #[inline]
     fn extend_from_slice(&mut self, data: &[u8]) -> ParquetResult<()> {
-        self.0[..data.len()].copy_from_slice(data);
+        let end = self.pos + data.len();
+        self.buf[self.pos..end].copy_from_slice(data);
+        self.pos = end;
         Ok(())
     }
 
     #[inline]
     fn extend_from_slice_safe(&mut self, data: &[u8]) -> ParquetResult<()> {
-        self.0[..data.len()].copy_from_slice(data);
+        let end = self.pos + data.len();
+        if end > self.buf.len() {
+            return Err(fmt_err!(Layout, "SliceSink overflow"));
+        }
+        self.buf[self.pos..end].copy_from_slice(data);
+        self.pos = end;
         Ok(())
     }
 }
@@
-        C::convert(slice, &mut SliceSink(&mut self.buffer));
+        C::convert(slice, &mut SliceSink { buf: &mut self.buffer, pos: 0 });

core/rust/qdbr/parquet2/src/encoding/hybrid_rle/bitmap.rs

core/rust/qdbr/src/parquet_read/column_sink/fixed.rs

core/rust/qdbr/src/parquet_read/column_sink/var.rs

core/rust/qdbr/src/parquet_read/slicer/mod.rs

core/rust/qdbr/src/parquet_write/array.rs

core/rust/qdbr/src/parquet_write/schema.rs

core/src/main/resources/io/questdb/site/conf/server.conf

kafka1991 · 2026-01-26T01:57:43Z

@CodeRabbit review again

coderabbitai · 2026-01-26T01:58:04Z

@kafka1991: I'll perform another comprehensive review of the PR, focusing on the Parquet optimization changes including the memory management improvements, cache hit optimization, and the REQUIRED repetition handling for non-null Symbol columns.

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

kafka1991 · 2026-01-26T02:54:49Z

Note: Some of the code is covered by tests in #6675

kafka1991 · 2026-01-26T05:39:42Z

@coderabbitai I addressed your comments and added tests, pls review again

coderabbitai · 2026-01-26T05:40:01Z

@kafka1991: Thank you for addressing the comments and adding tests! I'll perform another comprehensive review of the PR, focusing on the changes you've made and the new test coverage.

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

core/src/main/java/io/questdb/cairo/TableWriter.java (1)
1469-1502: Symbol type classification fails in PartitionDecoder for non-nullable symbols.

The encoding in TableWriter.java:1472 sets Integer.MIN_VALUE on symbol types lacking null values. However, PartitionDecoder.copyToSansUnsupported() calls ColumnType.isSymbol(columnType) without masking (line 303), and ColumnType.isSymbol() performs direct equality: columnType == SYMBOL. When the flag is set, a symbol becomes negative (e.g., 0x80000000 | 12 = -2147483636), which fails the equality check and returns false. Lines 298 (isUndefined) and 319 (isTimestamp) have the same issue.

Apply the pattern shown in OwnedMemoryPartitionDescriptor.java:47: call ColumnType.tagOf(columnType) to strip the flag before any ColumnType API check:
// In PartitionDecoder.copyToSansUnsupported() around line 296-303
final int columnType = getColumnType(i);
final int taggedType = ColumnType.tagOf(columnType);

if (ColumnType.isUndefined(taggedType)) { ... }
if (ColumnType.isSymbol(taggedType)) { ... }
// and line 319: if (ColumnType.isTimestamp(taggedType) && ...)
This affects the TableWriter.convertPartitionNativeToParquet() flow at line 1683 where metadata is read from Parquet.
core/src/main/java/io/questdb/cairo/DefaultCairoConfiguration.java (1)
688-695: Fix compression level for LZ4_RAW codec.

Returning 9 is incorrect. LZ4_RAW does not support compression levels in the Parquet specification. The codebase itself confirms this: PropServerConfiguration uses level 0 as the default for LZ4_RAW (only ZSTD gets 9), ExportModel explicitly documents that LZ4_RAW doesn't use compression levels, and test expectations set this to 0. The partition encoder path correctly returns 0 for the same codec. Change this to return 0 to align with the codec and other configuration paths.
Proposed change
     `@Override`
     public int getParquetExportCompressionLevel() {
-        return 9;
+        return 0;
     }

🤖 Fix all issues with AI agents

In `@core/src/main/java/io/questdb/cutlass/parquet/CopyExportRequestTask.java`:
- Around line 398-408: In CopyExportRequestTask inside the block handling symbol
columns (where ColumnType.isSymbol(columnType) is true), add a null check/assert
for symbolTable returned by pageFrameCursor.getSymbolTable(i) (e.g., assert
symbolTable != null : "Symbol table expected for symbol column " + i) before
calling symbolTable.containsNullValue() to avoid an NPE; also replace the magic
bit mask 1 << 31 with a named constant (e.g., SYMBOL_NON_NULL_FLAG) and use that
constant when setting symbolColumnType so the encoding is clear and
maintainable.

🧹 Nitpick comments (3)

core/rust/qdbr/src/parquet_read/column_sink/fixed.rs (1)
136-148: Consider adding debug_assert for consistency with bulk paths.

The single-element push() performs an unsafe write without a capacity check, relying on the caller having called reserve(). While this follows the same pattern as other single-push methods and is functionally correct when the contract is honored, adding a debug_assert! would provide consistency with the bulk paths and catch contract violations during development.
♻️ Optional: Add debug_assert for consistency
     fn push(&mut self) -> ParquetResult<()> {
         let slice = self.slicer.next();
         let base = self.buffers.data_vec.len();
+        debug_assert!(base + N <= self.buffers.data_vec.capacity());

         unsafe {
             let ptr = self.buffers.data_vec.as_mut_ptr().add(base);
core/rust/qdbr/src/parquet_read/slicer/tests.rs (1)

499-502: Minor: Consider bounds checking in test helper.

The get_dict_value method uses direct indexing which will panic on out-of-bounds. This is acceptable for test code as it will catch bugs, but you could optionally use .get() with a more descriptive panic message for easier debugging.
core/rust/qdbr/src/parquet_read/slicer/mod.rs (1)
619-666: Consider removing unused error field from ValueConvertSlicer.

The error field is initialized to Ok(()) and never modified in the current implementation. The result() method includes it in the chain, but it always returns Ok(()). Either remove it or document the intended future use.
♻️ Optional: Remove unused error field
 pub struct ValueConvertSlicer<const N: usize, T: DataPageSlicer, C: Converter<N>> {
     inner_slicer: T,
-    error: ParquetResult<()>,
     buffer: [u8; N],
     _converter: std::marker::PhantomData<C>,
 }

 // ...

     fn result(&self) -> ParquetResult<()> {
-        self.error.clone().or(self.inner_slicer.result())
+        self.inner_slicer.result()
     }
 }

 impl<const N: usize, T: DataPageSlicer, C: Converter<N>> ValueConvertSlicer<N, T, C> {
     pub fn new(inner_slicer: T) -> Self {
         Self {
             inner_slicer,
-            error: Ok(()),
             buffer: [0; N],
             _converter: std::marker::PhantomData,
         }
     }
 }

core/src/main/java/io/questdb/cutlass/parquet/CopyExportRequestTask.java

glasstiger · 2026-01-27T21:39:32Z

[PR Coverage check]

😍 pass : 1376 / 1594 (86.32%)

file detail

	path	covered line	new line	coverage
🔵	qdbr/parquet2/src/encoding/hybrid_rle/bitmap.rs	0	39	00.00%
🔵	qdbr/parquet2/src/deserialize/utils.rs	0	3	00.00%
🔵	qdbr/src/parquet_write/array.rs	2	32	06.25%
🔵	qdbr/src/parquet_read/slicer/rle.rs	72	109	66.06%
🔵	qdbr/src/parquet_read/slicer/mod.rs	123	175	70.29%
🔵	qdbr/src/parquet_read/column_sink/fixed.rs	119	162	73.46%
🔵	qdbr/parquet2/src/encoding/bitpacked/decode.rs	21	24	87.50%
🔵	io/questdb/cairo/O3PartitionJob.java	8	9	88.89%
🔵	qdbr/src/parquet_read/column_sink/var.rs	76	79	96.20%
🔵	qdbr/src/parquet_write/varchar.rs	29	30	96.67%
🔵	qdbr/src/parquet_read/slicer/tests.rs	550	556	98.92%
🔵	qdbr/parquet2/src/encoding/hybrid_rle/encoder.rs	1	1	100.00%
🔵	io/questdb/griffin/engine/table/parquet/OwnedMemoryPartitionDescriptor.java	1	1	100.00%
🔵	qdbr/src/parquet_write/symbol.rs	16	16	100.00%
🔵	io/questdb/griffin/engine/table/parquet/PartitionDescriptor.java	10	10	100.00%
🔵	io/questdb/cairo/DefaultCairoConfiguration.java	8	8	100.00%
🔵	qdbr/src/parquet_write/file.rs	1	1	100.00%
🔵	io/questdb/griffin/engine/table/parquet/PartitionEncoder.java	3	3	100.00%
🔵	io/questdb/cairo/TableWriter.java	5	5	100.00%
🔵	qdbr/src/parquet_write/util.rs	3	3	100.00%
🔵	io/questdb/PropServerConfiguration.java	5	5	100.00%
🔵	qdbr/src/allocator.rs	4	4	100.00%
🔵	qdbr/src/parquet_read/decode.rs	21	21	100.00%
🔵	io/questdb/cutlass/parquet/CopyExportRequestTask.java	12	12	100.00%
🔵	io/questdb/cairo/sql/PageFrameMemoryPool.java	13	13	100.00%
🔵	qdbr/src/parquet_read/column_sink/tests.rs	243	243	100.00%
🔵	qdbr/src/parquet_write/schema.rs	30	30	100.00%

kafka1991 added 3 commits January 13, 2026 12:10

reduce memory copy for decode data

b95640f

continue optimize decode parquet

2fc1ff6

continue optimize decode parquet

92f0981

kafka1991 marked this pull request as draft January 13, 2026 05:47

Merge branch 'master' into parquet_decode_opt

eb001f7

kafka1991 changed the title ~~perf(sql): optimize parquet decode perfmance~~ perf(sql): optimize parquet decode performance Jan 13, 2026

kafka1991 added 5 commits January 13, 2026 15:19

remove comment

302a5e3

use require flag when symbol column doesn't contain null

843014a

change write parquet default compress algorithm to lz4

a467ce7

Merge branch 'master' into parquet_decode_opt

57b65c6

add code

5f14401

kafka1991 marked this pull request as ready for review January 14, 2026 03:28

Rebuild Rust libraries

51e6df6

kafka1991 changed the title ~~perf(sql): optimize parquet decode performance~~ perf(sql): optimize parquet decode rowgroup performance Jan 14, 2026

kafka1991 added 2 commits January 14, 2026 14:49

fix symbolMapWriter contains null bug

e70691c

fix symbolMapNullFlagsChanged bug

01188db

kafka1991 added the Performance Performance improvements label Jan 14, 2026

merge fallout

56b630f

kafka1991 and others added 5 commits January 15, 2026 10:04

make default compression to lz4_raw

a2d39e9

Rebuild Rust libraries

79d8183

optimize parquet buffer decoding by adding cache hit checks

598af9f

fix tests

1c3ae60

adjust parquetCacheSize parameter to 8

821b340

kafka1991 added 2 commits January 15, 2026 12:36

better performance

fe182ee

code fmt

6e2f05b

Merge remote-tracking branch 'origin/master' into parquet_decode_opt

7d39e2a

coderabbitai bot reviewed Jan 24, 2026

View reviewed changes

code review

12b4e16

GitHub Actions - Rebuild Native Libraries and others added 2 commits January 26, 2026 02:01

Rebuild Rust libraries

84f8edd

add code review

f104af0

kafka1991 and others added 2 commits January 26, 2026 11:42

add tests

fd4e74d

Rebuild Rust libraries

845a9c2

coderabbitai bot reviewed Jan 26, 2026

View reviewed changes

core/src/main/java/io/questdb/cutlass/parquet/CopyExportRequestTask.java Show resolved Hide resolved

kafka1991 and others added 8 commits January 26, 2026 17:02

code review

aa78ae9

Rebuild Rust libraries

e3405ad

Merge branch 'master' into parquet_decode_opt

3d7cf15

add debug_assert! for col_top > 0

c6c9d12

code review

36c31d2

Rebuild Rust libraries

103d35f

add compression_level parameter in parquet export tests

dd6a8be

Merge branch 'master' into parquet_decode_opt

276ae24

bluestreak01 approved these changes Jan 27, 2026

View reviewed changes

Merge branch 'master' into parquet_decode_opt

66cb0f2

bluestreak01 merged commit b0f976f into master Jan 28, 2026
45 checks passed

bluestreak01 deleted the parquet_decode_opt branch January 28, 2026 01:14

This was referenced Jan 28, 2026

fix(core): fix dictionary and bitpack encoding for symbol columns in parquet #6708

Merged

feat(parquet): add decimal type support for parquet read/write #6725

Merged

This was referenced Mar 4, 2026

perf(parquet): breaking change 💥 - improve strings decoding performance for parquet files #6809

Merged

ci(core): fix rust ci coverage, restore clippy run #6854

Merged

Conversation

kafka1991 commented Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Combined improvement: ~6x read performance vs master

Notes & Future Work

Uh oh!

coderabbitai bot commented Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Estimated code review effort

Possibly related issues

Possibly related PRs

Suggested labels

Suggested reviewers

Uh oh!

bluestreak01 commented Jan 14, 2026

Uh oh!

bluestreak01 commented Jan 14, 2026

Uh oh!

kafka1991 commented Jan 15, 2026

Uh oh!

bluestreak01 commented Jan 15, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kafka1991 commented Jan 26, 2026

Uh oh!

coderabbitai bot commented Jan 26, 2026

Uh oh!

kafka1991 commented Jan 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kafka1991 commented Jan 26, 2026

Uh oh!

coderabbitai bot commented Jan 26, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

glasstiger commented Jan 27, 2026

[PR Coverage check]

file detail

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

kafka1991 commented Jan 13, 2026 •

edited

Loading

coderabbitai bot commented Jan 13, 2026 •

edited

Loading

kafka1991 commented Jan 26, 2026 •

edited

Loading