perf(sql): optimize parquet decode rowgroup performance#6632
perf(sql): optimize parquet decode rowgroup performance#6632bluestreak01 merged 53 commits intomasterfrom
Conversation
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the
WalkthroughAdds bulk-skip/advance APIs and sink-based streaming to Parquet decoders/slicers, bulk in-place writes for column sinks, required-field propagation in writers/schema, symbol nullability encoding, cache-hit avoidance for cached parquet frames, and numerous test updates and defaults changes (codec and cache sizes). Changes
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Possibly related issues
Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@kafka1991 can we leverage bloom filters for these queries when those are present? |
|
also, lets compare hot perf to duck |
hey @bluestreak01 This optimization is only one part of the picture. My idea is to wait until we’ve finished all our internal optimizations before doing a performance comparison with duck. WDYT? |
sure |
There was a problem hiding this comment.
Actionable comments posted: 9
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (6)
core/src/main/java/io/questdb/cairo/DefaultCairoConfiguration.java (1)
689-696: UpdategetParquetExportCompressionLevel()to return 0 for LZ4_RAW.The method returns
9unconditionally, but the codec is nowLZ4_RAW, which does not support compression levels (unlike ZSTD). This is inconsistent withPropServerConfiguration, which correctly defaults to0for non-ZSTD codecs, and withExportModel.java, which explicitly treatsLZ4_RAWas a codec that doesn't use compression level. The test expectations inServerMainTest.javaalso confirm the compression level should be0for LZ4_RAW export.Change line 695 to match the conditional logic used in
PropServerConfiguration:return parquetExportCompressionCodec == ParquetCompression.COMPRESSION_ZSTD ? 9 : 0;Or, more directly, return
0since the codec is fixed toLZ4_RAWat this configuration level.core/rust/qdbr/src/parquet_write/fixed_len_bytes.rs (1)
68-82: Passcolumn.requiredthrough tobytes_to_pageto avoid schema-encoding mismatch.The
Column.requiredfield is available and correctly used for Symbol columns (line 765 in file.rs), but is not propagated tochunk_to_primitive_pageorbytes_to_page. Since Long128, Uuid, and Long256 columns can be marked as required in the schema (viacolumn_type_to_parquet_type), hardcodingrequired=falseinbuild_plain_pagecreates a mismatch: the schema descriptor will advertiseRepetition::Requiredwhile the page encoding includes definition levels andrequired=false. Updatechunk_to_primitive_pageto passcolumn.required, and refactorbytes_to_pageto accept and use this flag to skip definition level encoding when appropriate.core/rust/qdbr/src/parquet_write/varchar.rs (1)
232-277: Guard against usize overflow before unsafe bulk writes.At Line 266 and Line 276,
count * ENTRY_SIZEandbase + ...can overflow, which would under-allocate and then write past the buffer inside the unsafe block. Add checked arithmetic and fail early.✅ Suggested fix
_ => { const ENTRY_SIZE: usize = 16; // 10 bytes header + 6 bytes offset let offset = data_mem.len(); assert!(offset < VARCHAR_MAX_COLUMN_SIZE); let mut null_entry = [0u8; ENTRY_SIZE]; null_entry[..10].copy_from_slice(&VARCHAR_HEADER_FLAG_NULL); null_entry[10..12].copy_from_slice(&(offset as u16).to_le_bytes()); null_entry[12..16].copy_from_slice(&((offset >> 16) as u32).to_le_bytes()); let base = aux_mem.len(); - aux_mem.reserve(count * ENTRY_SIZE)?; + let total = count + .checked_mul(ENTRY_SIZE) + .ok_or_else(|| fmt_err!(OutOfBounds, "varchar null batch too large"))?; + let new_len = base + .checked_add(total) + .ok_or_else(|| fmt_err!(OutOfBounds, "varchar null batch too large"))?; + aux_mem.reserve(total)?; unsafe { let ptr = aux_mem.as_mut_ptr().add(base); for i in 0..count { std::ptr::copy_nonoverlapping( null_entry.as_ptr(), ptr.add(i * ENTRY_SIZE), ENTRY_SIZE, ); } - aux_mem.set_len(base + count * ENTRY_SIZE); + aux_mem.set_len(new_len); } Ok(()) }core/rust/qdbr/src/parquet_write/symbol.rs (1)
159-221: Required fast‑path must excludecolumn_top/nulls.
Ifrequiredis true whilecolumn_top > 0or any value is-1, def levels are skipped andnull_countremains 0, which corrupts page encoding (andnon_null_len). Ensurerequiredis only true when there are truly no nulls, or fall back to the optional path in those cases.🐛 Suggested fix
- let mut data_buffer = vec![]; + let mut data_buffer = vec![]; + let required_no_nulls = required && column_top == 0; - let definition_levels_byte_length = if required { + let definition_levels_byte_length = if required_no_nulls { 0 } else { // TODO(amunra): Optimize if there's no column top. let deflevels_iter = (0..num_rows).map(|i| { if i < column_top { false } else { let key = column_values[i - column_top]; // negative denotes a null value if key > -1 { true } else { null_count += 1; false } } }); encode_primitive_def_levels(&mut data_buffer, deflevels_iter, num_rows, options.version)?; data_buffer.len() }; @@ - let data_page = build_plain_page( + let data_page = build_plain_page( data_buffer, num_rows, null_count, definition_levels_byte_length, if options.write_statistics { Some(stats.into_parquet_stats(null_count)) } else { None }, primitive_type, options, Encoding::RleDictionary, - required, + required_no_nulls, )?;core/src/main/java/io/questdb/cairo/sql/PageFrameMemoryPool.java (1)
126-135: Guard cache against addressCache changes: buffers with the same frameIndex across different parquet files will cause stale data reuse.The
cacheHitmechanism is keyed only byframeIndex. WhenPageFrameMemoryPool.of()switches to a newPageFrameAddressCache, it frees the parquetDecoder but does not clearcachedParquetBuffers. If the new addressCache has a frame with the same frameIndex as one cached from the previous addressCache,decode()will be skipped due to the cache hit, reusing buffers that contain data from the old parquet file.Pools are reused across different addressCache instances (visible in TimeFrameCursorImpl, LatestByTask, PageFrameReduceTask), making frameIndex collisions possible. Add
releaseParquetBuffers()call inPageFrameMemoryPool.of()to clear stale buffers when switching addressCache, or call it when parquetDecoder switches files inopenParquet().🧹 Suggested fix
public void of(PageFrameAddressCache addressCache) { this.addressCache = addressCache; frameMemory.clear(); + releaseParquetBuffers(); Misc.free(parquetDecoder); }core/src/main/java/io/questdb/cairo/TableWriter.java (1)
1469-1502: Mask the high-bit nullability flag before any ColumnType operations—this is a critical bug.Lines 1470–1473 set
encodeColumnType |= 1 << 31for symbol columns with no nulls, making the stored type negative. However,PartitionDecoder.getColumnType()(line 350–351) returns this raw int without masking, and the decoder immediately passes it toColumnType.isSymbol(),ColumnType.isUndefined(), etc. (lines 298–303) which use direct equality checks.When the high bit is set, these equality checks fail. For example,
SYMBOL (20) | (1 << 31) = -2147483628, and-2147483628 == 20is false, so the decoder misidentifies symbol columns with no nulls as undefined or fails type validation.The fix requires masking the high bit before any
ColumnType.*()call. ApplycolumnType & 0x7FFFFFFFinPartitionDecoder.getColumnType()or immediately after calling it in the decoder loop.
🤖 Fix all issues with AI agents
In `@core/rust/qdbr/parquet2/src/encoding/hybrid_rle/bitmap.rs`:
- Around line 86-101: The branch that handles advancing the mask uses `if count
<= bits_left_in_byte` which fails to advance the byte when `count ==
bits_left_in_byte`; change the condition to `if count < bits_left_in_byte` so an
exact-byte consume falls through to the else branch that advances `self.iter`,
updates `self.current_byte`, and sets `self.mask` (the logic around
`self.mask.rotate_left`, `self.iter`, `self.current_byte`, and `self.mask =
1u8.rotate_left(final_bits as u32)` should remain as-is in the else path).
In `@core/rust/qdbr/src/parquet_read/column_sink/fixed.rs`:
- Around line 67-96: The unsafe bulk-write in push_nulls (inside the _ arm) can
exceed the vector capacity because there is no runtime check before calling
ptr::copy_nonoverlapping and AcVecSetLen::set_len; add a guard to ensure
capacity: either call self.buffers.data_vec.reserve(total_bytes) (or
reserve_exact) right before the unsafe block or add a debug_assert!(base +
total_bytes <= self.buffers.data_vec.capacity()) to validate the invariant;
update the same pattern wherever similar unsafe bulk writes occur (e.g., other
methods that use self.buffers.data_vec, AcVecSetLen::set_len, null_value, and N)
so the pointer writes are guaranteed safe at the write site.
- Around line 344-357: In push_int96_as_epoch_nanos, fix endianness by decoding
the 8-byte nanoseconds and 4-byte Julian day explicitly as little-endian instead
of using ptr::read_unaligned (which preserves host endianness); copy bytes[0..8]
into a [u8;8] and use u64::from_le_bytes for nanos, copy bytes[8..12] into a
[u8;4] and use u32::from_le_bytes for julian_date, then compute days_since_epoch
using JULIAN_UNIX_EPOCH_OFFSET and NANOS_PER_DAY and extend data_vec with
nanos_since_epoch.to_le_bytes() as before.
In `@core/rust/qdbr/src/parquet_read/column_sink/var.rs`:
- Around line 199-214: The unsafe write into self.buffers.data_vec using
ptr::write_bytes can overflow because capacity isn't ensured; before calling
ptr::write_bytes and set_len in the match arm (where ELEM = size_of::<i32>() and
base = self.buffers.data_vec.len()), reserve enough space for base + count *
ELEM (e.g., call reserve or reserve_exact on self.buffers.data_vec for count *
ELEM) so the pointer write is safe, then perform the ptr::write_bytes and
set_len, and finally call write_offset_sequence(&mut self.buffers.aux_vec, base
+ ELEM, ELEM, count); ensure you reference data_vec, ELEM, base, and
write_offset_sequence when applying the fix.
- Around line 311-325: The null-fill branch in var.rs writes count * ELEM bytes
into self.buffers.data_vec without ensuring capacity, risking a buffer overflow;
before the unsafe ptr::write_bytes and set_len on data_vec (inside
BinaryColumnSink::push_nulls / this match arm), call reserve/reserve_exact to
allocate at least count * ELEM additional bytes (or ensure capacity >= base +
count * ELEM), and likewise ensure aux_vec has enough capacity for
write_offset_sequence; then perform the unsafe write_bytes and set_len as
before.
In `@core/rust/qdbr/src/parquet_read/slicer/mod.rs`:
- Around line 49-89: The unsafe ByteSink::extend_from_slice implementations (for
AcVec<u8> and Vec<u8>) assume reserved capacity but append_array currently
writes shape and padding via the unsafe extend_from_slice before reserving for
them (buffers.data_vec.reserve(slicer.data_size()) only covers element values),
so either reserve the extra bytes up-front or use the safe path; modify
append_array to call data_mem.reserve(...) (or buffers.data_vec.reserve(...)
including shape/padding size) before any calls to extend_from_slice that write
shape or padding metadata, or switch those initial writes to
extend_from_slice_safe() to ensure capacity checks; reference the ByteSink impls
(extend_from_slice / extend_from_slice_safe) and the append_array call site to
make this change.
In `@core/rust/qdbr/src/parquet_write/array.rs`:
- Around line 745-785: The unsafe bulk write can overflow because count *
ENTRY_SIZE and base + total_bytes are used unchecked; before calling
aux_mem.reserve(...) and AcVecSetLen::set_len(...) in the default branch,
perform checked arithmetic (use checked_mul for count and ENTRY_SIZE and
checked_add for base and total_bytes), return an Err on overflow, and only then
proceed with reserve and the unsafe copy; reference the identifiers count,
ENTRY_SIZE, total_bytes, base, aux_mem, data_mem, append_array_null, and
AcVecSetLen::set_len when applying these checks.
In `@core/rust/qdbr/src/parquet_write/schema.rs`:
- Around line 26-35: The current blanket is_notnull_type check (matching
ColumnTypeTag::Boolean | Byte | Short | Char) incorrectly forces
Repetition::Required and drops definition levels; change the logic so those
types are only treated as Required when the incoming required parameter is true
(i.e., remove or gate the is_notnull_type condition), so compute repetition from
designated_timestamp || required (and not from type alone) when setting the
repetition variable used for Parquet schema generation; update any references to
is_notnull_type, repetition, designated_timestamp, and required in this function
to reflect this corrected gating.
In `@core/src/main/resources/io/questdb/site/conf/server.conf`:
- Around line 608-612: The server.conf default for
cairo.partition.encoder.parquet.compression.codec (currently commented as
LZ4_RAW) is inconsistent with the Java default returned by
DefaultCairoConfiguration.getPartitionEncoderParquetCompressionCodec()
(ParquetCompression.COMPRESSION_ZSTD); update the commented default in
server.conf to match the Java default (set to ZSTD and ensure the allowed codec
list/comments include ZSTD), or alternatively change the Java default to
LZ4_RAW—pick one consistent canonical default and make the config key
(cairo.partition.encoder.parquet.compression.codec) and
DefaultCairoConfiguration.getPartitionEncoderParquetCompressionCodec() agree.
🧹 Nitpick comments (7)
core/src/main/java/io/questdb/cutlass/parquet/CopyExportRequestTask.java (1)
396-407: Inconsistent column type retrieval in non-symbol branch.Line 396 extracts
columnTypeinto afinallocal variable, but line 406 re-fetches it viametadata.getColumnType(i)instead of reusingcolumnType. This is functionally correct but inconsistent with the symbol branch (line 400) which uses the local variable.♻️ Suggested fix for consistency
} else { - columnMetadata.add((long) metadata.getWriterIndex(i) << 32 | metadata.getColumnType(i)); + columnMetadata.add((long) metadata.getWriterIndex(i) << 32 | columnType); }core/rust/qdbr/parquet2/src/encoding/bitpacked/decode.rs (1)
62-95: LGTM — minor optimization opportunity.The
advancemethod correctly maintains invariants:
remainingis decremented first, ensuringnext()won't read stale data if packs are exhausted- Pack boundary crossing is handled correctly
- The edge case where
packed.next()returnsNoneis safe sinceremainingwould be zero♻️ Optional: Use `nth()` instead of loop for skipping packs
- let packs_to_skip = to_skip / T::Unpacked::LENGTH; - for _ in 0..packs_to_skip { - self.packed.next(); - } + let packs_to_skip = to_skip / T::Unpacked::LENGTH; + if packs_to_skip > 0 { + self.packed.nth(packs_to_skip - 1); + }Using
nth(n-1)skipsnelements more efficiently than a loop, asChunksiterator may optimize this.core/src/main/java/io/questdb/cairo/O3PartitionJob.java (2)
1774-1781: Use a named flag for required symbol encoding.
The sign-bit mask in Line 1775 is correct but opaque; a named constant makes the intent explicit and reduces the chance of accidental misuse.♻️ Suggested tweak
- if (!symbolMapWriter.getNullFlag()) { - encodeColumnType |= 1 << 31; - } + if (!symbolMapWriter.getNullFlag()) { + encodeColumnType |= Integer.MIN_VALUE; // required (sign bit) + }
2496-2511: Treat ENOENT as benign during phantom-dir cleanup.
If the directory is already gone, logging an error is noisy. Consider skipping the error log whenerrnoindicates “not found.”core/rust/qdbr/src/parquet_write/schema.rs (1)
302-339: Make the required‑bit mask explicit.
column_type < 0and0x7FFFFFFFare magic values; a named mask improves readability and reduces sign‑bit confusion.♻️ Suggested refactor
- let required = column_type < 0; - let column_type: ColumnType = (column_type & 0x7FFFFFFF).try_into()?; + const REQUIRED_FLAG: i32 = 0x8000_0000; + let required = (column_type & REQUIRED_FLAG) != 0; + let column_type: ColumnType = (column_type & !REQUIRED_FLAG).try_into()?;core/src/main/java/io/questdb/griffin/engine/table/parquet/PartitionEncoder.java (1)
164-175: Prefer a named constant for the symbol‑required flag.Using
1 << 31is clear for us now but easy to miss later. A named constant keeps intent explicit and avoids duplication if reused elsewhere.♻️ Suggested tweak
public class PartitionEncoder { + private static final int SYMBOL_REQUIRED_FLAG = 1 << 31; ... - if (!symbolMapReader.containsNullValue()) { - encodeColumnType |= 1 << 31; - } + if (!symbolMapReader.containsNullValue()) { + encodeColumnType |= SYMBOL_REQUIRED_FLAG; + }core/rust/qdbr/src/parquet_read/slicer/mod.rs (1)
12-31:SliceSinkoverwrites instead of appending.
ByteSink::extend_from_sliceimplies append semantics. The currentSliceSinkalways writes at offset 0, so any converter that writes in multiple chunks will clobber prior bytes. Adding a cursor keeps behavior consistent with otherByteSinkimpls.♻️ Proposed fix
-pub struct SliceSink<'a>(pub &'a mut [u8]); +pub struct SliceSink<'a> { + buf: &'a mut [u8], + pos: usize, +} impl ByteSink for SliceSink<'_> { #[inline] fn extend_from_slice(&mut self, data: &[u8]) -> ParquetResult<()> { - self.0[..data.len()].copy_from_slice(data); + let end = self.pos + data.len(); + self.buf[self.pos..end].copy_from_slice(data); + self.pos = end; Ok(()) } #[inline] fn extend_from_slice_safe(&mut self, data: &[u8]) -> ParquetResult<()> { - self.0[..data.len()].copy_from_slice(data); + let end = self.pos + data.len(); + if end > self.buf.len() { + return Err(fmt_err!(Layout, "SliceSink overflow")); + } + self.buf[self.pos..end].copy_from_slice(data); + self.pos = end; Ok(()) } } @@ - C::convert(slice, &mut SliceSink(&mut self.buffer)); + C::convert(slice, &mut SliceSink { buf: &mut self.buffer, pos: 0 });
|
@CodeRabbit review again |
|
✅ Actions performedReview triggered.
|
|
Note: Some of the code is covered by tests in #6675 |
|
@coderabbitai I addressed your comments and added tests, pls review again |
|
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
core/src/main/java/io/questdb/cairo/TableWriter.java (1)
1469-1502: Symbol type classification fails in PartitionDecoder for non-nullable symbols.The encoding in TableWriter.java:1472 sets
Integer.MIN_VALUEon symbol types lacking null values. However,PartitionDecoder.copyToSansUnsupported()callsColumnType.isSymbol(columnType)without masking (line 303), andColumnType.isSymbol()performs direct equality:columnType == SYMBOL. When the flag is set, a symbol becomes negative (e.g.,0x80000000 | 12 = -2147483636), which fails the equality check and returns false. Lines 298 (isUndefined) and 319 (isTimestamp) have the same issue.Apply the pattern shown in OwnedMemoryPartitionDescriptor.java:47: call
ColumnType.tagOf(columnType)to strip the flag before any ColumnType API check:// In PartitionDecoder.copyToSansUnsupported() around line 296-303 final int columnType = getColumnType(i); final int taggedType = ColumnType.tagOf(columnType); if (ColumnType.isUndefined(taggedType)) { ... } if (ColumnType.isSymbol(taggedType)) { ... } // and line 319: if (ColumnType.isTimestamp(taggedType) && ...)This affects the TableWriter.convertPartitionNativeToParquet() flow at line 1683 where metadata is read from Parquet.
core/src/main/java/io/questdb/cairo/DefaultCairoConfiguration.java (1)
688-695: Fix compression level for LZ4_RAW codec.Returning
9is incorrect. LZ4_RAW does not support compression levels in the Parquet specification. The codebase itself confirms this:PropServerConfigurationuses level0as the default for LZ4_RAW (only ZSTD gets9),ExportModelexplicitly documents that LZ4_RAW doesn't use compression levels, and test expectations set this to0. The partition encoder path correctly returns0for the same codec. Change this to return0to align with the codec and other configuration paths.Proposed change
`@Override` public int getParquetExportCompressionLevel() { - return 9; + return 0; }
🤖 Fix all issues with AI agents
In `@core/src/main/java/io/questdb/cutlass/parquet/CopyExportRequestTask.java`:
- Around line 398-408: In CopyExportRequestTask inside the block handling symbol
columns (where ColumnType.isSymbol(columnType) is true), add a null check/assert
for symbolTable returned by pageFrameCursor.getSymbolTable(i) (e.g., assert
symbolTable != null : "Symbol table expected for symbol column " + i) before
calling symbolTable.containsNullValue() to avoid an NPE; also replace the magic
bit mask 1 << 31 with a named constant (e.g., SYMBOL_NON_NULL_FLAG) and use that
constant when setting symbolColumnType so the encoding is clear and
maintainable.
🧹 Nitpick comments (3)
core/rust/qdbr/src/parquet_read/column_sink/fixed.rs (1)
136-148: Consider adding debug_assert for consistency with bulk paths.The single-element
push()performs an unsafe write without a capacity check, relying on the caller having calledreserve(). While this follows the same pattern as other single-push methods and is functionally correct when the contract is honored, adding adebug_assert!would provide consistency with the bulk paths and catch contract violations during development.♻️ Optional: Add debug_assert for consistency
fn push(&mut self) -> ParquetResult<()> { let slice = self.slicer.next(); let base = self.buffers.data_vec.len(); + debug_assert!(base + N <= self.buffers.data_vec.capacity()); unsafe { let ptr = self.buffers.data_vec.as_mut_ptr().add(base);core/rust/qdbr/src/parquet_read/slicer/tests.rs (1)
499-502: Minor: Consider bounds checking in test helper.The
get_dict_valuemethod uses direct indexing which will panic on out-of-bounds. This is acceptable for test code as it will catch bugs, but you could optionally use.get()with a more descriptive panic message for easier debugging.core/rust/qdbr/src/parquet_read/slicer/mod.rs (1)
619-666: Consider removing unusederrorfield fromValueConvertSlicer.The
errorfield is initialized toOk(())and never modified in the current implementation. Theresult()method includes it in the chain, but it always returnsOk(()). Either remove it or document the intended future use.♻️ Optional: Remove unused error field
pub struct ValueConvertSlicer<const N: usize, T: DataPageSlicer, C: Converter<N>> { inner_slicer: T, - error: ParquetResult<()>, buffer: [u8; N], _converter: std::marker::PhantomData<C>, } // ... fn result(&self) -> ParquetResult<()> { - self.error.clone().or(self.inner_slicer.result()) + self.inner_slicer.result() } } impl<const N: usize, T: DataPageSlicer, C: Converter<N>> ValueConvertSlicer<N, T, C> { pub fn new(inner_slicer: T) -> Self { Self { inner_slicer, - error: Ok(()), buffer: [0; N], _converter: std::marker::PhantomData, } } }
core/src/main/java/io/questdb/cutlass/parquet/CopyExportRequestTask.java
Show resolved
Hide resolved
[PR Coverage check]😍 pass : 1376 / 1594 (86.32%) file detail
|
Optimizes Parquet partition read performance through three improvements:
1. Reduce memory allocations during Parquet page decoding
decodePage2. Switch default Parquet compression to LZ4_RAW
ZSTD(level=9)toLZ4_RAW3. Use REQUIRED repetition for non-null Symbol columns
4. Skip redundant decode on ParquetBuffers cache hit
Combined improvement: ~6x read performance vs master
Test Environment: M4 Pro
Test query:
Results:
Note: All performance numbers are from hot runs.
Final improvement: Master Branch ~17s → Patch 2.3 s
Why LZ4_RAW as default?
LZ4_RAW produces ~26% larger files than ZSTD(9) (43.9 MB vs 34.8 MB), but delivers 2x faster read performance
Notes & Future Work
REQUIRED definition level potential: The
REQUIREDrepetition shows modest improvement for Symbol columns. Extending this to other non-null columns (e.g.,price) could yield significant cumulative gains — worth exploring.Compression algorithm discussion: For more context on Parquet compression trade-offs, see Parquet Compression Benchmark.
Broad applicability: These optimizations benefit all Parquet reads in QuestDB since they target the low-level
decodePagepath.