feat(core): parquet row group pruning with min/max statistics and bloom filters#6739
feat(core): parquet row group pruning with min/max statistics and bloom filters#6739bluestreak01 merged 215 commits intomasterfrom
Conversation
✅ Actions performedReview triggered.
|
|
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
core/src/main/java/io/questdb/std/Chars.java (1)
138-162:⚠️ Potential issue | 🟡 MinorInconsistent string comparison semantics between
compare()andgreaterThan()/lessThan()may cause ordering bugs.The
compare()method uses per-character (UTF-16 code unit) comparison, whilegreaterThan()andlessThan()use proper code point comparison viaCharacter.codePointAt(). For strings with supplementary characters (e.g., emoji, rare CJK), this creates inconsistent ordering.This matters because both are used for overlapping operations:
compare()powersMinStrGroupByFunctionandMaxStrGroupByFunction(aggregation)lessThan()powersLtStrFunctionFactory(SQL<filter predicates)If a query aggregates min/max with
compare()then applies filter predicates withlessThan(), results may be inconsistent for supplementary characters. Consider updatingcompare()to use code point comparison for consistency, or document why UTF-16 semantics are intentional.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@core/src/main/java/io/questdb/std/Chars.java` around lines 138 - 162, The compare(CharSequence l, CharSequence r) implementation currently does UTF-16 code unit subtraction causing inconsistent ordering vs. greaterThan()/lessThan() which use code points; update compare() to perform code point-aware comparison (use Character.codePointAt and advance indices by Character.charCount(codePoint)) so it matches the semantics used by lessThan()/greaterThan(); ensure edge cases (null identity, length/remaining code points) and behavior relied on by MinStrGroupByFunction and MaxStrGroupByFunction remain consistent with LtStrFunctionFactory.
♻️ Duplicate comments (6)
core/src/main/java/io/questdb/cairo/TableWriter.java (2)
196-196:⚠️ Potential issue | 🟠 MajorAvoid per-call free/realloc of
bloomFilterIndexes; keep one lifecycle and free indoClose().At Line 1611,
Misc.free(bloomFilterIndexes)inside the conversion path forces repetitive native free/alloc cycles across partition conversions, and it also leaves cleanup ambiguous when conversion is never invoked. Reuse the list per call (reopen + clear) and release it once indoClose().♻️ Suggested fix
@@ - try { - if (bloomFilterColumns != null && !bloomFilterColumns.isEmpty()) { - bloomFilterIndexes.reopen(); - parseBloomFilterColumnIndexes(bloomFilterColumns, bloomFilterIndexes); - bloomFilterColumnIndexesPtr = bloomFilterIndexes.getAddress(); - bloomFilterColumnCount = (int) bloomFilterIndexes.size(); - } + if (bloomFilterColumns != null && !bloomFilterColumns.isEmpty()) { + bloomFilterIndexes.reopen(); + bloomFilterIndexes.clear(); + parseBloomFilterColumnIndexes(bloomFilterColumns, bloomFilterIndexes); + bloomFilterColumnIndexesPtr = bloomFilterIndexes.getAddress(); + bloomFilterColumnCount = (int) bloomFilterIndexes.size(); + } - PartitionEncoder.encodeWithOptions( - partitionDescriptor, - other, - ParquetCompression.packCompressionCodecLevel(compressionCodec, compressionLevel), - statisticsEnabled, - rawArrayEncoding, - rowGroupSize, - dataPageSize, - parquetVersion, - bloomFilterColumnIndexesPtr, - bloomFilterColumnCount, - fpp - ); - } finally { - Misc.free(bloomFilterIndexes); - } + PartitionEncoder.encodeWithOptions( + partitionDescriptor, + other, + ParquetCompression.packCompressionCodecLevel(compressionCodec, compressionLevel), + statisticsEnabled, + rawArrayEncoding, + rowGroupSize, + dataPageSize, + parquetVersion, + bloomFilterColumnIndexesPtr, + bloomFilterColumnCount, + fpp + );// In doClose(): Misc.free(bloomFilterIndexes);Also applies to: 1589-1612
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@core/src/main/java/io/questdb/cairo/TableWriter.java` at line 196, The field bloomFilterIndexes in TableWriter is being freed per conversion call causing repeated native free/alloc; instead keep it allocated for the TableWriter lifecycle, clear and reuse it during conversions (use reopen/clear on bloomFilterIndexes where conversion logic runs) and move the Misc.free(bloomFilterIndexes) call into TableWriter.doClose() so it is released once when the writer is closed; update conversion code to remove per-call Misc.free and replace with bloomFilterIndexes.clear() or equivalent reuse logic.
1587-1587:⚠️ Potential issue | 🟠 MajorValidate
fppbefore passing it into native encoding.Line 1587 accepts infinities and out-of-range values; these should be rejected early to avoid undefined native behavior.
🛡️ Suggested fix
double fpp = Double.isNaN(bloomFilterFpp) ? config.getPartitionEncoderParquetBloomFilterFpp() : bloomFilterFpp; + if (!Double.isFinite(fpp) || fpp <= 0.0 || fpp >= 1.0) { + throw CairoException.nonCritical() + .put("bloom_filter_fpp must be in (0,1), got: ") + .put(fpp); + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@core/src/main/java/io/questdb/cairo/TableWriter.java` at line 1587, Validate the computed fpp in TableWriter (the local variable fpp computed from bloomFilterFpp or config.getPartitionEncoderParquetBloomFilterFpp()) before passing it into native encoding: ensure it's finite (not NaN or infinite) and within the valid probability range (0.0 < fpp < 1.0, or use your project's accepted bounds), and if not throw an IllegalArgumentException with a clear message; update the code path that calls the native encoder to use this validated fpp so invalid values are rejected early.core/src/main/java/io/questdb/griffin/engine/table/ParquetRowGroupFilter.java (1)
77-79:⚠️ Potential issue | 🟡 MinorRename boolean locals to
is.../has...prefixes.Line 77 (
skip), Line 145 (supported), and Line 212 (allCompatible) should follow the repository boolean naming rule.As per coding guidelines
**/*.java: “When choosing a name for a boolean variable, field or method, always use the is... or has... prefix, as appropriate”.Also applies to: 145-146, 212-221
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@core/src/main/java/io/questdb/griffin/engine/table/ParquetRowGroupFilter.java` around lines 77 - 79, Rename the boolean local variables in ParquetRowGroupFilter to follow the is.../has... convention: change skip (used with decoder.canSkipRowGroup) to isSkippableRowGroup or isSkippable, change supported to isSupported, and change allCompatible to areAllCompatible (or isAllCompatible if you prefer singular). Update every use site within the method(s) (including the conditional checks and any increments like rowGroupsSkipped.incrementAndGet()) to reference the new names so compilation and logic remain unchanged.core/rust/qdbr/src/parquet_write/file.rs (1)
158-164:⚠️ Potential issue | 🟠 MajorUse runtime FPP validation in release builds, not only
debug_assert!.At Line 159 this check disappears in release builds, so invalid
fppcan still reach bloom sizing math. Please enforce0.0 < fpp < 1.0at runtime (assert!/error return) in the public setter.#!/bin/bash # Verify whether FPP validation in Rust is debug-only vs runtime-enforced. rg -n "with_bloom_filter_fpp|debug_assert!|assert!|bloom_filter_fpp\\s*[<>]" \ core/rust/qdbr/src/parquet_write/file.rs \ core/rust/qdbr/src/parquet_write/jni.rsExpected result: if
with_bloom_filter_fpponly hasdebug_assert!, runtime validation is not guaranteed in release.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@core/rust/qdbr/src/parquet_write/file.rs` around lines 158 - 164, The debug-only check in with_bloom_filter_fpp allows invalid fpp in release builds; replace the debug_assert! with runtime validation: either use assert!(fpp > 0.0 && fpp < 1.0, "...") to panic on invalid input at runtime or (preferable for a public setter) change with_bloom_filter_fpp signature to return Result<Self, Error> and return an Err when fpp is out of range, then only set self.bloom_filter_fpp = fpp and return Ok(self) when valid; update callers accordingly. Ensure you reference the with_bloom_filter_fpp method and the bloom_filter_fpp field when making the change.core/src/test/java/io/questdb/test/griffin/AlterTableConvertPartitionTest.java (1)
115-179:⚠️ Potential issue | 🟠 MajorBloom-option positive-path tests still don’t prove bloom options were applied.
Line 126, Line 142, Line 158, and Line 175 currently only prove conversion succeeded (
assertPartitionExists). These would still pass ifbloom_filter_columns/fppwere parsed but ignored in the convert path.Suggested follow-up assertion pattern
+import io.questdb.griffin.engine.table.ParquetRowGroupFilter; ... execute("ALTER TABLE " + tableName + " CONVERT PARTITION TO PARQUET LIST '2024-06-10' WITH (bloom_filter_columns = 'id')"); assertPartitionExists(tableName, "2024-06-10.3"); +ParquetRowGroupFilter.resetRowGroupsSkipped(); +assertQueryNoLeakCheck( + "cnt\n0\n", + "SELECT COUNT(*) cnt FROM " + tableName + " WHERE id = -1", + null, + false, + true +); +Assert.assertTrue(ParquetRowGroupFilter.getRowGroupsSkipped() > 0);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@core/src/test/java/io/questdb/test/griffin/AlterTableConvertPartitionTest.java` around lines 115 - 179, Tests like testConvertAllPartitionsWithBloomFilterColumns / testConvertAllPartitionsWithBloomFilterAndFpp / testConvertAllPartitionsWithBloomFilterFpp / testConvertAllPartitionsWithBloomFilterWhere only assert conversion succeeded; update each test to also verify that bloom options were applied by inspecting the converted partition output after execute(...) and assertPartitionExists(...). Concretely: locate the created partition (use the same tableName and partition name used in assertPartitionExists), read the partition’s Parquet/metadata or bloom artifacts (e.g., parquet file footer metadata keys for bloom_filter_columns and fpp or presence/content of bloom filter files for column "id"), and add assertions that the bloom_filter_columns contains "id" and that fpp matches the provided value (e.g., "0.05", "0.01", "0.1"). Ensure these checks run in each corresponding test immediately after assertPartitionExists so the tests fail if the options are parsed but not applied.core/src/main/java/io/questdb/cutlass/parquet/CopyExportRequestTask.java (1)
589-619:⚠️ Potential issue | 🟠 MajorValidate bloom filter FPP at the last pre-native boundary.
fppis passed intocreateStreamingParquetWriter(...)without any local range/finite validation (Line 591-618). Even if some call sites validate, this is the final JVM boundary before JNI/native allocation logic, so it’s worth failing fast here too.This was raised earlier; re-flagging because it still appears unvalidated in the current snippet.
Proposed fix (finite + (0,1) check)
diff --git a/core/src/main/java/io/questdb/cutlass/parquet/CopyExportRequestTask.java b/core/src/main/java/io/questdb/cutlass/parquet/CopyExportRequestTask.java @@ long bloomFilterIndexesPtr = 0; int bloomFilterCount = 0; double fpp = Double.isNaN(bloomFilterFpp) ? DEFAULT_BLOOM_FILTER_FPP : bloomFilterFpp; + if (!Double.isFinite(fpp) || fpp <= 0.0d || fpp >= 1.0d) { + throw CairoException.nonCritical() + .put("bloom_filter_fpp must be between 0 and 1 (exclusive): ") + .put(fpp); + } if (bloomFilterColumns != null && !bloomFilterColumns.isEmpty()) { bloomFilterColumnIndexes.reopen(); parseBloomFilterColumnIndexes(bloomFilterColumns, metadata, bloomFilterColumnIndexes, bloomFilterColumnsPosition); if (bloomFilterColumnIndexes.size() > 0) { bloomFilterIndexesPtr = bloomFilterColumnIndexes.getAddress(); bloomFilterCount = (int) bloomFilterColumnIndexes.size(); } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@core/src/main/java/io/questdb/cutlass/parquet/CopyExportRequestTask.java` around lines 589 - 619, The code uses fpp (derived from bloomFilterFpp) and passes it to createStreamingParquetWriter without validating it's a finite probability in (0,1); update CopyExportRequestTask to validate fpp after computing it (the local variable fpp) to ensure Double.isFinite(fpp) and fpp > 0.0 && fpp < 1.0, and if not throw an IllegalArgumentException (or similar) with a clear message including the invalid value and parameter name before calling createStreamingParquetWriter; this protects the native boundary that expects a valid false-positive probability.
🧹 Nitpick comments (8)
core/rust/qdbr/src/parquet_write/util.rs (1)
428-442: Tighten the boundary test wording and final assertion.Line 428’s comment is slightly misleading (
i32::MINis not the largestu32), and adding one final max assertion would better lock the invariant.Suggested test refinement
- // i32::MIN (0x80000000) is largest as unsigned (2^31) + // i32::MIN (0x80000000) is larger than i32::MAX under u32 ordering (2^31) // i32::MAX (0x7FFFFFFF) is 2^31 - 1 as unsigned @@ mm.update_unsigned(i32::MIN); // 0x80000000 = 2147483648u32 assert_eq!(mm.min, Some(0)); // 0 is still the smallest unsigned + assert_eq!(mm.max, Some(-1)); // 0xFFFFFFFF remains the largest unsigned🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@core/rust/qdbr/src/parquet_write/util.rs` around lines 428 - 442, The boundary test comments and final assertion around mm.update_unsigned are misleading: update the comment for the i32::MIN case to state that i32::MIN maps to 0x80000000 (2147483648u32) rather than calling it the largest u32, and add one more assertion after the i32::MIN update to assert mm.max is still Some(-1) so the invariant is fully checked; locate the test block using mm.update_unsigned, mm.min and mm.max and adjust the comment text and append the final max assertion accordingly.core/src/test/java/io/questdb/test/std/str/CharsTest.java (1)
386-422: Good test coverage for the code point ordering change.The tests comprehensively cover null handling, equality, basic ordering, prefix comparisons, empty strings, and supplementary characters. The surrogate pair test case (
\uD834\uDD1E>"z") correctly validates that code point ordering is used rather than UTF-16 code unit ordering.Consider adding a test case that would fail under the old per-char comparison but passes under code point comparison, to document the behavioral change:
💡 Optional: Add test for BMP vs supplementary ordering
// U+FFFF (BMP) vs U+10000 (first supplementary) // Under char comparison: '\uFFFF' (65535) > '\uD800' (55296) → wrong // Under code point comparison: 65535 < 65536 → correct Assert.assertTrue(Chars.lessThan("\uFFFF", "\uD800\uDC00")); Assert.assertTrue(Chars.greaterThan("\uD800\uDC00", "\uFFFF"));🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@core/src/test/java/io/questdb/test/std/str/CharsTest.java` around lines 386 - 422, Add an explicit BMP vs supplementary comparison to testGreaterThanAndLessThan that would fail under per-char ordering but pass under code point ordering: in the CharsTest.testGreaterThanAndLessThan method, add assertions using Chars.lessThan("\uFFFF", "\uD800\uDC00") and Chars.greaterThan("\uD800\uDC00", "\uFFFF") (i.e., compare U+FFFF to the surrogate pair for U+10000) to document and verify the intended code point semantics.core/src/main/java/io/questdb/cairo/TableWriter.java (1)
7114-7154: Optional perf refactor: precompute descriptor indexes once inparseBloomFilterColumnIndexes.The current implementation recomputes descriptor positions by scanning prior columns for each token (Line 7132 loop), which is avoidable overhead on wide schemas / large column lists. Consider building a one-pass metadataIndex→descriptorIndex map before token parsing.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@core/src/main/java/io/questdb/cairo/TableWriter.java` around lines 7114 - 7154, The parseBloomFilterColumnIndexes method is recomputing descriptorIndex by scanning prior columns for every token; precompute a metadataIndex→descriptorIndex mapping once and use it during token parsing to avoid the inner loop. Add a small int[] or DirectIntList (size metadata.getColumnCount()) and populate it in a single pass over metadata (incrementing a counter when metadata.getColumnType(i) > 0) to record descriptorIndex for each metadata index, then replace the per-token loop that computes descriptorIndex with a constant-time lookup into that map inside parseBloomFilterColumnIndexes (keep existing checks for metadataIndex >= 0 and duplicate detection).core/rust/qdbr/src/parquet_write/file.rs (2)
37-55:WriteOptionscloning is now on the hot path after addingHashSet.Line 850 clones
WriteOptionsper page; after Line 52 this deep-clonesbloom_filter_columnsrepeatedly. Consider making that field cheap-to-clone (e.g.,Arc<HashSet<usize>>) or keeping bloom-column selection out of per-page options.💡 Minimal direction to reduce clone cost
- pub bloom_filter_columns: HashSet<usize>, + pub bloom_filter_columns: Arc<HashSet<usize>>,- pub fn with_bloom_filter_columns(mut self, columns: HashSet<usize>) -> Self { - self.bloom_filter_columns = columns; + pub fn with_bloom_filter_columns(mut self, columns: HashSet<usize>) -> Self { + self.bloom_filter_columns = Arc::new(columns); self }Also applies to: 841-853
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@core/rust/qdbr/src/parquet_write/file.rs` around lines 37 - 55, WriteOptions is being cloned on the hot path and its bloom_filter_columns (HashSet<usize>) causes expensive deep clones; change bloom_filter_columns to a cheap-to-clone type (e.g., Arc<HashSet<usize>>) or remove it from per-page WriteOptions and pass bloom column selection separately to the per-page writer. Update the WriteOptions struct (symbol: WriteOptions) to use Arc<HashSet<usize>> for bloom_filter_columns and adjust all usages/constructors and clones (including where WriteOptions is cloned around line ~850) to account for Arc, or refactor the code that calls WriteOptions.clone() to accept a separate &HashSet/Arc<HashSet> parameter instead.
530-551: Avoid eager page materialization across partitions to cap peak memory.
collect_multi_partition_pagesbuilds a fullVecof pages before compression. For large multi-partition row groups this can inflate RSS and add latency. Prefer streaming/flattening iterators directly intoCompressor.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@core/rust/qdbr/src/parquet_write/file.rs` around lines 530 - 551, collect_multi_partition_pages currently eagerly materializes all Page values into the all_pages Vec causing high memory use; change it to stream pages instead by removing the Vec allocation and either (A) change the function to return a streaming iterator (e.g., impl Iterator<Item = ParquetResult<Page>> or a boxed Box<dyn Iterator...>) that yields the results of chaining/flattening each column_chunk_to_pages(...) iterator (use Iterator::chain or flat_map), or (B) change the API to accept a sink/Compressor callback and directly feed each pages_iter into the Compressor as you iterate. Update callers to consume the iterator or pass the compressor so pages are processed incrementally rather than collected into all_pages. Ensure references to collect_multi_partition_pages, column_chunk_to_pages, and Compressor are updated accordingly.core/src/main/java/io/questdb/griffin/engine/table/PushdownFilterExtractor.java (1)
103-107: Boolean identifiers should follow theis.../has...convention.
allConstantandcontainsQuery(...)don’t follow the project boolean naming rule.As per coding guidelines "When choosing a name for a boolean variable, field or method, always use the is... or has... prefix, as appropriate".
Also applies to: 136-152
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@core/src/main/java/io/questdb/griffin/engine/table/PushdownFilterExtractor.java` around lines 103 - 107, Rename boolean identifiers to follow the is/has convention: change the local variable allConstant to a boolean name like areAllConstants (or isAllConstant) and rename the method containsQuery(...) to hasQuery(...). Update all references inside PushdownFilterExtractor (search for allConstant and containsQuery), including the related occurrences around lines 136-152, plus any call sites, imports, and tests to use the new names so compilation succeeds.core/src/main/java/io/questdb/griffin/SqlCodeGenerator.java (1)
3188-3196: Boolean local names in this block should followis.../has...convention.
enableParallelFilterandenablePreTouchshould be renamed to match repo naming rules for booleans.As per coding guidelines: "When choosing a name for a boolean variable, field or method, always use the is... or has... prefix, as appropriate".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@core/src/main/java/io/questdb/griffin/SqlCodeGenerator.java` around lines 3188 - 3196, Rename the two local boolean variables to follow the is/has convention: change enableParallelFilter to isParallelFilterEnabled and enablePreTouch to isPreTouchEnabled (assignments remain the same: isParallelFilterEnabled = executionContext.isParallelFilterEnabled(); isPreTouchEnabled = SqlHints.hasEnablePreTouchHint(model, model.getName());), and update all subsequent references in this block (including the if check that uses factory.supportsPageFrameCursor() and the later logic that uses these booleans alongside useJit and canCompile) to use the new names.core/src/main/java/io/questdb/cutlass/parquet/CopyExportRequestTask.java (1)
85-113: De-duplicate bloom-filter column tokenization + keep static helpers grouped.
validateBloomFilterColumns()andparseBloomFilterColumnIndexes()implement the same comma-splitting + trimming loop. It’s easy for these to drift again (e.g., one starts accepting whitespace-only tokens, the other rejects). Recommend extracting a shared private static tokenizer/helper so both callers share the exact semantics while still throwingSqlExceptionvsCairoException.Also, since both helpers are
static, consider co-locating them (and keeping static members sorted) to match the repo’s member ordering rules. As per coding guidelines,**/*.java: “Group Java class members by kind (static vs. instance) and visibility, sorted alphabetically.”Also applies to: 347-375
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@core/src/main/java/io/questdb/cutlass/parquet/CopyExportRequestTask.java` around lines 85 - 113, Extract the comma-splitting + trimming logic used in validateBloomFilterColumns and parseBloomFilterColumnIndexes into a single private static helper (e.g., tokenizeColumnList or splitAndTrimColumns) that returns an iterable/list of CharSequence tokens and accepts the original CharSequence input and a base position; then update validateBloomFilterColumns to iterate that helper's tokens and perform meta.getColumnIndexQuiet checks, throwing SqlException with the appropriate position, and update parseBloomFilterColumnIndexes to reuse the same helper but translate missing-column cases into CairoException as before; finally, move and group these new static helper(s) alongside other static members and sort alphabetically to satisfy the class member ordering rules.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@core/rust/qdbr/src/parquet_read/jni.rs`:
- Around line 124-138: The code currently casts filtered_rows_size to
filtered_rows_count and calls slice::from_raw_parts(filtered_rows_ptr,
filtered_rows_count) without validating magnitude, which allows negative i64
values to become huge usize lengths; add a bounds check before constructing the
slice: verify filtered_rows_count is not larger than the actual row group span
(or total rows available from row_group_bufs/row_group_span) and that
filtered_rows_count is non-negative (i.e., input i64 is >= 0) and that
filtered_rows_ptr is non-null when count > 0; if the check fails return
Err(fmt_err!(InvalidLayout, "...")) rather than creating the slice. Ensure you
reference and use filtered_rows_size/filtered_rows_count, filtered_rows_ptr,
row_group_bufs (or the variable that holds the row span) and avoid calling
slice::from_raw_parts until validation passes.
In `@core/rust/qdbr/src/parquet_read/row_groups.rs`:
- Around line 1048-1070: Parquet metadata can declare FixedLenByteArray(0) which
causes panics when code (e.g., compare_signed_be) indexes a[0]; update the
FixedLenByteArray handling (the match arm using validate_filter_span and the
loops that call is_fixed_len_null / is_fixed_len_null_be and
parquet2::bloom_filter::hash_byte) to first guard for size == 0 and
short-circuit (e.g., return Ok(true) or otherwise skip pruning) to avoid slicing
zero-length values, and make compare_signed_be defensive by returning a safe
non-indexing result when given an empty slice; apply the same zero-length guard
to the other affected blocks that use FixedLenByteArray and decimal compare
paths (references: compare_signed_be, is_fixed_len_null, is_fixed_len_null_be,
validate_filter_span).
In
`@core/src/main/java/io/questdb/griffin/engine/table/PushdownFilterExtractor.java`:
- Around line 80-82: The clear() method in PushdownFilterExtractor only clears
conditions but leaves orValues populated, which can retain stale AST nodes;
update the clear() implementation to also reset orValues (e.g., call
orValues.clear() or reinitialize it) so any reused PushdownFilterExtractor
instance won't hold old expression references; ensure you modify the clear()
method in the PushdownFilterExtractor class and verify that both conditions and
orValues are emptied after clear() is called.
In
`@core/src/test/java/io/questdb/test/griffin/AlterTableConvertPartitionTest.java`:
- Line 476: Two test SQL strings use lowercase SQL keywords; update the
execute(...) calls in AlterTableConvertPartitionTest so keywords are uppercase —
change "list" to "LIST" in the ALTER TABLE CONVERT PARTITION statement (the
execute call containing "ALTER TABLE x CONVERT PARTITION TO NATIVE list
'1970-01'") and change any lowercase "and" to "AND" in the other execute(...)
test statement referenced around line 530 so all SQL keywords conform to the
UPPERCASE test convention.
---
Outside diff comments:
In `@core/src/main/java/io/questdb/std/Chars.java`:
- Around line 138-162: The compare(CharSequence l, CharSequence r)
implementation currently does UTF-16 code unit subtraction causing inconsistent
ordering vs. greaterThan()/lessThan() which use code points; update compare() to
perform code point-aware comparison (use Character.codePointAt and advance
indices by Character.charCount(codePoint)) so it matches the semantics used by
lessThan()/greaterThan(); ensure edge cases (null identity, length/remaining
code points) and behavior relied on by MinStrGroupByFunction and
MaxStrGroupByFunction remain consistent with LtStrFunctionFactory.
---
Duplicate comments:
In `@core/rust/qdbr/src/parquet_write/file.rs`:
- Around line 158-164: The debug-only check in with_bloom_filter_fpp allows
invalid fpp in release builds; replace the debug_assert! with runtime
validation: either use assert!(fpp > 0.0 && fpp < 1.0, "...") to panic on
invalid input at runtime or (preferable for a public setter) change
with_bloom_filter_fpp signature to return Result<Self, Error> and return an Err
when fpp is out of range, then only set self.bloom_filter_fpp = fpp and return
Ok(self) when valid; update callers accordingly. Ensure you reference the
with_bloom_filter_fpp method and the bloom_filter_fpp field when making the
change.
In `@core/src/main/java/io/questdb/cairo/TableWriter.java`:
- Line 196: The field bloomFilterIndexes in TableWriter is being freed per
conversion call causing repeated native free/alloc; instead keep it allocated
for the TableWriter lifecycle, clear and reuse it during conversions (use
reopen/clear on bloomFilterIndexes where conversion logic runs) and move the
Misc.free(bloomFilterIndexes) call into TableWriter.doClose() so it is released
once when the writer is closed; update conversion code to remove per-call
Misc.free and replace with bloomFilterIndexes.clear() or equivalent reuse logic.
- Line 1587: Validate the computed fpp in TableWriter (the local variable fpp
computed from bloomFilterFpp or
config.getPartitionEncoderParquetBloomFilterFpp()) before passing it into native
encoding: ensure it's finite (not NaN or infinite) and within the valid
probability range (0.0 < fpp < 1.0, or use your project's accepted bounds), and
if not throw an IllegalArgumentException with a clear message; update the code
path that calls the native encoder to use this validated fpp so invalid values
are rejected early.
In `@core/src/main/java/io/questdb/cutlass/parquet/CopyExportRequestTask.java`:
- Around line 589-619: The code uses fpp (derived from bloomFilterFpp) and
passes it to createStreamingParquetWriter without validating it's a finite
probability in (0,1); update CopyExportRequestTask to validate fpp after
computing it (the local variable fpp) to ensure Double.isFinite(fpp) and fpp >
0.0 && fpp < 1.0, and if not throw an IllegalArgumentException (or similar) with
a clear message including the invalid value and parameter name before calling
createStreamingParquetWriter; this protects the native boundary that expects a
valid false-positive probability.
In
`@core/src/main/java/io/questdb/griffin/engine/table/ParquetRowGroupFilter.java`:
- Around line 77-79: Rename the boolean local variables in ParquetRowGroupFilter
to follow the is.../has... convention: change skip (used with
decoder.canSkipRowGroup) to isSkippableRowGroup or isSkippable, change supported
to isSupported, and change allCompatible to areAllCompatible (or isAllCompatible
if you prefer singular). Update every use site within the method(s) (including
the conditional checks and any increments like
rowGroupsSkipped.incrementAndGet()) to reference the new names so compilation
and logic remain unchanged.
In
`@core/src/test/java/io/questdb/test/griffin/AlterTableConvertPartitionTest.java`:
- Around line 115-179: Tests like testConvertAllPartitionsWithBloomFilterColumns
/ testConvertAllPartitionsWithBloomFilterAndFpp /
testConvertAllPartitionsWithBloomFilterFpp /
testConvertAllPartitionsWithBloomFilterWhere only assert conversion succeeded;
update each test to also verify that bloom options were applied by inspecting
the converted partition output after execute(...) and
assertPartitionExists(...). Concretely: locate the created partition (use the
same tableName and partition name used in assertPartitionExists), read the
partition’s Parquet/metadata or bloom artifacts (e.g., parquet file footer
metadata keys for bloom_filter_columns and fpp or presence/content of bloom
filter files for column "id"), and add assertions that the bloom_filter_columns
contains "id" and that fpp matches the provided value (e.g., "0.05", "0.01",
"0.1"). Ensure these checks run in each corresponding test immediately after
assertPartitionExists so the tests fail if the options are parsed but not
applied.
---
Nitpick comments:
In `@core/rust/qdbr/src/parquet_write/file.rs`:
- Around line 37-55: WriteOptions is being cloned on the hot path and its
bloom_filter_columns (HashSet<usize>) causes expensive deep clones; change
bloom_filter_columns to a cheap-to-clone type (e.g., Arc<HashSet<usize>>) or
remove it from per-page WriteOptions and pass bloom column selection separately
to the per-page writer. Update the WriteOptions struct (symbol: WriteOptions) to
use Arc<HashSet<usize>> for bloom_filter_columns and adjust all
usages/constructors and clones (including where WriteOptions is cloned around
line ~850) to account for Arc, or refactor the code that calls
WriteOptions.clone() to accept a separate &HashSet/Arc<HashSet> parameter
instead.
- Around line 530-551: collect_multi_partition_pages currently eagerly
materializes all Page values into the all_pages Vec causing high memory use;
change it to stream pages instead by removing the Vec allocation and either (A)
change the function to return a streaming iterator (e.g., impl Iterator<Item =
ParquetResult<Page>> or a boxed Box<dyn Iterator...>) that yields the results of
chaining/flattening each column_chunk_to_pages(...) iterator (use
Iterator::chain or flat_map), or (B) change the API to accept a sink/Compressor
callback and directly feed each pages_iter into the Compressor as you iterate.
Update callers to consume the iterator or pass the compressor so pages are
processed incrementally rather than collected into all_pages. Ensure references
to collect_multi_partition_pages, column_chunk_to_pages, and Compressor are
updated accordingly.
In `@core/rust/qdbr/src/parquet_write/util.rs`:
- Around line 428-442: The boundary test comments and final assertion around
mm.update_unsigned are misleading: update the comment for the i32::MIN case to
state that i32::MIN maps to 0x80000000 (2147483648u32) rather than calling it
the largest u32, and add one more assertion after the i32::MIN update to assert
mm.max is still Some(-1) so the invariant is fully checked; locate the test
block using mm.update_unsigned, mm.min and mm.max and adjust the comment text
and append the final max assertion accordingly.
In `@core/src/main/java/io/questdb/cairo/TableWriter.java`:
- Around line 7114-7154: The parseBloomFilterColumnIndexes method is recomputing
descriptorIndex by scanning prior columns for every token; precompute a
metadataIndex→descriptorIndex mapping once and use it during token parsing to
avoid the inner loop. Add a small int[] or DirectIntList (size
metadata.getColumnCount()) and populate it in a single pass over metadata
(incrementing a counter when metadata.getColumnType(i) > 0) to record
descriptorIndex for each metadata index, then replace the per-token loop that
computes descriptorIndex with a constant-time lookup into that map inside
parseBloomFilterColumnIndexes (keep existing checks for metadataIndex >= 0 and
duplicate detection).
In `@core/src/main/java/io/questdb/cutlass/parquet/CopyExportRequestTask.java`:
- Around line 85-113: Extract the comma-splitting + trimming logic used in
validateBloomFilterColumns and parseBloomFilterColumnIndexes into a single
private static helper (e.g., tokenizeColumnList or splitAndTrimColumns) that
returns an iterable/list of CharSequence tokens and accepts the original
CharSequence input and a base position; then update validateBloomFilterColumns
to iterate that helper's tokens and perform meta.getColumnIndexQuiet checks,
throwing SqlException with the appropriate position, and update
parseBloomFilterColumnIndexes to reuse the same helper but translate
missing-column cases into CairoException as before; finally, move and group
these new static helper(s) alongside other static members and sort
alphabetically to satisfy the class member ordering rules.
In
`@core/src/main/java/io/questdb/griffin/engine/table/PushdownFilterExtractor.java`:
- Around line 103-107: Rename boolean identifiers to follow the is/has
convention: change the local variable allConstant to a boolean name like
areAllConstants (or isAllConstant) and rename the method containsQuery(...) to
hasQuery(...). Update all references inside PushdownFilterExtractor (search for
allConstant and containsQuery), including the related occurrences around lines
136-152, plus any call sites, imports, and tests to use the new names so
compilation succeeds.
In `@core/src/main/java/io/questdb/griffin/SqlCodeGenerator.java`:
- Around line 3188-3196: Rename the two local boolean variables to follow the
is/has convention: change enableParallelFilter to isParallelFilterEnabled and
enablePreTouch to isPreTouchEnabled (assignments remain the same:
isParallelFilterEnabled = executionContext.isParallelFilterEnabled();
isPreTouchEnabled = SqlHints.hasEnablePreTouchHint(model, model.getName());),
and update all subsequent references in this block (including the if check that
uses factory.supportsPageFrameCursor() and the later logic that uses these
booleans alongside useJit and canCompile) to use the new names.
In `@core/src/test/java/io/questdb/test/std/str/CharsTest.java`:
- Around line 386-422: Add an explicit BMP vs supplementary comparison to
testGreaterThanAndLessThan that would fail under per-char ordering but pass
under code point ordering: in the CharsTest.testGreaterThanAndLessThan method,
add assertions using Chars.lessThan("\uFFFF", "\uD800\uDC00") and
Chars.greaterThan("\uD800\uDC00", "\uFFFF") (i.e., compare U+FFFF to the
surrogate pair for U+10000) to document and verify the intended code point
semantics.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 1a3e7343-df1f-4aca-92ce-b6a43dd5f874
⛔ Files ignored due to path filters (5)
core/src/main/resources/io/questdb/bin/darwin-aarch64/libquestdbr.dylibis excluded by!**/*.dylibcore/src/main/resources/io/questdb/bin/darwin-x86-64/libquestdbr.dylibis excluded by!**/*.dylibcore/src/main/resources/io/questdb/bin/linux-aarch64/libquestdbr.sois excluded by!**/*.socore/src/main/resources/io/questdb/bin/linux-x86-64/libquestdbr.sois excluded by!**/*.socore/src/main/resources/io/questdb/bin/windows-x86-64/questdbr.dllis excluded by!**/*.dll
📒 Files selected for processing (35)
core/rust/qdbr/src/parquet_read/decode.rscore/rust/qdbr/src/parquet_read/decoders/converters.rscore/rust/qdbr/src/parquet_read/decoders/dictionary.rscore/rust/qdbr/src/parquet_read/jni.rscore/rust/qdbr/src/parquet_read/mod.rscore/rust/qdbr/src/parquet_read/row_groups.rscore/rust/qdbr/src/parquet_write/file.rscore/rust/qdbr/src/parquet_write/jni.rscore/rust/qdbr/src/parquet_write/mod.rscore/rust/qdbr/src/parquet_write/symbol.rscore/rust/qdbr/src/parquet_write/util.rscore/src/main/java/io/questdb/PropertyKey.javacore/src/main/java/io/questdb/cairo/CairoConfiguration.javacore/src/main/java/io/questdb/cairo/TableWriter.javacore/src/main/java/io/questdb/cairo/sql/PartitionFrameCursorFactory.javacore/src/main/java/io/questdb/cutlass/http/processors/ExportQueryProcessor.javacore/src/main/java/io/questdb/cutlass/parquet/CopyExportRequestTask.javacore/src/main/java/io/questdb/griffin/SqlCodeGenerator.javacore/src/main/java/io/questdb/griffin/SqlCompilerImpl.javacore/src/main/java/io/questdb/griffin/SqlParser.javacore/src/main/java/io/questdb/griffin/engine/functions/table/ReadParquetPageFrameCursor.javacore/src/main/java/io/questdb/griffin/engine/table/BwdTableReaderPageFrameCursor.javacore/src/main/java/io/questdb/griffin/engine/table/FwdTableReaderPageFrameCursor.javacore/src/main/java/io/questdb/griffin/engine/table/ParquetRowGroupFilter.javacore/src/main/java/io/questdb/griffin/engine/table/PushdownFilterExtractor.javacore/src/main/java/io/questdb/griffin/engine/table/parquet/PartitionDecoder.javacore/src/main/java/io/questdb/griffin/engine/table/parquet/PartitionEncoder.javacore/src/main/java/io/questdb/std/Chars.javacore/src/main/resources/io/questdb/site/conf/server.confcore/src/test/java/io/questdb/test/griffin/AlterTableConvertPartitionTest.javacore/src/test/java/io/questdb/test/griffin/CopyExportTest.javacore/src/test/java/io/questdb/test/griffin/engine/table/parquet/ReadParquetFunctionTest.javacore/src/test/java/io/questdb/test/std/str/CharsTest.javacore/src/test/resources/sqllogictest/test/parquet/test_parquet_filter_pushdown.testpkg/ami/marketplace/assets/server.conf
🚧 Files skipped from review as they are similar to previous changes (10)
- core/src/test/java/io/questdb/test/griffin/CopyExportTest.java
- core/rust/qdbr/src/parquet_read/decoders/converters.rs
- core/src/main/java/io/questdb/cairo/CairoConfiguration.java
- core/src/test/resources/sqllogictest/test/parquet/test_parquet_filter_pushdown.test
- core/rust/qdbr/src/parquet_read/decode.rs
- core/src/main/java/io/questdb/PropertyKey.java
- core/src/main/java/io/questdb/griffin/SqlParser.java
- pkg/ami/marketplace/assets/server.conf
- core/src/main/resources/io/questdb/site/conf/server.conf
- core/src/main/java/io/questdb/griffin/engine/table/parquet/PartitionEncoder.java
core/src/main/java/io/questdb/griffin/engine/table/PushdownFilterExtractor.java
Show resolved
Hide resolved
core/src/test/java/io/questdb/test/griffin/AlterTableConvertPartitionTest.java
Outdated
Show resolved
Hide resolved
Resolve conflicts from upstream clippy fixes and binary rebuilds. Accept upstream binaries, adopt combined #[allow(clippy::...)] attrs and removed redundant doc comment. Keep branch CI fixes (extra path pattern and object-count check) while adopting upstream's find-based chmod. Co-Authored-By: Claude Opus 4.6 <[email protected]>
The write-side Parquet JNI functions used panic!/assert!/expect() which would crash the JVM via undefined behavior if triggered. This commit replaces them with proper error returns, matching the pattern already used by the read-side JNI functions. Changes by category: Java-guaranteed invariants (null pointer checks from Java callers): - PartitionUpdater destroy/updateFileMetadata/updateRowGroup: replace panic!/assert! with if-null checks that throw CairoException or silently return (for destroy, matching free(NULL) semantics). External-data-dependent panics (corrupted Parquet/column data): - update.rs: replace panic on invalid Parquet version with Result error. - string.rs: replace expect() on invalid offsets and UTF-16 data with map_err error propagation. Change encode_plain/encode_delta to return ParquetResult. - binary.rs: replace expect() on invalid offsets with map_err. Add upfront offset validation in encode_delta to protect all downstream as-usize casts. Change encode_plain/encode_delta to return ParquetResult. - varchar.rs: replace assert! on data corruption with Err return. - column_chunk.rs: replace unwrap() on bloom filter mutex with map_err to handle poisoned lock gracefully. Also changes create_partition_descriptor assert_eq! on col_data_len alignment to a proper Err return, since the function already returns ParquetResult. Co-Authored-By: Claude Opus 4.6 <[email protected]>
Co-Authored-By: Claude Opus 4.6 <[email protected]>
…nto parquet_filter_pushdown_bloom
[PR Coverage check]😍 pass : 4877 / 6029 (80.89%) file detail
|
depend #6675
Summary
Queries on Parquet partitions can now skip entire row groups that contain no matching rows. Row group pruning uses three strategies:
bloom_filter_columnsoption (see DDL/API details below).IS NULL) or where all values are null (forIS NOT NULL) are skipped.Pruning applies to all Parquet read paths (forward scan, backward scan,
read_parquet(), parallel page frame execution).Supported filter operations
col = valuecol IN (v1, v2, ...)col < valuecol <= valuecol > valuecol >= valuecol BETWEEN v1 AND v2col IS NULLcol IS NOT NULLcol = v1 OR col = v2 OR ...Supported column types
Pushdown filtering supports: BYTE, SHORT, CHAR, INT, LONG, FLOAT, DOUBLE, TIMESTAMP, DATE, IPv4, UUID, LONG128, STRING, SYMBOL, VARCHAR, and all DECIMAL widths (8/16/32/64/128/256).
Not supported (pruning silently skipped): BOOLEAN, LONG256, GEOBYTE/GEOSHORT/GEOINT/GEOLONG, BINARY, INTERVAL.
Filter extraction
Top-level AND-connected predicates are extracted. Multiple AND conditions form multiple pushdown filter entries. OR-connected equalities on the same column (
col = v1 OR col = v2 OR ...) are also extracted and treated as an IN-list for bloom filter and min/max checks.Value expressions must be constant or runtime-constant (e.g., bind variables,
now()). Non-constant expressions are silently skipped.Implementation notes
Chars.lessThan()/Chars.greaterThan()now use Unicode code point comparison instead of Java UTF-16 code unit comparison. This aligns STRING/SYMBOL ordering with the SQL standard(byte-order Collation) and enables range filter pushdown for string columnsBloom filter generation
Bloom filter columns and false positive probability (FPP) can be specified on three paths:
ALTER TABLE t CONVERT PARTITION TO PARQUET WITH (bloom_filter_columns = 'col1,col2', fpp = 0.01)COPY t TO '/path/file.parquet' WITH (bloom_filter_columns = 'col1,col2', bloom_filter_fpp = 0.01)/exp): query parametersbloom_filter_columns=col1,col2andbloom_filter_fpp=0.01The bloom filter bitset is sized optimally using the exact NDV (number of distinct values) collected during encoding via
HashSet<u64>. The sizing formula follows the Parquet Split Block Bloom Filter spec:m = -8 * ndv / ln(1 - fpp^(1/8)), clamped to [32 bytes, 4 MB].Configuration
cairo.sql.parquet.row.group.pruning.enabledtruecairo.partition.encoder.parquet.bloom.filter.fpp0.01cairo.parquet.export.bloom.filter.fpp0.01cairo.parquet.export.statistics.enabledtrueOther changes
PartitionDecoder.Metadata.getColumnIndex()now uses case-insensitive column name matching (Chars.equalsIgnoreCase) to align with QuestDB's case-insensitive column semantics and improve interoperability with Parquet files generated by other tools.O3PartitionJob) now propagates the globalbloomFilterFppconfiguration toPartitionUpdater, but does not propagate per-column bloom filter indexes. O3-merged Parquet partitions do not preserve the original per-column bloom filter specification.Benchmark
Using the ClickBench
hits.parquetdataset (~100M rows, ~15GB):This query benefits from min/max statistics alone — no bloom filters needed. Most row groups are skipped because the target value falls outside their min/max range.
TODO
canSkipRowGroupnative functionCopyExportFactoryandExportQueryProcessorParquet export pathscanSkipRowGroupbehavior, allowing the skip logic to be disabled if bugs arise<,<=,>,>=,BETWEEN)IS NULL,IS NOT NULL)col = v1 OR col = v2)skipRowGroupcovering all supported column typesskipRowGroupcovering all supported column types, including column top scenariosCopyExportFactoryandCONVERT TO PARQUETDDL withbloom_filter_columnsoption