feat(core): parquet row group pruning with min/max statistics and bloom filters by kafka1991 · Pull Request #6739 · questdb/questdb

kafka1991 · 2026-02-03T08:37:43Z

Summary

Queries on Parquet partitions can now skip entire row groups that contain no matching rows. Row group pruning uses three strategies:

Min/max statistics: row groups whose per-column value range does not overlap the filter values are skipped. This uses metadata already present in every Parquet file — no configuration needed.
Bloom filters: row groups whose bloom filter reports no match for any filter value are skipped. Bloom filter generation is opt-in via the bloom_filter_columns option (see DDL/API details below).
Null count statistics: row groups where null count is zero (for IS NULL) or where all values are null (for IS NOT NULL) are skipped.

Pruning applies to all Parquet read paths (forward scan, backward scan, read_parquet(), parallel page frame execution).

Supported filter operations

Operation	Example	Pushdown Method
Equality	`col = value`	Bloom filter + Min/Max
In List	`col IN (v1, v2, ...)`	Bloom filter + Min/Max
Less Than	`col < value`	Min/Max statistics
Less Than or Equal	`col <= value`	Min/Max statistics
Greater Than	`col > value`	Min/Max statistics
Greater Than or Equal	`col >= value`	Min/Max statistics
Between	`col BETWEEN v1 AND v2`	Min/Max statistics (decomposed to GE + LE)
Is Null	`col IS NULL`	Null count statistics
Is Not Null	`col IS NOT NULL`	Null count statistics
OR equalities	`col = v1 OR col = v2 OR ...`	Bloom filter + Min/Max (same column only)

Supported column types

Pushdown filtering supports: BYTE, SHORT, CHAR, INT, LONG, FLOAT, DOUBLE, TIMESTAMP, DATE, IPv4, UUID, LONG128, STRING, SYMBOL, VARCHAR, and all DECIMAL widths (8/16/32/64/128/256).

Not supported (pruning silently skipped): BOOLEAN, LONG256, GEOBYTE/GEOSHORT/GEOINT/GEOLONG, BINARY, INTERVAL.

Filter extraction

Top-level AND-connected predicates are extracted. Multiple AND conditions form multiple pushdown filter entries. OR-connected equalities on the same column (col = v1 OR col = v2 OR ...) are also extracted and treated as an IN-list for bloom filter and min/max checks.

Value expressions must be constant or runtime-constant (e.g., bind variables, now()). Non-constant expressions are silently skipped.

Implementation notes

IPv4 uses unsigned comparison for correct ordering of high IP addresses (> 128.x.x.x)
Third-party Parquet files with unsigned integer types (UInt8/16/32/64) skip min/max range filtering, as QuestDB filter values are signed and cannot be correctly compared against unsigned statistics
Breaking change 💥: Chars.lessThan() / Chars.greaterThan() now use Unicode code point comparison instead of Java UTF-16 code unit comparison. This aligns STRING/SYMBOL ordering with the SQL standard(byte-order Collation) and enables range filter pushdown for string columns
Parquet DATE columns (physical type Int32, days since epoch) are handled by converting QuestDB's millisecond filter values to days before comparing with Parquet statistics

Bloom filter generation

Bloom filter columns and false positive probability (FPP) can be specified on three paths:

ALTER TABLE DDL: ALTER TABLE t CONVERT PARTITION TO PARQUET WITH (bloom_filter_columns = 'col1,col2', fpp = 0.01)
COPY TO export: COPY t TO '/path/file.parquet' WITH (bloom_filter_columns = 'col1,col2', bloom_filter_fpp = 0.01)
HTTP export (/exp): query parameters bloom_filter_columns=col1,col2 and bloom_filter_fpp=0.01

The bloom filter bitset is sized optimally using the exact NDV (number of distinct values) collected during encoding via HashSet<u64>. The sizing formula follows the Parquet Split Block Bloom Filter spec: m = -8 * ndv / ln(1 - fpp^(1/8)), clamped to [32 bytes, 4 MB].

Configuration

Property	Default	Description
`cairo.sql.parquet.row.group.pruning.enabled`	`true`	Enable/disable row group pruning at query time
`cairo.partition.encoder.parquet.bloom.filter.fpp`	`0.01`	Default bloom filter FPP for partition encoding
`cairo.parquet.export.bloom.filter.fpp`	`0.01`	Default bloom filter FPP for Parquet exports
`cairo.parquet.export.statistics.enabled`	`true`	Enable/disable statistics generation for Parquet exports

Other changes

PartitionDecoder.Metadata.getColumnIndex() now uses case-insensitive column name matching (Chars.equalsIgnoreCase) to align with QuestDB's case-insensitive column semantics and improve interoperability with Parquet files generated by other tools.
The O3 merge path (O3PartitionJob) now propagates the global bloomFilterFpp configuration to PartitionUpdater, but does not propagate per-column bloom filter indexes. O3-merged Parquet partitions do not preserve the original per-column bloom filter specification.

Benchmark

Using the ClickBench hits.parquet dataset (~100M rows, ~15GB):

SELECT UserID FROM read_parquet('hits.parquet') WHERE UserID = 9223371425449051681;

Branch	Time
master	8s
patch	50ms

This query benefits from min/max statistics alone — no bloom filters needed. Most row groups are skipped because the target value falls outside their min/max range.

TODO

coderabbitai · 2026-03-05T03:02:55Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

questdb-butler · 2026-03-05T03:23:37Z

⚠️ Enterprise CI Failed

The enterprise test suite failed for this PR.

Build: View Details
Tested Commit: 07b39675746d70bfd2cb5ad067f44f41f9f88586

Please investigate the failure before merging.

coderabbitai

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

core/src/main/java/io/questdb/std/Chars.java (1)
138-162: ⚠️ Potential issue | 🟡 Minor

Inconsistent string comparison semantics between compare() and greaterThan()/lessThan() may cause ordering bugs.

The compare() method uses per-character (UTF-16 code unit) comparison, while greaterThan() and lessThan() use proper code point comparison via Character.codePointAt(). For strings with supplementary characters (e.g., emoji, rare CJK), this creates inconsistent ordering.

This matters because both are used for overlapping operations:

compare() powers MinStrGroupByFunction and MaxStrGroupByFunction (aggregation)

lessThan() powers LtStrFunctionFactory (SQL < filter predicates)

If a query aggregates min/max with compare() then applies filter predicates with lessThan(), results may be inconsistent for supplementary characters. Consider updating compare() to use code point comparison for consistency, or document why UTF-16 semantics are intentional.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core/src/main/java/io/questdb/std/Chars.java` around lines 138 - 162, The
compare(CharSequence l, CharSequence r) implementation currently does UTF-16
code unit subtraction causing inconsistent ordering vs. greaterThan()/lessThan()
which use code points; update compare() to perform code point-aware comparison
(use Character.codePointAt and advance indices by
Character.charCount(codePoint)) so it matches the semantics used by
lessThan()/greaterThan(); ensure edge cases (null identity, length/remaining
code points) and behavior relied on by MinStrGroupByFunction and
MaxStrGroupByFunction remain consistent with LtStrFunctionFactory.

♻️ Duplicate comments (6)

core/src/main/java/io/questdb/cairo/TableWriter.java (2)

196-196: ⚠️ Potential issue | 🟠 Major

Avoid per-call free/realloc of bloomFilterIndexes; keep one lifecycle and free in doClose().

At Line 1611, Misc.free(bloomFilterIndexes) inside the conversion path forces repetitive native free/alloc cycles across partition conversions, and it also leaves cleanup ambiguous when conversion is never invoked. Reuse the list per call (reopen + clear) and release it once in doClose().

♻️ Suggested fix

@@
-                try {
-                    if (bloomFilterColumns != null && !bloomFilterColumns.isEmpty()) {
-                        bloomFilterIndexes.reopen();
-                        parseBloomFilterColumnIndexes(bloomFilterColumns, bloomFilterIndexes);
-                        bloomFilterColumnIndexesPtr = bloomFilterIndexes.getAddress();
-                        bloomFilterColumnCount = (int) bloomFilterIndexes.size();
-                    }
+                if (bloomFilterColumns != null && !bloomFilterColumns.isEmpty()) {
+                    bloomFilterIndexes.reopen();
+                    bloomFilterIndexes.clear();
+                    parseBloomFilterColumnIndexes(bloomFilterColumns, bloomFilterIndexes);
+                    bloomFilterColumnIndexesPtr = bloomFilterIndexes.getAddress();
+                    bloomFilterColumnCount = (int) bloomFilterIndexes.size();
+                }
 
-                    PartitionEncoder.encodeWithOptions(
-                            partitionDescriptor,
-                            other,
-                            ParquetCompression.packCompressionCodecLevel(compressionCodec, compressionLevel),
-                            statisticsEnabled,
-                            rawArrayEncoding,
-                            rowGroupSize,
-                            dataPageSize,
-                            parquetVersion,
-                            bloomFilterColumnIndexesPtr,
-                            bloomFilterColumnCount,
-                            fpp
-                    );
-                } finally {
-                    Misc.free(bloomFilterIndexes);
-                }
+                PartitionEncoder.encodeWithOptions(
+                        partitionDescriptor,
+                        other,
+                        ParquetCompression.packCompressionCodecLevel(compressionCodec, compressionLevel),
+                        statisticsEnabled,
+                        rawArrayEncoding,
+                        rowGroupSize,
+                        dataPageSize,
+                        parquetVersion,
+                        bloomFilterColumnIndexesPtr,
+                        bloomFilterColumnCount,
+                        fpp
+                );

// In doClose():
Misc.free(bloomFilterIndexes);

Also applies to: 1589-1612

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@core/src/main/java/io/questdb/cairo/TableWriter.java` at line 196, The field
bloomFilterIndexes in TableWriter is being freed per conversion call causing
repeated native free/alloc; instead keep it allocated for the TableWriter
lifecycle, clear and reuse it during conversions (use reopen/clear on
bloomFilterIndexes where conversion logic runs) and move the
Misc.free(bloomFilterIndexes) call into TableWriter.doClose() so it is released
once when the writer is closed; update conversion code to remove per-call
Misc.free and replace with bloomFilterIndexes.clear() or equivalent reuse logic.

1587-1587: ⚠️ Potential issue | 🟠 Major

Validate fpp before passing it into native encoding.

Line 1587 accepts infinities and out-of-range values; these should be rejected early to avoid undefined native behavior.

🛡️ Suggested fix

                 double fpp = Double.isNaN(bloomFilterFpp) ? config.getPartitionEncoderParquetBloomFilterFpp() : bloomFilterFpp;
+                if (!Double.isFinite(fpp) || fpp <= 0.0 || fpp >= 1.0) {
+                    throw CairoException.nonCritical()
+                            .put("bloom_filter_fpp must be in (0,1), got: ")
+                            .put(fpp);
+                }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@core/src/main/java/io/questdb/cairo/TableWriter.java` at line 1587, Validate
the computed fpp in TableWriter (the local variable fpp computed from
bloomFilterFpp or config.getPartitionEncoderParquetBloomFilterFpp()) before
passing it into native encoding: ensure it's finite (not NaN or infinite) and
within the valid probability range (0.0 < fpp < 1.0, or use your project's
accepted bounds), and if not throw an IllegalArgumentException with a clear
message; update the code path that calls the native encoder to use this
validated fpp so invalid values are rejected early.

core/src/main/java/io/questdb/griffin/engine/table/ParquetRowGroupFilter.java (1)

77-79: ⚠️ Potential issue | 🟡 Minor

Rename boolean locals to is.../has... prefixes.

Line 77 (skip), Line 145 (supported), and Line 212 (allCompatible) should follow the repository boolean naming rule.

As per coding guidelines **/*.java: “When choosing a name for a boolean variable, field or method, always use the is... or has... prefix, as appropriate”.

Also applies to: 145-146, 212-221
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@core/src/main/java/io/questdb/griffin/engine/table/ParquetRowGroupFilter.java`
around lines 77 - 79, Rename the boolean local variables in
ParquetRowGroupFilter to follow the is.../has... convention: change skip (used
with decoder.canSkipRowGroup) to isSkippableRowGroup or isSkippable, change
supported to isSupported, and change allCompatible to areAllCompatible (or
isAllCompatible if you prefer singular). Update every use site within the
method(s) (including the conditional checks and any increments like
rowGroupsSkipped.incrementAndGet()) to reference the new names so compilation
and logic remain unchanged.

core/rust/qdbr/src/parquet_write/file.rs (1)

158-164: ⚠️ Potential issue | 🟠 Major

Use runtime FPP validation in release builds, not only debug_assert!.

At Line 159 this check disappears in release builds, so invalid fpp can still reach bloom sizing math. Please enforce 0.0 < fpp < 1.0 at runtime (assert!/error return) in the public setter.
#!/bin/bash
# Verify whether FPP validation in Rust is debug-only vs runtime-enforced.
rg -n "with_bloom_filter_fpp|debug_assert!|assert!|bloom_filter_fpp\\s*[<>]" \
  core/rust/qdbr/src/parquet_write/file.rs \
  core/rust/qdbr/src/parquet_write/jni.rs
Expected result: if with_bloom_filter_fpp only has debug_assert!, runtime validation is not guaranteed in release.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core/rust/qdbr/src/parquet_write/file.rs` around lines 158 - 164, The
debug-only check in with_bloom_filter_fpp allows invalid fpp in release builds;
replace the debug_assert! with runtime validation: either use assert!(fpp > 0.0
&& fpp < 1.0, "...") to panic on invalid input at runtime or (preferable for a
public setter) change with_bloom_filter_fpp signature to return Result<Self,
Error> and return an Err when fpp is out of range, then only set
self.bloom_filter_fpp = fpp and return Ok(self) when valid; update callers
accordingly. Ensure you reference the with_bloom_filter_fpp method and the
bloom_filter_fpp field when making the change.

core/src/test/java/io/questdb/test/griffin/AlterTableConvertPartitionTest.java (1)

115-179: ⚠️ Potential issue | 🟠 Major

Bloom-option positive-path tests still don’t prove bloom options were applied.

Line 126, Line 142, Line 158, and Line 175 currently only prove conversion succeeded (assertPartitionExists). These would still pass if bloom_filter_columns / fpp were parsed but ignored in the convert path.

Suggested follow-up assertion pattern

+import io.questdb.griffin.engine.table.ParquetRowGroupFilter;
...
 execute("ALTER TABLE " + tableName + " CONVERT PARTITION TO PARQUET LIST '2024-06-10' WITH (bloom_filter_columns = 'id')");
 assertPartitionExists(tableName, "2024-06-10.3");
+ParquetRowGroupFilter.resetRowGroupsSkipped();
+assertQueryNoLeakCheck(
+        "cnt\n0\n",
+        "SELECT COUNT(*) cnt FROM " + tableName + " WHERE id = -1",
+        null,
+        false,
+        true
+);
+Assert.assertTrue(ParquetRowGroupFilter.getRowGroupsSkipped() > 0);

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In
`@core/src/test/java/io/questdb/test/griffin/AlterTableConvertPartitionTest.java`
around lines 115 - 179, Tests like
testConvertAllPartitionsWithBloomFilterColumns /
testConvertAllPartitionsWithBloomFilterAndFpp /
testConvertAllPartitionsWithBloomFilterFpp /
testConvertAllPartitionsWithBloomFilterWhere only assert conversion succeeded;
update each test to also verify that bloom options were applied by inspecting
the converted partition output after execute(...) and
assertPartitionExists(...). Concretely: locate the created partition (use the
same tableName and partition name used in assertPartitionExists), read the
partition’s Parquet/metadata or bloom artifacts (e.g., parquet file footer
metadata keys for bloom_filter_columns and fpp or presence/content of bloom
filter files for column "id"), and add assertions that the bloom_filter_columns
contains "id" and that fpp matches the provided value (e.g., "0.05", "0.01",
"0.1"). Ensure these checks run in each corresponding test immediately after
assertPartitionExists so the tests fail if the options are parsed but not
applied.

core/src/main/java/io/questdb/cutlass/parquet/CopyExportRequestTask.java (1)

589-619: ⚠️ Potential issue | 🟠 Major

Validate bloom filter FPP at the last pre-native boundary.

fpp is passed into createStreamingParquetWriter(...) without any local range/finite validation (Line 591-618). Even if some call sites validate, this is the final JVM boundary before JNI/native allocation logic, so it’s worth failing fast here too.

This was raised earlier; re-flagging because it still appears unvalidated in the current snippet.

Proposed fix (finite + (0,1) check)

diff --git a/core/src/main/java/io/questdb/cutlass/parquet/CopyExportRequestTask.java b/core/src/main/java/io/questdb/cutlass/parquet/CopyExportRequestTask.java
@@
             long bloomFilterIndexesPtr = 0;
             int bloomFilterCount = 0;
             double fpp = Double.isNaN(bloomFilterFpp) ? DEFAULT_BLOOM_FILTER_FPP : bloomFilterFpp;
+            if (!Double.isFinite(fpp) || fpp <= 0.0d || fpp >= 1.0d) {
+                throw CairoException.nonCritical()
+                        .put("bloom_filter_fpp must be between 0 and 1 (exclusive): ")
+                        .put(fpp);
+            }
 
             if (bloomFilterColumns != null && !bloomFilterColumns.isEmpty()) {
                 bloomFilterColumnIndexes.reopen();
                 parseBloomFilterColumnIndexes(bloomFilterColumns, metadata, bloomFilterColumnIndexes, bloomFilterColumnsPosition);
                 if (bloomFilterColumnIndexes.size() > 0) {
                     bloomFilterIndexesPtr = bloomFilterColumnIndexes.getAddress();
                     bloomFilterCount = (int) bloomFilterColumnIndexes.size();
                 }
             }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@core/src/main/java/io/questdb/cutlass/parquet/CopyExportRequestTask.java`
around lines 589 - 619, The code uses fpp (derived from bloomFilterFpp) and
passes it to createStreamingParquetWriter without validating it's a finite
probability in (0,1); update CopyExportRequestTask to validate fpp after
computing it (the local variable fpp) to ensure Double.isFinite(fpp) and fpp >
0.0 && fpp < 1.0, and if not throw an IllegalArgumentException (or similar) with
a clear message including the invalid value and parameter name before calling
createStreamingParquetWriter; this protects the native boundary that expects a
valid false-positive probability.

🧹 Nitpick comments (8)

core/rust/qdbr/src/parquet_write/util.rs (1)
428-442: Tighten the boundary test wording and final assertion.

Line 428’s comment is slightly misleading (i32::MIN is not the largest u32), and adding one final max assertion would better lock the invariant.
Suggested test refinement
-        // i32::MIN (0x80000000) is largest as unsigned (2^31)
+        // i32::MIN (0x80000000) is larger than i32::MAX under u32 ordering (2^31)
         // i32::MAX (0x7FFFFFFF) is 2^31 - 1 as unsigned
@@
         mm.update_unsigned(i32::MIN); // 0x80000000 = 2147483648u32
         assert_eq!(mm.min, Some(0)); // 0 is still the smallest unsigned
+        assert_eq!(mm.max, Some(-1)); // 0xFFFFFFFF remains the largest unsigned
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core/rust/qdbr/src/parquet_write/util.rs` around lines 428 - 442, The
boundary test comments and final assertion around mm.update_unsigned are
misleading: update the comment for the i32::MIN case to state that i32::MIN maps
to 0x80000000 (2147483648u32) rather than calling it the largest u32, and add
one more assertion after the i32::MIN update to assert mm.max is still Some(-1)
so the invariant is fully checked; locate the test block using
mm.update_unsigned, mm.min and mm.max and adjust the comment text and append the
final max assertion accordingly.
core/src/test/java/io/questdb/test/std/str/CharsTest.java (1)
386-422: Good test coverage for the code point ordering change.

The tests comprehensively cover null handling, equality, basic ordering, prefix comparisons, empty strings, and supplementary characters. The surrogate pair test case (\uD834\uDD1E > "z") correctly validates that code point ordering is used rather than UTF-16 code unit ordering.

Consider adding a test case that would fail under the old per-char comparison but passes under code point comparison, to document the behavioral change:
💡 Optional: Add test for BMP vs supplementary ordering
// U+FFFF (BMP) vs U+10000 (first supplementary)
// Under char comparison: '\uFFFF' (65535) > '\uD800' (55296) → wrong
// Under code point comparison: 65535 < 65536 → correct
Assert.assertTrue(Chars.lessThan("\uFFFF", "\uD800\uDC00"));
Assert.assertTrue(Chars.greaterThan("\uD800\uDC00", "\uFFFF"));
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core/src/test/java/io/questdb/test/std/str/CharsTest.java` around lines 386 -
422, Add an explicit BMP vs supplementary comparison to
testGreaterThanAndLessThan that would fail under per-char ordering but pass
under code point ordering: in the CharsTest.testGreaterThanAndLessThan method,
add assertions using Chars.lessThan("\uFFFF", "\uD800\uDC00") and
Chars.greaterThan("\uD800\uDC00", "\uFFFF") (i.e., compare U+FFFF to the
surrogate pair for U+10000) to document and verify the intended code point
semantics.
core/src/main/java/io/questdb/cairo/TableWriter.java (1)
7114-7154: Optional perf refactor: precompute descriptor indexes once in parseBloomFilterColumnIndexes.

The current implementation recomputes descriptor positions by scanning prior columns for each token (Line 7132 loop), which is avoidable overhead on wide schemas / large column lists. Consider building a one-pass metadataIndex→descriptorIndex map before token parsing.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core/src/main/java/io/questdb/cairo/TableWriter.java` around lines 7114 -
7154, The parseBloomFilterColumnIndexes method is recomputing descriptorIndex by
scanning prior columns for every token; precompute a
metadataIndex→descriptorIndex mapping once and use it during token parsing to
avoid the inner loop. Add a small int[] or DirectIntList (size
metadata.getColumnCount()) and populate it in a single pass over metadata
(incrementing a counter when metadata.getColumnType(i) > 0) to record
descriptorIndex for each metadata index, then replace the per-token loop that
computes descriptorIndex with a constant-time lookup into that map inside
parseBloomFilterColumnIndexes (keep existing checks for metadataIndex >= 0 and
duplicate detection).
core/rust/qdbr/src/parquet_write/file.rs (2)
37-55: WriteOptions cloning is now on the hot path after adding HashSet.

Line 850 clones WriteOptions per page; after Line 52 this deep-clones bloom_filter_columns repeatedly. Consider making that field cheap-to-clone (e.g., Arc<HashSet<usize>>) or keeping bloom-column selection out of per-page options.
💡 Minimal direction to reduce clone cost
- pub bloom_filter_columns: HashSet<usize>,
+ pub bloom_filter_columns: Arc<HashSet<usize>>,
- pub fn with_bloom_filter_columns(mut self, columns: HashSet<usize>) -> Self {
-     self.bloom_filter_columns = columns;
+ pub fn with_bloom_filter_columns(mut self, columns: HashSet<usize>) -> Self {
+     self.bloom_filter_columns = Arc::new(columns);
      self
  }
Also applies to: 841-853
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core/rust/qdbr/src/parquet_write/file.rs` around lines 37 - 55, WriteOptions
is being cloned on the hot path and its bloom_filter_columns (HashSet<usize>)
causes expensive deep clones; change bloom_filter_columns to a cheap-to-clone
type (e.g., Arc<HashSet<usize>>) or remove it from per-page WriteOptions and
pass bloom column selection separately to the per-page writer. Update the
WriteOptions struct (symbol: WriteOptions) to use Arc<HashSet<usize>> for
bloom_filter_columns and adjust all usages/constructors and clones (including
where WriteOptions is cloned around line ~850) to account for Arc, or refactor
the code that calls WriteOptions.clone() to accept a separate
&HashSet/Arc<HashSet> parameter instead.
530-551: Avoid eager page materialization across partitions to cap peak memory.

collect_multi_partition_pages builds a full Vec of pages before compression. For large multi-partition row groups this can inflate RSS and add latency. Prefer streaming/flattening iterators directly into Compressor.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core/rust/qdbr/src/parquet_write/file.rs` around lines 530 - 551,
collect_multi_partition_pages currently eagerly materializes all Page values
into the all_pages Vec causing high memory use; change it to stream pages
instead by removing the Vec allocation and either (A) change the function to
return a streaming iterator (e.g., impl Iterator<Item = ParquetResult<Page>> or
a boxed Box<dyn Iterator...>) that yields the results of chaining/flattening
each column_chunk_to_pages(...) iterator (use Iterator::chain or flat_map), or
(B) change the API to accept a sink/Compressor callback and directly feed each
pages_iter into the Compressor as you iterate. Update callers to consume the
iterator or pass the compressor so pages are processed incrementally rather than
collected into all_pages. Ensure references to collect_multi_partition_pages,
column_chunk_to_pages, and Compressor are updated accordingly.
core/src/main/java/io/questdb/griffin/engine/table/PushdownFilterExtractor.java (1)
103-107: Boolean identifiers should follow the is... / has... convention.

allConstant and containsQuery(...) don’t follow the project boolean naming rule.

As per coding guidelines "When choosing a name for a boolean variable, field or method, always use the is... or has... prefix, as appropriate".

Also applies to: 136-152
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@core/src/main/java/io/questdb/griffin/engine/table/PushdownFilterExtractor.java`
around lines 103 - 107, Rename boolean identifiers to follow the is/has
convention: change the local variable allConstant to a boolean name like
areAllConstants (or isAllConstant) and rename the method containsQuery(...) to
hasQuery(...). Update all references inside PushdownFilterExtractor (search for
allConstant and containsQuery), including the related occurrences around lines
136-152, plus any call sites, imports, and tests to use the new names so
compilation succeeds.
core/src/main/java/io/questdb/griffin/SqlCodeGenerator.java (1)
3188-3196: Boolean local names in this block should follow is.../has... convention.

enableParallelFilter and enablePreTouch should be renamed to match repo naming rules for booleans.

As per coding guidelines: "When choosing a name for a boolean variable, field or method, always use the is... or has... prefix, as appropriate".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core/src/main/java/io/questdb/griffin/SqlCodeGenerator.java` around lines
3188 - 3196, Rename the two local boolean variables to follow the is/has
convention: change enableParallelFilter to isParallelFilterEnabled and
enablePreTouch to isPreTouchEnabled (assignments remain the same:
isParallelFilterEnabled = executionContext.isParallelFilterEnabled();
isPreTouchEnabled = SqlHints.hasEnablePreTouchHint(model, model.getName());),
and update all subsequent references in this block (including the if check that
uses factory.supportsPageFrameCursor() and the later logic that uses these
booleans alongside useJit and canCompile) to use the new names.
core/src/main/java/io/questdb/cutlass/parquet/CopyExportRequestTask.java (1)
85-113: De-duplicate bloom-filter column tokenization + keep static helpers grouped.

validateBloomFilterColumns() and parseBloomFilterColumnIndexes() implement the same comma-splitting + trimming loop. It’s easy for these to drift again (e.g., one starts accepting whitespace-only tokens, the other rejects). Recommend extracting a shared private static tokenizer/helper so both callers share the exact semantics while still throwing SqlException vs CairoException.

Also, since both helpers are static, consider co-locating them (and keeping static members sorted) to match the repo’s member ordering rules. As per coding guidelines, **/*.java: “Group Java class members by kind (static vs. instance) and visibility, sorted alphabetically.”

Also applies to: 347-375
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core/src/main/java/io/questdb/cutlass/parquet/CopyExportRequestTask.java`
around lines 85 - 113, Extract the comma-splitting + trimming logic used in
validateBloomFilterColumns and parseBloomFilterColumnIndexes into a single
private static helper (e.g., tokenizeColumnList or splitAndTrimColumns) that
returns an iterable/list of CharSequence tokens and accepts the original
CharSequence input and a base position; then update validateBloomFilterColumns
to iterate that helper's tokens and perform meta.getColumnIndexQuiet checks,
throwing SqlException with the appropriate position, and update
parseBloomFilterColumnIndexes to reuse the same helper but translate
missing-column cases into CairoException as before; finally, move and group
these new static helper(s) alongside other static members and sort
alphabetically to satisfy the class member ordering rules.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@core/rust/qdbr/src/parquet_read/jni.rs`:
- Around line 124-138: The code currently casts filtered_rows_size to
filtered_rows_count and calls slice::from_raw_parts(filtered_rows_ptr,
filtered_rows_count) without validating magnitude, which allows negative i64
values to become huge usize lengths; add a bounds check before constructing the
slice: verify filtered_rows_count is not larger than the actual row group span
(or total rows available from row_group_bufs/row_group_span) and that
filtered_rows_count is non-negative (i.e., input i64 is >= 0) and that
filtered_rows_ptr is non-null when count > 0; if the check fails return
Err(fmt_err!(InvalidLayout, "...")) rather than creating the slice. Ensure you
reference and use filtered_rows_size/filtered_rows_count, filtered_rows_ptr,
row_group_bufs (or the variable that holds the row span) and avoid calling
slice::from_raw_parts until validation passes.

In `@core/rust/qdbr/src/parquet_read/row_groups.rs`:
- Around line 1048-1070: Parquet metadata can declare FixedLenByteArray(0) which
causes panics when code (e.g., compare_signed_be) indexes a[0]; update the
FixedLenByteArray handling (the match arm using validate_filter_span and the
loops that call is_fixed_len_null / is_fixed_len_null_be and
parquet2::bloom_filter::hash_byte) to first guard for size == 0 and
short-circuit (e.g., return Ok(true) or otherwise skip pruning) to avoid slicing
zero-length values, and make compare_signed_be defensive by returning a safe
non-indexing result when given an empty slice; apply the same zero-length guard
to the other affected blocks that use FixedLenByteArray and decimal compare
paths (references: compare_signed_be, is_fixed_len_null, is_fixed_len_null_be,
validate_filter_span).

In
`@core/src/main/java/io/questdb/griffin/engine/table/PushdownFilterExtractor.java`:
- Around line 80-82: The clear() method in PushdownFilterExtractor only clears
conditions but leaves orValues populated, which can retain stale AST nodes;
update the clear() implementation to also reset orValues (e.g., call
orValues.clear() or reinitialize it) so any reused PushdownFilterExtractor
instance won't hold old expression references; ensure you modify the clear()
method in the PushdownFilterExtractor class and verify that both conditions and
orValues are emptied after clear() is called.

In
`@core/src/test/java/io/questdb/test/griffin/AlterTableConvertPartitionTest.java`:
- Line 476: Two test SQL strings use lowercase SQL keywords; update the
execute(...) calls in AlterTableConvertPartitionTest so keywords are uppercase —
change "list" to "LIST" in the ALTER TABLE CONVERT PARTITION statement (the
execute call containing "ALTER TABLE x CONVERT PARTITION TO NATIVE list
'1970-01'") and change any lowercase "and" to "AND" in the other execute(...)
test statement referenced around line 530 so all SQL keywords conform to the
UPPERCASE test convention.

---

Outside diff comments:
In `@core/src/main/java/io/questdb/std/Chars.java`:
- Around line 138-162: The compare(CharSequence l, CharSequence r)
implementation currently does UTF-16 code unit subtraction causing inconsistent
ordering vs. greaterThan()/lessThan() which use code points; update compare() to
perform code point-aware comparison (use Character.codePointAt and advance
indices by Character.charCount(codePoint)) so it matches the semantics used by
lessThan()/greaterThan(); ensure edge cases (null identity, length/remaining
code points) and behavior relied on by MinStrGroupByFunction and
MaxStrGroupByFunction remain consistent with LtStrFunctionFactory.

---

Duplicate comments:
In `@core/rust/qdbr/src/parquet_write/file.rs`:
- Around line 158-164: The debug-only check in with_bloom_filter_fpp allows
invalid fpp in release builds; replace the debug_assert! with runtime
validation: either use assert!(fpp > 0.0 && fpp < 1.0, "...") to panic on
invalid input at runtime or (preferable for a public setter) change
with_bloom_filter_fpp signature to return Result<Self, Error> and return an Err
when fpp is out of range, then only set self.bloom_filter_fpp = fpp and return
Ok(self) when valid; update callers accordingly. Ensure you reference the
with_bloom_filter_fpp method and the bloom_filter_fpp field when making the
change.

In `@core/src/main/java/io/questdb/cairo/TableWriter.java`:
- Line 196: The field bloomFilterIndexes in TableWriter is being freed per
conversion call causing repeated native free/alloc; instead keep it allocated
for the TableWriter lifecycle, clear and reuse it during conversions (use
reopen/clear on bloomFilterIndexes where conversion logic runs) and move the
Misc.free(bloomFilterIndexes) call into TableWriter.doClose() so it is released
once when the writer is closed; update conversion code to remove per-call
Misc.free and replace with bloomFilterIndexes.clear() or equivalent reuse logic.
- Line 1587: Validate the computed fpp in TableWriter (the local variable fpp
computed from bloomFilterFpp or
config.getPartitionEncoderParquetBloomFilterFpp()) before passing it into native
encoding: ensure it's finite (not NaN or infinite) and within the valid
probability range (0.0 < fpp < 1.0, or use your project's accepted bounds), and
if not throw an IllegalArgumentException with a clear message; update the code
path that calls the native encoder to use this validated fpp so invalid values
are rejected early.

In `@core/src/main/java/io/questdb/cutlass/parquet/CopyExportRequestTask.java`:
- Around line 589-619: The code uses fpp (derived from bloomFilterFpp) and
passes it to createStreamingParquetWriter without validating it's a finite
probability in (0,1); update CopyExportRequestTask to validate fpp after
computing it (the local variable fpp) to ensure Double.isFinite(fpp) and fpp >
0.0 && fpp < 1.0, and if not throw an IllegalArgumentException (or similar) with
a clear message including the invalid value and parameter name before calling
createStreamingParquetWriter; this protects the native boundary that expects a
valid false-positive probability.

In
`@core/src/main/java/io/questdb/griffin/engine/table/ParquetRowGroupFilter.java`:
- Around line 77-79: Rename the boolean local variables in ParquetRowGroupFilter
to follow the is.../has... convention: change skip (used with
decoder.canSkipRowGroup) to isSkippableRowGroup or isSkippable, change supported
to isSupported, and change allCompatible to areAllCompatible (or isAllCompatible
if you prefer singular). Update every use site within the method(s) (including
the conditional checks and any increments like
rowGroupsSkipped.incrementAndGet()) to reference the new names so compilation
and logic remain unchanged.

In
`@core/src/test/java/io/questdb/test/griffin/AlterTableConvertPartitionTest.java`:
- Around line 115-179: Tests like testConvertAllPartitionsWithBloomFilterColumns
/ testConvertAllPartitionsWithBloomFilterAndFpp /
testConvertAllPartitionsWithBloomFilterFpp /
testConvertAllPartitionsWithBloomFilterWhere only assert conversion succeeded;
update each test to also verify that bloom options were applied by inspecting
the converted partition output after execute(...) and
assertPartitionExists(...). Concretely: locate the created partition (use the
same tableName and partition name used in assertPartitionExists), read the
partition’s Parquet/metadata or bloom artifacts (e.g., parquet file footer
metadata keys for bloom_filter_columns and fpp or presence/content of bloom
filter files for column "id"), and add assertions that the bloom_filter_columns
contains "id" and that fpp matches the provided value (e.g., "0.05", "0.01",
"0.1"). Ensure these checks run in each corresponding test immediately after
assertPartitionExists so the tests fail if the options are parsed but not
applied.

---

Nitpick comments:
In `@core/rust/qdbr/src/parquet_write/file.rs`:
- Around line 37-55: WriteOptions is being cloned on the hot path and its
bloom_filter_columns (HashSet<usize>) causes expensive deep clones; change
bloom_filter_columns to a cheap-to-clone type (e.g., Arc<HashSet<usize>>) or
remove it from per-page WriteOptions and pass bloom column selection separately
to the per-page writer. Update the WriteOptions struct (symbol: WriteOptions) to
use Arc<HashSet<usize>> for bloom_filter_columns and adjust all
usages/constructors and clones (including where WriteOptions is cloned around
line ~850) to account for Arc, or refactor the code that calls
WriteOptions.clone() to accept a separate &HashSet/Arc<HashSet> parameter
instead.
- Around line 530-551: collect_multi_partition_pages currently eagerly
materializes all Page values into the all_pages Vec causing high memory use;
change it to stream pages instead by removing the Vec allocation and either (A)
change the function to return a streaming iterator (e.g., impl Iterator<Item =
ParquetResult<Page>> or a boxed Box<dyn Iterator...>) that yields the results of
chaining/flattening each column_chunk_to_pages(...) iterator (use
Iterator::chain or flat_map), or (B) change the API to accept a sink/Compressor
callback and directly feed each pages_iter into the Compressor as you iterate.
Update callers to consume the iterator or pass the compressor so pages are
processed incrementally rather than collected into all_pages. Ensure references
to collect_multi_partition_pages, column_chunk_to_pages, and Compressor are
updated accordingly.

In `@core/rust/qdbr/src/parquet_write/util.rs`:
- Around line 428-442: The boundary test comments and final assertion around
mm.update_unsigned are misleading: update the comment for the i32::MIN case to
state that i32::MIN maps to 0x80000000 (2147483648u32) rather than calling it
the largest u32, and add one more assertion after the i32::MIN update to assert
mm.max is still Some(-1) so the invariant is fully checked; locate the test
block using mm.update_unsigned, mm.min and mm.max and adjust the comment text
and append the final max assertion accordingly.

In `@core/src/main/java/io/questdb/cairo/TableWriter.java`:
- Around line 7114-7154: The parseBloomFilterColumnIndexes method is recomputing
descriptorIndex by scanning prior columns for every token; precompute a
metadataIndex→descriptorIndex mapping once and use it during token parsing to
avoid the inner loop. Add a small int[] or DirectIntList (size
metadata.getColumnCount()) and populate it in a single pass over metadata
(incrementing a counter when metadata.getColumnType(i) > 0) to record
descriptorIndex for each metadata index, then replace the per-token loop that
computes descriptorIndex with a constant-time lookup into that map inside
parseBloomFilterColumnIndexes (keep existing checks for metadataIndex >= 0 and
duplicate detection).

In `@core/src/main/java/io/questdb/cutlass/parquet/CopyExportRequestTask.java`:
- Around line 85-113: Extract the comma-splitting + trimming logic used in
validateBloomFilterColumns and parseBloomFilterColumnIndexes into a single
private static helper (e.g., tokenizeColumnList or splitAndTrimColumns) that
returns an iterable/list of CharSequence tokens and accepts the original
CharSequence input and a base position; then update validateBloomFilterColumns
to iterate that helper's tokens and perform meta.getColumnIndexQuiet checks,
throwing SqlException with the appropriate position, and update
parseBloomFilterColumnIndexes to reuse the same helper but translate
missing-column cases into CairoException as before; finally, move and group
these new static helper(s) alongside other static members and sort
alphabetically to satisfy the class member ordering rules.

In
`@core/src/main/java/io/questdb/griffin/engine/table/PushdownFilterExtractor.java`:
- Around line 103-107: Rename boolean identifiers to follow the is/has
convention: change the local variable allConstant to a boolean name like
areAllConstants (or isAllConstant) and rename the method containsQuery(...) to
hasQuery(...). Update all references inside PushdownFilterExtractor (search for
allConstant and containsQuery), including the related occurrences around lines
136-152, plus any call sites, imports, and tests to use the new names so
compilation succeeds.

In `@core/src/main/java/io/questdb/griffin/SqlCodeGenerator.java`:
- Around line 3188-3196: Rename the two local boolean variables to follow the
is/has convention: change enableParallelFilter to isParallelFilterEnabled and
enablePreTouch to isPreTouchEnabled (assignments remain the same:
isParallelFilterEnabled = executionContext.isParallelFilterEnabled();
isPreTouchEnabled = SqlHints.hasEnablePreTouchHint(model, model.getName());),
and update all subsequent references in this block (including the if check that
uses factory.supportsPageFrameCursor() and the later logic that uses these
booleans alongside useJit and canCompile) to use the new names.

In `@core/src/test/java/io/questdb/test/std/str/CharsTest.java`:
- Around line 386-422: Add an explicit BMP vs supplementary comparison to
testGreaterThanAndLessThan that would fail under per-char ordering but pass
under code point ordering: in the CharsTest.testGreaterThanAndLessThan method,
add assertions using Chars.lessThan("\uFFFF", "\uD800\uDC00") and
Chars.greaterThan("\uD800\uDC00", "\uFFFF") (i.e., compare U+FFFF to the
surrogate pair for U+10000) to document and verify the intended code point
semantics.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 1a3e7343-df1f-4aca-92ce-b6a43dd5f874

📥 Commits

Reviewing files that changed from the base of the PR and between 80f1f43 and bdf9550.

⛔ Files ignored due to path filters (5)

core/src/main/resources/io/questdb/bin/darwin-aarch64/libquestdbr.dylib is excluded by !**/*.dylib
core/src/main/resources/io/questdb/bin/darwin-x86-64/libquestdbr.dylib is excluded by !**/*.dylib
core/src/main/resources/io/questdb/bin/linux-aarch64/libquestdbr.so is excluded by !**/*.so
core/src/main/resources/io/questdb/bin/linux-x86-64/libquestdbr.so is excluded by !**/*.so
core/src/main/resources/io/questdb/bin/windows-x86-64/questdbr.dll is excluded by !**/*.dll

📒 Files selected for processing (35)

core/rust/qdbr/src/parquet_read/decode.rs
core/rust/qdbr/src/parquet_read/decoders/converters.rs
core/rust/qdbr/src/parquet_read/decoders/dictionary.rs
core/rust/qdbr/src/parquet_read/jni.rs
core/rust/qdbr/src/parquet_read/mod.rs
core/rust/qdbr/src/parquet_read/row_groups.rs
core/rust/qdbr/src/parquet_write/file.rs
core/rust/qdbr/src/parquet_write/jni.rs
core/rust/qdbr/src/parquet_write/mod.rs
core/rust/qdbr/src/parquet_write/symbol.rs
core/rust/qdbr/src/parquet_write/util.rs
core/src/main/java/io/questdb/PropertyKey.java
core/src/main/java/io/questdb/cairo/CairoConfiguration.java
core/src/main/java/io/questdb/cairo/TableWriter.java
core/src/main/java/io/questdb/cairo/sql/PartitionFrameCursorFactory.java
core/src/main/java/io/questdb/cutlass/http/processors/ExportQueryProcessor.java
core/src/main/java/io/questdb/cutlass/parquet/CopyExportRequestTask.java
core/src/main/java/io/questdb/griffin/SqlCodeGenerator.java
core/src/main/java/io/questdb/griffin/SqlCompilerImpl.java
core/src/main/java/io/questdb/griffin/SqlParser.java
core/src/main/java/io/questdb/griffin/engine/functions/table/ReadParquetPageFrameCursor.java
core/src/main/java/io/questdb/griffin/engine/table/BwdTableReaderPageFrameCursor.java
core/src/main/java/io/questdb/griffin/engine/table/FwdTableReaderPageFrameCursor.java
core/src/main/java/io/questdb/griffin/engine/table/ParquetRowGroupFilter.java
core/src/main/java/io/questdb/griffin/engine/table/PushdownFilterExtractor.java
core/src/main/java/io/questdb/griffin/engine/table/parquet/PartitionDecoder.java
core/src/main/java/io/questdb/griffin/engine/table/parquet/PartitionEncoder.java
core/src/main/java/io/questdb/std/Chars.java
core/src/main/resources/io/questdb/site/conf/server.conf
core/src/test/java/io/questdb/test/griffin/AlterTableConvertPartitionTest.java
core/src/test/java/io/questdb/test/griffin/CopyExportTest.java
core/src/test/java/io/questdb/test/griffin/engine/table/parquet/ReadParquetFunctionTest.java
core/src/test/java/io/questdb/test/std/str/CharsTest.java
core/src/test/resources/sqllogictest/test/parquet/test_parquet_filter_pushdown.test
pkg/ami/marketplace/assets/server.conf

🚧 Files skipped from review as they are similar to previous changes (10)

core/src/test/java/io/questdb/test/griffin/CopyExportTest.java
core/rust/qdbr/src/parquet_read/decoders/converters.rs
core/src/main/java/io/questdb/cairo/CairoConfiguration.java
core/src/test/resources/sqllogictest/test/parquet/test_parquet_filter_pushdown.test
core/rust/qdbr/src/parquet_read/decode.rs
core/src/main/java/io/questdb/PropertyKey.java
core/src/main/java/io/questdb/griffin/SqlParser.java
pkg/ami/marketplace/assets/server.conf
core/src/main/resources/io/questdb/site/conf/server.conf
core/src/main/java/io/questdb/griffin/engine/table/parquet/PartitionEncoder.java

core/rust/qdbr/src/parquet_read/jni.rs

core/rust/qdbr/src/parquet_read/row_groups.rs

core/src/main/java/io/questdb/griffin/engine/table/PushdownFilterExtractor.java

core/src/test/java/io/questdb/test/griffin/AlterTableConvertPartitionTest.java

Resolve conflicts from upstream clippy fixes and binary rebuilds. Accept upstream binaries, adopt combined #[allow(clippy::...)] attrs and removed redundant doc comment. Keep branch CI fixes (extra path pattern and object-count check) while adopting upstream's find-based chmod. Co-Authored-By: Claude Opus 4.6 <[email protected]>

…down_bloom

The write-side Parquet JNI functions used panic!/assert!/expect() which would crash the JVM via undefined behavior if triggered. This commit replaces them with proper error returns, matching the pattern already used by the read-side JNI functions. Changes by category: Java-guaranteed invariants (null pointer checks from Java callers): - PartitionUpdater destroy/updateFileMetadata/updateRowGroup: replace panic!/assert! with if-null checks that throw CairoException or silently return (for destroy, matching free(NULL) semantics). External-data-dependent panics (corrupted Parquet/column data): - update.rs: replace panic on invalid Parquet version with Result error. - string.rs: replace expect() on invalid offsets and UTF-16 data with map_err error propagation. Change encode_plain/encode_delta to return ParquetResult. - binary.rs: replace expect() on invalid offsets with map_err. Add upfront offset validation in encode_delta to protect all downstream as-usize casts. Change encode_plain/encode_delta to return ParquetResult. - varchar.rs: replace assert! on data corruption with Err return. - column_chunk.rs: replace unwrap() on bloom filter mutex with map_err to handle poisoned lock gracefully. Also changes create_partition_descriptor assert_eq! on col_data_len alignment to a proper Err return, since the function already returns ParquetResult. Co-Authored-By: Claude Opus 4.6 <[email protected]>

Co-Authored-By: Claude Opus 4.6 <[email protected]>

…nto parquet_filter_pushdown_bloom

glasstiger · 2026-03-06T21:29:04Z

[PR Coverage check]

😍 pass : 4877 / 6029 (80.89%)

file detail

	path	covered line	new line	coverage
🔵	io/questdb/cairo/TableReader.java	0	4	00.00%
🔵	io/questdb/cutlass/text/CopyExportContext.java	0	1	00.00%
🔵	qdbr/src/parquet_read/jni.rs	0	166	00.00%
🔵	qdbr/src/parquet_write/binary.rs	0	41	00.00%
🔵	qdbr/src/parquet_write/jni.rs	0	60	00.00%
🔵	qdbr/src/parquet_write/update.rs	12	41	29.27%
🔵	qdbr/src/parquet_write/varchar.rs	21	59	35.59%
🔵	qdbr/src/parquet_write/string.rs	17	36	47.22%
🔵	io/questdb/cairo/sql/RecordCursorFactory.java	1	2	50.00%
🔵	qdbr/src/parquet_read/row_groups.rs	471	853	55.22%
🔵	qdbr/parquet2/src/write/file.rs	28	42	66.67%
🔵	qdbr/parquet2/src/bloom_filter/read.rs	31	45	68.89%
🔵	qdbr/src/parquet_write/primitive.rs	91	132	68.94%
🔵	qdbr/src/parquet_read/decoders/dictionary.rs	3	4	75.00%
🔵	qdbr/src/parquet_write/file.rs	167	218	76.61%
🔵	io/questdb/cairo/MetadataCache.java	17	22	77.27%
🔵	io/questdb/PropServerConfiguration.java	8	10	80.00%
🔵	io/questdb/cutlass/http/processors/ExportQueryProcessor.java	20	24	83.33%
🔵	io/questdb/griffin/engine/table/ParquetRowGroupFilter.java	174	205	84.88%
🔵	io/questdb/griffin/engine/table/PushdownFilterExtractor.java	207	244	84.84%
🔵	qdbr/src/parquet_write/symbol.rs	40	47	85.11%
🔵	qdbr/src/parquet_write/util.rs	24	28	85.71%
🔵	qdbr/src/parquet_write/simd.rs	815	942	86.52%
🔵	io/questdb/griffin/SqlParser.java	19	21	90.48%
🔵	io/questdb/griffin/SqlCompilerImpl.java	45	50	90.00%
🔵	io/questdb/cairo/TableWriter.java	50	55	90.91%
🔵	qdbr/src/parquet_write/fixed_len_bytes.rs	21	23	91.30%
🔵	qdbr/parquet2/src/write/row_group.rs	12	13	92.31%
🔵	io/questdb/cutlass/parquet/CopyExportRequestTask.java	62	67	92.54%
🔵	io/questdb/griffin/engine/table/BwdTableReaderPageFrameCursor.java	109	116	93.97%
🔵	qdbr/parquet2/src/write/column_chunk.rs	47	50	94.00%
🔵	qdbr/src/parquet_read/decode.rs	131	136	96.32%
🔵	io/questdb/griffin/engine/table/FwdTableReaderPageFrameCursor.java	80	82	97.56%
🔵	io/questdb/griffin/SqlCodeGenerator.java	250	256	97.66%
🔵	qdbr/src/parquet_read/mod.rs	911	926	98.38%
🔵	qdbr/src/parquet_write/mod.rs	754	769	98.05%
🔵	io/questdb/griffin/engine/table/parquet/PartitionDecoder.java	5	5	100.00%
🔵	io/questdb/cutlass/http/HttpConstants.java	2	2	100.00%
🔵	io/questdb/griffin/engine/functions/table/ReadParquetRecordCursor.java	21	21	100.00%
🔵	qdbr/parquet2/src/encoding/hybrid_rle/mod.rs	1	1	100.00%
🔵	io/questdb/cutlass/parquet/SQLSerialParquetExporter.java	13	13	100.00%
🔵	io/questdb/std/Chars.java	20	20	100.00%
🔵	io/questdb/cairo/TxReader.java	4	4	100.00%
🔵	io/questdb/cairo/DefaultCairoConfiguration.java	3	3	100.00%
🔵	io/questdb/cairo/O3PartitionJob.java	1	1	100.00%
🔵	io/questdb/griffin/SqlExecutionContextImpl.java	4	4	100.00%
🔵	qdbr/parquet2/src/encoding/mod.rs	1	1	100.00%
🔵	io/questdb/griffin/model/ExportModel.java	16	16	100.00%
🔵	io/questdb/cairo/TableWriterMetadata.java	3	3	100.00%
🔵	io/questdb/PropertyKey.java	3	3	100.00%
🔵	qdbr/parquet2/src/write/page.rs	1	1	100.00%
🔵	io/questdb/griffin/engine/functions/table/ReadParquetRecordCursorFactory.java	6	6	100.00%
🔵	io/questdb/griffin/engine/table/parquet/PartitionEncoder.java	5	5	100.00%
🔵	io/questdb/griffin/engine/table/PageFrameRecordCursorFactory.java	4	4	100.00%
🔵	qdbr/parquet2/src/bloom_filter/split_block.rs	17	17	100.00%
🔵	io/questdb/cairo/CairoTable.java	4	4	100.00%
🔵	io/questdb/cairo/CairoConfigurationWrapper.java	3	3	100.00%
🔵	qdbr/parquet2/src/encoding/bitpacked/encode.rs	1	1	100.00%
🔵	io/questdb/griffin/SqlKeywords.java	25	25	100.00%
🔵	io/questdb/cairo/AbstractPartitionFrameCursorFactory.java	7	7	100.00%
🔵	io/questdb/griffin/engine/ops/AlterOperationBuilder.java	3	3	100.00%
🔵	io/questdb/griffin/engine/functions/table/ReadParquetPageFrameRecordCursorFactory.java	8	8	100.00%
🔵	io/questdb/cutlass/parquet/CopyExportRequestJob.java	4	4	100.00%
🔵	io/questdb/griffin/engine/functions/table/ReadParquetPageFrameCursor.java	25	25	100.00%
🔵	io/questdb/cairo/TableStructure.java	1	1	100.00%
🔵	io/questdb/griffin/engine/table/AbstractPageFrameRecordCursorFactory.java	3	3	100.00%
🔵	io/questdb/griffin/engine/ops/CopyExportFactory.java	15	15	100.00%
🔵	io/questdb/griffin/engine/ops/AlterOperation.java	9	9	100.00%
🔵	qdbr/src/parquet_write/schema.rs	1	1	100.00%

…om filters (questdb#6739)

kafka1991 and others added 30 commits January 13, 2026 12:10

reduce memory copy for decode data

b95640f

continue optimize decode parquet

2fc1ff6

continue optimize decode parquet

92f0981

Merge branch 'master' into parquet_decode_opt

eb001f7

remove comment

302a5e3

use require flag when symbol column doesn't contain null

843014a

change write parquet default compress algorithm to lz4

a467ce7

Merge branch 'master' into parquet_decode_opt

57b65c6

add code

5f14401

Rebuild Rust libraries

51e6df6

fix symbolMapWriter contains null bug

e70691c

fix symbolMapNullFlagsChanged bug

01188db

merge fallout

56b630f

make default compression to lz4_raw

a2d39e9

Rebuild Rust libraries

79d8183

optimize parquet buffer decoding by adding cache hit checks

598af9f

fix tests

1c3ae60

adjust parquetCacheSize parameter to 8

821b340

better performance

fe182ee

code fmt

6e2f05b

Rebuild Rust libraries

a09fdf5

tuning performance

af4f498

tuning performance

7fc9438

Rebuild Rust libraries

4cdc29f

tuning performance

c6a1e5f

Rebuild Rust libraries

751a9b8

fix tests

218f483

merge fallout

49fe4a3

read_parquet later materialization

40b9dcb

Merge branch 'master' into parquet_decode_opt

e6bef54

code review

bdf9550

coderabbitai bot reviewed Mar 5, 2026

View reviewed changes

kafka1991 and others added 20 commits March 5, 2026 12:33

code review

28b4246

try to fix CI coverage pipeline

3d9c98f

try to fix CI coverage pipeline

0f7f687

try to fix CI coverage pipeline

645f3e3

fix cargo clippy

d732995

Rebuild Rust libraries

2e7a7f5

optimize containsQuery method to use iterative approach

bdd6e6a

code lint

8e828cb

reduce member variable

7604929

fix cargo clippy on bench

b83d5f5

Rebuild Rust libraries

f1cc2f2

Merge remote-tracking branch 'origin/master' into parquet_filter_push…

7b41b22

…down_bloom

cargo fmt

9fad671

Co-Authored-By: Claude Opus 4.6 <[email protected]>

Rebuild Rust libraries

5eb1c3f

tidy

c85394a

Merge remote-tracking branch 'origin/parquet_filter_pushdown_bloom' i…

49ae179

…nto parquet_filter_pushdown_bloom

Merge branch 'master' into parquet_filter_pushdown_bloom

d4f991e

Rebuild Rust libraries

1dc2403

bluestreak01 approved these changes Mar 8, 2026

View reviewed changes

bluestreak01 merged commit 834d0fe into master Mar 8, 2026
45 checks passed

bluestreak01 deleted the parquet_filter_pushdown_bloom branch March 8, 2026 00:42

maciulis pushed a commit to maciulis/questdb that referenced this pull request Mar 16, 2026

feat(core): parquet row group pruning with min/max statistics and blo…

97928b8

…om filters (questdb#6739)

RaphDal linked an issue Mar 23, 2026 that may be closed by this pull request

Bloom filters for high-cardinality column lookups questdb/roadmap#113

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(core): parquet row group pruning with min/max statistics and bloom filters#6739

feat(core): parquet row group pruning with min/max statistics and bloom filters#6739
bluestreak01 merged 215 commits intomasterfrom
parquet_filter_pushdown_bloom

kafka1991 commented Feb 3, 2026 •

edited

Loading

Uh oh!

coderabbitai bot commented Mar 5, 2026

Uh oh!

questdb-butler commented Mar 5, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

glasstiger commented Mar 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

kafka1991 commented Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Supported filter operations

Supported column types

Filter extraction

Implementation notes

Bloom filter generation

Configuration

Other changes

Benchmark

TODO

Uh oh!

coderabbitai bot commented Mar 5, 2026

Uh oh!

questdb-butler commented Mar 5, 2026

⚠️ Enterprise CI Failed

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

glasstiger commented Mar 6, 2026

[PR Coverage check]

file detail

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

kafka1991 commented Feb 3, 2026 •

edited

Loading