Skip to content

feat(core): parquet row group pruning with min/max statistics and bloom filters#6739

Merged
bluestreak01 merged 215 commits intomasterfrom
parquet_filter_pushdown_bloom
Mar 8, 2026
Merged

feat(core): parquet row group pruning with min/max statistics and bloom filters#6739
bluestreak01 merged 215 commits intomasterfrom
parquet_filter_pushdown_bloom

Conversation

@kafka1991
Copy link
Copy Markdown
Collaborator

@kafka1991 kafka1991 commented Feb 3, 2026

depend #6675

Summary

Queries on Parquet partitions can now skip entire row groups that contain no matching rows. Row group pruning uses three strategies:

  • Min/max statistics: row groups whose per-column value range does not overlap the filter values are skipped. This uses metadata already present in every Parquet file — no configuration needed.
  • Bloom filters: row groups whose bloom filter reports no match for any filter value are skipped. Bloom filter generation is opt-in via the bloom_filter_columns option (see DDL/API details below).
  • Null count statistics: row groups where null count is zero (for IS NULL) or where all values are null (for IS NOT NULL) are skipped.

Pruning applies to all Parquet read paths (forward scan, backward scan, read_parquet(), parallel page frame execution).

Supported filter operations

Operation Example Pushdown Method
Equality col = value Bloom filter + Min/Max
In List col IN (v1, v2, ...) Bloom filter + Min/Max
Less Than col < value Min/Max statistics
Less Than or Equal col <= value Min/Max statistics
Greater Than col > value Min/Max statistics
Greater Than or Equal col >= value Min/Max statistics
Between col BETWEEN v1 AND v2 Min/Max statistics (decomposed to GE + LE)
Is Null col IS NULL Null count statistics
Is Not Null col IS NOT NULL Null count statistics
OR equalities col = v1 OR col = v2 OR ... Bloom filter + Min/Max (same column only)

Supported column types

Pushdown filtering supports: BYTE, SHORT, CHAR, INT, LONG, FLOAT, DOUBLE, TIMESTAMP, DATE, IPv4, UUID, LONG128, STRING, SYMBOL, VARCHAR, and all DECIMAL widths (8/16/32/64/128/256).

Not supported (pruning silently skipped): BOOLEAN, LONG256, GEOBYTE/GEOSHORT/GEOINT/GEOLONG, BINARY, INTERVAL.

Filter extraction

Top-level AND-connected predicates are extracted. Multiple AND conditions form multiple pushdown filter entries. OR-connected equalities on the same column (col = v1 OR col = v2 OR ...) are also extracted and treated as an IN-list for bloom filter and min/max checks.

Value expressions must be constant or runtime-constant (e.g., bind variables, now()). Non-constant expressions are silently skipped.

Implementation notes

  • IPv4 uses unsigned comparison for correct ordering of high IP addresses (> 128.x.x.x)
  • Third-party Parquet files with unsigned integer types (UInt8/16/32/64) skip min/max range filtering, as QuestDB filter values are signed and cannot be correctly compared against unsigned statistics
  • Breaking change 💥: Chars.lessThan() / Chars.greaterThan() now use Unicode code point comparison instead of Java UTF-16 code unit comparison. This aligns STRING/SYMBOL ordering with the SQL standard(byte-order Collation) and enables range filter pushdown for string columns
  • Parquet DATE columns (physical type Int32, days since epoch) are handled by converting QuestDB's millisecond filter values to days before comparing with Parquet statistics

Bloom filter generation

Bloom filter columns and false positive probability (FPP) can be specified on three paths:

  • ALTER TABLE DDL: ALTER TABLE t CONVERT PARTITION TO PARQUET WITH (bloom_filter_columns = 'col1,col2', fpp = 0.01)
  • COPY TO export: COPY t TO '/path/file.parquet' WITH (bloom_filter_columns = 'col1,col2', bloom_filter_fpp = 0.01)
  • HTTP export (/exp): query parameters bloom_filter_columns=col1,col2 and bloom_filter_fpp=0.01

The bloom filter bitset is sized optimally using the exact NDV (number of distinct values) collected during encoding via HashSet<u64>. The sizing formula follows the Parquet Split Block Bloom Filter spec: m = -8 * ndv / ln(1 - fpp^(1/8)), clamped to [32 bytes, 4 MB].

Configuration

Property Default Description
cairo.sql.parquet.row.group.pruning.enabled true Enable/disable row group pruning at query time
cairo.partition.encoder.parquet.bloom.filter.fpp 0.01 Default bloom filter FPP for partition encoding
cairo.parquet.export.bloom.filter.fpp 0.01 Default bloom filter FPP for Parquet exports
cairo.parquet.export.statistics.enabled true Enable/disable statistics generation for Parquet exports

Other changes

  • PartitionDecoder.Metadata.getColumnIndex() now uses case-insensitive column name matching (Chars.equalsIgnoreCase) to align with QuestDB's case-insensitive column semantics and improve interoperability with Parquet files generated by other tools.
  • The O3 merge path (O3PartitionJob) now propagates the global bloomFilterFpp configuration to PartitionUpdater, but does not propagate per-column bloom filter indexes. O3-merged Parquet partitions do not preserve the original per-column bloom filter specification.

Benchmark

Using the ClickBench hits.parquet dataset (~100M rows, ~15GB):

SELECT UserID FROM read_parquet('hits.parquet') WHERE UserID = 9223371425449051681;
Branch Time
master 8s
patch 50ms

This query benefits from min/max statistics alone — no bloom filters needed. Most row groups are skipped because the target value falls outside their min/max range.

TODO

  • Add canSkipRowGroup native function
  • Write bloom filter during Parquet encoding (configurable)
  • Extract filter predicates from SQL and invoke row group skipping
  • Reduce temporary CharSequence object creation in CopyExportFactory and ExportQueryProcessor Parquet export paths
  • Add configuration switch to control canSkipRowGroup behavior, allowing the skip logic to be disabled if bugs arise
  • Add configurable default value for bloom filter FPP (false positive probability)
  • Add memory limit for HashSet when collecting values for bloom filter generation during Parquet write; skip bloom filter creation if limit is exceeded to avoid excessive memory usage
  • Add range comparison support (<, <=, >, >=, BETWEEN)
  • Add null check support (IS NULL, IS NOT NULL)
  • Add OR-connected equality extraction (col = v1 OR col = v2)
  • Fix STRING/SYMBOL comparison to use Unicode code point order (UTF-8 consistent)
  • Fix DATE filter pushdown for Parquet Int32 physical type (days vs millis conversion)
  • Add RleDictionary encoding support for DATE columns
  • Add decimal type support (after decimal PR is merged)
  • Tests - Add min/max filter and bloom filter skip tests for skipRowGroup covering all supported column types
  • Tests - Add null value tests for skipRowGroup covering all supported column types, including column top scenarios
  • Tests - Add syntax tests for CopyExportFactory and CONVERT TO PARQUET DDL with bloom_filter_columns option

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 5, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@questdb-butler
Copy link
Copy Markdown

⚠️ Enterprise CI Failed

The enterprise test suite failed for this PR.

Build: View Details
Tested Commit: 07b39675746d70bfd2cb5ad067f44f41f9f88586

Please investigate the failure before merging.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
core/src/main/java/io/questdb/std/Chars.java (1)

138-162: ⚠️ Potential issue | 🟡 Minor

Inconsistent string comparison semantics between compare() and greaterThan()/lessThan() may cause ordering bugs.

The compare() method uses per-character (UTF-16 code unit) comparison, while greaterThan() and lessThan() use proper code point comparison via Character.codePointAt(). For strings with supplementary characters (e.g., emoji, rare CJK), this creates inconsistent ordering.

This matters because both are used for overlapping operations:

  • compare() powers MinStrGroupByFunction and MaxStrGroupByFunction (aggregation)
  • lessThan() powers LtStrFunctionFactory (SQL < filter predicates)

If a query aggregates min/max with compare() then applies filter predicates with lessThan(), results may be inconsistent for supplementary characters. Consider updating compare() to use code point comparison for consistency, or document why UTF-16 semantics are intentional.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core/src/main/java/io/questdb/std/Chars.java` around lines 138 - 162, The
compare(CharSequence l, CharSequence r) implementation currently does UTF-16
code unit subtraction causing inconsistent ordering vs. greaterThan()/lessThan()
which use code points; update compare() to perform code point-aware comparison
(use Character.codePointAt and advance indices by
Character.charCount(codePoint)) so it matches the semantics used by
lessThan()/greaterThan(); ensure edge cases (null identity, length/remaining
code points) and behavior relied on by MinStrGroupByFunction and
MaxStrGroupByFunction remain consistent with LtStrFunctionFactory.
♻️ Duplicate comments (6)
core/src/main/java/io/questdb/cairo/TableWriter.java (2)

196-196: ⚠️ Potential issue | 🟠 Major

Avoid per-call free/realloc of bloomFilterIndexes; keep one lifecycle and free in doClose().

At Line 1611, Misc.free(bloomFilterIndexes) inside the conversion path forces repetitive native free/alloc cycles across partition conversions, and it also leaves cleanup ambiguous when conversion is never invoked. Reuse the list per call (reopen + clear) and release it once in doClose().

♻️ Suggested fix
@@
-                try {
-                    if (bloomFilterColumns != null && !bloomFilterColumns.isEmpty()) {
-                        bloomFilterIndexes.reopen();
-                        parseBloomFilterColumnIndexes(bloomFilterColumns, bloomFilterIndexes);
-                        bloomFilterColumnIndexesPtr = bloomFilterIndexes.getAddress();
-                        bloomFilterColumnCount = (int) bloomFilterIndexes.size();
-                    }
+                if (bloomFilterColumns != null && !bloomFilterColumns.isEmpty()) {
+                    bloomFilterIndexes.reopen();
+                    bloomFilterIndexes.clear();
+                    parseBloomFilterColumnIndexes(bloomFilterColumns, bloomFilterIndexes);
+                    bloomFilterColumnIndexesPtr = bloomFilterIndexes.getAddress();
+                    bloomFilterColumnCount = (int) bloomFilterIndexes.size();
+                }
 
-                    PartitionEncoder.encodeWithOptions(
-                            partitionDescriptor,
-                            other,
-                            ParquetCompression.packCompressionCodecLevel(compressionCodec, compressionLevel),
-                            statisticsEnabled,
-                            rawArrayEncoding,
-                            rowGroupSize,
-                            dataPageSize,
-                            parquetVersion,
-                            bloomFilterColumnIndexesPtr,
-                            bloomFilterColumnCount,
-                            fpp
-                    );
-                } finally {
-                    Misc.free(bloomFilterIndexes);
-                }
+                PartitionEncoder.encodeWithOptions(
+                        partitionDescriptor,
+                        other,
+                        ParquetCompression.packCompressionCodecLevel(compressionCodec, compressionLevel),
+                        statisticsEnabled,
+                        rawArrayEncoding,
+                        rowGroupSize,
+                        dataPageSize,
+                        parquetVersion,
+                        bloomFilterColumnIndexesPtr,
+                        bloomFilterColumnCount,
+                        fpp
+                );
// In doClose():
Misc.free(bloomFilterIndexes);

Also applies to: 1589-1612

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core/src/main/java/io/questdb/cairo/TableWriter.java` at line 196, The field
bloomFilterIndexes in TableWriter is being freed per conversion call causing
repeated native free/alloc; instead keep it allocated for the TableWriter
lifecycle, clear and reuse it during conversions (use reopen/clear on
bloomFilterIndexes where conversion logic runs) and move the
Misc.free(bloomFilterIndexes) call into TableWriter.doClose() so it is released
once when the writer is closed; update conversion code to remove per-call
Misc.free and replace with bloomFilterIndexes.clear() or equivalent reuse logic.

1587-1587: ⚠️ Potential issue | 🟠 Major

Validate fpp before passing it into native encoding.

Line 1587 accepts infinities and out-of-range values; these should be rejected early to avoid undefined native behavior.

🛡️ Suggested fix
                 double fpp = Double.isNaN(bloomFilterFpp) ? config.getPartitionEncoderParquetBloomFilterFpp() : bloomFilterFpp;
+                if (!Double.isFinite(fpp) || fpp <= 0.0 || fpp >= 1.0) {
+                    throw CairoException.nonCritical()
+                            .put("bloom_filter_fpp must be in (0,1), got: ")
+                            .put(fpp);
+                }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core/src/main/java/io/questdb/cairo/TableWriter.java` at line 1587, Validate
the computed fpp in TableWriter (the local variable fpp computed from
bloomFilterFpp or config.getPartitionEncoderParquetBloomFilterFpp()) before
passing it into native encoding: ensure it's finite (not NaN or infinite) and
within the valid probability range (0.0 < fpp < 1.0, or use your project's
accepted bounds), and if not throw an IllegalArgumentException with a clear
message; update the code path that calls the native encoder to use this
validated fpp so invalid values are rejected early.
core/src/main/java/io/questdb/griffin/engine/table/ParquetRowGroupFilter.java (1)

77-79: ⚠️ Potential issue | 🟡 Minor

Rename boolean locals to is.../has... prefixes.

Line 77 (skip), Line 145 (supported), and Line 212 (allCompatible) should follow the repository boolean naming rule.

As per coding guidelines **/*.java: “When choosing a name for a boolean variable, field or method, always use the is... or has... prefix, as appropriate”.

Also applies to: 145-146, 212-221

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@core/src/main/java/io/questdb/griffin/engine/table/ParquetRowGroupFilter.java`
around lines 77 - 79, Rename the boolean local variables in
ParquetRowGroupFilter to follow the is.../has... convention: change skip (used
with decoder.canSkipRowGroup) to isSkippableRowGroup or isSkippable, change
supported to isSupported, and change allCompatible to areAllCompatible (or
isAllCompatible if you prefer singular). Update every use site within the
method(s) (including the conditional checks and any increments like
rowGroupsSkipped.incrementAndGet()) to reference the new names so compilation
and logic remain unchanged.
core/rust/qdbr/src/parquet_write/file.rs (1)

158-164: ⚠️ Potential issue | 🟠 Major

Use runtime FPP validation in release builds, not only debug_assert!.

At Line 159 this check disappears in release builds, so invalid fpp can still reach bloom sizing math. Please enforce 0.0 < fpp < 1.0 at runtime (assert!/error return) in the public setter.

#!/bin/bash
# Verify whether FPP validation in Rust is debug-only vs runtime-enforced.
rg -n "with_bloom_filter_fpp|debug_assert!|assert!|bloom_filter_fpp\\s*[<>]" \
  core/rust/qdbr/src/parquet_write/file.rs \
  core/rust/qdbr/src/parquet_write/jni.rs

Expected result: if with_bloom_filter_fpp only has debug_assert!, runtime validation is not guaranteed in release.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core/rust/qdbr/src/parquet_write/file.rs` around lines 158 - 164, The
debug-only check in with_bloom_filter_fpp allows invalid fpp in release builds;
replace the debug_assert! with runtime validation: either use assert!(fpp > 0.0
&& fpp < 1.0, "...") to panic on invalid input at runtime or (preferable for a
public setter) change with_bloom_filter_fpp signature to return Result<Self,
Error> and return an Err when fpp is out of range, then only set
self.bloom_filter_fpp = fpp and return Ok(self) when valid; update callers
accordingly. Ensure you reference the with_bloom_filter_fpp method and the
bloom_filter_fpp field when making the change.
core/src/test/java/io/questdb/test/griffin/AlterTableConvertPartitionTest.java (1)

115-179: ⚠️ Potential issue | 🟠 Major

Bloom-option positive-path tests still don’t prove bloom options were applied.

Line 126, Line 142, Line 158, and Line 175 currently only prove conversion succeeded (assertPartitionExists). These would still pass if bloom_filter_columns / fpp were parsed but ignored in the convert path.

Suggested follow-up assertion pattern
+import io.questdb.griffin.engine.table.ParquetRowGroupFilter;
...
 execute("ALTER TABLE " + tableName + " CONVERT PARTITION TO PARQUET LIST '2024-06-10' WITH (bloom_filter_columns = 'id')");
 assertPartitionExists(tableName, "2024-06-10.3");
+ParquetRowGroupFilter.resetRowGroupsSkipped();
+assertQueryNoLeakCheck(
+        "cnt\n0\n",
+        "SELECT COUNT(*) cnt FROM " + tableName + " WHERE id = -1",
+        null,
+        false,
+        true
+);
+Assert.assertTrue(ParquetRowGroupFilter.getRowGroupsSkipped() > 0);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@core/src/test/java/io/questdb/test/griffin/AlterTableConvertPartitionTest.java`
around lines 115 - 179, Tests like
testConvertAllPartitionsWithBloomFilterColumns /
testConvertAllPartitionsWithBloomFilterAndFpp /
testConvertAllPartitionsWithBloomFilterFpp /
testConvertAllPartitionsWithBloomFilterWhere only assert conversion succeeded;
update each test to also verify that bloom options were applied by inspecting
the converted partition output after execute(...) and
assertPartitionExists(...). Concretely: locate the created partition (use the
same tableName and partition name used in assertPartitionExists), read the
partition’s Parquet/metadata or bloom artifacts (e.g., parquet file footer
metadata keys for bloom_filter_columns and fpp or presence/content of bloom
filter files for column "id"), and add assertions that the bloom_filter_columns
contains "id" and that fpp matches the provided value (e.g., "0.05", "0.01",
"0.1"). Ensure these checks run in each corresponding test immediately after
assertPartitionExists so the tests fail if the options are parsed but not
applied.
core/src/main/java/io/questdb/cutlass/parquet/CopyExportRequestTask.java (1)

589-619: ⚠️ Potential issue | 🟠 Major

Validate bloom filter FPP at the last pre-native boundary.

fpp is passed into createStreamingParquetWriter(...) without any local range/finite validation (Line 591-618). Even if some call sites validate, this is the final JVM boundary before JNI/native allocation logic, so it’s worth failing fast here too.

This was raised earlier; re-flagging because it still appears unvalidated in the current snippet.

Proposed fix (finite + (0,1) check)
diff --git a/core/src/main/java/io/questdb/cutlass/parquet/CopyExportRequestTask.java b/core/src/main/java/io/questdb/cutlass/parquet/CopyExportRequestTask.java
@@
             long bloomFilterIndexesPtr = 0;
             int bloomFilterCount = 0;
             double fpp = Double.isNaN(bloomFilterFpp) ? DEFAULT_BLOOM_FILTER_FPP : bloomFilterFpp;
+            if (!Double.isFinite(fpp) || fpp <= 0.0d || fpp >= 1.0d) {
+                throw CairoException.nonCritical()
+                        .put("bloom_filter_fpp must be between 0 and 1 (exclusive): ")
+                        .put(fpp);
+            }
 
             if (bloomFilterColumns != null && !bloomFilterColumns.isEmpty()) {
                 bloomFilterColumnIndexes.reopen();
                 parseBloomFilterColumnIndexes(bloomFilterColumns, metadata, bloomFilterColumnIndexes, bloomFilterColumnsPosition);
                 if (bloomFilterColumnIndexes.size() > 0) {
                     bloomFilterIndexesPtr = bloomFilterColumnIndexes.getAddress();
                     bloomFilterCount = (int) bloomFilterColumnIndexes.size();
                 }
             }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core/src/main/java/io/questdb/cutlass/parquet/CopyExportRequestTask.java`
around lines 589 - 619, The code uses fpp (derived from bloomFilterFpp) and
passes it to createStreamingParquetWriter without validating it's a finite
probability in (0,1); update CopyExportRequestTask to validate fpp after
computing it (the local variable fpp) to ensure Double.isFinite(fpp) and fpp >
0.0 && fpp < 1.0, and if not throw an IllegalArgumentException (or similar) with
a clear message including the invalid value and parameter name before calling
createStreamingParquetWriter; this protects the native boundary that expects a
valid false-positive probability.
🧹 Nitpick comments (8)
core/rust/qdbr/src/parquet_write/util.rs (1)

428-442: Tighten the boundary test wording and final assertion.

Line 428’s comment is slightly misleading (i32::MIN is not the largest u32), and adding one final max assertion would better lock the invariant.

Suggested test refinement
-        // i32::MIN (0x80000000) is largest as unsigned (2^31)
+        // i32::MIN (0x80000000) is larger than i32::MAX under u32 ordering (2^31)
         // i32::MAX (0x7FFFFFFF) is 2^31 - 1 as unsigned
@@
         mm.update_unsigned(i32::MIN); // 0x80000000 = 2147483648u32
         assert_eq!(mm.min, Some(0)); // 0 is still the smallest unsigned
+        assert_eq!(mm.max, Some(-1)); // 0xFFFFFFFF remains the largest unsigned
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core/rust/qdbr/src/parquet_write/util.rs` around lines 428 - 442, The
boundary test comments and final assertion around mm.update_unsigned are
misleading: update the comment for the i32::MIN case to state that i32::MIN maps
to 0x80000000 (2147483648u32) rather than calling it the largest u32, and add
one more assertion after the i32::MIN update to assert mm.max is still Some(-1)
so the invariant is fully checked; locate the test block using
mm.update_unsigned, mm.min and mm.max and adjust the comment text and append the
final max assertion accordingly.
core/src/test/java/io/questdb/test/std/str/CharsTest.java (1)

386-422: Good test coverage for the code point ordering change.

The tests comprehensively cover null handling, equality, basic ordering, prefix comparisons, empty strings, and supplementary characters. The surrogate pair test case (\uD834\uDD1E > "z") correctly validates that code point ordering is used rather than UTF-16 code unit ordering.

Consider adding a test case that would fail under the old per-char comparison but passes under code point comparison, to document the behavioral change:

💡 Optional: Add test for BMP vs supplementary ordering
// U+FFFF (BMP) vs U+10000 (first supplementary)
// Under char comparison: '\uFFFF' (65535) > '\uD800' (55296) → wrong
// Under code point comparison: 65535 < 65536 → correct
Assert.assertTrue(Chars.lessThan("\uFFFF", "\uD800\uDC00"));
Assert.assertTrue(Chars.greaterThan("\uD800\uDC00", "\uFFFF"));
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core/src/test/java/io/questdb/test/std/str/CharsTest.java` around lines 386 -
422, Add an explicit BMP vs supplementary comparison to
testGreaterThanAndLessThan that would fail under per-char ordering but pass
under code point ordering: in the CharsTest.testGreaterThanAndLessThan method,
add assertions using Chars.lessThan("\uFFFF", "\uD800\uDC00") and
Chars.greaterThan("\uD800\uDC00", "\uFFFF") (i.e., compare U+FFFF to the
surrogate pair for U+10000) to document and verify the intended code point
semantics.
core/src/main/java/io/questdb/cairo/TableWriter.java (1)

7114-7154: Optional perf refactor: precompute descriptor indexes once in parseBloomFilterColumnIndexes.

The current implementation recomputes descriptor positions by scanning prior columns for each token (Line 7132 loop), which is avoidable overhead on wide schemas / large column lists. Consider building a one-pass metadataIndex→descriptorIndex map before token parsing.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core/src/main/java/io/questdb/cairo/TableWriter.java` around lines 7114 -
7154, The parseBloomFilterColumnIndexes method is recomputing descriptorIndex by
scanning prior columns for every token; precompute a
metadataIndex→descriptorIndex mapping once and use it during token parsing to
avoid the inner loop. Add a small int[] or DirectIntList (size
metadata.getColumnCount()) and populate it in a single pass over metadata
(incrementing a counter when metadata.getColumnType(i) > 0) to record
descriptorIndex for each metadata index, then replace the per-token loop that
computes descriptorIndex with a constant-time lookup into that map inside
parseBloomFilterColumnIndexes (keep existing checks for metadataIndex >= 0 and
duplicate detection).
core/rust/qdbr/src/parquet_write/file.rs (2)

37-55: WriteOptions cloning is now on the hot path after adding HashSet.

Line 850 clones WriteOptions per page; after Line 52 this deep-clones bloom_filter_columns repeatedly. Consider making that field cheap-to-clone (e.g., Arc<HashSet<usize>>) or keeping bloom-column selection out of per-page options.

💡 Minimal direction to reduce clone cost
- pub bloom_filter_columns: HashSet<usize>,
+ pub bloom_filter_columns: Arc<HashSet<usize>>,
- pub fn with_bloom_filter_columns(mut self, columns: HashSet<usize>) -> Self {
-     self.bloom_filter_columns = columns;
+ pub fn with_bloom_filter_columns(mut self, columns: HashSet<usize>) -> Self {
+     self.bloom_filter_columns = Arc::new(columns);
      self
  }

Also applies to: 841-853

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core/rust/qdbr/src/parquet_write/file.rs` around lines 37 - 55, WriteOptions
is being cloned on the hot path and its bloom_filter_columns (HashSet<usize>)
causes expensive deep clones; change bloom_filter_columns to a cheap-to-clone
type (e.g., Arc<HashSet<usize>>) or remove it from per-page WriteOptions and
pass bloom column selection separately to the per-page writer. Update the
WriteOptions struct (symbol: WriteOptions) to use Arc<HashSet<usize>> for
bloom_filter_columns and adjust all usages/constructors and clones (including
where WriteOptions is cloned around line ~850) to account for Arc, or refactor
the code that calls WriteOptions.clone() to accept a separate
&HashSet/Arc<HashSet> parameter instead.

530-551: Avoid eager page materialization across partitions to cap peak memory.

collect_multi_partition_pages builds a full Vec of pages before compression. For large multi-partition row groups this can inflate RSS and add latency. Prefer streaming/flattening iterators directly into Compressor.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core/rust/qdbr/src/parquet_write/file.rs` around lines 530 - 551,
collect_multi_partition_pages currently eagerly materializes all Page values
into the all_pages Vec causing high memory use; change it to stream pages
instead by removing the Vec allocation and either (A) change the function to
return a streaming iterator (e.g., impl Iterator<Item = ParquetResult<Page>> or
a boxed Box<dyn Iterator...>) that yields the results of chaining/flattening
each column_chunk_to_pages(...) iterator (use Iterator::chain or flat_map), or
(B) change the API to accept a sink/Compressor callback and directly feed each
pages_iter into the Compressor as you iterate. Update callers to consume the
iterator or pass the compressor so pages are processed incrementally rather than
collected into all_pages. Ensure references to collect_multi_partition_pages,
column_chunk_to_pages, and Compressor are updated accordingly.
core/src/main/java/io/questdb/griffin/engine/table/PushdownFilterExtractor.java (1)

103-107: Boolean identifiers should follow the is... / has... convention.

allConstant and containsQuery(...) don’t follow the project boolean naming rule.

As per coding guidelines "When choosing a name for a boolean variable, field or method, always use the is... or has... prefix, as appropriate".

Also applies to: 136-152

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@core/src/main/java/io/questdb/griffin/engine/table/PushdownFilterExtractor.java`
around lines 103 - 107, Rename boolean identifiers to follow the is/has
convention: change the local variable allConstant to a boolean name like
areAllConstants (or isAllConstant) and rename the method containsQuery(...) to
hasQuery(...). Update all references inside PushdownFilterExtractor (search for
allConstant and containsQuery), including the related occurrences around lines
136-152, plus any call sites, imports, and tests to use the new names so
compilation succeeds.
core/src/main/java/io/questdb/griffin/SqlCodeGenerator.java (1)

3188-3196: Boolean local names in this block should follow is.../has... convention.

enableParallelFilter and enablePreTouch should be renamed to match repo naming rules for booleans.

As per coding guidelines: "When choosing a name for a boolean variable, field or method, always use the is... or has... prefix, as appropriate".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core/src/main/java/io/questdb/griffin/SqlCodeGenerator.java` around lines
3188 - 3196, Rename the two local boolean variables to follow the is/has
convention: change enableParallelFilter to isParallelFilterEnabled and
enablePreTouch to isPreTouchEnabled (assignments remain the same:
isParallelFilterEnabled = executionContext.isParallelFilterEnabled();
isPreTouchEnabled = SqlHints.hasEnablePreTouchHint(model, model.getName());),
and update all subsequent references in this block (including the if check that
uses factory.supportsPageFrameCursor() and the later logic that uses these
booleans alongside useJit and canCompile) to use the new names.
core/src/main/java/io/questdb/cutlass/parquet/CopyExportRequestTask.java (1)

85-113: De-duplicate bloom-filter column tokenization + keep static helpers grouped.

validateBloomFilterColumns() and parseBloomFilterColumnIndexes() implement the same comma-splitting + trimming loop. It’s easy for these to drift again (e.g., one starts accepting whitespace-only tokens, the other rejects). Recommend extracting a shared private static tokenizer/helper so both callers share the exact semantics while still throwing SqlException vs CairoException.

Also, since both helpers are static, consider co-locating them (and keeping static members sorted) to match the repo’s member ordering rules. As per coding guidelines, **/*.java: “Group Java class members by kind (static vs. instance) and visibility, sorted alphabetically.”

Also applies to: 347-375

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core/src/main/java/io/questdb/cutlass/parquet/CopyExportRequestTask.java`
around lines 85 - 113, Extract the comma-splitting + trimming logic used in
validateBloomFilterColumns and parseBloomFilterColumnIndexes into a single
private static helper (e.g., tokenizeColumnList or splitAndTrimColumns) that
returns an iterable/list of CharSequence tokens and accepts the original
CharSequence input and a base position; then update validateBloomFilterColumns
to iterate that helper's tokens and perform meta.getColumnIndexQuiet checks,
throwing SqlException with the appropriate position, and update
parseBloomFilterColumnIndexes to reuse the same helper but translate
missing-column cases into CairoException as before; finally, move and group
these new static helper(s) alongside other static members and sort
alphabetically to satisfy the class member ordering rules.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@core/rust/qdbr/src/parquet_read/jni.rs`:
- Around line 124-138: The code currently casts filtered_rows_size to
filtered_rows_count and calls slice::from_raw_parts(filtered_rows_ptr,
filtered_rows_count) without validating magnitude, which allows negative i64
values to become huge usize lengths; add a bounds check before constructing the
slice: verify filtered_rows_count is not larger than the actual row group span
(or total rows available from row_group_bufs/row_group_span) and that
filtered_rows_count is non-negative (i.e., input i64 is >= 0) and that
filtered_rows_ptr is non-null when count > 0; if the check fails return
Err(fmt_err!(InvalidLayout, "...")) rather than creating the slice. Ensure you
reference and use filtered_rows_size/filtered_rows_count, filtered_rows_ptr,
row_group_bufs (or the variable that holds the row span) and avoid calling
slice::from_raw_parts until validation passes.

In `@core/rust/qdbr/src/parquet_read/row_groups.rs`:
- Around line 1048-1070: Parquet metadata can declare FixedLenByteArray(0) which
causes panics when code (e.g., compare_signed_be) indexes a[0]; update the
FixedLenByteArray handling (the match arm using validate_filter_span and the
loops that call is_fixed_len_null / is_fixed_len_null_be and
parquet2::bloom_filter::hash_byte) to first guard for size == 0 and
short-circuit (e.g., return Ok(true) or otherwise skip pruning) to avoid slicing
zero-length values, and make compare_signed_be defensive by returning a safe
non-indexing result when given an empty slice; apply the same zero-length guard
to the other affected blocks that use FixedLenByteArray and decimal compare
paths (references: compare_signed_be, is_fixed_len_null, is_fixed_len_null_be,
validate_filter_span).

In
`@core/src/main/java/io/questdb/griffin/engine/table/PushdownFilterExtractor.java`:
- Around line 80-82: The clear() method in PushdownFilterExtractor only clears
conditions but leaves orValues populated, which can retain stale AST nodes;
update the clear() implementation to also reset orValues (e.g., call
orValues.clear() or reinitialize it) so any reused PushdownFilterExtractor
instance won't hold old expression references; ensure you modify the clear()
method in the PushdownFilterExtractor class and verify that both conditions and
orValues are emptied after clear() is called.

In
`@core/src/test/java/io/questdb/test/griffin/AlterTableConvertPartitionTest.java`:
- Line 476: Two test SQL strings use lowercase SQL keywords; update the
execute(...) calls in AlterTableConvertPartitionTest so keywords are uppercase —
change "list" to "LIST" in the ALTER TABLE CONVERT PARTITION statement (the
execute call containing "ALTER TABLE x CONVERT PARTITION TO NATIVE list
'1970-01'") and change any lowercase "and" to "AND" in the other execute(...)
test statement referenced around line 530 so all SQL keywords conform to the
UPPERCASE test convention.

---

Outside diff comments:
In `@core/src/main/java/io/questdb/std/Chars.java`:
- Around line 138-162: The compare(CharSequence l, CharSequence r)
implementation currently does UTF-16 code unit subtraction causing inconsistent
ordering vs. greaterThan()/lessThan() which use code points; update compare() to
perform code point-aware comparison (use Character.codePointAt and advance
indices by Character.charCount(codePoint)) so it matches the semantics used by
lessThan()/greaterThan(); ensure edge cases (null identity, length/remaining
code points) and behavior relied on by MinStrGroupByFunction and
MaxStrGroupByFunction remain consistent with LtStrFunctionFactory.

---

Duplicate comments:
In `@core/rust/qdbr/src/parquet_write/file.rs`:
- Around line 158-164: The debug-only check in with_bloom_filter_fpp allows
invalid fpp in release builds; replace the debug_assert! with runtime
validation: either use assert!(fpp > 0.0 && fpp < 1.0, "...") to panic on
invalid input at runtime or (preferable for a public setter) change
with_bloom_filter_fpp signature to return Result<Self, Error> and return an Err
when fpp is out of range, then only set self.bloom_filter_fpp = fpp and return
Ok(self) when valid; update callers accordingly. Ensure you reference the
with_bloom_filter_fpp method and the bloom_filter_fpp field when making the
change.

In `@core/src/main/java/io/questdb/cairo/TableWriter.java`:
- Line 196: The field bloomFilterIndexes in TableWriter is being freed per
conversion call causing repeated native free/alloc; instead keep it allocated
for the TableWriter lifecycle, clear and reuse it during conversions (use
reopen/clear on bloomFilterIndexes where conversion logic runs) and move the
Misc.free(bloomFilterIndexes) call into TableWriter.doClose() so it is released
once when the writer is closed; update conversion code to remove per-call
Misc.free and replace with bloomFilterIndexes.clear() or equivalent reuse logic.
- Line 1587: Validate the computed fpp in TableWriter (the local variable fpp
computed from bloomFilterFpp or
config.getPartitionEncoderParquetBloomFilterFpp()) before passing it into native
encoding: ensure it's finite (not NaN or infinite) and within the valid
probability range (0.0 < fpp < 1.0, or use your project's accepted bounds), and
if not throw an IllegalArgumentException with a clear message; update the code
path that calls the native encoder to use this validated fpp so invalid values
are rejected early.

In `@core/src/main/java/io/questdb/cutlass/parquet/CopyExportRequestTask.java`:
- Around line 589-619: The code uses fpp (derived from bloomFilterFpp) and
passes it to createStreamingParquetWriter without validating it's a finite
probability in (0,1); update CopyExportRequestTask to validate fpp after
computing it (the local variable fpp) to ensure Double.isFinite(fpp) and fpp >
0.0 && fpp < 1.0, and if not throw an IllegalArgumentException (or similar) with
a clear message including the invalid value and parameter name before calling
createStreamingParquetWriter; this protects the native boundary that expects a
valid false-positive probability.

In
`@core/src/main/java/io/questdb/griffin/engine/table/ParquetRowGroupFilter.java`:
- Around line 77-79: Rename the boolean local variables in ParquetRowGroupFilter
to follow the is.../has... convention: change skip (used with
decoder.canSkipRowGroup) to isSkippableRowGroup or isSkippable, change supported
to isSupported, and change allCompatible to areAllCompatible (or isAllCompatible
if you prefer singular). Update every use site within the method(s) (including
the conditional checks and any increments like
rowGroupsSkipped.incrementAndGet()) to reference the new names so compilation
and logic remain unchanged.

In
`@core/src/test/java/io/questdb/test/griffin/AlterTableConvertPartitionTest.java`:
- Around line 115-179: Tests like testConvertAllPartitionsWithBloomFilterColumns
/ testConvertAllPartitionsWithBloomFilterAndFpp /
testConvertAllPartitionsWithBloomFilterFpp /
testConvertAllPartitionsWithBloomFilterWhere only assert conversion succeeded;
update each test to also verify that bloom options were applied by inspecting
the converted partition output after execute(...) and
assertPartitionExists(...). Concretely: locate the created partition (use the
same tableName and partition name used in assertPartitionExists), read the
partition’s Parquet/metadata or bloom artifacts (e.g., parquet file footer
metadata keys for bloom_filter_columns and fpp or presence/content of bloom
filter files for column "id"), and add assertions that the bloom_filter_columns
contains "id" and that fpp matches the provided value (e.g., "0.05", "0.01",
"0.1"). Ensure these checks run in each corresponding test immediately after
assertPartitionExists so the tests fail if the options are parsed but not
applied.

---

Nitpick comments:
In `@core/rust/qdbr/src/parquet_write/file.rs`:
- Around line 37-55: WriteOptions is being cloned on the hot path and its
bloom_filter_columns (HashSet<usize>) causes expensive deep clones; change
bloom_filter_columns to a cheap-to-clone type (e.g., Arc<HashSet<usize>>) or
remove it from per-page WriteOptions and pass bloom column selection separately
to the per-page writer. Update the WriteOptions struct (symbol: WriteOptions) to
use Arc<HashSet<usize>> for bloom_filter_columns and adjust all
usages/constructors and clones (including where WriteOptions is cloned around
line ~850) to account for Arc, or refactor the code that calls
WriteOptions.clone() to accept a separate &HashSet/Arc<HashSet> parameter
instead.
- Around line 530-551: collect_multi_partition_pages currently eagerly
materializes all Page values into the all_pages Vec causing high memory use;
change it to stream pages instead by removing the Vec allocation and either (A)
change the function to return a streaming iterator (e.g., impl Iterator<Item =
ParquetResult<Page>> or a boxed Box<dyn Iterator...>) that yields the results of
chaining/flattening each column_chunk_to_pages(...) iterator (use
Iterator::chain or flat_map), or (B) change the API to accept a sink/Compressor
callback and directly feed each pages_iter into the Compressor as you iterate.
Update callers to consume the iterator or pass the compressor so pages are
processed incrementally rather than collected into all_pages. Ensure references
to collect_multi_partition_pages, column_chunk_to_pages, and Compressor are
updated accordingly.

In `@core/rust/qdbr/src/parquet_write/util.rs`:
- Around line 428-442: The boundary test comments and final assertion around
mm.update_unsigned are misleading: update the comment for the i32::MIN case to
state that i32::MIN maps to 0x80000000 (2147483648u32) rather than calling it
the largest u32, and add one more assertion after the i32::MIN update to assert
mm.max is still Some(-1) so the invariant is fully checked; locate the test
block using mm.update_unsigned, mm.min and mm.max and adjust the comment text
and append the final max assertion accordingly.

In `@core/src/main/java/io/questdb/cairo/TableWriter.java`:
- Around line 7114-7154: The parseBloomFilterColumnIndexes method is recomputing
descriptorIndex by scanning prior columns for every token; precompute a
metadataIndex→descriptorIndex mapping once and use it during token parsing to
avoid the inner loop. Add a small int[] or DirectIntList (size
metadata.getColumnCount()) and populate it in a single pass over metadata
(incrementing a counter when metadata.getColumnType(i) > 0) to record
descriptorIndex for each metadata index, then replace the per-token loop that
computes descriptorIndex with a constant-time lookup into that map inside
parseBloomFilterColumnIndexes (keep existing checks for metadataIndex >= 0 and
duplicate detection).

In `@core/src/main/java/io/questdb/cutlass/parquet/CopyExportRequestTask.java`:
- Around line 85-113: Extract the comma-splitting + trimming logic used in
validateBloomFilterColumns and parseBloomFilterColumnIndexes into a single
private static helper (e.g., tokenizeColumnList or splitAndTrimColumns) that
returns an iterable/list of CharSequence tokens and accepts the original
CharSequence input and a base position; then update validateBloomFilterColumns
to iterate that helper's tokens and perform meta.getColumnIndexQuiet checks,
throwing SqlException with the appropriate position, and update
parseBloomFilterColumnIndexes to reuse the same helper but translate
missing-column cases into CairoException as before; finally, move and group
these new static helper(s) alongside other static members and sort
alphabetically to satisfy the class member ordering rules.

In
`@core/src/main/java/io/questdb/griffin/engine/table/PushdownFilterExtractor.java`:
- Around line 103-107: Rename boolean identifiers to follow the is/has
convention: change the local variable allConstant to a boolean name like
areAllConstants (or isAllConstant) and rename the method containsQuery(...) to
hasQuery(...). Update all references inside PushdownFilterExtractor (search for
allConstant and containsQuery), including the related occurrences around lines
136-152, plus any call sites, imports, and tests to use the new names so
compilation succeeds.

In `@core/src/main/java/io/questdb/griffin/SqlCodeGenerator.java`:
- Around line 3188-3196: Rename the two local boolean variables to follow the
is/has convention: change enableParallelFilter to isParallelFilterEnabled and
enablePreTouch to isPreTouchEnabled (assignments remain the same:
isParallelFilterEnabled = executionContext.isParallelFilterEnabled();
isPreTouchEnabled = SqlHints.hasEnablePreTouchHint(model, model.getName());),
and update all subsequent references in this block (including the if check that
uses factory.supportsPageFrameCursor() and the later logic that uses these
booleans alongside useJit and canCompile) to use the new names.

In `@core/src/test/java/io/questdb/test/std/str/CharsTest.java`:
- Around line 386-422: Add an explicit BMP vs supplementary comparison to
testGreaterThanAndLessThan that would fail under per-char ordering but pass
under code point ordering: in the CharsTest.testGreaterThanAndLessThan method,
add assertions using Chars.lessThan("\uFFFF", "\uD800\uDC00") and
Chars.greaterThan("\uD800\uDC00", "\uFFFF") (i.e., compare U+FFFF to the
surrogate pair for U+10000) to document and verify the intended code point
semantics.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 1a3e7343-df1f-4aca-92ce-b6a43dd5f874

📥 Commits

Reviewing files that changed from the base of the PR and between 80f1f43 and bdf9550.

⛔ Files ignored due to path filters (5)
  • core/src/main/resources/io/questdb/bin/darwin-aarch64/libquestdbr.dylib is excluded by !**/*.dylib
  • core/src/main/resources/io/questdb/bin/darwin-x86-64/libquestdbr.dylib is excluded by !**/*.dylib
  • core/src/main/resources/io/questdb/bin/linux-aarch64/libquestdbr.so is excluded by !**/*.so
  • core/src/main/resources/io/questdb/bin/linux-x86-64/libquestdbr.so is excluded by !**/*.so
  • core/src/main/resources/io/questdb/bin/windows-x86-64/questdbr.dll is excluded by !**/*.dll
📒 Files selected for processing (35)
  • core/rust/qdbr/src/parquet_read/decode.rs
  • core/rust/qdbr/src/parquet_read/decoders/converters.rs
  • core/rust/qdbr/src/parquet_read/decoders/dictionary.rs
  • core/rust/qdbr/src/parquet_read/jni.rs
  • core/rust/qdbr/src/parquet_read/mod.rs
  • core/rust/qdbr/src/parquet_read/row_groups.rs
  • core/rust/qdbr/src/parquet_write/file.rs
  • core/rust/qdbr/src/parquet_write/jni.rs
  • core/rust/qdbr/src/parquet_write/mod.rs
  • core/rust/qdbr/src/parquet_write/symbol.rs
  • core/rust/qdbr/src/parquet_write/util.rs
  • core/src/main/java/io/questdb/PropertyKey.java
  • core/src/main/java/io/questdb/cairo/CairoConfiguration.java
  • core/src/main/java/io/questdb/cairo/TableWriter.java
  • core/src/main/java/io/questdb/cairo/sql/PartitionFrameCursorFactory.java
  • core/src/main/java/io/questdb/cutlass/http/processors/ExportQueryProcessor.java
  • core/src/main/java/io/questdb/cutlass/parquet/CopyExportRequestTask.java
  • core/src/main/java/io/questdb/griffin/SqlCodeGenerator.java
  • core/src/main/java/io/questdb/griffin/SqlCompilerImpl.java
  • core/src/main/java/io/questdb/griffin/SqlParser.java
  • core/src/main/java/io/questdb/griffin/engine/functions/table/ReadParquetPageFrameCursor.java
  • core/src/main/java/io/questdb/griffin/engine/table/BwdTableReaderPageFrameCursor.java
  • core/src/main/java/io/questdb/griffin/engine/table/FwdTableReaderPageFrameCursor.java
  • core/src/main/java/io/questdb/griffin/engine/table/ParquetRowGroupFilter.java
  • core/src/main/java/io/questdb/griffin/engine/table/PushdownFilterExtractor.java
  • core/src/main/java/io/questdb/griffin/engine/table/parquet/PartitionDecoder.java
  • core/src/main/java/io/questdb/griffin/engine/table/parquet/PartitionEncoder.java
  • core/src/main/java/io/questdb/std/Chars.java
  • core/src/main/resources/io/questdb/site/conf/server.conf
  • core/src/test/java/io/questdb/test/griffin/AlterTableConvertPartitionTest.java
  • core/src/test/java/io/questdb/test/griffin/CopyExportTest.java
  • core/src/test/java/io/questdb/test/griffin/engine/table/parquet/ReadParquetFunctionTest.java
  • core/src/test/java/io/questdb/test/std/str/CharsTest.java
  • core/src/test/resources/sqllogictest/test/parquet/test_parquet_filter_pushdown.test
  • pkg/ami/marketplace/assets/server.conf
🚧 Files skipped from review as they are similar to previous changes (10)
  • core/src/test/java/io/questdb/test/griffin/CopyExportTest.java
  • core/rust/qdbr/src/parquet_read/decoders/converters.rs
  • core/src/main/java/io/questdb/cairo/CairoConfiguration.java
  • core/src/test/resources/sqllogictest/test/parquet/test_parquet_filter_pushdown.test
  • core/rust/qdbr/src/parquet_read/decode.rs
  • core/src/main/java/io/questdb/PropertyKey.java
  • core/src/main/java/io/questdb/griffin/SqlParser.java
  • pkg/ami/marketplace/assets/server.conf
  • core/src/main/resources/io/questdb/site/conf/server.conf
  • core/src/main/java/io/questdb/griffin/engine/table/parquet/PartitionEncoder.java

kafka1991 and others added 20 commits March 5, 2026 12:33
Resolve conflicts from upstream clippy fixes and binary rebuilds.
Accept upstream binaries, adopt combined #[allow(clippy::...)] attrs
and removed redundant doc comment. Keep branch CI fixes (extra path
pattern and object-count check) while adopting upstream's find-based
chmod.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
The write-side Parquet JNI functions used panic!/assert!/expect() which
would crash the JVM via undefined behavior if triggered. This commit
replaces them with proper error returns, matching the pattern already
used by the read-side JNI functions.

Changes by category:

Java-guaranteed invariants (null pointer checks from Java callers):
- PartitionUpdater destroy/updateFileMetadata/updateRowGroup: replace
  panic!/assert! with if-null checks that throw CairoException or
  silently return (for destroy, matching free(NULL) semantics).

External-data-dependent panics (corrupted Parquet/column data):
- update.rs: replace panic on invalid Parquet version with Result error.
- string.rs: replace expect() on invalid offsets and UTF-16 data with
  map_err error propagation. Change encode_plain/encode_delta to return
  ParquetResult.
- binary.rs: replace expect() on invalid offsets with map_err. Add
  upfront offset validation in encode_delta to protect all downstream
  as-usize casts. Change encode_plain/encode_delta to return
  ParquetResult.
- varchar.rs: replace assert! on data corruption with Err return.
- column_chunk.rs: replace unwrap() on bloom filter mutex with map_err
  to handle poisoned lock gracefully.

Also changes create_partition_descriptor assert_eq! on col_data_len
alignment to a proper Err return, since the function already returns
ParquetResult.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Co-Authored-By: Claude Opus 4.6 <[email protected]>
@glasstiger
Copy link
Copy Markdown
Contributor

[PR Coverage check]

😍 pass : 4877 / 6029 (80.89%)

file detail

path covered line new line coverage
🔵 io/questdb/cairo/TableReader.java 0 4 00.00%
🔵 io/questdb/cutlass/text/CopyExportContext.java 0 1 00.00%
🔵 qdbr/src/parquet_read/jni.rs 0 166 00.00%
🔵 qdbr/src/parquet_write/binary.rs 0 41 00.00%
🔵 qdbr/src/parquet_write/jni.rs 0 60 00.00%
🔵 qdbr/src/parquet_write/update.rs 12 41 29.27%
🔵 qdbr/src/parquet_write/varchar.rs 21 59 35.59%
🔵 qdbr/src/parquet_write/string.rs 17 36 47.22%
🔵 io/questdb/cairo/sql/RecordCursorFactory.java 1 2 50.00%
🔵 qdbr/src/parquet_read/row_groups.rs 471 853 55.22%
🔵 qdbr/parquet2/src/write/file.rs 28 42 66.67%
🔵 qdbr/parquet2/src/bloom_filter/read.rs 31 45 68.89%
🔵 qdbr/src/parquet_write/primitive.rs 91 132 68.94%
🔵 qdbr/src/parquet_read/decoders/dictionary.rs 3 4 75.00%
🔵 qdbr/src/parquet_write/file.rs 167 218 76.61%
🔵 io/questdb/cairo/MetadataCache.java 17 22 77.27%
🔵 io/questdb/PropServerConfiguration.java 8 10 80.00%
🔵 io/questdb/cutlass/http/processors/ExportQueryProcessor.java 20 24 83.33%
🔵 io/questdb/griffin/engine/table/ParquetRowGroupFilter.java 174 205 84.88%
🔵 io/questdb/griffin/engine/table/PushdownFilterExtractor.java 207 244 84.84%
🔵 qdbr/src/parquet_write/symbol.rs 40 47 85.11%
🔵 qdbr/src/parquet_write/util.rs 24 28 85.71%
🔵 qdbr/src/parquet_write/simd.rs 815 942 86.52%
🔵 io/questdb/griffin/SqlParser.java 19 21 90.48%
🔵 io/questdb/griffin/SqlCompilerImpl.java 45 50 90.00%
🔵 io/questdb/cairo/TableWriter.java 50 55 90.91%
🔵 qdbr/src/parquet_write/fixed_len_bytes.rs 21 23 91.30%
🔵 qdbr/parquet2/src/write/row_group.rs 12 13 92.31%
🔵 io/questdb/cutlass/parquet/CopyExportRequestTask.java 62 67 92.54%
🔵 io/questdb/griffin/engine/table/BwdTableReaderPageFrameCursor.java 109 116 93.97%
🔵 qdbr/parquet2/src/write/column_chunk.rs 47 50 94.00%
🔵 qdbr/src/parquet_read/decode.rs 131 136 96.32%
🔵 io/questdb/griffin/engine/table/FwdTableReaderPageFrameCursor.java 80 82 97.56%
🔵 io/questdb/griffin/SqlCodeGenerator.java 250 256 97.66%
🔵 qdbr/src/parquet_read/mod.rs 911 926 98.38%
🔵 qdbr/src/parquet_write/mod.rs 754 769 98.05%
🔵 io/questdb/griffin/engine/table/parquet/PartitionDecoder.java 5 5 100.00%
🔵 io/questdb/cutlass/http/HttpConstants.java 2 2 100.00%
🔵 io/questdb/griffin/engine/functions/table/ReadParquetRecordCursor.java 21 21 100.00%
🔵 qdbr/parquet2/src/encoding/hybrid_rle/mod.rs 1 1 100.00%
🔵 io/questdb/cutlass/parquet/SQLSerialParquetExporter.java 13 13 100.00%
🔵 io/questdb/std/Chars.java 20 20 100.00%
🔵 io/questdb/cairo/TxReader.java 4 4 100.00%
🔵 io/questdb/cairo/DefaultCairoConfiguration.java 3 3 100.00%
🔵 io/questdb/cairo/O3PartitionJob.java 1 1 100.00%
🔵 io/questdb/griffin/SqlExecutionContextImpl.java 4 4 100.00%
🔵 qdbr/parquet2/src/encoding/mod.rs 1 1 100.00%
🔵 io/questdb/griffin/model/ExportModel.java 16 16 100.00%
🔵 io/questdb/cairo/TableWriterMetadata.java 3 3 100.00%
🔵 io/questdb/PropertyKey.java 3 3 100.00%
🔵 qdbr/parquet2/src/write/page.rs 1 1 100.00%
🔵 io/questdb/griffin/engine/functions/table/ReadParquetRecordCursorFactory.java 6 6 100.00%
🔵 io/questdb/griffin/engine/table/parquet/PartitionEncoder.java 5 5 100.00%
🔵 io/questdb/griffin/engine/table/PageFrameRecordCursorFactory.java 4 4 100.00%
🔵 qdbr/parquet2/src/bloom_filter/split_block.rs 17 17 100.00%
🔵 io/questdb/cairo/CairoTable.java 4 4 100.00%
🔵 io/questdb/cairo/CairoConfigurationWrapper.java 3 3 100.00%
🔵 qdbr/parquet2/src/encoding/bitpacked/encode.rs 1 1 100.00%
🔵 io/questdb/griffin/SqlKeywords.java 25 25 100.00%
🔵 io/questdb/cairo/AbstractPartitionFrameCursorFactory.java 7 7 100.00%
🔵 io/questdb/griffin/engine/ops/AlterOperationBuilder.java 3 3 100.00%
🔵 io/questdb/griffin/engine/functions/table/ReadParquetPageFrameRecordCursorFactory.java 8 8 100.00%
🔵 io/questdb/cutlass/parquet/CopyExportRequestJob.java 4 4 100.00%
🔵 io/questdb/griffin/engine/functions/table/ReadParquetPageFrameCursor.java 25 25 100.00%
🔵 io/questdb/cairo/TableStructure.java 1 1 100.00%
🔵 io/questdb/griffin/engine/table/AbstractPageFrameRecordCursorFactory.java 3 3 100.00%
🔵 io/questdb/griffin/engine/ops/CopyExportFactory.java 15 15 100.00%
🔵 io/questdb/griffin/engine/ops/AlterOperation.java 9 9 100.00%
🔵 qdbr/src/parquet_write/schema.rs 1 1 100.00%

@bluestreak01 bluestreak01 merged commit 834d0fe into master Mar 8, 2026
45 checks passed
@bluestreak01 bluestreak01 deleted the parquet_filter_pushdown_bloom branch March 8, 2026 00:42
maciulis pushed a commit to maciulis/questdb that referenced this pull request Mar 16, 2026
@RaphDal RaphDal linked an issue Mar 23, 2026 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bloom filters for high-cardinality column lookups

4 participants