(Test) Advanced adaptive filter selectivity evaluation by adriangb · Pull Request #20363 · apache/datafusion

adriangb · 2026-02-15T04:46:21Z

Which issue does this PR close?

Related to filter pushdown performance optimization work.

Rationale for this change

Currently when pushdown_filters = true, DataFusion pushes all filter predicates into the Parquet reader as row-level filters (ArrowPredicates) unconditionally. This is suboptimal because:

Some filters are expensive relative to their selectivity. A filter that references wide columns but prunes few rows wastes CPU decoding those columns during the row-filter phase, when it would be cheaper to apply the filter post-scan on the already-decoded batch.
The old reorder_filters heuristic was static. It used compressed column size as a proxy for cost and sorted filters by that metric, but never measured actual runtime selectivity or evaluation cost. It could not adapt to data skew or runtime conditions.
Dynamic join filters (e.g., from HashJoinExec) cannot be dropped even when they provide no benefit. Without a way to mark filters as optional, the system was forced to always evaluate them.

This PR introduces an adaptive filter selectivity tracking system that observes filter behavior at runtime and makes data-driven decisions about whether each filter should be pushed down as a row-level predicate or applied post-scan.

What changes are included in this PR?

1. New module: `selectivity.rs` (1,554 lines)

The core of this PR. Introduces SelectivityTracker, a shared, lock-guarded structure that:

Tracks per-filter statistics using Welford's online algorithm for numerically stable streaming mean and variance of filter "effectiveness" (bytes_pruned_per_second_of_eval_time).
Implements a filter state machine: each filter transitions through New -> RowFilter | PostScan -> (promoted/demoted/dropped) states based on:
- Initial placement: uses a byte-ratio heuristic (filter_bytes / projection_bytes) to cheaply decide whether a new filter starts as a row filter or post-scan filter.
- Promotion (PostScan -> RowFilter): when the confidence interval lower bound on effectiveness exceeds filter_pushdown_min_bytes_per_sec.
- Demotion (RowFilter -> PostScan): when the confidence interval upper bound drops below the threshold.
- Dropping (for optional filters only): filters wrapped in OptionalFilterPhysicalExpr can be dropped entirely when ineffective.
Detects dynamic filter updates via snapshot_generation(), resetting statistics when a filter's predicate changes (e.g., when a DynamicFilterPhysicalExpr from a hash join updates its value set).
Sorts filters by effectiveness within each partition (row-level and post-scan), so the most selective filters are applied first.

Key types:

SelectivityTracker -- cross-file tracker shared by all ParquetOpener instances
TrackerConfig -- immutable configuration (built from ParquetOptions)
SelectivityStats -- per-filter Welford statistics with confidence interval methods
FilterState -- RowFilter | PostScan | Dropped enum
PartitionedFilters -- output of partition_filters(), consumed by the opener
FilterId -- stable usize identifier assigned by ParquetSource::with_predicate

2. New wrapper: `OptionalFilterPhysicalExpr` (in `physical_expr_common`)

A transparent PhysicalExpr wrapper that marks a filter as optional -- droppable without affecting query correctness. All PhysicalExpr trait methods delegate to the inner expression. The selectivity tracker detects this via downcast_ref::<OptionalFilterPhysicalExpr>() and can drop the filter entirely when it is ineffective, rather than demoting it to post-scan.

HashJoinExec now wraps its dynamic join filters in OptionalFilterPhysicalExpr before pushing them down. This is why plan output now shows Optional(DynamicFilter [...]) instead of DynamicFilter [...].

3. Removal of `reorder_filters` config option

The old static reorder_filters boolean and its associated heuristic (sort by required_bytes, then can_use_index) are removed entirely. The adaptive system subsumes this:

FilterCandidate no longer stores required_bytes or can_use_index fields.
The size_of_columns() and columns_sorted() helper functions in row_filter.rs are removed.
Filter ordering is now handled by SelectivityTracker::partition_filters() based on measured effectiveness or byte-ratio fallback.

4. Three new configuration options (in `ParquetOptions`)

Option	Default	Purpose
`filter_pushdown_min_bytes_per_sec`	52,428,800 (50 MiB/s)	Throughput threshold for promoting a filter to row-level. `0.0` = all promoted, `INFINITY` = none promoted (feature disabled).
`filter_collecting_byte_ratio_threshold`	0.15	Byte-ratio threshold for initial filter placement. Filters whose columns use < 15% of projected bytes start as row filters; otherwise post-scan.
`filter_confidence_z`	2.0	Z-score for confidence intervals (~95%). Controls how much evidence is needed before promoting or demoting a filter.

5. Changes to `ParquetOpener` / opener.rs

Predicates are now stored as Vec<(FilterId, Arc<dyn PhysicalExpr>)> instead of a single combined Arc<dyn PhysicalExpr>.
The opener calls selectivity_tracker.partition_filters() to split filters into row-level vs. post-scan.
Row-level filters are built via build_row_filter() (updated signature).
Post-scan filters are applied in apply_post_scan_filters_with_stats(), a new function that evaluates each filter individually, reports per-filter timing and selectivity back to the tracker, and combines results into a single boolean mask.
The limit is only applied to the Parquet reader when there are no post-scan filters (otherwise limiting would cut off rows before the filter could find matches).
The projection mask is expanded to include columns needed by post-scan filters.
A new filter_apply_time metric tracks post-scan filter evaluation time.

6. Changes to `ParquetSource` / source.rs

Internal predicate storage changed from Option<Arc<dyn PhysicalExpr>> to Option<Vec<(FilterId, Arc<dyn PhysicalExpr>)>>.
with_predicate() now splits the predicate into conjuncts and assigns stable FilterIds (indices).
SelectivityTracker is stored as a shared Arc on ParquetSource and passed to all openers.
with_table_parquet_options() now builds a fresh SelectivityTracker from the three new config values.
with_reorder_filters() and reorder_filters() methods are removed.

7. Changes to `build_row_filter()` / row_filter.rs

Signature changed: takes Vec<(FilterId, Arc<dyn PhysicalExpr>)> + &Arc<SelectivityTracker> instead of &Arc<dyn PhysicalExpr> + reorder_predicates: bool.
Returns RowFilterWithMetrics (new struct) containing both the RowFilter and any unbuildable filters that must be applied post-scan.
DatafusionArrowPredicate now carries a FilterId and Arc<SelectivityTracker>, reporting per-batch evaluation metrics back to the tracker after each evaluate() call.
No reordering is done inside build_row_filter -- filters arrive pre-ordered by the tracker.

8. Changes to `HashJoinExec`

Dynamic join filters are now wrapped in OptionalFilterPhysicalExpr before being pushed down.
When receiving a pushed-down filter back, the join unwraps OptionalFilterPhysicalExpr to find the inner DynamicFilterPhysicalExpr.

9. Protobuf schema updates

reorder_filters field (tag 6) marked as reserved in datafusion_common.proto.
Three new optional fields added: filter_pushdown_min_bytes_per_sec (tag 35), filter_collecting_byte_ratio_threshold (tag 40), filter_confidence_z (tag 41).
Corresponding serialization/deserialization code updated in pbjson.rs, prost.rs, from_proto, to_proto, and file_formats.rs.

10. Test and benchmark updates

All references to reorder_filters removed from tests and benchmarks.
Existing filter pushdown tests set filter_pushdown_min_bytes_per_sec = 0.0 to preserve deterministic behavior (all filters always pushed down).
Snapshot test expectations updated from DynamicFilter [...] to Optional(DynamicFilter [...]).
New unit tests in selectivity.rs covering: effectiveness calculation, Welford's algorithm, confidence intervals, state machine transitions (initial placement, promotion, demotion, dropping), dynamic filter generation tracking, filter ordering, and integration lifecycle tests.
One expected output change in explain_analyze.rs (output_rows=8 -> output_rows=5) due to the adaptive system now placing some filters as post-scan that were previously row-level, causing slight row count differences in EXPLAIN ANALYZE output.

Are these changes tested?

Yes:

Existing tests: All existing pushdown_filters and filter pushdown SLT tests pass (with filter_pushdown_min_bytes_per_sec = 0.0 to force all filters to row-level for deterministic behavior).
New unit tests: Comprehensive tests in selectivity.rs (~450 lines of tests) covering the SelectivityStats calculator, TrackerConfig builder, state machine transitions (initial placement, promotion, demotion, dropping, reset on generation change), filter ordering, and full promotion/demotion lifecycle integration tests.
Updated snapshot tests: All physical optimizer filter pushdown snapshot tests updated to reflect the Optional(...) wrapper on dynamic filters.
Updated SLT tests: dynamic_filter_pushdown_config.slt, information_schema.slt, preserve_file_partitioning.slt, projection_pushdown.slt, push_down_filter.slt, and repartition_subset_satisfaction.slt updated.
Benchmark data included: benchmarks/results.txt shows TPC-H (13 faster, 6 slower, 3 unchanged), TPC-DS (33 faster, 31 slower, 35 unchanged, with notable 24x improvement on Q64), and ClickBench (18 faster, 12 slower, 13 unchanged) results.

Are there any user-facing changes?

Yes:

reorder_filters config option removed. This is a breaking change. Users who set SET datafusion.execution.parquet.reorder_filters = true will get an error. The adaptive system replaces this functionality automatically.
Three new config options added under datafusion.execution.parquet:
- filter_pushdown_min_bytes_per_sec (default: 52428800)
- filter_collecting_byte_ratio_threshold (default: 0.15)
- filter_confidence_z (default: 2.0)
Changed default behavior when pushdown_filters = true. Previously, all filters were unconditionally pushed into the Parquet reader. Now, the adaptive system decides per-filter based on byte-ratio thresholds and runtime effectiveness measurements. To restore the old behavior of pushing all filters unconditionally, set filter_pushdown_min_bytes_per_sec = 0.0.
EXPLAIN plan output changes. Dynamic join filters now display as Optional(DynamicFilter [...]) instead of DynamicFilter [...], reflecting their new optional wrapper.
Deprecated predicate() method signature changed. ParquetSource::predicate() now returns Option<Arc<dyn PhysicalExpr>> (owned) instead of Option<&Arc<dyn PhysicalExpr>> (reference). This method was already deprecated in favor of filter().

adriangb · 2026-02-15T04:49:22Z

run benchmark tpcds
DATAFUSION_EXECUTION_PARQUET_PUSHDOWN_FILTERS=true
DATAFUSION_EXECUTION_PARQUET_REORDER_FILTERS=true

adriangb · 2026-02-15T04:49:29Z

run benchmark clickbench_partitioned
DATAFUSION_EXECUTION_PARQUET_PUSHDOWN_FILTERS=true
DATAFUSION_EXECUTION_PARQUET_REORDER_FILTERS=true

alamb-ghbot · 2026-02-15T04:49:34Z

🤖 ./gh_compare_branch.sh gh_compare_branch.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing filter-pushdown-dynamic-bytes (dbab02b) to 53b0ffb diff using: clickbench_partitioned
Results will be posted here when complete

adriangb · 2026-02-15T04:49:36Z

run benchmark tpch
DATAFUSION_EXECUTION_PARQUET_PUSHDOWN_FILTERS=true
DATAFUSION_EXECUTION_PARQUET_REORDER_FILTERS=true

alamb-ghbot · 2026-02-15T05:18:40Z

🤖: Benchmark completed

Details

Comparing HEAD and filter-pushdown-dynamic-bytes
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃        HEAD ┃ filter-pushdown-dynamic-bytes ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0  │     2.52 ms │                       2.78 ms │  1.10x slower │
│ QQuery 1  │    52.24 ms │                      49.61 ms │ +1.05x faster │
│ QQuery 2  │   132.32 ms │                     136.95 ms │     no change │
│ QQuery 3  │   160.45 ms │                     159.43 ms │     no change │
│ QQuery 4  │  1006.07 ms │                    1001.72 ms │     no change │
│ QQuery 5  │  1264.63 ms │                    1240.53 ms │     no change │
│ QQuery 6  │    17.57 ms │                       6.46 ms │ +2.72x faster │
│ QQuery 7  │    68.54 ms │                      55.34 ms │ +1.24x faster │
│ QQuery 8  │  1432.88 ms │                    1357.09 ms │ +1.06x faster │
│ QQuery 9  │  1783.22 ms │                    1723.13 ms │     no change │
│ QQuery 10 │   469.85 ms │                     339.66 ms │ +1.38x faster │
│ QQuery 11 │   515.70 ms │                     392.95 ms │ +1.31x faster │
│ QQuery 12 │  1424.70 ms │                    1142.60 ms │ +1.25x faster │
│ QQuery 13 │  2086.66 ms │                    1758.57 ms │ +1.19x faster │
│ QQuery 14 │  1465.46 ms │                    1166.84 ms │ +1.26x faster │
│ QQuery 15 │  1205.75 ms │                    1142.22 ms │ +1.06x faster │
│ QQuery 16 │  2455.92 ms │                    2414.92 ms │     no change │
│ QQuery 17 │  2453.39 ms │                    2375.62 ms │     no change │
│ QQuery 18 │  4799.30 ms │                    4693.76 ms │     no change │
│ QQuery 19 │   141.25 ms │                     142.58 ms │     no change │
│ QQuery 20 │  1861.44 ms │                    1845.91 ms │     no change │
│ QQuery 21 │  2312.86 ms │                    2179.66 ms │ +1.06x faster │
│ QQuery 22 │  3956.56 ms │                    4079.52 ms │     no change │
│ QQuery 23 │  1067.61 ms │                    4758.55 ms │  4.46x slower │
│ QQuery 24 │   244.71 ms │                     185.57 ms │ +1.32x faster │
│ QQuery 25 │   635.55 ms │                     447.91 ms │ +1.42x faster │
│ QQuery 26 │   318.23 ms │                     204.37 ms │ +1.56x faster │
│ QQuery 27 │  2945.79 ms │                    2432.80 ms │ +1.21x faster │
│ QQuery 28 │ 23684.76 ms │                   23006.72 ms │     no change │
│ QQuery 29 │   948.88 ms │                     986.96 ms │     no change │
│ QQuery 30 │  1269.24 ms │                    1229.75 ms │     no change │
│ QQuery 31 │  1311.55 ms │                    1343.10 ms │     no change │
│ QQuery 32 │  4161.16 ms │                    3919.03 ms │ +1.06x faster │
│ QQuery 33 │  5017.59 ms │                    5183.95 ms │     no change │
│ QQuery 34 │  5507.17 ms │                    5238.45 ms │     no change │
│ QQuery 35 │  1862.72 ms │                    1797.80 ms │     no change │
│ QQuery 36 │   171.63 ms │                     185.39 ms │  1.08x slower │
│ QQuery 37 │    90.51 ms │                      72.77 ms │ +1.24x faster │
│ QQuery 38 │    85.76 ms │                     111.77 ms │  1.30x slower │
│ QQuery 39 │   286.54 ms │                     332.16 ms │  1.16x slower │
│ QQuery 40 │    56.28 ms │                      39.13 ms │ +1.44x faster │
│ QQuery 41 │    50.29 ms │                      35.28 ms │ +1.43x faster │
│ QQuery 42 │    36.46 ms │                      32.72 ms │ +1.11x faster │
└───────────┴─────────────┴───────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                            ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                            │ 80821.72ms │
│ Total Time (filter-pushdown-dynamic-bytes)   │ 80952.02ms │
│ Average Time (HEAD)                          │  1879.57ms │
│ Average Time (filter-pushdown-dynamic-bytes) │  1882.61ms │
│ Queries Faster                               │         20 │
│ Queries Slower                               │          5 │
│ Queries with No Change                       │         18 │
│ Queries with Failure                         │          0 │
└──────────────────────────────────────────────┴────────────┘

alamb-ghbot · 2026-02-15T05:18:43Z

🤖 ./gh_compare_branch.sh gh_compare_branch.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing filter-pushdown-dynamic-bytes (dbab02b) to 53b0ffb diff using: tpcds
Results will be posted here when complete

Dandandan · 2026-02-15T07:27:39Z

show benchmark queue

alamb-ghbot · 2026-02-15T07:27:49Z

🤖 Hi @Dandandan, you asked to view the benchmark queue (#20363 (comment)).

Job	User	Benchmarks	Comment
`20363_3903261774.sh`	adriangb	tpcds (env: DATAFUSION_EXECUTION_PARQUET_PUSHDOWN_FILTERS=true DATAFUSION_EXECUTION_PARQUET_REORDER_FILTERS=true)	`https://github.com/apache/datafusion/pull/20363#issuecomment-3903261774`
`20363_3903262814.sh`	adriangb	tpch (env: DATAFUSION_EXECUTION_PARQUET_PUSHDOWN_FILTERS=true DATAFUSION_EXECUTION_PARQUET_REORDER_FILTERS=true)	`https://github.com/apache/datafusion/pull/20363#issuecomment-3903262814`
`20365_3903537986.sh`	Dandandan	tpch tpcds (env: DATAFUSION_EXECUTION_PARQUET_PUSHDOWN_FILTERS=true DATAFUSION_EXECUTION_PARQUET_REORDER_FILTERS=true)	`https://github.com/apache/datafusion/pull/20365#issuecomment-3903537986`

Dandandan · 2026-02-15T09:21:03Z

show benchmark queue

alamb-ghbot · 2026-02-15T09:21:11Z

🤖 Hi @Dandandan, you asked to view the benchmark queue (#20363 (comment)).

Job	User	Benchmarks	Comment
`20363_3903261774.sh`	adriangb	tpcds (env: DATAFUSION_EXECUTION_PARQUET_PUSHDOWN_FILTERS=true DATAFUSION_EXECUTION_PARQUET_REORDER_FILTERS=true)	`https://github.com/apache/datafusion/pull/20363#issuecomment-3903261774`
`20363_3903262814.sh`	adriangb	tpch (env: DATAFUSION_EXECUTION_PARQUET_PUSHDOWN_FILTERS=true DATAFUSION_EXECUTION_PARQUET_REORDER_FILTERS=true)	`https://github.com/apache/datafusion/pull/20363#issuecomment-3903262814`
`20365_3903537986.sh`	Dandandan	tpch tpcds (env: DATAFUSION_EXECUTION_PARQUET_PUSHDOWN_FILTERS=true DATAFUSION_EXECUTION_PARQUET_REORDER_FILTERS=true)	`https://github.com/apache/datafusion/pull/20365#issuecomment-3903537986`
`20365_3903568877.sh`	Dandandan	clickbench_partitioned (env: DATAFUSION_EXECUTION_PARQUET_PUSHDOWN_FILTERS=true DATAFUSION_EXECUTION_PARQUET_REORDER_FILTERS=true)	`https://github.com/apache/datafusion/pull/20365#issuecomment-3903568877`

Dandandan · 2026-02-15T09:21:34Z

Hm it seems stuck again

Dandandan · 2026-02-15T12:07:08Z

FYI @alamb

Hm it seems stuck again

adriangb · 2026-02-15T12:40:12Z

@Dandandan this is mostly vibe coded, I'm only 50% confident it even makes sense without reviewing the code fwiw

adriangb · 2026-02-15T13:46:23Z

show benchmark queue

alamb-ghbot · 2026-02-15T13:46:30Z

🤖 Hi @adriangb, you asked to view the benchmark queue (#20363 (comment)).

Job	User	Benchmarks	Comment
`20363_3903261774.sh`	adriangb	tpcds (env: DATAFUSION_EXECUTION_PARQUET_PUSHDOWN_FILTERS=true DATAFUSION_EXECUTION_PARQUET_REORDER_FILTERS=true)	`https://github.com/apache/datafusion/pull/20363#issuecomment-3903261774`
`20363_3903262814.sh`	adriangb	tpch (env: DATAFUSION_EXECUTION_PARQUET_PUSHDOWN_FILTERS=true DATAFUSION_EXECUTION_PARQUET_REORDER_FILTERS=true)	`https://github.com/apache/datafusion/pull/20363#issuecomment-3903262814`
`20365_3903537986.sh`	Dandandan	tpch tpcds (env: DATAFUSION_EXECUTION_PARQUET_PUSHDOWN_FILTERS=true DATAFUSION_EXECUTION_PARQUET_REORDER_FILTERS=true)	`https://github.com/apache/datafusion/pull/20365#issuecomment-3903537986`
`20365_3903568877.sh`	Dandandan	clickbench_partitioned (env: DATAFUSION_EXECUTION_PARQUET_PUSHDOWN_FILTERS=true DATAFUSION_EXECUTION_PARQUET_REORDER_FILTERS=true)	`https://github.com/apache/datafusion/pull/20365#issuecomment-3903568877`

adriangb · 2026-02-15T13:47:03Z

Wonder if I'm infinite looping it or something :(

Dandandan · 2026-02-15T14:10:04Z

Wonder if I'm infinite looping it or something :(

Yes I think previously it got stuck during infinite loops / extremely long running tasks.

adriangb · 2026-02-15T14:12:58Z

Wonder if I'm infinite looping it or something :(

Yes I think previously it got stuck during infinite loops / extremely long running tasks.

My bad I’ll try to add a PR to have timeouts and a cancel command

adriangb · 2026-02-15T18:27:08Z

show benchmark queue

alamb-ghbot · 2026-02-15T18:27:14Z

🤖 Hi @adriangb, you asked to view the benchmark queue (#20363 (comment)).

Job	User	Benchmarks	Comment
`20363_3903261774.sh`	adriangb	tpcds (env: DATAFUSION_EXECUTION_PARQUET_PUSHDOWN_FILTERS=true DATAFUSION_EXECUTION_PARQUET_REORDER_FILTERS=true)	`https://github.com/apache/datafusion/pull/20363#issuecomment-3903261774`
`20363_3903262814.sh`	adriangb	tpch (env: DATAFUSION_EXECUTION_PARQUET_PUSHDOWN_FILTERS=true DATAFUSION_EXECUTION_PARQUET_REORDER_FILTERS=true)	`https://github.com/apache/datafusion/pull/20363#issuecomment-3903262814`
`20365_3903537986.sh`	Dandandan	tpch tpcds (env: DATAFUSION_EXECUTION_PARQUET_PUSHDOWN_FILTERS=true DATAFUSION_EXECUTION_PARQUET_REORDER_FILTERS=true)	`https://github.com/apache/datafusion/pull/20365#issuecomment-3903537986`
`20365_3903568877.sh`	Dandandan	clickbench_partitioned (env: DATAFUSION_EXECUTION_PARQUET_PUSHDOWN_FILTERS=true DATAFUSION_EXECUTION_PARQUET_REORDER_FILTERS=true)	`https://github.com/apache/datafusion/pull/20365#issuecomment-3903568877`
`arrow-9414-3904516323.sh`	Dandandan	arrow_reader_clickbench	`https://github.com/apache/arrow-rs/pull/9414#issuecomment-3904516323`

alamb · 2026-02-15T20:36:32Z

run benchmark tpch

Dandandan · 2026-02-24T06:44:39Z

I think it is stuck again 😆

Dandandan · 2026-02-24T06:45:37Z

@alamb could you take a look? Somehow the result is also empty.

adriangb · 2026-02-24T06:45:58Z

Yeah, seems like it’s always tpcds? I don’t think it’s this branch necessarily, it got stuck on your branch earlier and this one has been pretty much completely rewritten since last time it got stuck here.

Dandandan · 2026-02-24T10:16:10Z

Yeah, seems like it’s always tpcds? I don’t think it’s this branch necessarily, it got stuck on your branch earlier and this one has been pretty much completely rewritten since last time it got stuck here.

Hmmm could be...

adriangb · 2026-02-24T23:05:18Z

run benchmark clickbench_partitioned
DATAFUSION_EXECUTION_PARQUET_PUSHDOWN_FILTERS=true
DATAFUSION_EXECUTION_PARQUET_REORDER_FILTERS=true

adriangb · 2026-02-25T00:42:34Z

show benchmark queue

alamb-ghbot · 2026-02-25T00:42:36Z

🤖 Hi @adriangb, you asked to view the benchmark queue (#20363 (comment)).

Job	User	Benchmarks	Comment
`20534_3955110156.sh`	alamb	sql_planner	`https://github.com/apache/datafusion/pull/20534#issuecomment-3955110156`
`20481_3955122956.sh`	alamb	default	`https://github.com/apache/datafusion/pull/20481#issuecomment-3955122956`
`20481_3955149024.sh`	Dandandan	clickbench_partitioned (env: DATAFUSION_EXECUTION_PARQUET_PUSHDOWN_FILTERS=true DATAFUSION_EXECUTION_PARQUET_REORDER_FILTERS=true)	`https://github.com/apache/datafusion/pull/20481#issuecomment-3955149024`
`20363_3955225206.sh`	adriangb	clickbench_partitioned (env: DATAFUSION_EXECUTION_PARQUET_PUSHDOWN_FILTERS=true DATAFUSION_EXECUTION_PARQUET_REORDER_FILTERS=true)	`https://github.com/apache/datafusion/pull/20363#issuecomment-3955225206`

alamb-ghbot · 2026-02-25T01:40:53Z

🤖 ./gh_compare_branch.sh gh_compare_branch.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing filter-pushdown-dynamic-bytes (aaa75f2) to b9328b9 diff using: clickbench_partitioned
Results will be posted here when complete

alamb-ghbot · 2026-02-25T02:09:22Z

🤖: Benchmark completed

Details

Comparing HEAD and filter-pushdown-dynamic-bytes
--------------------
Benchmark clickbench_partitioned.json
--------------------

adriangb · 2026-02-25T14:36:54Z

run benchmark clickbench_partitioned
DATAFUSION_EXECUTION_PARQUET_PUSHDOWN_FILTERS=true
DATAFUSION_EXECUTION_PARQUET_REORDER_FILTERS=true

adriangb · 2026-02-25T16:06:48Z

show benchmark queue

alamb-ghbot · 2026-02-25T16:06:50Z

🤖 Hi @adriangb, you asked to view the benchmark queue (#20363 (comment)).

Job	User	Benchmarks	Comment
`19728_3959360119.sh`	alamb	clickbench_partitioned	`https://github.com/apache/datafusion/pull/19728#issuecomment-3959360119`
`19728_3959359739.sh`	alamb	clickbench_partitioned (env: DATAFUSION_EXECUTION_PARQUET_PUSHDOWN_FILTERS=true DATAFUSION_EXECUTION_PARQUET_REORDER_FILTERS=true)	`https://github.com/apache/datafusion/pull/19728#issuecomment-3959359739`
`20363_3959693326.sh`	adriangb	clickbench_partitioned (env: DATAFUSION_EXECUTION_PARQUET_PUSHDOWN_FILTERS=true DATAFUSION_EXECUTION_PARQUET_REORDER_FILTERS=true)	`https://github.com/apache/datafusion/pull/20363#issuecomment-3959693326`
`20481_3960310838.sh`	Dandandan	clickbench_partitioned (env: DATAFUSION_EXECUTION_PARQUET_PUSHDOWN_FILTERS=true DATAFUSION_EXECUTION_PARQUET_REORDER_FILTERS=true)	`https://github.com/apache/datafusion/pull/20481#issuecomment-3960310838`

alamb-ghbot · 2026-02-25T16:25:11Z

🤖 ./gh_compare_branch.sh gh_compare_branch.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing filter-pushdown-dynamic-bytes (aaa75f2) to b9328b9 diff using: clickbench_partitioned
Results will be posted here when complete

alamb-ghbot · 2026-02-25T16:54:25Z

🤖: Benchmark completed

Details

Comparing HEAD and filter-pushdown-dynamic-bytes
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃        HEAD ┃ filter-pushdown-dynamic-bytes ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0  │     2.53 ms │                       2.64 ms │     no change │
│ QQuery 1  │    52.35 ms │                      52.96 ms │     no change │
│ QQuery 2  │   131.00 ms │                     134.25 ms │     no change │
│ QQuery 3  │   154.14 ms │                     151.73 ms │     no change │
│ QQuery 4  │  1010.08 ms │                    1014.38 ms │     no change │
│ QQuery 5  │  1238.25 ms │                    1262.65 ms │     no change │
│ QQuery 6  │    17.37 ms │                       7.24 ms │ +2.40x faster │
│ QQuery 7  │    68.70 ms │                      68.96 ms │     no change │
│ QQuery 8  │  1358.20 ms │                    1408.47 ms │     no change │
│ QQuery 9  │  1734.13 ms │                    1763.40 ms │     no change │
│ QQuery 10 │   482.46 ms │                     487.49 ms │     no change │
│ QQuery 11 │   534.68 ms │                     546.47 ms │     no change │
│ QQuery 12 │  1376.64 ms │                    1400.04 ms │     no change │
│ QQuery 13 │  2090.56 ms │                    2049.19 ms │     no change │
│ QQuery 14 │  1392.07 ms │                    1405.61 ms │     no change │
│ QQuery 15 │  1153.69 ms │                    1188.94 ms │     no change │
│ QQuery 16 │  2420.32 ms │                    2468.34 ms │     no change │
│ QQuery 17 │  2439.14 ms │                    2475.68 ms │     no change │
│ QQuery 18 │  4696.48 ms │                    4827.59 ms │     no change │
│ QQuery 19 │   138.08 ms │                     140.65 ms │     no change │
│ QQuery 20 │  1770.09 ms │                    1777.28 ms │     no change │
│ QQuery 21 │  2183.88 ms │                    2157.75 ms │     no change │
│ QQuery 22 │  3747.64 ms │                    2942.05 ms │ +1.27x faster │
│ QQuery 23 │  1045.27 ms │                    1207.91 ms │  1.16x slower │
│ QQuery 24 │   241.92 ms │                     204.22 ms │ +1.18x faster │
│ QQuery 25 │   608.39 ms │                     628.69 ms │     no change │
│ QQuery 26 │   336.69 ms │                     238.10 ms │ +1.41x faster │
│ QQuery 27 │  2799.20 ms │                    2438.20 ms │ +1.15x faster │
│ QQuery 28 │ 21908.37 ms │                   23904.20 ms │  1.09x slower │
│ QQuery 29 │   998.99 ms │                     952.53 ms │     no change │
│ QQuery 30 │  1265.27 ms │                    1272.97 ms │     no change │
│ QQuery 31 │  1311.55 ms │                    1304.77 ms │     no change │
│ QQuery 32 │  3951.87 ms │                    4131.34 ms │     no change │
│ QQuery 33 │  4967.33 ms │                    5084.78 ms │     no change │
│ QQuery 34 │  5583.07 ms │                    5855.00 ms │     no change │
│ QQuery 35 │  1842.97 ms │                    1874.90 ms │     no change │
│ QQuery 36 │   181.72 ms │                     168.30 ms │ +1.08x faster │
│ QQuery 37 │    86.77 ms │                      86.91 ms │     no change │
│ QQuery 38 │    84.89 ms │                      94.07 ms │  1.11x slower │
│ QQuery 39 │   289.83 ms │                     291.41 ms │     no change │
│ QQuery 40 │    59.72 ms │                      54.82 ms │ +1.09x faster │
│ QQuery 41 │    51.56 ms │                      56.45 ms │  1.09x slower │
│ QQuery 42 │    38.89 ms │                      52.88 ms │  1.36x slower │
└───────────┴─────────────┴───────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                            ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                            │ 77846.73ms │
│ Total Time (filter-pushdown-dynamic-bytes)   │ 79636.21ms │
│ Average Time (HEAD)                          │  1810.39ms │
│ Average Time (filter-pushdown-dynamic-bytes) │  1852.00ms │
│ Queries Faster                               │          7 │
│ Queries Slower                               │          5 │
│ Queries with No Change                       │         31 │
│ Queries with Failure                         │          0 │
└──────────────────────────────────────────────┴────────────┘

adriangb · 2026-02-25T17:12:18Z

run benchmark clickbench_partitioned
DATAFUSION_EXECUTION_PARQUET_PUSHDOWN_FILTERS=true
DATAFUSION_EXECUTION_PARQUET_REORDER_FILTERS=true

alamb-ghbot · 2026-02-25T17:32:15Z

🤖 ./gh_compare_branch.sh gh_compare_branch.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing filter-pushdown-dynamic-bytes (3a4511f) to b9328b9 diff using: clickbench_partitioned
Results will be posted here when complete

alamb-ghbot · 2026-02-25T18:01:38Z

🤖: Benchmark completed

Details

Comparing HEAD and filter-pushdown-dynamic-bytes
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃        HEAD ┃ filter-pushdown-dynamic-bytes ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0  │     2.56 ms │                       2.63 ms │     no change │
│ QQuery 1  │    52.46 ms │                      52.89 ms │     no change │
│ QQuery 2  │   131.74 ms │                     130.76 ms │     no change │
│ QQuery 3  │   157.86 ms │                     156.25 ms │     no change │
│ QQuery 4  │   995.97 ms │                    1031.06 ms │     no change │
│ QQuery 5  │  1270.00 ms │                    1264.02 ms │     no change │
│ QQuery 6  │    17.80 ms │                       9.34 ms │ +1.90x faster │
│ QQuery 7  │    68.58 ms │                      65.71 ms │     no change │
│ QQuery 8  │  1365.22 ms │                    1415.75 ms │     no change │
│ QQuery 9  │  1776.90 ms │                    1764.53 ms │     no change │
│ QQuery 10 │   479.01 ms │                     483.21 ms │     no change │
│ QQuery 11 │   528.89 ms │                     530.44 ms │     no change │
│ QQuery 12 │  1371.54 ms │                    1384.82 ms │     no change │
│ QQuery 13 │  2018.88 ms │                    2040.67 ms │     no change │
│ QQuery 14 │  1388.12 ms │                    1403.30 ms │     no change │
│ QQuery 15 │  1146.12 ms │                    1176.75 ms │     no change │
│ QQuery 16 │  2444.29 ms │                    2529.31 ms │     no change │
│ QQuery 17 │  2430.13 ms │                    2482.21 ms │     no change │
│ QQuery 18 │  5264.45 ms │                    4906.84 ms │ +1.07x faster │
│ QQuery 19 │   135.54 ms │                     137.71 ms │     no change │
│ QQuery 20 │  1740.32 ms │                    1750.65 ms │     no change │
│ QQuery 21 │  2208.20 ms │                    2144.18 ms │     no change │
│ QQuery 22 │  3728.19 ms │                    3088.91 ms │ +1.21x faster │
│ QQuery 23 │  1055.02 ms │                    1068.56 ms │     no change │
│ QQuery 24 │   243.01 ms │                     212.54 ms │ +1.14x faster │
│ QQuery 25 │   599.75 ms │                     608.94 ms │     no change │
│ QQuery 26 │   340.35 ms │                     227.70 ms │ +1.49x faster │
│ QQuery 27 │  2769.48 ms │                    2339.84 ms │ +1.18x faster │
│ QQuery 28 │ 22177.26 ms │                   23878.20 ms │  1.08x slower │
│ QQuery 29 │   967.67 ms │                     955.31 ms │     no change │
│ QQuery 30 │  1288.14 ms │                    1231.92 ms │     no change │
│ QQuery 31 │  1341.31 ms │                    1321.31 ms │     no change │
│ QQuery 32 │  4533.90 ms │                    4026.53 ms │ +1.13x faster │
│ QQuery 33 │  5366.53 ms │                    5044.04 ms │ +1.06x faster │
│ QQuery 34 │  5411.59 ms │                    5346.41 ms │     no change │
│ QQuery 35 │  1833.12 ms │                    1857.85 ms │     no change │
│ QQuery 36 │   177.99 ms │                     159.91 ms │ +1.11x faster │
│ QQuery 37 │    89.31 ms │                      72.87 ms │ +1.23x faster │
│ QQuery 38 │    87.03 ms │                      91.05 ms │     no change │
│ QQuery 39 │   278.76 ms │                     281.03 ms │     no change │
│ QQuery 40 │    56.86 ms │                      48.66 ms │ +1.17x faster │
│ QQuery 41 │    49.16 ms │                      33.22 ms │ +1.48x faster │
│ QQuery 42 │    35.87 ms │                      52.40 ms │  1.46x slower │
└───────────┴─────────────┴───────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                            ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                            │ 79424.85ms │
│ Total Time (filter-pushdown-dynamic-bytes)   │ 78810.20ms │
│ Average Time (HEAD)                          │  1847.09ms │
│ Average Time (filter-pushdown-dynamic-bytes) │  1832.80ms │
│ Queries Faster                               │         12 │
│ Queries Slower                               │          2 │
│ Queries with No Change                       │         29 │
│ Queries with Failure                         │          0 │
└──────────────────────────────────────────────┴────────────┘

adriangb · 2026-02-25T23:39:30Z

run benchmark clickbench_extended
DATAFUSION_EXECUTION_PARQUET_PUSHDOWN_FILTERS=true
DATAFUSION_EXECUTION_PARQUET_REORDER_FILTERS=true

adriangb · 2026-02-25T23:39:36Z

run benchmark tcph
DATAFUSION_EXECUTION_PARQUET_PUSHDOWN_FILTERS=true
DATAFUSION_EXECUTION_PARQUET_REORDER_FILTERS=true

alamb-ghbot · 2026-02-25T23:39:39Z

🤖 ./gh_compare_branch.sh gh_compare_branch.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing filter-pushdown-dynamic-bytes (3a4511f) to b9328b9 diff using: clickbench_extended
Results will be posted here when complete

alamb-ghbot · 2026-02-25T23:39:48Z

🤖 Hi @adriangb, thanks for the request (#20363 (comment)).

scrape_comments.py only supports whitelisted benchmarks.

Standard: clickbench_1, clickbench_extended, clickbench_partitioned, clickbench_pushdown, external_aggr, tpcds, tpch, tpch10, tpch_mem, tpch_mem10
Criterion: aggregate_query_sql, aggregate_vectorized, case_when, character_length, in_list, left, plan_reuse, range_and_generate_series, replace, reset_plan_states, sort, sql_planner, strpos, substr_index, with_hashes

Please choose one or more of these with run benchmark <name> or run benchmark <name1> <name2>...

You can also set environment variables on subsequent lines:

run benchmark tpch_mem
DATAFUSION_RUNTIME_MEMORY_LIMIT=1G

Unsupported benchmarks: tcph.

adriangb · 2026-02-25T23:46:03Z

run benchmark tpch
DATAFUSION_EXECUTION_PARQUET_PUSHDOWN_FILTERS=true
DATAFUSION_EXECUTION_PARQUET_REORDER_FILTERS=true

alamb-ghbot · 2026-02-26T00:02:15Z

🤖: Benchmark completed

Details

Comparing HEAD and filter-pushdown-dynamic-bytes
--------------------
Benchmark clickbench_extended.json
--------------------
┏━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┓
┃ Query    ┃        HEAD ┃ filter-pushdown-dynamic-bytes ┃         Change ┃
┡━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━┩
│ QQuery 0 │  2166.07 ms │                    2159.88 ms │      no change │
│ QQuery 1 │   870.73 ms │                     920.67 ms │   1.06x slower │
│ QQuery 2 │  1755.11 ms │                    1764.73 ms │      no change │
│ QQuery 3 │  1034.51 ms │                    1020.70 ms │      no change │
│ QQuery 4 │  2157.67 ms │                    2248.62 ms │      no change │
│ QQuery 5 │ 28430.69 ms │                   28133.92 ms │      no change │
│ QQuery 6 │   109.39 ms │                   11631.36 ms │ 106.33x slower │
│ QQuery 7 │  2637.42 ms │                    2858.60 ms │   1.08x slower │
└──────────┴─────────────┴───────────────────────────────┴────────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                            ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                            │ 39161.59ms │
│ Total Time (filter-pushdown-dynamic-bytes)   │ 50738.48ms │
│ Average Time (HEAD)                          │  4895.20ms │
│ Average Time (filter-pushdown-dynamic-bytes) │  6342.31ms │
│ Queries Faster                               │          0 │
│ Queries Slower                               │          3 │
│ Queries with No Change                       │          5 │
│ Queries with Failure                         │          0 │
└──────────────────────────────────────────────┴────────────┘

alamb-ghbot · 2026-02-26T00:02:18Z

🤖 ./gh_compare_branch.sh gh_compare_branch.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing filter-pushdown-dynamic-bytes (3a4511f) to b9328b9 diff using: tpch
Results will be posted here when complete

alamb-ghbot · 2026-02-26T00:03:08Z

🤖: Benchmark completed

Details

Comparing HEAD and filter-pushdown-dynamic-bytes
--------------------
Benchmark tpch_sf1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃      HEAD ┃ filter-pushdown-dynamic-bytes ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1  │ 184.85 ms │                     175.10 ms │ +1.06x faster │
│ QQuery 2  │  98.02 ms │                      91.78 ms │ +1.07x faster │
│ QQuery 3  │ 167.39 ms │                     129.22 ms │ +1.30x faster │
│ QQuery 4  │ 134.67 ms │                      83.88 ms │ +1.61x faster │
│ QQuery 5  │ 305.28 ms │                     301.44 ms │     no change │
│ QQuery 6  │ 192.92 ms │                      59.22 ms │ +3.26x faster │
│ QQuery 7  │ 274.67 ms │                     282.27 ms │     no change │
│ QQuery 8  │ 326.54 ms │                     327.62 ms │     no change │
│ QQuery 9  │ 408.51 ms │                     530.61 ms │  1.30x slower │
│ QQuery 10 │ 271.23 ms │                     310.94 ms │  1.15x slower │
│ QQuery 11 │  75.72 ms │                      65.50 ms │ +1.16x faster │
│ QQuery 12 │ 256.99 ms │                     162.81 ms │ +1.58x faster │
│ QQuery 13 │ 216.94 ms │                     226.47 ms │     no change │
│ QQuery 14 │ 113.60 ms │                     206.07 ms │  1.81x slower │
│ QQuery 15 │ 191.77 ms │                     131.22 ms │ +1.46x faster │
│ QQuery 16 │  73.43 ms │                      66.58 ms │ +1.10x faster │
│ QQuery 17 │ 223.77 ms │                     236.98 ms │  1.06x slower │
│ QQuery 18 │ 488.89 ms │                     498.84 ms │     no change │
│ QQuery 19 │ 155.51 ms │                     149.29 ms │     no change │
│ QQuery 20 │ 153.22 ms │                     220.46 ms │  1.44x slower │
│ QQuery 21 │ 350.53 ms │                     294.18 ms │ +1.19x faster │
│ QQuery 22 │  65.47 ms │                      59.63 ms │ +1.10x faster │
└───────────┴───────────┴───────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary                            ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (HEAD)                            │ 4729.92ms │
│ Total Time (filter-pushdown-dynamic-bytes)   │ 4610.09ms │
│ Average Time (HEAD)                          │  215.00ms │
│ Average Time (filter-pushdown-dynamic-bytes) │  209.55ms │
│ Queries Faster                               │        11 │
│ Queries Slower                               │         5 │
│ Queries with No Change                       │         6 │
│ Queries with Failure                         │         0 │
└──────────────────────────────────────────────┴───────────┘

adriangb · 2026-02-26T00:27:34Z

run benchmark clickbench_extended
DATAFUSION_EXECUTION_PARQUET_PUSHDOWN_FILTERS=true
DATAFUSION_EXECUTION_PARQUET_REORDER_FILTERS=true

alamb-ghbot · 2026-02-26T00:33:01Z

🤖 ./gh_compare_branch.sh gh_compare_branch.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing filter-pushdown-dynamic-bytes (3a4511f) to b9328b9 diff using: clickbench_extended
Results will be posted here when complete

alamb-ghbot · 2026-02-26T00:56:01Z

🤖: Benchmark completed

Details

Comparing HEAD and filter-pushdown-dynamic-bytes
--------------------
Benchmark clickbench_extended.json
--------------------
┏━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┓
┃ Query    ┃        HEAD ┃ filter-pushdown-dynamic-bytes ┃         Change ┃
┡━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━┩
│ QQuery 0 │  2239.73 ms │                    2226.99 ms │      no change │
│ QQuery 1 │   895.61 ms │                     884.02 ms │      no change │
│ QQuery 2 │  1698.72 ms │                    1721.93 ms │      no change │
│ QQuery 3 │  1007.64 ms │                    1061.55 ms │   1.05x slower │
│ QQuery 4 │  2246.31 ms │                    2249.39 ms │      no change │
│ QQuery 5 │ 28217.27 ms │                   28120.76 ms │      no change │
│ QQuery 6 │   109.15 ms │                   11511.62 ms │ 105.47x slower │
│ QQuery 7 │  2708.26 ms │                    2673.31 ms │      no change │
└──────────┴─────────────┴───────────────────────────────┴────────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                            ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                            │ 39122.69ms │
│ Total Time (filter-pushdown-dynamic-bytes)   │ 50449.58ms │
│ Average Time (HEAD)                          │  4890.34ms │
│ Average Time (filter-pushdown-dynamic-bytes) │  6306.20ms │
│ Queries Faster                               │          0 │
│ Queries Slower                               │          2 │
│ Queries with No Change                       │          6 │
│ Queries with Failure                         │          0 │
└──────────────────────────────────────────────┴────────────┘

adriangb force-pushed the filter-pushdown-dynamic-bytes branch from e0240af to 09cdb0b Compare February 15, 2026 13:11

adriangb mentioned this pull request Feb 25, 2026

feat: adaptive filter selectivity tracking for Parquet row filters #19639

Closed

7 tasks

update default

3a4511f

Conversation

adriangb commented Feb 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

1. New module: selectivity.rs (1,554 lines)

2. New wrapper: OptionalFilterPhysicalExpr (in physical_expr_common)

3. Removal of reorder_filters config option

4. Three new configuration options (in ParquetOptions)

5. Changes to ParquetOpener / opener.rs

6. Changes to ParquetSource / source.rs

7. Changes to build_row_filter() / row_filter.rs

8. Changes to HashJoinExec

9. Protobuf schema updates

10. Test and benchmark updates

Are these changes tested?

Are there any user-facing changes?

Uh oh!

adriangb commented Feb 15, 2026

Uh oh!

adriangb commented Feb 15, 2026

Uh oh!

alamb-ghbot commented Feb 15, 2026

Uh oh!

adriangb commented Feb 15, 2026

Uh oh!

alamb-ghbot commented Feb 15, 2026

Uh oh!

alamb-ghbot commented Feb 15, 2026

Uh oh!

Dandandan commented Feb 15, 2026

Uh oh!

alamb-ghbot commented Feb 15, 2026

Uh oh!

Dandandan commented Feb 15, 2026

Uh oh!

alamb-ghbot commented Feb 15, 2026

Uh oh!

Dandandan commented Feb 15, 2026

Uh oh!

Dandandan commented Feb 15, 2026

Uh oh!

adriangb commented Feb 15, 2026

Uh oh!

adriangb commented Feb 15, 2026

Uh oh!

alamb-ghbot commented Feb 15, 2026

Uh oh!

adriangb commented Feb 15, 2026

Uh oh!

Dandandan commented Feb 15, 2026

Uh oh!

adriangb commented Feb 15, 2026

Uh oh!

adriangb commented Feb 15, 2026

Uh oh!

alamb-ghbot commented Feb 15, 2026

Uh oh!

alamb commented Feb 15, 2026

Uh oh!

Dandandan commented Feb 24, 2026

Uh oh!

Dandandan commented Feb 24, 2026

Uh oh!

adriangb commented Feb 24, 2026

Uh oh!

Dandandan commented Feb 24, 2026

Uh oh!

adriangb commented Feb 24, 2026

Uh oh!

adriangb commented Feb 25, 2026

Uh oh!

alamb-ghbot commented Feb 25, 2026

Uh oh!

alamb-ghbot commented Feb 25, 2026

Uh oh!

alamb-ghbot commented Feb 25, 2026

Uh oh!

adriangb commented Feb 25, 2026

adriangb commented Feb 15, 2026 •

edited

Loading

1. New module: `selectivity.rs` (1,554 lines)

2. New wrapper: `OptionalFilterPhysicalExpr` (in `physical_expr_common`)

3. Removal of `reorder_filters` config option

4. Three new configuration options (in `ParquetOptions`)

5. Changes to `ParquetOpener` / opener.rs

6. Changes to `ParquetSource` / source.rs

7. Changes to `build_row_filter()` / row_filter.rs

8. Changes to `HashJoinExec`