chore: Use DataFusion's optimizer in the join scan#4009
Merged
Conversation
Contributor
There was a problem hiding this comment.
pg_search 'logs' Query Performance
Details
| Benchmark suite | Current: a7dcac7 | Previous: c0c03ea | Ratio |
|---|---|---|---|
bucket-expr-filter |
7978.3885 median ms |
7988.855 median ms |
1.00 |
bucket-expr-filter - alternative 1 |
7904.639499999999 median ms |
7895.584 median ms |
1.00 |
bucket-numeric-filter |
2083.355 median ms |
2138.3869999999997 median ms |
0.97 |
bucket-numeric-filter - alternative 1 |
275.3455 median ms |
282.9895 median ms |
0.97 |
bucket-numeric-filter - alternative 2 |
104.254 median ms |
93.3485 median ms |
1.12 |
bucket-numeric-filter - alternative 3 |
275.9875 median ms |
286.312 median ms |
0.96 |
bucket-numeric-filter - alternative 4 |
413.4015 median ms |
421.4335 median ms |
0.98 |
bucket-numeric-nofilter |
2083.3 median ms |
2126.266 median ms |
0.98 |
bucket-numeric-nofilter - alternative 1 |
274.188 median ms |
284.9495 median ms |
0.96 |
bucket-numeric-nofilter - alternative 2 |
103.832 median ms |
91.10050000000001 median ms |
1.14 |
bucket-numeric-nofilter - alternative 3 |
276.20500000000004 median ms |
285.5245 median ms |
0.97 |
bucket-numeric-nofilter - alternative 4 |
414.61800000000005 median ms |
501.655 median ms |
0.83 |
bucket-string-filter |
3220.92 median ms |
3145.9365 median ms |
1.02 |
bucket-string-filter - alternative 1 |
234.222 median ms |
251.256 median ms |
0.93 |
bucket-string-filter - alternative 2 |
67.6215 median ms |
63.38 median ms |
1.07 |
bucket-string-filter - alternative 3 |
235.495 median ms |
256.0325 median ms |
0.92 |
bucket-string-filter - alternative 4 |
363.77099999999996 median ms |
372.4565 median ms |
0.98 |
bucket-string-nofilter |
3217.401 median ms |
3152.049 median ms |
1.02 |
bucket-string-nofilter - alternative 1 |
233.7165 median ms |
253.49200000000002 median ms |
0.92 |
bucket-string-nofilter - alternative 2 |
67.3075 median ms |
62.6375 median ms |
1.07 |
bucket-string-nofilter - alternative 3 |
238.3095 median ms |
256.286 median ms |
0.93 |
bucket-string-nofilter - alternative 4 |
362.72299999999996 median ms |
372.3595 median ms |
0.97 |
cardinality |
15036.7125 median ms |
15078.462 median ms |
1.00 |
cardinality - alternative 1 |
1985.4895000000001 median ms |
2040.4175 median ms |
0.97 |
cardinality - alternative 2 |
272.46299999999997 median ms |
284.881 median ms |
0.96 |
cardinality - alternative 3 |
103.939 median ms |
92.691 median ms |
1.12 |
cardinality - alternative 4 |
274.2915 median ms |
286.605 median ms |
0.96 |
cardinality - alternative 5 |
276.2255 median ms |
286.668 median ms |
0.96 |
count-filter |
211.2425 median ms |
214.6745 median ms |
0.98 |
count-filter - alternative 1 |
125.467 median ms |
126.6285 median ms |
0.99 |
count-filter - alternative 2 |
88.71100000000001 median ms |
86.721 median ms |
1.02 |
count-filter - alternative 3 |
125.9745 median ms |
128.86950000000002 median ms |
0.98 |
count-filter - alternative 4 |
125.016 median ms |
126.41149999999999 median ms |
0.99 |
count-nofilter |
746.6859999999999 median ms |
712.9960000000001 median ms |
1.05 |
count-nofilter - alternative 1 |
280.2415 median ms |
300.068 median ms |
0.93 |
count-nofilter - alternative 2 |
155.8965 median ms |
151.713 median ms |
1.03 |
count-nofilter - alternative 3 |
280.07849999999996 median ms |
302.578 median ms |
0.93 |
count-nofilter - alternative 4 |
280.8695 median ms |
302.63300000000004 median ms |
0.93 |
filtered-highcard |
5.867 median ms |
5.919499999999999 median ms |
0.99 |
filtered-lowcard |
5.8095 median ms |
5.777 median ms |
1.01 |
filtered_json-range |
7.4495000000000005 median ms |
7.3085 median ms |
1.02 |
filtered_json |
5.8405000000000005 median ms |
5.955500000000001 median ms |
0.98 |
highlighting |
8.508 median ms |
8.3375 median ms |
1.02 |
regex-and-heap |
6009.5064999999995 median ms |
6083.242 median ms |
0.99 |
top_n-agg-avg |
426.5395 median ms |
431.987 median ms |
0.99 |
top_n-agg-bucket-string |
388.071 median ms |
392.05949999999996 median ms |
0.99 |
top_n-agg-count |
431.08349999999996 median ms |
433.543 median ms |
0.99 |
top_n-compound |
76.6485 median ms |
72.4365 median ms |
1.06 |
top_n-numeric-highcard |
59.083 median ms |
55.4505 median ms |
1.07 |
top_n-numeric-lowcard |
43.952 median ms |
40.815 median ms |
1.08 |
top_n-score-asc |
93.7445 median ms |
91.1795 median ms |
1.03 |
top_n-score-desc |
90.79599999999999 median ms |
79.4495 median ms |
1.14 |
top_n-string |
44.8765 median ms |
43.634 median ms |
1.03 |
This comment was automatically generated by workflow using github-action-benchmark.
Contributor
There was a problem hiding this comment.
pg_search 'docs' Query Performance
Details
| Benchmark suite | Current: a7dcac7 | Previous: c0c03ea | Ratio |
|---|---|---|---|
aggregate_sort |
13753.4915 median ms |
13782.232499999998 median ms |
1.00 |
aggregate_sort - alternative 1 |
13774.423999999999 median ms |
13900.938999999998 median ms |
0.99 |
disjunctive_search |
562.0955 median ms |
559.8895 median ms |
1.00 |
disjunctive_search - alternative 1 |
565.6645 median ms |
594.2165 median ms |
0.95 |
distinct_parent_sort |
3421.5685000000003 median ms |
3622.2825000000003 median ms |
0.94 |
distinct_parent_sort - alternative 1 |
3457.191 median ms |
3614.009 median ms |
0.96 |
foreign_filter_local_sort |
147.711 median ms |
142.542 median ms |
1.04 |
foreign_filter_local_sort - alternative 1 |
4591.700999999999 median ms |
7086.736000000001 median ms |
0.65 |
hierarchical_content-no-scores-large |
1194.5214999999998 median ms |
1179.8355 median ms |
1.01 |
hierarchical_content-no-scores-small |
647.4580000000001 median ms |
660.4815 median ms |
0.98 |
hierarchical_content-scores-large |
1470.1855 median ms |
1458.6545 median ms |
1.01 |
hierarchical_content-scores-large - alternative 1 |
713.669 median ms |
722.751 median ms |
0.99 |
hierarchical_content-scores-small |
683.6655000000001 median ms |
689.8810000000001 median ms |
0.99 |
paging-string-max |
20.837 median ms |
19.355 median ms |
1.08 |
paging-string-median |
42.73950000000001 median ms |
42.199 median ms |
1.01 |
paging-string-min |
53.372 median ms |
53.232 median ms |
1.00 |
permissioned_search |
704.25 median ms |
719.5085 median ms |
0.98 |
permissioned_search - alternative 1 |
27483.391 median ms |
1560.5149999999999 median ms |
17.61 |
semi_join_filter |
593.473 median ms |
588.9515 median ms |
1.01 |
semi_join_filter - alternative 1 |
50969.813500000004 median ms |
31780.788999999997 median ms |
1.60 |
This comment was automatically generated by workflow using github-action-benchmark.
b0043b7 to
b7ae6b8
Compare
b7ae6b8 to
72d412b
Compare
Contributor
There was a problem hiding this comment.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'pg_search 'docs' Query Performance'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.10.
| Benchmark suite | Current: a7dcac7 | Previous: c0c03ea | Ratio |
|---|---|---|---|
permissioned_search - alternative 1 |
27483.391 median ms |
1560.5149999999999 median ms |
17.61 |
semi_join_filter - alternative 1 |
50969.813500000004 median ms |
31780.788999999997 median ms |
1.60 |
This comment was automatically generated by workflow using github-action-benchmark.
CC: @stuhood
72d412b to
a7dcac7
Compare
… sorts and limits.
a7dcac7 to
9ba22cd
Compare
stuhood
commented
Jan 29, 2026
stuhood
commented
Jan 29, 2026
mdashti
reviewed
Jan 30, 2026
mdashti
approved these changes
Jan 30, 2026
rebasedming
approved these changes
Jan 30, 2026
stuhood
commented
Feb 2, 2026
stuhood
added a commit
that referenced
this pull request
Feb 3, 2026
…4039) ## What Refactored the join scan to support pushing down nested joins (e.g., `(A JOIN B) JOIN C`). ## Why To enable multi-table joins to be executed in a columnar fashion by DataFusion, and to avoid materializing tuples until after a LIMIT can be safely applied (for TopN). ## How * Replaced (mostly) explicit use of binary outer/inner sides with collections of `JoinSource`s, to support arbitrary numbers of joined relations. * Extracted fast field pullup into a new `customscan/pullup.rs` module. * Applied leftover review feedback from #4009 ## Tests Expanded tests to cover multi-table scenarios, and added "alternatives" which use the join scan to our existing three-table benchmarks.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Ticket(s) Closed
joinscan#3987.What
This change migrates the
joinscanfrom manual physical plan construction to using DataFusion logical plans, which are then optimized into physical plans.To do so, it adds a TableProvider implementation to allow DataFusion to natively scan and filter using the previously extracted scan over fast fields.
It additionally rearranges the code a bit to ensure separation between planning and execution: execution state and DataFusion logical planning are isolated to
scan_state.rs(over time we might frontload and serialize more of DataFusion's logical plan during planning time, but for now none of that information is needed in theCustomPaththat we produce).JoinSideInfowas lifted up intopg_search/src/scanasScanInfo.Why
In followup changes, we will give DataFusion's optimizer a lot more to work with:
supports_filters_pushdown), and dynamic filteringTests
Existing tests pass, and were lightly expanded to cover some new edge cases.
One join benchmark is marginally faster, one is massively slower, others are unaffected. This is expected for now: more performance work is on the way.