Enforce JOIN plan to require condition#15334
Conversation
| } | ||
|
|
||
| #[test] | ||
| fn limit_should_push_down_join_without_condition() -> Result<()> { |
There was a problem hiding this comment.
This case has never occurred now.
| 08)--------------LeftAnti Join: customer.c_custkey = __correlated_sq_1.o_custkey | ||
| 09)----------------Filter: substr(customer.c_phone, Int64(1), Int64(2)) IN ([Utf8View("13"), Utf8View("31"), Utf8View("23"), Utf8View("29"), Utf8View("30"), Utf8View("18"), Utf8View("17")]) | ||
| 10)------------------TableScan: customer projection=[c_custkey, c_phone, c_acctbal], partial_filters=[substr(customer.c_phone, Int64(1), Int64(2)) IN ([Utf8View("13"), Utf8View("31"), Utf8View("23"), Utf8View("29"), Utf8View("30"), Utf8View("18"), Utf8View("17")])] | ||
| 10)------------------TableScan: customer projection=[c_custkey, c_phone, c_acctbal], partial_filters=[substr(customer.c_phone, Int64(1), Int64(2)) IN ([Utf8View("13"), Utf8View("31"), Utf8View("23"), Utf8View("29"), Utf8View("30"), Utf8View("18"), Utf8View("17")]), Boolean(true)] |
There was a problem hiding this comment.
I benchmarked to ensure the plan change won't impact the performance.
--------------------
Benchmark tpch_sf1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Query ┃ main ┃ fix_join-check ┃ Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ QQuery 1 │ 68.97ms │ 70.14ms │ no change │
│ QQuery 2 │ 18.72ms │ 18.55ms │ no change │
│ QQuery 3 │ 30.47ms │ 29.46ms │ no change │
│ QQuery 4 │ 21.45ms │ 21.77ms │ no change │
│ QQuery 5 │ 43.53ms │ 41.56ms │ no change │
│ QQuery 6 │ 14.39ms │ 14.88ms │ no change │
│ QQuery 7 │ 55.03ms │ 55.30ms │ no change │
│ QQuery 8 │ 41.24ms │ 39.45ms │ no change │
│ QQuery 9 │ 50.84ms │ 53.19ms │ no change │
│ QQuery 10 │ 44.85ms │ 44.83ms │ no change │
│ QQuery 11 │ 13.42ms │ 13.22ms │ no change │
│ QQuery 12 │ 28.49ms │ 28.20ms │ no change │
│ QQuery 13 │ 29.56ms │ 29.38ms │ no change │
│ QQuery 14 │ 23.03ms │ 24.40ms │ 1.06x slower │
│ QQuery 15 │ 33.59ms │ 34.98ms │ no change │
│ QQuery 16 │ 13.07ms │ 13.08ms │ no change │
│ QQuery 17 │ 58.00ms │ 58.67ms │ no change │
│ QQuery 18 │ 76.00ms │ 75.06ms │ no change │
│ QQuery 19 │ 39.99ms │ 38.40ms │ no change │
│ QQuery 20 │ 29.04ms │ 29.79ms │ no change │
│ QQuery 21 │ 62.14ms │ 64.14ms │ no change │
│ QQuery 22 │ 13.82ms │ 14.01ms │ no change │
└──────────────┴─────────┴────────────────┴──────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┓
┃ Benchmark Summary ┃ ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━┩
│ Total Time (main) │ 809.65ms │
│ Total Time (fix_join-check) │ 812.45ms │
│ Average Time (main) │ 36.80ms │
│ Average Time (fix_join-check) │ 36.93ms │
│ Queries Faster │ 0 │
│ Queries Slower │ 1 │
│ Queries with No Change │ 21 │
└───────────────────────────────┴──────────┘
|
|
||
| query IIII | ||
| select * from testSubQueryLimit as t1 join (select * from testSubQueryLimit limit 1) limit 10; | ||
| select * from testSubQueryLimit as t1, (select * from testSubQueryLimit limit 1) limit 10; |
There was a problem hiding this comment.
why this change is needed?
|
|
||
| # join condition is required | ||
| query error join condition should not be empty | ||
| SELECT * FROM t1 FULL JOIN t2 |
There was a problem hiding this comment.
Please include also a cross join without filters like
SELECT * FROM t1 CROSS JOIN t2
1f0a00e to
7765900
Compare
* add check for missing join condition * modify the tests * handle the cross join case * remove invald sql case * fix sqllogictests * fix for cross join case * revert the change for limit.slt * add cross join test
|
@goldmedal I don't understand this change. which actually corresponds to the following: The generated plan has a Left Join with a |
The I think the result is expected. |
* add check for missing join condition * modify the tests * handle the cross join case * remove invald sql case * fix sqllogictests * fix for cross join case * revert the change for limit.slt * add cross join test
Which issue does this PR close?
JOINshould requireONcondition #13486Rationale for this change
When working on unparsing the plan optimized by
ScalarSubqueryToJoin, I noticed it could generate a LEFT JOIN without the join condition.The unparsing result will look like
Most databases don't allow the join without the condition besides the cross join.
DuckDB
Postgres
DataFusion
I prefer to align this behavior with others. It makes the logical plan created by DataFusion reliable.
#13486 also mentioned that
joinwithout condition isn't a valid SQL.Because the cross-join has the same IR as the inner join without the condition, #13486 should be prevented from the SQL layer. Maybe we can consider to reopen apache/datafusion-sqlparser-rs#1552
What changes are included in this PR?
LEFT JOINplan with the conditionTrue.Are these changes tested?
yes
Are there any user-facing changes?
Besides
INNER JOIN, the join syntax without condition is not allowed anymore.