Move conditions from JOIN ON to WHERE#19685
Conversation
|
Could you please merge/rebase with upstream master? Because this:
is already fixed. |
tests/queries/0_stateless/01653_move_conditions_from_join_on_to_where.sql
Show resolved
Hide resolved
|
|
||
| SELECT '---------Q5----------'; | ||
| SELECT table1.a, table2.b FROM table1 JOIN table2 ON (table1.a = table2.a) AND (table1.b = 6) AND (table2.b > 20) AND (10 < 6); | ||
| EXPLAIN SYNTAX SELECT table1.a, table2.b FROM table1 JOIN table2 ON (table1.a = table2.a) AND (table1.b = 6) AND (table2.b > 20) AND (10 < 6); |
There was a problem hiding this comment.
I think we should also add tests with WHERE present in query
| ASTPtr asof_right_key{}; | ||
| ASTPtr new_on_expression{}; | ||
| ASTPtr new_where_conditions{}; | ||
| bool move_to_where{false}; |
There was a problem hiding this comment.
Can we use new_where_conditions != nullptr instead of move_to_where?
src/Interpreters/TreeRewriter.cpp
Outdated
| data.asofToJoinKeys(); | ||
| else if (data.move_to_where) | ||
| { | ||
| table_join.on_expression = (data.new_on_expression)->clone(); |
src/Interpreters/TreeRewriter.cpp
Outdated
| { | ||
| if (select_query->where()) | ||
| select_query->setExpression(ASTSelectQuery::Expression::WHERE, | ||
| makeASTFunction("and", new_where_conditions->clone(), select_query->where()->clone())); |
| ASTPtr left = func.arguments->children.at(0); | ||
| ASTPtr right = func.arguments->children.at(1); | ||
| auto table_numbers = getTableNumbers(left, right, data); | ||
| if (table_numbers.first != 0) |
There was a problem hiding this comment.
Maybe it is better to make this case more explicit. E.g. return pair with table_numbers.first == table_numbers.second and check this condition or add some special return value for it.
As I understand 0, 0 can be returned from getTableNumbers (but will be rejected later). Can we ensure that we don't we mix some cases? I'm trying to find counterexample, but it works correctly, but handling that case explicitly may be more readable at least.
There was a problem hiding this comment.
Let's only change check to table_numbers.first == table_numbers.second without passing pointer to bool
There was a problem hiding this comment.
ok, I have change it in the new commit.
And I think getTableForIdentifiers() should not return 0, because if there is a table that is not related to the joinedTables, it will be checked in TreeRewrite.cpp:appendSelect function before collectJoinOnKeysVisitors. So I add an assert(table_number != 0) in getTableForIdentifiers().
There was a problem hiding this comment.
Assert shouldn't check this condition because it means server fail (it actually fails server in debug), but exception only show error to user. So, if it's checked in appendSelect and throws an exception it is enough.
Example of such situation found by fuzzer, see AST fuzzer (debug) — Assertion table_number != 0' failed` https://clickhouse-test-reports.s3.yandex.net/19685/e1359b01a1cc34c7a6e5fead6568b6ecae5ba0a9/fuzzer_debug/report.html#fail1
There was a problem hiding this comment.
fuzzer is an amazing tool. How can I run fuzzer tests on my server?
And I found the sql SELECT * FROM table1 INNER JOIN table2 ON (table1.a = table2.a) AND (table1.a AND (1048575 < 65536) AND 2 AND (1048575 < -1)) AND (table2.b < toUInt32(table1, 10)) WHERE table1.b < 1 that table_number != 0' failed.
But I try it, it make another error:
dell123 :) SELECT * FROM table1 INNER JOIN table2 ON (table1.a = table2.a) AND (table1.a AND (1048575 < 65536) AND 2 AND (1048575 < -1)) AND (table2.b < toUInt32(table1, 10)) WHERE table1.b <1
SELECT *
FROM table1
INNER JOIN table2 ON (table1.a = table2.a) AND (table1.a AND (1048575 < 65536) AND 2 AND (1048575 < -1)) AND (table2.b < toUInt32(table1, 10))
WHERE table1.b < 1
Query id: e8d1f0c5-01de-4ae5-81e0-c539af9d34cf
0 rows in set. Elapsed: 0.019 sec.
Received exception from server (version 21.3.1):
Code: 48. DB::Exception: Received from localhost:9000. DB::Exception: JOIN ON inequalities are not supported. Unexpected 'table2.b < toUInt32(table1, 10)': While processing table2.b < toUInt32(table1, 10).
dell123 :)
There was a problem hiding this comment.
it make another error
Is it release build?
assert works only in debug build. Also we have similar behavior with LOGICAL_ERROR that fail server in debug, but in release only throws error to user.
How can I run fuzzer tests on my server?
Queries from particular run you can get from logs and run locally as you do it now.
Generating fuzzed queries you can run locally for any base query with passing option --query-fuzzer-runs=n to clickhouse-client
ClickHouse/programs/client/Client.cpp
Line 2482 in f7dbdc6
There was a problem hiding this comment.
@vdimir yes, it is release mode. and I build a debug mode, assert works.
Thank you for your guidance^^
| EXPLAIN SYNTAX SELECT * FROM table1 JOIN table2 ON (table1.a = table2.a) AND (table2.b < toUInt32(40)) where table1.b < 1; | ||
| SELECT * FROM table1 JOIN table2 ON (table1.a = table2.a) AND (table2.b < toUInt32(40)) where table1.b > 10; | ||
|
|
||
| DROP TABLE table1; |
There was a problem hiding this comment.
We need a test to check that this optimization is not applied for outer joins, because ON and WHERE conditions have different meaning there.
There was a problem hiding this comment.
Something like this:
postgres=# select * from (select 1 x) l full join (select 2 x) r on l.x = r.x and l.x = 1;
x | x
---+---
1 |
| 2
(2 rows)
postgres=# select * from (select 1 x) l full join (select 2 x) r on l.x = r.x where l.x = 1;
x | x
---+---
1 |
(1 row)
There was a problem hiding this comment.
@akuzm
Oh, there is a problem.
In this pr the result of executing :
dell123 :) select * from (select 1 x) l full join (select 2 x) r on l.x = r.x and l.x = 1;
┌─x─┬─r.x─┐
│ 1 │ 0 │
└───┴─────┘
dell123 :) explain syntax select * from (select 1 x) l full join (select 2 x) r on l.x = r.x and l.x = 1;
┌─explain──────────────┐
│ SELECT │
│ x, │
│ r.x │
│ FROM │
│ ( │
│ SELECT 1 AS x │
│ ) AS l │
│ ALL FULL OUTER JOIN │
│ ( │
│ SELECT 2 AS x │
│ ) AS r ON x = r.x │
│ WHERE x = 1 │
└──────────────────────┘
So the state is wrong. In the previous CH version, it just throw error:
dell126 :) select * from (select 1 x) l full join (select 2 x) r on l.x = r.x and l.x = 1;
Received exception from server (version 21.1.1):
Code: 403. DB::Exception: Received from localhost:9002. DB::Exception: Not equi-join ON expression: x = 1. No columns in one of equality side.: While processing x = 1.
So I think I should modify the implementation to check whether this is an inner join, if it is not an inner join , I will do nothing. otherwise I will try to optimize it, right?
There was a problem hiding this comment.
So I think I should modify the implementation to check whether this is an inner join, if it is not an inner join , I will do nothing. otherwise I will try to optimize it, right?
Yes. Also I suppose we can move conditions for left table in RIGHT join and conditions for right table for LEFT join. But it's up to you, we can do it later.
There was a problem hiding this comment.
ok, I will try it.
There was a problem hiding this comment.
@vdimir it seems that only inner join can move condition in JOIN ON to WHERE.
Because WHERE is executed after join and will remove LEFT/RIGHT join's unmatched result.
Am I right?
There was a problem hiding this comment.
Yes we don't want to remove unmatched columns. But for LEFT join there will not be unmatched rows from right table, so conditions for right table is safe. But it's just an idea, maybe there are better solutions and now we can only do for INNER, because it's clear.
There was a problem hiding this comment.
select * from t1 left join t2 on t1.a = t2.a and t2.b = 20;
should optimized to
select * from t1 left join (select * form t2 where t2.b = 20) t2 on t1.a = t2.a
but not to
select * from t1 left join t2 on t1.a = t2.a where t2.b = 20;
Right??? @vdimir
There was a problem hiding this comment.
Yes, I missed up this when wrote previous comment. Not sure how easy difficult to do it in this way.
|
It's slightly related to #19856, not I'm trying to realize can it be solved independently or we should make changes common for both cases. |
So In which situations should cross join optimized to inner join? And what kind of changes should we do? Any details? |
|
@hexiaoting I've thought about how this issue can be related to your changes. I haven't found reasons to mix this features, I think we can merge your code now. I've only performed minor refactoring. Could you, please, look at my last commit and check that I haven't omit some important part? |
|
|
I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
move Conditions that are not related to JOIN to where clause. #18720
Detailed description / Documentation draft:
Identify expression related to one table or non-table , and move them to where clause