Combine multiple IN lists in ExprSimplifier#8949
Conversation
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
alamb
left a comment
There was a problem hiding this comment.
Thank you for this contribution @jayzhan211 -- I think this one is pretty close. I think the tests need to be moved with the other simplification tests and a few more cases added but all in all 🦾 -- really nice
| // specific language governing permissions and limitations | ||
| // under the License. | ||
|
|
||
| //! This module implements a rule that simplifies expressions that is guaranteed to be true or false at planning time |
There was a problem hiding this comment.
| //! This module implements a rule that simplifies expressions that is guaranteed to be true or false at planning time | |
| //! This module implements a rule that simplifies the values for `InList`s |
| /// Simplify expressions that is guaranteed to be true or false to a literal boolean expression | ||
| /// | ||
| /// Rules: | ||
| /// If both expressions are positive or negative, then we can apply intersection of both inlist expressions |
There was a problem hiding this comment.
I think the
| /// If both expressions are positive or negative, then we can apply intersection of both inlist expressions | |
| /// If both expressions are `IN` or `NOT IN`, then we can apply intersection of both lists |
| /// | ||
| /// Rules: | ||
| /// If both expressions are positive or negative, then we can apply intersection of both inlist expressions | ||
| /// 1. `a in (1,2,3) AND a in (4,5) -> a in (1,2,3,4,5)` |
There was a problem hiding this comment.
Maybe we can update this example to show the lists having overlap (like a in (1,2,3) AND a in (2,3,4) --> a in (1,2,3,4)
| /// 1. `a in (1,2,3) AND a in (4,5) -> a in (1,2,3,4,5)` | ||
| /// 2. `a not int (1,2,3) AND a not in (4,5,6) -> a not in (1,2,3,4,5,6)` | ||
| /// If one of the expressions is negated, then we apply exception on positive expression | ||
| /// 1. `a in (1,2,3) AND a not in (1,2,3,4,5) -> a in ()` |
There was a problem hiding this comment.
Shouldn't this be rewritten to false?
| /// 1. `a in (1,2,3) AND a not in (1,2,3,4,5) -> a in ()` | |
| /// 1. `a in (1,2,3) AND a not in (1,2,3,4,5) -> false` |
There was a problem hiding this comment.
Yes, I thought a in () is more understandable in this context
| } | ||
| } | ||
|
|
||
| fn inlist_intersection(l1: &InList, l2: &InList, negated: bool) -> Result<Expr> { |
There was a problem hiding this comment.
I think you could avoid a clone here potentially because the mutate function gets passed an owed expr, like:
fn inlist_intersection(l1: InList, l2: InList, negated: bool) -> Result<Expr> {
(we could do this as a follow on PR too)
There was a problem hiding this comment.
But we still need expr or Binaray Expr if failed to match if let statement. I did not find out how to get the cloned value if we consume the value at first hand.
There was a problem hiding this comment.
I spent some time fooling with it this afternoon and here is what I came up with #8971
| ----------RepartitionExec: partitioning=Hash([ps_partkey@0], 4), input_partitions=1 | ||
| ------------MemoryExec: partitions=1, partition_sizes=[1] | ||
|
|
||
| # Inlist simplification |
There was a problem hiding this comment.
I recommend one or two tests in slt, and then adding the other tests in https://github.com/apache/arrow-datafusion/blob/795c71f3e8249847593a58dafe578ffcbcd3e012/datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs#L1302-L1307 because:
- It would keep the tests closer to the other simplification tests
- It makes it easier to validate what code is causing the rewrite
| create table t(x int) as values (1), (2), (3); | ||
|
|
||
| query TT | ||
| explain select x from t where x IN (1,2,3) AND x IN (4,5); |
There was a problem hiding this comment.
i have some additional suggested tests:
- Negative tests like
x IN (1,2,3) OR x IN (4,5)(maybe that could actually be combined intoX IN (1,2,3,4,5)🤔 - Tests with more than 2 clauses:
x IN (1,2,3) AND x IN (3,4) AND x IN (2,3)) - Tests with more than 2 clauses that are not all IN lists but the inlists could be combined
x IN (1,2,3) AND y = 2 AND x IN (2,3))
Maybe also some tests that could in theory be simplified like
`x IN (1,2,3) AND x = 2 AND x IN (2,3))` --> x=2
(we don't have to actually implement that simplification in this PR, but it would be good to document the potential further improvement in tests)
IN lists IN lists in ExprSimplifier
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
|
In the following on PR, I think we can move inlist related simplifier logic to
In this way, when the logic grows more complex, we can easily find out related conversion about specific Expr |
alamb
left a comment
There was a problem hiding this comment.
This looks great -- thank you @jayzhan211 very nice 👏
Great idea -- filed #8970 to track this |
Which issue does this PR close?
Closes #8687.
Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?