Enhance: simplify x=x --> x IS NOT NULL OR NULL#15589
Conversation
|
This optimizer transformation doesn't look right for projected columns like However it makes sense if it is only applied on filter expressions. |
Thank you for review. I'll make the change |
18e4bf0 to
7a17897
Compare
|
@2010YOUY01 Instead of applying transformation on filter expression, I adjusted the rule to transform x=x into About the performance, I'm not 100% sure whether this rule worth the change, but I made this change because |
I recommend writing a simple benchmark by criterion or other tools to get more reliable results instead of measuring the CLI execution. But I agree that this should be more performant |
|
We could also use a CASE CASE x IS NOT NULL THEN true ELSE null END |
|
I expect this to make a large performance difference when x is a string type (as string comparisons are fairly expensive) Thank you for this PR @ding-young and the great reviews @2010YOUY01 and @berkaysynnada |
|
I wrote a simple criterion bench (https://github.com/ding-young/datafusion/blob/simplify-trivial-eq-bench/datafusion/core/benches/simplify_trivial_eq.rs) and the result is as follows. Plus, since Thanks for the review! Trying out the bench was also a great experience for me. |
Thank you for sharing the results. It doesn't effect much here but next time you can exclude the create_context() part out of bench_function :) |
datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs
Outdated
Show resolved
Hide resolved
7a17897 to
ce1089b
Compare
kosiew
left a comment
There was a problem hiding this comment.
👍 👍 @ding-young
Left a few suggestions.
ce1089b to
44906b2
Compare
alamb
left a comment
There was a problem hiding this comment.
Thank you @ding-young @kosiew @2010YOUY01 and @berkaysynnada -- go team 🦾 !
| 06)----------Filter: __common_expr_3 = __common_expr_3 | ||
| 07)------------Projection: substr(CAST(md5(CAST(tmp_table.value AS Utf8)) AS Utf8), Int64(1), Int64(32)) AS __common_expr_3 | ||
| 08)--------------TableScan: tmp_table projection=[value] | ||
| 05)--------Projection: |
There was a problem hiding this comment.
There is a comment a few lines above that says:
TODO: this should probably be possible to completely remove the filter as always true?
We can probably update that too -- but we could do it in a follow on PR as well
x=x --> x IS NOT NULL OR NULL
- if x is not nullable, x=x -> true - else, x=x -> x is NOT NULL OR NULL
44906b2 to
115c8a2
Compare
| # The optimizer does not currently eliminate the filter; | ||
| # Instead, it's rewritten as `IS NULL OR NOT NULL` due to SQL null semantics |
There was a problem hiding this comment.
I updated the comment in slt as well :)
|
Thanks again @ding-young 🙏 |
- if x is not nullable, x=x -> true - else, x=x -> x is NOT NULL OR NULL

Which issue does this PR close?
Rationale for this change
What changes are included in this PR?
This pr adds a rule in ExprSimplifier to simplify x=x as follows.
Are these changes tested?
Yes, via unit test and slt
Are there any user-facing changes?