Feat: Support arrays_overlap function#1312
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1312 +/- ##
=============================================
- Coverage 56.12% 39.09% -17.04%
- Complexity 976 2065 +1089
=============================================
Files 119 260 +141
Lines 11743 60269 +48526
Branches 2251 12834 +10583
=============================================
+ Hits 6591 23562 +16971
- Misses 4012 32226 +28214
- Partials 1140 4481 +3341 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| expr.children(1), | ||
| inputs, | ||
| (builder, binaryExpr) => builder.setArrayAppend(binaryExpr)) | ||
| case ArraysOverlap(leftArrayExpr, rightArrayExpr) => |
There was a problem hiding this comment.
Do we support this for all input data types?
There was a problem hiding this comment.
Because the PR only contains basic tests, could you add a check to enable this expression only if CometConf.COMET_CAST_ALLOW_INCOMPATIBLE is enabled? We can remove this check in a future PR that adds comprehensive tests and demonstrates that we have Spark-compatible behavior for all supported data types.`
There was a problem hiding this comment.
Yes, makes sense and COMET_CAST_ALLOW_INCOMPATIBLE check has been added.
| } | ||
| } | ||
|
|
||
| test("arrays_overlap") { |
There was a problem hiding this comment.
Could you add tests where one or both input expressions evaluate to null on some rows? This can be achieved with a CASE expression as in some of the other array tests
There was a problem hiding this comment.
+1 for more coverage
There was a problem hiding this comment.
Case Expression based test case has been added.
|
@erenavsarogullari could you rebase this PR when you have time? We can go ahead and merge this and then improve tests as part of #1269 |
spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala
Outdated
Show resolved
Hide resolved
32b78fd to
86d8f57
Compare
86d8f57 to
96f776e
Compare
Thanks @andygrove and @kazuyukitanimura for the review. PR has been rebased and addressed the comments. FYI, Currently, DataFusion does not support |
andygrove
left a comment
There was a problem hiding this comment.
Thanks @erenavsarogullari. LGTM.
| checkSparkAnswerAndOperator(sql( | ||
| "SELECT arrays_overlap(array('a', null), array('b', null)) from t1 where _1 is not null")) | ||
| checkSparkAnswerAndOperator(spark.sql( | ||
| "SELECT arrays_overlap((CASE WHEN _2 =_3 THEN array(_6, _7) END), array(_6, _7)) FROM t1")); |
There was a problem hiding this comment.
I think _2 = _3 may always be true. The problem with makeParquetFileAllTypes is that every for each row, each column contains the same integer value cast to the column's type, so it is not ideal for tests like this. We can improve as part of #1269
Which issue does this PR close?
Related to Epic: #1042
arrays_overlap:select arrays_overlap(array('hello', '-', 'world'), array('hello'))=>trueDataFusion' s array_has_any has same behavior with Spark 's arrays_overlap function
Spark:https://docs.databricks.com/en/sql/language-manual/functions/arrays_overlap.htmlDataFusion:https://datafusion.apache.org/user-guide/sql/scalar_functions.html#array-has-anyRationale for this change
Defined under Epic: #1042
What changes are included in this PR?
planner.rs:Maps Spark 'sarrays_overlapfunction to DataFusionarray_has_anyphysical expression from Spark physical expression with return type: DataType::Boolean,expr.proto:arrays_overlap message has been added,QueryPlanSerde.scala:arrays_overlap pattern matching case has been added,CometExpressionSuite.scala:A new UT has been added for arrays_overlap function.How are these changes tested?
A new UT has been added.