feat: Implement ANSI support for UnaryMinus#471
Conversation
andygrove
left a comment
There was a problem hiding this comment.
This is looking good @vaibhawvipul. I think all that is needed now are unit tests perhaps in CometExpressionSuite to show that this is all working as intended.
I also left some minor feedback.
|
@andygrove we have a test case now which shows that we have parity. |
spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala
Outdated
Show resolved
Hide resolved
spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala
Outdated
Show resolved
Hide resolved
spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala
Outdated
Show resolved
Hide resolved
spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala
Outdated
Show resolved
Hide resolved
|
I have made all the requested changes, to me it looks ready for CI. Please let me know. cc @kazuyukitanimura @parthchandra |
kazuyukitanimura
left a comment
There was a problem hiding this comment.
Sorry I asked many things, but a few more comments
No problem, it greatly improved my contribution. I learnt a lot and I am happy. This exercise will ensure that my next PR won't have this much back-and-forth :) |
|
@kazuyukitanimura this is ready for review. |
| withTable("t_interval") { | ||
| spark.sql("CREATE TABLE t_interval(a STRING) USING PARQUET") | ||
| spark.sql("INSERT INTO t_interval VALUES ('INTERVAL 10000000000 YEAR')") | ||
| withAnsiMode(enabled = true) { | ||
| spark | ||
| .sql("SELECT CAST(a AS INTERVAL) AS a FROM t_interval") | ||
| .createOrReplaceTempView("t_interval_casted") | ||
| checkOverflow("SELECT a, -a FROM t_interval_casted", "interval") | ||
| } | ||
| } |
There was a problem hiding this comment.
It looks this does not hit native code as CAST(a AS INTERVAL) is not supported yet. So it will fall back to Spark and checkOverflow is comparing Spark results on both sides.
Perhaps there is no good way of creating interval now...
| withTable("t") { | ||
| sql("create table t(a int) using parquet") | ||
| sql("insert into t values (-2147483648)") | ||
| withAnsiMode(enabled = true) { | ||
| checkOverflow("select a, -a from t", "integer") | ||
| } | ||
| } | ||
|
|
||
| withTable("t_float") { | ||
| sql("create table t_float(a float) using parquet") | ||
| sql("insert into t_float values (3.4128235E38)") | ||
| withAnsiMode(enabled = true) { | ||
| checkOverflow("select a, -a from t_float", "float") | ||
| } | ||
| } |
There was a problem hiding this comment.
BTW this will not test scalar case unless we do something like
withSQLConf(
"spark.sql.optimizer.excludedRules" ->
"org.apache.spark.sql.catalyst.optimizer.ConstantFolding") {
checkOverflow("select a, -(a) from t_float", "float")
I think the current test creates a single item array.
That said, unless that option is set, it is unlikely to hit the scalar scenario. It is ideal to test scalar cases because user jobs may use that option, but it can be a follow up fix.
| arrow::datatypes::IntervalUnit::DayTime => check_overflow!( | ||
| array, | ||
| arrow::array::IntervalDayTimeArray, | ||
| i64::MIN, | ||
| "interval" | ||
| ), |
There was a problem hiding this comment.
I was expecting to see that testing this fails because DataFusion neg_wrapping breaks the i64 into two i32.
Then I realized there is no good way of testing right now as mentioned above.
andygrove
left a comment
There was a problem hiding this comment.
Thanks again @vaibhawvipul and thank you for the reviews @kazuyukitanimura and @parthchandra
|
Hi @vaibhawvipul , I don't know why, but it seems that a test included in this PR is failing in the main branch in spak 4.0 tests. Any thoughts on this? Thanks in advance! (- unary negative integer overflow test *** FAILED *** (739 milliseconds) |
Hmm.. I think there was another PR which got merged after mine, that is breaking the tests. |
|
Ops, looks like this is a merge conflict, I will fix it. Sorry for the inconvenience. |
@kazuyukitanimura @vaibhawvipul This is already fixed in main. I fixed it as part of #505 |
|
Ah thank you @andygrove cc @planga82 |
* checking for invalid inputs for unary minus * adding eval mode to expressions and proto message * extending evaluate function for negative expression * remove print statements * fix format errors * removing units * fix clippy errors * expect instead of unwrap, map_err instead of match and removing Float16 * adding test case for unary negative integer overflow * added a function to make the code more readable * adding comet sql ansi config * using withTempDir and checkSparkAnswerAndOperator * adding macros to improve code readability * using withParquetTable * adding scalar tests * adding more test cases and bug fix * using failonerror and removing eval_mode * bug fix * removing checks for float64 and monthdaynano * removing checks of float and monthday nano * adding checks while evalute bounds * IntervalDayTime splitting i64 and then checking * Adding interval test * fix ci errors




Which issue does this PR close?
Closes #465 .
Rationale for this change
Improves compatibility with spark
What changes are included in this PR?
ANSI support for UnaryMinus by adding input checks when ANSI mode enabled.
How are these changes tested?
returns the expected error
Caused by: org.apache.spark.SparkArithmeticException: [ARITHMETIC_OVERFLOW] integer overflow. If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error.