feat: Implement ANSI support for UnaryMinus by vaibhawvipul · Pull Request #471 · apache/datafusion-comet

vaibhawvipul · 2024-05-25T13:36:14Z

Which issue does this PR close?

Closes #465 .

Rationale for this change

Improves compatibility with spark

What changes are included in this PR?

ANSI support for UnaryMinus by adding input checks when ANSI mode enabled.

How are these changes tested?

scala> val df = Seq(Int.MaxValue, Int.MinValue).toDF("a")
scala> df.write.parquet("/tmp/int.parquet")
scala> spark.read.parquet("/tmp/int.parquet").createTempView("t")

scala> spark.conf.set("spark.sql.ansi.enabled", true)
scala> spark.conf.set("spark.comet.ansi.enabled", true)

scala> spark.conf.set("spark.comet.enabled", true)

scala> spark.sql("select a, -a from t").show

returns the expected error

Caused by: org.apache.spark.SparkArithmeticException: [ARITHMETIC_OVERFLOW] integer overflow. If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error.

vaibhawvipul · 2024-05-26T13:28:54Z

Adding evidence that we have parity with spark with and without ANSI mode.

Comet and Spark outputs with ANSI false -

Comet and Spark outputs with ANSI true -

Spark -

Comet -

core/src/execution/datafusion/expressions/negative.rs

core/src/execution/datafusion/planner.rs

core/src/execution/datafusion/expressions/negative.rs

andygrove

This is looking good @vaibhawvipul. I think all that is needed now are unit tests perhaps in CometExpressionSuite to show that this is all working as intended.

I also left some minor feedback.

vaibhawvipul · 2024-05-28T07:17:26Z

@andygrove we have a test case now which shows that we have parity.

spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala

andygrove

LGTM pending CI

spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala

spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala

core/src/execution/datafusion/expressions/negative.rs

spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala

core/src/execution/datafusion/expressions/negative.rs

vaibhawvipul · 2024-05-31T06:18:35Z

I have made all the requested changes, to me it looks ready for CI.

Please let me know. cc @kazuyukitanimura @parthchandra

parthchandra

LGTM

spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala

kazuyukitanimura

Sorry I asked many things, but a few more comments

spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala

core/src/execution/datafusion/expressions/negative.rs

vaibhawvipul · 2024-06-01T00:38:08Z

Sorry I asked many things, but a few more comments

No problem, it greatly improved my contribution. I learnt a lot and I am happy. This exercise will ensure that my next PR won't have this much back-and-forth :)

vaibhawvipul · 2024-06-01T15:58:40Z

@kazuyukitanimura this is ready for review.

kazuyukitanimura

This PR went through enough iterations, so I intend to accept this PR.

My comments below can be updated by follow ups.

kazuyukitanimura · 2024-06-03T19:30:37Z

spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala

+      withTable("t_interval") {
+        spark.sql("CREATE TABLE t_interval(a STRING) USING PARQUET")
+        spark.sql("INSERT INTO t_interval VALUES ('INTERVAL 10000000000 YEAR')")
+        withAnsiMode(enabled = true) {
+          spark
+            .sql("SELECT CAST(a AS INTERVAL) AS a FROM t_interval")
+            .createOrReplaceTempView("t_interval_casted")
+          checkOverflow("SELECT a, -a FROM t_interval_casted", "interval")
+        }
+      }


It looks this does not hit native code as CAST(a AS INTERVAL) is not supported yet. So it will fall back to Spark and checkOverflow is comparing Spark results on both sides.

Perhaps there is no good way of creating interval now...

kazuyukitanimura · 2024-06-03T19:44:59Z

spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala

+      withTable("t") {
+        sql("create table t(a int) using parquet")
+        sql("insert into t values (-2147483648)")
+        withAnsiMode(enabled = true) {
+          checkOverflow("select a, -a from t", "integer")
+        }
+      }
+
+      withTable("t_float") {
+        sql("create table t_float(a float) using parquet")
+        sql("insert into t_float values (3.4128235E38)")
+        withAnsiMode(enabled = true) {
+          checkOverflow("select a, -a from t_float", "float")
+        }
+      }


BTW this will not test scalar case unless we do something like

withSQLConf( "spark.sql.optimizer.excludedRules" -> "org.apache.spark.sql.catalyst.optimizer.ConstantFolding") { checkOverflow("select a, -(a) from t_float", "float")

I think the current test creates a single item array.

That said, unless that option is set, it is unlikely to hit the scalar scenario. It is ideal to test scalar cases because user jobs may use that option, but it can be a follow up fix.

kazuyukitanimura · 2024-06-03T19:55:01Z

core/src/execution/datafusion/expressions/negative.rs

+                            arrow::datatypes::IntervalUnit::DayTime => check_overflow!(
+                                array,
+                                arrow::array::IntervalDayTimeArray,
+                                i64::MIN,
+                                "interval"
+                            ),


I was expecting to see that testing this fails because DataFusion neg_wrapping breaks the i64 into two i32.

Then I realized there is no good way of testing right now as mentioned above.

andygrove

Thanks again @vaibhawvipul and thank you for the reviews @kazuyukitanimura and @parthchandra

planga82 · 2024-06-04T13:16:41Z

Hi @vaibhawvipul ,

I don't know why, but it seems that a test included in this PR is failing in the main branch in spak 4.0 tests. Any thoughts on this? Thanks in advance!

(- unary negative integer overflow test *** FAILED *** (739 milliseconds)
"[ARITHMETIC_OVERFLOW] short overflow. If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error. SQLSTATE: 22003" did not contain " caused overflow" (CometExpressionSuite.scala:1563))

vaibhawvipul · 2024-06-04T13:50:15Z

Hi @vaibhawvipul ,

I don't know why, but it seems that a test included in this PR is failing in the main branch in spak 4.0 tests. Any thoughts on this? Thanks in advance!

(- unary negative integer overflow test *** FAILED *** (739 milliseconds) "[ARITHMETIC_OVERFLOW] short overflow. If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error. SQLSTATE: 22003" did not contain " caused overflow" (CometExpressionSuite.scala:1563))

Hmm.. I think there was another PR which got merged after mine, that is breaking the tests.

kazuyukitanimura · 2024-06-04T17:28:25Z

Ops, looks like this is a merge conflict, I will fix it. Sorry for the inconvenience.

andygrove · 2024-06-04T17:48:07Z

Ops, looks like this is a merge conflict, I will fix it. Sorry for the inconvenience.

@kazuyukitanimura @vaibhawvipul This is already fixed in main. I fixed it as part of #505

kazuyukitanimura · 2024-06-04T17:59:57Z

Ah thank you @andygrove cc @planga82

* checking for invalid inputs for unary minus * adding eval mode to expressions and proto message * extending evaluate function for negative expression * remove print statements * fix format errors * removing units * fix clippy errors * expect instead of unwrap, map_err instead of match and removing Float16 * adding test case for unary negative integer overflow * added a function to make the code more readable * adding comet sql ansi config * using withTempDir and checkSparkAnswerAndOperator * adding macros to improve code readability * using withParquetTable * adding scalar tests * adding more test cases and bug fix * using failonerror and removing eval_mode * bug fix * removing checks for float64 and monthdaynano * removing checks of float and monthday nano * adding checks while evalute bounds * IntervalDayTime splitting i64 and then checking * Adding interval test * fix ci errors

vaibhawvipul added 4 commits May 25, 2024 19:05

checking for invalid inputs for unary minus

21e4629

adding eval mode to expressions and proto message

6dac6b1

extending evaluate function for negative expression

546b5e8

remove print statements

4d54a12

vaibhawvipul marked this pull request as ready for review May 26, 2024 13:30

andygrove reviewed May 26, 2024

View reviewed changes

core/src/execution/datafusion/expressions/negative.rs Outdated Show resolved Hide resolved

vaibhawvipul added 3 commits May 26, 2024 22:33

fix format errors

8395775

removing units

ffacb98

fix clippy errors

8635451

vaibhawvipul requested a review from andygrove May 27, 2024 13:07

andygrove reviewed May 27, 2024

View reviewed changes

core/src/execution/datafusion/planner.rs Outdated Show resolved Hide resolved

andygrove reviewed May 27, 2024

View reviewed changes

core/src/execution/datafusion/planner.rs Outdated Show resolved Hide resolved

andygrove reviewed May 27, 2024

View reviewed changes

core/src/execution/datafusion/expressions/negative.rs Outdated Show resolved Hide resolved

andygrove reviewed May 27, 2024

View reviewed changes

core/src/execution/datafusion/expressions/negative.rs Outdated Show resolved Hide resolved

andygrove reviewed May 27, 2024

View reviewed changes

core/src/execution/datafusion/expressions/negative.rs Outdated Show resolved Hide resolved

andygrove reviewed May 27, 2024

View reviewed changes

vaibhawvipul added 2 commits May 27, 2024 21:19

expect instead of unwrap, map_err instead of match and removing Float16

9675562

adding test case for unary negative integer overflow

a55deab

vaibhawvipul added 2 commits May 28, 2024 14:24

added a function to make the code more readable

e3d1058

adding comet sql ansi config

dcd1686

andygrove reviewed May 28, 2024

View reviewed changes

spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala Outdated Show resolved Hide resolved

andygrove reviewed May 28, 2024

View reviewed changes

spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala Outdated Show resolved Hide resolved

andygrove approved these changes May 28, 2024

View reviewed changes

using withTempDir and checkSparkAnswerAndOperator

1b345fd

kazuyukitanimura reviewed May 28, 2024

View reviewed changes

vaibhawvipul added 2 commits May 29, 2024 07:50

adding macros to improve code readability

6c75611

using withParquetTable

ed0574f

vaibhawvipul requested a review from kazuyukitanimura May 29, 2024 05:39

parthchandra reviewed May 30, 2024

View reviewed changes

core/src/execution/datafusion/expressions/negative.rs Outdated Show resolved Hide resolved

core/src/execution/datafusion/expressions/negative.rs Outdated Show resolved Hide resolved

vaibhawvipul added 4 commits May 31, 2024 07:31

bug fix

10f64d6

removing checks for float64 and monthdaynano

7a8fd38

removing checks of float and monthday nano

178345a

adding checks while evalute bounds

ec4c23f

vaibhawvipul requested review from andygrove, kazuyukitanimura and parthchandra May 31, 2024 06:17

parthchandra approved these changes May 31, 2024

View reviewed changes

spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala Show resolved Hide resolved

kazuyukitanimura reviewed May 31, 2024

View reviewed changes

IntervalDayTime splitting i64 and then checking

24c48ac

vaibhawvipul requested a review from kazuyukitanimura June 1, 2024 00:38

Adding interval test

08840f2

fix ci errors

9973306

kazuyukitanimura approved these changes Jun 3, 2024

View reviewed changes

andygrove approved these changes Jun 3, 2024

View reviewed changes

andygrove merged commit edd63ef into apache:main Jun 3, 2024

kazuyukitanimura mentioned this pull request Jun 3, 2024

Add tests for Scalar and Inverval values for UnaryMinus #508

Closed

vaibhawvipul deleted the issue-465 branch June 4, 2024 02:12

raulcd mentioned this pull request Oct 1, 2024

feat: Implement ANSI support for Round #989

Closed

coderfender pushed a commit to coderfender/datafusion-comet that referenced this pull request Dec 13, 2025

Disable Comet shuffle by default (apache#471)

4e43e63

Conversation

vaibhawvipul commented May 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

How are these changes tested?

Uh oh!

vaibhawvipul commented May 26, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

andygrove left a comment

Choose a reason for hiding this comment

Uh oh!

vaibhawvipul commented May 28, 2024

Uh oh!

Uh oh!

Uh oh!

andygrove left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vaibhawvipul commented May 31, 2024

Uh oh!

parthchandra left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kazuyukitanimura left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vaibhawvipul commented Jun 1, 2024

Uh oh!

vaibhawvipul commented Jun 1, 2024

Uh oh!

kazuyukitanimura left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kazuyukitanimura Jun 3, 2024

Choose a reason for hiding this comment

Uh oh!

kazuyukitanimura Jun 3, 2024

Choose a reason for hiding this comment

Uh oh!

kazuyukitanimura Jun 3, 2024

Choose a reason for hiding this comment

Uh oh!

andygrove left a comment

Choose a reason for hiding this comment

Uh oh!

planga82 commented Jun 4, 2024

Uh oh!

vaibhawvipul commented Jun 4, 2024

Uh oh!

kazuyukitanimura commented Jun 4, 2024

Uh oh!

andygrove commented Jun 4, 2024

Uh oh!

kazuyukitanimura commented Jun 4, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

vaibhawvipul commented May 25, 2024 •

edited

Loading

kazuyukitanimura left a comment •

edited

Loading