feat: Support Ansi mode in abs function by planga82 · Pull Request #500 · apache/datafusion-comet

planga82 · 2024-06-01T13:49:02Z

Which issue does this PR close?

Closes #464 .

Rationale for this change

This PR adds support for ansi mode in the abs function. This is done by adding a wrapper to the abs datafusion function to add the different behavior between Spark and Datafusion. The main differences are the overflow behavior in legacy mode and the message exception in ansi mode.

What changes are included in this PR?

In addition to introducing the wrapper, some minor refactoring is done to move some code to a more general location.
In Spark, the abs function does not support try execution mode, only ansi or legacy.

How are these changes tested?

The new tests test the correct execution in the event of an overflow.

…/abs_ansi_mode

vaibhawvipul · 2024-06-03T15:51:12Z

spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala

          }

-        case Abs(child, _) =>
+        case Abs(child, failOnErr) =>


since we are already using failOnErr, can we simply use this boolean instead of evalmode struct? what are you thoughts?

Hi, thanks for your review! The intention why it is done this way is that in the Rust code the ansi mode is always treated the same way whether the expression supports the three ansi modes or only two. If not, we have to do a different treatment for both cases.

vaibhawvipul · 2024-06-03T15:53:18Z

spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala

  }

+  test("abs Overflow ansi mode") {
+    val data: Seq[(Int, Int)] = Seq((Int.MaxValue, Int.MinValue))


Can we have tests for all numerical values?

…/abs_ansi_mode

codecov-commenter · 2024-06-05T12:13:35Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 34.08%. Comparing base (7ba5693) to head (dc3f2a8).
Report is 17 commits behind head on main.

Additional details and impacted files

@@             Coverage Diff              @@
##               main     #500      +/-   ##
============================================
+ Coverage     34.05%   34.08%   +0.02%     
+ Complexity      859      812      -47     
============================================
  Files           116      105      -11     
  Lines         38679    38516     -163     
  Branches       8567     8555      -12     
============================================
- Hits          13173    13127      -46     
+ Misses        22745    22644     -101     
+ Partials       2761     2745      -16

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

andygrove · 2024-06-05T13:09:53Z

core/src/execution/datafusion/expressions/abs.rs

+        match self.inner_abs_func.invoke(args) {
+            Ok(result) => Ok(result),
+            Err(DataFusionError::ArrowError(ArrowError::ComputeError(msg), trace))
+                if msg.contains("overflow") =>


It would be nice if Arrow/DataFusion threw a specific overflow error so that we didn't have to look for a string within the error message, but I guess that isn't available.

I am going to file a feature request in DataFusion. I will post the link here later.

See apache/datafusion#10805 and apache/arrow-rs#5845

Great! I understand that we have to wait for the next version of arror-rs to be released and integrate it here to be able to make the changes, right?

We can continue with this and do a follow up once the arrow-rs change is available. Or you can wait; the arrow-rs community is very responsive.

Probably better to do it in a follow-up PR, I think. Thanks!

It would help to log an issue to keep track

Apparently my PR in Arrow will not be available for 3 months because it is an API change 😞

When we upgrade to the version of the arrow with the improved overflow eror reporing then the tests in this PR will fail (because we are looking for ComputeError but instead will get ArithmeticOverflow) so I don't think we need to file an issue

@parthchandra fyi ☝️

andygrove · 2024-06-05T13:16:57Z

core/src/execution/datafusion/expressions/abs.rs

+
+    fn invoke(&self, args: &[ColumnarValue]) -> Result<ColumnarValue, DataFusionError> {
+        match self.inner_abs_func.invoke(args) {
+            Ok(result) => Ok(result),


Technically there is no need to match on an Ok result because there is already the catch all other handling.

Suggested change

Ok(result) => Ok(result),

andygrove · 2024-06-05T13:18:14Z

core/src/execution/datafusion/expressions/abs.rs

+fn arithmetic_overflow_error(from_type: &str) -> CometError {
+    CometError::ArithmeticOverflow {
+        from_type: from_type.to_string(),
+    }
+}


This seems to duplicate the same function from negative.rs. Perhaps that one could be moved so that it can be reused here.

andygrove · 2024-06-05T13:20:31Z

core/src/execution/datafusion/planner.rs

+                let eval_mode = match spark_expression::EvalMode::try_from(expr.eval_mode)? {
+                    spark_expression::EvalMode::Legacy => EvalMode::Legacy,
+                    spark_expression::EvalMode::Ansi => EvalMode::Ansi,
+                    spark_expression::EvalMode::Try => {
+                        return Err(ExecutionError::GeneralError(
+                            "Invalid EvalMode: \"TRY\"".to_string(),
+                        ))
+                    }
+                };


I see that we have this code block duplicated for abs and cast now. Perhaps this could be extracted into a function. Another option would be to implement TryFrom or TryInto for this conversion.

andygrove

Thanks @planga82. This is looking good. I left some feedback on reducing code duplication that would be good to fix before merging.

planga82 · 2024-06-05T19:28:17Z

I have done a refactor with all the comments, thanks for the revision!

…/abs_ansi_mode

planga82 · 2024-06-06T09:45:20Z

The problems in the tests seem to be because the branch was not aligned with main. By aligning it, the tests have passed in my repo.

andygrove · 2024-06-07T17:18:09Z

core/src/execution/datafusion/planner.rs

-                    spark_expression::EvalMode::Try => EvalMode::Try,
-                    spark_expression::EvalMode::Ansi => EvalMode::Ansi,
-                };
+                let eval_mode = EvalMode::try_from(expr.eval_mode)?;


It would be more idiomatic to use try_into here.

Suggested change

let eval_mode = EvalMode::try_from(expr.eval_mode)?;

let eval_mode = expr.eval_mode.try_into()?;

andygrove · 2024-06-07T17:18:43Z

core/src/execution/datafusion/planner.rs

                let return_type = child.data_type(&input_schema)?;
                let args = vec![child];
-                let scalar_def = ScalarFunctionDefinition::UDF(math::abs());
+                let eval_mode = EvalMode::try_from(expr.eval_mode)?;


It would be more idiomatic to use try_into here.

Suggested change

let eval_mode = EvalMode::try_from(expr.eval_mode)?;

let eval_mode = expr.eval_mode.try_into()?;

andygrove

Thanks again @planga82. I left one more nit, but will be happy to merge after that unless @parthchandra has more feedback

…/abs_ansi_mode

planga82 · 2024-06-08T06:26:39Z

Thanks!! I am a rust beginner, I appreciate any comments!

* change proto msg * QueryPlanSerde with eval mode * Move eval mode * Add abs in planner * CometAbsFunc wrapper * Add error management * Add tests * Add license * spotless apply * format * Fix clippy * error msg for all spark versions * Fix benches * Use enum to ansi mode * Fix format * Add more tests * Format * Refactor * refactor * fix merge * fix merge

planga82 added 14 commits May 30, 2024 16:58

change proto msg

fce78fa

QueryPlanSerde with eval mode

1071eee

Move eval mode

750331d

Add abs in planner

9f89b57

CometAbsFunc wrapper

e6eda86

Add error management

0b37f8e

Add tests

d1e2099

Add license

73e5513

spotless apply

cff5f29

format

9b3b4c8

Merge remote-tracking branch 'refs/remotes/upstream/main' into bugfix…

708fffe

…/abs_ansi_mode

Fix clippy

f7df357

error msg for all spark versions

76914b0

Fix benches

3b55ca2

vaibhawvipul suggested changes Jun 3, 2024

View reviewed changes

planga82 added 7 commits June 3, 2024 20:11

Merge upstream/main

aa92450

Use enum to ansi mode

ab28bf6

Fix format

0dda0b2

Add more tests

1fc4f48

Merge remote-tracking branch 'refs/remotes/upstream/main' into bugfix…

828ab3b

…/abs_ansi_mode

Format

fe2a003

Merge remote-tracking branch 'refs/remotes/upstream/main' into bugfix…

dc3f2a8

…/abs_ansi_mode

andygrove reviewed Jun 5, 2024

View reviewed changes

andygrove approved these changes Jun 5, 2024

View reviewed changes

planga82 added 2 commits June 5, 2024 21:14

Refactor

3dff4bb

refactor

19969d6

Merge remote-tracking branch 'refs/remotes/upstream/main' into bugfix…

6fb873a

…/abs_ansi_mode

vaibhawvipul approved these changes Jun 7, 2024

View reviewed changes

andygrove reviewed Jun 7, 2024

View reviewed changes

andygrove approved these changes Jun 7, 2024

View reviewed changes

planga82 added 4 commits June 8, 2024 06:32

Merge remote-tracking branch 'refs/remotes/upstream/main' into bugfix…

bf64a24

…/abs_ansi_mode

merge upstream master

b4df447

fix merge

a72db13

fix merge

809052d

andygrove approved these changes Jun 10, 2024

View reviewed changes

andygrove merged commit e07f24c into apache:main Jun 11, 2024

raulcd mentioned this pull request Oct 1, 2024

feat: Implement ANSI support for Round #989

Closed

	let eval_mode = EvalMode::try_from(expr.eval_mode)?;
	let eval_mode = expr.eval_mode.try_into()?;

Conversation

planga82 commented Jun 1, 2024

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

How are these changes tested?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

codecov-commenter commented Jun 5, 2024

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

andygrove left a comment

Choose a reason for hiding this comment

Uh oh!

planga82 commented Jun 5, 2024

Uh oh!

planga82 commented Jun 6, 2024

Uh oh!

andygrove Jun 7, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

andygrove Jun 7, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

andygrove left a comment

Choose a reason for hiding this comment

Uh oh!

planga82 commented Jun 8, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

andygrove Jun 7, 2024 •

edited

Loading

andygrove Jun 7, 2024 •

edited

Loading