feat: Support Ansi mode in abs function#500
Conversation
| } | ||
|
|
||
| case Abs(child, _) => | ||
| case Abs(child, failOnErr) => |
There was a problem hiding this comment.
since we are already using failOnErr, can we simply use this boolean instead of evalmode struct? what are you thoughts?
There was a problem hiding this comment.
Hi, thanks for your review! The intention why it is done this way is that in the Rust code the ansi mode is always treated the same way whether the expression supports the three ansi modes or only two. If not, we have to do a different treatment for both cases.
| } | ||
|
|
||
| test("abs Overflow ansi mode") { | ||
| val data: Seq[(Int, Int)] = Seq((Int.MaxValue, Int.MinValue)) |
There was a problem hiding this comment.
Can we have tests for all numerical values?
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #500 +/- ##
============================================
+ Coverage 34.05% 34.08% +0.02%
+ Complexity 859 812 -47
============================================
Files 116 105 -11
Lines 38679 38516 -163
Branches 8567 8555 -12
============================================
- Hits 13173 13127 -46
+ Misses 22745 22644 -101
+ Partials 2761 2745 -16 ☔ View full report in Codecov by Sentry. |
| match self.inner_abs_func.invoke(args) { | ||
| Ok(result) => Ok(result), | ||
| Err(DataFusionError::ArrowError(ArrowError::ComputeError(msg), trace)) | ||
| if msg.contains("overflow") => |
There was a problem hiding this comment.
It would be nice if Arrow/DataFusion threw a specific overflow error so that we didn't have to look for a string within the error message, but I guess that isn't available.
There was a problem hiding this comment.
I am going to file a feature request in DataFusion. I will post the link here later.
There was a problem hiding this comment.
There was a problem hiding this comment.
Great! I understand that we have to wait for the next version of arror-rs to be released and integrate it here to be able to make the changes, right?
There was a problem hiding this comment.
We can continue with this and do a follow up once the arrow-rs change is available. Or you can wait; the arrow-rs community is very responsive.
There was a problem hiding this comment.
Probably better to do it in a follow-up PR, I think. Thanks!
There was a problem hiding this comment.
It would help to log an issue to keep track
There was a problem hiding this comment.
Apparently my PR in Arrow will not be available for 3 months because it is an API change 😞
There was a problem hiding this comment.
When we upgrade to the version of the arrow with the improved overflow eror reporing then the tests in this PR will fail (because we are looking for ComputeError but instead will get ArithmeticOverflow) so I don't think we need to file an issue
|
|
||
| fn invoke(&self, args: &[ColumnarValue]) -> Result<ColumnarValue, DataFusionError> { | ||
| match self.inner_abs_func.invoke(args) { | ||
| Ok(result) => Ok(result), |
There was a problem hiding this comment.
Technically there is no need to match on an Ok result because there is already the catch all other handling.
| Ok(result) => Ok(result), |
| fn arithmetic_overflow_error(from_type: &str) -> CometError { | ||
| CometError::ArithmeticOverflow { | ||
| from_type: from_type.to_string(), | ||
| } | ||
| } |
There was a problem hiding this comment.
This seems to duplicate the same function from negative.rs. Perhaps that one could be moved so that it can be reused here.
| let eval_mode = match spark_expression::EvalMode::try_from(expr.eval_mode)? { | ||
| spark_expression::EvalMode::Legacy => EvalMode::Legacy, | ||
| spark_expression::EvalMode::Ansi => EvalMode::Ansi, | ||
| spark_expression::EvalMode::Try => { | ||
| return Err(ExecutionError::GeneralError( | ||
| "Invalid EvalMode: \"TRY\"".to_string(), | ||
| )) | ||
| } | ||
| }; |
There was a problem hiding this comment.
I see that we have this code block duplicated for abs and cast now. Perhaps this could be extracted into a function. Another option would be to implement TryFrom or TryInto for this conversion.
|
I have done a refactor with all the comments, thanks for the revision! |
|
The problems in the tests seem to be because the branch was not aligned with main. By aligning it, the tests have passed in my repo. |
| spark_expression::EvalMode::Try => EvalMode::Try, | ||
| spark_expression::EvalMode::Ansi => EvalMode::Ansi, | ||
| }; | ||
| let eval_mode = EvalMode::try_from(expr.eval_mode)?; |
There was a problem hiding this comment.
It would be more idiomatic to use try_into here.
| let eval_mode = EvalMode::try_from(expr.eval_mode)?; | |
| let eval_mode = expr.eval_mode.try_into()?; |
| let return_type = child.data_type(&input_schema)?; | ||
| let args = vec![child]; | ||
| let scalar_def = ScalarFunctionDefinition::UDF(math::abs()); | ||
| let eval_mode = EvalMode::try_from(expr.eval_mode)?; |
There was a problem hiding this comment.
It would be more idiomatic to use try_into here.
| let eval_mode = EvalMode::try_from(expr.eval_mode)?; | |
| let eval_mode = expr.eval_mode.try_into()?; |
andygrove
left a comment
There was a problem hiding this comment.
Thanks again @planga82. I left one more nit, but will be happy to merge after that unless @parthchandra has more feedback
|
Thanks!! I am a rust beginner, I appreciate any comments! |
* change proto msg * QueryPlanSerde with eval mode * Move eval mode * Add abs in planner * CometAbsFunc wrapper * Add error management * Add tests * Add license * spotless apply * format * Fix clippy * error msg for all spark versions * Fix benches * Use enum to ansi mode * Fix format * Add more tests * Format * Refactor * refactor * fix merge * fix merge
Which issue does this PR close?
Closes #464 .
Rationale for this change
This PR adds support for ansi mode in the abs function. This is done by adding a wrapper to the abs datafusion function to add the different behavior between Spark and Datafusion. The main differences are the overflow behavior in legacy mode and the message exception in ansi mode.
What changes are included in this PR?
In addition to introducing the wrapper, some minor refactoring is done to move some code to a more general location.
In Spark, the abs function does not support try execution mode, only ansi or legacy.
How are these changes tested?
The new tests test the correct execution in the event of an overflow.