-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-35162][SQL] New SQL functions: TRY_ADD/TRY_DIVIDE #32292
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will fix the sql/toString method. This is still WIP.
|
Test build #137800 has finished for PR 32292 at commit
|
|
Kubernetes integration test unable to build dist. exiting with code: 1 |
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Test build #137813 has finished for PR 32292 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Test build #138391 has finished for PR 32292 at commit
|
|
Test build #138392 has finished for PR 32292 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
@cloud-fan @maropu Sorry for the delay. This one is ready for review now. |
|
Test build #138453 has finished for PR 32292 at commit
|
|
Kubernetes integration test unable to build dist. exiting with code: 1 |
|
Kubernetes integration test starting |
|
Test build #138455 has finished for PR 32292 at commit
|
|
Kubernetes integration test status failure |
|
Test build #138468 has finished for PR 32292 at commit
|
maropu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just out of curiosity; Any reason to pick up try_add+try_divide instead of try_add+try_multiple?
| expression[CaseWhen]("when"), | ||
|
|
||
| expression[TryAdd]("try_add"), | ||
| expression[TryDivide]("try_divide"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If these functions exist in the math function group, how about moving them after L376?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| import org.apache.spark.sql.catalyst.expressions.codegen.Block._ | ||
| import org.apache.spark.sql.types.DataType | ||
|
|
||
| private[catalyst] case class TryEval(child: Expression) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need private[catalyst] here? I think catalyst package itself looks private.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| examples = """ | ||
| Examples: | ||
| > SELECT _FUNC_(1, 2); | ||
| 3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about adding an overflow example here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| > SELECT _FUNC_(3, 2); | ||
| 1.5 | ||
| > SELECT _FUNC_(2L, 2L); | ||
| 1.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about adding a null example here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
IMO, divide by 0 error is more common in ETL/ML jobs than integral multiply overflow. I also talked to @bart-samwel, from his experience on Bigquery, the most desired functions are |
maropu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the explanation, @gengliangwang . Looks fine to me.
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Test build #138490 has finished for PR 32292 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Test build #138502 has finished for PR 32292 at commit
|
|
Merging to master. @maropu Thanks for the review. |
…e as well ### What changes were proposed in this pull request? In #32292, `try_arithmetic.sql` is only tested with ANSI mode off. We should test it under ANSI mode as well. ### Why are the changes needed? Improve test coverage ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Unit test Closes #34795 from gengliangwang/try_arithmetic.sql. Authored-by: Gengliang Wang <[email protected]> Signed-off-by: Gengliang Wang <[email protected]>
What changes were proposed in this pull request?
Add New SQL functions:
These expressions are identical to the following expression under ANSI mode except that it returns null if error occurs:
Note: it is easy to add other expressions like
TRY_SUBTRACT/TRY_MULTIPLYbut let's control the number of these new expressions and just addTRY_ADDandTRY_DIVIDEfor now.Why are the changes needed?
For example, the behavior of the following SQL operations is unreasonable:
With the new safe version SQL functions:
Note: We should only add new expressions to important operators, instead of adding new safe expressions for all the expressions that can throw errors.
Does this PR introduce any user-facing change?
Yes, new SQL functions: TRY_ADD/TRY_DIVIDE
How was this patch tested?
Unit test