Skip to content

Conversation

@gengliangwang
Copy link
Member

@gengliangwang gengliangwang commented Apr 22, 2021

What changes were proposed in this pull request?

Add New SQL functions:

  • TRY_ADD
  • TRY_DIVIDE

These expressions are identical to the following expression under ANSI mode except that it returns null if error occurs:

  • ADD
  • DIVIDE

Note: it is easy to add other expressions like TRY_SUBTRACT/TRY_MULTIPLY but let's control the number of these new expressions and just add TRY_ADD and TRY_DIVIDE for now.

Why are the changes needed?

  1. Users can manage to finish queries without interruptions in ANSI mode.
  2. Users can get NULLs instead of unreasonable results if overflow occurs when ANSI mode is off.
    For example, the behavior of the following SQL operations is unreasonable:
2147483647 + 2 => -2147483647

With the new safe version SQL functions:

TRY_ADD(2147483647, 2) => null

Note: We should only add new expressions to important operators, instead of adding new safe expressions for all the expressions that can throw errors.

Does this PR introduce any user-facing change?

Yes, new SQL functions: TRY_ADD/TRY_DIVIDE

How was this patch tested?

Unit test

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will fix the sql/toString method. This is still WIP.

@SparkQA
Copy link

SparkQA commented Apr 22, 2021

Test build #137800 has finished for PR 32292 at commit eb5a07d.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • s\"$
  • case class Try(child: Expression) extends UnaryExpression

@SparkQA
Copy link

SparkQA commented Apr 22, 2021

@SparkQA
Copy link

SparkQA commented Apr 22, 2021

@SparkQA
Copy link

SparkQA commented Apr 22, 2021

@SparkQA
Copy link

SparkQA commented Apr 22, 2021

Test build #137813 has finished for PR 32292 at commit 153f147.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gengliangwang gengliangwang changed the title [WIP][SPARK-35162][SQL] New SQL functions: TRY_ADD/TRY_SUBTRACT/TRY_MULTIPLY/TRY_DIVIDE/TRY_DIV [WIP][SPARK-35162][SQL] New SQL functions: TRY_ADD/TRY_DIVIDE Apr 23, 2021
@SparkQA
Copy link

SparkQA commented May 11, 2021

@SparkQA
Copy link

SparkQA commented May 11, 2021

@SparkQA
Copy link

SparkQA commented May 11, 2021

@SparkQA
Copy link

SparkQA commented May 11, 2021

@SparkQA
Copy link

SparkQA commented May 11, 2021

Test build #138391 has finished for PR 32292 at commit a1d5742.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 11, 2021

Test build #138392 has finished for PR 32292 at commit 75dee4a.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 12, 2021

@SparkQA
Copy link

SparkQA commented May 12, 2021

@gengliangwang gengliangwang changed the title [WIP][SPARK-35162][SQL] New SQL functions: TRY_ADD/TRY_DIVIDE [SPARK-35162][SQL] New SQL functions: TRY_ADD/TRY_DIVIDE May 12, 2021
@gengliangwang
Copy link
Member Author

@cloud-fan @maropu Sorry for the delay. This one is ready for review now.

@SparkQA
Copy link

SparkQA commented May 12, 2021

Test build #138453 has finished for PR 32292 at commit 9b671f1.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 12, 2021

@SparkQA
Copy link

SparkQA commented May 12, 2021

@SparkQA
Copy link

SparkQA commented May 12, 2021

Test build #138455 has finished for PR 32292 at commit d1c4e92.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class TryAdd(left: Expression, right: Expression, child: Expression)
  • case class TryDivide(left: Expression, right: Expression, child: Expression)

@SparkQA
Copy link

SparkQA commented May 12, 2021

@SparkQA
Copy link

SparkQA commented May 12, 2021

Test build #138468 has finished for PR 32292 at commit 60b4514.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@maropu maropu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just out of curiosity; Any reason to pick up try_add+try_divide instead of try_add+try_multiple?

expression[CaseWhen]("when"),

expression[TryAdd]("try_add"),
expression[TryDivide]("try_divide"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If these functions exist in the math function group, how about moving them after L376?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

import org.apache.spark.sql.catalyst.expressions.codegen.Block._
import org.apache.spark.sql.types.DataType

private[catalyst] case class TryEval(child: Expression)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need private[catalyst] here? I think catalyst package itself looks private.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

examples = """
Examples:
> SELECT _FUNC_(1, 2);
3
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about adding an overflow example here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

> SELECT _FUNC_(3, 2);
1.5
> SELECT _FUNC_(2L, 2L);
1.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about adding a null example here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@gengliangwang
Copy link
Member Author

Just out of curiosity; Any reason to pick up try_add+try_divide instead of try_add+try_multiple?

IMO, divide by 0 error is more common in ETL/ML jobs than integral multiply overflow. I also talked to @bart-samwel, from his experience on Bigquery, the most desired functions are try_cast and try_divide.
We can add TRY_SUBTRACT/TRY_MULTIPLY if many users want it. Before that, let's be cautious in adding such new functions.

Copy link
Member

@maropu maropu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the explanation, @gengliangwang . Looks fine to me.

@SparkQA
Copy link

SparkQA commented May 13, 2021

@SparkQA
Copy link

SparkQA commented May 13, 2021

@SparkQA
Copy link

SparkQA commented May 13, 2021

Test build #138490 has finished for PR 32292 at commit 774bda1.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class TryEval(child: Expression) extends UnaryExpression with NullIntolerant

@SparkQA
Copy link

SparkQA commented May 13, 2021

@SparkQA
Copy link

SparkQA commented May 13, 2021

@SparkQA
Copy link

SparkQA commented May 13, 2021

Test build #138502 has finished for PR 32292 at commit 91bdc4c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gengliangwang
Copy link
Member Author

Merging to master. @maropu Thanks for the review.

gengliangwang added a commit that referenced this pull request Dec 3, 2021
…e as well

### What changes were proposed in this pull request?

In #32292, `try_arithmetic.sql` is only tested with ANSI mode off. We should test it under ANSI mode as well.

### Why are the changes needed?

Improve test coverage

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Unit test

Closes #34795 from gengliangwang/try_arithmetic.sql.

Authored-by: Gengliang Wang <[email protected]>
Signed-off-by: Gengliang Wang <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants