Skip to content

Conversation

@gengliangwang
Copy link
Member

What changes were proposed in this pull request?

Remove SQL configuration spark.sql.legacy.allowCastNumericToTimestamp

Why are the changes needed?

In the current master branch, there is a new configuration spark.sql.legacy.allowCastNumericToTimestamp which controls whether to cast Numeric types to Timestamp or not. The default value is true.

After #30260, the type conversion between Timestamp type and Numeric type is disallowed in ANSI mode. So, we don't need to a separate configuration spark.sql.legacy.allowCastNumericToTimestamp for disallowing the conversion. Users just need to set spark.sql.ansi.enabled for the behavior.

As the configuration is not in any released yet, we should remove the configuration to make things simpler.

Does this PR introduce any user-facing change?

No, since the configuration is not released yet.

How was this patch tested?

Existing test cases

@SparkQA
Copy link

SparkQA commented Nov 25, 2020

Test build #131731 has finished for PR 30493 at commit fcc1f12.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry but users cannot enable spark.sql.ansi.enabled because it enforces many other stuffs as a side effect. I'm not sure about removing this. Is it really okay?

spark.sql.legacy.allowCastNumericToTimestamp for disallowing the conversion. Users just need to set spark.sql.ansi.enabled for the behavior.

cc @gatorsmile

@gengliangwang
Copy link
Member Author

gengliangwang commented Nov 25, 2020

@dongjoon-hyun I can understand your concern. But if we are going to have this configuration, then it is reasonable to add new feature flags for each disallowed conversion in ANSI mode (#30260).
Overall I think it is better to remove this configuration in 3.1.0 before it's too late.

@dongjoon-hyun
Copy link
Member

According to the report of SPARK-31710, it was a correctness issue. This should be considered seriously. Are you using ANSI mode in production really?

@github-actions github-actions bot added the SQL label Nov 25, 2020
@gengliangwang
Copy link
Member Author

gengliangwang commented Nov 25, 2020

According to the report of SPARK-31710, it was a correctness issue.

I just tried the following in Spark 3.0.0

create table test(id bigint);
insert into test select 1586318188000;
create table test1(id bigint) partitioned by (year string);
set hive.exec.dynamic.partition.mode=nonstrict;
insert overwrite table test1 partition(year) select 234,cast(id as TIMESTAMP) from test;

and there is no exception as reported in https://issues.apache.org/jira/browse/SPARK-31710. (The affects version is 2.4.5)
Casting from Long type to Timestamp type can be ambiguous. Spark converts the Long value as "how many seconds from Unix epoch" by default. And it is considered as "wrong result" in https://issues.apache.org/jira/browse/SPARK-31710 .
(That's why we should disallow it in ANSI mode.)

Are you using ANSI mode in production really?

It is possible when it is ready.

@dongjoon-hyun
Copy link
Member

Got it, @gengliangwang .

@SparkQA
Copy link

SparkQA commented Nov 25, 2020

Test build #131732 has finished for PR 30493 at commit f98a5da.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

HiveCompatibilitySuite.scala seems to still have it.

/home/jenkins/workspace/SparkPullRequestBuilder@2/sql/hive/compatibility/src/test/scala/org/apache/spark/sql/hive/execution/HiveCompatibilitySuite.scala:44:19: value legacyAllowCastNumericToTimestamp is not a member of org.apache.spark.sql.internal.SQLConf
[error]     TestHive.conf.legacyAllowCastNumericToTimestamp

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for the removal according to the discussion.

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am okay with this too.

}
}

test("SPARK-31710: fail casting from numeric to timestamp if it is forbidden") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Users just need to set spark.sql.ansi.enabled for the behavior.

Just in case, should we leave the test, and check the cases under ^^ the config?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan
Copy link
Contributor

GA passed, merging to master, thanks!

@cloud-fan cloud-fan closed this in 19f3b89 Nov 25, 2020
@SparkQA
Copy link

SparkQA commented Nov 25, 2020

Test build #131742 has finished for PR 30493 at commit 6d3e7e2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants