Skip to content
This repository was archived by the owner on Nov 15, 2024. It is now read-only.

Commit 9c8109e

Browse files
viiryagatorsmile
authored andcommitted
[SPARK-21555][SQL] RuntimeReplaceable should be compared semantically by its canonicalized child
## What changes were proposed in this pull request? When there are aliases (these aliases were added for nested fields) as parameters in `RuntimeReplaceable`, as they are not in the children expression, those aliases can't be cleaned up in analyzer rule `CleanupAliases`. An expression `nvl(foo.foo1, "value")` can be resolved to two semantically different expressions in a group by query because they contain different aliases. Because those aliases are not children of `RuntimeReplaceable` which is an `UnaryExpression`. So we can't trim the aliases out by simple transforming the expressions in `CleanupAliases`. If we want to replace the non-children aliases in `RuntimeReplaceable`, we need to add more codes to `RuntimeReplaceable` and modify all expressions of `RuntimeReplaceable`. It makes the interface ugly IMO. Consider those aliases will be replaced later at optimization and so they're no harm, this patch chooses to simply override `canonicalized` of `RuntimeReplaceable`. One concern is about `CleanupAliases`. Because it actually cannot clean up ALL aliases inside a plan. To make caller of this rule notice that, this patch adds a comment to `CleanupAliases`. ## How was this patch tested? Added test. Author: Liang-Chi Hsieh <[email protected]> Closes apache#18761 from viirya/SPARK-21555.
1 parent 60e9b2b commit 9c8109e

File tree

4 files changed

+28
-2
lines changed

4 files changed

+28
-2
lines changed

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2234,7 +2234,9 @@ object EliminateUnions extends Rule[LogicalPlan] {
22342234
/**
22352235
* Cleans up unnecessary Aliases inside the plan. Basically we only need Alias as a top level
22362236
* expression in Project(project list) or Aggregate(aggregate expressions) or
2237-
* Window(window expressions).
2237+
* Window(window expressions). Notice that if an expression has other expression parameters which
2238+
* are not in its `children`, e.g. `RuntimeReplaceable`, the transformation for Aliases in this
2239+
* rule can't work for those parameters.
22382240
*/
22392241
object CleanupAliases extends Rule[LogicalPlan] {
22402242
private def trimAliases(e: Expression): Expression = {

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -241,6 +241,10 @@ trait RuntimeReplaceable extends UnaryExpression with Unevaluable {
241241
override def nullable: Boolean = child.nullable
242242
override def foldable: Boolean = child.foldable
243243
override def dataType: DataType = child.dataType
244+
// As this expression gets replaced at optimization with its `child" expression,
245+
// two `RuntimeReplaceable` are considered to be semantically equal if their "child" expressions
246+
// are semantically equal.
247+
override lazy val canonicalized: Expression = child.canonicalized
244248
}
245249

246250

sql/core/src/test/resources/sql-tests/inputs/sql-compatibility-functions.sql

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,3 +23,7 @@ SELECT float(1), double(1), decimal(1);
2323
SELECT date("2014-04-04"), timestamp(date("2014-04-04"));
2424
-- error handling: only one argument
2525
SELECT string(1, 2);
26+
27+
-- SPARK-21555: RuntimeReplaceable used in group by
28+
CREATE TEMPORARY VIEW tempView1 AS VALUES (1, NAMED_STRUCT('col1', 'gamma', 'col2', 'delta')) AS T(id, st);
29+
SELECT nvl(st.col1, "value"), count(*) FROM from tempView1 GROUP BY nvl(st.col1, "value");

sql/core/src/test/resources/sql-tests/results/sql-compatibility-functions.sql.out

Lines changed: 17 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
-- Automatically generated by SQLQueryTestSuite
2-
-- Number of queries: 13
2+
-- Number of queries: 15
33

44

55
-- !query 0
@@ -122,3 +122,19 @@ struct<>
122122
-- !query 12 output
123123
org.apache.spark.sql.AnalysisException
124124
Function string accepts only one argument; line 1 pos 7
125+
126+
127+
-- !query 13
128+
CREATE TEMPORARY VIEW tempView1 AS VALUES (1, NAMED_STRUCT('col1', 'gamma', 'col2', 'delta')) AS T(id, st)
129+
-- !query 13 schema
130+
struct<>
131+
-- !query 13 output
132+
133+
134+
135+
-- !query 14
136+
SELECT nvl(st.col1, "value"), count(*) FROM from tempView1 GROUP BY nvl(st.col1, "value")
137+
-- !query 14 schema
138+
struct<nvl(tempview1.`st`.`col1` AS `col1`, 'value'):string,FROM:bigint>
139+
-- !query 14 output
140+
gamma 1

0 commit comments

Comments
 (0)