Commit 2ed6e7b
[SPARK-36677][SQL] NestedColumnAliasing should not push down aggregate functions into projections
### What changes were proposed in this pull request?
This PR filters out `ExtractValues`s that contains any aggregation function in the `NestedColumnAliasing` rule to prevent cases where aggregations are pushed down into projections.
### Why are the changes needed?
To handle a corner/missed case in `NestedColumnAliasing` that can cause users to encounter a runtime exception.
Consider the following schema:
```
root
|-- a: struct (nullable = true)
| |-- c: struct (nullable = true)
| | |-- e: string (nullable = true)
| |-- d: integer (nullable = true)
|-- b: string (nullable = true)
```
and the query:
`SELECT MAX(a).c.e FROM (SELECT a, b FROM test_aggregates) GROUP BY b`
Executing the query before this PR will result in the error:
```
java.lang.UnsupportedOperationException: Cannot generate code for expression: max(input[0, struct<c:struct<e:string>,d:int>, true])
at org.apache.spark.sql.errors.QueryExecutionErrors$.cannotGenerateCodeForExpressionError(QueryExecutionErrors.scala:83)
at org.apache.spark.sql.catalyst.expressions.Unevaluable.doGenCode(Expression.scala:312)
at org.apache.spark.sql.catalyst.expressions.Unevaluable.doGenCode$(Expression.scala:311)
at org.apache.spark.sql.catalyst.expressions.aggregate.AggregateExpression.doGenCode(interfaces.scala:99)
...
```
The optimised plan before this PR is:
```
'Aggregate [b#1], [_extract_e#5 AS max(a).c.e#3]
+- 'Project [max(a#0).c.e AS _extract_e#5, b#1]
+- Relation default.test_aggregates[a#0,b#1] parquet
```
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
A new unit test in `NestedColumnAliasingSuite`. The test consists of the repro mentioned earlier.
The produced optimized plan is checked for equivalency with a plan of the form:
```
Aggregate [b#452], [max(a#451).c.e AS max('a)[c][e]#456]
+- LocalRelation <empty>, [a#451, b#452]
```
Closes #33921 from vicennial/spark-36677.
Authored-by: Venkata Sai Akhil Gudesa <[email protected]>
Signed-off-by: Liang-Chi Hsieh <[email protected]>1 parent 5a0ae69 commit 2ed6e7b
File tree
2 files changed
+40
-2
lines changed- sql/catalyst/src
- main/scala/org/apache/spark/sql/catalyst/optimizer
- test/scala/org/apache/spark/sql/catalyst/optimizer
2 files changed
+40
-2
lines changedLines changed: 13 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
21 | 21 | | |
22 | 22 | | |
23 | 23 | | |
| 24 | + | |
24 | 25 | | |
25 | 26 | | |
26 | 27 | | |
| |||
258 | 259 | | |
259 | 260 | | |
260 | 261 | | |
| 262 | + | |
| 263 | + | |
| 264 | + | |
| 265 | + | |
| 266 | + | |
| 267 | + | |
| 268 | + | |
261 | 269 | | |
262 | 270 | | |
263 | 271 | | |
| |||
268 | 276 | | |
269 | 277 | | |
270 | 278 | | |
271 | | - | |
| 279 | + | |
| 280 | + | |
| 281 | + | |
| 282 | + | |
272 | 283 | | |
273 | 284 | | |
274 | 285 | | |
275 | 286 | | |
276 | 287 | | |
277 | 288 | | |
278 | 289 | | |
279 | | - | |
| 290 | + | |
280 | 291 | | |
281 | 292 | | |
282 | 293 | | |
| |||
Lines changed: 27 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
20 | 20 | | |
21 | 21 | | |
22 | 22 | | |
| 23 | + | |
23 | 24 | | |
24 | 25 | | |
25 | 26 | | |
| |||
763 | 764 | | |
764 | 765 | | |
765 | 766 | | |
| 767 | + | |
| 768 | + | |
| 769 | + | |
| 770 | + | |
| 771 | + | |
| 772 | + | |
| 773 | + | |
| 774 | + | |
| 775 | + | |
| 776 | + | |
| 777 | + | |
| 778 | + | |
| 779 | + | |
| 780 | + | |
| 781 | + | |
| 782 | + | |
| 783 | + | |
| 784 | + | |
| 785 | + | |
| 786 | + | |
| 787 | + | |
| 788 | + | |
| 789 | + | |
| 790 | + | |
| 791 | + | |
| 792 | + | |
766 | 793 | | |
767 | 794 | | |
768 | 795 | | |
| |||
0 commit comments