Commit f0cad7a
[SPARK-40588] FileFormatWriter materializes AQE plan before accessing outputOrdering
### What changes were proposed in this pull request?
The `FileFormatWriter` materializes an `AdaptiveQueryPlan` before accessing the plan's `outputOrdering`. This is required for Spark 3.0 to 3.3. Spark 3.4 does not need this because `FileFormatWriter` gets the final plan.
### Why are the changes needed?
`FileFormatWriter` enforces an ordering if the written plan does not provide that ordering. An `AdaptiveQueryPlan` does not know its final ordering (Spark 3.0 to 3.3), in which case `FileFormatWriter` enforces the ordering (e.g. by column `"a"`) even if the plan provides a compatible ordering (e.g. by columns `"a", "b"`). In case of spilling, that order (e.g. by columns `"a", "b"`) gets broken (see SPARK-40588).
### Does this PR introduce _any_ user-facing change?
This fixes SPARK-40588, which was introduced in 3.0. This restores behaviour from Spark 2.4.
### How was this patch tested?
The final plan that is written to files cannot be extracted from `FileFormatWriter`. The bug explained in [SPARK-40588](https://issues.apache.org/jira/browse/SPARK-40588) can only be asserted on the result files when spilling occurs. This is very hard to control in an unit test scenario.
Therefore, this was tested manually. The [example to reproduce this issue](https://issues.apache.org/jira/browse/SPARK-40588?focusedCommentId=17621032&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17621032) given in SPARK-40588 now produces sorted files.
The actual plan written into the files changed from
```
Sort [input[0, bigint, false] ASC NULLS FIRST], false, 0
+- AdaptiveSparkPlan isFinalPlan=false
+- Sort [day#2L ASC NULLS FIRST, id#4L ASC NULLS FIRST], false, 0
+- Exchange hashpartitioning(day#2L, 2), REPARTITION_BY_NUM, [id=#30]
+- BroadcastNestedLoopJoin BuildLeft, Inner
:- BroadcastExchange IdentityBroadcastMode, [id=#28]
: +- Project [id#0L AS day#2L]
: +- Range (0, 2, step=1, splits=2)
+- Range (0, 10000000, step=1, splits=2)
```
where `FileFormatWriter` enforces order with `Sort [input[0, bigint, false] ASC NULLS FIRST], false, 0`, to
```
*(3) Sort [day#2L ASC NULLS FIRST, id#4L ASC NULLS FIRST], false, 0
+- AQEShuffleRead coalesced
+- ShuffleQueryStage 1
+- Exchange hashpartitioning(day#2L, 200), REPARTITION_BY_COL, [id=#68]
+- *(2) BroadcastNestedLoopJoin BuildLeft, Inner
:- BroadcastQueryStage 0
: +- BroadcastExchange IdentityBroadcastMode, [id=#42]
: +- *(1) Project [id#0L AS day#2L]
: +- *(1) Range (0, 2, step=1, splits=2)
+- *(2) Range (0, 1000000, step=1, splits=2)
```
where the sort given by the user is the outermost sort now.
Closes #38358 from EnricoMi/branch-3.3-materialize-aqe-plan.
Authored-by: Enrico Minack <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>1 parent ef74381 commit f0cad7a
File tree
2 files changed
+16
-4
lines changed- sql/core/src/main/scala/org/apache/spark/sql/execution
- adaptive
- datasources
2 files changed
+16
-4
lines changedLines changed: 2 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
222 | 222 | | |
223 | 223 | | |
224 | 224 | | |
| 225 | + | |
| 226 | + | |
225 | 227 | | |
226 | 228 | | |
227 | 229 | | |
| |||
Lines changed: 14 additions & 4 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
40 | 40 | | |
41 | 41 | | |
42 | 42 | | |
| 43 | + | |
43 | 44 | | |
44 | 45 | | |
45 | 46 | | |
| |||
187 | 188 | | |
188 | 189 | | |
189 | 190 | | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
| 199 | + | |
190 | 200 | | |
191 | | - | |
| 201 | + | |
192 | 202 | | |
193 | 203 | | |
194 | 204 | | |
| |||
210 | 220 | | |
211 | 221 | | |
212 | 222 | | |
213 | | - | |
| 223 | + | |
214 | 224 | | |
215 | 225 | | |
216 | 226 | | |
| |||
220 | 230 | | |
221 | 231 | | |
222 | 232 | | |
223 | | - | |
| 233 | + | |
224 | 234 | | |
225 | 235 | | |
226 | 236 | | |
227 | 237 | | |
228 | | - | |
| 238 | + | |
229 | 239 | | |
230 | 240 | | |
231 | 241 | | |
| |||
0 commit comments