Commit e2ce0ca
[SPARK-17618] Fix invalid comparisons between UnsafeRow and other row formats
## What changes were proposed in this pull request?
This patch addresses a correctness bug in Spark 1.6.x in where `coalesce()` declares that it can process `UnsafeRows` but mis-declares that it always outputs safe rows. If UnsafeRow and other Row types are compared for equality then we will get spurious `false` comparisons, leading to wrong answers in operators which perform whole-row comparison (such as `distinct()` or `except()`). An example of a query impacted by this bug is given in the [JIRA ticket](https://issues.apache.org/jira/browse/SPARK-17618).
The problem is that the validity of our row format conversion rules depends on operators which handle `unsafeRows` (signalled by overriding `canProcessUnsafeRows`) correctly reporting their output row format (which is done by overriding `outputsUnsafeRows`). In apache#9024, we overrode `canProcessUnsafeRows` but forgot to override `outputsUnsafeRows`, leading to the incorrect `equals()` comparison.
Our interface design is flawed because correctness depends on operators correctly overriding multiple methods this problem could have been prevented by a design which coupled row format methods / metadata into a single method / class so that all three methods had to be overridden at the same time.
This patch addresses this issue by adding missing `outputsUnsafeRows` overrides. In order to ensure that bugs in this logic are uncovered sooner, I have modified `UnsafeRow.equals()` to throw an `IllegalArgumentException` if it is called with an object that is not an `UnsafeRow`.
## How was this patch tested?
I believe that the stronger misuse-checking in `UnsafeRow.equals()` is sufficient to detect and prevent this class of bug.
Author: Josh Rosen <[email protected]>
Closes apache#15185 from JoshRosen/SPARK-17618.1 parent 7aded55 commit e2ce0ca
File tree
3 files changed
+12
-3
lines changed- sql
- catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions
- core/src/main/scala/org/apache/spark/sql/execution
3 files changed
+12
-3
lines changedLines changed: 6 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
30 | 30 | | |
31 | 31 | | |
32 | 32 | | |
| 33 | + | |
33 | 34 | | |
34 | 35 | | |
35 | 36 | | |
| |||
610 | 611 | | |
611 | 612 | | |
612 | 613 | | |
| 614 | + | |
| 615 | + | |
| 616 | + | |
| 617 | + | |
| 618 | + | |
613 | 619 | | |
614 | | - | |
615 | 620 | | |
616 | 621 | | |
617 | 622 | | |
| |||
Lines changed: 1 addition & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
96 | 96 | | |
97 | 97 | | |
98 | 98 | | |
| 99 | + | |
99 | 100 | | |
100 | 101 | | |
101 | 102 | | |
| |||
Lines changed: 5 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
251 | 251 | | |
252 | 252 | | |
253 | 253 | | |
| 254 | + | |
254 | 255 | | |
255 | 256 | | |
256 | 257 | | |
| |||
319 | 320 | | |
320 | 321 | | |
321 | 322 | | |
| 323 | + | |
322 | 324 | | |
323 | 325 | | |
324 | 326 | | |
325 | 327 | | |
326 | 328 | | |
327 | 329 | | |
328 | 330 | | |
329 | | - | |
| 331 | + | |
330 | 332 | | |
331 | | - | |
| 333 | + | |
332 | 334 | | |
| 335 | + | |
333 | 336 | | |
334 | 337 | | |
335 | 338 | | |
| |||
0 commit comments