Do not rename struct fields when coercing types in CASE#14384
Do not rename struct fields when coercing types in CASE#14384alamb merged 5 commits intoapache:mainfrom
CASE#14384Conversation
CASE
| } | ||
| } | ||
|
|
||
| /// returns the result of coercing two fields to a common type |
There was a problem hiding this comment.
here is the fix. I suspect this doesn't matter in most cases as the result of a comparison is boolean so the names of the intermediate fields don't matter.
However, in a CASE (and union for that matter) the names do matter, so this PR preserves them
| { 'xxx': 'e' } -- column4: Struct with field xxx (no second field) | ||
| ); | ||
|
|
||
| # Note field names are in different orders |
There was a problem hiding this comment.
This test documents some strange behavior that I don't really know if is a bug or not
DataFusion treats the field orders as important, and when coercing two structs, it does it in field order (not by name)
So in this case { 'foo': 'a', 'xxx': 'b' }, and { 'xxx': 'c', 'foo': 'd' } are coerced to compare the values a /c and b / d (not the values a/d and b/c
This PR doesn't change this behavior, but I felt that was an important behavior to document in tests, so I did so
There was a problem hiding this comment.
- While working on Fix regression list Type Coercion List with inner type struct which has large/view types #14385 I found that this behaior behavior (using struct position rather than field name) is inconsistent
Specifically the basic case comparison goes through "comparison_coercion" (fixed in this PR) which uses the list field order
But when coercing list elements, that goes through type_union_resolution, (link) which matches fields by name (link)
@jayzhan211 is the eventual plan to unify thee code paths? Or maybe the case coercion should use type_union_resolution 🤔
There was a problem hiding this comment.
We follow how DuckDB works, I think both field order and name should be the same
There was a problem hiding this comment.
To be clear this PR doesn't change the existing behavior of comparison_coercion other than not renaming field names
|
@jayzhan211 as our resident coercion expert can you possibly review this bug? It is similar to #14385 |
| /// returns the result of coercing two fields to a common type | ||
| fn coerce_fields(common_type: DataType, lhs: &FieldRef, rhs: &FieldRef) -> FieldRef { | ||
| let is_nullable = lhs.is_nullable() || rhs.is_nullable(); | ||
| let name = lhs.name(); // pick the name from the left field |
There was a problem hiding this comment.
if the name is not the same, return error
There was a problem hiding this comment.
I agree that would be better -- can I do it in a follow on PR?
In my opinion this PR makes things better (doesn't lose field names) even though it is not perfect and I would like to fix the regression since 43
What I think we should do is switch comparison_coercion to use type_union_coercion eventually and then implement the correct coercion there
There was a problem hiding this comment.
Yes, I think type union resolution is the correct on for CASE
There was a problem hiding this comment.
I filed
to track the issue and plan to merge this one in when ready
Thank you @andygrove What I plan to do is to file a follow on ticket to address @jayzhan211 's (very valid) concern that the coercion should error if the fields have different names and merge this PR in tomorrow |
* Fix field name during struct equality coercion * fix bug * Add more tests * Update tests per apache#14396
Which issue does this PR close?
Casecoercion of Structs loses field names #14383Rationale for this change
When coercing fields for a struct, we shouldn't rename the fields to
c0,c1, etc. See #14383What changes are included in this PR?
Preserve the Struct field names
Are these changes tested?
Yes, sqllogictests are added
Are there any user-facing changes?
coerced struct field names do not revever to
c0, `c1, etc