Conversation
959a210 to
808c285
Compare
arrow-cast/src/cast/mod.rs
Outdated
| @@ -221,12 +221,34 @@ pub fn can_cast_types(from_type: &DataType, to_type: &DataType) -> bool { | |||
| Decimal32(_, _) | Decimal64(_, _) | Decimal128(_, _) | Decimal256(_, _), | |||
| ) => true, | |||
| (Struct(from_fields), Struct(to_fields)) => { | |||
| from_fields.len() == to_fields.len() | |||
| && from_fields.iter().zip(to_fields.iter()).all(|(f1, f2)| { | |||
| // fast path, all field names are in the same order and same number of fields | |||
There was a problem hiding this comment.
This comment should be on line 228, I believe
There was a problem hiding this comment.
indeed, good catch the order was a bit different at the start when I wrote that comment haha
808c285 to
001f740
Compare
| }); | ||
| } | ||
|
|
||
| // slow path, we match the fields by name |
There was a problem hiding this comment.
I think one idea that has come up in the past is to do this mapping calculation once and then use it for both can_cast_types and cast
However, this seems to be strictly better than current main (doesn't slow down existing code and allows more uses, so 👍 to me)
There was a problem hiding this comment.
I was also going back and forth, but I decided that an additional allocation in the average case would be far worse in perf cost than comparisons. We're heavily profiling all these code paths, if it's significant, we will come back and improve the perf! 😄
| let struct_array = StructArray::from(vec![ | ||
| ( | ||
| Arc::new(Field::new("b", DataType::Boolean, false)), | ||
| Arc::new(Field::new("a", DataType::Boolean, false)), |
There was a problem hiding this comment.
It turns out these tests were actually wrong to begin with, have a look at the names of the columns, how can a/b be cast to b/c? They only ever worked by accident, and now that we test whether they match, they needed to be fixed.
There was a problem hiding this comment.
Would other people find this a regression (aka that they expect struct fields to be treated in order, rather than by name) 🤔
There was a problem hiding this comment.
It’s possible someone had it incorrect with it accidentally working but I don’t think that should stop fixing what’s clearly a bug. The previous behavior can actually cause very hidden unexpected behavior when it does accidentally work but field names mismatch or have a different order. I can’t see a valid use case for the incorrect previous behavior and all valid behaviors can be represented still and on total mismatches a user will even get a proper error now instead of potentially silently continuing.
There was a problem hiding this comment.
I believe this actually caused a regression downstream in DataFusion when I was testing:
I think we need to fix it prior to release
| let struct_array = StructArray::from(vec![ | ||
| ( | ||
| Arc::new(Field::new("b", DataType::Boolean, false)), | ||
| Arc::new(Field::new("a", DataType::Boolean, false)), |
There was a problem hiding this comment.
likewise why was this test changed?
|
Thanks again @brancz |
# Which issue does this PR close? Closes #9005 # Rationale for this change Not break something in a patch release. # What changes are included in this PR? Bring back in-order casting for structs that have equal field numbers. # Are these changes tested? Yes, the tests that were modified in #8871 were reverted back. # Are there any user-facing changes? It brings back functionality.
Which issue does this PR close?
Closes #8870.
What changes are included in this PR?
Check if field order in from/to casting matches, and if not, attempt to find the fields by name.
Are these changes tested?
Added unit tests (that previously failed, so I separated them in a commit).
Are there any user-facing changes?
No, it's strictly additive functionality.
@alamb @vegarsti