Struct casting field order by brancz · Pull Request #8871 · apache/arrow-rs

brancz · 2025-11-19T09:51:20Z

Which issue does this PR close?

Closes #8870.

What changes are included in this PR?

Check if field order in from/to casting matches, and if not, attempt to find the fields by name.

Are these changes tested?

Added unit tests (that previously failed, so I separated them in a commit).

Are there any user-facing changes?

No, it's strictly additive functionality.

@alamb @vegarsti

vegarsti

Nice! This looks good to me. Nice feature improvement.

vegarsti · 2025-11-19T09:52:17Z

arrow-cast/src/cast/mod.rs

@@ -221,12 +221,34 @@ pub fn can_cast_types(from_type: &DataType, to_type: &DataType) -> bool {
            Decimal32(_, _) | Decimal64(_, _) | Decimal128(_, _) | Decimal256(_, _),
        ) => true,
        (Struct(from_fields), Struct(to_fields)) => {
-            from_fields.len() == to_fields.len()
-                && from_fields.iter().zip(to_fields.iter()).all(|(f1, f2)| {
+            // fast path, all field names are in the same order and same number of fields


This comment should be on line 228, I believe

indeed, good catch the order was a bit different at the start when I wrote that comment haha

alamb

Thanks @brancz -- my only concern are the changes to the existing tests. Otherwise it looks good to me

alamb · 2025-11-19T18:06:45Z

arrow-cast/src/cast/mod.rs

+                });
+            }
+
+            // slow path, we match the fields by name


I think one idea that has come up in the past is to do this mapping calculation once and then use it for both can_cast_types and cast

However, this seems to be strictly better than current main (doesn't slow down existing code and allows more uses, so 👍 to me)

I was also going back and forth, but I decided that an additional allocation in the average case would be far worse in perf cost than comparisons. We're heavily profiling all these code paths, if it's significant, we will come back and improve the perf! 😄

alamb · 2025-11-19T18:07:41Z

arrow-cast/src/cast/mod.rs

        let struct_array = StructArray::from(vec![
            (
-                Arc::new(Field::new("b", DataType::Boolean, false)),
+                Arc::new(Field::new("a", DataType::Boolean, false)),


why this change?

It turns out these tests were actually wrong to begin with, have a look at the names of the columns, how can a/b be cast to b/c? They only ever worked by accident, and now that we test whether they match, they needed to be fixed.

Would other people find this a regression (aka that they expect struct fields to be treated in order, rather than by name) 🤔

It’s possible someone had it incorrect with it accidentally working but I don’t think that should stop fixing what’s clearly a bug. The previous behavior can actually cause very hidden unexpected behavior when it does accidentally work but field names mismatch or have a different order. I can’t see a valid use case for the incorrect previous behavior and all valid behaviors can be represented still and on total mismatches a user will even get a proper error now instead of potentially silently continuing.

I believe this actually caused a regression downstream in DataFusion when I was testing:

Regression in struct casting in 57.2.0 (not yet released) #9005

I think we need to fix it prior to release

alamb · 2025-11-19T18:07:55Z

arrow-cast/src/cast/mod.rs

        let struct_array = StructArray::from(vec![
            (
-                Arc::new(Field::new("b", DataType::Boolean, false)),
+                Arc::new(Field::new("a", DataType::Boolean, false)),


likewise why was this test changed?

alamb

Thank you for this contribution @brancz

alamb · 2025-11-25T11:27:57Z

Thanks again @brancz

# Which issue does this PR close? Closes #9005 # Rationale for this change Not break something in a patch release. # What changes are included in this PR? Bring back in-order casting for structs that have equal field numbers. # Are these changes tested? Yes, the tests that were modified in #8871 were reverted back. # Are there any user-facing changes? It brings back functionality.

arrow-cast: Add test cases for various order variants of structs

a01a465

github-actions bot added the arrow Changes to the arrow crate label Nov 19, 2025

brancz force-pushed the struct-casting-field-order branch from 959a210 to 808c285 Compare November 19, 2025 09:55

vegarsti approved these changes Nov 19, 2025

View reviewed changes

arrow-cast: Attempt finding struct field on inconsistent order

001f740

brancz force-pushed the struct-casting-field-order branch from 808c285 to 001f740 Compare November 19, 2025 10:08

alamb reviewed Nov 19, 2025

View reviewed changes

alamb approved these changes Nov 19, 2025

View reviewed changes

alamb merged commit 7e637a7 into apache:main Nov 25, 2025
26 checks passed

brancz mentioned this pull request Nov 27, 2025

fix: correctly handle schema evolution in DF vortex-data/vortex#5555

Closed

brancz deleted the struct-casting-field-order branch December 16, 2025 14:41

alamb mentioned this pull request Dec 16, 2025

Regression in struct casting in 57.2.0 (not yet released) #9005

Closed

brancz mentioned this pull request Dec 16, 2025

arrow-cast: Bring back in-order field casting for StructArray #9007

Merged

alamb mentioned this pull request Dec 18, 2025

Upgrade DataFusion to arrow-rs/parquet 57.2.0 apache/datafusion#19355

Merged

alamb mentioned this pull request Jan 6, 2026

Struct casting requires same order of fields #8870

Closed

Comments

Conversation

brancz commented Nov 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

vegarsti left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

alamb commented Nov 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

brancz commented Nov 19, 2025 •

edited

Loading

vegarsti left a comment •

edited

Loading