Skip to content

Comments

Sort table columns by name when dumping schema#53281

Merged
byroot merged 1 commit intorails:mainfrom
jduff:schema_dumper_sort_columns
Jan 7, 2025
Merged

Sort table columns by name when dumping schema#53281
byroot merged 1 commit intorails:mainfrom
jduff:schema_dumper_sort_columns

Conversation

@jduff
Copy link
Contributor

@jduff jduff commented Oct 12, 2024

Motivation / Background

In projects when multiple people are creating migrations they can often be run out of order from one another, this results in each developer machine having columns added in a different order. Each time we run migrations the schema is dumped based on our local database base so the schema output has columns out of order from machine to machine resulting in noisy diffs and potentially conflicts in the schema file that make no difference overall. ex:

     t.string "county"
-    t.string "country_code"
     t.datetime "created_at", null: false
     t.datetime "updated_at", null: false
+    t.string "country_code"
     t.string "urn"

Detail

This PR adds an option to sort the columns before outputting to the table dump. I found this previous PR which does this just for MySQL, but I decided to do this at the base schema dumper level as an option because there could be cases where teams don't want to do this (MySQL being one of them since you can add columns relative to existing columns).

Additional information

If this approach is accepted I think making a config option to set this would be a good idea and it could be useful to make enabling this the default depending on the database you ere using (ex enabled by default for postgresql and sqlite3, not for mysql).

Checklist

Before submitting the PR make sure the following are checked:

  • This Pull Request is related to one change. Unrelated changes should be opened in separate PRs.
  • Commit message has a detailed description of what changed and why. If this PR fixes a related issue include it in the commit message. Ex: [Fix #issue-number]
  • Tests are added or updated if you fix a bug or add a feature.
  • CHANGELOG files are updated for the changed libraries if there is a behavior change or additional feature. Minor bug fixes and documentation changes should not be included.

@byroot
Copy link
Member

byroot commented Nov 16, 2024

I think it makes sense, anything that helps with schema.rb flip-floping is welcome, but:

  • I'm not sure if it should be even be an option?
  • If it's an option it probably should be part of load_defaults
  • I think I'd like a second opinion, as it's a big user facing change. @matthewd or @dhh perhaps? Feels like it something you'd care about.

I also wonder if ignoring the column other could result in bug. I think as long as you use higher level Active Record APIs you're fine, but there might be some really intricate cases where code is dependent on columns ordering?

@jduff
Copy link
Contributor Author

jduff commented Jan 3, 2025

@fatkodima i saw your change to reduce merge conflicts in schema #53797 and wondering what you think about the change I have here. Have you noticed issues with columns changing order in the schema.rb and would the change I've made here help your team with that? Happy to make changes here too if you had suggestions.

@byroot
Copy link
Member

byroot commented Jan 5, 2025

#53797 is about structure.sql, not schema.rb.

@byroot
Copy link
Member

byroot commented Jan 5, 2025

I didn't get some other opinion, but I think this change is OK. It just need to be added in load_defaults, have doc in configuring.md and all that.

@jduff
Copy link
Contributor Author

jduff commented Jan 5, 2025

@byroot ah, sorry, I misread the PR and didn't look at the diff itself.

I'll update with those changes and then ask for another review, thanks!

@matthewd
Copy link
Member

matthewd commented Jan 6, 2025

Sorry for the delayed reply 😅

Yeah, I worry partly about the pluck("*") case 1, and partly about people who've used MySQL's ADD COLUMN .. AFTER .. syntax to be very conscious of their column order... but mostly just the aesthetic loss: most of the time, most tables' most important columns all appear together, in a semi-intelligent cluster, at the start of the list (because they were added first, and in groups). And even ignoring the affect on DBs populated by future schema loads, this file is itself generally the human reference point for what columns [and thus attributes] are available on tables/models.

Footnotes

  1. notably, we're not just hiding the fact that two existing databases have different ordering between two semi-concurrently added columns: over time, we'd be guaranteeing that every DB instance has a column ordering unique to the moment it was loaded from the schema.

@jduff
Copy link
Contributor Author

jduff commented Jan 6, 2025

@matthewd thanks for the feedback, I kind of worry about that too but I also think it's already happening and people just ignore those diffs. Either way I'll make it something that can be disabled so that people can opt out of it if they want.

@dhh
Copy link
Member

dhh commented Jan 6, 2025

This should not be an option. This should just be default behavior. Schema dumping should be deterministic regardless of the platform.

@jduff jduff force-pushed the schema_dumper_sort_columns branch from 48a539c to d430b58 Compare January 7, 2025 00:49
@jduff
Copy link
Contributor Author

jduff commented Jan 7, 2025

@dhh sounds good! Just updated the PR to remove the option and always sort the columns when dumping. If you want any other changes let me know.

@byroot byroot force-pushed the schema_dumper_sort_columns branch from d430b58 to e1adacc Compare January 7, 2025 10:39
@byroot byroot force-pushed the schema_dumper_sort_columns branch from e1adacc to e403adf Compare January 7, 2025 10:56
@byroot
Copy link
Member

byroot commented Jan 7, 2025

I rebased your branch because it was quite outdated. I also added a changelog entry, I'll merge on green.

@byroot byroot merged commit 1edb808 into rails:main Jan 7, 2025
1 check passed
@jduff
Copy link
Contributor Author

jduff commented Jan 7, 2025

Thanks @byroot!

@matthewd
Copy link
Member

matthewd commented Jan 7, 2025

Agree there's no need for a separate option here: this can easily be left to the category of low-level schema details where, if you need to reproduce your database more precisely than our schema dump, that's what the structure.sql dump is for.

@jakeonrails
Copy link

Hi! I wrote the original fix-db-schema-conflicts gem to solve this problem 11 years ago.

Over time we ended up adding several other sorts:

  • columns
  • extensions
  • foreign keys
  • indexes

Should we add sorts to those other things too?

@jduff
Copy link
Contributor Author

jduff commented Jan 18, 2025

@jakeonrails I had looked at a few of those when I made my change and it seemed like indexes and foreign keys were already sorted. Feel free to take another look though in case I missed something (I didn't look at extensions). Since we're doing columns now I don't see any reason not to do the others.

@mackuba
Copy link
Contributor

mackuba commented Mar 30, 2025

Hmm… so I wanted to write up a PR with a bug fix since I discovered the schema re-sorts my table columns, but I've found this was changed on purpose…

The description says this adds an option to change the behavior like this - I would strongly vote against making this non-configurable, otherwise I think I'm just gonna have to monkey-patch this method. I want my table columns to be in the order I specified, not alphabetical. Yes, this makes the order deterministic, but it's now deterministically *wrong* ¯\_(ツ)_/¯

If this stays non-configurable, then I think you'd also need to remove support for after / before column options in migrations? Because otherwise they just don't work if schema overrides whatever you put there anyway…

mugitti9 added a commit to mugitti9/ridgepole that referenced this pull request Oct 23, 2025
…abetically in all test SQL and create_table definitions
mugitti9 added a commit to mugitti9/ridgepole that referenced this pull request Oct 29, 2025
…ls/rails#53281 by sorting columns alphabetically in all test SQL and create_table definitions
@tachyons
Copy link

tachyons commented Nov 9, 2025

Ordering column in the table is a known performance optimization in postgresql

https://www.percona.com/blog/postgresql-column-alignment-and-padding-how-to-improve-performance-with-smarter-table-design/

@repoles
Copy link

repoles commented Nov 10, 2025

This PR appears to diverge from the original proposal. Ideally, sorting columns alphabetically in schema.rb should be an optional behavior. After nearly two decades of allowing this flexibility, introducing a new default standard raises concerns.

For many developers, the order of columns in relational database tables is more than a stylistic preference. It often serves practical purposes. While some ordering choices are subjective, consistent patterns can significantly improve readability and workflow efficiency.

For example, I find it helpful to place foreign keys at the beginning. Knowing that id will always be the first column, and that created_at and updated_at will appear last, provides a predictable structure. Similarly, when reviewing address-related records, having fields follow a logical sequence — zip code, street name, number, neighborhood, city — makes navigation easier and reduces cognitive load.

Altering this established behavior could be seen as a step backward for many users who rely on this structure for clarity and productivity. It may be worth reconsidering whether enforcing alphabetical sorting by default is necessary or whether it could remain configurable to accommodate different workflows.

@mackuba
Copy link
Contributor

mackuba commented Nov 11, 2025

FYI, I have a PR here adding an option to restore the previous behavior of the dumper, I rebased it now on the latest version: #55414

@jweir
Copy link

jweir commented Jan 16, 2026

A real world example: we interact with customers that dump CSV files from their databases using a SELECT *. This is a breaking change as the development environment no longer reflects the production environment and different results are produced.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants