Skip to content

Comments

Add optional setting to SchemaDumper to disable sorting of table columns#55414

Open
mackuba wants to merge 1 commit intorails:mainfrom
mackuba:schema-dumper-column-sort
Open

Add optional setting to SchemaDumper to disable sorting of table columns#55414
mackuba wants to merge 1 commit intorails:mainfrom
mackuba:schema-dumper-column-sort

Conversation

@mackuba
Copy link
Contributor

@mackuba mackuba commented Jul 28, 2025

Motivation / Background

Pull request #53281, merged in January, has changed the behavior of ActiveRecord::SchemaDumper to print table columns inside a create_table block in schema.db sorted alphabetically by name, instead of the actual order in the database.

The rationale was that when multiple people are adding migrations to the same table in parallel, they can end up with different order of the columns in their local databases, which then generate different versions of the schema, causing confusing diffs or merge conflicts. Putting columns in alphabetical order makes the order deterministic.

That is true, but it's also true that the order is now deterministically different than the actual order in the database… and sometimes, the order of the columns matters to some degree (otherwise, why have an :after / :before options in migrations at all?).

This PR intends to add a way to restore the behavior present in AR 8.0 and earlier, while keeping the new behavior as a default.

Detail

The PR adds a class field sort_table_columns in ActiveRecord::SchemaDumper, defaulting to true. If set to false, the table columns are printed in original order instead of sorted by name.

Additional information

I've added one additional test and checked the tests with rake test:sqlite3.

Alternative name I've considered was sort_columns_by_name, so we could go with that if you prefer.

@fatkodima
Copy link
Member

and sometimes, the order of the columns matters to some degree (otherwise, why have an :after / :before options in migrations at all?).

Can you provide real-world examples when it matters?

@mackuba
Copy link
Contributor Author

mackuba commented Jul 28, 2025

It's not about a specific use case, but rather each developer's/team's preference - when you're viewing data using queries in a console like psql, it's printed in the table with columns in the order as they are in the database, so you may want to see them in that order, and not a completely different one.

@ngan
Copy link
Contributor

ngan commented Aug 5, 2025

I don't have a opinion on sorting vs not sorting, but to answer your question, @fatkodima:

Can you provide real-world examples when it matters?

Does this count? paper-trail-gem/paper_trail#1457

@fatkodima
Copy link
Member

Does this count? paper-trail-gem/paper_trail#1457

No, because schema.rb is for development use only, while production schemas can have different order.

@ngan
Copy link
Contributor

ngan commented Aug 5, 2025

I see. So if you wanted to have specific ordering for production, you'd write a migration to change the column ordering and know that this would yield no changes to your schema.rb. Make sense! Although I can see how this could confuse people.

@mackuba
Copy link
Contributor Author

mackuba commented Aug 6, 2025

Hmm, schema.rb is for development use only? That's the first time I'm hearing this… so what is supposed to be used in production?

@fatkodima
Copy link
Member

Migrations are used to change db in production. schema.rb file is not needed there.

@mackuba
Copy link
Contributor Author

mackuba commented Aug 6, 2025

But db:migrate was just changed in 8.0 so that when you initialize a new database, it skips running all migrations and loads schema.rb instead (#52830), I think that change applies to both dev and production?

@bensheldon
Copy link
Contributor

I support this:

  • in Postgres, column ordering affects storage size. I do consider this when laying out my database tables.
  • For single-tenant and “stamped”/duplicatable applications , I do use schema to install the database and I think that’s a valid use for it. Otherwise one would have to maintain migrations longterm and that also isn’t appropriate imo.

@pbstriker38
Copy link
Contributor

pbstriker38 commented Oct 27, 2025

But db:migrate was just changed in 8.0 so that when you initialize a new database, it skips running all migrations and loads schema.rb instead (#52830), I think that change applies to both dev and production?

This is definitely going to effect new apps that spend some time in development before being pushed to production for the first time.

This is also going to make it so that you need to be explicit with the column names when copying data between databases. Even in development, everyone could have a different layout depending on when was the last time they dropped their local DB and did a schema:load.

@bensheldon
Copy link
Contributor

Even in development, everyone could have a different layout depending on when was the last time they dropped their local DB and did a schema:load.

I think it's a good practice to regularly sync the production schema with the committed codebase schema. That goes along with encouraging developers to limit usage of bespoke local dev-data and invest time making seeds effective. Which makes local dev environments less precious and more consistent.

@stevequinlan
Copy link

I think this really is a team preference thing. We find benefit in being able to group "like with like" columns together in our larger tables. It's much easier for us to spot inconsistencies in naming for example. We rely on reading schema.rb for reference, and having the schema dumper undo all this curation by automatically sorting columns was unfortunate for us. So we've disabled the sorting for our project (~360 tables, ~5600 columns).

Hopefully this PR gets merged. Since there's a merge conflict, here is a commit which may be of use, based on this PR, which updates configuring.md instead of the changelog.md file.
ClinchIO@e4d176d

@mackuba mackuba force-pushed the schema-dumper-column-sort branch from a4ae7ed to ff3eb88 Compare November 11, 2025 14:15
@github-actions github-actions bot added the docs label Nov 11, 2025
@mackuba
Copy link
Contributor Author

mackuba commented Nov 11, 2025

Hey! I've rebased the PR based on @stevequinlan's version.

@afittro
Copy link

afittro commented Jan 7, 2026

👀 following this. I made a patch, for now, to give me what I need. hopefully this will get approved.

@tachyons
Copy link

tachyons commented Jan 17, 2026

We use explicit column order at GitLab. you can read about it here. But we aren't affected by this change as we use structure.sql instead of schema.rb

Column sorting should be opt-in. Not opt out

##
# :singleton-method:
# Specifies if columns in create_table block should be ordered alphabetically by name.
cattr_accessor :sort_table_columns, default: true

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
cattr_accessor :sort_table_columns, default: true
cattr_accessor :sort_table_columns, default: false

We should be keep this false by default to restore original behaviour

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorting by default is already a new standard. We should not change it.
But adding the ability to disable it is great idea.

@mackuba
Copy link
Contributor Author

mackuba commented Jan 17, 2026

@tachyons yeah I wouldn't mind that at all, but for now I'm having trouble convincing the team to accept even the opt-out version…

@Shuiei
Copy link

Shuiei commented Jan 28, 2026

I second this change. I don't mind opting out of this feature, but I should have the option to prevent columns from being sorted in my schema.rb.

Few example:

  • Doing a SELECT * and using the index of a column or any script that requires a specific column order will not work anymore (good practice, maybe not, but it can still happen),
  • Having my columns ordered based on their business commonality is impossible.
  • created_at and updated_at in the middle of my table, instead of at the end, is something I really dislike.
  • Not reflecting the order of my column in production and development breaks the dev-parity logic.
  • Column order can affect DB size/performance for PostgreSQL.
  • It makes the usage of add_column, after: and before: confusing since it will not be properly reflected in the schema.rb

Overall, I do believe devs should have the option to opt out of this until it starts causing issues, and, most importantly, having this configurable is harmless, so I don't understand why there's a battle over whether it should be enforced.

@jasonperrone
Copy link

I have a real world example of why I need these columns in database order. I have automated tests which dump a mysql database, uses schema.rb to create a new database and then mysql loads in the data I just dumped... which thanks to this change no longer matches the table columns.

@2called-chaos
Copy link
Contributor

If we can't get this merged (I do hope so) can we consider a mechanism that at least dumps both, sql and ruby schema? Because the ruby version is a good reference but now unusable to initialize.

config.active_record.schema_format obviously only accepts either or.

I know that I can use SCHEMA_FORMAT env variable but obviously that would require manual care as migrations will only update one of them automatically.

@mackuba
Copy link
Contributor Author

mackuba commented Feb 11, 2026

Looks like this is the no.1 most upvoted currently open PR, if I'm reading this correctly… https://github.com/rails/rails/pulls?q=is%3Apr+is%3Aopen+sort%3Areactions-%2B1-desc

@bert-mccutchen
Copy link

@mackuba one thing I noticed is that you're missing an entry in the CHANGELOG. I'm not sure if this is dissuading the maintainers from merging, or if they are just busy - most likely the latter.
https://edgeguides.rubyonrails.org/contributing_to_ruby_on_rails.html#updating-the-changelog

e.g. PR #53281 has a CHANGELOG entry.

I'd also just like to throw in my 2 cents into this while I'm already commenting. Having sorted a schema makes things really weird for open source self-hosted applications. When someone spins up one of my applications, all of their columns are sorted and illogical.

@mackuba
Copy link
Contributor Author

mackuba commented Feb 13, 2026

@eileencodes hey, any chance someone could look at this again? 🙏🏻

@eileencodes
Copy link
Member

I don't think we want a config option for this, it should either be sorted or not. If there's a problem caused by sorting other than dev preference, then we should revert. Otherwise the behavior should stay as-is.

Also please don't ping individuals who haven't interacted with the PR, getting attention on a PR is what the discord is for. Thanks for understanding.

@jasonperrone
Copy link

jasonperrone commented Feb 13, 2026

Well it was fine for 8 major versions, I say it needs to be reverted.

And I don't understand this whole "schema.rb is for dev only" stuff. Nonsense. I should be able to create brand new databases from my schema.rb and I expect them to look exactly like my other databases.

@bert-mccutchen
Copy link

Let's step back and revisit how we got here.

The main point is that for teams developing overlapping features in parallel, the schema becomes difficult to work with as the dump is not deterministic. This is a valid issue; as DHH points out "Schema dumping should be deterministic regardless of the platform." While I do think this is more of an issue in developer workflow, DHH's point about being deterministic still rings true.

The second point is that it's not guaranteed that every database platform will provide the table schema in an ordinal pattern. At least that's what I inferred from the previous PRs/issues.

What if instead we move forward and include the ordinal information in the schema alongside the columns, much like we do with null: {bool}, unique: {bool}, etc. e.g. position: {int}

PS. Please respect the maintainers, and be sure to follow the contribution guidelines when looking for feedback.

@mackuba
Copy link
Contributor Author

mackuba commented Feb 14, 2026

What if instead we move forward and include the ordinal information in the schema alongside the columns, much like we do with null: {bool}, unique: {bool}, etc. e.g. position: {int}

But I think then we're back at the same problem, that dev 1 adds a field :tags, position: 5 and dev 2 adds a field :draft, position: 5 on a parallel branch, and then when they merge, they have two fields at position 5?

@bert-mccutchen
Copy link

bert-mccutchen commented Feb 14, 2026

That's true, it doesn't fix the problem of development workflow and merges. However, it is still deterministic, and I don't believe that it is the responsibility of Rails to make merges easier anyways.

Update: I've come full circle in realizing that the original schema actually was deterministic in nature when it considers ordinal positions of columns for a given platform. It's just that for #53281, ordinal positions between the two development databases were truly different. So if the claim in #32111 that different platforms do not guarantee returning accurate ordinal positions of columns is true. Then it can't be deterministic across different platforms.

Though, if it simply only caused by development workflow... then I'm on the fence that we revert.

@bert-mccutchen
Copy link

Ok so I did a test, and I have found no evidence among the supported ActiveRecord adapters that they produce non-deterministic results due to ordinal columns. This means that even with ordinal columns, schema dumps are deterministic across platforms! (excluding extensions)

MySQL Test Script
# frozen_string_literal: true

require "bundler/inline"

gemfile(true) do
  source "https://rubygems.org"
  gem "rails", "~> 7.2"
  gem "mysql2"
end

require "active_record/railtie"
require "rake"

ENV["DATABASE_URL"] = "mysql://root@localhost/ar_ordinal_schema_test"

class TestApp < Rails::Application
  config.load_defaults Rails::VERSION::STRING.to_f
  config.eager_load = false
  config.logger = Logger.new($stdout)
  config.secret_key_base = "secret_key_base"

  config.active_record.encryption.primary_key = "primary_key"
  config.active_record.encryption.deterministic_key = "deterministic_key"
  config.active_record.encryption.key_derivation_salt = "key_derivation_salt"

  config.paths["db"] = ["mysql_db"]
end
Rails.application.initialize!
Rails.application.load_tasks

Rake::Task['db:prepare'].invoke

ActiveRecord::Schema.define do
  create_table :posts, force: true do |t|
    t.string "token", null: false
    t.string "name", null: false
    t.string "description"
    t.datetime "created_at", null: false
    t.datetime "updated_at", null: false
    t.datetime "deleted_at", null: false
    t.index ["token"], name: "index_posts_on_token", unique: true
  end
end

Rake::Task['db:schema:dump'].invoke
PostgreSQL Test Script
# frozen_string_literal: true

require "bundler/inline"

gemfile(true) do
  source "https://rubygems.org"
  gem "rails", "~> 7.2"
  gem "pg"
end

require "active_record/railtie"
require "rake"

ENV["DATABASE_URL"] = "postgresql://localhost/ar_ordinal_schema_test"

class TestApp < Rails::Application
  config.load_defaults Rails::VERSION::STRING.to_f
  config.eager_load = false
  config.logger = Logger.new($stdout)
  config.secret_key_base = "secret_key_base"

  config.active_record.encryption.primary_key = "primary_key"
  config.active_record.encryption.deterministic_key = "deterministic_key"
  config.active_record.encryption.key_derivation_salt = "key_derivation_salt"

  config.paths["db"] = ["pgsql_db"]
end
Rails.application.initialize!
Rails.application.load_tasks

Rake::Task['db:prepare'].invoke

ActiveRecord::Schema.define do
  create_table :posts, force: true do |t|
    t.string "token", null: false
    t.string "name", null: false
    t.string "description"
    t.datetime "created_at", null: false
    t.datetime "updated_at", null: false
    t.datetime "deleted_at", null: false
    t.index ["token"], name: "index_posts_on_token", unique: true
  end
end

Rake::Task['db:schema:dump'].invoke
SQLite Test Script
# frozen_string_literal: true

require "bundler/inline"

gemfile(true) do
  source "https://rubygems.org"
  gem "rails", "~> 7.2"
  gem "sqlite3"
end

require "active_record/railtie"
require "rake"

ENV["DATABASE_URL"] = "sqlite3::memory:"

class TestApp < Rails::Application
  config.load_defaults Rails::VERSION::STRING.to_f
  config.eager_load = false
  config.logger = Logger.new($stdout)
  config.secret_key_base = "secret_key_base"

  config.active_record.encryption.primary_key = "primary_key"
  config.active_record.encryption.deterministic_key = "deterministic_key"
  config.active_record.encryption.key_derivation_salt = "key_derivation_salt"

  config.paths["db"] = ["sqlite_db"]
end
Rails.application.initialize!
Rails.application.load_tasks

Rake::Task['db:prepare'].invoke

ActiveRecord::Schema.define do
  create_table :posts, force: true do |t|
    t.string "token", null: false
    t.string "name", null: false
    t.string "description"
    t.datetime "created_at", null: false
    t.datetime "updated_at", null: false
    t.datetime "deleted_at", null: false
    t.index ["token"], name: "index_posts_on_token", unique: true
  end
end

Rake::Task['db:schema:dump'].invoke

Meanwhile, sorting columns alphabetically produces an incorrect column ordering in production when spinning up a Rails application a new region for data residency purposes via db:prepare. Something that my company needs to do with every new region we begin to support. I believe this satisfies the condition to revert mentioned by Eileen above.

If there's a problem caused by sorting other than dev preference, then we should revert.


This essentially means that the original change to sort columns in the schema alphabetically was included due to developer preference and workflows, and has a real-world impact on production environments as a result.

@mackuba
Copy link
Contributor Author

mackuba commented Feb 14, 2026

I think that what the original PR meant by non-deterministic results is that:

  1. dev A adds field_a to a table
  2. dev B adds field_b to the same table
  3. someone merges their branches
  4. schema becomes ..., field_a, field_b
  5. dev B pulls the latest version, runs migration; field_a gets added, field_b was already there
  6. dev B's database now has ..., field_b, field_a
  7. Rails on dev B's computer dumps the schema and shows a change in the schema file, trying to change the order from field_a, field_b to field_b, field_a

@bert-mccutchen
Copy link

Yes, and that that result makes complete sense since their schema was dumped from a database that had columns in a different order than the previous schema dump. This does not mean that it's the responsibility of the schema dumper to resolve this failing - especially at the cost of incorrect column ordering in production when using db:prepare when spinning up new regions.

So then how do we properly solve this?

Everyone having this problem is looking in the wrong spot. Trying to fix the symptom (schema.rb), when they should be focusing on the cause.

The cause is ActiveRecord's migration process. When ActiveRecord processes old migrations, it does not consider what columns were added after the one it is inserting. To fix this issue, it needs to make sure that it inserts them before and not after.

Here is a debug script that outlines this issue.
# frozen_string_literal: true

require "bundler/inline"

gemfile(true) do
  source "https://rubygems.org"

  gem "rails"
  gem "sqlite3"
end

require "active_record/railtie"
require "minitest/autorun"

# This connection will do for database-independent bug reports.
ENV["DATABASE_URL"] = "sqlite3::memory:"

class TestApp < Rails::Application
  config.load_defaults Rails::VERSION::STRING.to_f
  config.eager_load = false
  config.logger = Logger.new($stdout)
  config.secret_key_base = "secret_key_base"
end
Rails.application.initialize!

ActiveRecord::Schema.define do
  create_table :users, force: true do |t|
    t.string :email, null: false
  end
end

# db/migrate/1771075560_add_name_to_users.rb
class AddNameToUsers < ActiveRecord::Migration::Current
  def change
    add_column :users, :first_name, :string
    add_column :users, :last_name, :string
  end
end

# db/migrate/1771075634_add_display_name_to_users.rb
class AddDisplayNameToUsers < ActiveRecord::Migration::Current
  def change
    add_column :users, :display_name, :string
  end
end

class User < ActiveRecord::Base
end

class BugTest < ActiveSupport::TestCase
  setup do
    # Reset between.
    AddNameToUsers.migrate(:down)
    AddDisplayNameToUsers.migrate(:down)
  end

  ##
  # Assume the following migration structure
  # db/
  # └── migrate/
  #     ├── 1771075543_create_users.rb
  #     ├── 1771075560_add_name_to_users.rb
  #     └── 1771075634_add_display_name_to_users.rb

  # Developer 1 is adding first_name/last_name.
  def test_migration_in_order_for_developer_1
    AddNameToUsers.migrate(:up)
    AddDisplayNameToUsers.migrate(:up)
    User.reset_column_information

    # Everything is fine since the display name migration is after their migration.
    assert_equal %w[id email first_name last_name display_name], User.columns_hash.keys
  end

  # Developer 2 is adding display_name and has just updated their branch with developer 1's changes.
  def test_migration_out_of_order_for_developer_2
    AddDisplayNameToUsers.migrate(:up)
    AddNameToUsers.migrate(:up)
    User.reset_column_information

    # ActiveRecord migrations fail to recognize that the columns added in migration 1771075560
    # should be inserted before the columns added in migration 1771075634.
    assert_equal %w[id email first_name last_name display_name], User.columns_hash.keys
  end
end

To fix this, ActiveRecord would need to store (and check) metadata about each column in regards to what migration created each column. Even though this could be done by yet another migration metadata table, I don't think we should. The overhead for such a small developer inconvenience is not worth it.

e.g. schema_column_migrations

table column version
users id 1771075543
users email 1771075543
users first_name 1771075560
users last_name 1771075560
users display_name 1771075634

So to summarize everything again, for the busy folks.

  1. The original issue with schema conflicts (which is a symptom of another issue entirely) is strictly a minor developer inconvenience.
  2. The fix (alphabetical schemas) for this issue is causing real-world problems in production when creating a new database in a new region for regionalized Rails applications (data residency) via db:prepare.
  3. There was chatter in previous issues about schema dumping not returning columns in a deterministic way across platforms. Which I have disproved.
  4. The real issue is with how ActiveRecord handles adding columns from previous migrations - rather than the schema dump.

With all of this information, we should revert. Alphabetical columns in schemas is not the right answer to the problem.

@eileencodes
Copy link
Member

And I don't understand this whole "schema.rb is for dev only" stuff

Both are kind of only for dev stuff. You can load up a schema for a new db in production but you'd only do that once. Regardless that's not what we're really discussing here.

From my perspective, the schema and structure files are for computers to rebuild the database, not for humans to read, so there should be no applied order to the output.

With all of this information, we should revert

Can someone open a PR reverting so Rails core can discuss a final decision?

@mackuba
Copy link
Contributor Author

mackuba commented Feb 18, 2026

From my perspective, the schema and structure files are for computers to rebuild the database, not for humans to read

Yes, but the whole point is not about how a schema.rb file looks, the point is how the resulting database table looks. Right now this files creates a table that looks different than originally intended.

@bert-mccutchen
Copy link

I created a PR per Eileen's request: #56842

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.