Add optional setting to SchemaDumper to disable sorting of table columns#55414
Add optional setting to SchemaDumper to disable sorting of table columns#55414mackuba wants to merge 1 commit intorails:mainfrom
SchemaDumper to disable sorting of table columns#55414Conversation
Can you provide real-world examples when it matters? |
|
It's not about a specific use case, but rather each developer's/team's preference - when you're viewing data using queries in a console like |
|
I don't have a opinion on sorting vs not sorting, but to answer your question, @fatkodima:
Does this count? paper-trail-gem/paper_trail#1457 |
No, because schema.rb is for development use only, while production schemas can have different order. |
|
I see. So if you wanted to have specific ordering for production, you'd write a migration to change the column ordering and know that this would yield no changes to your schema.rb. Make sense! Although I can see how this could confuse people. |
|
Hmm, schema.rb is for development use only? That's the first time I'm hearing this… so what is supposed to be used in production? |
|
Migrations are used to change db in production. |
|
But |
|
I support this:
|
This is definitely going to effect new apps that spend some time in development before being pushed to production for the first time. This is also going to make it so that you need to be explicit with the column names when copying data between databases. Even in development, everyone could have a different layout depending on when was the last time they dropped their local DB and did a schema:load. |
I think it's a good practice to regularly sync the production schema with the committed codebase schema. That goes along with encouraging developers to limit usage of bespoke local dev-data and invest time making seeds effective. Which makes local dev environments less precious and more consistent. |
|
I think this really is a team preference thing. We find benefit in being able to group "like with like" columns together in our larger tables. It's much easier for us to spot inconsistencies in naming for example. We rely on reading schema.rb for reference, and having the schema dumper undo all this curation by automatically sorting columns was unfortunate for us. So we've disabled the sorting for our project (~360 tables, ~5600 columns). Hopefully this PR gets merged. Since there's a merge conflict, here is a commit which may be of use, based on this PR, which updates configuring.md instead of the changelog.md file. |
a4ae7ed to
ff3eb88
Compare
|
Hey! I've rebased the PR based on @stevequinlan's version. |
|
👀 following this. I made a patch, for now, to give me what I need. hopefully this will get approved. |
|
We use explicit column order at GitLab. you can read about it here. But we aren't affected by this change as we use structure.sql instead of schema.rb Column sorting should be opt-in. Not opt out |
| ## | ||
| # :singleton-method: | ||
| # Specifies if columns in create_table block should be ordered alphabetically by name. | ||
| cattr_accessor :sort_table_columns, default: true |
There was a problem hiding this comment.
| cattr_accessor :sort_table_columns, default: true | |
| cattr_accessor :sort_table_columns, default: false |
We should be keep this false by default to restore original behaviour
There was a problem hiding this comment.
Sorting by default is already a new standard. We should not change it.
But adding the ability to disable it is great idea.
|
@tachyons yeah I wouldn't mind that at all, but for now I'm having trouble convincing the team to accept even the opt-out version… |
|
I second this change. I don't mind opting out of this feature, but I should have the option to prevent columns from being sorted in my Few example:
Overall, I do believe devs should have the option to opt out of this until it starts causing issues, and, most importantly, having this configurable is harmless, so I don't understand why there's a battle over whether it should be enforced. |
ff3eb88 to
06a18cc
Compare
|
I have a real world example of why I need these columns in database order. I have automated tests which dump a mysql database, uses schema.rb to create a new database and then mysql loads in the data I just dumped... which thanks to this change no longer matches the table columns. |
|
If we can't get this merged (I do hope so) can we consider a mechanism that at least dumps both, sql and ruby schema? Because the ruby version is a good reference but now unusable to initialize.
I know that I can use SCHEMA_FORMAT env variable but obviously that would require manual care as migrations will only update one of them automatically. |
|
Looks like this is the no.1 most upvoted currently open PR, if I'm reading this correctly… https://github.com/rails/rails/pulls?q=is%3Apr+is%3Aopen+sort%3Areactions-%2B1-desc |
|
@mackuba one thing I noticed is that you're missing an entry in the CHANGELOG. I'm not sure if this is dissuading the maintainers from merging, or if they are just busy - most likely the latter. e.g. PR #53281 has a CHANGELOG entry. I'd also just like to throw in my 2 cents into this while I'm already commenting. Having sorted a schema makes things really weird for open source self-hosted applications. When someone spins up one of my applications, all of their columns are sorted and illogical. |
|
@eileencodes hey, any chance someone could look at this again? 🙏🏻 |
|
I don't think we want a config option for this, it should either be sorted or not. If there's a problem caused by sorting other than dev preference, then we should revert. Otherwise the behavior should stay as-is. Also please don't ping individuals who haven't interacted with the PR, getting attention on a PR is what the discord is for. Thanks for understanding. |
|
Well it was fine for 8 major versions, I say it needs to be reverted. And I don't understand this whole "schema.rb is for dev only" stuff. Nonsense. I should be able to create brand new databases from my schema.rb and I expect them to look exactly like my other databases. |
|
Let's step back and revisit how we got here. The main point is that for teams developing overlapping features in parallel, the schema becomes difficult to work with as the dump is not deterministic. This is a valid issue; as DHH points out "Schema dumping should be deterministic regardless of the platform." While I do think this is more of an issue in developer workflow, DHH's point about being deterministic still rings true. The second point is that it's not guaranteed that every database platform will provide the table schema in an ordinal pattern. At least that's what I inferred from the previous PRs/issues. What if instead we move forward and include the ordinal information in the schema alongside the columns, much like we do with PS. Please respect the maintainers, and be sure to follow the contribution guidelines when looking for feedback. |
But I think then we're back at the same problem, that dev 1 adds a field |
|
That's true, it doesn't fix the problem of development workflow and merges. However, it is still deterministic, and I don't believe that it is the responsibility of Rails to make merges easier anyways. Update: I've come full circle in realizing that the original schema actually was deterministic in nature when it considers ordinal positions of columns for a given platform. It's just that for #53281, ordinal positions between the two development databases were truly different. So if the claim in #32111 that different platforms do not guarantee returning accurate ordinal positions of columns is true. Then it can't be deterministic across different platforms. Though, if it simply only caused by development workflow... then I'm on the fence that we revert. |
|
Ok so I did a test, and I have found no evidence among the supported ActiveRecord adapters that they produce non-deterministic results due to ordinal columns. This means that even with ordinal columns, schema dumps are deterministic across platforms! (excluding extensions) MySQL Test Script# frozen_string_literal: true
require "bundler/inline"
gemfile(true) do
source "https://rubygems.org"
gem "rails", "~> 7.2"
gem "mysql2"
end
require "active_record/railtie"
require "rake"
ENV["DATABASE_URL"] = "mysql://root@localhost/ar_ordinal_schema_test"
class TestApp < Rails::Application
config.load_defaults Rails::VERSION::STRING.to_f
config.eager_load = false
config.logger = Logger.new($stdout)
config.secret_key_base = "secret_key_base"
config.active_record.encryption.primary_key = "primary_key"
config.active_record.encryption.deterministic_key = "deterministic_key"
config.active_record.encryption.key_derivation_salt = "key_derivation_salt"
config.paths["db"] = ["mysql_db"]
end
Rails.application.initialize!
Rails.application.load_tasks
Rake::Task['db:prepare'].invoke
ActiveRecord::Schema.define do
create_table :posts, force: true do |t|
t.string "token", null: false
t.string "name", null: false
t.string "description"
t.datetime "created_at", null: false
t.datetime "updated_at", null: false
t.datetime "deleted_at", null: false
t.index ["token"], name: "index_posts_on_token", unique: true
end
end
Rake::Task['db:schema:dump'].invokePostgreSQL Test Script# frozen_string_literal: true
require "bundler/inline"
gemfile(true) do
source "https://rubygems.org"
gem "rails", "~> 7.2"
gem "pg"
end
require "active_record/railtie"
require "rake"
ENV["DATABASE_URL"] = "postgresql://localhost/ar_ordinal_schema_test"
class TestApp < Rails::Application
config.load_defaults Rails::VERSION::STRING.to_f
config.eager_load = false
config.logger = Logger.new($stdout)
config.secret_key_base = "secret_key_base"
config.active_record.encryption.primary_key = "primary_key"
config.active_record.encryption.deterministic_key = "deterministic_key"
config.active_record.encryption.key_derivation_salt = "key_derivation_salt"
config.paths["db"] = ["pgsql_db"]
end
Rails.application.initialize!
Rails.application.load_tasks
Rake::Task['db:prepare'].invoke
ActiveRecord::Schema.define do
create_table :posts, force: true do |t|
t.string "token", null: false
t.string "name", null: false
t.string "description"
t.datetime "created_at", null: false
t.datetime "updated_at", null: false
t.datetime "deleted_at", null: false
t.index ["token"], name: "index_posts_on_token", unique: true
end
end
Rake::Task['db:schema:dump'].invokeSQLite Test Script# frozen_string_literal: true
require "bundler/inline"
gemfile(true) do
source "https://rubygems.org"
gem "rails", "~> 7.2"
gem "sqlite3"
end
require "active_record/railtie"
require "rake"
ENV["DATABASE_URL"] = "sqlite3::memory:"
class TestApp < Rails::Application
config.load_defaults Rails::VERSION::STRING.to_f
config.eager_load = false
config.logger = Logger.new($stdout)
config.secret_key_base = "secret_key_base"
config.active_record.encryption.primary_key = "primary_key"
config.active_record.encryption.deterministic_key = "deterministic_key"
config.active_record.encryption.key_derivation_salt = "key_derivation_salt"
config.paths["db"] = ["sqlite_db"]
end
Rails.application.initialize!
Rails.application.load_tasks
Rake::Task['db:prepare'].invoke
ActiveRecord::Schema.define do
create_table :posts, force: true do |t|
t.string "token", null: false
t.string "name", null: false
t.string "description"
t.datetime "created_at", null: false
t.datetime "updated_at", null: false
t.datetime "deleted_at", null: false
t.index ["token"], name: "index_posts_on_token", unique: true
end
end
Rake::Task['db:schema:dump'].invokeMeanwhile, sorting columns alphabetically produces an incorrect column ordering in production when spinning up a Rails application a new region for data residency purposes via
This essentially means that the original change to sort columns in the schema alphabetically was included due to developer preference and workflows, and has a real-world impact on production environments as a result. |
|
I think that what the original PR meant by non-deterministic results is that:
|
|
Yes, and that that result makes complete sense since their schema was dumped from a database that had columns in a different order than the previous schema dump. This does not mean that it's the responsibility of the schema dumper to resolve this failing - especially at the cost of incorrect column ordering in production when using So then how do we properly solve this?Everyone having this problem is looking in the wrong spot. Trying to fix the symptom (schema.rb), when they should be focusing on the cause. The cause is ActiveRecord's migration process. When ActiveRecord processes old migrations, it does not consider what columns were added after the one it is inserting. To fix this issue, it needs to make sure that it inserts them before and not after. Here is a debug script that outlines this issue.# frozen_string_literal: true
require "bundler/inline"
gemfile(true) do
source "https://rubygems.org"
gem "rails"
gem "sqlite3"
end
require "active_record/railtie"
require "minitest/autorun"
# This connection will do for database-independent bug reports.
ENV["DATABASE_URL"] = "sqlite3::memory:"
class TestApp < Rails::Application
config.load_defaults Rails::VERSION::STRING.to_f
config.eager_load = false
config.logger = Logger.new($stdout)
config.secret_key_base = "secret_key_base"
end
Rails.application.initialize!
ActiveRecord::Schema.define do
create_table :users, force: true do |t|
t.string :email, null: false
end
end
# db/migrate/1771075560_add_name_to_users.rb
class AddNameToUsers < ActiveRecord::Migration::Current
def change
add_column :users, :first_name, :string
add_column :users, :last_name, :string
end
end
# db/migrate/1771075634_add_display_name_to_users.rb
class AddDisplayNameToUsers < ActiveRecord::Migration::Current
def change
add_column :users, :display_name, :string
end
end
class User < ActiveRecord::Base
end
class BugTest < ActiveSupport::TestCase
setup do
# Reset between.
AddNameToUsers.migrate(:down)
AddDisplayNameToUsers.migrate(:down)
end
##
# Assume the following migration structure
# db/
# └── migrate/
# ├── 1771075543_create_users.rb
# ├── 1771075560_add_name_to_users.rb
# └── 1771075634_add_display_name_to_users.rb
# Developer 1 is adding first_name/last_name.
def test_migration_in_order_for_developer_1
AddNameToUsers.migrate(:up)
AddDisplayNameToUsers.migrate(:up)
User.reset_column_information
# Everything is fine since the display name migration is after their migration.
assert_equal %w[id email first_name last_name display_name], User.columns_hash.keys
end
# Developer 2 is adding display_name and has just updated their branch with developer 1's changes.
def test_migration_out_of_order_for_developer_2
AddDisplayNameToUsers.migrate(:up)
AddNameToUsers.migrate(:up)
User.reset_column_information
# ActiveRecord migrations fail to recognize that the columns added in migration 1771075560
# should be inserted before the columns added in migration 1771075634.
assert_equal %w[id email first_name last_name display_name], User.columns_hash.keys
end
endTo fix this, ActiveRecord would need to store (and check) metadata about each column in regards to what migration created each column. Even though this could be done by yet another migration metadata table, I don't think we should. The overhead for such a small developer inconvenience is not worth it. e.g.
So to summarize everything again, for the busy folks.
With all of this information, we should revert. Alphabetical columns in schemas is not the right answer to the problem. |
Both are kind of only for dev stuff. You can load up a schema for a new db in production but you'd only do that once. Regardless that's not what we're really discussing here. From my perspective, the schema and structure files are for computers to rebuild the database, not for humans to read, so there should be no applied order to the output.
Can someone open a PR reverting so Rails core can discuss a final decision? |
Yes, but the whole point is not about how a |
|
I created a PR per Eileen's request: #56842 |
Motivation / Background
Pull request #53281, merged in January, has changed the behavior of
ActiveRecord::SchemaDumperto print table columns inside acreate_tableblock inschema.dbsorted alphabetically by name, instead of the actual order in the database.The rationale was that when multiple people are adding migrations to the same table in parallel, they can end up with different order of the columns in their local databases, which then generate different versions of the schema, causing confusing diffs or merge conflicts. Putting columns in alphabetical order makes the order deterministic.
That is true, but it's also true that the order is now deterministically different than the actual order in the database… and sometimes, the order of the columns matters to some degree (otherwise, why have an
:after/:beforeoptions in migrations at all?).This PR intends to add a way to restore the behavior present in AR 8.0 and earlier, while keeping the new behavior as a default.
Detail
The PR adds a class field
sort_table_columnsinActiveRecord::SchemaDumper, defaulting to true. If set to false, the table columns are printed in original order instead of sorted by name.Additional information
I've added one additional test and checked the tests with
rake test:sqlite3.Alternative name I've considered was
sort_columns_by_name, so we could go with that if you prefer.