Skip to content

Comments

[ActiveRecord] Schema Dumper Columns Order#32111

Closed
Tensho wants to merge 2 commits intorails:masterfrom
Tensho:schema-dumper-sort-columns
Closed

[ActiveRecord] Schema Dumper Columns Order#32111
Tensho wants to merge 2 commits intorails:masterfrom
Tensho:schema-dumper-sort-columns

Conversation

@Tensho
Copy link
Contributor

@Tensho Tensho commented Feb 26, 2018

Summary

From project to project written with Rails I spectate the similar snag, when column definition rows in the schema are shuffled from time to time (only for schema.rb in :ruby mode). It's annoying to see the changes in the git index, despite I haven't touched any of them. It's really hard for me to define the steps to reproduce for this bug, but let me put here my thoughts and proposed changes. First consider the next example for better clarity:

     t.string "county"
-    t.string "country_code"
     t.datetime "created_at", null: false
     t.datetime "updated_at", null: false
+    t.string "country_code"
     t.string "urn"

Here "country_code" column definition changed position even though there was no any related migration code.

First of all I thought this behaviour is connected somehow with database engine version or operating system version divergence between developers in our team. So we setup the same version of Rails 4.2 (later checked on 5.2) and MySQL 5.7.20 on macOS Sierra 10.12.6 at MacBook Pro. But there was no luck – I still spectate schema shuffling.

Then I decided to dig into the schema dumper code to understand how ActiveRecord retrieves columns information and dumps it to the schema.rb file. Inside I noticed expected alphabetical table names sorting:

def tables(stream)
  sorted_tables = @connection.tables.sort

   sorted_tables.each do |table_name|
    table(table_name, stream) unless ignored?(table_name)
  end
  ...

Then I found table columns info retrievement:

def table(table, stream)
  columns = @connection.columns(table)
...

So there is no specific ordering. "Hmmn" 🤔 , – was sparkled in my – "Maybe it's desirable to show columns definition in the same manera as database holds them and don't sort in alphabetical order. But how different databases return schema info and do they comply the result order?". I moved to connection adapters:
MySQL (ActiveRecord::ConnectionAdapters::AbstractMysqlAdapter)

def column_definitions(table_name) # :nodoc:
  execute_and_free("SHOW FULL FIELDS FROM #{quote_table_name(table_name)}", "SCHEMA") do |result|
    each_hash(result)
  end
end

I haven't found any proof in the SHOW COLUMNS (synonym to SHOW FIELDS) MySQL official documentation that ordering is present. But I found the next interesting discussions – "SHOW DATABASES does not order infomation_schema correct" and "Return order of MySQL SHOW COLUMNS
"
– where people argue how MySQL handles position of columns info. Unfortunately, I'm not able to trace MySQL server source code to find the pure evidence. Let's assume all this means column_definitionsdoesn't gauranty the order. Do we have alternatives? Yes!

  • DESCRIBE (the clone of SHOW COLUMNS, that leads me to the idea MySQL follows the same "There is more than one way to do it" principle as Ruby/Perl 😄 )
  • INFORMATION_SCHEMA.COLUMNS

MySQL has ORDINAL POSITION of the columns and provides it in INFORMATION_SCHEMA.COLUMNS. But there is one note, that makes me confused:

ORDINAL_POSITION is necessary because you might want to say ORDER BY ORDINAL_POSITION. Unlike SHOW, SELECT does not have automatic ordering.

Seems like thoeretically SHOW should keep the order according to what I see in the note, but practically I see the opposite. Unlike Postgres, MySQL allows to insert column before or after specific one.
I'd be very pleasant to hear any thoughts and get any references regarding the columns positioning (logical, physical, virtual, etc.) here.

Postgres (ActiveRecord::ConnectionAdapters::PostgreSQLAdapter)

def column_definitions(table_name)
  query(<<-end_sql, "SCHEMA")
              SELECT a.attname, format_type(a.atttypid, a.atttypmod),
                     pg_get_expr(d.adbin, d.adrelid), a.attnotnull, a.atttypid, a.atttypmod,
                     c.collname, col_description(a.attrelid, a.attnum) AS comment
                FROM pg_attribute a
                LEFT JOIN pg_attrdef d ON a.attrelid = d.adrelid AND a.attnum = d.adnum
                LEFT JOIN pg_type t ON a.atttypid = t.oid
                LEFT JOIN pg_collation c ON a.attcollation = c.oid AND a.attcollation <> t.typcollation
               WHERE a.attrelid = #{quote(quote_table_name(table_name))}::regclass
                 AND a.attnum > 0 AND NOT a.attisdropped
               ORDER BY a.attnum
  end_sql
end

Schema info retrievement is pretty straightforward from pg_attribute system table. We have explicit ORDER BY a.attnum (a.attnum – the number of the column), which should guarantee the order of attributes I guess. Unlike MySQL, Postgres doesn't allow to insert column before or after specific one.

SQlite3 (ActiveRecord::ConnectionAdapters::SQLite3Adapter)

def table_structure(table_name)
  structure = exec_query("PRAGMA table_info(#{quote_table_name(table_name)})", "SCHEMA")
  ...

PRAGMA table_info lacks the required knowledge, that's why ActiveRecord uses sqlite_master readonly table to get and parse initial table creation SQL. As far as we parse the SQL statements all the time, it could be considered as columns order guarantee. Here is an excerpt from the ALTER TABLE official documentation regarding column position specification:

The ADD COLUMN syntax is used to add a new column to an existing table. The new column is always appended to the end of the list of existing columns.

Alternatives

Any sorting, especially if it's not supported in the database, could be performed at schema dumper level. For example, dumper sorts table names in the alphabetical order right now. The same approach could be applied to the columns. Moreover we may enforce ordering at both levels – database query and schema dumper table info processing.

Considertations

How do different databases consider the columns order? AFAIK, there are 2 points:

  • Columns order from fixed length to variable length provides performance boost, because it's easier to find the offset of the fixed field. That's why primary key is better to place in the beginning of the table.
  • Columns order doesn't much matter from user point of view, because columns order at the logical level could be different from how database spread columns at the physical level. That means you don't have to be bothered with column position tuning, database engine better knows how to store the data.

Proposed changes

  • Force MySQL adapter to retrieve columns definition in the same order.

Does all this make sense? Please shout out if I have to cover provided changes with specs somewhere else. Also I'm happy for any critics 🙇

@Tensho Tensho force-pushed the schema-dumper-sort-columns branch from fcc0c54 to 0732289 Compare February 26, 2018 10:32
@matthewd
Copy link
Member

Unlike SHOW, SELECT does not have automatic ordering

I agree that strongly supports the assumption that SHOW is ordered.

Does your newly added test fail without the adapter change? 😕

I don't recall ever hearing a claim that the schema dump doesn't match the database column order.

@Tensho
Copy link
Contributor Author

Tensho commented Feb 26, 2018

No, it's not failing 😞 As I wrote earlier it's hard to reproduce in a bare rails app.

Suppose MySQL SHOW returns always ordered result. What is the base for that order? ORDINAL_POSITION?

TBH, I'm lost what could issue something like this:
image
Is it possible MySQL changes the column ordinal position internally, without specific alter command?
Seems like previously ordinal position for reason_reopened was 27 and now it's 29.

mysql> SELECT * FROM INFORMATION_SCHEMA.COLUMNS WHERE table_schema = "rails_app_development" AND TABLE_NAME = "case_files" AND COLUMN_NAME = "reason_reopened" \G;
*************************** 1. row ***************************
           TABLE_CATALOG: def
            TABLE_SCHEMA: rails_app_development
              TABLE_NAME: case_files
             COLUMN_NAME: reason_reopened
        ORDINAL_POSITION: 29
          COLUMN_DEFAULT: NULL
             IS_NULLABLE: YES
               DATA_TYPE: text
CHARACTER_MAXIMUM_LENGTH: 65535
  CHARACTER_OCTET_LENGTH: 65535
       NUMERIC_PRECISION: NULL
           NUMERIC_SCALE: NULL
      DATETIME_PRECISION: NULL
      CHARACTER_SET_NAME: utf8
          COLLATION_NAME: utf8_general_ci
             COLUMN_TYPE: text
              COLUMN_KEY:
                   EXTRA:
              PRIVILEGES: select,insert,update,references
          COLUMN_COMMENT:
   GENERATION_EXPRESSION:

@matthewd
Copy link
Member

IME that happens when the schema.rb was last dumped by a different computer, using a different copy of the database, which has the columns in a different order. And I really don't think there's anything we can do about that.

@Tensho
Copy link
Contributor Author

Tensho commented Feb 26, 2018

🤔 Hmmn, maybe the problem arises from the input side, for example mysqldump invocation with some specific options leads to different column definition order... and then people run different SQL instructions for import. OK, I'll try to find more clues among my team. Will reopen in case if find anything helpful.

@matthewd, thank you for your time 🙇

P.S. Does the test for tables alphabetical order is still acceptable?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants