Skip to content

Comments

Assign auto populated columns on Active Record object creation #48241

Merged
eileencodes merged 1 commit intorails:mainfrom
Shopify:populate-autoincremented-column-for-a-model-with-cpk
Jun 1, 2023
Merged

Assign auto populated columns on Active Record object creation #48241
eileencodes merged 1 commit intorails:mainfrom
Shopify:populate-autoincremented-column-for-a-model-with-cpk

Conversation

@nvasilevski
Copy link
Contributor

@nvasilevski nvasilevski commented May 16, 2023

This PR extends Active Record creation logic to allow for a database
auto-populated attributes to be assigned on object creation.

Given a Post model represented by the following schema:

create_table :posts, id: false do |t|
  t.integer :sequential_number, auto_increment: true
  t.string :title, primary_key: true
  t.string :ruby_on_rails, default: -> { "concat('R', 'o', 'R')" }
end

where title is being used as a primary key, the table has an
integer sequential_number column populated by a sequence and
ruby_on_rails column has a default function - creation of
Post records should populate the sequential_number and
ruby_on_rails attributes:

new_post = Post.create(title: 'My first post')
new_post.sequential_number # => 1
new_post.ruby_on_rails # => 'RoR'

At this moment MySQL and SQLite adapters are limited to only one
column being populated and the column must be the auto_increment
while PostgreSQL adapter supports any number of auto-populated
columns through RETURNING statement.

Implementation details

Overall solution is trying to be generic and extensible to allow for additions while still maintains (for MySQL and SQLite) the fact that currently Rails only populates a single column on creation.

  • The most important change I'd like to bring the most attention to is the method signature change for both exec_insert and insert public methods. We are adding a new returning keyword which accepts either a single value or an array of columns to control the return value of the method and define what column values to be returned after row insertion. Currently only supported by Postgresql using RETURNING statement. The behavior of the returning argument is defined as following: nil is the default value and doesn't change the behavior of the methods - they continue to return a single primary key or LAST_INSERT_ID() value. If an array of column names is passed - an array is returned back. However non-nil value for the argument is only supported by postgresql adapter at the moment.
  • ConnectionAdapters::Column gets a new adapter-abstract attribute called auto_incremented_by_db? which indicates whether a given column is being auto-incremented by the database regardless of the way it's being incremented. For mysql we alias it to auto_increment and for postgresql we alias it to serial.
  • ConnectionAdapters::Column also gets a new auto_populated? method which allows us to identify columns auto-populated by the database regardless of the way how the column is being populated. Currently only checks for auto_incremented_by_db or a default function
  • SQLite3::Column has a new rowid property which is an implicit way to have an integer column being autoincremented
  • ModelSchema._returning_columns_for_insert has been added to gather a list of columns values for which we want to get back after insertion. Adapter defines the selection rule. postgresql will select any auto-populated column while mysql and sqlite will select only auto_incremented_by_db? which by implication limits the selection to a single column.

Long term plans

The feature is flexible enough to be extended to support not just postgresql adapter but any other database that is capable of returning column values after creation.
The closest candidates are:

We'll have to teach adapters how to build a RETURNING statement in sql_for_insert
https://github.com/rails/rails/blob/aa162cefe19b81daa5d7f8c2b9ee4fe2564e73fc/activerecord/lib/active_record/connection_adapters/abstract/database_statements.rb#L632-L633
and then just extend the return_value_after_insert? method to ask for more than just an auto-increment being returned after an insert.
https://github.com/rails/rails/blob/aa162cefe19b81daa5d7f8c2b9ee4fe2564e73fc/activerecord/lib/active_record/connection_adapters/postgresql_adapter.rb#L284-L285

Databases that don't support RETURNING statement or an alternative will require an additional SELECT query to be performed to fetch values for every auto-populated columns. An additional query may not be desirable for every application so the behavior should be configurable which brings more complexity and one of the reasons why we are not trying to tackle it all at once.

@eileencodes
Copy link
Member

Can you take a look at the test failures when you're back on Thursday?

@nvasilevski nvasilevski force-pushed the populate-autoincremented-column-for-a-model-with-cpk branch 2 times, most recently from 79473d8 to a9f0f3e Compare May 18, 2023 19:15
@nvasilevski
Copy link
Contributor Author

Can you take a look at the test failures when you're back on Thursday?

Done. Was a memoization issue so we added a reset of the @_auto_populated_column_names variable

There are a few actionview failures that may be legit since the failure is around id being nil. I'll have a look

@nvasilevski nvasilevski force-pushed the populate-autoincremented-column-for-a-model-with-cpk branch 2 times, most recently from ec4f723 to 46054c2 Compare May 24, 2023 15:57
@matthewd
Copy link
Member

fetches only one column as insert signature is only capable of returning a single value

I suspect this might be the best moment to change that: we can vary the return value by whether the new parameter is supplied.

That does mean our caller is going to need to know how to figure out which return strategy the adapter supports, but I think that's already true: prioritising the PK is sensible, but in the case of MySQL, it's not true that we can get the new value for one column of our choosing [e.g. if the PK is not auto-populated] -- we will always get the value of the autoincrement column, whatever it is, and otherwise nothing. 99% of the time that's equivalent, but for when it's not, I think we do need to know about the distinction.

(That is, by my reading, the current implementation would be wrong on MySQL when faced with a PK that defaults to RAND() * 1000 and a second integer column that's AUTO_INCREMENT. I'm not sure it's practical for us to make that not broken, but we at least shouldn't shove values into incorrect attributes.)

@nvasilevski
Copy link
Contributor Author

I suspect this might be the best moment to change that: we can vary the return value by whether the new parameter is supplied.

Right, for some reason I thought we can't do it without a deprecation cycle but we can if we only change the signature in case if the new returning_columns argument is used. Though I don't feel completely comfortable about the duality of the method return value, it reminds me of the "boolean flag argument" smell so I'm thinking of maybe defining new exec_insert_with_returning and insert_with_returning methods instead. It may result in some duplication though. I'll try both "separate methods" and "returning_columns" argument approaches to see which one makes more sense and causes less troubles but let me know if you already have preference.

My only concern here that regardless of implementation the multi-return values solution will only work with PostgreSQL unless we want to add a special case for non-postgresql databases which will make an additional query to fetch values for all auto-populated columns. Though it may be too much for a single PR

the current implementation would be wrong on MySQL when faced with a PK that defaults to RAND() * 1000 and a second integer column that's AUTO_INCREMENT.

That's correct. Not an excuse but just a clarification that current implementation on main won't be able to handle such scenario either as it explicitly assumes that whichever value is returned by LAST_INSERT_ID() is the primary key value

self.id ||= new_id if @primary_key

Though I agree that since we are making changes we have a great opportunity to fix it as well. Because it is impossible to have more than one AUTO_INCREMENT column in MySQL (I'm relying on the error ERROR 1075 (42000) at line 3: Incorrect table definition; there can be only one auto column and it must be defined as a key) the strategy for MySQL (and most likely for SQLite) will be implemented as "populate the (only) auto_increment? column while for postgresql it will be "populate everything"

I'll work on extending the return values of insert methods (either through an argument or using a new method) along with a strategy, will post an update when its ready

@nvasilevski nvasilevski force-pushed the populate-autoincremented-column-for-a-model-with-cpk branch 12 times, most recently from a14ee55 to ecb6bee Compare May 31, 2023 18:56
@nvasilevski nvasilevski changed the title Fill auto incremented column for models with CPK Assign auto populated columns on Active Record object creation May 31, 2023
@nvasilevski nvasilevski force-pushed the populate-autoincremented-column-for-a-model-with-cpk branch 2 times, most recently from aa162ce to 23ff3fd Compare June 1, 2023 13:30
This commit extends Active Record creation logic to allow for a database
auto-populated attributes to be assigned on object creation.

Given a `Post` model represented by the following schema:
```ruby
create_table :posts, id: false do |t|
  t.integer :sequential_number, auto_increment: true
  t.string :title, primary_key: true
  t.string :ruby_on_rails, default: -> { "concat('R', 'o', 'R')" }
end
```
where `title` is being used as a primary key, the table has an
integer `sequential_number` column populated by a sequence and
`ruby_on_rails` column has a default function - creation of
`Post` records should populate the `sequential_number` and
`ruby_on_rails` attributes:

```ruby
new_post = Post.create(title: 'My first post')
new_post.sequential_number # => 1
new_post.ruby_on_rails # => 'RoR'
```

* At this moment MySQL and SQLite adapters are limited to only one
column being populated and the column must be the `auto_increment`
while PostgreSQL adapter supports any number of auto-populated
columns through `RETURNING` statement.
@nvasilevski nvasilevski force-pushed the populate-autoincremented-column-for-a-model-with-cpk branch from 23ff3fd to c929332 Compare June 1, 2023 14:39
Copy link
Member

@matthewd matthewd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

I think it's worth some follow-up to make sure adapters that don't know how to do full-featured RETURNING will instead raise if they're asked to return a column(s) they can't... but no need to squeeze that into this PR.

@nvasilevski
Copy link
Contributor Author

I think it's worth some follow-up to make sure adapters that don't know how to do full-featured RETURNING will instead raise if they're asked to return a column(s) they can't

Thanks, that's being done right now in parallel and a PR will be ready soon!

fractaledmind added a commit to fractaledmind/rails that referenced this pull request Sep 20, 2023
Implementing the full `supports_insert_returning?` contract means the SQLite3 adapter supports auto-populated columns (rails#48241) as well as custom primary keys.
fatkodima added a commit to fatkodima/rails that referenced this pull request Oct 5, 2023
Fixes rails#49502.

This bug was introduced in rails#48241. Since `IDENTITY` columns
in PostgreSQL do not have defaults, they are not auto populated via the changes made in that PR.
Thus we need to consider them separately.
owst added a commit to bambooengineering/activerecord7-redshift-adapter that referenced this pull request Jan 12, 2024
davinlagerroos pushed a commit to umn-asr/oracle-enhanced that referenced this pull request Feb 21, 2024
See rails/rails#48241
and rails/rails#49692

48241 added support for setting additional db-controlled attributes via
a returning clause. As part of these changes, that PR removed the
[setting of the in-memory object's id](https://github.com/rails/rails/pull/48241/files#diff-11b42664eb9834972953ecd5725c75fd6020af6d4ee5da37ef6d8e0c146f7773L1242)
in Persistance#_create_record. Instead, those changes expect some
`Column`s to be true for `auto_incremented_by_db?` so they can be
matched with the id value (and possibly other database values) returned
by insert and then used to update attributes on the in memory
activerecord object in [`_insert_record`](https://github.com/rails/rails/pull/48241/files#diff-11b42664eb9834972953ecd5725c75fd6020af6d4ee5da37ef6d8e0c146f7773R1248).

Unless an adapter adds the ability to identify some Column as `auto_incremented_by_db?`
the insert happens but the newly created object does not get its id value
updated from nil to the new id value. This commit updates `column_defintions`
and `Column` to identify the primary key column and make it selectable by
`auto_incremented_by_db?` so the corresponding attribute gets updated
with the id_value from the `insert`.

49692 makes further changes to support id_value in `insert` but I still need to
explicity pass `returning: nil` and wrap the return from `super` in Array
to get the id to update in tests.

Limitations:

This approach will probably need to be reconsidered if the adapter adds
support for composite primary keys, but for now this gets `SomeModel.create`
to set the id after the record is inserted and fixes many current test
failures.

PR 48241 also supports updating more via `returning` but this commit
does not add that feature. It looks like the seeds of an
implementation are there in `DatabaseStatements#sql_for_insert` and
`DatabaseStatements.exec_insert` but the goal of this commit is to fix
setting id during create.

Performance:

Using a left join to bring in the primary key since its performance
is pretty close to the performance of the original query and better
than using a WITH clause:

using benchmark-ips:

Warming up --------------------------------------
old_column_definitions
                         1.000 i/100ms
with clause column_definitions
                         1.000 i/100ms
left join column_definitions
                         1.000 i/100ms
Calculating -------------------------------------
old_column_definitions
                         10.093 (± 9.9%) i/s -     51.000 in   5.070277s
with clause column_definitions
                          9.031 (± 0.0%) i/s -     46.000 in   5.106131s
left join column_definitions
                          9.919 (± 0.0%) i/s -     50.000 in   5.049687s

Comparison:
old_column_definitions:       10.1 i/s
left join column_definitions:        9.9 i/s - same-ish: difference falls within error
with clause column_definitions:        9.0 i/s - 1.12x  slower
JesseChavez added a commit to JesseChavez/activerecord-jdbc-adapter that referenced this pull request Jul 22, 2024
rails/rails#48241

Also relates to change to sqlite3 adapter getting support to
'supports_insert_returning?' true
owst added a commit to bambooengineering/activerecord7-redshift-adapter that referenced this pull request Sep 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants