Skip to content

Comments

Stop serializing columns as YAML by default#47422

Merged
byroot merged 1 commit intorails:mainfrom
Shopify:default-column-serializer
Feb 24, 2023
Merged

Stop serializing columns as YAML by default#47422
byroot merged 1 commit intorails:mainfrom
Shopify:default-column-serializer

Conversation

@casperisfine
Copy link
Contributor

@casperisfine casperisfine commented Feb 17, 2023

Followup: #47463

YAML is great for configuration and such, but for serializing arbitrary data it has many problem, both in term of performance and efficiency but also in term of security.

As such, we shouldn't let it be the default.

The open question is wether we should provide another default, or just let users chose what they want based on their own tradeoffs.

Many people would probably suggest JSON as the new default, unfortunately I don't think it's a good fit either because the parsers available in Ruby have some wonky behaviors:

>> ActiveSupport::JSON.decode(ActiveSupport::JSON.encode(Object.new))
=> {}
>> JSON.load(JSON.dump(Object.new))
=> "#<Object:0x000000012b61a068>"

If we were to select another default, I beleive it would need several properties:

  • It should only serialized a safe list of primitive types.
  • It should explictly raise if attempting to serialize complex types.

NB: I have quite a lot of documentation to update, but making sure CI passes first.

@casperisfine casperisfine force-pushed the default-column-serializer branch 2 times, most recently from 2d9da0a to 7deaa5d Compare February 17, 2023 15:28
@rails-bot rails-bot bot added the actiontext label Feb 17, 2023
@casperisfine casperisfine force-pushed the default-column-serializer branch 6 times, most recently from 313d6b3 to 768801c Compare February 20, 2023 10:08
@casperisfine casperisfine marked this pull request as ready for review February 21, 2023 08:56
@byroot byroot requested a review from rafaelfranca February 21, 2023 08:57
@casperisfine
Copy link
Contributor Author

Note to self: Let's ship the refactor without the change of default for now.

@rafaelfranca
Copy link
Member

I don't think we need to provide a default since we do provide a global configuration, that to me is good enough if people wants a default.

@casperisfine casperisfine force-pushed the default-column-serializer branch 2 times, most recently from 516df83 to da3fef6 Compare February 24, 2023 07:34
YAML is great for configuration and such, but for serializing arbitrary
data it has many problem, both in term of performance and efficiency
but also in term of security.

As such, we shouldn't let it be the default.

The open question is wether we should provide another default, or
just let users chose what they want based on their own tradeoffs.

Many people would probably suggest JSON as the new default, unfortunately
I don't think it's a good fit either because the parsers available in
Ruby have some wonky behaviors:

```ruby
>> ActiveSupport::JSON.decode(ActiveSupport::JSON.encode(Object.new))
=> {}
>> JSON.load(JSON.dump(Object.new))
=> "#<Object:0x000000012b61a068>"
```

If we were to select another default, I beleive it would need several
properties:

  - It should only serialized a safe list of primitive types.
  - It should explictly raise if attempting to serialize complex types.
@casperisfine casperisfine force-pushed the default-column-serializer branch from da3fef6 to cee1555 Compare February 24, 2023 07:55
@casperisfine
Copy link
Contributor Author

cc @tenderlove @jhawthorn as we discussed this a couple days ago.

We mentioned maybe implementing a strict JSON encoder. The performance likely wouldn't be good and it's a big can of worm I'm not sure I want to get into.

In the meantime I implemented a strict option for the json gem: ruby/json#519, if it get accepted, we may have the option to use JSON by default, but probably won't happen soon.

In the meantime I think I'm ok with no default. Serializing data inside the database is a decision with big consequences, so I think I like that users now need to make that choice themselves.

So I think I'll merge this for now, but if there are strong opinions we can revert or set another default.

@byroot byroot merged commit 2daa762 into rails:main Feb 24, 2023
byroot added a commit that referenced this pull request Feb 24, 2023
active_record.raise_on_assign_to_attr_readonly = true
active_record.belongs_to_required_validates_foreign_key = false
active_record.before_committed_on_all_records = true
active_record.default_column_serializer = nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be in 7.0, and setting it to YAML. Whether we want to be broken out of the box for new applications is debatable, but we definitely don't want to break for upgrades.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@matthewd the default is set here:

class_attribute :default_column_serializer, instance_accessor: false, default: Coders::YAMLColumn

It's still YAML for all existing applications, we only set it to nil if you have load_default 7.1.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This causes different behaviour of active_record when used by itself vs being used in a Rails 7.1 application. Would it not be better to have active_record default to nil also so at least the behaviour is consistent and gems which depend on active_record can have parity between their tests and behaviour when used in Rails apps (see rpush/rpush#668 for example)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@benlangfeld unfortunately no, because the default must be YAMLColumn if load_defaults is never called.

This is the way we do it for all default changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants