Skip to content

Add delayed preloading for arrays of records#32136

Closed
dinahshi wants to merge 11 commits intorails:mainfrom
dinahshi:preload_array
Closed

Add delayed preloading for arrays of records#32136
dinahshi wants to merge 11 commits intorails:mainfrom
dinahshi:preload_array

Conversation

@dinahshi
Copy link
Copy Markdown
Contributor

@dinahshi dinahshi commented Feb 28, 2018

Summary

Adds method Relation#load_associations(arr, *args) which accepts an array
of records and args to preload. Useful for when a loaded list is filtered
in ruby to produce an Array and we want to preload the records in this Array.
Effectively exposes Preloader.preload(records, associations, preload_scope = nil)
without requiring people to know about a separate Preloader class.

Todo

  • Documentation
  • CHANGELOG

cc/ @matthewd

@rails-bot
Copy link
Copy Markdown

Thanks for the pull request, and welcome! The Rails team is excited to review your changes, and you should hear from @eileencodes (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

This repository is being automatically checked for code quality issues using Code Climate. You can see results for this analysis in the PR status below. Newly introduced issues should be fixed before a Pull Request is considered ready to review.

Please see the contribution instructions for more information.

@rafaelfranca
Copy link
Copy Markdown
Member

This is awesome. We have an implementation of this at Shopify. This is the implementation.

class AssociationPreloader
  def initialize(records)
    @records = records
  end

  def preload(*associations)
    if @records.is_a?(ActiveRecord::Relation)
      @records.preload(associations)
    else
      ActiveRecord::Associations::Preloader.new.preload(@records, associations.flatten) unless @records.empty?
      @records
    end
  end
end

Maybe we should also support passing an relation?

Also can you add documentation?

@bogdan
Copy link
Copy Markdown
Contributor

bogdan commented Mar 1, 2018

We are using similar technique in our project.
We found a major problems with this implementation:

Sometimes loaded associations.

Preloader currently determines if the association is preloaded or not by the first record https://github.com/rails/rails/blob/master/activerecord/lib/active_record/associations/preloader.rb#L180
We have encountered that practically: it results in the double load on the loaded target records. I think we need to fix that part first before exposing the power of Preloader to the audience.

Offtopic

You may see the implementation that is efficient for us:

  def preload_into_association(host_association, *preload_associations)
    # Preloading records in such a weird way
    # Because we want to cache them inside an association
    # but not have a separated scope as a result of #preload call
    preload_associations.each do |preload_association|
      collection = send(host_association)
      unpreloaded = collection.reject {|l| l.association(preload_association).loaded?}
      ActiveRecord::Associations::Preloader.new.preload(unpreloaded, preload_association)
    end
  end

Here is how it is used:

Site.has_many :campaigns
Campaign.has_many :assets

site.preload_into_association(:campaigns, :assets)
# which is better than
site.campaigns.preload(:assets) 
# because it generates new relation and site.campaigns.map(&:assets) is not cached...EVER...  

You may see that we filter out loaded records with loaded association from collection which resolves the original problem I described. The counter problem of that is that it doesn't allow to pass the nested associations... only an array of associations.

@dinahshi
Copy link
Copy Markdown
Contributor Author

dinahshi commented Mar 5, 2018

@rafaelfranca I like the Shopify implementation's straightforward way of handling relations. I'm a little concerned that the method can return different types. When passed a relation, it returns a relation so it can be chained with other query methods like where, order, etc. But when passed an Array it returns an Array.

@dinahshi
Copy link
Copy Markdown
Contributor Author

Addressed feedback from @rafaelfranca and @kaspth. aab3b41f3d360c56bcd8e72295863676d4b3ed7a fixes the issue raised in #32140 where it is assumed that an entire list is loaded when the first record in the list is loaded. 8d6e1468b6c126f7cfb96171e417d892f257bbdf fixes preloading different object ids.

Added documentation and updated changelog.

Copy link
Copy Markdown
Contributor

@lsylvester lsylvester Mar 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @dinahshi

These changes look good, but I think that they will conflict with some of the latest changes made in #31250.

The method preload_associations was defined by this PR so I think that this will cause conflicting behaviour.

In #31250, this method is doing the preloading of associations as part of the loading of the relation, so doesn't take any arguments for the different associations, but does accept the records as the preloading can be done on a subset of the records.

I think that this change would still be good without this alias, so maybe it could be removed?

Otherwise, maybe these methods could be moved to be class methods on ActiveRecord::Base as I am not sure if there is use case for using them on a relation (all the examples in the tests are directly on the ActiveRecord::Base subclasses), but there might be other use cases I am not aware of.

Last option would be to rename the other preload_associations method, easiest thing to do would be to prefix with _ as it is not really part of the public API.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out! I think it'd be nice to make preload_associations part of the public API. It doesn't seem like there's a precedent for prefixing private methods with _ so I'd like to rename the current method to be more specific—maybe preload_associations_on_records?

In terms of the method being defined on Base vs Relation, I wanted to keep the logic close to the original preload query method which is defined on Relation.

@sunnyrjuneja
Copy link
Copy Markdown

I'm super excited about this feature, thank you so much @dinahshi and @eileencodes (and everyone else who contributed). Just wanted to confirm, is this is targeting Rails 6?

@dinahshi
Copy link
Copy Markdown
Contributor Author

After a few months on the back burner, I believe this is ready for review! 😅

As suggested by @matthewd, I've removed the preload_associations alias to follow the precedent of "load" implying that an action occurs immediately, and renamed preload_associations (introduced by #31250) to execute_preload in order to distinguish that the operation occurs on a relation's previously defined associations.

Copy link
Copy Markdown
Contributor

@kaspth kaspth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking real good!

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still think a load_association (singular) alias would be nice for the single association case.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should preload be linked here? Or does RDoc do it automatically via # and I just don't remember? 😄

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other methods seem to have it linked automatically through # but I'm not sure!

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could self.klass here just be klass?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'd just say

raise ArgumentError, "`load_associations` must be given…"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How likely are users to pass in nested arrays? And ones where nil is part of the array?

Does the preloader do the same work on the records or does it support nested arrays and nil values?

Since we mutate records how can users properly loop through them if they're either nested or contain nil? E.g. should we punt the flatten.compact to user code instead of baking it in here?

Come to think of it, I'm curious if we even need this defensive coding method at all?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Preloader handles nil with a compact here. The docs say it should flatten nested arrays but now that I try it, I'm not seeing that behaviour. Will investigate further and see if either docs or code need to be changed.

I wanted to add this layer so that errors don't originate in Preloader, which isn't part of the public API. If we keep flatten, perhaps an addition to documentation (similar to the Preloader doc) would help explain the mutation to records.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked on master and Preloader also doesn't flatten arrays there (i.e. passing a nested array into the records parameter causes an undefined method for Array error). I'll leave the nil check in, remove the flatten, and update the Preloader docs.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice to know that we support polymorphic association loading as well!

I'm curious what happens if I keep the variables separate:

authors, posts = Author.all.to_a, Post.all.to_a

ActiveRecord::Base.load_associations(authors + posts, :comments)

authors.map(&:comments) # Will SQL queries fire here? Or are the comments loaded?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments are loaded, no SQL queries will fire.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍, would it be clearer if we kept the variables separate in the tests and do authors + posts?

@dinahshi
Copy link
Copy Markdown
Contributor Author

dinahshi commented Aug 1, 2018

I just realized that ActiveRecord::Base.load_associations() doesn't work at the moment. It throws ActiveRecord::Base doesn't belong in a hierarchy descending from ActiveRecord during scoping since it's currently set up in the Querying module using delegate :load_associations, to: :all.

I'm not sure how to rework this, would #load_associations have to be defined on the Core module as well as the Querying module? Is this effort worthwhile given that the method could be called on any subclass (i.e. do we want to support ActiveRecord::Base.load_associations() at all)?

cc/ @matthewd

@matthewd
Copy link
Copy Markdown
Member

matthewd commented Aug 2, 2018

ActiveRecord::Base doesn't belong in a hierarchy descending from ActiveRecord

Well that's an impressively-confusing error message 😅

Yes, let's move it out of Relation and into one of the model modules. At a glance it seems like a pretty equally awkward fit for both Querying and Associations, so Core looks to be the way to go. It's just a move, so there's no added complexity -- while I wouldn't encourage people to call it that way, all class methods are accessible via the relation automatically.

My thinking here is that

  1. it's conceptually not really something that acts on the current relation: the only existing methods it calls on self are check_if_method_has_arguments! and build_preloader, which are fine to inline and duplicate;
  2. it does seem useful for third party libraries to be able to AR::Base.load_associations(..) without needing to find a suitable subclass in the application; and
  3. if we were to fix the above error, it would complain that AR::Base is an abstract class [and therefore can't have a base class]... which would mean other user-defined abstract classes become invalid too (which would be totally correct, but more annoying when documenting valid callees for this method).

@dinahshi
Copy link
Copy Markdown
Contributor Author

👋 again! Moved load_associations to Core, addressed comments from @kaspth, and tests are passing. Can I have another round of review please? cc/ @matthewd

@eileencodes eileencodes added this to the 6.0.0 milestone Jan 3, 2019
@bogdan
Copy link
Copy Markdown
Contributor

bogdan commented Feb 7, 2019

This PR has a lot of problems right now. Here is a brief list (some with failing tests):

  • records with loaded associations may cause double loading (test 1)
  • records with dirty associations will not properly merge in-memory and DB records (test 2) (see CollectionAssociation#merge_target_lists)
  • through associations improperly uses loaded state which can have its own custom scope (test 3)
  • through associations may override (with incomplete array because of through scope) and than reset the already loaded associations (no test invented yet)

My vision on fixing that is pretty blurry, but I can share these thoughts:
The handling of loaded and unloaded associations inside the same records array would need to improved. We can simply ensure association is not loaded before assigning association records to target inside #associate_records_to_owner. However this is ineffective due unnecessary loading of already loaded records. There should be a better way in generating separated preloader instances for records with loaded and unloaded associations.

Also, through associations loading should never assign records to source and through reflection when the special scope is applied (in terms of code - Association#reset should never be used).
It would require a serious refactoring of the preloader classes to ensure the code for loading records and assigning them as association targets is separated and can be used independently.

Failing test cases (pluggable to relations_test.rb):

  def test_load_associations_ignore_loaded_records
    post1 = posts(:welcome)
    post2 = posts(:thinking)
    posts = [post1, post2]
    comment_ids = post2.comments.to_a.pluck(:id)
    Post.load_associations(posts, :comments)
    assert_equal comment_ids, post2.comments.to_a.pluck(:id)
  end

  def test_load_associations_into_dirty_association_merges_targets_properly
    post = posts(:welcome)
    ids = post.comment_ids
    post.comments << post.comments.first
    Post.load_associations(posts, :comments)
    assert_equal ids, post.comments.ids
  end

  require "models/department"
  require "models/hotel"
  require "models/chef"
  require "models/cake_designer"
  require "models/drink_designer"
  def test_load_associations_doesnt_reset_loaded_through_associations
    cake_designer = CakeDesigner.create!(chef: Chef.new)
    drink_designer = DrinkDesigner.create!(chef: Chef.new)
    chefs = [cake_designer.chef, drink_designer.chef]
    department = Department.create!(chefs: chefs)
    hotel = Hotel.create!(departments: [department])
    Hotel.load_associations([hotel], :cake_designers, :drink_designers)
    assert_equal chefs, hotel.chefs
    assert_equal [cake_designer], hotel.cake_designers
    assert_equal [drink_designer], hotel.drink_designers
  end

@rails-bot
Copy link
Copy Markdown

rails-bot bot commented Dec 18, 2019

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
Thank you for your contributions.

@rails-bot rails-bot bot added the stale label Dec 18, 2019
@rails-bot rails-bot bot closed this Dec 25, 2019
@eileencodes eileencodes reopened this Dec 25, 2019
@rails-bot rails-bot bot removed the stale label Dec 25, 2019
eileencodes added a commit to eileencodes/rails that referenced this pull request Dec 11, 2020
Part 1 of ?

The `ActiveRecord::Associations::Preloader` class was passing around
many of the same arguments to methods and was difficult to follow.

This PR does a few things:

1) Move arguments for the `Preloader` to the initializer instead of
`preload`
2) Merges the `preloaders_for_one` and `preloaders_for_hash` methods
into one `build_preloaders` method that takes no arguments. This new
method is responsible for looping through all the passed associations.
3) If there are nested associations passed, the class will build a new
`Preloader` object for each `child` in `build_child_preloader`.
4) The above changes allowed for removing most of the arguments being
passed down.
5) Implement a deprecated shim that preserves the original API. While
this API is private and undocuented, we know that lots of applications
are using the API. Generally we don't deprecate private APIs but due to
it's wide usage we decided to maintain behavior here.

Next steps:

1) Continue refactoring this class and shared classes as they make more
sense. There's a lot of `flat_map`'s in here that feel overused.
2) Once the API is more reasonable revive older Preloader PRs (ie rails#32136)
3) Expose a usable public API so that apps can build their own
preloaders when necessary (we do this a lot at GitHub) without
interacting with private methods.

Co-authored-by: Dinah Shi <[email protected]
Base automatically changed from master to main January 14, 2021 17:00
@rafaelfranca
Copy link
Copy Markdown
Member

Closing because the work is going to be done in effort to make preload public and we have a lot of conflicts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.