Skip to content

Conversation

@tlconnor
Copy link

@tlconnor tlconnor commented Dec 9, 2020

The id_method_name is not a reliable method of determining IDs for relationships. It has built in assumptions about primary key / foreign key naming and can result in undefined method errors if the assumptions are incorrect.

Since a serializer is always the source of truth for determining how a record gets serialized, it would generally make sense to use that where possible.

This PR updates the Relationship class to use the serializer class to calculate IDs if an id_method_name is not provided, or if it does not map to an existing foreign key / primary key.

Closes #64

What is the current behavior?

  1. If an object_block is provided, the object is loaded and a method will be called to retrieve the ID using either id_method_name, or "id". This will raise an exception if the method cannot be found.
  2. If an object_block is not provided, a foreign key is calculated based on id_method_name and ActiveRecord conventions. The foreign key method is then caled on the parent record. This will raise an exception if the method cannot be found.

What is the new behavior?

  1. If an id_method_name is provided, it will attempt to use that as a foreign key
  2. If an id_method_name is not provided, or does not correspond to a method on the parent record, a foreign key is guessed based on ActiveRecord conventions.
  3. If the foreign key could still not be found, the child object is loaded and id_method_name is used as a possible method for look up the ID from the child object.
  4. If id_method_name does not map to a method available on the child record, the serializer for the relationship is used to look up the IDs.

Checklist

Please make sure the following requirements are complete:

  • Tests for the changes have been added (for bug fixes / features)
  • Docs have been reviewed and added / updated if needed (for bug fixes /
    features)
  • All automated checks pass (CI/CD)

@tlconnor tlconnor force-pushed the fixRelationshipFetchId branch 4 times, most recently from 2e1702a to 021b1e3 Compare December 10, 2020 01:56
@stas
Copy link
Collaborator

stas commented Dec 12, 2020

@tlconnor this indeed is a pretty well known bug. The inconsistencies around it are described in #53 and #45 and are one of the reasons I started #141

I don't mind merging this, but I will have to ask you to provide a test-case as well. Would you be kind?! 🙇

@christophersansone
Copy link
Contributor

christophersansone commented Dec 12, 2020

@tlconnor Thanks, agreed that something should happen here long-term. I wrote #45 to address it but couldn't get everyone behind that particular implementation.

My concern is that this will be a change in behavior (and therefore a breaking change) for people using id_method_name. If we are intentionally making a breaking change, we should follow proper semantic versioning with a major version bump and deprecation warnings in the meantime.

I'm thinking the right approach would be to remove the part in ObjectSerializer where id_method_name is being computed, so that we can differentiate between when the developer actually specifies it vs. when it's up to the serialization process to automatically determine it. In other words, the Relationship object's id_method_name should be nil if not specified, and then we can safely move the computation into the Relationship object where it belongs. This way, we can focus on improving the way it is automatically computed in a backwards compatible way, without breaking the use case where the developer specifies id_method_name directly.

Last thing. When a relationship is being included for serialization, it's no problem to get the ID from the object and serializer. When it's not being included, it gets more complicated from a performance standpoint, because we don't necessarily want to fetch all the records / models just to get the IDs if it can be avoided. ActiveRecord has a shortcut where, if a Book has_many :authors then we can get the IDs as an array by simply calling book.author_ids. The id_method_name currently leverages this here.

There are many things to consider here if we are going to try to maintain performance and backward compatibility. Happy to assist you and @stas . Some basic pseudo-code might look something like this:

def fetch_ids(record)
  if object_block.nil?
    # if an object block is specified, avoid these quick attempts in favor of fetching the objects below

    # try the id_method_name that was specified by the developer in the relationship's properties
    return record.send(id_method_name) if id_method_name.present? && record.respond_to?(id_method_name)

    # try the conventional relationship_ids method name, e.g. author_ids or publisher_id
    return record.send(relationship_id_method_name) if record.respond_to?(relationship_id_method_name)
  end

  if has_many?
    # iterate over the objects and get the IDs
    rel = fetch_associated_object(record)
    if rel.respond_to?(:map)
      return rel.map { |o| relationship_object_id_for(o) }
    end

    # not sure what else should happen if a has_many relationship does not respond to map ... error?
  end

  if belongs_to? || has_one?
    # iterate over the objects and get the IDs
    rel = fetch_associated_object(record)
    return relationship_object_id_for(rel)
  end
end

def relationship_object_id_for(rel)
   if id_method_name.present?
    return rel.send(id_method_name)
  else
    return serializer_for(rel).id_for(rel)
  end
end

def relationship_id_method_name
  @relationship_id_method_name ||= has_many? "#{name}_ids" : "#{name}_id"
end

The main point of uncertainty is, if the developer specifies id_method_name, whether he means for it to apply to the parent or the child. In my opinion, moving forward, it should only apply to the parent, because the object's serializer should be the single source of truth for determining how to determine the ID of a single object. That's why I advocated for deprecating id_method_name as is and implementing a separate foreign_key field for this purpose moving forward. I don't care what it's named, as long as what it does is clear...

@tlconnor tlconnor force-pushed the fixRelationshipFetchId branch 2 times, most recently from ff80353 to 348b23f Compare December 13, 2020 22:36
@tlconnor
Copy link
Author

Thanks @stas / @christophersansone for the feedback.

I have updated this PR to make the following changes:

  • Backwards compatibility is now maintained. If a user provides an id_method_name, it will continue to be used. This applies to both the foreign key and primary key use cases.
  • Foreign key computation is moved into the Relationship class
  • If the foreign key can not be found, the serializer is used to look up the ID.
  • Explicit test cases are added for the cases where the serializer is now used to look up related IDs.

What this means is the existing behavior is now maintained, except that instead of

undefined method `actor_ids' for #<Movie>

we now use the serializer to look up the IDs.

@tlconnor tlconnor force-pushed the fixRelationshipFetchId branch from 348b23f to 43fd2e5 Compare December 13, 2020 23:09
The `Relationship` class currently has two possible methods for determining the ID for a relationship:
1. It can use the `serializer` associated with the relationship.
2. It has an option called `id_method_name`, where `id_method_name` is the name of a method on the parent record that can used to look up the ID of the related record without loading the related record.

The `id_method_name` is not a reliable method of determining IDs for a couple of reasons:
1. A value is not always provided and the calculated default is not always accurate.
2. The parent doesn't necessarily always have a method for finding the related ID directly, and there is no guarantee that it will be performant.
3. If the method does not exist, the user will get a not very helpful error message.

Since a serializer is always the source of truth for determining how a record gets serialized, it would generally make sense to use that where possible.

This PR updates the `Relationship` class to use the serializer class to calculate IDs in cases where an `id_method_name` is not provided and the related record and serializer are both available.
@tlconnor tlconnor force-pushed the fixRelationshipFetchId branch from 43fd2e5 to 62d1861 Compare December 13, 2020 23:11
# This code path should be removed when `id_method_name` is replaced with `foreign_key`
#
if id_method_name
if has_many? && object.first.respond_to?(id_method_name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking a stab at this, I think it's really close!

I'm a bit concerned about calling first here. If it's an Array, no big deal. If it's an ActiveRecord relationship, and the relationship has not been loaded yet, then first will call its own SELECT ... LIMIT 1 query. Then the full query of all records would follow when map is called. So you'd get two queries here, which we should avoid.

Also, what happens if the developer intends for id_method_name to be called on the child, but the relationship has no associated records? It will fall through to attempting the id_method_name on the parent, which may result in incorrect output.

I think we can solve for both this way:

if has_many?
  if object.length > 0 && object.first.respond_to?(id_method_name)
    return object.map { |item| item.public_send(id_method_name) }
  else
    return []
  end
...
end

Also, what should happen if id_method_name is specified but neither the parent nor child respond to it? I tend to think it should throw an exception as opposed to falling through to the next attempts. If the developer specifies it, it should tell the developer if it's wrong.

# explicit.
#
foreign_key = id_method_name
foreign_key ||= (has_many? ? "#{@name.to_s.singularize}_ids" : "#{@name}_id")
Copy link
Contributor

@christophersansone christophersansone Dec 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For performance, we should memoize this computation, as we may be iterating over hundreds of records or more, and it is constant once the relationship is defined. Either on init or as a memoized method.

@computed_id_method_name = has_many? ? "#{@name.to_s.singularize}_ids" : "#{@name}_id"

@christophersansone
Copy link
Contributor

@tlconnor 👏 Thanks for the updates! I added a couple comments, but in general I think it's as good as it can be if we go down this path. You definitely opened a can of worms though!

My concern is that there still will be edge cases where it causes a breaking change: the order in which the process flows, the removal of the hardcoded :id in some cases, possibly now requiring a serializer that was not required prior to this, and of course the uncertainty of whether id_method_name is intended to be called on the parent or the child.

@stas, how important is it to maintain perfect backward compatibility here, at least until a major version bump? Is there a new major release on the horizon soon?

Ultimately, my opinion is to deprecate id_method_name entirely, in favor of a new parameter that applies only to the parent. We can simply call it id_method. Its job is to simply customize the method that returns the foreign key(s), e.g. book_id, author_ids, etc. If the developer wants to customize the ID method of the child, then they would do that through the child object serializer only. To me, that gives us a nice path forward, notifies the developer of the changes, and ultimately puts things where they belong.

@stas
Copy link
Collaborator

stas commented Dec 14, 2020

@stas, how important is it to maintain perfect backward compatibility here, at least until a major version bump? Is there a new major release on the horizon soon?

Good question. #141 is pretty much ready for a beta. I'm stuck finding some time to finish up the inline-docs and general documentation. And this issue is already handled fully (:crossed_fingers:) in the that branch. Saying that, I really don't know if I have time and energy to introduce an intermediary significant release and deal with the requests for that.

Honestly, I'd push for #141 as a v.3-beta release, even the way it is now, close a bunch of issues it handles and keep the good work on top of it. But I also don't want to sound like 💩 and not consider the contributions from the community like what @tlconnor is doing (where I apologise for not taking time to look into yet 🙈)...

@christophersansone
Copy link
Contributor

@stas Sweet, I just did a quick review of #141 and see that id_method_name was removed in favor of ids_method_name. Love it! Looks like a lot of simplification all around. Great work! 👏

@tlconnor Thanks again for tackling this... twice! Because of the potential for breaking changes with certain use cases, I'm thinking #141 with the major version bump is the better option. Are you able to solve your specific issue today using the current version of this library, either with id_method_name or a Proc? If not, can you provide a specific example? Let's definitely make sure this isn't blocking you from getting your work done.

@tlconnor
Copy link
Author

@stas Sweet, I just did a quick review of #141 and see that id_method_name was removed in favor of ids_method_name. Love it! Looks like a lot of simplification all around. Great work! 👏

@tlconnor Thanks again for tackling this... twice! Because of the potential for breaking changes with certain use cases, I'm thinking #141 with the major version bump is the better option. Are you able to solve your specific issue today using the current version of this library, either with id_method_name or a Proc? If not, can you provide a specific example? Let's definitely make sure this isn't blocking you from getting your work done.

I am not blocked, I've added my 2 fixes to my app as monkey patches for now. The main area of concern for me is the non-rails use cases where relationships aren't backed by ActiveRecord.

I'm happy to drop this change in favor of #141, as long as some sort of forward progress is being made there. If you release it as a beta I would be happy to switch to the beta version.

@stas @christophersansone would one of you also be able to look at #154? It is a simpler bug fix that fixes another blocking bug for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

embedded relationship data does not honor relationship's object_method_name

3 participants