Skip to content

Conversation

@christophersansone
Copy link
Contributor

@christophersansone christophersansone commented Dec 13, 2019

Overview

This PR implements the following changes:

  1. The ability to specify a relationship's serializer option as a Proc, when the serializer may be dynamic or requires some custom logic. Resolves Allow a relationship's serializer to be computed based on the object #29. Example:
  class ZooSerializer
    include FastJsonapi::ObjectSerializer

    has_many :animals, serializer: -> (object) do
      if object.is_a?(Mammal)
        MammalSerializer
      elsif object.is_a?(Bird)
        BirdSerializer
      else
        AnimalSerializer
      end
    end
  end
  1. When record_type is not specified on a relationship, the outputted type is now always determined from its serializer's set_type.

  2. When record_type is not specified on a relationship, the relationship's serializer must always exist (so that it can get the type).

Is it a breaking change for anyone?

There are two edge cases where this can result in different behavior than before:

  1. A record_type is not specified, and the old way to infer the type does not match that of the relationship's serializer. In this case, this was likely invalid JSON:API output (see Polymorphic hash does not select the proper serializer for included records #30), so this change now prevents this from occurring.

  2. A record_type is not specified, and the relationship's serializer is not defined. This should not really be happening in practice... if a relationship cannot be serialized, there is no sense in defining the relationship to begin with.

TL;DR

The new serializer option behaves just like the Proc definitions for attributes and conditionals. This gives the developer full control over determining the serializer to use based on any possible use case.

Because the Proc may output different serializers, and different serializers means having different type values, I needed a way to get the correct types somehow. I chose to use the serializer's set_type value to offer a Single Source of Truth. This resolves some uncertainty as to what record_type does, whether it chooses the serializer, what type is being outputted, etc (#14). It also resolves #30, where the type in the relationships hash may differ from the type in the included list. With this change, the type will always be consistent and simplifies the API because the developer no longer needs to define it in both the serializer and on all relationships with via record_type.

There were very few changes that I had to make to the existing specs... but there were changes. I want to specifically point them out and encourage you to review the spec changes. I annotated the changeset with comments for further clarity as well.

  1. Because the type is now resolving differently, it is possible that the output may change for some people. When the object itself is serialized, as data or within included, the type always comes from the serializer's set_type. That has not changed at all. What has changed is that the relationships hash now uses that same value from set_type, as opposed to inferring it by the property name. Therefore, the only cases where the output would be different after this change than before this change is if the relationships hash contained a different type than the record itself, which to me is a bug in the output.

  2. As a result of the previous point, it now requires a serializer to be defined for each class being outputted, so that it can properly resolve the type key. In other words, previously, you could define a relationship without actually having a serializer defined for those items, and it would infer the type from the property name. I can’t imagine that this is an actual problem in practice, because it does no good to have a relationship defined without somehow being able to serialize those records. I only mention it because there were some specs that defined relationships without defining the serializers.

How do these changes impact performance?

I tried to compare the rspec performance benchmarks between the master branch and this branch. When the difference is microseconds, it is really hard to get perfectly consistent readings on my computer (late 2019 MacBook Pro). Initial load appears to be consistently affected because of the up front serializer resolution, but we are talking about 0.3ms. After that, performance is quite similar. The tests with larger recordsets were often faster with the new implementation, though it is not consistent enough to state one way or the other. I am confident saying that the performance impact is negligible on initial render and at least as fast on subsequent renders.

What should happen to record_type?

The record_type option is still usable and will override the default behavior of looking up the serializer. To me, the negatives outweigh the positives at this point. It adds to the API surface and overlaps other functionality, causing confusion. And worse, because it can result in the type in the relationships hash being different than the serialized record itself, it can result in invalid output. I believe it should be deprecated, but it still "works" for now.

attributes :compensation

has_one :account
has_one :account, serializer: EmployeeAccountSerializer
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This here was extremely unclear what the behavior should have been. There is a property called account, but the class is defined as EmployeeAccount. Should it assume AccountSerializer because of the property name? Should it read the object and see that it's an EmployeeAccount and therefore use EmployeeAccountSerializer?

FYI, the existing behavior (and the updated behavior) is that has_one :account will resolve to AccountSerializer. This has not changed. What has changed is that it now requires the serializer to exist. But, the reality is that you probably won't have an AccountSerializer when the class is EmployeeAccount. To me, this looks like a bug here in the EmployeeSerializer, where we should expect it to be EmployeeAccountSerializer and therefore the relationship would somehow need to specify the serializer.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks a bit messy, but it seems like the account serializer was never defined and actually not used except in two tests (lines 144-151). I think we can just replace the account with something that already has a defined serializer, like move the has_one :photo into the EmployeeSerializer and update the tests to reflect photo instead of the account. Plus we delete a couple of extra lines.

@christophersansone
Copy link
Contributor Author

@stas Hey, just wondering if you've had a chance to consider this one. I believe there are some important improvements here moving forward. I'd love to help address any concerns you may have.

Copy link
Collaborator

@stas stas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First of all, apologies for the delay in reviewing this. 🙈

@christophersansone I left some notes I'd love to have your input on before we move on with this.

Non-PR-related concerns (cc @jopotts @kpheasey):

  1. Should we set up a default set of fixtures, the fact that every test-case defines it's own set of fixtures seems, at least, hard to follow. I prefer us to have a default set of models/fixtures we all work and reuse?! If so, could we start with this test-case to build this fixtures library please 🙏
  2. Shall we adopt the jsonapi-rspec1 gem to help DRY up these tests instead of having these custom shared examples?! The gem is not maintained despite having a really nice API and a bunch of forks. We could even get in touch with the jsonapi-rb crew and see if we can take it over.

attributes :compensation

has_one :account
has_one :account, serializer: EmployeeAccountSerializer
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks a bit messy, but it seems like the account serializer was never defined and actually not used except in two tests (lines 144-151). I think we can just replace the account with something that already has a defined serializer, like move the has_one :photo into the EmployeeSerializer and update the tests to reflect photo instead of the account. Plus we delete a couple of extra lines.

@stas stas self-requested a review January 5, 2020 19:31
@christophersansone
Copy link
Contributor Author

I agree that the tests and fixtures can use some improvement. I opened #33 to discuss having more, clearer tests for each relationship option, but it sounds like you have some ideas to broaden it even further. I will defer to you on how to better structure everything related to tests, but I am open to whatever makes sense.

end
end

def initialize_static_serializer
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@christophersansone I'm trying to understand the reasoning behind all this caching and extra attributes. Correct me if I'm wrong, but a relationship will have just one serializer klass, which we can calculate upon the relationship initialization DSL (say as part of the create_relationship). Same is true for the polymorphic case. So why do we need the extra logic here, seems overly complicated?

Copy link
Contributor Author

@christophersansone christophersansone Jan 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct me if I'm wrong, but a relationship will have just one serializer klass...

This is true in some cases but not all. A relationship's serializer can either be static or dynamic.

When it is static, it is determined once up front and will not change. This is the most common use case:

belongs_to :actor
has_many :movies
belongs_to :genre, serializer: CustomGenreSerializer

The serializer is considered dynamic when it may change based on object or params. This occurs in the following ways:

has_many :movies, polymorphic: true
has_many :actors, polymorphic: { ComedyActor => :custom_comedy, DramaActor => :custom_drama }
has_many :genres, serializer: -> (object, params) {
  ...
}

Therefore, the role of initialize_static_serializer is to determine if the relationship's serializer can be static, and if so, which serializer it is. It determines this once and then caches it forever. If it determines that the serializer is dynamic, then it returns nil, stating that there is no static serializer.

Then, if there is no static serializer, it knows that it needs to dynamically determine the serializer. In some cases, like polymorphic: true, it simply looks at the object class, computes the serializer by name once and memoizes it in @serializers_for_name. Other times it haas more work to do, such as when options[:serializer] is a Proc.

The bottom line is this. If options[:serializer] is specified, that takes precedence over everything. The developer is specifically stating which serializer to use (or how to determine it if a Proc is specified). If not, then it needs to infer it, based on whether options[:polymorphic] is specified, by the relationship name, etc.

Is it complicated? Yes. Is it overly complicated? I don't think so, given that it all needed to be backwards compatible with options[:record_type] and options[:polymorphic]. I basically just consolidated all the logic into one place. Now to me the flow is quite simple:

  1. Up front, it determines if the serializer is static or dynamic.
  2. For every record being serialized, the relationship calls serializer_for to determine which serializer to use. If the serializer is static, it returns the static serializer immediately. If not, it performs the dynamic check (which also does what it can to memoize the results for performance purposes).

To get an idea of how much more consolidated the logic is, look in the original implementation about how many checks for polymorphic are all over the place. Look at how many times it needs to iteratively call serializer.to_s.constantize or object.class.name.demodulize.underscore.to_sym.to_s.constantize. And consider how much simpler the Relationship API is for determining the serializer of any given object.

Finally, to address the second part of your question:

...which we can calculate upon the relationship initialization DSL (say as part of the create_relationship)...

If the serializer is indeed static, then it is indeed determined "up front". It is not done in create_relationship for two reasons:

  1. As stated earlier, if options[:serializer] is specified, then it takes precedence over any other serializer determination such as polymorphic. Therefore, we don't want to create the relationship with its serializer defined as options[:serializer] || compute_serializer(). We could still compute it in create_relationship, but it would have to be as a separate variable. To me this is more complicated, not less, and at that point I would rather have the relationship handle it.

  2. Before, the serializer passed into Relationship was a symbol, which still has to be constantized. And it was constantized often. Now, the static serializer determined by the relationship is the actual serializer class, and THAT is memoized. Because of this, we cannot necessarily determine this when the Relationship is created, because the serializer class may not be loaded yet. Therefore, it is determined "up front", i.e. the first time it is attempted.

Hopefully this answers all your questions.

@mmkarim
Copy link

mmkarim commented Jan 8, 2020

Hi @christophersansone,
Thank you for coming up with this feature. I was trying the fixes from your branch and found an unusual bug. Here's how to recreate it.

Declare a has_many relation with polymorphic: true. In my example, user has_many documents.

class User
  include FastJsonapi::ObjectSerializer
  set_type :user 

  has_many :documents, polymorphic: true, serializer: Serializer::Document

And in the response you will see, the id field for the document is showing the id of the user.

Here's how the response looks like,

    "data": {
        "id": "1043",
        "type": "user",
        "relationships": {
            "documents": {
                "data": {
                    "id": "1043",
                    "type": "document"
                }
            }
        }
    }

Hopefully it will help you to fix the bug.

@christophersansone
Copy link
Contributor Author

christophersansone commented Jan 8, 2020

@mmkarim Thanks for the report! Can you confirm that this is not occurring in the master branch here?

I noticed some things in the existing code along the lines of properly resolving the IDs. It seems like id_method_name is not being used correctly in all cases. Maybe related to this commit.

If a Pet belongs_to :owner, then id_method_name should refer to the way to retrieve the owner's ID from the pet model, e.g. :owner_id. I think there is an issue where it is using id_method_name on the child instead of the parent. I think id_method_name can be misinterpreted two different ways: "the foreign key name of the relationship" vs. "the primary key name of the related record". I think it's being used in both ways, which is not correct. I did not go down that rabbit hole in this PR.

FWIW, id_method_name should refer to the foreign key name. If the primary key name needs to be determined from the relationship, it should look up the serializer and use the value from set_id. This could not be determined effectively prior to this PR, but now we are always looking up the serializer for relationships. So, I think we can solve the root issue effectively once this PR is merged.

If you can confirm that this problem is in master, would you mind creating a separate issue?

@mmkarim
Copy link

mmkarim commented Jan 8, 2020

@christophersansone I tried with the master branch. It's working correctly there.

@christophersansone
Copy link
Contributor Author

christophersansone commented Jan 8, 2020

@mmkarim I think this is what's going on:

  1. You are specifying a specific serializer: Serializer::Document. This will be used for every child record, therefore it is considered a "static" serializer.

  2. You are specifying polymorphic, which affects id_method_name. When polymorphic is not specified, then by default it assumes it can fetch the IDs via user.document_ids. When polymorphic is specified, it assumes that it must reach into each individual record and touch id, i.e. user.documents.map(&:id). As a result, id_method_name becomes id instead of document_ids.

  3. During serialization, the static serializer is causing it to fetch the IDs from the foreign key, i.e. user.document_ids, but because polymorphic changed the field name, it is looking up user.id instead.

I'll try to make this backward compatible with the old behavior, in case others run into this. The right solution here is this:

  • Let the Relationship determine what id_method_name should be if it is not specified in options.
  • When the serializer is static, it should assume it can do the foreign key lookup, i.e. user.document_ids.
  • When the serializer is dynamic (polymorphic or a Proc), it should iterate over the objects and use the ID field specified by its serializer's set_id.

Lastly, can you help me understand what you are expecting to accomplish when both polymorphic and serializer are specified? Before this PR, polymorphic would take precedence over serializer, which means that it would attempt to look up different serializers for each object. After this PR, serializer takes precedence over polymorphic, so it will always use Serializer::Document. Which one are you actually trying to accomplish? Is there a reason why you needed to specify both in this case?

As a rule of thumb, specifying polymorphic and serializer together is not a good idea because they have overlapping functionality, causing expectations to be unclear. @stas and I have discussed the impact of these two quite a bit.

@christophersansone
Copy link
Contributor Author

@mmkarim cc @stas I just created #45 to address the overall issues with id_method_name.

@christophersansone
Copy link
Contributor Author

@stas I created ObjectSerializer#serializer_for as we discussed. Let me know what else is needed to merge!

@mmkarim I believe I fixed your issue by giving polymorphic precedence over serializer. I would have preferred not to, but it does a better job of maintaining backward compatibility when both are specified. Can you please confirm that this is fixed?

@mmkarim
Copy link

mmkarim commented Jan 9, 2020

Hi @christophersansone, The issue with the id is solved now.
But now I'm not able to pass the serializer for the relations,

This is the error I'm getting,
Serializer::User cannot resolve a serializer class for 'Document'. Attempted to find 'Serializer::UserDocumentSerializer'. Consider specifying the serializer directly through options[:serializer].

though I'm passing the serializer already,

has_many :documents, polymorphic: true, serializer: Serializer::Document

@jopotts
Copy link
Contributor

jopotts commented Jan 9, 2020

Hi @christophersansone - great work. @mmkarim it would be good to have a test to show the failures, and prevent future regression. Not sure how easy that would be though.

And about @stas's comment about the fixtures and improved tests, I agree but as @christophersansone said, it should be separate from this PR. #33 is the start of that.

@ivan-garanin
Copy link

ivan-garanin commented Jan 13, 2020

Also it will be great opportunity to specify id_method_name / foreign_key in Proc too. As example

  id_method_name: -> (object) do
      if object.is_a?(Mammal)
        :slug
      elsif object.is_a?(Bird)
        :uuid
      else
        :id
      end
    end

I have got custom id (set_id) in several serializers and it's a headache for me right now.

Copy link
Collaborator

@stas stas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@christophersansone let's get this merged! 🙇‍♂️

@ivan-garanin please move the conversation into #45, thank you!

/cc @jopotts @kpheasey

@stas
Copy link
Collaborator

stas commented Feb 7, 2020

@christophersansone I'd like to get this merged, please let me know if you're ok me picking up the work where you left. Ty 🙇‍♂️

@christophersansone
Copy link
Contributor Author

@stas So sorry for dropping the ball. I had some time over the holidays but have been swamped since. I've been using it in production since then and it's been working great. I did find a minor race condition that I resolved, but other than that, it hasn't changed.

I resolved the merge conflict here in Github (just the README). I thought it would just create a new commit in my branch, but apparently not. I would have just rebased instead had I known. Anyway, I think it's good to go.

Is there anything else that needs to be done prior to merging? Feel free to ask or to just take the ball and run with it, whatever works best!

@stas stas merged commit 6d01bec into jsonapi-serializer:master Feb 15, 2020
@stas
Copy link
Collaborator

stas commented Feb 15, 2020

Thanks @christophersansone 🙇‍♂️

everlast240 added a commit to everlast240/grape_jsonapi-serializer that referenced this pull request Aug 21, 2020
…zer/jsonapi-serializer),

instead of fast_jsonapi (https://github.com/Netflix/fast_jsonapi).

Also contains a small fix in the GrapeSwagger parser to use the relationship `key` instead of `record_type`,
which may be caused by jsonapi-serializer/jsonapi-serializer#32

TODO: A couple of the specs are commented out for now.
@everlast240
Copy link

Hello,
It seems to me that following this PR, the record_type of relationships remains empty in some cases. This may break code, which depends on it being set and effectively could lead of having type: null in relationships.
I noticed this while updating https://github.com/EmCousin/grape_fast_jsonapi to work with the new jsonapi-serializer fork of fast_jsonapi. You can see my update / fork and what I mean here: https://github.com/everlast240/grape_jsonapi-serializer/blob/master/lib/grape_jsonapi-serializer/parser.rb#L118

      def relationships_example(relationship_data)
        # data = { id: 1, type: relationship_data[:record_type] }
        data = { id: 1, type: Dry::Inflector.new.singularize(relationship_data[:key]).to_sym }

The original implementation used record_type, but I had to rework it to use the key (that's what I decided in the end, after testing several other options).
This not only caused the gem's specs to fail with errors like:

       expected: {:data=>{:example=>{:attributes=>{}, :id=>1, :relationships=>{:user=>{:data=>{:id=>1, :type=>:user}}}...
            got: {:data=>{:example=>{:attributes=>{}, :id=>1, :relationships=>{:user=>{:data=>{:id=>1, :type=>nil}}}, ...

But also, the model documentation generated for my API by Grape Swagger started having relationship types as null too. E.g.:

{
  "data": {
    "id": 1,
    "type": "group",
    "attributes": {
      "name": "Example string",
      "cost-center": "Example string",
      "description": "Example string",
      "created-at": "Example string",
      "updated-at": "Example string"
    },
    "relationships": {
      "users": {
        "data": [
          {
            "id": 1,
            "type": null
          }
        ]
      }
    }
  }
}

PS: Such an issue doesn't exist with the original Netflix codebase

@christophersansone
Copy link
Contributor Author

@everlast240 Hi there! Yes we made some changes to #record_type in this PR for two reasons:

  1. Better support of polymorphic relationships, where different types may be output for the same relationship.

  2. Offer a "Single Source of Truth" for the type using the object serializer instead of defining them at the relationship level, which lead to duplication of code, prone to error, and difficult if not impossible for polymorphic relationships.

Now, the record_type of a relationship is inferred by the serializer of the related object. I'm not entirely sure what you're trying to accomplish, but if the relationship is static, you can look it up via Relationship#static_record_type. If the relationship is dynamic (such as a polymorphic relationship) and there is no :record_type specified, then there is no way to know up front what the type should be until you run a record through it, so static_record_type will result in nil. Note that I would consider static_record_type to be a private API, and although we don't intend to change it anytime soon, use at your own risk.

@stas
Copy link
Collaborator

stas commented Aug 26, 2020

@everlast240, actually I'd be happy to review an example and maybe put it into a test case if you're willing to share it.

@christophersansone thanks for helping us understand the internals 🙇

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Polymorphic hash does not select the proper serializer for included records Allow a relationship's serializer to be computed based on the object

6 participants