-
-
Notifications
You must be signed in to change notification settings - Fork 4.5k
fix sqs multi-protocol handling and upgrade botocore #9710
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
f01ddec to
cb8f2cf
Compare
0ebd655 to
ab02b47
Compare
c092a3c to
3b792e5
Compare
7619561 to
09a4c3f
Compare
LocalStack Community integration with Pro 2 files 2 suites 1h 17m 41s ⏱️ Results for commit 72d8615. ♻️ This comment has been updated with latest results. |
bentsku
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, this is an extensive set of changes. Biggest takeaway the clean separation between the protocol and the service name, made possible with the nice changes in the service router!
We've also had a sync looking at a lot of test changes, and it looks great! I like the parametrization making sure we properly test both protocols.
There's a lot of clean up in here, and it seems #9828 has been merged, which would maybe need additional snapshots and parametrization as well.
All in all, LGTM! Thank you so much for addressing this and making a nice clean fix 🚀
thrau
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this is already a huge improvement over what we had before! I do have a few open questions/comments that I'd like to understand before approving, but overall really great work!
| def test_query_protocol_error_serialization(): | ||
| exception = InvalidMessageContents("Exception message!") | ||
| _botocore_error_serializer_integration_test( | ||
| "sqs-query", "SendMessage", exception, "InvalidMessageContents", 400, "Exception message!" | ||
| "sqs", "SendMessage", exception, "InvalidMessageContents", 400, "Exception message!" | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since this is specifically a test for the query protocol, should we maybe use a service that we know is going to use an unaltered version of the query serializer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SQS is used a lot across the parser tests. In fact, the parser and serializer tests should be refactored and structured in the future, but I would like to avoid adding this to this PR, if it's okay for you.
|
|
||
|
|
||
| def resolve_conflicts(candidates: Set[str], request: Request): | ||
| def resolve_conflicts(candidates: Set[ServiceModel], request: Request): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are we still passing a Set[ServiceModel] ? is ServiceModel even hashable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was hashable, yes, but it's a good point. The service router has now been restructured such that the different stages are working based on service model identifiers, and the main function loads and returns the model.
So in the latest revision, this specific parameter has been changed to a set of ServiceIdentifier (a named tuple containing the service name and the protocol).
| # default sqs protocol (`query`). | ||
| content_type = request.headers.get("Content-Type") | ||
| return "sqs" if content_type == "application/x-amz-json-1.0" else "sqs-query" | ||
| return "sqs-json" if content_type == "application/x-amz-json-1.0" else "sqs" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: i was under the impression we were getting rid of the virtual service names? shouldn't this then return the service model instead? if not, should we introduce some form of value type for the (service,version,protocol) tuple which should uniquely identify a service model?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for you input, this is how it's working now (we have a named tuple uniquely identifying a loadable service model), which is the return type of the conflict resolution / heuristic stage functions.
localstack/aws/spec.py
Outdated
| @cached_property | ||
| def target_prefix_index(self) -> Dict[str, List[ServiceName]]: | ||
| def target_prefix_index(self) -> Dict[str, List[ServiceModel]]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We seem to now store ServiceModel instances in the catalog index, which is the one we serialize and save to disk. How does this affect the size of the cache file and the performance of serializing/deserializing the cache?
I would be more inclined to keep cache structures to primitives types when they are stored. So something like the service,version,protocol tuple i mentioned earlier could be useful for that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a very good point, the model caching totally slipped my mind.
I changed the index such that it now only uses ServiceModelIdentifier (a NamedTuple).
This obviously increases the size of the service catalog pickle, but just by ~225kb instead of multiple megabytes: 383,9kb (old) compared to 608,6kb (new)
This results in the following building and loading times of the service catalog:
| Stage | Previous Index | New Index |
|---|---|---|
| Building | 2.0 - 2.1s | 1.8 - 2.2s |
| Loading | 14 - 15ms | 27 - 40ms |
However, the variance was quite high for my tests (on my notebook).
But we can see that the building doesn't really take considerably longer, the loading takes double the time, but is only executed once per LocalStack run and it's an impact of max. 25ms.
localstack/aws/spec.py
Outdated
| # the service name needs to be set to standard sqs | ||
| if service == "sqs-json": | ||
| service = "sqs" | ||
| return ServiceModel(service_description, service) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be really good if we could get protocol: str as an optional parameter into our load_service implementation instead of relying on the fake service names. i understand that we currently need them under the hood to know which spec file to load, but basically our api should understand protocols as first-class citizen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good point. The new solution does not use the sqs-json service anymore in the code.
However, using a custom service name has a big benefits: This way it integrates really nicely with botocore and our internal AWS client factory. We can just use aws_client.sqs_json to load a client which reads the spec we ship in the localstack module. This can be really helpful in the tests, and might also come in handy for other use cases. So we would like to have two ways to load a service with a specific protocol:
- Add the suffix to the service name (f.e.
sqs-json), which enables the use case mentioned above. - Add the protocol to the service loading function, to have a clean and clear way to define a specific protocol of a service to load.
I also tried to avoid patching botocore, but since the spec#load_service is the only function using the custom loader, which is why this custom logic (implementing the protocol-specific loading) is contained in that function.
2d78884 to
6fa2171
Compare
f9b30e2 to
005a20a
Compare
01734d5 to
b14b789
Compare
b14b789 to
e151e88
Compare
e151e88 to
5afd969
Compare
5afd969 to
72d8615
Compare
thrau
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fantastic job @alexrashed! thanks a lot for your patience and persistence on this long standing issue! i think we found a good solution for now, and it's exciting to see how we're really pushing the boundaries of botocore and what we need from it.
well done!
|
merging the PR to get some data over the weekend. will keep an eye on it! |
Motivation
This PR aims at cleaning up after the quick fixes with the protocol switch in SQS of
botocorefromquerytojson(and now back again).These changes of the protocol are highly problematic, because this means that clients are suddenly changing the way how they talk to AWS (and in turn also with LocalStack). There's nothing we can do on our side besides implementing the new protocol (in addition to the old protocol to remain downwards-compatible with other or older clients).
Here's a timeline of what happened:
botocoreswitched the protocol for SQS fromquerytojsonin version1.29.127(specifically this commit: boto/botocore@2395012).botocorereverts the change with Revert SQS protocol changes boto/botocore#2931 in version1.29.128.botocoreagain performs the switch in version1.31.81(specifically this commit: boto/botocore@84c7847).botocorechanges the exception handling for SQS in version1.31.85in Fix Sqs ErrorType not being parsed properly boto/botocore#3054.botocoreswitches to gzipped specs which is not compatible with moto at the time.botocoreas a first response.motosuch that it contains Techdebt: Fix compatibility with botocore 1.32.1 getmoto/moto#7030.botocoreagainbotocoreswitches the protocol for SQS back fromjsontoqueryin version1.32.6(specifically this PR: Revert SQS protocol changes boto/botocore#3071)botocoreas a first response.This is where we are at. With this PR we want to:
sqs-queryshould only be used in very specific sections explicitly handling thequeryspec.botocore, effectively switching back toqueryas the default protocol, but still supportjson.Dockerfileintroduced with pin global awscli install in Dockerfile #9767.Changes
sqs-queryservice "alias", and removes all special cases handling this service alias.serializerandparsersuch that they differentiate based on the protocol of the given service model (instead of only based on the service).service_routersuch that it delivers a service model instead of only the service name (because that would be ambiguous, there are two service models for SQS).botocoreversion (since they had to be hold back, see Update ASF APIs #9735).botocore.jsonback toqueryas default protocol for SQS.querytojson(the "non-default" protocol).serializeradjustments which are necessary due to the duality of the protocols, their divergence, and the fact that we only generate a single ASF stub based on one of the specs.