-
-
Notifications
You must be signed in to change notification settings - Fork 4.5k
ASF: implement RPC V2 CBOR parser and serializer #13125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
281b641 to
b5db830
Compare
Test Results (amd64) - Integration, Bootstrap 5 files 5 suites 2h 38m 38s ⏱️ Results for commit 646b04c. ♻️ This comment has been updated with latest results. |
LocalStack Community integration with Pro 2 files 2 suites 1h 58m 41s ⏱️ Results for commit 646b04c. ♻️ This comment has been updated with latest results. |
9809182 to
a2853d1
Compare
b5db830 to
8806c2c
Compare
alexrashed
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Absolutely great! Another great step towards #13028 / to support CBOR for CloudWatch before the SDKs perform the switch! 💯
The next step / only step left would be to adjust the service router and the serializer / parser calls to support first-class multi-protocol support on handler-chain level! 🤩
I really want to stress how great this nice, clean, iterative approach here is! 🦸🏽
|
|
||
| ServiceName = str | ||
| ProtocolName = Literal["query", "json", "rest-json", "rest-xml", "ec2"] | ||
| ProtocolName = Literal["query", "json", "rest-json", "rest-xml", "ec2", "smithy-rpc-v2-cbor"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
praise: 🥳
| # TODO: do we want to add a check if the user-defined protocol is part of the available ones in the ServiceModel? | ||
| # or should it be checked once | ||
| service_protocol = protocol or service.protocol |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thought: I guess this here depends a bit on the question how strict or loose we want to be here. When thinking about it, I think the serializer should be loose (we want to avoid failing with the serialization, therefore using maybe another layer of fallback).
For the parser, I think this is different. We want to fail fast / be strict / only take the protocol determined in the service router, otherwise this could lead to weird parsing issues which are hard to debug.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree. I believe the protocol determination in the Service Name Parser will only be able to select the one defined in the service model, with a fallback to the default one of the service.
In this comment, I was wondering if we would need to add an additional check if the protocol is indeed again available in the service model. Technically, I don't think we could use a protocol not defined in the service model here, because the Service Name Parser would have properly detected it and can only return a protocol that is defined.
I suppose this check would only be enforced when we are manually using the parser and serializer, which can maybe still happen somewhere in the code. Having clear exceptions might be better here. It's just that this check is going to be "useful" in our normal usage I suppose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I need to address the unix timestamp issue in this PR, I'll update this comment to be clearer and will leave it to be addressed in the multi-protocol PRs, as this where things are coming come into place 👍
| # think about a better way, this is very hacky | ||
| # Error responses in the rpcv2Cbor protocol MUST be serialized identically to standard responses with one | ||
| # additional component to distinguish which error is contained: a body field named __type. | ||
| shape_copy = copy.deepcopy(shape) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thought: Similar to this comment in #13103, I hope we might find a less expensive solution for this... 😅
|
|
||
| # the Smithy spec defines that only `application/cbor` is supported for RPC v2 CBOR | ||
| SUPPORTED_MIME_TYPES = [APPLICATION_CBOR] | ||
| # TODO: check the timestamp format for RpcV2CBOR, which might be different than Kinesis CBOR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thought: Based on this code comment in botocore, I think unixtimestamp should be just right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, thanks for linking it! the comment right under is talking about milli seconds: https://github.com/boto/botocore/blob/28a90a45d7de077770665462ef8e1b53a9dd57ad/botocore/parsers.py#L898C10-L898C62
and the specs is also talking about milli seconds precision, and that we need to disregard the timestampFormat trait: https://smithy.io/2.0/additional-specs/protocols/smithy-rpc-v2.html#timestamp-type-serialization
As support for half-precision floating-point values is inconsistent across implementations, timestamp types SHOULD NOT serialize into a half-precision (16 bit) numeric CBOR value for the tagged value.
This protocol uses epoch-seconds, also known as Unix timestamps, with millisecond (1/1000th of a second) resolution. The timestampFormat MUST NOT be respected to customize timestamp serialization.
So they talk about Unix timestamps but with millisecond resolution, so this is probably wrong now that I see it 😅 I'll try to manually test it against AWS and fix that for this PR + unit test it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
circling back: this is going be a little of a challenge, because ArcRegionSwitch does not define any InputShape with a date time... and is the only service supporting smithy-rpc-v2-cbor for now. We can get back to this once we add CloudWatch, I'll still check on the serializer side because it does return some timestamp data and define it in its output shapes 👍
| request body. | ||
| """ | ||
|
|
||
| # TODO: investigate datetime format for RpcV2CBOR protocol, which might be different than Kinesis CBOR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Seems like a duplicate to a comment in the serializer, and I think unixtimestamp should just be fine for now?
bentsku
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for the review! I'll investigate the timestamp resolution and will update the PR 👍 I'll try to open the multi-protocol one later today 🚀
|
|
||
| # the Smithy spec defines that only `application/cbor` is supported for RPC v2 CBOR | ||
| SUPPORTED_MIME_TYPES = [APPLICATION_CBOR] | ||
| # TODO: check the timestamp format for RpcV2CBOR, which might be different than Kinesis CBOR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, thanks for linking it! the comment right under is talking about milli seconds: https://github.com/boto/botocore/blob/28a90a45d7de077770665462ef8e1b53a9dd57ad/botocore/parsers.py#L898C10-L898C62
and the specs is also talking about milli seconds precision, and that we need to disregard the timestampFormat trait: https://smithy.io/2.0/additional-specs/protocols/smithy-rpc-v2.html#timestamp-type-serialization
As support for half-precision floating-point values is inconsistent across implementations, timestamp types SHOULD NOT serialize into a half-precision (16 bit) numeric CBOR value for the tagged value.
This protocol uses epoch-seconds, also known as Unix timestamps, with millisecond (1/1000th of a second) resolution. The timestampFormat MUST NOT be respected to customize timestamp serialization.
So they talk about Unix timestamps but with millisecond resolution, so this is probably wrong now that I see it 😅 I'll try to manually test it against AWS and fix that for this PR + unit test it
| # TODO: do we want to add a check if the user-defined protocol is part of the available ones in the ServiceModel? | ||
| # or should it be checked once | ||
| service_protocol = protocol or service.protocol |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree. I believe the protocol determination in the Service Name Parser will only be able to select the one defined in the service model, with a fallback to the default one of the service.
In this comment, I was wondering if we would need to add an additional check if the protocol is indeed again available in the service model. Technically, I don't think we could use a protocol not defined in the service model here, because the Service Name Parser would have properly detected it and can only return a protocol that is defined.
I suppose this check would only be enforced when we are manually using the parser and serializer, which can maybe still happen somewhere in the code. Having clear exceptions might be better here. It's just that this check is going to be "useful" in our normal usage I suppose.
| # TODO: do we want to add a check if the user-defined protocol is part of the available ones in the ServiceModel? | ||
| # or should it be checked once | ||
| service_protocol = protocol or service.protocol |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I need to address the unix timestamp issue in this PR, I'll update this comment to be clearer and will leave it to be addressed in the multi-protocol PRs, as this where things are coming come into place 👍
aef8ffa to
646b04c
Compare
|
@alexrashed re-requested your review following the latest changes fixing timestamp handling and adding a different serialization format for data structures (finite vs indefinite maps & list). Had to force-push after the merge, but the changes starts at 2ba257d I think this would the proper diff of the changes: https://github.com/localstack/localstack/pull/13125/files/a02653974f9af7f5d9b178c4d204eafd278d7bab..646b04c8544e25894ad0837be801a7586d2e813e Thanks! |
alexrashed
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome! Huge kudos for this really great deep dive into the data structure length handling! The implementation is looking good, I'm curious if the fixed length mode will become necessary 😅
Hopefully not 😄 it will be easy to remove once we're certain, as the work was already done, I kept it there. It feels like different service might implement different serializers on their side, but hopefully for us it won't be the case. 🤞 |
| timestamp = float(self._convert_timestamp_to_str(value)) | ||
| initial_byte = self._get_initial_byte(self.FLOAT_AND_SIMPLE_MAJOR_TYPE, 27) | ||
| serialized.extend(initial_byte + struct.pack(">d", timestamp)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we might want to revisit this for compatibility reasons: even if it looks like most clients should be able to deserialize this timestamp, we know Botocore handles it, and the Go SDK deserializer is able to deserialize float32, float64, cbor uint (within int64 bounds) and cbor -int (within int64 bounds) as mentioned here:
We'll need to check the AWS raw response to see what kind of resolution they return, if it depends on operation/service, and if we can have any kind of compatibility issue.
We might need to serialize to an int instead, maybe depending on the resolution of the serialized datetime. Hopefully, it is the same for every datetime.
Motivation
This PR is part of a bigger initiative to implement support for the new Smithy Protocol
smithy-rpc-v2-cbor, as well as the new Botocore multi-protocol support for single services (before, a service could only define one protocol).This PR is stacked on top of #13103.
This PR properly implements the
smithy-rpc-v2-cborprotocol and adds support for it.Currently, no service that LocalStack supports implements this protocol, but we can still our unit tests to validate the serialization/deserialization works.
The logic is based on Botocore's own serializer and parser, with some adaptations due to the client/server difference.
One of the main feature of RPC v2-based protocol is its routing, following the
/service/<service-name>/operation/<operation-name>format. This is implemented in our parser.In the next steps, while implementing proper multi-protocol support, we will run integration tests with this protocol against AWS, to make sure our deserialization/serialization works end-to-end.
Because we do not support any service with
smithy-rpc-v2-cboryet, I have not implemented the Service detection yet, because we cannot test it. This will come in a follow-up when implementing the multi-protocol support (as it will also implement protocol detection, we will addrpc-v2-cborservice name parsing).Changes
BaseRpcV2RequestParserandBaseRpcV2ResponseSerializerRpcV2CBORRequestParserandRpcV2CBORResponseSerializerwhich supportssmithy-rpc-v2-cbor