-
-
Notifications
You must be signed in to change notification settings - Fork 4.5k
ASF: implement CBOR parser and serializer #13103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
S3 Image Test Results (AMD64 / ARM64) 2 files 2 suites 8m 24s ⏱️ Results for commit a2853d1. ♻️ This comment has been updated with latest results. |
Test Results - Preflight, Unit22 149 tests 20 411 ✅ 6m 32s ⏱️ Results for commit a2853d1. ♻️ This comment has been updated with latest results. |
Test Results (amd64) - Acceptance7 tests 5 ✅ 3m 8s ⏱️ Results for commit a2853d1. ♻️ This comment has been updated with latest results. |
Test Results (amd64) - Integration, Bootstrap 5 files 5 suites 2h 35m 36s ⏱️ Results for commit a2853d1. ♻️ This comment has been updated with latest results. |
LocalStack Community integration with Pro 2 files 2 suites 1h 55m 48s ⏱️ Results for commit a2853d1. ♻️ This comment has been updated with latest results. |
81df418 to
530069d
Compare
|
Currently, only patch changes are allowed on main. Your PR labels (area: asf, semver: minor, skip-docs) indicate that it cannot be merged into the main at this time. |
9809182 to
a2853d1
Compare
alexrashed
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Absolutely great! I only have a few comments, but mostly praises 😅
This is a great first step towards a nice implementation of the cbor protocol as first class citizen in our ASF, i.e. preparing for #13028! 💯 🚀
| def test_json_protocol_cbor_serialization(headers_dict): | ||
| @pytest.mark.parametrize("serializer_factory", [create_serializer, _cbor_serializer_factory]) | ||
| def test_json_protocol_cbor_serialization(headers_dict, serializer_factory): | ||
| # TODO: test datetime serialization format for Kinesis manually |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: Why does this have to be tested manually? Can we maybe extend this test to use an operation which has a datetime in the spec?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think because right now, Botocore is unable to parse/serialize CBOR data for Kinesis, so I would need to write a manual test somehow to verify what AWS is returning "raw" and compare it. Sorry, when I meant manually, I meant we need to verify against AWS the raw response and put it in a unit test, not "fully manual"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
context: Kinesis' CBOR is a different wire format than Smithy RPC v2 CBOR, and you're accurate in botocore not supporting it. The AWS SDK for Java has a working implementation to reference if you want to pursue it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, I think we will park Kinesis CBOR support as its "own" protocol for now, as we rely on botocore internals, and will continue to treat it as a sub-part/different serialization format of the json protocol. We might revisit in the future. Thanks for the tips about Java support!
| [{"Content-Type": "application/cbor"}, {"Accept": "application/cbor"}], | ||
| ) | ||
| def test_json_protocol_cbor_serialization(headers_dict): | ||
| @pytest.mark.parametrize("serializer_factory", [create_serializer, _cbor_serializer_factory]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
praise: Nice way to test the serialization for both ways! 💯
| if shape: | ||
| # FIXME: we need to manually add the `__type` field to the shape as it is not part of the specs | ||
| # think about a better way, this is very hacky | ||
| shape_copy = copy.deepcopy(shape) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thought: This here is potentially very expensive. This only happens in case of an exception (which is already expensive and shouldn't happen too often), but maybe we can add this to the shape globally when it's loaded and avoid the deep copy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. I'm a bit worried about adding it to the global shape but I will investigate it once I implement multi-protocol support. I'm especially worried for multi protocol with all of query, json and smithy-rpc-v2-cbor, because the __type shape is needed for JSON and RPC v2 CBOR, but not for Query, so I would then need to implement the deletion of __type in Query (but I saw this workaround in the RPC v2 Parser of Botocore (filtering the __type key), so I think we could have it in our serializer 🤔)
I'll also need to properly verify that adding shapes globally wouldn't mess our internal botocore clients either.
I'm planning to have a follow-up on this once everything is implemented and I have a better view to remove this hack and find a proper solution, global, or changing the signature of _serialize_data_item to accept additional optional members and serialize them
| def _handle_event_stream(self, request: Request, shape: Shape, event_name: str): | ||
| # TODO handle event streams | ||
| raise NotImplementedError |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI: There are only 7 services in botocore right now which define an eventstream property. All of them use a json protocol (either json or rest-json), and cloudwatch is none of them. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the info! I can see that the RPC v2 CBOR protocol implements evenstream and is properly defined in the spec, so I'd expect it to be supported at some point (via both CBOR and RPC v2: https://smithy.io/2.0/additional-specs/protocols/smithy-rpc-v2.html)
But at least for now, it seems like we don't need to support it, which is great news, thanks for making sure of it 🙏
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While it's not needed right now, eventstream operations could be added to any service that supports RPC v2 CBOR and we'd likely not make significant note of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
praise: Nice! This will hopefully also help us in the future to remove the dependency on and patching of cbor2 🤩
bentsku
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for the PR review! 🚀 I'll wait for the merge freeze to be lifted to start merging and rebasing PRs 👍
Concerning the __type serialization, I've created a ticket to track it and make sure this is addressed before the end of the small project, once I have the full picture with multi-protocol support. Have a few ideas on how to solve it 👍
| def test_json_protocol_cbor_serialization(headers_dict): | ||
| @pytest.mark.parametrize("serializer_factory", [create_serializer, _cbor_serializer_factory]) | ||
| def test_json_protocol_cbor_serialization(headers_dict, serializer_factory): | ||
| # TODO: test datetime serialization format for Kinesis manually |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think because right now, Botocore is unable to parse/serialize CBOR data for Kinesis, so I would need to write a manual test somehow to verify what AWS is returning "raw" and compare it. Sorry, when I meant manually, I meant we need to verify against AWS the raw response and put it in a unit test, not "fully manual"
| def _handle_event_stream(self, request: Request, shape: Shape, event_name: str): | ||
| # TODO handle event streams | ||
| raise NotImplementedError |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the info! I can see that the RPC v2 CBOR protocol implements evenstream and is properly defined in the spec, so I'd expect it to be supported at some point (via both CBOR and RPC v2: https://smithy.io/2.0/additional-specs/protocols/smithy-rpc-v2.html)
But at least for now, it seems like we don't need to support it, which is great news, thanks for making sure of it 🙏
| if shape: | ||
| # FIXME: we need to manually add the `__type` field to the shape as it is not part of the specs | ||
| # think about a better way, this is very hacky | ||
| shape_copy = copy.deepcopy(shape) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. I'm a bit worried about adding it to the global shape but I will investigate it once I implement multi-protocol support. I'm especially worried for multi protocol with all of query, json and smithy-rpc-v2-cbor, because the __type shape is needed for JSON and RPC v2 CBOR, but not for Query, so I would then need to implement the deletion of __type in Query (but I saw this workaround in the RPC v2 Parser of Botocore (filtering the __type key), so I think we could have it in our serializer 🤔)
I'll also need to properly verify that adding shapes globally wouldn't mess our internal botocore clients either.
I'm planning to have a follow-up on this once everything is implemented and I have a better view to remove this hack and find a proper solution, global, or changing the signature of _serialize_data_item to accept additional optional members and serialize them
Motivation
This PR is part of a bigger initiative to implement support for the new Smithy Protocol
smithy-rpc-v2-cbor, as well as the new Botocore multi-protocol support for single services (before, a service could only define one protocol).This PR is the first step in adding support for the
smithy-rpc-v2-cborprotocol via our parsers and serializers.By adding support for this new protocol, Botocore also implemented CBOR serialization and parsing. We can rely on that to implement our own custom body parsing/serializing in the CBOR format.
Currently in Botocore, only the services using the
smithy-rpc-v2-cborprotocol uses this implementation (likearc-region-switch, and sooncloudwatch).Internally, we know that the Kinesis service can respond using CBOR, but with a
json-protocol based type of response. This PR implements and internalize the CBOR body parsing and serializing, and try to use this protocol to parser/serialize Kinesis requests & responses. In some follow-up to this initiative, we could also implement better handling of CBOR for Kinesis.This PR is better reviewed by looking at its direct follow-up, #13125. It builds on this to properly implement the
smithy-rpc-v2-cborprotocol entirely.Changes
BaseCBORRequestParserandBaseCBORResponseSerializerBaseCBORRequestParserand theJSONResponseSerializerto try out against Kinesis