Skip to content

Conversation

@bentsku
Copy link
Contributor

@bentsku bentsku commented Sep 4, 2025

Motivation

This PR is part of a bigger initiative to implement support for the new Smithy Protocol smithy-rpc-v2-cbor, as well as the new Botocore multi-protocol support for single services (before, a service could only define one protocol).

This PR is the first step in adding support for the smithy-rpc-v2-cbor protocol via our parsers and serializers.

By adding support for this new protocol, Botocore also implemented CBOR serialization and parsing. We can rely on that to implement our own custom body parsing/serializing in the CBOR format.

Currently in Botocore, only the services using the smithy-rpc-v2-cbor protocol uses this implementation (like arc-region-switch, and soon cloudwatch).

Internally, we know that the Kinesis service can respond using CBOR, but with a json-protocol based type of response. This PR implements and internalize the CBOR body parsing and serializing, and try to use this protocol to parser/serialize Kinesis requests & responses. In some follow-up to this initiative, we could also implement better handling of CBOR for Kinesis.

This PR is better reviewed by looking at its direct follow-up, #13125. It builds on this to properly implement the smithy-rpc-v2-cbor protocol entirely.

Changes

  • internalize CBOR body parsing/serializing via the BaseCBORRequestParser and BaseCBORResponseSerializer
  • validate that the CBOR serde works properly by trying it against our known Kinesis tests
  • implements a CBOR parser based on the BaseCBORRequestParser and the JSONResponseSerializer to try out against Kinesis

@bentsku bentsku added this to the 4.9 milestone Sep 4, 2025
@bentsku bentsku self-assigned this Sep 4, 2025
@bentsku bentsku added area: asf semver: minor Non-breaking changes which can be included in minor releases, but not in patch releases docs: skip Pull request does not require documentation changes labels Sep 4, 2025
@github-actions
Copy link

github-actions bot commented Sep 4, 2025

S3 Image Test Results (AMD64 / ARM64)

    2 files    2 suites   8m 24s ⏱️
  532 tests 480 ✅  52 💤 0 ❌
1 064 runs  960 ✅ 104 💤 0 ❌

Results for commit a2853d1.

♻️ This comment has been updated with latest results.

@github-actions
Copy link

github-actions bot commented Sep 4, 2025

Test Results - Preflight, Unit

22 149 tests   20 411 ✅  6m 32s ⏱️
     1 suites   1 738 💤
     1 files         0 ❌

Results for commit a2853d1.

♻️ This comment has been updated with latest results.

@github-actions
Copy link

github-actions bot commented Sep 4, 2025

Test Results (amd64) - Acceptance

7 tests   5 ✅  3m 8s ⏱️
1 suites  2 💤
1 files    0 ❌

Results for commit a2853d1.

♻️ This comment has been updated with latest results.

@github-actions
Copy link

github-actions bot commented Sep 4, 2025

Test Results (amd64) - Integration, Bootstrap

    5 files      5 suites   2h 35m 36s ⏱️
5 035 tests 4 544 ✅ 491 💤 0 ❌
5 041 runs  4 544 ✅ 497 💤 0 ❌

Results for commit a2853d1.

♻️ This comment has been updated with latest results.

@github-actions
Copy link

github-actions bot commented Sep 4, 2025

LocalStack Community integration with Pro

    2 files      2 suites   1h 55m 48s ⏱️
4 661 tests 4 330 ✅ 331 💤 0 ❌
4 663 runs  4 330 ✅ 333 💤 0 ❌

Results for commit a2853d1.

♻️ This comment has been updated with latest results.

@bentsku bentsku force-pushed the cbor-parser-serializer branch from 81df418 to 530069d Compare September 10, 2025 21:44
@localstack-bot
Copy link
Contributor

Currently, only patch changes are allowed on main. Your PR labels (area: asf, semver: minor, skip-docs) indicate that it cannot be merged into the main at this time.

@bentsku bentsku force-pushed the cbor-parser-serializer branch from 9809182 to a2853d1 Compare September 15, 2025 21:36
@bentsku bentsku marked this pull request as ready for review September 15, 2025 22:09
Copy link
Member

@alexrashed alexrashed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely great! I only have a few comments, but mostly praises 😅
This is a great first step towards a nice implementation of the cbor protocol as first class citizen in our ASF, i.e. preparing for #13028! 💯 🚀

def test_json_protocol_cbor_serialization(headers_dict):
@pytest.mark.parametrize("serializer_factory", [create_serializer, _cbor_serializer_factory])
def test_json_protocol_cbor_serialization(headers_dict, serializer_factory):
# TODO: test datetime serialization format for Kinesis manually
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Why does this have to be tested manually? Can we maybe extend this test to use an operation which has a datetime in the spec?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think because right now, Botocore is unable to parse/serialize CBOR data for Kinesis, so I would need to write a manual test somehow to verify what AWS is returning "raw" and compare it. Sorry, when I meant manually, I meant we need to verify against AWS the raw response and put it in a unit test, not "fully manual"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

context: Kinesis' CBOR is a different wire format than Smithy RPC v2 CBOR, and you're accurate in botocore not supporting it. The AWS SDK for Java has a working implementation to reference if you want to pursue it.

Copy link
Contributor Author

@bentsku bentsku Sep 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, I think we will park Kinesis CBOR support as its "own" protocol for now, as we rely on botocore internals, and will continue to treat it as a sub-part/different serialization format of the json protocol. We might revisit in the future. Thanks for the tips about Java support!

[{"Content-Type": "application/cbor"}, {"Accept": "application/cbor"}],
)
def test_json_protocol_cbor_serialization(headers_dict):
@pytest.mark.parametrize("serializer_factory", [create_serializer, _cbor_serializer_factory])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

praise: Nice way to test the serialization for both ways! 💯

if shape:
# FIXME: we need to manually add the `__type` field to the shape as it is not part of the specs
# think about a better way, this is very hacky
shape_copy = copy.deepcopy(shape)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought: This here is potentially very expensive. This only happens in case of an exception (which is already expensive and shouldn't happen too often), but maybe we can add this to the shape globally when it's loaded and avoid the deep copy?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I'm a bit worried about adding it to the global shape but I will investigate it once I implement multi-protocol support. I'm especially worried for multi protocol with all of query, json and smithy-rpc-v2-cbor, because the __type shape is needed for JSON and RPC v2 CBOR, but not for Query, so I would then need to implement the deletion of __type in Query (but I saw this workaround in the RPC v2 Parser of Botocore (filtering the __type key), so I think we could have it in our serializer 🤔)

I'll also need to properly verify that adding shapes globally wouldn't mess our internal botocore clients either.

I'm planning to have a follow-up on this once everything is implemented and I have a better view to remove this hack and find a proper solution, global, or changing the signature of _serialize_data_item to accept additional optional members and serialize them

Comment on lines +1218 to +1220
def _handle_event_stream(self, request: Request, shape: Shape, event_name: str):
# TODO handle event streams
raise NotImplementedError
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: There are only 7 services in botocore right now which define an eventstream property. All of them use a json protocol (either json or rest-json), and cloudwatch is none of them. :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the info! I can see that the RPC v2 CBOR protocol implements evenstream and is properly defined in the spec, so I'd expect it to be supported at some point (via both CBOR and RPC v2: https://smithy.io/2.0/additional-specs/protocols/smithy-rpc-v2.html)

But at least for now, it seems like we don't need to support it, which is great news, thanks for making sure of it 🙏

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While it's not needed right now, eventstream operations could be added to any service that supports RPC v2 CBOR and we'd likely not make significant note of it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

praise: Nice! This will hopefully also help us in the future to remove the dependency on and patching of cbor2 🤩

Copy link
Contributor Author

@bentsku bentsku left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the PR review! 🚀 I'll wait for the merge freeze to be lifted to start merging and rebasing PRs 👍

Concerning the __type serialization, I've created a ticket to track it and make sure this is addressed before the end of the small project, once I have the full picture with multi-protocol support. Have a few ideas on how to solve it 👍

def test_json_protocol_cbor_serialization(headers_dict):
@pytest.mark.parametrize("serializer_factory", [create_serializer, _cbor_serializer_factory])
def test_json_protocol_cbor_serialization(headers_dict, serializer_factory):
# TODO: test datetime serialization format for Kinesis manually
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think because right now, Botocore is unable to parse/serialize CBOR data for Kinesis, so I would need to write a manual test somehow to verify what AWS is returning "raw" and compare it. Sorry, when I meant manually, I meant we need to verify against AWS the raw response and put it in a unit test, not "fully manual"

Comment on lines +1218 to +1220
def _handle_event_stream(self, request: Request, shape: Shape, event_name: str):
# TODO handle event streams
raise NotImplementedError
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the info! I can see that the RPC v2 CBOR protocol implements evenstream and is properly defined in the spec, so I'd expect it to be supported at some point (via both CBOR and RPC v2: https://smithy.io/2.0/additional-specs/protocols/smithy-rpc-v2.html)

But at least for now, it seems like we don't need to support it, which is great news, thanks for making sure of it 🙏

if shape:
# FIXME: we need to manually add the `__type` field to the shape as it is not part of the specs
# think about a better way, this is very hacky
shape_copy = copy.deepcopy(shape)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I'm a bit worried about adding it to the global shape but I will investigate it once I implement multi-protocol support. I'm especially worried for multi protocol with all of query, json and smithy-rpc-v2-cbor, because the __type shape is needed for JSON and RPC v2 CBOR, but not for Query, so I would then need to implement the deletion of __type in Query (but I saw this workaround in the RPC v2 Parser of Botocore (filtering the __type key), so I think we could have it in our serializer 🤔)

I'll also need to properly verify that adding shapes globally wouldn't mess our internal botocore clients either.

I'm planning to have a follow-up on this once everything is implemented and I have a better view to remove this hack and find a proper solution, global, or changing the signature of _serialize_data_item to accept additional optional members and serialize them

@bentsku bentsku merged commit 396f80e into main Sep 17, 2025
54 of 56 checks passed
@bentsku bentsku deleted the cbor-parser-serializer branch September 17, 2025 10:11
@alexrashed alexrashed added the notes: skip Pull request does not have to be mentioned in the release notes label Sep 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: asf docs: skip Pull request does not require documentation changes notes: skip Pull request does not have to be mentioned in the release notes semver: minor Non-breaking changes which can be included in minor releases, but not in patch releases

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants