-
Notifications
You must be signed in to change notification settings - Fork 124
feat: add SOE codecs #514
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add SOE codecs #514
Conversation
|
The current state is mostly wrappers around |
|
I thought of another thing I wanted some feedback on: the basic and generic encoder/decoder pairs could benefit from being one entity, i.e Codec and GenericCodec that each can do bother encoding and decoding. Any preference on bundling there? I've noticed when writing tests for application code only doing decoding it's very handy to have an encoder for the same schema available, and vice versa. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces native support for Avro Single Object Encoding (SOE) by implementing core encoding/decoding functionality, generic type-safe wrappers, and a dynamic decoder leveraging a schema resolver.
- Implements core SOE header creation, parsing, and fingerprint computation in soe/soe.go and soe/basic.go.
- Introduces generic and dynamic encoder/decoder types in soe/generic.go and soe/dynamic.go.
- Adds tests in soe/basic_test.go covering header format, error cases for short headers, bad magic, and fingerprint mismatches.
Reviewed Changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| soe/soe.go | Defines SOE header construction, parsing, schema fingerprinting, and associated utilities. |
| soe/resolvers.go | Introduces an interface for schema resolution used by the dynamic decoder. |
| soe/generic.go | Implements generic encoder and decoder wrappers for avrogen-generated types. |
| soe/dynamic.go | Provides a dynamic decoder that resolves schemas using a provided resolver. |
| soe/basic.go | Offers the basic encoder and decoder implementation for SOE framing. |
| soe/basic_test.go | Contains tests validating header format, magic checks, and fingerprint handling. |
Comments suppressed due to low confidence (2)
soe/soe.go:32
- [nitpick] Consider defining a constant for the header length (10) to improve code readability and maintainability.
if len(data) < 10 {
soe/soe.go:42
- [nitpick] Consider defining a constant for the expected fingerprint length (8) to enhance clarity and avoid magic numbers.
if len(fingerprint) != 8 {
Bundling them seems like a good idea. |
|
Would you prefer I finish up everything in this PR, or post piecemeal PRs split into;
|
|
Actually it's probably easier to do everything together. Just worried it will turn into a megapatch, but let's worry about that if it becomes a problem. |
Cool, done. Also added tests for generic codec, mostly repeated from basic_test.go. Still haven't finished up the DynamicDecoder + Resolver. |
|
I think this has everything I wanted to do:
Might be worthwhile to add some tests for the minor utility functions in |
|
Gentle Easter week ping! |
Sorry, I have not forgotten. Will get to it as soon as I have time. |
|
I left a couple comments, but in general I think this is looking good. Will properly review once it is out of Draft. |
|
Alright, squashed the history and wrote up a more proper commit message. When you're happy with this, I'd like to try it out with some of our internal code so let's not merge until I've had a chance to run it for real. |
nrwiersma
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Will wait for your go ahead before merging.
|
Sorry for going AWOL on this PR, I ran into 6 months of unrelated chaos and still haven't had a chance to integrate this on our end. That said, it would be easier to do so if it was merged and released, so I think we can move ahead if you're comfortable with it. I'll rebase the branch and push again. |
Avro describes a very simple Single Object Encoding in https://avro.apache.org/docs/1.12.0/specification/#single-object-encoding-specification Encoders put a schema fingerprint in the header, which decoders can use to look up a schema definition for deserialization. There are two codecs: * soe.Codec: follows avro.Marshal/Unmarshal but adds/parses SOE header * soe.TypedCodec: a type-safe generic version for avrogen-generated types Both of them can encode/decode SOE, using a single schema. Also add soe.DynamicDecoder, which has a more liberal view of types and schemas: for each record, try to look up the associated schema from a configured resolver, and decode anything for which a schema is found. This can be used for full dynamic encoding of streams of records from different encoders, as long as their schemas are available for decoding. This is similar to how the Avro Java implementation does it. Include a minimal in-memory schema store.
Avro describes a very simple Single Object Encoding in
https://avro.apache.org/docs/1.12.0/specification/#single-object-encoding-specification
Encoders put a schema fingerprint in the header, which decoders can use to look
up a schema definition for deserialization.
There are two codecs:
Both of them can encode/decode SOE, using a single schema.
Also add soe.DynamicDecoder, which has a more liberal view of types and schemas:
for each record, try to look up the associated schema from a configured
resolver, and decode anything for which a schema is found. This can be used for
full dynamic encoding of streams of records from different encoders, as long as
their schemas are available for decoding. This is similar to how the Avro Java
implementation does it.
Include a minimal in-memory schema store.