Skip to content

Conversation

@kimgr
Copy link
Contributor

@kimgr kimgr commented Jan 20, 2025

The Avro specification details a Single Object Encoding using a header
to associate a schema ID with an Avro payload. The ID is defined as the
CRC64 fingerprint in little-endian encoding.

The pkg/crc64 module only provides big-endian CRC64, and the CRC64-AVRO
fingerprint type is implemented as such. The specification does not
detail endianness of the CRC64-AVRO fingerprint itself (only when
embedded in an SOE header).

To avoid breaking existing CRC64-AVRO fingerprints, add a new
fingerprint type CRC64-AVRO-LE, identical to CRC64-AVRO except
little-endian.

Add NewWithByteOrder and SumWithByteOrder top-level functions to crc64
so users can configure the hasher to use a specific byte order.

Add tests and benchmarks for the SumWithByteOrder function.

Fixes #489.

@kimgr
Copy link
Contributor Author

kimgr commented Jan 20, 2025

See also #490.

@kimgr kimgr force-pushed the sum-encoding branch 3 times, most recently from c76586d to e126cff Compare January 23, 2025 13:06
Copy link
Member

@nrwiersma nrwiersma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor issue, otherwise it is looking good.

The Avro specification details a Single Object Encoding using a header
to associate a schema ID with an Avro payload. The ID is defined as the
CRC64 fingerprint in little-endian encoding.

The pkg/crc64 module only provides big-endian CRC64, and the CRC64-AVRO
fingerprint type is implemented as such. The specification does not
detail endianness of the CRC64-AVRO fingerprint itself (only when
embedded in an SOE header).

To avoid breaking existing CRC64-AVRO fingerprints, add a new
fingerprint type CRC64-AVRO-LE, identical to CRC64-AVRO except
little-endian.

Add NewWithByteOrder and SumWithByteOrder top-level functions to crc64
so users can configure the hasher to use a specific byte order.

Add tests and benchmarks for the SumWithByteOrder function.

Fixes hamba#489.
Copy link
Member

@nrwiersma nrwiersma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@nrwiersma nrwiersma merged commit e9a408d into hamba:main Jan 30, 2025
12 checks passed
@kimgr
Copy link
Contributor Author

kimgr commented Jan 30, 2025

Thanks! Oh, I forgot about the README in all the revisions. Were you thinking something like "if you want to use a schema fingerprint for SOE, use CRC64-AVRO-LE because reasons"?

@nrwiersma
Copy link
Member

No stress, it will be documented in the release. I think that should be fine.

@kimgr
Copy link
Contributor Author

kimgr commented Jan 30, 2025

Alright, I'll leave it to you. Thanks again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

CRC64 schema fingerprint endianness

2 participants