-
Notifications
You must be signed in to change notification settings - Fork 125
Description
First off, thank you for a great library!
We use Avro's Single Object Encoding [1] to add a little header to binary payloads; basically:
magic := []byte{0xc3, 0x01}
fingerprint, _ := schema.FingerprintUsing(avro.CRC64Avro)
header := append(magic, fingerprint...)
In interop with the canonical Java library, we see failures to identify the schema here [2]. That code is very Java, but it essentially decodes the 8 fingerprint bytes, little endian, into a 64-bit integer. Their fingerprinting algorithm produces a little-endian byte sequence [3], so that's consistent.
Unfortunately there's a byte-order inconsistency with hamba/avro here. Your CRC64 implementation produces a big-endian fingerprint [4].
The specification doesn't say anything about byte order or representation of fingerprints in the abstract (only for Single Object Encoding), but I wonder if it would be a good idea to be consistent to avoid interop surprises?
This is a breaking change, of course, so if that's a concern, would you consider taking patches for a separate little-endian fingerprint type avro.CRC64LE? Or would you prefer to turn the default? Or leave it to clients, maybe with a doc update somewhere? I'm happy to do the work if you guide me as to what changes you'd prefer (if any).
[1] https://avro.apache.org/docs/1.11.1/specification/#single-object-encoding
[2] https://github.com/apache/avro/blob/8e51c7e1c14116545c7b08e72f649064cbd9f1bb/lang/java/avro/src/main/java/org/apache/avro/message/BinaryMessageDecoder.java#L156
[3] https://github.com/apache/avro/blob/8e51c7e1c14116545c7b08e72f649064cbd9f1bb/lang/java/avro/src/main/java/org/apache/avro/SchemaNormalization.java#L64
[4]
Line 96 in 68046a4
| return [Size]byte{byte(s >> 56), byte(s >> 48), byte(s >> 40), byte(s >> 32), byte(s >> 24), byte(s >> 16), byte(s >> 8), byte(s)} |