Skip to content

CRC64 schema fingerprint endianness #489

@kimgr

Description

@kimgr

First off, thank you for a great library!

We use Avro's Single Object Encoding [1] to add a little header to binary payloads; basically:

magic := []byte{0xc3, 0x01}
fingerprint, _ := schema.FingerprintUsing(avro.CRC64Avro)
header := append(magic, fingerprint...)

In interop with the canonical Java library, we see failures to identify the schema here [2]. That code is very Java, but it essentially decodes the 8 fingerprint bytes, little endian, into a 64-bit integer. Their fingerprinting algorithm produces a little-endian byte sequence [3], so that's consistent.

Unfortunately there's a byte-order inconsistency with hamba/avro here. Your CRC64 implementation produces a big-endian fingerprint [4].

The specification doesn't say anything about byte order or representation of fingerprints in the abstract (only for Single Object Encoding), but I wonder if it would be a good idea to be consistent to avoid interop surprises?

This is a breaking change, of course, so if that's a concern, would you consider taking patches for a separate little-endian fingerprint type avro.CRC64LE? Or would you prefer to turn the default? Or leave it to clients, maybe with a doc update somewhere? I'm happy to do the work if you guide me as to what changes you'd prefer (if any).

[1] https://avro.apache.org/docs/1.11.1/specification/#single-object-encoding
[2] https://github.com/apache/avro/blob/8e51c7e1c14116545c7b08e72f649064cbd9f1bb/lang/java/avro/src/main/java/org/apache/avro/message/BinaryMessageDecoder.java#L156
[3] https://github.com/apache/avro/blob/8e51c7e1c14116545c7b08e72f649064cbd9f1bb/lang/java/avro/src/main/java/org/apache/avro/SchemaNormalization.java#L64
[4]

return [Size]byte{byte(s >> 56), byte(s >> 48), byte(s >> 40), byte(s >> 32), byte(s >> 24), byte(s >> 16), byte(s >> 8), byte(s)}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions