Skip to content

Conversation

@dylanlauzy
Copy link
Contributor

Goal of this PR

This PR adds stricter schema validation to bring hamba/avro in closer alignment with the Avro specification and behavior seen in the official Apache implementations. This ensures proper validation when parsing schemas.

Changes

  1. Reject duplicate field names in record schemas (Avro Python ref)

    • Added validation to parseRecord to reject duplicate field names using a map/set.
    • Added unit test: "Duplicate Field Names" in TestRecordSchema.
    • Updated unit test: TestConfig_ReusesDecoders_WithWriterFingerprint to remove duplicate field name in test record.
  2. Reject duplicate symbols in enum schemas (Avro Python ref

    • Added validation in NewEnumSchema using a map/set to ensure uniqueness.
    • Added unit test: "Duplicate Symbols" in TestEnumSchema.
  3. Disallow negative fixed sizes (Avro Python ref)

    • Added validation in NewFixedSchema to reject size < 0.
    • Added unit test: "Invalid Size Value" in TestFixedSchema.

How did I test it?

make test: All existing tests pass, and additional test cases have been added and pass to verify the new validation checks.

References

These changes address issues similar to those reported in the official Avro implementations:

@nrwiersma nrwiersma merged commit 142e304 into hamba:main Jun 5, 2025
16 checks passed
@dylanlauzy
Copy link
Contributor Author

@nrwiersma thanks for merging! Is there an estimate for when the next release will be published that includes these changes?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants