Skip to content

SchemaCompatibility.Resolve incorrectly resolves enums #556

@Rirush

Description

@Rirush

When using schema resolution with enums that have reordered or removed symbols, resolved schema is incorrect if writer does not have new incompatible symbols.

Expected behavior

When I have these two schemas, I expect to be able to correctly parse data after performing schema resolution.

Reader schema:

{"type": "enum", "name": "Enum", "symbols": ["A", "B", "C", "D"]}

Writer schema:

{"type": "enum", "name": "Enum", "symbols": ["B", "C", "D"]}

Actual behavior

The parsed data uses the reader schema's symbol positions to parse the data, instead of the writer schema's positions.

Example code to reproduce the issue

Playground link: https://go.dev/play/p/6nuMheyX0j9

package main

import (
	"fmt"

	"github.com/hamba/avro/v2"
)

func main() {
	reader := avro.MustParse(`{"type": "enum", "name": "Enum", "symbols": ["A", "B", "C", "D"]}`)
	writer := avro.MustParse(`{"type": "enum", "name": "Enum", "symbols": ["B", "C", "D"]}`)

	data, err := avro.Marshal(writer, "D")
	if err != nil {
		panic(err)
	}

	sc := avro.NewSchemaCompatibility()
	rs, err := sc.Resolve(reader, writer)
	if err != nil {
		panic(err)
	}

	var value string
	err = avro.Unmarshal(rs, data, &value)
	if err != nil {
		panic(err)
	}

	// Expected output: "Resolved schema: D"
	// Observed output: "Resolved schema: C"
	fmt.Println("Resolved schema:", value)
}

Probable root cause

I believe this issue is caused by this if statement in the resolution logic:

if err = c.checkEnumSymbols(r, w); err != nil {
(permalink copied from main at the time of report). checkEnumSymbols method only checks for names of the symbols, but not positions, so if all names are compatible, but positions are not, the reader schema is incorrectly returned as-is. A possible solution might be setting the encodedSymbols field when order is mismatched, instead of simply returning the reader schema.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions