Skip to content

feat: Support uniqueItems validation for arrays of complex objects #1563

@lanej

Description

@lanej

Problem Statement

Ogen currently does not support uniqueItems: true validation for arrays containing complex objects (objects, nested arrays). When encountering such schemas, ogen skips the operation entirely with the error:

INFO  Skipping operation  {"reason_error": "complex uniqueItems not implemented"}

This limitation prevents ogen from generating clients for many real-world OpenAPI specifications, including:

  • Atlassian JIRA REST API v3 (~20 operations skipped, including workflow endpoints)
  • Mist API (issue generator failing on uniqueItems #1507)
  • Any API using arrays of objects with uniqueness constraints

Current Behavior

Code location: gen/schema_gen.go

if schema.UniqueItems {
    item := schema.Item
    if item == nil ||
        item.Type == "" ||
        item.Type == jsonschema.Array ||
        item.Type == jsonschema.Object {
        return nil, &ErrNotImplemented{Name: "complex uniqueItems"}
    }
}

Impact: Operations containing these arrays are completely skipped, resulting in incomplete API client generation.

Root Cause Analysis

Go's type system doesn't allow direct comparison of arbitrary structs with ==. The challenge is implementing equality checking for:

  • Structs with multiple fields
  • Optional fields (ogen's OptT types)
  • Pointer fields
  • Nested objects and arrays
  • Maps

PR #887 (May 2023) added uniqueItems support for primitive comparable types only (string, int, bool, etc.), but complex types remain unimplemented.

Proposed Solution

Generate type-specific Equal() and Hash() methods for all schema types that appear in uniqueItems arrays, then use hash-based deduplication for O(n) average-case performance.

Architecture Overview

  1. Type Detection: Mark types needing equality methods during schema generation
  2. Method Generation: Generate Equal() and Hash() methods for each marked type
  3. Validation Integration: Use generated methods in array validation
  4. Fallback: Hash collisions handled by calling Equal() for verification

Implementation Design

1. Generated Equal() Method

// Generated for a workflow status type
func (a WorkflowReferenceStatus) Equal(b WorkflowReferenceStatus) bool {
    // Primitive fields - direct comparison
    if a.ID != b.ID { return false }
    if a.Name != b.Name { return false }
    
    // OptString fields - ogen's optional wrapper
    if a.Description.Set != b.Description.Set { return false }
    if a.Description.Set && a.Description.Value != b.Description.Value {
        return false
    }
    
    // Pointer fields
    if (a.Category == nil) != (b.Category == nil) { return false }
    if a.Category != nil && *a.Category != *b.Category {
        return false
    }
    
    // Nested objects - recursive equality
    if !a.StatusCategory.Equal(b.StatusCategory) { return false }
    
    // Arrays - length check then element comparison
    if len(a.Properties) != len(b.Properties) { return false }
    for i := range a.Properties {
        if a.Properties[i] != b.Properties[i] { return false }
    }
    
    return true
}

2. Generated Hash() Method

func (a WorkflowReferenceStatus) Hash() uint64 {
    h := fnv.New64a()
    
    // Primitive fields
    h.Write([]byte(a.ID))
    h.Write([]byte(a.Name))
    
    // Optional fields - include presence marker
    if a.Description.Set {
        h.Write([]byte{1})
        h.Write([]byte(a.Description.Value))
    } else {
        h.Write([]byte{0})
    }
    
    // Pointers
    if a.Category != nil {
        h.Write([]byte{1})
        h.Write([]byte(*a.Category))
    } else {
        h.Write([]byte{0})
    }
    
    // Nested objects - incorporate their hash
    binary.Write(h, binary.LittleEndian, a.StatusCategory.Hash())
    
    // Arrays
    for _, prop := range a.Properties {
        h.Write([]byte(prop))
    }
    
    return h.Sum64()
}

3. Validation Function

// Generated validation function for arrays with complex uniqueItems
func validateUniqueWorkflowReferenceStatus(items []WorkflowReferenceStatus) error {
    type entry struct {
        index int
        item  WorkflowReferenceStatus
    }
    seen := make(map[uint64][]entry, len(items))
    
    for i, item := range items {
        hash := item.Hash()
        
        // Check for duplicates with same hash
        if entries, exists := seen[hash]; exists {
            for _, e := range entries {
                // Verify with Equal() to handle hash collisions
                if e.item.Equal(item) {
                    return fmt.Errorf(
                        "duplicate item found at indices %d and %d",
                        e.index, i,
                    )
                }
            }
        }
        
        seen[hash] = append(seen[hash], entry{index: i, item: item})
    }
    return nil
}

Complexity: O(n) average case, O(n²) worst case with hash collisions

Field-Level Comparison Matrix

Type Pattern Equal() Logic Hash() Logic
Primitives (string, int64, bool) Direct == Write bytes to FNV
OptT (OptString, OptInt64) Compare Set flag, then Value Write presence marker + value
Pointers (*string) Nil check, then dereference Write presence marker + dereferenced value
Arrays Length check, element-by-element Hash each element in order
Nested Objects Recursive .Equal() call Incorporate .Hash() result
Maps Length, key existence, value comparison Hash keys and values (order-independent)

Files to Modify/Create

Core Changes

  1. gen/schema_gen.go - Remove ErrNotImplemented for complex uniqueItems (~5 lines)
  2. gen/ir/validation.go - Track types needing equality methods (~20 lines)
  3. gen/gen_equality.go - NEW FILE: Generate Equal() and Hash() methods (~400 lines)
  4. gen/gen_validators.go - Generate validation calls for complex arrays (~50 lines)
  5. validate/array.go - Add complex uniqueItems validation logic (~30 lines)

Testing

  1. gen/gen_equality_test.go - NEW FILE: Test method generation (~200 lines)
  2. validate/array_test.go - Add complex uniqueItems test cases (~300 lines)
  3. Integration tests with real OpenAPI specs

Example OpenAPI Schema

components:
  schemas:
    WorkflowReferenceStatus:
      type: object
      properties:
        id:
          type: string
        name:
          type: string
        description:
          type: string
        statusCategory:
          $ref: '#/components/schemas/StatusCategory'
      required:
        - id
        - name
    
    Workflow:
      type: object
      properties:
        id:
          type: string
        statuses:
          type: array
          items:
            $ref: '#/components/schemas/WorkflowReferenceStatus'
          uniqueItems: true  # ← Currently causes operation skip

Current behavior: Operation containing Workflow schema is skipped
Expected behavior: Full operation generation with runtime uniqueness validation

Edge Cases to Handle

  1. Empty arrays: Valid (no duplicates possible)
  2. Single element: Valid (no duplicates possible)
  3. Nil vs unset optional fields: Treated as different values
  4. Hash collisions: Must verify with Equal()
  5. Nested arrays: Recursive comparison
  6. Circular references: Potential stack overflow (may need cycle detection)
  7. Map field ordering: Order-independent hashing
  8. Large objects: Performance optimization may be needed

Testing Strategy

Unit Tests

func TestGeneratedEqual(t *testing.T) {
    tests := []struct {
        name    string
        a, b    WorkflowReferenceStatus
        wantEq  bool
    }{
        {
            name: "identical objects",
            a:    WorkflowReferenceStatus{ID: "1", Name: "Open"},
            b:    WorkflowReferenceStatus{ID: "1", Name: "Open"},
            wantEq: true,
        },
        {
            name: "different IDs",
            a:    WorkflowReferenceStatus{ID: "1", Name: "Open"},
            b:    WorkflowReferenceStatus{ID: "2", Name: "Open"},
            wantEq: false,
        },
        {
            name: "optional set vs unset",
            a:    WorkflowReferenceStatus{ID: "1", Description: OptString{Set: true, Value: "A"}},
            b:    WorkflowReferenceStatus{ID: "1", Description: OptString{Set: false}},
            wantEq: false,
        },
        {
            name: "both optional unset",
            a:    WorkflowReferenceStatus{ID: "1", Description: OptString{Set: false}},
            b:    WorkflowReferenceStatus{ID: "1", Description: OptString{Set: false}},
            wantEq: true,
        },
    }
    
    for _, tt := range tests {
        t.Run(tt.name, func(t *testing.T) {
            got := tt.a.Equal(tt.b)
            if got != tt.wantEq {
                t.Errorf("Equal() = %v, want %v", got, tt.wantEq)
            }
            
            // Verify hash consistency
            if tt.wantEq && tt.a.Hash() != tt.b.Hash() {
                t.Error("Equal items must have equal hashes")
            }
        })
    }
}

Integration Tests

  • JIRA API workflow operations
  • Nested object arrays
  • Mixed primitive and complex fields
  • Performance benchmarks (1k, 10k, 100k items)

Performance Expectations

Array Size Unique Items Performance
100 Yes ~0.1ms
1,000 Yes ~1ms
10,000 Yes ~10ms
100 With duplicates ~0.5ms (early detection)

Migration Path

This is a non-breaking change:

  • Existing primitive uniqueItems validation continues to work
  • Complex types that were previously skipped will now be generated
  • No changes needed to existing generated code

Implementation Effort Estimate

Phase Description Effort
1 Equal() generation for all type patterns 1-2 weeks
2 Hash() generation 1 week
3 Validation integration 3-5 days
4 Comprehensive testing 1 week
Total 4-5 weeks

Open Questions for Discussion

  1. Hash algorithm: Use FNV-1a (fast, good distribution) or a different hash function?
  2. Circular reference detection: Should we add cycle detection, or document this limitation?
  3. Map field hashing: Order-independent hashing required - sort keys first?
  4. Code size concerns: Generated Equal() methods could be large - acceptable tradeoff?
  5. Opt-in flag: Should this be enabled by default or require a config flag?
  6. Generic constraints: Use Go 1.18+ generics for the validation function signature?

Real-World Impact

Atlassian JIRA REST API v3

  • Operations affected: ~20 (including readWorkflows, createWorkflow, etc.)
  • Current workaround: Use openapi-generator instead of ogen
  • With this feature: 100% ogen compatibility

Benefits

  • ✅ Complete OpenAPI 3.0 compliance for uniqueItems
  • ✅ Runtime validation catches duplicate items
  • ✅ Type-safe, generated code (no reflection)
  • ✅ Performance: O(n) average case
  • ✅ Enables ogen adoption for more APIs

References

Proposed by

This enhancement request is being created to document the design for a community-contributed implementation. The implementation will be developed in a fork and submitted as a PR for review.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions