Skip to content

feat(recipe): per-Builder DataProvider isolation via WithDataProvider option #986

Description

@mchmarny

Part of #983.

Summary

recipe.SetDataProvider mutates a process-global, and both loadMetadataStore (pkg/recipe/metadata_store.go) and GetComponentRegistry (pkg/recipe/components.go) cache their loaded state via sync.Once against whatever provider was set when the cache first populated.

Two Builders in the same process — say, an in-process operator handling multiple recipe sources, or a test harness exercising two layered providers — silently share whichever cache populated first, regardless of which provider was last SetDataProvider'd. The store and registry returned to the second Builder are not the ones from its own source.

Add a WithDataProvider(dp) builder option and per-DataProvider caches so each Builder resolves against its own source without interference.

Motivation

The current shape works for AICR's shipping consumers (CLI, API server) because each is a single-tenant process with one Builder. It does not work — and cannot be made to work safely — for any in-process consumer that needs to maintain more than one recipe source over its lifetime. The hazard is dormant only because no in-tree caller triggers it.

The underlying pattern is documented as an anti-pattern in .claude/CLAUDE.md:

sync.Once caching state that depends on a settable global (e.g., a registry tied to a DataProvider): Key the cache by a generation counter the setter increments; recompute on miss so late-bound configuration takes effect.

The approach below does one better: key the cache by DataProvider identity, so concurrent in-process consumers with different providers each get the correct cached state with no eviction or recomputation needed.

Proposed change

1. WithDataProvider builder option

In pkg/recipe/builder.go:

// WithDataProvider binds the Builder to a specific DataProvider,
// isolating its metadata store and component registry from the
// process-global ones at GetDataProvider().
//
// Use this from any caller that constructs more than one Builder
// per process. When unset, the Builder falls back to the
// package-global DataProvider — preserving the CLI and API
// server behavior.
func WithDataProvider(dp DataProvider) Option

The Builder gains a private dp DataProvider field. When set, all internal calls that previously short-circuited through GetDataProvider() route through b.dp instead.

2. Per-provider metadata store cache

Replace the sync.Once-backed package globals in pkg/recipe/metadata_store.go with a sync.Map[DataProvider]*entry keyed by provider:

var storeCache sync.Map // map[DataProvider]*storeCacheEntry

type storeCacheEntry struct {
    once  sync.Once
    store *MetadataStore
    err   error
}

loadMetadataStore takes a dp DataProvider parameter. When nil, falls back to GetDataProvider() so the existing call sites (Builder without WithDataProvider) continue to hit a single cache entry.

3. Per-provider component registry cache

Same treatment for pkg/recipe/components.go. The existing GetComponentRegistry() becomes a thin shim that calls a new getComponentRegistryFor(dp) against the global provider. A new public entry point lets callers ask for the registry of a specific provider.

4. Eviction helpers

Long-running in-process consumers that retire providers over time (e.g., when a configuration source is deleted) need a way to drop the corresponding cache entries — otherwise memory grows monotonically with the number of unique providers ever observed.

// EvictCachedStore drops the cached MetadataStore for the supplied
// DataProvider. The next loadMetadataStore call for that provider
// will rebuild from scratch. No-op if no entry exists.
func EvictCachedStore(dp DataProvider)

// EvictCachedRegistry drops the cached ComponentRegistry for the
// supplied DataProvider. No-op if no entry exists.
func EvictCachedRegistry(dp DataProvider)

Both must be safe on nil providers (return without panicking).

5. Store carries its provider

MetadataStore should carry the DataProvider it was loaded from so any store-internal lookup (e.g., resolving a component registry default) uses the same provider, not the package-global. This closes a subtle drift window where the store and registry could disagree if SetDataProvider is called between their loads.

Tests

  • Add tests in pkg/recipe/builder_test.go that construct two Builders via NewBuilder(WithDataProvider(dpA)) and NewBuilder(WithDataProvider(dpB)) against fixtures with materially different component sets, then resolve the same criteria on both concurrently (with -race). Assert each result reflects its own provider's components.
  • Eviction tests: load a store, evict, load again, assert a fresh read (not a cache hit). Confirm nil-provider eviction is a no-op.
  • Regression tests for the existing single-provider path: every existing recipe test must pass unchanged, demonstrating back-compat for CLI / API server.

Acceptance

  • recipe.WithDataProvider, recipe.EvictCachedStore, recipe.EvictCachedRegistry, and (if added) recipe.GetComponentRegistryFor are public, documented, and tested.
  • make test -race ./pkg/recipe/... clean, including the new multi-provider concurrency tests.
  • CLI / API server behavior unchanged (verify via existing integration tests in tests/chainsaw/ and the KWOK matrix).
  • Doc update in docs/contributor/data.md covering the new option and when to use it.

Out of scope

  • Deprecating SetDataProvider / GetDataProvider. Covered by a separate child issue, gated on this one.
  • Migrating CLI / API server call sites to use WithDataProvider. Optional follow-up.
  • Pure performance work — the cache shape is per-provider but no benchmarking required as part of this change.

Dependencies

  • Soft dependency: the deep-clone fix (sibling child issue) should ideally land first. Per-provider caches widen the blast radius of the existing aliasing bug — any test that hits two providers concurrently and mutates a returned Validation field could expose it. Order matters more for clean review than for correctness; both can be in flight simultaneously.
  • Hard dependency for the SetDataProvider deprecation (sibling child issue, depends on this one).

Metadata

Metadata

Assignees

No one assigned

    Fields

    No fields configured for Enhancement.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions