Skip to content

feat(recipe): context-aware DataProvider interface#1121

Merged
mchmarny merged 1 commit into
mainfrom
feat/datprovider-context
May 30, 2026
Merged

feat(recipe): context-aware DataProvider interface#1121
mchmarny merged 1 commit into
mainfrom
feat/datprovider-context

Conversation

@mchmarny

Copy link
Copy Markdown
Member

Summary

  • Threads context.Context through DataProvider.ReadFile / WalkDir so callers operating against slow/hung backing stores (stalled NFS/sshfs --data mounts) can cancel mid-read.
  • Cascades ctx through catalog.LoadWithDataProvider and the validator.ValidatePhases / ValidatePhase entry points called out in the issue acceptance criteria.
  • Folds in three doc/code drift fixes left over from chore(recipe): remove back-compat shims introduced by #1107 #1118 (per-provider criteria registry, CLI/server Client model, catalog.Loadcatalog.LoadWithDataProvider).

Motivation / Context

Fixes #1109. The pre-existing exposure: a hung NFS / sshfs read on a --data directory would block the validator goroutine until process termination, since the surrounding ValidatePhases(ctx) cancellation couldn't interrupt the in-flight ReadFile. Embedded reads cannot hang; the exposure is narrow but real for network-backed --data mounts.

Type of Change

  • New feature (non-breaking change adding functionality)

Components Affected

  • Recipes / registry (pkg/recipe)
  • Validator (pkg/validator)
  • CLI commands (pkg/cli)
  • API server (pkg/api)
  • Bundler / deployers (pkg/bundler)
  • Documentation

Implementation Notes

Interface change. DataProvider.ReadFile(ctx, path) and WalkDir(ctx, root, fn) now take context.Context. Embedded checks ctx.Err() before reading and between entries; Layered forwards ctx to embedded reads and inside mergeEmbeddedAndExternal. The bounded WalkDir(ctx) guard in LayeredDataProvider short-circuits on a stalled network mount.

Catalog entry point. catalog.LoadWithDataProvider(ctx, dp, version, commit) is the new canonical entry; catalog.Load(...) becomes a back-compat wrapper that derives a defaults.FileReadTimeout-bounded context.Background().

Validator propagation. validator.ValidatePhases and ValidatePhase forward their incoming ctx directly to catalog.LoadWithDataProvider so the surrounding --timeout reaches the catalog read.

Public-API back-compat. Existing helpers without ctx in their signature (GetManifestContent, RecipeResult.GetValuesForComponent, HydrateResult) gain *WithContext variants; the no-ctx form derives a bounded context.Background() from defaults.FileReadTimeout internally so a hung mount still returns instead of parking the goroutine. Internal cascades that pass through GetComponentRegistryFor (sync.Once cached, first-load bounded internally) carry //nolint:contextcheck with a rationale comment.

Cancellation test. TestDataProvider_HonorsContextCancellation asserts both implementations return context.Canceled from ReadFile and WalkDir when the caller cancels mid-read.

Drift cleanup (folded in). The user flagged three doc/code references left over from #1118: docs/contributor/data.md still described the criteria registry as process-global (it's per-provider now); pkg/recipe/doc.go still said the CLI/API server uses SetDataProvider (they use aicr.Client with WithRecipeSource); docs/contributor/validator.md referenced the old catalog.Load entry point.

Testing

  • go test -race ./... — all packages pass (only pre-existing pkg/trust TUF/Sigstore network test fails in this sandbox; confirmed identical failure on main via git stash).
  • golangci-lint run -c .golangci.yaml ./... — 0 issues.
  • make qualify — passes except the same pre-existing pkg/trust network test.

Coverage (changed packages)

Package Delta
pkg/recipe unchanged (88.9%)
pkg/validator/catalog unchanged (97.6%)
pkg/validator unchanged (24.3%) — only call-site cascade
pkg/cli unchanged — only call-site cascade
pkg/api unchanged — only call-site cascade
pkg/bundler unchanged — only call-site cascade
pkg/mirror unchanged — only call-site cascade
pkg/client/v1 unchanged — only call-site cascade

New test (TestDataProvider_HonorsContextCancellation) covers the four new code paths added in this PR (Embedded.ReadFile + WalkDir ctx checks, Layered.ReadFile + WalkDir ctx checks).

Risk Assessment

Low.

  • API change is additive on the call sites where ctx is now required, and the back-compat wrappers (GetManifestContent, Load, HydrateResult, etc.) keep the no-ctx public shape working unchanged for external consumers.
  • Internal bounded ctx (defaults.FileReadTimeout) ensures the no-ctx call sites still get a timeout rather than the unbounded blocking that existed before.
  • The cancellation test pins the contract.
  • Doc-only sections (data.md, validator.md, doc.go) carry no runtime risk.

Checklist

  • Tests pass (make test)
  • Linter passes (make lint)
  • Code follows existing patterns (make qualify)
  • Self-review completed
  • No Co-Authored-By lines
  • Commits are signed (-S)
  • PR title ≤ 70 chars

Closes #1109.

@coderabbitai

This comment was marked as resolved.

coderabbitai[bot]

This comment was marked as resolved.

Comment thread pkg/recipe/provider.go
Comment thread pkg/recipe/provider.go
Comment thread docs/contributor/data.md Outdated
Comment thread pkg/recipe/components.go
Comment thread pkg/recipe/provider.go
Comment thread pkg/mirror/discover.go
Threads context.Context into DataProvider.ReadFile / WalkDir so callers
operating against a slow or hung backing store (e.g., a stalled
NFS/sshfs mount on --data) can cancel the operation at I/O boundaries.
Cancellation is honored before each file open and between WalkDir
entries — not mid-syscall on an in-flight read; LayeredDataProvider
reads external files through os.Open + io.ReadAll, which are not
cancelable once started. A hung mount blocked mid-readv sees the
cancellation honored on the next file the walk touches, not the one
currently blocked. Embedded reads cannot hang but honor cancellation
between entries for symmetry.

Cascades ctx through the catalog and validator entry points called out
in the issue acceptance criteria:

  - DataProvider.ReadFile(ctx, path) / WalkDir(ctx, root, fn)
  - EmbeddedDataProvider, LayeredDataProvider, in-memory test providers
  - catalog.LoadWithDataProvider(ctx, dp, version, commit)
  - validator.ValidatePhases / ValidatePhase propagate their ctx
  - recipe.HydrateResultWithContext, GetValuesForComponentWithContext,
    GetManifestContentWithContext (existing methods kept as bounded
    back-compat wrappers via defaults.FileReadTimeout)

Pre-existing public APIs that don't yet take ctx gain a *WithContext
variant; the no-ctx form derives a bounded background ctx so a hung
mount still returns instead of hanging the goroutine.

Adds TestDataProvider_HonorsContextCancellation acceptance test
verifying ReadFile/WalkDir return context.Canceled when the caller
cancels mid-read on both embedded and layered providers.

Review feedback addressed:

  - validator.ValidatePhases / ValidatePhase now use
    errors.PropagateOrWrap (preserves catalog.LoadWithDataProvider's
    structured error code) instead of errors.Wrap.
  - DataProvider interface godoc and pkg/recipe/components.go
    loadComponentRegistryFor godoc clarified to state the real
    cancellation scope (I/O boundaries, not mid-syscall).
  - GetComponentRegistryFor now evicts transient-error cache entries
    (context.Canceled / DeadlineExceeded) so a one-time first-load
    cancellation isn't cached for the provider's lifetime — mirrors
    LoadMetadataStoreFor's existing behavior.
  - getMergedRegistry / getMergedCatalog sync.Once caching invariant
    documented (latent issue: a long-lived LayeredDataProvider reused
    across requests with cancelable contexts could poison the cache).
  - docs/contributor/data.md DataProvider interface code block updated
    to the new ctx-aware signature.

Closes #1109.
@mchmarny mchmarny force-pushed the feat/datprovider-context branch from 252aacd to 8961d0d Compare May 30, 2026 10:53
@github-actions

Copy link
Copy Markdown
Contributor

@mchmarny mchmarny enabled auto-merge (squash) May 30, 2026 11:01
@mchmarny mchmarny disabled auto-merge May 30, 2026 11:02
@mchmarny mchmarny merged commit 94149a9 into main May 30, 2026
34 checks passed
@mchmarny mchmarny deleted the feat/datprovider-context branch May 30, 2026 11:03
@github-actions

Copy link
Copy Markdown
Contributor

Coverage Report ✅

Metric Value
Coverage 76.6%
Threshold 75%
Status Pass
Coverage Badge
![Coverage](https://img.shields.io/badge/coverage-76.6%25-green)

Merging this branch will decrease overall coverage

Impacted Packages Coverage Δ 🤖
github.com/NVIDIA/aicr/pkg/api 54.60% (ø)
github.com/NVIDIA/aicr/pkg/bundler 72.64% (ø)
github.com/NVIDIA/aicr/pkg/cli 65.23% (ø)
github.com/NVIDIA/aicr/pkg/client/v1 76.79% (ø)
github.com/NVIDIA/aicr/pkg/mirror 68.14% (ø)
github.com/NVIDIA/aicr/pkg/recipe 86.73% (-0.02%) 👎
github.com/NVIDIA/aicr/pkg/validator 24.31% (ø)
github.com/NVIDIA/aicr/pkg/validator/catalog 97.53% (ø)

Coverage by file

Changed files (no unit tests)

Changed File Coverage Δ Total Covered Missed 🤖
github.com/NVIDIA/aicr/pkg/api/recipe_handler.go 69.39% (ø) 98 68 30
github.com/NVIDIA/aicr/pkg/bundler/bundler.go 72.07% (ø) 605 436 169
github.com/NVIDIA/aicr/pkg/cli/query.go 43.59% (ø) 78 34 44
github.com/NVIDIA/aicr/pkg/cli/validate_evidence.go 58.33% (ø) 12 7 5
github.com/NVIDIA/aicr/pkg/client/v1/aicr.go 81.44% (ø) 334 272 62
github.com/NVIDIA/aicr/pkg/mirror/discover.go 71.13% (ø) 142 101 41
github.com/NVIDIA/aicr/pkg/recipe/adapter.go 89.61% (+1.20%) 77 (+8) 69 (+8) 8 👍
github.com/NVIDIA/aicr/pkg/recipe/components.go 85.22% (-0.37%) 115 (+4) 98 (+3) 17 (+1) 👎
github.com/NVIDIA/aicr/pkg/recipe/handler_query.go 84.21% (ø) 76 64 12
github.com/NVIDIA/aicr/pkg/recipe/metadata_store.go 85.36% (-0.47%) 362 (+2) 309 53 (+2) 👎
github.com/NVIDIA/aicr/pkg/recipe/provider.go 90.50% (-0.02%) 221 (+10) 200 (+9) 21 (+1) 👎
github.com/NVIDIA/aicr/pkg/recipe/query.go 98.41% (+0.08%) 63 (+3) 62 (+3) 1 👍
github.com/NVIDIA/aicr/pkg/validator/catalog/catalog.go 97.53% (ø) 81 79 2
github.com/NVIDIA/aicr/pkg/validator/validator.go 19.47% (ø) 190 37 153

Please note that the "Total", "Covered", and "Missed" counts above refer to code statements instead of lines of code. The value in brackets refers to the test coverage of that file in the old version of the code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(recipe): add context.Context to DataProvider interface for cancellable I/O

2 participants