feat(health): pkg/health core Compute loop, resolves signal, rollup#1291
Conversation
📝 WalkthroughWalkthroughThis PR adds the new pkg/health package implementing ADR-009 health reporting. It introduces the Compute() function that enumerates recipe catalog leaf overlays, builds each entry, classifies resolver errors into a resolves signal (pass/fail/unknown), applies dimension rollup with explicit precedence (fail > warn > unknown > pass), and returns a deterministically sorted report. The package includes data types for options, coverage tracking, per-combo health, and the overall report, along with comprehensive tests validating error classification, rollup semantics, embedded catalog correctness, deterministic serialization, and error handling. Configuration changes add ownership and labeler support. Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related issues
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
Coverage Report ✅
Coverage BadgeMerging this branch will increase overall coverage
Coverage by fileChanged files (no unit tests)
Please note that the "Total", "Covered", and "Missed" counts above refer to code statements instead of lines of code. The value in brackets refers to the test coverage of that file in the old version of the code. |
njhensley
left a comment
There was a problem hiding this comment.
Solid, well-documented foundation for ADR-009 — the generic rollup, the descriptor-vs-graded split, byte-deterministic output, and the standalone-package rationale (avoiding the future recipe → evidence → recipe import cycle) are all well-judged, and coverage is strong.
I have one correctness concern about the unknown classification being too wide (latent today — every current leaf resolves pass), one context-handling gap, and a test gap worth closing alongside the first. Details inline.
Introduce the standalone pkg/health package per ADR-009 V1: the ADR-009 §4 types, a Compute() that enumerates leaf recipes via ListCatalog and resolves each through a single shared recipe.Builder, the graded resolves signal, a generic status rollup with unknown-held semantics, and byte-deterministic output. The package is standalone (not pkg/recipe) to keep the future v1.1 evidence import acyclic. Only the resolves dimension is graded here; chart_pinned/declared_coverage (#1226) and constraints_wellformed (#1227) layer onto the same generic rollup without changing it. The held-unknown bucket holds only genuinely re-runnable errors (context cancellation/deadline and ErrCodeTimeout). ErrCodeInternal is graded fail: on the resolve path it is reserved for deterministic structural defects (e.g. a missing registry healthCheck.assertFile), which re-runs never clear — narrowing ADR-009 §2's literal wording to its stated "re-run rather than penalize" intent. The catalog loop fails loud on context cancellation so a truncated run is not mistaken for a healthy one. Wire pkg/health/** into CODEOWNERS and the area/recipes labeler glob. Closes #1225
8c77b00 to
fe7da04
Compare
njhensley
left a comment
There was a problem hiding this comment.
Approving — all review comments addressed in the force-push.
isTransientnarrowed (the High-latent finding):ErrCodeInternalnow gradesfail; onlyErrCodeTimeout+context.DeadlineExceeded/Canceledare held asunknown. Unit cases flipped accordingly.- Loop now fails loud on
ctx.Err()instead of emitting a truncated report that looks healthy. - New
TestComputeGradesDeterministicDefectAsFaildrives a real broken leaf (missinghealthCheck.assertFile→ErrCodeInternal) throughComputeand assertsfailend-to-end — exactly the trigger path the bucket fix protects, guarding against regression. - CODEOWNERS wording fixed.
The two remaining nits (always-allocated Detail map, fail-path determinism fixture) are intentionally deferred and fine for V1. Nice, well-tested foundation for ADR-009.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@pkg/health/health.go`:
- Around line 44-45: Update the comment for the StatusUnknown constant to
reflect the actual transient logic in isTransient: remove mention of
ErrCodeInternal and only reference ErrCodeTimeout (and context
deadline/canceled) as transient; ensure the comment aligns with the rationale
already described near the isTransient function and the handling that
deliberately treats ErrCodeInternal as a non-transient failure.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Enterprise
Run ID: 41e54918-682a-49ad-b8fe-c61a6e24ddd9
📒 Files selected for processing (6)
.github/CODEOWNERS.github/labeler.ymlpkg/defaults/timeouts.gopkg/health/doc.gopkg/health/health.gopkg/health/health_test.go
Summary
Adds the standalone
pkg/healthpackage per ADR-009 V1: the core enumeration→resolution loop (Compute), the gradedresolvessignal, a generic per-recipe status rollup withunknown-held semantics, and byte-deterministic output.Motivation / Context
First building block of the Recipe health tracking epic (ADR-009 V1).
pkg/healthis standalone (not inpkg/recipe) so the future v1.1 evidence axis stays acyclic —pkg/evidence/attestationalready importspkg/recipe, so hosting health there would later close arecipe → evidence → recipecycle. This PR ships only the package core and theresolvesdimension; the read-only signals and the constraint signal land on top of it.Fixes: #1225
Related: #1224 (epic), #1226, #1227
Type of Change
Component(s) Affected
pkg/recipe) — new sibling packagepkg/healthImplementation Notes
Computeenumerates leaf overlays viaMetadataStore.ListCatalog(filter)(filtered toIsLeaf) and resolves each through a single sharedrecipe.NewBuilder(...)so the owner-stamp invariant holds across combos. Combos are sorted by criteria string (tie-broken by leaf name) for determinism.resolvesgrading: resolves →pass; non-transient error →fail(message inDetail);ErrCodeTimeout/ErrCodeInternal→unknown, held out of the fail/warn rollup (re-run rather than penalize).Dimensionsmap generically (precedencefail > warn > unknown > pass), so pkg/health: chart_pinned + declared_coverage signals #1226/pkg/health: constraints_wellformed signal (parse-only, hermetic) #1227 add dimensions without touching rollup code. A heldunknownsurfaces as the recipe status when nothing fails/warns — never silently aspass.resolves-only core leavesDeclaredCoverageat its zero value; the descriptor types are defined now so later signals slot in without a schema change.time.Now()on the computed path. The one non-deterministic input (unknown-from-real-timeout) is not exercised offline; the determinism test asserts byte-equality on the non-error embedded-catalog path viaserializer.MarshalYAMLDeterministic.Testing
classifyResolve,isTransient,rollup(incl. unknown-held precedence).resolvesdimension with a status consistent with the rollup; byte-determinism across twoComputeruns; non-matching filter → empty report; failing provider → surfaced load error.pkg/healthcoverage: 95.5% (new package; floor is 75%).Risk Assessment
Rollout notes: N/A
Checklist
make testwith-race)make lint)git commit -S)