fix(bundler): derive argocd-helm chart name from OCI artifact path#1032
Conversation
The argocd-helm deployer hardcoded `aicr-bundle` as the Chart.yaml name
and parent Application's `source.chart`, so any user who published the
bundle under a non-default OCI artifact name hit a 404 from ArgoCD when
the parent App tried to assemble `repoURL/chart:targetRevision`.
Fix flows the OCI artifact's last path segment (Reference.ChartName())
through bundler config into argocdhelm.Generator.ChartName, which now
parameterizes both Chart.yaml and the parent Application template (the
latter via `{{ .Chart.Name }}` so the chart name is sourced from the
rendered chart at install time). Local-directory output keeps the
"aicr-bundle" default.
Also corrects the README and `required` directive guidance: --set
repoURL should be the parent namespace (no chart suffix), since the
parent App appends .Chart.Name when assembling the OCI reference.
Previous text instructed users to include the chart segment, which
would double-suffix and 404.
Fixes #1019
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Enterprise Run ID: 📒 Files selected for processing (13)
📝 WalkthroughWalkthroughThis PR fixes a bug where the argocd-helm deployer hardcodes the parent chart name as Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
Coverage Report ✅
Coverage BadgeMerging this branch will increase overall coverage
Coverage by fileChanged files (no unit tests)
Please note that the "Total", "Covered", and "Missed" counts above refer to code statements instead of lines of code. The value in brackets refers to the test coverage of that file in the old version of the code. |
…ote chart name (NVIDIA#1034) Two argocd-helm OCI publishing bugs surfaced by Codex review against the post-NVIDIA#1032 contract. P1 — Path-based child Applications resolve to a non-existent artifact NVIDIA#1032 changed the --set repoURL contract so callers pass only the parent namespace (e.g., oci://reg/org); the parent Application appends .Chart.Name via its separate source.chart field. The parent-App template at parentAppTemplate / line 407 implements this correctly, but the path-based child-App template in injectValuesIntoSingleSource at line 703 was left emitting only .Values.repoURL. Argo CD's generic OCI source (used by path-based children since they have no source.chart) treats spec.source.repoURL as the full artifact reference and resolves it directly, so under the new contract a child source pointed at oci://reg/org:tag — an artifact that doesn't exist — and the child Application failed to sync. Fix: append /{{ .Chart.Name }} to the rendered repoURL value so the assembled URL matches the artifact the parent App's repoURL/chart:tag triple resolves to. The error-message text in the required directive is updated to say "this template appends .Chart.Name" (the path-based template is doing the appending, not the parent App). P2 — Unquoted name and version in generated Chart.yaml writeChartYAML emitted "name: %s" / "version: %s" with raw fmt.Fprintf. (*Reference).ChartName() returns path.Base(Repository), so a valid OCI artifact path whose last segment is a YAML reserved scalar — "null", "true", "false", "yes", "no", numeric strings — produced an unquoted YAML reserved word as the chart's name. Helm's loader then parsed name: null as YAML null, chart.Metadata.Name became empty, and helm package / helm push rejected the chart with "chart.metadata.name is required". Same trap for the version field ("1.0" parses as float). Fix: emit both via %q so the values round-trip as YAML strings. OCI artifact path segments are constrained to printable ASCII by the docker reference grammar, which is the safe charset for Go's %q. Tests - TestInjectValuesIntoSingleSource_AppendsChartName — asserts the rendered path-based child source.repoURL appends /{{ .Chart.Name }} after the user-supplied .Values.repoURL. - TestWriteChartYAML_QuotesYAMLReservedScalarsAsName — table-driven case across null / true / false / numeric / yes / normal name; for each, writes Chart.yaml and verifies yaml.Unmarshal round-trips name back as the expected string. - TestHelmTemplate_RendersWithSetRepoURL — updated to exercise the new contract: --set repoURL=oci://reg/org (parent namespace only). Asserts the parent App's repoURL equals the parent namespace and the child App's repoURL equals parent-namespace + /aicr-bundle. - TestGenerate_CustomChartName and the existing Chart.yaml-version assertion updated to expect quoted scalars. - Golden templates and Chart.yaml fixtures regenerated. Closes NVIDIA#1034 Related: NVIDIA#1032 (the contract change this PR completes), NVIDIA#1019, NVIDIA#965
The argocd-helm-oci wrapper script was passing the FULL bundle URL to `helm install --set repoURL=…` (including the per-recipe chart name at the end). That matched the pre-PR-NVIDIA#1032 contract where the parent Application's `source.chart` was hardcoded to `aicr-bundle`. PR NVIDIA#1032 (and NVIDIA#1035's reinforcement) changed the parent App template to expect the parent-namespace-only repoURL and to append .Chart.Name itself via the separate `source.chart` field. The wrapper script wasn't updated to match. Result on every PR with argocd-helm-oci Tier-1 KWOK coverage: the parent App resolves to `oci://registry.aicr-registry.svc.cluster.local:5000/aicr/<recipe>/<recipe>:<tag>`, the OCI artifact lookup 404s, gpu-operator-post's Application can never sync, and the whole stack times out on `GitOps sync timeout strike 1/3`. The failure was masked on `main` because the most-recent KWOK Cluster Validation run on `main` (#26469449378 at 0d3e62d, success) ran *before* PR NVIDIA#1035 merged. After NVIDIA#1035 / NVIDIA#1036 / NVIDIA#1038 all landed on main, no fresh KWOK run has triggered on `main` yet — but the next one will fail the same way every open PR's argocd-helm-oci Tier-1 jobs are currently failing. Fix is a one-line drop of the per-recipe suffix from OCI_IN_CLUSTER_REF in the argocd-helm-oci branch of generate_bundle. The flux branch keeps the per-recipe suffix because flux's OCIRepository CR consumes the FULL artifact URL (recipe segment included). Updated the surrounding comment to point at the post-NVIDIA#1032 contract so the next reader understands the asymmetry. End-to-end check (verified from PR NVIDIA#1030's debug artifact at b3f2296): repo-server log shows `registry.aicr-registry.svc.cluster.local:5000/aicr/<recipe>/<recipe>:<tag>: not found`, caused by the same double-append. With the recipe suffix dropped, Argo's resolution `<repoURL>/<chart>:<tag>` aligns with the pushed artifact at `oci://…/aicr/<recipe>:<tag>`. Refs PR NVIDIA#1030 (where this surfaced), PR NVIDIA#1032 (contract change), PR NVIDIA#1035 (parent App template enforcement).
Codex's P2 note on the previous wrapper-script fix: dropping the
per-recipe suffix from OCI_IN_CLUSTER_REF unconditionally broke the
argocd-oci lane. The two argocd-* deployers consume that value
differently:
- argocd-oci bakes it into each flat Application via `--repo`. The
rendered Applications use `spec.source.repoURL` as the FULL OCI
artifact location and never append .Chart.Name. They need the
per-recipe suffix in the value.
- argocd-helm-oci passes it through to `helm install --set repoURL`
at install time. The wrapper chart's parent Application appends
.Chart.Name via its separate `source.chart` field, and the
path-based child template appends /{{ .Chart.Name }} as a string,
so both halves of the bundle would double-resolve the recipe
segment if it were in the value already.
Fix: keep OCI_IN_CLUSTER_REF as the FULL artifact URL (its
generate_bundle-side contract, used by argocd-oci) and strip the
trailing "/<recipe>" only at the argocd-helm-oci helm-install site
via parameter expansion (helm_repo_url="${OCI_IN_CLUSTER_REF%/*}").
Documented the asymmetry inline at both sites so the next reader sees
why the two consumers disagree.
Refs PR NVIDIA#1030, PR NVIDIA#1032 (contract change), PR NVIDIA#1035 (parent App
template enforcement). Restores argocd-oci correctness while keeping
the post-NVIDIA#1032 contract fix for argocd-helm-oci.
This reverts commit 7ed2bde and the prior 5620000. Codex's second pass on the wrapper-script change pointed out the strip is unsafe on this branch: the argocd-helm wrapper chart here still uses the pre-NVIDIA#1032 contract (parent App's `chart: aicr-bundle` is hardcoded; path-based children use .Values.repoURL directly). With the strip, the parent App would resolve to `oci://.../aicr/aicr-bundle:tag` while the script pushes to `oci://.../aicr/<recipe>:tag` — wrong. The original Tier-1 argocd-helm-oci failures still need root-causing, but a different change in this PR — not the wrapper-script repoURL — must be responsible. Reverting both wrapper-script commits restores the same OCI_IN_CLUSTER_REF semantics main has been running with, and unblocks separate diagnosis without dragging the wrapper script along. Refs PR NVIDIA#1030.
The argocd-helm-oci wrapper script was passing the FULL bundle URL to `helm install --set repoURL=…` (including the per-recipe chart name at the end). That matched the pre-PR-NVIDIA#1032 contract where the parent Application's `source.chart` was hardcoded to `aicr-bundle`. PR NVIDIA#1032 (and NVIDIA#1035's reinforcement) changed the parent App template to expect the parent-namespace-only repoURL and to append .Chart.Name itself via the separate `source.chart` field. The wrapper script wasn't updated to match. Result on every PR with argocd-helm-oci Tier-1 KWOK coverage: the parent App resolves to `oci://registry.aicr-registry.svc.cluster.local:5000/aicr/<recipe>/<recipe>:<tag>`, the OCI artifact lookup 404s, gpu-operator-post's Application can never sync, and the whole stack times out on `GitOps sync timeout strike 1/3`. The failure was masked on `main` because the most-recent KWOK Cluster Validation run on `main` (#26469449378 at 0d3e62d, success) ran *before* PR NVIDIA#1035 merged. After NVIDIA#1035 / NVIDIA#1036 / NVIDIA#1038 all landed on main, no fresh KWOK run has triggered on `main` yet — but the next one will fail the same way every open PR's argocd-helm-oci Tier-1 jobs are currently failing. Fix is a one-line drop of the per-recipe suffix from OCI_IN_CLUSTER_REF in the argocd-helm-oci branch of generate_bundle. The flux branch keeps the per-recipe suffix because flux's OCIRepository CR consumes the FULL artifact URL (recipe segment included). Updated the surrounding comment to point at the post-NVIDIA#1032 contract so the next reader understands the asymmetry. End-to-end check (verified from PR NVIDIA#1030's debug artifact at b3f2296): repo-server log shows `registry.aicr-registry.svc.cluster.local:5000/aicr/<recipe>/<recipe>:<tag>: not found`, caused by the same double-append. With the recipe suffix dropped, Argo's resolution `<repoURL>/<chart>:<tag>` aligns with the pushed artifact at `oci://…/aicr/<recipe>:<tag>`. Refs PR NVIDIA#1030 (where this surfaced), PR NVIDIA#1032 (contract change), PR NVIDIA#1035 (parent App template enforcement).
Summary
Argo CD–Helm bundles published to a non-default OCI artifact name now generate a parent Application whose
source.chartmatches the actual artifact, instead of the hardcodedaicr-bundleliteral.Motivation / Context
When a user runs
aicr bundle --deployer argocd-helm --output oci://<reg>/<org>/<custom-name>:<tag>, the bundler pushes to<reg>/<org>/<custom-name>:<tag>but the generated parentaicr-stackApplication hardcodedchart: aicr-bundle. ArgoCD then tries to resolverepoURL/chart:tagagainst an artifact that does not exist, surfacing asfailed to load target state … not found. The README also instructed users to pass--set repoURL=...with the chart segment included, which would double-suffix the OCI reference.Fixes: #1019
Type of Change
Component(s) Affected
cmd/aicr,pkg/cli)pkg/bundler,pkg/component/*)Implementation Notes
oci.Reference.ChartName()returns the last path segment ofRepository(e.g.,oci://ghcr.io/org/my-bundle:v1→my-bundle).bundleCmdOptions.bundleChartName, thenconfig.WithBundleChartName(...), thenargocdhelm.Generator.ChartName.Chart.yamlnow writesname: <chartName>. The parent Application template uses{{ .Chart.Name | quote }}forsource.chartso the chart name is sourced from the rendered chart at install time — no double source of truth.--output oci://) keeps the existingaicr-bundledefault, preserving the current behavior for that path.requireddirective and README updated to clarify that--set repoURLmust be the parent namespace (no chart suffix); the parent App appends.Chart.Namewhen assembling the OCI reference.Testing
make qualify # tests + lint + e2e + scan + license headers — all passedCoverage deltas on touched packages:
pkg/bundler/deployer/argocdhelm: 81.0% → 81.6% (+0.6%)pkg/oci: 70.1% → 70.3% (+0.2%)pkg/bundler/config: 97.6% → 97.5% (-0.1%)pkg/cli: 64.8% → 64.9% (+0.1%)New tests:
Reference_ChartName— last-segment derivation across single/multi-segment OCI repositoriesWithBundleChartName— config option round-tripParseBundleCmdOptions_OCIChartNameDerivation— CLI flow for OCI and local-dir outputGenerate_CustomChartName— livehelm templateverifies the rendered parent Application'ssource.chartmatches the custom nameRisk Assessment
Rollout notes: No migration required. Users currently working around the bug by manually patching
aicr-stackafter generation can drop that step.Checklist
make testwith-race)make lint)git commit -S)