Skip to content

fix(bundler): derive argocd-helm chart name from OCI artifact path#1032

Merged
mchmarny merged 1 commit into
mainfrom
fix/argocd-helm-chart-name-1019
May 26, 2026
Merged

fix(bundler): derive argocd-helm chart name from OCI artifact path#1032
mchmarny merged 1 commit into
mainfrom
fix/argocd-helm-chart-name-1019

Conversation

@mchmarny

Copy link
Copy Markdown
Member

Summary

Argo CD–Helm bundles published to a non-default OCI artifact name now generate a parent Application whose source.chart matches the actual artifact, instead of the hardcoded aicr-bundle literal.

Motivation / Context

When a user runs aicr bundle --deployer argocd-helm --output oci://<reg>/<org>/<custom-name>:<tag>, the bundler pushes to <reg>/<org>/<custom-name>:<tag> but the generated parent aicr-stack Application hardcoded chart: aicr-bundle. ArgoCD then tries to resolve repoURL/chart:tag against an artifact that does not exist, surfacing as failed to load target state … not found. The README also instructed users to pass --set repoURL=... with the chart segment included, which would double-suffix the OCI reference.

Fixes: #1019

Type of Change

  • Bug fix (non-breaking change that fixes an issue)

Component(s) Affected

  • CLI (cmd/aicr, pkg/cli)
  • Bundlers (pkg/bundler, pkg/component/*)

Implementation Notes

  • oci.Reference.ChartName() returns the last path segment of Repository (e.g., oci://ghcr.io/org/my-bundle:v1my-bundle).
  • CLI flows that into bundleCmdOptions.bundleChartName, then config.WithBundleChartName(...), then argocdhelm.Generator.ChartName.
  • Chart.yaml now writes name: <chartName>. The parent Application template uses {{ .Chart.Name | quote }} for source.chart so the chart name is sourced from the rendered chart at install time — no double source of truth.
  • Local-directory output (no --output oci://) keeps the existing aicr-bundle default, preserving the current behavior for that path.
  • required directive and README updated to clarify that --set repoURL must be the parent namespace (no chart suffix); the parent App appends .Chart.Name when assembling the OCI reference.

Testing

make qualify   # tests + lint + e2e + scan + license headers — all passed

Coverage deltas on touched packages:

  • pkg/bundler/deployer/argocdhelm: 81.0% → 81.6% (+0.6%)
  • pkg/oci: 70.1% → 70.3% (+0.2%)
  • pkg/bundler/config: 97.6% → 97.5% (-0.1%)
  • pkg/cli: 64.8% → 64.9% (+0.1%)

New tests:

  • Reference_ChartName — last-segment derivation across single/multi-segment OCI repositories
  • WithBundleChartName — config option round-trip
  • ParseBundleCmdOptions_OCIChartNameDerivation — CLI flow for OCI and local-dir output
  • Generate_CustomChartName — live helm template verifies the rendered parent Application's source.chart matches the custom name

Risk Assessment

  • Low — Isolated change, fully unit-tested, easy to revert. Default behavior for local-directory output is unchanged.

Rollout notes: No migration required. Users currently working around the bug by manually patching aicr-stack after generation can drop that step.

Checklist

  • Tests pass locally (make test with -race)
  • Linter passes (make lint)
  • I did not skip/disable tests to make CI green
  • I added/updated tests for new functionality
  • I updated docs if user-facing behavior changed (README emitted into the bundle was corrected)
  • Changes follow existing patterns in the codebase
  • Commits are cryptographically signed (git commit -S)

The argocd-helm deployer hardcoded `aicr-bundle` as the Chart.yaml name
and parent Application's `source.chart`, so any user who published the
bundle under a non-default OCI artifact name hit a 404 from ArgoCD when
the parent App tried to assemble `repoURL/chart:targetRevision`.

Fix flows the OCI artifact's last path segment (Reference.ChartName())
through bundler config into argocdhelm.Generator.ChartName, which now
parameterizes both Chart.yaml and the parent Application template (the
latter via `{{ .Chart.Name }}` so the chart name is sourced from the
rendered chart at install time). Local-directory output keeps the
"aicr-bundle" default.

Also corrects the README and `required` directive guidance: --set
repoURL should be the parent namespace (no chart suffix), since the
parent App appends .Chart.Name when assembling the OCI reference.
Previous text instructed users to include the chart segment, which
would double-suffix and 404.

Fixes #1019
@coderabbitai

coderabbitai Bot commented May 26, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: efef7194-efc5-4efb-99f4-703a1e339bcd

📥 Commits

Reviewing files that changed from the base of the PR and between c210874 and 15f48e4.

📒 Files selected for processing (13)
  • pkg/bundler/bundler.go
  • pkg/bundler/config/config.go
  • pkg/bundler/config/config_test.go
  • pkg/bundler/deployer/argocdhelm/argocdhelm.go
  • pkg/bundler/deployer/argocdhelm/argocdhelm_test.go
  • pkg/bundler/deployer/argocdhelm/testdata/helm_and_manifest_only/templates/aicr-stack.yaml
  • pkg/bundler/deployer/argocdhelm/testdata/helm_and_manifest_only/templates/nodewright-customizations.yaml
  • pkg/bundler/deployer/argocdhelm/testdata/mixed_component/templates/aicr-stack.yaml
  • pkg/bundler/deployer/argocdhelm/testdata/mixed_component/templates/gpu-operator-post.yaml
  • pkg/cli/bundle.go
  • pkg/cli/bundle_test.go
  • pkg/oci/reference.go
  • pkg/oci/reference_test.go

📝 Walkthrough

Walkthrough

This PR fixes a bug where the argocd-helm deployer hardcodes the parent chart name as aicr-bundle instead of using the actual OCI artifact name specified by users. The solution threads a configurable chart name through the entire system: a new ChartName() method derives the name from OCI repository paths; CLI parsing extracts and passes it through config; the bundler wires it to the Generator; and the Generator uses it dynamically in Chart.yaml, helm install examples, and rendered Application templates, replacing all hardcoded references.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested labels

size/L, area/bundler

Suggested reviewers

  • dims
🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly describes the main change: deriving the argocd-helm chart name from the OCI artifact path, which matches the primary objective of fixing the hardcoded 'aicr-bundle' chart name issue.
Description check ✅ Passed The description comprehensively explains the bug, motivation, implementation approach, and testing, directly relating to the changeset and addressing the linked issue #1019.
Linked Issues check ✅ Passed The changes fully address issue #1019 by: (1) implementing ChartName() on oci.Reference to derive chart name from OCI path [pkg/oci/reference.go], (2) flowing it through CLI options into config and generator [pkg/cli/bundle.go, pkg/bundler/config/config.go], (3) using it in templates and Chart.yaml [argocdhelm.go, test data templates], and (4) adding comprehensive tests covering the entire flow.
Out of Scope Changes check ✅ Passed All changes are scoped to the argocd-helm deployer's chart-name derivation logic and related test updates. No extraneous modifications to unrelated components were introduced.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/argocd-helm-chart-name-1019

Comment @coderabbitai help to get the list of available commands and usage tips.

@mchmarny mchmarny self-assigned this May 26, 2026
@github-actions

Copy link
Copy Markdown
Contributor

Coverage Report ✅

Metric Value
Coverage 76.9%
Threshold 75%
Status Pass
Coverage Badge
![Coverage](https://img.shields.io/badge/coverage-76.9%25-green)

Merging this branch will increase overall coverage

Impacted Packages Coverage Δ 🤖
github.com/NVIDIA/aicr/pkg/bundler 67.84% (ø)
github.com/NVIDIA/aicr/pkg/bundler/config 97.60% (+0.04%) 👍
github.com/NVIDIA/aicr/pkg/bundler/deployer/argocdhelm 81.62% (+0.67%) 👍
github.com/NVIDIA/aicr/pkg/cli 64.98% (+0.03%) 👍
github.com/NVIDIA/aicr/pkg/oci 70.29% (+0.15%) 👍

Coverage by file

Changed files (no unit tests)

Changed File Coverage Δ Total Covered Missed 🤖
github.com/NVIDIA/aicr/pkg/bundler/bundler.go 66.09% (ø) 519 343 176
github.com/NVIDIA/aicr/pkg/bundler/config/config.go 97.19% (+0.05%) 178 (+3) 173 (+3) 5 👍
github.com/NVIDIA/aicr/pkg/bundler/deployer/argocdhelm/argocdhelm.go 81.62% (+0.67%) 370 (+13) 302 (+13) 68 👍
github.com/NVIDIA/aicr/pkg/cli/bundle.go 44.72% (+0.35%) 161 (+1) 72 (+1) 89 👍
github.com/NVIDIA/aicr/pkg/oci/reference.go 54.39% (+3.41%) 57 (+6) 31 (+5) 26 (+1) 👍

Please note that the "Total", "Covered", and "Missed" counts above refer to code statements instead of lines of code. The value in brackets refers to the test coverage of that file in the old version of the code.

@mchmarny mchmarny enabled auto-merge (squash) May 26, 2026 18:08
@mchmarny mchmarny merged commit 51eb4b5 into main May 26, 2026
33 checks passed
@mchmarny mchmarny deleted the fix/argocd-helm-chart-name-1019 branch May 26, 2026 18:17
yuanchen8911 added a commit to yuanchen8911/aicr that referenced this pull request May 26, 2026
…ote chart name (NVIDIA#1034)

Two argocd-helm OCI publishing bugs surfaced by Codex review against
the post-NVIDIA#1032 contract.

P1 — Path-based child Applications resolve to a non-existent artifact

NVIDIA#1032 changed the --set repoURL contract so callers pass only the
parent namespace (e.g., oci://reg/org); the parent Application
appends .Chart.Name via its separate source.chart field. The
parent-App template at parentAppTemplate / line 407 implements this
correctly, but the path-based child-App template in
injectValuesIntoSingleSource at line 703 was left emitting only
.Values.repoURL. Argo CD's generic OCI source (used by path-based
children since they have no source.chart) treats spec.source.repoURL
as the full artifact reference and resolves it directly, so under
the new contract a child source pointed at oci://reg/org:tag — an
artifact that doesn't exist — and the child Application failed to
sync.

Fix: append /{{ .Chart.Name }} to the rendered repoURL value so the
assembled URL matches the artifact the parent App's repoURL/chart:tag
triple resolves to. The error-message text in the required directive
is updated to say "this template appends .Chart.Name" (the path-based
template is doing the appending, not the parent App).

P2 — Unquoted name and version in generated Chart.yaml

writeChartYAML emitted "name: %s" / "version: %s" with raw
fmt.Fprintf. (*Reference).ChartName() returns path.Base(Repository),
so a valid OCI artifact path whose last segment is a YAML reserved
scalar — "null", "true", "false", "yes", "no", numeric strings —
produced an unquoted YAML reserved word as the chart's name. Helm's
loader then parsed name: null as YAML null, chart.Metadata.Name
became empty, and helm package / helm push rejected the chart with
"chart.metadata.name is required". Same trap for the version field
("1.0" parses as float).

Fix: emit both via %q so the values round-trip as YAML strings. OCI
artifact path segments are constrained to printable ASCII by the
docker reference grammar, which is the safe charset for Go's %q.

Tests

- TestInjectValuesIntoSingleSource_AppendsChartName — asserts the
  rendered path-based child source.repoURL appends /{{ .Chart.Name }}
  after the user-supplied .Values.repoURL.
- TestWriteChartYAML_QuotesYAMLReservedScalarsAsName — table-driven
  case across null / true / false / numeric / yes / normal name; for
  each, writes Chart.yaml and verifies yaml.Unmarshal round-trips
  name back as the expected string.
- TestHelmTemplate_RendersWithSetRepoURL — updated to exercise the
  new contract: --set repoURL=oci://reg/org (parent namespace only).
  Asserts the parent App's repoURL equals the parent namespace and
  the child App's repoURL equals parent-namespace + /aicr-bundle.
- TestGenerate_CustomChartName and the existing Chart.yaml-version
  assertion updated to expect quoted scalars.
- Golden templates and Chart.yaml fixtures regenerated.

Closes NVIDIA#1034
Related: NVIDIA#1032 (the contract change this PR completes), NVIDIA#1019, NVIDIA#965
yuanchen8911 added a commit to yuanchen8911/aicr that referenced this pull request May 26, 2026
The argocd-helm-oci wrapper script was passing the FULL bundle URL to
`helm install --set repoURL=…` (including the per-recipe chart name
at the end). That matched the pre-PR-NVIDIA#1032 contract where the parent
Application's `source.chart` was hardcoded to `aicr-bundle`.

PR NVIDIA#1032 (and NVIDIA#1035's reinforcement) changed the parent App template
to expect the parent-namespace-only repoURL and to append .Chart.Name
itself via the separate `source.chart` field. The wrapper script
wasn't updated to match. Result on every PR with argocd-helm-oci
Tier-1 KWOK coverage: the parent App resolves to
`oci://registry.aicr-registry.svc.cluster.local:5000/aicr/<recipe>/<recipe>:<tag>`,
the OCI artifact lookup 404s, gpu-operator-post's Application can
never sync, and the whole stack times out on
`GitOps sync timeout strike 1/3`.

The failure was masked on `main` because the most-recent KWOK
Cluster Validation run on `main` (#26469449378 at 0d3e62d, success)
ran *before* PR NVIDIA#1035 merged. After NVIDIA#1035 / NVIDIA#1036 / NVIDIA#1038 all landed
on main, no fresh KWOK run has triggered on `main` yet — but the
next one will fail the same way every open PR's argocd-helm-oci
Tier-1 jobs are currently failing.

Fix is a one-line drop of the per-recipe suffix from
OCI_IN_CLUSTER_REF in the argocd-helm-oci branch of generate_bundle.
The flux branch keeps the per-recipe suffix because flux's
OCIRepository CR consumes the FULL artifact URL (recipe segment
included). Updated the surrounding comment to point at the
post-NVIDIA#1032 contract so the next reader understands the asymmetry.

End-to-end check (verified from PR NVIDIA#1030's debug artifact at
b3f2296): repo-server log shows
`registry.aicr-registry.svc.cluster.local:5000/aicr/<recipe>/<recipe>:<tag>: not found`,
caused by the same double-append. With the recipe suffix dropped,
Argo's resolution `<repoURL>/<chart>:<tag>` aligns with the pushed
artifact at `oci://…/aicr/<recipe>:<tag>`.

Refs PR NVIDIA#1030 (where this surfaced), PR NVIDIA#1032 (contract change),
PR NVIDIA#1035 (parent App template enforcement).
yuanchen8911 added a commit to yuanchen8911/aicr that referenced this pull request May 26, 2026
Codex's P2 note on the previous wrapper-script fix: dropping the
per-recipe suffix from OCI_IN_CLUSTER_REF unconditionally broke the
argocd-oci lane. The two argocd-* deployers consume that value
differently:

  - argocd-oci bakes it into each flat Application via `--repo`. The
    rendered Applications use `spec.source.repoURL` as the FULL OCI
    artifact location and never append .Chart.Name. They need the
    per-recipe suffix in the value.
  - argocd-helm-oci passes it through to `helm install --set repoURL`
    at install time. The wrapper chart's parent Application appends
    .Chart.Name via its separate `source.chart` field, and the
    path-based child template appends /{{ .Chart.Name }} as a string,
    so both halves of the bundle would double-resolve the recipe
    segment if it were in the value already.

Fix: keep OCI_IN_CLUSTER_REF as the FULL artifact URL (its
generate_bundle-side contract, used by argocd-oci) and strip the
trailing "/<recipe>" only at the argocd-helm-oci helm-install site
via parameter expansion (helm_repo_url="${OCI_IN_CLUSTER_REF%/*}").
Documented the asymmetry inline at both sites so the next reader sees
why the two consumers disagree.

Refs PR NVIDIA#1030, PR NVIDIA#1032 (contract change), PR NVIDIA#1035 (parent App
template enforcement). Restores argocd-oci correctness while keeping
the post-NVIDIA#1032 contract fix for argocd-helm-oci.
yuanchen8911 added a commit to yuanchen8911/aicr that referenced this pull request May 26, 2026
This reverts commit 7ed2bde and the prior 5620000. Codex's second
pass on the wrapper-script change pointed out the strip is unsafe on
this branch: the argocd-helm wrapper chart here still uses the
pre-NVIDIA#1032 contract (parent App's `chart: aicr-bundle` is hardcoded;
path-based children use .Values.repoURL directly). With the strip,
the parent App would resolve to `oci://.../aicr/aicr-bundle:tag`
while the script pushes to `oci://.../aicr/<recipe>:tag` — wrong.

The original Tier-1 argocd-helm-oci failures still need root-causing,
but a different change in this PR — not the wrapper-script repoURL —
must be responsible. Reverting both wrapper-script commits restores
the same OCI_IN_CLUSTER_REF semantics main has been running with,
and unblocks separate diagnosis without dragging the wrapper script
along.

Refs PR NVIDIA#1030.
yuanchen8911 added a commit to yuanchen8911/aicr that referenced this pull request May 26, 2026
The argocd-helm-oci wrapper script was passing the FULL bundle URL to
`helm install --set repoURL=…` (including the per-recipe chart name
at the end). That matched the pre-PR-NVIDIA#1032 contract where the parent
Application's `source.chart` was hardcoded to `aicr-bundle`.

PR NVIDIA#1032 (and NVIDIA#1035's reinforcement) changed the parent App template
to expect the parent-namespace-only repoURL and to append .Chart.Name
itself via the separate `source.chart` field. The wrapper script
wasn't updated to match. Result on every PR with argocd-helm-oci
Tier-1 KWOK coverage: the parent App resolves to
`oci://registry.aicr-registry.svc.cluster.local:5000/aicr/<recipe>/<recipe>:<tag>`,
the OCI artifact lookup 404s, gpu-operator-post's Application can
never sync, and the whole stack times out on
`GitOps sync timeout strike 1/3`.

The failure was masked on `main` because the most-recent KWOK
Cluster Validation run on `main` (#26469449378 at 0d3e62d, success)
ran *before* PR NVIDIA#1035 merged. After NVIDIA#1035 / NVIDIA#1036 / NVIDIA#1038 all landed
on main, no fresh KWOK run has triggered on `main` yet — but the
next one will fail the same way every open PR's argocd-helm-oci
Tier-1 jobs are currently failing.

Fix is a one-line drop of the per-recipe suffix from
OCI_IN_CLUSTER_REF in the argocd-helm-oci branch of generate_bundle.
The flux branch keeps the per-recipe suffix because flux's
OCIRepository CR consumes the FULL artifact URL (recipe segment
included). Updated the surrounding comment to point at the
post-NVIDIA#1032 contract so the next reader understands the asymmetry.

End-to-end check (verified from PR NVIDIA#1030's debug artifact at
b3f2296): repo-server log shows
`registry.aicr-registry.svc.cluster.local:5000/aicr/<recipe>/<recipe>:<tag>: not found`,
caused by the same double-append. With the recipe suffix dropped,
Argo's resolution `<repoURL>/<chart>:<tag>` aligns with the pushed
artifact at `oci://…/aicr/<recipe>:<tag>`.

Refs PR NVIDIA#1030 (where this surfaced), PR NVIDIA#1032 (contract change),
PR NVIDIA#1035 (parent App template enforcement).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug(bundler): argocd-helm hardcodes parent chart name as aicr-bundle instead of using the published OCI artifact name

2 participants