Skip to content

feat(bundler): add helmfile deployer (#632)#899

Merged
mchmarny merged 1 commit into
mainfrom
feat/helmfile
May 14, 2026
Merged

feat(bundler): add helmfile deployer (#632)#899
mchmarny merged 1 commit into
mainfrom
feat/helmfile

Conversation

@lockwobr

@lockwobr lockwobr commented May 14, 2026

Copy link
Copy Markdown
Contributor

Summary

Adds --deployer helmfile alongside helm / argocd / argocd-helm / flux. The bundler emits a declarative helmfile.yaml release graph + per-component chart dirs (via the shared localformat writer); operators drive rollouts with the upstream helmfile CLI (apply / diff / destroy). No bash wrapper, no pre-flight, no scripted post-install — AICR's charter is to generate validated configuration, not orchestrate deployments.

Motivation / Context

deploy.sh.tmpl (~450 lines) and undeploy.sh.tmpl (~700 lines) have accreted finalizer-handling, stale-hook cleanup, webhook-orphan detection, and per-operator post-install logic. Seven PRs on undeploy.sh.tmpl alone have chased finalizer edge cases (#253, #282, #364, #416, #477, #561, #602). That's orchestration logic, and it belongs in a tool purpose-built or it — not in a bash template AICR maintains.

This change is additive: the helm deployer stays in place; the two coexist indefinitely. If the helmfile path is well received it opens a future conversation about simplifying the bash deployer, but that's a separate decision for a later issue.

Beyond scope alignment, users pick up helmfile diff (no AICR deployer offers preview today), idempotent apply/destroy, a reviewable release graph (needs: in YAML instead of bash iteration order).

closes #632

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation update
  • Refactoring (no functional changes)
  • Build/CI/tooling

Component(s) Affected

  • CLI (cmd/aicr, pkg/cli)
  • API server (cmd/aicrd, pkg/api, pkg/server)
  • Recipe engine / data (pkg/recipe)
  • Bundlers (pkg/bundler, pkg/component/*)
  • Collectors / snapshotter (pkg/collector, pkg/snapshotter)
  • Validator (pkg/validator)
  • Core libraries (pkg/errors, pkg/k8s)
  • Docs/examples (docs/, examples/)
  • Other: ____________

Implementation Notes

Mapping from AICR concepts to helmfile primitives:

AICR concept Helmfile mapping
DeploymentOrder release.needs (immediate predecessor)
Chart repo (HTTP/OCI) repositories[], deduped by URL
DynamicValues[name] cluster-values.yaml appended to release values:
Global HELM_TIMEOUT="10m" helmDefaults.timeout: 600
ASYNC_COMPONENTS (kai-scheduler) per-release wait: false
kai-scheduler → 20m per-release timeout: 1200
Manifest-only / Kustomize / mixed-post chart: ./NNN-name (localformat wraps as KindLocalHelm)
VendorCharts localformat pulls tarball into ./NNN-name/charts/; release still chart: ./NNN-name

Reuses existing infrastructure:

  • localformat.Write() emits the uniform NNN-<name>/ per-component layout. The helmfile deployer doesn't re-classify components; it walks the returned Folder list and emits one release per folder.
  • deployer.SortComponentRefsByDeploymentOrder for ordering.
    | ASYNC_COMPONENTS (kai-scheduler) | per-release wait: false |
    | kai-scheduler → 20m | per-release timeout: 1200 |

Non-goals (per #632):

  • No deploy.sh / undeploy.sh emitted alongside helmfile.yaml.
  • No prepare: / presync: / postsync: / postuninstall: hooks emitted.
  • No replication of today's undeploy.sh CRD/finalizer/webhook cleanup — helmfile destroy stays the simple graceful-uninstall primitive.

Testing

# Commands run (prefer `make qualify` for non-trivial changes)
make qualify

Risk Assessment

  • Low — Isolated change, well-tested, easy to revert
  • Medium — Touches multiple components or has broader impact
  • High — Breaking change, affects critical paths, or complex rollout

Rollout notes:

Checklist

  • Tests pass locally (make test with -race)
  • Linter passes (make lint)
  • I did not skip/disable tests to make CI green
  • I added/updated tests for new functionality
  • I updated docs if user-facing behavior changed
  • Changes follow existing patterns in the codebase
  • Commits are cryptographically signed (git commit -S) — GPG signing info

@github-actions

Copy link
Copy Markdown
Contributor

@coderabbitai

coderabbitai Bot commented May 14, 2026

Copy link
Copy Markdown

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a new helmfile deployer end-to-end: new DeployerHelmfile config and tests; Helmfile data model and release-building helpers; exported Generator that writes per-component chart folders, helmfile.yaml, README, optional provenance/checksums, and data files; bundler wiring and CLI/OpenAPI updates; comprehensive unit/golden tests and a Chainsaw e2e test; embedded README template and multiple test fixtures; CI/action inputs and setup steps to install/verify a pinned helmfile binary; .settings.yaml pin and checksum tooling plus a script and Renovate rule to refresh checksums.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • NVIDIA/aicr#817: Related deployer enum and bundler integration changes that touch the same bundler/API wiring.
  • NVIDIA/aicr#856: Introduces pre/post manifest phase plumbing used by the helmfile generator.

Suggested labels

enhancement, documentation

Suggested reviewers

  • mchmarny
  • pdmack
🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The PR title clearly summarizes the main change: adding a helmfile deployer to the bundler, directly matching the changeset's primary objective.
Linked Issues check ✅ Passed The code changes fully implement the requirements from issue #632: the helmfile deployer is added alongside existing deployers [#632], AICR config/API support for helmfile is extended [#632], documentation is updated [#632], and test coverage validates the implementation [#632].
Out of Scope Changes check ✅ Passed All changes are directly scoped to implementing the helmfile deployer feature per #632. No unrelated refactoring, unrelated tool updates, or scope creep detected; build system, GitHub Actions, and tool-setup changes support the helmfile runtime dependency and CI integration.
Description check ✅ Passed The pull request description is directly related to the changeset, clearly describing the addition of a new helmfile deployer option alongside existing deployers.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/helmfile

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/contributor/api-server.md`:
- Line 26: The docs line describing value override support for the /v1/bundle
endpoint currently lists deployers as "helm, argocd-helm, flux, and helmfile"
but omits "argocd"; update the sentence referencing `/v1/bundle` and the query
params `?set=bundler:path=value` and `?dynamic=component:path` to include
"argocd" in the supported deployer list so it reads e.g. "helm, argocd,
argocd-helm, flux, and helmfile".

In `@docs/README.md`:
- Around line 86-87: Update the "Deployer Options" and "How it helps" text to
distinguish outputs per deployer: keep the list under "Deployer Options"
(`helm`, `argocd`, `argocd-helm`, `flux`, `helmfile`) but change the "How it
helps" paragraph so it states which deployers produce ready-to-run scripts or
manifests (e.g., `helm` generates a custom install.sh that pre-validates the
environment), while others (e.g., `argocd`, `flux`, `argocd-helm`) produce
Application/HelmRelease manifests and `helmfile` produces a declarative
helmfile-driven release graph — do not imply `helmfile` is script-driven and
avoid promising future features.

In `@docs/user/cli-reference.md`:
- Around line 1458-1463: The release graph example in helmfile.yaml uses the
wrong path pattern for per-release value files; update the example entries that
show values: [./NNN/values.yaml, ./NNN/cluster-values.yaml] so they match the
generated layout (use the NNN-<component> directory naming), e.g. reference
values: [./NNN-<component>/values.yaml, ./NNN-<component>/cluster-values.yaml]
so users edit the correct cluster-values.yaml per component before running
helmfile apply; adjust the example tree and any accompanying text to reflect the
NNN-<component>/ directory naming consistently.

In `@pkg/bundler/deployer/helmfile/helmfile_test.go`:
- Around line 161-163: Tests are currently matching error text substrings
instead of using errors.Is with structured error codes; update the production
errors so tests can use errors.Is: in the Generate() error paths in helmfile.go
(where the code comments reference wrapping lines 114 and 117) return pkg/errors
structured errors using errors.New or errors.Wrap that include StructuredError
with Code set to errors.ErrCodeInvalidRequest and errors.ErrCodeTimeout
respectively, and in releases.go replace fmt.Errorf calls (around the
buildHelmfile() error paths) with pkg/errors errors.Wrap or errors.New so they
produce structured errors; after that, update the tests to assert with
errors.Is(err, &errors.StructuredError{Code: errors.ErrCodeInvalidRequest}) or
errors.ErrCodeTimeout as appropriate.

In `@pkg/bundler/deployer/helmfile/releases.go`:
- Around line 79-86: The componentOverrides map duplicates special-case logic
from the helm deployer template
(pkg/bundler/deployer/helm/templates/deploy.sh.tmpl) and risks drift; replace
the hardcoded map with a single source of truth or add an automated consistency
check: either move the overrides into a shared constant/registry consumed by
both the Go code (componentOverrides / type overrides) and the template, or add
a unit/integration test that loads deploy.sh.tmpl and verifies ASYNC_COMPONENTS
/ COMPONENT_HELM_TIMEOUT entries match the componentOverrides map (e.g.,
validate "kai-scheduler" wait/timeout), ensuring changes to one place fail CI
until synchronized.
- Around line 106-165: Replace the two fmt.Errorf usages inside buildHelmfile
with the project's pkg/errors structured errors: when f.Upstream is nil
(currently "folder %s is KindUpstreamHelm but Upstream is nil") and when
encountering an unsupported folder kind (currently "unsupported folder kind %v
for %s"), construct and return a pkg/errors error value using the appropriate
error code/constructor from pkg/errors (e.g., errors.E with a suitable sentinel
like errors.Invalid or the project's defined code) and include the folder
details in the message; update the imports to use pkg/errors and remove or keep
fmt only if still needed elsewhere.

In `@tests/chainsaw/cli/bundle-helmfile/chainsaw-test.yaml`:
- Around line 129-132: The loop currently skips validation for the first release
(when idx == 1) which lets a regression add a needs: to release[0]; change the
branch so that when idx == 1 you explicitly assert the first release has no
needs entry (using the same variables used in the loop such as idx, prev, and
name) and fail the script if a needs field is present instead of just
continuing; keep the existing behavior of setting prev="$name" afterwards if the
assertion passes.

In `@tools/setup-tools`:
- Around line 410-416: Add SHA256 verification for downloaded helmfile
artifacts: extend .settings.yaml with a testing_tools.helmfile_checksums section
mirroring chainsaw_checksums, add a new helper script
tools/update-helmfile-checksums that fetches the GitHub release checksums
(modeled on tools/update-chainsaw-checksums), and register that script in
Renovate postUpgradeTasks so checksum entries auto-update on version bumps. In
the installer flow (function/section using verify_download_url and variables
HELMFILE_URL, HELMFILE_TMP, HELMFILE_TAR, HELMFILE_OS, HELMFILE_ARCH) read
HELMFILE_SHA256 from .settings.yaml
(testing_tools.helmfile_checksums.${HELMFILE_OS}_${HELMFILE_ARCH}) and call
verify_sha256 on the downloaded "${HELMFILE_TMP}/${HELMFILE_TAR}" before
extracting; keep existing cleanup and move/chmod steps unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: 1fc0b716-bab1-4c80-a629-ba346b680278

📥 Commits

Reviewing files that changed from the base of the PR and between 2331074 and 753032d.

📒 Files selected for processing (25)
  • .settings.yaml
  • api/aicr/v1/server.yaml
  • docs/README.md
  • docs/contributor/api-server.md
  • docs/contributor/component.md
  • docs/user/api-reference.md
  • docs/user/cli-reference.md
  • pkg/bundler/bundler.go
  • pkg/bundler/config/config.go
  • pkg/bundler/config/config_test.go
  • pkg/bundler/deployer/helmfile/doc.go
  • pkg/bundler/deployer/helmfile/helmfile.go
  • pkg/bundler/deployer/helmfile/helmfile_test.go
  • pkg/bundler/deployer/helmfile/releases.go
  • pkg/bundler/deployer/helmfile/templates/README.md.tmpl
  • pkg/bundler/deployer/helmfile/testdata/kai_scheduler_async/helmfile.yaml
  • pkg/bundler/deployer/helmfile/testdata/manifest_only/helmfile.yaml
  • pkg/bundler/deployer/helmfile/testdata/mixed_gpu_operator/helmfile.yaml
  • pkg/bundler/deployer/helmfile/testdata/upstream_helm_only/README.md
  • pkg/bundler/deployer/helmfile/testdata/upstream_helm_only/helmfile.yaml
  • pkg/bundler/deployer/helmfile/testdata/with_dynamic_values/helmfile.yaml
  • pkg/cli/bundle.go
  • tests/chainsaw/cli/bundle-helmfile/chainsaw-test.yaml
  • tools/check-tools
  • tools/setup-tools

Comment thread docs/contributor/api-server.md Outdated
Comment thread docs/README.md Outdated
Comment thread docs/user/cli-reference.md
Comment thread pkg/bundler/deployer/helmfile/helmfile_test.go
Comment thread pkg/bundler/deployer/helmfile/releases.go
Comment thread pkg/bundler/deployer/helmfile/releases.go
Comment thread tests/chainsaw/cli/bundle-helmfile/chainsaw-test.yaml
Comment thread tools/setup-tools Outdated
@github-actions

github-actions Bot commented May 14, 2026

Copy link
Copy Markdown
Contributor

Coverage Report ✅

Metric Value
Coverage 75.9%
Threshold 75%
Status Pass
Coverage Badge
![Coverage](https://img.shields.io/badge/coverage-75.9%25-green)

Merging this branch changes the coverage (1 decrease, 2 increase)

Impacted Packages Coverage Δ 🤖
github.com/NVIDIA/aicr/pkg/bundler 61.95% (+0.37%) 👍
github.com/NVIDIA/aicr/pkg/bundler/config 97.46% (ø)
github.com/NVIDIA/aicr/pkg/bundler/deployer/helmfile 87.50% (+87.50%) 🌟
github.com/NVIDIA/aicr/pkg/cli 63.72% (-0.05%) 👎

Coverage by file

Changed files (no unit tests)

Changed File Coverage Δ Total Covered Missed 🤖
github.com/NVIDIA/aicr/pkg/bundler/bundler.go 57.94% (+0.56%) 428 (+8) 248 (+7) 180 (+1) 👍
github.com/NVIDIA/aicr/pkg/bundler/config/config.go 97.01% (ø) 167 162 5
github.com/NVIDIA/aicr/pkg/bundler/deployer/helmfile/doc.go 0.00% (ø) 0 0 0
github.com/NVIDIA/aicr/pkg/bundler/deployer/helmfile/helmfile.go 79.55% (+79.55%) 88 (+88) 70 (+70) 18 (+18) 🌟
github.com/NVIDIA/aicr/pkg/bundler/deployer/helmfile/releases.go 100.00% (+100.00%) 56 (+56) 56 (+56) 0 🌟
github.com/NVIDIA/aicr/pkg/cli/bundle.go 41.22% (-0.28%) 148 (+1) 61 87 (+1) 👎

Please note that the "Total", "Covered", and "Missed" counts above refer to code statements instead of lines of code. The value in brackets refers to the test coverage of that file in the old version of the code.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/bundler/deployer/helmfile/helmfile.go`:
- Around line 159-161: The current error handling after buildHelmfile uses
errors.Wrap with a fixed ErrCodeInternal which can overwrite an existing
structured code returned by buildHelmfile; replace that call so it propagates an
existing structured error code or wraps uncoded errors by using
errors.PropagateOrWrap(err, errors.ErrCodeInternal, "failed to build helmfile
document") (target the error handling immediately after the buildHelmfile call
where err is checked) so existing codes like ErrCodeInvalidRequest are preserved
while uncoded errors get wrapped with ErrCodeInternal and the message.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: e154aa83-1c65-45d0-ab58-b40eda47a4bf

📥 Commits

Reviewing files that changed from the base of the PR and between 753032d and 47c33ad.

📒 Files selected for processing (30)
  • .github/actions/cli-e2e/action.yml
  • .github/actions/load-versions/action.yml
  • .github/actions/setup-build-tools/action.yml
  • .github/workflows/qualification.yaml
  • .settings.yaml
  • api/aicr/v1/server.yaml
  • docs/README.md
  • docs/contributor/api-server.md
  • docs/contributor/component.md
  • docs/user/api-reference.md
  • docs/user/cli-reference.md
  • pkg/bundler/bundler.go
  • pkg/bundler/bundler_test.go
  • pkg/bundler/config/config.go
  • pkg/bundler/config/config_test.go
  • pkg/bundler/deployer/helmfile/doc.go
  • pkg/bundler/deployer/helmfile/helmfile.go
  • pkg/bundler/deployer/helmfile/helmfile_test.go
  • pkg/bundler/deployer/helmfile/releases.go
  • pkg/bundler/deployer/helmfile/templates/README.md.tmpl
  • pkg/bundler/deployer/helmfile/testdata/kai_scheduler_async/helmfile.yaml
  • pkg/bundler/deployer/helmfile/testdata/manifest_only/helmfile.yaml
  • pkg/bundler/deployer/helmfile/testdata/mixed_gpu_operator/helmfile.yaml
  • pkg/bundler/deployer/helmfile/testdata/upstream_helm_only/README.md
  • pkg/bundler/deployer/helmfile/testdata/upstream_helm_only/helmfile.yaml
  • pkg/bundler/deployer/helmfile/testdata/with_dynamic_values/helmfile.yaml
  • pkg/cli/bundle.go
  • tests/chainsaw/cli/bundle-helmfile/chainsaw-test.yaml
  • tools/check-tools
  • tools/setup-tools

Comment thread pkg/bundler/deployer/helmfile/helmfile.go
@lockwobr lockwobr force-pushed the feat/helmfile branch 2 times, most recently from d6e1089 to b7b7d85 Compare May 14, 2026 21:35

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (2)
tools/setup-tools (1)

410-414: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Verify Helmfile artifact checksum before extraction.

helmfile is downloaded and untarred without a verify_sha256 check, so this path is weaker than the neighboring Chainsaw installer and allows unnoticed artifact tampering.

Suggested fix
             if verify_download_url "$HELMFILE_URL" "helmfile ${HELMFILE_VERSION} for ${HELMFILE_OS} ${HELMFILE_ARCH}"; then
                 HELMFILE_TMP=$(mktemp -d)
                 curl -fsSL -o "${HELMFILE_TMP}/${HELMFILE_TAR}" "${HELMFILE_URL}"
+                HELMFILE_SHA256=$(yq ".testing_tools.helmfile_checksums.${HELMFILE_OS}_${HELMFILE_ARCH}" .settings.yaml)
+                verify_sha256 "${HELMFILE_TMP}/${HELMFILE_TAR}" "${HELMFILE_SHA256}"
                 tar -xzf "${HELMFILE_TMP}/${HELMFILE_TAR}" -C "${HELMFILE_TMP}"
                 sudo mv "${HELMFILE_TMP}/helmfile" /usr/local/bin/helmfile
                 sudo chmod +x /usr/local/bin/helmfile
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tools/setup-tools` around lines 410 - 414, The Helmfile download block
currently extracts the tarball without checksum verification; add a
verify_sha256 step using the expected checksum variable (e.g., HELMFILE_SHA256)
after downloading to HELMFILE_TMP and before tar -xzf and sudo mv. Update the
block around verify_download_url / HELMFILE_TMP / HELMFILE_TAR to call
verify_sha256 "${HELMFILE_TMP}/${HELMFILE_TAR}" "${HELMFILE_SHA256}" (or compute
and compare sha256sum) and fail/exit if verification fails so only verified
artifacts are extracted and installed.
pkg/bundler/deployer/helmfile/helmfile.go (1)

159-161: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Preserve structured error codes from buildHelmfile failures.

Line 160 currently wraps all buildHelmfile errors as ErrCodeInternal, which can mask existing structured codes (e.g., ErrCodeInvalidRequest). Propagate coded errors and wrap only uncoded ones.

🔧 Proposed fix
-	if err != nil {
-		return nil, errors.Wrap(errors.ErrCodeInternal, "failed to build helmfile document", err)
-	}
+	if err != nil {
+		return nil, errors.PropagateOrWrap(err, errors.ErrCodeInternal, "failed to build helmfile document")
+	}

Based on learnings: use errors.PropagateOrWrap(err, errCode, message) so structured error codes remain intact.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/bundler/deployer/helmfile/helmfile.go` around lines 159 - 161, The
current error handling after calling buildHelmfile wraps every error as
errors.ErrCodeInternal, which hides any structured codes from buildHelmfile;
update the return path in the function that calls buildHelmfile to use
errors.PropagateOrWrap(err, errors.ErrCodeInternal, "failed to build helmfile
document") so that if err already carries a code it is preserved and only
uncoded errors get wrapped with ErrCodeInternal; locate the check that reads "if
err != nil { return nil, errors.Wrap(...)" and replace that wrap call with
errors.PropagateOrWrap as described.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In `@pkg/bundler/deployer/helmfile/helmfile.go`:
- Around line 159-161: The current error handling after calling buildHelmfile
wraps every error as errors.ErrCodeInternal, which hides any structured codes
from buildHelmfile; update the return path in the function that calls
buildHelmfile to use errors.PropagateOrWrap(err, errors.ErrCodeInternal, "failed
to build helmfile document") so that if err already carries a code it is
preserved and only uncoded errors get wrapped with ErrCodeInternal; locate the
check that reads "if err != nil { return nil, errors.Wrap(...)" and replace that
wrap call with errors.PropagateOrWrap as described.

In `@tools/setup-tools`:
- Around line 410-414: The Helmfile download block currently extracts the
tarball without checksum verification; add a verify_sha256 step using the
expected checksum variable (e.g., HELMFILE_SHA256) after downloading to
HELMFILE_TMP and before tar -xzf and sudo mv. Update the block around
verify_download_url / HELMFILE_TMP / HELMFILE_TAR to call verify_sha256
"${HELMFILE_TMP}/${HELMFILE_TAR}" "${HELMFILE_SHA256}" (or compute and compare
sha256sum) and fail/exit if verification fails so only verified artifacts are
extracted and installed.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: 4eb198d4-0004-4811-a2a1-5aa548996b3c

📥 Commits

Reviewing files that changed from the base of the PR and between 47c33ad and d6e1089.

📒 Files selected for processing (31)
  • .github/actions/cli-e2e/action.yml
  • .github/actions/load-versions/action.yml
  • .github/actions/setup-build-tools/action.yml
  • .github/workflows/qualification.yaml
  • .settings.yaml
  • api/aicr/v1/server.yaml
  • docs/README.md
  • docs/contributor/api-server.md
  • docs/contributor/component.md
  • docs/user/api-reference.md
  • docs/user/cli-reference.md
  • pkg/bundler/bundler.go
  • pkg/bundler/bundler_test.go
  • pkg/bundler/config/config.go
  • pkg/bundler/config/config_test.go
  • pkg/bundler/deployer/helmfile/doc.go
  • pkg/bundler/deployer/helmfile/helmfile.go
  • pkg/bundler/deployer/helmfile/helmfile_test.go
  • pkg/bundler/deployer/helmfile/releases.go
  • pkg/bundler/deployer/helmfile/templates/README.md.tmpl
  • pkg/bundler/deployer/helmfile/testdata/kai_scheduler_async/helmfile.yaml
  • pkg/bundler/deployer/helmfile/testdata/manifest_only/helmfile.yaml
  • pkg/bundler/deployer/helmfile/testdata/mixed_gpu_operator/helmfile.yaml
  • pkg/bundler/deployer/helmfile/testdata/upstream_helm_only/README.md
  • pkg/bundler/deployer/helmfile/testdata/upstream_helm_only/helmfile.yaml
  • pkg/bundler/deployer/helmfile/testdata/with_dynamic_values/helmfile.yaml
  • pkg/cli/bundle.go
  • tests/chainsaw/cli/bundle-helmfile/chainsaw-test.yaml
  • tools/check-tools
  • tools/setup-tools
  • tools/update-helmfile-checksums

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tools/setup-tools`:
- Around line 395-427: The script logs failures but continues when helmfile
cannot be installed; modify the branches that handle missing checksum
(HELMFILE_SHA256 null check) and the verify_download_url false branch so the
script exits non-zero after logging (e.g., call exit 1) to fail-fast; ensure the
changes are applied around the HELMFILE_SHA256 null/error handling and the else
branch after verify_download_url so that a missing checksum or failed
verification/installation does not allow the setup to continue.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: ac182feb-6a09-4605-87a0-3d609bee5b46

📥 Commits

Reviewing files that changed from the base of the PR and between d6e1089 and b7b7d85.

📒 Files selected for processing (32)
  • .github/actions/cli-e2e/action.yml
  • .github/actions/load-versions/action.yml
  • .github/actions/setup-build-tools/action.yml
  • .github/renovate.json5
  • .github/workflows/qualification.yaml
  • .settings.yaml
  • api/aicr/v1/server.yaml
  • docs/README.md
  • docs/contributor/api-server.md
  • docs/contributor/component.md
  • docs/user/api-reference.md
  • docs/user/cli-reference.md
  • pkg/bundler/bundler.go
  • pkg/bundler/bundler_test.go
  • pkg/bundler/config/config.go
  • pkg/bundler/config/config_test.go
  • pkg/bundler/deployer/helmfile/doc.go
  • pkg/bundler/deployer/helmfile/helmfile.go
  • pkg/bundler/deployer/helmfile/helmfile_test.go
  • pkg/bundler/deployer/helmfile/releases.go
  • pkg/bundler/deployer/helmfile/templates/README.md.tmpl
  • pkg/bundler/deployer/helmfile/testdata/kai_scheduler_async/helmfile.yaml
  • pkg/bundler/deployer/helmfile/testdata/manifest_only/helmfile.yaml
  • pkg/bundler/deployer/helmfile/testdata/mixed_gpu_operator/helmfile.yaml
  • pkg/bundler/deployer/helmfile/testdata/upstream_helm_only/README.md
  • pkg/bundler/deployer/helmfile/testdata/upstream_helm_only/helmfile.yaml
  • pkg/bundler/deployer/helmfile/testdata/with_dynamic_values/helmfile.yaml
  • pkg/cli/bundle.go
  • tests/chainsaw/cli/bundle-helmfile/chainsaw-test.yaml
  • tools/check-tools
  • tools/setup-tools
  • tools/update-helmfile-checksums

Comment thread tools/setup-tools

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
docs/user/cli-reference.md (1)

1148-1155: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Update the later deployer enumerations to include helmfile.

This table adds helmfile, but the Attestation section later in the same page still says attestation works with only helm, argocd, argocd-helm, and flux. That leaves the CLI reference internally contradictory about current deployer support.

As per coding guidelines, user documentation should document CLI commands and user-facing features accurately.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/user/cli-reference.md` around lines 1148 - 1155, The Attestation section
is out of sync with the deployer table — add "helmfile" to the list of deployers
that support attestation and describe its support (e.g., that attestation works
with helmfile deploys similarly to other GitOps/Helm-based deployers, and any
caveats such as using per-release cluster-values.yaml or requiring the helmfile
binary). Update the Attestation text to explicitly mention "helmfile" and mirror
the same support statement pattern used for "helm", "argocd", "argocd-helm", and
"flux" so the CLI reference is consistent.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/actions/load-versions/action.yml:
- Line 193: Update the echo that prints the helmfile output so it includes its
checksum like the architecture checksums: change the single echo of
steps.versions.outputs.helmfile to print both the version and the corresponding
checksum (e.g., use steps.versions.outputs.helmfile and
steps.versions.outputs.helmfile_checksum) so the action logs helmfile version
plus its checksum for consistency and easier troubleshooting.

In `@pkg/bundler/bundler_test.go`:
- Around line 829-832: The test currently only asserts helmfile.yaml exists; add
negative assertions to ensure deploy.sh and undeploy.sh are NOT generated by
checking os.Stat(filepath.Join(tmpDir, "deploy.sh")) and
os.Stat(filepath.Join(tmpDir, "undeploy.sh")) and failing the test with t.Errorf
if those files exist (i.e., if stat returns nil) so the non-goals are locked in;
reuse the same pattern as the existing helmfile check but invert the condition
(t.Errorf when err == nil) and include clear messages identifying deploy.sh and
undeploy.sh.

In `@pkg/bundler/deployer/helmfile/helmfile_test.go`:
- Around line 40-270: Consolidate the five nearly-identical tests
TestGenerate_UpstreamHelmOnly, TestGenerate_DynamicValues,
TestGenerate_KaiSchedulerAsync, TestGenerate_MixedGPUOperator, and
TestGenerate_ManifestOnly into a single table-driven test that iterates
scenarios and calls runScenario(t, g, goldenName, expectedFiles) for each entry;
create a slice of structs (name, gen *Generator, golden string, expected
[]string) and a t.Run subtest per entry, reusing the existing Generator
initializations from each test body and referencing runScenario and the
Generator fields (RecipeResult, ComponentValues, DynamicValues,
ComponentPostManifests, Version) to populate each case; leave all other tests
(TestGenerate_EmptyRecipe, TestGenerate_NilRecipeResult,
TestGenerate_WithChecksums, TestGenerate_WithDataFiles,
TestGenerate_ContextCanceled, TestGenerate_MissingDataFile) unchanged.
- Around line 253-270: Update TestGenerate_MissingDataFile to assert the
structured error code instead of only checking non-nil: after calling
g.Generate(...) keep the existing nil check but then use stderrors.Is(err,
errors.New(errors.ErrCodeInternal, "")) (consistent with
TestGenerate_ContextCanceled) to verify the error returned by Generator.Generate
indicates an internal missing-data-file failure; reference the test name
TestGenerate_MissingDataFile and the Generator.Generate call when making the
assertion.

---

Outside diff comments:
In `@docs/user/cli-reference.md`:
- Around line 1148-1155: The Attestation section is out of sync with the
deployer table — add "helmfile" to the list of deployers that support
attestation and describe its support (e.g., that attestation works with helmfile
deploys similarly to other GitOps/Helm-based deployers, and any caveats such as
using per-release cluster-values.yaml or requiring the helmfile binary). Update
the Attestation text to explicitly mention "helmfile" and mirror the same
support statement pattern used for "helm", "argocd", "argocd-helm", and "flux"
so the CLI reference is consistent.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: 01d4c143-f578-4906-8e30-35ad27a4b514

📥 Commits

Reviewing files that changed from the base of the PR and between b7b7d85 and db30777.

📒 Files selected for processing (32)
  • .github/actions/cli-e2e/action.yml
  • .github/actions/load-versions/action.yml
  • .github/actions/setup-build-tools/action.yml
  • .github/renovate.json5
  • .github/workflows/qualification.yaml
  • .settings.yaml
  • api/aicr/v1/server.yaml
  • docs/README.md
  • docs/contributor/api-server.md
  • docs/contributor/component.md
  • docs/user/api-reference.md
  • docs/user/cli-reference.md
  • pkg/bundler/bundler.go
  • pkg/bundler/bundler_test.go
  • pkg/bundler/config/config.go
  • pkg/bundler/config/config_test.go
  • pkg/bundler/deployer/helmfile/doc.go
  • pkg/bundler/deployer/helmfile/helmfile.go
  • pkg/bundler/deployer/helmfile/helmfile_test.go
  • pkg/bundler/deployer/helmfile/releases.go
  • pkg/bundler/deployer/helmfile/templates/README.md.tmpl
  • pkg/bundler/deployer/helmfile/testdata/kai_scheduler_async/helmfile.yaml
  • pkg/bundler/deployer/helmfile/testdata/manifest_only/helmfile.yaml
  • pkg/bundler/deployer/helmfile/testdata/mixed_gpu_operator/helmfile.yaml
  • pkg/bundler/deployer/helmfile/testdata/upstream_helm_only/README.md
  • pkg/bundler/deployer/helmfile/testdata/upstream_helm_only/helmfile.yaml
  • pkg/bundler/deployer/helmfile/testdata/with_dynamic_values/helmfile.yaml
  • pkg/cli/bundle.go
  • tests/chainsaw/cli/bundle-helmfile/chainsaw-test.yaml
  • tools/check-tools
  • tools/setup-tools
  • tools/update-helmfile-checksums

Comment thread .github/actions/load-versions/action.yml
Comment thread pkg/bundler/bundler_test.go
Comment thread pkg/bundler/deployer/helmfile/helmfile_test.go Outdated
Comment thread pkg/bundler/deployer/helmfile/helmfile_test.go
Comment thread pkg/bundler/deployer/helmfile/releases.go

@mchmarny mchmarny left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solid additive change — the helmfile deployer slots into the existing localformat.Write() infrastructure cleanly, reuses SortComponentRefsByDeploymentOrder, and the golden-file scenarios cover the four shapes that actually matter (upstream-only, dynamic-values, kai-scheduler async, mixed + manifest-only). The TestComponentOverrides_ParityWithHelmDeployScript regression guard is the right call — it locks the helmfile and helm deployers together until the special cases move into the recipe schema.

CI is green where it has reported (Lint, Test, E2E, CLI E2E, GPU smoke, GPU inference, analyze, malware/grype/ClamAV); the GPU Training Test was still in progress at review time. CodeQL is neutral — confirm that's the expected outcome and not a regressed config.

Feedback is all low/nit-grade — nothing blocks merge:

  • The PR description's "parallel installs where needs: allows" claim doesn't match the implementation (every release chains to its predecessor, no fan-out).
  • valuesFilesForFolder references cluster-values.yaml for injected -pre / -post folders too — works, but worth a code comment.
  • Chainsaw cleanup is attached only to the last step.
  • One brittle count >= 2 assertion and a couple of 0755 vs 0o755 style nits.

Nice work overall.

Comment thread pkg/bundler/deployer/helmfile/releases.go
Comment thread pkg/bundler/deployer/helmfile/releases.go
Comment thread tests/chainsaw/cli/bundle-helmfile/chainsaw-test.yaml Outdated
Comment thread tests/chainsaw/cli/bundle-helmfile/chainsaw-test.yaml Outdated
Comment thread pkg/bundler/deployer/helmfile/helmfile.go Outdated
Comment thread pkg/bundler/deployer/helmfile/helmfile_test.go
Comment thread tools/update-helmfile-checksums
@mchmarny mchmarny enabled auto-merge (squash) May 14, 2026 22:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]: Helmfile deployer for declarative, bash-free deployments

2 participants