fix(bundler): split helmfile bundle into CRD + main sub-helmfiles#922
Merged
Conversation
helm-diff (and therefore `helmfile apply` / `helmfile diff`) renders
every release against the live cluster's REST mapper before any
release is installed. On a fresh cluster this fails for charts whose
templates reference a CRD installed by a different chart in the same
bundle (e.g., nodewright-customizations uses kind: Skyhook, registered
by nodewright-operator). `helmfile`'s needs: directive sequences
installs but not the diff pre-pass, so the document aborts before any
mutation.
Fix: when any referenced component is registry-marked installsCRDs,
the helmfile deployer emits a multi-helmfiles document:
helmfile.yaml # helmfiles: [crds.yaml, releases.yaml]
crds.yaml # every CRD-owner release
releases.yaml # everything else
helmfile processes the helmfiles: list sequentially, so by the time
releases.yaml renders, every CRD installed by crds.yaml is registered
with the cluster's REST mapper. Single-file layout is preserved when
no marked component is present (no behavior change for non-mixed
bundles), and the path collapses to a single helmfile.yaml in the
pathological "every component is a CRD-owner" case.
Marks cert-manager, nodewright-operator, kube-prometheus-stack,
nvsentinel, agentgateway-crds, and slinky-slurm-operator-crds as
installsCRDs: true. Adds split-aware shape assertions, dedicated
unit tests for the partition + layout decision, golden files for the
mixed case, and a registry-sentinel guard. Generated README documents
the helm-diff plugin prerequisite and the helmfiles: explainer.
Chainsaw test cli-bundle-helmfile gains per-leaf iteration.
Demo cuj1-gke-config.md: drops `--skip-diff-on-install`.
Fixes #914
This comment was marked as resolved.
This comment was marked as resolved.
Contributor
Coverage Report ✅
Coverage BadgeMerging this branch will decrease overall coverage
Coverage by fileChanged files (no unit tests)
Please note that the "Total", "Covered", and "Missed" counts above refer to code statements instead of lines of code. The value in brackets refers to the test coverage of that file in the old version of the code. |
Three follow-ups from the review on the helmfile CRD-split PR: 1. Bound the test Generate calls with context.WithTimeout (testGenerateTimeout = 30s). The pkg/bundler I/O path should carry a deadline per project rules; context.Background was a gap. 2. Differentiate "file exists" from other os.Stat errors in the single-file regression guards. The prior `\!os.IsNotExist(err)` would have silently failed-open on a permission error. Extracted the duplicated check into assertSubHelmfilesAbsent. 3. Critical: fix the subshell exit in the chainsaw per-component-values-paths step. `awk | while … exit 1` only exits the subshell; the parent script kept returning 0 on broken bundles. Switched to a temp-file intermediate (process substitution would be cleaner but is bash-only — chainsaw runs scripts under /bin/sh).
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@pkg/bundler/deployer/helmfile/crd_split_test.go`:
- Around line 200-202: The test discards the error from yaml.Unmarshal into
TopHelmfile (variable top) before asserting on top.Helmfiles; change the call to
capture and check the error (e.g. err := yaml.Unmarshal(data, &top)) and fail
the test immediately on error (using t.Fatalf/t.Fatal or the same assertion
helper used for the later Helmfile unmarshal) so you don't proceed with a
zero-value top; reference the TopHelmfile unmarshalling site and mirror the
error handling used for the later Helmfile unmarshal.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Enterprise
Run ID: e89f5f98-512d-4e23-8f42-64bdc4ab79ce
📒 Files selected for processing (2)
pkg/bundler/deployer/helmfile/crd_split_test.gotests/chainsaw/cli/bundle-helmfile/chainsaw-test.yaml
lockwobr
approved these changes
May 15, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
helm-diffrenders every release against the live cluster's REST mapper before any release is installed. On a fresh cluster this fails for charts whose templates reference a CRD installed by a different chart in the same bundle (e.g.,nodewright-customizationsuseskind: Skyhook, registered bynodewright-operator).helmfile'sneeds:directive sequences installs but not the diff pre-pass, so the document aborts before any mutation withno matches for kind "Skyhook" in version "skyhook.nvidia.com/v1alpha1".This PR implements option C1 from the issue: when any referenced component is registry-marked
installsCRDs, the helmfile deployer emits a multi-helmfiles:document so CRDs land before consumers render.Fixes: #914
Related: N/A
Type of Change
Component(s) Affected
pkg/bundler,pkg/component/*)pkg/recipe)docs/,examples/)Implementation Notes
Layout when split is active:
helmfileprocesses thehelmfiles:list sequentially, so by the timereleases.yamlrenders, every CRD installed bycrds.yamlis registered with the cluster's REST mapper. The cross-sub-helmfileneeds:edge (e.g., gpu-operator → cert-manager) dissolves because sub-helmfile sequencing already enforces ordering.Decision tree in
Generator.writeHelmfileLayout:helmfile.yaml(legacy path, byte-identical for non-mixed bundles).helmfile.yaml(split would add no value).Registry marks (
installsCRDs: true) added to:cert-manager— Certificate / Issuer / ClusterIssuer / Order / Challenge / CertificateRequest.nodewright-operator— Skyhook (the originating failure).kube-prometheus-stack— ServiceMonitor / PodMonitor / Alertmanager / Prometheus / etc.nvsentinel— HealthCheck / NodeHealth.agentgateway-crds— CRDs-only chart.slinky-slurm-operator-crds— CRDs-only chart.Other components (gpu-operator, network-operator, nvidia-dra-driver-gpu, kueue, kubeflow-trainer, etc.) also install CRDs but are not consumed by other charts in the bundle — leaving them unmarked keeps the main
releases.yamlchain longer and the split surface smaller. Easy to opt in if a future consumer surfaces the same bug.Tests added:
pkg/bundler/deployer/helmfile/crd_split_test.go—TestSplitFoldersByCRD(partition shape, pre/post inheritance),TestSplitFoldersByCRD_NilCRDSet,TestGenerate_SplitLayout(end-to-end three-file layout + needs dissolution),TestGenerate_AllCRD_CollapsesToSingleFile,TestGenerate_NoCRD_KeepsSingleFile,TestRegistry_CRDOwnersStillMarked(sentinel guard against rename/unmark).pkg/bundler/deployer/helmfile/helmfile_test.go—assertHelmfileShaperefactored into split-aware top-level walker + leaf checker;upstream_helm_onlyscenario flipped to the split layout with newcrds.yaml+releases.yamlgoldens.tests/chainsaw/cli/bundle-helmfile/chainsaw-test.yaml— every step that inspectedhelmfile.yaml's leaf shape now resolves aLEAVESlist (one entry in the single-file case, two in the split case) and iterates.Generated README gains:
helm-diffplugin prerequisite (helmfile invokes it under the hood forapply/diff) and a paragraph explaining thehelmfiles:layout whenHasCRDLayeris true.Testing
All 21 chainsaw tests + go test -race + golangci-lint + yamllint pass. Coverage on
pkg/bundler/deployer/helmfileunchanged.Manual end-to-end via
chainsaw test ./tests/chainsaw/cli/bundle-helmfile --no-clusteragainst an eks-h100-ubuntu-training recipe (which triggers the split because cert-manager + kube-prometheus-stack + nodewright-operator + nvsentinel are all marked): top-levelhelmfile.yamlcarrieshelmfiles: [crds.yaml, releases.yaml], both sub-files parse,helmfile buildresolves the cross-file release graph cleanly.Risk Assessment
git revert.Rollout notes: No CLI flag, no API surface change. Operators of existing bundles will see the split layout the next time they regenerate;
helmfile applycontinues to work without flags. The README in the generated bundle explains the new layout.Checklist
make testwith-race)make lint)git commit -S)