Skip to content

refactor(kwok): migrate argocd-* and flux-oci sync gate from bash to chainsaw#1050

Merged
mchmarny merged 9 commits into
mainfrom
feat/kwok-chainsaw-sync-gate-962
May 27, 2026
Merged

refactor(kwok): migrate argocd-* and flux-oci sync gate from bash to chainsaw#1050
mchmarny merged 9 commits into
mainfrom
feat/kwok-chainsaw-sync-gate-962

Conversation

@mchmarny

@mchmarny mchmarny commented May 26, 2026

Copy link
Copy Markdown
Member

Summary

Migrates the KWOK argocd-* and flux-oci sync gate from ~500 lines of bash + jq (wait_for_argocd_sync, wait_for_argocd_root_app, dump_argocd_failures, and the symmetric flux-* peers) to idiomatic Chainsaw scenarios under tests/chainsaw/kwok/.

Final diff: 5 files, -520 / +362 LOC (net -158).

Fixes #962

Motivation / Context

#962 proposed moving the argocd-* sync gate to Chainsaw to (a) shorten the bash surface, (b) get structured per-resource failure output instead of opaque "GitOps sync timeout strike 1/3", and (c) align with the rest of the test surface. The flux-oci sync gate was added symmetrically after the issue was filed; this PR migrates both in one shot to avoid a half-bash / half-chainsaw state.

Type of Change

  • Refactoring (no functional changes)
  • Build/CI/tooling

Component(s) Affected

  • KWOK deployer matrix CI driver (kwok/scripts/validate-scheduling.sh)
  • tests/chainsaw/kwok/ (new)
  • .github/actions/kwok-test/action.yml + .github/workflows/kwok-recipes.yaml (chainsaw install wiring)

Implementation Notes

Chainsaw scenarios:

  • tests/chainsaw/kwok/argocd-sync/chainsaw-test.yaml — asserts root Application is present (30s grace) and every Application in argocd namespace reaches one of the 4 terminal-pass states. The 4-arm OR is a single JMESPath expression nested under status:. catch: block dumps per-App status + repo-server + application-controller logs (StatefulSet/Deployment fallback preserved for Argo CD chart 9.x vs. older).
  • tests/chainsaw/kwok/flux-sync/chainsaw-test.yaml — sequential per-stage assertions: OCIRepository → Kustomization → HelmReleases → ArtifactGenerators, each Ready=True. Per-step catch: blocks dump the controller logs for the failing stage. The HelmRelease predicate accepts both Ready=True AND the terminal fallback (Released=True + history[0]=='deployed' + Stalled!=True), mirroring the pre-migration bash contract so KWOK pod-readiness gaps don't false-negative.

Bash shim (thin):

  • wait_for_argocd_sync() and wait_for_flux_sync() survive as ~30-line shims that invoke chainsaw test with the right --set bindings and translate exit codes.
  • The other 4 functions (wait_for_argocd_root_app, dump_argocd_failures, wait_for_flux_root_app, dump_flux_failures) are removed.

What bash still owns:

  • Per-deployer binding selection (ARGOCD_ROOT_APP, Flux CR names) — still set in the matrix-shape preamble, passed to chainsaw via --set.
  • Exit-code translation. Chainsaw collapses pass/fail to 0/1; the shim re-emits EXIT_ARGOCD_SYNC_TIMEOUT=50 on chainsaw failure so run-all-recipes.sh's 3-strike rule still fires. Behavior difference vs. the old gate: hard errors (RBAC, CRD missing) now consume strike budget instead of failing fast (3 attempts then bail vs. 1). Acceptable tradeoff for the LOC reduction; documented in the shim comment.

CI wiring:

  • .github/actions/kwok-test/action.yml: added chainsaw_version + chainsaw_sha256 inputs, threaded through to setup-build-tools with install_chainsaw: 'true', added /usr/local/bin/chainsaw to the kwok-tools cache key.
  • .github/workflows/kwok-recipes.yaml: passes the versions from steps.versions.outputs.chainsaw / chainsaw_sha256_linux_amd64 into all three kwok-test matrix calls (Tier 1 / 2 / 3).

Diagnostic parity: every kubectl-log target in the prior dump_*_failures is reproduced in a chainsaw catch: script block, including the StatefulSet/Deployment fallback for argocd-application-controller (Argo chart 9.x).

What's deferred

Iterations during review

The migration required several CI cycles to converge on the correct chainsaw idioms. Each fix is its own commit for bisectability:

  1. refactor(kwok): migrate argocd-* and flux-oci sync gate from bash to chainsaw — initial migration.
  2. fix(kwok): address CodeRabbit review on chainsaw sync gate — three pre-CI findings:
    • --values key=value--set (the former is YAML-only; the inline override was a silent no-op).
    • HelmRelease pass criteria broadened to accept the Released+deployed+!Stalled terminal-fallback arm.
    • ArtifactGenerator apiVersion corrected to source.extensions.fluxcd.io/v1beta1 (extension API).
  3. fix(ci): install chainsaw on KWOK matrix runners — first CI run surfaced "chainsaw not found on PATH". Added install_chainsaw: 'true' + version+sha256 inputs to the kwok-test composite action and threaded the values from the workflow.
  4. fix(kwok): skip chainsaw cleanup-delete on KWOK clusters — second run surfaced cleanup ERROR: context deadline exceeded. Chainsaw auto-creates a per-test temp namespace and deletes it during cleanup; on KWOK there's no kube-controller-manager to finalize the kubernetes namespace finalizer, so the namespace stays in Terminating indefinitely. spec.skipDelete: true resolves it (the tests create no resources of their own).
  5. fix(kwok): pull --set overrides through bindings; fix OCIRepository apiVersion — third run: argocd-oci failed actual resource not found on argocd/aicr-stack (rootApp default) because chainsaw's --set populates $values, not bindings. Routed bindings through ($values.<key> || '<default>'). Same run: flux-oci failed no matches for source.toolkit.fluxcd.io/v1beta2 — Flux v2.x serves OCIRepository at /v1, not v1beta2.
  6. fix(kwok): scope HelmRelease + ArtifactGenerator asserts to flux-system — fourth run: HelmRelease assertion was implicitly scoped to chainsaw's per-test temp namespace (chainsaw-main-goblin/*) where no HRs exist. Added explicit metadata.namespace: flux-system. Also flipped ArtifactGenerator from assert: (which fails on zero-match) to error: (which fires only when a non-Ready AG exists) so pure-Helm flux bundles don't false-negative.

Testing

Final CI run on 5c00091a: all 70 KWOK matrix jobs green (helm × 14 + argocd-oci × 14 + argocd-helm-oci × 14 + flux-oci × 14 across aks/eks/gke-cos/kind/lke/oke at base/inference/training), plus the broader PR status (Qualification, ClamAV, grype, etc.) — 89 success, 0 failures, 12 skipped (Tier 2/3 lanes that only run on schedule).

Local checks:

chainsaw lint test -f tests/chainsaw/kwok/argocd-sync/chainsaw-test.yaml   # valid
chainsaw lint test -f tests/chainsaw/kwok/flux-sync/chainsaw-test.yaml     # valid
bash -n kwok/scripts/validate-scheduling.sh                                # syntax OK
shellcheck kwok/scripts/validate-scheduling.sh                             # clean (pre-existing SC1091 unrelated)
yamllint -c .yamllint.yaml tests/chainsaw/kwok/                            # clean
make lint                                                                   # 0 issues

Risk Assessment

  • Medium — touches the CI driver every PR depends on. Easy to revert (single bash file + 2 new yaml files + workflow plumbing). The full KWOK matrix is now green end-to-end on this PR, which is the strongest validation possible short of merging.

Rollout notes: no runtime impact. Affects only the KWOK CI matrix driver. After merge, open PRs rebasing onto main will see the new sync-gate path; if anything regresses, revert is one git command.

Checklist

  • Tests pass locally (make lint; chainsaw lint test; shellcheck; yamllint)
  • Full KWOK matrix green on CI (the only validation path that exercises the migration end-to-end)
  • Linter passes
  • I did not skip/disable tests to make CI green
  • I added/updated tests for new functionality (negative-path test deferred to follow-up; see "What's deferred")
  • Documentation updated (CI driver comments + shim docstring explain the new contract)
  • Changes follow existing patterns in the codebase (mirrors recipes/checks/*/health-check.yaml chainsaw idiom)
  • Commits are cryptographically signed (git commit -S)

…chainsaw

Replaces ~500 lines of bash + jq sync-wait machinery in
kwok/scripts/validate-scheduling.sh (wait_for_argocd_root_app,
wait_for_argocd_sync, dump_argocd_failures, and the symmetric flux-*
peers) with idiomatic Chainsaw scenarios under tests/chainsaw/kwok/.

What moved to Chainsaw:

  - tests/chainsaw/kwok/argocd-sync/chainsaw-test.yaml: asserts the root
    Application is present (30s grace) and every Application in the
    argocd namespace reaches one of the 4 terminal-pass states (Synced+
    Healthy, Synced+Progressing, OutOfSync+Healthy+Succeeded, Synced+
    Degraded+Succeeded). 4-arm OR is a single JMESPath expression
    nested under `status:`. catch: block dumps per-App status +
    repo-server + application-controller logs (StatefulSet/Deployment
    fallback preserved verbatim).

  - tests/chainsaw/kwok/flux-sync/chainsaw-test.yaml: sequential
    assertions on OCIRepository → Kustomization → HelmReleases →
    ArtifactGenerators, each Ready=True. Per-step catch: blocks dump
    the controller logs for the failing stage.

What stayed in bash:

  - Per-deployer binding selection: ARGOCD_ROOT_APP and the Flux CR
    names are still set in the matrix-shape preamble.
  - Exit-code translation: Chainsaw collapses pass/fail to 0/1; the
    bash shim re-emits EXIT_ARGOCD_SYNC_TIMEOUT (50) on chainsaw
    failure so run-all-recipes.sh's 3-strike rule still fires.
    Behavior difference vs. the old gate: hard errors (RBAC, CRD
    missing) consume strike budget instead of failing fast. Accepted
    tradeoff — 3 attempts then bail vs. 1.

What's deferred:

  - Negative-path test (issue #962 acceptance criterion #2). Requires
    fixture infrastructure to deploy a known-bad Application; out of
    scope for this PR. Follow-up.

Validation status:

  Cannot run KWOK + ArgoCD locally. Chainsaw `lint test` validates the
  schema (passes for both files). yamllint and shellcheck pass. The
  4-arm JMESPath expression syntax is structurally correct but its
  runtime behavior is validated only by CI's argocd-* and flux-oci
  matrix lanes. Marking PR as draft until CI confirms parity.

Fixes #962
@github-actions

github-actions Bot commented May 26, 2026

Copy link
Copy Markdown
Contributor

Coverage Report ✅

Metric Value
Coverage 77.0%
Threshold 75%
Status Pass
Coverage Badge
![Coverage](https://img.shields.io/badge/coverage-77.0%25-green)

No Go source files changed in this PR.

@coderabbitai

coderabbitai Bot commented May 26, 2026

Copy link
Copy Markdown

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR replaces in-script Argo CD and Flux sync polling with Chainsaw Test manifests and updates validate-scheduling.sh to call those tests. It adds tests/tests/chainsaw/kwok/argocd-sync and tests/chainsaw/kwok/flux-sync, refactors wait_for_argocd_sync and wait_for_flux_sync to invoke chainsaw, and updates the kwok-test composite action and workflows to install and pass Chainsaw version/checksum inputs.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately and concisely summarizes the main change: migrating ArgoCD and Flux sync gate logic from bash to Chainsaw, reflecting the primary refactoring objective.
Linked Issues check ✅ Passed The PR implementation meets all acceptance criteria from #962: (1) replaces bash/jq sync logic with Chainsaw tests for argocd and flux gates, (2) encodes all three pass predicates (Synced+Healthy; Synced+Progressing; OutOfSync+Healthy+phase=Succeeded) and rejects false positives, (3) preserves 3-strike behavior via EXIT_ARGOCD_SYNC_TIMEOUT exit-code translation in bash shims, (4) reproduces failure diagnostics (per-App status, repo-server/application-controller logs) via catch blocks equivalent to dump_argocd_failures. Negative-path test deferred to follow-up as noted.
Out of Scope Changes check ✅ Passed All changes are scoped to the argocd/flux sync-gate refactoring: Chainsaw test definitions, thin bash shims, GitHub Action inputs for Chainsaw tooling setup, and workflow variable passing. No unrelated modifications to bundle logic, verify_pods, or other KWOK components.
Description check ✅ Passed The PR description comprehensively explains the migration from bash to Chainsaw, covering implementation details, objectives, testing results, and risk assessment.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/kwok-chainsaw-sync-gate-962

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@kwok/scripts/validate-scheduling.sh`:
- Around line 1200-1203: The script is passing an inline key=value to Chainsaw's
--values (which expects YAML), so update the invocation in
validate-scheduling.sh where the variable chainsaw_bin is used: replace --values
"rootApp=${ARGOCD_ROOT_APP}" with either --set "rootApp=${ARGOCD_ROOT_APP}" (or
--set-string if you need to preserve it as a string) or feed proper YAML to
--values via a heredoc or temp file; change the call around the test invocation
that references chainsaw_bin and adjust quoting accordingly so the value is
passed with --set/--set-string or as a YAML file/stdin.

In `@tests/chainsaw/kwok/flux-sync/chainsaw-test.yaml`:
- Around line 105-117: The HelmRelease assertion "assert-all-helmreleases-ready"
is too strict—change the predicate to accept either Ready==True OR the terminal
fallback (Released==True with the first history entry marked deployed and
Stalled!=True); update the assert under resource HelmRelease to check
(conditions[?type=='Ready']|[0].status == 'True') OR (status.release.status ==
'Released' AND status.history[0].deployed == true AND
(conditions[?type=='Stalled']|[0].status != 'True')) so KWOK/Flux gate contract
semantics are preserved.
- Around line 142-145: The readiness assertion for kind ArtifactGenerator uses
the wrong apiVersion; update the selector in
tests/chainsaw/kwok/flux-sync/chainsaw-test.yaml so ArtifactGenerator is
referenced as apiVersion: source.extensions.fluxcd.io/v1beta1 (not
source.toolkit.fluxcd.io/v1beta2) so the readiness check
(conditions[?type=='Ready']|[0].status == 'True') can match actual CRs; locate
the ArtifactGenerator block in the YAML and replace the apiVersion value
accordingly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: 24293c8b-7720-4972-acc2-816cebd086c2

📥 Commits

Reviewing files that changed from the base of the PR and between b0bf12d and 4cf5b9b.

📒 Files selected for processing (3)
  • kwok/scripts/validate-scheduling.sh
  • tests/chainsaw/kwok/argocd-sync/chainsaw-test.yaml
  • tests/chainsaw/kwok/flux-sync/chainsaw-test.yaml

Comment thread kwok/scripts/validate-scheduling.sh
Comment thread tests/chainsaw/kwok/flux-sync/chainsaw-test.yaml
Comment thread tests/chainsaw/kwok/flux-sync/chainsaw-test.yaml Outdated
mchmarny and others added 3 commits May 26, 2026 16:44
Three valid findings from the initial CodeRabbit review:

1. Chainsaw `--values` is YAML-only (file/stdin/heredoc); inline
   `key=value` was a silent no-op that left bindings undefined,
   causing every argocd-* lane to assert against the default
   `aicr-stack` rootApp regardless of deployer. Switched both
   shims to `--set`, which is chainsaw's inline override flag and
   populates the same $values binding. This is the root cause of
   the failing argocd-oci and argocd-helm-oci matrix jobs.

2. HelmRelease pass criteria narrowed from the bash gate. The old
   `wait_for_flux_sync` accepted Ready=True OR the terminal
   fallback (Released=True + history[0]==deployed + Stalled\!=True)
   to avoid false negatives on KWOK lanes where helm-controller's
   install succeeded but the Ready condition lagged. Re-added the
   fallback as a JMESPath disjunction. Stalled\!=True gate prevents
   masking a genuinely failed reconcile with a stale prior
   Released.

3. ArtifactGenerator apiVersion. CR lives under
   `source.extensions.fluxcd.io/v1beta1` (extension API), not
   `source.toolkit.fluxcd.io/v1beta2` (core source-controller
   group). Matches the apiVersion the bundler emits at
   pkg/bundler/deployer/flux/templates/artifactgenerator.yaml.tmpl;
   the prior incorrect group would have silently matched zero CRs
   and no-op'd the stage-4 check.

Verification:
  - chainsaw lint test: both files valid
  - bash -n + shellcheck: clean (pre-existing SC1091 unrelated)
  - yamllint: clean

Still draft — CI will tell us whether the JMESPath expressions
behave correctly against live resources.
CI on PR #1050 surfaced "chainsaw not found on PATH" — the kwok-test
composite action installs kind/helm/kubectl/yq/flux/goreleaser but did
not install chainsaw. The bash sync gate that PR #1050 migrated now
depends on chainsaw at runtime via kwok/scripts/validate-scheduling.sh.

Three wiring changes:
- .github/actions/kwok-test/action.yml: add chainsaw_version +
  chainsaw_sha256 inputs and pass them through to setup-build-tools
  with install_chainsaw=true.
- Same file: cache /usr/local/bin/chainsaw alongside the other tool
  binaries in the kwok-tools cache.
- .github/workflows/kwok-recipes.yaml: thread the chainsaw version
  and sha256 from steps.versions.outputs into all three kwok-test
  matrix calls (Tier 1, Tier 2, Tier 3).
mchmarny and others added 4 commits May 26, 2026 17:42
CI surfaced: chainsaw's per-test temp namespace stalls in Terminating
on KWOK clusters (no kube-controller-manager to finalize the
`kubernetes` namespace finalizer), the 30s cleanup timeout fires, and
the test is reported as failed even when every assertion passed
inside 8s.

Set spec.skipDelete: true on both chainsaw scenarios. Safe because
neither test creates resources of its own — both only assert on
existing Application / HelmRelease / OCIRepository CRs in argocd or
flux-system namespaces.

Verified locally with chainsaw lint test (both valid).
…piVersion

Two issues surfaced by CI on the post-skipDelete run:

1. --set rootApp=nvidia-stack didn't override the binding. chainsaw's
   --set populates the implicit \$values binding, NOT the explicit
   spec.bindings — they're separate namespaces. Result: argocd-oci
   lane (rootApp=nvidia-stack) ran with the binding default
   (aicr-stack) and failed "actual resource not found" on every
   argocd-oci/flux-oci matrix job that didn't happen to match the
   default. Threaded the override through: bindings now read
   `(\$values.<key> || '<default>')` so the CLI --set value wins,
   with the literal fallback preserved for ad-hoc local runs.

2. OCIRepository apiVersion source.toolkit.fluxcd.io/v1beta2 — the
   stable group/version under Flux v2.x is /v1 (matches what the
   bundler emits at pkg/bundler/deployer/flux/templates/). The
   v1beta2 selector matched zero CRs and the assertion errored
   with "no matches for source.toolkit.fluxcd.io/v1beta2, Resource=".
   Switched to /v1.

Pattern from the previous CI run that confirms the diagnosis: helm
and argocd-helm-oci passed (argocd-helm-oci's rootApp default
aicr-stack matched the actual root App), argocd-oci and flux-oci
failed (rootApp/apiVersion mismatch).
Two flux-sync failures from CI run 26484518107:

1. HelmRelease assertion was implicitly scoped to chainsaw's per-test
   namespace (chainsaw-main-goblin), where no HRs exist:

     ASSERT | RUN | helm.toolkit.fluxcd.io/v2/HelmRelease @ chainsaw-main-goblin/*
     ASSERT | ERROR | "no actual resource found"

   The bundler places every HelmRelease in the Flux install namespace
   (Generator.Namespace, default "flux-system" via
   config.DefaultFluxNamespace). Added explicit metadata.namespace:
   flux-system so the selector finds them.

2. ArtifactGenerator step was `assert:` which fails on zero-match.
   Pure-Helm flux bundles don't emit ArtifactGenerators at all, so
   the step would fail on every recipe without local charts. Flipped
   to `error:` ("no AG in non-Ready state"), which:
     - PASSES on zero AGs (vacuously true)
     - PASSES when every AG is Ready=True
     - FAILS when any AG is non-Ready
   Same semantic as the pre-migration bash gate's optional check.
@mchmarny mchmarny marked this pull request as ready for review May 27, 2026 03:33
@mchmarny mchmarny requested review from a team as code owners May 27, 2026 03:33
@mchmarny mchmarny enabled auto-merge (squash) May 27, 2026 10:54
@yuanchen8911

Copy link
Copy Markdown
Contributor

Heads up — looks like the new chainsaw sync gate races ahead of child Application materialization for App-of-Apps deployers, causing intermittent (now reliable) failures on Tier 1: <recipe> (argocd-helm-oci) and (argocd-oci) KWOK jobs on PRs based on maind31e1f5d.

Filed #1061 with the analysis and a proposed minimum-invasive fix (add operationState.phase == 'Succeeded' to arms 1 and 2 of the JMESPath in tests/chainsaw/kwok/argocd-sync/chainsaw-test.yaml, so all four arms require a completed sync — guaranteeing the App-of-Apps has created its children before any Application is counted as passing).

Side-by-side pod-count evidence in the issue (35 pre-migration → 13 occasional pass → 5 reliable fail). No prior commit between 15:17 and 15:36 UTC triggered the transition, so it's a race condition that's now being lost more often, not a separate regression.

Happy to put up a PR if you'd like.

yuanchen8911 added a commit to yuanchen8911/aicr that referenced this pull request May 27, 2026
…IA#1061)

The chainsaw assertion at `assert-all-applications-pass` could pass
within ~4-5 seconds of helm install, before Argo CD had completed the
initial App-of-Apps sync that creates the child Applications. With only
the root Application present in the argocd namespace and the predicate
satisfied for that root alone, the assertion returned PASS while the
cluster was still half-deployed. The subsequent pod-scheduling check in
`validate-scheduling.sh` then tripped on KIND-default kube-system pods
(coredns, local-path-provisioner) that the AICR bundle's pods would
otherwise have rendered insignificant.

Same-SHA evidence: run 26523455927 on `8b5ef744` failed on first attempt
and passed on retry of identical content — a textbook race-condition
signature. The pre-NVIDIA#1050 bash gate avoided the race because its polling
loop took long enough that children materialized before the next
re-list.

Tighten the JMESPath predicate so all four arms require
`operationState.phase == 'Succeeded'`. An Application can only count as
passing after Argo CD has completed at least one sync operation; for the
App-of-Apps that's the gate ensuring the children exist and join the
selector's match set on subsequent polls. The four sync/health
combinations remain (canonical pass + the three KWOK/operator-mutation
relaxations the old bash gate accepted), so legitimate Application
states still resolve to PASS once Argo CD has actually run.

The sibling flux-sync chainsaw test does not need a parallel change: it
asserts `Ready=True` conditions on HelmRelease and Kustomization, which
already require their controllers to have completed reconciliation.

Also add `tests/chainsaw/kwok/**` to the `KWOK Cluster Validation`
workflow path filters. PR NVIDIA#1050 placed the new chainsaw scenarios under
`tests/chainsaw/kwok/`, but the workflow filters were not updated to
match — so changes to the scenarios silently skipped KWOK CI. Now any
edit to the chainsaw KWOK scenarios re-runs the matrix automatically.

Fixes NVIDIA#1061
Related NVIDIA#1050 (the chainsaw migration that introduced the race)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Refactor KWOK argocd-* sync gate from bash/jq to Chainsaw

3 participants