Skip to content

ci(kwok): add argocd-git lane with in-cluster gitea#1390

Merged
mchmarny merged 2 commits into
NVIDIA:mainfrom
haarchri:feat/kwok-argocd-git-lane
Jun 16, 2026
Merged

ci(kwok): add argocd-git lane with in-cluster gitea#1390
mchmarny merged 2 commits into
NVIDIA:mainfrom
haarchri:feat/kwok-argocd-git-lane

Conversation

@haarchri

Copy link
Copy Markdown
Contributor

Summary

Adds the argocd-git KWOK deployer lane: the filesystem bundle is git pushed to
the in-cluster Gitea and Argo CD's repo-server clones + reconciles the app-of-apps
from the Git repoURL the Argo CD counterpart to flux-git, completing the
Git-source round-trip matrix.

Motivation / Context

The OCI lanes (argocd-oci, argocd-helm-oci, flux-oci) and flux-git (#1290)
left one documented product surface uncovered: the Argo CD filesystem/Git output
shape
. With --output ./dir --repo <git-url>, the argocd deployer bakes the Git
URL into the app-of-apps and every per-component application.yaml (targetRevision: main, path-based) — entirely different rendering from the OCI path, and previously
exercised by no lane. This lane reuses the in-cluster Gitea infra flux-git already
established (same host-port 3300, same anonymous-public-repo contract).

Fixes: #963
Related: #1290 (flux-git), #843 (KWOK deployer matrix), #1288 (argocd sync-gate
exists-semantics, tracked separately)

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation update
  • Refactoring (no functional changes)
  • Build/CI/tooling

Component(s) Affected

  • CLI (cmd/aicr, pkg/cli)
  • API server (cmd/aicrd, pkg/server)
  • Recipe engine / data (pkg/recipe)
  • Bundlers (pkg/bundler, pkg/component/*)
  • Collectors / snapshotter (pkg/collector, pkg/snapshotter)
  • Validator (pkg/validator)
  • Core libraries (pkg/errors, pkg/k8s)
  • Docs/examples (docs/, examples/)
  • Other: KWOK test infra (kwok/, tests/chainsaw/kwok), CI workflow (.github/)

Implementation Notes

  • No Go changes. The argocd deployer's Git filesystem shape already exists
    (resolveRepoSettings() defaulting targetRevision to main,
    app-of-apps.yaml.tmpl, per-component application.yaml repoURL); this PR adds
    the CI lane that protects it.
  • Shared Gitea helpers. Extracted compute_gitea_urls() /
    push_bundle_to_gitea() in validate-scheduling.sh and refactored the existing
    flux-git branch onto them, so the runner→Service-DNS dual-view URL rule and the
    git init/commit/push --force main block live in one place.
  • install-infra.sh argocd-git case installs Argo CD + repo secret and
    Gitea (a superset of the argocd-oci install, mirroring how flux-git composes
    install_flux + install_gitea).
  • Sync gate. argocd-git reuses the existing source-agnostic argocd-sync
    chainsaw gate (root app nvidia-stack), via a dedicated argocd-git-sync sibling
    that adds one leading step asserting the root Application's spec.source.repoURL
    is the Git URL — the "Git mode engaged" proof (analogue of flux-git-sync's
    GitRepository-Ready step). The shared steps stay byte-identical between the two
    files (SYNC NOTE headers enforce this by convention).
  • Contract under test: anonymous public Git over plain HTTP (no repo
    credentials), exactly what the bundler's argocd output emits.
  • The known exists-semantics weakness of argocd-sync (assert-all-applications-pass)
    is out of scope here and tracked in a separate issue ([Bug]: ci(kwok): chainsaw sync gates pass when ANY resource converges, not ALL #1288 argocd half); this
    lane behaves identically to the existing argocd-oci/argocd-helm-oci lanes.

Testing

# Commands run (prefer `make qualify` for non-trivial changes)
make qualify

Risk Assessment

  • Low — Isolated change, well-tested, easy to revert
  • Medium — Touches multiple components or has broader impact
  • High — Breaking change, affects critical paths, or complex rollout

Rollout notes:

CI matrix grows by one cell per generic overlay (Tier 1) and one
deployer (Tier 3), staying under GitHub's 256-config cap. No new cluster topology —
the Gitea Service + host-port 3300 mapping shipped with #1290; clusters created
after that need no recreation. Existing argocd-oci/argocd-helm-oci/flux-* lanes
are unchanged.

Checklist

  • Tests pass locally (make test with -race)
  • Linter passes (make lint)
  • I did not skip/disable tests to make CI green
  • I added/updated tests for new functionality
  • I updated docs if user-facing behavior changed
  • Changes follow existing patterns in the codebase
  • Commits are cryptographically signed (git commit -S) — GPG signing info

@haarchri haarchri requested review from a team as code owners June 16, 2026 16:43
@copy-pr-bot

copy-pr-bot Bot commented Jun 16, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai

coderabbitai Bot commented Jun 16, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

This PR introduces the argocd-git deployer lane to the KWOK test matrix, completing filesystem-bundle round-trip coverage for Argo CD alongside the existing flux-git lane. The changes span CI workflow matrices (kwok-recipes.yaml, kwok-test action, Makefile), infra install logic (install-infra.sh gains an argocd-git case that deploys Argo CD and Gitea), and validate-scheduling.sh, which gains two new helper functions (compute_gitea_urls, push_bundle_to_gitea) and routes argocd-git through bundle generation, app-of-apps deployment, and a new Chainsaw sync gate (argocd-git-sync). Documentation in kwok/README.md, docs/contributor/tests.md, and ADR-010 is updated to reflect the expanded deployer matrix.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • NVIDIA/aicr#1290: Added the flux-git KWOK deployer lane using the same infra pattern (Gitea, compute_gitea_urls, push_bundle_to_gitea) that this PR reuses for argocd-git, modifying the same scripts and CI matrices.

Suggested labels

area/ci, area/tests, size/XL, area/docs, theme/ci-dx

Suggested reviewers

  • mchmarny
🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'ci(kwok): add argocd-git lane with in-cluster gitea' accurately and concisely describes the main change — adding the argocd-git KWOK deployer lane with in-cluster Gitea support.
Description check ✅ Passed The description clearly explains the purpose of the PR, its motivation, implementation details, testing approach, and risk assessment, all directly related to the changeset of adding the argocd-git lane.
Linked Issues check ✅ Passed The PR implements the core objectives from issue #963: Gitea installation, argocd-git lane addition to CI matrix, sync gate implementation, and documentation updates are all present and properly executed.
Out of Scope Changes check ✅ Passed All changes are directly aligned with the scope of implementing the argocd-git lane. No unrelated modifications to unrelated components or areas of the codebase were introduced.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
kwok/scripts/validate-scheduling.sh (1)

448-450: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add git to preflight checks for argocd-git.

Line 449 only requires git for flux-git, but argocd-git also calls push_bundle_to_gitea() and executes git commands. This currently fails later at runtime instead of failing fast in dependency preflight.

Suggested fix
-    [[ "$DEPLOYER" == "flux-git" ]] && required+=(git)
+    [[ "$DEPLOYER" == "flux-git" || "$DEPLOYER" == "argocd-git" ]] && required+=(git)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@kwok/scripts/validate-scheduling.sh` around lines 448 - 450, The preflight
check currently only adds `git` to the required dependencies array when the
DEPLOYER is "flux-git", but the `argocd-git` deployer also requires `git` since
it calls `push_bundle_to_gitea()` which executes git commands. Extend the
condition on line 449 to also require `git` when the DEPLOYER equals
"argocd-git" so that missing git is caught during preflight validation instead
of failing at runtime.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@kwok/scripts/validate-scheduling.sh`:
- Around line 1942-1946: The usage() function output is missing argocd-git from
the list of valid deployer options, while the validation logic at line 1942
accepts it as a valid value. Update the usage() function to include argocd-git
in both the deployer list description and any example output to ensure the CLI
help text is consistent with the actual accepted values.

---

Outside diff comments:
In `@kwok/scripts/validate-scheduling.sh`:
- Around line 448-450: The preflight check currently only adds `git` to the
required dependencies array when the DEPLOYER is "flux-git", but the
`argocd-git` deployer also requires `git` since it calls
`push_bundle_to_gitea()` which executes git commands. Extend the condition on
line 449 to also require `git` when the DEPLOYER equals "argocd-git" so that
missing git is caught during preflight validation instead of failing at runtime.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: 8bdf2e0a-f35c-4f59-bc55-82ebdfcf9a3f

📥 Commits

Reviewing files that changed from the base of the PR and between 9a741e6 and 36bd4df.

📒 Files selected for processing (11)
  • .github/actions/kwok-test/action.yml
  • .github/workflows/kwok-recipes.yaml
  • Makefile
  • docs/contributor/tests.md
  • docs/design/010-kwok-git-source-lanes.md
  • kwok/README.md
  • kwok/scripts/install-infra.sh
  • kwok/scripts/run-all-recipes.sh
  • kwok/scripts/validate-scheduling.sh
  • tests/chainsaw/kwok/argocd-git-sync/chainsaw-test.yaml
  • tests/chainsaw/kwok/argocd-sync/chainsaw-test.yaml

Comment thread kwok/scripts/validate-scheduling.sh
@github-actions

Copy link
Copy Markdown
Contributor

@mchmarny mchmarny left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean, well-scoped addition of the argocd-git KWOK lane — completes the #963 Git-source round-trip matrix for both GitOps controllers. No Go changes; the bundler's argocd Git output shape already existed and this wires the CI lane that protects it.

The git-mode gate is genuinely fail-closed via three independent guards, which is the thing that matters most here:

  • bundle-time repoURL grep on app-of-apps.yaml (rejects a silent OCI fallback before the push),
  • the new assert-root-app-git-source chainsaw step doing an exact repoURL == $repoURL match with no default binding (an unpassed value fails loud rather than vacuously passing),
  • the driver's empty-GIT_IN_CLUSTER_URL guard before the sync gate.

Consumer/matrix parity is complete — every site that enumerates the deployer set is updated (Tier 1 matrix + Tier 3 list, both shell allowlists, action.yml, Makefile, install-infra.sh, run-all-recipes.sh, ADR-010, tests.md, kwok/README.md). The compute_gitea_urls() / push_bundle_to_gitea() extraction with flux-git refactored onto it is the right dedup, and the SYNC NOTE headers on both chainsaw files flag the byte-identical-steps invariant. The known assert-all-applications-pass exists-semantics weakness is correctly scoped out and tracked (#1288).

One non-blocking nit inline on the bundle-time grep (unescaped regex interpolation; the chainsaw assert is the real authority).

Not blocking, but worth confirming before merge: the 146 KWOK checks are still queued — for a CI-lane PR the lanes actually going green is the real proof, so I'd wait for Tier 1 … (argocd-git) to pass rather than merge on the wiring review alone.

Comment thread kwok/scripts/validate-scheduling.sh
@mchmarny mchmarny enabled auto-merge (squash) June 16, 2026 17:44
@mchmarny mchmarny merged commit f2f8ed9 into NVIDIA:main Jun 16, 2026
148 of 149 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ci(kwok): add Gitea-in-cluster for filesystem-bundle round-trip coverage

2 participants