Skip to content

fix(ci): unblock packaging dry-run and /ok-to-test startup failures#1128

Merged
mchmarny merged 1 commit into
mainfrom
fix/ci-workflows-startup-failures
May 30, 2026
Merged

fix(ci): unblock packaging dry-run and /ok-to-test startup failures#1128
mchmarny merged 1 commit into
mainfrom
fix/ci-workflows-startup-failures

Conversation

@mchmarny

Copy link
Copy Markdown
Member

Summary

Two unrelated CI workflows have been failing on main and are fixed here together.

  • Packaging Check (packaging.yaml) — failed after PR refactor(server): merge pkg/api into pkg/server, drop legacy direct handlers #1127 because the dry-run step did not set HOMEBREW_DEPLOY_KEY, so the brews template references to .Env.HOMEBREW_DEPLOY_KEY in .goreleaser.yaml blew up with template: failed to apply "{{ .Env.HOMEBREW_DEPLOY_KEY }}": map has no entry for key "HOMEBREW_DEPLOY_KEY". Fixed by injecting an empty placeholder; publish is skipped so the value is never consumed. The real secret is still provided by the release workflow in on-tag.yaml.
  • OK to Test (ok-to-test.yaml) — has been failing with startup_failure on every issue_comment trigger since PRs refactor(ci): make /ok-to-test's unprivileged status explicit via privileged_ci input #1064 / fix(ci): drop privileged grants from /ok-to-test fork path #1067. Root cause: GitHub validates the union of permissions declared by every job in the called workflow at workflow startup, before if: conditions evaluate. qualification.yaml's cli-e2e declares id-token: write and security-scan declares security-events: write, but the caller granted neither. Even though both jobs are gated by if: inputs.privileged_ci=false, the run never gets that far. Fixed by declaring those permissions on the reusable-workflow call; the gated jobs are still skipped on the fork path, so no fork-controlled code ever executes with elevated tokens.

Motivation/Context

The Packaging Check failure was visible in https://github.com/NVIDIA/aicr/actions/runs/26692166497 right after PR #1127 merged (any push to main touching .goreleaser.yaml triggers it). The OK to Test failure is older — 100+ consecutive startup_failure runs going back to 2026-05-28 (see https://github.com/NVIDIA/aicr/actions/runs/26691845494). Both are pre-existing latent issues exposed by recent changes; this PR clears them.

Type of Change

  • Bug fix (CI workflows)

Components Affected

  • CI (.github/workflows/packaging.yaml, .github/workflows/ok-to-test.yaml)

Implementation Notes

  • The ok-to-test.yaml change documents in-line why "deliberately withheld" permissions (per PR fix(ci): drop privileged grants from /ok-to-test fork path #1067) cannot actually be withheld at the workflow-call level once the reusable workflow declares them on any job. Security intent is preserved by the if: inputs.privileged_ci runtime gating, which keeps fork-controlled code from running with the declared tokens.
  • The packaging.yaml change keeps the fix minimal and CI-local rather than complicating the goreleaser template for production releases.

Testing

  • actionlint and yamllint clean on both modified files.
  • Verified locally that HOMEBREW_DEPLOY_KEY="" lets goreleaser release --snapshot --clean --skip=publish,sbom get past template evaluation and write the brew formula (final local-only failures were unrelated gcloud/docker sandbox issues, not present in CI).
  • Post-merge, the next push to main that touches .goreleaser.yaml should keep Packaging Check green, and the next /ok-to-test comment on any PR should authorize and dispatch the qualification suite.

Risk Assessment

Low. CI-only configuration changes. No application code or release artifacts affected.

Checklist

  • Changes pass actionlint and yamllint
  • No application code changes
  • Commit is GPG-signed (-S)

Two unrelated CI workflows were broken on main:

- packaging.yaml's GoReleaser Dry Run step did not provide
  HOMEBREW_DEPLOY_KEY, so the brews template references to
  .Env.HOMEBREW_DEPLOY_KEY in .goreleaser.yaml failed with
  "map has no entry for key". Inject an empty string placeholder;
  publish is skipped so the value is never used.

- ok-to-test.yaml could not start the called qualification.yaml
  workflow because cli-e2e and security-scan declare id-token: write
  and security-events: write. GitHub validates the union of permissions
  at workflow startup, before `if: inputs.privileged_ci` gating
  evaluates, so the run failed with startup_failure. Declare the
  permissions on the reusable-workflow call; the gated jobs are still
  skipped on the fork path (privileged_ci: false), so no
  fork-controlled code ever executes with elevated tokens.
@mchmarny mchmarny requested a review from a team as a code owner May 30, 2026 19:16
@mchmarny mchmarny added the bug label May 30, 2026
@mchmarny mchmarny self-assigned this May 30, 2026
@coderabbitai

coderabbitai Bot commented May 30, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: e9d90df3-adfa-403b-8cac-96b2b9c3d07b

📥 Commits

Reviewing files that changed from the base of the PR and between 5d45e98 and 0d18b3f.

📒 Files selected for processing (2)
  • .github/workflows/ok-to-test.yaml
  • .github/workflows/packaging.yaml

📝 Walkthrough

Walkthrough

This PR updates two GitHub Actions workflow files with configuration clarifications and new permission declarations. The ok-to-test.yaml workflow documentation is revised to explain why id-token: write and security-events: write permissions must be declared even for fork PRs—the reusable qualification.yaml workflow validates the union of permissions at startup before if: conditions evaluate, and fails with startup_failure if required permissions are missing. The tests job now explicitly includes these permissions in its permissions block. The packaging.yaml workflow's dry-run job sets HOMEBREW_DEPLOY_KEY to an empty string with documentation noting this placeholder prevents template evaluation failures during dry runs.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~5 minutes

Possibly related PRs

  • NVIDIA/aicr#1067: Both PRs modify .github/workflows/ok-to-test.yaml to change the tests job permissions passed into qualification.yaml, specifically around handling id-token: write and security-events: write for fork /ok-to-test runs.
  • NVIDIA/aicr#1064: Updates .github/workflows/ok-to-test.yaml and its qualification.yaml call to explicitly include id-token: write/security-events: write permissions and document their required startup validation, directly overlapping with permission handling changes.

Suggested labels

size/S, area/ci

Suggested reviewers

  • lockwobr
🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main changes: fixing two CI workflow failures (packaging dry-run and ok-to-test startup).
Description check ✅ Passed The description is detailed and directly related to the changeset, explaining the root causes and fixes for both workflow failures.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/ci-workflows-startup-failures

Comment @coderabbitai help to get the list of available commands and usage tips.

@mchmarny mchmarny merged commit 1c82645 into main May 30, 2026
33 of 34 checks passed
@mchmarny mchmarny deleted the fix/ci-workflows-startup-failures branch May 30, 2026 19:27
@github-actions

Copy link
Copy Markdown
Contributor

Coverage Report ✅

Metric Value
Coverage 76.2%
Threshold 75%
Status Pass
Coverage Badge
![Coverage](https://img.shields.io/badge/coverage-76.2%25-green)

No Go source files changed in this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant