Skip to content

Ensure ephemeral runner is deleted from the service on exit != 0#4260

Merged
nikola-jokic merged 3 commits intomasterfrom
nikola-jokic/remove-er-exit-nonzero
Oct 6, 2025
Merged

Ensure ephemeral runner is deleted from the service on exit != 0#4260
nikola-jokic merged 3 commits intomasterfrom
nikola-jokic/remove-er-exit-nonzero

Conversation

@nikola-jokic
Copy link
Copy Markdown
Collaborator

@nikola-jokic nikola-jokic commented Sep 26, 2025

When runner exits with exit code 1, make sure that it is removed from the API.
The assumption is:

  1. The runner if exits with 0, means that the runner entrypoint was successful, therefore the runner successfully deregistered itself.
  2. If the exit code is 1, it can be due to a disruption. The disruption could be caused either by a node or something else, which caused the runner not to deregister itself but get interrupted, or the entrypoint was wrong.
  3. If the entrypoint was wrong, the delete request would cause an increase in API requests, but that is how it previously worked, so it would lead just to a slower reconcile loop. Since the entrypoint is mostly run.sh, which is shipped by the runner, this case is rare or unlikely to occur.
  4. The more likely case is that the runner couldn't finish due to an interruption. Then, we would delete the runner but we also should clean it up from the service.

Copilot AI review requested due to automatic review settings September 26, 2025 11:33
@nikola-jokic nikola-jokic added the gha-runner-scale-set Related to the gha-runner-scale-set mode label Sep 26, 2025
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds cleanup logic to ensure ephemeral runners are properly removed from the GitHub Actions service when a pod fails and exits with a non-zero code. The change prevents orphaned runners from remaining registered in the service after their associated pods have been deleted due to failure.

Key Changes

  • Added service cleanup after deleting failed ephemeral runner pods
  • Implemented proper error handling for the runner removal process

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Comment on lines +345 to +349
return ctrl.Result{}, nil
}
if err := actionsClient.RemoveRunner(ctx, int64(ephemeralRunner.Status.RunnerId)); err != nil {
log.Error(err, "Failed to remove the runner from the service")
return ctrl.Result{}, nil
Copy link

Copilot AI Sep 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both error cases return ctrl.Result{}, nil instead of the actual error. This swallows the errors and prevents proper error propagation. Consider returning the errors or logging them more prominently if the intention is to continue despite failures.

Suggested change
return ctrl.Result{}, nil
}
if err := actionsClient.RemoveRunner(ctx, int64(ephemeralRunner.Status.RunnerId)); err != nil {
log.Error(err, "Failed to remove the runner from the service")
return ctrl.Result{}, nil
return ctrl.Result{}, err
}
if err := actionsClient.RemoveRunner(ctx, int64(ephemeralRunner.Status.RunnerId)); err != nil {
log.Error(err, "Failed to remove the runner from the service")
return ctrl.Result{}, err

Copilot uses AI. Check for mistakes.
Comment on lines +345 to +349
return ctrl.Result{}, nil
}
if err := actionsClient.RemoveRunner(ctx, int64(ephemeralRunner.Status.RunnerId)); err != nil {
log.Error(err, "Failed to remove the runner from the service")
return ctrl.Result{}, nil
Copy link

Copilot AI Sep 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both error cases return ctrl.Result{}, nil instead of the actual error. This swallows the errors and prevents proper error propagation. Consider returning the errors or logging them more prominently if the intention is to continue despite failures.

Suggested change
return ctrl.Result{}, nil
}
if err := actionsClient.RemoveRunner(ctx, int64(ephemeralRunner.Status.RunnerId)); err != nil {
log.Error(err, "Failed to remove the runner from the service")
return ctrl.Result{}, nil
return ctrl.Result{}, err
}
if err := actionsClient.RemoveRunner(ctx, int64(ephemeralRunner.Status.RunnerId)); err != nil {
log.Error(err, "Failed to remove the runner from the service")
return ctrl.Result{}, err

Copilot uses AI. Check for mistakes.
@nikola-jokic nikola-jokic merged commit 94a6f3c into master Oct 6, 2025
31 of 33 checks passed
@nikola-jokic nikola-jokic deleted the nikola-jokic/remove-er-exit-nonzero branch October 6, 2025 09:38
Okabe-Junya added a commit to mercari/actions-runner-controller that referenced this pull request Jan 5, 2026
* Updates: runner to v2.318.0 container-hooks to v0.6.1 (actions#3684)

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Updates: runner to v2.319.0 (actions#3702)

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Updates: runner to v2.319.1 (actions#3708)

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Bassem Dghaidi <[email protected]>

* Add exponential backoff when generating runner reg tokens (actions#3724)

* Updates: runner to v2.320.0 (actions#3763)

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Updates: runner to v2.321.0 container-hooks to v0.6.2 (actions#3809)

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Fix ARC e2e tests (actions#3836)

* Make EphemeralRunnerController MaxConcurrentReconciles configurable (actions#3832)

Co-authored-by: Bassem Dghaidi <[email protected]>

* Make EphemeralRunnerReconciler create runner pods earlier (actions#3831)

Co-authored-by: Bassem Dghaidi <[email protected]>

* Bump github.com/bradleyfalzon/ghinstallation/v2 from 2.8.0 to 2.12.0 (actions#3837)

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Bassem Dghaidi <[email protected]>

* Update docs with details for the dashboard visualizations (actions#3696)

Co-authored-by: Bassem Dghaidi <[email protected]>

* Make k8s client rate limiter parameters configurable (actions#3848)

Co-authored-by: Taketoshi Fujiwara <[email protected]>

* Bump golang.org/x/crypto from 0.22.0 to 0.31.0 (actions#3844)

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Bassem Dghaidi <[email protected]>

* Prepare `0.10.0` release (actions#3849)

* Fix helm chart bug related to `runnerMaxConcurrentReconciles` (actions#3858)

* Prepare `0.10.1` release (actions#3859)

* Update dependabot config to group packages (& include actions eco) (actions#3880)

* Fix template tests and add go test on gha-validate-chart (actions#3886)

* cmd/ghalistener/config: export Validate (actions#3870)

Co-authored-by: Han-Wen Nienhuys <[email protected]>

* Updated dead link (actions#3830)

Co-authored-by: Nikola Jokic <[email protected]>

* docs: end markdown code block correctly (actions#3736)

* Clarify syntax for `githubConfigSecret` (actions#3812)

Co-authored-by: Bassem Dghaidi <[email protected]>

* Bump golang.org/x/net from 0.25.0 to 0.33.0 (actions#3881)

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Nikola Jokic <[email protected]>

* Updates: runner to v2.322.0 (actions#3893)

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Sanitize labels ending in hyphen, underscore, and dot (actions#3664)

* metrics cardinality for ghalistener (actions#3671)

Co-authored-by: Bassem Dghaidi <[email protected]>
Co-authored-by: Nikola Jokic <[email protected]>

* Rename log from target/actual to build/autoscalingRunnerSet version (actions#3957)

* Use Ready from the pod conditions when setting it to the EphemeralRunner (actions#3891)

* AutoscalingRunnerSet env: not Rendering correctly (actions#3826)

Co-authored-by: Nikola Jokic <[email protected]>

* Drop verbose flag from runner scale set init-dind-externals copy (actions#3805)

* Include custom annotations and labels to all resources created by `gha-runner-scale-set` chart (actions#3934)

* Remove old githubrunnerscalesetlistener, remove warning and fix config bug (actions#3937)

* Wrap errors in controller helper methods and swap logic in cleanups (actions#3960)

* Clean up as much as possible in a single pass for the EphemeralRunner reconciler (actions#3941)

* Use gha-runner-scale-set-controller.chart instead of .Chart.Version (actions#3729)

Co-authored-by: Nikola Jokic <[email protected]>

* Trim volume and container helpers in gha-runner-scale-set (actions#3807)

Co-authored-by: Bassem Dghaidi <[email protected]>

* Small readme updates for readability  (actions#3860)

* Update all dependencies, conforming to the new controller-runtime API (actions#3949)

* feat: allow namespace overrides (actions#3797)

Signed-off-by: Jesús Fernández <[email protected]>
Co-authored-by: Nikola Jokic <[email protected]>

* chore: Added `OwnerReferences` during resource creation for `EphemeralRunnerSet`, `EphemeralRunner`, and `EphemeralRunnerPod` (actions#3575)

* Updates: runner to v2.323.0 (actions#3976)

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Bump github.com/golang-jwt/jwt/v4 from 4.5.1 to 4.5.2 (actions#3984)

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Add events role permission to leader_election_role (actions#3988)

* Create configurable metrics (actions#3975)

* Prepare 0.11.0 release (actions#3992)

* Fix busy runners metric (actions#4016)

* Bump the gomod group across 1 directory with 7 updates (actions#4008)

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Nikola Jokic <[email protected]>

* Include more context to errors raised by github/actions client (actions#4032)

Co-authored-by: Copilot <[email protected]>

* Pin third party actions (actions#3981)

* upgrade(golangci-lint): v2.1.2  (actions#4023)

Signed-off-by: karamaru-alpha <[email protected]>

* Revised dashboard (actions#4022)

* feat(helm): move dind to sidecar (actions#3842)

* Fix code block fences (actions#3140)

Co-authored-by: Mosè Giordano <[email protected]>

* Add missing backtick to metrics.serviceMonitor.namespace line to correct formatting (actions#3790)

* Bump go version (actions#4075)

* Create backoff mechanism for failed runners and allow re-creation of failed ephemeral runners (actions#4059)

* Updates: runner to v2.324.0 container-hooks to v0.7.0 (actions#4086)

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Fix docker lint warnings (actions#4074)

* Relax version requirements to allow patch version mismatch (actions#4080)

Co-authored-by: Copilot <[email protected]>

* Refactor resource naming removing unnecessary calculations (actions#4076)

* Allow use of client id as an app id (actions#4057)

* Updates: runner to v2.325.0 (actions#4109)

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Add job_workflow_ref label to listener metrics (actions#4054)

Signed-off-by: rskmm0chang <[email protected]>

* Bump github.com/cloudflare/circl from 1.6.0 to 1.6.1 (actions#4118)

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Add startup probe to dind side-car (actions#4117)

* Avoid nil point when config.Metrics is nil and expose all metrics if none are configured (actions#4101)

Co-authored-by: Nikola Jokic <[email protected]>

* Add response body to error when fetching access token (actions#4005)

Co-authored-by: mluffman <[email protected]>
Co-authored-by: Nikola Jokic <[email protected]>

* Delete config secret when listener pod gets deleted (actions#4033)

Co-authored-by: Nikola Jokic <[email protected]>

* Azure Key Vault integration to resolve secrets (actions#4090)

* Bump github.com/golang-jwt/jwt/v5 from 5.2.1 to 5.2.2 (actions#4120)

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Prepare 0.12.0 release (actions#4122)

* Bump build-push-action to 6.18.0 (actions#4123)

* Remove cache for build-push-action (actions#4124)

* Fix indentation of startupProbe attributes in dind sidecar (actions#4126)

* Fix dind sidecar template (actions#4128)

* Remove duplicate float64 call (actions#4139)

* Remove check if runner exists after exit code 0 (actions#4142)

* Explicitly requeue during backoff ephemeral runner (actions#4152)

* Prepare 0.12.1 release (actions#4153)

* Update CodeQL workflow for v3 (global-run-codeql.yaml) (actions#4157)

* Bump the actions group across 1 directory with 5 updates (actions#4160)

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* feat(runner): add ubuntu 24.04 support (actions#3598)

* Fix image pull secrets list arguments in the chart (actions#4164)

* Remove workflow actions version comments since upgrades are done via dependabot (actions#4161)

* Updates: runner to v2.326.0 (actions#4176)

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Update example GitHub URLs in values.yaml to include an example for enterprise account-level runners (actions#4181)

* Add Missing Languages to CodeQL Advanced Configuration (actions#4179)

* Updates: runner to v2.327.0 (actions#4185)

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Remove deprecated preserveUnknownFields from CRDs (actions#4135)

* Updates: runner to v2.327.1 (actions#4188)

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Remove JIT config from ephemeral runner status field (actions#4191)

* Fix usage of underscore in Runner Scale Set name (actions#3545)

Co-authored-by: Nikola Jokic <[email protected]>

* Bump docker/login-action from 3.4.0 to 3.5.0 in the actions group (actions#4196)

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump actions/checkout from 4 to 5 in the actions group (actions#4205)

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Updates: runner to v2.328.0 (actions#4209)

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Requeue if create pod returns already exists error (actions#4201)

* docs: fix repo path typo (actions#4229)

* Update CODEOWNERS (actions#4251)

* Update CODEOWNERS to include new maintainer (actions#4253)

* Remove ephemeral runner when exit code != 0 and is patched with the job (actions#4239)

* Add workflow name and target labels (actions#4240)

* Bump the actions group across 1 directory with 5 updates (actions#4262)

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Introduce new kubernetes-novolume mode (actions#4250)

Co-authored-by: Copilot <[email protected]>

* Ensure ephemeral runner is deleted from the service on exit != 0 (actions#4260)

* docs: fix broken Grafana dashboard JSON path (actions#4270)

* Potential fix for code scanning alert no. 3: Workflow does not contain permissions (actions#4273)

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>

* Potential fix for code scanning alert no. 1: Workflow does not contain permissions (actions#4274)

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
Co-authored-by: Copilot <[email protected]>
Co-authored-by: jiaren-wu <[email protected]>
Co-authored-by: Copilot <[email protected]>

* Bump all dependencies (actions#4266)

* Bump the gomod group across 1 directory with 4 updates (actions#4277)

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Prepare 0.13.0 release (actions#4280)

* Revert "gha: customize client-go rate limiter params (#4)"

This reverts commit 8728190.
Keep several instrumentations

* Revert "gha: make MaxConcurrentReconciles for each reconciler configurable (#1)"

This reverts commit 057a1e7.

* chore(chart): bump version to 0.13.0-rc.1 for testing

---------

Signed-off-by: dependabot[bot] <[email protected]>
Signed-off-by: Jesús Fernández <[email protected]>
Signed-off-by: karamaru-alpha <[email protected]>
Signed-off-by: rskmm0chang <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Bassem Dghaidi <[email protected]>
Co-authored-by: Yusuke Kuoka <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Ken Muse <[email protected]>
Co-authored-by: Taketoshi Fujiwara <[email protected]>
Co-authored-by: Rob Herley <[email protected]>
Co-authored-by: Nikola Jokic <[email protected]>
Co-authored-by: Han-Wen Nienhuys <[email protected]>
Co-authored-by: Han-Wen Nienhuys <[email protected]>
Co-authored-by: Matteo Bianchi <[email protected]>
Co-authored-by: James Ward <[email protected]>
Co-authored-by: John Wesley Walker III <[email protected]>
Co-authored-by: &es <[email protected]>
Co-authored-by: Chris Johnston <[email protected]>
Co-authored-by: thinkbiggerltd <[email protected]>
Co-authored-by: Cees-Jan Kiewiet <[email protected]>
Co-authored-by: Mikey Smet <[email protected]>
Co-authored-by: Patrick Vickery <[email protected]>
Co-authored-by: Salman Chishti <[email protected]>
Co-authored-by: J. Fernández <[email protected]>
Co-authored-by: kahirokunn <[email protected]>
Co-authored-by: David Maxwell <[email protected]>
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Ryosei Karaki <[email protected]>
Co-authored-by: Borislav Velkov <[email protected]>
Co-authored-by: Mosè Giordano <[email protected]>
Co-authored-by: Mosè Giordano <[email protected]>
Co-authored-by: scodef <[email protected]>
Co-authored-by: Ryo Sakamoto <[email protected]>
Co-authored-by: Tingluo Huang <[email protected]>
Co-authored-by: Nash Luffman <[email protected]>
Co-authored-by: mluffman <[email protected]>
Co-authored-by: Wim Fournier <[email protected]>
Co-authored-by: Jeev B <[email protected]>
Co-authored-by: Mark Huijgen <[email protected]>
Co-authored-by: calx <[email protected]>
Co-authored-by: adjn <[email protected]>
Co-authored-by: Ho Kim <[email protected]>
Co-authored-by: Cory Calahan <[email protected]>
Co-authored-by: Kylie Stradley <[email protected]>
Co-authored-by: Alex Hatzenbuhler <[email protected]>
Co-authored-by: clechevalli <[email protected]>
Co-authored-by: zkpepe <[email protected]>
Co-authored-by: Dennis Stone <[email protected]>
Co-authored-by: Berat Postalcioglu <[email protected]>
Co-authored-by: Jiaren Wu <[email protected]>
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
Co-authored-by: Copilot <[email protected]>
Co-authored-by: jiaren-wu <[email protected]>
Co-authored-by: Junya Okabe <[email protected]>
unpollito pushed a commit to DistruApp/actions-runner-controller that referenced this pull request Jan 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

gha-runner-scale-set Related to the gha-runner-scale-set mode

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants