-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Failed EphemeralRunners block launching new pods #3685
Copy link
Copy link
Closed
Labels
bugSomething isn't workingSomething isn't workinggha-runner-scale-setRelated to the gha-runner-scale-set modeRelated to the gha-runner-scale-set modeneeds triageRequires review from the maintainersRequires review from the maintainers
Description
Checks
- I've already read https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners-with-actions-runner-controller/troubleshooting-actions-runner-controller-errors and I'm sure my issue is not covered in the troubleshooting guide.
- I am using charts that are officially provided
Controller Version
0.8.3
Deployment Method
Helm
Checks
- This isn't a question or user support case (For Q&A and community support, go to Discussions).
- I've read the Changelog before submitting this issue and I'm sure it's not due to any recently-introduced backward-incompatible changes
To Reproduce
1. Trigger a `FailedScheduling` event.
2. Wait for 5 failures in pod scheduling.
3. Recover the cluster.
4. New ephemeral runner pods will not be scheduled to meet capacity.Describe the bug
When EphemeralRunners are in Failed state they get stuck in that state, which prevents other pods from being launched. This issue has been previously noted in these discussions.
status:
currentRunners: 17
failedEphemeralRunners: 16
pendingEphemeralRunners: 0
runningEphemeralRunners: 1
https://github.com/actions/actions-runner-controller/discussions/3300
https://github.com/actions/actions-runner-controller/discussions/3610
Describe the expected behavior
Failed Ephemeral runners will be cleared, so scheduling can be retired.
Additional Context
https://github.com/actions/actions-runner-controller/discussions/3610
https://github.com/actions/actions-runner-controller/discussions/3300Controller Logs
2024-06-20T19:18:03Z INFO listener-app.worker.kubernetesworker Ephemeral runner set scaled. {"namespace": "my-scaleset-ns", "name": "my-runner-6pzbd", "replicas": 3}
2024-06-20T19:18:03Z INFO listener-app.listener Getting next message {"lastMessageID": 11}
2024-06-20T19:18:11Z INFO listener-app.listener Getting next message {"lastMessageID": 14}
2024-06-20T19:18:53Z INFO listener-app.listener Getting next message {"lastMessageID": 11}
2024-06-20T19:19:01Z INFO listener-app.listener Getting next message {"lastMessageID": 14}Runner Pod Logs
2024-06-21T16:22:44Z INFO listener-app.worker.kubernetesworker Ephemeral runner set scaled. {"namespace": "my-scaleset", "name": "my-runner-rpvp2", "replicas": 10}Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workinggha-runner-scale-setRelated to the gha-runner-scale-set modeRelated to the gha-runner-scale-set modeneeds triageRequires review from the maintainersRequires review from the maintainers