Skip to content

[26.1 backport] gha: add guardrails timeouts on all jobs#48647

Merged
thaJeztah merged 2 commits intomoby:26.1from
austinvazquez:cherry-pick-c68c9aed8cb3916669de6d7f2c564279ec83663f-to-26.1
Oct 12, 2024
Merged

[26.1 backport] gha: add guardrails timeouts on all jobs#48647
thaJeztah merged 2 commits intomoby:26.1from
austinvazquez:cherry-pick-c68c9aed8cb3916669de6d7f2c564279ec83663f-to-26.1

Conversation

@austinvazquez
Copy link
Copy Markdown
Contributor

- What I did

- How I did it

git cherry-pick -xsS 6b7e2783d1e68c4da3764525e3e8e74b85d0d8c8
git cherry-pick -xsS c68c9aed8cb3916669de6d7f2c564279ec83663f

- Description for the changelog

n/a

- A picture of a cute animal (not mandatory but encouraged)

We had a few "runaway jobs" recently, where the job got stuck, and kept
running for 6 hours (in one case even 24 hours, probably due some github
outage). Some of those jobs could not be terminated.

While running these actions on public repositories doesn't cost us, it's
still not desirable to have jobs running for that long (as they can still
hold up the queue).

This patch adds a blanket "2 hours" time-limit to all jobs that didn't
have a limit set. We should look at tweaking those limits to actually
expected duration, but having a default at least is a start.

Also changed the position of some existing timeouts so that we have a
consistent order in which it's set; making it easier to spot locations
where no limit is defined.

Signed-off-by: Sebastiaan van Stijn <[email protected]>
(cherry picked from commit 6b7e278)
Signed-off-by: Austin Vazquez <[email protected]>
We had a couple of runs where these jobs got stuck and github
actions didn't allow terminating them, so that they were only
terminated after 120 minutes.

These jobs usually complete in 5 minutes, so let's give them
a shorter timeout. 20 minutes should be enough (don't @ me).

Signed-off-by: Sebastiaan van Stijn <[email protected]>
(cherry picked from commit c68c9ae)
Signed-off-by: Austin Vazquez <[email protected]>
@austinvazquez
Copy link
Copy Markdown
Contributor Author

I was thinking it was be good to have these in the maintenance branches.

@thaJeztah
Copy link
Copy Markdown
Member

I was thinking it was be good to have these in the maintenance branches.

Yes, it is! I thought it wasn't critical but definitely good to have. And ... evidently we need to be even more aggressive; this PR had one of the bin-image jobs to hang, and it was terminated after 2 hours; looks like we can set those a lot shorter as well; https://github.com/moby/moby/actions/runs/11301696824/job/31436480660?pr=48647

Screenshot 2024-10-12 at 16 08 33

Running it again completed in less than 4 minutes

Screenshot 2024-10-12 at 16 16 25

Not sure what's the cause of these though; they started to show up more recently. Either something changed in the GHA runners, or some deadlock somewhere (but I don't think the docker engine versions changed in GHA)

Copy link
Copy Markdown
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@thaJeztah thaJeztah merged commit 95807d2 into moby:26.1 Oct 12, 2024
@austinvazquez austinvazquez deleted the cherry-pick-c68c9aed8cb3916669de6d7f2c564279ec83663f-to-26.1 branch October 13, 2024 03:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants