Skip to content
This repository was archived by the owner on Sep 30, 2024. It is now read-only.

Alert admins if containers are crashing/restarting #9793

Closed
emidoots opened this issue Apr 13, 2020 · 3 comments · Fixed by #17239
Closed

Alert admins if containers are crashing/restarting #9793

emidoots opened this issue Apr 13, 2020 · 3 comments · Fixed by #17239

Comments

@emidoots
Copy link
Member

when containers restart, alerts should fire. Should work in all deployment environments.

@uwedeportivo
Copy link
Contributor

Dear all,

This is your release captain speaking. 🚂🚂🚂

Branch cut for the 3.15 release is scheduled for tomorrow.

Is this issue / PR going to make it in time? Please change the milestone accordingly.
When in doubt, reach out!

Thank you

@emidoots emidoots added the planned/3.15 Issues that were planned for the given milestone. Used by cmd/tracking-issue. label Apr 15, 2020
@emidoots emidoots modified the milestones: 3.15, 3.16 Apr 15, 2020
@emidoots emidoots added container-metrics planned/3.16 Issues that were planned for the given milestone. Used by cmd/tracking-issue. and removed planned/3.15 Issues that were planned for the given milestone. Used by cmd/tracking-issue. labels Apr 16, 2020
@emidoots emidoots removed this from the 3.16 milestone Apr 24, 2020
@emidoots emidoots removed the planned/3.16 Issues that were planned for the given milestone. Used by cmd/tracking-issue. label Apr 24, 2020
@bobheadxi
Copy link
Member

bobheadxi commented May 29, 2020

This should be covered in all environments except server deployments (since https://github.com/sourcegraph/sourcegraph/issues/9791) by the cadvisor_container_restart_count metric and the associated alert, right?

@emidoots
Copy link
Member Author

I think the only thing left to do here is confirm this works in k8s and docker-compose deployments. Then, we should promote the alert to a critical one (so admins would be paged) and ensure it won't fire on sourcegraph/server deployments.

@emidoots emidoots changed the title monitoring for container restarts Alert admins when containers crash/restart Jun 23, 2020
@emidoots emidoots changed the title Alert admins when containers crash/restart Alert admins if containers are crashing/restarting Jun 23, 2020
@emidoots emidoots added this to the Backlog milestone Jul 1, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants