Postmortem -
Read details
Feb 21, 08:53 UTC
Resolved -
## Timeline
**07:00 UTC - Issue detected**
The authentication service began experiencing failures due to database connectivity issues caused by elevated CPU on the shared database server from unrelated workloads.
**07:04 UTC - Service degraded**
All authentication requests began failing, including user logins, token grants, and token refreshes.
**07:15 UTC - Investigation started**
Engineering identified the root cause as database connection pool exhaustion following the database instability. The Kubernetes health checks were causing the service to restart repeatedly before it could recover.
**08:00 UTC - Remediation applied**
Health check thresholds were adjusted and the database connection pool was resized to handle the backlog of queued requests.
**~09:30 UTC - Service restored**
The authentication service stabilized and resumed normal operation. No further restarts observed.
Feb 21, 07:00 UTC