fix(auth): persist OAuth state for restart resilience#6034
fix(auth): persist OAuth state for restart resilience#6034clubanderson wants to merge 1 commit intofix/6022-6024-6026-securityfrom
Conversation
Replace the in-memory oauthStateStore map with a persistent oauth_states table so in-flight OAuth flows survive a backend restart between /auth/login and /auth/callback. Previously a restart between those two redirects would drop every pending state token and users would see csrf_validation_failed on the callback. - New oauth_states table + index in pkg/store/sqlite.go migrate() - Store interface gains StoreOAuthState / ConsumeOAuthState / CleanupExpiredOAuthStates (atomic single-transaction consume) - auth.go helpers become methods on *AuthHandler that delegate to the store; the in-memory map, global init goroutine, and inlineCleanup threshold are removed - NewAuthHandler starts a per-handler cleanup ticker that sweeps expired rows every oauthStateCleanupInterval - MockStore gains overridable StoreOAuthState / ConsumeOAuthState - Tests: store-level round-trip, single-use, expiry, cleanup, and a reopen-the-same-db "restart" test; handler-level restart test that instantiates two AuthHandlers against the same SQLite file Fixes #6028 Signed-off-by: Andrew Anderson <[email protected]>
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
👋 Hey @clubanderson — thanks for opening this PR!
This is an automated message. |
…covery, oauth_configured requires secret Fixes three backend auth edges: - #6056: /health reported oauth_configured=true when only GITHUB_CLIENT_ID was set but GITHUB_CLIENT_SECRET was missing. Such a config is unusable for the token exchange, so the probe misled the frontend into showing a GitHub login button that was guaranteed to fail. The check now requires both values. - #6063: A structurally malformed Authorization header (no Bearer prefix, 'Bearer' with no token, whitespace-only) returned 401 immediately even when a perfectly valid kc_auth cookie was attached. The middleware now treats a malformed header the same as an empty header and falls through to the cookie path. Complements #6026 (stale-but-valid-format header fallback already landed in #6031). - #6064: When validateAndConsumeOAuthState failed (stale OAuth tab, server restart that cleared the in-memory state store, back-button replay) the callback unconditionally redirected to /login?error=csrf_validation_failed even if the user already had a valid kc_auth cookie. The callback now checks for a non-expired, non-revoked JWT cookie on state failure and, if present, redirects to / so the live session is preserved. Complements #6034 which fixes the root cause of state loss on the server side. Signed-off-by: Andrew Anderson <[email protected]>
…covery, oauth_configured requires secret (#6076) Fixes three backend auth edges: - #6056: /health reported oauth_configured=true when only GITHUB_CLIENT_ID was set but GITHUB_CLIENT_SECRET was missing. Such a config is unusable for the token exchange, so the probe misled the frontend into showing a GitHub login button that was guaranteed to fail. The check now requires both values. - #6063: A structurally malformed Authorization header (no Bearer prefix, 'Bearer' with no token, whitespace-only) returned 401 immediately even when a perfectly valid kc_auth cookie was attached. The middleware now treats a malformed header the same as an empty header and falls through to the cookie path. Complements #6026 (stale-but-valid-format header fallback already landed in #6031). - #6064: When validateAndConsumeOAuthState failed (stale OAuth tab, server restart that cleared the in-memory state store, back-button replay) the callback unconditionally redirected to /login?error=csrf_validation_failed even if the user already had a valid kc_auth cookie. The callback now checks for a non-expired, non-revoked JWT cookie on state failure and, if present, redirects to / so the live session is preserved. Complements #6034 which fixes the root cause of state loss on the server side. Signed-off-by: Andrew Anderson <[email protected]>
Fixes #6028. Stacked on #6031 (base branch will auto-update to `main` once #6031 merges).
Problem
`pkg/api/handlers/auth.go` stored OAuth CSRF state tokens in an in-memory `map[string]time.Time`. If the backend restarted between `/auth/login` (where the state is generated and the user is redirected to GitHub) and `/auth/github/callback` (where the state must be validated), the map was gone and every in-flight user hit `csrf_validation_failed`.
Fix
Persist OAuth state in the backing store so in-flight flows survive restarts, as long as the user completes them within `oauthStateExpiration` (10 minutes, unchanged).
Tests
Verification
```
go build ./... # clean
go vet ./... # clean
go test ./pkg/store/... ./pkg/api/handlers/... # PASS
cd web && npm run build # PASS
```
Notes