fix(security): gitops RBAC, exec session lifecycle, auth header fallback#6031
fix(security): gitops RBAC, exec session lifecycle, auth header fallback#6031clubanderson merged 1 commit intomainfrom
Conversation
…eader fallback Bundles three backend security fixes: - #6022: Mutating GitOps endpoints (sync, helm upgrade/uninstall/rollback, ArgoCD sync) are now gated as editor-or-admin via a shared requireEditorOrAdmin helper in pkg/api/handlers/auth_helpers.go. The legacy admin-only helper in gitops.go was removed; drift detection is now gated as viewer-or-above since it is read-oriented. Tests cover viewer/editor/admin for all five mutation endpoints plus drift. - #6024: /ws/exec sessions now register their stream context cancel funcs in a per-user registry (pkg/api/handlers/exec.go). The auth Logout handler calls the new CancelUserExecSessions(userID) after wsHub.DisconnectUser, which was previously a no-op for exec because the exec handler does not participate in the WebSocket Hub. Deferred cleanup in the exec handler removes the entry on normal session end so the registry stays bounded by live sessions. - #6026: JWTAuth middleware now falls back to the kc_auth cookie when the Authorization header token fails to parse AND a valid cookie is present. Without this, in-flight requests after a silent token refresh returned 401 even though the session was valid. The fallback only engages on header parse failure — a valid header is still preferred unconditionally. Fixes #6022 Fixes #6024 Fixes #6026 Signed-off-by: Andrew Anderson <[email protected]>
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
✅ Deploy Preview for kubestellarconsole canceled.
|
|
👋 Hey @clubanderson — thanks for opening this PR!
This is an automated message. |
There was a problem hiding this comment.
Pull request overview
Backend security hardening across GitOps RBAC enforcement, exec session termination on logout, and JWT auth token source selection to avoid stale-header 401s.
Changes:
- Add shared RBAC helpers and gate GitOps mutation/drift endpoints to match the updated role policy.
- Track active
/ws/execsessions per user and cancel them during logout to prevent post-logout shells. - Update JWTAuth middleware to fall back to a valid
kc_authcookie when a stale/invalid Authorization header is present; add unit tests.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| pkg/api/middleware/auth.go | Adds cookie fallback when Authorization header JWT parsing fails. |
| pkg/api/middleware/auth_test.go | Adds tests covering stale-header + valid-cookie fallback scenarios. |
| pkg/api/handlers/auth_helpers.go | Introduces shared RBAC helpers used by GitOps handlers. |
| pkg/api/handlers/gitops.go | Applies shared RBAC helpers to GitOps sync/helm/argocd sync and drift detection. |
| pkg/api/handlers/gitops_test.go | Refactors/expands RBAC tests for GitOps endpoints. |
| pkg/api/handlers/exec.go | Adds per-user exec session registry and registers cancel funcs for logout-driven teardown. |
| pkg/api/handlers/exec_test.go | Adds unit tests for exec session registry cancellation/unregister behavior. |
| pkg/api/handlers/auth.go | Cancels active exec sessions on logout in addition to hub-based WS disconnects. |
| // Register this cancel with the per-user exec session registry so a later | ||
| // Logout call can tear the stream down even though /ws/exec is not tracked | ||
| // by the WebSocket Hub (#6024). Only register when we have a real userID — | ||
| // in dev/demo without a valid UserID claim there is nothing to key on. | ||
| var execRegistrationID int64 |
There was a problem hiding this comment.
There is a race window where an exec session can outlive logout: the execCancel registration happens only after init parsing and k8s/executor setup. If the user logs out after JWT validation but before this registration point, Logout won’t find this session in the registry, and the exec stream can continue running even though the token has been revoked. Consider creating the execCtx/execCancel and registering it immediately after successful JWT validation (and unregistering via defer), so the registry covers the full lifetime of the WS connection from the moment it becomes authenticated.
| func gitopsMutationCases() []mutationCase { | ||
| return []mutationCase{ | ||
| { | ||
| name: "sync", | ||
| path: "/api/gitops/sync", |
There was a problem hiding this comment.
RBAC tests for mutating GitOps endpoints don’t include POST /api/gitops/argocd/sync, even though TriggerArgoSync is now gated by requireEditorOrAdmin (#6022). Adding an argocd-sync entry to gitopsMutationCases (with a per-case expected status or an assertion like “not 403” for the allowed roles, since the handler may return 503 when k8sClient is nil) would prevent regressions where that endpoint accidentally becomes viewer-accessible again.
Fixes #6029. SSE streaming endpoints (streamClusters in sse.go) ran inside SetBodyStreamWriter callbacks that blocked for up to sseOverallDeadline (~30s), so when a user logged out any in-flight SSE streams continued to emit cluster_data events until the deadline fired or the client disconnected. Mirrors the exec session cancellation pattern added in #6031 for /ws/exec sessions (#6024): - Add a per-user SSE session registry in sse.go with registerSSESession / unregisterSSESession / CancelUserSSEStreams, guarded by a plain sync.Mutex. - In streamClusters, capture the authenticated user ID before the deferred SetBodyStreamWriter callback runs (the fiber.Ctx may be reused by then) and register the stream context's cancel func in the registry, with a deferred unregister for normal stream end. - In auth.Logout, after CancelUserExecSessions, call CancelUserSSEStreams(claims.UserID) so logout tears down any live SSE streams the user had open. - Add sse_test.go with the four lifecycle tests mirroring exec_test.go: CancelsRegisteredContexts, OtherUserUnaffected, UnregisterSSESession_RemovesEntry, and NoSessions (no panic on empty map). streamDemoSSE is left unchanged — demo streams are instant, they do not block, so they need no cancellation path. Signed-off-by: Andrew Anderson <[email protected]>
|
Thank you for your contribution! Your PR has been merged. Check out what's new:
Stay connected: Slack #kubestellar-dev | Multi-Cluster Survey |
|
Post-merge build verification passed ✅ Both Go and frontend builds compiled successfully against merge commit |
Fixes #6029. SSE streaming endpoints (streamClusters in sse.go) ran inside SetBodyStreamWriter callbacks that blocked for up to sseOverallDeadline (~30s), so when a user logged out any in-flight SSE streams continued to emit cluster_data events until the deadline fired or the client disconnected. Mirrors the exec session cancellation pattern added in #6031 for /ws/exec sessions (#6024): - Add a per-user SSE session registry in sse.go with registerSSESession / unregisterSSESession / CancelUserSSEStreams, guarded by a plain sync.Mutex. - In streamClusters, capture the authenticated user ID before the deferred SetBodyStreamWriter callback runs (the fiber.Ctx may be reused by then) and register the stream context's cancel func in the registry, with a deferred unregister for normal stream end. - In auth.Logout, after CancelUserExecSessions, call CancelUserSSEStreams(claims.UserID) so logout tears down any live SSE streams the user had open. - Add sse_test.go with the four lifecycle tests mirroring exec_test.go: CancelsRegisteredContexts, OtherUserUnaffected, UnregisterSSESession_RemovesEntry, and NoSessions (no panic on empty map). streamDemoSSE is left unchanged — demo streams are instant, they do not block, so they need no cancellation path. Signed-off-by: Andrew Anderson <[email protected]>
✅ Post-Merge Verification: passedCommit: |
…covery, oauth_configured requires secret Fixes three backend auth edges: - #6056: /health reported oauth_configured=true when only GITHUB_CLIENT_ID was set but GITHUB_CLIENT_SECRET was missing. Such a config is unusable for the token exchange, so the probe misled the frontend into showing a GitHub login button that was guaranteed to fail. The check now requires both values. - #6063: A structurally malformed Authorization header (no Bearer prefix, 'Bearer' with no token, whitespace-only) returned 401 immediately even when a perfectly valid kc_auth cookie was attached. The middleware now treats a malformed header the same as an empty header and falls through to the cookie path. Complements #6026 (stale-but-valid-format header fallback already landed in #6031). - #6064: When validateAndConsumeOAuthState failed (stale OAuth tab, server restart that cleared the in-memory state store, back-button replay) the callback unconditionally redirected to /login?error=csrf_validation_failed even if the user already had a valid kc_auth cookie. The callback now checks for a non-expired, non-revoked JWT cookie on state failure and, if present, redirects to / so the live session is preserved. Complements #6034 which fixes the root cause of state loss on the server side. Signed-off-by: Andrew Anderson <[email protected]>
Two cleanups flagged on the merged security bundle PR. 1. exec.go race window — close the gap where /ws/exec could outlive logout. Previously the exec context was created and registered in the per-user registry only AFTER reading the init message, looking up the k8s client + REST config, and building the SPDY executor. If the user logged out during that ~hundreds-of-ms window, CancelUserExecSessions had nothing to cancel — the about-to-start exec session leaked past logout. Move execCtx + registerExecSession() to immediately after JWT validation, before any of the long-running setup. Registration now happens essentially synchronously with the first byte the handler receives, so a follow-up Logout always finds and cancels the in-flight session. 2. gitops_test.go — add /api/gitops/argocd/sync to gitopsMutationCases so TriggerArgoSync is exercised by the editor/admin/viewer RBAC matrix the same way the helm-* and sync handlers are. Adding the test surfaced a pre-existing ordering bug in TriggerArgoSync: the k8sClient nil check ran BEFORE body parsing and required-field validation, so a request with valid auth and a valid body shape would 503 instead of returning the 400 the matrix expects. Reordered to body-parse → required-fields → validateK8sName → k8sClient check, matching Sync / UpgradeHelmRelease conventions. Fixes #6075 Signed-off-by: Andrew Anderson <[email protected]>
Fixes #6029. SSE streaming endpoints (streamClusters in sse.go) ran inside SetBodyStreamWriter callbacks that blocked for up to sseOverallDeadline (~30s), so when a user logged out any in-flight SSE streams continued to emit cluster_data events until the deadline fired or the client disconnected. Mirrors the exec session cancellation pattern added in #6031 for /ws/exec sessions (#6024): - Add a per-user SSE session registry in sse.go with registerSSESession / unregisterSSESession / CancelUserSSEStreams, guarded by a plain sync.Mutex. - In streamClusters, capture the authenticated user ID before the deferred SetBodyStreamWriter callback runs (the fiber.Ctx may be reused by then) and register the stream context's cancel func in the registry, with a deferred unregister for normal stream end. - In auth.Logout, after CancelUserExecSessions, call CancelUserSSEStreams(claims.UserID) so logout tears down any live SSE streams the user had open. - Add sse_test.go with the four lifecycle tests mirroring exec_test.go: CancelsRegisteredContexts, OtherUserUnaffected, UnregisterSSESession_RemovesEntry, and NoSessions (no panic on empty map). streamDemoSSE is left unchanged — demo streams are instant, they do not block, so they need no cancellation path. Signed-off-by: Andrew Anderson <[email protected]>
…covery, oauth_configured requires secret (#6076) Fixes three backend auth edges: - #6056: /health reported oauth_configured=true when only GITHUB_CLIENT_ID was set but GITHUB_CLIENT_SECRET was missing. Such a config is unusable for the token exchange, so the probe misled the frontend into showing a GitHub login button that was guaranteed to fail. The check now requires both values. - #6063: A structurally malformed Authorization header (no Bearer prefix, 'Bearer' with no token, whitespace-only) returned 401 immediately even when a perfectly valid kc_auth cookie was attached. The middleware now treats a malformed header the same as an empty header and falls through to the cookie path. Complements #6026 (stale-but-valid-format header fallback already landed in #6031). - #6064: When validateAndConsumeOAuthState failed (stale OAuth tab, server restart that cleared the in-memory state store, back-button replay) the callback unconditionally redirected to /login?error=csrf_validation_failed even if the user already had a valid kc_auth cookie. The callback now checks for a non-expired, non-revoked JWT cookie on state failure and, if present, redirects to / so the live session is preserved. Complements #6034 which fixes the root cause of state loss on the server side. Signed-off-by: Andrew Anderson <[email protected]>
Two cleanups flagged on the merged security bundle PR. 1. exec.go race window — close the gap where /ws/exec could outlive logout. Previously the exec context was created and registered in the per-user registry only AFTER reading the init message, looking up the k8s client + REST config, and building the SPDY executor. If the user logged out during that ~hundreds-of-ms window, CancelUserExecSessions had nothing to cancel — the about-to-start exec session leaked past logout. Move execCtx + registerExecSession() to immediately after JWT validation, before any of the long-running setup. Registration now happens essentially synchronously with the first byte the handler receives, so a follow-up Logout always finds and cancels the in-flight session. 2. gitops_test.go — add /api/gitops/argocd/sync to gitopsMutationCases so TriggerArgoSync is exercised by the editor/admin/viewer RBAC matrix the same way the helm-* and sync handlers are. Adding the test surfaced a pre-existing ordering bug in TriggerArgoSync: the k8sClient nil check ran BEFORE body parsing and required-field validation, so a request with valid auth and a valid body shape would 503 instead of returning the 400 the matrix expects. Reordered to body-parse → required-fields → validateK8sName → k8sClient check, matching Sync / UpgradeHelmRelease conventions. Fixes #6075 Signed-off-by: Andrew Anderson <[email protected]>
Fixes #6029. SSE streaming endpoints (streamClusters in sse.go) ran inside SetBodyStreamWriter callbacks that blocked for up to sseOverallDeadline (~30s), so when a user logged out any in-flight SSE streams continued to emit cluster_data events until the deadline fired or the client disconnected. Mirrors the exec session cancellation pattern added in #6031 for /ws/exec sessions (#6024): - Add a per-user SSE session registry in sse.go with registerSSESession / unregisterSSESession / CancelUserSSEStreams, guarded by a plain sync.Mutex. - In streamClusters, capture the authenticated user ID before the deferred SetBodyStreamWriter callback runs (the fiber.Ctx may be reused by then) and register the stream context's cancel func in the registry, with a deferred unregister for normal stream end. - In auth.Logout, after CancelUserExecSessions, call CancelUserSSEStreams(claims.UserID) so logout tears down any live SSE streams the user had open. - Add sse_test.go with the four lifecycle tests mirroring exec_test.go: CancelsRegisteredContexts, OtherUserUnaffected, UnregisterSSESession_RemovesEntry, and NoSessions (no panic on empty map). streamDemoSSE is left unchanged — demo streams are instant, they do not block, so they need no cancellation path. Signed-off-by: Andrew Anderson <[email protected]>
Fixes #6029. SSE streaming endpoints (streamClusters in sse.go) ran inside SetBodyStreamWriter callbacks that blocked for up to sseOverallDeadline (~30s), so when a user logged out any in-flight SSE streams continued to emit cluster_data events until the deadline fired or the client disconnected. Mirrors the exec session cancellation pattern added in #6031 for /ws/exec sessions (#6024): - Add a per-user SSE session registry in sse.go with registerSSESession / unregisterSSESession / CancelUserSSEStreams, guarded by a plain sync.Mutex. - In streamClusters, capture the authenticated user ID before the deferred SetBodyStreamWriter callback runs (the fiber.Ctx may be reused by then) and register the stream context's cancel func in the registry, with a deferred unregister for normal stream end. - In auth.Logout, after CancelUserExecSessions, call CancelUserSSEStreams(claims.UserID) so logout tears down any live SSE streams the user had open. - Add sse_test.go with the four lifecycle tests mirroring exec_test.go: CancelsRegisteredContexts, OtherUserUnaffected, UnregisterSSESession_RemovesEntry, and NoSessions (no panic on empty map). streamDemoSSE is left unchanged — demo streams are instant, they do not block, so they need no cancellation path. Signed-off-by: Andrew Anderson <[email protected]>
Summary
Bundles three backend security fixes targeting RBAC, session lifecycle, and authentication flow. All three are small, targeted changes with unit tests; no frontend changes.
#6022 — GitOps endpoints lack RBAC (editor-or-admin for mutations)
Mutating GitOps endpoints are now gated by a shared
requireEditorOrAdminhelper extracted topkg/api/handlers/auth_helpers.go(new file). The legacy admin-only helper insidegitops.gowas removed, and drift detection is now gated as viewer-or-above (it is read-oriented but still sensitive).Gated endpoints:
POST /api/gitops/sync— editor-or-adminPOST /api/gitops/helm-upgrade— editor-or-adminPOST /api/gitops/helm-uninstall— editor-or-adminPOST /api/gitops/helm-rollback— editor-or-adminPOST /api/gitops/argocd/sync— editor-or-adminPOST /api/gitops/detect-drift— viewer-or-aboveThe shared helper takes a
store.Storeso it can be reused by any handler without refactoring existing handler structs (theCardHandler.requireEditorOrAdminmethod incards.gois left alone to avoid unrelated churn).#6024 — Exec sessions usable after logout
/ws/execdid not participate in the WebSocket Hub, soHub.DisconnectUser(userID)in Logout could not reach exec sessions. The exec handler now registers its stream context cancel func in a per-user registry (execSessionsmap guarded by async.Mutex). On normal session end a deferred cleanup removes the entry so the registry stays bounded by live session count, not lifetime count.Logoutcalls the new exportedCancelUserExecSessions(userID)afterwsHub.DisconnectUser. That unblocksexecutor.StreamWithContext, which in turn causes the WebSocket read loop to exit and close the connection.#6026 — Stale Authorization header overrides cookie
JWTAuthmiddleware unconditionally preferred theAuthorizationheader. If the header token was stale (happens after silent refresh — in-flight requests may still carry the old bearer value), the middleware returned 401 even when thekc_authcookie held a valid token.Fix: if header parse fails AND a
kc_authcookie is present AND the cookie parses successfully, use the cookie. Only engaged on header-parse failure, so the valid-header happy path is unchanged. Added an inline comment explaining the scenario for future readers.Implementation notes
pkg/api/handlers/auth_helpers.go(new) holdsrequireEditorOrAdminandrequireViewerOrAboveas package-level functions taking astore.Storeparameter. This lets gitops reuse the logic without embedding a base struct and without touching cards.go (which still has its own handler-method form).sync.Mutex(not RWMutex) as requested — add/remove/cancel are all infrequent and short. Map layout ismap[uuid.UUID]map[int64]context.CancelFuncso unregister by session id works without iterating.testExecCancelWait = 250 * time.MillisecondandstatusPastRBAC = http.StatusBadRequestare the only new constants; both named.GitOpsHandlers.requireAdminmethod was deleted, not wrapped. Its imports (middleware,models) were also pruned from gitops.go.Test plan
go build ./...passesgo vet ./...passesgo test ./pkg/... -count=1passes (all packages green, including 12 new/updated tests acrossgitops_test.go,exec_test.go,auth_test.go)cd web && npm run buildpasses including post-build safety checks/api/gitops/sync→ 403/api/gitops/sync→ reaches body validation/ws/exec,POST /auth/logoutin another tab, verify shell diesFixes #6022
Fixes #6024
Fixes #6026