Skip to content

fix(security): gitops RBAC, exec session lifecycle, auth header fallback#6031

Merged
clubanderson merged 1 commit intomainfrom
fix/6022-6024-6026-security
Apr 10, 2026
Merged

fix(security): gitops RBAC, exec session lifecycle, auth header fallback#6031
clubanderson merged 1 commit intomainfrom
fix/6022-6024-6026-security

Conversation

@clubanderson
Copy link
Copy Markdown
Collaborator

Summary

Bundles three backend security fixes targeting RBAC, session lifecycle, and authentication flow. All three are small, targeted changes with unit tests; no frontend changes.

#6022 — GitOps endpoints lack RBAC (editor-or-admin for mutations)

Mutating GitOps endpoints are now gated by a shared requireEditorOrAdmin helper extracted to pkg/api/handlers/auth_helpers.go (new file). The legacy admin-only helper inside gitops.go was removed, and drift detection is now gated as viewer-or-above (it is read-oriented but still sensitive).

Gated endpoints:

  • POST /api/gitops/sync — editor-or-admin
  • POST /api/gitops/helm-upgrade — editor-or-admin
  • POST /api/gitops/helm-uninstall — editor-or-admin
  • POST /api/gitops/helm-rollback — editor-or-admin
  • POST /api/gitops/argocd/sync — editor-or-admin
  • POST /api/gitops/detect-drift — viewer-or-above

The shared helper takes a store.Store so it can be reused by any handler without refactoring existing handler structs (the CardHandler.requireEditorOrAdmin method in cards.go is left alone to avoid unrelated churn).

#6024 — Exec sessions usable after logout

/ws/exec did not participate in the WebSocket Hub, so Hub.DisconnectUser(userID) in Logout could not reach exec sessions. The exec handler now registers its stream context cancel func in a per-user registry (execSessions map guarded by a sync.Mutex). On normal session end a deferred cleanup removes the entry so the registry stays bounded by live session count, not lifetime count.

Logout calls the new exported CancelUserExecSessions(userID) after wsHub.DisconnectUser. That unblocks executor.StreamWithContext, which in turn causes the WebSocket read loop to exit and close the connection.

#6026 — Stale Authorization header overrides cookie

JWTAuth middleware unconditionally preferred the Authorization header. If the header token was stale (happens after silent refresh — in-flight requests may still carry the old bearer value), the middleware returned 401 even when the kc_auth cookie held a valid token.

Fix: if header parse fails AND a kc_auth cookie is present AND the cookie parses successfully, use the cookie. Only engaged on header-parse failure, so the valid-header happy path is unchanged. Added an inline comment explaining the scenario for future readers.

Implementation notes

  • Shared helpers: pkg/api/handlers/auth_helpers.go (new) holds requireEditorOrAdmin and requireViewerOrAbove as package-level functions taking a store.Store parameter. This lets gitops reuse the logic without embedding a base struct and without touching cards.go (which still has its own handler-method form).
  • Exec registry: sync.Mutex (not RWMutex) as requested — add/remove/cancel are all infrequent and short. Map layout is map[uuid.UUID]map[int64]context.CancelFunc so unregister by session id works without iterating.
  • No magic numbers: testExecCancelWait = 250 * time.Millisecond and statusPastRBAC = http.StatusBadRequest are the only new constants; both named.
  • No backwards-compat shims: the old GitOpsHandlers.requireAdmin method was deleted, not wrapped. Its imports (middleware, models) were also pruned from gitops.go.

Test plan

  • go build ./... passes
  • go vet ./... passes
  • go test ./pkg/... -count=1 passes (all packages green, including 12 new/updated tests across gitops_test.go, exec_test.go, auth_test.go)
  • cd web && npm run build passes including post-build safety checks
  • Manual: login as viewer, POST /api/gitops/sync → 403
  • Manual: login as editor, POST /api/gitops/sync → reaches body validation
  • Manual: open /ws/exec, POST /auth/logout in another tab, verify shell dies
  • Manual: send request with stale bearer header + fresh cookie → 200

Fixes #6022
Fixes #6024
Fixes #6026

…eader fallback

Bundles three backend security fixes:

- #6022: Mutating GitOps endpoints (sync, helm upgrade/uninstall/rollback,
  ArgoCD sync) are now gated as editor-or-admin via a shared
  requireEditorOrAdmin helper in pkg/api/handlers/auth_helpers.go. The
  legacy admin-only helper in gitops.go was removed; drift detection is
  now gated as viewer-or-above since it is read-oriented. Tests cover
  viewer/editor/admin for all five mutation endpoints plus drift.

- #6024: /ws/exec sessions now register their stream context cancel funcs
  in a per-user registry (pkg/api/handlers/exec.go). The auth Logout
  handler calls the new CancelUserExecSessions(userID) after
  wsHub.DisconnectUser, which was previously a no-op for exec because
  the exec handler does not participate in the WebSocket Hub. Deferred
  cleanup in the exec handler removes the entry on normal session end
  so the registry stays bounded by live sessions.

- #6026: JWTAuth middleware now falls back to the kc_auth cookie when
  the Authorization header token fails to parse AND a valid cookie is
  present. Without this, in-flight requests after a silent token
  refresh returned 401 even though the session was valid. The fallback
  only engages on header parse failure — a valid header is still
  preferred unconditionally.

Fixes #6022
Fixes #6024
Fixes #6026

Signed-off-by: Andrew Anderson <[email protected]>
@clubanderson clubanderson added the ai-generated Pull request generated by AI label Apr 10, 2026
Copilot AI review requested due to automatic review settings April 10, 2026 10:55
@kubestellar-prow kubestellar-prow Bot added the dco-signoff: yes Indicates the PR's author has signed the DCO. label Apr 10, 2026
@kubestellar-prow
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign clubanderson for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@netlify
Copy link
Copy Markdown

netlify Bot commented Apr 10, 2026

Deploy Preview for kubestellarconsole canceled.

Name Link
🔨 Latest commit 761de6d
🔍 Latest deploy log https://app.netlify.com/projects/kubestellarconsole/deploys/69d8d70799270700087ebe46

@github-actions
Copy link
Copy Markdown
Contributor

👋 Hey @clubanderson — thanks for opening this PR!

🤖 This project is developed exclusively using AI coding assistants.

Please do not attempt to code anything for this project manually.
All contributions should be authored using an AI coding tool such as:

This ensures consistency in code style, architecture patterns, test coverage,
and commit quality across the entire codebase.


This is an automated message.

@kubestellar-prow kubestellar-prow Bot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Apr 10, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Backend security hardening across GitOps RBAC enforcement, exec session termination on logout, and JWT auth token source selection to avoid stale-header 401s.

Changes:

  • Add shared RBAC helpers and gate GitOps mutation/drift endpoints to match the updated role policy.
  • Track active /ws/exec sessions per user and cancel them during logout to prevent post-logout shells.
  • Update JWTAuth middleware to fall back to a valid kc_auth cookie when a stale/invalid Authorization header is present; add unit tests.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
pkg/api/middleware/auth.go Adds cookie fallback when Authorization header JWT parsing fails.
pkg/api/middleware/auth_test.go Adds tests covering stale-header + valid-cookie fallback scenarios.
pkg/api/handlers/auth_helpers.go Introduces shared RBAC helpers used by GitOps handlers.
pkg/api/handlers/gitops.go Applies shared RBAC helpers to GitOps sync/helm/argocd sync and drift detection.
pkg/api/handlers/gitops_test.go Refactors/expands RBAC tests for GitOps endpoints.
pkg/api/handlers/exec.go Adds per-user exec session registry and registers cancel funcs for logout-driven teardown.
pkg/api/handlers/exec_test.go Adds unit tests for exec session registry cancellation/unregister behavior.
pkg/api/handlers/auth.go Cancels active exec sessions on logout in addition to hub-based WS disconnects.

Comment thread pkg/api/handlers/exec.go
Comment on lines +389 to +393
// Register this cancel with the per-user exec session registry so a later
// Logout call can tear the stream down even though /ws/exec is not tracked
// by the WebSocket Hub (#6024). Only register when we have a real userID —
// in dev/demo without a valid UserID claim there is nothing to key on.
var execRegistrationID int64
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a race window where an exec session can outlive logout: the execCancel registration happens only after init parsing and k8s/executor setup. If the user logs out after JWT validation but before this registration point, Logout won’t find this session in the registry, and the exec stream can continue running even though the token has been revoked. Consider creating the execCtx/execCancel and registering it immediately after successful JWT validation (and unregistering via defer), so the registry covers the full lifetime of the WS connection from the moment it becomes authenticated.

Copilot uses AI. Check for mistakes.
Comment on lines +144 to +148
func gitopsMutationCases() []mutationCase {
return []mutationCase{
{
name: "sync",
path: "/api/gitops/sync",
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RBAC tests for mutating GitOps endpoints don’t include POST /api/gitops/argocd/sync, even though TriggerArgoSync is now gated by requireEditorOrAdmin (#6022). Adding an argocd-sync entry to gitopsMutationCases (with a per-case expected status or an assertion like “not 403” for the allowed roles, since the handler may return 503 when k8sClient is nil) would prevent regressions where that endpoint accidentally becomes viewer-accessible again.

Copilot uses AI. Check for mistakes.
clubanderson added a commit that referenced this pull request Apr 10, 2026
Fixes #6029. SSE streaming endpoints (streamClusters in sse.go) ran
inside SetBodyStreamWriter callbacks that blocked for up to
sseOverallDeadline (~30s), so when a user logged out any in-flight
SSE streams continued to emit cluster_data events until the deadline
fired or the client disconnected.

Mirrors the exec session cancellation pattern added in #6031 for
/ws/exec sessions (#6024):

- Add a per-user SSE session registry in sse.go with
  registerSSESession / unregisterSSESession / CancelUserSSEStreams,
  guarded by a plain sync.Mutex.
- In streamClusters, capture the authenticated user ID before the
  deferred SetBodyStreamWriter callback runs (the fiber.Ctx may be
  reused by then) and register the stream context's cancel func in
  the registry, with a deferred unregister for normal stream end.
- In auth.Logout, after CancelUserExecSessions, call
  CancelUserSSEStreams(claims.UserID) so logout tears down any live
  SSE streams the user had open.
- Add sse_test.go with the four lifecycle tests mirroring
  exec_test.go: CancelsRegisteredContexts, OtherUserUnaffected,
  UnregisterSSESession_RemovesEntry, and NoSessions (no panic on
  empty map).

streamDemoSSE is left unchanged — demo streams are instant, they
do not block, so they need no cancellation path.

Signed-off-by: Andrew Anderson <[email protected]>
@clubanderson clubanderson merged commit 9cf5ff8 into main Apr 10, 2026
52 of 56 checks passed
@kubestellar-prow kubestellar-prow Bot deleted the fix/6022-6024-6026-security branch April 10, 2026 12:36
@github-actions
Copy link
Copy Markdown
Contributor

Thank you for your contribution! Your PR has been merged.

Check out what's new:

Stay connected: Slack #kubestellar-dev | Multi-Cluster Survey

@github-actions
Copy link
Copy Markdown
Contributor

Post-merge build verification passed

Both Go and frontend builds compiled successfully against merge commit 9cf5ff871f90687b504625537e36b38b2b08431a.

clubanderson added a commit that referenced this pull request Apr 10, 2026
Fixes #6029. SSE streaming endpoints (streamClusters in sse.go) ran
inside SetBodyStreamWriter callbacks that blocked for up to
sseOverallDeadline (~30s), so when a user logged out any in-flight
SSE streams continued to emit cluster_data events until the deadline
fired or the client disconnected.

Mirrors the exec session cancellation pattern added in #6031 for
/ws/exec sessions (#6024):

- Add a per-user SSE session registry in sse.go with
  registerSSESession / unregisterSSESession / CancelUserSSEStreams,
  guarded by a plain sync.Mutex.
- In streamClusters, capture the authenticated user ID before the
  deferred SetBodyStreamWriter callback runs (the fiber.Ctx may be
  reused by then) and register the stream context's cancel func in
  the registry, with a deferred unregister for normal stream end.
- In auth.Logout, after CancelUserExecSessions, call
  CancelUserSSEStreams(claims.UserID) so logout tears down any live
  SSE streams the user had open.
- Add sse_test.go with the four lifecycle tests mirroring
  exec_test.go: CancelsRegisteredContexts, OtherUserUnaffected,
  UnregisterSSESession_RemovesEntry, and NoSessions (no panic on
  empty map).

streamDemoSSE is left unchanged — demo streams are instant, they
do not block, so they need no cancellation path.

Signed-off-by: Andrew Anderson <[email protected]>
@github-actions
Copy link
Copy Markdown
Contributor

✅ Post-Merge Verification: passed

Commit: 9cf5ff871f90687b504625537e36b38b2b08431a
Specs run: smoke.spec.ts
Report: https://github.com/kubestellar/console/actions/runs/24243233452

clubanderson added a commit that referenced this pull request Apr 10, 2026
…covery, oauth_configured requires secret

Fixes three backend auth edges:

- #6056: /health reported oauth_configured=true when only GITHUB_CLIENT_ID
  was set but GITHUB_CLIENT_SECRET was missing. Such a config is unusable
  for the token exchange, so the probe misled the frontend into showing
  a GitHub login button that was guaranteed to fail. The check now
  requires both values.

- #6063: A structurally malformed Authorization header (no Bearer prefix,
  'Bearer' with no token, whitespace-only) returned 401 immediately even
  when a perfectly valid kc_auth cookie was attached. The middleware now
  treats a malformed header the same as an empty header and falls through
  to the cookie path. Complements #6026 (stale-but-valid-format header
  fallback already landed in #6031).

- #6064: When validateAndConsumeOAuthState failed (stale OAuth tab,
  server restart that cleared the in-memory state store, back-button
  replay) the callback unconditionally redirected to
  /login?error=csrf_validation_failed even if the user already had a
  valid kc_auth cookie. The callback now checks for a non-expired,
  non-revoked JWT cookie on state failure and, if present, redirects to
  / so the live session is preserved. Complements #6034 which fixes the
  root cause of state loss on the server side.

Signed-off-by: Andrew Anderson <[email protected]>
clubanderson added a commit that referenced this pull request Apr 10, 2026
Two cleanups flagged on the merged security bundle PR.

1. exec.go race window — close the gap where /ws/exec could outlive
   logout. Previously the exec context was created and registered
   in the per-user registry only AFTER reading the init message,
   looking up the k8s client + REST config, and building the SPDY
   executor. If the user logged out during that ~hundreds-of-ms
   window, CancelUserExecSessions had nothing to cancel — the
   about-to-start exec session leaked past logout.

   Move execCtx + registerExecSession() to immediately after JWT
   validation, before any of the long-running setup. Registration
   now happens essentially synchronously with the first byte the
   handler receives, so a follow-up Logout always finds and cancels
   the in-flight session.

2. gitops_test.go — add /api/gitops/argocd/sync to gitopsMutationCases
   so TriggerArgoSync is exercised by the editor/admin/viewer RBAC
   matrix the same way the helm-* and sync handlers are.

   Adding the test surfaced a pre-existing ordering bug in
   TriggerArgoSync: the k8sClient nil check ran BEFORE body parsing
   and required-field validation, so a request with valid auth and
   a valid body shape would 503 instead of returning the 400 the
   matrix expects. Reordered to body-parse → required-fields →
   validateK8sName → k8sClient check, matching Sync /
   UpgradeHelmRelease conventions.

Fixes #6075

Signed-off-by: Andrew Anderson <[email protected]>
clubanderson added a commit that referenced this pull request Apr 10, 2026
Fixes #6029. SSE streaming endpoints (streamClusters in sse.go) ran
inside SetBodyStreamWriter callbacks that blocked for up to
sseOverallDeadline (~30s), so when a user logged out any in-flight
SSE streams continued to emit cluster_data events until the deadline
fired or the client disconnected.

Mirrors the exec session cancellation pattern added in #6031 for
/ws/exec sessions (#6024):

- Add a per-user SSE session registry in sse.go with
  registerSSESession / unregisterSSESession / CancelUserSSEStreams,
  guarded by a plain sync.Mutex.
- In streamClusters, capture the authenticated user ID before the
  deferred SetBodyStreamWriter callback runs (the fiber.Ctx may be
  reused by then) and register the stream context's cancel func in
  the registry, with a deferred unregister for normal stream end.
- In auth.Logout, after CancelUserExecSessions, call
  CancelUserSSEStreams(claims.UserID) so logout tears down any live
  SSE streams the user had open.
- Add sse_test.go with the four lifecycle tests mirroring
  exec_test.go: CancelsRegisteredContexts, OtherUserUnaffected,
  UnregisterSSESession_RemovesEntry, and NoSessions (no panic on
  empty map).

streamDemoSSE is left unchanged — demo streams are instant, they
do not block, so they need no cancellation path.

Signed-off-by: Andrew Anderson <[email protected]>
clubanderson added a commit that referenced this pull request Apr 10, 2026
…covery, oauth_configured requires secret (#6076)

Fixes three backend auth edges:

- #6056: /health reported oauth_configured=true when only GITHUB_CLIENT_ID
  was set but GITHUB_CLIENT_SECRET was missing. Such a config is unusable
  for the token exchange, so the probe misled the frontend into showing
  a GitHub login button that was guaranteed to fail. The check now
  requires both values.

- #6063: A structurally malformed Authorization header (no Bearer prefix,
  'Bearer' with no token, whitespace-only) returned 401 immediately even
  when a perfectly valid kc_auth cookie was attached. The middleware now
  treats a malformed header the same as an empty header and falls through
  to the cookie path. Complements #6026 (stale-but-valid-format header
  fallback already landed in #6031).

- #6064: When validateAndConsumeOAuthState failed (stale OAuth tab,
  server restart that cleared the in-memory state store, back-button
  replay) the callback unconditionally redirected to
  /login?error=csrf_validation_failed even if the user already had a
  valid kc_auth cookie. The callback now checks for a non-expired,
  non-revoked JWT cookie on state failure and, if present, redirects to
  / so the live session is preserved. Complements #6034 which fixes the
  root cause of state loss on the server side.

Signed-off-by: Andrew Anderson <[email protected]>
clubanderson added a commit that referenced this pull request Apr 10, 2026
Two cleanups flagged on the merged security bundle PR.

1. exec.go race window — close the gap where /ws/exec could outlive
   logout. Previously the exec context was created and registered
   in the per-user registry only AFTER reading the init message,
   looking up the k8s client + REST config, and building the SPDY
   executor. If the user logged out during that ~hundreds-of-ms
   window, CancelUserExecSessions had nothing to cancel — the
   about-to-start exec session leaked past logout.

   Move execCtx + registerExecSession() to immediately after JWT
   validation, before any of the long-running setup. Registration
   now happens essentially synchronously with the first byte the
   handler receives, so a follow-up Logout always finds and cancels
   the in-flight session.

2. gitops_test.go — add /api/gitops/argocd/sync to gitopsMutationCases
   so TriggerArgoSync is exercised by the editor/admin/viewer RBAC
   matrix the same way the helm-* and sync handlers are.

   Adding the test surfaced a pre-existing ordering bug in
   TriggerArgoSync: the k8sClient nil check ran BEFORE body parsing
   and required-field validation, so a request with valid auth and
   a valid body shape would 503 instead of returning the 400 the
   matrix expects. Reordered to body-parse → required-fields →
   validateK8sName → k8sClient check, matching Sync /
   UpgradeHelmRelease conventions.

Fixes #6075

Signed-off-by: Andrew Anderson <[email protected]>
clubanderson added a commit that referenced this pull request Apr 10, 2026
Fixes #6029. SSE streaming endpoints (streamClusters in sse.go) ran
inside SetBodyStreamWriter callbacks that blocked for up to
sseOverallDeadline (~30s), so when a user logged out any in-flight
SSE streams continued to emit cluster_data events until the deadline
fired or the client disconnected.

Mirrors the exec session cancellation pattern added in #6031 for
/ws/exec sessions (#6024):

- Add a per-user SSE session registry in sse.go with
  registerSSESession / unregisterSSESession / CancelUserSSEStreams,
  guarded by a plain sync.Mutex.
- In streamClusters, capture the authenticated user ID before the
  deferred SetBodyStreamWriter callback runs (the fiber.Ctx may be
  reused by then) and register the stream context's cancel func in
  the registry, with a deferred unregister for normal stream end.
- In auth.Logout, after CancelUserExecSessions, call
  CancelUserSSEStreams(claims.UserID) so logout tears down any live
  SSE streams the user had open.
- Add sse_test.go with the four lifecycle tests mirroring
  exec_test.go: CancelsRegisteredContexts, OtherUserUnaffected,
  UnregisterSSESession_RemovesEntry, and NoSessions (no panic on
  empty map).

streamDemoSSE is left unchanged — demo streams are instant, they
do not block, so they need no cancellation path.

Signed-off-by: Andrew Anderson <[email protected]>
clubanderson added a commit that referenced this pull request Apr 10, 2026
Fixes #6029. SSE streaming endpoints (streamClusters in sse.go) ran
inside SetBodyStreamWriter callbacks that blocked for up to
sseOverallDeadline (~30s), so when a user logged out any in-flight
SSE streams continued to emit cluster_data events until the deadline
fired or the client disconnected.

Mirrors the exec session cancellation pattern added in #6031 for
/ws/exec sessions (#6024):

- Add a per-user SSE session registry in sse.go with
  registerSSESession / unregisterSSESession / CancelUserSSEStreams,
  guarded by a plain sync.Mutex.
- In streamClusters, capture the authenticated user ID before the
  deferred SetBodyStreamWriter callback runs (the fiber.Ctx may be
  reused by then) and register the stream context's cancel func in
  the registry, with a deferred unregister for normal stream end.
- In auth.Logout, after CancelUserExecSessions, call
  CancelUserSSEStreams(claims.UserID) so logout tears down any live
  SSE streams the user had open.
- Add sse_test.go with the four lifecycle tests mirroring
  exec_test.go: CancelsRegisteredContexts, OtherUserUnaffected,
  UnregisterSSESession_RemovesEntry, and NoSessions (no panic on
  empty map).

streamDemoSSE is left unchanged — demo streams are instant, they
do not block, so they need no cancellation path.

Signed-off-by: Andrew Anderson <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ai-generated Pull request generated by AI dco-signoff: yes Indicates the PR's author has signed the DCO. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

2 participants