fix: SSE cache poisoning, cluster data leak, exec audit, upgrade auth#5412
fix: SSE cache poisoning, cluster data leak, exec audit, upgrade auth#5412clubanderson merged 1 commit intomainfrom
Conversation
…udit logging, and self-upgrade admin check - SSE non-retryable errors (401/503) now clear the in-flight cache entry before resolving, preventing stale promises from blocking future requests to the same URL (#5404) - Logout now clears cluster caches (both localStorage keys and in-memory module state) so cluster names, metrics, and distributions from a previous user session do not persist across logins (#5405) - Pod exec handler now logs a SECURITY warning with user identity and target pod details on every exec session, acknowledging that authorization (RBAC) is not yet enforced — only authentication is checked (#5406) - Self-upgrade trigger endpoint now requires admin role before allowing Deployment image patches, preventing non-admin authenticated users from rolling the console to arbitrary image tags (#5409) Signed-off-by: Andrew Anderson <[email protected]>
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
✅ Deploy Preview for kubestellarconsole ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
|
👋 Hey @clubanderson — thanks for opening this PR!
This is an automated message. |
|
Thank you for your contribution! Your PR has been merged. Check out what's new:
Stay connected: Slack #kubestellar-dev | Multi-Cluster Survey |
|
Post-merge build verification passed ✅ Both Go and frontend builds compiled successfully against merge commit |
There was a problem hiding this comment.
Pull request overview
This PR hardens session-bound caching and authorization boundaries across the frontend SSE client, cluster cache lifecycle, and backend self-upgrade/exec endpoints to prevent cross-session data reuse and unauthorized operations.
Changes:
- Clear SSE in-flight cache entries on non-retryable errors (401/503) to prevent stale promise reuse.
- Wipe cluster localStorage + in-memory shared cache during logout to avoid cross-user cluster data leakage.
- Add backend admin role gate for self-upgrade trigger, and add exec security/audit logging plus documentation of missing authz.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| web/src/lib/sseClient.ts | Clears timeout + in-flight entry on non-retryable SSE errors to avoid cache poisoning. |
| web/src/lib/auth.tsx | Calls new cluster-cache logout cleanup alongside existing logout cache resets. |
| web/src/hooks/mcp/shared.ts | Introduces clearClusterCacheOnLogout() to clear localStorage + reset shared cluster cache state. |
| pkg/api/server.go | Wires store into SelfUpgradeHandler constructor for role checks. |
| pkg/api/handlers/self_upgrade.go | Adds admin role enforcement before running Kubernetes self-upgrade logic. |
| pkg/api/handlers/exec.go | Adds security warnings/audit logs and documents that exec authorization is still not implemented. |
| return c.Status(fiber.StatusForbidden).JSON(SelfUpgradeTriggerResponse{ | ||
| Error: "unable to verify user role — access denied", | ||
| }) | ||
| } |
There was a problem hiding this comment.
TriggerUpgrade dereferences user.Role without checking whether GetUser returned a nil user. Since store.Store.GetUser can return (nil, nil) (e.g., missing user or uuid.Nil), this can panic and turn the endpoint into a DoS vector. Add an explicit user == nil check and deny the request when the user record can’t be loaded.
| } | |
| } | |
| if user == nil { | |
| slog.Warn("[self-upgrade] SECURITY: user record missing for role check", | |
| "user_id", userID, | |
| "github_login", middleware.GetGitHubLogin(c)) | |
| return c.Status(fiber.StatusForbidden).JSON(SelfUpgradeTriggerResponse{ | |
| Error: "unable to verify user role — access denied", | |
| }) | |
| } |
| if h.store != nil { | ||
| user, err := h.store.GetUser(userID) | ||
| if err != nil { | ||
| slog.Warn("[self-upgrade] SECURITY: failed to look up user for role check", | ||
| "user_id", userID, "error", err) | ||
| return c.Status(fiber.StatusForbidden).JSON(SelfUpgradeTriggerResponse{ | ||
| Error: "unable to verify user role — access denied", | ||
| }) | ||
| } | ||
| if user.Role != models.UserRoleAdmin { | ||
| slog.Warn("[self-upgrade] SECURITY: non-admin user attempted self-upgrade", | ||
| "user_id", userID, | ||
| "github_login", middleware.GetGitHubLogin(c), | ||
| "role", user.Role) | ||
| return c.Status(fiber.StatusForbidden).JSON(SelfUpgradeTriggerResponse{ | ||
| Error: "self-upgrade requires admin role", | ||
| }) | ||
| } | ||
| } |
There was a problem hiding this comment.
The admin gate is conditional on h.store != nil; if the handler is ever constructed with a nil store, the role check is silently skipped and any authenticated caller could trigger a self-upgrade. Prefer failing closed here (treat nil store as misconfiguration and return 403/500) so the security check cannot be bypassed by wiring mistakes.
| if h.store != nil { | |
| user, err := h.store.GetUser(userID) | |
| if err != nil { | |
| slog.Warn("[self-upgrade] SECURITY: failed to look up user for role check", | |
| "user_id", userID, "error", err) | |
| return c.Status(fiber.StatusForbidden).JSON(SelfUpgradeTriggerResponse{ | |
| Error: "unable to verify user role — access denied", | |
| }) | |
| } | |
| if user.Role != models.UserRoleAdmin { | |
| slog.Warn("[self-upgrade] SECURITY: non-admin user attempted self-upgrade", | |
| "user_id", userID, | |
| "github_login", middleware.GetGitHubLogin(c), | |
| "role", user.Role) | |
| return c.Status(fiber.StatusForbidden).JSON(SelfUpgradeTriggerResponse{ | |
| Error: "self-upgrade requires admin role", | |
| }) | |
| } | |
| } | |
| if h.store == nil { | |
| slog.Error("[self-upgrade] SECURITY: store unavailable for role check", | |
| "user_id", userID, | |
| "github_login", middleware.GetGitHubLogin(c)) | |
| return c.Status(fiber.StatusInternalServerError).JSON(SelfUpgradeTriggerResponse{ | |
| Error: "server misconfiguration prevents role verification", | |
| }) | |
| } | |
| user, err := h.store.GetUser(userID) | |
| if err != nil { | |
| slog.Warn("[self-upgrade] SECURITY: failed to look up user for role check", | |
| "user_id", userID, "error", err) | |
| return c.Status(fiber.StatusForbidden).JSON(SelfUpgradeTriggerResponse{ | |
| Error: "unable to verify user role — access denied", | |
| }) | |
| } | |
| if user.Role != models.UserRoleAdmin { | |
| slog.Warn("[self-upgrade] SECURITY: non-admin user attempted self-upgrade", | |
| "user_id", userID, | |
| "github_login", middleware.GetGitHubLogin(c), | |
| "role", user.Role) | |
| return c.Status(fiber.StatusForbidden).JSON(SelfUpgradeTriggerResponse{ | |
| Error: "self-upgrade requires admin role", | |
| }) | |
| } |
| // Don't retry on auth (401) or service unavailable (503) — expected in demo mode | ||
| const isNonRetryable = err.message?.includes('401') || err.message?.includes('503') | ||
| if (isNonRetryable) { | ||
| console.debug('[SSE] Non-retryable error — skipping retries (demo mode)') | ||
| // Clear the in-flight entry and timers so future requests for the | ||
| // same URL start a fresh stream instead of reusing this stale | ||
| // resolved promise (#5404). | ||
| clearTimeout(timeoutId) | ||
| cleanup(/* wasAborted */ true) | ||
| resolve(accumulated) | ||
| return |
There was a problem hiding this comment.
There are existing unit tests for fetchSSE(), but none appear to cover the new non-retryable (401/503) path that should clear the in-flight cache entry. Add a test that: (1) triggers a 401/503, (2) calls fetchSSE again for the same URL, and (3) asserts a new fetch is started (no stale in-flight promise reuse).
| /** | ||
| * Clear all cluster caches on logout so data from a previous user session | ||
| * does not leak to the next login (#5405). Clears both localStorage keys | ||
| * and the module-level in-memory cache, then notifies subscribers so the | ||
| * UI resets to a loading/empty state. | ||
| */ | ||
| export function clearClusterCacheOnLogout(): void { | ||
| try { | ||
| localStorage.removeItem(CLUSTER_CACHE_KEY) | ||
| localStorage.removeItem(CLUSTER_DIST_CACHE_KEY) | ||
| } catch { | ||
| // Ignore storage errors | ||
| } | ||
|
|
||
| clusterCache = { | ||
| clusters: [], | ||
| lastUpdated: null, | ||
| isLoading: true, | ||
| isRefreshing: false, | ||
| error: null, | ||
| consecutiveFailures: 0, | ||
| isFailed: false, | ||
| lastRefresh: null, | ||
| } | ||
| notifyClusterSubscribers() | ||
| } |
There was a problem hiding this comment.
This adds new exported logout behavior that mutates the module-level shared cluster cache and localStorage, but there isn’t test coverage for it. Since hooks/mcp has an existing shared.test.ts suite, add a unit test to assert clearClusterCacheOnLogout() clears both storage keys and resets clusterCache (and notifies subscribers).
| // SECURITY WARNING (#5406): This handler validates authentication (JWT) but | ||
| // does NOT perform authorization. Any authenticated user can open a shell to | ||
| // any pod in any cluster. Full RBAC scoping requires checking the user's | ||
| // Kubernetes RBAC permissions (SubjectAccessReview with "create" verb on | ||
| // "pods/exec" in the target namespace/cluster) before establishing the exec | ||
| // session. This is a known limitation tracked in issue #5406. | ||
| slog.Warn("[Exec] SECURITY: pod exec session opened — no authorization check performed", | ||
| "user", claims.GitHubLogin, |
There was a problem hiding this comment.
This code explicitly documents that pod exec still has no authorization check (“known limitation tracked in #5406”). That conflicts with closing #5406 in the PR body; either implement the SubjectAccessReview authorization gate here, or keep #5406 open / adjust the PR description so it doesn’t claim the issue is closed.
✅ Post-Merge Verification: passedCommit: |
PR #5412 added clearClusterCacheOnLogout (from hooks/mcp/shared), clearPermissionsCache, disconnectPresence, and clearSSECache imports to auth.tsx. The auth tests did not mock these new dependencies, causing 50 test failures. The demoMode mock also lacked exports (isNetlifyDeployment, isDemoMode, isDemoToken, subscribeDemoMode) that hooks/mcp/shared requires at module load time. Closes #5418 Signed-off-by: Andrew Anderson <[email protected]>
…#5422) PR #5412 added clearClusterCacheOnLogout (from hooks/mcp/shared), clearPermissionsCache, disconnectPresence, and clearSSECache imports to auth.tsx. The auth tests did not mock these new dependencies, causing 50 test failures. The demoMode mock also lacked exports (isNetlifyDeployment, isDemoMode, isDemoToken, subscribeDemoMode) that hooks/mcp/shared requires at module load time. Closes #5418 Signed-off-by: Andrew Anderson <[email protected]>
Summary
Fixes four security and cache-consistency issues:
Non-retryable SSE failures poison the in-flight cache. #5404 — SSE in-flight cache poisoning: Non-retryable errors (401/503) now clear the in-flight cache entry and timers before resolving, so future requests start a fresh stream instead of reusing a stale resolved promise.
Cluster data from the previous user survives logout #5405 — Cluster data survives logout: Added
clearClusterCacheOnLogout()that wipes bothkubestellar-cluster-cacheandkubestellar-cluster-distributionsfrom localStorage and resets the module-level in-memory cluster state. Called during logout alongside the existing SSE/permissions/presence cleanup.Pod exec allows any authenticated user to open a shell in any pod. #5406 — Pod exec missing authorization: Added
slog.Warnsecurity audit logs with user identity, target pod, namespace, cluster, container, and command on every exec session. Added code comment documenting that authorization (Kubernetes RBAC SubjectAccessReview) is not yet implemented — only authentication is enforced.Any authenticated user can trigger self-upgrade. #5409 — Self-upgrade missing admin check:
TriggerUpgradenow looks up the user's role from the store and rejects non-admin users with 403 Forbidden before any Kubernetes API calls. TheSelfUpgradeHandlerconstructor now accepts astore.Storeparameter.Closes #5404, closes #5405, closes #5406, closes #5409
Test plan
kubestellar-cluster-cacheandkubestellar-cluster-distributionsare gone from localStorageslog.Warnentries appear in backend logs with user and pod detailsPOST /api/self-upgrade/triggeras a non-admin user — should get 403POST /api/self-upgrade/triggeras an admin user — should proceed normallygo build ./...passesnpm run buildpasses