Skip to content

fix: SSE cache poisoning, cluster data leak, exec audit, upgrade auth#5412

Merged
clubanderson merged 1 commit intomainfrom
fix/security-cache-batch
Apr 8, 2026
Merged

fix: SSE cache poisoning, cluster data leak, exec audit, upgrade auth#5412
clubanderson merged 1 commit intomainfrom
fix/security-cache-batch

Conversation

@clubanderson
Copy link
Copy Markdown
Collaborator

Summary

Fixes four security and cache-consistency issues:

  • Non-retryable SSE failures poison the in-flight cache. #5404 — SSE in-flight cache poisoning: Non-retryable errors (401/503) now clear the in-flight cache entry and timers before resolving, so future requests start a fresh stream instead of reusing a stale resolved promise.

  • Cluster data from the previous user survives logout #5405 — Cluster data survives logout: Added clearClusterCacheOnLogout() that wipes both kubestellar-cluster-cache and kubestellar-cluster-distributions from localStorage and resets the module-level in-memory cluster state. Called during logout alongside the existing SSE/permissions/presence cleanup.

  • Pod exec allows any authenticated user to open a shell in any pod. #5406 — Pod exec missing authorization: Added slog.Warn security audit logs with user identity, target pod, namespace, cluster, container, and command on every exec session. Added code comment documenting that authorization (Kubernetes RBAC SubjectAccessReview) is not yet implemented — only authentication is enforced.

  • Any authenticated user can trigger self-upgrade. #5409 — Self-upgrade missing admin check: TriggerUpgrade now looks up the user's role from the store and rejects non-admin users with 403 Forbidden before any Kubernetes API calls. The SelfUpgradeHandler constructor now accepts a store.Store parameter.

Closes #5404, closes #5405, closes #5406, closes #5409

Test plan

  • Trigger a 401 from an SSE endpoint, retry the same request — should start a fresh stream (not reuse stale promise)
  • Log in, load clusters, log out, verify kubestellar-cluster-cache and kubestellar-cluster-distributions are gone from localStorage
  • Open pod exec as authenticated user — verify slog.Warn entries appear in backend logs with user and pod details
  • Attempt POST /api/self-upgrade/trigger as a non-admin user — should get 403
  • Attempt POST /api/self-upgrade/trigger as an admin user — should proceed normally
  • go build ./... passes
  • npm run build passes

…udit logging, and self-upgrade admin check

- SSE non-retryable errors (401/503) now clear the in-flight cache entry
  before resolving, preventing stale promises from blocking future
  requests to the same URL (#5404)

- Logout now clears cluster caches (both localStorage keys and in-memory
  module state) so cluster names, metrics, and distributions from a
  previous user session do not persist across logins (#5405)

- Pod exec handler now logs a SECURITY warning with user identity and
  target pod details on every exec session, acknowledging that
  authorization (RBAC) is not yet enforced — only authentication is
  checked (#5406)

- Self-upgrade trigger endpoint now requires admin role before allowing
  Deployment image patches, preventing non-admin authenticated users
  from rolling the console to arbitrary image tags (#5409)

Signed-off-by: Andrew Anderson <[email protected]>
Copilot AI review requested due to automatic review settings April 8, 2026 06:55
@kubestellar-prow kubestellar-prow Bot added the dco-signoff: yes Indicates the PR's author has signed the DCO. label Apr 8, 2026
@kubestellar-prow
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign mikespreitzer for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@netlify
Copy link
Copy Markdown

netlify Bot commented Apr 8, 2026

Deploy Preview for kubestellarconsole ready!

Name Link
🔨 Latest commit 9745acc
🔍 Latest deploy log https://app.netlify.com/projects/kubestellarconsole/deploys/69d5fbcc8d2ba200085f648e
😎 Deploy Preview https://deploy-preview-5412.console-deploy-preview.kubestellar.io
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 8, 2026

👋 Hey @clubanderson — thanks for opening this PR!

🤖 This project is developed exclusively using AI coding assistants.

Please do not attempt to code anything for this project manually.
All contributions should be authored using an AI coding tool such as:

This ensures consistency in code style, architecture patterns, test coverage,
and commit quality across the entire codebase.


This is an automated message.

@kubestellar-prow kubestellar-prow Bot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Apr 8, 2026
@clubanderson clubanderson merged commit 8f6b4ba into main Apr 8, 2026
18 of 20 checks passed
@kubestellar-prow kubestellar-prow Bot deleted the fix/security-cache-batch branch April 8, 2026 06:55
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 8, 2026

Thank you for your contribution! Your PR has been merged.

Check out what's new:

Stay connected: Slack #kubestellar-dev | Multi-Cluster Survey

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 8, 2026

Post-merge build verification passed

Both Go and frontend builds compiled successfully against merge commit 8f6b4ba095cf25c4555e78931b9e4979b871115f.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens session-bound caching and authorization boundaries across the frontend SSE client, cluster cache lifecycle, and backend self-upgrade/exec endpoints to prevent cross-session data reuse and unauthorized operations.

Changes:

  • Clear SSE in-flight cache entries on non-retryable errors (401/503) to prevent stale promise reuse.
  • Wipe cluster localStorage + in-memory shared cache during logout to avoid cross-user cluster data leakage.
  • Add backend admin role gate for self-upgrade trigger, and add exec security/audit logging plus documentation of missing authz.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
web/src/lib/sseClient.ts Clears timeout + in-flight entry on non-retryable SSE errors to avoid cache poisoning.
web/src/lib/auth.tsx Calls new cluster-cache logout cleanup alongside existing logout cache resets.
web/src/hooks/mcp/shared.ts Introduces clearClusterCacheOnLogout() to clear localStorage + reset shared cluster cache state.
pkg/api/server.go Wires store into SelfUpgradeHandler constructor for role checks.
pkg/api/handlers/self_upgrade.go Adds admin role enforcement before running Kubernetes self-upgrade logic.
pkg/api/handlers/exec.go Adds security warnings/audit logs and documents that exec authorization is still not implemented.

return c.Status(fiber.StatusForbidden).JSON(SelfUpgradeTriggerResponse{
Error: "unable to verify user role — access denied",
})
}
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TriggerUpgrade dereferences user.Role without checking whether GetUser returned a nil user. Since store.Store.GetUser can return (nil, nil) (e.g., missing user or uuid.Nil), this can panic and turn the endpoint into a DoS vector. Add an explicit user == nil check and deny the request when the user record can’t be loaded.

Suggested change
}
}
if user == nil {
slog.Warn("[self-upgrade] SECURITY: user record missing for role check",
"user_id", userID,
"github_login", middleware.GetGitHubLogin(c))
return c.Status(fiber.StatusForbidden).JSON(SelfUpgradeTriggerResponse{
Error: "unable to verify user role — access denied",
})
}

Copilot uses AI. Check for mistakes.
Comment on lines +215 to +233
if h.store != nil {
user, err := h.store.GetUser(userID)
if err != nil {
slog.Warn("[self-upgrade] SECURITY: failed to look up user for role check",
"user_id", userID, "error", err)
return c.Status(fiber.StatusForbidden).JSON(SelfUpgradeTriggerResponse{
Error: "unable to verify user role — access denied",
})
}
if user.Role != models.UserRoleAdmin {
slog.Warn("[self-upgrade] SECURITY: non-admin user attempted self-upgrade",
"user_id", userID,
"github_login", middleware.GetGitHubLogin(c),
"role", user.Role)
return c.Status(fiber.StatusForbidden).JSON(SelfUpgradeTriggerResponse{
Error: "self-upgrade requires admin role",
})
}
}
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The admin gate is conditional on h.store != nil; if the handler is ever constructed with a nil store, the role check is silently skipped and any authenticated caller could trigger a self-upgrade. Prefer failing closed here (treat nil store as misconfiguration and return 403/500) so the security check cannot be bypassed by wiring mistakes.

Suggested change
if h.store != nil {
user, err := h.store.GetUser(userID)
if err != nil {
slog.Warn("[self-upgrade] SECURITY: failed to look up user for role check",
"user_id", userID, "error", err)
return c.Status(fiber.StatusForbidden).JSON(SelfUpgradeTriggerResponse{
Error: "unable to verify user role — access denied",
})
}
if user.Role != models.UserRoleAdmin {
slog.Warn("[self-upgrade] SECURITY: non-admin user attempted self-upgrade",
"user_id", userID,
"github_login", middleware.GetGitHubLogin(c),
"role", user.Role)
return c.Status(fiber.StatusForbidden).JSON(SelfUpgradeTriggerResponse{
Error: "self-upgrade requires admin role",
})
}
}
if h.store == nil {
slog.Error("[self-upgrade] SECURITY: store unavailable for role check",
"user_id", userID,
"github_login", middleware.GetGitHubLogin(c))
return c.Status(fiber.StatusInternalServerError).JSON(SelfUpgradeTriggerResponse{
Error: "server misconfiguration prevents role verification",
})
}
user, err := h.store.GetUser(userID)
if err != nil {
slog.Warn("[self-upgrade] SECURITY: failed to look up user for role check",
"user_id", userID, "error", err)
return c.Status(fiber.StatusForbidden).JSON(SelfUpgradeTriggerResponse{
Error: "unable to verify user role — access denied",
})
}
if user.Role != models.UserRoleAdmin {
slog.Warn("[self-upgrade] SECURITY: non-admin user attempted self-upgrade",
"user_id", userID,
"github_login", middleware.GetGitHubLogin(c),
"role", user.Role)
return c.Status(fiber.StatusForbidden).JSON(SelfUpgradeTriggerResponse{
Error: "self-upgrade requires admin role",
})
}

Copilot uses AI. Check for mistakes.
Comment thread web/src/lib/sseClient.ts
Comment on lines 315 to 325
// Don't retry on auth (401) or service unavailable (503) — expected in demo mode
const isNonRetryable = err.message?.includes('401') || err.message?.includes('503')
if (isNonRetryable) {
console.debug('[SSE] Non-retryable error — skipping retries (demo mode)')
// Clear the in-flight entry and timers so future requests for the
// same URL start a fresh stream instead of reusing this stale
// resolved promise (#5404).
clearTimeout(timeoutId)
cleanup(/* wasAborted */ true)
resolve(accumulated)
return
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are existing unit tests for fetchSSE(), but none appear to cover the new non-retryable (401/503) path that should clear the in-flight cache entry. Add a test that: (1) triggers a 401/503, (2) calls fetchSSE again for the same URL, and (3) asserts a new fetch is started (no stale in-flight promise reuse).

Copilot uses AI. Check for mistakes.
Comment on lines +259 to +284
/**
* Clear all cluster caches on logout so data from a previous user session
* does not leak to the next login (#5405). Clears both localStorage keys
* and the module-level in-memory cache, then notifies subscribers so the
* UI resets to a loading/empty state.
*/
export function clearClusterCacheOnLogout(): void {
try {
localStorage.removeItem(CLUSTER_CACHE_KEY)
localStorage.removeItem(CLUSTER_DIST_CACHE_KEY)
} catch {
// Ignore storage errors
}

clusterCache = {
clusters: [],
lastUpdated: null,
isLoading: true,
isRefreshing: false,
error: null,
consecutiveFailures: 0,
isFailed: false,
lastRefresh: null,
}
notifyClusterSubscribers()
}
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This adds new exported logout behavior that mutates the module-level shared cluster cache and localStorage, but there isn’t test coverage for it. Since hooks/mcp has an existing shared.test.ts suite, add a unit test to assert clearClusterCacheOnLogout() clears both storage keys and resets clusterCache (and notifies subscribers).

Copilot uses AI. Check for mistakes.
Comment thread pkg/api/handlers/exec.go
Comment on lines +174 to +181
// SECURITY WARNING (#5406): This handler validates authentication (JWT) but
// does NOT perform authorization. Any authenticated user can open a shell to
// any pod in any cluster. Full RBAC scoping requires checking the user's
// Kubernetes RBAC permissions (SubjectAccessReview with "create" verb on
// "pods/exec" in the target namespace/cluster) before establishing the exec
// session. This is a known limitation tracked in issue #5406.
slog.Warn("[Exec] SECURITY: pod exec session opened — no authorization check performed",
"user", claims.GitHubLogin,
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code explicitly documents that pod exec still has no authorization check (“known limitation tracked in #5406”). That conflicts with closing #5406 in the PR body; either implement the SubjectAccessReview authorization gate here, or keep #5406 open / adjust the PR description so it doesn’t claim the issue is closed.

Copilot uses AI. Check for mistakes.
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 8, 2026

✅ Post-Merge Verification: passed

Commit: 8f6b4ba095cf25c4555e78931b9e4979b871115f
Specs run: smoke.spec.ts
Report: https://github.com/kubestellar/console/actions/runs/24122229887

clubanderson added a commit that referenced this pull request Apr 8, 2026
PR #5412 added clearClusterCacheOnLogout (from hooks/mcp/shared),
clearPermissionsCache, disconnectPresence, and clearSSECache imports
to auth.tsx. The auth tests did not mock these new dependencies,
causing 50 test failures. The demoMode mock also lacked exports
(isNetlifyDeployment, isDemoMode, isDemoToken, subscribeDemoMode)
that hooks/mcp/shared requires at module load time.

Closes #5418

Signed-off-by: Andrew Anderson <[email protected]>
clubanderson added a commit that referenced this pull request Apr 8, 2026
…#5422)

PR #5412 added clearClusterCacheOnLogout (from hooks/mcp/shared),
clearPermissionsCache, disconnectPresence, and clearSSECache imports
to auth.tsx. The auth tests did not mock these new dependencies,
causing 50 test failures. The demoMode mock also lacked exports
(isNetlifyDeployment, isDemoMode, isDemoToken, subscribeDemoMode)
that hooks/mcp/shared requires at module load time.

Closes #5418

Signed-off-by: Andrew Anderson <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dco-signoff: yes Indicates the PR's author has signed the DCO. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

2 participants