🐛 fix: exec authz typed-nil guard + test coverage (#8137)#8139
🐛 fix: exec authz typed-nil guard + test coverage (#8137)#8139clubanderson merged 1 commit intomainfrom
Conversation
Follow-up to #8134 addressing three Copilot review comments on the pods/exec RBAC security fix. 1. NewExecHandlers typed-nil interface footgun (exec.go) Before: `authorizer: k8sClient` unconditionally stored the *k8s. MultiClusterClient pointer into the execAuthorizer interface. When k8sClient is nil, this produces a typed-nil interface — a value that compares non-nil via `== nil` but panics on method calls. HandleExec's `h.authorizer == nil` fail-closed guard would silently pass and the first CheckPodExecPermissionForUser call would blow up. After: explicitly leave authorizer as an untyped-nil interface when k8sClient is nil, so the guard in HandleExec stays truthful. 2. Test comment was misleading (exec_test.go) The old TestExecHandlers_AuthorizerWired comment claimed a typed-nil interface was "distinguishable from untyped nil" by `== nil`, which is the exact opposite of how Go interface equality works. With fix #1 in place, h.authorizer is now a true untyped nil when k8sClient is nil, so the test asserts `require.Nil(t, h.authorizer)` and the comment explains why the contract matters. 3. Unit tests for the authorization decision (exec.go + exec_test.go) The allow/deny/error branches of the SubjectAccessReview path in HandleExec were not unit-tested — they lived inline inside a websocket handler, which makes isolated coverage awkward. Extracted the decision into (*ExecHandlers).authorizePodExec which returns nil on allow and a sentinel error on each fail-closed branch: - errExecAuthorizerUnavailable (h.authorizer == nil) - errExecMissingUserSubject (empty GitHub login) - errExecSARFailed (wraps SAR call error) - errExecRBACDenied (allowed=false; reason wrapped) HandleExec now calls authorizePodExec after parsing the init message and dispatches on errors.Is() to write the appropriate websocket error frame. Every fail-closed branch is preserved — no part of #8134's security fix is weakened. New tests in exec_test.go cover: - TestAuthorizePodExec_Allow → nil err, SAR called once - TestAuthorizePodExec_Deny → errExecRBACDenied - TestAuthorizePodExec_NilAuthorizer → errExecAuthorizerUnavailable - TestAuthorizePodExec_SARError → errExecSARFailed wraps err - TestAuthorizePodExec_EmptyLogin → errExecMissingUserSubject, SAR NOT called (calls == 0) Fixes #8137 Signed-off-by: Andy Anderson <[email protected]>
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
✅ Deploy Preview for kubestellarconsole canceled.
|
|
👋 Hey @clubanderson — thanks for opening this PR!
This is an automated message. |
There was a problem hiding this comment.
Pull request overview
Follow-up hardening for the pods/exec RBAC enforcement: fixes the typed-nil interface pitfall in NewExecHandlers, extracts the authorization decision into a unit-testable helper, and expands unit coverage for allow/deny/error authz branches.
Changes:
- Prevent typed-nil
execAuthorizervalues by leavingauthorizeras a true untypednilwhenk8sClientisnil. - Add
authorizePodExec(...)helper + sentinel errors and routeHandleExecfail-closed behavior througherrors.Is()checks. - Add focused unit tests covering allow/deny/SAR-error/empty-login/nil-authorizer branches.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| pkg/api/handlers/exec.go | Adds authorizePodExec + sentinel errors; fixes typed-nil authorizer wiring; updates HandleExec to dispatch on errors.Is. |
| pkg/api/handlers/exec_test.go | Updates typed-nil regression guard and adds unit tests for authorizePodExec decision branches. |
| pod, | ||
| ) | ||
| if err != nil { | ||
| return fmt.Errorf("%w: %v", errExecSARFailed, err) |
There was a problem hiding this comment.
In authorizePodExec, the SAR error path formats the underlying error with %v, so callers cannot recover the underlying SAR error via errors.Is/Unwrap (despite comments implying wrapping). Consider wrapping the underlying error as well (e.g., use a %w for the underlying error) or adjust the surrounding comments/tests to avoid claiming it’s wrapped.
| return fmt.Errorf("%w: %v", errExecSARFailed, err) | |
| return fmt.Errorf("%w: %w", errExecSARFailed, err) |
| // The sentinel error is wrapped so both errExecSARFailed and the underlying | ||
| // error are recoverable via errors.Is. |
There was a problem hiding this comment.
This test comment says the underlying SAR error is recoverable via errors.Is, but authorizePodExec currently formats the underlying error with %v (so errors.Is(err, sentinel) will fail). Either update authorizePodExec to wrap the underlying error or adjust this comment to match the actual error semantics.
| // The sentinel error is wrapped so both errExecSARFailed and the underlying | |
| // error are recoverable via errors.Is. | |
| // errExecSARFailed must remain recoverable via errors.Is, and the underlying | |
| // SAR error text should still be surfaced in the returned message for | |
| // operator debugging. |
| slog.Warn("[Exec] SECURITY: pods/exec denied by RBAC", | ||
| "user", claims.GitHubLogin, | ||
| "cluster", init.Cluster, | ||
| "namespace", init.Namespace, | ||
| "pod", init.Pod, | ||
| "error", err, | ||
| ) |
There was a problem hiding this comment.
The PR description states the original log messages/fields were preserved byte-for-byte, but this branch no longer logs the SAR deny reason under the structured "reason" key (it now logs a combined wrapped error under "error"). If preserving logs is a requirement (and/or to keep the deny reason queryable as a field), consider logging the reason separately here (e.g., keep a dedicated "reason" attribute) rather than only embedding it in the error string.
|
Thank you for your contribution! Your PR has been merged. Check out what's new:
Stay connected: Slack #kubestellar-dev | Multi-Cluster Survey |
|
Post-merge build verification passed ✅ Both Go and frontend builds compiled successfully against merge commit |
✅ Post-Merge Verification: passedCommit: |
Addresses three Copilot review comments on merged PR #8139: 1. exec.go authorizePodExec: the SAR-call-failed branch formatted the underlying error with `%v`, so `errors.Is(err, underlyingSARError)` silently returned false despite surrounding comments and the test name implying the underlying error was recoverable. Switched to double-%w (Go 1.20+): `fmt.Errorf("%w: %w", errExecSARFailed, err)` so callers can recover BOTH the sentinel AND the underlying SAR error via errors.Is. 2. exec_test.go TestAuthorizePodExec_SARError: the comment claimed the underlying error was recoverable, but the test only asserted the sentinel. Added `require.ErrorIs(t, err, sentinel)` so the double-%w contract is locked in by a test, not just a comment. 3. exec.go HandleExec deny branch: pre-#8139 logged the SAR deny reason under a structured `"reason"` slog key; #8139 accidentally embedded it only in the error string, breaking operator log queries that filter by the `reason` attribute. To restore the structured field without re-parsing error strings, authorizePodExec now returns `(execAuthzResult, error)` where execAuthzResult.reason is populated ONLY on errExecRBACDenied (empty on every other branch — allow, nil authorizer, missing subject, SAR failure). HandleExec passes authzResult.reason to slog.String("reason", ...) on the deny branch, alongside the wrapped error. Added TestAuthorizePodExec_Deny assertion that res.reason is the exact apiserver reason string (non-empty, distinct from the error), and a new TestAuthorizePodExec_DenyNoReason covering the empty-reason subcase. Non-deny branches assert res.reason stays empty so callers never accidentally log a synthesized reason. No behavior change to the security decision: every fail-closed branch from #8134/#8137/#8139 still denies the exec before the executor is built. The `h.authorizer == nil` typed-nil guard from #8139 is untouched. build/vet/go test ./pkg/api/handlers/... -run Exec: all 13 tests pass. Fixes #8140 Signed-off-by: Andy Anderson <[email protected]>
) Addresses three Copilot review comments on merged PR #8139: 1. exec.go authorizePodExec: the SAR-call-failed branch formatted the underlying error with `%v`, so `errors.Is(err, underlyingSARError)` silently returned false despite surrounding comments and the test name implying the underlying error was recoverable. Switched to double-%w (Go 1.20+): `fmt.Errorf("%w: %w", errExecSARFailed, err)` so callers can recover BOTH the sentinel AND the underlying SAR error via errors.Is. 2. exec_test.go TestAuthorizePodExec_SARError: the comment claimed the underlying error was recoverable, but the test only asserted the sentinel. Added `require.ErrorIs(t, err, sentinel)` so the double-%w contract is locked in by a test, not just a comment. 3. exec.go HandleExec deny branch: pre-#8139 logged the SAR deny reason under a structured `"reason"` slog key; #8139 accidentally embedded it only in the error string, breaking operator log queries that filter by the `reason` attribute. To restore the structured field without re-parsing error strings, authorizePodExec now returns `(execAuthzResult, error)` where execAuthzResult.reason is populated ONLY on errExecRBACDenied (empty on every other branch — allow, nil authorizer, missing subject, SAR failure). HandleExec passes authzResult.reason to slog.String("reason", ...) on the deny branch, alongside the wrapped error. Added TestAuthorizePodExec_Deny assertion that res.reason is the exact apiserver reason string (non-empty, distinct from the error), and a new TestAuthorizePodExec_DenyNoReason covering the empty-reason subcase. Non-deny branches assert res.reason stays empty so callers never accidentally log a synthesized reason. No behavior change to the security decision: every fail-closed branch from #8134/#8137/#8139 still denies the exec before the executor is built. The `h.authorizer == nil` typed-nil guard from #8139 is untouched. build/vet/go test ./pkg/api/handlers/... -run Exec: all 13 tests pass. Fixes #8140 Signed-off-by: Andy Anderson <[email protected]>
…5406) Phase 3d of the pod-SA → kc-agent refactor. The /ws/exec WebSocket now runs under kc-agent using the user's own kubeconfig rather than the backend pod ServiceAccount. The target cluster's apiserver enforces RBAC natively against the caller's identity, so the SubjectAccessReview workaround the backend handler had to simulate (#8120, #5406) is no longer needed — the shell fails synchronously at stream-open time if the user's kubeconfig doesn't permit pods/exec on the target pod. Fixes #5406. pkg/agent/server_exec.go (new file, ~520 lines): handleExec is the kc-agent side of the pod exec WebSocket. The flow mirrors the backend handler minus the pod-SA-specific layers: 1. Token validation via the standard s.validateToken(r) path (Authorization header or ?token= query param fallback that kc-agent's /ws already uses) — no first-message JWT handshake. 2. Gorilla WebSocket upgrade. 3. Read the exec_init JSON frame (cluster/namespace/pod/container/ command/tty/cols/rows). 4. Resolve the clientset + REST config from the shared *k8s.MultiClusterClient (the user's kubeconfig). 5. Build the SPDY exec stream and spin up ping ticker + read loop goroutines under a shared writeMu. 6. Block on executor.StreamWithContext; on exit, drain the reader goroutine (#7048 / #7778 ordering preserved) and send the exit frame. What we kept from the backend handler because it's independent of the pod-SA problem: - The stdin drop telemetry (agentExecStdinDropCount + rate-limited WARN log on power-of-two session counts) from PR 7995. - The agentExecMaxStdinBytes frame cap (1 MiB). - The #7048 / #7778 ordering: drain reader before closing the resize queue channel to avoid send-on-closed panic. - Ping/pong keepalive so half-open TCP is detected within agentExecPongTimeout. - Terminal size queue for TTY resize events. - writeMu serializing every WriteMessage so stdout/stderr/ping/ exit/error frames never race. What we dropped because kc-agent uses the user's kubeconfig: - First-message JWT auth handshake (authMessage type). Auth is on the HTTP upgrade now. - execAuthorizer interface + authorizePodExec + the four sentinel errors errExecAuthorizerUnavailable / errExecMissingUserSubject / errExecSARFailed / errExecRBACDenied. The whole SubjectAccess- Review dance was there to simulate the user's identity against the pod SA's clientset (#8120, #8137, #8139, #8140); kc-agent's clientset IS the user's identity, so the apiserver enforces natively. - Per-user execSessionsRegistry / registerExecSession / unregisterExecSession / CancelUserExecSessions. kc-agent is a per-user local process with no logout concept — if the user closes their browser tab the WebSocket closes, the read loop exits, and execCancel fires via defer. #6024's cross-session cleanup is no longer needed. Named constants instead of literals per the repo rule: agentExecMaxStdinBytes (1 MiB), agentExecPingInterval (30s), agentExecPongTimeout (45s), agentExecWriteDeadline (10s — explicit because gorilla unlike gofiber does not have an implicit write timeout), agentExecStdinBufferSize (32), agentExecResizeBufferSize (4), agentExecDefaultCols (80), agentExecDefaultRows (24). pkg/agent/server.go: mux.HandleFunc("/ws/exec", s.handleExec) registered next to /ws with an explanatory comment. pkg/api/handlers/exec.go + exec_test.go (deleted): The entire 791-line backend handler and its 395-line test file are gone. The architecture is cleaner now — there's one code path for exec, it runs against the user's identity, and the whole class of "who is the console SA and what can it impersonate" concerns is retired alongside the code. pkg/api/handlers/auth.go: The Logout handler used to call CancelUserExecSessions (#6024) to tear down exec streams that were running as the pod SA when the user's JWT was revoked. kc-agent exec is per-user local, so there are no cross-session exec streams to cancel on logout. The SSE cancellation (#6029) stays; it's unrelated. pkg/api/server.go: The backend /ws/exec route registration is gone. A comment at the same line points readers at pkg/agent/server_exec.go and web/src/hooks/useExecSession.ts for the replacement. web/src/hooks/useExecSession.ts: The WebSocket URL is now LOCAL_AGENT_WS_URL.replace(/\/ws$/, '/ws/exec') instead of ${window.location.host}/ws/exec so the browser connects to the user's own kc-agent at 127.0.0.1:8585. The first-message auth send is removed (kc-agent validates on upgrade). STORAGE_KEY_TOKEN import dropped; LOCAL_AGENT_WS_URL added. web/src/hooks/__tests__/useExecSession.test.ts (40/40 pass): Three assertion updates — sentMessages[0] is now exec_init (not auth), sentMessages[1] is no longer used. The "no auth token on connect" test is removed (that failure mode no longer exists on the frontend; kc-agent would 401 on upgrade instead, which the "constructor throwing" test already covers). Refs #7993. Fixes #5406. Signed-off-by: Andrew Anderson <[email protected]>
…5406) (#8168) Phase 3d of the pod-SA → kc-agent refactor. The /ws/exec WebSocket now runs under kc-agent using the user's own kubeconfig rather than the backend pod ServiceAccount. The target cluster's apiserver enforces RBAC natively against the caller's identity, so the SubjectAccessReview workaround the backend handler had to simulate (#8120, #5406) is no longer needed — the shell fails synchronously at stream-open time if the user's kubeconfig doesn't permit pods/exec on the target pod. Fixes #5406. pkg/agent/server_exec.go (new file, ~520 lines): handleExec is the kc-agent side of the pod exec WebSocket. The flow mirrors the backend handler minus the pod-SA-specific layers: 1. Token validation via the standard s.validateToken(r) path (Authorization header or ?token= query param fallback that kc-agent's /ws already uses) — no first-message JWT handshake. 2. Gorilla WebSocket upgrade. 3. Read the exec_init JSON frame (cluster/namespace/pod/container/ command/tty/cols/rows). 4. Resolve the clientset + REST config from the shared *k8s.MultiClusterClient (the user's kubeconfig). 5. Build the SPDY exec stream and spin up ping ticker + read loop goroutines under a shared writeMu. 6. Block on executor.StreamWithContext; on exit, drain the reader goroutine (#7048 / #7778 ordering preserved) and send the exit frame. What we kept from the backend handler because it's independent of the pod-SA problem: - The stdin drop telemetry (agentExecStdinDropCount + rate-limited WARN log on power-of-two session counts) from PR 7995. - The agentExecMaxStdinBytes frame cap (1 MiB). - The #7048 / #7778 ordering: drain reader before closing the resize queue channel to avoid send-on-closed panic. - Ping/pong keepalive so half-open TCP is detected within agentExecPongTimeout. - Terminal size queue for TTY resize events. - writeMu serializing every WriteMessage so stdout/stderr/ping/ exit/error frames never race. What we dropped because kc-agent uses the user's kubeconfig: - First-message JWT auth handshake (authMessage type). Auth is on the HTTP upgrade now. - execAuthorizer interface + authorizePodExec + the four sentinel errors errExecAuthorizerUnavailable / errExecMissingUserSubject / errExecSARFailed / errExecRBACDenied. The whole SubjectAccess- Review dance was there to simulate the user's identity against the pod SA's clientset (#8120, #8137, #8139, #8140); kc-agent's clientset IS the user's identity, so the apiserver enforces natively. - Per-user execSessionsRegistry / registerExecSession / unregisterExecSession / CancelUserExecSessions. kc-agent is a per-user local process with no logout concept — if the user closes their browser tab the WebSocket closes, the read loop exits, and execCancel fires via defer. #6024's cross-session cleanup is no longer needed. Named constants instead of literals per the repo rule: agentExecMaxStdinBytes (1 MiB), agentExecPingInterval (30s), agentExecPongTimeout (45s), agentExecWriteDeadline (10s — explicit because gorilla unlike gofiber does not have an implicit write timeout), agentExecStdinBufferSize (32), agentExecResizeBufferSize (4), agentExecDefaultCols (80), agentExecDefaultRows (24). pkg/agent/server.go: mux.HandleFunc("/ws/exec", s.handleExec) registered next to /ws with an explanatory comment. pkg/api/handlers/exec.go + exec_test.go (deleted): The entire 791-line backend handler and its 395-line test file are gone. The architecture is cleaner now — there's one code path for exec, it runs against the user's identity, and the whole class of "who is the console SA and what can it impersonate" concerns is retired alongside the code. pkg/api/handlers/auth.go: The Logout handler used to call CancelUserExecSessions (#6024) to tear down exec streams that were running as the pod SA when the user's JWT was revoked. kc-agent exec is per-user local, so there are no cross-session exec streams to cancel on logout. The SSE cancellation (#6029) stays; it's unrelated. pkg/api/server.go: The backend /ws/exec route registration is gone. A comment at the same line points readers at pkg/agent/server_exec.go and web/src/hooks/useExecSession.ts for the replacement. web/src/hooks/useExecSession.ts: The WebSocket URL is now LOCAL_AGENT_WS_URL.replace(/\/ws$/, '/ws/exec') instead of ${window.location.host}/ws/exec so the browser connects to the user's own kc-agent at 127.0.0.1:8585. The first-message auth send is removed (kc-agent validates on upgrade). STORAGE_KEY_TOKEN import dropped; LOCAL_AGENT_WS_URL added. web/src/hooks/__tests__/useExecSession.test.ts (40/40 pass): Three assertion updates — sentMessages[0] is now exec_init (not auth), sentMessages[1] is no longer used. The "no auth token on connect" test is removed (that failure mode no longer exists on the frontend; kc-agent would 401 on upgrade instead, which the "constructor throwing" test already covers). Refs #7993. Fixes #5406. Signed-off-by: Andrew Anderson <[email protected]>
Summary
Follow-up to #8134 addressing three Copilot review comments on the
pods/exec RBAC security fix. Every fail-closed branch from #8134 is
preserved — no part of the original security fix is weakened.
1.
NewExecHandlerstyped-nil interface footgun (pkg/api/handlers/exec.go)Before:
authorizer: k8sClientstored the*k8s.MultiClusterClientpointer into the
execAuthorizerinterface unconditionally. Whenk8sClientisnil, this produces a typed-nil interface — a value thatcompares non-nil via
== nilbut panics on method calls.HandleExec'sh.authorizer == nilfail-closed guard would silentlypass and the first
CheckPodExecPermissionForUsercall would blow up.After: explicitly leave
authorizeras a true untyped-nil interfacewhen
k8sClientisnil, soHandleExec's guard stays truthful:2. Test comment was misleading (
pkg/api/handlers/exec_test.go)The old
TestExecHandlers_AuthorizerWiredcomment claimed a typed-nilinterface was "distinguishable from untyped nil by
== nil" — the exactopposite of Go interface equality semantics. With fix #1,
h.authorizeris now a true untyped nil when
k8sClientis nil. The test now assertsrequire.Nil(t, h.authorizer)and the comment explains the contractthat the fix is defending.
3. Unit tests for allow/deny/error authz branches
Extracted the decision from the inline block in
HandleExecinto asmall helper so each branch has a dedicated unit test:
Returns
nilon allow, or one of four sentinel errors on fail-closed:errExecAuthorizerUnavailable—h.authorizer == nilerrExecMissingUserSubject— empty GitHub login in JWTerrExecSARFailed— wraps the underlying SAR errorerrExecRBACDenied— SAR returnedallowed=false(reason wrapped)HandleExecnow callsauthorizePodExecafter parsing the init messageand dispatches on
errors.Is()to write the appropriate websocket errorframe. All original log messages and error frames are preserved
byte-for-byte.
New tests:
TestAuthorizePodExec_Allowgithub:prefixTestAuthorizePodExec_DenyerrExecRBACDenied, reason in error stringTestAuthorizePodExec_NilAuthorizererrExecAuthorizerUnavailableTestAuthorizePodExec_SARErrorerrExecSARFailedwraps underlying errTestAuthorizePodExec_EmptyLoginerrExecMissingUserSubject, SAR NOT calledTest plan
go build ./...passesgo vet ./...passesgo test ./pkg/api/handlers/... -run Exec -count=1— all 13 exec tests pass (5 new)Fixes #8137