🔐 fix: enforce pods/exec RBAC check before opening exec stream (#8120)#8134
🔐 fix: enforce pods/exec RBAC check before opening exec stream (#8120)#8134clubanderson merged 1 commit intomainfrom
Conversation
HandleExec previously validated the caller's JWT and then opened a pod shell without ever asking Kubernetes whether the user was allowed to create pods/exec on the target pod. Any authenticated console user could exec into any pod the backend pod's ServiceAccount could reach (privilege escalation, #8120). Enforce authorization explicitly before any stream is opened: - Add MultiClusterClient.CheckPodExecPermissionForUser, which issues a SubjectAccessReview (not SelfSAR) against the target cluster's apiserver using the end user's identity. SelfSAR would reflect the backend pod SA's permissions — which is exactly the gap the issue described. SAR asks the apiserver about the user subject instead, so the decision is made by Kubernetes RBAC against the user's own bindings. - Wire a narrow execAuthorizer interface into ExecHandlers so the handler is unit-testable without a full MultiClusterClient. - In HandleExec, after JWT validation and init parsing, call the authorizer with "github:<login>" as the user subject. Fail-closed on (allowed=false), on SAR errors, on missing identity, and on a nil authorizer. All failure paths return before the executor is built — no WebSocket stream is opened unless the SAR said yes. - All timeouts, verb/resource strings, and subject prefix are named constants (execAuthzTimeout, execUserSubjectPrefix, podExecVerb/Resource/Subresource). Tests: - pkg/k8s: TestCheckPodExecPermissionForUser covers allowed, denied, SAR-error-fails-closed, and missing-identity-skips-SAR. Verifies the SAR carries exactly the verb/resource/subresource/namespace/name tuple kubelet enforces on pods/exec. - pkg/api/handlers: TestExecAuthorizerSeam_* covers the seam's allow/deny/error paths and TestExecHandlers_AuthorizerWired is a regression guard that the default authorizer is plumbed from NewExecHandlers. Fixes #8120 Signed-off-by: Andy Anderson <[email protected]>
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
✅ Deploy Preview for kubestellarconsole ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
|
👋 Hey @clubanderson — thanks for opening this PR!
This is an automated message. |
There was a problem hiding this comment.
Pull request overview
This PR fixes a privilege-escalation gap in the backend pod exec WebSocket handler by enforcing Kubernetes RBAC (create on pods/exec) for the authenticated end user before opening any exec streams.
Changes:
- Add
MultiClusterClient.CheckPodExecPermissionForUserthat performs aSubjectAccessReviewforcreate pods/execon the target pod. - Update
/ws/exec(HandleExec) to run the SAR (fail-closed) after parsing the init message and before creating the SPDY executor/stream. - Add tests for the new SAR behavior in
pkg/k8s, and introduce an authorizer seam in handlers.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| pkg/k8s/rbac.go | Adds SAR-based permission check for pods/exec with centralized RBAC tuple constants. |
| pkg/k8s/rbac_test.go | Adds unit tests covering allowed/denied/error/missing-identity SAR outcomes and payload correctness. |
| pkg/api/handlers/exec.go | Enforces RBAC via SAR before opening exec stream; introduces execAuthorizer seam and new constants. |
| pkg/api/handlers/exec_test.go | Adds fake authorizer seam tests and wiring check, plus existing exec session registry tests. |
| // authorizer defaults to the k8sClient argument — which is nil here. The | ||
| // important invariant is that NewExecHandlers does not leave authorizer | ||
| // as a zero value of a different type; an interface holding a typed nil | ||
| // is still distinguishable from an untyped nil by HandleExec's | ||
| // `h.authorizer == nil` guard. | ||
| require.Nil(t, h.k8sClient) | ||
| // When k8sClient is nil, authorizer is a typed-nil interface holding | ||
| // (*k8s.MultiClusterClient)(nil). HandleExec's early "No Kubernetes | ||
| // client available" check fires before the authorizer is consulted, so | ||
| // the typed-nil is never invoked in practice. We assert the k8sClient | ||
| // nil-check guard exists upstream by not dereferencing authorizer here. | ||
| _ = h.authorizer |
There was a problem hiding this comment.
The comment in this test claims a typed-nil interface is distinguishable from an untyped nil by h.authorizer == nil, but in Go an interface holding a typed-nil pointer is non-nil. This is misleading and can hide the typed-nil footgun in NewExecHandlers; either fix the production code to avoid typed-nil interfaces and assert the expected nil/non-nil behavior here, or update/remove this comment accordingly.
| // authorizer defaults to the k8sClient argument — which is nil here. The | |
| // important invariant is that NewExecHandlers does not leave authorizer | |
| // as a zero value of a different type; an interface holding a typed nil | |
| // is still distinguishable from an untyped nil by HandleExec's | |
| // `h.authorizer == nil` guard. | |
| require.Nil(t, h.k8sClient) | |
| // When k8sClient is nil, authorizer is a typed-nil interface holding | |
| // (*k8s.MultiClusterClient)(nil). HandleExec's early "No Kubernetes | |
| // client available" check fires before the authorizer is consulted, so | |
| // the typed-nil is never invoked in practice. We assert the k8sClient | |
| // nil-check guard exists upstream by not dereferencing authorizer here. | |
| _ = h.authorizer | |
| // authorizer defaults to the k8sClient argument — which is nil here. | |
| // In Go, assigning a typed nil pointer to an interface yields a non-nil | |
| // interface value that still carries the concrete type. Assert that | |
| // behavior explicitly so this test documents the typed-nil footgun rather | |
| // than masking it behind an unused reference. | |
| require.Nil(t, h.k8sClient) | |
| // When k8sClient is nil, authorizer is expected to hold a typed-nil | |
| // concrete value. HandleExec's early "No Kubernetes client available" | |
| // check should prevent that value from being invoked, but the interface | |
| // itself is not nil. | |
| require.NotNil(t, h.authorizer) |
| func NewExecHandlers(k8sClient *k8s.MultiClusterClient, jwtSecret string, devMode bool) *ExecHandlers { | ||
| return &ExecHandlers{ | ||
| k8sClient: k8sClient, | ||
| jwtSecret: jwtSecret, | ||
| devMode: devMode, | ||
| k8sClient: k8sClient, | ||
| authorizer: k8sClient, | ||
| jwtSecret: jwtSecret, | ||
| devMode: devMode, | ||
| } |
There was a problem hiding this comment.
NewExecHandlers assigns authorizer: k8sClient. If k8sClient is nil, this produces a non-nil interface holding a typed-nil *k8s.MultiClusterClient, so if h.authorizer == nil in HandleExec will not detect the misconfiguration (and would panic if reached). Set authorizer to nil when k8sClient is nil (or explicitly wrap with var a execAuthorizer; if k8sClient != nil { a = k8sClient }).
| // SECURITY (#8120): Enforce Kubernetes RBAC for pods/exec BEFORE opening | ||
| // any stream. The backend's clientset talks to the cluster as the pod's | ||
| // ServiceAccount, so a SelfSubjectAccessReview would reflect the SA's | ||
| // permissions — which is exactly the privilege-escalation path described | ||
| // in the bug report. Instead, we run a SubjectAccessReview with the end | ||
| // user's identity derived from the JWT, and let the target cluster's | ||
| // apiserver make the authorization decision using its own RBAC bindings. | ||
| // | ||
| // Fail-closed on every branch: if the user has no resolvable identity, | ||
| // if the SAR returns allowed=false, or if the SAR call errors out, we | ||
| // MUST deny the exec and return before touching the executor. | ||
| if h.authorizer == nil { | ||
| slog.Error("[Exec] SECURITY: authorizer not configured — denying exec", "user", claims.GitHubLogin) | ||
| writeError(c, "server misconfiguration: authorization unavailable") | ||
| return | ||
| } | ||
|
|
||
| userSubject := execUserSubjectPrefix + claims.GitHubLogin | ||
| if claims.GitHubLogin == "" { | ||
| slog.Warn("[Exec] SECURITY: denying exec — JWT has no GitHub login", "user_id", claims.UserID) | ||
| writeError(c, "user is not authorized to exec into this pod") | ||
| return | ||
| } | ||
|
|
||
| authzCtx, authzCancel := context.WithTimeout(execCtx, execAuthzTimeout) | ||
| allowed, reason, authzErr := h.authorizer.CheckPodExecPermissionForUser( | ||
| authzCtx, | ||
| init.Cluster, | ||
| userSubject, | ||
| nil, // no extra group memberships today; policy is purely user-based | ||
| init.Namespace, | ||
| init.Pod, | ||
| ) | ||
| authzCancel() | ||
| if authzErr != nil { | ||
| slog.Error("[Exec] SECURITY: pods/exec SubjectAccessReview failed — denying exec (fail-closed)", | ||
| "user", claims.GitHubLogin, | ||
| "cluster", init.Cluster, | ||
| "namespace", init.Namespace, | ||
| "pod", init.Pod, | ||
| "error", authzErr, | ||
| ) | ||
| writeError(c, "failed to verify exec permission; request denied") | ||
| return | ||
| } | ||
| if !allowed { | ||
| slog.Warn("[Exec] SECURITY: pods/exec denied by RBAC", | ||
| "user", claims.GitHubLogin, | ||
| "cluster", init.Cluster, | ||
| "namespace", init.Namespace, | ||
| "pod", init.Pod, | ||
| "reason", reason, | ||
| ) | ||
| writeError(c, "user is not authorized to exec into this pod") | ||
| return | ||
| } |
There was a problem hiding this comment.
The new pods/exec RBAC enforcement is security-critical, but there’s no unit test that drives HandleExec through the new allow/deny/error branches (e.g., asserting a deny/error returns before building the executor / calling GetClient / opening any stream). Consider extracting the authz decision into a small helper (or adding a websocket harness) so these branches are covered.
|
Thank you for your contribution! Your PR has been merged. Check out what's new:
Stay connected: Slack #kubestellar-dev | Multi-Cluster Survey |
|
Post-merge build verification passed ✅ Both Go and frontend builds compiled successfully against merge commit |
✅ Post-Merge Verification: passedCommit: |
Follow-up to #8134 addressing three Copilot review comments on the pods/exec RBAC security fix. 1. NewExecHandlers typed-nil interface footgun (exec.go) Before: `authorizer: k8sClient` unconditionally stored the *k8s. MultiClusterClient pointer into the execAuthorizer interface. When k8sClient is nil, this produces a typed-nil interface — a value that compares non-nil via `== nil` but panics on method calls. HandleExec's `h.authorizer == nil` fail-closed guard would silently pass and the first CheckPodExecPermissionForUser call would blow up. After: explicitly leave authorizer as an untyped-nil interface when k8sClient is nil, so the guard in HandleExec stays truthful. 2. Test comment was misleading (exec_test.go) The old TestExecHandlers_AuthorizerWired comment claimed a typed-nil interface was "distinguishable from untyped nil" by `== nil`, which is the exact opposite of how Go interface equality works. With fix #1 in place, h.authorizer is now a true untyped nil when k8sClient is nil, so the test asserts `require.Nil(t, h.authorizer)` and the comment explains why the contract matters. 3. Unit tests for the authorization decision (exec.go + exec_test.go) The allow/deny/error branches of the SubjectAccessReview path in HandleExec were not unit-tested — they lived inline inside a websocket handler, which makes isolated coverage awkward. Extracted the decision into (*ExecHandlers).authorizePodExec which returns nil on allow and a sentinel error on each fail-closed branch: - errExecAuthorizerUnavailable (h.authorizer == nil) - errExecMissingUserSubject (empty GitHub login) - errExecSARFailed (wraps SAR call error) - errExecRBACDenied (allowed=false; reason wrapped) HandleExec now calls authorizePodExec after parsing the init message and dispatches on errors.Is() to write the appropriate websocket error frame. Every fail-closed branch is preserved — no part of #8134's security fix is weakened. New tests in exec_test.go cover: - TestAuthorizePodExec_Allow → nil err, SAR called once - TestAuthorizePodExec_Deny → errExecRBACDenied - TestAuthorizePodExec_NilAuthorizer → errExecAuthorizerUnavailable - TestAuthorizePodExec_SARError → errExecSARFailed wraps err - TestAuthorizePodExec_EmptyLogin → errExecMissingUserSubject, SAR NOT called (calls == 0) Fixes #8137 Signed-off-by: Andy Anderson <[email protected]>
Follow-up to #8134 addressing three Copilot review comments on the pods/exec RBAC security fix. 1. NewExecHandlers typed-nil interface footgun (exec.go) Before: `authorizer: k8sClient` unconditionally stored the *k8s. MultiClusterClient pointer into the execAuthorizer interface. When k8sClient is nil, this produces a typed-nil interface — a value that compares non-nil via `== nil` but panics on method calls. HandleExec's `h.authorizer == nil` fail-closed guard would silently pass and the first CheckPodExecPermissionForUser call would blow up. After: explicitly leave authorizer as an untyped-nil interface when k8sClient is nil, so the guard in HandleExec stays truthful. 2. Test comment was misleading (exec_test.go) The old TestExecHandlers_AuthorizerWired comment claimed a typed-nil interface was "distinguishable from untyped nil" by `== nil`, which is the exact opposite of how Go interface equality works. With fix #1 in place, h.authorizer is now a true untyped nil when k8sClient is nil, so the test asserts `require.Nil(t, h.authorizer)` and the comment explains why the contract matters. 3. Unit tests for the authorization decision (exec.go + exec_test.go) The allow/deny/error branches of the SubjectAccessReview path in HandleExec were not unit-tested — they lived inline inside a websocket handler, which makes isolated coverage awkward. Extracted the decision into (*ExecHandlers).authorizePodExec which returns nil on allow and a sentinel error on each fail-closed branch: - errExecAuthorizerUnavailable (h.authorizer == nil) - errExecMissingUserSubject (empty GitHub login) - errExecSARFailed (wraps SAR call error) - errExecRBACDenied (allowed=false; reason wrapped) HandleExec now calls authorizePodExec after parsing the init message and dispatches on errors.Is() to write the appropriate websocket error frame. Every fail-closed branch is preserved — no part of #8134's security fix is weakened. New tests in exec_test.go cover: - TestAuthorizePodExec_Allow → nil err, SAR called once - TestAuthorizePodExec_Deny → errExecRBACDenied - TestAuthorizePodExec_NilAuthorizer → errExecAuthorizerUnavailable - TestAuthorizePodExec_SARError → errExecSARFailed wraps err - TestAuthorizePodExec_EmptyLogin → errExecMissingUserSubject, SAR NOT called (calls == 0) Fixes #8137 Signed-off-by: Andy Anderson <[email protected]>
Addresses three Copilot review comments on merged PR #8139: 1. exec.go authorizePodExec: the SAR-call-failed branch formatted the underlying error with `%v`, so `errors.Is(err, underlyingSARError)` silently returned false despite surrounding comments and the test name implying the underlying error was recoverable. Switched to double-%w (Go 1.20+): `fmt.Errorf("%w: %w", errExecSARFailed, err)` so callers can recover BOTH the sentinel AND the underlying SAR error via errors.Is. 2. exec_test.go TestAuthorizePodExec_SARError: the comment claimed the underlying error was recoverable, but the test only asserted the sentinel. Added `require.ErrorIs(t, err, sentinel)` so the double-%w contract is locked in by a test, not just a comment. 3. exec.go HandleExec deny branch: pre-#8139 logged the SAR deny reason under a structured `"reason"` slog key; #8139 accidentally embedded it only in the error string, breaking operator log queries that filter by the `reason` attribute. To restore the structured field without re-parsing error strings, authorizePodExec now returns `(execAuthzResult, error)` where execAuthzResult.reason is populated ONLY on errExecRBACDenied (empty on every other branch — allow, nil authorizer, missing subject, SAR failure). HandleExec passes authzResult.reason to slog.String("reason", ...) on the deny branch, alongside the wrapped error. Added TestAuthorizePodExec_Deny assertion that res.reason is the exact apiserver reason string (non-empty, distinct from the error), and a new TestAuthorizePodExec_DenyNoReason covering the empty-reason subcase. Non-deny branches assert res.reason stays empty so callers never accidentally log a synthesized reason. No behavior change to the security decision: every fail-closed branch from #8134/#8137/#8139 still denies the exec before the executor is built. The `h.authorizer == nil` typed-nil guard from #8139 is untouched. build/vet/go test ./pkg/api/handlers/... -run Exec: all 13 tests pass. Fixes #8140 Signed-off-by: Andy Anderson <[email protected]>
) Addresses three Copilot review comments on merged PR #8139: 1. exec.go authorizePodExec: the SAR-call-failed branch formatted the underlying error with `%v`, so `errors.Is(err, underlyingSARError)` silently returned false despite surrounding comments and the test name implying the underlying error was recoverable. Switched to double-%w (Go 1.20+): `fmt.Errorf("%w: %w", errExecSARFailed, err)` so callers can recover BOTH the sentinel AND the underlying SAR error via errors.Is. 2. exec_test.go TestAuthorizePodExec_SARError: the comment claimed the underlying error was recoverable, but the test only asserted the sentinel. Added `require.ErrorIs(t, err, sentinel)` so the double-%w contract is locked in by a test, not just a comment. 3. exec.go HandleExec deny branch: pre-#8139 logged the SAR deny reason under a structured `"reason"` slog key; #8139 accidentally embedded it only in the error string, breaking operator log queries that filter by the `reason` attribute. To restore the structured field without re-parsing error strings, authorizePodExec now returns `(execAuthzResult, error)` where execAuthzResult.reason is populated ONLY on errExecRBACDenied (empty on every other branch — allow, nil authorizer, missing subject, SAR failure). HandleExec passes authzResult.reason to slog.String("reason", ...) on the deny branch, alongside the wrapped error. Added TestAuthorizePodExec_Deny assertion that res.reason is the exact apiserver reason string (non-empty, distinct from the error), and a new TestAuthorizePodExec_DenyNoReason covering the empty-reason subcase. Non-deny branches assert res.reason stays empty so callers never accidentally log a synthesized reason. No behavior change to the security decision: every fail-closed branch from #8134/#8137/#8139 still denies the exec before the executor is built. The `h.authorizer == nil` typed-nil guard from #8139 is untouched. build/vet/go test ./pkg/api/handlers/... -run Exec: all 13 tests pass. Fixes #8140 Signed-off-by: Andy Anderson <[email protected]>
Security
Fixes #8120 —
HandleExecpreviously accepted any valid JWT and opened a pod shell without asking Kubernetes whether the caller had permission tocreate pods/execon the target pod. Any authenticated console user could exec into any pod the backend pod ServiceAccount could reach (privilege escalation).Fix path
Inline
SubjectAccessReviewagainst the target cluster's apiserver using the end user's identity — the project's kc-agent has no exec endpoint today, so routing through it is out of scope for this fix. The important property is that the authorization decision is made by Kubernetes RBAC against the user's subject, not against the backend pod SA:MultiClusterClient.CheckPodExecPermissionForUserissues aSubjectAccessReview(not SelfSAR — SelfSAR would reflect the pod SA's permissions, which is exactly the gap bug: Missing RBAC check before pod exec allows unauthorized access #8120 described).HandleExecnow calls this check after parsing the init message (so we have cluster/namespace/pod) and before building the exec request or opening any stream. Fail-closed on deny, SAR error, missing identity, and nil authorizer.github:<login>— centralised asexecUserSubjectPrefixso cluster admins can bind RBAC to the same principal the console authenticated via GitHub OAuth.execAuthorizerinterface added soHandleExecis unit-testable without standing up a fullMultiClusterClient.execAuthzTimeout,execUserSubjectPrefix,podExecVerb,podExecResource,podExecSubresource) per the repo's no-magic-numbers rule.Test plan
go build ./...go vet ./...go test ./pkg/k8s/... -run TestCheckPodExecPermissionForUser— covers allowed, denied, SAR-error-fails-closed, missing-identity-skips-SAR; asserts the SAR payload carries exactly theverb/resource/subresource/namespace/nametuple kubelet enforces onpods/exec.go test ./pkg/api/handlers/... -run TestExec— covers the authorizer seam's allow/deny/error paths and confirmsNewExecHandlersplumbs the default authorizer.writeError("user is not authorized to exec into this pod")before any WS stream frames.Notes for reviewers
User: github:<login>(or a group containing that subject) for any user who should be allowed to exec. Until such bindings exist, all exec attempts will fail-closed. This is the intended behaviour for a privilege-escalation fix — the previous implicit allow-everyone was the bug.Fixes #8120