refactor: backend pod SA must not be used for user-initiated k8s actions (umbrella — was CheckCanI)

## Architectural rule (confirmed 2026-04-14)

> The console running on a cluster is **not** supposed to give anyone elevated access to the cluster via the pod ServiceAccount. It is supposed to work just like localhost: each user brings their own kc-agent + kubeconfig for their own use — **not shared**. As long as GPU reservation continues to work, it is fine to break in-cluster functionality for users who don't have a local kc-agent.

**Pod SA may only be used for:**
1. Bootstrapping the console as a Deployment (frontend + internal console state).
2. **GPU reservation exception** — namespace create + ResourceQuota create/update/delete (`namespaces.go` GPU path, `mcp_resources.go` ResourceQuota handlers).
3. **Self-upgrade exception** — `self_upgrade.go` patching its own Deployment.

**Every other k8s operation against a managed cluster MUST go through kc-agent** at `LOCAL_AGENT_HTTP_URL` / `LOCAL_AGENT_WS_URL` (`ws://127.0.0.1:8585`), which loads the user's kubeconfig and respects per-cluster RBAC automatically via the apiserver.

This was originally filed (from #7979) as a narrow `CheckCanI` SSAR identity bug. The audit revealed it's an architectural migration gap — some handlers already route through kc-agent, many don't. CheckCanI is a symptom, not the disease.

## Phase 0 audit — violation inventory

### `pkg/api/handlers/` call sites (mutating or dynamic)

| file | handler | verb + resource | class |
|---|---|---|---|
| workloads.go:199 | DeployWorkload | create Deployment/STS/DS/Svc/CM/Secret bundle | MIGRATE |
| workloads.go:506/568/576/625 | node-label group flow | patch Node \`kubestellar.io/group\` | MIGRATE |
| workloads.go:1112 | ScaleWorkload | patch/scale Deployment/STS | MIGRATE |
| workloads.go:1139 | DeleteWorkload | delete Deployment bundle | MIGRATE |
| namespaces.go:98 | CreateNamespace | create Namespace | **SPLIT**: GPU path KEEP, general MIGRATE |
| namespaces.go:138 | DeleteNamespace | delete Namespace | **SPLIT**: same policy |
| namespaces.go:279 | GrantNamespaceAccess | create RoleBinding | MIGRATE |
| namespaces.go:321 | RevokeNamespaceAccess | delete RoleBinding | MIGRATE |
| rbac.go:461 | CreateServiceAccount | create ServiceAccount | MIGRATE |
| rbac.go:554 | CreateRoleBinding | create RoleBinding | MIGRATE |
| mcs.go:229 | CreateServiceExport | create ServiceExport | MIGRATE |
| mcs.go:256 | DeleteServiceExport | delete ServiceExport | MIGRATE |
| mcp_resources.go:225 | InstallGPUHealthCronJob | create CronJob+RBAC | MIGRATE |
| mcp_resources.go:261 | UninstallGPUHealthCronJob | delete CronJob+RBAC | MIGRATE |
| **mcp_resources.go:892** | CreateOrUpdateResourceQuota | create/update ResourceQuota | **KEEP (GPU)** |
| **mcp_resources.go:928** | DeleteResourceQuota | delete ResourceQuota | **KEEP (GPU)** |
| gitops.go (13 exec sites + 1 dyn) | helm / kubectl / argocd / git | all verbs | MIGRATE — shells out with pod kubeconfig |
| **self_upgrade.go:113/126/148/388** | status/apply | own Deployment | **KEEP (self-upgrade)** |
| console_persistence.go + console_resources.go (~13 write sites) | CRUD ManagedWorkload / ClusterGroup / WorkloadDeployment CRs | create/update/delete CR | MIGRATE |
| custom_resources.go:171, crds.go:89, admission_webhooks.go:113, service_exports.go:94 | list | READ-ONLY | MIGRATE (view-leak) |
| exec.go:363/369 | WS exec via SPDY | pod exec | **DELETE** — local kc-agent WS already handles, closes #5406 |
| sse.go (15 sites) + mcp_resources.go/mcp_workloads.go/mcp_cluster.go/rbac.go/gateway.go/mcs.go/topology.go reads (~120 sites) | list/get | various | MIGRATE (view-leak, Phase 4.5) |
| k8s/rbac.go:289-312 CheckClusterAdminAccess | SSAR | via shared client | **DELETE** — guard used in namespaces.go is invalid; replaced by kc-agent routing |
| k8s/rbac.go:619-648 CheckCanI | SSAR | via shared client | **DELETE or make GPU-specific** — no general-purpose consumer after migration |

### Frontend call sites to migrate

- \`hooks/useWorkloads.ts\` (deploy, scale, delete)
- \`hooks/useUsers.ts\` (service accounts, bindings)
- \`hooks/useMCS.ts\` (service exports)
- \`hooks/useArgoCD.ts\` (sync, applicationsets, detect-drift)
- \`components/gitops/SyncDialog.tsx\` (sync, detect-drift)
- \`components/namespaces/{CreateNamespaceModal,NamespaceManager,GrantAccessModal}.tsx\`
- \`hooks/mcp/storage.ts\` — ResourceQuota paths **KEEP** (GPU)
- \`hooks/useCachedData.ts\` — GPU health cronjob → MIGRATE
- \`components/drilldown/RemediationConsole.tsx\` — MCP ops tools (per-tool review)

### kc-agent coverage gaps (new routes needed in \`pkg/agent/server.go\`)

kc-agent currently has **only one mutating k8s route**: \`POST /scale\`. All below need to be added:

- \`POST /workloads/deploy\` (bundle create — replaces \`DeployWorkload\`, ~400 LOC port)
- \`POST /workloads/delete\`
- \`POST/DELETE /namespaces\` (general create/delete)
- \`POST/DELETE /rolebindings\` (namespace access grant/revoke + rbac.CreateRoleBinding)
- \`POST /serviceaccounts\`
- \`POST/DELETE /serviceexports\` (mcs)
- \`POST /gitops/helm-{rollback,uninstall,upgrade}\` (new shell-out handlers)
- \`POST /gitops/detect-drift\` + \`POST /gitops/sync\`
- \`POST /argocd/sync\`
- \`POST /gpu-health-cronjob\` (install/uninstall — MIGRATE classification)
- \`POST/PUT/DELETE /console-cr/*\` (ManagedWorkload, ClusterGroup, WorkloadDeployment)
- Optional: \`POST /node-label\` for \`kubestellar.io/group\` patches

### Phase plan (refined against real scope)

| Phase | Scope | Size | Blocks |
|---|---|---|---|
| **1** | workloads.go: Scale (frontend swap to existing \`/scale\`); Deploy + Delete (port bundling logic to kc-agent); decide node-label routing | Medium (~400 LOC port) | — |
| **1.5** | rbac.go + mcs.go: CreateServiceAccount, CreateRoleBinding, Create/Delete ServiceExport | Small | new \`/rolebindings\` route used in Phase 2 |
| **2** | namespaces.go SPLIT: keep GPU-reservation path, migrate general create/delete + grant/revoke | Medium | 1.5 |
| **2.5** | console_persistence.go: migrate CR writes via new agent \`/console-cr/*\` routes | Medium | — |
| **3a** | kc-agent helm handlers (rollback/uninstall/upgrade) | Medium — shell-out wrappers | — |
| **3b** | kc-agent drift-detect + kubectl-sync handlers | Small | — |
| **3c** | kc-agent ArgoCD handlers (sync + CR update) | Medium | — |
| **3d** | Delete backend exec handler (local kc-agent already serves) — closes #5406 | Small | — |
| **3e** | Migrate InstallGPUHealthCronJob + node-label to kc-agent | Small | — |
| **4** | Delete backend gitops.go handlers; frontend migration to new agent routes | Medium | 3a/3b/3c |
| **4.5** | Read-leak cleanup: migrate ~150 list/get call sites in sse/mcp_*/rbac/gateway/mcs/topology/crds/admission_webhooks/custom_resources/service_exports | Large-LOC, mechanical | — |
| **5** | Rename \`MultiClusterClient\` → \`PrivilegedClient\`; add CI lint blocking new \`k8sClient.*Create/Delete/Update/Patch\` in \`pkg/api/handlers/\` outside allowlist; delete/narrow \`CheckCanI\` and \`CheckClusterAdminAccess\` | Small | all prior |

### Closes on merge

This issue + **#5406** (exec backend handler documented limitation — deleted in Phase 3d).

### Expected user-visible effect

**In-cluster users without a local kc-agent lose destructive operations as each phase lands.** This is the stated architectural intent. GPU reservation continues to work throughout (the only pod-SA path for user-initiated action). Local-mode users are unaffected because the backend falls through to \`~/.kube/config\` which is what kc-agent uses anyway.

### Status

- Phase 0 audit: **complete** (this issue body)
- Phase 1: awaiting user green-light to launch

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: backend pod SA must not be used for user-initiated k8s actions (umbrella — was CheckCanI) #7993

Architectural rule (confirmed 2026-04-14)

Phase 0 audit — violation inventory

`pkg/api/handlers/` call sites (mutating or dynamic)

Frontend call sites to migrate

kc-agent coverage gaps (new routes needed in `pkg/agent/server.go`)

Phase plan (refined against real scope)

Closes on merge

Expected user-visible effect

Status

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

file	handler	verb + resource	class
workloads.go:199	DeployWorkload	create Deployment/STS/DS/Svc/CM/Secret bundle	MIGRATE
workloads.go:506/568/576/625	node-label group flow	patch Node `kubestellar.io/group`	MIGRATE
workloads.go:1112	ScaleWorkload	patch/scale Deployment/STS	MIGRATE
workloads.go:1139	DeleteWorkload	delete Deployment bundle	MIGRATE
namespaces.go:98	CreateNamespace	create Namespace	SPLIT: GPU path KEEP, general MIGRATE
namespaces.go:138	DeleteNamespace	delete Namespace	SPLIT: same policy
namespaces.go:279	GrantNamespaceAccess	create RoleBinding	MIGRATE
namespaces.go:321	RevokeNamespaceAccess	delete RoleBinding	MIGRATE
rbac.go:461	CreateServiceAccount	create ServiceAccount	MIGRATE
rbac.go:554	CreateRoleBinding	create RoleBinding	MIGRATE
mcs.go:229	CreateServiceExport	create ServiceExport	MIGRATE
mcs.go:256	DeleteServiceExport	delete ServiceExport	MIGRATE
mcp_resources.go:225	InstallGPUHealthCronJob	create CronJob+RBAC	MIGRATE
mcp_resources.go:261	UninstallGPUHealthCronJob	delete CronJob+RBAC	MIGRATE
mcp_resources.go:892	CreateOrUpdateResourceQuota	create/update ResourceQuota	KEEP (GPU)
mcp_resources.go:928	DeleteResourceQuota	delete ResourceQuota	KEEP (GPU)
gitops.go (13 exec sites + 1 dyn)	helm / kubectl / argocd / git	all verbs	MIGRATE — shells out with pod kubeconfig
self_upgrade.go:113/126/148/388	status/apply	own Deployment	KEEP (self-upgrade)
console_persistence.go + console_resources.go (~13 write sites)	CRUD ManagedWorkload / ClusterGroup / WorkloadDeployment CRs	create/update/delete CR	MIGRATE
custom_resources.go:171, crds.go:89, admission_webhooks.go:113, service_exports.go:94	list	READ-ONLY	MIGRATE (view-leak)
exec.go:363/369	WS exec via SPDY	pod exec	DELETE — local kc-agent WS already handles, closes #5406
sse.go (15 sites) + mcp_resources.go/mcp_workloads.go/mcp_cluster.go/rbac.go/gateway.go/mcs.go/topology.go reads (~120 sites)	list/get	various	MIGRATE (view-leak, Phase 4.5)
k8s/rbac.go:289-312 CheckClusterAdminAccess	SSAR	via shared client	DELETE — guard used in namespaces.go is invalid; replaced by kc-agent routing
k8s/rbac.go:619-648 CheckCanI	SSAR	via shared client	DELETE or make GPU-specific — no general-purpose consumer after migration

Phase	Scope	Size	Blocks
1	workloads.go: Scale (frontend swap to existing `/scale`); Deploy + Delete (port bundling logic to kc-agent); decide node-label routing	Medium (~400 LOC port)	—
1.5	rbac.go + mcs.go: CreateServiceAccount, CreateRoleBinding, Create/Delete ServiceExport	Small	new `/rolebindings` route used in Phase 2
2	namespaces.go SPLIT: keep GPU-reservation path, migrate general create/delete + grant/revoke	Medium	1.5
2.5	console_persistence.go: migrate CR writes via new agent `/console-cr/*` routes	Medium	—
3a	kc-agent helm handlers (rollback/uninstall/upgrade)	Medium — shell-out wrappers	—
3b	kc-agent drift-detect + kubectl-sync handlers	Small	—
3c	kc-agent ArgoCD handlers (sync + CR update)	Medium	—
3d	Delete backend exec handler (local kc-agent already serves) — closes #5406	Small	—
3e	Migrate InstallGPUHealthCronJob + node-label to kc-agent	Small	—
4	Delete backend gitops.go handlers; frontend migration to new agent routes	Medium	3a/3b/3c
4.5	Read-leak cleanup: migrate ~150 list/get call sites in sse/mcp_*/rbac/gateway/mcs/topology/crds/admission_webhooks/custom_resources/service_exports	Large-LOC, mechanical	—
5	Rename `MultiClusterClient` → `PrivilegedClient`; add CI lint blocking new `k8sClient.*Create/Delete/Update/Patch` in `pkg/api/handlers/` outside allowlist; delete/narrow `CheckCanI` and `CheckClusterAdminAccess`	Small	all prior

refactor: backend pod SA must not be used for user-initiated k8s actions (umbrella — was CheckCanI) #7993

Description

Architectural rule (confirmed 2026-04-14)

Phase 0 audit — violation inventory

pkg/api/handlers/ call sites (mutating or dynamic)

Frontend call sites to migrate

kc-agent coverage gaps (new routes needed in `pkg/agent/server.go`)

Phase plan (refined against real scope)

Closes on merge

Expected user-visible effect

Status

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`pkg/api/handlers/` call sites (mutating or dynamic)