Architectural rule (confirmed 2026-04-14)
The console running on a cluster is not supposed to give anyone elevated access to the cluster via the pod ServiceAccount. It is supposed to work just like localhost: each user brings their own kc-agent + kubeconfig for their own use — not shared. As long as GPU reservation continues to work, it is fine to break in-cluster functionality for users who don't have a local kc-agent.
Pod SA may only be used for:
- Bootstrapping the console as a Deployment (frontend + internal console state).
- GPU reservation exception — namespace create + ResourceQuota create/update/delete (
namespaces.go GPU path, mcp_resources.go ResourceQuota handlers).
- Self-upgrade exception —
self_upgrade.go patching its own Deployment.
Every other k8s operation against a managed cluster MUST go through kc-agent at LOCAL_AGENT_HTTP_URL / LOCAL_AGENT_WS_URL (ws://127.0.0.1:8585), which loads the user's kubeconfig and respects per-cluster RBAC automatically via the apiserver.
This was originally filed (from #7979) as a narrow CheckCanI SSAR identity bug. The audit revealed it's an architectural migration gap — some handlers already route through kc-agent, many don't. CheckCanI is a symptom, not the disease.
Phase 0 audit — violation inventory
pkg/api/handlers/ call sites (mutating or dynamic)
| file |
handler |
verb + resource |
class |
| workloads.go:199 |
DeployWorkload |
create Deployment/STS/DS/Svc/CM/Secret bundle |
MIGRATE |
| workloads.go:506/568/576/625 |
node-label group flow |
patch Node `kubestellar.io/group` |
MIGRATE |
| workloads.go:1112 |
ScaleWorkload |
patch/scale Deployment/STS |
MIGRATE |
| workloads.go:1139 |
DeleteWorkload |
delete Deployment bundle |
MIGRATE |
| namespaces.go:98 |
CreateNamespace |
create Namespace |
SPLIT: GPU path KEEP, general MIGRATE |
| namespaces.go:138 |
DeleteNamespace |
delete Namespace |
SPLIT: same policy |
| namespaces.go:279 |
GrantNamespaceAccess |
create RoleBinding |
MIGRATE |
| namespaces.go:321 |
RevokeNamespaceAccess |
delete RoleBinding |
MIGRATE |
| rbac.go:461 |
CreateServiceAccount |
create ServiceAccount |
MIGRATE |
| rbac.go:554 |
CreateRoleBinding |
create RoleBinding |
MIGRATE |
| mcs.go:229 |
CreateServiceExport |
create ServiceExport |
MIGRATE |
| mcs.go:256 |
DeleteServiceExport |
delete ServiceExport |
MIGRATE |
| mcp_resources.go:225 |
InstallGPUHealthCronJob |
create CronJob+RBAC |
MIGRATE |
| mcp_resources.go:261 |
UninstallGPUHealthCronJob |
delete CronJob+RBAC |
MIGRATE |
| mcp_resources.go:892 |
CreateOrUpdateResourceQuota |
create/update ResourceQuota |
KEEP (GPU) |
| mcp_resources.go:928 |
DeleteResourceQuota |
delete ResourceQuota |
KEEP (GPU) |
| gitops.go (13 exec sites + 1 dyn) |
helm / kubectl / argocd / git |
all verbs |
MIGRATE — shells out with pod kubeconfig |
| self_upgrade.go:113/126/148/388 |
status/apply |
own Deployment |
KEEP (self-upgrade) |
| console_persistence.go + console_resources.go (~13 write sites) |
CRUD ManagedWorkload / ClusterGroup / WorkloadDeployment CRs |
create/update/delete CR |
MIGRATE |
| custom_resources.go:171, crds.go:89, admission_webhooks.go:113, service_exports.go:94 |
list |
READ-ONLY |
MIGRATE (view-leak) |
| exec.go:363/369 |
WS exec via SPDY |
pod exec |
DELETE — local kc-agent WS already handles, closes #5406 |
| sse.go (15 sites) + mcp_resources.go/mcp_workloads.go/mcp_cluster.go/rbac.go/gateway.go/mcs.go/topology.go reads (~120 sites) |
list/get |
various |
MIGRATE (view-leak, Phase 4.5) |
| k8s/rbac.go:289-312 CheckClusterAdminAccess |
SSAR |
via shared client |
DELETE — guard used in namespaces.go is invalid; replaced by kc-agent routing |
| k8s/rbac.go:619-648 CheckCanI |
SSAR |
via shared client |
DELETE or make GPU-specific — no general-purpose consumer after migration |
Frontend call sites to migrate
- `hooks/useWorkloads.ts` (deploy, scale, delete)
- `hooks/useUsers.ts` (service accounts, bindings)
- `hooks/useMCS.ts` (service exports)
- `hooks/useArgoCD.ts` (sync, applicationsets, detect-drift)
- `components/gitops/SyncDialog.tsx` (sync, detect-drift)
- `components/namespaces/{CreateNamespaceModal,NamespaceManager,GrantAccessModal}.tsx`
- `hooks/mcp/storage.ts` — ResourceQuota paths KEEP (GPU)
- `hooks/useCachedData.ts` — GPU health cronjob → MIGRATE
- `components/drilldown/RemediationConsole.tsx` — MCP ops tools (per-tool review)
kc-agent coverage gaps (new routes needed in `pkg/agent/server.go`)
kc-agent currently has only one mutating k8s route: `POST /scale`. All below need to be added:
- `POST /workloads/deploy` (bundle create — replaces `DeployWorkload`, ~400 LOC port)
- `POST /workloads/delete`
- `POST/DELETE /namespaces` (general create/delete)
- `POST/DELETE /rolebindings` (namespace access grant/revoke + rbac.CreateRoleBinding)
- `POST /serviceaccounts`
- `POST/DELETE /serviceexports` (mcs)
- `POST /gitops/helm-{rollback,uninstall,upgrade}` (new shell-out handlers)
- `POST /gitops/detect-drift` + `POST /gitops/sync`
- `POST /argocd/sync`
- `POST /gpu-health-cronjob` (install/uninstall — MIGRATE classification)
- `POST/PUT/DELETE /console-cr/*` (ManagedWorkload, ClusterGroup, WorkloadDeployment)
- Optional: `POST /node-label` for `kubestellar.io/group` patches
Phase plan (refined against real scope)
| Phase |
Scope |
Size |
Blocks |
| 1 |
workloads.go: Scale (frontend swap to existing `/scale`); Deploy + Delete (port bundling logic to kc-agent); decide node-label routing |
Medium (~400 LOC port) |
— |
| 1.5 |
rbac.go + mcs.go: CreateServiceAccount, CreateRoleBinding, Create/Delete ServiceExport |
Small |
new `/rolebindings` route used in Phase 2 |
| 2 |
namespaces.go SPLIT: keep GPU-reservation path, migrate general create/delete + grant/revoke |
Medium |
1.5 |
| 2.5 |
console_persistence.go: migrate CR writes via new agent `/console-cr/*` routes |
Medium |
— |
| 3a |
kc-agent helm handlers (rollback/uninstall/upgrade) |
Medium — shell-out wrappers |
— |
| 3b |
kc-agent drift-detect + kubectl-sync handlers |
Small |
— |
| 3c |
kc-agent ArgoCD handlers (sync + CR update) |
Medium |
— |
| 3d |
Delete backend exec handler (local kc-agent already serves) — closes #5406 |
Small |
— |
| 3e |
Migrate InstallGPUHealthCronJob + node-label to kc-agent |
Small |
— |
| 4 |
Delete backend gitops.go handlers; frontend migration to new agent routes |
Medium |
3a/3b/3c |
| 4.5 |
Read-leak cleanup: migrate ~150 list/get call sites in sse/mcp_*/rbac/gateway/mcs/topology/crds/admission_webhooks/custom_resources/service_exports |
Large-LOC, mechanical |
— |
| 5 |
Rename `MultiClusterClient` → `PrivilegedClient`; add CI lint blocking new `k8sClient.*Create/Delete/Update/Patch` in `pkg/api/handlers/` outside allowlist; delete/narrow `CheckCanI` and `CheckClusterAdminAccess` |
Small |
all prior |
Closes on merge
This issue + #5406 (exec backend handler documented limitation — deleted in Phase 3d).
Expected user-visible effect
In-cluster users without a local kc-agent lose destructive operations as each phase lands. This is the stated architectural intent. GPU reservation continues to work throughout (the only pod-SA path for user-initiated action). Local-mode users are unaffected because the backend falls through to `~/.kube/config` which is what kc-agent uses anyway.
Status
- Phase 0 audit: complete (this issue body)
- Phase 1: awaiting user green-light to launch
Architectural rule (confirmed 2026-04-14)
Pod SA may only be used for:
namespaces.goGPU path,mcp_resources.goResourceQuota handlers).self_upgrade.gopatching its own Deployment.Every other k8s operation against a managed cluster MUST go through kc-agent at
LOCAL_AGENT_HTTP_URL/LOCAL_AGENT_WS_URL(ws://127.0.0.1:8585), which loads the user's kubeconfig and respects per-cluster RBAC automatically via the apiserver.This was originally filed (from #7979) as a narrow
CheckCanISSAR identity bug. The audit revealed it's an architectural migration gap — some handlers already route through kc-agent, many don't. CheckCanI is a symptom, not the disease.Phase 0 audit — violation inventory
pkg/api/handlers/call sites (mutating or dynamic)Frontend call sites to migrate
kc-agent coverage gaps (new routes needed in `pkg/agent/server.go`)
kc-agent currently has only one mutating k8s route: `POST /scale`. All below need to be added:
Phase plan (refined against real scope)
Closes on merge
This issue + #5406 (exec backend handler documented limitation — deleted in Phase 3d).
Expected user-visible effect
In-cluster users without a local kc-agent lose destructive operations as each phase lands. This is the stated architectural intent. GPU reservation continues to work throughout (the only pod-SA path for user-initiated action). Local-mode users are unaffected because the backend falls through to `~/.kube/config` which is what kc-agent uses anyway.
Status