🐛 fix(gpu): centralize GPU resource names + fix multi-accelerator sum + tests (Issue 9204 follow-up) by clubanderson · Pull Request #9216 · kubestellar/console

clubanderson · 2026-04-20T18:11:21Z

Summary

Follow-up PR addressing the 3 Copilot review findings on #9204:

Multi-accelerator undercount (real bug) — ci.GPURequested was overwritten per matching resource name, so a container requesting both nvidia.com/gpu and habana.ai/gaudi returned whichever came last in Go's randomized map iteration.
Duplicate list between client_resources.go and client_gpu.go — exact drift surface that Issue 9090 originally suffered.
No test coverage for the Intel / Gaudi names introduced in 🐛 fix(gpu): track Intel and Gaudi GPU requests in pod resource views (Issue 9090) #9204.

Changes

New pkg/k8s/gpu_resources.go — single source of truth:
- GPUResourceNames (exported slice)
- IsGPUResourceName(name) predicate
- SumGPURequested(rl) — sums across ALL known names (fixes 🐛 Fix Playwright E2E test authentication and CI workflow #1)
pkg/k8s/client_resources.go — uses SumGPURequested on Requests, falls back to Limits.
pkg/k8s/client_gpu.go — replaces six repeated if ok blocks with a single SumGPURequested call on the Gaudi-era path, so pod and node trackers read the same list.
New pkg/k8s/gpu_resources_test.go:
- TestIsGPUResourceName — 6 known names (true) + 7 adjacent names including google.com/tpu, intel.com/xpu, nvidia.com/mig-4g.20gb (false)
- TestSumGPURequested — empty, single-name, multi-accelerator regression (nvidia=1 + gaudi=2 → 3), all-six-summed, non-GPU ignored

Context

Related PR: #9204 (original #9090 fix, merged 2026-04-20).
Copilot review comments addressed:

client_resources.go:39 — centralize shared list
client_resources.go:107 — add unit test coverage
client_resources.go:106 — sum instead of overwrite (bug)

… + tests (Issue 9204 follow-up) Addresses 3 Copilot review findings on PR kubestellar/console Issue 9204: ### 1. Multi-accelerator undercount (real bug) Previously GetPods in client_resources.go did: for name, qty := range c.Resources.Requests { if isGPUResourceName(name) { ci.GPURequested = int(qty.Value()) } } A container requesting more than one accelerator (e.g., nvidia.com/gpu=1 + habana.ai/gaudi=2) had ci.GPURequested overwritten per matching name, so the final value depended on Go's randomized map iteration order and undercounted the total. ### 2. Duplicate list between client_resources.go and client_gpu.go Both files hard-coded the same six resource names. Any future addition risked drifting one path behind the other — the exact class of bug Issue 9090 originally reported. ### 3. No test coverage for the new names The new Intel and Gaudi names landed without a unit test that would fail if a future refactor dropped one. ### Fix Added pkg/k8s/gpu_resources.go as the single source of truth: - `GPUResourceNames` — exported slice - `IsGPUResourceName(name)` — exact-match predicate - `SumGPURequested(rl)` — sums across ALL known names (fixes #1) client_resources.go now calls SumGPURequested on Requests, then on Limits as a fallback. client_gpu.go replaces its six repeated `if ok` blocks with a single SumGPURequested call too, so both pod and node paths read from the same list. gpu_resources_test.go covers: - IsGPUResourceName on all six known names (true) and 7 adjacent names including google.com/tpu / intel.com/xpu / nvidia.com/mig-4g.20gb (false) - SumGPURequested on empty, single-name, multi-accelerator (the regression case — returns 3 for nvidia=1+gaudi=2), all-six-summed, and non-GPU resources ignored. Signed-off-by: Andrew Anderson <[email protected]>

kubestellar-prow · 2026-04-20T18:11:26Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign eeshaansa for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

netlify · 2026-04-20T18:11:26Z

✅ Deploy Preview for kubestellarconsole ready!

Name	Link
🔨 Latest commit	`726f538`
🔍 Latest deploy log	https://app.netlify.com/projects/kubestellarconsole/deploys/69e66c4cf5c2c70008315d73
😎 Deploy Preview	https://deploy-preview-9216.console-deploy-preview.kubestellar.io
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

github-actions · 2026-04-20T18:11:29Z

👋 Hey @clubanderson — thanks for opening this PR!

🤖 This project is developed exclusively using AI coding assistants.

Please do not attempt to code anything for this project manually.
All contributions should be authored using an AI coding tool such as:

Claude Code (Opus 4.5 / 4.6) — recommended

GitHub Copilot

Cursor

Other AI coding assistants

This ensures consistency in code style, architecture patterns, test coverage,
and commit quality across the entire codebase.

This is an automated message.

Copilot

Pull request overview

This PR follows up on GPU accounting issues by centralizing the Kubernetes GPU/accelerator resource-name list and fixing pod/node GPU requested totals when multiple accelerator resources are requested in the same container.

Changes:

Added shared GPU resource helpers (GPUResourceNames, IsGPUResourceName, SumGPURequested) and updated callers to use the shared summation logic.
Fixed multi-accelerator undercount by summing across all known GPU resource names (instead of overwriting per match).
Added unit tests covering known/unknown GPU resource names and regression coverage for multi-accelerator sums.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File	Description
pkg/k8s/gpu_resources.go	Introduces centralized GPU resource-name list + helper functions, including deterministic summation across all known names.
pkg/k8s/gpu_resources_test.go	Adds tests for name classification and `SumGPURequested` regression coverage (multi-accelerator sum).
pkg/k8s/client_resources.go	Switches pod container GPURequested calculation to `SumGPURequested` (Requests, fallback to Limits).
pkg/k8s/client_gpu.go	Replaces repeated per-resource checks with a single `SumGPURequested` call for node allocation accounting.

Copilot · 2026-04-20T18:14:20Z

+var GPUResourceNames = []corev1.ResourceName{
+	"nvidia.com/gpu",
+	"amd.com/gpu",
+	"gpu.intel.com/i915",
+	"habana.ai/gaudi",
+	"habana.ai/gaudi2",
+	"intel.com/gaudi",
+}


GPUResourceNames is exported as a mutable slice. Since it’s only referenced within the k8s package, consider making the list unexported (and, if needed externally, expose a function that returns a defensive copy) to prevent accidental mutation and reduce public API surface.

Copilot · 2026-04-20T18:14:20Z

+	"nvidia.com/gpu",
+	"amd.com/gpu",
+	"gpu.intel.com/i915",
+	"habana.ai/gaudi",
+	"habana.ai/gaudi2",
+	"intel.com/gaudi",


The comment says to keep GPUResourceNames “sorted by vendor”, but the current order mixes Intel/Habana entries (Intel appears both before and after the Habana names). Either adjust the ordering to match the comment or update the comment to reflect the intended grouping.

Suggested change

"nvidia.com/gpu",

"amd.com/gpu",

"gpu.intel.com/i915",

"habana.ai/gaudi",

"habana.ai/gaudi2",

"intel.com/gaudi",

"amd.com/gpu",

"habana.ai/gaudi",

"habana.ai/gaudi2",

"gpu.intel.com/i915",

"intel.com/gaudi",

"nvidia.com/gpu",

Copilot · 2026-04-20T18:14:21Z

+// IsGPUResourceName reports whether the given Kubernetes resource name is a
+// known GPU / AI accelerator. Exact match; no vendor-prefix heuristics.
+func IsGPUResourceName(name corev1.ResourceName) bool {
+	for _, n := range GPUResourceNames {
+		if name == n {
+			return true
+		}
+	}
+	return false
+}


IsGPUResourceName is exported but (currently) only used by gpu_resources_test.go. If this helper is intended to be internal, consider unexporting it to avoid growing the public API; if it’s intended to be public, consider adding at least one non-test usage to justify the exported surface.

Copilot · 2026-04-20T18:14:21Z

+// Keep each row's expected sum honest: the hand-written totals prove the
+// SumGPURequested contract (sum across ALL known GPU resource names) without
+// looping the implementation under test.


The comment about “each row's expected sum” and “SumGPURequested contract” sits above TestIsGPUResourceName, but it describes the SumGPURequested table test below. Consider moving/rewriting the comment so it documents the correct test and doesn’t confuse readers.

Suggested change

// Keep each row's expected sum honest: the hand-written totals prove the

// SumGPURequested contract (sum across ALL known GPU resource names) without

// looping the implementation under test.

// TestIsGPUResourceName verifies that the known GPU resource names are

// accepted and that adjacent-but-unsupported resource names are rejected.

github-actions · 2026-04-20T18:16:21Z

Thank you for your contribution! Your PR has been merged.

Check out what's new:

KubeStellar Console — Live multi-cluster dashboard
Marketplace — Community extensions
Knowledge Base — Troubleshooting and how-tos

Stay connected: Slack #kubestellar-dev | Multi-Cluster Survey

github-actions · 2026-04-20T18:19:07Z

Post-merge build verification passed ✅

Both Go and frontend builds compiled successfully against merge commit 9e3fbf1502f966b4a82db3924df2c7c7125b203a.

Copilot AI review requested due to automatic review settings April 20, 2026 18:11

kubestellar-prow Bot added the dco-signoff: yes Indicates the PR's author has signed the DCO. label Apr 20, 2026

kubestellar-prow Bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Apr 20, 2026

github-actions Bot added the tier/2-standard label Apr 20, 2026

Copilot started reviewing on behalf of clubanderson April 20, 2026 18:11 View session

Copilot AI reviewed Apr 20, 2026

View reviewed changes

clubanderson merged commit 9e3fbf1 into main Apr 20, 2026
35 of 36 checks passed

clubanderson deleted the fix/gpu-resources-centralize-sum-9204fu branch April 20, 2026 18:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🐛 fix(gpu): centralize GPU resource names + fix multi-accelerator sum + tests (Issue 9204 follow-up)#9216

🐛 fix(gpu): centralize GPU resource names + fix multi-accelerator sum + tests (Issue 9204 follow-up)#9216
clubanderson merged 1 commit intomainfrom
fix/gpu-resources-centralize-sum-9204fu

clubanderson commented Apr 20, 2026

Uh oh!

kubestellar-prow Bot commented Apr 20, 2026

Uh oh!

netlify Bot commented Apr 20, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Apr 20, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 20, 2026

Uh oh!

Copilot AI Apr 20, 2026

Uh oh!

Copilot AI Apr 20, 2026

Uh oh!

Copilot AI Apr 20, 2026

Uh oh!

Uh oh!

github-actions Bot commented Apr 20, 2026

Uh oh!

github-actions Bot commented Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

clubanderson commented Apr 20, 2026

Summary

Changes

Context

Uh oh!

kubestellar-prow Bot commented Apr 20, 2026

Uh oh!

netlify Bot commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for kubestellarconsole ready!

Uh oh!

github-actions Bot commented Apr 20, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions Bot commented Apr 20, 2026

Uh oh!

github-actions Bot commented Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

netlify Bot commented Apr 20, 2026 •

edited

Loading