[None][fix] Resolve NVML device index mismatch in get_numa_aware_cpu_affinity when CUDA_VISIBLE_DEVICES is set by YPxHolic · Pull Request #12985 · NVIDIA/TensorRT-LLM

YPxHolic · 2026-04-13T07:07:37Z

Summary by CodeRabbit

Bug Fixes
- Enhanced GPU device identification to properly handle CUDA_VISIBLE_DEVICES environment variable remapping, ensuring correct logical-to-physical device mapping for CPU affinity assignment in multi-GPU environments.

Description

Problem

get_numa_aware_cpu_affinity() in tensorrt_llm/llmapi/utils.py passes the logical CUDA device index directly to pynvml.nvmlDeviceGetHandleByIndex(). However, NVML always enumerates all physical GPUs on the system regardless of CUDA_VISIBLE_DEVICES.

When a user restricts GPU visibility (e.g. CUDA_VISIBLE_DEVICES=3,4), logical CUDA device 0 actually corresponds to physical GPU 3. The old code would query physical GPU 0's NUMA topology instead, causing worker threads to be bound to incorrect CPU cores on the wrong NUMA node. This leads to degraded performance in multi-GPU deployments that use GPU subsetting (e.g. disaggregated prefill/decode with TP>1).

Solution

Added a mapping from logical CUDA device index to physical NVML device index by parsing CUDA_VISIBLE_DEVICES before the NVML call:

When CUDA_VISIBLE_DEVICES is set, parse it into a list of physical GPU IDs and use physical_ids[device_id] as the NVML index.
When CUDA_VISIBLE_DEVICES is not set, behavior is unchanged (nvml_device_id = device_id).
Added a warning log with graceful fallback when device_id exceeds the CUDA_VISIBLE_DEVICES list length.
Updated the docstring to clarify that device_id is the logical CUDA index.

Example

CUDA_VISIBLE_DEVICES=3,4
Logical device 0 → Physical GPU 3 → correct NUMA node queried
Logical device 1 → Physical GPU 4 → correct NUMA node queried

coderabbitai · 2026-04-13T07:13:10Z

📝 Walkthrough

Walkthrough

Updated get_numa_aware_cpu_affinity() function to properly handle CUDA_VISIBLE_DEVICES environment variable remapping by interpreting the input device_id as a logical CUDA device index and mapping it to the corresponding physical GPU index for NVML operations.

Changes

Cohort / File(s)	Summary
Device ID mapping logic `tensorrt_llm/llmapi/utils.py`	Added logic to parse `CUDA_VISIBLE_DEVICES` environment variable, map logical `device_id` to physical GPU index, include fallback handling with warning for out-of-range indices, and updated docstring to clarify `device_id` parameter as logical CUDA index.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and specifically describes the main fix: resolving NVML device index mismatch when CUDA_VISIBLE_DEVICES is set, directly matching the core change in the PR.
Description check	✅ Passed	The PR description includes Problem, Solution, and Example sections explaining the issue, fix, and impact clearly. However, the PR checklist items are not explicitly addressed, and Test Coverage section is missing.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tensorrt_llm/llmapi/utils.py`:
- Around line 583-597: The CUDA_VISIBLE_DEVICES parsing assumes all tokens are
integers and allows negative device_id indexing; change the logic in the block
that computes nvml_device_id so CUDA_VISIBLE_DEVICES is parsed into a list of
trimmed tokens (convert tokens to int only when they are numeric, otherwise keep
the raw token string) into physical_ids, then guard selection with an explicit
non-negative bounds check (if 0 <= device_id < len(physical_ids)) before
indexing; if in-range set nvml_device_id to physical_ids[device_id] (which may
be an int or a UUID/MIG string), otherwise log the warning and fall back to
nvml_device_id = device_id. Use the existing symbols physical_ids, device_id,
nvml_device_id and logger when implementing this.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: f810959a-468b-42d2-96f7-9f7a39eb0fbf

📥 Commits

Reviewing files that changed from the base of the PR and between 9a3dc61 and 64b7fb2.

📒 Files selected for processing (1)

tensorrt_llm/llmapi/utils.py

…y when CUDA_VISIBLE_DEVICES is set NVML always enumerates all physical GPUs regardless of CUDA_VISIBLE_DEVICES. When a user restricts GPU visibility (e.g. CUDA_VISIBLE_DEVICES=3,4), the logical CUDA device index 0 actually corresponds to physical GPU 3. Previously, get_numa_aware_cpu_affinity() passed the logical index directly to nvmlDeviceGetHandleByIndex(), causing it to query the wrong GPU's NUMA topology and bind worker threads to incorrect CPU cores. This commit adds a mapping from logical CUDA device index to physical NVML device index by parsing CUDA_VISIBLE_DEVICES before the NVML call, ensuring correct NUMA-aware CPU affinity in multi-GPU deployments that use GPU subsetting. Signed-off-by: pernyyang <[email protected]>

Superjomn

LGTM

karljang · 2026-05-14T06:53:50Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-05-14T06:59:15Z

PR_Github #48320 [ run ] triggered by Bot. Commit: 87e96f9 Link to invocation

tensorrt-cicd · 2026-05-15T07:00:06Z

PR_Github #48320 [ run ] completed with state ABORTED. Commit: 87e96f9

Link to invocation

karljang · 2026-05-15T16:23:03Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-05-15T16:28:09Z

PR_Github #48604 [ run ] triggered by Bot. Commit: 87e96f9 Link to invocation

tensorrt-cicd · 2026-05-15T22:43:58Z

PR_Github #48604 [ run ] completed with state FAILURE. Commit: 87e96f9
/LLM/main/L0_MergeRequest_PR pipeline #38387 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

YPxHolic · 2026-05-21T03:44:00Z

Hi @karljang ,
I don't have access to the CI logs. Could you please share the failure
details so I can investigate and fix? Thanks!

karljang · 2026-05-21T16:27:32Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-05-21T16:33:11Z

PR_Github #49736 [ run ] triggered by Bot. Commit: 87e96f9 Link to invocation

tensorrt-cicd · 2026-05-21T23:05:31Z

PR_Github #49736 [ run ] completed with state SUCCESS. Commit: 87e96f9
/LLM/main/L0_MergeRequest_PR pipeline #39340 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

YPxHolic · 2026-05-25T03:06:48Z

Hi @karljang ,
Still running into an issue here. Could you take a look?

karljang · 2026-05-25T07:42:05Z

Again, the failure doesn't seem to be related.

karljang · 2026-05-25T07:42:11Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-05-25T07:47:48Z

PR_Github #50174 [ run ] triggered by Bot. Commit: 87e96f9 Link to invocation

tensorrt-cicd · 2026-05-25T14:42:15Z

PR_Github #50174 [ run ] completed with state SUCCESS. Commit: 87e96f9
/LLM/main/L0_MergeRequest_PR pipeline #39718 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

karljang · 2026-05-26T01:43:54Z

@YPxHolic ,
Thanks for your contributions!
Most CI failures look irrelevant, but there is a formatting issue. could you please check this log:

diff --git a/tensorrt_llm/llmapi/utils.py b/tensorrt_llm/llmapi/utils.py
index 9c724ff..82adc4f 100644
--- a/tensorrt_llm/llmapi/utils.py
+++ b/tensorrt_llm/llmapi/utils.py
@@ -573,7 +573,9 @@ def get_numa_aware_cpu_affinity(device_id):
         # actually corresponds to physical GPU 3.
         cuda_visible = os.environ.get("CUDA_VISIBLE_DEVICES")
         if cuda_visible is not None and cuda_visible.strip():
-            visible_tokens = [x.strip() for x in cuda_visible.split(",") if x.strip()]
+            visible_tokens = [
+                x.strip() for x in cuda_visible.split(",") if x.strip()
+            ]
             if 0 <= device_id < len(visible_tokens):
                 token = visible_tokens[device_id]
                 if token.isdigit():
@@ -581,7 +583,8 @@ def get_numa_aware_cpu_affinity(device_id):
                 else:
                     logger.warning(
                         f"CUDA_VISIBLE_DEVICES token '{token}' is non-numeric; "
-                        f"falling back to device_id ({device_id}) as NVML index.")
+                        f"falling back to device_id ({device_id}) as NVML index."
+                    )
                     nvml_device_id = device_id
             else:
                 logger.warning(


Error: pre-commit checks failed
Please refer to our coding style guidelines at: https://github.com/NVIDIA/TensorRT-LLM/blob/main/CONTRIBUTING.md#coding-style to fix this issue

…aware_cpu_affinity Signed-off-by: pernyyang <[email protected]>

YPxHolic · 2026-05-27T11:59:42Z

Hi @karljang ,
Thanks for the review! I've addressed the comments. Could you please take another look?

karljang · 2026-05-28T21:31:23Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-05-28T21:37:34Z

PR_Github #50893 [ run ] triggered by Bot. Commit: 7939099 Link to invocation

tensorrt-cicd · 2026-05-29T03:57:22Z

PR_Github #50893 [ run ] completed with state SUCCESS. Commit: 7939099
/LLM/main/L0_MergeRequest_PR pipeline #40359 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

karljang · 2026-05-29T05:17:17Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-05-29T05:22:49Z

PR_Github #50974 [ run ] triggered by Bot. Commit: 7939099 Link to invocation

tensorrt-cicd · 2026-05-29T06:21:35Z

PR_Github #50974 [ run ] completed with state SUCCESS. Commit: 7939099
/LLM/main/L0_MergeRequest_PR pipeline #40429 completed with status: 'SUCCESS'

CI Report

Link to invocation

YPxHolic requested a review from a team as a code owner April 13, 2026 07:07

YPxHolic requested a review from syuoni April 13, 2026 07:07

github-actions Bot assigned YPxHolic Apr 13, 2026

coderabbitai Bot reviewed Apr 13, 2026

View reviewed changes

Comment thread tensorrt_llm/llmapi/utils.py

YPxHolic force-pushed the fix/numa-affinity-cuda-visible-devices branch from 64b7fb2 to 0406292 Compare April 13, 2026 07:22

Merge branch 'main' into fix/numa-affinity-cuda-visible-devices

87e96f9

svc-trtllm-gh-bot added the Community want to contribute PRs initiated from Community label Apr 13, 2026

Superjomn approved these changes May 13, 2026

View reviewed changes

Merge branch 'main' into fix/numa-affinity-cuda-visible-devices

f691ac7

Merge branch 'NVIDIA:main' into fix/numa-affinity-cuda-visible-devices

ede5317

YPxHolic force-pushed the fix/numa-affinity-cuda-visible-devices branch from fa6e8e7 to ede5317 Compare May 27, 2026 11:46

fix: Improve formatting of CUDA_VISIBLE_DEVICES handling in get_numa_…

7939099

…aware_cpu_affinity Signed-off-by: pernyyang <[email protected]>

YPxHolic force-pushed the fix/numa-affinity-cuda-visible-devices branch from c90d446 to 7939099 Compare May 27, 2026 11:55

karljang merged commit ebbbec4 into NVIDIA:main May 29, 2026
7 checks passed

Conversation

YPxHolic commented Apr 13, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Description

Problem

Solution

Example

Uh oh!

coderabbitai Bot commented Apr 13, 2026

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Superjomn left a comment

Choose a reason for hiding this comment

Uh oh!

karljang commented May 14, 2026

Uh oh!

tensorrt-cicd commented May 14, 2026

Uh oh!

tensorrt-cicd commented May 15, 2026

Uh oh!

karljang commented May 15, 2026

Uh oh!

tensorrt-cicd commented May 15, 2026

Uh oh!

tensorrt-cicd commented May 15, 2026

Uh oh!

YPxHolic commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

karljang commented May 21, 2026

Uh oh!

tensorrt-cicd commented May 21, 2026

Uh oh!

tensorrt-cicd commented May 21, 2026

Uh oh!

YPxHolic commented May 25, 2026

Uh oh!

karljang commented May 25, 2026

Uh oh!

karljang commented May 25, 2026

Uh oh!

tensorrt-cicd commented May 25, 2026

Uh oh!

tensorrt-cicd commented May 25, 2026

Uh oh!

karljang commented May 26, 2026

Uh oh!

YPxHolic commented May 27, 2026

Uh oh!

karljang commented May 28, 2026

Uh oh!

tensorrt-cicd commented May 28, 2026

Uh oh!

tensorrt-cicd commented May 29, 2026

Uh oh!

karljang commented May 29, 2026

Uh oh!

tensorrt-cicd commented May 29, 2026

Uh oh!

tensorrt-cicd commented May 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

YPxHolic commented Apr 13, 2026 •

edited by coderabbitai Bot

Loading

YPxHolic commented May 21, 2026 •

edited

Loading