[None][fix] Resolve NVML device index mismatch in get_numa_aware_cpu_affinity when CUDA_VISIBLE_DEVICES is set#12985
Conversation
📝 WalkthroughWalkthroughUpdated Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tensorrt_llm/llmapi/utils.py`:
- Around line 583-597: The CUDA_VISIBLE_DEVICES parsing assumes all tokens are
integers and allows negative device_id indexing; change the logic in the block
that computes nvml_device_id so CUDA_VISIBLE_DEVICES is parsed into a list of
trimmed tokens (convert tokens to int only when they are numeric, otherwise keep
the raw token string) into physical_ids, then guard selection with an explicit
non-negative bounds check (if 0 <= device_id < len(physical_ids)) before
indexing; if in-range set nvml_device_id to physical_ids[device_id] (which may
be an int or a UUID/MIG string), otherwise log the warning and fall back to
nvml_device_id = device_id. Use the existing symbols physical_ids, device_id,
nvml_device_id and logger when implementing this.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: f810959a-468b-42d2-96f7-9f7a39eb0fbf
📒 Files selected for processing (1)
tensorrt_llm/llmapi/utils.py
…y when CUDA_VISIBLE_DEVICES is set NVML always enumerates all physical GPUs regardless of CUDA_VISIBLE_DEVICES. When a user restricts GPU visibility (e.g. CUDA_VISIBLE_DEVICES=3,4), the logical CUDA device index 0 actually corresponds to physical GPU 3. Previously, get_numa_aware_cpu_affinity() passed the logical index directly to nvmlDeviceGetHandleByIndex(), causing it to query the wrong GPU's NUMA topology and bind worker threads to incorrect CPU cores. This commit adds a mapping from logical CUDA device index to physical NVML device index by parsing CUDA_VISIBLE_DEVICES before the NVML call, ensuring correct NUMA-aware CPU affinity in multi-GPU deployments that use GPU subsetting. Signed-off-by: pernyyang <[email protected]>
64b7fb2 to
0406292
Compare
|
/bot run --disable-fail-fast |
|
PR_Github #48320 [ run ] triggered by Bot. Commit: |
|
PR_Github #48320 [ run ] completed with state |
|
/bot run --disable-fail-fast |
|
PR_Github #48604 [ run ] triggered by Bot. Commit: |
|
PR_Github #48604 [ run ] completed with state
|
|
Hi @karljang , |
|
/bot run --disable-fail-fast |
|
PR_Github #49736 [ run ] triggered by Bot. Commit: |
|
PR_Github #49736 [ run ] completed with state
|
|
Hi @karljang , |
|
Again, the failure doesn't seem to be related. |
|
/bot run --disable-fail-fast |
|
PR_Github #50174 [ run ] triggered by Bot. Commit: |
|
PR_Github #50174 [ run ] completed with state
|
|
@YPxHolic , |
fa6e8e7 to
ede5317
Compare
…aware_cpu_affinity Signed-off-by: pernyyang <[email protected]>
c90d446 to
7939099
Compare
|
Hi @karljang , |
|
/bot run --disable-fail-fast |
|
PR_Github #50893 [ run ] triggered by Bot. Commit: |
|
PR_Github #50893 [ run ] completed with state
|
|
/bot run --disable-fail-fast |
|
PR_Github #50974 [ run ] triggered by Bot. Commit: |
|
PR_Github #50974 [ run ] completed with state |
Summary by CodeRabbit
CUDA_VISIBLE_DEVICESenvironment variable remapping, ensuring correct logical-to-physical device mapping for CPU affinity assignment in multi-GPU environments.Description
Problem
get_numa_aware_cpu_affinity()intensorrt_llm/llmapi/utils.pypasses the logical CUDA device index directly topynvml.nvmlDeviceGetHandleByIndex(). However, NVML always enumerates all physical GPUs on the system regardless ofCUDA_VISIBLE_DEVICES.When a user restricts GPU visibility (e.g.
CUDA_VISIBLE_DEVICES=3,4), logical CUDA device 0 actually corresponds to physical GPU 3. The old code would query physical GPU 0's NUMA topology instead, causing worker threads to be bound to incorrect CPU cores on the wrong NUMA node. This leads to degraded performance in multi-GPU deployments that use GPU subsetting (e.g. disaggregated prefill/decode with TP>1).Solution
Added a mapping from logical CUDA device index to physical NVML device index by parsing
CUDA_VISIBLE_DEVICESbefore the NVML call:CUDA_VISIBLE_DEVICESis set, parse it into a list of physical GPU IDs and usephysical_ids[device_id]as the NVML index.CUDA_VISIBLE_DEVICESis not set, behavior is unchanged (nvml_device_id = device_id).device_idexceeds theCUDA_VISIBLE_DEVICESlist length.device_idis the logical CUDA index.Example
CUDA_VISIBLE_DEVICES=3,4
Logical device 0 → Physical GPU 3 → correct NUMA node queried
Logical device 1 → Physical GPU 4 → correct NUMA node queried