Skip to content

[Linux] Fix GPU name collision for multi-AMD GPU systems #1297

@shm11C3

Description

@shm11C3

Parent: #1296

Problem

get_gpu_name_from_lspci_by_vendor_id("1002") in lspci.rs returns the first VGA device matching vendor ID 0x1002. When a system has both an AMD iGPU (e.g., Renoir integrated Radeon Graphics) and a discrete AMD GPU (e.g., Radeon RX 7900), both share vendor 0x1002, so both GPUs are assigned the same name.

Since GPU name is the key in gpu_usage_histories / gpu_temperature_histories HashMap and the gpu_name column in GPU_DATA_ARCHIVE, this causes:

  • One GPU's metrics overwriting the other in real-time history
  • Both GPUs stored under the same name in the archive DB
  • Only one tab appearing in Insights even though two GPUs exist

Additionally, get_gpu_usage() in platform/linux/gpu.rs returns on the first AMD card found, completely ignoring the second GPU.

Affected Files

  • src-tauri/src/infrastructure/providers/linux/lspci.rsget_gpu_name_from_lspci_by_vendor_id() returns first match only
  • src-tauri/src/infrastructure/providers/linux/drm_sys.rs — needs BDF extraction from sysfs
  • src-tauri/src/services/monitoring_service.rscollect_linux_gpu_metrics() uses vendor-ID-based name lookup
  • src-tauri/src/platform/linux/gpu.rsget_amd_graphic_info() uses same broken lookup; get_gpu_usage() returns first AMD card only

Proposed Solution

  1. Add BDF extraction in drm_sys.rs: read the PCI address from the sysfs symlink at /sys/class/drm/card{N}/device (e.g., ../../0000:03:00.0) to get the bus:device.function tuple

  2. Add BDF-based lspci lookup in lspci.rs: match the specific PCI slot (e.g., 03:00.0) from lspci -nn output instead of matching by vendor ID alone

  3. Update collect_linux_gpu_metrics() to use BDF-based name resolution for each card individually

  4. Update get_amd_graphic_info() to accept and use the card's BDF for name resolution

  5. Fix get_gpu_usage() to not early-return on first AMD card (or deprecate in favor of sample_gpu() which already handles multiple GPUs)

Fallback

If BDF resolution fails (e.g., unusual kernel configuration), fall back to "AMD GPU (card{N})" format using the card ID, which at least ensures uniqueness.

Metadata

Metadata

Assignees

No one assigned

    Labels

    backendbugSomething isn't workinghardwarerustPull requests that update Rust code

    Projects

    Status
    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions