Process/client-level GPU utilization observability

keval.shah1 · July 30, 2025, 7:07pm

We are running workloads on a mps-enabled node in GKE. The node configuration is like this:

--accelerator="type=nvidia-a100-80gb,count=1,gpu-sharing-strategy=mps,max-shared-clients-per-gpu=4

and when scheduling pods / mps-clients, we set the env variables as:

env:
        - name: CUDA_MPS_ACTIVE_THREAD_PERCENTAGE
          value: "30"
        - name: CUDA_MPS_PINNED_DEVICE_MEM_LIMIT
          value: "15Gi"

We need to monitor mps-client/process/container-level gpu utilization. Running nvidia-smi and nvidia-smi pmon, we are only seeing aggregate utilization metrics.

/home/kubernetes/bin/nvidia/bin/nvidia-smi pmon -c 25
# gpu         pid   type     sm    mem    enc    dec    jpg    ofa    command 
# Idx           #    C/G      %      %      %      %      %      %    name 
    0       7078     C     99     21      -      -      -      -    nvidia-cuda-mps
    0      10075   M+C      -      -      -      -      -      -    python3        
    0      10078   M+C      -      -      -      -      -      -    python3        
    0       7078     C     99     20      -      -      -      -    nvidia-cuda-mps
    0      10075   M+C      -      -      -      -      -      -    python3        
    0      10078   M+C      -      -      -      -      -      -    python3

How can we client/process-level GPU utilization?
For each pod replica, the memory request is CUDA_MPS_PINNED_DEVICE_MEM_LIMIT=15Gi, so what explains ~20% of 80GB (~16GB) Total GPU memory utilization ?
In nvidia-smi the memory appears as follows as ~3GB each for the processes. What is the difference in GPU Memory Usage in nvidia-smi vs nvidia-smi pmon ?

Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A            7078      C   nvidia-cuda-mps-server                   34MiB |
|    0   N/A  N/A           10075    M+C   python3                                3220MiB |
|    0   N/A  N/A           10078    M+C   python3                                3220MiB

Topic		Replies	Views
How to monitor per process's utilization on GPU under mps CUDA Programming and Performance cuda	3	764	November 5, 2024
showing gpu utlization per process CUDA Programming and Performance	5	2243	October 12, 2018
How to get SM utilization metrics for processes using MPS? CUDA Programming and Performance	1	200	November 5, 2024
MPS thread limit and 100% GPU usage CUDA Programming and Performance	7	361	August 14, 2025
Questions on per-process GPU utilization System Management and Monitoring (NVML)	6	2923	October 30, 2023
per-process resource accounting CUDA Programming and Performance	2	2836	December 22, 2022
GPU utilization DGX Systems (Data Center)	8	7096	August 21, 2019
Measure SM utilization per process System Management and Monitoring (NVML)	1	1464	January 11, 2024
Get GPU Usage CUDA Programming and Performance	1	15098	February 3, 2013
Monitoring GPU Utilization "Top" like utility for GPU CUDA Programming and Performance	8	6505	July 28, 2010

Process/client-level GPU utilization observability

Related topics