We are running workloads on a mps-enabled node in GKE. The node configuration is like this:
--accelerator="type=nvidia-a100-80gb,count=1,gpu-sharing-strategy=mps,max-shared-clients-per-gpu=4
and when scheduling pods / mps-clients, we set the env variables as:
env:
- name: CUDA_MPS_ACTIVE_THREAD_PERCENTAGE
value: "30"
- name: CUDA_MPS_PINNED_DEVICE_MEM_LIMIT
value: "15Gi"
We need to monitor mps-client/process/container-level gpu utilization. Running nvidia-smi and nvidia-smi pmon, we are only seeing aggregate utilization metrics.
/home/kubernetes/bin/nvidia/bin/nvidia-smi pmon -c 25
# gpu pid type sm mem enc dec jpg ofa command
# Idx # C/G % % % % % % name
0 7078 C 99 21 - - - - nvidia-cuda-mps
0 10075 M+C - - - - - - python3
0 10078 M+C - - - - - - python3
0 7078 C 99 20 - - - - nvidia-cuda-mps
0 10075 M+C - - - - - - python3
0 10078 M+C - - - - - - python3
- How can we client/process-level GPU utilization?
- For each pod replica, the memory request is
CUDA_MPS_PINNED_DEVICE_MEM_LIMIT=15Gi, so what explains ~20% of 80GB (~16GB) Total GPU memory utilization ? - In
nvidia-smithe memory appears as follows as ~3GB each for the processes. What is the difference in GPU Memory Usage innvidia-smivsnvidia-smi pmon?
Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 7078 C nvidia-cuda-mps-server 34MiB |
| 0 N/A N/A 10075 M+C python3 3220MiB |
| 0 N/A N/A 10078 M+C python3 3220MiB