Process/client-level GPU utilization observability

We are running workloads on a mps-enabled node in GKE. The node configuration is like this:

--accelerator="type=nvidia-a100-80gb,count=1,gpu-sharing-strategy=mps,max-shared-clients-per-gpu=4

and when scheduling pods / mps-clients, we set the env variables as:

env:
        - name: CUDA_MPS_ACTIVE_THREAD_PERCENTAGE
          value: "30"
        - name: CUDA_MPS_PINNED_DEVICE_MEM_LIMIT
          value: "15Gi"

We need to monitor mps-client/process/container-level gpu utilization. Running nvidia-smi and nvidia-smi pmon, we are only seeing aggregate utilization metrics.

/home/kubernetes/bin/nvidia/bin/nvidia-smi pmon -c 25
# gpu         pid   type     sm    mem    enc    dec    jpg    ofa    command 
# Idx           #    C/G      %      %      %      %      %      %    name 
    0       7078     C     99     21      -      -      -      -    nvidia-cuda-mps
    0      10075   M+C      -      -      -      -      -      -    python3        
    0      10078   M+C      -      -      -      -      -      -    python3        
    0       7078     C     99     20      -      -      -      -    nvidia-cuda-mps
    0      10075   M+C      -      -      -      -      -      -    python3        
    0      10078   M+C      -      -      -      -      -      -    python3 
  1. How can we client/process-level GPU utilization?
  2. For each pod replica, the memory request is CUDA_MPS_PINNED_DEVICE_MEM_LIMIT=15Gi, so what explains ~20% of 80GB (~16GB) Total GPU memory utilization ?
  3. In nvidia-smi the memory appears as follows as ~3GB each for the processes. What is the difference in GPU Memory Usage in nvidia-smi vs nvidia-smi pmon ?
Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A            7078      C   nvidia-cuda-mps-server                   34MiB |
|    0   N/A  N/A           10075    M+C   python3                                3220MiB |
|    0   N/A  N/A           10078    M+C   python3                                3220MiB