GPU Isolation and MPS

Good morning,

I am trying to achieve GPU process isolation and prioritization on the Jetson Nano and AGX platforms. I have tested several approaches. First, I attempted to take ownership of the CUDA render device to restrict GPU execution to a single user.

Later, I explored NVIDIA MPS and configured it on my system. However, I noticed that the behaviour is similar: once the MPS server is started by a user, only that user’s GPU processes are allowed to run. Additionally, I tried using MPS priorities to prioritize workloads, but they do not seem to work as expected.

Do you know if there is any reliable way to prioritize processes and achieve isolation on the integrated GPU of these platforms?

Which AGX platform?

Hi,

Do you use Jetson Nano?
We started supporting MPS from CUDA 12.5, which is not available on Jetson Nano.

Thanks.

Hardware:

  • P-Number: p3701-0000
  • Module: NVIDIA Jetson AGX Orin

yes, I have this configuration.

  • Model: NVIDIA Jetson Orin Nano Developer Kit
  • L4T: 36.4.7
    NV Power Mode[0]: 15W
    Serial Number: [XXX Show with: jetson_release -s XXX]
    Hardware:
  • P-Number: p3767-0005
  • Module: NVIDIA Jetson Orin Nano (Developer kit)
    Platform:
  • Distribution: Ubuntu 22.04 Jammy Jellyfish
  • Release: 5.15.148-tegra
    jtop:
  • Version: 4.3.2
  • Service: Active
    Libraries:
  • CUDA: 12.6.68
  • cuDNN: 9.3.0.75

From your last two posts, there are AGX Orin and Orin Nano, which one is correct?

Yes, I have both devices and I am trying to do the same thing: prioritise and isolate the GPU. The idea is to have stable times to see how they fit into a real-time environment.

Hi,

This is usually controlled by the GPU scheduler.
Are you able to share a simple sample and steps to reproduce your issue?
(Launch the task from other users fails?)

Thanks.

Good morning,

First approach:

export CUDA_MPS_PIPE_DIRECTORY=/tmp/mps-pipe
export CUDA_MPS_LOG_DIRECTORY=/tmp/mps-log

rm -rf /tmp/mps-pipe /tmp/mps-log
mkdir -p /tmp/mps-pipe /tmp/mps-log

sudo -E nvidia-cuda-mps-control -d

Then the mps is running in the system.

Then I have the next text:

// kernel.cu

 

 <cuda_runtime.h>

global void heavyKernel(float *a)
{
int idx = blockIdx.x * blockDim.x + threadIdx.x;
float x = a[idx];
// Heavy loop to saturate GPU
for (int i = 0; i < 500000; ++i) {
x = x * 1.000001f + 0.000001f;
}
a[idx] = x;
}

int main()
{
const int N = 1 << 20; // 1M elements
size_t size = N * sizeof(float);

float *d_a;
cudaMalloc(&d_a, size);

dim3 block(256);
dim3 grid((N + block.x - 1) / block.x);

printf("Launching heavy kernel...\n");

// launch kernel many times to keep GPU constantly busy
for (int i = 0; i < 50; i++) {
    heavyKernel<<<grid, block>>>(d_a);
}

cudaDeviceSynchronize();
cudaFree(d_a);

printf("Finished.\n");
return 0;

}

nvcc kernel.cu -o heavy_kernel

Then, in user1 I am running:

#!/bin/bash

###############################################

Assumes MPS server is already running

and CUDA_MPS_PIPE_DIRECTORY + LOG_DIRECTORY

are already exported in the environment.

###############################################

APP=“./heavy_kernel”   # CUDA app to run
LOW_PERC=10
HIGH_PERC=80

Check if MPS pipe exists

if [ ! -d “$CUDA_MPS_PIPE_DIRECTORY” ]; then
echo “[ERROR] MPS pipe directory not found: $CUDA_MPS_PIPE_DIRECTORY”
echo “Please start the MPS server first.”
exit 1
fi

echo “[INFO] Starting test (without creating MPS server)…”
echo “Using:”
echo " - Pipe: $CUDA_MPS_PIPE_DIRECTORY"
echo " - Log:  $CUDA_MPS_LOG_DIRECTORY"
echo

-------------------------------------------

RUN LOW PRIORITY PROCESS

-------------------------------------------

echo “[INFO] Launching LOW priority ($LOW_PERC%) process…”
CUDA_MPS_ACTIVE_THREAD_PERCENTAGE=$LOW_PERC $APP &
PID_LOW=$!
START_LOW=$(date +%s.%N)

-------------------------------------------

RUN HIGH PRIORITY PROCESS

-------------------------------------------

sleep 0.2
echo “[INFO] Launching HIGH priority ($HIGH_PERC%) process…”
CUDA_MPS_ACTIVE_THREAD_PERCENTAGE=$HIGH_PERC $APP &
PID_HIGH=$!
START_HIGH=$(date +%s.%N)

echo
echo “Waiting for processes:”
echo " - LOW  ($LOW_PERC%)  PID=$PID_LOW"
echo " - HIGH ($HIGH_PERC%) PID=$PID_HIGH"
echo

-------------------------------------------

WAIT FOR COMPLETION

-------------------------------------------

wait $PID_LOW
END_LOW=$(date +%s.%N)

wait $PID_HIGH
END_HIGH=$(date +%s.%N)

-------------------------------------------

CALCULATE DURATIONS

-------------------------------------------

DUR_LOW=$(echo “$END_LOW - $START_LOW” | bc)
DUR_HIGH=$(echo “$END_HIGH - $START_HIGH” | bc)

echo
echo “==================== RESULTS ====================”
echo “Low  priority  ($LOW_PERC%) finished in:  $DUR_LOW seconds”
echo “High priority  ($HIGH_PERC%) finished in: $DUR_HIGH seconds”
echo

if (( $(echo “$DUR_HIGH < $DUR_LOW” | bc -l) )); then
echo “➡️  HIGH priority process finished first (expected)”
else
echo “➡️  LOW priority process finished first (unexpected)”
fi

echo “=================================================”

and the priority is not taking effect. Then as “ISOLATION” when I run the same bash and kernel with the user2. The MPS blocks this call to the end of the execution of the user1. Here the question is if I can configure in the nano and orin to work with: CUDA_MPS_ACTIVE_THREAD_PERCENTAGE

Then, the second approach:

chmod 600 /dev/dri/renderD128

run command

restore-permissions

Then, in my ONNX inference, if another user tries to use CUDA while the driver is blocked due to permissions, it will fall back to running on the CPU.

Can you guide me on how to exploit and take advantage of the GPU scheduler?

Hi,

Thanks for sharing the details.
We will check your use case and provide more info to you.

Thanks.

I ran another test on another AGX we have.

Platform:

jetson_release
Software part of jetson-stats 4.3.2 - (c) 2024, Raffaello Bonghi
Model: NVIDIA Jetson AGX Orin Developer Kit - Jetpack 6.2 [L4T 36.4.3]
NV Power Mode[0]: MAXN
Serial Number: [XXX Show with: jetson_release -s XXX]
Hardware:

P-Number: p3701-0005

Module: NVIDIA Jetson AGX Orin (64GB ram)
Platform:

Distribution: Ubuntu 22.04 Jammy Jellyfish

Release: 5.15.148-rt-tegra
jtop:

Version: 4.3.2

Service: Active
Libraries:

CUDA: Not installed

cuDNN: Not installed

TensorRT: Not installed

VPI: Not installed

Vulkan: 1.3.204

OpenCV: 4.5.4 - with CUDA: NO

In this case, it seems that when the priority is higher than 30, then the priority works correctly.

Hi,

Thanks for your patience.

We test your source but meet some errors when reproducing it.
When running the script for user 1, it terminates with below error:

$ ./test.sh 
“[ERROR] MPS pipe directory not found: /tmp/mps-pipe”
“Please start the MPS server first.”

But the MPS server is running after the initialization steps:

$ sudo -E nvidia-cuda-mps-control -d
An instance of this daemon is already running

Is there any step missing in our testing?
Thanks.

Good morning,

I tried again and it works without any problems.

The code you shared says that the tmp files are not ready.

Have you applied the next lines?

export CUDA_MPS_PIPE_DIRECTORY=/tmp/mps-pipe
export CUDA_MPS_LOG_DIRECTORY=/tmp/mps-log

rm -rf /tmp/mps-pipe /tmp/mps-log
mkdir -p /tmp/mps-pipe /tmp/mps-log

sudo -E nvidia-cuda-mps-control -d

Perhaps you have a previous MPS server running.