I am trying to achieve GPU process isolation and prioritization on the Jetson Nano and AGX platforms. I have tested several approaches. First, I attempted to take ownership of the CUDA render device to restrict GPU execution to a single user.
Later, I explored NVIDIA MPS and configured it on my system. However, I noticed that the behaviour is similar: once the MPS server is started by a user, only that user’s GPU processes are allowed to run. Additionally, I tried using MPS priorities to prioritize workloads, but they do not seem to work as expected.
Do you know if there is any reliable way to prioritize processes and achieve isolation on the integrated GPU of these platforms?
Yes, I have both devices and I am trying to do the same thing: prioritise and isolate the GPU. The idea is to have stable times to see how they fit into a real-time environment.
This is usually controlled by the GPU scheduler.
Are you able to share a simple sample and steps to reproduce your issue?
(Launch the task from other users fails?)
// kernel.cu
<cuda_runtime.h>
global void heavyKernel(float *a)
{
int idx = blockIdx.x * blockDim.x + threadIdx.x;
float x = a[idx];
// Heavy loop to saturate GPU
for (int i = 0; i < 500000; ++i) {
x = x * 1.000001f + 0.000001f;
}
a[idx] = x;
}
int main()
{
const int N = 1 << 20; // 1M elements
size_t size = N * sizeof(float);
float *d_a;
cudaMalloc(&d_a, size);
dim3 block(256);
dim3 grid((N + block.x - 1) / block.x);
printf("Launching heavy kernel...\n");
// launch kernel many times to keep GPU constantly busy
for (int i = 0; i < 50; i++) {
heavyKernel<<<grid, block>>>(d_a);
}
cudaDeviceSynchronize();
cudaFree(d_a);
printf("Finished.\n");
return 0;
}
nvcc kernel.cu -o heavy_kernel
Then, in user1 I am running:
#!/bin/bash
###############################################
Assumes MPS server is already running
and CUDA_MPS_PIPE_DIRECTORY + LOG_DIRECTORY
are already exported in the environment.
###############################################
APP=“./heavy_kernel” # CUDA app to run
LOW_PERC=10
HIGH_PERC=80
Check if MPS pipe exists
if [ ! -d “$CUDA_MPS_PIPE_DIRECTORY” ]; then
echo “[ERROR] MPS pipe directory not found: $CUDA_MPS_PIPE_DIRECTORY”
echo “Please start the MPS server first.”
exit 1
fi
echo “[INFO] Starting test (without creating MPS server)…”
echo “Using:”
echo " - Pipe: $CUDA_MPS_PIPE_DIRECTORY"
echo " - Log: $CUDA_MPS_LOG_DIRECTORY"
echo
-------------------------------------------
RUN LOW PRIORITY PROCESS
-------------------------------------------
echo “[INFO] Launching LOW priority ($LOW_PERC%) process…”
CUDA_MPS_ACTIVE_THREAD_PERCENTAGE=$LOW_PERC $APP &
PID_LOW=$!
START_LOW=$(date +%s.%N)
-------------------------------------------
RUN HIGH PRIORITY PROCESS
-------------------------------------------
sleep 0.2
echo “[INFO] Launching HIGH priority ($HIGH_PERC%) process…”
CUDA_MPS_ACTIVE_THREAD_PERCENTAGE=$HIGH_PERC $APP &
PID_HIGH=$!
START_HIGH=$(date +%s.%N)
echo
echo “Waiting for processes:”
echo " - LOW ($LOW_PERC%) PID=$PID_LOW"
echo " - HIGH ($HIGH_PERC%) PID=$PID_HIGH"
echo
-------------------------------------------
WAIT FOR COMPLETION
-------------------------------------------
wait $PID_LOW
END_LOW=$(date +%s.%N)
wait $PID_HIGH
END_HIGH=$(date +%s.%N)
-------------------------------------------
CALCULATE DURATIONS
-------------------------------------------
DUR_LOW=$(echo “$END_LOW - $START_LOW” | bc)
DUR_HIGH=$(echo “$END_HIGH - $START_HIGH” | bc)
echo
echo “==================== RESULTS ====================”
echo “Low priority ($LOW_PERC%) finished in: $DUR_LOW seconds”
echo “High priority ($HIGH_PERC%) finished in: $DUR_HIGH seconds”
echo
if (( $(echo “$DUR_HIGH < $DUR_LOW” | bc -l) )); then
echo “➡️ HIGH priority process finished first (expected)”
else
echo “➡️ LOW priority process finished first (unexpected)”
fi
echo “=================================================”
and the priority is not taking effect. Then as “ISOLATION” when I run the same bash and kernel with the user2. The MPS blocks this call to the end of the execution of the user1. Here the question is if I can configure in the nano and orin to work with: CUDA_MPS_ACTIVE_THREAD_PERCENTAGE
Then, the second approach:
chmod 600 /dev/dri/renderD128
run command
restore-permissions
Then, in my ONNX inference, if another user tries to use CUDA while the driver is blocked due to permissions, it will fall back to running on the CPU.
Can you guide me on how to exploit and take advantage of the GPU scheduler?