I have a python application that spawns multiple processes (using multiprocessing.Process)
Subprocess A uses torch to do some matrix computations to process an input image
Subprocess B uses pycuda to run inference on image (using a tensorrt compiled model)
B is freezing periodically in what appears to be something related to gpu and kernel mutexes
Right now the initialization of torch and cuda is as follows:
A initializes torch using _ = torch.randn(1, device=‘cuda’)
B initializez torch also using _ = torch.randn(1, device=‘cuda’) then attaches to cuda context with cuda.Context.attach()
Which is the correct way to handle this scenario, I’ve seen way too many different examples but I have yet to find the correct one that avoids any application freeze
At this point I am looking for a confirmation that I use the correct initialization sequence so that torch and cuda coexist peacefully in python
This is the latest code I use (as suggested by AI after several not-so-successful iterations), not sure if correct:
#!/usr/bin/env python3
import torch
import pycuda.driver as cuda
# 1. Initialize PyTorch's CUDA state (you already did this)
print(f"Initializing torch")
_ = torch.randn(1, device="cuda")
# 2. Initialize the PyCUDA driver
print(f"Initializing cuda")
cuda.init()
# 3. Retrieve the primary context created by PyTorch
# Use the device ID matching your torch tensor (usually 0)
print(f"Get pytorch context")
device = cuda.Device(torch.cuda.current_device())
ctx = device.retain_primary_context()
# 4. Push the context to make it active for PyCUDA
print("Push context")
ctx.push()
try:
# Your PyCUDA kernel operations go here
print("Run sample torch operation")
x = torch.ones(10, device="cuda")
finally:
# 5. Pop the context when done to avoid cleanup hangs
print(f"Pop context")
ctx.pop()
I am looking for the proper way to initialize torch and cuda in the same script so that I do not risk a deadlock when using both, especially that torch is also used in another process
What I have is a multiprocess python app, in which process A uses torch, and process B uses both torch and cuda, and they run at the same time, which has lead to some gpu-related futex deadlocks
For example I’ve read that cuda autoinit is a no-no and I should attach cuda to torch context (there are several ways to do this, not sure which one is correct)
The data is not shared, each process does it’s own processing on different data
Depending on how we initialize python torch and cuda, in some scenarios we end in a futex deadlock which is gpu related, and various information on the net suggests that yes this is known to happen
There is no update from you for a period, assuming this is not an issue anymore.
Hence, we are closing this topic. If need further support, please open a new one.
Thanks ~0422
Hi,
Could you try to run each process with a different CUDA stream?
Thanks.