torchpipe is an alternative choice for Triton Inference Server, mainly featuring similar functionalities such as Shared-momory, Ensemble, and BLS mechanism.
For serving scenarios, TorchPipe is designed to support multi-instance deployment, pipeline parallelism, adaptive batching, GPU-accelerated operators, and reduced head-of-line (HOL) blocking.It acts as a bridge between lower-level acceleration libraries (e.g., TensorRT, OpenCV, CVCUDA) and RPC frameworks (e.g., Thrift). At its core, it is an engine that enables programmable scheduling.
If you find an issue, please let us know!
Below are some usage examples, for more check out the examples.
from torchpipe import pipe
import torch
from torchvision.models.resnet import resnet101
# create some regular pytorch model...
model = resnet101(pretrained=True).eval().cuda()
# create example model
model_path = f"./resnet101.onnx"
x = torch.ones((1, 3, 224, 224)).cuda()
torch.onnx.export(model, x, model_path, opset_version=17,
input_names=['input'], output_names=['output'],
dynamic_axes={'input': {0: 'batch_size'},
'output': {0: 'batch_size'}})
thread_safe_pipe = pipe({
"preprocessor": {
"backend": "S[DecodeTensor,ResizeTensor,CvtColorTensor,SyncTensor]",
# "backend": "S[DecodeMat,ResizeMat,CvtColorMat,Mat2Tensor,SyncTensor]",
'instance_num': 2,
'color': 'rgb',
'resize_h': '224',
'resize_w': '224',
'next': 'model',
},
"model": {
"backend": "SyncTensor[TensorrtTensor]",
"model": model_path,
"model::cache": model_path.replace(".onnx", ".trt"),
"max": '4',
'batching_timeout': 4, # ms, timeout for batching
'instance_num': 2,
'mean': "123.675, 116.28, 103.53",
'std': "58.395, 57.120, 57.375", # merged into trt
}}
)We can execute the returned thread_safe_pipe just like the original PyTorch model, but in a thread-safe manner.
data = {'data': open('/path/to/img.jpg', 'rb').read()}
thread_safe_pipe(data) # <-- this is thread-safe
result = data['result']Note: compiling torchpipe depends on the TensorRT C++ API. Please follow the TensorRT Installation Guide. You may also try installing torchpipe inside one of the NGC PyTorch docker containers(e.g. nvcr.io/nvidia/pytorch:25.05-py3).
To install the torchpipe Python library, call the following
git clone https://github.com/torchpipe/torchpipe.git
cd torchpipe/
img_name=nvcr.io/nvidia/pytorch:25.05-py3
docker run --rm --gpus all -it --rm --network host \
-v $(pwd):/workspace/ --privileged \
-w /workspace/ \
$img_name \
bash
# pip config set global.index-url https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple
cd /workspace/plugins/torchpipe && python setup.py install --cv2The core functionality of TorchPipe (v0) has been extracted into the standalone Omniback library.
TorchPipe (v1, this version) is a collection of deep learning computation backends built on Omniback library. Not all computation backends from TorchPipe (v0) have been ported to TorchPipe (v1) yet.