NVIDIA TensorRT LLM 1.2.0rc2z on ubuntu 25.10本地部署安装推理体验

本文记录了在Ubuntu 25.10系统上部署NVIDIA TensorRT LLM 1.2.0rc2的过程。作者使用Intel i5-1240P处理器和RTX2060 SUPER显卡的机器，尝试通过pip安装时遇到连接问题，最终通过指定NVIDIA PyPI源成功下载。但在安装过程中遭遇硬盘空间不足和依赖冲突问题，随后更换到配备4060Ti显卡的机器重新安装，又遇到大量依赖项安装需求。整个部署过

JesseCooper

3121人浏览 · 2025-11-10 19:06:07

JesseCooper · 2025-11-10 19:06:07 发布

NVIDIA TensorRT LLM 1.2.0rc2z on ubuntu 25.10本地部署安装推理体验

#TensorRTLLM 1.0实战#

最近TensorRT LLM 1.2.0rc推出，今天尝试部署一下：

1. 本地机器参数：

系统：ubuntu 25.10 （截至11月7日最新版）
处理器与GPU：Intel i5-1240P + NVIDIA RTX2060 SUPER 8G
RAM: 16G
Python 3.12
Torch Version: 2.6.0+cu118
Python Version: CPython 3.13.5
pip info:
     Operating System: Linux 6.14.0-34-generic
     CPU Architecture: x86_64
     Driver Version: 580.95
     CUDA Version: 13.0
 nvcc -V:
 	Build cuda_12.4.r12.4/compiler.34097967_0

2. 必要组件库安装：

首先假设已经安装好了Pytorch的合适版本，现在需要安装libopenmpi-dev、libzmq3-dev两个组件库：
实测使用HUST的镜像源可在普通网络环境顺利下载。
```
sudo apt-get update
sudo apt-get -y install libopenmpi-dev
sudo apt-get -y install libzmq3-dev
```
然后使用pip安装本体：
```
pip3 install --upgrade pip setuptools
pip3 install tensorrt_llm
```
这里setuptools准备从72.1.0更新到80.9.0, 而此时pip版本是25.2, 比较好玩的是，timeout了，无解，那就别更新了，直接 install好了。但是这个Connection aborted，就比较难处理了。

anyway，只需要切换到US的服务器就好了，于是乎更新了pip 到版本25.3，同时setuptools更新到了80.9.0。
```
wheel_stub.error.InstallFailedError:
*******************************************************************************
  
  The installation of tensorrt-llm for version 1.0.0 failed.
  
  This is a special placeholder package which downloads a real wheel package
  from https://pypi.nvidia.com/. If https://pypi.nvidia.com/ is not reachable, we
  cannot download the real wheel file to install.
  
  You might try installing this package via
  	pip install --extra-index-url https://pypi.nvidia.com/ tensorrt-llm
```
尝试之后，依然被链接重置了，导致我现在头很大，一言不合，降版本！
```
conda create -n trtllm_env python=3.10
conda activate trtllm_env
```
然后就开始搞9小时前才发布的最新版本1.2.0rc2
```
pip install tensorrt-llm==1.2.0rc2
```
尝试无果，于是乎这时候某乎有人说了，啊TensorRT-LLM怎么能不装TensorRT的，我瞬间懵了，算了算了装一下吧，然后我就上当了，文章末尾说以上没用……真是无语，于是乎在文章的结尾，我们才明白，这玩意在https://pypi.nvidia.cn/tensorrt-llm/ 确实能访问到文件，也可以下载，于是乎我去尝试了，居然一键搞定：
```
pip3 install tensorrt_llm -U --pre --extra-index-url https://pypi.nvidia.com
```
当我看到命令行终端里显示Downloading https://pypi.nvidia.cn/tensorrt-llm/tensorrt_llm-1.2.0rc2-cp310-cp310-linux_x86_64.wh(2247.0MB)的时候，我彻底热泪盈眶了……

而两分钟后，正当我欣喜之余，命令行里突然多出一行红字：
```
ERROR: Could not install packages due to an OSError: [Errno 28] 设备上没有空间
```
孩子们，这太戏剧了，原本没想到会装满我分了200G的双系统硬盘，如今我只做了一点点小的开发（实际上也只是跑跑人家的项目），居然已经空间满了，我顿时更无语凝噎了……

在这时候我想起了我曾在Windows下的spacesniffer，它就是我的硬盘救星！然后我搜索到ubuntu下有类似的软件baobab，结果发现硬盘还有80G，天哪这太有趣了……

于是乎我换了台机器，使用的是4060Ti 8G，先运行一条命令：

	sudo apt-get -y install libopenmpi-dev && pip3 install --upgrade pip setuptools && pip3 install tensorrt_llm

 然后就开始慢慢安装了，它在下载`torch2.7.1`和`cublas`，以及`CUDA 12.6` 的相关组件，包括`cudnn`等等。
 这里贴一张运行时的`bash`：

sudo apt-get -y install libopenmpi-dev && pip3 install --upgrade pip setuptools && pip3 install tensorrt_llm
[sudo: authenticate] Password: 
正在读取软件包列表... 完成
正在分析软件包的依赖关系树... 完成
正在读取状态信息... 完成                 
正在解析依赖... 完成
将会同时安装下列软件：
  autoconf automake autotools-dev gfortran gfortran-14 gfortran-14-x86-64-linux-gnu gfortran-15 gfortran-15-x86-64-linux-gnu
  gfortran-x86-64-linux-gnu javascript-common libamd-comgr2 libamdhip64-5 libcaf-openmpi-3t64 libcoarrays-dev libcoarrays-openmpi-dev
  libevent-2.1-7t64 libevent-core-2.1-7t64 libevent-dev libevent-extra-2.1-7t64 libevent-openssl-2.1-7t64 libevent-pthreads-2.1-7t64 libfabric1
  libgfortran-14-dev libgfortran-15-dev libhsa-runtime64-1 libhsakmt1 libhwloc-dev libhwloc-plugins libhwloc15 libibmad5 libibumad3 libibverbs-dev
  libjs-jquery libjs-jquery-ui libllvm17t64 libltdl-dev libnl-3-dev libnl-route-3-dev libnuma-dev libopenmpi40 libpsm2-2 librdmacm1t64 libtool
  libucx0 libze1 m4 openmpi-bin openmpi-common zlib1g-dev
建议安装：
  autoconf-archive gnu-standards autoconf-doc gettext gfortran-multilib gfortran-doc gfortran-14-multilib gfortran-14-doc gfortran-15-multilib
  gfortran-15-doc apache2 | lighttpd | httpd libhwloc-contrib-plugins libjs-jquery-ui-docs libtool-doc openmpi-doc gcj-jdk m4-doc
下列【新】软件包将被安装：
  autoconf automake autotools-dev gfortran gfortran-14 gfortran-14-x86-64-linux-gnu gfortran-15 gfortran-15-x86-64-linux-gnu
  gfortran-x86-64-linux-gnu javascript-common libamd-comgr2 libamdhip64-5 libcaf-openmpi-3t64 libcoarrays-dev libcoarrays-openmpi-dev
  libevent-2.1-7t64 libevent-core-2.1-7t64 libevent-dev libevent-extra-2.1-7t64 libevent-openssl-2.1-7t64 libevent-pthreads-2.1-7t64 libfabric1
  libgfortran-14-dev libgfortran-15-dev libhsa-runtime64-1 libhsakmt1 libhwloc-dev libhwloc-plugins libhwloc15 libibmad5 libibumad3 libibverbs-dev
  libjs-jquery libjs-jquery-ui libllvm17t64 libltdl-dev libnl-3-dev libnl-route-3-dev libnuma-dev libopenmpi-dev libopenmpi40 libpsm2-2
  librdmacm1t64 libtool libucx0 libze1 m4 openmpi-bin openmpi-common zlib1g-dev
升级了 0 个软件包，新安装了 50 个软件包，要卸载 0 个软件包，有 0 个软件包未被升级。
需要下载 92.4 MB 的归档。
解压缩后会消耗 360 MB 的额外空间。
......
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
dgl 2.4.0+cu124 requires torch<=2.4.0, but you have torch 2.7.1 which is incompatible.
s3fs 2025.7.0 requires fsspec==2025.7.0, but you have fsspec 2024.9.0 which is incompatible.
torchaudio 2.9.0 requires torch==2.9.0, but you have torch 2.7.1 which is incompatible.
Successfully installed StrEnum-0.4.15 accelerate-1.11.0 aenum-3.1.16 backoff-2.2.1 blake3-1.0.8 blobfile-3.1.0 build-1.3.0 click_option_group-0.5.9 colored-2.3.1 cuda-bindings-12.9.4 cuda-pathfinder-1.3.2 cuda-python-12.9.4 datasets-3.1.0 diffusers-0.35.2 dill-0.3.8 einops-0.8.1 etcd3-0.12.0 evaluate-0.4.6 fastapi-0.115.4 flashinfer-python-0.2.5 fsspec-2024.9.0 grpcio-1.76.0 h5py-3.12.1 hf-xet-1.2.0 huggingface-hub-0.36.0 jiter-0.12.0 lark-1.3.1 llguidance-0.7.29 meson-1.9.1 ml_dtypes-0.5.3 multiprocess-0.70.16 ninja-1.13.0 numpy-1.26.4 nvidia-cublas-cu12-12.6.4.1 nvidia-cuda-cupti-cu12-12.6.80 nvidia-cuda-nvrtc-cu12-12.6.77 nvidia-cuda-runtime-cu12-12.6.77 nvidia-cudnn-cu12-9.5.1.17 nvidia-cufft-cu12-11.3.0.4 nvidia-cufile-cu12-1.11.1.6 nvidia-curand-cu12-10.3.7.77 nvidia-cusolver-cu12-11.7.1.2 nvidia-cusparse-cu12-12.5.4.2 nvidia-cusparselt-cu12-0.6.3 nvidia-ml-py-12.575.51 nvidia-modelopt-0.33.1 nvidia-modelopt-core-0.33.1 nvidia-nccl-cu12-2.26.2 nvidia-nvjitlink-cu12-12.6.85 nvidia-nvtx-cu12-12.6.77 nvtx-0.2.13 onnx-1.19.1 onnx_graphsurgeon-0.5.8 openai-2.7.1 opencv-python-headless-4.11.0.86 optimum-2.0.0 ordered-set-4.1.0 peft-0.17.1 pillow-10.3.0 polygraphy-0.49.26 pulp-3.3.0 pycryptodomex-3.23.0 pynvml-12.0.0 pyproject_hooks-1.2.0 safetensors-0.6.2 sentencepiece-0.2.1 setuptools-79.0.1 soundfile-0.13.1 starlette-0.41.3 tensorrt-10.11.0.33 tensorrt_cu12-10.11.0.33 tensorrt_cu12_bindings-10.11.0.33 tensorrt_cu12_libs-10.11.0.33 tensorrt_llm-1.0.0 tiktoken-0.12.0 tokenizers-0.21.4 torch-2.7.1 torchprofile-0.0.4 torchvision-0.22.1 transformers-4.53.1 triton-3.3.1 uvicorn-0.38.0 xgrammar-0.1.21 xxhash-3.6.0

不管pip了，然后让我们尝试正式运行：

这时候会有：

ImportError: libpython3.12.so.1.0: cannot open shared object file: No such file or directory

重新添加路径到usr/lib/之后解决了此问题。
这时候又有：

Exception: Invalid argument, error stack:
internal_Comm_split_type(102): MPI_Comm_split_type(MPI_COMM_WORLD, split_type=9, key=0, MPI_INFO_NULL, newcomm=0x7902330bd760) failed
internal_Comm_split_type(74).: Invalid split_type argument (9)

这是由于mpi4py参数设置错误所致，这时候我们找到/home/(usrname)/anaconda3/lib/python3.12/site-packages/tensorrt_llm/_utils.py，打开发现有这么一行代码：

# mpi4py only exports MPI_COMM_TYPE_SHARED, so we define OMPI_COMM_TYPE_HOST here
OMPI_COMM_TYPE_HOST = 9

默认设置了9个host，但我们的设备不支持，所以应该稍作修改：

try:
    local_comm = mpi_comm().Split_type(split_type=OMPI_COMM_TYPE_HOST)
except:
    # 如果Split_type失败，使用默认的COMM_WORLD
    local_comm = mpi_comm()

完成修改，保存关闭，重启内核。然后顺利输出了：

<frozen importlib._bootstrap_external>:1301: FutureWarning: The cuda.cuda module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.driver module instead.
<frozen importlib._bootstrap_external>:1301: FutureWarning: The cuda.cudart module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.runtime module instead.
[2025-11-10 18:39:56] INFO utils.py:164: NumExpr defaulting to 16 threads.
[2025-11-10 18:39:56] INFO config.py:54: PyTorch version 2.7.1 available.
2025-11-10 18:39:58,380 - INFO - flashinfer.jit: Prebuilt kernels not found, using JIT backend

截至目前已经编译10分钟了，准备等明天来继续检查结果，本期内容就到这里，本人作为一个外行人，也只能这么一步步踩坑来体验一下NVIDIA的最新大模型技术，按照纸面数据，性能主要提升在首词出现时长的压缩，以及token会更快，这些我准备放在下期内容详细调试和讨论，感谢观看到这里，我们下期再见！

——contact me at [email protected] or send messages to me via this post.

NVIDIA AI 技术专区

分享最新的 NVIDIA AI Software 资源以及活动/会议信息，精选收录AI相关技术内容，欢迎大家加入社区并参与讨论。

更多推荐

TensorRT-LLM 1.2最新特性：如何用1行代码实现10倍推理加速？

NVIDIA AI 技术专区

使用 NVIDIA Nemotron 构建您自己的 Bash 计算机使用代理，只需一小时

NVIDIA AI 技术专区

NVIDIA 培训 | AI 培训班新课发布，使用 Isaac 探索机器人仿真的巨大潜能

课程结束后，学员将具备亲手搭建机器人仿真环境的能力，并能将合成数据工作流集成到其中，构建可扩展的仿真优先开发流程。由 NVIDIA 深度学习培训中心（DLI）举办的 NVIDIA AI 培训班推出机器人仿真开发新课程，欢迎热衷于机器人开发的工程师、研究人员、爱好者积极报名，系统掌握仿真环境搭建与开发流程，全面提升实战能力。11 月 27 日：使用 NVIDIA Isaac 加速机器人开发（2025