NVIDIA TensorRT LLM 1.2.0rc2z on ubuntu 25.10本地部署安装推理体验
本文记录了在Ubuntu 25.10系统上部署NVIDIA TensorRT LLM 1.2.0rc2的过程。作者使用Intel i5-1240P处理器和RTX2060 SUPER显卡的机器,尝试通过pip安装时遇到连接问题,最终通过指定NVIDIA PyPI源成功下载。但在安装过程中遭遇硬盘空间不足和依赖冲突问题,随后更换到配备4060Ti显卡的机器重新安装,又遇到大量依赖项安装需求。整个部署过
NVIDIA TensorRT LLM 1.2.0rc2z on ubuntu 25.10本地部署安装推理体验
#TensorRTLLM 1.0实战#
最近TensorRT LLM 1.2.0rc推出,今天尝试部署一下:
1. 本地机器参数:
系统:ubuntu 25.10 (截至11月7日最新版)
处理器与GPU:Intel i5-1240P + NVIDIA RTX2060 SUPER 8G
RAM: 16G
Python 3.12
Torch Version: 2.6.0+cu118
Python Version: CPython 3.13.5
pip info:
Operating System: Linux 6.14.0-34-generic
CPU Architecture: x86_64
Driver Version: 580.95
CUDA Version: 13.0
nvcc -V:
Build cuda_12.4.r12.4/compiler.34097967_0
2. 必要组件库安装:
-
首先假设已经安装好了Pytorch的合适版本,现在需要安装
libopenmpi-dev、libzmq3-dev两个组件库:
实测使用HUST的镜像源可在普通网络环境顺利下载。sudo apt-get update sudo apt-get -y install libopenmpi-dev sudo apt-get -y install libzmq3-dev -
然后使用pip安装本体:
pip3 install --upgrade pip setuptools pip3 install tensorrt_llm这里
setuptools准备从72.1.0更新到80.9.0, 而此时pip版本是25.2, 比较好玩的是,timeout了,无解,那就别更新了,直接install好了。但是这个Connection aborted,就比较难处理了。anyway,只需要切换到US的服务器就好了,于是乎更新了
pip到版本25.3,同时setuptools更新到了80.9.0。wheel_stub.error.InstallFailedError: ******************************************************************************* The installation of tensorrt-llm for version 1.0.0 failed. This is a special placeholder package which downloads a real wheel package from https://pypi.nvidia.com/. If https://pypi.nvidia.com/ is not reachable, we cannot download the real wheel file to install. You might try installing this package via pip install --extra-index-url https://pypi.nvidia.com/ tensorrt-llm尝试之后,依然被链接重置了,导致我现在头很大,一言不合,降版本!
conda create -n trtllm_env python=3.10 conda activate trtllm_env然后就开始搞9小时前才发布的最新版本
1.2.0rc2pip install tensorrt-llm==1.2.0rc2尝试无果,于是乎这时候某乎有人说了,啊
TensorRT-LLM怎么能不装TensorRT的,我瞬间懵了,算了算了装一下吧,然后我就上当了,文章末尾说以上没用……真是无语,于是乎在文章的结尾,我们才明白,这玩意在https://pypi.nvidia.cn/tensorrt-llm/确实能访问到文件,也可以下载,于是乎我去尝试了,居然一键搞定:pip3 install tensorrt_llm -U --pre --extra-index-url https://pypi.nvidia.com当我看到命令行终端里显示
Downloading https://pypi.nvidia.cn/tensorrt-llm/tensorrt_llm-1.2.0rc2-cp310-cp310-linux_x86_64.wh(2247.0MB)的时候,我彻底热泪盈眶了……而两分钟后,正当我欣喜之余,命令行里突然多出一行红字:
ERROR: Could not install packages due to an OSError: [Errno 28] 设备上没有空间孩子们,这太戏剧了,原本没想到会装满我分了200G的双系统硬盘,如今我只做了一点点小的开发(实际上也只是跑跑人家的项目),居然已经空间满了,我顿时更无语凝噎了……
在这时候我想起了我曾在Windows下的
spacesniffer,它就是我的硬盘救星!然后我搜索到ubuntu下有类似的软件baobab,结果发现硬盘还有80G,天哪这太有趣了…… -
于是乎我换了台机器,使用的是4060Ti 8G, 先运行一条命令:
sudo apt-get -y install libopenmpi-dev && pip3 install --upgrade pip setuptools && pip3 install tensorrt_llm然后就开始慢慢安装了,它在下载`torch2.7.1`和`cublas`,以及`CUDA 12.6` 的相关组件,包括`cudnn`等等。 这里贴一张运行时的`bash`:sudo apt-get -y install libopenmpi-dev && pip3 install --upgrade pip setuptools && pip3 install tensorrt_llm [sudo: authenticate] Password: 正在读取软件包列表... 完成 正在分析软件包的依赖关系树... 完成 正在读取状态信息... 完成 正在解析依赖... 完成 将会同时安装下列软件: autoconf automake autotools-dev gfortran gfortran-14 gfortran-14-x86-64-linux-gnu gfortran-15 gfortran-15-x86-64-linux-gnu gfortran-x86-64-linux-gnu javascript-common libamd-comgr2 libamdhip64-5 libcaf-openmpi-3t64 libcoarrays-dev libcoarrays-openmpi-dev libevent-2.1-7t64 libevent-core-2.1-7t64 libevent-dev libevent-extra-2.1-7t64 libevent-openssl-2.1-7t64 libevent-pthreads-2.1-7t64 libfabric1 libgfortran-14-dev libgfortran-15-dev libhsa-runtime64-1 libhsakmt1 libhwloc-dev libhwloc-plugins libhwloc15 libibmad5 libibumad3 libibverbs-dev libjs-jquery libjs-jquery-ui libllvm17t64 libltdl-dev libnl-3-dev libnl-route-3-dev libnuma-dev libopenmpi40 libpsm2-2 librdmacm1t64 libtool libucx0 libze1 m4 openmpi-bin openmpi-common zlib1g-dev 建议安装: autoconf-archive gnu-standards autoconf-doc gettext gfortran-multilib gfortran-doc gfortran-14-multilib gfortran-14-doc gfortran-15-multilib gfortran-15-doc apache2 | lighttpd | httpd libhwloc-contrib-plugins libjs-jquery-ui-docs libtool-doc openmpi-doc gcj-jdk m4-doc 下列【新】软件包将被安装: autoconf automake autotools-dev gfortran gfortran-14 gfortran-14-x86-64-linux-gnu gfortran-15 gfortran-15-x86-64-linux-gnu gfortran-x86-64-linux-gnu javascript-common libamd-comgr2 libamdhip64-5 libcaf-openmpi-3t64 libcoarrays-dev libcoarrays-openmpi-dev libevent-2.1-7t64 libevent-core-2.1-7t64 libevent-dev libevent-extra-2.1-7t64 libevent-openssl-2.1-7t64 libevent-pthreads-2.1-7t64 libfabric1 libgfortran-14-dev libgfortran-15-dev libhsa-runtime64-1 libhsakmt1 libhwloc-dev libhwloc-plugins libhwloc15 libibmad5 libibumad3 libibverbs-dev libjs-jquery libjs-jquery-ui libllvm17t64 libltdl-dev libnl-3-dev libnl-route-3-dev libnuma-dev libopenmpi-dev libopenmpi40 libpsm2-2 librdmacm1t64 libtool libucx0 libze1 m4 openmpi-bin openmpi-common zlib1g-dev 升级了 0 个软件包,新安装了 50 个软件包,要卸载 0 个软件包,有 0 个软件包未被升级。 需要下载 92.4 MB 的归档。 解压缩后会消耗 360 MB 的额外空间。 ...... ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. dgl 2.4.0+cu124 requires torch<=2.4.0, but you have torch 2.7.1 which is incompatible. s3fs 2025.7.0 requires fsspec==2025.7.0, but you have fsspec 2024.9.0 which is incompatible. torchaudio 2.9.0 requires torch==2.9.0, but you have torch 2.7.1 which is incompatible. Successfully installed StrEnum-0.4.15 accelerate-1.11.0 aenum-3.1.16 backoff-2.2.1 blake3-1.0.8 blobfile-3.1.0 build-1.3.0 click_option_group-0.5.9 colored-2.3.1 cuda-bindings-12.9.4 cuda-pathfinder-1.3.2 cuda-python-12.9.4 datasets-3.1.0 diffusers-0.35.2 dill-0.3.8 einops-0.8.1 etcd3-0.12.0 evaluate-0.4.6 fastapi-0.115.4 flashinfer-python-0.2.5 fsspec-2024.9.0 grpcio-1.76.0 h5py-3.12.1 hf-xet-1.2.0 huggingface-hub-0.36.0 jiter-0.12.0 lark-1.3.1 llguidance-0.7.29 meson-1.9.1 ml_dtypes-0.5.3 multiprocess-0.70.16 ninja-1.13.0 numpy-1.26.4 nvidia-cublas-cu12-12.6.4.1 nvidia-cuda-cupti-cu12-12.6.80 nvidia-cuda-nvrtc-cu12-12.6.77 nvidia-cuda-runtime-cu12-12.6.77 nvidia-cudnn-cu12-9.5.1.17 nvidia-cufft-cu12-11.3.0.4 nvidia-cufile-cu12-1.11.1.6 nvidia-curand-cu12-10.3.7.77 nvidia-cusolver-cu12-11.7.1.2 nvidia-cusparse-cu12-12.5.4.2 nvidia-cusparselt-cu12-0.6.3 nvidia-ml-py-12.575.51 nvidia-modelopt-0.33.1 nvidia-modelopt-core-0.33.1 nvidia-nccl-cu12-2.26.2 nvidia-nvjitlink-cu12-12.6.85 nvidia-nvtx-cu12-12.6.77 nvtx-0.2.13 onnx-1.19.1 onnx_graphsurgeon-0.5.8 openai-2.7.1 opencv-python-headless-4.11.0.86 optimum-2.0.0 ordered-set-4.1.0 peft-0.17.1 pillow-10.3.0 polygraphy-0.49.26 pulp-3.3.0 pycryptodomex-3.23.0 pynvml-12.0.0 pyproject_hooks-1.2.0 safetensors-0.6.2 sentencepiece-0.2.1 setuptools-79.0.1 soundfile-0.13.1 starlette-0.41.3 tensorrt-10.11.0.33 tensorrt_cu12-10.11.0.33 tensorrt_cu12_bindings-10.11.0.33 tensorrt_cu12_libs-10.11.0.33 tensorrt_llm-1.0.0 tiktoken-0.12.0 tokenizers-0.21.4 torch-2.7.1 torchprofile-0.0.4 torchvision-0.22.1 transformers-4.53.1 triton-3.3.1 uvicorn-0.38.0 xgrammar-0.1.21 xxhash-3.6.0不管pip了,然后让我们尝试正式运行:
这时候会有:
ImportError: libpython3.12.so.1.0: cannot open shared object file: No such file or directory重新添加路径到
usr/lib/之后解决了此问题。
这时候又有:Exception: Invalid argument, error stack: internal_Comm_split_type(102): MPI_Comm_split_type(MPI_COMM_WORLD, split_type=9, key=0, MPI_INFO_NULL, newcomm=0x7902330bd760) failed internal_Comm_split_type(74).: Invalid split_type argument (9)这是由于
mpi4py参数设置错误所致,这时候我们找到/home/(usrname)/anaconda3/lib/python3.12/site-packages/tensorrt_llm/_utils.py,打开发现有这么一行代码:# mpi4py only exports MPI_COMM_TYPE_SHARED, so we define OMPI_COMM_TYPE_HOST here OMPI_COMM_TYPE_HOST = 9默认设置了
9个host,但我们的设备不支持,所以应该稍作修改:try: local_comm = mpi_comm().Split_type(split_type=OMPI_COMM_TYPE_HOST) except: # 如果Split_type失败,使用默认的COMM_WORLD local_comm = mpi_comm()完成修改,保存关闭,重启内核。然后顺利输出了:
<frozen importlib._bootstrap_external>:1301: FutureWarning: The cuda.cuda module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.driver module instead. <frozen importlib._bootstrap_external>:1301: FutureWarning: The cuda.cudart module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.runtime module instead. [2025-11-10 18:39:56] INFO utils.py:164: NumExpr defaulting to 16 threads. [2025-11-10 18:39:56] INFO config.py:54: PyTorch version 2.7.1 available. 2025-11-10 18:39:58,380 - INFO - flashinfer.jit: Prebuilt kernels not found, using JIT backend
截至目前已经编译10分钟了,准备等明天来继续检查结果,本期内容就到这里,本人作为一个外行人,也只能这么一步步踩坑来体验一下NVIDIA的最新大模型技术,按照纸面数据,性能主要提升在首词出现时长的压缩,以及token会更快,这些我准备放在下期内容详细调试和讨论,感谢观看到这里,我们下期再见!
——contact me at [email protected] or send messages to me via this post.
更多推荐



所有评论(0)