examples

TensorRT-LLM Examples

Quick Start

TensorRT-LLM uses the PyTorch backend by default. The fastest way to get started:

# Serve a model with OpenAI-compatible API
trtllm-serve "TinyLlama/TinyLlama-1.1B-Chat-v1.0"

# Or use a pre-quantized model for better performance
trtllm-serve "nvidia/Llama-3.1-8B-Instruct-FP8"

For the Python API:

from tensorrt_llm import LLM

llm = LLM(model="TinyLlama/TinyLlama-1.1B-Chat-v1.0")
output = llm.generate(["What is TensorRT-LLM?"])
print(output[0].outputs[0].text)

Full documentation: https://nvidia.github.io/TensorRT-LLM/quick-start-guide.html

Examples Directory

Directory	Description
`llm-api/`	Python LLM API examples (offline inference, quantization, speculative decoding)
`apps/`	Application examples (chat, FastAPI server)
`configs/`	Pre-tuned serving configurations — curated quick-starts and a comprehensive database
`auto_deploy/`	AutoDeploy (beta) development examples, cookbooks, and model registry
`serve/`	`trtllm-serve` deployment guides and examples
`quantization/`	Quantization workflows with NVIDIA Model Optimizer

Pre-Tuned Model Configurations

The configs/ directory contains recommended trtllm-serve configurations. Start with the hand-picked curated configs or browse the full database for specific GPU / ISL / OSL / concurrency combinations.

trtllm-serve "deepseek-ai/DeepSeek-R1-0528" \
  --config configs/curated/deepseek-r1-throughput.yaml

For model-specific walkthroughs and an interactive recipe selector, see the Model Recipes deployment guide.

AutoDeploy (Beta)

The AutoDeploy backend automatically translates HuggingFace models into optimized inference graphs. It is accessed through the same trtllm-serve, trtllm-bench, and LLM API entry points as the default PyTorch backend.

See auto_deploy/ for development examples, Jupyter cookbooks, and a registry of 90+ validated models.

Legacy Engine-Build Workflow

⚠️ Legacy: The convert_checkpoint.py → trtllm-build → run.py workflow is legacy and may not receive new features. For new projects, use trtllm-serve or the LLM API as shown above.

The models/ directory contains per-model scripts for the legacy TensorRT engine-build workflow. These scripts convert Hugging Face checkpoints to TensorRT engines for deployment. While still functional for supported models, this workflow is no longer the recommended path and may not support newly added models.

If you are following a tutorial or guide that references convert_checkpoint.py or trtllm-build, please refer to the Quick Start Guide for the current recommended workflow.

Name		Name	Last commit message	Last commit date
parent directory ..
apps		apps
auto_deploy		auto_deploy
bindings/executor		bindings/executor
configs		configs
cpp/executor		cpp/executor
cpp_library		cpp_library
disaggregated		disaggregated
dora		dora
draft_target_model		draft_target_model
eagle		eagle
infinitebench		infinitebench
language_adapter		language_adapter
layer_wise_benchmarks		layer_wise_benchmarks
llm-api		llm-api
llm-eval/lm-eval-harness		llm-eval/lm-eval-harness
longbench		longbench
lookahead		lookahead
medusa		medusa
models		models
ngram		ngram
openai_triton		openai_triton
opentelemetry		opentelemetry
python_plugin		python_plugin
quantization		quantization
ray_orchestrator		ray_orchestrator
redrafter		redrafter
sample_weight_stripping		sample_weight_stripping
scaffolding		scaffolding
serve		serve
sparse_attention		sparse_attention
trtllm-eval		trtllm-eval
visual_gen		visual_gen
wide_ep		wide_ep
README.md		README.md
__init__.py		__init__.py
constraints.txt		constraints.txt
eval_long_context.py		eval_long_context.py
generate_checkpoint_config.py		generate_checkpoint_config.py
generate_xgrammar_tokenizer_info.py		generate_xgrammar_tokenizer_info.py
hf_lora_convert.py		hf_lora_convert.py
mmlu.py		mmlu.py
run.py		run.py
summarize.py		summarize.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

TensorRT-LLM Examples

Quick Start

Examples Directory

Pre-Tuned Model Configurations

AutoDeploy (Beta)

Legacy Engine-Build Workflow

FilesExpand file tree

examples

Directory actions

More options

Directory actions

More options

Latest commit

History

examples

Folders and files

parent directory

README.md

TensorRT-LLM Examples

Quick Start

Examples Directory

Pre-Tuned Model Configurations

AutoDeploy (Beta)

Legacy Engine-Build Workflow