- [May 2026] SFT and RL post-training scripts are available in Alpamayo Recipes: Alpamayo 1.5 SFT and Alpamayo 1.x RL post-training.
π Please read the HuggingFace Model Card first! The model card contains comprehensive details on model architecture, inputs/outputs, licensing, and tested hardware configurations. This GitHub README focuses on setup, usage, and frequently asked questions.
- NVIDIA GPU with CUDA support
- CUDA Toolkit 12.x with
nvcc(required to compileflash-attnfrom source). If you don't have it, see Troubleshooting for a fallback using PyTorch's built-in SDPA. - Python 3.12
| Configuration | VRAM |
|---|---|
Single-sample inference (num_traj_samples=1) |
~24 GB |
Multi-sample inference (num_traj_samples=16) |
~40 GB |
Multi-sample inference with CFG (num_traj_samples=16) |
~60 GB |
Measured on an NVIDIA H100 80GB GPU.
curl -LsSf https://astral.sh/uv/install.sh | sh
export PATH="$HOME/.local/bin:$PATH"uv venv a1_5_venv
source a1_5_venv/bin/activate
uv sync --activeNote: If
uv syncfails onflash-attn, see Troubleshooting below.
The model and dataset require access to gated resources. Request access here:
Then authenticate:
hf auth loginGet your token at: https://huggingface.co/settings/tokens
Note: The
physical_ai_avpackage (auto-installed via dependencies) streams data from the HuggingFace dataset. You must have accepted the dataset access request above before running inference.
NOTE: This script will download both some example data (relatively small) and the model weights (22 GB). The latter can be particularly slow depending on network bandwidth. For reference, it takes around 2.5 minutes on a 100 MB/s wired connection.
python src/alpamayo1_5/test_inference.pyIn case you would like to obtain more trajectories and reasoning traces, please feel free to increase
the num_traj_samples argument in the script.
We provide notebooks that demonstrate the different capabilities of Alpamayo 1.5 under notebooks/, including standard model inference, incorporating navigation guidance, modifying the number of cameras, and visual question answering.
Alpamayo 1.5 provides two inference methods:
-
sample_trajectories_from_data_with_vlm_rollout-- Full pipeline: the VLM generates chain-of-causation reasoning, then a diffusion expert produces trajectory predictions conditioned on the VLM's hidden states. This is the primary inference method used by the test script and most notebooks. -
generate_text-- Text-only generation for visual question answering (VQA). Returns extracted text fields.
SFT and RL post-training scripts are maintained in Alpamayo Recipes:
- Alpamayo 1.5 SFT
- Alpamayo 1.x RL post-training, including Alpamayo 1.5
alpamayo_1.5_release/
βββ notebooks/
β βββ inference.ipynb # Standard model inference
β βββ inference_cam_num.ipynb # Inference with different camera counts
β βββ inference_nav.ipynb # Inference with navigation guidance
β βββ inference_vqa.ipynb # Visual question answering
βββ src/
β βββ alpamayo1_5/
β βββ action_space/
β β βββ ... # Action space definitions
β βββ diffusion/
β β βββ ... # Diffusion model components
β βββ geometry/
β β βββ ... # Geometry utilities and modules
β βββ models/
β β βββ ... # Model components and utils functions
β βββ __init__.py # Package marker
β βββ config.py # Model and experiment configuration
β βββ helper.py # Utility functions
β βββ load_physical_aiavdataset.py # Dataset loader
β βββ test_inference.py # Inference test script
βββ pyproject.toml # Project dependencies
βββ uv.lock # Locked dependency versions
The model uses Flash Attention 2 by default. flash-attn requires CUDA Toolkit (specifically nvcc) at build time. If you see build errors during uv sync:
Option A: Install without flash-attn and use SDPA fallback
uv sync --active --no-install-package flash-attnThen load the model with PyTorch's built-in scaled dot-product attention:
from alpamayo1_5.models.alpamayo1_5 import Alpamayo1_5
model = Alpamayo1_5.from_pretrained(
"nvidia/Alpamayo-1.5-10B",
dtype=torch.bfloat16,
attn_implementation="sdpa",
).to("cuda")Option B: Install CUDA Toolkit, then retry
Install CUDA Toolkit 12.x (e.g., via your package manager or NVIDIA's install guide), ensure nvcc is on your PATH, then re-run:
uv sync --activeHow does Alpamayo 1.5 relate to Alpamayo 1?
Alpamayo 1.5 expands upon the architecture released in Alpamayo 1 and fully realizes what is described in our paper "Alpamayo 1: Bridging Reasoning and Action Prediction for Generalizable Autonomous Driving in the Long Tail ". Specifically:
| Feature | Description | Alpamayo 1 | Alpamayo 1.5 |
|---|---|---|---|
| Chain-of-Causation (CoC) reasoning | Hybrid auto-labeling with human in the loop for reasoning traces | β Included | β Included |
| Vision-Language-Action architecture | Cosmos-Reason backbone + action expert | β Included | β Included |
| Trajectory prediction | 6.4s horizon, 64 waypoints at 10 Hz | β Supported | β Supported |
| RL post-training | Reinforcement learning for reasoning/action consistency | β Not RL post-trained | β RL post-trained |
| Navigation conditioning | Explicit navigation inputs | β Not supported | β Supported |
| General VQA | Supports visual question answering | β Not supported | β Supported |
| Flexible multi-camera support | Supports a variable number of input cameras | β Not supported | β Supported |
Does Alpamayo 1.5 accept navigation inputs?
Yes! Please see notebooks/inference_nav.ipynb for examples.
Does Alpamayo 1.5 support general VQA?
Yes! Please see notebooks/inference_vqa.ipynb for examples.
Was Alpamayo 1.5 post-trained with Reinforcement Learning (RL)?
Yes! Alpamayo 1.5 has undergone RL post-training, achieving improvements in reasoning quality and reasoning-trajectory alignment as a result.
Does Alpamayo 1.5 accept different numbers of cameras?
Yes! Please see notebooks/inference_cam_num.ipynb for examples. Note that model accuracy may degrade with fewer cameras, the magnitude of which will depend on the specific scenario. For instance, it is expected that Alpamayo 1.5 would struggle to see cross-traffic in a right turn if only provided a front-facing camera.
What are the minimum GPU requirements?
You need an NVIDIA GPU with at least 24 GB VRAM for inference. Tested configurations include RTX 3090, A100, H100, and B200. Running on GPUs with less memory (e.g., 16 GB) will likely result in CUDA out-of-memory errors. Please refer to our hardware requirements for more information.
Can I use this model in production / commercial applications?
No. The model weights are released under a non-commercial license. This release is intended for research, experimentation, and evaluation purposes only. See the License section and the HuggingFace Model Card for details.
Apache License 2.0 - see LICENSE for details.
Alpamayo 1.5 is a pre-trained reasoning model designed to accelerate research and development in the autonomous vehicle (AV) domain. It is intended to serve as a foundation for a range of AV-related use cases-from instantiating an end-to-end backbone for autonomous driving to enabling reasoning-based auto-labeling tools. In short, it should be viewed as a building block for developing customized AV applications.
Important notes:
- Alpamayo 1.5 is provided solely for research, experimentation, and evaluation purposes.
- Alpamayo 1.5 is not a fully fledged driving stack. Among other limitations, it lacks access to critical real-world sensor inputs, does not incorporate required diverse and redundant safety mechanisms, and has not undergone automotive-grade validation for deployment.
By using this model, you acknowledge that it is a research tool intended to support scientific inquiry, benchmarking, and explorationβnot a substitute for a certified AV stack. The developers and contributors disclaim any responsibility or liability for the use of the model or its outputs.
If you use Alpamayo 1.5 in your research, please cite:
@article{nvidia2025alpamayo,
title={{Alpamayo-R1}: Bridging Reasoning and Action Prediction for Generalizable Autonomous Driving in the Long Tail},
author={NVIDIA and Yan Wang and Wenjie Luo and Junjie Bai and Yulong Cao and Tong Che and Ke Chen and Yuxiao Chen and Jenna Diamond and Yifan Ding and Wenhao Ding and Liang Feng and Greg Heinrich and Jack Huang and Peter Karkus and Boyi Li and Pinyi Li and Tsung-Yi Lin and Dongran Liu and Ming-Yu Liu and Langechuan Liu and Zhijian Liu and Jason Lu and Yunxiang Mao and Pavlo Molchanov and Lindsey Pavao and Zhenghao Peng and Mike Ranzinger and Ed Schmerling and Shida Shen and Yunfei Shi and Sarah Tariq and Ran Tian and Tilman Wekel and Xinshuo Weng and Tianjun Xiao and Eric Yang and Xiaodong Yang and Yurong You and Xiaohui Zeng and Wenyuan Zhang and Boris Ivanovic and Marco Pavone},
year={2025},
journal={arXiv preprint arXiv:2511.00088},
}