Xiaomi-Robotics-0

An Open-Sourced Vision-Language-Action Model with Real-Time Inference

💡 About Xiaomi-Robotics-0

Xiaomi-Robotics-0 is a state-of-the-art Vision-Language-Action (VLA) model with 4.7B parameters, specifically engineered for high-performance robotic reasoning and seamless real-time execution.

Key Features:

🧠 Strong Generalization: Pre-trained on diverse cross-embodiment trajectories and VL data to handle complex, unseen tasks.
🚀 Real-Time Ready: Optimized with asynchronous execution to minimize inference latency.
🛠️ Flexible Deployment: Fully compatible with the Hugging Face transformers ecosystem and optimized for consumer GPUs.

📅 Updates

[Feb 2026] 🎉 Released the Technical Report.
[Feb 2026] 🔥 Released Pre-trained weights and Fine-tuned weights for LIBERO, CALVIN, and SimplerEnv.
[Feb 2026] 💻 Inference code and evaluation scripts are now live!

🏆 Benchmark

We evaluate Xiaomi-Robotics-0 on three standard simulation benchmarks: CALVIN, LIBERO, and SimplerEnv. The table below summarizes the performance results across different embodiments and datasets. For each setting, we provide the corresponding fine-tuned checkpoint and a guide for running the evaluation.

	🤗 Name on Hugging Face	Description	Performance	Evaluation Guide
LIBERO	Xiaomi-Robotics-0-LIBERO	Fine-tuned on four LIBERO suites.	98.7% (Avg Success)	LIBERO Eval
CALVIN	Xiaomi-Robotics-0-Calvin-ABCD_D	Fine-tuned on ABCD→D Split.	4.80 (Avg Length)	CALVIN Eval
	Xiaomi-Robotics-0-Calvin-ABC_D	Fine-tuned on ABC→D Split.	4.75 (Avg Length)	CALVIN Eval
SimplerEnv	Xiaomi-Robotics-0-SimplerEnv-Google-Robot	Fine-tuned on Fractal dataset.	85.5% (VM) 74.7% (VA)	SimplerEnv Eval
	Xiaomi-Robotics-0-SimplerEnv-WidowX	Fine-tuned on Bridge dataset.	79.2%	SimplerEnv Eval
Base	Xiaomi-Robotics-0	Pre-trained model.	-	-

🚀 Quick Start: Installation & Deployment

Our project relies primarily on HuggingFace Transformers 🤗, making deployment extremely easy. If your environment supports transformers >= 4.57.1, you can use our project seamlessly—we recommend PyTorch 2.8.0 (paired with torchvision 0.23.0 and torchaudio 2.8.0), as this combination has been fully tested by our team and ensures optimal compatibility.

1️⃣ Installation Guides

Here’s a simple installation guide to get you started:

git clone https://github.com/XiaomiRobotics/Xiaomi-Robotics-0 
cd Xiaomi-Robotics-0

# Create a Conda environment with Python 3.12
conda create -n mibot python=3.12 -y
conda activate mibot

# Install PyTorch
pip install torch==2.8.0 torchvision==0.23.0 torchaudio==2.8.0 --index-url https://download.pytorch.org/whl/cu128
# Install transformers
pip install transformers==4.57.1
# Install flash-attn
pip uninstall -y ninja && pip install ninja
pip install flash-attn==2.8.3 --no-build-isolation
# or pip install https://github.com/Dao-AILab/flash-attention/releases/download/v2.8.3/flash_attn-2.8.3+cu12torch2.8cxx11abiTRUE-cp312-cp312-linux_x86_64.whl

sudo apt-get install -y libegl1 libgl1 libgles2

2️⃣ Deployment Guides

Xiaomi-Robotics-0 is deployed on top of the HuggingFace Transformers 🤗 ecosystem, enabling straightforward deployment for robotic manipulation tasks. By leveraging Flash Attention 2 and bfloat16 precision, the model can be loaded and run efficiently on consumer-grade GPUs.

import torch
from transformers import AutoModel, AutoProcessor

# 1. Load model and processor 
model_path = "XiaomiRobotics/Xiaomi-Robotics-0-LIBERO"
model = AutoModel.from_pretrained(
    model_path, 
    trust_remote_code=True, 
    attn_implementation="flash_attention_2", 
    dtype=torch.bfloat16
).cuda().eval()
processor = AutoProcessor.from_pretrained(model_path, trust_remote_code=True, use_fast=False)


# 2. Construct the prompt with multi-view inputs
language_instruction = "Pick up the red block."
instruction = (
    f"<|im_start|>user\nThe following observations are captured from multiple views.\n"
    f"# Base View\n<|vision_start|><|image_pad|><|vision_end|>\n"
    f"# Left-Wrist View\n<|vision_start|><|image_pad|><|vision_end|>\n"
    f"Generate robot actions for the task:\n{language_instruction} /no_cot<|im_end|>\n"
    f"<|im_start|>assistant\n<cot></cot><|im_end|>\n"
)

# 3. Prepare inputs
# Assuming `image_base`, `image_wrist`, and `proprio_state` are already loaded
inputs = processor(
    text=[instruction],
    images=[image_base, image_wrist], # [PIL.Image, PIL.Image]
    videos=None,
    padding=True,
    return_tensors="pt",
).to(model.device)

# Add proprioceptive state and action mask
robot_type = "libero_all"
inputs["seed"] = 42 
inputs["state"] = torch.from_numpy(proprio_state).to(model.device, model.dtype).view(1, 1, -1)
inputs["action_mask"] = processor.get_action_mask(robot_type).to(model.device, model.dtype)

# 4. Generate action 
with torch.no_grad():
    outputs = model(**inputs)
    
# Decode raw outputs into actionable control commands
action_chunk = processor.decode_action(outputs.actions, robot_type=robot_type)
print(f"Generated Action Chunk Shape: {action_chunk.shape}")

📚 Citation

If you find this project useful, please consider citing:

@article{cai2026xiaomi,
  title={Xiaomi-Robotics-0: An Open-Sourced Vision-Language-Action Model with Real-Time Execution},
  author={Cai, Rui and Guo, Jun and He, Xinze and Jin, Piaopiao and Li, Jie and Lin, Bingxuan and Liu, Futeng and Liu, Wei and Ma, Fei and Ma, Kun and Qiu, Feng and Qu, Heng and Su, Yifei and Sun, Qiao and Wang, Dong and Wang, Donghao and Wang, Yunhong and Wu, Rujie and Xiang, Diyun and Yang, Yu and Ye, Hangjun and Zhang, Yuan and Zhou, Quanyun},
  journal={arXiv preprint arXiv:2602.12684},
  year={2026}
}

📄 License

This project is licensed under the Apache License 2.0.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
assets		assets
deploy		deploy
eval_calvin		eval_calvin
eval_libero		eval_libero
eval_simplerenv		eval_simplerenv
scripts		scripts
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Xiaomi-Robotics-0

💡 About Xiaomi-Robotics-0

Key Features:

📅 Updates

🏆 Benchmark

🚀 Quick Start: Installation & Deployment

1️⃣ Installation Guides

2️⃣ Deployment Guides

📚 Citation

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Xiaomi-Robotics-0

💡 About Xiaomi-Robotics-0

Key Features:

📅 Updates

🏆 Benchmark

🚀 Quick Start: Installation & Deployment

1️⃣ Installation Guides

2️⃣ Deployment Guides

📚 Citation

📄 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages