BFM-Zero: A Promptable Behavioral Foundation Model for Humanoid Control Using Unsupervised Reinforcement Learning

BFM-Zero Deployment Stack

Deployment stack for BFM-Zero on the Unitree G1 (Jetson Orin)

This repository provides a complete deployment solution for running BFM-Zero policies on the Unitree G1 robot. It includes everything you need to test in simulation and deploy to the real robot.

📦 What's Included

ONNX Policy Runner (rl_policy/bfm_zero.py, exposing BFMZeroPolicy)
Robot & Policy Configuration files under config/
Experiment Configs for tracking, reward inference, and goal reaching (config/exp/)

🚀 Quick Start

Environment Setup

Create and activate the runtime environment (Python 3.10 recommended):
```
conda create -n bfm0real python=3.10 -y
conda activate bfm0real
```

Install Python dependencies:

cd BFM-Zero
pip install -r requirements.txt

Downloading the BFM-Zero ONNX Model

run

python download_hf_model.py --token <YOUR_HF_TOKEN>

You can either pass your Hugging Face token as a flag (as above), or set it via the HF_TOKEN environment variable:

export HF_TOKEN=your_token_here
python download_hf_model.py

After downloading the model, your directory structure should look like this:

model/
├── exported/
│   └── *.onnx              # ONNX policy model
├── tracking_inference/
│   └── *.pkl                   # Latent variables for tracking tasks
├── reward_inference/
│   └── *.pkl                   # Latent variables for reward inference tasks
└── goal_inference/
    └── *.pkl                   # Latent variables for goal reaching tasks

Directory Structure:

exported/model.onnx - The main ONNX policy model file
tracking_inference/*.pkl - Pre-computed latent variables for motion tracking (shape: [seq_length, 256])
reward_inference/*.pkl - Pre-computed latent variables for reward inference tasks
goal_inference/*.pkl - Pre-computed latent variables for goal reaching tasks

🧪 Sim-to-Sim Test Workflow

⚠️ Highly Recommended Before Real Robot Deployment
Not required on Jetson if you only plan to run on the real robot, but highly recommended to validate every motion first.

Step 1: Launch the Simulation (MuJoCo)

Standard (Linux/Windows):

python -m sim_env.base_sim \
  --robot_config=./config/robot/g1.yaml \
  --scene_config=./config/scene/g1_29dof.yaml

macOS (with MuJoCo's mjpython):

mjpython -m sim_env.base_sim \
  --robot_config=./config/robot/g1.yaml \
  --scene_config=./config/scene/g1_29dof.yaml

You should now see a MuJoCo environment:

🎮 Interactive Simulation Commands

7 - Lift the robostack
8 - Lower down the robostack
9 - Release the robostack

Step 2: Start the Policy Process

In a separate terminal, activate the environment and navigate to the project:

conda activate bfm0real
cd BFM-Zero

Run the policy:

python rl_policy/bfm_zero.py \
  --robot_config config/robot/g1.yaml \
  --policy_config ${POLICY_CONFIG} \
  --model_path ${MODEL_ONNX_PATH} \
  --task ${TASK}

We provide three types of tasks: tracking, reward inference, and goal reaching.

⌨️ Interactive Terminal Commands

Key	Action	Task Type
`]`	Start policy action (stops at `stop` frame)	All
`[`	Start tracking	Tracking
`p`	Stop at the `stop` frame	Tracking
`n`	Switch to next reward/goal	Reward/Goal
`i`	Reset robot to initial position	All
`o`	Emergency stop (release all control)	All

📋 Task-Specific Instructions

🎯 Tracking

./rl_policy/tracking.sh

Configuration: config/exp/tracking/<your-motion>.yaml

Parameter	Description	Example
`ctx_path`	`.pkl` file containing `seq_length` latent variables. Shape: `[seq_length, 256]`	`../tracking_inference/zs_walking.pkl`
`start`	Start frame index in the context sequence	`0`
`end`	End frame index (set to `None` for continuous tracking)	`2000` or `None`
`gamma`	Discount factor for context averaging. Controls weight of recent vs. older frames	`0.8`
`window_size`	Number of frames to average over. Larger = smoother but more delayed	`3`

💡 Tip: If you generate latent z with discounted window, set window_size to 1.

🎁 Reward Inference

./rl_policy/reward.sh

Configuration: config/exp/reward/<your-rewards>.yaml

Parameter	Description
`ctx_path`	Path to the reward inference `.pkl` file containing reward-specific latent variables
`selected_rewards_filter_z`	List of dictionaries specifying which rewards and z indices to use

Example Configuration:

selected_rewards_filter_z:
  - reward: "raisearms-m-l"
    z_ids: [3]
  - reward: "sitonground"
    z_ids: [3, 4]

Each entry selects specific z latent variables (different subsampled buffers are used to infer different zs) for a given reward.

🎯 Goal Reaching

./rl_policy/goal.sh

Configuration: config/exp/goal/<your-goal-states>.yaml

Parameter	Description
`ctx_path`	Path to the goal inference `.pkl` file containing goal-specific latent variables
`selected_goals`	List of goal names to select from the context dictionary

Example Configuration:

selected_goals: [
  'walk3_subject3_9044',    # on your knees
  'fight1_subject3_5559',   # hands on hip
  'dance1_subject3_4024',   # raise arm high
]

🤖 On-Robot Deployment (Jetson Orin, Unitree G1)

‼️Alert & Disclaimer

Deploying these models on physical hardware can be hazardous, and these models can ONLY BE DEPLOYED ON G1 Type 10, 11, 12, 15, or 16. Unless you have deep sim‑to‑real expertise and robust safety protocols, we strongly advise against running the model on real robots. These models are supplied for research use only, and we disclaim all responsibility for any harm, loss, or malfunction arising from their deployment.

Required Setup

Target platform: onboard Orin Jetson of the Unitree G1 (ssh into the robot and copy this codebase).
Install the Unitree C++ SDK Python binding from https://github.com/EGalahad/unitree_sdk2 to enable 50 Hz control. Update the import path in rl_policy/base_policy.py after building the binding.

Build the SDK with CMake

git clone https://github.com/EGalahad/unitree_sdk2.git
sudo apt-get update
sudo apt-get install build-essential cmake python3-dev python3-pip
pip3 install pybind11 pybind11-stubgen numpy
cd ./unitree_sdk2
mkdir build
cd build
cmake -DCMAKE_BUILD_TYPE=Release -Dpybind11_DIR=<your-pybind11-path> # `python3 -m pybind11 --cmakedir` to see the path
make -j$(nproc)

git clone https://github.com/eclipse-cyclonedds/cyclonedds -b releases/0.10.x
cd cyclonedds && mkdir build install && cd build
cmake .. -DCMAKE_INSTALL_PREFIX=../install
cmake --build . --target install
echo "export CYCLONEDDS_HOME=/home/<username>/cyclonedds/install">>~/.bashrc. # if your cyclonedds is not in root directory, change this path accordingly
source ~/.bashrc

Running on the Real Robot

In ./rl_policy/bfm_zero.py, replace "/path/to/your/unitree_sdk2/build/lib" with the actual path to your Unitree SDK2 installation (e.g., sys.path.append("/home/unitree/User/unitree_sdk2/build/lib")).

When deploying to the real robot, use the real-robot configuration file g1_real.yaml instead of the simulation configuration g1.yaml.

python rl_policy/bfm_zero.py \
  --robot_config config/robot/g1_real.yaml \
  --policy_config ${POLICY_CONFIG} \
  --model_path ${MODEL_ONNX_PATH} \
  --task  ${TASK}

Known Issues

If you encounter an error stating that eth0 is not a valid network interface, update the interface name in the file './config/robot/g1_real.yaml' to match your robot’s actual network interface (e.g., eth1).
If you encounter severe jittering / non-stable performance on the real robot, please ensure you are using full GPU for inference and that there are no network issues, so the inference latency stays around 4–5 ms, and that your robot is type 10, 11, 12, 15, or 16 with the hip pitch motor N7520-22.5.

👥 Citation

If you find this project useful in your research, please consider citing:

@misc{li2025bfmzeropromptablebehavioralfoundation,
      title={BFM-Zero: A Promptable Behavioral Foundation Model for Humanoid Control Using Unsupervised Reinforcement Learning}, 
      author={Yitang Li and Zhengyi Luo and Tonghe Zhang and Cunxi Dai and Anssi Kanervisto and Andrea Tirinzoni and Haoyang Weng and Kris Kitani and Mateusz Guzek and Ahmed Touati and Alessandro Lazaric and Matteo Pirotta and Guanya Shi},
      year={2025},
      eprint={2511.04131},
      archivePrefix={arXiv},
      primaryClass={cs.RO},
      url={https://arxiv.org/abs/2511.04131}, 
}

This sim-to-real repo is built upon:

@misc{weng2025hdmilearninginteractivehumanoid,
      title={HDMI: Learning Interactive Humanoid Whole-Body Control from Human Videos}, 
      author={Haoyang Weng and Yitang Li and Nikhil Sobanbabu and Zihan Wang and Zhengyi Luo and Tairan He and Deva Ramanan and Guanya Shi},
      year={2025},
      eprint={2509.16757},
      archivePrefix={arXiv},
      primaryClass={cs.RO},
      url={https://arxiv.org/abs/2509.16757}, 
}

@article{he2025asap,
  title={ASAP: Aligning Simulation and Real-World Physics for Learning Agile Humanoid Whole-Body Skills},
  author={He, Tairan and Gao, Jiawei and Xiao, Wenli and Zhang, Yuanhang and Wang, Zi and Wang, Jiashun and Luo, Zhengyi and He, Guanqi and Sobanbabu, Nikhil and Pan, Chaoyi and Yi, Zeji and Qu, Guannan and Kitani, Kris and Hodgins, Jessica and Fan, Linxi "Jim" and Zhu, Yuke and Liu, Changliu and Shi, Guanya},
  journal={arXiv preprint arXiv:2502.01143},
  year={2025}
}

License

BFM-Zero is licensed under the CC BY-NC 4.0 license. See LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
config		config
data/robots/g1		data/robots/g1
rl_policy		rl_policy
scripts		scripts
sim_env		sim_env
static/images		static/images
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
download_hf_model.py		download_hf_model.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BFM-Zero: A Promptable Behavioral Foundation Model for Humanoid Control Using Unsupervised Reinforcement Learning

BFM-Zero Deployment Stack

📦 What's Included

🚀 Quick Start

Environment Setup

Downloading the BFM-Zero ONNX Model

🧪 Sim-to-Sim Test Workflow

Step 1: Launch the Simulation (MuJoCo)

Step 2: Start the Policy Process

📋 Task-Specific Instructions

🎯 Tracking

🎁 Reward Inference

🎯 Goal Reaching

🤖 On-Robot Deployment (Jetson Orin, Unitree G1)

‼️Alert & Disclaimer

Required Setup

Running on the Real Robot

Known Issues

👥 Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

BFM-Zero: A Promptable Behavioral Foundation Model for Humanoid Control Using Unsupervised Reinforcement Learning

BFM-Zero Deployment Stack

📦 What's Included

🚀 Quick Start

Environment Setup

Downloading the BFM-Zero ONNX Model

🧪 Sim-to-Sim Test Workflow

Step 1: Launch the Simulation (MuJoCo)

Step 2: Start the Policy Process

📋 Task-Specific Instructions

🎯 Tracking

🎁 Reward Inference

🎯 Goal Reaching

🤖 On-Robot Deployment (Jetson Orin, Unitree G1)

‼️Alert & Disclaimer

Required Setup

Running on the Real Robot

Known Issues

👥 Citation

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages