Jingxiao Chen*
Β·
Xinyao Li*
Β·
Jiahang Cao*
Β·
Zhengbang Zhu
Β·
Wentao Dong
Minghuan Liu
Β·
Ying Wen
Β·
Yong Yu
Β·
Liqing Zhang
Β·
Weinan Zhang
- [2025/03] We have released rgb video of human interaction data! Please check here.
- [2025/02] Original release!
Here is an overview of the environments to be installed and their respective purposes:
-
deploy_control:
- Usage: Inference for control nodes.
- Setup: Clone
ros_env, install HaMeR, ZED Python API, and other dependencies.
-
planner_motion:
- Usage: Inference and training for motion planning.
- Setup: Clone
ros_env, install PyTorch and other dependencies.
-
tv:
- Usage: Data collection, inference, and training for the TeleVision module.
- Setup: Clone
ros_env, install dependencies from OpenTeleVision.
-
Docker for Robot Hardware:
- Usage: Inference for robot hardware control.
- Setup: Install Docker, pull ROS Docker image, build and run the container, compile and build SDK inside the container.
Each environment is tailored to specific parts of the RHINO system, ensuring modularity and ease of management.
Create a conda environment ros_env following the RoboStack guide.
This will serve as the base conda environment for all the following environments we used.
We use Robot Operating System (ROS) to facilitate communication across various modules of the robot. To improve compatibility with conda environments, we utilize a virtual environment based on the Robostack framework. The system is deployed on Ubuntu 22.04 with ROS 2 Humble Hawksbill.
- inference:
deploy_control/ros_node/zed_react_node.py
- Clone ROS env.
conda create -n deploy_control --clone ros_env
conda activate deploy_control- Install HaMeR.
We recommend you to use our fork of HaMeR and download the trained models and install the dependencies according to the README.
Note that HaMeR need to be cloned directly under the RHINO directory.
-
Install ZED Python API following the official guide.
-
Install other dependencies.
pip install ultralytics
pip install scipy==1.10.1
pip install webdataset
pip install h5py- If you are encountering numpy issues such as
TypeError: expected np.ndarray (got numpy.ndarray), try reinstalling numpy with pip:
pip uninstall numpy
pip install numpy==1.23.0- train
planner_motion/tools/train.pyplanner_motion/tools/train_classifier.py
- inference:
planner_motion/tools/react_node.py
- Clone ROS env.
conda create -n planner_motion --clone ros_env
conda activate planner_motion- Install dependencies
cd planner_motion
conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.7 -c pytorch -c nvidia
pip install -r requirements.txt- process_data:
post_process_teleop_data.py
- train:
TeleVision/act/imitate_episodes.py
- inference:
TeleVision/scripts/manip_node.pydeploy_control/ros_nodes/arm_safe_node.py
- Clone ROS env.
conda create -n tv --clone ros_env
conda activate tv- Install dependencies following OpenTeleVision.
cd TeleVision
pip install -r requirements.txt
cd act/detr && pip install -e .- Install ZED Python API following the official guide.
To ensure a stable and reproducible compilation environment, we package the robot's code in a Docker container.
- inference:
deploy_control/ros_nodes/control_node.pydeploy_control/ros_nodes/safe_node.py
- Install Docker.
- pull the docker image with ROS environments.
docker pull osrf/ros2
- Build and run the Docker container, mapping serial devices and device paths. The following command starts a bash shell inside the container with the ROS environment:
docker run --name rhino -it \
--device=/dev/ttyUSB0:/dev/ttyUSB0 \
--device=/dev/ttyUSB1:/dev/ttyUSB1 \
--device=/dev/ttyACM0:/dev/ttyACM0 \
-v /host/path:/container/path \
osrf/ros2 \
bin/bash- Compile and build the SDK and related packages inside the container by following the latest instructions provided by the hardware service. The official SDK for the robot is available at Unitree SDK.
We offer a HuggingFace repo in https://huggingface.co/datasets/TimerChen/RHINO, of which the README file represents dataset descriptions.
Download the dataset as a sub-directory rhino_data of this repository:
- Decompress the dataset with small files
cd rhino_data/
tar -xzvf data/motion_data/motion_data.tar.gz- Create symlink for assets and checkpoints
ln -s rhino_data/motions_processed planner_motion/motions_processed
ln -s rhino_data/h1_assets deploy_control/h1_assets
ln -s rhino_data/yolo_ckpts deploy_control/zed_module/yolo_ckptsWe recommend a two-step pipeline to process the data for the manipulation module:
-
Step 1: Generate Viewable Videos (
--only-video) Convert raw data captured by the ZED camera into playable videos (.mp4files). This allows you to:- Verify recording quality
- Perform manual annotation (e.g., labeling key frames or actions).
-
Step 2: Compile Training Data (
--with-label) After manual annotation, run this step to consolidate all training-ready data (including labels) into.hdf5files. Some hyper-parameter is assigned inprocess_config.yamlto make labels more memorizeble and reasonable for specific subset.
cd deploy_control
python scripts/post_process_teleop_data.py --root rhino_data/data/manipulation_data/ --task pick_can --only-video --overwrite
python scripts/post_process_teleop_data.py --root rhino_data/data/manipulation_data/ --task pick_can --with-label --overwrite --config rhino_data/data/manipulation_data/process_config.yamlcd planner_motion
python train.py --tconf configs/train_30_10.yaml --epiname EXPERIMENT_NAMEcd planner_motion
python train_classifier.py --dtype EXPERIMENT_NAME --mconf "configs/model_train/classifier_his30_rh0.yaml" --hd_input "hand_pos" "hand_near" "head_pos" --better-state --hand-iou-mean-pool --obj-mask-ratio 0.2Since datasets and code are typically stored in separate directories, and processed datasets are often very large, we recommend using soft (symbolic) links to import the required data into the training directory. A script is provided for this purpose, which can be used as follows:
cd TeleVision
python scripts/load_data_soft.py \
--src_dir rhino_data/data/manipulation_data/ \
--dst_dir data/recordings/ \
--config rhino_data/data/manipulation_data/train_config.yaml \
--overwriteThen start the training using the integrated training script:
cd TeleVision
python train.py --exptid 01-task-name-date-lr --config rhino_data/data/manipulation_data/train_config.tamlLogs and checkpoints would be saved to the sub-directory of data/logs with the save name as the skill id (taskid).
We suggest that exptid starts with the prefix of the name of the skill (see train_config.yaml).
For better control of the training hyper-parameters, we also provide the usage of the original comand for training, here is an example:
cd TeleVision/act
python imitate_episodes.py --policy_class ACT --kl_weight 10 \
--chunk_size 30 --hidden_dim 512 --batch_size 45 --dim_feedforward 3200 --num_epochs 50001 \
--lr 5e-5 --seed 0 --taskid 01 --exptid 01-task-name-date-lr --backbone dino_v2 \
--state_name "head" "arm" "wrist" "hand" --left_right_mask "11" --progress_bar 25Modify taskid and exptid before training new skills.
e.g. --taskid 01 --expitd 01-task-name-date-lr. left-right-mask indicates that this task use which of the two arms: 10 for the left arm only, 01 for the right arm only and 11 for using both.
In terminal 1: start the robot arm, wrist and head control.
cd deploy_control
python ros_nodes/control_node.py --manipIn terminal 2: start the dexterous hand control.
cd deploy_control/h1_inspire_service
./inspire_hand -s SERIAL_OF_INSPIRE_HANDwhere the serial can be found by ls /dev/tty*
In terminal 3: run the ZED sensing module.
cd deploy_control
python ros_nodes/zed_react_node.py --use_camera --visIn terminal 4: run the motion generation module.
cd deploy_control
python ros_nodes/react_node.pyIn terminal 5: run the manipulation module.
cd deploy_control
python ros_nodes/manip_node.py --react --skill-cancelAdd --skill-cancel arguments to enable the cross-task interruption.
In terminal 6: run the safe module
cd deploy_control
python ros_nodes/arm_safe_node.pyThis code builds upon following open-source code-bases. Please visit the URLs to see the respective LICENSES:
- https://github.com/tonyzhaozh/act
- https://github.com/facebookresearch/detr
- https://github.com/dexsuite/dex-retargeting
- https://github.com/vuer-ai/vuer
- https://github.com/OpenTeleVision/TeleVision
- https://github.com/geopavlakos/hamer
- https://github.com/tr3e/InterGen
- https://github.com/stephane-caron/pink
@article{rhino2025chen,
author = {Chen*, Jingxiao and Li*, Xinyao and Cao*, Jiahang and Zhu, Zhengbang and Dong, Wentao and Liu, Minghuan and Wen, Ying and Yu, Yong and Zhang, Liqing and Zhang, Weinan},
title = {RHINO: Learning Real-Time Humanoid-Human-Object Interaction from Human Demonstrations},
journal = {arXiv preprint arXiv:2502.13134},
year = {2025},
}