Skip to content

Selen-Suyue/DSPv2

Repository files navigation

DSPv2: Improved Dense Policy for Effective and Generalizable Whole-body Mobile Manipulation

Authors: Yue Su, Chubin Zhang, Sijin Chen, Liufan Tan, Yansong Tang, Jianan Wang, Xihui Liu†

teaser

🛫 Getting Started

We have introduced both DINOv3 (As Default) and DINOv2 as the 2D Encoder here. You can refer policy and v_model to look up and change the setup.

The v1 version of DSP : Dense Policy. You can use its Dense Head for action generation easily.

⚡️ Quick Follow

Refer policy to follow DSPv2 easily and smoothly.

💻 Installation

Please following the installation guide to install the dspv2 conda environments and the dependencies. Also, remember to adjust the constant parameters in dataset/constants.py according to your own environment.

🛢️ Data

Our original datasets are collected and organized by hdf5. Each demo(trajectory) is formulated like:

├─ Group: /images_dict
  ├─ Group: /images_dict/head
    └─ Dataset: depth (Shape: (174, 720, 1280), Dtype: uint16)
    └─ Dataset: rgb (Shape: (174, 720, 1280, 3), Dtype: uint8)
    └─ ...
  ├─ Group: /images_dict/left
    └─ ...
  ├─ Group: /images_dict/right
    └─ ...
  ├─ Group: /images_dict/torso
    └─ ...
├─ Group: /joints_dict
  └─ Dataset: joints_position_state (Shape: (174, 25), Dtype: float64)
  └─ ...
├─ Group: /poses_dict
  └─ Dataset: astribot_arm_left (Shape: (174, 7), Dtype: float64)
  └─ Dataset: astribot_arm_right (Shape: (174, 7), Dtype: float64)
  └─ ...
  └─ Dataset: merge_pose (Shape: (174, 37), Dtype: float64)
├─ ...

where we use the multi-view images and head-cam depth, pose_dict/merge_pose is used for organize state and actions, which serves as a combination of [chassis pose, torso pose, left arm pose, left gripper, right arm pose, right gripper, head pose]. They are all relative pose to the chassis. The chassis's movement is based on the world frame, saved as the first 3 dimension in joints_dict/joints_position_state. You can ignore other data in hdf5.

As for the point cloud projection, sampling in conventional methods, voxelization in DSPv2, are provided in dataset/preprocess_data.py.py. It also provides a function for calculating delta of chassis movement. Using dataset/preprocess_data.py.py to process data is essential for accelerating training.

We provide a dummy hdf5 data at here, you can refer its structure by utils/hdf5_view.py. You can also process it by dataset/preprocess_data.py. Note: It may cause some pointcloud without points since it's dummy data.

🧑🏻‍💻 Training

Before training, we recommend to calculate the 5%-95% min-max value of each task for normalization. For each task, just follow utils/minmax.py and save the value in dataset/pose.json, named as your_task_name. Add --task your_task_name in train.sh. Then embark your training:

conda activate dspv2
bash train.sh

🤖 Evaluation

conda activate dspv2
python eval.py

✍️ Citation

@article{dspv2,
    title={DSPv2: Improved Dense Policy for Effective and Generalizable Whole-body Mobile Manipulation}, 
    author={Yue Su and Chubin Zhang and Sijin Chen and Liufan Tan and Yansong Tang and Jianan Wang and Xihui Liu},
    journal={arXiv preprint arXiv:2509.16063},
    year={2025}
}

📃 License

DSPv2 is licensed under CC BY-NC-SA 4.0

About

DSPv2: Improved Dense Policy for Effective and Generalizable Whole-body Mobile Manipulation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published