Zifan Zhao¹, Siddhant Haldar², Jinda Cui³, Lerrel Pinto², *Raunaq Bhirangi²
¹New York University Shanghai, ²New York University, ³Honda Research
*Corresponding author: [email protected]
This repository provides code for "Touch begins where vision ends: Generalizable policies for contact-rich manipulation". It supports jax-based behaviour cloning and residual RL modules. Semantic augmentation pipelines and VLM-guided reaching extension are also included.
- Create a Conda environment:
conda env create -n vital python=3.10
- Install dependencies:
pip install -r requirements.txt
- Set up pre-commit hooks:
pre-commit install
- Configure project root:
- Create
cfgs/local_config.yamlwith:root_dir: /path/to/vital/
- Create
- Update experiment parameters in
cfgs/*.yaml, notablysuite/xarm.yamlfor dataset paths.
-
BC raining:
python train.py
-
BC Evaluation:
python eval.py model_path=/path/to/checkpoint
-
RL raining:
python train.py expt=bcrl_sac offset_action_scale=2.0
-
RL Evaluation:
python eval.py expt=bcrl_sac offset_action_scale=2.0 model_path=/path/to/checkpoint
Preprocessing scripts for domain randomization and augmentation using DIFT, SAM2, and XMem. For each module you should create corresponding environment from the original repo.
cd dift
python process_xarm_pipeline.pyPrepare anchor data for each task:
anchor_data/{task_name}/base/
├── gripper_mask.png
├── object_mask.png
├── target_mask.png
└── dift_feature_map.pt
cd sam2
python process_xarm_pipeline.pycd xmem
python process_xarm_pipeline.pyPopulate augmented_backgrounds/ with randomly generated background images for augmentation.
- Launch preprocessing servers in separate terminals:
# Terminal 1 cd dift && python dift_server.py # Terminal 2 cd sam2 && python sam2_server.py # Terminal 3 cd xmem && ./scripts/download_models.sh && python xmem_server.py
Enable visuotactile reaching with the Molmo server:
cd molmo
python molmo_server.pySet molmo_reaching: true in your configuration file to activate the VLM-based reaching module.