Touch begins where vision ends: Generalizable policies for contact-rich manipulation

Zifan Zhao¹, Siddhant Haldar², Jinda Cui³, Lerrel Pinto², *Raunaq Bhirangi²

¹New York University Shanghai, ²New York University, ³Honda Research

*Corresponding author: [email protected]

This repository provides code for "Touch begins where vision ends: Generalizable policies for contact-rich manipulation". It supports jax-based behaviour cloning and residual RL modules. Semantic augmentation pipelines and VLM-guided reaching extension are also included.

1. Installation

Create a Conda environment:
```
conda env create -n vital python=3.10
```
Install dependencies:
```
pip install -r requirements.txt
```
Set up pre-commit hooks:
```
pre-commit install
```
Configure project root:
- Create cfgs/local_config.yaml with:
```
root_dir: /path/to/vital/
```
Update experiment parameters in cfgs/*.yaml, notably suite/xarm.yaml for dataset paths.

2. Usage

BC raining:
```
python train.py
```

BC Evaluation:

python eval.py model_path=/path/to/checkpoint

RL raining:

python train.py expt=bcrl_sac offset_action_scale=2.0

RL Evaluation:

python eval.py expt=bcrl_sac offset_action_scale=2.0 model_path=/path/to/checkpoint

3. Semantic Augmentation Pipeline

Preprocessing scripts for domain randomization and augmentation using DIFT, SAM2, and XMem. For each module you should create corresponding environment from the original repo.

3.1 DIFT

cd dift
python process_xarm_pipeline.py

Prepare anchor data for each task:

anchor_data/{task_name}/base/
├── gripper_mask.png
├── object_mask.png
├── target_mask.png
└── dift_feature_map.pt

3.2 SAM2

cd sam2
python process_xarm_pipeline.py

3.3 XMem

cd xmem
python process_xarm_pipeline.py

Populate augmented_backgrounds/ with randomly generated background images for augmentation.

4. Reinforcement Learning Workflow

Launch preprocessing servers in separate terminals:

# Terminal 1
cd dift && python dift_server.py

# Terminal 2
cd sam2 && python sam2_server.py

# Terminal 3
cd xmem && ./scripts/download_models.sh && python xmem_server.py

5. VLM-Guided Reaching Extension

Enable visuotactile reaching with the Molmo server:

cd molmo
python molmo_server.py

Set molmo_reaching: true in your configuration file to activate the VLM-based reaching module.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
XMem		XMem
agent		agent
cfgs		cfgs
data_handling		data_handling
data_preprocess		data_preprocess
dift		dift
envs		envs
molmo		molmo
sam2		sam2
suite		suite
tests		tests
.DS_Store		.DS_Store
.gitattributes		.gitattributes
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
eval.py		eval.py
img_utils.py		img_utils.py
logger.py		logger.py
reaching.py		reaching.py
requirements.txt		requirements.txt
rewarder.py		rewarder.py
train.py		train.py
type_utils.py		type_utils.py
utils.py		utils.py
video.py		video.py
workspace.py		workspace.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Touch begins where vision ends: Generalizable policies for contact-rich manipulation

1. Installation

2. Usage

3. Semantic Augmentation Pipeline

3.1 DIFT

3.2 SAM2

3.3 XMem

4. Reinforcement Learning Workflow

5. VLM-Guided Reaching Extension

About

Uh oh!

Releases

Packages

Languages

License

Exiam6/ViTaL

Folders and files

Latest commit

History

Repository files navigation

Touch begins where vision ends: Generalizable policies for contact-rich manipulation

1. Installation

2. Usage

3. Semantic Augmentation Pipeline

3.1 DIFT

3.2 SAM2

3.3 XMem

4. Reinforcement Learning Workflow

5. VLM-Guided Reaching Extension

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages