Skip to content
/ ViTaL Public

Accompanying codebase for paper"Touch begins where vision ends: Generalizable policies for contact-rich manipulation"

License

Notifications You must be signed in to change notification settings

Exiam6/ViTaL

Repository files navigation

Touch begins where vision ends: Generalizable policies for contact-rich manipulation

[Paper] [Project Website]

Zifan Zhao¹, Siddhant Haldar², Jinda Cui³, Lerrel Pinto², *Raunaq Bhirangi²

¹New York University Shanghai, ²New York University, ³Honda Research

*Corresponding author: [email protected]


This repository provides code for "Touch begins where vision ends: Generalizable policies for contact-rich manipulation". It supports jax-based behaviour cloning and residual RL modules. Semantic augmentation pipelines and VLM-guided reaching extension are also included.


1. Installation

  1. Create a Conda environment:
    conda env create -n vital python=3.10
  2. Install dependencies:
    pip install -r requirements.txt
  3. Set up pre-commit hooks:
    pre-commit install
  4. Configure project root:
    • Create cfgs/local_config.yaml with:
      root_dir: /path/to/vital/
  5. Update experiment parameters in cfgs/*.yaml, notably suite/xarm.yaml for dataset paths.

2. Usage

  • BC raining:

    python train.py
  • BC Evaluation:

    python eval.py model_path=/path/to/checkpoint
  • RL raining:

    python train.py expt=bcrl_sac offset_action_scale=2.0
  • RL Evaluation:

    python eval.py expt=bcrl_sac offset_action_scale=2.0 model_path=/path/to/checkpoint

3. Semantic Augmentation Pipeline

Preprocessing scripts for domain randomization and augmentation using DIFT, SAM2, and XMem. For each module you should create corresponding environment from the original repo.

3.1 DIFT

cd dift
python process_xarm_pipeline.py

Prepare anchor data for each task:

anchor_data/{task_name}/base/
├── gripper_mask.png
├── object_mask.png
├── target_mask.png
└── dift_feature_map.pt

3.2 SAM2

cd sam2
python process_xarm_pipeline.py

3.3 XMem

cd xmem
python process_xarm_pipeline.py

Populate augmented_backgrounds/ with randomly generated background images for augmentation.


4. Reinforcement Learning Workflow

  1. Launch preprocessing servers in separate terminals:
    # Terminal 1
    cd dift && python dift_server.py
    
    # Terminal 2
    cd sam2 && python sam2_server.py
    
    # Terminal 3
    cd xmem && ./scripts/download_models.sh && python xmem_server.py

5. VLM-Guided Reaching Extension

Enable visuotactile reaching with the Molmo server:

cd molmo
python molmo_server.py

Set molmo_reaching: true in your configuration file to activate the VLM-based reaching module.


About

Accompanying codebase for paper"Touch begins where vision ends: Generalizable policies for contact-rich manipulation"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published