Tool-as-Interface: Learning Robot Tool Use from Human Play through Imitation Learning

Haonan Chen¹, Cheng Zhu¹, Yunzhu Li², Katherine Driggs-Campbell¹

¹University of Illinois, Urbana-Champaign, ²Columbia University,

Hardware Requirements

UR5-CB3 or UR5e (with RTDE Interface)
Alternative: Kinova Gen 3

Sensors:
- 2× Intel RealSense D415
- USB-C cables and mounting hardware
Control Interfaces:
- 3Dconnexion SpaceMouse (teleoperation)
- GELLO Controller (teleoperation)
Custom Components:
- 3D Printed Hammer and Nail
  
  To install a tool on the UR5E, we provide two types of fast tool changers:
  - 3D-Printed Clipper
    - Requires a connector to attach tools to the Clipper.
      - Example: The hammer linked above already includes the connector.
    The upper Clipper is connected to the Clipper base using one M4×16 screw and one M4 nut. A clipper gasket is provided to place between the UR5E robot and the Clipper. If the gasket is chosen to use, you should use four M6×30 screws. Without the clipper gasket, four M6×24 screws will work as well. Four M6 screw gasket will be used in both conditions.
  - 3D Printed Mounter
    - Suitable for both 3D-printed and standard tools.
    - Secured using one or two 3D-printed screws.
    To connect the Mounter with UR5E robot, you should use four M6×12 screws. Four M6 screw gasket will be used here to make it tightly connected.

Environment Setup

We recommend using Mambaforge over the standard Anaconda distribution for a faster installation process. Create your environment using:

Install the necessary dependencies:

sudo apt install -y libosmesa6-dev libgl1-mesa-glx libglfw3 patchelf libglm-dev

Clone the repository:

git clone --recursive https://github.com/Tool-as-Interface/Tool_as_Interface.git
cd Tool_as_Interface/
git clone https://github.com/xinyu1205/recognize-anything.git third_party/Grounded-Segment-Anything/recognize-anything

Update mamba, and create and activate the environment:

To update mamba and create the environment, use the following commands:
```
mamba install mamba=1.5.1 -n base -c conda-forge
mamba env create -f conda_environment_real.yml
mamba activate ti
```

Install packages:

# grounded sam
export AM_I_DOCKER=False
export BUILD_WITH_CUDA=True
export CUDA_HOME=/usr/local/cuda-11.8

pip install https://artifactory.kinovaapps.com:443/artifactory/generic-public/kortex/API/2.6.0/kortex_api-2.6.0.post3-py3-none-any.whl
pip install --no-build-isolation -e third_party/Grounded-Segment-Anything/GroundingDINO
pip install mmcv-full -f https://download.openmmlab.com/mmcv/dist/cu118/torch2.0.0/index.html
pip install third_party/Grounded-Segment-Anything/grounded-sam-osx/transformer_utils

# FoundationPose
CONDA_ENV_PATH=$(conda info --base)/envs/$(basename "$CONDA_PREFIX")
EIGEN_PATH="$CONDA_ENV_PATH/include/eigen3"
export CMAKE_PREFIX_PATH="$CMAKE_PREFIX_PATH:$EIGEN_PATH"
cd third_party/FoundationPose
CMAKE_PREFIX_PATH=$CONDA_PREFIX/lib/python3.10/site-packages/pybind11/share/cmake/pybind11 console build_all_conda.sh
cd ../..

Download the checkpoints
```
bash setup_downloads.sh
```

Verify the Installation

To ensure everything is installed correctly, run the following commands:

python -c 'import torch; print(torch.__version__); print(torch.cuda.is_available())'
python -c 'import torchvision; print(torchvision.__version__)'
python -c "from groundingdino.util.inference import Model; from segment_anything import sam_model_registry, SamPredictor"

Data Preparation

mkdir -p data

Place the hammer_human dataset in the data folder. The directory structure should be:

To obtain the dataset, download the corresponding zip file from the following link:
hammer_human dataset

Once downloaded, extract the contents into the data folder.

Directory structure:

data/
└── hammer_human/

Demo, Training and Eval on a Real Robot

Activate conda environment and login to wandb (if you haven't already).

conda activate ti
wandb login

Calibrate the multi cameras

Adjust the arguments in the calibrate_extrinsics function located in ti/real_world/multi_realsense.py to calibrate multiple cameras. The calibration is performed using a 400×300 mm Charuco board with a checker size of 40 mm (DICT_4X4).

The robot_base_in_world parameter is manually measured and tuned in ti/real_world/multi_realsense.py.

Run the following command to perform the calibration:

python ti/real_world/multi_realsense.py

Collecting Demonstration Data

Start the demonstration collection script. The following script applies to both human play collection and robot demonstration collection.

Press C to start recording.
Press S to stop recording.
Press Backspace to delete the most recent recording (confirm with y/n).

Collect Human Play Data

Run the following script to collect human video demonstrations:

python scripts/collect_human_video.py

Once the human play data has been collected, process the raw data using:

python scripts/preprocess_human_play.py

Collect Robot Demonstration Data (Baseline)

If you want to use GELLO, please calibrate the GELLO offset using the following script:

python ti/devices/gello_software/scripts/gello_get_offset.py

After calibration, update the YAML configuration file:

ti/devices/gello_software/gello.yaml

For details on performing the calibration, refer to:

ti/devices/gello_software/README.md

Once the calibration is complete, update the argument in scripts/demo_real_ur5e.py to select either Spacemouse or GELLO, then run the following command to collect robot demonstration data:

python scripts/demo_real_ur5e.py

Training

To launch training, run:

python scripts/train_diffusion_policy.py \
  --config-name=train_diffusion_policy.yaml

Evaluation on a Real Robot

Modify eval_real_ur5e_human.py or eval_real_ur5e.py, then launch the evaluation script:

Press C to start evaluation (handing control over to the policy).
Press S to stop the current episode.

Evaluate Human Play-Trained Policy

For eval_real_ur5e_human.py, we assume that the tool and the end-effector (EEF) are rigidly attached. Therefore, the tool pose estimation only needs to be performed once.

Press T once to estimate the tool's pose in the EEF frame.

python eval_real_ur5e_human.py

Evaluate Teleoperation-Trained Policy

python eval_real_ur5e.py

Troubleshooting

If you encounter the following error:

AttributeError: module 'collections' has no attribute 'MutableMapping'

Resolve it by installing the correct version of protobuf:

pip install protobuf==3.20.1

Acknowledgement

Policy training implementation is adapted from Diffusion Policy.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
assets/mesh/hammer		assets/mesh/hammer
conf/train_bc		conf/train_bc
doc		doc
engine		engine
scripts		scripts
third_party		third_party
ti		ti
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
conda_environment_real.yml		conda_environment_real.yml
setup.py		setup.py
setup_downloads.sh		setup_downloads.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Tool-as-Interface: Learning Robot Tool Use from Human Play through Imitation Learning

Hardware Requirements

Environment Setup

Verify the Installation

Data Preparation

Demo, Training and Eval on a Real Robot

Calibrate the multi cameras

Collecting Demonstration Data

Collect Human Play Data

Collect Robot Demonstration Data (Baseline)

Training

Evaluation on a Real Robot

Evaluate Human Play-Trained Policy

Evaluate Teleoperation-Trained Policy

Troubleshooting

Acknowledgement

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

Tool-as-Interface/Tool_as_Interface

Folders and files

Latest commit

History

Repository files navigation

Tool-as-Interface: Learning Robot Tool Use from Human Play through Imitation Learning

Hardware Requirements

Environment Setup

Verify the Installation

Data Preparation

Demo, Training and Eval on a Real Robot

Calibrate the multi cameras

Collecting Demonstration Data

Collect Human Play Data

Collect Robot Demonstration Data (Baseline)

Training

Evaluation on a Real Robot

Evaluate Human Play-Trained Policy

Evaluate Teleoperation-Trained Policy

Troubleshooting

Acknowledgement

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages