Skip to content

StoreBlank/KUDA

Repository files navigation

KUDA: Keypoints to Unify Dynamics Learning and Visual Prompting for Open-Vocabulary Robotic Manipulation

Project Page | Video | Arxiv

KUDA

🛠️ Installation

To set up the environment, please follow these steps:

  1. Create the conda environment:

    conda create -n kuda python=3.8
    conda activate kuda
    pip install -r requirements.txt
  2. Download the checkpoints for GroundingSAM:

    cd perception/models
    mkdir checkpoints
    cd checkpoints
    wget https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha/groundingdino_swint_ogc.pth
    wget https://huggingface.co/lkeab/hq-sam/resolve/main/sam_hq_vit_h.pth
  3. Download the checkpoints for SpaTracker:

    cd ../../../dynamics/tracker
    mkdir checkpoints
    cd checkpoints
    pip install gdown
    gdown 18YlG_rgrHcJ7lIYQWfRz_K669z6FdmUX

    You can manually download the checkpoints from Google Drive.

🕹️ Usage

Visual Prompting (No Robot Setup)

To quickly test the visual prompting functionality without setting up a robot, please follow these steps:

  1. Replace the api_key in demo.py with your OpenAI API key.

  2. Run the demo:

    python demo.py

    Please modify the img and instruction variables in demo.py to experiment with different tasks. You can see examples in results.

Real-World Execution

To execute tasks in the real world, please follow these steps:

  1. Dynamics Models:
    Please download the dynamics model checkpoints from this link. Please update the corresponding paths in dynamics/dyn_utils.py to ensure the checkpoints are properly loaded and accessible.

  2. Calibration:
    We use the xArm6 robot and a ChArUco calibration board. Please run the following code for calibration:

    cd xarm-calibrate
    python calibrate.py
    python camera_to_base_transforms.py

    Please replace the camera serial number in xarm-calibrate/real_world/real_env.py and the robot IP in xarm-calibrate/real_world/xarm6.py.

    To verify the calibration results, please run:

    python verify_stationary_cameras.py
  3. Robot Execution:
    Please ensure the following steps for real world execution:

    • We employ different end-effctors to manipulate various objects. Specifically, we use the cylinder stick for T shape, ropes, board pusher for cubes and granular pieces. You can download them for 3D prints from here. Please update the robot setup in config/real_config.yaml->planner and envs/real_env.py. Ensure the top-down and side cameras have clear views.
    • Please change hyperparameters such as radius in planner/planner.py and box_threshold in perception/models/grounding_dino_wrapper.py for various objects.
    • Please replace the api_key in launch.py with your OpenAI API key.

    Launch the execution:

    python launch.py

    It is expected to see the execution results in logs/low_level and dynamics predictions in logs/{material}-planning-{time.time()}.

🔬 Acknowledgements

We thank the authors of the following projects for making their code open source:

🏷️ License

This repository is released under the MIT license.

🔭 Citation

@misc{liu2025kudakeypointsunifydynamics,
      title={KUDA: Keypoints to Unify Dynamics Learning and Visual Prompting for Open-Vocabulary Robotic Manipulation}, 
      author={Zixian Liu and Mingtong Zhang and Yunzhu Li},
      year={2025},
      eprint={2503.10546},
      archivePrefix={arXiv},
      primaryClass={cs.RO},
      url={https://arxiv.org/abs/2503.10546}, 
}

About

KUDA: Keypoints to Unify Dynamics Learning and Visual Prompting for Open-Vocabulary Robotic Manipulation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages