Tether Autonomous Play with Correspondence-Driven Trajectory Warping

William Liang1,2, Sam Wang1, Hung-Ju Wang1,3, Osbert Bastani1, Yecheng Jason Ma1,3†, Dinesh Jayaraman1†

1University of Pennsylvania

2University of California, Berkeley

3Dyna Robotics

Equal advising

Paper arXiv Code

Abstract

TLDR. Tether performs autonomous multi-task play in the real world with a correspondence-driven trajectory warping policy and vision-language models. Our non-parametric policy outperforms alternative methods in the low-data regime, and the stream of play data consistently improves downstream policies over time, ultimately reaching near-perfect success rates.


The ability to conduct and learn from self-directed interaction and experience is a central challenge in robotics, offering a scalable alternative to labor-intensive human demonstrations. However, realizing such "play" requires (1) a policy robust to diverse, potentially out-of-distribution environment states, and (2) a procedure that continuously produces useful, task-directed robot experience. To address these challenges, we introduce Tether, a method for autonomous play with two key contributions. First, we design a novel non-parametric policy that leverages strong visual priors for extreme generalization: given two-view images, it identifies semantic correspondences to warp demonstration trajectories into new scenes. We show that this design is robust to significant spatial and semantic variations of the environment, such as dramatic positional differences and unseen objects. We then deploy this policy for autonomous multi-task play in the real world via a continuous cycle of task selection, execution, evaluation, and improvement, guided by the visual understanding capabilities of vision-language models. This procedure generates diverse, high-quality datasets with minimal human intervention. In a household-like multi-object setup, our method is among the first to perform many hours of autonomous real-world play, producing a stream of data that consistently improves downstream policy performance over time. Ultimately, Tether yields over 1000 expert-level trajectories and trains policies competitive with those learned from human-collected demonstrations.

Real-World Play

Timelapse of a subsection of our 26-hour long autonomous play experiment. Video plays at 100x speed.

Robust Imitation

We evaluate our policy on tasks involving fruits and containers with in-distribution objects (first row) and out-of-distribution objects (second row), as well as other challenging manipulation skills (third row). Videos are 1x speed.

Method

Correspondence-Driven Trajectory Warping. Given a few demos, our non-parametric policy computes visual correspondences and produces a warped trajectory action plan.

Autonomous Multi-Task Play with Vision-Language Models. Our play procedure continuously runs the Tether policy, cycling across different tasks and querying a VLM for plan generation and success detection.

Quantitative Results

Main Policy Comparison. Our Tether policy surpasses imitation learning baselines across all 12 tasks and performs remarkly well with only 1, 5, or 10 demonstrations.
Autonomous Play Statistics. In around 26 hours of play, Tether produces over 1000 trajectories across 6 tasks and significantly expands the diversity of object positions.
Downstream Policy Learning Results. The stream of data generated by autonomous play consistently improves downstream diffusion policy performance over time, ultimately achieving high success rates competitive with policies trained on an equal number of human-collected demos (in black).

Citation

@misc{liang2025tether, title = {Tether: Autonomous Play with Correspondence-Driven Trajectory Warping}, author = {William Liang and Sam Wang and Hungju Wang and Osbert Bastani and Jason Ma and Dinesh Jayaraman}, year = {2025}, }