Coordinated Humanoid Manipulation with Choice Policies

Haozhi Qi* Yen-Jen Wang*, Toru Lin, Brent Yi, Yi Ma, Koushil Sreenath†, Jitendra Malik†
* Equal contribution. † Equal advising.
UC Berkeley
Abstract:

Humanoid robots hold great promise for operating in human-centric environments, yet achieving robust whole-body coordination across the head, hands, and legs remains a major challenge. We present a system that combines a modular teleoperation interface with a scalable learning framework to address this problem. Our teleoperation design decomposes humanoid control into intuitive submodules, which include hand-eye coordination, grasp primitives, arm end-effector tracking, and locomotion. This modularity allows us to collect high-quality demonstrations efficiently. Building on this, we introduce Choice Policy, an imitation learning approach that generates multiple candidate actions and learns to score them. This architecture enables both fast inference and effective modeling of multimodal behaviors. We validate our approach on two real-world tasks: dishwasher loading and whole-body loco-manipulation for whiteboard wiping. Experiments show that Choice Policy significantly outperforms diffusion policies and standard behavior cloning. Furthermore, our results indicate that hand-eye coordination is critical for success in long-horizon tasks. Our work demonstrates a practical path toward scalable data collection and learning for coordinated humanoid manipulation in unstructured environments.

Highlight Results

Dishwasher Loading
Loco-manipulation Wiping

Our framework enables the humanoid to perform diverse, long-horizon tasks. In Dishwasher Loading, the robot coordinates gaze and reach to handle object insertion. In Loco-manipulation Wiping, the system maintains balance and walking stability, showcasing the tight integration of locomotion and manipulation.

Teleoperation

Teleoperation

Our modular teleoperation interface simplifies complex humanoid control by decomposing whole-body movements into intuitive functional submodules. By providing automated hand-eye coordination and high-level locomotion primitives via a VR interface, we enable operators to collect high-quality demonstrations for long-horizon tasks with minimal physical fatigue.

Comparison of learning method

Stage 3

We introduce Choice Policy, a learning framework designed to handle the inherent multimodality of human expert data. Unlike diffusion models that require iterative sampling, our method generate multiple action candidates and score them in a single forward pass.

Evaluation (10 Consecutive Trials)

We evaluate the robustness of our approach under standard operating conditions, where the robot is tasked with autonomously loading three colored plates into a dishwasher.

Behavior Cloning (BC) often fails due to its inability to capture the multimodal nature of the demonstrations, Diffusion Policy is capable of modeling the distribution but suffers from high latency In contrast, our Choice Policy successfully completes the task by efficiently modeling multiple candidate actions in a single forward pass, providing the precise, high-frequency control.

Choice Policy
Behavior Cloning Policy
Diffusion Policy

*Note: To ensure a fair assessment of policy performance, trials interrupted by external hardware failures or network connection timeouts are excluded from the success rate statistics.

Citation

          @article{qi2025coordinated,
             title={Coordinated Humanoid Manipulation with Choice Policies},
             author={Qi, Haozhi and Wang, Yen-Jen and Lin, Toru and Yi Brent and Ma, Yi and Sreenath, Koushil and Malik, Jitendra},
             journal={arXiv:2512.25072}
             year={2025}
          }
        

Acknowledgements

This work is supported in part by the program "Design of Robustly Implementable Autonomous and Intelligent Machines (TIAMAT)", Defense Advanced Research Projects Agency award number HR00112490425.


Website inspired by PRoPE