Please visit A Survey: Learning Embodied Intelligence from Physical Simulators and World Models for more details and comprehensive information.
Author list: Xiaoxiao Long, Qingrui Zhao, Kaiwen Zhang, Zihao Zhang, Dingrui Wang, Yumeng Liu, Zhengjie Shu, Yi Lu, Shouzheng Wang, Xinzhe Wei, Wei Li, Wei Yin, Yao Yao, Jia Pan, Qiu Shen, Ruigang Yang, Xun Cao, Qionghai Dai
- 1. Introduction
- 2. Levels of Intelligent Robot
- 3. Robotic Mobility, Dexterity and Interaction
- 4. Simulators
- 5. World Models
- 6. World Models for Intelligent Robots
Embodied intelligence provides a foundation for creating robots that can truly understand and reason about the world in a more human-like manner. Central to enabling intelligent behavior in robots are two key technologies: physical simulators and world models. Physical simulators provide controlled, high-fidelity environments for training and evaluating robotic agents, allowing safe and efficient development of complex behaviors. While world models empower robots with internal representations of their surroundings, enabling predictive planning and adaptive decision-making beyond direct sensory input. The synergy between them enhances robots' autonomy, adaptability, and task performance across diverse scenarios.
This repository aims to collect and organize research and resources related to learning embodied AI through the integration of physical simulators and world models.
To address the absence of a comprehensive grading system that integrates the dimensions of "intelligent cognition" and "autonomous behavior," we outline a capability grading model for intelligent robots, ranging from IR-L0 to IR-L4. This model covers the entire technological evolution, from basic mechanical operation levels to advanced social interaction capabilities.
Model Predictive Control, MPC
| Paper | Date | Venue |
|---|---|---|
| Model Predictive Control: Theory, Computation, and Design | 2017 | Nob Hill Publishing, LLC |
| Model predictive control of legged and humanoid robots: models and algorithms | 2023-02 | Advanced Robotics |
| An integrated system for real-time model predictive control of humanoid robots | 2013-10 | Humanoids 2013 |
| Whole-body model-predictive control applied to the HRP-2 humanoid | 2015-09 | IROS 2015 |
Whole-Body Control, WBC
| Paper | Date | Venue |
|---|---|---|
| Humanoid Robotics: A Reference | 2017 | Springer |
| A whole-body control framework for humanoids operating in human environments | 2006-05 | ICRA 2006 |
| Hierarchical quadratic programming: Fast online humanoid-robot motion generation | 2014-05 | The International Journal of Robotics Research |
| Optimization-based locomotion planning, estimation, and control design for the atlas humanoid robot | 2015-07 | Autonomous Robots |
| Compliant locomotion using whole-body control and divergent component of motion tracking | 2015-05 | ICRA 2015 |
| ExBody2: Advanced Expressive Humanoid Whole-Body Control | 2024-12 | arXiv |
| A Unified and General Humanoid Whole-Body Controller for Fine-Grained Locomotion | 2025-02 | arXiv |
Reinforcement Learning
| Paper | Date | Venue |
|---|---|---|
| Reinforcement learning in robotics: A survey | 2013-08 | The International Journal of Robotics Research |
| Learning-based legged locomotion: State of the art and future perspectives | 2025-01 | The International Journal of Robotics Research |
| Reinforcement learning of dynamic motor sequence: Learning to stand up | 1998-10 | IROS 1998 |
| DeepLoco: dynamic locomotion skills using hierarchical deep reinforcement learning | 2017-07 | TOG |
| Learning symmetric and low-energy locomotion | 2018-07 | TOG |
| Emergence of locomotion behaviours in rich environments | 2017-10 | arXiv |
| Iterative Reinforcement Learning Based Design of Dynamic Locomotion Skills for Cassie | 2019-03 | arXiv |
Imitation Learning
| Paper | Date | Venue |
|---|---|---|
| Diffusion policy: Visuomotor policy learning via action diffusion | 2024-10 | The International Journal of Robotics Research |
| 3d diffusion policy | 2024-03 | arXiv |
| Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware | 2023-04 | arXiv |
| RDT-1B: a Diffusion Foundation Model for Bimanual Manipulation | 2024-10 | arXiv |
| Diffuseloco: Real-time legged locomotion control with diffusion from offline datasets | 2024-04 | arXiv |
| AMP: adversarial motion priors for stylized physics-based character control | 2021-07 | TOG |
| Whole-body Humanoid Robot Locomotion with Human Reference | 2024-10 | IROS 2024 |
| Dexcap: Scalable and portable mocap data collection system for dexterous manipulation | 2024-03 | arXiv |
| Open-television: Teleoperation with immersive active visual feedback | 2024-07 | arXiv |
| Visual Imitation Enables Contextual Humanoid Control | 2025-05 | arXiv |
Visual-Language-Action Models, VLA
| Paper | Date | Venue |
|---|---|---|
| Rt-2: Vision-language-action models transfer web knowledge to robotic control | 2023-07 | CoRL 2023 |
| Openvla: An open-source vision-language-action | 2024-06 | arXiv |
| 3D-VLA: A 3D Vision-Language-Action Generative World Model | 2024-03 | arXiv |
| Magma: A foundation model for multimodal ai agents | 2025-06 | CVPR 2025 |
| $π_0$: A Vision-Language-Action Flow Model for General Robot Control | 2024-10 | arXiv |
| Fast: Efficient action tokenization for vision-language-action models | 2025-01 | arXiv |
| Hi robot: Open-ended instruction following with hierarchical vision-language-action models | 2025-02 | arXiv |
| TinyVLA: Towards Fast, Data-Efficient Vision-Language-Action Models for Robotic Manipulation | 2024-09 | arXiv |
| Vision-Language-Action Models: Concepts, Progress, Applications and Challenges | 2025-05 | arXiv |
Related Survey
| Paper | Date | Venue |
|---|---|---|
| Humanoid Locomotion and Manipulation: Current Progress and Challenges in Control, Planning, and Learning | 2025-04 | arXiv |
| A Comprehensive Review of Humanoid Robots | 2025-03 | SmartBot |
| Recent Progress in Legged Robots Locomotion Control | 2021-06 | Current Robotics Reports |
Legged Locomotion
Gripper-based manipulation
| Paper | Date | Venue | Code |
|---|---|---|---|
| Diffusion Policy: Visuomotor Policy Learning via Action Diffusion | 2023-03 | RSS 2023 | Code |
| RT-1: Robotics Transformer for Real-World Control at Scale | 2022-12 | Arxiv | Code |
| RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control | 2023-7 | PMLR 23 | Code |
| Perceiver-Actor: A Multi-Task Transformer for Robotic Manipulation | 2022-9 | CoRL 2022 | Code |
| Act3d: 3d feature field transformers for multi-task robotic manipulation | 2023-6 | Arxiv | Code |
| Modeling of deformable objects for robotic manipulation: A tutorial and review | 2020-9 | Frontiers in Robotics and AI | -- |
| 6-DOF Grasping for Target-driven Object Manipulation in Clutter | 2019-12 | ICRA 2020 | -- |
| Cable manipulation with a tactile-reactive gripper | 2021-12 | IJRR 2021 | -- |
Dexterous hand manipulation
Bimanual Manipulation Task
| Paper | Date | Venue | Code |
|---|---|---|---|
| Stabilize to act: Learning to coordinate for bimanual manipulation | 2023-9 | CoRL 2023 | -- |
| Interactive imitation learning of bimanual movement primitives | 2023-8 | TMECH | -- |
| Learning fine-grained bimanual manipulation with low-cost hardware | 2023-4 | RSS 2023 | Cpde |
| Mobile ALOHA: Learning Bimanual Mobile Manipulation with Low-Cost Whole-Body Teleoperation | 2024-1 | CoRL 2024 | Code |
| RoboTwin 2.0: A Scalable Data Generator and Benchmark with Strong Domain Randomization for Robust Bimanual Robotic Manipulation | 2025-6 | Arxiv | Code |
| Rdt-1b: a diffusion foundation model for bimanual manipulation | 2024-10 | Arxiv | Code |
Whole-Body Manipulation Control
| Paper | Date | Venue | Code |
|---|---|---|---|
| Tidybot: Personalized robot assistance with large language models | 2023-12 | Autonomous Robots | -- |
| Open-world object manipulation using pre-trained vision-language models | 2023-2 | Arixv | Website |
| Harmon: Whole-body motion generation of humanoid robots from language descriptions | 2024-10 | CoRL 2024 | Website |
| Okami: Teaching humanoid robots manipulation skills through single video imitation | 2024-10 | CoRL 2024 | Code |
| Generalizable Humanoid Manipulation with 3D Diffusion Policies | 2024-10 | Arixv | Code |
| OmniH2O: Universal and Dexterous Human-to-Humanoid Whole-Body Teleoperation and Learning | 2024-6 | CoRL 2024 | Code |
| HumanPlus: Humanoid Shadowing and Imitation from Humans | 2024-6 | CoRL 2024 | Code |
| BEHAVIOR Robot Suite: Streamlining Real-World Whole-Body Manipulation for Everyday Household Activities | 2025-3 | Arixv | Code |
Foundation Models in Humanoid Robot Manipulation
| Paper | Date | Venue | Code |
|---|---|---|---|
| Do as i can, not as i say: Grounding language in robotic affordances | 2024-4 | Arixv | Code |
| Palm-e: An embodied multimodal language model | 2023-3 | ICML 2023 | Website |
| Inner monologue: Embodied reasoning through planning with language models | 2022-7 | Arxiv | Website |
| Code as policies: Language model programs for embodied control | 2022-9 | ICRA 2023 | Code |
| STIV: Scalable Text and Image Conditioned Video Generation | 2024-12 | Arxiv | -- |
| GR00T N1: An open foundation model for generalist humanoid robots | 2025-3 | Arxiv | Code |
| $\pi_0$: A Vision-Language-Action Flow Model for General Robot Control | 2024-10 | Arxiv | Code |
| Openvla: An open-source vision-language-action model | 2024-6 | Arxiv | Code |
| Gr-2: A generative video-language-action model with web-scale knowledge for robot manipulation | 2024-10 | Arxiv | Website |
Related Survey
| Paper | Date | Venue |
|---|---|---|
| Humanlike service robots: A systematic literature review and research agenda | 2024-08 | Psychology & Marketing |
| Human–robot collaboration and machine learning: A systematic review of recent research | 2023-02 | Robotics and Computer-Integrated Manufacturing |
| Emotion Recognition for Human-Robot Interaction: Recent Advances and Future Perspectives | 2020-12 | Frontiers in Robotics and AI |
| Application, Development and Future Opportunities of Collaborative Robots (Cobots) in Manufacturing: A Literature Review | 2022-04 | International Journal of Human–Computer Interaction |
| Towards Social AI: A Survey on Understanding Social Interactions | 2024-09 | arXiv |
| Human–robot interaction: A review and analysis on variable admittance control, safety, and perspectives | 2022-07 | Machines |
| Human-robot perception in industrial environments: A survey | 2021-02 | Sensors |
Cognitive Collaboration
Physical Reliability
Social Embeddedness
Related Survey
| Paper | Date | Venue | Code | Application |
|---|---|---|---|---|
| A Review of Physics Simulators | 2021 | IEEE Access | – | Simlulator Survey |
| Review of Embodied AI | 2025 | arXiv | – | Embodied AI Survey |
| A Survey of Embodied AI | 2022 | arXiv | – | Embodied AI Survey |
Related Works
| Paper | Date | Venue | Code | Application |
|---|---|---|---|---|
| ManiSkill3 | 2024 | arXiv | – | Manipulation Benchmark |
| ManiSkill2 | 2023 | ICLR | – | Manipulation Benchmark |
| Analysis using DEM | 2020 | IEEE Aerospace | – | Granular Simulation |
| Mobile Aloha | 2024 | arXiv | – | Teleoperation |
| Open-Television | 2024 | arXiv | – | Teleoperation |
| Universal Manipulation Interface | 2024 | arXiv | – | Imitation Learning |
Overview and Documentation
| Paper | Date | Venue | Code | Application |
|---|---|---|---|---|
| Webots: Professional Mobile Robot Simulation | 2004 | JARS | – | Simulator Platform |
| Design and use paradigms for Gazebo | 2004 | IROS | – | Simulator Platform |
| MuJoCo: A physics engine for model-based control | 2012 | IROS | – | Simulator Platform |
| PyBullet: Python module for physics simulation | 2016 | GitHub | GitHub | Simulator Platform |
| CoppeliaSim (formerly V-REP) | 2013 | IROS | – | Simulator Platform |
| Isaac Gym: GPU-based physics simulation for robot learning | 2021 | arXiv | – | Simulator Platform |
| Isaac Sim | 2025 | NVIDIA Developer | – | Simulator Platform |
| Isaac Lab Documentation | 2025 | NVIDIA Developer | – | Simulator Platform |
| SAPIEN: A simulated part-based interactive environment | 2020 | CVPR | – | Simulator Platform |
| Genesis: A Universal and Generative Physics Engine | 2024 | GitHub | GitHub | Simulator Platform |
| MuJoCo Programming Guide | 2025 | Docs | – | Developer Guide |
| Newton Isaac Sim Project | 2024 | GitHub | GitHub | Simulator Platform |
| Newton Physics Engine Announcement | 2025 | NVIDIA Blog | – | Physics Engine |
Physical Simulation Engines and Platforms
| Paper | Date | Venue | Code | Application |
|---|---|---|---|---|
| LS Group Interact Kinematics | 2025 | Docs | – | Kinematics Documentation |
| NVIDIA Omniverse | 2025 | NVIDIA Developer | – | 3D Simulation & Collaboration Platform |
| NVIDIA PhysX System Software | 2021 | NVIDIA Developer | – | Real-Time Physics Engine |
Rendering Engines and Framework
| Paper | Date | Venue | Code | Application |
|---|---|---|---|---|
| LuisaRender | 2022 | TOG | – | Rendering Framework |
| Pyrender | 2019 | GitHub | GitHub | Rendering |
| HydraRendererInfo | 2019 | GitHub | GitHub | Rendering |
| The Alliance for OpenUSD | 2023 | AOUSD | – | Open Universal Scene Description (USD) Standard |
| OpenGL: The Industry Standard for High‑Performance Graphics | 1992 | Khronos Group | – | Cross-Platform Graphics API |
| Vulkan: Cross‑Platform 3D Graphics and Compute API | 2016 | Khronos Group | – | Low-Level Graphics and Compute API |
| NVIDIA OptiX™ Ray Tracing Engine | 2024 | NVIDIA Developer | – | GPU-Accelerated Ray Tracing Framework |
Representative Architectures of World Models
Core roles of World Models
WMs as Neural Simulators for Autonomous Driving
WMs as Dynamic Models for Autonomous Driving
WMs as Reward Models for Autonomous Driving
| Paper | Date | Venue | Code | Application |
|---|---|---|---|---|
| SEM2: Enhance Sample Efficiency and Robustness of End-to-end Urban Autonomous Driving via Semantic Masked World Model | 2024-05 | T-ITS | - | Reinforcement Learning |
| Iso-Dream: Isolating and Leveraging Noncontrollable Visual Dynamics in World Models | 2022-05 | NeurIPS 2022 | GitHub | Reinforcement Learning |
| Vista: A Generalizable Driving World Model with High Fidelity and Versatile Controllability | 2024-05 | NeurIPS 2024 | GitHub | Reinforcement Learning |
| Driving into the Future: Multiview Visual Forecasting and Planning with World Model for Autonomous Driving | 2023-11 | CVPR 2024 | GitHub | Motion Planning |
| WoTE: World-model-based End-to-end Autonomous Driving | 2025-04 | ICCV 2025 | GitHub | Motion Planning |
The following table compares researches for World Models in Robotics in terms of model input, architecture, experiment platform, and code availability.
Neural Simulators
| Paper | Date | Venue | Code |
|---|---|---|---|
| Whale: Towards generalizable and scalable world models for embodied decision-making | 2024-08 | arXiv | - |
| RoboDreamer: Learning Compositional World Models for Robot Imagination | 2024-08 | ICML 2024 | Github |
| Dream to Manipulate: Compositional World Models Empowering Robot Imitation Learning with Imagination | 2024-11 | ICLR 2025 | GitHub |
| EnerVerse: Envisioning Embodied Future Space for Robotics Manipulation | 2025-01 | arXiv | Github |
| Cosmos World Foundation Model Platform for Physical AI | 2025-03 | arXiv | Github |
| WorldEval: World Model as Real-World Robot Policies Evaluator | 2025-05 | arXiv | Github |
| DreamGen: Unlocking Generalization in Robot Learning through Neural Trajectories | 2025-05 | arXiv | Github |






