Skip to content

NJU3DV-LoongGroup/Embodied-World-Models-Survey

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

51 Commits
 
 
 
 
 
 

Repository files navigation

A Survey: Learning Embodied Intelligence from Physical Simulators and World Models

Stars Badge Forks Badge Pull Requests Badge Issues Badge License Badge

🤝   Citation

Please visit A Survey: Learning Embodied Intelligence from Physical Simulators and World Models for more details and comprehensive information.

Author list: Xiaoxiao Long, Qingrui Zhao, Kaiwen Zhang, Zihao Zhang, Dingrui Wang, Yumeng Liu, Zhengjie Shu, Yi Lu, Shouzheng Wang, Xinzhe Wei, Wei Li, Wei Yin, Yao Yao, Jia Pan, Qiu Shen, Ruigang Yang, Xun Cao, Qionghai Dai

Table of Content

1. Introduction

Embodied intelligence provides a foundation for creating robots that can truly understand and reason about the world in a more human-like manner. Central to enabling intelligent behavior in robots are two key technologies: physical simulators and world models. Physical simulators provide controlled, high-fidelity environments for training and evaluating robotic agents, allowing safe and efficient development of complex behaviors. While world models empower robots with internal representations of their surroundings, enabling predictive planning and adaptive decision-making beyond direct sensory input. The synergy between them enhances robots' autonomy, adaptability, and task performance across diverse scenarios.

This repository aims to collect and organize research and resources related to learning embodied AI through the integration of physical simulators and world models.

2. Levels of Intelligent Robot

To address the absence of a comprehensive grading system that integrates the dimensions of "intelligent cognition" and "autonomous behavior," we outline a capability grading model for intelligent robots, ranging from IR-L0 to IR-L4. This model covers the entire technological evolution, from basic mechanical operation levels to advanced social interaction capabilities.

3. Robotic Mobility, Dexterity and Interaction

Related Robotic Techniques

Model Predictive Control, MPC
Paper Date Venue
Model Predictive Control: Theory, Computation, and Design 2017 Nob Hill Publishing, LLC
Model predictive control of legged and humanoid robots: models and algorithms 2023-02 Advanced Robotics
An integrated system for real-time model predictive control of humanoid robots 2013-10 Humanoids 2013
Whole-body model-predictive control applied to the HRP-2 humanoid 2015-09 IROS 2015
Whole-Body Control, WBC
Paper Date Venue
Humanoid Robotics: A Reference 2017 Springer
A whole-body control framework for humanoids operating in human environments 2006-05 ICRA 2006
Hierarchical quadratic programming: Fast online humanoid-robot motion generation 2014-05 The International Journal of Robotics Research
Optimization-based locomotion planning, estimation, and control design for the atlas humanoid robot 2015-07 Autonomous Robots
Compliant locomotion using whole-body control and divergent component of motion tracking 2015-05 ICRA 2015
ExBody2: Advanced Expressive Humanoid Whole-Body Control 2024-12 arXiv
A Unified and General Humanoid Whole-Body Controller for Fine-Grained Locomotion 2025-02 arXiv
Reinforcement Learning
Paper Date Venue
Reinforcement learning in robotics: A survey 2013-08 The International Journal of Robotics Research
Learning-based legged locomotion: State of the art and future perspectives 2025-01 The International Journal of Robotics Research
Reinforcement learning of dynamic motor sequence: Learning to stand up 1998-10 IROS 1998
DeepLoco: dynamic locomotion skills using hierarchical deep reinforcement learning 2017-07 TOG
Learning symmetric and low-energy locomotion 2018-07 TOG
Emergence of locomotion behaviours in rich environments 2017-10 arXiv
Iterative Reinforcement Learning Based Design of Dynamic Locomotion Skills for Cassie 2019-03 arXiv
Imitation Learning
Paper Date Venue
Diffusion policy: Visuomotor policy learning via action diffusion 2024-10 The International Journal of Robotics Research
3d diffusion policy 2024-03 arXiv
Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware 2023-04 arXiv
RDT-1B: a Diffusion Foundation Model for Bimanual Manipulation 2024-10 arXiv
Diffuseloco: Real-time legged locomotion control with diffusion from offline datasets 2024-04 arXiv
AMP: adversarial motion priors for stylized physics-based character control 2021-07 TOG
Whole-body Humanoid Robot Locomotion with Human Reference 2024-10 IROS 2024
Dexcap: Scalable and portable mocap data collection system for dexterous manipulation 2024-03 arXiv
Open-television: Teleoperation with immersive active visual feedback 2024-07 arXiv
Visual Imitation Enables Contextual Humanoid Control 2025-05 arXiv
Visual-Language-Action Models, VLA
Paper Date Venue
Rt-2: Vision-language-action models transfer web knowledge to robotic control 2023-07 CoRL 2023
Openvla: An open-source vision-language-action 2024-06 arXiv
3D-VLA: A 3D Vision-Language-Action Generative World Model 2024-03 arXiv
Magma: A foundation model for multimodal ai agents 2025-06 CVPR 2025
$π_0$: A Vision-Language-Action Flow Model for General Robot Control 2024-10 arXiv
Fast: Efficient action tokenization for vision-language-action models 2025-01 arXiv
Hi robot: Open-ended instruction following with hierarchical vision-language-action models 2025-02 arXiv
TinyVLA: Towards Fast, Data-Efficient Vision-Language-Action Models for Robotic Manipulation 2024-09 arXiv
Vision-Language-Action Models: Concepts, Progress, Applications and Challenges 2025-05 arXiv

Robotic Locomotion

Related Survey
Paper Date Venue
Humanoid Locomotion and Manipulation: Current Progress and Challenges in Control, Planning, and Learning 2025-04 arXiv
A Comprehensive Review of Humanoid Robots 2025-03 SmartBot
Recent Progress in Legged Robots Locomotion Control 2021-06 Current Robotics Reports
Legged Locomotion
Paper Date Venue
Compliant terrain adaptation for biped humanoids without measuring ground surface and contact force 2009-02 T-RO
Online Learning of Uneven Terrain for Humanoid Bipedal Walking 2010-07 AAAI 2010
Practical bipedal walking control on uneven terrain using surface learning and push recovery 2011-09 IROS 2011
Biped walking stabilization based on linear inverted pendulum tracking 2010-09 IROS 2010
Dynamic walking with compliance on a Cassie bipedal robot 2019-06 European Control Conference
Dynamic walking on compliant and uneven terrain using DCM and passivity-based whole-body control 2019-10 Humanoids 2019
Fast Contact-Implicit Model Predictive Control 2024-01 T-RO
Efficient Anytime CLF Reactive Planning System for a Bipedal Robot on Undulating Terrain 2023-01 T-RO
Learning quadrupedal locomotion over challenging terrain 2020-10 Science Robotics
Blind bipedal stair traversal via sim-to-real reinforcement learning 2021-07 Robotics: Science and Systems (RSS)
Learning vision-based bipedal locomotion for challenging terrain 2024-05 ICRA 2024
Learning humanoid locomotion with perceptive internal model 2024-11 arXiv
Humanoid parkour learning 2024-06 arXiv
Unified modeling and control of walking and running on the spring-loaded inverted pendulum 2016-08 T-RO
Capturability-based analysis and control of legged locomotion, part 2: Application to m2v2, a lower- body humanoid 2012-09 ijrr
Convex model predictive control of single rigid body model on so (3) for versatile dynamic legged motions 2023-05 ICRA 2023
Bipedal hopping: Reduced- order model embedding via optimization-based control 2018-10 IROS 2018
Vertical Jump of a Humanoid Robot With CoP-Guided Angular Momentum Control and Impact Absorption 2023-05 T-RO
CDM-MPC: An integrated dynamic planning and control framework for bipedal robots jumping 2024-06 RAL
Optimizing bipedal locomotion for the 100m dash with comparison to human running 2023-05 ICRA 2023
Reinforcement learning for versatile, dynamic, and robust bipedal locomotion control 2024-10 ijrr
Expressive Whole-Body Control for Humanoid Robots 2024-09 RSS
Exbody2: Advanced expressive humanoid whole-body control 2024-12 arXiv
OMNIH2O: Universal and dexterous human- to-humanoid whole-body teleoperation and learning 2024-06 CoRL 2024
ASAP: Aligning simulation and real-world physics for learning agile humanoid whole-body skills 2025-02 arXiv

Robotic Manipulation

Gripper-based manipulation
Paper Date Venue Code
Diffusion Policy: Visuomotor Policy Learning via Action Diffusion 2023-03 RSS 2023 Code
RT-1: Robotics Transformer for Real-World Control at Scale 2022-12 Arxiv Code
RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control 2023-7 PMLR 23 Code
Perceiver-Actor: A Multi-Task Transformer for Robotic Manipulation 2022-9 CoRL 2022 Code
Act3d: 3d feature field transformers for multi-task robotic manipulation 2023-6 Arxiv Code
Modeling of deformable objects for robotic manipulation: A tutorial and review 2020-9 Frontiers in Robotics and AI --
6-DOF Grasping for Target-driven Object Manipulation in Clutter 2019-12 ICRA 2020 --
Cable manipulation with a tactile-reactive gripper 2021-12 IJRR 2021 --
Dexterous hand manipulation
Paper Date Venue Code
Dexgraspnet: A large-scale robotic dexterous grasp dataset for general objects based on simulation 2023-5 Arxiv Code
DexGraspNet 2.0: Learning Generative Dexterous Grasping in Large-scale Synthetic Cluttered Scenes 2024-10 CoRL 2024 Code
HGC-Net: Deep anthropomorphic hand grasping in clutter 2022-5 ICRA 2022 Code
Deep differentiable grasp planner for high-dof grippers 2022-2 Arxiv --
DexGrasp Anything: Towards Universal Robotic Dexterous Grasping with Physics Awareness 2025-3 Arxiv Code
UGG: Unified Generative Grasping 2023-11 ECCV 2004 Code
SpringGrasp: Synthesizing Compliant, Dexterous Grasps under Shape Uncertainty 2024-4 Arxiv Code
A System for General In-Hand Object Re-Orientation 2021-11 CoRL 2021 Code
Visual dexterity: In-hand reorientation of novel and complex object shapes 2023-11 Science Robotics 2023 --
Rotating without Seeing: Towards In-hand Dexterity through Touch 2023-3 RSS 2023 Code
DexVIP: Learning Dexterous Grasping with Human Hand Pose Priors from Video 2021-6 CoRL 2021 --
DexGraspVLA: A Vision-Language-Action Framework Towards General Dexterous Grasping 2025-2 Arixv Code
Bimanual Manipulation Task
Paper Date Venue Code
Stabilize to act: Learning to coordinate for bimanual manipulation 2023-9 CoRL 2023 --
Interactive imitation learning of bimanual movement primitives 2023-8 TMECH --
Learning fine-grained bimanual manipulation with low-cost hardware 2023-4 RSS 2023 Cpde
Mobile ALOHA: Learning Bimanual Mobile Manipulation with Low-Cost Whole-Body Teleoperation 2024-1 CoRL 2024 Code
RoboTwin 2.0: A Scalable Data Generator and Benchmark with Strong Domain Randomization for Robust Bimanual Robotic Manipulation 2025-6 Arxiv Code
Rdt-1b: a diffusion foundation model for bimanual manipulation 2024-10 Arxiv Code
Whole-Body Manipulation Control
Paper Date Venue Code
Tidybot: Personalized robot assistance with large language models 2023-12 Autonomous Robots --
Open-world object manipulation using pre-trained vision-language models 2023-2 Arixv Website
Harmon: Whole-body motion generation of humanoid robots from language descriptions 2024-10 CoRL 2024 Website
Okami: Teaching humanoid robots manipulation skills through single video imitation 2024-10 CoRL 2024 Code
Generalizable Humanoid Manipulation with 3D Diffusion Policies 2024-10 Arixv Code
OmniH2O: Universal and Dexterous Human-to-Humanoid Whole-Body Teleoperation and Learning 2024-6 CoRL 2024 Code
HumanPlus: Humanoid Shadowing and Imitation from Humans 2024-6 CoRL 2024 Code
BEHAVIOR Robot Suite: Streamlining Real-World Whole-Body Manipulation for Everyday Household Activities 2025-3 Arixv Code
Foundation Models in Humanoid Robot Manipulation
Paper Date Venue Code
Do as i can, not as i say: Grounding language in robotic affordances 2024-4 Arixv Code
Palm-e: An embodied multimodal language model 2023-3 ICML 2023 Website
Inner monologue: Embodied reasoning through planning with language models 2022-7 Arxiv Website
Code as policies: Language model programs for embodied control 2022-9 ICRA 2023 Code
STIV: Scalable Text and Image Conditioned Video Generation 2024-12 Arxiv --
GR00T N1: An open foundation model for generalist humanoid robots 2025-3 Arxiv Code
$\pi_0$: A Vision-Language-Action Flow Model for General Robot Control 2024-10 Arxiv Code
Openvla: An open-source vision-language-action model 2024-6 Arxiv Code
Gr-2: A generative video-language-action model with web-scale knowledge for robot manipulation 2024-10 Arxiv Website

Human-Robot Interaction

Related Survey
Paper Date Venue
Humanlike service robots: A systematic literature review and research agenda 2024-08 Psychology & Marketing
Human–robot collaboration and machine learning: A systematic review of recent research 2023-02 Robotics and Computer-Integrated Manufacturing
Emotion Recognition for Human-Robot Interaction: Recent Advances and Future Perspectives 2020-12 Frontiers in Robotics and AI
Application, Development and Future Opportunities of Collaborative Robots (Cobots) in Manufacturing: A Literature Review 2022-04 International Journal of Human–Computer Interaction
Towards Social AI: A Survey on Understanding Social Interactions 2024-09 arXiv
Human–robot interaction: A review and analysis on variable admittance control, safety, and perspectives 2022-07 Machines
Human-robot perception in industrial environments: A survey 2021-02 Sensors
Cognitive Collaboration
Paper Date Venue Code Task
Artificial cognition for social human–robot interaction: An implementation 2017-06 Artificial Intelligence -- Robot Cognitive Skills
Cognitive Interaction Analysis in Human–Robot Collaboration Using an Assembly Task 2021-05 Electronics -- Assembly Collabotation
Enhancing Robotic Collaborative Tasks Through Contextual Human Motion Prediction and Intention Inference 2024-07 International Journal of Social Robotics -- Human-Robot Handover
L3MVN: Leveraging Large Language Models for Visual Target Navigation 2023-10 IROS 2023 Github Object Goal Navigation
SG-Nav: Online 3D Scene Graph Prompting for LLM-based Zero-shot Object Navigation 2024-10 NeurIPS 2024 Github Object Goal Navigation
TriHelper: Zero-Shot Object Navigation with Dynamic Assistance 2024-03 IROS 2024 -- Object Goal Navigation
CogNav: Cognitive Process Modeling for Object Goal Navigation with LLMs 2024-10 NeurIPS 2024 OWA Workshop -- Object Goal Navigation
UniGoal: Towards Universal Zero-shot Goal-oriented Navigation 2025-03 CVPR 2025 Github Goal-oriented Navigation
Physical Reliability
Paper Date Venue Code Remarks
A Comparative Study of Probabilistic Roadmap Planners 2004 Algorithmic foundations of robotics V -- Probabilistic Roadmap Planning (PRM)
Rapidly-exploring random trees: A new tool for Path Planning 1998 Research Report -- Rapidly-exploring Random Trees (RRT)
Sampling-based Algorithms for Optimal Motion Planning 2011-05 International Journal of Robotics Research -- PRM* and RRT*
Path planning for manipulators based on an improved probabilistic roadmap method 2021-12 Robotics and Computer-Integrated Manufacturing -- Path Planning for Manipulators
RRT-connect: An efficient approach to single-query path planning 2000-04 ICRA 2000 -- Incrementally build two RRTs from the start and goal.
Homotopy-Aware RRT*: Toward Human-Robot Topological Path-Planning 2016-03 11th ACM/IEEE International Conference on Human-Robot Interaction -- Human-robot Interactive Path-planning
Human-in-the-loop Robotic Manipulation Planning for Collaborative Assembly 2019-09 IEEE Transactions on Automation Science and Engineering -- Human-robot Interactive Path-planning
CHOMP: Gradient optimization techniques for efficient motion planning 2009-05 ICRA 2009 MoveIt! Gradient-based Trajectory Optimization
STOMP: Stochastic trajectory optimization for motion planning 2011-05 ICRA 2011 MoveIt! Probabilistic Trajectory Optimization
ITOMP: Incremental trajectory optimization for real-time replanning in dynamic environments 2012-05 Proceedings of the International Conference on Automated Planning and Scheduling Github Trajectory Optimization in Dynamic Environment
Motion planning with sequential convex optimization and convex collision checking 2014 IJRR 2014 -- Trajectory Optimization using SCO
Considering avoidance and consistency in motion planning for human-robot manipulation in a shared workspace 2016-05 ICRA 2016 -- Human-robot Interactive Path-planning
Considering Human Behavior in Motion Planning for Smooth Human-Robot Collaboration in Close Proximity 2018-08 27th IEEE International Symposium on Robot and Human Interactive Communication -- Human-robot Interactive Path-planning
Continuous-time Gaussian process motion planning via probabilistic inference 2017-07 IJRR 2018 -- Gaussian Process Motion Planner (GPMP)
Simultaneous Scene Reconstruction and Whole-Body Motion Planning for Safe Operation in Dynamic Environments 2021-03 IROS 2021 -- GPMP for Whole-body Motion Planning in Dynamic Scene
Admittance control for collaborative dual-arm manipulation 2019-12 International Conference on Advanced Robotics -- Admittance Control
Cooperative control of dual-arm robots in different human-robot collaborative tasks 2020-02 Assembly Automation -- Admittance Control
Control system design and methods for collaborative robots 2023-01 Applied Sciences -- Interactive Control System
Towards shared autonomy framework for human-aware motion planning in industrial human-robot collaboration 2020-08 International Conference on Automation Science and Engineering -- Industrial HRI
An actor-critic approach for legible robot motion planner 2020-05 ICRA 2020 -- RL Method
A task-adaptive deep reinforcement learning framework for dual-arm robot manipulation 2024-01 IEEE Transactions on Automation Science and Engineering -- RL Method
Learning robust skills for tightly coordinated arms in contact-rich tasks 2024-01 IEEE RAL -- RL Method
HandoverSim: A Simulation Framework and Benchmark for Human-to-Robot Object Handovers 2022-05 ICRA 2022 Github Benchmark
GenH2R: Learning Generalizable Human-to-Robot Handover via Scalable Simulation, Demonstration, and Imitation 2024-01 CVPR 2024 Github Imitation Learning
MobileH2R: Learning Generalizable Human to Mobile Robot Handover Exclusively from Scalable and Diverse Synthetic Data 2025-01 CoRR 2025 -- Imitation Learning
Social Embeddedness
Paper Date Venue Links Remarks
The space between us: A neurophilosophical framework for the investigation of human interpersonal space 2009-03 Neuroscience & Biobehavioral Reviews -- Peripersonal Space
The interrelation between peripersonal action space and interpersonal social space: psychophysiological evidence and clinical implications 2021-02 Frontiers in Human Neuroscience -- Peripersonal Space
Robot-assisted shopping for the blind: issues in spatial cognition and product selection 2008-03 Intelligent Service Robotics -- Application in Social Scenario
A review of assistive spatial orientation and navigation technologies for the visually impaired 2017-08 Universal Access in the Information Society -- Application in Social Scenario
ShelfHelp: Empowering Humans to Perform Vision-Independent Manipulation Tasks with a Socially Assistive Robotic Cane 2024-05 arXiv -- Application in Social Scenario
Conversational memory network for emotion recognition in dyadic dialogue videos 2018-06 Proceedings of the conference. Association for Computational Linguistics -- Linguistic Research
Graph Based Network with Contextualized Representations of Turns in Dialogue 2021-09 EMNLP 2021 -- Linguistic Research
DialogueGCN: A Graph Convolutional Neural Network for Emotion Recognition in Conversation 2019-08 EMNLP 2019 -- Linguistic Research
Dialogue act modeling for automatic tagging and recognition of conversational speech 2000-10 Computational Linguistics -- Linguistic Research
Werewolf among us: Multimodal resources for modeling persuasion behaviors in social deduction games 2022-12 ACL 2023 -- Linguistic Research
The Call for Socially Aware Language Technologies 2025-02 pre-MIT Press publication version -- Linguistic Research
LD-ConGR: A Large RGB-D Video Dataset for Long-Distance Continuous Gesture Recognition 2022-06 CVPR 2022 Github Non-verbal Behaviors Study
The Audio-Visual Conversational Graph: From an Egocentric-Exocentric Perspective 2023-12 CVPR 2024 Github Non-verbal Behaviors Study
SocialGesture: Delving into Multi-person Gesture Understanding 2025-04 CVPR 2025 Dataset Non-verbal Behaviors Study
JRDB-Social: A Multifaceted Robotic Dataset for Understanding of Context and Dynamics of Human Interactions Within Social Groups 2024-04 CVPR 2024 Project Page HRI in Social Group
MRAC Track 1: 2nd Workshop on Multimodal, Generative and Responsible Affective Computing 2024-09 ACM MM Workshop 2024 Workshop Page Affective Computing
The Tong Test: Evaluating Artificial General Intelligence Through Dynamic Embodied Physical and Social Interactions 2024-03 Engineering -- Evaluation of AGI in Social Interaction

4. Simulators

Related Survey
Paper Date Venue Code Application
A Review of Physics Simulators 2021 IEEE Access Simlulator Survey
Review of Embodied AI 2025 arXiv Embodied AI Survey
A Survey of Embodied AI 2022 arXiv Embodied AI Survey
Related Works
Paper Date Venue Code Application
ManiSkill3 2024 arXiv Manipulation Benchmark
ManiSkill2 2023 ICLR Manipulation Benchmark
Analysis using DEM 2020 IEEE Aerospace Granular Simulation
Mobile Aloha 2024 arXiv Teleoperation
Open-Television 2024 arXiv Teleoperation
Universal Manipulation Interface 2024 arXiv Imitation Learning

Mainstream Simulators

Overview and Documentation
Paper Date Venue Code Application
Webots: Professional Mobile Robot Simulation 2004 JARS Simulator Platform
Design and use paradigms for Gazebo 2004 IROS Simulator Platform
MuJoCo: A physics engine for model-based control 2012 IROS Simulator Platform
PyBullet: Python module for physics simulation 2016 GitHub GitHub Simulator Platform
CoppeliaSim (formerly V-REP) 2013 IROS Simulator Platform
Isaac Gym: GPU-based physics simulation for robot learning 2021 arXiv Simulator Platform
Isaac Sim 2025 NVIDIA Developer Simulator Platform
Isaac Lab Documentation 2025 NVIDIA Developer Simulator Platform
SAPIEN: A simulated part-based interactive environment 2020 CVPR Simulator Platform
Genesis: A Universal and Generative Physics Engine 2024 GitHub GitHub Simulator Platform
MuJoCo Programming Guide 2025 Docs Developer Guide
Newton Isaac Sim Project 2024 GitHub GitHub Simulator Platform
Newton Physics Engine Announcement 2025 NVIDIA Blog Physics Engine

Physical Properties of Simulators

Physical Simulation Engines and Platforms
Paper Date Venue Code Application
LS Group Interact Kinematics 2025 Docs Kinematics Documentation
NVIDIA Omniverse 2025 NVIDIA Developer 3D Simulation & Collaboration Platform
NVIDIA PhysX System Software 2021 NVIDIA Developer Real-Time Physics Engine

Rendering Capabilities

Rendering Engines and Framework
Paper Date Venue Code Application
LuisaRender 2022 TOG Rendering Framework
Pyrender 2019 GitHub GitHub Rendering
HydraRendererInfo 2019 GitHub GitHub Rendering
The Alliance for OpenUSD 2023 AOUSD Open Universal Scene Description (USD) Standard
OpenGL: The Industry Standard for High‑Performance Graphics 1992 Khronos Group Cross-Platform Graphics API
Vulkan: Cross‑Platform 3D Graphics and Compute API 2016 Khronos Group Low-Level Graphics and Compute API
NVIDIA OptiX™ Ray Tracing Engine 2024 NVIDIA Developer GPU-Accelerated Ray Tracing Framework

Sensor and Joint Component Types

5. World Models

Representative Architectures of World Models
Paper Date Venue Code Architecture
World Models 2018-03 NeurIPS 2018 - RSSM
Learning Latent Dynamics for Planning from Pixels 2018-11 ICML 2019 Github RSSM
Dream to Control: Learning Behaviors by Latent Imagination (Dreamer) 2019-12 ICLR 2020 Github RSSM
Mastering Atari with Discrete World Models (Dreamer v2) 2020-10 ICLR 2021 Github RSSM
DayDreamer: World Models for Physical Robot Learning 2022-06 CoRL 2022 Github RSSM
Mastering Diverse Domains through World Models (Dreamer v3) 2023-01 Nature Github RSSM
A Path Towards Autonomous Machine Intelligence 2022-06 OpenReview - JEPA
Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture (I-JEPA) 2023-01 CVPR 2023 Github JEPA
Revisiting Feature Prediction for Learning Visual Representations from Video (V-JEPA) 2024-04 arXiv Github JEPA
V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning 2025-06 arXiv Github JEPA
TransDreamer: Reinforcement Learning with Transformer World Models 2022-02 NeurIPS 2021 Workshop Github TSSM
Transformer-based World Models Are Happy With 100k Interactions 2023-03 ICLR 2023 Github TSSM
Genie: Generative Interactive Environments 2024-02 arXiv - TSSM
GAIA-1: A Generative World Model for Autonomous Driving 2023-09 arXiv Wayve - Autoregressive Transformer
OccWorld: Learning a 3D Occupancy World Model for Autonomous Driving 2023-11 ECCV 2024 Github Autoregressive Transformer
Video generation models as world simulators (Sora) 2024-02 OpenAI - Diffusion
Vista: A Generalizable Driving World Model with High Fidelity and Versatile Controllability 2024-05 NeurIPS 2024 Github Diffusion
GAIA-2: A Controllable Multi-View Generative World Model for Autonomous Driving 2025-03 arXiv Wayve - Diffusion
Vid2World: Crafting Video Diffusion Models to Interactive World Models 2025-05 arXiv - AR+Diffusion
Epona: Autoregressive Diffusion World Model for Autonomous Driving 2025-06 ICCV 2025 Github AR+Diffusion
Core roles of World Models
Paper Date Venue Code Role
Cosmos World Foundation Model Platform for Physical AI 2025-03 arXiv Github Neural Simulator
Cosmos-Transfer1: Conditional World Generation with Adaptive Multimodal Control 2025-04 arXiv GitHub Neural Simulator
GAIA-1: A Generative World Model for Autonomous Driving 2023-09 arXiv Wayve - Neural Simulator
GAIA-2: A Controllable Multi-View Generative World Model for Autonomous Driving 2025-03 arXiv Wayve - Neural Simulator
DriveWorld: 4D Pre-trained Scene Understanding via World Models for Autonomous Driving 2024-05 CVPR 2024 - Neural Simulator
DOME: Taming Diffusion Model into High-Fidelity Controllable Occupancy World Model 2024-10 arXiv GitHub Neural Simulator
Dream to Control: Learning Behaviors by Latent Imagination (Dreamer) 2019-12 ICLR 2020 Github Dynamic Model
Mastering Atari with Discrete World Models (Dreamer v2) 2020-10 ICLR 2021 Github Dynamic Model
DayDreamer: World Models for Physical Robot Learning 2022-06 CoRL 2022 Github Dynamic Model
Mastering Diverse Domains through World Models (Dreamer v3) 2023-01 Nature Github Dynamic Model
Pre-training Contextualized World Models with In-the-wild Videos for Reinforcement Learning 2023-05 NeurIPS 2023 Github Dynamic Model
iVideoGPT: Interactive VideoGPTs are Scalable World Models 2024-05 NeurIPS 2024 Github Dynamic Model
Video Prediction Models as Rewards for Reinforcement Learning (VIPER) 2023-05 NeurIPS 2023 Github Reward Model
Video models are zero-shot learners and reasoners 2025-09 arXiv Google Deepmind - Neural Simulator

6. World Models for Intelligent Robots

World Models for Autonomous Driving

Table

WMs as Neural Simulators for Autonomous Driving
Paper Date Venue Code Application
GAIA-1: A Generative World Model for Autonomous Driving 2023-09 arXiv Wayve - Scenario Generation
DriveDreamer: Towards Real-world-driven World Models for Autonomous Driving 2023-09 ECCV 2024 GitHub Scenario Generation
ADriver-I: A General World Model for Autonomous Driving 2023-11 arXiv - Scenario Generation
GAIA-2: A Controllable Multi-View Generative World Model for Autonomous Driving 2025-03 arXiv Wayve - Scenario Generation
DriveDreamer-2: LLM-Enhanced World Models for Diverse Driving Video Generation 2024-05 AAAI 2025 GitHub Scenario Generation
DriveDreamer4D: World Models Are Effective Data Machines for 4D Driving Scene Representation 2024-11 CVPR 2025 GitHub Scenario Generation
DrivingWorld: Constructing World Model for Autonomous Driving via Video GPT 2024-12 arXiv GitHub Scenario Generation
MagicDrive: Street View Generation with Diverse 3D Geometry Control 2024-05 ICLR 2024 GitHub Scenario Generation
MagicDrive3D: Controllable 3D Generation for Any-View Rendering in Street Scenes 2024-11 arXiv GitHub Scenario Generation
MagicDrive-V2: High-Resolution Long Video Generation for Autonomous Driving with Adaptive Control 2024-11 arXiv GitHub Scenario Generation
WoVoGen: World Volume-aware Diffusion for Controllable Multi-camera Driving Scene Generation 2024-08 ECCV 2024 GitHub Scenario Generation
ReconDreamer: Crafting World Models for Driving Scene Reconstruction via Online Restoration 2024-11 CVPR 2025 GitHub Scenario Generation
DualDiff+: Dual-Branch Diffusion for High-Fidelity Video Generation with Reward Guidance 2025-03 ICRA 2025 GitHub Scenario Generation
Panacea+: Panoramic and Controllable Video Generation for Autonomous Driving 2024-08 CVPR 2024 GitHub Scenario Generation
Cosmos-Transfer1: Conditional World Generation with Adaptive Multimodal Control 2025-04 arXiv GitHub Scenario Generation
GeoDrive: Trajectory-Conditioned 3D World Model for Autonomous Driving 2025-02 arXiv - Scenario Generation
DriveWorld: 4D Pre-trained Scene Understanding via World Models for Autonomous Driving 2024-05 CVPR 2024 - Scenario Generation
OccSora: 4D Occupancy Generation Models as World Simulators for Autonomous Driving 2025-05 arXiv GitHub Scenario Generation
Driving in the Occupancy World: Vision-Centric 4D Occupancy Forecasting and Planning via World Models for Autonomous Driving 2025-01 AAAI 2025 GitHub Scenario Generation
DOME: Taming Diffusion Model into High-Fidelity Controllable Occupancy World Model 2024-10 arXiv GitHub Scenario Generation
RenderWorld: World Model with Self-Supervised 3D Label 2024-11 arXiv - Scenario Generation
OccLLaMA: A Language-Driven 3D Occupancy Generation Framework 2024-12 arXiv - Scenario Generation
BEVWorld: A Multimodal World Model for Autonomous Driving via Unified BEV Latent Space 2024-07 arXiv - Scenario Generation
HoloDrive: Holistic View-Aware World Model for Autonomous Driving 2024-10 arXiv - Scenario Generation
GEM: A Generalizable Ego-Vision Multimodal World Model for Fine-Grained Ego-Motion, Object Dynamics, and Scene Composition Control 2024-12 CVPR 2025 GitHub Scenario Generation
DriveArena: A Closed-loop Generative Simulation Platform for Autonomous Driving 2024-08 arXiv GitHub Scenario Generation
ACT-Bench: Towards Action Controllable World Models for Autonomous Driving 2024-12 arXiv - Scenario Generation
InfinityDrive: Towards Infinite-Resolution World Models for Autonomous Driving 2024-12 arXiv - Scenario Generation
Epona: Autoregressive Diffusion World Model for Autonomous Driving 2025-06 ICCV 2025 GitHub Scenario Generation
DrivePhysica: A Physics-Conditioned World Model for Autonomous Driving 2024-12 arXiv - Scenario Generation
Cosmos-Drive: Multi-Modal World Model for Autonomous Driving 2025-03 arXiv GitHub Scenario Generation
Genie 3: A new frontier for world models 2025-08 website Talk Interactive Online Simulation
WMs as Dynamic Models for Autonomous Driving
Paper Date Venue Code Application
MILE: Model-based Imitation Learning for Urban Driving 2022-10 NeurIPS 2022 GitHub Motion Planning
Cosmos-Reason1: From Physical Common Sense To Embodied Reasoning 2025-03 arXiv GitHub Reasoning
TrafficBots: Towards World Models for Autonomous Driving Simulation and Motion Prediction 2023-03 ICRA 2023 GitHub Motion Prediction
Uniworld: Autonomous Driving Pre-training via World Models 2023-08 arXiv - Pre-training
Copilot4D: Learning Unsupervised World Models for Autonomous Driving via Discrete Diffusion 2023-11 ICLR 2024 - Motion Planning
MUVO: A Multimodal Generative World Model for Autonomous Driving with Geometric Representations 2023-11 IV 2025 - Motion Planning
OccWorld: Learning a 3D Occupancy World Model for Autonomous Driving 2023-11 ECCV 2024 GitHub Motion Planning
ViDAR: Visual Point Cloud Forecasting for Autonomous Driving 2023-12 CVPR 2024 - Motion Prediction
Think2Drive: Efficient Reinforcement Learning by Thinking in Latent World Model for Quasi-Realistic Autonomous Driving 2024-02 ECCV 2024 - Motion Planning
LidarDM: Generative LiDAR Simulation in a Generated World 2024-04 ICRA 2025 - Simulation
Enhancing End-to-End Autonomous Driving with Latent World Model 2025-02 ICLR 2025 GitHub Motion Planning
UnO: Unsupervised Occupancy Fields for Perception and Forecasting 2024-06 CVPR 2024 - Motion Prediction
CarFormer: Self-Driving with Learned Object-Centric Representations 2024-07 ECCV 2024 GitHub Motion Planning
NeMo: Neural Occupancy Fields for Autonomous Driving 2024 ECCV 2024 - Motion Prediction
Mitigating Covariate Shift in Imitation Learning via Offline Data With Partial Coverage 2021-10 NeurIPS 2021 GitHub Motion Planning
Imagine-2-Drive: High-Fidelity World Modeling for Autonomous Driving 2024-11 IROS 2025 GitHub Motion Planning
Doe-1: Closed-Loop Autonomous Driving with Large World Model 2024-08 arXiv GitHub Motion Planning
GaussianWorld: Gaussian World Model for Streaming 3D Occupancy Prediction 2024-12 arXiv GitHub Motion Prediction
DFIT-OccWorld: Efficient Occupancy Forecasting via Differential Factorization and Interactive Transformer 2024-12 arXiv - Motion Prediction
DrivingGPT: Unifying Driving World Modeling and Planning with Multi-modal Large Language Model 2024-12 arXiv - Motion Planning
AdaWM: Adaptive World Model for Autonomous Driving 2025-01 ICLR 2025 - Motion Planning
AD-L-JEPA: Autonomous Driving with L-JEPA 2025-01 arXiv GitHub Motion Prediction
HERMES: Harmonized Embodied Representation for Multi-modal Sensor Integration in Autonomous Driving 2025-01 ICCV 2025 GitHub Motion Planning
WMs as Reward Models for Autonomous Driving
Paper Date Venue Code Application
SEM2: Enhance Sample Efficiency and Robustness of End-to-end Urban Autonomous Driving via Semantic Masked World Model 2024-05 T-ITS - Reinforcement Learning
Iso-Dream: Isolating and Leveraging Noncontrollable Visual Dynamics in World Models 2022-05 NeurIPS 2022 GitHub Reinforcement Learning
Vista: A Generalizable Driving World Model with High Fidelity and Versatile Controllability 2024-05 NeurIPS 2024 GitHub Reinforcement Learning
Driving into the Future: Multiview Visual Forecasting and Planning with World Model for Autonomous Driving 2023-11 CVPR 2024 GitHub Motion Planning
WoTE: World-model-based End-to-end Autonomous Driving 2025-04 ICCV 2025 GitHub Motion Planning

World Models for Articulated Robots

The following table compares researches for World Models in Robotics in terms of model input, architecture, experiment platform, and code availability.

Neural Simulators
Paper Date Venue Code
Whale: Towards generalizable and scalable world models for embodied decision-making 2024-08 arXiv -
RoboDreamer: Learning Compositional World Models for Robot Imagination 2024-08 ICML 2024 Github
Dream to Manipulate: Compositional World Models Empowering Robot Imitation Learning with Imagination 2024-11 ICLR 2025 GitHub
EnerVerse: Envisioning Embodied Future Space for Robotics Manipulation 2025-01 arXiv Github
Cosmos World Foundation Model Platform for Physical AI 2025-03 arXiv Github
WorldEval: World Model as Real-World Robot Policies Evaluator 2025-05 arXiv Github
DreamGen: Unlocking Generalization in Robot Learning through Neural Trajectories 2025-05 arXiv Github

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors