Embodied-AI-Paper-TopConf

🔥 NeuIPS2025 & CORL2025 & ICCV2025 & ICML2025 & RSS2025 & CVPR2025 & ICLR2025 & ICLR2026 Embodied AI Paper List Resources.

[03/22/2025] We plan to organize more papers on Embodied AI from top conferences in the future and build a more comprehensive paper list. If there are any conference papers you would like to browse or if you have any other suggestions, please feel free to leave an issue.

[04/12/2025] We are updating Embodied AI papers accepted by RSS2025 (Robotics Top Conference)!

[05/21/2025] We are updating Embodied AI papers accepted by ICML2025!

[08/05/2025] We are updating Embodied AI papers accepted by ICCV2025!

[09/30/2025] We are updating Embodied AI papers accepted by CORL2025!

[11/30/2025] We are updating Embodied AI papers accepted by NeuIPS2025!

[03/12/2026] We are updating Embodied AI papers accepted by ICLR2026! (📖 ICLR2026)

ICLR2026

📄 Full List

Vision-Language-Action Models

Scaling up Memory for Robotic Control via Experience Retrieval Paper
MemoryVLA: Perceptual-Cognitive Memory in Vision-Language-Action Models for Robotic Manipulation Paper
PixelVLA: Advancing Pixel-level Understanding in Vision-Language-Action Model Paper
Vlaser: Vision-Language-Action Model with Synergistic Embodied Reasoning Paper
Disentangled Robot Learning via Separate Forward and Inverse Dynamics Pretraining Paper
MetaVLA: Unified Meta Co-Training for Efficient Embodied Adaptation Paper
Unifying Diffusion and Autoregression for Generalizable Vision-Language-Action Model Paper
Hybrid Training for Vision-Language-Action Models Paper
End-to-end Listen, Look, Speak and Act Paper
WholeBodyVLA: Towards Unified Latent VLA for Whole-body Loco-manipulation Control Paper
RoboOmni: Proactive Robot Manipulation in Omni-modal Context Paper
Unified Vision-Language-Action Model Paper
SP-VLA: A Joint Model Scheduling and Token Pruning Approach for VLA Model Acceleration Paper
Align-Then-stEer: Adapting the Vision-Language Action Models through Unified Latent Guidance Paper
AutoQVLA: Not All Channels Are Equal in Vision-Language-Action Model's Quantization Paper
Verifier-free Test-Time Sampling for Vision Language Action Models Paper
Interleave-VLA: Enhancing Robot Manipulation with Image-Text Interleaved Instructions Paper
Unified Diffusion VLA: Vision-Language-Action Model via Joint Discrete Diffusion Diffusion Process Paper
Endowing GPT-4 with a Humanoid Body: Building the Bridge Between Off-the-Shelf VLMs and the Physical World Paper
On Robustness of Vision-Language-Action Model against Multi-Modal Perturbations Paper
Spatially Guided Training for Vision-Language-Action Model Paper
Self-Improving Vision-Language-Action Models with Data Generation via Residual RL Paper
Action-aware Dynamic Pruning for Efficient Vision-Language-Action Manipulation Paper
Spatial Forcing: Implicit Spatial Representation Alignment for Vision-language-action Model Paper
Genie Envisioner: A Unified World Foundation Platform for Robotic Manipulation Paper
From Spatial to Actions: Grounding Vision-Language-Action Model in Spatial Foundation Priors Paper
TwinVLA: Data-Efficient Bimanual Manipulation with Twin Single-Arm Vision-Language-Action Models Paper
FASTer: Toward Powerful and Efficient Autoregressive Vision–Language–Action Models with Learnable Action Tokenizer and Block-wise Decoding Paper
Embodied Navigation Foundation Model Paper
X-VLA: Soft-Prompted Transformer as Scalable Cross-Embodiment Vision-Language-Action Model Paper
Actions as Language: Fine-Tuning VLMs into VLAs Without Catastrophic Forgetting Paper
OneTwoVLA: A Unified Vision-Language-Action Model with Adaptive Reasoning Paper
VLM4VLA: Revisiting Vision-Language-Models in Vision-Language-Action Models Paper
Vision-Language-Action Instruction Tuning: From Understanding to Manipulation Paper
villa-X: Enhancing Latent Action Modeling in Vision-Language-Action Models Paper
From Seeing to Doing: Bridging Reasoning and Decision for Robotic Manipulation Paper

Vision-Language-Navigation Models

AutoFly: Vision-Language-Action Model for UAV Autonomous Navigation in the Wild Paper
Ground Slow, Move Fast: A Dual-System Foundation Model for Generalizable Vision-Language Navigation Paper
Towards Physically Executable 3D Gaussian for Embodied Navigation Paper
Uncertainty-Aware Gaussian Map for Vision-Language Navigation Paper
OpenFly: A COMPREHENSIVE PLATFORM FOR AERIAL VISION-LANGUAGE NAVIGATION Paper
JanusVLN: Decoupling Semantics and Spatiality with Dual Implicit Memory for Vision-Language Navigation Paper
CompassNav: Steering From Path Imitation to Decision Understanding In Navigation Paper
M$^3$E: Continual Vision-and-Language Navigation via Mixture of Macro and Micro Experts Paper
All-day Multi-scenes Lifelong Vision-and-Language Navigation with Tucker Adaptation Paper
OmniNav: A Unified Framework for Prospective Exploration and Visual-Language Navigation Paper

World Models

Ctrl-World: A Controllable Generative World Model for Robot Manipulation Paper
Context and Diversity Matter: The Emergence of In-Context Learning in World Models Paper
FantasyWorld: Geometry-Consistent World Modeling via Unified Video and 3D Prediction Paper
NeMo-map: Neural Implicit Flow Fields for Spatio-Temporal Motion Mapping Paper
Astra: General Interactive World Model with Autoregressive Denoising Paper
Empowering Multi-Robot Cooperation via Sequential World Models Paper
Test-Time Mixture of World Models for Embodied Agents in Dynamic Environments Paper
RIG: Synergizing Reasoning and Imagination in End-to-End Generalist Policy Paper
Learning Massively Multitask World Models for Continuous Control Paper
Unified 3D Scene Understanding Through Physical World Modeling Paper
ExoPredicator: Learning Abstract Models of Dynamic Worlds for Robot Planning Paper
Efficient Reinforcement Learning by Guiding World Models with Non-Curated Data Paper
Vid2World: Crafting Video Diffusion Models to Interactive World Models Paper
WMPO: World Model-based Policy Optimization for Vision-Language-Action Models Paper
Object-Centric World Models from Few-Shot Annotations for Sample-Efficient Reinforcement Learning Paper
Building spatial world models from sparse transitional episodic memories Paper
Cosmos Policy: Fine-Tuning Video Models for Visuomotor Control and Planning Paper
WoW!: World Models in a Closed-Loop World Paper

Planning and Reasoning

VLMgineer: Vision-Language Models as Robotic Toolsmiths Paper
MomaGraph: State-Aware Unified Scene Graphs with Vision-Language Models for Embodied Task Planning Paper
Planning with an Embodied Learnable Memory Paper
Theory of Space: Can Foundation Models Construct Spatial Beliefs through Active Exploration? Paper
Compositional Visual Planning via Inference-Time Diffusion Scaling Paper
Experience-based Knowledge Correction for Robust Planning in Minecraft Paper
Self-Improving Loops for Visual Robotic Planning Paper
BOLT: Decision‑Aligned Distillation and Budget-Aware Routing for Constrained Multimodal QA on Robots Paper
ReCAPA: Hierarchical Predictive Correction to Mitigate Cascading Failures Paper
One Demo Is All It Takes: Planning Domain Derivation with LLMs from A Single Demonstration Paper
EVLP: Learning Unified Embodied Vision-Language Planner with Reinforced Supervised Fine-Tuning Paper
Towards Improvisational TAMP: Learning Low-Level Shortcuts in Abstract Planning Graphs Paper
Embodied-R1: Reinforced Embodied Reasoning for General Robotic Manipulation Paper
Self-Refining Vision Language Model for Robotic Failure Detection and Reasoning Paper
Natural Language PDDL (NL-PDDL) for Open-world Goal-oriented Commonsense Regression Planning in Embodied AI Paper
SafeFlowMatcher: Safe and Fast Planning using Flow Matching with Control Barrier Functions Paper
OmniEVA: Embodied Versatile Planner via Task-Adaptive 3D-Grounded and Embodiment-aware Reasoning Paper

Navigation

From Seeing to Experiencing: Scaling Navigation Foundation Models with Reinforcement Learning Paper
Lifelong Embodied Navigation Learning Paper
CE-Nav: Flow-Guided Reinforcement Refinement for Cross-Embodiment Local Navigation Paper
Emergence of Spatial Representation in an Actor-Critic Agent with Hippocampus-Inspired Sequence Generator Paper

Humanoid

HWC-Loco: A Hierarchical Whole-Body Control Approach to Robust Humanoid Locomotion Paper
Task Tokens: A Flexible Approach to Adapting Behavior Foundation Models Paper
BFM-Zero: A Promptable Behavioral Foundation Model for Humanoid Control Using Unsupervised Reinforcement Learning Paper
From Language to Locomotion: Retargeting-free Humanoid Control via Motion Latent Guidance Paper

3D Vision

Geometry-aware 4D Video Generation for Robot Manipulation Paper
PD$^{2}$GS: Part-Level Decoupling and Continuous Deformation of Articulated Objects via Gaussian Splatting Paper
Manipulation as in Simulation: Enabling Accurate Geometry Perception in Robots Paper

Policy

Master Skill Learning with Policy-Grounded Synergy of LLM-based Reward Shaping and Exploring Paper
When would Vision-Proprioception Policies Fail in Robotic Manipulation? Paper
ManipEvalAgent: Promptable and Efficient Evaluation Framework for Robotic Manipulation Policies Paper
Remotely Detectable Robot Policy Watermarking Paper
Difference-Aware Retrieval Polices for Imitation Learning Paper
Capturing Visual Environment Structure Correlates with Control Performance Paper
VITA: Vision-to-Action Flow Matching Policy Paper
DemoGrasp: Universal Dexterous Grasping from a Single Demonstration Paper
When a Robot is More Capable than a Human: Learning from Constrained Demonstrators Paper
Autonomous Play with Correspondence-Driven Trajectory Warping Paper
Cross-Embodiment Offline Reinforcement Learning for Heterogeneous Robot Datasets Paper
Uncovering Robot Vulnerabilities through Semantic Potential Fields Paper
Time Optimal Execution of Action Chunk Policies Beyond Demonstration Speed Paper
Policy Likelihood-based Query Sampling and Critic-Exploited Reset for Efficient Preference-based Reinforcement Learning Paper
Rodrigues Network for Learning Robot Actions Paper
Reference Guided Skill Discovery Paper
Masked Generative Policy for Robotic Control Paper
GRL-SNAM: Geometric Reinforcement Learning with Differential Hamiltonians for Navigation and Mapping in Unknown Environments Paper
HAMLET: Switch Your Vision-Language-Action Model into a History-Aware Policy Paper
Towards Bridging the Gap between Large-Scale Pretraining and Efficient Finetuning for Humanoid Control Paper
Learning Video Generation for Robotic Manipulation with Collaborative Trajectory Control Paper
Policy Contrastive Decoding for Robotic Foundation Models Paper
Demystifying Robot Diffusion Policies: Action Memorization and a Simple Lookup Table Alternative Paper
H$^3$DP: Triply‑Hierarchical Diffusion Policy for Visuomotor Learning Paper
SimpleVLA-RL: Scaling VLA Training via Reinforcement Learning Paper
Compose Your Policies! Improving Diffusion-based or Flow-based Robot Policies via Test-time Distribution-level Composition Paper
Abstracting Robot Manipulation Skills via Mixture-of-Experts Diffusion Policies Paper
Accelerated co-design of robots through morphological pretraining Paper
Generalizable Coarse-to-Fine Robot Manipulation via Language-Aligned 3D Keypoints Paper
VER: Vision Expert Transformer for Robot Learning via Foundation Distillation and Dynamic Routing Paper
SpikePingpong: Spike Vision-based Fast-Slow Pingpong Robot System Paper
EquAct: An SE(3)-Equivariant Multi-Task Transformer for 3D Robotic Manipulation Paper
Translating Flow to Policy via Hindsight Online Imitation Paper
Hierarchical Value-Decomposed Offline Reinforcement Learning for Whole-Body Control Paper
Cortical Policy: A Dual-Stream View Transformer for Robotic Manipulation Paper
Geometry-aware Policy Imitation Paper
Contractive Diffusion Policies: Robust Action Diffusion via Contractive Score-Based Sampling with Differential Equations Paper
Scalable Exploration for High-Dimensional Continuous Control via Value-Guided Flow Paper
Mean Flow Policy with Instantaneous Velocity Constraint for One-step Action Generation Paper
Emergent Dexterity Via Diverse Resets and Large-Scale Reinforcement Learning Paper
Learning Part-Aware Dense 3D Feature Field For Generalizable Articulated Object Manipulation Paper
Real-Time Robot Execution with Masked Action Chunking Paper
Robust Fine-tuning of Vision-Language-Action Robot Policies via Parameter Merging Paper
ViPRA: Video Prediction for Robot Actions Paper
RAVEN: End-to-end Equivariant Robot Learning with RGB Cameras Paper

Dexterous Manipulation

DexNDM: Closing the Reality Gap for Dexterous In-Hand Rotation via Joint-Wise Neural Dynamics Model Paper
EgoDex: Learning Dexterous Manipulation from Large-Scale Egocentric Video Paper
RFS: Reinforcement learning with Residual flow steering for dexterous manipulation Paper
Learning to Grasp Anything By Playing with Random Toys Paper
SARM: Stage-Aware Reward Modeling for Long Horizon Robot Manipulation Paper
UniHM: Unified Dexterous Hand Manipulation with Vision Language Model Paper
DexMove: Learning Tactile-Guided Non-Prehensile Manipulation with Dexterous Hands Paper
VLBiMan: Vision-Language Anchored One-Shot Demonstration Enables Generalizable Bimanual Robotic Manipulation Paper
Cross-Embodied Co-Design for Dexterous Hands Paper
Robotic Manipulation by Imitating Generated Videos Without Physical Demonstrations Paper
Primary-Fine Decoupling for Action Generation in Robotic Imitation Paper

Tactile

AnyTouch 2: General Optical Tactile Representation Learning For Dynamic Tactile Perception Paper
APPLE: Toward General Active Perception via Reinforcement Learning Paper

Sim2real and Real2sim

D-REX: Differentiable Real-to-Sim-to-Real Engine for Learning Dexterous Grasping Paper
DemoGrasp: Universal Dexterous Grasping from a Single Demonstration Paper
Sim2Real VLA: Zero-Shot Generalization of Synthesized Skills to Realistic Manipulation Paper
Exo-Plore: Exploring Exoskeleton Control Space through Human-aligned Simulation Paper
Emergent Dexterity Via Diverse Resets and Large-Scale Reinforcement Learning Paper
Manipulation as in Simulation: Enabling Accurate Geometry Perception in Robots Paper
Latent Adaptation of Foundation Policies for Sim-to-Real Transfer Paper
RobotArena $\infty$: Unlimited Robot Benchmarking via Real-to-Sim Translation Paper
PD$^{2}$GS: Part-Level Decoupling and Continuous Deformation of Articulated Objects via Gaussian Splatting Paper
Contact-guided Real2Sim from Monocular Video with Planar Scene Primitives Paper

Benchmark and Dataset

D2E: Scaling Vision-Action Pretraining on Desktop Data for Transfer to Embodied AI Paper
Memory, Benchmark & Robots: A Benchmark for Solving Complex Tasks with Reinforcement Learning Paper
DataMIL: Selecting Data for Robot Imitation Learning with Datamodels Paper
MIMIC: Mask-Injected Manipulation Video Generation with Interaction Control Paper
LeRobot: An Open-Source Library for End-to-End Robot Learning Paper
RobotArena $\infty$: Unlimited Robot Benchmarking via Real-to-Sim Translation Paper
RoboInter: A Holistic Intermediate Representation Suite Towards Robotic Manipulation Paper
ENACT: Evaluating Embodied Cognition with World Modeling of Egocentric Interaction Paper
AutoBio: A Simulation and Benchmark for Robotic Automation in Digital Biology Laboratory Paper
Image Quality Assessment for Embodied AI Paper
MoMaGen: Generating Demonstrations under Soft and Hard Constraints for Multi-Step Bimanual Mobile Manipulation Paper
CoNavBench: Collaborative Long-Horizon Vision-Language Navigation Benchmark Paper
World2Minecraft: Occupancy-Driven simulated scenes Construction Paper
CitySeeker: How Do VLMs Explore Embodied Urban Navigation with Implicit Human Needs? Paper
Seeing Across Views: Benchmarking Spatial Reasoning of Vision-Language Models in Robotic Scenes Paper
RoboCasa365: A Large-Scale Simulation Framework for Training and Benchmarking Generalist Robots Paper
REI-Bench: Can Embodied Agents Understand Vague Human Instructions in Task Planning? Paper

Other

On the Generalization Capacities of MLLMs for Spatial Intelligence Paper
Embodied Agents Meet Personalization: Investigating Challenges and Solutions Through the Lens of Memory Utilization Paper
Interaction-aware Representation Modeling With Co-Occurrence Consistency for Egocentric Hand-Object Parsing Paper
PhyScensis: Physics-Augmented LLM Agents for Complex Physical Scene Arrangement Paper
OmniActor: A Generalist GUI and Embodied Agent for 2D&3D Worlds Paper
EgoWorld: Translating Exocentric View to Egocentric View using Rich Exocentric Observations Paper
Contact-guided Real2Sim from Monocular Video with Planar Scene Primitives Paper

NeuIPS2025

Vision-Language-Action Model

Fast-in-Slow: A Dual-System VLA Model Unifying Fast Manipulation within Slow Reasoning Paper Page
AC-DiT: Adaptive Coordination Diffusion Transformer for Mobile Manipulation Paper Page
BridgeVLA: Input-Output Alignment for Efficient 3D Manipulation Learning with Vision-Language Models Paper Page
CogVLA: Cognition-Aligned Vision-Language-Action Models via Instruction-Driven Routing & Sparsification Paper Page
VideoVLA: Video Generators Can Be Generalizable Robot Manipulators
ChatVLA-2: Vision-Language-Action Model with Open-World Reasoning Paper Page
Exploring the Limits of Vision-Language-Action Manipulation in Cross-task Generalization Paper Page
BadVLA: Towards Backdoor Attacks on Vision-Language-Action Models via Objective-Decoupled Optimization Paper Page
Compliant Residual DAgger: Improving Real-World Contact-Rich Manipulation with Human Corrections Paper Page
VLA-OS: Structuring and Dissecting Planning Representations and Paradigms in Vision-Language-Action Models Paper Page
ThinkAct: Vision-Language-Action Reasoning via Reinforced Visual Latent Planning Paper Page
Self-Improving Embodied Foundation Models Paper Page
Robo2VLM: Improving Visual Question Answering using Large-Scale Robot Manipulation Data Paper Page
EnerVerse: Envisioning Embodied Future Space for Robotics Manipulation Paper
Learning Spatial-Aware Manipulation Ordering Paper
PRIMT: Preference-based Reinforcement Learning with Multimodal Feedback and Trajectory Synthesis from Foundation Models Paper
BEAST: Efficient Tokenization of B-Splines Encoded Action Sequences for Imitation Learning Paper
PointMapPolicy: Structured Point Cloud Processing for Multi-Modal Imitation Learning Paper
Real-Time Execution of Action Chunking Flow Policies Paper Page
Chain-of-Action: Trajectory Autoregressive Modeling for Robotic Manipulation Paper Page
4D-VLA: Spatiotemporal Vision-Language-Action Pretraining with Cross-Scene Calibration
SAFE: Multitask Failure Detection for Vision-Language-Action Models Paper Page
Blindfolded Experts Generalize Better: Insights from Robotic Manipulation and Videogames Paper Page
HiMaCon: Discovering Hierarchical Manipulation Concepts from Unlabeled Multi-Modal Data Paper
Knowledge Insulating Vision-Language-Action Models: Train Fast, Run Fast, Generalize Better Paper
Provable Ordering and Continuity in Vision-Language Pretraining for Generalizable Embodied Agents Paper Page
DreamVLA: A Vision-Language-Action Model Dreamed with Comprehensive World Knowledge Paper Page

Data

EgoBridge: Domain Adaptation for Generalizable Imitation from Egocentric Human Data Paper Page
RobotSmith: Generative Robotic Tool Design for Acquisition of Complex Manipulation Skill Paper Page
URDF-Anything: Constructing Articulated Objects with 3D Multimodal Language Model Paper
DEAL: Diffusion Evolution Adversarial Learning for Sim-to-Real Transfer
Generalizable Domain Adaptation for Sim-and-Real Policy Co-Training Paper Page

World Model

SAMPO: Scale-wise Autoregression with Motion Prompt for Generative World Models Paper
Learning 3D Persistent Embodied World Models Paper
OSVI-WM: One-Shot Visual Imitation for Unseen Tasks using World-Model-Guided Trajectory Generation Paper

Planning and Reasoning

Towards Reliable LLM-based Robots Planning via Combined Uncertainty Estimation Paper
Towards Reliable Code-as-Policies: A Neuro-Symbolic Framework for Embodied Task Planning Paper
RDD: Retrieval-Based Demonstration Decomposer for Planner Alignment in Long-Horizon Tasks Paper Page
UniDomain: Pretraining a Unified PDDL Domain from Real-World Demonstrations for Generalizable Robot Task Planning Paper
InstructFlow: Adaptive Symbolic Constraint-Guided Code Generation for Long-Horizon Planning

Navigation

C-NAV: Towards Self-Evolving Continual Object Navigation in Open World Paper Page
Distilling LLM Prior to Flow Model for Generalizable Agent’s Imagination in Object Goal Navigation Paper
TP-MDDN: Task-Preferenced Multi-Demand-Driven Navigation with Autonomous Decision-Making
Active Test-time Vision-Language Navigation Paper
Aux-Think: Exploring Reasoning Strategies for Data-Efficient Vision-Language Navigation
EfficientNav: Towards On-Device Object-Goal Navigation with Navigation Map Caching and Retrieval Paper
Seeing through Uncertainty: Robust Task-Oriented Optimization in Visual Navigation Paper Page

Humanoid

Adversarial Locomotion and Motion Imitation for Humanoid Policy Learning Paper Page
From Experts to a Generalist: Toward General Whole-Body Control for Humanoid Robots Paper Page
KungfuBot: Physics-Based Humanoid Whole-Body Control for Learning Highly-Dynamic Skills Paper Page

3D Vision

DynaRend: Learning 3D Dynamics via Masked Future Rendering for Robotic Manipulation Paper
Building 3D Representations and Generating Motions From a Single Image via Video-Generation Paper

Policy

Emerging Risks from Embodied AI Require Urgent Policy Action
Human-assisted Robotic Policy Refinement via Action Preference Optimization Paper Page
Hyper-GoalNet: Goal-Conditioned Manipulation Policy Learning with HyperNetworks
ReinFlow: Fine-tuning Flow Matching Policy with Online Reinforcement Learning Paper Page
Diversifying Parallel Ergodic Search: A Signature Kernel Evolution Strategy
FreqPolicy: Efficient Flow-based Visuomotor Policy via Frequency Consistency Paper
A Practical Guide for Incorporating Symmetry in Diffusion Policy Paper
Latent Policy Barrier: Learning Robust Visuomotor Policies by Staying In-Distribution Paper Page
Quantization-Free Autoregressive Action Transformer Paper
Real-World Reinforcement Learning of Active Perception Behaviors
Failure Prediction at Runtime for Generative Robot Policies Paper
Act to See, See to Act: Diffusion-Driven Perception-Action Interplay for Adaptive Policies Paper Page
Dynamic Test-Time Compute Scaling in Control Policy: Difficulty-Aware Stochastic Interpolant Policy Paper
DynaGuide: Steering Diffusion Polices with Active Dynamic Guidance Paper Page
World-aware Planning Narratives Enhance Large Vision-Language Model Planner Paper

Accelerating and Deploying

Accelerating Visual-Policy Learning through Parallel Differentiable Simulation Paper Page
EfficientVLA: Training-Free Acceleration and Compression for Vision-Language-Action Models Paper
A Smooth Sea Never Made a Skilled SAILOR: Robust Imitation via Learning to Search Paper Page
VLA-Cache: Efficient Vision-Language-Action Manipulation via Adaptive Token Caching Paper Page

Tactile

Universal Visuo-Tactile Video Understanding for Embodied Interaction Paper
Enhancing Tactile-based Reinforcement Learning for Robotic Control Paper Page
Taccel: Scaling Up Vision-based Tactile Robotics via High-performance GPU Simulation Paper Page
Toward Artificial Palpation: Representation Learning of Touch on Soft Bodies Paper Page
Touch in the Wild: Learning Fine-Grained Manipulation with a Portable Visuo-Tactile Gripper Paper Page

Dexterous

Contact Map Transfer with Conditional Diffusion Model for Generalizable Dexterous Grasp Generation Paper Page
HumanoidGen: Data Generation for Bimanual Dexterous Manipulation via LLM Reasoning Paper Page
Grasp2Grasp: Vision-Based Dexterous Grasp Translation via Schrödinger Bridges Paper Page
Scaffolding Dexterous Manipulation with Vision-Language Models Paper Page
DexFlyWheel: A Scalable and Self-improving Data Generation Framework for Dexterous Manipulation Paper Page
DexGarmentLab: Dexterous Garment Manipulation Environment with Generalizable Policy Paper Page

Benchmark and Dataset

RoboCerebra: A Large-scale Benchmark for Long-horizon Robotic Manipulation Evaluation Paper Page
SutureBot: A Precision Framework & Benchmark For Autonomous End-to-End Suturing Paper Page
Synthesizing Photorealistic and Dynamic Urban Environments for Multimodal Robot Navigation and Collaboration
LabUtopia: High-Fidelity Simulation and Hierarchical Benchmark for Scientific Embodied Agents Paper Page
SonoGym: High Performance Simulation for Challenging Surgical Tasks with Robotic Ultrasound Paper Page
Embodied Crowd Counting
PAC Bench: Do Foundation Models Understand Prerequisites for Executing Manipulation Policies? Paper

CORL2025

Vision-Language-Action Model

$\pi_{0.5}$: a Vision-Language-Action Model with Open-World Generalization Paper page
Training Strategies for Efficient Embodied ReasoningPaper page
Long-VLA: Unleashing Long-Horizon Capability of Vision Language Action Model for Robot Manipulation Paper page
RoboMonkey: Scaling Test-Time Sampling and Verification for Vision-Language-Action Models Paper page
RoboChemist: Long-Horizon and Safety-Compliant Robotic Chemical Experimentation Paper page
TA-VLA: Elucidating the Design Space of Torque-aware Vision-Language-Action Models Paper page
Focusing on What Matters: Object-Agent-centric Tokenization for Vision Language Action models Paper
FLOWER: Democratizing Generalist Robot Policies with Efficient Vision-Language-Action Flow Policies Paper page
Mechanistic Interpretability for Steering Vision-Language-Action Models Paper
RICL: Adding In-Context Adaptability to Pre-Trained Vision-Language-Action Models Paper page
DexVLA: Vision-Language Model with Plug-In Diffusion Expert for General Robot Control Paper page
FLARE: Robot Learning with Implicit World Modeling Paper page
3DS-VLA: A 3D Spatial-Aware Vision Language Action Model for Robust Multi-Task Manipulation Paper
GraspVLA: a Grasping Foundation Model Pre-trained on Billion-scale Synthetic Action Data Paper page
EndoVLA: Dual-Phase Vision-Language-Action for Precise Autonomous Tracking in Endoscopy Paper
MoTo: A Zero-shot Plug-in Interaction-aware Navigation for General Mobile Manipulation Paper page
ControlVLA: Few-shot Object-centric Adaptation for Pre-trained Vision-Language-Action Models Paper page
TrackVLA: Embodied Visual Tracking in the Wild Paper page
AnyPlace: Learning Generalizable Object Placement for Robot Manipulation Paper page
Generalist Robot Manipulation beyond Action Labeled Data Paper page
LaVA-Man: Learning Visual Action Representations for Robot Manipulation Paper page

Navigation

MoTo: A Zero-shot Plug-in Interaction-aware Navigation for General Mobile Manipulation
Meta-Optimization and Program Search using Language Models for Task and Motion Planning
ObjectReact: Learning Object-Relative Control for Visual Navigation
HALO: Human Preference Aligned Offline Reward Learning for Robot Navigation
Imagine, Verify, Execute: Memory-guided Agentic Exploration with Vision-Language Models
Long Range Navigator (LRN): Extending robot planning horizons beyond metric maps
Search-TTA: A Multi-Modal Test-Time Adaptation Framework for Visual Search in the Wild
ActLoc: Learning to Localize on the Move via Active Viewpoint Selection
Human-like Navigation in a World Built for Humans
GC-VLN: Instruction as Graph Constraints for Training-free Vision-and-Language Navigation
GraspMolmo: Generalizable Task-Oriented Grasping via Large-Scale Synthetic Data Generation
Belief-Conditioned One-Step Diffusion: Real-Time Trajectory Planning with Just-Enough Sensing

Policy

ImMimic: Cross-Domain Imitation from Human Videos via Mapping and Interpolation Paper page
ReWiND: Language-Guided Rewards Teach Robot Policies without New Demonstrations Paper page
Steering Your Diffusion Policy with Latent Space Reinforcement Learning Paper page
Streaming Flow Policy: Simplifying diffusion/flow-matching policies by treating action trajectories as flow trajectories Paper page
SAIL: Faster-than-Demonstration Execution of Imitation Learning Policies Paper page
Reactive In-Air Clothing Manipulation with Confidence-Aware Dense Correspondence and Visuotactile Affordance Paper page
Data Retrieval with Importance Weights for Few-Shot Imitation Learning Paper page
X-Sim: Cross-Embodiment Learning via Real-to-Sim-to-Real Paper
DemoSpeedup: Accelerating Visuomotor Policies via Entropy-Guided Demonstration Acceleration Paper page
ManiFlow: A General Robot Manipulation Policy via Consistency Flow Training Paper page
Text2Touch: Tactile In-Hand Manipulation with LLM-Designed Reward Functions Paper page
Multi-Loco: Unifying Multi-Embodiment Legged Locomotion via Reinforcement Learning Augmented Diffusion Paper page
$\texttt{SPIN}$: distilling $\texttt{Skill-RRT}$ for long-horizon prehensile and non-prehensile manipulation Paper
Imitation Learning Based on Disentangled Representation Learning of Behavioral Characteristics Paper
Constraint-Preserving Data Generation for One-Shot Visuomotor Policy Generalization Paper page
CLASS: Contrastive Learning via Action Sequence Supervision for Robot Manipulation Paper page
MirrorDuo: Reflection-Consistent Visuomotor Learning from Mirrored Demonstration Pairs page
Dynamics-Compliant Trajectory Diffusion for Super-Nominal Payload Manipulation Paper
Eye, Robot: Learning to Look to Act with a BC-RL Perception-Action Loop Paper page
ARCH: Hierarchical Hybrid Learning for Long-Horizon Contact-Rich Robotic Assembly Paper page
KDPE: A Kernel Density Estimation Strategy for Diffusion Policy Trajectory Selection Paper page
AimBot: A Simple Auxiliary Visual Cue to Enhance Spatial Awareness of Visuomotor Policies Paper page
Enabling Long(er) Horizon Imitation for Manipulation Tasks by Modeling Subgoal Transitions
Mobi-$\pi$: Mobilizing Your Robot Learning Policy Paper page
Action-Free Reasoning for Policy Generalization Paper page
Learn from What We HAVE: History-Aware VErifier that Reasons about Past Interactions Online Paper page
D-CODA: Diffusion for Coordinated Dual-Arm Data Augmentation Paper page
ATK: Automatic Task-driven Keypoint Selection for Robust Policy Learning Paper page
Poke and Strike: Learning Task-Informed Exploration Policies Paper page
SafeBimanual: Diffusion-based trajectory optimization for safe bimanual manipulation Paper page
COMBO-Grasp: Learning Constraint-Based Manipulation for Bimanual Occluded Grasping Paper
Phantom: Training Robots Without Robots Using Only Human Videos Paper page
Learning Long-Context Diffusion Policies via Past-Token Prediction Paper
VT-Refine: Learning Bimanual Assembly with Visuo-Tactile Feedback via Simulation Fine-Tuning Paper
COLLAGE: Adaptive Fusion-based Retrieval for Augmented Policy Learning Paper page
CDP: Towards Robust Autoregressive Visuomotor Policy Learning via Causal Diffusion Paper page
Robust Dexterous Grasping of General Objects Paper page
Point Policy: Unifying Observations and Actions with Key Points for Robot Manipulation Paper page

Benchmark and Dataset

RoboArena: Distributed Real-World Evaluation of Generalist Robot Policies
GraspVLA: a Grasping Foundation Model Pre-trained on Billion-scale Synthetic Action Data
CUPID: Curating Data your Robot Loves with Influence
AutoEval: Autonomous Evaluation of Generalist Robot Manipulation Policies in the Real World
ManipBench: Benchmarking Vision-Language Models for Low-Level Robot Manipulation Functions
Ensuring Force Safety in Vision-Guided Robotic Manipulation via Implicit Tactile Calibration
Crossing the Human-Robot Embodiment Gap with Sim-to-Real RL using One Human Demonstration
UniSkill: Imitating Human Videos via Cross-Embodiment Skill Representations

Humanoid

HuB: Learning Extreme Humanoid Balance
Versatile Loco-Manipulation through Flexible Interlimb Coordination
Visual Imitation Enables Contextual Humanoid Control
Hand-Eye Autonomous Delivery: Learning Humanoid Navigation, Locomotion and Reaching
CLONE: Closed-Loop Whole-Body Humanoid Teleoperation for Long-Horizon Tasks
Embrace Contacts: humanoid shadowing with full body ground contacts
Hold My Beer: Learning Gentle Humanoid Locomotion and End-Effector Stabilization Control
SLAC: Simulation-Pretrained Latent Action Space for Whole-Body Real-World RL
Robot Trains Robot: Automatic Real-World Policy Adaptation and Learning for Humanoids
Humanoid Policy ~ Human Policy

World Model

Real2Render2Real: Scaling Robot Data Without Dynamics Simulation or Robot Hardware
Cross-Sensor Touch Generation
WoMAP: World Models For Embodied Open-Vocabulary Object Localization
DreamGen: Unlocking Generalization in Robot Learning through Video World Models
Tool-as-Interface: Learning Robot Policies from Observing Human Tool Use
Articulated Object Estimation in the Wild
DiWA: Diffusion Policy Adaptation with World Models
Steerable Scene Generation with Post Training and Inference-Time Search
Generative Visual Foresight Meets Task-Agnostic Pose Estimation in Robotic Table-top Manipulation
Gen2Act: Human Video Generation in Novel Scenarios enables Generalizable Robot Manipulation
Reflective Planning: Vision-Language Models for Multi-Stage Long-Horizon Robotic Manipulation
LaDi-WM: A Latent Diffusion-Based World Model for Predictive Manipulation

Dexterous Manipulation

DexUMI: Using Human Hand as the Universal Manipulation Interface for Dexterous Manipulation page
Dexplore: Scalable Neural Control for Dexterous Manipulation from Reference Scoped Exploration
FFHFlow: Diverse and Uncertainty-Aware Dexterous Grasp Generation via Flow Variational Inference
GraspQP: Differentiable Optimization of Force Closure for Diverse and Robust Dexterous Grasping page
Morphologically Symmetric Reinforcement Learning for Ambidextrous Bimanual Manipulation
KineDex: Learning Tactile-Informed Visuomotor Policies via Kinesthetic Teaching for Dexterous Manipulation
D-Cubed: Latent Diffusion Trajectory Optimisation for Dexterous Deformable Manipulation
LodeStar: Long-horizon Dexterity via Synthetic Data Augmentation from Human Demonstrations

Sim-to-Real

The Sound of Simulation: Learning Multimodal Sim-to-Real Robot Policies with Generative Audio
FetchBot: Learning Generalizable Object Fetching in Cluttered Scenes via Zero-Shot Sim2Real
ClutterDexGrasp: A Sim-to-Real System for General Dexterous Grasping in Cluttered Scenes
SimShear: Sim-to-Real Shear-based Tactile Servoing
Wheeled Lab: Modern Sim2Real for Low-cost, Open-source Wheeled Robotics
Articulate AnyMesh: Open-vocabulary 3D Articulated Objects Modeling
AgentWorld: An Interactive Simulation Platform for Scene Construction and Mobile Robotic Manipulation
Robot Learning from Any Images

ICCV2025

Vision-Language-Action Model

Exploring the Adversarial Vulnerabilities of Vision-Language-Action Models in Robotics Paper page
VQ-VLA: Improving Vision-Language-Action Models via Scaling Vector-Quantized Action Tokenizers Paper page
Dita: Scaling Diffusion Transformer for Generalist Vision-Language-Action Policy Paper page
Moto: Latent Motion Token as the Bridging Language for Learning Robot Manipulation from Videos Paper page
A0: An Affordance-Aware Hierarchical Model for General Robotic Manipulation Paper page
Embodied VideoAgent: Persistent Memory from Egocentric Videos and Embodied Sensors Enables Dynamic Scene Understanding Paper page
CoA-VLA: Improving Vision-Language-Action Models via Visual-Text Chain-of-Affordance Paper
FedVLA: Federated Vision-Language-Action Learning with Dual Gating Mixture-of-Experts for Robotic Manipulation Paper
Towards Long-Horizon Vision-Language-Action System: Reasoning, Acting and Memory Paper
PASG: A Closed-Loop Framework for Automated Geometric Primitive Extraction and Semantic Anchoring in Robotic Manipulation Paper
SD2Actor: Continuous State Decomposition via Diffusion Embeddings for Robotic Manipulation Paper

Vision-Language-Navigation Model

Move to Understand a 3D Scene: Bridging Visual Grounding and Exploration for Efficient and Versatile Embodied Navigation Paper page
Rethinking the Embodied Gap in Vision-and-Language Navigation: A Holistic Study of Physical and Visual Disparities Paper page
P3Nav: A Unified Framework for Embodied Navigation Integrating Perception, Planning, and Prediction Paper
SAME: Learning Generic Language-Guided Visual Navigation with State-Adaptive Mixture of Experts Paper page
NavMorph: A Self-Evolving World Model for Vision-and-Language Navigation in Continuous Environments Paper page
Harnessing Input-adaptive Inference for Efficient VLN Paper
Embodied Navigation with Auxiliary Task of Action Description Prediction Paper
3D Gaussian Map with Open-Set Semantic Grouping for Vision-Language Navigation Paper
NavQ: Learning a Q-Model for Foresighted Vision-and-Language Navigation Paper
monoVLN: Bridging the Observation Gap between Monocular and Panoramic Vision and Language Navigation Paper

Hierarchical Planning

Adaptive Articulated Object Manipulation On The Fly with Foundation Model Reasoning and Part Grounding Paper
CogNav: Cognitive Process Modeling for Object Goal Navigation with LLMs Paper page
RoBridge: A Hierarchical Architecture Bridging Cognition and Execution for General Robotic Manipulation Paper page

World Model

IRASim: A Fine-Grained World Model for Robot Manipulation Paper page
GWM: Towards Scalable Gaussian World Models for Robotic Manipulation Paper page
DyWA: Dynamics-adaptive World Action Model for Generalizable Non-prehensile Manipulation Paper page
Diffusion-Based Imaginative Coordination for Bimanual Manipulation Paper
Learning 4D Embodied World Models Paper

Policy

Rethinking Bimanual Robotic Manipulation: Learning with Decoupled Interaction Framework Paper
EC-Flow: Enabling Versatile Robotic Manipulation from Action-Unlabeled Videos via Embodiment-Centric Flow Paper page
Dense Policy: Bidirectional Autoregressive Learning of Actions Paper page
AnyBimanual: Transferring Unimanual Policy for General Bimanual Manipulation Paper page
Learning Precise Affordances from Egocentric Videos for Robotic Manipulation Paper page
iManip: Skill-Incremental Learning for Robotic Manipulation Paper
Spatial-Temporal Aware Visuomotor Diffusion Policy Learning Paper page
Wavelet Policy: Lifting Scheme for Policy Learning in Long-Horizon Tasks Paper page
4D Visual Pre-training for Robot Learning Paper

Accelerating and Deploying

Saliency-Aware Quantized Imitation Learning for Efficient Robotic Control Paper
On-Device Diffusion Transformer Policy for Efficient Robot Manipulation Paper
COSMO: Combination of Selective Memorization for Low-cost Vision-and-Language Navigation Paper
CARP: Coarse-to-Fine Autoregressive Prediction for Visuomotor Policy Learning Paper page

Perception

EmbodiedOcc: Embodied 3D Occupancy Prediction for Vision-based Online Scene Understanding Paper page
Embodied Image Captioning: Self-supervised Learning Agents for Spatially Coherent Image Descriptions Paper page

Benchmark and Dataset

VLABench: A Large-Scale Benchmark for Language-Conditioned Robotics Manipulation with Long-Horizon Reasoning Tasks Paper page
RoboFactory: Exploring Embodied Agent Collaboration with Compositional Constraints Paper page
HUMOTO: A 4D Dataset of Mocap Human Object Interactions Paper page
RoboMM: All-in-One Multimodal Large Model for Robotic Manipulation Paper
MoMa-Kitchen: A 100K+ Benchmark for Affordance-Grounded Last-Mile Navigation in Mobile Manipulation Paper page
RoboPearls: Editable Video Simulation for Robot Manipulation Paper page
DexH2R: A Benchmark for Dynamic Dexterous Grasping in Human-to-Robot Handover Paper page
Beyond the Destination: A Novel Benchmark for Exploration-Aware Embodied Question Answering Paper page
RobAVA: A Large-scale Dataset and Baseline Towards Video based Robotic Arm Action Understanding Paper
RoboAnnotatorX: A Comprehensive and Universal Annotation Framework for Accurate Understanding of Long-horizon Robot Demonstration Paper

ICML2025

Vision-Language-Action Models

Hi Robot: Open-Ended Instruction Following with Hierarchical Vision-Language-Action Models Paper
OTTER: A Vision-Language-Action Model with Text-Aware Visual Feature Extraction paper page
UP-VLA: A Unified Understanding and Prediction Model for Embodied Agent paper
ELEMENTAL: Interactive Learning from Demonstrations and Vision-Language Models for Reward Design in Robotics paper
ReinboT: Amplifying Robot Visual-Language Manipulation with Reinforcement Learning paper
A Large Recurrent Action Model: xLSTM enables Fast Inference for Robotics Tasks paper page

Planning and Reasoning

Efficient Robotic Policy Learning via Latent Space Backward Planning paper page
Closed-Loop Long-Horizon Robotic Planning via Equilibrium Sequence Modeling paper page

Policies

SAM2Act:Integrating Visual Foundation Model with A Memory Architecture for Robotic Manipulation paper
Pre-training Auto-regressive Robotic Models with 4D Representations paper page
Flow-based Domain Randomization for Learning and Sequencing Robotic Skills paper
EmbodiedBench: Comprehensive Benchmarking Multi-modal Large Language Models for Vision-Driven Embodied Agents paper page
Learning Policy Committees for Effective Personalization in MDPs with Diverse Tasks paper
Video Prediction Policy: A Generalist Robot Policy with Predictive Visual Representations paper page
STAR: Learning Diverse Robot Skill Abstractions through Rotation-Augmented paper page

3D Vision

Unifying 2D and 3D Vision-Language Understanding paper page
GAPrompt: Geometry-Aware Point Cloud Prompt for 3D Vision Model paper page

Dataset

WOMD-Reasoning: A Large-Scale Dataset for Interaction Reasoning in Driving paper page

RSS2025

Unified World Models: Coupling Video and Action Diffusion for Pretraining on Large Robotic Datasets Paper Page
CordViP: Correspondence-based Visuomotor Policy for Dexterous Manipulation in Real-World Paper Page
Reactive Diffusion Policy: Slow-Fast Visual-Tactile Policy Learning for Contact-Rich Manipulation Paper Page
Dynamic Rank Adjustment in Diffusion Policies for Efficient and Flexible Training Paper
SpatialVLA: Exploring Spatial Representations for Visual-Language-Action Model Paper
Sketch-to-Skill: Bootstrapping Robot Learning with Human Drawn Trajectory Sketches Paper
NaVILA: Legged Robot Vision-Language-Action Model for Navigation Paper Page
ConRFT: A Reinforced Fine-tuning Method for VLA Models via Consistency Policy Paper Page
You Only Teach Once: Learn One-Shot Bimanual Robotic Manipulation from Video Demonstrations Paper Page
ASAP: Aligning Simulation and Real-World Physics for Learning Agile Humanoid Whole-Body Skills Paper Page
Flying Hand: End-Effector-Centric Framework for Versatile Aerial Manipulation Teleoperation and Policy Learning Paper Page
DemoGen: Synthetic Demonstration Generation for Data-Efficient Visuomotor Policy Learning Paper Page
DOGlove: Dexterous Manipulation with a Low-Cost Open-Source Haptic Force Feedback Glove Paper Page
RoboSplat: Novel Demonstration Generation with Gaussian Splatting Enables Robust One-Shot Manipulation Paper Page
Enhancing Autonomous Driving Systems with On-Board Deployed Large Language Models Paper
SATA: Safe and Adaptive Torque-Based Locomotion Policies Inspired by Animal Learning Paper Video
FACTR: Force-Attending Curriculum Training for Contact-Rich Policy Learning Paper Page
RoboVerse: Towards a Unified Platform, Dataset and Benchmark for Scalable and Generalizable Robot Learning Paper Page
STDArm: Transferring Visuomotor Policies From Static Data Training to Dynamic Robot Manipulation Paper

CVPR2025

Vision-Language-Action Models

UniAct: Universal Actions For Enhanced Embodied Foundation Models Paper Page
MoManipVLA: Transferring Vision-language-action Models for General Mobile Manipulation Paper Page
CoT-VLA: Visual Chain-of-Thought Reasoning for Vision-Language-Action Models Paper
SOLAMI: Social Vision-Language-Action Modeling for Immersive Interaction with 3D Autonomous Characters Paper Page
A Data-Centric Revisit of Pre-Trained Vision Models for Robot Learning Paper Page
Think Small, Act Big: Primitive Prompt Learning for Lifelong Robot Manipulation
Phoenix: A Motion-based Self-Reflection Framework for Fine-grained Robotic Action Correction Paper
OmniManip: Towards General Robotic Manipulation via Object-Centric Interaction Primitives as Spatial Constraints Paper Page
Mitigating the Human-Robot Domain Discrepancy in Visual Pre-training for Robotic Manipulation Paper
Object-Centric Prompt-Driven Vision-Language-Action Model for Robotic Manipulation Abstract
Robotic Visual Instruction
RoboGround: Robot Manipulation with Grounded Vision-Language Priors

Policies

KStar Diffuser: Spatial-Temporal Graph Diffusion Policy with Kinematics Modeling for Bimanual Robotic Manipulation Paper
RoboPEPP: Vision-Based Robot Pose and Joint Angle Estimation through Embedding Predictive Pre-Training Paper
Lift3D Policy: Lifting 2D Foundation Models for Robust 3D Robotic Manipulation Paper Page
PDFactor: Learning Tri-Perspective View Policy Diffusion Field for Multi-Task Robotic Manipulation Abstract
Two by Two: Learning Cross-Task Pairwise Objects Assembly for Generalizable Robot Manipulation
FlowRAM: Grounding Flow Matching Policy with Region-Aware Mamba Framework for Robotic Manipulation Abstract
G3Flow: Generative 3D Semantic Flow for Pose-aware and Generalizable Object Manipulation Paper Page
DexHandDiff: Interaction-aware Diffusion Planning for Adaptive Dexterous Manipulation Paper Page
Tra-MoE: Learning Trajectory Prediction Model from Multiple Domains for Adaptive Policy Conditioning Paper
AffordDP: Generalizable Diffusion Policy with Transferable AffordancePaper Page
Tra-MoE: Learning Trajectory Prediction Model from Multiple Domains for Adaptive Policy Conditioning Paper Page

Grasp

UniGraspTransformer: Simplified Policy Distillation for Scalable Dexterous Robotic Grasping Paper Page
DexGrasp Anything: Towards Universal Robotic Dexterous Grasping with Physics Awareness Paper Page
ZeroGrasp: Zero-Shot Shape Reconstruction Enabled Robotic Grasping Paper

Humanoid

Let Humanoid Robots Go Hiking! Integrative Skill Development over Complex Trails Paper Page
MobileH2R: Learning Generalizable Human to Mobile Robot Handover Exclusively from Scalable and Diverse Synthetic Data Paper

3D Vision

3D-MVP: 3D Multiview Pretraining for Robotic Manipulation Paper Page
VidBot: Learning Generalizable 3D Actions from In-the-Wild 2D Human Videos for Zero-Shot Robotic ManipulationPaper Page
Touch2Shape: Touch-Conditioned 3D Diffusion for Shape Exploration and Reconstruction Abs

Planning and Reasoning

RoboBrain: A Unified Brain Model for Robotic Manipulation from Abstract to Concrete Paper
PhysVLM: Enabling Visual Language Models to Understand Robotic Physical Reachability Paper
RoboSpatial: Teaching Spatial Understanding to 2D and 3D Vision-Language Models for Robotics Paper
Tartan IMU: A Light Foundation Model for Inertial Positioning in Robotics Abstract
Code-as-Monitor: Constraint-aware Visual Programming for Reactive and Proactive Robotic Failure Detection Paper Page

Video

TASTE-Rob: Advancing Video Generation of Task-Oriented Hand-Object Interaction for Generalizable Robotic Manipulation Paper
GraphMimic: Graph-to-Graphs Generative Modeling from Videos for Policy Learning Paper

Sim2real and Real2sim

Prof. Robot: Differentiable Robot Rendering Without Static and Self-Collisions Paper Page
AutoURDF: Unsupervised Robot Modeling from Point Cloud Frames Using Cluster Registration Paper Page

Benchmark and Dataset

RoboTwin: Dual-Arm Robot Benchmark with Generative Digital Twins (early version)Paper Page
Pixel-aligned RGB-NIR Stereo Imaging and Dataset for Robot Vision Paper
RoboSense: Large-scale Dataset and Benchmark for Egocentric Robot Perception and Navigation in Crowded and Unstructured Environments Paper Page

ICLR2025

Vision-Language-Action Models

LLaRA: Supercharging Robot Learning Data for Vision-Language Policy Paper Page
VLAS: Vision-Language-Action Model With Speech Instructions For Customized Robot Manipulation Paper Page
TraceVLA: Visual Trace Prompting Enhances Spatial-Temporal Awareness for Generalist Robotic Policies Paper Page
Robots Pre-train Robots: Manipulation-Centric Robotic Representation from Large-Scale Robot Datasets Paper Page
PIDM: Predictive Inverse Dynamics Models are Scalable Learners for Robotic Manipulation Paper Page

Policies

GravMAD: Grounded Spatial Value Maps Guided Action Diffusion for Generalized 3D Manipulation Paper Page
ReViWo: Learning View-invariant World Models for Visual Robotic Manipulation Paper zhihu
HAMSTER: Hierarchical Action Models For Open-World Robot Manipulation Paper Page
BadRobot: Jailbreaking Embodied LLMs in the Physical World Paper Page
STRAP: Robot Sub-Trajectory Retrieval for Augmented Policy Learning Paper Page
SRSA: Skill Retrieval and Adaptation for Robotic Assembly Tasks Paper Page
Data Scaling Laws in Imitation Learning for Robotic Manipulation Paper Page
Stem-OB: Generalizable Visual Imitation Learning with Stem-Like Convergent Observation through Diffusion Inversion Paper Page

3D Vision

Dream to Manipulate: Compositional World Models Empowering Robot Imitation Learning with Imagination Paper Page
SPA*: 3D Spatial-Awareness Enables Effective Embodied Representation Paper Page

Planning and Reasoning

LASeR: Towards Diversified and Generalizable Robot Design with Large Language Models Paper Page
Physics-informed Temporal Difference Metric Learning for Robot Motion Planning Paper Page
AHA: A Vision-Language-Model for Detecting and Reasoning Over Failures in Robotic Manipulation Paper Page
EMOS: Embodiment-aware Heterogeneous Multi-robot Operating System with LLM AgentsPaper Page
VisualPredicator: Learning Abstract World Models with Neuro-Symbolic Predicates for Robot Planning Paper Page
DenseMatcher: Learning 3D Semantic Correspondence for Category-Level Manipulation from a Single Demo Paper Page
6D Object Pose Tracking in Internet Videos for Robotic Manipulation Paper Page

Planning and Reasoning

Multi-Robot Motion Planning with Diffusion Models Paper Page

Video

GEVRM: Goal-Expressive Video Generation Model For Robust Visual Manipulation Paper

Sim2real and Real2sim

ReGen: Generative Robot Simulation via Inverse Design Paper Page

ICRA2025

MoRE: Unlocking Scalability in Reinforcement Learning for Quadruped Vision-Language-Action Models Paper
QUART-Online: Latency-Free Large Multimodal Language Model for Quadruped Robot Learning Paper Page
SpatialBot: Precise Spatial Understanding with Vision Language Models Paper Page

Name		Name	Last commit message	Last commit date
Latest commit History 96 Commits
ICLR		ICLR
LICENSE		LICENSE
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Embodied-AI-Paper-TopConf

📖 Paper List

ICLR2026

Vision-Language-Action Models

Vision-Language-Navigation Models

World Models

Planning and Reasoning

Navigation

Humanoid

3D Vision

Policy

Dexterous Manipulation

Tactile

Sim2real and Real2sim

Benchmark and Dataset

Other

NeuIPS2025

Vision-Language-Action Model

Data

World Model

Planning and Reasoning

Navigation

Humanoid

3D Vision

Policy

Accelerating and Deploying

Tactile

Dexterous

Benchmark and Dataset

CORL2025

Vision-Language-Action Model

Navigation

Policy

Benchmark and Dataset

Humanoid

World Model

Dexterous Manipulation

Sim-to-Real

ICCV2025

Vision-Language-Action Model

Vision-Language-Navigation Model

Hierarchical Planning

World Model

Policy

Accelerating and Deploying

Perception

Benchmark and Dataset

ICML2025

Vision-Language-Action Models

Planning and Reasoning

Policies

3D Vision

Dataset

RSS2025

CVPR2025

Vision-Language-Action Models

Policies

Grasp

Humanoid

3D Vision

Planning and Reasoning

Video

Sim2real and Real2sim

Benchmark and Dataset

ICLR2025

Vision-Language-Action Models

Policies

3D Vision

Planning and Reasoning

Planning and Reasoning

Video

Sim2real and Real2sim

ICRA2025

About

Resources

License

Packages