Skip to content

Songwxuan/Embodied-AI-Paper-TopConf

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

96 Commits
 
 
 
 
 
 

Repository files navigation

Embodied-AI-Paper-TopConf

đŸ”¥ NeuIPS2025 & CORL2025 & ICCV2025 & ICML2025 & RSS2025 & CVPR2025 & ICLR2025 & ICLR2026 Embodied AI Paper List Resources.

[03/22/2025] We plan to organize more papers on Embodied AI from top conferences in the future and build a more comprehensive paper list. If there are any conference papers you would like to browse or if you have any other suggestions, please feel free to leave an issue.

[04/12/2025] We are updating Embodied AI papers accepted by RSS2025 (Robotics Top Conference)!

[05/21/2025] We are updating Embodied AI papers accepted by ICML2025!

[08/05/2025] We are updating Embodied AI papers accepted by ICCV2025!

[09/30/2025] We are updating Embodied AI papers accepted by CORL2025!

[11/30/2025] We are updating Embodied AI papers accepted by NeuIPS2025!

[03/12/2026] We are updating Embodied AI papers accepted by ICLR2026! (đŸ“– ICLR2026)

đŸ“– Paper List

ICLR2026

đŸ“„ Full List

Vision-Language-Action Models

  • Scaling up Memory for Robotic Control via Experience Retrieval Paper
  • MemoryVLA: Perceptual-Cognitive Memory in Vision-Language-Action Models for Robotic Manipulation Paper
  • PixelVLA: Advancing Pixel-level Understanding in Vision-Language-Action Model Paper
  • Vlaser: Vision-Language-Action Model with Synergistic Embodied Reasoning Paper
  • Disentangled Robot Learning via Separate Forward and Inverse Dynamics Pretraining Paper
  • MetaVLA: Unified Meta Co-Training for Efficient Embodied Adaptation Paper
  • Unifying Diffusion and Autoregression for Generalizable Vision-Language-Action Model Paper
  • Hybrid Training for Vision-Language-Action Models Paper
  • End-to-end Listen, Look, Speak and Act Paper
  • WholeBodyVLA: Towards Unified Latent VLA for Whole-body Loco-manipulation Control Paper
  • RoboOmni: Proactive Robot Manipulation in Omni-modal Context Paper
  • Unified Vision-Language-Action Model Paper
  • SP-VLA: A Joint Model Scheduling and Token Pruning Approach for VLA Model Acceleration Paper
  • Align-Then-stEer: Adapting the Vision-Language Action Models through Unified Latent Guidance Paper
  • AutoQVLA: Not All Channels Are Equal in Vision-Language-Action Model's Quantization Paper
  • Verifier-free Test-Time Sampling for Vision Language Action Models Paper
  • Interleave-VLA: Enhancing Robot Manipulation with Image-Text Interleaved Instructions Paper
  • Unified Diffusion VLA: Vision-Language-Action Model via Joint Discrete Diffusion Diffusion Process Paper
  • Endowing GPT-4 with a Humanoid Body: Building the Bridge Between Off-the-Shelf VLMs and the Physical World Paper
  • On Robustness of Vision-Language-Action Model against Multi-Modal Perturbations Paper
  • Spatially Guided Training for Vision-Language-Action Model Paper
  • Self-Improving Vision-Language-Action Models with Data Generation via Residual RL Paper
  • Action-aware Dynamic Pruning for Efficient Vision-Language-Action Manipulation Paper
  • Spatial Forcing: Implicit Spatial Representation Alignment for Vision-language-action Model Paper
  • Genie Envisioner: A Unified World Foundation Platform for Robotic Manipulation Paper
  • From Spatial to Actions: Grounding Vision-Language-Action Model in Spatial Foundation Priors Paper
  • TwinVLA: Data-Efficient Bimanual Manipulation with Twin Single-Arm Vision-Language-Action Models Paper
  • FASTer: Toward Powerful and Efficient Autoregressive Vision–Language–Action Models with Learnable Action Tokenizer and Block-wise Decoding Paper
  • Embodied Navigation Foundation Model Paper
  • X-VLA: Soft-Prompted Transformer as Scalable Cross-Embodiment Vision-Language-Action Model Paper
  • Actions as Language: Fine-Tuning VLMs into VLAs Without Catastrophic Forgetting Paper
  • OneTwoVLA: A Unified Vision-Language-Action Model with Adaptive Reasoning Paper
  • VLM4VLA: Revisiting Vision-Language-Models in Vision-Language-Action Models Paper
  • Vision-Language-Action Instruction Tuning: From Understanding to Manipulation Paper
  • villa-X: Enhancing Latent Action Modeling in Vision-Language-Action Models Paper
  • From Seeing to Doing: Bridging Reasoning and Decision for Robotic Manipulation Paper

Vision-Language-Navigation Models

  • AutoFly: Vision-Language-Action Model for UAV Autonomous Navigation in the Wild Paper
  • Ground Slow, Move Fast: A Dual-System Foundation Model for Generalizable Vision-Language Navigation Paper
  • Towards Physically Executable 3D Gaussian for Embodied Navigation Paper
  • Uncertainty-Aware Gaussian Map for Vision-Language Navigation Paper
  • OpenFly: A COMPREHENSIVE PLATFORM FOR AERIAL VISION-LANGUAGE NAVIGATION Paper
  • JanusVLN: Decoupling Semantics and Spatiality with Dual Implicit Memory for Vision-Language Navigation Paper
  • CompassNav: Steering From Path Imitation to Decision Understanding In Navigation Paper
  • M$^3$E: Continual Vision-and-Language Navigation via Mixture of Macro and Micro Experts Paper
  • All-day Multi-scenes Lifelong Vision-and-Language Navigation with Tucker Adaptation Paper
  • OmniNav: A Unified Framework for Prospective Exploration and Visual-Language Navigation Paper

World Models

  • Ctrl-World: A Controllable Generative World Model for Robot Manipulation Paper
  • Context and Diversity Matter: The Emergence of In-Context Learning in World Models Paper
  • FantasyWorld: Geometry-Consistent World Modeling via Unified Video and 3D Prediction Paper
  • NeMo-map: Neural Implicit Flow Fields for Spatio-Temporal Motion Mapping Paper
  • Astra: General Interactive World Model with Autoregressive Denoising Paper
  • Empowering Multi-Robot Cooperation via Sequential World Models Paper
  • Test-Time Mixture of World Models for Embodied Agents in Dynamic Environments Paper
  • RIG: Synergizing Reasoning and Imagination in End-to-End Generalist Policy Paper
  • Learning Massively Multitask World Models for Continuous Control Paper
  • Unified 3D Scene Understanding Through Physical World Modeling Paper
  • ExoPredicator: Learning Abstract Models of Dynamic Worlds for Robot Planning Paper
  • Efficient Reinforcement Learning by Guiding World Models with Non-Curated Data Paper
  • Vid2World: Crafting Video Diffusion Models to Interactive World Models Paper
  • WMPO: World Model-based Policy Optimization for Vision-Language-Action Models Paper
  • Object-Centric World Models from Few-Shot Annotations for Sample-Efficient Reinforcement Learning Paper
  • Building spatial world models from sparse transitional episodic memories Paper
  • Cosmos Policy: Fine-Tuning Video Models for Visuomotor Control and Planning Paper
  • WoW!: World Models in a Closed-Loop World Paper

Planning and Reasoning

  • VLMgineer: Vision-Language Models as Robotic Toolsmiths Paper
  • MomaGraph: State-Aware Unified Scene Graphs with Vision-Language Models for Embodied Task Planning Paper
  • Planning with an Embodied Learnable Memory Paper
  • Theory of Space: Can Foundation Models Construct Spatial Beliefs through Active Exploration? Paper
  • Compositional Visual Planning via Inference-Time Diffusion Scaling Paper
  • Experience-based Knowledge Correction for Robust Planning in Minecraft Paper
  • Self-Improving Loops for Visual Robotic Planning Paper
  • BOLT: Decision‑Aligned Distillation and Budget-Aware Routing for Constrained Multimodal QA on Robots Paper
  • ReCAPA: Hierarchical Predictive Correction to Mitigate Cascading Failures Paper
  • One Demo Is All It Takes: Planning Domain Derivation with LLMs from A Single Demonstration Paper
  • EVLP: Learning Unified Embodied Vision-Language Planner with Reinforced Supervised Fine-Tuning Paper
  • Towards Improvisational TAMP: Learning Low-Level Shortcuts in Abstract Planning Graphs Paper
  • Embodied-R1: Reinforced Embodied Reasoning for General Robotic Manipulation Paper
  • Self-Refining Vision Language Model for Robotic Failure Detection and Reasoning Paper
  • Natural Language PDDL (NL-PDDL) for Open-world Goal-oriented Commonsense Regression Planning in Embodied AI Paper
  • SafeFlowMatcher: Safe and Fast Planning using Flow Matching with Control Barrier Functions Paper
  • OmniEVA: Embodied Versatile Planner via Task-Adaptive 3D-Grounded and Embodiment-aware Reasoning Paper

Navigation

  • From Seeing to Experiencing: Scaling Navigation Foundation Models with Reinforcement Learning Paper
  • Lifelong Embodied Navigation Learning Paper
  • CE-Nav: Flow-Guided Reinforcement Refinement for Cross-Embodiment Local Navigation Paper
  • Emergence of Spatial Representation in an Actor-Critic Agent with Hippocampus-Inspired Sequence Generator Paper

Humanoid

  • HWC-Loco: A Hierarchical Whole-Body Control Approach to Robust Humanoid Locomotion Paper
  • Task Tokens: A Flexible Approach to Adapting Behavior Foundation Models Paper
  • BFM-Zero: A Promptable Behavioral Foundation Model for Humanoid Control Using Unsupervised Reinforcement Learning Paper
  • From Language to Locomotion: Retargeting-free Humanoid Control via Motion Latent Guidance Paper

3D Vision

  • Geometry-aware 4D Video Generation for Robot Manipulation Paper
  • PD$^{2}$GS: Part-Level Decoupling and Continuous Deformation of Articulated Objects via Gaussian Splatting Paper
  • Manipulation as in Simulation: Enabling Accurate Geometry Perception in Robots Paper

Policy

  • Master Skill Learning with Policy-Grounded Synergy of LLM-based Reward Shaping and Exploring Paper
  • When would Vision-Proprioception Policies Fail in Robotic Manipulation? Paper
  • ManipEvalAgent: Promptable and Efficient Evaluation Framework for Robotic Manipulation Policies Paper
  • Remotely Detectable Robot Policy Watermarking Paper
  • Difference-Aware Retrieval Polices for Imitation Learning Paper
  • Capturing Visual Environment Structure Correlates with Control Performance Paper
  • VITA: Vision-to-Action Flow Matching Policy Paper
  • DemoGrasp: Universal Dexterous Grasping from a Single Demonstration Paper
  • When a Robot is More Capable than a Human: Learning from Constrained Demonstrators Paper
  • Autonomous Play with Correspondence-Driven Trajectory Warping Paper
  • Cross-Embodiment Offline Reinforcement Learning for Heterogeneous Robot Datasets Paper
  • Uncovering Robot Vulnerabilities through Semantic Potential Fields Paper
  • Time Optimal Execution of Action Chunk Policies Beyond Demonstration Speed Paper
  • Policy Likelihood-based Query Sampling and Critic-Exploited Reset for Efficient Preference-based Reinforcement Learning Paper
  • Rodrigues Network for Learning Robot Actions Paper
  • Reference Guided Skill Discovery Paper
  • Masked Generative Policy for Robotic Control Paper
  • GRL-SNAM: Geometric Reinforcement Learning with Differential Hamiltonians for Navigation and Mapping in Unknown Environments Paper
  • HAMLET: Switch Your Vision-Language-Action Model into a History-Aware Policy Paper
  • Towards Bridging the Gap between Large-Scale Pretraining and Efficient Finetuning for Humanoid Control Paper
  • Learning Video Generation for Robotic Manipulation with Collaborative Trajectory Control Paper
  • Policy Contrastive Decoding for Robotic Foundation Models Paper
  • Demystifying Robot Diffusion Policies: Action Memorization and a Simple Lookup Table Alternative Paper
  • H$^3$DP: Triply‑Hierarchical Diffusion Policy for Visuomotor Learning Paper
  • SimpleVLA-RL: Scaling VLA Training via Reinforcement Learning Paper
  • Compose Your Policies! Improving Diffusion-based or Flow-based Robot Policies via Test-time Distribution-level Composition Paper
  • Abstracting Robot Manipulation Skills via Mixture-of-Experts Diffusion Policies Paper
  • Accelerated co-design of robots through morphological pretraining Paper
  • Generalizable Coarse-to-Fine Robot Manipulation via Language-Aligned 3D Keypoints Paper
  • VER: Vision Expert Transformer for Robot Learning via Foundation Distillation and Dynamic Routing Paper
  • SpikePingpong: Spike Vision-based Fast-Slow Pingpong Robot System Paper
  • EquAct: An SE(3)-Equivariant Multi-Task Transformer for 3D Robotic Manipulation Paper
  • Translating Flow to Policy via Hindsight Online Imitation Paper
  • Hierarchical Value-Decomposed Offline Reinforcement Learning for Whole-Body Control Paper
  • Cortical Policy: A Dual-Stream View Transformer for Robotic Manipulation Paper
  • Geometry-aware Policy Imitation Paper
  • Contractive Diffusion Policies: Robust Action Diffusion via Contractive Score-Based Sampling with Differential Equations Paper
  • Scalable Exploration for High-Dimensional Continuous Control via Value-Guided Flow Paper
  • Mean Flow Policy with Instantaneous Velocity Constraint for One-step Action Generation Paper
  • Emergent Dexterity Via Diverse Resets and Large-Scale Reinforcement Learning Paper
  • Learning Part-Aware Dense 3D Feature Field For Generalizable Articulated Object Manipulation Paper
  • Real-Time Robot Execution with Masked Action Chunking Paper
  • Robust Fine-tuning of Vision-Language-Action Robot Policies via Parameter Merging Paper
  • ViPRA: Video Prediction for Robot Actions Paper
  • RAVEN: End-to-end Equivariant Robot Learning with RGB Cameras Paper

Dexterous Manipulation

  • DexNDM: Closing the Reality Gap for Dexterous In-Hand Rotation via Joint-Wise Neural Dynamics Model Paper
  • EgoDex: Learning Dexterous Manipulation from Large-Scale Egocentric Video Paper
  • RFS: Reinforcement learning with Residual flow steering for dexterous manipulation Paper
  • Learning to Grasp Anything By Playing with Random Toys Paper
  • SARM: Stage-Aware Reward Modeling for Long Horizon Robot Manipulation Paper
  • UniHM: Unified Dexterous Hand Manipulation with Vision Language Model Paper
  • DexMove: Learning Tactile-Guided Non-Prehensile Manipulation with Dexterous Hands Paper
  • VLBiMan: Vision-Language Anchored One-Shot Demonstration Enables Generalizable Bimanual Robotic Manipulation Paper
  • Cross-Embodied Co-Design for Dexterous Hands Paper
  • Robotic Manipulation by Imitating Generated Videos Without Physical Demonstrations Paper
  • Primary-Fine Decoupling for Action Generation in Robotic Imitation Paper

Tactile

  • AnyTouch 2: General Optical Tactile Representation Learning For Dynamic Tactile Perception Paper
  • APPLE: Toward General Active Perception via Reinforcement Learning Paper

Sim2real and Real2sim

  • D-REX: Differentiable Real-to-Sim-to-Real Engine for Learning Dexterous Grasping Paper
  • DemoGrasp: Universal Dexterous Grasping from a Single Demonstration Paper
  • Sim2Real VLA: Zero-Shot Generalization of Synthesized Skills to Realistic Manipulation Paper
  • Exo-Plore: Exploring Exoskeleton Control Space through Human-aligned Simulation Paper
  • Emergent Dexterity Via Diverse Resets and Large-Scale Reinforcement Learning Paper
  • Manipulation as in Simulation: Enabling Accurate Geometry Perception in Robots Paper
  • Latent Adaptation of Foundation Policies for Sim-to-Real Transfer Paper
  • RobotArena $\infty$: Unlimited Robot Benchmarking via Real-to-Sim Translation Paper
  • PD$^{2}$GS: Part-Level Decoupling and Continuous Deformation of Articulated Objects via Gaussian Splatting Paper
  • Contact-guided Real2Sim from Monocular Video with Planar Scene Primitives Paper

Benchmark and Dataset

  • D2E: Scaling Vision-Action Pretraining on Desktop Data for Transfer to Embodied AI Paper
  • Memory, Benchmark & Robots: A Benchmark for Solving Complex Tasks with Reinforcement Learning Paper
  • DataMIL: Selecting Data for Robot Imitation Learning with Datamodels Paper
  • MIMIC: Mask-Injected Manipulation Video Generation with Interaction Control Paper
  • LeRobot: An Open-Source Library for End-to-End Robot Learning Paper
  • RobotArena $\infty$: Unlimited Robot Benchmarking via Real-to-Sim Translation Paper
  • RoboInter: A Holistic Intermediate Representation Suite Towards Robotic Manipulation Paper
  • ENACT: Evaluating Embodied Cognition with World Modeling of Egocentric Interaction Paper
  • AutoBio: A Simulation and Benchmark for Robotic Automation in Digital Biology Laboratory Paper
  • Image Quality Assessment for Embodied AI Paper
  • MoMaGen: Generating Demonstrations under Soft and Hard Constraints for Multi-Step Bimanual Mobile Manipulation Paper
  • CoNavBench: Collaborative Long-Horizon Vision-Language Navigation Benchmark Paper
  • World2Minecraft: Occupancy-Driven simulated scenes Construction Paper
  • CitySeeker: How Do VLMs Explore Embodied Urban Navigation with Implicit Human Needs? Paper
  • Seeing Across Views: Benchmarking Spatial Reasoning of Vision-Language Models in Robotic Scenes Paper
  • RoboCasa365: A Large-Scale Simulation Framework for Training and Benchmarking Generalist Robots Paper
  • REI-Bench: Can Embodied Agents Understand Vague Human Instructions in Task Planning? Paper

Other

  • On the Generalization Capacities of MLLMs for Spatial Intelligence Paper
  • Embodied Agents Meet Personalization: Investigating Challenges and Solutions Through the Lens of Memory Utilization Paper
  • Interaction-aware Representation Modeling With Co-Occurrence Consistency for Egocentric Hand-Object Parsing Paper
  • PhyScensis: Physics-Augmented LLM Agents for Complex Physical Scene Arrangement Paper
  • OmniActor: A Generalist GUI and Embodied Agent for 2D&3D Worlds Paper
  • EgoWorld: Translating Exocentric View to Egocentric View using Rich Exocentric Observations Paper
  • Contact-guided Real2Sim from Monocular Video with Planar Scene Primitives Paper

NeuIPS2025

Vision-Language-Action Model

  • Fast-in-Slow: A Dual-System VLA Model Unifying Fast Manipulation within Slow Reasoning Paper Page
  • AC-DiT: Adaptive Coordination Diffusion Transformer for Mobile Manipulation Paper Page
  • BridgeVLA: Input-Output Alignment for Efficient 3D Manipulation Learning with Vision-Language Models Paper Page
  • CogVLA: Cognition-Aligned Vision-Language-Action Models via Instruction-Driven Routing & Sparsification Paper Page
  • VideoVLA: Video Generators Can Be Generalizable Robot Manipulators
  • ChatVLA-2: Vision-Language-Action Model with Open-World Reasoning Paper Page
  • Exploring the Limits of Vision-Language-Action Manipulation in Cross-task Generalization Paper Page
  • BadVLA: Towards Backdoor Attacks on Vision-Language-Action Models via Objective-Decoupled Optimization Paper Page
  • Compliant Residual DAgger: Improving Real-World Contact-Rich Manipulation with Human Corrections Paper Page
  • VLA-OS: Structuring and Dissecting Planning Representations and Paradigms in Vision-Language-Action Models Paper Page
  • ThinkAct: Vision-Language-Action Reasoning via Reinforced Visual Latent Planning Paper Page
  • Self-Improving Embodied Foundation Models Paper Page
  • Robo2VLM: Improving Visual Question Answering using Large-Scale Robot Manipulation Data Paper Page
  • EnerVerse: Envisioning Embodied Future Space for Robotics Manipulation Paper
  • Learning Spatial-Aware Manipulation Ordering Paper
  • PRIMT: Preference-based Reinforcement Learning with Multimodal Feedback and Trajectory Synthesis from Foundation Models Paper
  • BEAST: Efficient Tokenization of B-Splines Encoded Action Sequences for Imitation Learning Paper
  • PointMapPolicy: Structured Point Cloud Processing for Multi-Modal Imitation Learning Paper
  • Real-Time Execution of Action Chunking Flow Policies Paper Page
  • Chain-of-Action: Trajectory Autoregressive Modeling for Robotic Manipulation Paper Page
  • 4D-VLA: Spatiotemporal Vision-Language-Action Pretraining with Cross-Scene Calibration
  • SAFE: Multitask Failure Detection for Vision-Language-Action Models Paper Page
  • Blindfolded Experts Generalize Better: Insights from Robotic Manipulation and Videogames Paper Page
  • HiMaCon: Discovering Hierarchical Manipulation Concepts from Unlabeled Multi-Modal Data Paper
  • Knowledge Insulating Vision-Language-Action Models: Train Fast, Run Fast, Generalize Better Paper
  • Provable Ordering and Continuity in Vision-Language Pretraining for Generalizable Embodied Agents Paper Page
  • DreamVLA: A Vision-Language-Action Model Dreamed with Comprehensive World Knowledge Paper Page

Data

  • EgoBridge: Domain Adaptation for Generalizable Imitation from Egocentric Human Data Paper Page
  • RobotSmith: Generative Robotic Tool Design for Acquisition of Complex Manipulation Skill Paper Page
  • URDF-Anything: Constructing Articulated Objects with 3D Multimodal Language Model Paper
  • DEAL: Diffusion Evolution Adversarial Learning for Sim-to-Real Transfer
  • Generalizable Domain Adaptation for Sim-and-Real Policy Co-Training Paper Page

World Model

  • SAMPO: Scale-wise Autoregression with Motion Prompt for Generative World Models Paper
  • Learning 3D Persistent Embodied World Models Paper
  • OSVI-WM: One-Shot Visual Imitation for Unseen Tasks using World-Model-Guided Trajectory Generation Paper

Planning and Reasoning

  • Towards Reliable LLM-based Robots Planning via Combined Uncertainty Estimation Paper
  • Towards Reliable Code-as-Policies: A Neuro-Symbolic Framework for Embodied Task Planning Paper
  • RDD: Retrieval-Based Demonstration Decomposer for Planner Alignment in Long-Horizon Tasks Paper Page
  • UniDomain: Pretraining a Unified PDDL Domain from Real-World Demonstrations for Generalizable Robot Task Planning Paper
  • InstructFlow: Adaptive Symbolic Constraint-Guided Code Generation for Long-Horizon Planning

Navigation

  • C-NAV: Towards Self-Evolving Continual Object Navigation in Open World Paper Page
  • Distilling LLM Prior to Flow Model for Generalizable Agent’s Imagination in Object Goal Navigation Paper
  • TP-MDDN: Task-Preferenced Multi-Demand-Driven Navigation with Autonomous Decision-Making
  • Active Test-time Vision-Language Navigation Paper
  • Aux-Think: Exploring Reasoning Strategies for Data-Efficient Vision-Language Navigation
  • EfficientNav: Towards On-Device Object-Goal Navigation with Navigation Map Caching and Retrieval Paper
  • Seeing through Uncertainty: Robust Task-Oriented Optimization in Visual Navigation Paper Page

Humanoid

  • Adversarial Locomotion and Motion Imitation for Humanoid Policy Learning Paper Page
  • From Experts to a Generalist: Toward General Whole-Body Control for Humanoid Robots Paper Page
  • KungfuBot: Physics-Based Humanoid Whole-Body Control for Learning Highly-Dynamic Skills Paper Page

3D Vision

  • DynaRend: Learning 3D Dynamics via Masked Future Rendering for Robotic Manipulation Paper
  • Building 3D Representations and Generating Motions From a Single Image via Video-Generation Paper

Policy

  • Emerging Risks from Embodied AI Require Urgent Policy Action
  • Human-assisted Robotic Policy Refinement via Action Preference Optimization Paper Page
  • Hyper-GoalNet: Goal-Conditioned Manipulation Policy Learning with HyperNetworks
  • ReinFlow: Fine-tuning Flow Matching Policy with Online Reinforcement Learning Paper Page
  • Diversifying Parallel Ergodic Search: A Signature Kernel Evolution Strategy
  • FreqPolicy: Efficient Flow-based Visuomotor Policy via Frequency Consistency Paper
  • A Practical Guide for Incorporating Symmetry in Diffusion Policy Paper
  • Latent Policy Barrier: Learning Robust Visuomotor Policies by Staying In-Distribution Paper Page
  • Quantization-Free Autoregressive Action Transformer Paper
  • Real-World Reinforcement Learning of Active Perception Behaviors
  • Failure Prediction at Runtime for Generative Robot Policies Paper
  • Act to See, See to Act: Diffusion-Driven Perception-Action Interplay for Adaptive Policies Paper Page
  • Dynamic Test-Time Compute Scaling in Control Policy: Difficulty-Aware Stochastic Interpolant Policy Paper
  • DynaGuide: Steering Diffusion Polices with Active Dynamic Guidance Paper Page
  • World-aware Planning Narratives Enhance Large Vision-Language Model Planner Paper

Accelerating and Deploying

  • Accelerating Visual-Policy Learning through Parallel Differentiable Simulation Paper Page
  • EfficientVLA: Training-Free Acceleration and Compression for Vision-Language-Action Models Paper
  • A Smooth Sea Never Made a Skilled SAILOR: Robust Imitation via Learning to Search Paper Page
  • VLA-Cache: Efficient Vision-Language-Action Manipulation via Adaptive Token Caching Paper Page

Tactile

  • Universal Visuo-Tactile Video Understanding for Embodied Interaction Paper
  • Enhancing Tactile-based Reinforcement Learning for Robotic Control Paper Page
  • Taccel: Scaling Up Vision-based Tactile Robotics via High-performance GPU Simulation Paper Page
  • Toward Artificial Palpation: Representation Learning of Touch on Soft Bodies Paper Page
  • Touch in the Wild: Learning Fine-Grained Manipulation with a Portable Visuo-Tactile Gripper Paper Page

Dexterous

  • Contact Map Transfer with Conditional Diffusion Model for Generalizable Dexterous Grasp Generation Paper Page
  • HumanoidGen: Data Generation for Bimanual Dexterous Manipulation via LLM Reasoning Paper Page
  • Grasp2Grasp: Vision-Based Dexterous Grasp Translation via Schrödinger Bridges Paper Page
  • Scaffolding Dexterous Manipulation with Vision-Language Models Paper Page
  • DexFlyWheel: A Scalable and Self-improving Data Generation Framework for Dexterous Manipulation Paper Page
  • DexGarmentLab: Dexterous Garment Manipulation Environment with Generalizable Policy Paper Page

Benchmark and Dataset

  • RoboCerebra: A Large-scale Benchmark for Long-horizon Robotic Manipulation Evaluation Paper Page
  • SutureBot: A Precision Framework & Benchmark For Autonomous End-to-End Suturing Paper Page
  • Synthesizing Photorealistic and Dynamic Urban Environments for Multimodal Robot Navigation and Collaboration
  • LabUtopia: High-Fidelity Simulation and Hierarchical Benchmark for Scientific Embodied Agents Paper Page
  • SonoGym: High Performance Simulation for Challenging Surgical Tasks with Robotic Ultrasound Paper Page
  • Embodied Crowd Counting
  • PAC Bench: Do Foundation Models Understand Prerequisites for Executing Manipulation Policies? Paper

CORL2025

Vision-Language-Action Model

  • $\pi_{0.5}$: a Vision-Language-Action Model with Open-World Generalization Paper page
  • Training Strategies for Efficient Embodied ReasoningPaper page
  • Long-VLA: Unleashing Long-Horizon Capability of Vision Language Action Model for Robot Manipulation Paper page
  • RoboMonkey: Scaling Test-Time Sampling and Verification for Vision-Language-Action Models Paper page
  • RoboChemist: Long-Horizon and Safety-Compliant Robotic Chemical Experimentation Paper page
  • TA-VLA: Elucidating the Design Space of Torque-aware Vision-Language-Action Models Paper page
  • Focusing on What Matters: Object-Agent-centric Tokenization for Vision Language Action models Paper
  • FLOWER: Democratizing Generalist Robot Policies with Efficient Vision-Language-Action Flow Policies Paper page
  • Mechanistic Interpretability for Steering Vision-Language-Action Models Paper
  • RICL: Adding In-Context Adaptability to Pre-Trained Vision-Language-Action Models Paper page
  • DexVLA: Vision-Language Model with Plug-In Diffusion Expert for General Robot Control Paper page
  • FLARE: Robot Learning with Implicit World Modeling Paper page
  • 3DS-VLA: A 3D Spatial-Aware Vision Language Action Model for Robust Multi-Task Manipulation Paper
  • GraspVLA: a Grasping Foundation Model Pre-trained on Billion-scale Synthetic Action Data Paper page
  • EndoVLA: Dual-Phase Vision-Language-Action for Precise Autonomous Tracking in Endoscopy Paper
  • MoTo: A Zero-shot Plug-in Interaction-aware Navigation for General Mobile Manipulation Paper page
  • ControlVLA: Few-shot Object-centric Adaptation for Pre-trained Vision-Language-Action Models Paper page
  • TrackVLA: Embodied Visual Tracking in the Wild Paper page
  • AnyPlace: Learning Generalizable Object Placement for Robot Manipulation Paper page
  • Generalist Robot Manipulation beyond Action Labeled Data Paper page
  • LaVA-Man: Learning Visual Action Representations for Robot Manipulation Paper page

Navigation

  • MoTo: A Zero-shot Plug-in Interaction-aware Navigation for General Mobile Manipulation
  • Meta-Optimization and Program Search using Language Models for Task and Motion Planning
  • ObjectReact: Learning Object-Relative Control for Visual Navigation
  • HALO: Human Preference Aligned Offline Reward Learning for Robot Navigation
  • Imagine, Verify, Execute: Memory-guided Agentic Exploration with Vision-Language Models
  • Long Range Navigator (LRN): Extending robot planning horizons beyond metric maps
  • Search-TTA: A Multi-Modal Test-Time Adaptation Framework for Visual Search in the Wild
  • ActLoc: Learning to Localize on the Move via Active Viewpoint Selection
  • Human-like Navigation in a World Built for Humans
  • GC-VLN: Instruction as Graph Constraints for Training-free Vision-and-Language Navigation
  • GraspMolmo: Generalizable Task-Oriented Grasping via Large-Scale Synthetic Data Generation
  • Belief-Conditioned One-Step Diffusion: Real-Time Trajectory Planning with Just-Enough Sensing

Policy

  • ImMimic: Cross-Domain Imitation from Human Videos via Mapping and Interpolation Paper page
  • ReWiND: Language-Guided Rewards Teach Robot Policies without New Demonstrations Paper page
  • Steering Your Diffusion Policy with Latent Space Reinforcement Learning Paper page
  • Streaming Flow Policy: Simplifying diffusion/flow-matching policies by treating action trajectories as flow trajectories Paper page
  • SAIL: Faster-than-Demonstration Execution of Imitation Learning Policies Paper page
  • Reactive In-Air Clothing Manipulation with Confidence-Aware Dense Correspondence and Visuotactile Affordance Paper page
  • Data Retrieval with Importance Weights for Few-Shot Imitation Learning Paper page
  • X-Sim: Cross-Embodiment Learning via Real-to-Sim-to-Real Paper
  • DemoSpeedup: Accelerating Visuomotor Policies via Entropy-Guided Demonstration Acceleration Paper page
  • ManiFlow: A General Robot Manipulation Policy via Consistency Flow Training Paper page
  • Text2Touch: Tactile In-Hand Manipulation with LLM-Designed Reward Functions Paper page
  • Multi-Loco: Unifying Multi-Embodiment Legged Locomotion via Reinforcement Learning Augmented Diffusion Paper page
  • $\texttt{SPIN}$: distilling $\texttt{Skill-RRT}$ for long-horizon prehensile and non-prehensile manipulation Paper
  • Imitation Learning Based on Disentangled Representation Learning of Behavioral Characteristics Paper
  • Constraint-Preserving Data Generation for One-Shot Visuomotor Policy Generalization Paper page
  • CLASS: Contrastive Learning via Action Sequence Supervision for Robot Manipulation Paper page
  • MirrorDuo: Reflection-Consistent Visuomotor Learning from Mirrored Demonstration Pairs page
  • Dynamics-Compliant Trajectory Diffusion for Super-Nominal Payload Manipulation Paper
  • Eye, Robot: Learning to Look to Act with a BC-RL Perception-Action Loop Paper page
  • ARCH: Hierarchical Hybrid Learning for Long-Horizon Contact-Rich Robotic Assembly Paper page
  • KDPE: A Kernel Density Estimation Strategy for Diffusion Policy Trajectory Selection Paper page
  • AimBot: A Simple Auxiliary Visual Cue to Enhance Spatial Awareness of Visuomotor Policies Paper page
  • Enabling Long(er) Horizon Imitation for Manipulation Tasks by Modeling Subgoal Transitions
  • Mobi-$\pi$: Mobilizing Your Robot Learning Policy Paper page
  • Action-Free Reasoning for Policy Generalization Paper page
  • Learn from What We HAVE: History-Aware VErifier that Reasons about Past Interactions Online Paper page
  • D-CODA: Diffusion for Coordinated Dual-Arm Data Augmentation Paper page
  • ATK: Automatic Task-driven Keypoint Selection for Robust Policy Learning Paper page
  • Poke and Strike: Learning Task-Informed Exploration Policies Paper page
  • SafeBimanual: Diffusion-based trajectory optimization for safe bimanual manipulation Paper page
  • COMBO-Grasp: Learning Constraint-Based Manipulation for Bimanual Occluded Grasping Paper
  • Phantom: Training Robots Without Robots Using Only Human Videos Paper page
  • Learning Long-Context Diffusion Policies via Past-Token Prediction Paper
  • VT-Refine: Learning Bimanual Assembly with Visuo-Tactile Feedback via Simulation Fine-Tuning Paper
  • COLLAGE: Adaptive Fusion-based Retrieval for Augmented Policy Learning Paper page
  • CDP: Towards Robust Autoregressive Visuomotor Policy Learning via Causal Diffusion Paper page
  • Robust Dexterous Grasping of General Objects Paper page
  • Point Policy: Unifying Observations and Actions with Key Points for Robot Manipulation Paper page

Benchmark and Dataset

  • RoboArena: Distributed Real-World Evaluation of Generalist Robot Policies
  • GraspVLA: a Grasping Foundation Model Pre-trained on Billion-scale Synthetic Action Data
  • CUPID: Curating Data your Robot Loves with Influence
  • AutoEval: Autonomous Evaluation of Generalist Robot Manipulation Policies in the Real World
  • ManipBench: Benchmarking Vision-Language Models for Low-Level Robot Manipulation Functions
  • Ensuring Force Safety in Vision-Guided Robotic Manipulation via Implicit Tactile Calibration
  • Crossing the Human-Robot Embodiment Gap with Sim-to-Real RL using One Human Demonstration
  • UniSkill: Imitating Human Videos via Cross-Embodiment Skill Representations

Humanoid

  • HuB: Learning Extreme Humanoid Balance
  • Versatile Loco-Manipulation through Flexible Interlimb Coordination
  • Visual Imitation Enables Contextual Humanoid Control
  • Hand-Eye Autonomous Delivery: Learning Humanoid Navigation, Locomotion and Reaching
  • CLONE: Closed-Loop Whole-Body Humanoid Teleoperation for Long-Horizon Tasks
  • Embrace Contacts: humanoid shadowing with full body ground contacts
  • Hold My Beer: Learning Gentle Humanoid Locomotion and End-Effector Stabilization Control
  • SLAC: Simulation-Pretrained Latent Action Space for Whole-Body Real-World RL
  • Robot Trains Robot: Automatic Real-World Policy Adaptation and Learning for Humanoids
  • Humanoid Policy ~ Human Policy

World Model

  • Real2Render2Real: Scaling Robot Data Without Dynamics Simulation or Robot Hardware
  • Cross-Sensor Touch Generation
  • WoMAP: World Models For Embodied Open-Vocabulary Object Localization
  • DreamGen: Unlocking Generalization in Robot Learning through Video World Models
  • Tool-as-Interface: Learning Robot Policies from Observing Human Tool Use
  • Articulated Object Estimation in the Wild
  • DiWA: Diffusion Policy Adaptation with World Models
  • Steerable Scene Generation with Post Training and Inference-Time Search
  • Generative Visual Foresight Meets Task-Agnostic Pose Estimation in Robotic Table-top Manipulation
  • Gen2Act: Human Video Generation in Novel Scenarios enables Generalizable Robot Manipulation
  • Reflective Planning: Vision-Language Models for Multi-Stage Long-Horizon Robotic Manipulation
  • LaDi-WM: A Latent Diffusion-Based World Model for Predictive Manipulation

Dexterous Manipulation

  • DexUMI: Using Human Hand as the Universal Manipulation Interface for Dexterous Manipulation page
  • Dexplore: Scalable Neural Control for Dexterous Manipulation from Reference Scoped Exploration
  • FFHFlow: Diverse and Uncertainty-Aware Dexterous Grasp Generation via Flow Variational Inference
  • GraspQP: Differentiable Optimization of Force Closure for Diverse and Robust Dexterous Grasping page
  • Morphologically Symmetric Reinforcement Learning for Ambidextrous Bimanual Manipulation
  • KineDex: Learning Tactile-Informed Visuomotor Policies via Kinesthetic Teaching for Dexterous Manipulation
  • D-Cubed: Latent Diffusion Trajectory Optimisation for Dexterous Deformable Manipulation
  • LodeStar: Long-horizon Dexterity via Synthetic Data Augmentation from Human Demonstrations

Sim-to-Real

  • The Sound of Simulation: Learning Multimodal Sim-to-Real Robot Policies with Generative Audio
  • FetchBot: Learning Generalizable Object Fetching in Cluttered Scenes via Zero-Shot Sim2Real
  • ClutterDexGrasp: A Sim-to-Real System for General Dexterous Grasping in Cluttered Scenes
  • SimShear: Sim-to-Real Shear-based Tactile Servoing
  • Wheeled Lab: Modern Sim2Real for Low-cost, Open-source Wheeled Robotics
  • Articulate AnyMesh: Open-vocabulary 3D Articulated Objects Modeling
  • AgentWorld: An Interactive Simulation Platform for Scene Construction and Mobile Robotic Manipulation
  • Robot Learning from Any Images

ICCV2025

Vision-Language-Action Model

  • Exploring the Adversarial Vulnerabilities of Vision-Language-Action Models in Robotics Paper page
  • VQ-VLA: Improving Vision-Language-Action Models via Scaling Vector-Quantized Action Tokenizers Paper page
  • Dita: Scaling Diffusion Transformer for Generalist Vision-Language-Action Policy Paper page
  • Moto: Latent Motion Token as the Bridging Language for Learning Robot Manipulation from Videos Paper page
  • A0: An Affordance-Aware Hierarchical Model for General Robotic Manipulation Paper page
  • Embodied VideoAgent: Persistent Memory from Egocentric Videos and Embodied Sensors Enables Dynamic Scene Understanding Paper page
  • CoA-VLA: Improving Vision-Language-Action Models via Visual-Text Chain-of-Affordance Paper
  • FedVLA: Federated Vision-Language-Action Learning with Dual Gating Mixture-of-Experts for Robotic Manipulation Paper
  • Towards Long-Horizon Vision-Language-Action System: Reasoning, Acting and Memory Paper
  • PASG: A Closed-Loop Framework for Automated Geometric Primitive Extraction and Semantic Anchoring in Robotic Manipulation Paper
  • SD2Actor: Continuous State Decomposition via Diffusion Embeddings for Robotic Manipulation Paper

Vision-Language-Navigation Model

  • Move to Understand a 3D Scene: Bridging Visual Grounding and Exploration for Efficient and Versatile Embodied Navigation Paper page
  • Rethinking the Embodied Gap in Vision-and-Language Navigation: A Holistic Study of Physical and Visual Disparities Paper page
  • P3Nav: A Unified Framework for Embodied Navigation Integrating Perception, Planning, and Prediction Paper
  • SAME: Learning Generic Language-Guided Visual Navigation with State-Adaptive Mixture of Experts Paper page
  • NavMorph: A Self-Evolving World Model for Vision-and-Language Navigation in Continuous Environments Paper page
  • Harnessing Input-adaptive Inference for Efficient VLN Paper
  • Embodied Navigation with Auxiliary Task of Action Description Prediction Paper
  • 3D Gaussian Map with Open-Set Semantic Grouping for Vision-Language Navigation Paper
  • NavQ: Learning a Q-Model for Foresighted Vision-and-Language Navigation Paper
  • monoVLN: Bridging the Observation Gap between Monocular and Panoramic Vision and Language Navigation Paper

Hierarchical Planning

  • Adaptive Articulated Object Manipulation On The Fly with Foundation Model Reasoning and Part Grounding Paper
  • CogNav: Cognitive Process Modeling for Object Goal Navigation with LLMs Paper page
  • RoBridge: A Hierarchical Architecture Bridging Cognition and Execution for General Robotic Manipulation Paper page

World Model

  • IRASim: A Fine-Grained World Model for Robot Manipulation Paper page
  • GWM: Towards Scalable Gaussian World Models for Robotic Manipulation Paper page
  • DyWA: Dynamics-adaptive World Action Model for Generalizable Non-prehensile Manipulation Paper page
  • Diffusion-Based Imaginative Coordination for Bimanual Manipulation Paper
  • Learning 4D Embodied World Models Paper

Policy

  • Rethinking Bimanual Robotic Manipulation: Learning with Decoupled Interaction Framework Paper
  • EC-Flow: Enabling Versatile Robotic Manipulation from Action-Unlabeled Videos via Embodiment-Centric Flow Paper page
  • Dense Policy: Bidirectional Autoregressive Learning of Actions Paper page
  • AnyBimanual: Transferring Unimanual Policy for General Bimanual Manipulation Paper page
  • Learning Precise Affordances from Egocentric Videos for Robotic Manipulation Paper page
  • iManip: Skill-Incremental Learning for Robotic Manipulation Paper
  • Spatial-Temporal Aware Visuomotor Diffusion Policy Learning Paper page
  • Wavelet Policy: Lifting Scheme for Policy Learning in Long-Horizon Tasks Paper page
  • 4D Visual Pre-training for Robot Learning Paper

Accelerating and Deploying

  • Saliency-Aware Quantized Imitation Learning for Efficient Robotic Control Paper
  • On-Device Diffusion Transformer Policy for Efficient Robot Manipulation Paper
  • COSMO: Combination of Selective Memorization for Low-cost Vision-and-Language Navigation Paper
  • CARP: Coarse-to-Fine Autoregressive Prediction for Visuomotor Policy Learning Paper page

Perception

  • EmbodiedOcc: Embodied 3D Occupancy Prediction for Vision-based Online Scene Understanding Paper page
  • Embodied Image Captioning: Self-supervised Learning Agents for Spatially Coherent Image Descriptions Paper page

Benchmark and Dataset

  • VLABench: A Large-Scale Benchmark for Language-Conditioned Robotics Manipulation with Long-Horizon Reasoning Tasks Paper page
  • RoboFactory: Exploring Embodied Agent Collaboration with Compositional Constraints Paper page
  • HUMOTO: A 4D Dataset of Mocap Human Object Interactions Paper page
  • RoboMM: All-in-One Multimodal Large Model for Robotic Manipulation Paper
  • MoMa-Kitchen: A 100K+ Benchmark for Affordance-Grounded Last-Mile Navigation in Mobile Manipulation Paper page
  • RoboPearls: Editable Video Simulation for Robot Manipulation Paper page
  • DexH2R: A Benchmark for Dynamic Dexterous Grasping in Human-to-Robot Handover Paper page
  • Beyond the Destination: A Novel Benchmark for Exploration-Aware Embodied Question Answering Paper page
  • RobAVA: A Large-scale Dataset and Baseline Towards Video based Robotic Arm Action Understanding Paper
  • RoboAnnotatorX: A Comprehensive and Universal Annotation Framework for Accurate Understanding of Long-horizon Robot Demonstration Paper

ICML2025

Vision-Language-Action Models

  • Hi Robot: Open-Ended Instruction Following with Hierarchical Vision-Language-Action Models Paper
  • OTTER: A Vision-Language-Action Model with Text-Aware Visual Feature Extraction paper page
  • UP-VLA: A Unified Understanding and Prediction Model for Embodied Agent paper
  • ELEMENTAL: Interactive Learning from Demonstrations and Vision-Language Models for Reward Design in Robotics paper
  • ReinboT: Amplifying Robot Visual-Language Manipulation with Reinforcement Learning paper
  • A Large Recurrent Action Model: xLSTM enables Fast Inference for Robotics Tasks paper page

Planning and Reasoning

  • Efficient Robotic Policy Learning via Latent Space Backward Planning paper page
  • Closed-Loop Long-Horizon Robotic Planning via Equilibrium Sequence Modeling paper page

Policies

  • SAM2Act:Integrating Visual Foundation Model with A Memory Architecture for Robotic Manipulation paper
  • Pre-training Auto-regressive Robotic Models with 4D Representations paper page
  • Flow-based Domain Randomization for Learning and Sequencing Robotic Skills paper
  • EmbodiedBench: Comprehensive Benchmarking Multi-modal Large Language Models for Vision-Driven Embodied Agents paper page
  • Learning Policy Committees for Effective Personalization in MDPs with Diverse Tasks paper
  • Video Prediction Policy: A Generalist Robot Policy with Predictive Visual Representations paper page
  • STAR: Learning Diverse Robot Skill Abstractions through Rotation-Augmented paper page

3D Vision

  • Unifying 2D and 3D Vision-Language Understanding paper page
  • GAPrompt: Geometry-Aware Point Cloud Prompt for 3D Vision Model paper page

Dataset

  • WOMD-Reasoning: A Large-Scale Dataset for Interaction Reasoning in Driving paper page

RSS2025

  • Unified World Models: Coupling Video and Action Diffusion for Pretraining on Large Robotic Datasets Paper Page
  • CordViP: Correspondence-based Visuomotor Policy for Dexterous Manipulation in Real-World Paper Page
  • Reactive Diffusion Policy: Slow-Fast Visual-Tactile Policy Learning for Contact-Rich Manipulation Paper Page
  • Dynamic Rank Adjustment in Diffusion Policies for Efficient and Flexible Training Paper
  • SpatialVLA: Exploring Spatial Representations for Visual-Language-Action Model Paper
  • Sketch-to-Skill: Bootstrapping Robot Learning with Human Drawn Trajectory Sketches Paper
  • NaVILA: Legged Robot Vision-Language-Action Model for Navigation Paper Page
  • ConRFT: A Reinforced Fine-tuning Method for VLA Models via Consistency Policy Paper Page
  • You Only Teach Once: Learn One-Shot Bimanual Robotic Manipulation from Video Demonstrations Paper Page
  • ASAP: Aligning Simulation and Real-World Physics for Learning Agile Humanoid Whole-Body Skills Paper Page
  • Flying Hand: End-Effector-Centric Framework for Versatile Aerial Manipulation Teleoperation and Policy Learning Paper Page
  • DemoGen: Synthetic Demonstration Generation for Data-Efficient Visuomotor Policy Learning Paper Page
  • DOGlove: Dexterous Manipulation with a Low-Cost Open-Source Haptic Force Feedback Glove Paper Page
  • RoboSplat: Novel Demonstration Generation with Gaussian Splatting Enables Robust One-Shot Manipulation Paper Page
  • Enhancing Autonomous Driving Systems with On-Board Deployed Large Language Models Paper
  • SATA: Safe and Adaptive Torque-Based Locomotion Policies Inspired by Animal Learning Paper Video
  • FACTR: Force-Attending Curriculum Training for Contact-Rich Policy Learning Paper Page
  • RoboVerse: Towards a Unified Platform, Dataset and Benchmark for Scalable and Generalizable Robot Learning Paper Page
  • STDArm: Transferring Visuomotor Policies From Static Data Training to Dynamic Robot Manipulation Paper

CVPR2025

Vision-Language-Action Models

  • UniAct: Universal Actions For Enhanced Embodied Foundation Models Paper Page
  • MoManipVLA: Transferring Vision-language-action Models for General Mobile Manipulation Paper Page
  • CoT-VLA: Visual Chain-of-Thought Reasoning for Vision-Language-Action Models Paper
  • SOLAMI: Social Vision-Language-Action Modeling for Immersive Interaction with 3D Autonomous Characters Paper Page
  • A Data-Centric Revisit of Pre-Trained Vision Models for Robot Learning Paper Page
  • Think Small, Act Big: Primitive Prompt Learning for Lifelong Robot Manipulation
  • Phoenix: A Motion-based Self-Reflection Framework for Fine-grained Robotic Action Correction Paper
  • OmniManip: Towards General Robotic Manipulation via Object-Centric Interaction Primitives as Spatial Constraints Paper Page
  • Mitigating the Human-Robot Domain Discrepancy in Visual Pre-training for Robotic Manipulation Paper
  • Object-Centric Prompt-Driven Vision-Language-Action Model for Robotic Manipulation Abstract
  • Robotic Visual Instruction
  • RoboGround: Robot Manipulation with Grounded Vision-Language Priors

Policies

  • KStar Diffuser: Spatial-Temporal Graph Diffusion Policy with Kinematics Modeling for Bimanual Robotic Manipulation Paper
  • RoboPEPP: Vision-Based Robot Pose and Joint Angle Estimation through Embedding Predictive Pre-Training Paper
  • Lift3D Policy: Lifting 2D Foundation Models for Robust 3D Robotic Manipulation Paper Page
  • PDFactor: Learning Tri-Perspective View Policy Diffusion Field for Multi-Task Robotic Manipulation Abstract
  • Two by Two: Learning Cross-Task Pairwise Objects Assembly for Generalizable Robot Manipulation
  • FlowRAM: Grounding Flow Matching Policy with Region-Aware Mamba Framework for Robotic Manipulation Abstract
  • G3Flow: Generative 3D Semantic Flow for Pose-aware and Generalizable Object Manipulation Paper Page
  • DexHandDiff: Interaction-aware Diffusion Planning for Adaptive Dexterous Manipulation Paper Page
  • Tra-MoE: Learning Trajectory Prediction Model from Multiple Domains for Adaptive Policy Conditioning Paper
  • AffordDP: Generalizable Diffusion Policy with Transferable AffordancePaper Page
  • Tra-MoE: Learning Trajectory Prediction Model from Multiple Domains for Adaptive Policy Conditioning Paper Page

Grasp

  • UniGraspTransformer: Simplified Policy Distillation for Scalable Dexterous Robotic Grasping Paper Page
  • DexGrasp Anything: Towards Universal Robotic Dexterous Grasping with Physics Awareness Paper Page
  • ZeroGrasp: Zero-Shot Shape Reconstruction Enabled Robotic Grasping Paper

Humanoid

  • Let Humanoid Robots Go Hiking! Integrative Skill Development over Complex Trails Paper Page
  • MobileH2R: Learning Generalizable Human to Mobile Robot Handover Exclusively from Scalable and Diverse Synthetic Data Paper

3D Vision

  • 3D-MVP: 3D Multiview Pretraining for Robotic Manipulation Paper Page
  • VidBot: Learning Generalizable 3D Actions from In-the-Wild 2D Human Videos for Zero-Shot Robotic ManipulationPaper Page
  • Touch2Shape: Touch-Conditioned 3D Diffusion for Shape Exploration and Reconstruction Abs

Planning and Reasoning

  • RoboBrain: A Unified Brain Model for Robotic Manipulation from Abstract to Concrete Paper
  • PhysVLM: Enabling Visual Language Models to Understand Robotic Physical Reachability Paper
  • RoboSpatial: Teaching Spatial Understanding to 2D and 3D Vision-Language Models for Robotics Paper
  • Tartan IMU: A Light Foundation Model for Inertial Positioning in Robotics Abstract
  • Code-as-Monitor: Constraint-aware Visual Programming for Reactive and Proactive Robotic Failure Detection Paper Page

Video

  • TASTE-Rob: Advancing Video Generation of Task-Oriented Hand-Object Interaction for Generalizable Robotic Manipulation Paper
  • GraphMimic: Graph-to-Graphs Generative Modeling from Videos for Policy Learning Paper

Sim2real and Real2sim

  • Prof. Robot: Differentiable Robot Rendering Without Static and Self-Collisions Paper Page
  • AutoURDF: Unsupervised Robot Modeling from Point Cloud Frames Using Cluster Registration Paper Page

Benchmark and Dataset

  • RoboTwin: Dual-Arm Robot Benchmark with Generative Digital Twins (early version)Paper Page
  • Pixel-aligned RGB-NIR Stereo Imaging and Dataset for Robot Vision Paper
  • RoboSense: Large-scale Dataset and Benchmark for Egocentric Robot Perception and Navigation in Crowded and Unstructured Environments Paper Page

ICLR2025

Vision-Language-Action Models

  • LLaRA: Supercharging Robot Learning Data for Vision-Language Policy Paper Page
  • VLAS: Vision-Language-Action Model With Speech Instructions For Customized Robot Manipulation Paper Page
  • TraceVLA: Visual Trace Prompting Enhances Spatial-Temporal Awareness for Generalist Robotic Policies Paper Page
  • Robots Pre-train Robots: Manipulation-Centric Robotic Representation from Large-Scale Robot Datasets Paper Page
  • PIDM: Predictive Inverse Dynamics Models are Scalable Learners for Robotic Manipulation Paper Page

Policies

  • GravMAD: Grounded Spatial Value Maps Guided Action Diffusion for Generalized 3D Manipulation Paper Page
  • ReViWo: Learning View-invariant World Models for Visual Robotic Manipulation Paper zhihu
  • HAMSTER: Hierarchical Action Models For Open-World Robot Manipulation Paper Page
  • BadRobot: Jailbreaking Embodied LLMs in the Physical World Paper Page
  • STRAP: Robot Sub-Trajectory Retrieval for Augmented Policy Learning Paper Page
  • SRSA: Skill Retrieval and Adaptation for Robotic Assembly Tasks Paper Page
  • Data Scaling Laws in Imitation Learning for Robotic Manipulation Paper Page
  • Stem-OB: Generalizable Visual Imitation Learning with Stem-Like Convergent Observation through Diffusion Inversion Paper Page

3D Vision

  • Dream to Manipulate: Compositional World Models Empowering Robot Imitation Learning with Imagination Paper Page
  • SPA*: 3D Spatial-Awareness Enables Effective Embodied Representation Paper Page

Planning and Reasoning

  • LASeR: Towards Diversified and Generalizable Robot Design with Large Language Models Paper Page
  • Physics-informed Temporal Difference Metric Learning for Robot Motion Planning Paper Page
  • AHA: A Vision-Language-Model for Detecting and Reasoning Over Failures in Robotic Manipulation Paper Page
  • EMOS: Embodiment-aware Heterogeneous Multi-robot Operating System with LLM AgentsPaper Page
  • VisualPredicator: Learning Abstract World Models with Neuro-Symbolic Predicates for Robot Planning Paper Page
  • DenseMatcher: Learning 3D Semantic Correspondence for Category-Level Manipulation from a Single Demo Paper Page
  • 6D Object Pose Tracking in Internet Videos for Robotic Manipulation Paper Page

Planning and Reasoning

  • Multi-Robot Motion Planning with Diffusion Models Paper Page

Video

  • GEVRM: Goal-Expressive Video Generation Model For Robust Visual Manipulation Paper

Sim2real and Real2sim

  • ReGen: Generative Robot Simulation via Inverse Design Paper Page

ICRA2025

  • MoRE: Unlocking Scalability in Reinforcement Learning for Quadruped Vision-Language-Action Models Paper
  • QUART-Online: Latency-Free Large Multimodal Language Model for Quadruped Robot Learning Paper Page
  • SpatialBot: Precise Spatial Understanding with Vision Language Models Paper Page

About

[Actively MaintainedđŸ”¥] A list of Embodied AI papers accepted by top conferences (ICLR, NeurIPS, ICML, RSS, CoRL, ICRA, IROS, CVPR, ICCV, ECCV).

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors