đŸ”¥ NeuIPS2025 & CORL2025 & ICCV2025 & ICML2025 & RSS2025 & CVPR2025 & ICLR2025 & ICLR2026 Embodied AI Paper List Resources.
[03/22/2025] We plan to organize more papers on Embodied AI from top conferences in the future and build a more comprehensive paper list. If there are any conference papers you would like to browse or if you have any other suggestions, please feel free to leave an issue.
[04/12/2025] We are updating Embodied AI papers accepted by RSS2025 (Robotics Top Conference)!
[05/21/2025] We are updating Embodied AI papers accepted by ICML2025!
[08/05/2025] We are updating Embodied AI papers accepted by ICCV2025!
[09/30/2025] We are updating Embodied AI papers accepted by CORL2025!
[11/30/2025] We are updating Embodied AI papers accepted by NeuIPS2025!
[03/12/2026] We are updating Embodied AI papers accepted by ICLR2026! (đŸ“– ICLR2026)
- đŸ“– ICLR2026
- đŸ“– NeuIPS2025
- đŸ“– CORL2025
- đŸ“– ICCV2025
- đŸ“– ICML2025
- đŸ“– RSS2025
- đŸ“– CVPR2025
- đŸ“– ICLR2025
- đŸ“– ICRA2025
- Scaling up Memory for Robotic Control via Experience Retrieval Paper
- MemoryVLA: Perceptual-Cognitive Memory in Vision-Language-Action Models for Robotic Manipulation Paper
- PixelVLA: Advancing Pixel-level Understanding in Vision-Language-Action Model Paper
- Vlaser: Vision-Language-Action Model with Synergistic Embodied Reasoning Paper
- Disentangled Robot Learning via Separate Forward and Inverse Dynamics Pretraining Paper
- MetaVLA: Unified Meta Co-Training for Efficient Embodied Adaptation Paper
- Unifying Diffusion and Autoregression for Generalizable Vision-Language-Action Model Paper
- Hybrid Training for Vision-Language-Action Models Paper
- End-to-end Listen, Look, Speak and Act Paper
- WholeBodyVLA: Towards Unified Latent VLA for Whole-body Loco-manipulation Control Paper
- RoboOmni: Proactive Robot Manipulation in Omni-modal Context Paper
- Unified Vision-Language-Action Model Paper
- SP-VLA: A Joint Model Scheduling and Token Pruning Approach for VLA Model Acceleration Paper
- Align-Then-stEer: Adapting the Vision-Language Action Models through Unified Latent Guidance Paper
- AutoQVLA: Not All Channels Are Equal in Vision-Language-Action Model's Quantization Paper
- Verifier-free Test-Time Sampling for Vision Language Action Models Paper
- Interleave-VLA: Enhancing Robot Manipulation with Image-Text Interleaved Instructions Paper
- Unified Diffusion VLA: Vision-Language-Action Model via Joint Discrete Diffusion Diffusion Process Paper
- Endowing GPT-4 with a Humanoid Body: Building the Bridge Between Off-the-Shelf VLMs and the Physical World Paper
- On Robustness of Vision-Language-Action Model against Multi-Modal Perturbations Paper
- Spatially Guided Training for Vision-Language-Action Model Paper
- Self-Improving Vision-Language-Action Models with Data Generation via Residual RL Paper
- Action-aware Dynamic Pruning for Efficient Vision-Language-Action Manipulation Paper
- Spatial Forcing: Implicit Spatial Representation Alignment for Vision-language-action Model Paper
- Genie Envisioner: A Unified World Foundation Platform for Robotic Manipulation Paper
- From Spatial to Actions: Grounding Vision-Language-Action Model in Spatial Foundation Priors Paper
- TwinVLA: Data-Efficient Bimanual Manipulation with Twin Single-Arm Vision-Language-Action Models Paper
- FASTer: Toward Powerful and Efficient Autoregressive Vision–Language–Action Models with Learnable Action Tokenizer and Block-wise Decoding Paper
- Embodied Navigation Foundation Model Paper
- X-VLA: Soft-Prompted Transformer as Scalable Cross-Embodiment Vision-Language-Action Model Paper
- Actions as Language: Fine-Tuning VLMs into VLAs Without Catastrophic Forgetting Paper
- OneTwoVLA: A Unified Vision-Language-Action Model with Adaptive Reasoning Paper
- VLM4VLA: Revisiting Vision-Language-Models in Vision-Language-Action Models Paper
- Vision-Language-Action Instruction Tuning: From Understanding to Manipulation Paper
- villa-X: Enhancing Latent Action Modeling in Vision-Language-Action Models Paper
- From Seeing to Doing: Bridging Reasoning and Decision for Robotic Manipulation Paper
- AutoFly: Vision-Language-Action Model for UAV Autonomous Navigation in the Wild Paper
- Ground Slow, Move Fast: A Dual-System Foundation Model for Generalizable Vision-Language Navigation Paper
- Towards Physically Executable 3D Gaussian for Embodied Navigation Paper
- Uncertainty-Aware Gaussian Map for Vision-Language Navigation Paper
- OpenFly: A COMPREHENSIVE PLATFORM FOR AERIAL VISION-LANGUAGE NAVIGATION Paper
- JanusVLN: Decoupling Semantics and Spatiality with Dual Implicit Memory for Vision-Language Navigation Paper
- CompassNav: Steering From Path Imitation to Decision Understanding In Navigation Paper
- M$^3$E: Continual Vision-and-Language Navigation via Mixture of Macro and Micro Experts Paper
- All-day Multi-scenes Lifelong Vision-and-Language Navigation with Tucker Adaptation Paper
- OmniNav: A Unified Framework for Prospective Exploration and Visual-Language Navigation Paper
- Ctrl-World: A Controllable Generative World Model for Robot Manipulation Paper
- Context and Diversity Matter: The Emergence of In-Context Learning in World Models Paper
- FantasyWorld: Geometry-Consistent World Modeling via Unified Video and 3D Prediction Paper
- NeMo-map: Neural Implicit Flow Fields for Spatio-Temporal Motion Mapping Paper
- Astra: General Interactive World Model with Autoregressive Denoising Paper
- Empowering Multi-Robot Cooperation via Sequential World Models Paper
- Test-Time Mixture of World Models for Embodied Agents in Dynamic Environments Paper
- RIG: Synergizing Reasoning and Imagination in End-to-End Generalist Policy Paper
- Learning Massively Multitask World Models for Continuous Control Paper
- Unified 3D Scene Understanding Through Physical World Modeling Paper
- ExoPredicator: Learning Abstract Models of Dynamic Worlds for Robot Planning Paper
- Efficient Reinforcement Learning by Guiding World Models with Non-Curated Data Paper
- Vid2World: Crafting Video Diffusion Models to Interactive World Models Paper
- WMPO: World Model-based Policy Optimization for Vision-Language-Action Models Paper
- Object-Centric World Models from Few-Shot Annotations for Sample-Efficient Reinforcement Learning Paper
- Building spatial world models from sparse transitional episodic memories Paper
- Cosmos Policy: Fine-Tuning Video Models for Visuomotor Control and Planning Paper
- WoW!: World Models in a Closed-Loop World Paper
- VLMgineer: Vision-Language Models as Robotic Toolsmiths Paper
- MomaGraph: State-Aware Unified Scene Graphs with Vision-Language Models for Embodied Task Planning Paper
- Planning with an Embodied Learnable Memory Paper
- Theory of Space: Can Foundation Models Construct Spatial Beliefs through Active Exploration? Paper
- Compositional Visual Planning via Inference-Time Diffusion Scaling Paper
- Experience-based Knowledge Correction for Robust Planning in Minecraft Paper
- Self-Improving Loops for Visual Robotic Planning Paper
- BOLT: Decision‑Aligned Distillation and Budget-Aware Routing for Constrained Multimodal QA on Robots Paper
- ReCAPA: Hierarchical Predictive Correction to Mitigate Cascading Failures Paper
- One Demo Is All It Takes: Planning Domain Derivation with LLMs from A Single Demonstration Paper
- EVLP: Learning Unified Embodied Vision-Language Planner with Reinforced Supervised Fine-Tuning Paper
- Towards Improvisational TAMP: Learning Low-Level Shortcuts in Abstract Planning Graphs Paper
- Embodied-R1: Reinforced Embodied Reasoning for General Robotic Manipulation Paper
- Self-Refining Vision Language Model for Robotic Failure Detection and Reasoning Paper
- Natural Language PDDL (NL-PDDL) for Open-world Goal-oriented Commonsense Regression Planning in Embodied AI Paper
- SafeFlowMatcher: Safe and Fast Planning using Flow Matching with Control Barrier Functions Paper
- OmniEVA: Embodied Versatile Planner via Task-Adaptive 3D-Grounded and Embodiment-aware Reasoning Paper
- From Seeing to Experiencing: Scaling Navigation Foundation Models with Reinforcement Learning Paper
- Lifelong Embodied Navigation Learning Paper
- CE-Nav: Flow-Guided Reinforcement Refinement for Cross-Embodiment Local Navigation Paper
- Emergence of Spatial Representation in an Actor-Critic Agent with Hippocampus-Inspired Sequence Generator Paper
- HWC-Loco: A Hierarchical Whole-Body Control Approach to Robust Humanoid Locomotion Paper
- Task Tokens: A Flexible Approach to Adapting Behavior Foundation Models Paper
- BFM-Zero: A Promptable Behavioral Foundation Model for Humanoid Control Using Unsupervised Reinforcement Learning Paper
- From Language to Locomotion: Retargeting-free Humanoid Control via Motion Latent Guidance Paper
- Geometry-aware 4D Video Generation for Robot Manipulation Paper
- PD$^{2}$GS: Part-Level Decoupling and Continuous Deformation of Articulated Objects via Gaussian Splatting Paper
- Manipulation as in Simulation: Enabling Accurate Geometry Perception in Robots Paper
- Master Skill Learning with Policy-Grounded Synergy of LLM-based Reward Shaping and Exploring Paper
- When would Vision-Proprioception Policies Fail in Robotic Manipulation? Paper
- ManipEvalAgent: Promptable and Efficient Evaluation Framework for Robotic Manipulation Policies Paper
- Remotely Detectable Robot Policy Watermarking Paper
- Difference-Aware Retrieval Polices for Imitation Learning Paper
- Capturing Visual Environment Structure Correlates with Control Performance Paper
- VITA: Vision-to-Action Flow Matching Policy Paper
- DemoGrasp: Universal Dexterous Grasping from a Single Demonstration Paper
- When a Robot is More Capable than a Human: Learning from Constrained Demonstrators Paper
- Autonomous Play with Correspondence-Driven Trajectory Warping Paper
- Cross-Embodiment Offline Reinforcement Learning for Heterogeneous Robot Datasets Paper
- Uncovering Robot Vulnerabilities through Semantic Potential Fields Paper
- Time Optimal Execution of Action Chunk Policies Beyond Demonstration Speed Paper
- Policy Likelihood-based Query Sampling and Critic-Exploited Reset for Efficient Preference-based Reinforcement Learning Paper
- Rodrigues Network for Learning Robot Actions Paper
- Reference Guided Skill Discovery Paper
- Masked Generative Policy for Robotic Control Paper
- GRL-SNAM: Geometric Reinforcement Learning with Differential Hamiltonians for Navigation and Mapping in Unknown Environments Paper
- HAMLET: Switch Your Vision-Language-Action Model into a History-Aware Policy Paper
- Towards Bridging the Gap between Large-Scale Pretraining and Efficient Finetuning for Humanoid Control Paper
- Learning Video Generation for Robotic Manipulation with Collaborative Trajectory Control Paper
- Policy Contrastive Decoding for Robotic Foundation Models Paper
- Demystifying Robot Diffusion Policies: Action Memorization and a Simple Lookup Table Alternative Paper
- H$^3$DP: Triply‑Hierarchical Diffusion Policy for Visuomotor Learning Paper
- SimpleVLA-RL: Scaling VLA Training via Reinforcement Learning Paper
- Compose Your Policies! Improving Diffusion-based or Flow-based Robot Policies via Test-time Distribution-level Composition Paper
- Abstracting Robot Manipulation Skills via Mixture-of-Experts Diffusion Policies Paper
- Accelerated co-design of robots through morphological pretraining Paper
- Generalizable Coarse-to-Fine Robot Manipulation via Language-Aligned 3D Keypoints Paper
- VER: Vision Expert Transformer for Robot Learning via Foundation Distillation and Dynamic Routing Paper
- SpikePingpong: Spike Vision-based Fast-Slow Pingpong Robot System Paper
- EquAct: An SE(3)-Equivariant Multi-Task Transformer for 3D Robotic Manipulation Paper
- Translating Flow to Policy via Hindsight Online Imitation Paper
- Hierarchical Value-Decomposed Offline Reinforcement Learning for Whole-Body Control Paper
- Cortical Policy: A Dual-Stream View Transformer for Robotic Manipulation Paper
- Geometry-aware Policy Imitation Paper
- Contractive Diffusion Policies: Robust Action Diffusion via Contractive Score-Based Sampling with Differential Equations Paper
- Scalable Exploration for High-Dimensional Continuous Control via Value-Guided Flow Paper
- Mean Flow Policy with Instantaneous Velocity Constraint for One-step Action Generation Paper
- Emergent Dexterity Via Diverse Resets and Large-Scale Reinforcement Learning Paper
- Learning Part-Aware Dense 3D Feature Field For Generalizable Articulated Object Manipulation Paper
- Real-Time Robot Execution with Masked Action Chunking Paper
- Robust Fine-tuning of Vision-Language-Action Robot Policies via Parameter Merging Paper
- ViPRA: Video Prediction for Robot Actions Paper
- RAVEN: End-to-end Equivariant Robot Learning with RGB Cameras Paper
- DexNDM: Closing the Reality Gap for Dexterous In-Hand Rotation via Joint-Wise Neural Dynamics Model Paper
- EgoDex: Learning Dexterous Manipulation from Large-Scale Egocentric Video Paper
- RFS: Reinforcement learning with Residual flow steering for dexterous manipulation Paper
- Learning to Grasp Anything By Playing with Random Toys Paper
- SARM: Stage-Aware Reward Modeling for Long Horizon Robot Manipulation Paper
- UniHM: Unified Dexterous Hand Manipulation with Vision Language Model Paper
- DexMove: Learning Tactile-Guided Non-Prehensile Manipulation with Dexterous Hands Paper
- VLBiMan: Vision-Language Anchored One-Shot Demonstration Enables Generalizable Bimanual Robotic Manipulation Paper
- Cross-Embodied Co-Design for Dexterous Hands Paper
- Robotic Manipulation by Imitating Generated Videos Without Physical Demonstrations Paper
- Primary-Fine Decoupling for Action Generation in Robotic Imitation Paper
- AnyTouch 2: General Optical Tactile Representation Learning For Dynamic Tactile Perception Paper
- APPLE: Toward General Active Perception via Reinforcement Learning Paper
- D-REX: Differentiable Real-to-Sim-to-Real Engine for Learning Dexterous Grasping Paper
- DemoGrasp: Universal Dexterous Grasping from a Single Demonstration Paper
- Sim2Real VLA: Zero-Shot Generalization of Synthesized Skills to Realistic Manipulation Paper
- Exo-Plore: Exploring Exoskeleton Control Space through Human-aligned Simulation Paper
- Emergent Dexterity Via Diverse Resets and Large-Scale Reinforcement Learning Paper
- Manipulation as in Simulation: Enabling Accurate Geometry Perception in Robots Paper
- Latent Adaptation of Foundation Policies for Sim-to-Real Transfer Paper
-
RobotArena
$\infty$ : Unlimited Robot Benchmarking via Real-to-Sim Translation Paper - PD$^{2}$GS: Part-Level Decoupling and Continuous Deformation of Articulated Objects via Gaussian Splatting Paper
- Contact-guided Real2Sim from Monocular Video with Planar Scene Primitives Paper
- D2E: Scaling Vision-Action Pretraining on Desktop Data for Transfer to Embodied AI Paper
- Memory, Benchmark & Robots: A Benchmark for Solving Complex Tasks with Reinforcement Learning Paper
- DataMIL: Selecting Data for Robot Imitation Learning with Datamodels Paper
- MIMIC: Mask-Injected Manipulation Video Generation with Interaction Control Paper
- LeRobot: An Open-Source Library for End-to-End Robot Learning Paper
-
RobotArena
$\infty$ : Unlimited Robot Benchmarking via Real-to-Sim Translation Paper - RoboInter: A Holistic Intermediate Representation Suite Towards Robotic Manipulation Paper
- ENACT: Evaluating Embodied Cognition with World Modeling of Egocentric Interaction Paper
- AutoBio: A Simulation and Benchmark for Robotic Automation in Digital Biology Laboratory Paper
- Image Quality Assessment for Embodied AI Paper
- MoMaGen: Generating Demonstrations under Soft and Hard Constraints for Multi-Step Bimanual Mobile Manipulation Paper
- CoNavBench: Collaborative Long-Horizon Vision-Language Navigation Benchmark Paper
- World2Minecraft: Occupancy-Driven simulated scenes Construction Paper
- CitySeeker: How Do VLMs Explore Embodied Urban Navigation with Implicit Human Needs? Paper
- Seeing Across Views: Benchmarking Spatial Reasoning of Vision-Language Models in Robotic Scenes Paper
- RoboCasa365: A Large-Scale Simulation Framework for Training and Benchmarking Generalist Robots Paper
- REI-Bench: Can Embodied Agents Understand Vague Human Instructions in Task Planning? Paper
- On the Generalization Capacities of MLLMs for Spatial Intelligence Paper
- Embodied Agents Meet Personalization: Investigating Challenges and Solutions Through the Lens of Memory Utilization Paper
- Interaction-aware Representation Modeling With Co-Occurrence Consistency for Egocentric Hand-Object Parsing Paper
- PhyScensis: Physics-Augmented LLM Agents for Complex Physical Scene Arrangement Paper
- OmniActor: A Generalist GUI and Embodied Agent for 2D&3D Worlds Paper
- EgoWorld: Translating Exocentric View to Egocentric View using Rich Exocentric Observations Paper
- Contact-guided Real2Sim from Monocular Video with Planar Scene Primitives Paper
- Fast-in-Slow: A Dual-System VLA Model Unifying Fast Manipulation within Slow Reasoning Paper Page
- AC-DiT: Adaptive Coordination Diffusion Transformer for Mobile Manipulation Paper Page
- BridgeVLA: Input-Output Alignment for Efficient 3D Manipulation Learning with Vision-Language Models Paper Page
- CogVLA: Cognition-Aligned Vision-Language-Action Models via Instruction-Driven Routing & Sparsification Paper Page
- VideoVLA: Video Generators Can Be Generalizable Robot Manipulators
- ChatVLA-2: Vision-Language-Action Model with Open-World Reasoning Paper Page
- Exploring the Limits of Vision-Language-Action Manipulation in Cross-task Generalization Paper Page
- BadVLA: Towards Backdoor Attacks on Vision-Language-Action Models via Objective-Decoupled Optimization Paper Page
- Compliant Residual DAgger: Improving Real-World Contact-Rich Manipulation with Human Corrections Paper Page
- VLA-OS: Structuring and Dissecting Planning Representations and Paradigms in Vision-Language-Action Models Paper Page
- ThinkAct: Vision-Language-Action Reasoning via Reinforced Visual Latent Planning Paper Page
- Self-Improving Embodied Foundation Models Paper Page
- Robo2VLM: Improving Visual Question Answering using Large-Scale Robot Manipulation Data Paper Page
- EnerVerse: Envisioning Embodied Future Space for Robotics Manipulation Paper
- Learning Spatial-Aware Manipulation Ordering Paper
- PRIMT: Preference-based Reinforcement Learning with Multimodal Feedback and Trajectory Synthesis from Foundation Models Paper
- BEAST: Efficient Tokenization of B-Splines Encoded Action Sequences for Imitation Learning Paper
- PointMapPolicy: Structured Point Cloud Processing for Multi-Modal Imitation Learning Paper
- Real-Time Execution of Action Chunking Flow Policies Paper Page
- Chain-of-Action: Trajectory Autoregressive Modeling for Robotic Manipulation Paper Page
- 4D-VLA: Spatiotemporal Vision-Language-Action Pretraining with Cross-Scene Calibration
- SAFE: Multitask Failure Detection for Vision-Language-Action Models Paper Page
- Blindfolded Experts Generalize Better: Insights from Robotic Manipulation and Videogames Paper Page
- HiMaCon: Discovering Hierarchical Manipulation Concepts from Unlabeled Multi-Modal Data Paper
- Knowledge Insulating Vision-Language-Action Models: Train Fast, Run Fast, Generalize Better Paper
- Provable Ordering and Continuity in Vision-Language Pretraining for Generalizable Embodied Agents Paper Page
- DreamVLA: A Vision-Language-Action Model Dreamed with Comprehensive World Knowledge Paper Page
- EgoBridge: Domain Adaptation for Generalizable Imitation from Egocentric Human Data Paper Page
- RobotSmith: Generative Robotic Tool Design for Acquisition of Complex Manipulation Skill Paper Page
- URDF-Anything: Constructing Articulated Objects with 3D Multimodal Language Model Paper
- DEAL: Diffusion Evolution Adversarial Learning for Sim-to-Real Transfer
- Generalizable Domain Adaptation for Sim-and-Real Policy Co-Training Paper Page
- SAMPO: Scale-wise Autoregression with Motion Prompt for Generative World Models Paper
- Learning 3D Persistent Embodied World Models Paper
- OSVI-WM: One-Shot Visual Imitation for Unseen Tasks using World-Model-Guided Trajectory Generation Paper
- Towards Reliable LLM-based Robots Planning via Combined Uncertainty Estimation Paper
- Towards Reliable Code-as-Policies: A Neuro-Symbolic Framework for Embodied Task Planning Paper
- RDD: Retrieval-Based Demonstration Decomposer for Planner Alignment in Long-Horizon Tasks Paper Page
- UniDomain: Pretraining a Unified PDDL Domain from Real-World Demonstrations for Generalizable Robot Task Planning Paper
- InstructFlow: Adaptive Symbolic Constraint-Guided Code Generation for Long-Horizon Planning
- C-NAV: Towards Self-Evolving Continual Object Navigation in Open World Paper Page
- Distilling LLM Prior to Flow Model for Generalizable Agent’s Imagination in Object Goal Navigation Paper
- TP-MDDN: Task-Preferenced Multi-Demand-Driven Navigation with Autonomous Decision-Making
- Active Test-time Vision-Language Navigation Paper
- Aux-Think: Exploring Reasoning Strategies for Data-Efficient Vision-Language Navigation
- EfficientNav: Towards On-Device Object-Goal Navigation with Navigation Map Caching and Retrieval Paper
- Seeing through Uncertainty: Robust Task-Oriented Optimization in Visual Navigation Paper Page
- Adversarial Locomotion and Motion Imitation for Humanoid Policy Learning Paper Page
- From Experts to a Generalist: Toward General Whole-Body Control for Humanoid Robots Paper Page
- KungfuBot: Physics-Based Humanoid Whole-Body Control for Learning Highly-Dynamic Skills Paper Page
- DynaRend: Learning 3D Dynamics via Masked Future Rendering for Robotic Manipulation Paper
- Building 3D Representations and Generating Motions From a Single Image via Video-Generation Paper
- Emerging Risks from Embodied AI Require Urgent Policy Action
- Human-assisted Robotic Policy Refinement via Action Preference Optimization Paper Page
- Hyper-GoalNet: Goal-Conditioned Manipulation Policy Learning with HyperNetworks
- ReinFlow: Fine-tuning Flow Matching Policy with Online Reinforcement Learning Paper Page
- Diversifying Parallel Ergodic Search: A Signature Kernel Evolution Strategy
- FreqPolicy: Efficient Flow-based Visuomotor Policy via Frequency Consistency Paper
- A Practical Guide for Incorporating Symmetry in Diffusion Policy Paper
- Latent Policy Barrier: Learning Robust Visuomotor Policies by Staying In-Distribution Paper Page
- Quantization-Free Autoregressive Action Transformer Paper
- Real-World Reinforcement Learning of Active Perception Behaviors
- Failure Prediction at Runtime for Generative Robot Policies Paper
- Act to See, See to Act: Diffusion-Driven Perception-Action Interplay for Adaptive Policies Paper Page
- Dynamic Test-Time Compute Scaling in Control Policy: Difficulty-Aware Stochastic Interpolant Policy Paper
- DynaGuide: Steering Diffusion Polices with Active Dynamic Guidance Paper Page
- World-aware Planning Narratives Enhance Large Vision-Language Model Planner Paper
- Accelerating Visual-Policy Learning through Parallel Differentiable Simulation Paper Page
- EfficientVLA: Training-Free Acceleration and Compression for Vision-Language-Action Models Paper
- A Smooth Sea Never Made a Skilled SAILOR: Robust Imitation via Learning to Search Paper Page
- VLA-Cache: Efficient Vision-Language-Action Manipulation via Adaptive Token Caching Paper Page
- Universal Visuo-Tactile Video Understanding for Embodied Interaction Paper
- Enhancing Tactile-based Reinforcement Learning for Robotic Control Paper Page
- Taccel: Scaling Up Vision-based Tactile Robotics via High-performance GPU Simulation Paper Page
- Toward Artificial Palpation: Representation Learning of Touch on Soft Bodies Paper Page
- Touch in the Wild: Learning Fine-Grained Manipulation with a Portable Visuo-Tactile Gripper Paper Page
- Contact Map Transfer with Conditional Diffusion Model for Generalizable Dexterous Grasp Generation Paper Page
- HumanoidGen: Data Generation for Bimanual Dexterous Manipulation via LLM Reasoning Paper Page
- Grasp2Grasp: Vision-Based Dexterous Grasp Translation via Schrödinger Bridges Paper Page
- Scaffolding Dexterous Manipulation with Vision-Language Models Paper Page
- DexFlyWheel: A Scalable and Self-improving Data Generation Framework for Dexterous Manipulation Paper Page
- DexGarmentLab: Dexterous Garment Manipulation Environment with Generalizable Policy Paper Page
- RoboCerebra: A Large-scale Benchmark for Long-horizon Robotic Manipulation Evaluation Paper Page
- SutureBot: A Precision Framework & Benchmark For Autonomous End-to-End Suturing Paper Page
- Synthesizing Photorealistic and Dynamic Urban Environments for Multimodal Robot Navigation and Collaboration
- LabUtopia: High-Fidelity Simulation and Hierarchical Benchmark for Scientific Embodied Agents Paper Page
- SonoGym: High Performance Simulation for Challenging Surgical Tasks with Robotic Ultrasound Paper Page
- Embodied Crowd Counting
- PAC Bench: Do Foundation Models Understand Prerequisites for Executing Manipulation Policies? Paper
-
$\pi_{0.5}$ : a Vision-Language-Action Model with Open-World Generalization Paper page - Training Strategies for Efficient Embodied ReasoningPaper page
- Long-VLA: Unleashing Long-Horizon Capability of Vision Language Action Model for Robot Manipulation Paper page
- RoboMonkey: Scaling Test-Time Sampling and Verification for Vision-Language-Action Models Paper page
- RoboChemist: Long-Horizon and Safety-Compliant Robotic Chemical Experimentation Paper page
- TA-VLA: Elucidating the Design Space of Torque-aware Vision-Language-Action Models Paper page
- Focusing on What Matters: Object-Agent-centric Tokenization for Vision Language Action models Paper
- FLOWER: Democratizing Generalist Robot Policies with Efficient Vision-Language-Action Flow Policies Paper page
- Mechanistic Interpretability for Steering Vision-Language-Action Models Paper
- RICL: Adding In-Context Adaptability to Pre-Trained Vision-Language-Action Models Paper page
- DexVLA: Vision-Language Model with Plug-In Diffusion Expert for General Robot Control Paper page
- FLARE: Robot Learning with Implicit World Modeling Paper page
- 3DS-VLA: A 3D Spatial-Aware Vision Language Action Model for Robust Multi-Task Manipulation Paper
- GraspVLA: a Grasping Foundation Model Pre-trained on Billion-scale Synthetic Action Data Paper page
- EndoVLA: Dual-Phase Vision-Language-Action for Precise Autonomous Tracking in Endoscopy Paper
- MoTo: A Zero-shot Plug-in Interaction-aware Navigation for General Mobile Manipulation Paper page
- ControlVLA: Few-shot Object-centric Adaptation for Pre-trained Vision-Language-Action Models Paper page
- TrackVLA: Embodied Visual Tracking in the Wild Paper page
- AnyPlace: Learning Generalizable Object Placement for Robot Manipulation Paper page
- Generalist Robot Manipulation beyond Action Labeled Data Paper page
- LaVA-Man: Learning Visual Action Representations for Robot Manipulation Paper page
- MoTo: A Zero-shot Plug-in Interaction-aware Navigation for General Mobile Manipulation
- Meta-Optimization and Program Search using Language Models for Task and Motion Planning
- ObjectReact: Learning Object-Relative Control for Visual Navigation
- HALO: Human Preference Aligned Offline Reward Learning for Robot Navigation
- Imagine, Verify, Execute: Memory-guided Agentic Exploration with Vision-Language Models
- Long Range Navigator (LRN): Extending robot planning horizons beyond metric maps
- Search-TTA: A Multi-Modal Test-Time Adaptation Framework for Visual Search in the Wild
- ActLoc: Learning to Localize on the Move via Active Viewpoint Selection
- Human-like Navigation in a World Built for Humans
- GC-VLN: Instruction as Graph Constraints for Training-free Vision-and-Language Navigation
- GraspMolmo: Generalizable Task-Oriented Grasping via Large-Scale Synthetic Data Generation
- Belief-Conditioned One-Step Diffusion: Real-Time Trajectory Planning with Just-Enough Sensing
- ImMimic: Cross-Domain Imitation from Human Videos via Mapping and Interpolation Paper page
- ReWiND: Language-Guided Rewards Teach Robot Policies without New Demonstrations Paper page
- Steering Your Diffusion Policy with Latent Space Reinforcement Learning Paper page
- Streaming Flow Policy: Simplifying diffusion/flow-matching policies by treating action trajectories as flow trajectories Paper page
- SAIL: Faster-than-Demonstration Execution of Imitation Learning Policies Paper page
- Reactive In-Air Clothing Manipulation with Confidence-Aware Dense Correspondence and Visuotactile Affordance Paper page
- Data Retrieval with Importance Weights for Few-Shot Imitation Learning Paper page
- X-Sim: Cross-Embodiment Learning via Real-to-Sim-to-Real Paper
- DemoSpeedup: Accelerating Visuomotor Policies via Entropy-Guided Demonstration Acceleration Paper page
- ManiFlow: A General Robot Manipulation Policy via Consistency Flow Training Paper page
- Text2Touch: Tactile In-Hand Manipulation with LLM-Designed Reward Functions Paper page
- Multi-Loco: Unifying Multi-Embodiment Legged Locomotion via Reinforcement Learning Augmented Diffusion Paper page
-
$\texttt{SPIN}$ : distilling$\texttt{Skill-RRT}$ for long-horizon prehensile and non-prehensile manipulation Paper - Imitation Learning Based on Disentangled Representation Learning of Behavioral Characteristics Paper
- Constraint-Preserving Data Generation for One-Shot Visuomotor Policy Generalization Paper page
- CLASS: Contrastive Learning via Action Sequence Supervision for Robot Manipulation Paper page
- MirrorDuo: Reflection-Consistent Visuomotor Learning from Mirrored Demonstration Pairs page
- Dynamics-Compliant Trajectory Diffusion for Super-Nominal Payload Manipulation Paper
- Eye, Robot: Learning to Look to Act with a BC-RL Perception-Action Loop Paper page
- ARCH: Hierarchical Hybrid Learning for Long-Horizon Contact-Rich Robotic Assembly Paper page
- KDPE: A Kernel Density Estimation Strategy for Diffusion Policy Trajectory Selection Paper page
- AimBot: A Simple Auxiliary Visual Cue to Enhance Spatial Awareness of Visuomotor Policies Paper page
- Enabling Long(er) Horizon Imitation for Manipulation Tasks by Modeling Subgoal Transitions
- Mobi-$\pi$: Mobilizing Your Robot Learning Policy Paper page
- Action-Free Reasoning for Policy Generalization Paper page
- Learn from What We HAVE: History-Aware VErifier that Reasons about Past Interactions Online Paper page
- D-CODA: Diffusion for Coordinated Dual-Arm Data Augmentation Paper page
- ATK: Automatic Task-driven Keypoint Selection for Robust Policy Learning Paper page
- Poke and Strike: Learning Task-Informed Exploration Policies Paper page
- SafeBimanual: Diffusion-based trajectory optimization for safe bimanual manipulation Paper page
- COMBO-Grasp: Learning Constraint-Based Manipulation for Bimanual Occluded Grasping Paper
- Phantom: Training Robots Without Robots Using Only Human Videos Paper page
- Learning Long-Context Diffusion Policies via Past-Token Prediction Paper
- VT-Refine: Learning Bimanual Assembly with Visuo-Tactile Feedback via Simulation Fine-Tuning Paper
- COLLAGE: Adaptive Fusion-based Retrieval for Augmented Policy Learning Paper page
- CDP: Towards Robust Autoregressive Visuomotor Policy Learning via Causal Diffusion Paper page
- Robust Dexterous Grasping of General Objects Paper page
- Point Policy: Unifying Observations and Actions with Key Points for Robot Manipulation Paper page
- RoboArena: Distributed Real-World Evaluation of Generalist Robot Policies
- GraspVLA: a Grasping Foundation Model Pre-trained on Billion-scale Synthetic Action Data
- CUPID: Curating Data your Robot Loves with Influence
- AutoEval: Autonomous Evaluation of Generalist Robot Manipulation Policies in the Real World
- ManipBench: Benchmarking Vision-Language Models for Low-Level Robot Manipulation Functions
- Ensuring Force Safety in Vision-Guided Robotic Manipulation via Implicit Tactile Calibration
- Crossing the Human-Robot Embodiment Gap with Sim-to-Real RL using One Human Demonstration
- UniSkill: Imitating Human Videos via Cross-Embodiment Skill Representations
- HuB: Learning Extreme Humanoid Balance
- Versatile Loco-Manipulation through Flexible Interlimb Coordination
- Visual Imitation Enables Contextual Humanoid Control
- Hand-Eye Autonomous Delivery: Learning Humanoid Navigation, Locomotion and Reaching
- CLONE: Closed-Loop Whole-Body Humanoid Teleoperation for Long-Horizon Tasks
- Embrace Contacts: humanoid shadowing with full body ground contacts
- Hold My Beer: Learning Gentle Humanoid Locomotion and End-Effector Stabilization Control
- SLAC: Simulation-Pretrained Latent Action Space for Whole-Body Real-World RL
- Robot Trains Robot: Automatic Real-World Policy Adaptation and Learning for Humanoids
- Humanoid Policy ~ Human Policy
- Real2Render2Real: Scaling Robot Data Without Dynamics Simulation or Robot Hardware
- Cross-Sensor Touch Generation
- WoMAP: World Models For Embodied Open-Vocabulary Object Localization
- DreamGen: Unlocking Generalization in Robot Learning through Video World Models
- Tool-as-Interface: Learning Robot Policies from Observing Human Tool Use
- Articulated Object Estimation in the Wild
- DiWA: Diffusion Policy Adaptation with World Models
- Steerable Scene Generation with Post Training and Inference-Time Search
- Generative Visual Foresight Meets Task-Agnostic Pose Estimation in Robotic Table-top Manipulation
- Gen2Act: Human Video Generation in Novel Scenarios enables Generalizable Robot Manipulation
- Reflective Planning: Vision-Language Models for Multi-Stage Long-Horizon Robotic Manipulation
- LaDi-WM: A Latent Diffusion-Based World Model for Predictive Manipulation
- DexUMI: Using Human Hand as the Universal Manipulation Interface for Dexterous Manipulation page
- Dexplore: Scalable Neural Control for Dexterous Manipulation from Reference Scoped Exploration
- FFHFlow: Diverse and Uncertainty-Aware Dexterous Grasp Generation via Flow Variational Inference
- GraspQP: Differentiable Optimization of Force Closure for Diverse and Robust Dexterous Grasping page
- Morphologically Symmetric Reinforcement Learning for Ambidextrous Bimanual Manipulation
- KineDex: Learning Tactile-Informed Visuomotor Policies via Kinesthetic Teaching for Dexterous Manipulation
- D-Cubed: Latent Diffusion Trajectory Optimisation for Dexterous Deformable Manipulation
- LodeStar: Long-horizon Dexterity via Synthetic Data Augmentation from Human Demonstrations
- The Sound of Simulation: Learning Multimodal Sim-to-Real Robot Policies with Generative Audio
- FetchBot: Learning Generalizable Object Fetching in Cluttered Scenes via Zero-Shot Sim2Real
- ClutterDexGrasp: A Sim-to-Real System for General Dexterous Grasping in Cluttered Scenes
- SimShear: Sim-to-Real Shear-based Tactile Servoing
- Wheeled Lab: Modern Sim2Real for Low-cost, Open-source Wheeled Robotics
- Articulate AnyMesh: Open-vocabulary 3D Articulated Objects Modeling
- AgentWorld: An Interactive Simulation Platform for Scene Construction and Mobile Robotic Manipulation
- Robot Learning from Any Images
- Exploring the Adversarial Vulnerabilities of Vision-Language-Action Models in Robotics Paper page
- VQ-VLA: Improving Vision-Language-Action Models via Scaling Vector-Quantized Action Tokenizers Paper page
- Dita: Scaling Diffusion Transformer for Generalist Vision-Language-Action Policy Paper page
- Moto: Latent Motion Token as the Bridging Language for Learning Robot Manipulation from Videos Paper page
- A0: An Affordance-Aware Hierarchical Model for General Robotic Manipulation Paper page
- Embodied VideoAgent: Persistent Memory from Egocentric Videos and Embodied Sensors Enables Dynamic Scene Understanding Paper page
- CoA-VLA: Improving Vision-Language-Action Models via Visual-Text Chain-of-Affordance Paper
- FedVLA: Federated Vision-Language-Action Learning with Dual Gating Mixture-of-Experts for Robotic Manipulation Paper
- Towards Long-Horizon Vision-Language-Action System: Reasoning, Acting and Memory Paper
- PASG: A Closed-Loop Framework for Automated Geometric Primitive Extraction and Semantic Anchoring in Robotic Manipulation Paper
- SD2Actor: Continuous State Decomposition via Diffusion Embeddings for Robotic Manipulation Paper
- Move to Understand a 3D Scene: Bridging Visual Grounding and Exploration for Efficient and Versatile Embodied Navigation Paper page
- Rethinking the Embodied Gap in Vision-and-Language Navigation: A Holistic Study of Physical and Visual Disparities Paper page
- P3Nav: A Unified Framework for Embodied Navigation Integrating Perception, Planning, and Prediction Paper
- SAME: Learning Generic Language-Guided Visual Navigation with State-Adaptive Mixture of Experts Paper page
- NavMorph: A Self-Evolving World Model for Vision-and-Language Navigation in Continuous Environments Paper page
- Harnessing Input-adaptive Inference for Efficient VLN Paper
- Embodied Navigation with Auxiliary Task of Action Description Prediction Paper
- 3D Gaussian Map with Open-Set Semantic Grouping for Vision-Language Navigation Paper
- NavQ: Learning a Q-Model for Foresighted Vision-and-Language Navigation Paper
- monoVLN: Bridging the Observation Gap between Monocular and Panoramic Vision and Language Navigation Paper
- Adaptive Articulated Object Manipulation On The Fly with Foundation Model Reasoning and Part Grounding Paper
- CogNav: Cognitive Process Modeling for Object Goal Navigation with LLMs Paper page
- RoBridge: A Hierarchical Architecture Bridging Cognition and Execution for General Robotic Manipulation Paper page
- IRASim: A Fine-Grained World Model for Robot Manipulation Paper page
- GWM: Towards Scalable Gaussian World Models for Robotic Manipulation Paper page
- DyWA: Dynamics-adaptive World Action Model for Generalizable Non-prehensile Manipulation Paper page
- Diffusion-Based Imaginative Coordination for Bimanual Manipulation Paper
- Learning 4D Embodied World Models Paper
- Rethinking Bimanual Robotic Manipulation: Learning with Decoupled Interaction Framework Paper
- EC-Flow: Enabling Versatile Robotic Manipulation from Action-Unlabeled Videos via Embodiment-Centric Flow Paper page
- Dense Policy: Bidirectional Autoregressive Learning of Actions Paper page
- AnyBimanual: Transferring Unimanual Policy for General Bimanual Manipulation Paper page
- Learning Precise Affordances from Egocentric Videos for Robotic Manipulation Paper page
- iManip: Skill-Incremental Learning for Robotic Manipulation Paper
- Spatial-Temporal Aware Visuomotor Diffusion Policy Learning Paper page
- Wavelet Policy: Lifting Scheme for Policy Learning in Long-Horizon Tasks Paper page
- 4D Visual Pre-training for Robot Learning Paper
- Saliency-Aware Quantized Imitation Learning for Efficient Robotic Control Paper
- On-Device Diffusion Transformer Policy for Efficient Robot Manipulation Paper
- COSMO: Combination of Selective Memorization for Low-cost Vision-and-Language Navigation Paper
- CARP: Coarse-to-Fine Autoregressive Prediction for Visuomotor Policy Learning Paper page
- EmbodiedOcc: Embodied 3D Occupancy Prediction for Vision-based Online Scene Understanding Paper page
- Embodied Image Captioning: Self-supervised Learning Agents for Spatially Coherent Image Descriptions Paper page
- VLABench: A Large-Scale Benchmark for Language-Conditioned Robotics Manipulation with Long-Horizon Reasoning Tasks Paper page
- RoboFactory: Exploring Embodied Agent Collaboration with Compositional Constraints Paper page
- HUMOTO: A 4D Dataset of Mocap Human Object Interactions Paper page
- RoboMM: All-in-One Multimodal Large Model for Robotic Manipulation Paper
- MoMa-Kitchen: A 100K+ Benchmark for Affordance-Grounded Last-Mile Navigation in Mobile Manipulation Paper page
- RoboPearls: Editable Video Simulation for Robot Manipulation Paper page
- DexH2R: A Benchmark for Dynamic Dexterous Grasping in Human-to-Robot Handover Paper page
- Beyond the Destination: A Novel Benchmark for Exploration-Aware Embodied Question Answering Paper page
- RobAVA: A Large-scale Dataset and Baseline Towards Video based Robotic Arm Action Understanding Paper
- RoboAnnotatorX: A Comprehensive and Universal Annotation Framework for Accurate Understanding of Long-horizon Robot Demonstration Paper
- Hi Robot: Open-Ended Instruction Following with Hierarchical Vision-Language-Action Models Paper
- OTTER: A Vision-Language-Action Model with Text-Aware Visual Feature Extraction paper page
- UP-VLA: A Unified Understanding and Prediction Model for Embodied Agent paper
- ELEMENTAL: Interactive Learning from Demonstrations and Vision-Language Models for Reward Design in Robotics paper
- ReinboT: Amplifying Robot Visual-Language Manipulation with Reinforcement Learning paper
- A Large Recurrent Action Model: xLSTM enables Fast Inference for Robotics Tasks paper page
- Efficient Robotic Policy Learning via Latent Space Backward Planning paper page
- Closed-Loop Long-Horizon Robotic Planning via Equilibrium Sequence Modeling paper page
- SAM2Act:Integrating Visual Foundation Model with A Memory Architecture for Robotic Manipulation paper
- Pre-training Auto-regressive Robotic Models with 4D Representations paper page
- Flow-based Domain Randomization for Learning and Sequencing Robotic Skills paper
- EmbodiedBench: Comprehensive Benchmarking Multi-modal Large Language Models for Vision-Driven Embodied Agents paper page
- Learning Policy Committees for Effective Personalization in MDPs with Diverse Tasks paper
- Video Prediction Policy: A Generalist Robot Policy with Predictive Visual Representations paper page
- STAR: Learning Diverse Robot Skill Abstractions through Rotation-Augmented paper page
- Unifying 2D and 3D Vision-Language Understanding paper page
- GAPrompt: Geometry-Aware Point Cloud Prompt for 3D Vision Model paper page
- Unified World Models: Coupling Video and Action Diffusion for Pretraining on Large Robotic Datasets Paper Page
- CordViP: Correspondence-based Visuomotor Policy for Dexterous Manipulation in Real-World Paper Page
- Reactive Diffusion Policy: Slow-Fast Visual-Tactile Policy Learning for Contact-Rich Manipulation Paper Page
- Dynamic Rank Adjustment in Diffusion Policies for Efficient and Flexible Training Paper
- SpatialVLA: Exploring Spatial Representations for Visual-Language-Action Model Paper
- Sketch-to-Skill: Bootstrapping Robot Learning with Human Drawn Trajectory Sketches Paper
- NaVILA: Legged Robot Vision-Language-Action Model for Navigation Paper Page
- ConRFT: A Reinforced Fine-tuning Method for VLA Models via Consistency Policy Paper Page
- You Only Teach Once: Learn One-Shot Bimanual Robotic Manipulation from Video Demonstrations Paper Page
- ASAP: Aligning Simulation and Real-World Physics for Learning Agile Humanoid Whole-Body Skills Paper Page
- Flying Hand: End-Effector-Centric Framework for Versatile Aerial Manipulation Teleoperation and Policy Learning Paper Page
- DemoGen: Synthetic Demonstration Generation for Data-Efficient Visuomotor Policy Learning Paper Page
- DOGlove: Dexterous Manipulation with a Low-Cost Open-Source Haptic Force Feedback Glove Paper Page
- RoboSplat: Novel Demonstration Generation with Gaussian Splatting Enables Robust One-Shot Manipulation Paper Page
- Enhancing Autonomous Driving Systems with On-Board Deployed Large Language Models Paper
- SATA: Safe and Adaptive Torque-Based Locomotion Policies Inspired by Animal Learning Paper Video
- FACTR: Force-Attending Curriculum Training for Contact-Rich Policy Learning Paper Page
- RoboVerse: Towards a Unified Platform, Dataset and Benchmark for Scalable and Generalizable Robot Learning Paper Page
- STDArm: Transferring Visuomotor Policies From Static Data Training to Dynamic Robot Manipulation Paper
- UniAct: Universal Actions For Enhanced Embodied Foundation Models Paper Page
- MoManipVLA: Transferring Vision-language-action Models for General Mobile Manipulation Paper Page
- CoT-VLA: Visual Chain-of-Thought Reasoning for Vision-Language-Action Models Paper
- SOLAMI: Social Vision-Language-Action Modeling for Immersive Interaction with 3D Autonomous Characters Paper Page
- A Data-Centric Revisit of Pre-Trained Vision Models for Robot Learning Paper Page
- Think Small, Act Big: Primitive Prompt Learning for Lifelong Robot Manipulation
- Phoenix: A Motion-based Self-Reflection Framework for Fine-grained Robotic Action Correction Paper
- OmniManip: Towards General Robotic Manipulation via Object-Centric Interaction Primitives as Spatial Constraints Paper Page
- Mitigating the Human-Robot Domain Discrepancy in Visual Pre-training for Robotic Manipulation Paper
- Object-Centric Prompt-Driven Vision-Language-Action Model for Robotic Manipulation Abstract
- Robotic Visual Instruction
- RoboGround: Robot Manipulation with Grounded Vision-Language Priors
- KStar Diffuser: Spatial-Temporal Graph Diffusion Policy with Kinematics Modeling for Bimanual Robotic Manipulation Paper
- RoboPEPP: Vision-Based Robot Pose and Joint Angle Estimation through Embedding Predictive Pre-Training Paper
- Lift3D Policy: Lifting 2D Foundation Models for Robust 3D Robotic Manipulation Paper Page
- PDFactor: Learning Tri-Perspective View Policy Diffusion Field for Multi-Task Robotic Manipulation Abstract
- Two by Two: Learning Cross-Task Pairwise Objects Assembly for Generalizable Robot Manipulation
- FlowRAM: Grounding Flow Matching Policy with Region-Aware Mamba Framework for Robotic Manipulation Abstract
- G3Flow: Generative 3D Semantic Flow for Pose-aware and Generalizable Object Manipulation Paper Page
- DexHandDiff: Interaction-aware Diffusion Planning for Adaptive Dexterous Manipulation Paper Page
- Tra-MoE: Learning Trajectory Prediction Model from Multiple Domains for Adaptive Policy Conditioning Paper
- AffordDP: Generalizable Diffusion Policy with Transferable AffordancePaper Page
- Tra-MoE: Learning Trajectory Prediction Model from Multiple Domains for Adaptive Policy Conditioning Paper Page
- UniGraspTransformer: Simplified Policy Distillation for Scalable Dexterous Robotic Grasping Paper Page
- DexGrasp Anything: Towards Universal Robotic Dexterous Grasping with Physics Awareness Paper Page
- ZeroGrasp: Zero-Shot Shape Reconstruction Enabled Robotic Grasping Paper
- Let Humanoid Robots Go Hiking! Integrative Skill Development over Complex Trails Paper Page
- MobileH2R: Learning Generalizable Human to Mobile Robot Handover Exclusively from Scalable and Diverse Synthetic Data Paper
- 3D-MVP: 3D Multiview Pretraining for Robotic Manipulation Paper Page
- VidBot: Learning Generalizable 3D Actions from In-the-Wild 2D Human Videos for Zero-Shot Robotic ManipulationPaper Page
- Touch2Shape: Touch-Conditioned 3D Diffusion for Shape Exploration and Reconstruction Abs
- RoboBrain: A Unified Brain Model for Robotic Manipulation from Abstract to Concrete Paper
- PhysVLM: Enabling Visual Language Models to Understand Robotic Physical Reachability Paper
- RoboSpatial: Teaching Spatial Understanding to 2D and 3D Vision-Language Models for Robotics Paper
- Tartan IMU: A Light Foundation Model for Inertial Positioning in Robotics Abstract
- Code-as-Monitor: Constraint-aware Visual Programming for Reactive and Proactive Robotic Failure Detection Paper Page
- TASTE-Rob: Advancing Video Generation of Task-Oriented Hand-Object Interaction for Generalizable Robotic Manipulation Paper
- GraphMimic: Graph-to-Graphs Generative Modeling from Videos for Policy Learning Paper
- Prof. Robot: Differentiable Robot Rendering Without Static and Self-Collisions Paper Page
- AutoURDF: Unsupervised Robot Modeling from Point Cloud Frames Using Cluster Registration Paper Page
- RoboTwin: Dual-Arm Robot Benchmark with Generative Digital Twins (early version)Paper Page
- Pixel-aligned RGB-NIR Stereo Imaging and Dataset for Robot Vision Paper
- RoboSense: Large-scale Dataset and Benchmark for Egocentric Robot Perception and Navigation in Crowded and Unstructured Environments Paper Page
- LLaRA: Supercharging Robot Learning Data for Vision-Language Policy Paper Page
- VLAS: Vision-Language-Action Model With Speech Instructions For Customized Robot Manipulation Paper Page
- TraceVLA: Visual Trace Prompting Enhances Spatial-Temporal Awareness for Generalist Robotic Policies Paper Page
- Robots Pre-train Robots: Manipulation-Centric Robotic Representation from Large-Scale Robot Datasets Paper Page
- PIDM: Predictive Inverse Dynamics Models are Scalable Learners for Robotic Manipulation Paper Page
- GravMAD: Grounded Spatial Value Maps Guided Action Diffusion for Generalized 3D Manipulation Paper Page
- ReViWo: Learning View-invariant World Models for Visual Robotic Manipulation Paper zhihu
- HAMSTER: Hierarchical Action Models For Open-World Robot Manipulation Paper Page
- BadRobot: Jailbreaking Embodied LLMs in the Physical World Paper Page
- STRAP: Robot Sub-Trajectory Retrieval for Augmented Policy Learning Paper Page
- SRSA: Skill Retrieval and Adaptation for Robotic Assembly Tasks Paper Page
- Data Scaling Laws in Imitation Learning for Robotic Manipulation Paper Page
- Stem-OB: Generalizable Visual Imitation Learning with Stem-Like Convergent Observation through Diffusion Inversion Paper Page
- Dream to Manipulate: Compositional World Models Empowering Robot Imitation Learning with Imagination Paper Page
- SPA*: 3D Spatial-Awareness Enables Effective Embodied Representation Paper Page
- LASeR: Towards Diversified and Generalizable Robot Design with Large Language Models Paper Page
- Physics-informed Temporal Difference Metric Learning for Robot Motion Planning Paper Page
- AHA: A Vision-Language-Model for Detecting and Reasoning Over Failures in Robotic Manipulation Paper Page
- EMOS: Embodiment-aware Heterogeneous Multi-robot Operating System with LLM AgentsPaper Page
- VisualPredicator: Learning Abstract World Models with Neuro-Symbolic Predicates for Robot Planning Paper Page
- DenseMatcher: Learning 3D Semantic Correspondence for Category-Level Manipulation from a Single Demo Paper Page
- 6D Object Pose Tracking in Internet Videos for Robotic Manipulation Paper Page
- GEVRM: Goal-Expressive Video Generation Model For Robust Visual Manipulation Paper