Skip to content

Psi-Robot/Awesome-VLA-Papers

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 

Repository files navigation

Awesome-VLA-Papers

This repository contains the list of representative VLA works in the survey β€œA Survey on Vision-Language-Action Models: An Action Tokenization Perspective”, along with relevant reference materials.

Foundation Models

Language Foundation Models

Vision Foundation Models

Vision Language Models

Language Description as Action Tokens

Language Plan

Language Motion

Code as Action Tokens

Affordance as Action Tokens

Keypoint

Bounding Box

Segmentation Mask

Affordance Map

Trajectory as Action Tokens

Robotic Manipulation

Autonomous Driving

  • DriveVLM, DriveVLM: The Convergence of Autonomous Driving and Large Vision-Language Models, 2024.02, CoRL 2024. [πŸ“„ Paper] [🌍 Website]
  • CoVLA, CoVLA: Comprehensive Vision-Language-Action Dataset for Autonomous Driving, 2024.08, WACV 2025 Oral. [πŸ“„ Paper] [🌍 Website] [πŸ“Š Dataset]
  • EMMA, EMMA: End-to-End Multimodal Model for Autonomous Driving, 2024.10. [πŸ“„ Paper]
  • VLM-E2E, VLM-E2E: Enhancing End-to-End Autonomous Driving with Multimodal Driver Attention Fusion, 2025.02. [πŸ“„ Paper]

Goal State as Action Tokens

Single-Frame Image / Point Cloud

Multi-Frame Video

Latent Representation as Action Tokens

Raw Action as Action Tokens

Reasoning as Action Tokens

Scalable Data Sources

Bottom Layer: Web Data and Human Video

Middle Layer: Synthetic and Simulation Data

  • MimicGen, MimicGen: A Data Generation System for Scalable Robot Learning using Human Demonstrations, 2023.10, CoRL 2023. [πŸ“„ Paper]
  • RoboCase, RoboCasa: Large-Scale Simulation of Everyday Tasks for Generalist Robots, 2024.06, RSS 2024. [πŸ“„ Paper] [🌍 Website] [πŸ’» Code]
  • DexMimicGen, DexMimicGen: Automated Data Generation for Bimanual Dexterous Manipulation via Imitation Learning, 2024.10, ICRA 2025. [πŸ“„ Paper] [🌍 Website]
  • AgiBot DigitalWorld, AgiBot DigitalWorld, 2025.02. [🌍 Website]

Top Layer: Real-world Robot Data

Related Surveys

  • Robot Learning in the Era of Foundation Models: A Survey, 2023.11, Neurocomputing Volume 638. [πŸ“„ Paper]
  • A Survey on Robotics with Foundation Models: toward Embodied AI, 2024.02. [πŸ“„ Paper]
  • A Survey on Integration of Large Language Models with Intelligent Robots, 2024.04, Intelligent Service Robotics 2024. [πŸ“„ Paper]
  • What Foundation Models can Bring for Robot Learning in Manipulation: A Survey, 2024.04. [πŸ“„ Paper]
  • A Survey on Vision-Language-Action Models for Embodied AI, 2024.05. [πŸ“„ Paper]
  • Aligning Cyber Space with Physical World: A Comprehensive Survey on Embodied AI, 2024.07. [πŸ“„ Paper]
  • Exploring Embodied Multimodal Large Models: Development, Datasets, and Future Directions, 2025.02, Information Fusion Volume 122. [πŸ“„ Paper]
  • Generative Artificial Intelligence in Robotic Manipulation: A Survey, 2025.03. [πŸ“„ Paper]
  • OpenHelix: A Short Survey, Empirical Analysis, and Open-Source Dual-System VLA Model for Robotic Manipulation, 2025.05. [πŸ“„ Paper]
  • Vision-Language-Action Models: Concepts, Progress, Applications and Challenges, 2025.05. [πŸ“„ Paper]

About

Paper list in the survey: A Survey on Vision-Language-Action Models: An Action Tokenization Perspective

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published