Skip to content

GAI4Manipulation/AwesomeGAIManipulation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 

Repository files navigation

AwesomeGAIManipulation

Survey

Data Generation

  • GRUtopia: Dream General Robots in a City at Scale [paper] [code]
  • Diffusion for Multi-Embodiment Grasping [paper]
  • Gen2sim: Scaling up robot learning in simulation with generative model (ICRA 2024) [paper] [code] [webpage]
  • RoboGen: Towards Unleashing Infinite Data for Automated Robot Learning via Generative Simulation (ICML 2024) [paper] [code] [webpage]
  • Holodeck: Language Guided Generation of 3D Embodied AI Environments (CVPR 2024) [paper] [code] [webpage]
  • Video Generation Models as World Simulators [paper] [webpage]
  • Learning Interactive Real-World Simulators [paper]
  • MimicGen: A Data Generation System for Scalable Robot Learning using Human Demonstrations (CoRL 2023) [paper] [code] [webpage]
  • CyberDemo: Augmenting Simulated Human Demonstration for Real-World Dexterous Manipulation (CVPR 2024) [paper] [code] [webpage]
  • Diffusion Meets DAgger: Supercharging Eye-in-hand Imitation Learning [paper] [code] [webpage]
  • DexMimicGen: Automated Data Generation for Bimanual Dexterous Manipulation via Imitation Learning (ICRA 2025) [paper] [code] [webpage]
  • IntervenGen: Interventional Data Generation for Robust and Data-Efficient Robot Imitation Learning [paper] [webpage]
  • Scaling Up and Distilling Down: Language-Guided Robot Skill Acquisition (CoRL 2023) [paper] [code] [webpage]
  • GenAug: Retargeting behaviors to unseen situations via Generative Augmentation (RSS 2023) [paper] [code] [webpage]
  • Scaling Robot Learning with Semantically Imagined Experience (RSS 2023) [paper] [webpage]
  • RoVi-Aug: Robot and Viewpoint Augmentation for Cross-Embodiment Robot Learning (CoRL 2024) [paper] [code] [webpage]
  • Learning Robust Real-World Dexterous Grasping Policies via Implicit Shape Augmentation (CoRL 2022) [paper] [webpage]
  • DALL-E-Bot: Introducing Web-Scale Diffusion Models to Robotics [paper]
  • Shadow: Leveraging Segmentation Masks for Cross-Embodiment Policy Transfer (CoRL 2024) [paper] [code]
  • Human-to-Robot Imitation in the Wild (RSS 2022) [paper] [webpage]
  • Mirage: Cross-Embodiment Zero-Shot Policy Transfer with Cross-Painting (RSS 2024) [paper] [code] [webpage]
  • CACTI: A Framework for Scalable Multi-Task Multi-Scene Visual Imitation Learning [paper] [webpage]
  • RoboAgent: Generalization and Efficiency in Robot Manipulation via Semantic Augmentations and Action Chunking (ICRA 2024) [paper] [code] [webpage]
  • ExAug: Robot-Conditioned Navigation Policies via Geometric Experience Augmentation (ICRA 2023) [paper] [code] [webpage]
  • RACER: Rich Language-Guided Failure Recovery Policies for Imitation Learning (ICRA 2025) [paper] [code] [webpage]
  • Robotic Skill Acquisition via Instruction Augmentation with Vision-Language Models (RSS 2023) [paper] [webpage]

Reward Generation

  • Language to Rewards for Robotic Skill Synthesis (CoRL 2023) [paper] [code] [webpage]
  • Vision-Language Models as Success Detectors (CoLLA 2023) [paper]
  • Scaling robot policy learning via zero-shot labeling with foundation models (CoRL 2024) [paper] [code] [webpage]
  • FuRL: Visual-Language Models as Fuzzy Rewards for Reinforcement Learning (ICML 2024) [paper] [code]
  • Text2Reward: Automated Dense Reward Function Generation for Reinforcement Learning (ICLR 2024) [paper]
  • Eureka: Human-Level Reward Design via Coding Large Language Models (NeurIPS 2023) [paper]
  • Agentic Skill Discovery (CoRL 2024 workshop & ICRA@40) [paper] [code]
  • CLIPort: What and Where Pathways for Robotic Manipulation [paper]
  • R3M: A Universal Visual Representation for Robot Manipulation [paper] [code] [webpage]
  • LIV: Language-Image Representations and Rewards for Robotic Control (ICML 2023) [paper] [code] [webpage]
  • Learning Reward Functions for Robotic Manipulation by Observing Humans [paper]
  • Deep visual foresight for planning robot motion (ICRA 2017) [paper]
  • VLMPC: Vision-Language Model Predictive Control for Robotic Manipulation (RSS 2024) [paper] [code]
  • Learning Reward for Robot Skills Using Large Language Models via Self-Alignment (ICML 2024) [paper]
  • Video Prediction Models as Rewards for Reinforcement Learning [paper] [code]
  • Vip: Towards universal visual reward and representation via value-implicit pre-training (ICLR 2023) [paper] [code]
  • Learning to Understand Goal Specifications by Modelling Reward [paper]
  • Plan-Seq-Learn: Language Model Guided RL for Solving Long Horizon Robotics Tasks [paper]
  • Policy improvement using language feedback models (NeurIPS 2024) [paper]

State Generation

  • Reinforcement learning with action-free pre-training from videos (ICML2022) [paper] [code]
  • Mastering diverse domains through world models [paper] [code] [webpage]
  • Dream to Control: Learning Behaviors by Latent Imagination [paper]
  • Robot Shape and Location Retention in Video Generation Using Diffusion Models [paper] [code]
  • Uncertainty-aware active learning of nerf-based object models for robot manipulators using visual and re-orientation actions [paper] [code] [webpage]
  • Click to Grasp: Zero-Shot Precise Manipulation via Visual Diffusion Descriptors [paper] [code] [webpage]
  • Empowering Embodied Visual Tracking with Visual Foundation Models and Offline RL (ECCV2024) [paper]
  • Doughnet: A visual predictive model for topological manipulation of deformable objects [paper]
  • KISA: A Unified Keyframe Identifier and Skill Annotator for Long-Horizon Robotics Demonstrations (ICML2024) [paper]
  • DynSyn: Dynamical Synergistic Representation for Efficient Learning and Control in Overactuated Embodied Systems (ICML2024) [paper]
  • Symmetry-Aware Robot Design with Structured Subgroups (ICML2023) [paper]
  • Total-recon: Deformable scene reconstruction for embodied view synthesis (ICCV2023) [paper] [code & data] [webpage]
  • Explore and Tell: Embodied Visual Captioning in 3D Environments (ICCV2023) [paper] [code & data]
  • Track2Act: Predicting Point Tracks from Internet Videos enables Generalizable Robot Manipulation (ECCV2024) [paper]
  • Manigaussian: Dynamic gaussian splatting for multi-task robotic manipulation (CoRL2024) [paper] [code] [webpage]
  • Physically Embodied Gaussian Splatting: A Realtime Correctable World Model for Robotics [paper]
  • Learning an Actionable Discrete Diffusion Policy via Large-Scale Actionless Video Pre-Training (NeurIPS2024) [paper] [code] [webpage]
  • PreLAR: World Model Pre-training with Learnable Action Representation (ECCV2024) [paper] [code]
  • Octopus: Embodied vision-language programmer from environmental feedback [paper] [code] [webpage]
  • Ec2: Emergent communication for embodied control (CVPR2023) [papar]
  • Voxposer: Composable 3d value maps for robotic manipulation with language models [paper]

Language Generation

  • Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents (PMLR 2022) [paper] [code]
  • Scaling Up and Distilling Down: Language-Guided Robot Skill Acquisition (PMLR 2023) [paper] [code]
  • Plan-Seq-Learn: Language Model Guided RL for Solving Long Horizon Robotics Tasks (ICLR 2024) [paper] [code]
  • Large language models as commonsense knowledge for large-scale task planning (NeurIPS 2023) [paper] [code]
  • REFLECT: Summarizing Robot Experiences for Failure Explanation and Correction (CoRL 2023) [paper] [code]
  • Gesture-Informed Robot Assistance via Foundation Models (CoRL 2023) [paper]
  • Large Language Models for Robotics: Opportunities, Challenges, and Perspectives [paper]
  • Embodied Agent Interface (EAI): Benchmarking LLMs for Embodied Decision Making (NeurIPS 2024 Track Datasets and Benchmarks) [paper] [code]
  • Embodiedgpt: Vision-language pre-training via embodied chain of thought (NeurIPS 2023) [paper] [code]
  • Chat with the Environment: Interactive Multimodal Perception using Large Language Models (IROS 2023) [paper] [code]
  • Embodied CoT Distillation From LLM To Off-the-shelf Agents (ICML 2024) [paper]
  • Do as i can, not as i say: Grounding language in robotic affordances [paper] [code]
  • Grounded Decoding: Guiding Text Generation with Grounded Models for Embodied Agents (NeurIPS 2023) [paper]
  • Inner Monologue: Embodied Reasoning through Planning with Language Models (CoRL 2022) [paper]
  • PhyGrasp: Generalizing Robotic Grasping with Physics-informed Large Multimodal Models [paper] [code]
  • SayPlan: Grounding Large Language Models using 3D Scene Graphs for Scalable Robot Task Planning (CoRL 2023) [paper]
  • Robomp2: A robotic multimodal perception-planning framework with multimodal large language models (ICML 2024) [paper] [code]
  • Text2Motion: From Natural Language Instructions to Feasible Plans (Autonomous Robots 2023) [paper]
  • STAP: Sequencing Task-Agnostic Policies (ICRA 2023) [paper] [code]

Code Generation

  • Closed-Loop Open-Vocabulary Mobile Manipulation with GPT-4V (arXiv 2024) [paper]
  • ProgPrompt: Program Generation for Situated Robot Task Planning Using Large Language Models (Autonomous Robots 2023) [paper]
  • See and Think: Embodied Agent in Virtual Environment (arXiv 2023) [paper]
  • Octopus: Embodied Vision-Language Programmer from Environmental Feedback (ECCV 2024) [paper] [webpage] [code]
  • Demo2Code: From Summarizing Demonstrations to Synthesizing Code via Extended Chain-of-Thought (NeurIPS 2023) [paper] [webpage] [code]
  • EC2: Emergent Communication for Embodied Control (CVPR 2023) [paper]
  • When Prolog Meets Generative Models: A New Approach for Managing Knowledge and Planning in Robotic Applications (ICRA 2024) [paper]
  • Code as Policies: Language Model Programs for Embodied Control (ICRA 2023) [paper] [webpage] [code]
  • GenCHiP: Generating Robot Policy Code for High-Precision and Contact-Rich Manipulation Tasks (arXiv 2024) [paper]
  • VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models (CoRL 2023) [paper] [webpage] [code]
  • ReKep: Spatio-Temporal Reasoning of Relational Keypoint Constraints for Robotic Manipulation (arXiv 2024) [paper] [webpage] [code]
  • RoboScript: Code Generation for Free-Form Manipulation Tasks Across Real and Simulation (arXiv 2024) [paper]
  • RobotGPT: Robot Manipulation Learning From ChatGPT (RAL 2024) [paper]
  • RoboCodeX: Multimodal Code Generation for Robotic Behavior Synthesis (ICML 2024) [paper] [webpage] [code]
  • Instruct2Act: Mapping Multi-modality Instructions to Robotic Actions with Large Language Model (arXiv 2023) [paper] [code]
  • GenSim: Generating Robotic Simulation Tasks via Large Language Models (ICLR 2024) [paper] [code]

Visual Generation

  • Learning Universal Policies via Text-Guided Video Generation (NeurIPS 2023) [paper] [webpage]
  • SlowFast-VGen: Slow-Fast Learning for Action-Driven Long Video Generation (ICLR 2025) [paper] [webpage]
  • Using Left and Right Brains Together: Towards Vision and Language Planning (ICML 2024) [paper]
  • Compositional Foundation Models for Hierarchical Planning (NeurIPS 2023) [paper] [webpage]
  • Closed-Loop Visuomotor Control with Generative Expectation for Robotic Manipulation (NeurIPS 2024) [paper] [code]
  • GR-1: Unleashing Large-Scale Video Generative Pre-training for Visual Robot Manipulation [webpage] [code]
  • GR-2: A Generative Video-Language-Action Model with Web-Scale Knowledge for Robot Manipulation [webpage]
  • Zero-Shot Robotic Manipulation with Pretrained Image-Editing Diffusion Models (ICLR 2024) [paper] [webpage] [code]
  • Generate Subgoal Images before Act: Unlocking the Chain-of-Thought Reasoning in Diffusion Model for Robot Manipulation with Multimodal Prompts (CVPR 2024) [paper] [webpage]
  • Surfer: Progressive Reasoning with World Models for Robotic Manipulation [paper]
  • TAX-Pose: Task-Specific Cross-Pose Estimation for Robot Manipulation (CoRL 2022) [paper] [webpage] [code]
  • Imagination Policy: Using Generative Point Cloud Models for Learning Manipulation Policies (CoRL 2024) [paper] [webpage] [code]
  • Uncertainty-aware Active Learning of NeRF-based Object Models for Robot Manipulators using Visual and Re-orientation Actions [paper] [webpage] [code]
  • Perceiver-Actor: A Multi-Task Transformer for Robotic Manipulation (CoRL 2022) [paper] [webpage] [code]
  • ManiGaussian: Dynamic Gaussian Splatting for Multi-task Robotic Manipulation (ECCV 2024) [paper] [webpage] [code]
  • GNFactor: Multi-Task Real Robot Learning with Generalizable Neural Feature Fields (CoRL 2023) [paper] [webpage] [code]
  • WorldVLA: Towards Autoregressive Action World Model [paper] [webpage] [code]

Grasp Generation

Trajectory Generation

  • Mobile ALOHA: Learning Bimanual Mobile Manipulation with Low-Cost Whole-Body Teleoperation [webpage]
  • Diffusion Policy: Visuomotor Policy Learning via Action Diffusion [webpage]
  • 3D Diffuser Actor: Policy Diffusion with 3D Scene Representations [webpage]
  • RT-1: Robotics Transformer for Real-World Control at Scale [webpage]
  • RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control [webpage]
  • RVT: Robotic View Transformer for 3D Object Manipulation [webpage]
  • RVT-2: Learning Precise Manipulation from Few Examples [webpage]
  • GR-1: Unleashing Large-Scale Video Generative Pre-training for Visual Robot Manipulation [webpage]
  • GR-2: A Generative Video-Language-Action Model with Web-Scale Knowledge for Robot Manipulation [webpage]
  • ReKep: Spatio-Temporal Reasoning of Relational Keypoint Constraints for Robotic Manipulation [webpage]
  • Gen2Act: Human Video Generation in Novel Scenarios enables Generalizable Robot Manipulation [webpage]
  • OpenVLA: An Open-Source Vision-Language-Action Model [webpage]
  • RDT-1B: a Diffusion Foundation Model for Bimanual Manipulation [webpage]
  • π0: Our First Generalist Policy [webpage]

About

Generative Artificial Intelligence in Robotic Manipulation: A Survey

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 7