AwesomeGAIManipulation

Survey

Data Generation

GRUtopia: Dream General Robots in a City at Scale [paper] [code]
Diffusion for Multi-Embodiment Grasping [paper]
Gen2sim: Scaling up robot learning in simulation with generative model (ICRA 2024) [paper] [code] [webpage]
RoboGen: Towards Unleashing Infinite Data for Automated Robot Learning via Generative Simulation (ICML 2024) [paper] [code] [webpage]
Holodeck: Language Guided Generation of 3D Embodied AI Environments (CVPR 2024) [paper] [code] [webpage]
Video Generation Models as World Simulators [paper] [webpage]
Learning Interactive Real-World Simulators [paper]
MimicGen: A Data Generation System for Scalable Robot Learning using Human Demonstrations (CoRL 2023) [paper] [code] [webpage]
CyberDemo: Augmenting Simulated Human Demonstration for Real-World Dexterous Manipulation (CVPR 2024) [paper] [code] [webpage]
Diffusion Meets DAgger: Supercharging Eye-in-hand Imitation Learning [paper] [code] [webpage]
DexMimicGen: Automated Data Generation for Bimanual Dexterous Manipulation via Imitation Learning (ICRA 2025) [paper] [code] [webpage]
IntervenGen: Interventional Data Generation for Robust and Data-Efficient Robot Imitation Learning [paper] [webpage]
Scaling Up and Distilling Down: Language-Guided Robot Skill Acquisition (CoRL 2023) [paper] [code] [webpage]
GenAug: Retargeting behaviors to unseen situations via Generative Augmentation (RSS 2023) [paper] [code] [webpage]
Scaling Robot Learning with Semantically Imagined Experience (RSS 2023) [paper] [webpage]
RoVi-Aug: Robot and Viewpoint Augmentation for Cross-Embodiment Robot Learning (CoRL 2024) [paper] [code] [webpage]
Learning Robust Real-World Dexterous Grasping Policies via Implicit Shape Augmentation (CoRL 2022) [paper] [webpage]
DALL-E-Bot: Introducing Web-Scale Diffusion Models to Robotics [paper]
Shadow: Leveraging Segmentation Masks for Cross-Embodiment Policy Transfer (CoRL 2024) [paper] [code]
Human-to-Robot Imitation in the Wild (RSS 2022) [paper] [webpage]
Mirage: Cross-Embodiment Zero-Shot Policy Transfer with Cross-Painting (RSS 2024) [paper] [code] [webpage]
CACTI: A Framework for Scalable Multi-Task Multi-Scene Visual Imitation Learning [paper] [webpage]
RoboAgent: Generalization and Efficiency in Robot Manipulation via Semantic Augmentations and Action Chunking (ICRA 2024) [paper] [code] [webpage]
ExAug: Robot-Conditioned Navigation Policies via Geometric Experience Augmentation (ICRA 2023) [paper] [code] [webpage]
RACER: Rich Language-Guided Failure Recovery Policies for Imitation Learning (ICRA 2025) [paper] [code] [webpage]
Robotic Skill Acquisition via Instruction Augmentation with Vision-Language Models (RSS 2023) [paper] [webpage]

Reward Generation

Language to Rewards for Robotic Skill Synthesis (CoRL 2023) [paper] [code] [webpage]
Vision-Language Models as Success Detectors (CoLLA 2023) [paper]
Scaling robot policy learning via zero-shot labeling with foundation models (CoRL 2024) [paper] [code] [webpage]
FuRL: Visual-Language Models as Fuzzy Rewards for Reinforcement Learning (ICML 2024) [paper] [code]
Text2Reward: Automated Dense Reward Function Generation for Reinforcement Learning (ICLR 2024) [paper]
Eureka: Human-Level Reward Design via Coding Large Language Models (NeurIPS 2023) [paper]
Agentic Skill Discovery (CoRL 2024 workshop & ICRA@40) [paper] [code]
CLIPort: What and Where Pathways for Robotic Manipulation [paper]
R3M: A Universal Visual Representation for Robot Manipulation [paper] [code] [webpage]
LIV: Language-Image Representations and Rewards for Robotic Control (ICML 2023) [paper] [code] [webpage]
Learning Reward Functions for Robotic Manipulation by Observing Humans [paper]
Deep visual foresight for planning robot motion (ICRA 2017) [paper]
VLMPC: Vision-Language Model Predictive Control for Robotic Manipulation (RSS 2024) [paper] [code]
Learning Reward for Robot Skills Using Large Language Models via Self-Alignment (ICML 2024) [paper]
Video Prediction Models as Rewards for Reinforcement Learning [paper] [code]
Vip: Towards universal visual reward and representation via value-implicit pre-training (ICLR 2023) [paper] [code]
Learning to Understand Goal Specifications by Modelling Reward [paper]
Plan-Seq-Learn: Language Model Guided RL for Solving Long Horizon Robotics Tasks [paper]
Policy improvement using language feedback models (NeurIPS 2024) [paper]

State Generation

Reinforcement learning with action-free pre-training from videos (ICML2022) [paper] [code]
Mastering diverse domains through world models [paper] [code] [webpage]
Dream to Control: Learning Behaviors by Latent Imagination [paper]
Robot Shape and Location Retention in Video Generation Using Diffusion Models [paper] [code]
Uncertainty-aware active learning of nerf-based object models for robot manipulators using visual and re-orientation actions [paper] [code] [webpage]
Click to Grasp: Zero-Shot Precise Manipulation via Visual Diffusion Descriptors [paper] [code] [webpage]
Empowering Embodied Visual Tracking with Visual Foundation Models and Offline RL (ECCV2024) [paper]
Doughnet: A visual predictive model for topological manipulation of deformable objects [paper]
KISA: A Unified Keyframe Identifier and Skill Annotator for Long-Horizon Robotics Demonstrations (ICML2024) [paper]
DynSyn: Dynamical Synergistic Representation for Efficient Learning and Control in Overactuated Embodied Systems (ICML2024) [paper]
Symmetry-Aware Robot Design with Structured Subgroups (ICML2023) [paper]
Total-recon: Deformable scene reconstruction for embodied view synthesis (ICCV2023) [paper] [code & data] [webpage]
Explore and Tell: Embodied Visual Captioning in 3D Environments (ICCV2023) [paper] [code & data]
Track2Act: Predicting Point Tracks from Internet Videos enables Generalizable Robot Manipulation (ECCV2024) [paper]
Manigaussian: Dynamic gaussian splatting for multi-task robotic manipulation (CoRL2024) [paper] [code] [webpage]
Physically Embodied Gaussian Splatting: A Realtime Correctable World Model for Robotics [paper]
Learning an Actionable Discrete Diffusion Policy via Large-Scale Actionless Video Pre-Training (NeurIPS2024) [paper] [code] [webpage]
PreLAR: World Model Pre-training with Learnable Action Representation (ECCV2024) [paper] [code]
Octopus: Embodied vision-language programmer from environmental feedback [paper] [code] [webpage]
Ec2: Emergent communication for embodied control (CVPR2023) [papar]
Voxposer: Composable 3d value maps for robotic manipulation with language models [paper]

Language Generation

Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents (PMLR 2022) [paper] [code]
Scaling Up and Distilling Down: Language-Guided Robot Skill Acquisition (PMLR 2023) [paper] [code]
Plan-Seq-Learn: Language Model Guided RL for Solving Long Horizon Robotics Tasks (ICLR 2024) [paper] [code]
Large language models as commonsense knowledge for large-scale task planning (NeurIPS 2023) [paper] [code]
REFLECT: Summarizing Robot Experiences for Failure Explanation and Correction (CoRL 2023) [paper] [code]
Gesture-Informed Robot Assistance via Foundation Models (CoRL 2023) [paper]
Large Language Models for Robotics: Opportunities, Challenges, and Perspectives [paper]
Embodied Agent Interface (EAI): Benchmarking LLMs for Embodied Decision Making (NeurIPS 2024 Track Datasets and Benchmarks) [paper] [code]
Embodiedgpt: Vision-language pre-training via embodied chain of thought (NeurIPS 2023) [paper] [code]
Chat with the Environment: Interactive Multimodal Perception using Large Language Models (IROS 2023) [paper] [code]
Embodied CoT Distillation From LLM To Off-the-shelf Agents (ICML 2024) [paper]
Do as i can, not as i say: Grounding language in robotic affordances [paper] [code]
Grounded Decoding: Guiding Text Generation with Grounded Models for Embodied Agents (NeurIPS 2023) [paper]
Inner Monologue: Embodied Reasoning through Planning with Language Models (CoRL 2022) [paper]
PhyGrasp: Generalizing Robotic Grasping with Physics-informed Large Multimodal Models [paper] [code]
SayPlan: Grounding Large Language Models using 3D Scene Graphs for Scalable Robot Task Planning (CoRL 2023) [paper]
Robomp2: A robotic multimodal perception-planning framework with multimodal large language models (ICML 2024) [paper] [code]
Text2Motion: From Natural Language Instructions to Feasible Plans (Autonomous Robots 2023) [paper]
STAP: Sequencing Task-Agnostic Policies (ICRA 2023) [paper] [code]

Code Generation

Closed-Loop Open-Vocabulary Mobile Manipulation with GPT-4V (arXiv 2024) [paper]
ProgPrompt: Program Generation for Situated Robot Task Planning Using Large Language Models (Autonomous Robots 2023) [paper]
See and Think: Embodied Agent in Virtual Environment (arXiv 2023) [paper]
Octopus: Embodied Vision-Language Programmer from Environmental Feedback (ECCV 2024) [paper] [webpage] [code]
Demo2Code: From Summarizing Demonstrations to Synthesizing Code via Extended Chain-of-Thought (NeurIPS 2023) [paper] [webpage] [code]
EC2: Emergent Communication for Embodied Control (CVPR 2023) [paper]
When Prolog Meets Generative Models: A New Approach for Managing Knowledge and Planning in Robotic Applications (ICRA 2024) [paper]
Code as Policies: Language Model Programs for Embodied Control (ICRA 2023) [paper] [webpage] [code]
GenCHiP: Generating Robot Policy Code for High-Precision and Contact-Rich Manipulation Tasks (arXiv 2024) [paper]
VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models (CoRL 2023) [paper] [webpage] [code]
ReKep: Spatio-Temporal Reasoning of Relational Keypoint Constraints for Robotic Manipulation (arXiv 2024) [paper] [webpage] [code]
RoboScript: Code Generation for Free-Form Manipulation Tasks Across Real and Simulation (arXiv 2024) [paper]
RobotGPT: Robot Manipulation Learning From ChatGPT (RAL 2024) [paper]
RoboCodeX: Multimodal Code Generation for Robotic Behavior Synthesis (ICML 2024) [paper] [webpage] [code]
Instruct2Act: Mapping Multi-modality Instructions to Robotic Actions with Large Language Model (arXiv 2023) [paper] [code]
GenSim: Generating Robotic Simulation Tasks via Large Language Models (ICLR 2024) [paper] [code]

Visual Generation

Learning Universal Policies via Text-Guided Video Generation (NeurIPS 2023) [paper] [webpage]
SlowFast-VGen: Slow-Fast Learning for Action-Driven Long Video Generation (ICLR 2025) [paper] [webpage]
Using Left and Right Brains Together: Towards Vision and Language Planning (ICML 2024) [paper]
Compositional Foundation Models for Hierarchical Planning (NeurIPS 2023) [paper] [webpage]
Closed-Loop Visuomotor Control with Generative Expectation for Robotic Manipulation (NeurIPS 2024) [paper] [code]
GR-1: Unleashing Large-Scale Video Generative Pre-training for Visual Robot Manipulation [webpage] [code]
GR-2: A Generative Video-Language-Action Model with Web-Scale Knowledge for Robot Manipulation [webpage]
Zero-Shot Robotic Manipulation with Pretrained Image-Editing Diffusion Models (ICLR 2024) [paper] [webpage] [code]
Generate Subgoal Images before Act: Unlocking the Chain-of-Thought Reasoning in Diffusion Model for Robot Manipulation with Multimodal Prompts (CVPR 2024) [paper] [webpage]
Surfer: Progressive Reasoning with World Models for Robotic Manipulation [paper]
TAX-Pose: Task-Specific Cross-Pose Estimation for Robot Manipulation (CoRL 2022) [paper] [webpage] [code]
Imagination Policy: Using Generative Point Cloud Models for Learning Manipulation Policies (CoRL 2024) [paper] [webpage] [code]
Uncertainty-aware Active Learning of NeRF-based Object Models for Robot Manipulators using Visual and Re-orientation Actions [paper] [webpage] [code]
Perceiver-Actor: A Multi-Task Transformer for Robotic Manipulation (CoRL 2022) [paper] [webpage] [code]
ManiGaussian: Dynamic Gaussian Splatting for Multi-task Robotic Manipulation (ECCV 2024) [paper] [webpage] [code]
GNFactor: Multi-Task Real Robot Learning with Generalizable Neural Feature Fields (CoRL 2023) [paper] [webpage] [code]
WorldVLA: Towards Autoregressive Action World Model [paper] [webpage] [code]

Grasp Generation

Trajectory Generation

Mobile ALOHA: Learning Bimanual Mobile Manipulation with Low-Cost Whole-Body Teleoperation [webpage]
Diffusion Policy: Visuomotor Policy Learning via Action Diffusion [webpage]
3D Diffuser Actor: Policy Diffusion with 3D Scene Representations [webpage]
RT-1: Robotics Transformer for Real-World Control at Scale [webpage]
RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control [webpage]
RVT: Robotic View Transformer for 3D Object Manipulation [webpage]
RVT-2: Learning Precise Manipulation from Few Examples [webpage]
GR-1: Unleashing Large-Scale Video Generative Pre-training for Visual Robot Manipulation [webpage]
GR-2: A Generative Video-Language-Action Model with Web-Scale Knowledge for Robot Manipulation [webpage]
ReKep: Spatio-Temporal Reasoning of Relational Keypoint Constraints for Robotic Manipulation [webpage]
Gen2Act: Human Video Generation in Novel Scenarios enables Generalizable Robot Manipulation [webpage]
OpenVLA: An Open-Source Vision-Language-Action Model [webpage]
RDT-1B: a Diffusion Foundation Model for Bimanual Manipulation [webpage]
π0: Our First Generalist Policy [webpage]

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AwesomeGAIManipulation

Survey

Data Generation

Reward Generation

State Generation

Language Generation

Code Generation

Visual Generation

Grasp Generation

Trajectory Generation

About

Uh oh!

Releases

Packages

Contributors 7

Uh oh!

GAI4Manipulation/AwesomeGAIManipulation

Folders and files

Latest commit

History

Repository files navigation

AwesomeGAIManipulation

Survey

Data Generation

Reward Generation

State Generation

Language Generation

Code Generation

Visual Generation

Grasp Generation

Trajectory Generation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 7

Uh oh!

Packages