RoboScholar: A Comprehensive Paper List of Embodied AI and Robotics Research

It's RoboScholar Project Here, started by Tianxing Chen.
Related Information:

Lumina Embodied AI Community

Embodied-AI-Guide

Sections

Manipulation
- Imitation Learning (IL)
- Reinforcement Learning (RL)
- Humanoid Whole-Body Control
- Dexterous Manipulation
- World Model
- Tectile Manipulation
Simulations
- Platform
- Dataset & Benchmark
Robot Hardware
- Data Collection
Real World Dataset
Robot Nevigation
Locomotion
LLM Agent for Robotics
Computer Vision
Embodied AI for X
- Medical

Recent Random Papers

Open-Vocabulary 3D Articulated Objects Modeling https://arxiv.org/pdf/2507.02747

[] [arXiv 25] LEMON: Learning 3D Human-Object Interaction Relation from 2D Images, arXiv
[] [arXiv 25] Merging and Disentangling Views in Visual Reinforcement Learning for Robotic Manipulation, arXiv
[] [RSS 25] Reactive Diffusion Policy: Slow-Fast Visual-Tactile Policy Learning for Contact-Rich Manipulation, arXiv
[] [arXiv 24] GRAPE: Generalizing Robot Policy via Preference Alignment, arXiv
[] [arXiv 25] GROVE: A Generalized Reward for Learning Open-Vocabulary Physical Skill, arXiv
[] [arXiv 24] Scaling Proprioceptive-Visual Learning with Heterogeneous Pre-trained Transformers, arXiv

1. Diffusion Model for Planning, Policy, and RL

[] [arXiv 24] Surgical Robot Transformer: Imitation Learning for Surgical Tasks, website

6. Generative Model for Embodied

[] [arXiv 24] Generative Image as Action Models, website
[] [arXiv 24] Genie: Generative Interactive Environments, website

9. Pose Estimation and Tracking

[] [CVPR 24 (Highlight)] FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects, website
[] [CVPR 23 (Highlight)] GAPartNet: Cross-Category Domain-Generalizable Object Perception and Manipulation via Generalizable and Actionable Parts, website
[] [arXiv 23] GAMMA: Generalizable Articulation Modeling and Manipulation for Articulated Objects, website
[] [arXiv 24] ManiPose: A Comprehensive Benchmark for Pose-aware Object Manipulation in Robotics, website
[] [ICCV 23] AffordPose: A Large-scale Dataset of Hand-Object Interactions with Affordance-driven Hand Pose, website
[] [CVPR 23] BundleSDF: Neural 6-DoF Tracking and 3D Reconstruction of Unknown Objects, website
[] [arXiv 24] WiLoR: End-to-end 3D hand localization and reconstruction in-the-wild, website

TO READ

Where2Act: From Pixels to Actions for Articulated 3D Objects
PreAfford: Universal Affordance-Based Pre-Grasping for Diverse Objects and Environments
Decision Transformer: Reinforcement Learning via Sequence Modeling
Toward General-Purpose Robots via Foundation Models: A Survey and Meta-Analysis
AO-Grasp: Articulated Object Grasp Generation
Human-to-Robot Imitation in the Wild
RoboCasa: Large-Scale Simulation of Everyday Tasks for Generalist Robots
SAM-E: Leveraging Visual Foundation Model with Sequence Imitation for Embodied Manipulation https://sam-embodied.github.io/, ICML24
https://progprompt.github.io/
PerAct, Act3D
https://groups.csail.mit.edu/vision/datasets/ADE20K/
Probing the 3D Awareness of Visual Foundation Model: https://arxiv.org/pdf/2404.08636
ManipVQA: Injecting Robotic Affordance and Physically Grounded Information into Multi-Modal Large Language Models
CLIP: Zero-shot Jack of All Trades, website, CLIP GradCAM CLIP_GradCAM_Visualization
Articulated Object Manipulation with Coarse-to-fine Affordance for Mitigating the Effect of Point Cloud Noise: https://arxiv.org/pdf/2402.18699
3D-VLA: A 3D Vision-Language-Action Generative World Model
PDDLGym: Gym Environments from PDDL Problems: https://arxiv.org/abs/2002.06432
https://github.com/zjunlp/LLMAgentPapers?tab=readme-ov-file
https://github.com/zjunlp/Prompt4ReasoningPapers
TravelPlanner: A Benchmark for Real-World Planning with Language Agents
VisionLLM: https://arxiv.org/abs/2305.11175
Ferret: Refer and Ground Anything Anywhere at Any Granularity: https://github.com/apple/ml-ferret
LangSplat
Embodied AI with Two Arms: Zero-shot Learning, Safety and Modularity
SparseDFF
ManiPose: A Comprehensive Benchmark for Pose-aware Object Manipulation in Robotics

Stabilizing Transformers for Reinforcement Learning
- Summary: 本文提出了Gated Transformer-XL (GTrXL)，一种改进的Transformer架构，用于解决标准Transformer在强化学习中的优化难题。通过引入层归一化和门控机制，GTrXL在部分可观察性环境中取得了优于LSTM的性能。
- 链接
CoBERL: Contrastive BERT for Reinforcement Learning
- Summary: 文章介绍了CoBERL，它结合了对比损失和Transformer架构，通过双向掩码预测和对比学习方法提高强化学习中的数据效率和性能。
- 链接
Adaptive Transformers in RL
- Summary: 该研究探索了在强化学习中使用具有自适应注意力跨度的Transformer模型，发现这种方法能够提高模型在需要长期依赖的环境中的性能。
- 链接
Efficient Transformers in Reinforcement Learning using Actor-Learner Distillation
- Summary: 本文提出了Actor-Learner Distillation (ALD)方法，通过从大型学习者模型向小型执行者模型进行知识蒸馏，以提高Transformer在强化学习中的样本效率。
- 链接
Deep Transformer Q-Networks for Partially Observable Reinforcement Learning
- Summary: 介绍了Deep Transformer Q-Networks (DTQN)，这是一种新型的强化学习架构，使用Transformer的自注意力机制来处理部分可观察性任务，并在多个挑战性环境中展示了有效性。
- 链接
CtrlFormer: Learning Transferable State Representation for Visual Control via Transformer
- Summary: CtrlFormer是一种新型的Transformer架构，专注于通过学习可迁移的状态表示来提高视觉控制任务的样本效率，特别强调了在跨任务迁移学习方面的优势。
- 链接

Sapiens: Foundation for Human Vision Models: https://about.meta.com/realitylabs/codecavatars/sapiens General Flow as Foundation Affordance for Scalable Robot Learning https://general-flow.github.io/

Name		Name	Last commit message	Last commit date
Latest commit History 128 Commits
files		files
topics		topics
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
To_Read_list.md		To_Read_list.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

RoboScholar: A Comprehensive Paper List of Embodied AI and Robotics Research

Sections

Recent Random Papers

1. Diffusion Model for Planning, Policy, and RL

6. Generative Model for Embodied

9. Pose Estimation and Tracking

TO READ

About

Uh oh!

Releases

Packages

License

TianxingChen/RoboScholar

Folders and files

Latest commit

History

Repository files navigation

RoboScholar: A Comprehensive Paper List of Embodied AI and Robotics Research

Sections

Recent Random Papers

1. Diffusion Model for Planning, Policy, and RL

6. Generative Model for Embodied

9. Pose Estimation and Tracking

TO READ

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages