↑ This is Zhongguancun Science & Technology Park in Beijing where I interned in summer 2025 ↑
Hello! I am an MS student at University of Electronic Science and Technology of China (UESTC).
Focus: Embodied Intelligence (VLA), Vision-Language Models (VLM).
- [Mar. 2026] 🎉 Our RoboCOIN dataset has been rated as a EAI-100 Top-10 Dataset in 2025 by ModelScope and CCF TCIR!
- [Mar. 2026] 🖋️ I was invited to serve as a reviewer for BMVC 2026.
- [Mar. 2026] 📈 Our RoboCOIN dataset has reached 4,000,000+ total downloads!
- [Feb. 2026] 🚀 Paper InSpire (Intrinsic Spatial Reasoning for VLAs) accepted by ICRA 2026!
🔍 Media Reports & Coverage
- 🌏 [ModelScope] BAAI RoboCOIN is officially open-sourced!
- 🌏 [ModelScope] EAI-100: Top 100 Achievements & Figures in Embodied AI 2025 White Paper
- 🎙️ [AIorang] RoboCOIN: Large-scale Dual-Arm Robot Dataset Public Lecture
- 🏢 [AgileX] BAAI builds the first large-scale multi-embodiment dual-arm data infrastructure
- 📰 [Embodied AI Heart] PCD: Training-free & Plug-and-Play VLA for Action Prediction
- 📰 [DeepBlue] Simple spatial reasoning boosts VLA generalization by 4x
- 📰 [Multimodal Space] CoT-enhanced spatial reasoning for VLAs
- 📰 [LLMPhD] Skip Tuning: Lightweight adaptation for vision-language models
- 📰 [Robot Lecture Hall] BAAI RoboCOIN: Largest real-world dual-arm dataset with fine-grained annotations
- 🏢 Research Intern · Beijing Academy of Artificial Intelligence (BAAI) · 2025.06 - Present
- 🎓 Master's Student · UESTC, Computer Science · 2023.09 - Present
- 🏆National Scholarship (2024), 🏅Sichuan Province Outstanding Graduate (2026)
- 🎓 Bachelor of SE · UESTC, Software Engineering · 2019.09 - 2023.06
- 🏅UESTC Outstanding Graduate (2023), 🏆"Shiqiang" Special Scholarship (2022)
-
🤖 [EAI-100 TOP-10 Datasets in 2025] RoboCOIN: An Open-Sourced Bimanual Robotic Data Collection for Integrated Manipulation
[Project] [arXiv] [PDF] [Code]- Open-sourced large-scale bimanual robotic dataset with 15 robotic platforms and 180K+ demonstrations, collaborated with 20 institutions.
-
🤖 [ICLR 2026] Policy Contrastive Decoding for Robotic Foundation Models
[Project] [arXiv] [PDF] [Code]- Universal framework for multiple VLA architectures, achieving +8%~41% improvement without training.
-
🤖 [ICRA 2026] InSpire: Vision-Language-Action Models with Intrinsic Spatial Reasoning
[Project] [arXiv] [PDF] [Code]- Reducing spurious correlations in VLAs, boosting performance on seen (+6.2%) and unseen (+10%) tasks.
-
🖼️ [IJCV 2026] A Closer Look at Conditional Prompt Tuning for Vision-Language Models
[arXiv] [PDF] [Code]- Identified critical issues in existing conditional prompt tuning methods, outperforming the state-of-the-art by 3.49%.
-
🖼️ [CVPR 2025] Skip Tuning: Pre-trained Vision-Language Models are Effective and Efficient Adapters
[arXiv][PDF][Code]- Parameter-free adaptation method, +1.04% accuracy with 15x speedup and 6.4x memory efficiency.
-
🖼️ [CVPR 2024] DePT: Decoupled Prompt Tuning
[arXiv] [PDF] [Code]- Plug-and-play method providing +0.67%~2.65% gains across various prompt tuning baselines.
| Category | Skills & Frameworks |
|---|---|
| AI | |
| Data Science | |
| Languages | |
| Web & Backend | |
| Tools |
|
|
|
BAAI |
UESTC |


