I’m currently a fourth year undergrad at Peking University (PKU), advised by Prof. Wentao Zhang. My primary research interests lie in data-centric machine learning and its applications in large language models (LLMs). In particular, I focus on how high-quality data can be leveraged to enhance models’ generalization and logical reasoning abilities. I am also interested in how LLM agents can perform decision-making and reasoning in complex, dynamic settings.
Prior to this, I worked in Prof. Yitao Liang’s CraftJarvis Team, committed to developing a generalist agent capable of mastering a wide range of tasks and challenges within the open-world Minecraft.
🔥 News
- 2025.10: 🎉🎉 Our survey page of “Data-Centric Perspectives on Agentic Retrieval-Augmented Generation: A Survey” is out at Awesome-AgenticRAG-Data.
- 2025.06: 🎉🎉 Our paper “Open-World Skill Discovery from Unsegmented Demonstration Videos” is accepted by ICCV 2025.
📝 Publications

Open-World Skill Discovery from Unsegmented Demonstration Videos
Jingwen Deng*, Zihao Wang*, Shaofei Cai, Anji Liu, Yitao Liang
- We propose a self-supervised method, Skill Boundary Detection (SBD), that segments unlabelled long videos into semantic aware skill-consistent parts by detecting prediction-error peaks.
- In Minecraft experiments, SBD significantly improves both short-term and long-horizon task performance, enabling effective use of diverse online videos to train instruction-following agents.
📝 Works in Progress

Data-Centric Perspectives on Agentic Retrieval-Augmented Generation: A Survey
Jingwen Deng, Jihao Huang, Zhen Hao Wong, Hao Liang, Bin Cui, Wentao Zhang
This survey provides a data-centric overview of Agentic RAG, outlining its full data lifecycle and offering guidance for building scalable datasets to power adaptive, knowledge-seeking LLM agents.

FlipVQA-Miner: Cross-Page Visual Question–Answer Mining from Textbooks
Zhen Hao Wong*, Jingwen Deng*, Hao Liang, Runming He, Chengyu Shen, Wentao Zhang
We propose an automated pipeline that extracts well-formed QA and VQA pairs from college textbooks by combining layout-aware OCR with LLM-based semantic parsing.
🎖 Honors and Awards
- 2025
- Leo KoGuan Scholarship, Peking University
- 2024
- Leo KoGuan Scholarship, Peking University
- First Prize, Mathematics competition of Chinese College Students
- 2023
- YanChuang Capital Scholarship, Peking University
📖 Educations
- 2022.09 - , Peking University, BS in Computer Science
- GPA: 3.834/4.0 (rank 9/146, top 7% in major)
💻 Internships
- 2025.09 - , DPTechnology, China.