I’m currently a fourth year undergrad at Peking University (PKU), advised by Prof. Wentao Zhang. My primary research interests lie in data-centric machine learning and its applications in large language models (LLMs). In particular, I focus on how high-quality data can be leveraged to enhance models’ generalization and logical reasoning abilities. I am also interested in how LLM agents can perform decision-making and reasoning in complex, dynamic settings.

Prior to this, I worked in Prof. Yitao Liang’s CraftJarvis Team, committed to developing a generalist agent capable of mastering a wide range of tasks and challenges within the open-world Minecraft.

🔥 News

  • 2025.10:  🎉🎉 Our survey page of “Data-Centric Perspectives on Agentic Retrieval-Augmented Generation: A Survey” is out at Awesome-AgenticRAG-Data.
  • 2025.06:  🎉🎉 Our paper “Open-World Skill Discovery from Unsegmented Demonstration Videos” is accepted by ICCV 2025.

📝 Publications

ICCV 2025
sym

Open-World Skill Discovery from Unsegmented Demonstration Videos

Jingwen Deng*, Zihao Wang*, Shaofei Cai, Anji Liu, Yitao Liang

Paper | Project

  • We propose a self-supervised method, Skill Boundary Detection (SBD), that segments unlabelled long videos into semantic aware skill-consistent parts by detecting prediction-error peaks.
  • In Minecraft experiments, SBD significantly improves both short-term and long-horizon task performance, enabling effective use of diverse online videos to train instruction-following agents.

LogicPuzzleRL: Cultivating Robust Mathematical Reasoning in LLMs via Reinforcement Learning

Zhen Hao Wong*, Jingwen Deng*, Runming He, Zirong Chen, Qijie You, Hejun Dong, Hao Liang, Chengyu Shen, Bin Cui, Wentao Zhang

arXiv 2025

Paper | Code

📝 Works in Progress

sym

Data-Centric Perspectives on Agentic Retrieval-Augmented Generation: A Survey

Jingwen Deng, Jihao Huang, Zhen Hao Wong, Hao Liang, Bin Cui, Wentao Zhang

Paper | Project

This survey provides a data-centric overview of Agentic RAG, outlining its full data lifecycle and offering guidance for building scalable datasets to power adaptive, knowledge-seeking LLM agents.

sym

FlipVQA-Miner: Cross-Page Visual Question–Answer Mining from Textbooks

Zhen Hao Wong*, Jingwen Deng*, Hao Liang, Runming He, Chengyu Shen, Wentao Zhang

Paper | Project

We propose an automated pipeline that extracts well-formed QA and VQA pairs from college textbooks by combining layout-aware OCR with LLM-based semantic parsing.

🎖 Honors and Awards

  • 2025
    • Leo KoGuan Scholarship, Peking University
  • 2024
    • Leo KoGuan Scholarship, Peking University
    • First Prize, Mathematics competition of Chinese College Students
  • 2023
    • YanChuang Capital Scholarship, Peking University

📖 Educations

  • 2022.09 - , Peking University, BS in Computer Science
    • GPA: 3.834/4.0 (rank 9/146, top 7% in major)

💻 Internships