👋 About Me

I am a first year Statistics and Data Science PhD student at UCLA, advised by Prof. Kai-Wei Chang and Prof. Ying Nian Wu. Before that, I earned my M.S. in Data Science from Tsinghua University and B.S. in Electronic Engineering from Sun Yat-sen University (SYSU). Currently, I'm a research intern at Microsoft Research, working with Dr. Yeyun Gong, Dr. Yelong Shen, and Dr. Weizhu Chen. My research interests lie at the intersection of generative models, large language models (LLMs), and reinforcement learning (RL).

🔥 News

2025.09: 🎉 Our paper SwS has been accepted by Neurips 2025.
2025.09: 🎉 Our paper GuiLoMo has been accepted by EMNLP 2025.
2025.09: 🎓️ Start my PhD research journey at UCLA.
2025.05: 🎓 Graduated from my Master’s program at Tsinghua University. Thanks to all my advisors and friends!

📖 Education

Sep. 2025 - Jun. 2029 (Expected) Ph.D., Statistics and Data Science, University of California, Los Angeles, USA
Aug. 2022 - Jun. 2025 M.Sc., Data Science and Information Technology, Tsinghua University, Beijing, China.
Sep. 2018 - Jun. 2022 B.Sc., Electronic Information Science and Technology, Sun Yat-sen University, Guangzhou, China.

📑 Selected Publications

Preprint 2025

Beyond Pass@1: Self-Play With Variational Problem Synthesis Sustains RLVR

Xiao Liang*, Zhong-Zhi Li*, Yeyun Gong, Yelong Shen, Ying Nian Wu, Zhijiang Guo, Weizhu Chen

Code Project Page

We propose an online Self-play with Variational Problem Synthesis strategy for RLVR training that iteratively leverages model responses to synthesize variational problems for augmentation.

NeurIPS 2025

SwS: Self-aware Weakness-driven Problem Synthesis in Reinforcement Learning for LLM Reasoning

Xiao Liang*, Zhong-Zhi Li*, Yeyun Gong, Yang Wang, Hengyuan Zhang, Yelong Shen, Ying Nian Wu, Weizhu Chen

Code Project Page

We introduce a Self-aware Weakness-driven Problem Synthesis framework that identifies and leverages model weaknesses for problem augmentation in RLVR.

Preprint 2025

Gold-Medal-Level Olympiad Geometry Solving with Efficient Heuristic Auxiliary Constructions

Boyan Duan, Xiao Liang, Shuai Lu, Yaoxiang Wang, Yelong Shen, Kai-Wei Chang, Ying Nian Wu, Mao Yang, Weizhu Chen, Yeyun Gong

Code Benchmark

We present a highly efficient geometry theorem proving method that runs entirely on CPUs and achieves gold-medal-level performance on IMO problems.

Preprint 2025

TL;DR: Too Long, Do Re-weighting for Efficient LLM Reasoning Compression

Zhong-Zhi Li*, Xiao Liang*, Zihao Tang, Lei Ji, Peijie Wang, Haotian Xu, Xing W, Haizhen Huang, Weiwei Deng, Yeyun Gong, Ying Nian Wu, Zhijiang Guo, Xiao Liu, Fei Yin, Cheng-Lin Liu

Code

We propose a dynamic ratio-based training pipeline that balances System-1 and System-2 data to reduce redundant reasoning in LLMs.

Preprint 2025

Reinforcement Learning with Verifiable Rewards Implicitly Incentivizes Correct Reasoning in Base LLMs

Xumeng Wen*, Zihan Liu*, Shun Zheng*, Zhijian Xu, Shengyu Ye, Zhirong Wu, Xiao Liang, Yang Wang, Junjie Li, Ziming Miao, Jiang Bian, Mao Yang

We introduce a stricter reasoning-aware metric beyond pass@k and show that RLVR uniquely incentivizes logically consistent reasoning.

ICLR 2025

Integrative Decoding: Improve Factuality via Implicit Self-consistency

Yi Cheng, Xiao Liang, Yeyun Gong, Wen Xiao, Song Wang, Yuji Zhang, Wenjun Hou, Kaishuai Xu, Wenge Liu, Wenjie Li, Jian Jiao, Qi Chen, Peng Cheng, Wayne Xiong

Code

We present a self-consistency based decoding strategy that improves factual accuracy in long-form generation.

EMNLP 2024

Task Oriented In-Domain Data Augmentation

Xiao Liang*, Xinyu Hu*, Simiao Zuo, Yeyun Gong, Qiang Lou, Yi Liu, Shao-Lun Huang, Jian Jiao

We propose a task-oriented in-domain data augmentation framework consisting of in-domain data selection and task-specific synthesis.

ACL 2024

Chunk, Align, Select: A Simple Long-sequence Processing Method for Transformers

Jiawen Xie, Pengyu Cheng, Xiao Liang, Yong Dai, Nan Du

Code

We propose a reinforcement-learning-based token selection framework for efficient long-sequence processing.

ACL 2025

Chain-of-Reasoning: Towards Unified Mathematical Reasoning in Large Language Models via a Multi-Paradigm Perspective

Yiyao Yu, Yuxiang Zhang, Dongdong Zhang, Xiao Liang, Hengyuan Zhang, Xingxing Zhang, Ziyi Yang, Mahmoud Khademi, Hany Awadalla, Junjie Wang, Yujiu Yang, Furu Wei

We introduce a unified framework that integrates natural language, algorithmic, and symbolic reasoning for mathematical problem solving.

🧑‍💻 Experience

(Nov. 2023 - Present) Reserch Intern, Microsoft Research & CoreAI.
Mentor: Yeyun Gong, Yelong Shen, Weizhu Chen
Working on LLM reasoning, pre-training and reinforcement learning.
(Mar. 2023 - Sep. 2023) Research Intern, AI Lab, Tencent Inc., Guangdong, China.
Mentor: Pengyu Cheng
Working on large language models, long sequence processing.

🏆 Honors and Awards

Graduate Dean’s Scholar Award (GDSA, Amount: $14,500), UCLA, California, 2025
Outstanding Master Graduate Thesis, Tsinghua University, Beijing, 2025
Second Prize Scholarship, Tsinhua University, Beijing, 2024
Outstanding Graduate Thesis, Sun Yat-sen University, Guangdong, 2022
Outstanding Graduate Student, Sun Yat-sen University, Guangdong, 2022
Second Prize Scholarship, Sun Yat-sen University, Guangdong, 2019~2022
1st of the 2022 Tsinghua Open Hack Competition - Multimodal Learning Track, Tsinghua University, Beijing, 2022