Zhaowei Zhang's Homepage

Zhaowei Zhang (张钊为)

Ph.D. Student

School of Intelligence Science and Technology

Peking University

Email: zwzhang [at] stu (dot) pku (dot) edu (dot) cn

[Google Scholar] [Github] [Twitter] [LinkedIn]

Research Interests

AI Alignment & Governance
Multi-Agent System
Reinforcement Learning
Game Theory

Zhaowei is pronounced as "Ju" (as in judge) + "ou" (as in out) + "Way"; Zhang, or Cheung in Hong Kong, is "Ju" (as in judge) + "on" | audio ([International Phonetic Alphabet, IPA]) here: [tʂɑuwei][tʂɑŋ].

I am currently a third-year Ph.D. Candidate at Institute for AI, School of Intelligence Science and Technology, Peking University. Specifically, I am in the team of PAIR-Lab led by Prof. Yaodong Yang. The long-term goal of my research is to build a strong and human-like AI system. To this end, my research focuses on AI Alignment, Reinforcement Learning, and Multi-Agent System. In particular, I am currently quite interested in investigating the complete closed-loop process for LLM alignment, which includes exploring AI for finding human consensus, RL for improving LLM's instruction following, and test-time alignment algorithms. I welcome more friends to discuss these topics with me ☺️.

Publications (* indicates equal contribution.)

2025

Make an Offer They Can't Refuse: Grounding Bayesian Persuasion in Real-World Dialogues without Pre-Commitment

Buwei He, Yang Liu, Zhaowei Zhang, Zixia Jia, Huijia Wu, Zhaofeng He, Zilong Zheng, Yipeng Kang

Preprint
[Paper]
Aegis: Automated Error Generation and Identification for Multi-Agent Systems

Fanqi Kong, Ruijie Zhang, Huaxiao Yin, Guibin Zhang, Xiaofei Zhang, Ziang Chen, Zhaowei Zhang, Xiaoyuan Zhang, Song-Chun Zhu, Xue Feng

Preprint
[Paper] [Website] [Dataset]
PoliCon: Evaluating LLMs on Achieving Diverse Political Consensus Objectives

Zhaowei Zhang, Xiaobo Wang, Minghua Yi, Mengmeng Wang, Fengshuo Bai, Zilong Zheng, Yipeng Kang, Yaodong Yang

Preprint
[Paper] [Website]
Evaluating Generalization Capabilities of LLM-Based Agents in Mixed-Motive Scenarios Using Concordia

Cooperate with the DeepMind Concordia Team

NeurIPS DB Track 2025
[Paper]
Amulet: ReAlignment During Test Time for Personalized Preference Adaptation of LLMs

Zhaowei Zhang, Fengshuo Bai, Qizhi Chen, Chengdong Ma, Mingzhi Wang, Haoran Sun, Zilong Zheng, Yaodong Yang

ICLR 2025
[Paper] [Website] [Code]
Magnetic Preference Optimization: Achieving Last-iterate Convergence for Language Models Alignment

Mingzhi Wang, Chengdong Ma, Qizhi Chen, Linjian Meng, Yang Han, Jiancong Xiao, Zhaowei Zhang, Jing Huo, Weijie J. Su, Yaodong Yang

ICLR 2025
[Paper]

2024

ValueDCG: Measuring Comprehensive Human Value Understanding Ability of Language Models

Zhaowei Zhang, Fengshuo Bai, Jun Gao, Yaodong Yang

NeurIPS 2025 Workshop on Regulatable ML
[Paper] [Blog] [Chinese Blog]
Efficient Model-agnostic Alignment via Bayesian Persuasion

Fengshuo Bai, Mingzhi Wang *, Zhaowei Zhang *, Boyuan Chen *, Yinda Xu, Ying Wen, Yaodong Yang

Preprint
[Paper]
Foundational Challenges in Assuring Alignment and Safety of Large Language Models

As a major contributor

TMLR
[Paper] [Website]
Roadmap on Incentive Compatibility for AI Alignment and Governance in Sociotechnical Systems

Zhaowei Zhang, Fengshuo Bai, Mingzhi Wang, Haoyang Ye, Chengdong Ma, Yaodong Yang

AGI 2025 (Oral)
[Paper] [Chinese Blog]
CivRealm: A Learning and Reasoning Odyssey in Civilization for Decision-Making Agents

Siyuan Qi, Shuo Chen, Yexin Li, Xiangyu Kong, Junqi Wang, Bangcheng Yang, Pring Wong, Yifan Zhong, Xiaoyuan Zhang, Zhaowei Zhang, Nian Liu, Wei Wang, Yaodong Yang, Song-Chun Zhu

ICLR 2024 (Spotlight)
[Paper] [Website] [Code]

2023

AI Alignment: A Comprehensive Survey

PAIR-Lab

ACM Computing Surveys
[Paper] [Website]
ProAgent: Building Proactive Cooperative AI with Large Language Models

Ceyao Zhang, Kaijie Yang, Siyi Hu, Zihao Wang, Guanghe Li, Yihang Sun, Cheng Zhang, Zhaowei Zhang, Anji Liu, Song-Chun Zhu, Xiaojun Chang, Junge Zhang, Feng Yin, Yitao Liang, Yaodong Yang

AAAI 2024 (Oral)
[Paper]
Heterogeneous Value Alignment Evaluation for Large Language Models

Zhaowei Zhang, Nian Liu, Siyuan Qi, Ceyao Zhang, Ziqi Rong, Shuguang Cui, Song-Chun Zhu, Yaodong Yang

AGI 2025 & AAAI 2024 Workshop: Public Sector LLMs (Oral)
[Paper]
STAS: Spatial-Temporal Return Decomposition for Solving Sparse Rewards Problems in Multi-agent Reinforcement Learning

Sirui Chen *, Zhaowei Zhang *, Yali Du, Yaodong Yang

AAAI 2024
[Paper] [Code]

2022

Contextual Transformer for Offline Meta Reinforcement Learning

Runji Lin, Ye Li, Xidong Feng, Zhaowei Zhang, Xian Hong Wu Fung, Haifeng Zhang, Jun Wang, Yali Du, Yaodong Yang

FMDM Workshop at NeurIPS 2022
[Paper]

Perspectives

The Three-Layer Paradigm for Implementing Sociotechnical AI Alignment: A Top-Down-Top Outlook

Abstract: Backward Alignment is an indispensable part of AI Alignment, and the alignment problems from the perspective of Socio-Technical Systems (STS) are an important component of it. However, what exactly are STS, and what do they refer to? In fact, STS is a very broad concept with many considerations, but currently, there is still little work that clearly unifies all these issues in one go; they are often glossed over in various materials. Additionally, different articles discuss this grand term STS at different scales, or use different terms to define it at the same scale, which also makes it difficult for researchers to understand this field. This article will, from my personal perspective, clearly explain the AI alignment issues present in STS from a computable perspective at different scales, as well as possible research approaches.
[English Version] [Chinese Version]

Selected Awards

Huawei Spark Award, 2025. [News]
[Top 5%] Wuhan University Outstanding Thesis Award, 2023.

Services

Program Committee Member for AAAI 2026 AIA Track.
Program Committee Member for AAAI 2026.
Program Committee Member for DAI 2024.