My research centers on
Humanistic,
Pluralistic, and
Coevolutionary AI Safety and Alignment
aiming to foster the long-term secure, sustainable, and synergistic coevolution of AI and
humanity:
From Human to AI: Developing human-centered, ever-evolving, and future-oriented AI systems,
anchored in interdisciplinary insights into human intelligence, values, and global needs.
From AI to Human: Advancing the frontiers of human knowledge, augmenting human capabilities, and
addressing consequential sociotechnical challenges through robust, efficient, and scalable innovations
in data, learning algorithms, and AI system design.
My current research focuses on developing data, algorithmic, and system-level solutions to address sociotechnical challenges in AI safety, security, and LLM alignment, often through multi-agent, RL, and data synthesis angles. My works have spearheaded research on moral and (pluralistic) value reasoning of LLMs. An overview of my research:
Artificial IntelligenceNatural Language ProcessingAI SafetyMachine LearningPluralistic AlignmentHuman-AI Interaction
Dec 2025: I'll be traveling to San Diego for NeurIPS 2025 for presenting two
accepted papers: Oral for Artificial Hivemind and poster for AI debates for
controversial claims. I'll
also be attending the Alignment Workshop!
Please feel free to reach out if you'd like to chat!
Towards Safe & Trustworthy Agents Workshop @ NeurIPS 2024
Position Paper: Political Neutrality in AI is Impossible—But Here is How to Approximate it
Jillian Fisher, Ruth Elisabeth Appel, Chan Young Park, Yujin Potter, Liwei
Jiang, Taylor Sorensen, Shangbin Feng, Yulia Tsvetkov, Margaret Roberts,
Jennifer Pan, Dawn Song, Yejin Choi
SafetyAnalyst: Interpretable, Transparent, and Steerable Safety Moderation for AI Behavior
Jing-Jing Li, Valentina Pyatkin, Max Kleiman-Weiner, Liwei Jiang, Nouha
Dziri, Anne G. E. Collins, Jana Schaich Borg, Maarten Sap, Yejin Choi, Sydney Levine
To Err is AI: A Case Study Informing LLM Flaw Reporting Practices
Sean McGregor, Allyson Ettinger, Nick Judd, Paul Albee, Liwei Jiang, Kavel
Rao, Will Smith, Shayne Longpre, Avijit Ghosh, Christopher Fiorelli, Michelle Hoang, Sven
Cattell, Nouha Dziri
Position Paper: A Roadmap to Pluralistic Alignment
Taylor Sorensen, Jared Moore, Jillian Fisher, Mitchell Gordon, Niloofar Mireshghallah,
Christopher Michael Rytting, Andre Ye, Liwei Jiang, Ximing Lu, Nouha Dziri,
Tim Althoff, Yejin Choi
NovaCOMET: Open Commonsense Foundation Models with Symbolic Knowledge Distillation
Peter West, Ronan Le Bras, Taylor Sorensen, Bill Yuchen Lin, Liwei Jiang,
Ximing Lu, Khyathi Chandu, Jack Hessel, Ashutosh Baheti, Chandra Bhagavatula, Yejin Choi
SODA: Million-scale Dialogue Distillation with Social Commonsense Contextualization
🏆 Outstanding Paper
Award
Hyunwoo Kim, Jack Hessel, Liwei Jiang, Peter West, Ximing Lu, Youngjae Yu,
Pei Zhou, Ronan Le Bras, Malihe Alikhani, Gunhee Kim, Maarten Sap, Yejin Choi
UCLA NLP Seminar | UIUC NLP Seminar | UIUC ECE (Hosted by Prof. Huan Zhang)
Humanistic, Pluralistic, and Coevolutionary AI Safety and Alignment
(Upcoming) Jan 2026, Speaker
NeurIPS 2025 | NVIDIA | Ploutos | University of Toronto (Hosted by Prof. Ebrahim Bagheri) | Zhiyuan Talk | AI TIME
Artificial Hivemind: The Open-Ended Homogeneity of Language Models (and Beyond)
Slides
Dec 2025, Speaker
Netskope
WildTeaming and WildGuard: Building Robust Model-Level and System-Level Safeguards of Language Models
May 2025, Speaker
Darpa ITM PI Meeting
Can Language Models Reason about Individualistic Human Values and Preferences?
March 2025, Speaker
University of Washington, Foster School of Business, Computational Minds and Machines lab
How to Build Machines with Deep Concerns of Human Traits, Values, and Needs?—Towards Humanistic AI Alignment
Feb 2025, Speaker (Hosted by Prof. Max Kleiman-Weiner)
Annual Research Showcase and Open House Event, UW CSE
AI Safety Panel
Oct 2024, Panelist
All-Ai2 Meeting, Allen Institute for Artificial Intelligence (Ai2)
WildTeaming at Scale: From In-the-Wild Jailbreaks to (Adversarially) Safer LMs
July 2024, Speaker
The Big Picture Workshop, EMNLP, Singapore
On the Outcomes of Scientific Disagreements on Machine Morality
Dec 2023, Speaker
Darpa ITM Kickoff PI Meeting
Toward Interpretable and Interactive Socially & Ethically Informed AI
May 2023, Speaker
Mosaic Morality & AI Series, Allen Institute for Artificial Intelligence (Ai2)
Toward Interpretable, Interactive, Informative Machine Moral Reasoning
Feb 2023, Discussant
UW NLP Retreat
Toward Socially Aware & Ethically Informed AI
Sept 2022, Speaker
All-Ai2 Meeting, Allen Institute for Artificial Intelligence (Ai2)
Delphi: Toward Machine Ethics and Norms
Oct 2021, Speaker
Bio
Liwei Jiang is a final-year Ph.D. candidate in the Paul G. Allen School of Computer Science & Engineering at the University of Washington, advised by Prof. Yejin Choi. She was previously a graduate student researcher at NVIDIA and the Allen Institute for Artificial Intelligence (Ai2). Her research focuses on humanistic, pluralistic, and coevolutionary AI safety and alignment, where she spearheads research on moral and pluralistic value reasoning in language models and develops data-, algorithm-, and system-level solutions to socio-technical challenges in AI safety, security, and large language model alignment. Her work has received Best Paper Awards at NeurIPS 2025, NAACL 2022, and CHI 2024, as well as Outstanding Paper Awards at EMNLP 2023 and the AIA Workshop at COLM 2025, and has been featured in The New York Times, Nature Outlook, IEEE Spectrum, Wired, and other major media. She co-organizes workshops including MP2 (NeurIPS 2023) and SoLaR (NeurIPS 2024; COLM 2025), and co-leads the Guardrails and Security for LLMs tutorial at ACL 2025.
姜力炜是华盛顿大学保罗·艾伦计算机科学与工程学院(Paul G. Allen School of Computer Science & Engineering at University of Washington)的博士生,师从Yejin Choi教授。她曾是英伟达(NVIDIA)和艾伦人工智能研究所(Ai2)的研究生研究员。她的研究专注于以人为本、多元化和协同演化的人工智能安全与对齐,在语言模型的道德和多元价值推理方面开展了开创性研究,并开发了数据、算法和系统级解决方案,以应对人工智能安全、安保和大语言模型对齐中的社会技术挑战。她的工作获得了NeurIPS 2025、NAACL 2022和CHI 2024的最佳论文奖,以及EMNLP 2023和COLM 2025 AIA研讨会的杰出论文奖,并被《纽约时报》、《Nature Outlook》、《IEEE Spectrum》、《Wired》等主流媒体报道。她共同组织了MP2(NeurIPS 2023)和SoLaR(NeurIPS 2024; COLM 2025)研讨会,并共同主持ACL 2025的《大语言模型的护栏与安全》教程。
Personal
I deeply value mentorship and am profoundly grateful to the mentors who have shaped and supported my research journey (in alphabetical order): Chandra Bhagavatula, Antoine Bosselut, Yejin Choi, Oren Etzioni, Erick Galinkin, Jena D. Hwang, Natasha Jaques, James Landay, Ronan Le Bras, Christopher Parisien, Sherry Ruan, Maarten Sap, and Yulia Tsvetkov.
I firmly believe that everyone has the potential to achieve anything they set their mind to. Keep going
and try again.
Your path is uniquely yours. Follow what ignites you. Every twist, every turn, every unexpected direction
is exactly where you need to be.
Two cats, an orange tabby named Loopy and an orange british shorthair named Loafy, adopted me as their owner.