Yong Zheng-Xin

CS PhD @ Brown University

prof_pic.jpg

I am a final-year PhD student at Brown University advised by Stephen Bach, funded by the Open Philanthropy grant for technical AI safety. I’m also an Astra Fellow working with Miles Wang (OpenAI). Previously, I was a research scientist intern at Meta AI and a research collaborator at Cohere Labs.

My current research focuses on reasoning and AI safety. I have worked on:

  • Reasoning generalization: How chain-of-thought reasoning works across languages and scales at test-time (arXiv)
  • Safety for reasoning models: Emergent self-jailbreaking behaviors during reasoning (arXiv) and predicting refusals before models finish thinking (arXiv)
  • Safety for multilingual models: Jailbreaking vulnerabilities in low-resource languages (Best Paper, NeurIPS 2023 SoLaR), detoxification (EMNLP 2024), and finetuning attacks (NAACL 2025)

I’ve also contributed to multilingual frontier models through the Aya instruction-following model (Best Paper, ACL 2024), language adaptation techniques (ACL 2023), and making speech models robust to new accents (INTERSPEECH 2025).


Selected Publications (see all)

  1. Yik Siu Chan* ,  Zheng-Xin Yong* ,  and  Stephen H. Bach
    arxiv preprint, 2025
  2. Zheng-Xin Yong ,  M. Farid Adilazuarda ,  Jonibek Mansurov , and 7 more authors
    arxiv preprint, 2025
  3. Ahmet Üstün* ,  Viraat Aryabumi* ,  Zheng-Xin Yong* , and 14 more authors
    ACL, 2024 (Best Paper Award)
  4. Zheng-Xin Yong ,  Cristina Menghini ,  and  Stephen Bach
    NeurIPS Workshop: Socially Responsible Language Modelling Research (SoLaR) , 2023 (Best Paper Award)