|
Ruixin (Ray) Yang
   
杨瑞欣
I am a MSCS student at Georgia Tech, working with Prof. Alan Ritter. I received my BSc degree in Computer Science and Statistics from University of British Columbia in the beautiful Vancouver, Canada.
My research aims to understand and improve foundation models, enabling them to be more robust, trustworthy, and to collaborate more effectively with humans. Specifically, I am interested in:
(1) Evaluation and oversight: How can we rigorously evaluate models or agents to quantify both emergent capabilities and reliability risks? How can we systematically monitor and audit model behaviors in user-facing, interactive, or open-ended environments?
(2) Alignment through collaboration: How can we mitigate reliability risks by leveraging diverse human experiences and insights distilled from simulated or real-world interaction dynamics? Can such interaction-centric approaches also enhance personalization and diversity to support effective Human-AI collaboration?
(3) Understanding the principles of reasoning and their connection to reliability: How do models reason and how can we make reasoning more reliable? Can we better understand and analyze the underlying mechanisms that lead to various reliability failures by analyzing data influence, training dynamics, or internal representations?
During Summer 2025, I was a Research Engineer Intern at the Center for AI Safety. Previously, I was a research assistant at Dartmouth College where I had the chance to work with Dr. Ruibo Liu and Prof. Soroush Vosoughi on Value Alignment for LLMs.
Email  / 
Github  / 
Google Scholar  / 
Linkedin  / 
Twitter
|
|
|
|
Do Vision-Language Models Respect Contextual Integrity in Location Disclosure?
Ruixin Yang,
Ethan Mendes,
Arthur Wang,
James Hays,
Sauvik Das,
Wei Xu,
Alan Ritter
In submission
A benchmark and set of analyses for evaluating whether vision-language models respect contextual integrity in location disclosure for image geolocation, revealing that violations of contextual norms can lead to contextual harm, characterized by over-disclosure of sensitive locations, poor privacy-utility tradeoffs, and misalignment with human privacy expectations.
|
|
|
Confidence Calibration and Rationalization for LLMs via Multi-Agent Deliberation
Ruixin Yang,
Dheeraj Rajagopal,
Shirley Anugrah Hayati,
Bin Hu,
Dongyeop Kang
ICLR 2024 Workshop on Reliable and Responsible Foundation Models
OpenReview
/
arXiv
/
code
We propose Collaborative Calibration, a collaborative approach to elicit, calibrate, and rationalize prediction confidence of LLMs.
|
|
|
Training Socially Aligned Language Models on Simulated Social Interactions
Ruibo Liu,
Ruixin Yang,
Chenyan Jia,
Ge Zhang,
Diyi Yang,
Soroush Vosoughi
ICLR 2024
OpenReview
/
arXiv
/
code & data
Alignment training with data from multi-LLM simulated social interactions, as an efficient, effective, and stable alternative for RLHF.
|
|
|
Visual Analytics for Generative Transformer Models
*Raymond Li,
*Ruixin Yang,
Wen Xiao,
Ahmed AbuRa'ed,
Gabriel Murray,
Giuseppe Carenini
paper
/
arXiv
/
code & data
In this work, we present a novel visual analytical framework to support the analysis of transformer-based generative models.
|
|
|
Generalizing Morphological Inflection Systems to Unseen Lemmas
*Changbing Yang,
*Ruixin Yang,
Garrett Nicolai,
Miikka Silfverberg
SIGMORPHON 2022
paper
Competed for Shared Task 0: Generalization and Typologically Diverse Morphological Inflection and achieved the highest performance among all submission in both small and large training conditions.
|
I come from Nanjing, a beautiful and historical city that served as the capital of six ancient Chinese dynasties over the past two thousand years.
I like listening to Rock N' Roll, ranging from Progressive Rock to BritPop and Pop Rock.
I've also been known to (awkwardly) hoop, smash, and stroke. (Style borrowed here from Prof. Schmidt)
|