Hi, I am a Ph.D. student at Harvard-MGB AIM, jointly with Maastricht University, under the guidance of Hugo Aerts, Ph.D. and Danielle S. Bitterman, M.D. My works are supported by the 2024 Google PhD Fellowship in Natural Language Processing. I am also affiliated with the Boston Children's Hospital Computational Health Informatics Program (CHIP), where we have the privilege of collaborating closely with Guergana Savova, Ph.D. and Tim Miller, Ph.D.
My research bridges technical rigor and clinical responsibility, balancing the drive to enhance AI capabilities with the need to mitigate risks. I probe the foundational limitations of LLMs—uncovering vulnerabilities like data bias and hallucinations—while creating scalable benchmarks to measure real progress. Simultaneously, I focus on last-mile alignment and robust safety evaluations for high-stakes tasks, aiming to transform opaque models into transparent, reliable instruments that improve patient care and trust.
My research has been featured in major media outlets such as Bloomberg, The New York Times, NBC, and New Scientist, among others. It has also been highlighted by government agencies including the FDA, NCI, and NIH, and has been cited in U.S. congressional hearings.
During COVID-19, I completed with M.S. in Computational Linguistics from Brandeis University, where I was fortunate to be advised by Professor Nianwen Xue Ph.D. where I fully explored my interests and met many wonderful people and friends. Before Brandeis, I spent 4 years as an undergraduate in Math, Japanese and Linguistics at St. Olaf College enjoying the snow!
During my free time, I enjoy basketball, dragonboat and kyudo 🏹.
If you want to work with me or my group, please email [email protected] instead!
News
- [11/11/2025] Our work "When helpfulness backfires: LLMs and the risk of false medical information due to sycophantic behavior" got hightlight on New York Times, Nature and oral at AMIA 2025!
- [01/04/2025] Our paper on using LLMs to identify social determinants of health in electronic health records was the most cited journal-wide (Nature/NPJ Digital Medicine) in 2024! This paper was also selected for the AI and Data Science Year in Review 2024 at AMIA!
- [11/11/2024] Heading to EMNLP and wrote a blog post on what we learnt this year on various things in AI4healthcare.
- [10/10/2024] Our paper "WorldMedQA-V: a multilingual, multimodal medical examination dataset for multimodal language models evaluation" is now available on arXiv.
- [09/15/2024] Honored to receive the 2024 Google PhD Fellowship in Natural Language Processing!
- [06/19/2024] RABBITS is out! We examined current biomedical benchmarks and found the language models are more familar with generic terms!
- [05/09/2024] Cross-Care is out! The first grounded bias benchmark that analyzes how pre-training data impacts model misalignment with real-world medical concepts.
- [04/02/2024] LCD Benchmark is out! Try this long clinical documents benchmark that LLMs are bad at!
- [11/07/2023] Our SDoH paper got accepted at Nature Digital Medicine, front page featured article from Jan-May 2024. Highlight research at NCI!
- [08/24/2023] Check out our work and editorial highlights @ JAMA Onc. Also used during US congress hearing!
Selected Publications
(* indicates equal contribution)
Mentoring
Students and projects
- Pedro Moreria - MIT Master Thesis student, Now MLE at Google
- Kuleen Sasse - Undergrad student, Now PhD at JHU
- Shayan Chowdhury - Undergrad student at Columbia
- Javier Mora, MD - Harvard Medical School Residency research year 2025
- Kraig Tou - Master student, Now MLE at AWS Annapurna Labs
- Yanan (Lance) Lu - Harvard DBMI Master Thesis student, Now MLE at Tiktok
- Nikolaj Munch - Master Thesis student, Now Chief AI Advisor at The Ministry of Foreign Affairs of Denmark
- Vikram Goddla - High school student, Now at Harvard College
Honors and Service
Honors
- Google PhD Fellowship in Natural Language Processing, 2024
- CHIL Doctoral Consortium, 2024, 25 (Oral)
- Brandeis Merit Scholarship, 2020
- JASSO Scholarship, 日本文部科学省, 2019
- National Japanese Exam Silver Prize, AATJ 全米日本語教育学会, 2019
- Henry Luce Research Grant, Henry Luce Foundation, 2018
- Pi Mu Epsilon Society, National Math honor society, 2018
Service
- Scientific Advisory Board: Mass General Brigham HPC Scientific Advisory, 2025-present
- Peer Reviewer: ACL, EMNLP, NAACL, EACL, ICLR, Neurips
- Program Committee: Clinical NLP Workshop 2023, 24
- Journal Reviewer: JAMIA, JBI, JMIR, JNCI, Nature communication, npj Digital Medicine, Nature Medicine
Invited Talks
- Shan Chen; UMich NLP Seminar - How Far are we from reliable LLMs Applications in clinical settings; 2025
- Shan Chen; CHIP Journal Club - LLMs Applications in clinical settings; 2025
- Shan Chen; UAB Annual Methods Symposium - Tutorial - LLMs Applications in clinical settings; 2025
- Shan Chen; MIT HST 953 - Towards More Robust Large Language Models Applications in Clinical Settings; 2024
- Shan Chen; City of Hope - The Role and Risks of Large Language Models in Clinical Settings; 2024
- Shan Chen; Harvard - Beacon hill lecture seminars: current progress of AI4Healthcare; 2024