Photo by Rod Searcey
Hi all!
I’m a Research Fellow at the Hoover Institution’s Technology Policy Accelerator and I collaborate with the Stanford Intelligence Systems Laboratory and the Stanford Center for AI Safety at Stanford University.
My research focuses on the security and safety of language models through mechanistic interpretability, reward modeling, and robust evaluation. I develop model-internal interventions for failure modes like backdoors and reward-proxy shortcuts and methods for learning from expert judgment, with an emphasis on statistically grounded measurement under uncertainty and distribution shift. Past research includes probabilistic and hierarchical modeling of noisy human feedback, often in collaboration with domain experts in high-stakes settings like mental health or national security. My work produces tools and evaluations that support development, deployment, and governance decisions, and has been published in leading AI venues (ICLR, CoLM, NeurIPS, FAccT, AIES, and Nature). I developed and taught multiple courses on AI safety, societal impact of AI, and emerging technologies at Stanford University.
My technical research also directly informed AI governance efforts through policy publications (e.g., in Foreign Affairs and the Carnegie Council for Ethics in International Affairs), bi-partisan briefings and direct engagement with policymakers (including local, state, and federal government officials, military staff, and international diplomats), as well as written feedback on AI legislation.
Prior to my current appointment, I was a postdoctoral fellow at the Stanford Center for AI Safety, the Center for International Security and Cooperation, and the Stanford Existential Risks Initiative at Stanford University advised by Prof. Clark Barrett, Prof. Steve Luby, and Prof. Paul Edwards. I received my Ph.D. in August 2023 from the School of Natural Sciences at the Technical University of Munich and hold a B.Sc. and M.Sc. in Physics from the Ruprecht Karl University of Heidelberg.
Curriculum Vitae (Updated 03/03/2026)