Swarnadeep Saha

About Me

I am a Research Scientist on the Alignment team at FAIR, Meta. I work with Dr. Jason Weston on Reasoning, Memory, and Alignment of Large Language Models. In particular, I do research on Self-improving and Co-improving LLMs by building robust Reward Models and scalable RL recipes.

Previously, I obtained a PhD in Computer Science from the University of North Carolina at Chapel Hill, advised by Prof. Mohit Bansal. My PhD was supported by a Google PhD Fellowship and a Rebecca and Munroe Cobey Fellowship. A list of my publications (grouped by topics) can be found here:

Recent News

Oct 2025: Three new pre-prints on better post-training methods: Hybrid Reinforcement, Self-aggregation, and Improving Diversity.
Aug 2025: Bridging Offline and Online RL and OptimalThinkingBench are out on arXiv.
May 2025: J1, new paper on RL recipes for training LLM-as-a-Judge, is now on arXiv.
May 2025: Organizing the RAM2 workshop, co-located with COLM 2025. Submit your papers soon!
May 2025: EvalPlanner is accepted to ICML 2025.
January 2025: EvalPlanner, my first work out of FAIR, is now on arXiv.
January 2025: System-1.x is accepted to ICLR 2025.
August 2024: Joined FAIR at Meta as a Research Scientist.
July 2024: System-1.x, my final PhD paper, is out on arXiv.
May 2024: ReConcile is accepted to ACL 2024.
May 2024: MAGDi is accepted to ICML 2024.
April 2024: Defended a thesis on “Multi-step Reasoning over Natural Language”. Check it out here.
March 2024: Branch-Solve-Merge is accepted to NAACL 2024.
February 2024: New pre-print on Structured Distillation from Multi-agent Interaction Graphs.
October 2023: New pre-print from my FAIR Internship on Branch-Solve-Merge for improving LLM Evaluation and Generation.