About Me

Hello! I’m Sapana, an Applied Scientist at Amazon, focused on reinforcement learning (RL) post-training.

In December 2025, we launched Reinforcement Fine-Tuning (RFT) as a service—details here. This release brings powerful, serverless model customization to developers, and I’m proud to have worked with an incredible team to enable customization for open-source models in Amazon SageMaker AI. You can now fine-tune leading models—including Qwen, Meta’s Llama, DeepSeek, and OpenAI’s gpt-oss—using advanced techniques such as: RLAIF (Reinforcement Learning from AI Feedback), RLVR (Reinforcement Learning with Verifiable Rewards), SFT (Supervised Fine-Tuning), and DPO (Direct Preference Optimization). These capabilities make it easier to build higher-quality, more controllable models tailored to your use case—without the overhead of managing infrastructure.

Earlier, I received my doctorate from Texas A&M University (TAMU) working with Dr. Dileep Kalathil. At TAMU, I focussed on making AI systems safe by including various notions of safety in online learning, RL, and reinforcement learning from human feedback (RLHF). Previously, I was a research fellow at MPI-SWS, Germany with Dr. Adish Singla. I also did a MS (Research) at IIT Madras with Dr. Balaraman Ravindran and Dr. Radha Krishna Ganti.

Aside from work, I like to hike, cook, paint, and photograph.

News

[Mar 2025] Paper on reasoning distillation out on arxiv!
[Jan 2025] AgentOccam accepted to ICLR 2025!
[Sep 2024] Paper on Risk Averse RLHF accepted to Neurips 2024!
[Sep 2024] Paper on Pedagogical Alignment of LLMs accepted to EMNLP 2024!
[Aug 2024] Joined Amazon Web Services (AWS) as an Applied Scientist!
[Aug 2023] Paper on Safe distributed OCO accepted to TMLR!
[May 2023] Back in Seattle for an Applied Scientist intern at Amazon!
[Apr 2023] Accepted to IJCAI 2023 Doctoral Consortium!
[Feb 2023] New paper on Safe distributed OCO out on arxiv!
[Feb 2023] Gave an invited talk on ‘Adaptivity and safety in sequential decision making’ at Rice University!
[Sep 2022] Paper on meta-RL in sparse reward environments accepted to NeurIPS 2022!
[Aug 2022] Spent a wonderful summer in Seattle as an Applied Scientist intern at Amazon!
[Dec 2021] Paper on Safe online convex optimization accepted to AAAI 2022!

Sapana Chaudhary

News