Blogs
2025
Optimizing LLM test-time compute involves solving a meta RL problem
A. Setlur, Y. Qu, M. Yang, L. Zhang, V. Smith, A. Kumar
[CMU MLD Blog] (Jan 2025)Sharpening or Discovery, RL or Meta RL?: How RL Improves LLM Reasoning
A. Setlur, A. Kumar
[Notion Blog] (June 2025)How to Explore to Scale RL Training of LLMs on Hard Problems?
A. Setlur, A. Kumar
[CMU MLD Blog] (Dec 2025)