Bayes-Adaptive RL for LLM Reasoning

Code for Beyond Markovian: Reflective Exploration via Bayes-Adaptive RL for LLM Reasoning.

Authors: Shenao Zhang¹, Yaqing Wang², Yinxiao Liu², Tianqi Liu², Peter Grabowski³, Eugene Ie³, Zhaoran Wang¹, Yunxian Li³.

¹Northwestern University, ²Google Deepmind, ³Google.

We introduce a principled RL framework for stitching together plausible strategies, analogous to linearized best-of-N reasoning, but with explicit step-level guidance on when and how LLMs should reflectively explore.

Installation

pip install -e .

Run the Code

bash train_barl.sh

Citation

@article{zhang2025beyond,
  title={Beyond Markovian: Reflective Exploration via Bayes-Adaptive RL for LLM Reasoning},
  author={Zhang, Shenao and Wang, Yaqing and Liu, Yinxiao and Liu, Tianqi and Grabowski, Peter and Ie, Eugene and Wang, Zhaoran and Li, Yunxuan},
  journal={arXiv preprint arXiv:2505.20561},
  year={2025}
}

Acknowledgement

This repository is built upon the OpenRLHF framework. We thank the authors for their great work.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
openrlhf		openrlhf
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py
train_barl.sh		train_barl.sh
version.txt		version.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Bayes-Adaptive RL for LLM Reasoning

Installation

Run the Code

Citation

Acknowledgement

About

Uh oh!

Releases

Packages

Uh oh!

Languages

shenao-zhang/BARL

Folders and files

Latest commit

History

Repository files navigation

Bayes-Adaptive RL for LLM Reasoning

Installation

Run the Code

Citation

Acknowledgement

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages