This is the codebase of the paper: Blending Supervised and Reinforcement Fine-Tuning with Prefix Sampling. In this paper, we propose a hybrid approach for LLM post-training.
The code base in based on veRL. All prefix-rft related codes are in recipe/prefix_rft
@article{huang2025blending,
title={Blending Supervised and Reinforcement Fine-Tuning with Prefix Sampling},
author={Huang, Zeyu and Cheng, Tianhao and Qiu, Zihan and Wang, Zili and Xu, Yinghui and Ponti, Edoardo M and Titov, Ivan},
journal={arXiv preprint arXiv:2507.01679},
year={2025}
}