Skip to content

sabijun/MT-RewardTree

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 

Repository files navigation


Overview of MT-RewardTree

Feature

  • We employ the Monte Carlo Tree Search (MCTS) method to generate token-level translation preference pairs (Prefixed data) for both model training and testing purposes. For contrast, we also utilize conventional approaches to generate sequence-level translation preference pairs (Arbitrary data). We partition both datasets into training and testing subsets for further evaluation.
  • To assess the effectiveness of our custom dataset, we use prefixed data and Arbitrary data to train our Implicit Process Reward Model. You can reach our models in Hugging Face.
  • Finally, we deploy our Implicit Process Reward Model in both Test-time Alignment and Hypothesis Ensembling frameworks, demonstrating significant performance improvements across evaluation metrics.
  • We also provide the code for generating token-level prefixed translation preference pairs and utilizing our Process RM to obtain sequence-level results. You can access these codes in the code directory.

Citation

@article{feng2025mtrewardtree,
  title={MT-RewardTree: A Comprehensive Framework for Advancing LLM-Based Machine Translation via Reward Modeling},
  author={Feng, Zhaopeng and Ren, Jiahan and Su, Jiayuan and Zheng, Jiamei and Tang, Zhihang and Wang, Hongwei and Liu, Zuozhu},
  journal={arXiv preprint arXiv:2503.12123},
  year={2025}
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors