RL-Guider: Leveraging Historical Decisions and Feedback for Drug Editing with Large Language Models
Xufeng Liu* , Yixuan Ding*† , Jingxiang Qu, Yichi Zhang, Wenhan Gao‡ , Yi Liu‡
* Equal contribution.
† Work done during an internship at Stony Brook University.
‡ Equal senior contribution.
ACL 2025
Introduction: Recent advances in large language models (LLMs) across diverse domains highlight their potential to transform scientific discovery, including drug editing. Traditional drug editing relies on iterative conversations with domain experts, refining a molecule until the desired property is achieved. This interactive process mirrors the strengths of LLMs. However, existing approaches edit each molecule independently without leveraging knowledge from past edits.
Human experts develop intuition about effective modifications over time by learning from historical experience. Accumulating past knowledge is pivotal for both humans and LLMs. In this work, we propose RL-Guider — a reinforcement-learning agent that suggests edits to LLMs and improves over time by learning from evaluation feedback on past results.
RL-Guider is the first framework to combine the comprehensive "world-level" knowledge of LLMs with knowledge accumulated from historical feedback. As a result, RL-Guider mitigates shortcomings of existing approaches and achieves superior performance.
All dependencies are listed in requirements.txt. Install them with:
pip install -r requirements.txtRun an example to perform drug editing (small molecule) with LLaMA:
python run_ChatDrug.py --task_id=101 --C=0 --constraint='loose' --conversational_LLM='llama' --conversation_type='single'Arguments:
--task_id: The task identifier.--C: Constraint strength.--constraint: Editing constraint type (e.g.,loose,strict).--conversational_LLM: Choice of LLM (e.g.,llama).--conversation_type: Mode of conversation (e.g.,single,multi).
Follow these steps to use RL-Guider:
- Run the scripts in
gather_buffer/to interact with LLMs and collect data. - Run the scripts in
process_buffer/to preprocess the collected data. - Train RL models using the scripts in the
rl_train/folder. - Run
run_planner_tree.pyto perform drug editing with RL-Guider.
Released under the MIT License. See LICENSE.
If you find this work useful, please cite:
@inproceedings{liu2025rl,
title={RL-Guider: Leveraging Historical Decisions and Feedback for Drug Editing with Large Language Models},
author={Liu, Xufeng and Ding, Yixuan and Qu, Jingxiang and Zhang, Yichi and Gao, Wenhan and Liu, Yi},
booktitle={Findings of the Association for Computational Linguistics: ACL 2025},
pages={13121--13138},
year={2025}
}