This is the official code repository for the paper READRetro: Natural Product Biosynthesis Planning with Retrieval-Augmented Dual-View Retrosynthesis (bioRxiv, 2023).
We also provide a web version for ease of use.
Download the necessary data folder READRetro_data from Zenodo to ensure proper execution of the code and demonstrations in this repository.
The directory structure of READRetro_data is as follows:
READRetro_data
├── data.sh
├── data
│ ├── model_train_data
│ └── multistep_data
├── model
│ ├── bionavi
│ ├── g2s
│ │ └── saved_models
│ ├── megan
│ └── retroformer
│ └── saved_models
├── result
└── scripts
Place READRetro_data into the READRetro directory (i.e., READRetro/READRetro_data) and run sh data.sh in READRetro_data to set up the data.
Ensure the data is correctly located in READRetro. Verify the following:
READRetro/retroformer/saved_modelsshould matchREADRetro_data/model/retroformer/saved_models.READRetro/g2s/saved_modelsshould matchREADRetro_data/model/g2s/saved_models.READRetro/datashould matchREADRetro_data/data/multistep_data.READRetro/resultshould matchREADRetro_data/result.READRetro/scriptsshould matchREADRetro_data/scripts.
The directories READRetro_data/model/bionavi, READRetro_data/model/megan, and READRetro_data/data/model_train_data are required for reproducing the values in the manuscript.
Run the following commands to install the dependencies:
conda create -n readretro python=3.8
conda activate readretro
conda install pytorch==1.12.0 cudatoolkit=11.3 -c pytorch
pip install easydict pandas tqdm numpy==1.22 OpenNMT-py==2.3.0 networkx==2.5
conda install -c conda-forge rdkit=2019.09Alternatively, you can install the readretro package through pip:
conda create -n readretro python=3.8 -y
conda activate readretro
pip install readretro==1.2.0We provide the trained models through Zenodo.
You can use your own models trained using the official codes (https://github.com/coleygroup/Graph2SMILES and https://github.com/yuewan2/Retroformer).
More detailed instructions can be found in demo.ipynb.
Run the following commands to evaluate the single-step performance of the models:
CUDA_VISIBLE_DEVICES=${gpu_id} python eval_single.py # ensemble
CUDA_VISIBLE_DEVICES=${gpu_id} python eval_single.py -m retroformer # Retroformer
CUDA_VISIBLE_DEVICES=${gpu_id} python eval_single.py -m g2s -s 200 # Graph2SMILESRun the following command to plan paths of multiple products using multiprocessing:
CUDA_VISIBLE_DEVICES=${gpu_id} python run_mp.py
# e.g., CUDA_VISIBLE_DEVICES=0 python run_mp.pyYou can modify other hyperparameters described in run_mp.py.
Lower num_threads if you run out of GPU capacity.
Run the following command to plan the retrosynthesis path of your own molecule:
CUDA_VISIBLE_DEVICES=${gpu_id} python run.py ${product}
# e.g., CUDA_VISIBLE_DEVICES=0 python run.py 'O=C1C=C2C=CC(O)CC2O1'run_readretro -rc ${retroformer_ckpt} -gc ${g2s_ckpt} ${product}
# e.g., run_readretro -rc retroformer/saved_models/biochem.pt -gc g2s/saved_models/biochem.pt 'O=C1C=C2C=CC(O)CC2O1'
# you can replace the checkpoints with your own trained checkpoints of retroformer and g2s
# you should set the corresponding vocab file as an option if you replace the checkpointsYou can modify other hyperparameters described in run.py.
Run the following command to evaluate the planned paths of the test molecules:
python eval.py ${save_file}
# e.g., python eval.py result/debug.txtYou can reproduce the figures and tables presented in the paper or train your own models by utilizing the provided demo.ipynb.