READRetro: Natural Product Biosynthesis Planning with Retrieval-Augmented Dual-View Retrosynthesis

This is the official code repository for the paper READRetro: Natural Product Biosynthesis Planning with Retrieval-Augmented Dual-View Retrosynthesis (bioRxiv, 2023).
We also provide a web version for ease of use.

Data

Download the necessary data folder READRetro_data from Zenodo to ensure proper execution of the code and demonstrations in this repository.

The directory structure of READRetro_data is as follows:

READRetro_data
    ├── data.sh
    ├── data
    │   ├── model_train_data
    │   └── multistep_data
    ├── model
    │   ├── bionavi
    │   ├── g2s
    │   │   └── saved_models
    │   ├── megan
    │   └── retroformer
    │       └── saved_models
    ├── result
    └── scripts

Place READRetro_data into the READRetro directory (i.e., READRetro/READRetro_data) and run sh data.sh in READRetro_data to set up the data.

Ensure the data is correctly located in READRetro. Verify the following:

READRetro/retroformer/saved_models should match READRetro_data/model/retroformer/saved_models.
READRetro/g2s/saved_models should match READRetro_data/model/g2s/saved_models.
READRetro/data should match READRetro_data/data/multistep_data.
READRetro/result should match READRetro_data/result.
READRetro/scripts should match READRetro_data/scripts.

The directories READRetro_data/model/bionavi, READRetro_data/model/megan, and READRetro_data/data/model_train_data are required for reproducing the values in the manuscript.

Installation

Run the following commands to install the dependencies:

conda create -n readretro python=3.8
conda activate readretro
conda install pytorch==1.12.0 cudatoolkit=11.3 -c pytorch
pip install easydict pandas tqdm numpy==1.22 OpenNMT-py==2.3.0 networkx==2.5
conda install -c conda-forge rdkit=2019.09

Alternatively, you can install the readretro package through pip:

conda create -n readretro python=3.8 -y
conda activate readretro
pip install readretro==1.2.0

Model Preparation

We provide the trained models through Zenodo.
You can use your own models trained using the official codes (https://github.com/coleygroup/Graph2SMILES and https://github.com/yuewan2/Retroformer).
More detailed instructions can be found in demo.ipynb.

Single-step Planning and Evaluation

Run the following commands to evaluate the single-step performance of the models:

CUDA_VISIBLE_DEVICES=${gpu_id} python eval_single.py                    # ensemble
CUDA_VISIBLE_DEVICES=${gpu_id} python eval_single.py -m retroformer     # Retroformer
CUDA_VISIBLE_DEVICES=${gpu_id} python eval_single.py -m g2s -s 200      # Graph2SMILES

Multi-step Planning

Run the following command to plan paths of multiple products using multiprocessing:

CUDA_VISIBLE_DEVICES=${gpu_id} python run_mp.py
# e.g., CUDA_VISIBLE_DEVICES=0 python run_mp.py

You can modify other hyperparameters described in run_mp.py.
Lower num_threads if you run out of GPU capacity.

Run the following command to plan the retrosynthesis path of your own molecule:

CUDA_VISIBLE_DEVICES=${gpu_id} python run.py ${product}
# e.g., CUDA_VISIBLE_DEVICES=0 python run.py 'O=C1C=C2C=CC(O)CC2O1'

Using the command from pip

run_readretro -rc ${retroformer_ckpt} -gc ${g2s_ckpt} ${product}
# e.g., run_readretro -rc retroformer/saved_models/biochem.pt -gc g2s/saved_models/biochem.pt 'O=C1C=C2C=CC(O)CC2O1'
# you can replace the checkpoints with your own trained checkpoints of retroformer and g2s
# you should set the corresponding vocab file as an option if you replace the checkpoints

You can modify other hyperparameters described in run.py.

Multi-step Evaluation

Run the following command to evaluate the planned paths of the test molecules:

python eval.py ${save_file}
# e.g., python eval.py result/debug.txt

Demo

You can reproduce the figures and tables presented in the paper or train your own models by utilizing the provided demo.ipynb.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

READRetro: Natural Product Biosynthesis Planning with Retrieval-Augmented Dual-View Retrosynthesis

Data

Installation

Model Preparation

Single-step Planning and Evaluation

Multi-step Planning

Using the command from pip

Multi-step Evaluation

Demo

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
g2s		g2s
megan		megan
retro_star		retro_star
retroformer		retroformer
utils		utils
LICENSE		LICENSE
README.md		README.md
demo.ipynb		demo.ipynb
eval.py		eval.py
eval_single.py		eval_single.py
run.py		run.py
run_mp.py		run_mp.py

License

SeulLee05/READRetro

Folders and files

Latest commit

History

Repository files navigation

READRetro: Natural Product Biosynthesis Planning with Retrieval-Augmented Dual-View Retrosynthesis

Data

Installation

Model Preparation

Single-step Planning and Evaluation

Multi-step Planning

Using the command from pip

Multi-step Evaluation

Demo

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages