This repository contains the official implementation of "MMM: Quantum-Chemical Molecular Representation Learning for Combinatorial Drug Recommendation".
MMM consists of four principal components: (1) a module encoding patient EHR data into longitudinal visit states; (2) an ELF-based drug encoder using a pre-trained CNN to extract global molecular electronic features from 3D ELF maps; (3) a local bipartite encoder capturing patient-specific importance of drug substructures; and (4) a medication recommendation module that integrates global and local drug representations to predict safe and effective drug combinations. Together, MMM fuses temporal patient data with detailed molecular drug properties for personalized, safe prescriptions.
Highlights
- Multimodal Fusion: Integrating ELF-based molecular image embeddings with sequential patient EHR representations
- DDI-aware Learning: Incorporating drug–drug interaction (DDI) adjacency matrices and rule-based constraints into the training loss and attention mechanisms
- Combinatorial Recommendation: Predicting multi-label drug combinations while continuously monitoring DDI risk
- Optional Explainability: Applying Grad-CAM-style visualization to ELF maps to interpret molecular feature contributions
We provide the network architecture of the proposed MMM model, along with the pipeline code to enable users to train and test the network on the EHR dataset and drug molecular ELF image. The DDI calculations and drug information used in this work are based on the implementation from SafeDrug's repository. All experiments were conducted in an environment with Python 3.9.23, PyTorch 2.3.0+cu118, and CUDA 11.8.
To further train the model, you need to install RDKit-related tools and several packages. To avoid version conflicts among these packages, please follow the installation steps in the exact order below.
- First, create and activate a new conda environment.
conda create -c conda-forge -n new_env python=3.9
conda activate new_env
- Install RDKit
conda install -c conda-forge rdkit
- If RDKit does not work after the above installation, try:
pip install rdkit-pypi
- Install numpy, pandas, and scipy with specific versions to avoid conflicts:
pip install numpy==1.22.4 pandas==1.3.0 scipy==1.13.1
- To install PyTorch 2.3.0 with CUDA 11.8 support and torchvision 0.18.0 matching CUDA version, run::
pip install torch==2.3.0+cu118 torchvision==0.18.0+cu118 torchaudio --extra-index-url https://download.pytorch.org/whl/cu118
Data paths and hyperparameters (such as learning rate, target_ddi, etc.) are configured in the main.py file. Dataset paths should be set in the code to correctly correspond to the input and output folders. Running the preprocessing script processing.py automatically generates related files within the output folder. The paths in the code must then be updated accordingly to reflect the locations of these generated files.
- Dataset Configuration
In main.py the paths for the following variables must be updated to correspond to the .pkl files generated within the output folder:
data_path = "[records_final.pkl]"
voc_path = "[voc_final.pkl]"
ddi_adj_path = "[ddi_A_final.pkl]"
ddi_mask_path = "[ddi_mask_H.pkl]"
molecule_path = "[cidtoSMILES.pkl]"
ddi_rate = ddi_rate_score("[ddi_A_final.pkl]")
These should be set to point to the corresponding .pkl files generated by preprocessing, typically located in the data/output folder.
- External data files required for preprocessing
The following files are obtained from external sources and must be prepared in advance:
| Filename | Description |
|---|---|
| ndc2RXCUI.txt | NDC-to-RxCUI mapping file, adapted from ndc2rxnorm_mapping.csv in the GAMENet repository. |
| drug-DDI.csv | Contains drug–drug interaction (DDI) information indexed by CID. Download from Google Drive. |
| RXCUI2atc4.csv | RxCUI-to-ATC4 mapping file, adapted from ndc2atc_level4.csv in the GAMENet repository. |
- Hyperparameter Configuration
Hyperparameters can be configured in main.py. These hyperparameters are set using the argparse module, allowing default values to be specified and overridden via command-line arguments:
hyperparameters = {
"Test": [True or False],
"model_name": ["model_identifier"],
"resume_path": ["path/to/checkpoint"],
"lr": [learning_rate],
"target_ddi": [target_ddi],
"kp": [coefficient_of_P_signal],
"dim": [dimension_size],
"cuda": [cuda_device_index]
}
- Run the Code
python main.py
python main.py --Test --resume_path [best_epoch_path]
If you find this code useful for your work, please cite the following and consider starring this repository:
@inproceedings{
kwon2025mmm,
title={{MMM}: Quantum-Chemical Molecular Representation Learning for Personalized Drug Recommendation},
author={Chongmyung Kwon and Yujin Kim and Seoeun Park and Yunji Lee and Charmgil Hong},
booktitle={PRedictive Intelligence in MEdicine},
year={2025},
organization={Springer}
}
@inproceedings{yang2021safedrug,
title = {SafeDrug: Dual Molecular Graph Encoders for Safe Drug Recommendations},
author = {Yang, Chaoqi and Xiao, Cao and Ma, Fenglong and Glass, Lucas and Sun, Jimeng},
booktitle = {Proceedings of the Thirtieth International Joint Conference on
Artificial Intelligence, {IJCAI} 2021},
year = {2021}
}
@article{kim2025pubchem,
title={PubChem 2025 update},
author={Kim, Sunghwan and Chen, Jie and Cheng, Tiejun and Gindulyte, Asta and He, Jie and He, Suyun and Li, Qingliang and Shoemaker, Bradford A. and Thiessen, Paul A. and Yu, Bo and Zaslavsky, Leonid and Zhang, Jian and Bolton, Evan E.},
journal={Nucleic Acids Research},
volume={53},
number={D1},
pages={D1516--D1525},
year={2025},
doi={10.1093/nar/gkae1059}
}
