Instructions

Hierarchical Attention Propagation (HAP) is a medical ontology embedding framework which generalizes GRAM by hierarchically propagating attention across the entire ontology structure, where a medical concept adaptively learns its embedding from all other concepts in the hierarchy instead of only its ancestors.

For more information, please check our paper:

M. Zhang, C. King, M. Avidan, and Y. Chen, Hierarchical Attention Propagation for Healthcare Representation Learning, Proc. ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD-20), 2020. [PDF]

Code Description

Like GRAM, the code trains an RNN (Gated Recurrent Units) to predict, at each timestep (i.e. visit), the diagnosis/procedure codes occurring in the next visit. The code uses Multi-level Clinical Classification Software for ICD-9-CM as the domain knowledge.

Running HAP

STEP 1: Installation

Install python, Theano. We use Python 2.7, Theano 0.8.2. Theano can be easily installed in Ubuntu as suggested here
If you plan to use GPU computation, install CUDA

STEP 2: Run on MIMIC-III

You will first need to request access for MIMIC-III, a publicly avaiable electronic health records collected from ICU patients over 11 years.
You can use "process_mimic.py" located in "data/mimic3/" to process MIMIC-III dataset and generate a suitable training dataset for HAP. Place the script to the same location where the MIMIC-III CSV files are located, and run the script with:
```
 python process_mimic.py ADMISSIONS.csv DIAGNOSES_ICD.csv mimic
```
More instructions are described inside the script. You may use the already processed files included in "data/mimic3/"; otherwise, please copy your generated "mimic.*" files to "data/mimic3/".
Use "build_trees.py" in "data/mimic3/" to build files that contain the ancestor information of each medical code. This requires "ccs_multi_dx_tool_2015.csv" (Multi-level CCS for ICD9), which can be downloaded from here. We also include it in "data/mimic3/".

Run "build_trees.py" with:
```
 python build_trees.py ccs_multi_dx_tool_2015.csv mimic.seqs mimic.types remap
```
Running this script will re-map integer codes assigned to all medical codes. Therefore you also need the ".seqs" file and the ".types" file created by process_mimc.py. The execution command is python build_trees.py ccs_multi_dx_tool_2015.csv <seqs file> <types file> <output path>. This will build five files "remap.level#.pk" and a "remap.p2c" which contain level information and parent to children mapping extracted from the hierarchy. This will replace the old "mimic.seqs" and "mimic.types" files with the correct ones.
Run HAP using the "remap.seqs" and "remap.p2c" files generated by "build_trees.py". The ".seqs" file contains the sequence of visits for each patient. Each visit consists of multiple diagnosis codes. The command is:
```
 python hap.py data/mimic3/ remap.seqs remap.seqs remap result/mimic3/HAP/ --p2c_file remap.p2c --sep_attention --L2 0 --n_epochs 50 
```
More commands for generating the experimental results are contained in "run_mimic.sh".

STEP 3: How to pretrain the code embedding

For sequential diagnoses prediction, it is very effective to pretrain the code embeddings with some co-occurrence based algorithm such as word2vec or GloVe To pretrain the code embeddings with GloVe, do the following:

Use "create_glove_comap.py" with ".seqs" file, which is generated by "build_trees.py". The execution command is:
```
 python create_glove_comap.py remap.seqs remap
```
This will create a file "cooccurrenceMap.pk" that contains the co-occurrence information of codes and ancestors.
Use "glove.py" on the co-occurrence file generated by "create_glove_comap.py". The execution command is:
```
 python glove.py cooccurrenceMap.pk remap pretrained_embedding
```
Use the pretrained embeddings when you train HAP by appending "--embed_file pretrained_embedding.npz" to your command.

Reference

If you find the code useful, please cite our paper:

@inproceedings{zhang2020hierarchical,
  title={Hierarchical Attention Propagation for Healthcare Representation Learning},
  author={Zhang, Muhan and King, Christopher R and Avidan, Michael and Chen, Yixin},
  booktitle={Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery \& Data Mining},
  pages={249--256},
  year={2020}
}

Muhan Zhang, Washington University in St. Louis [email protected] 11/2/2020

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
data		data
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
gram.py		gram.py
hap.py		hap.py
run_mimic.sh		run_mimic.sh
summarize_results.py		summarize_results.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Instructions

Code Description

Running HAP

Reference

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

muhanzhang/HAP

Folders and files

Latest commit

History

Repository files navigation

Instructions

Code Description

Running HAP

Reference

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages