Skip to content
/ HAP Public

Code for "Hierarchical Attention Propagation for Healthcare Representation Learning", KDD 2020.

License

Notifications You must be signed in to change notification settings

muhanzhang/HAP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Instructions

Hierarchical Attention Propagation (HAP) is a medical ontology embedding framework which generalizes GRAM by hierarchically propagating attention across the entire ontology structure, where a medical concept adaptively learns its embedding from all other concepts in the hierarchy instead of only its ancestors.

For more information, please check our paper:

M. Zhang, C. King, M. Avidan, and Y. Chen, Hierarchical Attention Propagation for Healthcare Representation Learning, Proc. ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD-20), 2020. [PDF]

Code Description

Like GRAM, the code trains an RNN (Gated Recurrent Units) to predict, at each timestep (i.e. visit), the diagnosis/procedure codes occurring in the next visit. The code uses Multi-level Clinical Classification Software for ICD-9-CM as the domain knowledge.

Running HAP

STEP 1: Installation

  1. Install python, Theano. We use Python 2.7, Theano 0.8.2. Theano can be easily installed in Ubuntu as suggested here

  2. If you plan to use GPU computation, install CUDA

STEP 2: Run on MIMIC-III

  1. You will first need to request access for MIMIC-III, a publicly avaiable electronic health records collected from ICU patients over 11 years.

  2. You can use "process_mimic.py" located in "data/mimic3/" to process MIMIC-III dataset and generate a suitable training dataset for HAP. Place the script to the same location where the MIMIC-III CSV files are located, and run the script with:

     python process_mimic.py ADMISSIONS.csv DIAGNOSES_ICD.csv mimic
    

    More instructions are described inside the script. You may use the already processed files included in "data/mimic3/"; otherwise, please copy your generated "mimic.*" files to "data/mimic3/".

  3. Use "build_trees.py" in "data/mimic3/" to build files that contain the ancestor information of each medical code. This requires "ccs_multi_dx_tool_2015.csv" (Multi-level CCS for ICD9), which can be downloaded from here. We also include it in "data/mimic3/".

    Run "build_trees.py" with:

     python build_trees.py ccs_multi_dx_tool_2015.csv mimic.seqs mimic.types remap
    

    Running this script will re-map integer codes assigned to all medical codes. Therefore you also need the ".seqs" file and the ".types" file created by process_mimc.py. The execution command is python build_trees.py ccs_multi_dx_tool_2015.csv <seqs file> <types file> <output path>. This will build five files "remap.level#.pk" and a "remap.p2c" which contain level information and parent to children mapping extracted from the hierarchy. This will replace the old "mimic.seqs" and "mimic.types" files with the correct ones.

  4. Run HAP using the "remap.seqs" and "remap.p2c" files generated by "build_trees.py". The ".seqs" file contains the sequence of visits for each patient. Each visit consists of multiple diagnosis codes. The command is:

     python hap.py data/mimic3/ remap.seqs remap.seqs remap result/mimic3/HAP/ --p2c_file remap.p2c --sep_attention --L2 0 --n_epochs 50 
    

    More commands for generating the experimental results are contained in "run_mimic.sh".

STEP 3: How to pretrain the code embedding

For sequential diagnoses prediction, it is very effective to pretrain the code embeddings with some co-occurrence based algorithm such as word2vec or GloVe To pretrain the code embeddings with GloVe, do the following:

  1. Use "create_glove_comap.py" with ".seqs" file, which is generated by "build_trees.py". The execution command is:

     python create_glove_comap.py remap.seqs remap
    

    This will create a file "cooccurrenceMap.pk" that contains the co-occurrence information of codes and ancestors.

  2. Use "glove.py" on the co-occurrence file generated by "create_glove_comap.py". The execution command is:

     python glove.py cooccurrenceMap.pk remap pretrained_embedding
    
  3. Use the pretrained embeddings when you train HAP by appending "--embed_file pretrained_embedding.npz" to your command.

Reference

If you find the code useful, please cite our paper:

@inproceedings{zhang2020hierarchical,
  title={Hierarchical Attention Propagation for Healthcare Representation Learning},
  author={Zhang, Muhan and King, Christopher R and Avidan, Michael and Chen, Yixin},
  booktitle={Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery \& Data Mining},
  pages={249--256},
  year={2020}
}

Muhan Zhang, Washington University in St. Louis [email protected] 11/2/2020

About

Code for "Hierarchical Attention Propagation for Healthcare Representation Learning", KDD 2020.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages