This repository is the official implementation of TMSE.
Abstract: Survival prediction plays a crucial role in clinical decision-making, enabling personalized treatments by integrating multi-modal medical data, such as histopathology images, pathology reports, and genomic profiles. However, the heterogeneity across these modalities and the high dimensionality of Whole Slide Images (WSI) make it challenging to capture survival-relevant features and model their interactions. Existing methods, typically focused on single-modal WSI, fail to leverage multimodal information, such as expert-driven pathology reports, and struggle with the computational complexity of WSI. To address these issues, we propose a novel Tri-Modal Survival Estimation framework (TMSE), which includes three components: (1) Pathology report processing pipeline, curated with expert knowledge, with both the pipeline and the processed structured report being publicly available; (2) Context-aware Tissue Prototype (CTP) module, which uses Mamba and Gaussian mixture models to extract compact, survival-relevant features from WSI, reducing redundancy while preserving histological details; (3) Attention-Entropy Interaction (AEI) module, a attention mechanism enhanced with entropy-based optimization to align and fuse three modalities: WSI, pathology reports, and genomic data. Extensive evaluation on three TCGA datasets (BLCA, BRCA, LUAD) shows that our approach achieves superior performance in survival prediction.
We preprocess whole slide image (WSI) data using CLAM, which provides an easy-to-use tool for WSI preprocessing. For detailed guidance, we highly recommend referring to the Tutorial - Processing WSIs for MIL from Scratch.
We use PLIP as the patch-level feature encoder and store the extracted features in the /path/to/data_source directory.
Pathway data can be downloaded from MMP and should be placed in the data_csv/rna folder.
To advance pathology research, we publicly release a curated dataset of approximately 10K TCGA reports. We utilize a commercial LLM to refine the original reports for better quality. The cleaned reports and the prompts used in the cleaning process are available in the text_report folder. The original TCGA reports can be accessed from TCGA Path Reports.
For text feature encoding, we employ BiomedBERT, with extracted features stored in the /path/to/text_embeddings directory.
We adopt the same data split strategy as MMP. You may modify and use your own data split file at "/path/to/data_splits/k=${k}".
Install all required packages by running:
pip install -r requirements.txtcd src_TMSERun the following bash script and specify the required arguments:
bash ./scripts/prototype/blca.sh gpu_idRun the following bash script and specify the required arguments:
bash ./scripts/survival/blca_surv.sh gpu_id TMSEThe code for TMSE was adapted and inspired by the fantastic works of PANTHER, MMP MG-Trans and CLAM.
