Nature Cancer
SMMILe enables accurate spatial quantification in digital pathology using multiple-instance learning.
For HPC NVIDIA A100:
module purge
module load GCCcore/11.3.0 Python/3.10.4
module load GCCcore/11.3.0 OpenSlide/3.4.1-largefiles
module load CUDA/11.3.1
module load cuDNN/8.2.1.32-CUDA-11.3.1
python -m venv ./pyvenv/smmile
source ./pyvenv/smmile/bin/activate
pip install --upgrade pip
pip install -r requirements.txt
The original WSI files (.svs) of TCGA data can be downloaded from GDC Portal page.
We provide extracted embeddings (ResNet-50, Conch) and superpixel segmentation results on our Hugging Face dataset page.
python feature_extraction.py --encoder_name {resnet50/conch} \
--feature_dir /path/to/your/embedding/save/folder\
--anno_dir /path/to/annotation/folder/\
--wsi_dir /path/to/svs/file/folder/\
--file_list_path /path/to/slide/list/file/\
--patch_size 512 --step_size 512 --level 1\
For TCGA datasets (Renal, Lung, Gastric), level = 1.
For Camelyon-16 dataset (Breast), UBC-OCEAN (Ovarian), level = 0.
For datasets with tessellated patches (SICAPv2), patches are organized into subdirectories based on their class labels. Each parent folder represents a WSI, and its subfolders contain patches belonging to different classes.:
/xxx/SICAPv2/
│── 17B00208864/ # WSI ID (Parent folder for a slide)
│ ├── 0/ # Class 0 patches
│ │ ├── patch_001.png
│ │ ├── patch_002.png
│ │ ├── ...
│ ├── 3/ # Class 3 patches
│ │ ├── patch_101.png
│ │ ├── patch_102.png
│ │ ├── ...
│ ├── 4/ # Class 4 patches
│ │ ├── patch_201.png
│ │ ├── patch_202.png
│ │ ├── ...
python feature_extraction_patch.py --encoder_name {resnet50/conch} \
--feature_dir /path/to/your/embedding/save/folder\
--patch_dirs /path/to/patches/*
--file_suffix 0_1024.npy --patch_size 1024
You can also use the standard preprocessing pipeline provided by CLAM. We have a modified version CLAM_Pre to generate embedding files with readable formats for SMMILe.
Set up the size (patch size) as the same as the feature extraction step, n_segments_persp can be set to 9, 16, and 25 for different datasets. We use 16 as default.
python superpixel_generation.py --size 2048 --n_segments_persp 16 --compactness 50 \
--file_suffix '*0_2048.npy' --keyword_feature feature \
--fea_dir /path/to/your/embedding/save/folder \
--sp_dir /path/to/your/superpixel/save/folder/sp_n%d_c%d_%d/
Binary or Multi-class dataset: cd single/
Multi-label dataset: cd multi/
- Setup the config of stage 1, for example,
./single/configs_rcc/config_renal_smmile_r1_conch.yaml, the current config is set for the base version without any module.
python main.py --config ./configs_rcc/config_renal_smmile_r1_conch.yaml \
--drop_with_score --D 1 --superpixel --exp_code smmile_d1sp \
--max_epochs 40
- After stage 1, setup the config of stage 2, for example,
./single/configs_rcc/config_renal_smmile_r1_conch.yaml
python main.py --config ./configs_rcc/config_renal_smmile_r1_conch.yaml --drop_with_score --D 1 \
--superpixel --inst_refinement --mrf --exp_code smmile_d1sp_ref_mrf \
--models_dir /home/z/zeyugao/SMMILe/single/results_conch_rcc/smmile_d1sp_s1 \
--max_epochs 20
Note that using --consistency for the dataset containing normal cases in both stages.
Also, --mrf is not suitable for datasets only with small tumor regions, like Camelyon16, most WSIs only have several patches containing tumor.
The whole test set:
python eval.py --data_root_dir /path/to/extracted/embedding/folder/ \
--data_sp_dir /path/to/superpixels/folder/ \
--results_dir /path/to/trained/models/folder/ \
--models_exp_code smmile_d1sp_ref_mrf_s1 --save_exp_code _conch_rcc
Metric calculation:
python metric_calculate.py --data_root_dir /path/to/eval/results/folder/
Heat map generation:
python generate_heatmap.py \
--model_name smmile \
--wsi_dir '/path/to/original/svs_file/folder/*.svs' \
--results_dir '/path/to/generated/results/folder/' \
--num_workers 8
The single WSI demo (several paths need to be set in demo.py):
python demo.py
This project is licensed under the GPLv3 License and is available for non-commercial academic purposes.
We acknowledge funding and support from Cancer Research UK and the Cancer Research UK Cambridge Centre [CTRQQR-2021-100012], The Mark Foundation for Cancer Research [RG95043], GE HealthCare, and the CRUK National Cancer Imaging Translational Accelerator (NCITA) [A27066]. Additional support was also provided by the National Institute of Health Research (NIHR) Cambridge Biomedical Research Centre [NIHR203312] and EPSRC Tier-2 capital grant [EP/P020259/1]. Calculations were performed in part using the Sulis Tier 2 HPC platform hosted by the Scientific Computing Research Technology Platform at the University of Warwick. Sulis is funded by EPSRC Grant EP/T022108/1 and the HPC Midlands+ consortium. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
We gratefully acknowledge the CLAM repository by Mahmood Lab, upon which the SMMILe framework was developed.
Their open-source contribution provided an essential foundation for our work.
If you find any useful, please cite our paper.
Cite this article
Gao, Z., Mao, A., Dong, Y., Clayton, H., Wu, J., Liu, J., Wang, C., He, K., Gong, T., Li, C. & Crispin-Ortuzar, M.
SMMILe enables accurate spatial quantification in digital pathology using multiple-instance learning.
Nat Cancer (2025).
https://doi.org/10.1038/s43018-025-01060-8
@article{Gao2025SMMILe,
title = {SMMILe enables accurate spatial quantification in digital pathology using multiple-instance learning},
author = {Gao, Zeyu and Mao, Anyu and Dong, Yuxing and Clayton, Hannah and Wu, Jialun and Liu, Jiashuai and Wang, ChunBao and He, Kai and Gong, Tieliang and Li, Chen and Crispin-Ortuzar, Mireia},
journal = {Nature Cancer},
year = {2025},
doi = {10.1038/s43018-025-01060-8},
url = {https://www.nature.com/articles/s43018-025-01060-8}
}

