Enabling Zero-Shot Clinical Collaboration & Automated Concept Discovery
DermFM-Zero is the first multimodal foundation model to provide effective clinical decision support across primary care and specialty settings without fine-tuning. Beyond diagnosis, it unlocks emerging capabilities in automated concept discovery, advancing AI-assisted dermatology.
⚠️ Note: The model weights cannot be shared at this time due to internal IP restrictions. We plan to release an open version of the model weights in the future, no later than the time this work is accepted for publication in a peer-reviewed journal.
📘 Documentation | 🚀 Quick Start | 📊 Benchmarks | 💬 Discussion
- Highlights
- Benchmark Results
- Repository Structure
- Quick Start
- Evaluation Tasks
- Contributors
- License
- Contact
- Citation
🏆 State-of-the-art Performance: Achieves 73.20% average accuracy across 7 zero-shot classification benchmarks
🔍 Multimodal Fusion: Supports clinical images, dermoscopic images, and patient metadata
🧠 Interpretable AI: Built-in concept discovery with Sparse Autoencoders (SAE)
🌍 Multi-center Validation: Evaluated on datasets from Austria, Brazil, Korea, Portugal, and more
DermFM-Zero demonstrates state-of-the-art performance across diverse benchmarks.
Modality: D = Dermoscopic, C = Clinical
| Model | HAM (7-D) |
PAD (6-C) |
ISIC2020 (2-D) |
PH2 (2-C) |
SNU (134-C) |
SD-128 (128-C) |
Daffodil (5-D) |
Average |
|---|---|---|---|---|---|---|---|---|
| Task | Skin Cancer | Skin Cancer | Mel Det. | Mel Det. | DDX | DDX | Rare DX | - |
| Country/Inst | Austria | Brazil | Multi-center | Portugal | Korea | Multi-center | Multi-center | - |
| Metric | ACC | ACC | AUROC | AUROC | ACC | ACC | ACC | - |
| CLIP-Large [1] | 0.2754 | 0.3839 | 0.4772 | 0.3855 | 0.0857 | 0.1210 | 0.5304 | 0.3227 |
| BiomedCLIP [2] | 0.6347 | 0.4512 | 0.7305 | 0.8441 | 0.0966 | 0.1153 | 0.5785 | 0.4930 |
| MONET [3] | 0.3347 | 0.4729 | 0.6940 | 0.8370 | 0.1414 | 0.2028 | 0.7607 | 0.4919 |
| MAKE [4] | 0.4551 | 0.5857 | 0.8141 | 0.9095 | 0.3260 | 0.3886 | 0.7785 | 0.6082 |
| DermLIP-ViT-B-16 [5] | 0.6813 | 0.6074 | 0.8235 | 0.8285 | 0.2532 | 0.2783 | 0.7246 | 0.5995 |
| DermLIP-PanDerm [5] | 0.6281 | 0.6247 | 0.7876 | 0.7975 | 0.3332 | 0.3822 | 0.7812 | 0.6192 |
| DermFM-Zero (Ours) | 0.7957 | 0.6941 | 0.8663 | 0.9304 | 0.4450 | 0.5075 | 0.8848 | 0.7320 |
Evaluation with limited labeled data to assess data efficiency and representation quality.
| Model | HAM (7-class) |
ISIC'20 (Melanoma) |
PAD (6-class) |
SD-128 (128-class) |
Average |
|---|---|---|---|---|---|
| Task | Skin Cancer | Mel Det. | Skin Cancer | DDX | - |
| Metric | ACC | AUROC | ACC | ACC | - |
| CLIP [1] | 0.7798 | 0.7828 | 0.6161 | 0.3146 | 0.6233 |
| BiomedCLIP [2] | 0.6959 | 0.4318 | 0.6499 | 0.2541 | 0.5079 |
| MONET [3] | 0.8064 | 0.8036 | 0.6464 | 0.2747 | 0.6328 |
| BiomedGPT [6] | 0.7565 | 0.7838 | 0.5249 | 0.1694 | 0.5586 |
| PanDerm [7] | 0.7898 | 0.8417 | 0.6508 | 0.3483 | 0.6577 |
| DermLIP-ViT-B-16 [5] | 0.8157 | 0.8058 | 0.6594 | 0.3552 | 0.6590 |
| DermLIP-PanDerm [5] | 0.8184 | 0.8707 | 0.6529 | 0.3637 | 0.6764 |
| MAKE [4] | 0.8257 | 0.7813 | 0.6790 | 0.3986 | 0.6712 |
| DINOv3-ViT-L16 [8] | 0.7705 | 0.8310 | 0.6573 | 0.3018 | 0.6401 |
| DINOv3-ViT-7B [8] | 0.7871 | 0.8226 | 0.6985 | 0.3345 | 0.6607 |
| DermFM-Zero (Ours) | 0.8416 | 0.8687 | 0.6855 | 0.4007 | 0.6991 |
Evaluated on Derm1M validation set (n = 9,806) and SkinCap (n = 4,000).
| Model | Derm1M I→T |
Derm1M T→I |
SkinCap I→T |
SkinCap T→I |
Average |
|---|---|---|---|---|---|
| CLIP-Large [1] | 0.122 | 0.104 | 0.174 | 0.127 | 0.132 |
| BiomedCLIP [2] | 0.188 | 0.179 | 0.187 | 0.175 | 0.182 |
| MONET [3] | 0.171 | 0.159 | 0.215 | 0.203 | 0.187 |
| DermFM-Zero (Ours) | 0.457 | 0.454 | 0.369 | 0.349 | 0.407 |
DermFM-Zero/
├── src/ # Core models and modules
├── script/ # Experiment scripts
├── data/ # Dataset storage
├── automated-concept-discovery/ # SAE & CBM implementation
├── linear_probe/ # Linear probe utilities
├── multimodal_finetune/ # Multimodal training code
├── requirements.txt # Python dependencies
└── README.md # Documentation
git clone [email protected]:SiyuanYan1/DermFM-Zero.git
cd DermFM-Zero
conda create -n panderm python=3.9.20
conda activate panderm
pip install -r requirements.txtDownload benchmark data from Google Drive and unzip to the data/ folder.
Expected directory structure:
data/
├── zero-shot-classification/
├── zero-shot-retrieval/
├── linear_probe/
├── multimodal_finetune/
└── automated-concept-discovery/
See our interactive notebook for zero-shot disease classification.
Evaluate DermFM-Zero on 7 dermatology datasets without fine-tuning.
Benchmark datasets: HAM, PAD, ISIC2020, PH2, SNU, SD-128, Daffodil
# Quick run
bash script/zero-shot-eval/PanDerm-v2-zs-classification.sh
# Or detailed command
python src/main.py \
--val-data="" \
--dataset-type "csv" \
--batch-size=1024 \
--zeroshot-eval1=data/zero-shot-classification/pad-zero-shot-test.csv \
--zeroshot-eval2=data/zero-shot-classification/HAM-official-7-zero-shot-test.csv \
--zeroshot-eval3=data/zero-shot-classification/snu-134-zero-shot-test.csv \
--zeroshot-eval4=data/zero-shot-classification/sd-128-zero-shot-test.csv \
--zeroshot-eval5=data/zero-shot-classification/daffodil-5-zero-shot-test.csv \
--zeroshot-eval6=data/zero-shot-classification/ph2-2-zero-shot-test.csv \
--zeroshot-eval7=data/zero-shot-classification/isic2020-2-zero-shot-test.csv \
--csv-label-key label \
--csv-img-key image_path \
--model 'hf-hub:redlessone/PanDerm2'Custom Dataset Evaluation
Prepare a CSV file:
image_path,label,diag
examples/image1.png,0,melanoma
examples/image2.png,1,nevusConfigure class names in src/open_clip/zero_shot_metadata.py:
customized_CLASSNAMES = ['melanoma', 'nevus', 'basal cell carcinoma']Run evaluation:
python src/main.py \
--dataset-type csv \
--batch-size 1024 \
--csv-label-key label \
--csv-img-key image_path \
--zeroshot_eval_custom your_data.csv \
--model 'hf-hub:redlessone/PanDerm2'Evaluate image-text retrieval performance on Derm1M Hold-out and SkinCAP datasets.
bash script/zero-shot-eval/PanDerm-v2-zs-retrieval.shEvaluate feature quality by training linear classifiers on frozen features.
Datasets: HAM, ISIC2020, PAD, SD-128
bash script/linear-probe/PanDerm-v2-lp-eval.shFine-tune DermFM-Zero with clinical images, dermoscopic images, and patient metadata.
Dataset modalities:
- Derm7pt: Clinical + Dermoscopic + Metadata
- MILK-11: Clinical + Dermoscopic
- PAD-UFES-20: Clinical + Metadata
cd multimodal_finetune
# Choose dataset
bash ../script/multimodal_finetune/Derm7pt\(C+D+M\).sh
bash ../script/multimodal_finetune/MILK11\(C+D\).sh
bash ../script/multimodal_finetune/PAD\(C+M\).shKey hyperparameters:
--model_name: Base model (e.g.,PanDerm-v2)--dataset_name: Target dataset (Derm7pt,MILK-11,PAD)--epochs: Training epochs (default: 50)--batch_size: Batch size per GPU (default: 32)--learning_rate: Learning rate (default: 1e-5)--use_cli,--use_derm,--use_meta: Enable modalities
Metadata is converted to text prompts - see multimodal_finetune/dataset/prompt.py.
Results are saved to multimodal_finetune-result/.
Discover interpretable concepts using Sparse Autoencoders (SAE) and build Concept Bottleneck Models (CBM).
Prerequisites:
bash script/automated-concept-discovery/env_setup.shDownload SAE checkpoint from Google Drive to automated-concept-discovery-result/SAE-embeddings/.
Quick run:
bash script/automated-concept-discovery/dermoscopic-melanoma-classification/PanDerm-v2-SAE.shStep-by-step pipeline:
# Step 1: Extract visual features
cd src
python export_visual_features.py \
--model_name hf-hub:redlessone/PanDerm2 \
--csv_path ../data/automated-concept-discovery/clinical-malignant/meta.csv \
--data_root ../data/automated-concept-discovery/clinical-malignant/final_images/ \
--img_col ImageID \
--batch_size 2048 \
--output_dir ../automated-concept-discovery-result/clinical-malignant/
cd ..
# Step 2: Extract SAE concepts
python automated-concept-discovery/0_extract_sae_activations.py \
--checkpoint automated-concept-discovery-result/SAE-embeddings/autoencoder.pth \
--embeddings automated-concept-discovery-result/clinical-malignant/all_embeddings.npy \
--output automated-concept-discovery-result/clinical-malignant/learned_activation.npy
# Step 3: Train CBM classifier
python automated-concept-discovery/1_train_clf_binary-class.py \
--csv data/automated-concept-discovery/clinical-malignant/meta.csv \
--embeddings automated-concept-discovery-result/clinical-malignant/learned_activation.npy \
--image_col ImageID \
--output automated-concept-discovery-result/clinical-malignant/Analysis tools:
- Concept Intervention:
script/automated-concept-discovery/ISIC-intervention/ - Global Explanation:
automated-concept-discovery/global-explanation/ - Concept Retrieval:
automated-concept-discovery/concept-retrieval/
Results are saved to automated-concept-discovery-result/.
The model and associated code are released under the CC-BY-NC-ND 4.0 license and may only be used for non-commercial academic research purposes with proper attribution.
Siyuan Yan - Research Fellow, Monash University
📧 Email: [email protected]
If you find DermFM-Zero useful, please cite:
@misc{yan2026visionlanguagefoundationmodelzeroshot,
title={A Vision-Language Foundation Model for Zero-shot Clinical Collaboration and Automated Concept Discovery in Dermatology},
author={Siyuan Yan and Xieji Li and Dan Mo and Philipp Tschandl and Yiwen Jiang and Zhonghua Wang and Ming Hu and Lie Ju and Cristina Vico-Alonso and Yizhen Zheng and Jiahe Liu and Juexiao Zhou and Camilla Chello and Jen G. Cheung and Julien Anriot and Luc Thomas and Clare Primiero and Gin Tan and Aik Beng Ng and Simon See and Xiaoying Tang and Albert Ip and Xiaoyang Liao and Adrian Bowling and Martin Haskett and Shuang Zhao and Monika Janda and H. Peter Soyer and Victoria Mar and Harald Kittler and Zongyuan Ge},
year={2026},
eprint={2602.10624},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2602.10624},
}Related work:
@article{yan2025multimodal,
title={A multimodal vision foundation model for clinical dermatology},
author={Yan, Siyuan and Yu, Zhen and Primiero, Clare and Vico-Alonso, Cristina and Wang, Zhonghua and Yang, Litao and Tschandl, Philipp and Hu, Ming and Ju, Lie and Tan, Gin and others},
journal={Nature Medicine},
pages={1--12},
year={2025},
publisher={Nature Publishing Group}
}@inproceedings{yan2025derm1m,
title={Derm1M: A Million-scale Vision-Language Dataset Aligned with Clinical Ontology Knowledge},
author={Yan, Siyuan and others},
booktitle={ICCV},
year={2025}
}