DermFM-Zero

A Vision-Language Foundation Model for Dermatology

Enabling Zero-Shot Clinical Collaboration & Automated Concept Discovery

DermFM-Zero is the first multimodal foundation model to provide effective clinical decision support across primary care and specialty settings without fine-tuning. Beyond diagnosis, it unlocks emerging capabilities in automated concept discovery, advancing AI-assisted dermatology.

⚠️ Note: The model weights cannot be shared at this time due to internal IP restrictions. We plan to release an open version of the model weights in the future, no later than the time this work is accepted for publication in a peer-reviewed journal.

📘 Documentation | 🚀 Quick Start | 📊 Benchmarks | 💬 Discussion

Highlights

🏆 State-of-the-art Performance: Achieves 73.20% average accuracy across 7 zero-shot classification benchmarks

🔍 Multimodal Fusion: Supports clinical images, dermoscopic images, and patient metadata

🧠 Interpretable AI: Built-in concept discovery with Sparse Autoencoders (SAE)

🌍 Multi-center Validation: Evaluated on datasets from Austria, Brazil, Korea, Portugal, and more

Benchmark Results

DermFM-Zero demonstrates state-of-the-art performance across diverse benchmarks.

Modality: D = Dermoscopic, C = Clinical

Zero-Shot Classification Performance

Model	HAM (7-D)	PAD (6-C)	ISIC2020 (2-D)	PH2 (2-C)	SNU (134-C)	SD-128 (128-C)	Daffodil (5-D)	Average
Task	Skin Cancer	Skin Cancer	Mel Det.	Mel Det.	DDX	DDX	Rare DX	-
Country/Inst	Austria	Brazil	Multi-center	Portugal	Korea	Multi-center	Multi-center	-
Metric	ACC	ACC	AUROC	AUROC	ACC	ACC	ACC	-
CLIP-Large [1]	0.2754	0.3839	0.4772	0.3855	0.0857	0.1210	0.5304	0.3227
BiomedCLIP [2]	0.6347	0.4512	0.7305	0.8441	0.0966	0.1153	0.5785	0.4930
MONET [3]	0.3347	0.4729	0.6940	0.8370	0.1414	0.2028	0.7607	0.4919
MAKE [4]	0.4551	0.5857	0.8141	0.9095	0.3260	0.3886	0.7785	0.6082
DermLIP-ViT-B-16 [5]	0.6813	0.6074	0.8235	0.8285	0.2532	0.2783	0.7246	0.5995
DermLIP-PanDerm [5]	0.6281	0.6247	0.7876	0.7975	0.3332	0.3822	0.7812	0.6192
DermFM-Zero (Ours)	0.7957	0.6941	0.8663	0.9304	0.4450	0.5075	0.8848	0.7320

Few-Shot Learning (10% training data)

Evaluation with limited labeled data to assess data efficiency and representation quality.

Model	HAM (7-class)	ISIC'20 (Melanoma)	PAD (6-class)	SD-128 (128-class)	Average
Task	Skin Cancer	Mel Det.	Skin Cancer	DDX	-
Metric	ACC	AUROC	ACC	ACC	-
CLIP [1]	0.7798	0.7828	0.6161	0.3146	0.6233
BiomedCLIP [2]	0.6959	0.4318	0.6499	0.2541	0.5079
MONET [3]	0.8064	0.8036	0.6464	0.2747	0.6328
BiomedGPT [6]	0.7565	0.7838	0.5249	0.1694	0.5586
PanDerm [7]	0.7898	0.8417	0.6508	0.3483	0.6577
DermLIP-ViT-B-16 [5]	0.8157	0.8058	0.6594	0.3552	0.6590
DermLIP-PanDerm [5]	0.8184	0.8707	0.6529	0.3637	0.6764
MAKE [4]	0.8257	0.7813	0.6790	0.3986	0.6712
DINOv3-ViT-L16 [8]	0.7705	0.8310	0.6573	0.3018	0.6401
DINOv3-ViT-7B [8]	0.7871	0.8226	0.6985	0.3345	0.6607
DermFM-Zero (Ours)	0.8416	0.8687	0.6855	0.4007	0.6991

Zero-Shot Cross-Modal Retrieval (Mean Recall)

Evaluated on Derm1M validation set (n = 9,806) and SkinCap (n = 4,000).

Model	Derm1M I→T	Derm1M T→I	SkinCap I→T	SkinCap T→I	Average
CLIP-Large [1]	0.122	0.104	0.174	0.127	0.132
BiomedCLIP [2]	0.188	0.179	0.187	0.175	0.182
MONET [3]	0.171	0.159	0.215	0.203	0.187
DermFM-Zero (Ours)	0.457	0.454	0.369	0.349	0.407

Repository Structure

DermFM-Zero/
├── src/                              # Core models and modules
├── script/                           # Experiment scripts
├── data/                             # Dataset storage
├── automated-concept-discovery/      # SAE & CBM implementation
├── linear_probe/                     # Linear probe utilities
├── multimodal_finetune/              # Multimodal training code
├── requirements.txt                  # Python dependencies
└── README.md                         # Documentation

Quick Start

Installation

git clone [email protected]:SiyuanYan1/DermFM-Zero.git
cd DermFM-Zero

conda create -n panderm python=3.9.20
conda activate panderm
pip install -r requirements.txt

Download Data

Download benchmark data from Google Drive and unzip to the data/ folder.

Expected directory structure:

data/
├── zero-shot-classification/
├── zero-shot-retrieval/
├── linear_probe/
├── multimodal_finetune/
└── automated-concept-discovery/

Quick Example

See our interactive notebook for zero-shot disease classification.

Evaluation Tasks

Task1: Zero-shot Classification

Evaluate DermFM-Zero on 7 dermatology datasets without fine-tuning.

Benchmark datasets: HAM, PAD, ISIC2020, PH2, SNU, SD-128, Daffodil

# Quick run
bash script/zero-shot-eval/PanDerm-v2-zs-classification.sh

# Or detailed command
python src/main.py \
   --val-data="" \
   --dataset-type "csv" \
   --batch-size=1024 \
   --zeroshot-eval1=data/zero-shot-classification/pad-zero-shot-test.csv \
   --zeroshot-eval2=data/zero-shot-classification/HAM-official-7-zero-shot-test.csv \
   --zeroshot-eval3=data/zero-shot-classification/snu-134-zero-shot-test.csv \
   --zeroshot-eval4=data/zero-shot-classification/sd-128-zero-shot-test.csv \
   --zeroshot-eval5=data/zero-shot-classification/daffodil-5-zero-shot-test.csv \
   --zeroshot-eval6=data/zero-shot-classification/ph2-2-zero-shot-test.csv \
   --zeroshot-eval7=data/zero-shot-classification/isic2020-2-zero-shot-test.csv \
   --csv-label-key label \
   --csv-img-key image_path \
   --model 'hf-hub:redlessone/PanDerm2'

Custom Dataset Evaluation

Prepare a CSV file:

image_path,label,diag
examples/image1.png,0,melanoma
examples/image2.png,1,nevus

Configure class names in src/open_clip/zero_shot_metadata.py:

customized_CLASSNAMES = ['melanoma', 'nevus', 'basal cell carcinoma']

Run evaluation:

python src/main.py \
   --dataset-type csv \
   --batch-size 1024 \
   --csv-label-key label \
   --csv-img-key image_path \
   --zeroshot_eval_custom your_data.csv \
   --model 'hf-hub:redlessone/PanDerm2'

Task2: Zero-shot Cross-modal Retrieval

Evaluate image-text retrieval performance on Derm1M Hold-out and SkinCAP datasets.

bash script/zero-shot-eval/PanDerm-v2-zs-retrieval.sh

Task3: Linear Probing

Evaluate feature quality by training linear classifiers on frozen features.

Datasets: HAM, ISIC2020, PAD, SD-128

bash script/linear-probe/PanDerm-v2-lp-eval.sh

Task4: Multimodal Finetuning

Fine-tune DermFM-Zero with clinical images, dermoscopic images, and patient metadata.

Dataset modalities:

Derm7pt: Clinical + Dermoscopic + Metadata
MILK-11: Clinical + Dermoscopic
PAD-UFES-20: Clinical + Metadata

cd multimodal_finetune

# Choose dataset
bash ../script/multimodal_finetune/Derm7pt\(C+D+M\).sh
bash ../script/multimodal_finetune/MILK11\(C+D\).sh  
bash ../script/multimodal_finetune/PAD\(C+M\).sh

Key hyperparameters:

--model_name: Base model (e.g., PanDerm-v2)
--dataset_name: Target dataset (Derm7pt, MILK-11, PAD)
--epochs: Training epochs (default: 50)
--batch_size: Batch size per GPU (default: 32)
--learning_rate: Learning rate (default: 1e-5)
--use_cli, --use_derm, --use_meta: Enable modalities

Metadata is converted to text prompts - see multimodal_finetune/dataset/prompt.py.

Results are saved to multimodal_finetune-result/.

Task5: Automated Concept Discovery

Discover interpretable concepts using Sparse Autoencoders (SAE) and build Concept Bottleneck Models (CBM).

Prerequisites:

bash script/automated-concept-discovery/env_setup.sh

Download SAE checkpoint from Google Drive to automated-concept-discovery-result/SAE-embeddings/.

Quick run:

bash script/automated-concept-discovery/dermoscopic-melanoma-classification/PanDerm-v2-SAE.sh

Step-by-step pipeline:

# Step 1: Extract visual features
cd src
python export_visual_features.py \
    --model_name hf-hub:redlessone/PanDerm2 \
    --csv_path ../data/automated-concept-discovery/clinical-malignant/meta.csv \
    --data_root ../data/automated-concept-discovery/clinical-malignant/final_images/ \
    --img_col ImageID \
    --batch_size 2048 \
    --output_dir ../automated-concept-discovery-result/clinical-malignant/
cd ..

# Step 2: Extract SAE concepts
python automated-concept-discovery/0_extract_sae_activations.py \
  --checkpoint automated-concept-discovery-result/SAE-embeddings/autoencoder.pth \
  --embeddings automated-concept-discovery-result/clinical-malignant/all_embeddings.npy \
  --output automated-concept-discovery-result/clinical-malignant/learned_activation.npy

# Step 3: Train CBM classifier
python automated-concept-discovery/1_train_clf_binary-class.py \
  --csv data/automated-concept-discovery/clinical-malignant/meta.csv \
  --embeddings automated-concept-discovery-result/clinical-malignant/learned_activation.npy \
  --image_col ImageID \
  --output automated-concept-discovery-result/clinical-malignant/

Analysis tools:

Concept Intervention: script/automated-concept-discovery/ISIC-intervention/
Global Explanation: automated-concept-discovery/global-explanation/
Concept Retrieval: automated-concept-discovery/concept-retrieval/

Results are saved to automated-concept-discovery-result/.

Contributors

License

The model and associated code are released under the CC-BY-NC-ND 4.0 license and may only be used for non-commercial academic research purposes with proper attribution.

Contact

Siyuan Yan - Research Fellow, Monash University
📧 Email: [email protected]

Citation

If you find DermFM-Zero useful, please cite:

@misc{yan2026visionlanguagefoundationmodelzeroshot,
      title={A Vision-Language Foundation Model for Zero-shot Clinical Collaboration and Automated Concept Discovery in Dermatology}, 
      author={Siyuan Yan and Xieji Li and Dan Mo and Philipp Tschandl and Yiwen Jiang and Zhonghua Wang and Ming Hu and Lie Ju and Cristina Vico-Alonso and Yizhen Zheng and Jiahe Liu and Juexiao Zhou and Camilla Chello and Jen G. Cheung and Julien Anriot and Luc Thomas and Clare Primiero and Gin Tan and Aik Beng Ng and Simon See and Xiaoying Tang and Albert Ip and Xiaoyang Liao and Adrian Bowling and Martin Haskett and Shuang Zhao and Monika Janda and H. Peter Soyer and Victoria Mar and Harald Kittler and Zongyuan Ge},
      year={2026},
      eprint={2602.10624},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2602.10624}, 
}

Related work:

@article{yan2025multimodal,
  title={A multimodal vision foundation model for clinical dermatology},
  author={Yan, Siyuan and Yu, Zhen and Primiero, Clare and Vico-Alonso, Cristina and Wang, Zhonghua and Yang, Litao and Tschandl, Philipp and Hu, Ming and Ju, Lie and Tan, Gin and others},
  journal={Nature Medicine},
  pages={1--12},
  year={2025},
  publisher={Nature Publishing Group}
}

@inproceedings{yan2025derm1m,
  title={Derm1M: A Million-scale Vision-Language Dataset Aligned with Clinical Ontology Knowledge},
  author={Yan, Siyuan and others},
  booktitle={ICCV},
  year={2025}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DermFM-Zero

A Vision-Language Foundation Model for Dermatology

📑 Table of Contents

Highlights

Benchmark Results

Zero-Shot Classification Performance

Few-Shot Learning (10% training data)

Zero-Shot Cross-Modal Retrieval (Mean Recall)

Repository Structure

Quick Start

Installation

Download Data

Quick Example

Evaluation Tasks

Task1: Zero-shot Classification

Task2: Zero-shot Cross-modal Retrieval

Task3: Linear Probing

Task4: Multimodal Finetuning

Task5: Automated Concept Discovery

Contributors

License

Contact

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 68 Commits
automated-concept-discovery		automated-concept-discovery
examples		examples
linear_probe		linear_probe
multimodal_finetune		multimodal_finetune
script		script
src		src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

DermFM-Zero

A Vision-Language Foundation Model for Dermatology

📑 Table of Contents

Highlights

Benchmark Results

Zero-Shot Classification Performance

Few-Shot Learning (10% training data)

Zero-Shot Cross-Modal Retrieval (Mean Recall)

Repository Structure

Quick Start

Installation

Download Data

Quick Example

Evaluation Tasks

Task1: Zero-shot Classification

Task2: Zero-shot Cross-modal Retrieval

Task3: Linear Probing

Task4: Multimodal Finetuning

Task5: Automated Concept Discovery

Contributors

License

Contact

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages