Skip to content

SiyuanYan1/DermFM-Zero

Repository files navigation

DermFM-Zero

A Vision-Language Foundation Model for Dermatology

Enabling Zero-Shot Clinical Collaboration & Automated Concept Discovery


DermFM-Zero is the first multimodal foundation model to provide effective clinical decision support across primary care and specialty settings without fine-tuning. Beyond diagnosis, it unlocks emerging capabilities in automated concept discovery, advancing AI-assisted dermatology.

⚠️ Note: The model weights cannot be shared at this time due to internal IP restrictions. We plan to release an open version of the model weights in the future, no later than the time this work is accepted for publication in a peer-reviewed journal.

Paper Model License Python 3.9+

📘 Documentation | 🚀 Quick Start | 📊 Benchmarks | 💬 Discussion

📑 Table of Contents

Highlights

🏆 State-of-the-art Performance: Achieves 73.20% average accuracy across 7 zero-shot classification benchmarks

🔍 Multimodal Fusion: Supports clinical images, dermoscopic images, and patient metadata

🧠 Interpretable AI: Built-in concept discovery with Sparse Autoencoders (SAE)

🌍 Multi-center Validation: Evaluated on datasets from Austria, Brazil, Korea, Portugal, and more

Benchmark Results

DermFM-Zero demonstrates state-of-the-art performance across diverse benchmarks.

Modality: D = Dermoscopic, C = Clinical

Zero-Shot Classification Performance

Model HAM
(7-D)
PAD
(6-C)
ISIC2020
(2-D)
PH2
(2-C)
SNU
(134-C)
SD-128
(128-C)
Daffodil
(5-D)
Average
Task Skin Cancer Skin Cancer Mel Det. Mel Det. DDX DDX Rare DX -
Country/Inst Austria Brazil Multi-center Portugal Korea Multi-center Multi-center -
Metric ACC ACC AUROC AUROC ACC ACC ACC -
CLIP-Large [1] 0.2754 0.3839 0.4772 0.3855 0.0857 0.1210 0.5304 0.3227
BiomedCLIP [2] 0.6347 0.4512 0.7305 0.8441 0.0966 0.1153 0.5785 0.4930
MONET [3] 0.3347 0.4729 0.6940 0.8370 0.1414 0.2028 0.7607 0.4919
MAKE [4] 0.4551 0.5857 0.8141 0.9095 0.3260 0.3886 0.7785 0.6082
DermLIP-ViT-B-16 [5] 0.6813 0.6074 0.8235 0.8285 0.2532 0.2783 0.7246 0.5995
DermLIP-PanDerm [5] 0.6281 0.6247 0.7876 0.7975 0.3332 0.3822 0.7812 0.6192
DermFM-Zero (Ours) 0.7957 0.6941 0.8663 0.9304 0.4450 0.5075 0.8848 0.7320

Few-Shot Learning (10% training data)

Evaluation with limited labeled data to assess data efficiency and representation quality.

Model HAM
(7-class)
ISIC'20
(Melanoma)
PAD
(6-class)
SD-128
(128-class)
Average
Task Skin Cancer Mel Det. Skin Cancer DDX -
Metric ACC AUROC ACC ACC -
CLIP [1] 0.7798 0.7828 0.6161 0.3146 0.6233
BiomedCLIP [2] 0.6959 0.4318 0.6499 0.2541 0.5079
MONET [3] 0.8064 0.8036 0.6464 0.2747 0.6328
BiomedGPT [6] 0.7565 0.7838 0.5249 0.1694 0.5586
PanDerm [7] 0.7898 0.8417 0.6508 0.3483 0.6577
DermLIP-ViT-B-16 [5] 0.8157 0.8058 0.6594 0.3552 0.6590
DermLIP-PanDerm [5] 0.8184 0.8707 0.6529 0.3637 0.6764
MAKE [4] 0.8257 0.7813 0.6790 0.3986 0.6712
DINOv3-ViT-L16 [8] 0.7705 0.8310 0.6573 0.3018 0.6401
DINOv3-ViT-7B [8] 0.7871 0.8226 0.6985 0.3345 0.6607
DermFM-Zero (Ours) 0.8416 0.8687 0.6855 0.4007 0.6991

Zero-Shot Cross-Modal Retrieval (Mean Recall)

Evaluated on Derm1M validation set (n = 9,806) and SkinCap (n = 4,000).

Model Derm1M
I→T
Derm1M
T→I
SkinCap
I→T
SkinCap
T→I
Average
CLIP-Large [1] 0.122 0.104 0.174 0.127 0.132
BiomedCLIP [2] 0.188 0.179 0.187 0.175 0.182
MONET [3] 0.171 0.159 0.215 0.203 0.187
DermFM-Zero (Ours) 0.457 0.454 0.369 0.349 0.407

Repository Structure

DermFM-Zero/
├── src/                              # Core models and modules
├── script/                           # Experiment scripts
├── data/                             # Dataset storage
├── automated-concept-discovery/      # SAE & CBM implementation
├── linear_probe/                     # Linear probe utilities
├── multimodal_finetune/              # Multimodal training code
├── requirements.txt                  # Python dependencies
└── README.md                         # Documentation

Quick Start

Installation

git clone [email protected]:SiyuanYan1/DermFM-Zero.git
cd DermFM-Zero

conda create -n panderm python=3.9.20
conda activate panderm
pip install -r requirements.txt

Download Data

Download benchmark data from Google Drive and unzip to the data/ folder.

Expected directory structure:

data/
├── zero-shot-classification/
├── zero-shot-retrieval/
├── linear_probe/
├── multimodal_finetune/
└── automated-concept-discovery/

Quick Example

See our interactive notebook for zero-shot disease classification.

Evaluation Tasks

Task1: Zero-shot Classification

Evaluate DermFM-Zero on 7 dermatology datasets without fine-tuning.

Benchmark datasets: HAM, PAD, ISIC2020, PH2, SNU, SD-128, Daffodil

# Quick run
bash script/zero-shot-eval/PanDerm-v2-zs-classification.sh

# Or detailed command
python src/main.py \
   --val-data="" \
   --dataset-type "csv" \
   --batch-size=1024 \
   --zeroshot-eval1=data/zero-shot-classification/pad-zero-shot-test.csv \
   --zeroshot-eval2=data/zero-shot-classification/HAM-official-7-zero-shot-test.csv \
   --zeroshot-eval3=data/zero-shot-classification/snu-134-zero-shot-test.csv \
   --zeroshot-eval4=data/zero-shot-classification/sd-128-zero-shot-test.csv \
   --zeroshot-eval5=data/zero-shot-classification/daffodil-5-zero-shot-test.csv \
   --zeroshot-eval6=data/zero-shot-classification/ph2-2-zero-shot-test.csv \
   --zeroshot-eval7=data/zero-shot-classification/isic2020-2-zero-shot-test.csv \
   --csv-label-key label \
   --csv-img-key image_path \
   --model 'hf-hub:redlessone/PanDerm2'

Custom Dataset Evaluation

Prepare a CSV file:

image_path,label,diag
examples/image1.png,0,melanoma
examples/image2.png,1,nevus

Configure class names in src/open_clip/zero_shot_metadata.py:

customized_CLASSNAMES = ['melanoma', 'nevus', 'basal cell carcinoma']

Run evaluation:

python src/main.py \
   --dataset-type csv \
   --batch-size 1024 \
   --csv-label-key label \
   --csv-img-key image_path \
   --zeroshot_eval_custom your_data.csv \
   --model 'hf-hub:redlessone/PanDerm2'

Task2: Zero-shot Cross-modal Retrieval

Evaluate image-text retrieval performance on Derm1M Hold-out and SkinCAP datasets.

bash script/zero-shot-eval/PanDerm-v2-zs-retrieval.sh

Task3: Linear Probing

Evaluate feature quality by training linear classifiers on frozen features.

Datasets: HAM, ISIC2020, PAD, SD-128

bash script/linear-probe/PanDerm-v2-lp-eval.sh

Task4: Multimodal Finetuning

Fine-tune DermFM-Zero with clinical images, dermoscopic images, and patient metadata.

Dataset modalities:

  • Derm7pt: Clinical + Dermoscopic + Metadata
  • MILK-11: Clinical + Dermoscopic
  • PAD-UFES-20: Clinical + Metadata
cd multimodal_finetune

# Choose dataset
bash ../script/multimodal_finetune/Derm7pt\(C+D+M\).sh
bash ../script/multimodal_finetune/MILK11\(C+D\).sh  
bash ../script/multimodal_finetune/PAD\(C+M\).sh

Key hyperparameters:

  • --model_name: Base model (e.g., PanDerm-v2)
  • --dataset_name: Target dataset (Derm7pt, MILK-11, PAD)
  • --epochs: Training epochs (default: 50)
  • --batch_size: Batch size per GPU (default: 32)
  • --learning_rate: Learning rate (default: 1e-5)
  • --use_cli, --use_derm, --use_meta: Enable modalities

Metadata is converted to text prompts - see multimodal_finetune/dataset/prompt.py.

Results are saved to multimodal_finetune-result/.

Task5: Automated Concept Discovery

Discover interpretable concepts using Sparse Autoencoders (SAE) and build Concept Bottleneck Models (CBM).

Prerequisites:

bash script/automated-concept-discovery/env_setup.sh

Download SAE checkpoint from Google Drive to automated-concept-discovery-result/SAE-embeddings/.

Quick run:

bash script/automated-concept-discovery/dermoscopic-melanoma-classification/PanDerm-v2-SAE.sh

Step-by-step pipeline:

# Step 1: Extract visual features
cd src
python export_visual_features.py \
    --model_name hf-hub:redlessone/PanDerm2 \
    --csv_path ../data/automated-concept-discovery/clinical-malignant/meta.csv \
    --data_root ../data/automated-concept-discovery/clinical-malignant/final_images/ \
    --img_col ImageID \
    --batch_size 2048 \
    --output_dir ../automated-concept-discovery-result/clinical-malignant/
cd ..

# Step 2: Extract SAE concepts
python automated-concept-discovery/0_extract_sae_activations.py \
  --checkpoint automated-concept-discovery-result/SAE-embeddings/autoencoder.pth \
  --embeddings automated-concept-discovery-result/clinical-malignant/all_embeddings.npy \
  --output automated-concept-discovery-result/clinical-malignant/learned_activation.npy

# Step 3: Train CBM classifier
python automated-concept-discovery/1_train_clf_binary-class.py \
  --csv data/automated-concept-discovery/clinical-malignant/meta.csv \
  --embeddings automated-concept-discovery-result/clinical-malignant/learned_activation.npy \
  --image_col ImageID \
  --output automated-concept-discovery-result/clinical-malignant/

Analysis tools:

Results are saved to automated-concept-discovery-result/.

Contributors

License

The model and associated code are released under the CC-BY-NC-ND 4.0 license and may only be used for non-commercial academic research purposes with proper attribution.

Contact

Siyuan Yan - Research Fellow, Monash University
📧 Email: [email protected]

Citation

If you find DermFM-Zero useful, please cite:

@misc{yan2026visionlanguagefoundationmodelzeroshot,
      title={A Vision-Language Foundation Model for Zero-shot Clinical Collaboration and Automated Concept Discovery in Dermatology}, 
      author={Siyuan Yan and Xieji Li and Dan Mo and Philipp Tschandl and Yiwen Jiang and Zhonghua Wang and Ming Hu and Lie Ju and Cristina Vico-Alonso and Yizhen Zheng and Jiahe Liu and Juexiao Zhou and Camilla Chello and Jen G. Cheung and Julien Anriot and Luc Thomas and Clare Primiero and Gin Tan and Aik Beng Ng and Simon See and Xiaoying Tang and Albert Ip and Xiaoyang Liao and Adrian Bowling and Martin Haskett and Shuang Zhao and Monika Janda and H. Peter Soyer and Victoria Mar and Harald Kittler and Zongyuan Ge},
      year={2026},
      eprint={2602.10624},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2602.10624}, 
}

Related work:

@article{yan2025multimodal,
  title={A multimodal vision foundation model for clinical dermatology},
  author={Yan, Siyuan and Yu, Zhen and Primiero, Clare and Vico-Alonso, Cristina and Wang, Zhonghua and Yang, Litao and Tschandl, Philipp and Hu, Ming and Ju, Lie and Tan, Gin and others},
  journal={Nature Medicine},
  pages={1--12},
  year={2025},
  publisher={Nature Publishing Group}
}
@inproceedings{yan2025derm1m,
  title={Derm1M: A Million-scale Vision-Language Dataset Aligned with Clinical Ontology Knowledge},
  author={Yan, Siyuan and others},
  booktitle={ICCV},
  year={2025}
}

About

Next-generation dermatology FM

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors