Skip to content

jusiro/DLILP

Repository files navigation

Disentangled Language-Image-Label Pre-training


A Reality Check of Vision-Language Pre-training in Radiology: Have We Progressed Using Text?
📜 Information Processing in Medical Imaging
Julio Silva-Rodríguez, Jose Dolz, Ismail Ben Ayed ⋅ ÉTS Montréal
| Project | Conference | ArXiv | Code | Tutorials |

In this work, we focus on pre-training large-scale vision models for chest X-ray (CXR) understanding. In this domain, raw datasets naturally present text supervision thanks to medical reports. Nevertheless, only a few datasets with such information are available, as fine-grained labels - obtained from entity extraction methods - are the most popular choice. Nevertheless, current literature is mostly focused on vision-language pre-training, which might struggle to incorporate label information, and thus fail to scale to integrate more datasets properly. In contrast, we propose:

  • Unimodal pre-training using image-label information.
  • Disentangled Language-Image-Label Pre-training, DLILP, which separately aligns image-text and image-label supervision.

Install DLILP

  • Install in your enviroment a compatible torch version with your GPU. For example:
conda create -n dlilp python=3.11 -y
conda activate dlilp
pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu124
  • Install DLILP library (only for basic models usage):
pip install git+https://github.com/jusiro/DLILP.git

Usage

from PIL import Image
import numpy as np

# Import Chest X-ray VLM
from dlilp import VLMModel

# Set model
model = VLMModel.from_pretrained("jusiro2/DLILP_CMP")

# There are several other available weights, with models pre-trained on:
# CheXpert (C), MIMIC (M), and PadChest (P).
# "jusiro2/DLILP_CMP" - "jusiro2/DLILP_CM" - "jusiro2/DLILP_M" - "jusiro2/DLILP_C"
# "jusiro2/CXR_Unimodal_CMP" - "jusiro2/CXR_Unimodal_CM" - ...
# "jusiro2/CONVIRT". 

# Load image and set target categories 
# (if the repo is not cloned, download the image and change the path!)

image = np.array(Image.open("./DLILP/local_data/media/sample_bronchopneumonia.png"))[:,:,0:3]
text = ["normal", "no finding", "pneumonia", "osteopenia", "calcified adenopathy",
        "broncho-pneumonia", "opacities"]
        
# Forward DLILP model using the visual-textual projection
model.caption = "[CLS]"
probs, logits = model(image, text)

print("Image-Text similarities:")
print(logits.round(3)) # [[ 0.818  1.768  7.617 -1.306  1.051  4.63   5.374]]
print("Probabilities:")
print(probs.round(3))  # [[ 0.001  0.002  0.861  0.     0.001  0.043  0.091]]

Pre-training and transferability

In the following, we present the scripts for model pre-training and transferability. To use them, we recommend cloning the whole repository.

git clone https://github.com/jusiro/DLILP.git
cd DLILP
pip install -r requirements.txt

📦 Datasets

  1. Download datasets. Please, check ./local_data/datasets/README.md for expected datasets structure, links, and instructions.
  2. Define the relative paths for datasets in ./local_data/constants.py.
  3. Create dataset partitions for pre-training and transferability. Follow the instructions at ./local_data/partitions/README.md, and execute python ./local_data/partitions/partitions.py.

📦 Foundation model pre-training

  • Contrastive language-image pretraining - CLIP.
python main_pretrain.py --learning_criteria clip --exp_id clip_CM --datasets CheXpert-train-frontal,MIMIC-CXR-2-train-frontal
  • Unified contrastive representations in the label space - UniCL.
python main_pretrain.py --learning_criteria unicl --exp_id unicl_CM --datasets CheXpert-train-frontal,MIMIC-CXR-2-train-frontal
  • Unimodal (only vision) pre-training.
python main_pretrain.py --learning_criteria unimodal --exp_id unimodal_CM --datasets CheXpert-train-frontal,MIMIC-CXR-2-train-frontal
  • Disentangled language-image-label pre-training - DLILP.
python main_pretrain.py --learning_criteria dlilp --exp_id dlilp_CM --datasets CheXpert-train-frontal,MIMIC-CXR-2-train-frontal

📦 Pre-trained weights download

We provide our pre-trained weights in the following LINK. You can manually download the weights, and store them at: ./dlilp/modeling/pretrained_weights/[ID].pth. They present the following [ID]: "method_dataset", e.g. dlilp_MCP stands for the pre-training strategy DLILP, using Mimic (M), CheXpert (C) and PadChest (P) datasets.

📦 Transferability to downstream tasks/domains

  • Zero-shot
python main_transferability.py --experiment chexpert_5x200 --method zero_shot --load_weights True --ensemble True --shots_train 80% --shots_test 20% --folds 5 
python main_transferability.py --experiment mimic_5x200 --method zero_shot --load_weights True --ensemble True --shots_train 80% --shots_test 20%  --folds 5 
python main_transferability.py --experiment covid_train --method zero_shot --load_weights True --ensemble True --shots_train 100% --shots_test 0% --experiment_test covid_test --folds 1 
python main_transferability.py --experiment rsna_pneumonia_train --method zero_shot --load_weights True --ensemble True --shots_train 100% --shots_test 0% --experiment_test rsna_pneumonia_test --folds 1 
  • Linear Probing
python main_transferability.py --experiment chexpert_5x200 --method lp --load_weights True --ensemble True --shots_train 16 --shots_test 20% --folds 5 
python main_transferability.py --experiment mimic_5x200 --method lp --load_weights True --ensemble True --shots_train 16 --shots_test 20%  --folds 5 
python main_transferability.py --experiment covid_train --method lp --load_weights True --ensemble True --shots_train 16 --shots_test 0% --experiment_test covid_test --folds 5
python main_transferability.py --experiment rsna_pneumonia_train --method lp --load_weights True --ensemble True --shots_train 16 --shots_test 0% --experiment_test rsna_pneumonia_test --folds 5  

📦 Using other pre-trained models

We also have prepared the framework for evaluating the Linear Probing of recent released models. Note that some details such as image normalization or size might vary. We present some examples of calls below, but additional are incldued at ./local_data/examples/sota.txt. If interested, please refer to MedKLIP, KED, BioVIL, GlorIA, MedCLIP, or CXR-CLIP repositories and allocate their ResNet-50 weights at ./dlilp/modeling/pretrained_weights/other/.

# MedKLIP (ICCV23)
python main_transferability.py --experiment chexpert_5x200 --method lp --shots_train 16 --shots_test 20% --folds 5 --norm True --size 224 --weights_path ./dlilp/modeling/pretrained_weights/other/medklip.pth
# KED (Nat.Comm.23)
python main_transferability.py --experiment chexpert_5x200 --method lp --shots_train 16 --shots_test 20% --folds 5 --norm True --size 224 --weights_path ./dlilp/modeling/pretrained_weights/other/KAD.pt
# BioVIL (Nat.Comm.23)
python main_transferability.py --experiment chexpert_5x200 --method lp --shots_train 16 --shots_test 20% --folds 5 --norm False --size 512 --weights_path ./dlilp/modeling/pretrained_weights/other/biovil.pt
# GlorIA (ICCV21)
python main_transferability.py --experiment chexpert_5x200 --method lp --shots_train 16 --shots_test 20% --folds 5 --norm True --size 224 --weights_path ./dlilp/modeling/pretrained_weights/other/gloria_weights.ckpt
# MedCLIP (EMNLP22)
python main_transferability.py --experiment chexpert_5x200 --method lp --shots_train 16 --shots_test 20% --folds 5 --norm True --size 224 --weights_path ./dlilp/modeling/pretrained_weights/other/medclip_weights.bin
# CXR-CLIP (MICCAI23)
python main_transferability.py --experiment chexpert_5x200 --method lp --shots_train 16 --shots_test 20% --folds 5 --norm True --size 224 --weights_path ./dlilp/modeling/pretrained_weights/other/cxr-clip.tar

Citation

If you find this repository useful, please consider citing this paper:

@inproceedings{dlilp,
    title={A Reality Check of Vision-Language Pre-training in Radiology: Have We Progressed Using Text?},
    author={Julio Silva-Rodríguez and Jose Dolz and Ismail {Ben Ayed}},
    booktitle={Information Processing in Medical Imaging (IPMI)},
    year={2025}
}

About

[IPMI'25] A Reality-check of vision-language pre-training for radiology. DLILP, a disentangled language-image-label pre-training criteria for VLMs.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages