Citation

Disentangled Language-Image-Label Pre-training

In this work, we focus on pre-training large-scale vision models for chest X-ray (CXR) understanding. In this domain, raw datasets naturally present text supervision thanks to medical reports. Nevertheless, only a few datasets with such information are available, as fine-grained labels - obtained from entity extraction methods - are the most popular choice. Nevertheless, current literature is mostly focused on vision-language pre-training, which might struggle to incorporate label information, and thus fail to scale to integrate more datasets properly. In contrast, we propose:

Unimodal pre-training using image-label information.
Disentangled Language-Image-Label Pre-training, DLILP, which separately aligns image-text and image-label supervision.

Install DLILP

Install in your enviroment a compatible torch version with your GPU. For example:

conda create -n dlilp python=3.11 -y
conda activate dlilp
pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu124

Install DLILP library (only for basic models usage):

pip install git+https://github.com/jusiro/DLILP.git

Usage

from PIL import Image
import numpy as np

# Import Chest X-ray VLM
from dlilp import VLMModel

# Set model
model = VLMModel.from_pretrained("jusiro2/DLILP_CMP")

# There are several other available weights, with models pre-trained on:
# CheXpert (C), MIMIC (M), and PadChest (P).
# "jusiro2/DLILP_CMP" - "jusiro2/DLILP_CM" - "jusiro2/DLILP_M" - "jusiro2/DLILP_C"
# "jusiro2/CXR_Unimodal_CMP" - "jusiro2/CXR_Unimodal_CM" - ...
# "jusiro2/CONVIRT". 

# Load image and set target categories 
# (if the repo is not cloned, download the image and change the path!)

image = np.array(Image.open("./DLILP/local_data/media/sample_bronchopneumonia.png"))[:,:,0:3]
text = ["normal", "no finding", "pneumonia", "osteopenia", "calcified adenopathy",
        "broncho-pneumonia", "opacities"]
        
# Forward DLILP model using the visual-textual projection
model.caption = "[CLS]"
probs, logits = model(image, text)

print("Image-Text similarities:")
print(logits.round(3)) # [[ 0.818  1.768  7.617 -1.306  1.051  4.63   5.374]]
print("Probabilities:")
print(probs.round(3))  # [[ 0.001  0.002  0.861  0.     0.001  0.043  0.091]]

Pre-training and transferability

In the following, we present the scripts for model pre-training and transferability. To use them, we recommend cloning the whole repository.

git clone https://github.com/jusiro/DLILP.git
cd DLILP
pip install -r requirements.txt

📦 Datasets

Download datasets. Please, check ./local_data/datasets/README.md for expected datasets structure, links, and instructions.
Define the relative paths for datasets in ./local_data/constants.py.
Create dataset partitions for pre-training and transferability. Follow the instructions at ./local_data/partitions/README.md, and execute python ./local_data/partitions/partitions.py.

📦 Foundation model pre-training

Contrastive language-image pretraining - CLIP.

python main_pretrain.py --learning_criteria clip --exp_id clip_CM --datasets CheXpert-train-frontal,MIMIC-CXR-2-train-frontal

Unified contrastive representations in the label space - UniCL.

python main_pretrain.py --learning_criteria unicl --exp_id unicl_CM --datasets CheXpert-train-frontal,MIMIC-CXR-2-train-frontal

Unimodal (only vision) pre-training.

python main_pretrain.py --learning_criteria unimodal --exp_id unimodal_CM --datasets CheXpert-train-frontal,MIMIC-CXR-2-train-frontal

Disentangled language-image-label pre-training - DLILP.

python main_pretrain.py --learning_criteria dlilp --exp_id dlilp_CM --datasets CheXpert-train-frontal,MIMIC-CXR-2-train-frontal

📦 Pre-trained weights download

We provide our pre-trained weights in the following LINK. You can manually download the weights, and store them at: ./dlilp/modeling/pretrained_weights/[ID].pth. They present the following [ID]: "method_dataset", e.g. dlilp_MCP stands for the pre-training strategy DLILP, using Mimic (M), CheXpert (C) and PadChest (P) datasets.

📦 Transferability to downstream tasks/domains

Zero-shot

python main_transferability.py --experiment chexpert_5x200 --method zero_shot --load_weights True --ensemble True --shots_train 80% --shots_test 20% --folds 5 
python main_transferability.py --experiment mimic_5x200 --method zero_shot --load_weights True --ensemble True --shots_train 80% --shots_test 20%  --folds 5 
python main_transferability.py --experiment covid_train --method zero_shot --load_weights True --ensemble True --shots_train 100% --shots_test 0% --experiment_test covid_test --folds 1 
python main_transferability.py --experiment rsna_pneumonia_train --method zero_shot --load_weights True --ensemble True --shots_train 100% --shots_test 0% --experiment_test rsna_pneumonia_test --folds 1

Linear Probing

python main_transferability.py --experiment chexpert_5x200 --method lp --load_weights True --ensemble True --shots_train 16 --shots_test 20% --folds 5 
python main_transferability.py --experiment mimic_5x200 --method lp --load_weights True --ensemble True --shots_train 16 --shots_test 20%  --folds 5 
python main_transferability.py --experiment covid_train --method lp --load_weights True --ensemble True --shots_train 16 --shots_test 0% --experiment_test covid_test --folds 5
python main_transferability.py --experiment rsna_pneumonia_train --method lp --load_weights True --ensemble True --shots_train 16 --shots_test 0% --experiment_test rsna_pneumonia_test --folds 5

📦 Using other pre-trained models

We also have prepared the framework for evaluating the Linear Probing of recent released models. Note that some details such as image normalization or size might vary. We present some examples of calls below, but additional are incldued at ./local_data/examples/sota.txt. If interested, please refer to MedKLIP, KED, BioVIL, GlorIA, MedCLIP, or CXR-CLIP repositories and allocate their ResNet-50 weights at ./dlilp/modeling/pretrained_weights/other/.

# MedKLIP (ICCV23)
python main_transferability.py --experiment chexpert_5x200 --method lp --shots_train 16 --shots_test 20% --folds 5 --norm True --size 224 --weights_path ./dlilp/modeling/pretrained_weights/other/medklip.pth
# KED (Nat.Comm.23)
python main_transferability.py --experiment chexpert_5x200 --method lp --shots_train 16 --shots_test 20% --folds 5 --norm True --size 224 --weights_path ./dlilp/modeling/pretrained_weights/other/KAD.pt
# BioVIL (Nat.Comm.23)
python main_transferability.py --experiment chexpert_5x200 --method lp --shots_train 16 --shots_test 20% --folds 5 --norm False --size 512 --weights_path ./dlilp/modeling/pretrained_weights/other/biovil.pt
# GlorIA (ICCV21)
python main_transferability.py --experiment chexpert_5x200 --method lp --shots_train 16 --shots_test 20% --folds 5 --norm True --size 224 --weights_path ./dlilp/modeling/pretrained_weights/other/gloria_weights.ckpt
# MedCLIP (EMNLP22)
python main_transferability.py --experiment chexpert_5x200 --method lp --shots_train 16 --shots_test 20% --folds 5 --norm True --size 224 --weights_path ./dlilp/modeling/pretrained_weights/other/medclip_weights.bin
# CXR-CLIP (MICCAI23)
python main_transferability.py --experiment chexpert_5x200 --method lp --shots_train 16 --shots_test 20% --folds 5 --norm True --size 224 --weights_path ./dlilp/modeling/pretrained_weights/other/cxr-clip.tar

Citation

If you find this repository useful, please consider citing this paper:

@inproceedings{dlilp,
    title={A Reality Check of Vision-Language Pre-training in Radiology: Have We Progressed Using Text?},
    author={Julio Silva-Rodríguez and Jose Dolz and Ismail {Ben Ayed}},
    booktitle={Information Processing in Medical Imaging (IPMI)},
    year={2025}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Disentangled Language-Image-Label Pre-training

Install DLILP

Usage

Pre-training and transferability

📦 Datasets

📦 Foundation model pre-training

📦 Pre-trained weights download

📦 Transferability to downstream tasks/domains

📦 Using other pre-trained models

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
dlilp		dlilp
local_data		local_data
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main_pretrain.py		main_pretrain.py
main_transferability.py		main_transferability.py
requirements.txt		requirements.txt
setup.py		setup.py

Folders and files

Latest commit

History

Repository files navigation

Disentangled Language-Image-Label Pre-training

Install DLILP

Usage

Pre-training and transferability

📦 Datasets

📦 Foundation model pre-training

📦 Pre-trained weights download

📦 Transferability to downstream tasks/domains

📦 Using other pre-trained models

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages