ViLU: Learning Vision-Language Uncertainties for Failure Prediction

This is the official Pytorch implementation of ViLU accepted at ICCV2025.

Installation

conda create -n vilu python=3.11
conda activate vilu
conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia
pip install -e .

Datasets

data/
|––––ImageNet/
|––––––––ILSVRC2012_devkit_t12.tar.gz
|––––––––train/
|––––––––val/
|––––CIFAR-10/
|––––––––cifar-10-batches-py/
|––––CIFAR-100/
|––––––––cifar-100-python/
|––––caltech101/
|––––––––101_ObjectCategories/
|––––EuroSAT_RGB/
|––––dtddataset/
|––––––––dtd/
|––––––––––––images/
|––––fgvc-aircraft-2013b/
|––––flowers102/
|––––food-101/
|––––––––images/
|––––oxford-iiit-pet/
|––––––––images/
|––––stanford_cars/
|––––SUN397/
|––––UCF-101-midframes/
|––––CC3M/
|––––––––train/
|––––––––val/
|––––CC12M/
|––––––––train/
|––––––––val/
|––––LAION/
|––––––––train/
|––––––––val/

Training

We provide training scripts in the scripts folder. For instance, to launch the training of vilu on CIFAR-10 with the ViT-B/32 backbone with 400 epochs do:

sh scripts/train_vilu_cifar10.sh

Note that you may adapt the values of CUDA_VISIBLE_DEVICES. Training on an RTX A6000 GPU takes around 2 hours. The main code for viewing the training loops is located in vilu/engine/trainer.py

Notebook for reproductibility and visualisations

In the notebooks folder you can find the scripts for loading and evaluating the pre-trained vilu models on each dataset, as well as the code for visualising the failure curves.

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
notebooks		notebooks
scripts		scripts
vilu		vilu
visu		visu
weights		weights
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
lint.sh		lint.sh
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ViLU: Learning Vision-Language Uncertainties for Failure Prediction

Installation

Datasets

Training

Notebook for reproductibility and visualisations

About

Uh oh!

Releases

Packages

Languages

License

ykrmm/ViLU

Folders and files

Latest commit

History

Repository files navigation

ViLU: Learning Vision-Language Uncertainties for Failure Prediction

Installation

Datasets

Training

Notebook for reproductibility and visualisations

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages