This is the reference implementation of the ICLR 2021 paper "Rethinking the Role of Gradient-Based Attribution Methods for Model Interpretability".
The repository contains methods to both train classifiers with generative score-matching regularizers, as well as methods to evaluate the interpretability and generative modelling properties of existing models.
First train regularized models using
python train.py --regularizer=<score/anti/gnorm> --model-arch=<resnet9/18/34> --dataset=<cifar10/cifar100>
Next perform evaluations on trained models using
python evaluations.py --eval=<visualize-saliency-and-samples/compute-sample-quality/pixel-perturb> --model-name=<path-to-model> --model-arch=<resnet9/18/34> --dataset=<cifar10/cifar100>
Please see example_scripts.sh for specific examples.
The code is organized as follows:
-
regularizers.pydefinesScoreMatching,AntiScoreMatchingandGradNormRegularizermethods -
train.pycontains scripts to train CIFAR10/100 models with (and without) above regularizers -
evaluations.pycontains scripts to perform the following evaluations on trained models:visualize-saliency-and-samples, which computes saliency maps, and samples using activation maximization on the outputs, and dumps the resulting output imagescompute-sample-quality, generate samples using activation maximization as above, and use the GAN-test to evaluate sample qualitypixel-perturb, evaluate saliency map faithfulness using the pixel perturbation method
-
example_scripts.shcontains example scripts to runtrain.pyandevaluations.py -
utils\folder contains routines to compute gradients, perform pixel perturbation, activation maximization and other miscellaneous functions -
models\folder contains model definitions
torch torchvision numpy matplotlib pandas
If you found our work helpful for your research, please do consider citing our paper.
@inproceedings{srinivas2021rethinking,
title={Rethinking the Role of Gradient-based Attribution Methods for Model Interpretability},
author={Suraj Srinivas and Francois Fleuret},
booktitle={International Conference on Learning Representations},
year={2021},
url={https://openreview.net/forum?id=dYeAHXnpWJ4}
}