This is the official repository of DiffLens: Dissecting and Mitigating Diffusion Bias via Mechanistic Interpretability (CVPR2025)
[2025.06.13] Release code.
TL;DR: A method to dissect and mitigate social bias in diffusion models by identifying and intervening in key bias-generating features within neuron activities.
Please follow the steps below to perform the installation:
conda create -n difflens python=3.9
conda activate difflenspip install -r requirements.txtDownload the CelebA-HQ checkpoint from P2 weighting (CVPR 2022), and then put the checkpoint into a folder pretrained_model/P2.
Download the model from Stable Diffusion v1.5.
Download the res34_fair_align_multi_7_20190809.pt checkpoint from FairFace (WACV 2021), and then put the checkpoint into a folder evaluation/Fairface.
If you want to evaluate the images generated by Stable Diffusion v1.5, you need download dlib checkpoints from FairFace (WACV 2021).
You can skip this step and proceed directly to the "Bias Mitigation" step because we have provided ready-to-use resources.
cd ./utils/P2
# for training h-classifier
python prepare_h_classifier_latents.py
# for locating features
python prepare_locating_latents.pyTip
For P2, You can skip this step and proceed directly to the "Bias Mitigation" step, because we have provided ready-to-use resources in Checkpints (Google Drive). For Stable Diffsion v1.5, you can skip both "Train SAE" and "Train h-classifier" steps.
# You can also use torchrun to start pytorch DDP
# Set train_sae.train to True in ./config/P2/example_P2.yaml.
python -m difflens ./config/P2/example_P2.yamlThen the SAE checkpoint will be saved into sae-ckpts.
# set train_h_classifier.train to True in ./config/P2/example_P2.yaml.
python -m difflens ./config/P2/train_h_classifier/h_classifier_config.yaml# set locate.locate to True in ./config/P2/example_P2.yaml.
python -m difflens ./config/P2/example_P2.yamlAfter locating features, you need aggregate features across diffusion timesteps.
cd ./utils/P2
python aggregate_features.pyWe provide an example of P2 model. You can download SAE checkpoint and features from Checkpints (Google Drive)
You can generate images using following commands.
# Set bias_mitigation.generate to True in ./config/P2/example_P2.yaml.
python -m difflens ./config/P2/example_P2.yamlSome parameters in ./config/P2/bias_mitigation/generate_config_age.yaml
target_attr: age (Choose one fromage,genderandrace)top_k_list: 30_30_30 (e.g.genderhas two classes,maleandfemale, and thetop_k_listshould be30_30)edit_ratios: 1.0_1.0_3.0 (e.g.genderhas two classes,maleandfemale, and theedit_ratiosshould be1.0_1.0)edit_method: multiply_all_probability (Choose one frommultiply_all,multiply_all_probability,add_all_probabilityandadd_all_probability.multiply_allmeansScalingfeatures regardless pos in images.add_allmeansAddingfeatures regardless pos in images.)
- Fairness Discrepancy:
For FD metric, you can use Fairface to test your images. Here is an example of
age.
If you want to evaluate images generated by Stable Diffusion v1.5, you need to crop face using ./evaluation/crop_face/crop.py first.
cd ./evaluation/Fairface
python age.pyEdit file_lists if you want to change the path of images.
- CLIP-I:
We provide a script
clip_image_score.pyfor CLIP-I.
cd ./evaluation/CLIP
python clip_image_score.py- FID:
We use
clean-fidto calculate FID.
cd ./evaluation/FID
python fid.py@InProceedings{Shi_2025_CVPR,
author = {Shi, Yingdong and Li, Changming and Wang, Yifan and Zhao, Yongxiang and Pang, Anqi and Yang, Sibei and Yu, Jingyi and Ren, Kan},
title = {Dissecting and Mitigating Diffusion Bias via Mechanistic Interpretability},
booktitle = {Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR)},
month = {June},
year = {2025},
pages = {8192-8202}
}