This respository contains the code for the CVPR 2024 paper Doubly Abductive Counterfactual Inference for Text-based Image Editing.
First, clone the repository:
git clone https://github.com/xuesong39/DACThen, install the dependencies in a new virtual environment:
cd DAC
git clone https://github.com/huggingface/diffusers -b v0.24.0
cd diffusers
pip install -e .Finally, cd in the main folder DAC and run:
pip install -r requirements.txtThe images and annotations we use in the paper can be found here.
For the format of data used in the experiments, we provide some examples in the folder DAC/data. For example, for the image DAC/data/cat/train/cat.jpeg, the folder containing source prompt is DAC/data/cat/ while that containing target prompt is DAC/data/cat-cap/.
The fine-tuning script for abduction on U is train_text_to_image_lora.sh as follows:
export MODEL_NAME="stabilityai/stable-diffusion-2-1-base"
export TRAIN_DIR="ORIGIN_DATA_PATH"
CUDA_VISIBLE_DEVICES=0 accelerate launch train_text_to_image_lora.py \
--pretrained_model_name_or_path=$MODEL_NAME \
--train_data_dir=$TRAIN_DIR --caption_column="text" \
--resolution=512 \
--train_batch_size=1 \
--num_train_epochs=1000 --checkpointing_steps=1000 \
--learning_rate=1e-04 --lr_scheduler="constant" --lr_warmup_steps=0 \
--seed=42 \
--rank=512 \
--output_dir="U_PATH" \
--validation_prompt="xxx" \
--report_to="wandb" \
--validation_epochs=500Please specify TRAIN_DIR (e.g., "./data/cat/"), --output_dir (e.g., "ckpt/cat"), and --validation_prompt (e.g., "A cat.").
The fine-tuning script for abduction on Δ is train_text_to_image_lora_t.sh as follows:
export MODEL_NAME="stabilityai/stable-diffusion-2-1-base"
export TRAIN_DIR="TARGET_DATA_PATH"
CUDA_VISIBLE_DEVICES=0 accelerate launch train_text_to_image_lora_t.py \
--pretrained_model_name_or_path=$MODEL_NAME \
--unet_lora_path="U_PATH" \
--train_data_dir=$TRAIN_DIR --caption_column="text" \
--resolution=512 --train_text_encoder \
--train_batch_size=1 \
--num_train_epochs=1000 --checkpointing_steps=1000 \
--learning_rate=1e-04 --lr_scheduler="constant" --lr_warmup_steps=0 \
--seed=42 \
--annealing=0.8 \
--output_dir="DELTA_PATH" \
--report_to="wandb" \
--validation_epochs=500Please specify TRAIN_DIR (e.g., "./data/cat-cap/"), --unet_lora_path (e.g., "ckpt/cat"), and --output_dir (e.g., "ckpt/cat-cap-annealing0.8"). You can also change --annealing to achieve control on hyperparameter
The inference script is inference_t.sh as follows:
CUDA_VISIBLE_DEVICES=0 python inference_t.py \
--annealing=0.8 \
--unet_path="U_PATH" \
--text_path="DELTA_PATH" \
--target_prompt="xxx" \
--save_path="./"Please specify --unet_path (e.g., "ckpt/cat"), --text_path (e.g., "ckpt/cat-cap-annealing0.8"), and --target_prompt (e.g., "A cat wearing a wool cap.").
This part contains the implementation mentioned in the ablation analysis section in the paper, i.e., ablation on Abduction-1. We could incorporate another exogenous variable T in the Abduction-1 to further improve fidelity.
The fine-tuning script for abduction on U is the same as the above.
The fine-tuning script for abduction on T is train_text_to_image_lora_t.sh as follows:
export MODEL_NAME="stabilityai/stable-diffusion-2-1-base"
export TRAIN_DIR="ORIGIN_DATA_PATH"
CUDA_VISIBLE_DEVICES=0 accelerate launch train_text_to_image_lora_t.py \
--pretrained_model_name_or_path=$MODEL_NAME \
--unet_lora_path="U_PATH" \
--train_data_dir=$TRAIN_DIR --caption_column="text" \
--resolution=512 --train_text_encoder \
--train_batch_size=1 \
--num_train_epochs=1000 --checkpointing_steps=1000 \
--learning_rate=1e-04 --lr_scheduler="constant" --lr_warmup_steps=0 \
--seed=42 \
--annealing=0.8 \
--output_dir="T_PATH" \
--report_to="wandb" \
--validation_epochs=500Please specify TRAIN_DIR (e.g., "./data/cat/"), --unet_lora_path (e.g., "ckpt/cat"), and --output_dir (e.g., "ckpt/cat-annealing0.8")
The fine-tuning script for abduction on Δ is train_text_to_image_lora_t2.sh as follows:
export MODEL_NAME="stabilityai/stable-diffusion-2-1-base"
export TRAIN_DIR="TARGTE_DATA_PATH"
CUDA_VISIBLE_DEVICES=0 accelerate launch train_text_to_image_lora_t2.py \
--pretrained_model_name_or_path=$MODEL_NAME \
--unet_lora_path="U_PATH" \
--text_lora1_path="T_PATH" \
--train_data_dir=$TRAIN_DIR --caption_column="text" \
--resolution=512 --train_text_encoder \
--train_batch_size=1 \
--num_train_epochs=1000 --checkpointing_steps=1000 \
--learning_rate=1e-04 --lr_scheduler="constant" --lr_warmup_steps=0 \
--seed=42 \
--annealing=0.8 \
--output_dir="DELTA_PATH" \
--report_to="wandb" \
--validation_epochs=500Please specify TRAIN_DIR (e.g., "./data/cat-cap/"), --unet_lora_path (e.g., "ckpt/cat"), --text_lora1_path (e.g., "ckpt/cat-annealing0.8"), and --output_dir (e.g., "ckpt/cat-cap-annealing0.8-t2").
The inference script is inference_t2.sh as follows:
CUDA_VISIBLE_DEVICES=0 python inference_t2.py \
--annealing=0.8 \
--unet_path="U_PATH" \
--text1_path="T_PATH" \
--text2_path="DELTA_PATH" \
--target_prompt="xxx" \
--save_path="./"Please specify --unet_path (e.g., "ckpt/cat"), --text1_path (e.g., "ckpt/cat-annealing0.8"), --text2_path (e.g., "ckpt/cat-cap-annealing0.8-t2"), and --target_prompt (e.g., "A cat wearing a wool cap.").
We provide some checkpoints in the following:
| Image | Abduction-1 | Abduction-2 |
|---|---|---|
DAC/data/cat |
U | Δ |
DAC/data/glass |
U | Δ |
DAC/data/black |
U | Δ |
DAC/data/cat |
U, T | Δ |
DAC/data/glass |
U, T | Δ |
DAC/data/black |
U, T | Δ |
In this code we refer to the following codebase: Diffusers and PEFT. Great thanks to them!