ReEdit is an efficient end-to-end optimization-free framework for exemplar-based image editing. Unlike existing approaches, it doesn't require fine-tuning or optimization during inference time.
Given a pair of exemplar images (original and edited), ReEdit captures the edit and applies it to a test image to obtain the corresponding edited version. The framework consists of three main components:
- Image Space Edit Capture: Uses pretrained adapter modules to capture edits in the image embedding space
- Text Space Edit Capture: Incorporates multimodal VLMs for detailed reasoning and edit description
- Content Preservation: Conditions image generation on test image features and self-attention maps
- No fine-tuning or optimization required during inference
- ~4x faster than baseline methods
- Preserves original image structure while applying edits
- Works with various types of edits
- Model-agnostic (independent of base diffusion model)
This project has 2 different conda environments llava and reedit. You can set up these environments manually by running:
cd LLaVA
conda create -n llava python=3.10 -y
conda activate llava
pip install --upgrade pip
pip install -e .
pip install protobufconda create -n reedit python=3.9 -y
conda activate reedit
pip install -r requirements.txtFirst add your exemplar pairs in the data directory in the following format:
data
└── add_butterfly
├── 0_0.png
├── 0_1.png
└── 1_0.png
To generate LLaVA captions for your exemplar pairs, run:
python preprocess-llava.py --directory data
cd LLaVA
python edit.py --img_fol ../data --res_fol ../llava_results
python get_caption.py --img_fol ../data --res_fol ../llava_results
python3 truncate_caption.py --res_fol ../llava_resultspython3 preprocess.py --data_path datapython3 preprocess.py --data_path data
python3 pnp.py --name reedit --group reeditThe project includes a curated dataset of 1474 exemplar pairs covering various edit types:
- Global Style Transfer (428 pairs)
- Background Change (212 pairs)
- Localized Style Transfer (290 pairs)
- Object Replacement (366 pairs)
- Motion Edit (14 pairs)
- Object Insertion (164 pairs)
ReEdit combines several key components:
- IP-Adapter: Handles image prompt conditioning
- LLaVA Integration: Provides detailed reasoning and text descriptions
- PNP Module: Maintains the structure of the test image while performing the edit
Compared to baselines:
- 4x faster inference time
- Better consistency in non-edited regions
- Higher edit accuracy
- Improved structure preservation
@article{srivastava2024reedit,
title={ReEdit: Multimodal Exemplar-Based Image Editing with Diffusion Models},
author={Srivastava, Ashutosh and Menta, Tarun Ram and Java, Abhinav and Jadhav, Avadhoot and Singh, Silky and Jandial, Surgan and Krishnamurthy, Balaji},
journal={arXiv preprint arXiv:2411.03982},
year={2024}
}
