CompBench is a comprehensive benchmark dataset for evaluating image editing model performance. This guide will help you quickly get started with using CompBench for model evaluation.
First, download the CompBench dataset in parquet format from Hugging Face:
After downloading, move the files to your working directory:
cd /CompBenchUse the provided script to extract the parquet files:
python extract_parquet.py --input_dir ./data --output_dir your_dirParameters:
--input_dir: Directory containing the parquet files--output_dir: Output directory for extracted files
edited_image folders under each task in the extracted tasks folder.
Ensure your file structure follows this pattern:
your_edited_dir/
├── action/
├── add/
├── implicit_reasoning/
├── location/
├── multi_object_remove/
├── multi_turn_add/
├── multi_turn_remove/
├── remove/
├── replace/
└── view/
Use the corresponding test scripts for evaluation. For example, for implicit editing tasks:
python eval_implicit.py --edited_dir your_edited_dirParameters:
--edited_dir: Directory containing your prepared edited images
For multi-turn editing tasks, an additional preprocessing step is required before testing:
python convert_multi_turn.pyThis script merges the metadata from multi_turn_add and multi_turn_remove tasks to ensure proper multi-turn editing evaluation.
The complete project directory structure should look like this:
CompBench_dataset/
├── .cache/ # Cache directory
├── data/ # Original parquet files
├── tasks/ # Extracted task data
│ ├── action/
│ ├── add/
│ ├── implicit_reasoning/
│ ├── location/
│ ├── multi_object_remove/
│ ├── multi_turn_add/
│ ├── multi_turn_remove/
│ ├── remove/
│ ├── replace/
│ └── view/
├── convert_multi_turn.py # Multi-turn editing preprocessing script
├── eval_implicit.py # Implicit reasoning evaluation script
├── eval_local_clip_img.py # Local CLIP image evaluation script
├── eval_local_editing.py # Local editing evaluation script
├── eval_multi_clip_img.py # Multi CLIP image evaluation script
├── eval_multi_editing.py # Multi editing evaluation script
├── extract_parquet.py # Data extraction script
CompBench provides multiple evaluation scripts for different types of editing tasks:
eval_implicit.py- For implicit reasoning taskseval_local_clip_img.py- For local CLIP-based image evaluationeval_local_editing.py- For local editing evaluationeval_multi_clip_img.py- For multi-image CLIP evaluationeval_multi_editing.py- For multi-image editing evaluationcompare_images.py- Utility for comparing images
# Extract data
python extract_parquet.py --input_dir ./data --output_dir ./tasks
# Run implicit reasoning evaluation
python eval_implicit.py --edited_dir ./your_edited_dir
# Run local editing evaluation
python eval_local_editing.py --edited_dir ./your_edited_dir
# Run CLIP-based evaluations
python eval_local_clip_img.py --edited_dir ./your_edited_dir
python eval_multi_clip_img.py --edited_dir ./your_edited_dir# Preprocess multi-turn data
python convert_multi_turn.py
# Run multi-turn editing evaluation
python eval_multi_editing.py --edited_dir ./your_multi_turn_images- File Structure Consistency: Always ensure your edited images follow the exact same directory structure as the original dataset
- Image Formats: Verify that your edited images are in the correct format (typically JPEG or PNG)
- Naming Convention: Keep the same file names as in the original dataset
- Quality Check: Validate your edited images before running evaluation to avoid processing errors