🚧 This repository is under construction 🚧
This repository is an official PyTorch implementation of the NeurIPS 2025 paper: Efficient Multimodal Dataset Distillation via Generative Models
- Python 3.8+
- CUDA 11.0+ (for GPU support)
# Clone the repository
git clone https://github.com/ichbill/EDGE.git
cd EDGE
# Create conda environment
conda create -n edge_vldd python=3.8
conda activate edge_vldd
# Install dependencies
pip install -r requirements.txtEDGE/
├── dataset/ # Dataset loading modules
│ ├── coco_dataset.py
│ ├── flickr30k_dataset.py
│ ├── cc3m_dataset.py
│ └── randaugment.py
├── utils/ # Utility modules
│ ├── networks.py # Model architectures
│ ├── epoch.py # Training/evaluation loops
│ └── vl_distill_utils.py # Distillation utilities
├── evaluation/ # Evaluation scripts
│ └── evaluation.py
├── edge_diffusion_train.py # Main training script
├── edge_diffusion_sampling.py # Sampling script
├── sampling.py # Alternative sampling
├── caption_synthesis_gpt.py # GPT-4 caption generation
├── caption_synthesis_llama.py # Llama caption generation
├── caption_synthesis_llava.py # LLaVA caption generation
└── convert_results_*.py # Dataset format conversion
For pretrained model checkpoints and datasets, please follow the instructions at: https://github.com/silicx/LoRS_Distill/
Follow the CC3M download instructions
Train a Stable Diffusion model with EDGE diffusion losses:
python edge_diffusion_train.pyGenerate synthetic images using the trained model:
# Single GPU
python edge_diffusion_sampling.py \
--dataset coco \
--num_queries 1000 \
--sd_model your_checkpoint \
--save_tag experiment_name
# Multi-GPU
CUDA_VISIBLE_DEVICES=0,1,2,3 accelerate launch --num_processes=4 \
edge_diffusion_sampling.py \
--dataset coco \
--num_queries 1000 \
--sd_model your_checkpoint \
--save_tag experiment_nameGenerate diverse captions using vision-language models:
export OPENAI_API_KEY="your-api-key"
python caption_synthesis_gpt.py \
--dataset coco \
--label_file_path ./data/coco/annotations.json \
--cpi 2 \
--save_tag gpt4_captionsCUDA_VISIBLE_DEVICES=0 python caption_synthesis_llava.py \
--dataset coco \
--label_file_path ./data/coco/annotations.json \
--cpi 2 \
--save_tag llava_captionspython caption_synthesis_llama.py \
--dataset coco \
--image_root ./data/coco \
--ann_root ./data/coco \
--ckpt_dir /path/to/llama/checkpoint \
--output_file ./synthesized_captions.jsonEvaluate the distilled dataset:
CUDA_VISIBLE_DEVICES=0 python evaluation/evaluation.py \
--dataset coco \
--distill_image ./sampling_results/coco/experiment/images/ \
--distill_ann ./sampling_results/coco/experiment/ \
--image_root ./data/coco/ \
--ann_root ./data/coco/ \
--text_encoder bert \
--batch_size_train 64 \
--batch_size_test 100- Code Release.
- Training Configuration Files.
- Pretrained Diffusion Checkpoints.
- Distilled Dataset Release.
For updates on these releases, please watch this repository or check back regularly.
If you find this work useful, please cite:
@article{zhao2025efficient,
title={Efficient multimodal dataset distillation via generative models},
author={Zhao, Zhenghao and Wang, Haoxuan and Wu, Junyi and Shang, Yuzhang and Liu, Gaowen and Yan, Yan},
journal={arXiv preprint arXiv:2509.15472},
year={2025}
}Our code is built upon LoRS and Minimax Diffusion.
