Skip to content

LL3RD/DreamFuse-Code

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DreamFuse (ICCV 2025)

Official implementation of DreamFuse: Adaptive Image Fusion with Diffusion Transformer

arXiv demo dataset model

🚀 TODO

  • Release github repo.
  • Release inference code.
  • Release training code.
  • Release model checkpoints.
  • Release arXiv paper.
  • Release the dataset.
  • Release huggingface space demo.
  • Release the LDPO code.

📖 Introduction

Image fusion seeks to seamlessly integrate foreground objects with background scenes, producing realistic and harmonious fused images. Unlike existing methods that directly insert objects into the background, adaptive and interactive fusion remains a challenging yet appealing task. To address this, we propose an iterative human-in-the-loop data generation pipeline, which leverages limited initial data with diverse textual prompts to generate fusion datasets across various scenarios and interactions, including placement, holding, wearing, and style transfer. Building on this, we introduce DreamFuse, a novel approach based on the Diffusion Transformer (DiT) model, to generate consistent and harmonious fused images with both foreground and background information. DreamFuse employs a Positional Affine mechanism and uses Localized Direct Preference Optimization guided by human feedback to refine the result. Experimental results show that DreamFuse outperforms SOTA across multiple metrics.

🔧 Dependencies and Installation

git clone https://github.com/LL3RD/DreamFuse-Code.git
cd DreamFuse
conda create -n DreamFuse python=3.10
conda activate DreamFuse
conda install pytorch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 pytorch-cuda=12.1 -c pytorch -c nvidia
pip install -r requirements.txt

📖 Dataset

We propose an iterative Human-in-the-Loop data generation pipeline and construct a comprehensive fusion dataset containing 80k diverse fusion scenarios. Over half of the dataset features outdoor backgrounds, and approximately 23k images include hand-held scenarios.

assert

Visualization about different fusion scenarios in DreamFuse dataset.

assert

Visualization about different foreground in DreamFuse dataset.

Download the dataset from huggingface:

huggingface-cli download --repo-type dataset --resume-download LL3RD/DreamFuse --local-dir DreamFuse_80K --local-dir-use-symlinks False

Extract the images with:

cat DreamFuse80K.tar.part* > DreamFuse80K.tar
tar -xvf DreamFuse80K.tar

If you want to visualize the data, please refer to the function in data_reader.py to extract data.

🌟 Gradio Demo

python inference/dreamfuse_gui.py

✍️ Inference

Run inference on single GPU:

python inference/dreamfuse_inference.py

For multi-GPU support:

python inference/multi_gpu_starter.py

🚀 Training

To train DreamFuse from T2I model (flux-dev):

bash dreamfuse_train.sh

Adjust hyperparameters directly in dreamfuse_train.sh and modify the file path in configs/dreamfuse.yaml

🎨 Examples

Please visit our Project Gallery.

📄 Citation

If you find this project useful for your research, please consider citing our paper.

License

DreamFuse follows the FLUX-DEV License standard. See LICENSE for more information

About

The code of “DreamFuse: Adaptive Image Fusion with Diffusion Transformer”.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published