Official implementation of DreamFuse: Adaptive Image Fusion with Diffusion Transformer
- Release github repo.
- Release inference code.
- Release training code.
- Release model checkpoints.
- Release arXiv paper.
- Release the dataset.
- Release huggingface space demo.
- Release the LDPO code.
Image fusion seeks to seamlessly integrate foreground objects with background scenes, producing realistic and harmonious fused images. Unlike existing methods that directly insert objects into the background, adaptive and interactive fusion remains a challenging yet appealing task. To address this, we propose an iterative human-in-the-loop data generation pipeline, which leverages limited initial data with diverse textual prompts to generate fusion datasets across various scenarios and interactions, including placement, holding, wearing, and style transfer. Building on this, we introduce DreamFuse, a novel approach based on the Diffusion Transformer (DiT) model, to generate consistent and harmonious fused images with both foreground and background information. DreamFuse employs a Positional Affine mechanism and uses Localized Direct Preference Optimization guided by human feedback to refine the result. Experimental results show that DreamFuse outperforms SOTA across multiple metrics.
git clone https://github.com/LL3RD/DreamFuse-Code.git
cd DreamFuse
conda create -n DreamFuse python=3.10
conda activate DreamFuse
conda install pytorch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 pytorch-cuda=12.1 -c pytorch -c nvidia
pip install -r requirements.txtWe propose an iterative Human-in-the-Loop data generation pipeline and construct a comprehensive fusion dataset containing 80k diverse fusion scenarios. Over half of the dataset features outdoor backgrounds, and approximately 23k images include hand-held scenarios.
Visualization about different fusion scenarios in DreamFuse dataset.
Visualization about different foreground in DreamFuse dataset.
Download the dataset from huggingface:
huggingface-cli download --repo-type dataset --resume-download LL3RD/DreamFuse --local-dir DreamFuse_80K --local-dir-use-symlinks FalseExtract the images with:
cat DreamFuse80K.tar.part* > DreamFuse80K.tar
tar -xvf DreamFuse80K.tarIf you want to visualize the data, please refer to the function in data_reader.py to extract data.
python inference/dreamfuse_gui.pyRun inference on single GPU:
python inference/dreamfuse_inference.pyFor multi-GPU support:
python inference/multi_gpu_starter.pyTo train DreamFuse from T2I model (flux-dev):
bash dreamfuse_train.shAdjust hyperparameters directly in dreamfuse_train.sh and modify the file path in configs/dreamfuse.yaml
Please visit our Project Gallery.
If you find this project useful for your research, please consider citing our paper.
DreamFuse follows the FLUX-DEV License standard. See LICENSE for more information


