GitHub - melvinsevi/MVA-Enhancing-IP-Adapter-Generation-Diversity: The goal of this project is to improve the IP-Adapter model by Hu and al.

Introduction

Text-to-image diffusion models have demonstrated impressive capabilities in generating high-quality images. However, generating highly specific images solely through text prompts presents challenges, particularly in prompt engineering. The IP-Adapter is a lightweight solution with only 22 million parameters, designed to address these challenges. But it lacks of variation in generated images. So in our project we aim to enhance the model using cross attention between text and image features before doing the decouple crossed attention they talked about in the paper.

Here's a figure of our modification and at the right the figure from the original model extracted from the paper.

Features

Additional Image Prompt Capability: IP-Adapter enhances pretrained text-to-image diffusion models by introducing an extra image prompt capability.
Decoupled Cross-Attention Mechanism: The model employs a distinctive decoupled cross-attention mechanism, dividing attention layers between text and image features. This allows conditioning image generation with both text and image prompts.
Adaptability: IP-Adapter is designed for seamless adaptation to different custom models. This includes the generation of human images conditioned on faces or poses, based on the original model.

Challenges and Considerations

Despite its advantages, the IP-Adapter model faces challenges, notably the risk of generating images closely resembling the reference image in terms of style and content. Striking a balance between consistency with the reference image and introducing content variation is crucial.

Project Goal

The primary goal of our project is to enhance the diversity of IP-Adapter's image generation. We aim to ensure that synthesized images exhibit a broader range of variation while maintaining fidelity to the original text.

Usage

Installation

# Clone the IP-Adapter repository
git clone https://github.com/melvinsevi/MVA-Enhancing-IP-Adapter-Generation-Diversity

# Clone the IP-Adapter_bis folder
git clone https://huggingface.co/h94/IP-Adapter

# Move contents to IPAdapter directory
mv /content/IP-Adapter/* /content/IP-MVA-Enhancing-IP-Adapter-Generation-Diversity/
mv MVA-Enhancing-IP-Adapter-Generation-Diversity IPAdapter

Preprocessing

To download the dataset from huggingface, run :

python preprocessing_data.py

Training

!accelerate launch --num_processes 1 \
  IPAdapter/tutorial_train_plus.py \
  --pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5" \
  --image_encoder_path="IPAdapter/models/image_encoder" \
  --data_json_file="IPAdapter/output.json" \
  --data_root_path="dataset" \
  --resolution=512 \
  --train_batch_size=4\
  --dataloader_num_workers=1 \
  --learning_rate=1e-04 \
  --weight_decay=0.01 \
  --output_dir="IPAdapter" \
  --save_steps=132

Inference

python test2.py

Details

For more details, youc can directly see the README of the original paper in the IP_Adapter folder

#Notes

Note that all the files that we modified have the _new extension in their name (example: ip_adapter_new.py) The parts we modified is always between two comments "MODIFIED"

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
IP-Adapter		IP-Adapter
figures		figures
Capture d'écran 2024-01-16 030415.png		Capture d'écran 2024-01-16 030415.png
Diagramme_features.png		Diagramme_features.png
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Introduction

Features

Challenges and Considerations

Project Goal

Usage

Installation

Preprocessing

Training

Inference

Details

About

Uh oh!

Releases

Packages

Uh oh!

Languages

melvinsevi/MVA-Enhancing-IP-Adapter-Generation-Diversity

Folders and files

Latest commit

History

Repository files navigation

Introduction

Features

Challenges and Considerations

Project Goal

Usage

Installation

Preprocessing

Training

Inference

Details

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages