ColorizeDiffusion XL

[CVPR 2026] Official Implementation of "Towards High-resolution and Disentangled Reference-based Sketch Colorization"

SDXL-based implementation of ColorizeDiffusion, a reference-based sketch colorization framework built on Stable Diffusion. This repository contains the XL architecture (1024px) with enhanced embedding guidance for character colorization and geometry disentanglement. For the base SD2.1 implementation (512/768px), refer to the original repository.

Getting Started

conda env create -f environment.yml
conda activate hf

User Interface

python -u app.py

The default server address is http://localhost:7860.

Inference options

Option	Description
Low-level injection	Enable low-level feature injection for backgrounds.
Attention injection	Noised low-level feature injection, 2x inference time.
Reference guidance scale	Classifier-free guidance scale for the reference image.
Reference strength	Decrease to increase semantic fidelity to sketch inputs.
Foreground strength	Reference strength for the foreground region.
Background strength	Reference strength for the background region.
Sketch guidance scale	Classifier-free guidance scale for the sketch image, suggested 1.
Sketch strength	Control scale of the sketch condition.
Background factor	Controls how background region is blended.
Merging scale	Scale for merging foreground and background.
Preprocessor	Sketch preprocessing. Extract is suggested for complicated pencil drawings.
Line extractor	Line extractors used when preprocessor is Extract.

Text manipulation is deactivated by default. To activate:

python -u app.py -manipulate

Use --full to expose additional advanced controls (per-level sketch strengths, cross-attention scales, injection fidelity, etc.). Refer to the base repository for details on manipulation options.

Training

Our implementation is based on Accelerate and DeepSpeed. Before starting a training, first collect data and organize your training dataset as follows:

[dataset_path]
├── image_list.json    # Optional, for image indexing
├── color/             # Color images (zip archives)
│   ├── 0001.zip
│   │   ├── 10001.png
│   │   ├── 100001.jpg
│   │   └── ...
│   ├── 0002.zip
│   └── ...
├── sketch/            # Sketch images (zip archives)
│   ├── 0001.zip
│   │   ├── 10001.png
│   │   ├── 100001.jpg
│   │   └── ...
│   ├── 0002.zip
│   └── ...
└── mask/              # Mask images (required for adapter training)
    ├── 0001.zip
    │   ├── 10001.png
    │   ├── 100001.jpg
    │   └── ...
    ├── 0002.zip
    └── ...

For details of dataset organization, see data/dataloader.py.

Training command:

accelerate launch --config_file [accelerate_config] \
    train.py \
    -n [experiment_name] \
    -d [dataset_path] \
    -bs 16 \
    -nt 4 \
    -cfg configs/training/sdxl-base.yaml \
    -pt [pretrained_model_path] \
    -lr 1e-5 \
    -fm

Note that the batch_size is micro batch size per GPU. If you run the command on 8 GPUs, the total batch size is 128. Use -fm to fit pretrained weights to the new model architecture. Refer to options.py for full arguments.

Inference & Validation

# Inference
python inference.py \
    --name inf \
    --dataroot [dataset_path] \
    --batch_size 64 \
    -cfg configs/inference/sdxl.yaml \
    -pt [pretrained_model_path] \
    -gs 5

# Validation (uses random reference images)
python inference.py \
    --name val \
    --dataroot [dataset_path] \
    --batch_size 64 \
    -cfg configs/inference/xl-val.yaml \
    -pt [pretrained_model_path] \
    -gs 5 \
    -val

The difference between inference and validation modes is that validation mode uses randomly selected images as reference inputs. Refer to options.py for full arguments.

Code Reference

Citation

@InProceedings{Yan_2025_CVPR,
    author    = {Yan, Dingkun and Wang, Xinrui and Li, Zhuoru and Saito, Suguru and Iwasawa, Yusuke and Matsuo, Yutaka and Guo, Jiaxian},
    title     = {Image Referenced Sketch Colorization Based on Animation Creation Workflow},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2025},
    pages     = {23391-23400}
}

@article{2026arXiv260305971Y,
    author = {{Yan}, Dingkun and {Wang}, Xinrui and {Iwasawa}, Yusuke and {Matsuo}, Yutaka and {Saito}, Suguru and {Guo}, Jiaxian},
    title = "{ColorizeDiffusion XL: Enhancing Embedding Guidance for Character Colorization and Geometry Disentanglement}",
    journal = {arXiv e-prints},
    year = {2026},
    doi = {10.48550/arXiv.2603.05971},
}

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
backend		backend
configs		configs
data		data
k_diffusion		k_diffusion
ldm		ldm
libs		libs
preprocessor		preprocessor
refnet		refnet
LICENSE		LICENSE
app.py		app.py
ckpt_util.py		ckpt_util.py
environment.yml		environment.yml
inference.py		inference.py
logger.py		logger.py
options.py		options.py
readme.md		readme.md
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ColorizeDiffusion XL

Getting Started

User Interface

Inference options

Training

Inference & Validation

Code Reference

Citation

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ColorizeDiffusion XL

Getting Started

User Interface

Inference options

Training

Inference & Validation

Code Reference

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages