Skip to content

tellurion-kanata/ColorizeDiffusionXL

Repository files navigation

ColorizeDiffusion XL

arXiv Paper WACV 2025 CVPR 2025 arXiv v2 Paper CVPR 2026 Model Weights Demo License

[CVPR 2026] Official Implementation of "Towards High-resolution and Disentangled Reference-based Sketch Colorization"

SDXL-based implementation of ColorizeDiffusion, a reference-based sketch colorization framework built on Stable Diffusion. This repository contains the XL architecture (1024px) with enhanced embedding guidance for character colorization and geometry disentanglement. For the base SD2.1 implementation (512/768px), refer to the original repository.

Getting Started


conda env create -f environment.yml
conda activate hf

User Interface


python -u app.py

The default server address is http://localhost:7860.

Inference options

Option Description
Low-level injection Enable low-level feature injection for backgrounds.
Attention injection Noised low-level feature injection, 2x inference time.
Reference guidance scale Classifier-free guidance scale for the reference image.
Reference strength Decrease to increase semantic fidelity to sketch inputs.
Foreground strength Reference strength for the foreground region.
Background strength Reference strength for the background region.
Sketch guidance scale Classifier-free guidance scale for the sketch image, suggested 1.
Sketch strength Control scale of the sketch condition.
Background factor Controls how background region is blended.
Merging scale Scale for merging foreground and background.
Preprocessor Sketch preprocessing. Extract is suggested for complicated pencil drawings.
Line extractor Line extractors used when preprocessor is Extract.

Text manipulation is deactivated by default. To activate:

python -u app.py -manipulate

Use --full to expose additional advanced controls (per-level sketch strengths, cross-attention scales, injection fidelity, etc.). Refer to the base repository for details on manipulation options.

Training


Our implementation is based on Accelerate and DeepSpeed. Before starting a training, first collect data and organize your training dataset as follows:

[dataset_path]
├── image_list.json    # Optional, for image indexing
├── color/             # Color images (zip archives)
│   ├── 0001.zip
│   │   ├── 10001.png
│   │   ├── 100001.jpg
│   │   └── ...
│   ├── 0002.zip
│   └── ...
├── sketch/            # Sketch images (zip archives)
│   ├── 0001.zip
│   │   ├── 10001.png
│   │   ├── 100001.jpg
│   │   └── ...
│   ├── 0002.zip
│   └── ...
└── mask/              # Mask images (required for adapter training)
    ├── 0001.zip
    │   ├── 10001.png
    │   ├── 100001.jpg
    │   └── ...
    ├── 0002.zip
    └── ...

For details of dataset organization, see data/dataloader.py.

Training command:

accelerate launch --config_file [accelerate_config] \
    train.py \
    -n [experiment_name] \
    -d [dataset_path] \
    -bs 16 \
    -nt 4 \
    -cfg configs/training/sdxl-base.yaml \
    -pt [pretrained_model_path] \
    -lr 1e-5 \
    -fm

Note that the batch_size is micro batch size per GPU. If you run the command on 8 GPUs, the total batch size is 128. Use -fm to fit pretrained weights to the new model architecture. Refer to options.py for full arguments.

Inference & Validation


# Inference
python inference.py \
    --name inf \
    --dataroot [dataset_path] \
    --batch_size 64 \
    -cfg configs/inference/sdxl.yaml \
    -pt [pretrained_model_path] \
    -gs 5

# Validation (uses random reference images)
python inference.py \
    --name val \
    --dataroot [dataset_path] \
    --batch_size 64 \
    -cfg configs/inference/xl-val.yaml \
    -pt [pretrained_model_path] \
    -gs 5 \
    -val

The difference between inference and validation modes is that validation mode uses randomly selected images as reference inputs. Refer to options.py for full arguments.

Code Reference


  1. Stable Diffusion XL
  2. SD-webui-ControlNet
  3. Stable-Diffusion-webui
  4. K-diffusion
  5. DeepSpeed
  6. sketchKeras-PyTorch

Citation


@InProceedings{Yan_2025_CVPR,
    author    = {Yan, Dingkun and Wang, Xinrui and Li, Zhuoru and Saito, Suguru and Iwasawa, Yusuke and Matsuo, Yutaka and Guo, Jiaxian},
    title     = {Image Referenced Sketch Colorization Based on Animation Creation Workflow},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2025},
    pages     = {23391-23400}
}

@article{2026arXiv260305971Y,
    author = {{Yan}, Dingkun and {Wang}, Xinrui and {Iwasawa}, Yusuke and {Matsuo}, Yutaka and {Saito}, Suguru and {Guo}, Jiaxian},
    title = "{ColorizeDiffusion XL: Enhancing Embedding Guidance for Character Colorization and Geometry Disentanglement}",
    journal = {arXiv e-prints},
    year = {2026},
    doi = {10.48550/arXiv.2603.05971},
}

About

Official implementation of ColorizeDiffusionXL

Resources

License

Stars

Watchers

Forks

Contributors