[CVPR 2026] Official Implementation of "Towards High-resolution and Disentangled Reference-based Sketch Colorization"
SDXL-based implementation of ColorizeDiffusion, a reference-based sketch colorization framework built on Stable Diffusion. This repository contains the XL architecture (1024px) with enhanced embedding guidance for character colorization and geometry disentanglement. For the base SD2.1 implementation (512/768px), refer to the original repository.
conda env create -f environment.yml
conda activate hfpython -u app.pyThe default server address is http://localhost:7860.
| Option | Description |
|---|---|
| Low-level injection | Enable low-level feature injection for backgrounds. |
| Attention injection | Noised low-level feature injection, 2x inference time. |
| Reference guidance scale | Classifier-free guidance scale for the reference image. |
| Reference strength | Decrease to increase semantic fidelity to sketch inputs. |
| Foreground strength | Reference strength for the foreground region. |
| Background strength | Reference strength for the background region. |
| Sketch guidance scale | Classifier-free guidance scale for the sketch image, suggested 1. |
| Sketch strength | Control scale of the sketch condition. |
| Background factor | Controls how background region is blended. |
| Merging scale | Scale for merging foreground and background. |
| Preprocessor | Sketch preprocessing. Extract is suggested for complicated pencil drawings. |
| Line extractor | Line extractors used when preprocessor is Extract. |
Text manipulation is deactivated by default. To activate:
python -u app.py -manipulateUse --full to expose additional advanced controls (per-level sketch strengths, cross-attention scales, injection fidelity, etc.).
Refer to the base repository for details on manipulation options.
Our implementation is based on Accelerate and DeepSpeed. Before starting a training, first collect data and organize your training dataset as follows:
[dataset_path]
├── image_list.json # Optional, for image indexing
├── color/ # Color images (zip archives)
│ ├── 0001.zip
│ │ ├── 10001.png
│ │ ├── 100001.jpg
│ │ └── ...
│ ├── 0002.zip
│ └── ...
├── sketch/ # Sketch images (zip archives)
│ ├── 0001.zip
│ │ ├── 10001.png
│ │ ├── 100001.jpg
│ │ └── ...
│ ├── 0002.zip
│ └── ...
└── mask/ # Mask images (required for adapter training)
├── 0001.zip
│ ├── 10001.png
│ ├── 100001.jpg
│ └── ...
├── 0002.zip
└── ...
For details of dataset organization, see data/dataloader.py.
Training command:
accelerate launch --config_file [accelerate_config] \
train.py \
-n [experiment_name] \
-d [dataset_path] \
-bs 16 \
-nt 4 \
-cfg configs/training/sdxl-base.yaml \
-pt [pretrained_model_path] \
-lr 1e-5 \
-fmNote that the batch_size is micro batch size per GPU. If you run the command on 8 GPUs, the total batch size is 128.
Use -fm to fit pretrained weights to the new model architecture.
Refer to options.py for full arguments.
# Inference
python inference.py \
--name inf \
--dataroot [dataset_path] \
--batch_size 64 \
-cfg configs/inference/sdxl.yaml \
-pt [pretrained_model_path] \
-gs 5
# Validation (uses random reference images)
python inference.py \
--name val \
--dataroot [dataset_path] \
--batch_size 64 \
-cfg configs/inference/xl-val.yaml \
-pt [pretrained_model_path] \
-gs 5 \
-valThe difference between inference and validation modes is that validation mode uses randomly selected images as reference inputs.
Refer to options.py for full arguments.
- Stable Diffusion XL
- SD-webui-ControlNet
- Stable-Diffusion-webui
- K-diffusion
- DeepSpeed
- sketchKeras-PyTorch
@InProceedings{Yan_2025_CVPR,
author = {Yan, Dingkun and Wang, Xinrui and Li, Zhuoru and Saito, Suguru and Iwasawa, Yusuke and Matsuo, Yutaka and Guo, Jiaxian},
title = {Image Referenced Sketch Colorization Based on Animation Creation Workflow},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2025},
pages = {23391-23400}
}
@article{2026arXiv260305971Y,
author = {{Yan}, Dingkun and {Wang}, Xinrui and {Iwasawa}, Yusuke and {Matsuo}, Yutaka and {Saito}, Suguru and {Guo}, Jiaxian},
title = "{ColorizeDiffusion XL: Enhancing Embedding Guidance for Character Colorization and Geometry Disentanglement}",
journal = {arXiv e-prints},
year = {2026},
doi = {10.48550/arXiv.2603.05971},
}