Kotaro Kikuchi1 Ukyo Honda1 Naoto Inoue1 Mayu Otani1 Edgar Simo-Serra2 Kota Yamaguchi1
1CyberAgent 2Waseda University
This repository contains the inference code and pre-trained models for the paper Multimodal Markup Document Models for Graphic Design Completion (ACM Multimedia 2025).
1. Clone this repository:
git clone https://github.com/CyberAgentAILab/MarkupDM.git
cd MarkupDM2. Install dependencies:
# Using pip
pip install .
# Or using uv
uv sync3. Install Google Chrome (required for SVG rendering on Linux):
wget -q -O - https://dl.google.com/linux/linux_signing_key.pub | sudo apt-key add -
sudo sh -c 'echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google-chrome.list'
sudo apt update
sudo apt install google-chrome-stable4. Run inference:
Image reconstruction:
Encodes and decodes an input image using the VQ-VAE model to test reconstruction quality.
# If using uv: uv run python inference/reconstruct_image.py ...
# Note: Replace input_image.png with your own image file
python inference/reconstruct_image.py input_image.png \
--output_path reconstructed_image.png \
--model_path cyberagent/ldm-vq-f16-rgbaDesign completion:
Generates SVG and PNG files for input, target, and predicted designs from the Crello test dataset.
Note: This requires access to bigcode/starcoderbase-7b. Visit the model page to request access.
# If using uv: uv run python inference/complete_design.py ...
python inference/complete_design.py \
--output_dir output \
--model_path cyberagent/markupdmOutput files will be saved in subdirectories within the specified directory:
input/: Input design with missing textindex.html: HTML file with embedded SVG- Referenced assets (PNG images and TTF fonts)
screenshot.png: Rendered result
target/: Original complete design (same structure as above)pred/: Model-generated completion (same structure as above)
Pre-trained models are available on Hugging Face:
- MarkupDM: cyberagent/markupdm - Main model for design completion
- LDM-VQ-F16-RGBA: cyberagent/ldm-vq-f16-rgba - Image tokenizer for RGBA images
See dataset/README.md for details on the Crello-Instruct dataset used in this project.
This repository is released under the Apache-2.0 license.
@inproceedings{Kikuchi2025,
title = {Multimodal Markup Document Models for Graphic Design Completion},
author = {Kotaro Kikuchi and Ukyo Honda and Naoto Inoue and Mayu Otani and Edgar Simo-Serra and Kota Yamaguchi},
booktitle = {ACM International Conference on Multimedia},
year = {2025},
doi = {10.1145/3746027.3755420}
}