Multimodal Markup Document Models for Graphic Design Completion

Kotaro Kikuchi¹ Ukyo Honda¹ Naoto Inoue¹ Mayu Otani¹ Edgar Simo-Serra² Kota Yamaguchi¹

¹CyberAgent ²Waseda University

This repository contains the inference code and pre-trained models for the paper Multimodal Markup Document Models for Graphic Design Completion (ACM Multimedia 2025).

Usage

1. Clone this repository:

git clone https://github.com/CyberAgentAILab/MarkupDM.git
cd MarkupDM

2. Install dependencies:

# Using pip
pip install .

# Or using uv
uv sync

3. Install Google Chrome (required for SVG rendering on Linux):

wget -q -O - https://dl.google.com/linux/linux_signing_key.pub | sudo apt-key add -
sudo sh -c 'echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google-chrome.list'
sudo apt update
sudo apt install google-chrome-stable

4. Run inference:

Image reconstruction:

Encodes and decodes an input image using the VQ-VAE model to test reconstruction quality.

# If using uv: uv run python inference/reconstruct_image.py ...
# Note: Replace input_image.png with your own image file
python inference/reconstruct_image.py input_image.png \
    --output_path reconstructed_image.png \
    --model_path cyberagent/ldm-vq-f16-rgba

Design completion:

Generates SVG and PNG files for input, target, and predicted designs from the Crello test dataset.

Note: This requires access to bigcode/starcoderbase-7b. Visit the model page to request access.

# If using uv: uv run python inference/complete_design.py ...
python inference/complete_design.py \
    --output_dir output \
    --model_path cyberagent/markupdm

Output files will be saved in subdirectories within the specified directory:

input/: Input design with missing text
- index.html: HTML file with embedded SVG
- Referenced assets (PNG images and TTF fonts)
- screenshot.png: Rendered result
target/: Original complete design (same structure as above)
pred/: Model-generated completion (same structure as above)

Pre-trained Models

Pre-trained models are available on Hugging Face:

MarkupDM: cyberagent/markupdm - Main model for design completion
LDM-VQ-F16-RGBA: cyberagent/ldm-vq-f16-rgba - Image tokenizer for RGBA images

Dataset

See dataset/README.md for details on the Crello-Instruct dataset used in this project.

License

This repository is released under the Apache-2.0 license.

Citation

@inproceedings{Kikuchi2025,
  title     = {Multimodal Markup Document Models for Graphic Design Completion},
  author    = {Kotaro Kikuchi and Ukyo Honda and Naoto Inoue and Mayu Otani and Edgar Simo-Serra and Kota Yamaguchi},
  booktitle = {ACM International Conference on Multimedia},
  year      = {2025},
  doi       = {10.1145/3746027.3755420}
}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
assets		assets
dataset		dataset
inference		inference
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Multimodal Markup Document Models for Graphic Design Completion

Usage

Pre-trained Models

Dataset

License

Citation

About

Uh oh!

Languages

License

CyberAgentAILab/MarkupDM

Folders and files

Latest commit

History

Repository files navigation

Multimodal Markup Document Models for Graphic Design Completion

Usage

Pre-trained Models

Dataset

License

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Languages