Skip to content

CyberAgentAILab/MarkupDM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Multimodal Markup Document Models for Graphic Design Completion

arXiv Paper License

Kotaro Kikuchi1   Ukyo Honda1   Naoto Inoue1   Mayu Otani1   Edgar Simo-Serra2   Kota Yamaguchi1

1CyberAgent   2Waseda University

MarkupDM Teaser


This repository contains the inference code and pre-trained models for the paper Multimodal Markup Document Models for Graphic Design Completion (ACM Multimedia 2025).

Usage

1. Clone this repository:

git clone https://github.com/CyberAgentAILab/MarkupDM.git
cd MarkupDM

2. Install dependencies:

# Using pip
pip install .

# Or using uv
uv sync

3. Install Google Chrome (required for SVG rendering on Linux):

wget -q -O - https://dl.google.com/linux/linux_signing_key.pub | sudo apt-key add -
sudo sh -c 'echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google-chrome.list'
sudo apt update
sudo apt install google-chrome-stable

4. Run inference:

Image reconstruction:

Encodes and decodes an input image using the VQ-VAE model to test reconstruction quality.

# If using uv: uv run python inference/reconstruct_image.py ...
# Note: Replace input_image.png with your own image file
python inference/reconstruct_image.py input_image.png \
    --output_path reconstructed_image.png \
    --model_path cyberagent/ldm-vq-f16-rgba

Design completion:

Generates SVG and PNG files for input, target, and predicted designs from the Crello test dataset.

Note: This requires access to bigcode/starcoderbase-7b. Visit the model page to request access.

# If using uv: uv run python inference/complete_design.py ...
python inference/complete_design.py \
    --output_dir output \
    --model_path cyberagent/markupdm

Output files will be saved in subdirectories within the specified directory:

  • input/: Input design with missing text
    • index.html: HTML file with embedded SVG
    • Referenced assets (PNG images and TTF fonts)
    • screenshot.png: Rendered result
  • target/: Original complete design (same structure as above)
  • pred/: Model-generated completion (same structure as above)

Pre-trained Models

Pre-trained models are available on Hugging Face:

Dataset

See dataset/README.md for details on the Crello-Instruct dataset used in this project.

License

This repository is released under the Apache-2.0 license.

Citation

@inproceedings{Kikuchi2025,
  title     = {Multimodal Markup Document Models for Graphic Design Completion},
  author    = {Kotaro Kikuchi and Ukyo Honda and Naoto Inoue and Mayu Otani and Edgar Simo-Serra and Kota Yamaguchi},
  booktitle = {ACM International Conference on Multimedia},
  year      = {2025},
  doi       = {10.1145/3746027.3755420}
}

About

Multimodal Markup Document Models for Graphic Design Completion

Resources

License

Stars

Watchers

Forks

Languages