Skip to content

JiazheWei/PosterCopilot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

17 Commits
Β 
Β 
Β 
Β 

Repository files navigation

PosterCopilot: Toward Layout Reasoning and Controllable Editing for Professional Graphic Design

Jiazhe Wei1,*, Ken Li1,*, Tianyu Lao2, Haofan Wang2, Liang Wang1,3, Caifeng Shan1, Chenyang Si1,†

1PRLab, Nanjing University, 2LibLib.ai, 3Institute of Automation, Chinese Academy of Sciences

*Equal Contribution, †Corresponding Author

πŸ“„ Paper | 🌐 Project Page | ▢️ Video | πŸ€— Model Weights (Coming Soon) | πŸ€— Datasets (Coming Soon)


πŸ”₯ News

  • [2025-12-04] Our paper is now available on arXiv!

🌟 Highlights

PosterCopilot is a cutting-edge framework that advances layout reasoning and controllable editing for professional graphic design using Large Multimodal Models (LMMs).

PosterCopilot Teaser

✨ Core Features

  • 🎯 Geometrically Accurate Layouts
    Achieves precise spatial positioning through a progressive three-stage training strategy that moves beyond simple regression to distribution-based learning

  • 🎨 Aesthetic Reasoning
    Instills human-like design principles and aesthetics through reinforcement learning from aesthetic feedback

  • βœ‚οΈ Layer-level Control
    Enables precise, fine-grained editing of individual layers while maintaining global visual consistency

  • πŸ”„ Multi-round Iterative Editing
    Supports professional iterative design workflows with multiple refinement rounds on specific elements

  • 🎭 Versatile Applications
    Handles complete layout generation, insufficient assets synthesis, theme switching, and canvas reframing

πŸ“ˆ Three-Stage Training Paradigm

  1. Perturbed Supervised Fine-Tuning (PSFT)
    Reformulates coordinate regression into distribution-based learning for continuous spatial reasoning

  2. Reinforcement Learning for Visual-Reality Alignment (RL-VRA)
    Introduces geometric reward signals to ensure visual-reality alignment and spatial accuracy

  3. Reinforcement Learning from Aesthetic Feedback (RLAF)
    Employs learned aesthetic rewards to generate coherent and diverse compositions

πŸ“Š PosterCopilot Dataset

One of the largest-scale, most thematically diverse, and highest-quality multi-layer poster datasets.

  • 160K posters with 2.6M layers (1.2M text + 1.4M image/decorative elements)
  • Spans 40+ distinct domains from commercial promotions to public announcements
  • Novel OCR-based pipeline addresses over-segmentation challenges in multi-layer datasets

πŸ“‹ To-Do List

We are committed to making outstanding contributions to both academia and the graphic design industry with PosterCopilot. Our open-source plan includes:

βœ… Released

  • Project page and documentation
  • Demo video

🚧 Coming Soon

  • Data Pipeline
  • Test Dataset
  • Training Code
  • Model Weights

πŸ“ Citation

If you find PosterCopilot useful for your research, please consider citing:

@misc{wei2025postercopilot,
        title={PosterCopilot: Toward Layout Reasoning and Controllable Editing for Professional Graphic Design}, 
        author={Jiazhe Wei and Ken Li and Tianyu Lao and Haofan Wang and Liang Wang and Caifeng Shan and Chenyang Si},
        year={2025},
        eprint={2512.04082},
        archivePrefix={arXiv},
        primaryClass={cs.CV}
  }

πŸ“§ Contact

For questions and collaborations, please contact:


πŸ™ Acknowledgments

We thank all contributors and the research community for their valuable feedback and support.

Β© 2025 PosterCopilot project. Released under the Apache 2.0 License.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published