Keming Wu

吴科明

Ph.D. Student, Tsinghua University

Beijing, China
Email: [email protected] [email protected]

About

I am a Ph.D. student at the School of Software at Tsinghua University. I am very fortunate to have been supervised by Prof. Wenhu Chen at the University of Waterloo. Previously, I was a research intern at Visual Computing Group, Microsoft Research Asia from September, 2024 to April, 2025. Currently, my research interest include topics on deep generative models and their applications in Computer Vision and Language Models.

I’m currently actively seeking for Research Assistant, or internship positions related to any of the above topics. I’m also open to any possible discussions or collaborate opportunities. please feel free to contact me for further discussion and potential collaboration!

News

2025.11 Two papers are released: OpenMMReasoner, and the other LongVT.
2025.10 One papers are released: Focusing on generative image evaluation.
2025.09 Two papers are released: one focusing on image editing reward model, and the other on generative video evaluation.
2025.08 One paper got accepted by ACM MM 2025 Brave New Ideas Track (Oral).
2025.06 One paper about layout to image generation got accepted by ICCV 2025 (First Author).
2025.05 One paper about multi-layer image generation is released.
2025.02 One paper got accepted by CVPR 2025.
2024.10 One paper about information fusion got accepted by IEEE Transactions on Systems, Man and Cybernetics: Systems (CCF-B journal, First Author).
2024.07 One paper got accepted by ACM MM 2024 (My first CCF-A conference paper, First Author. Congratulations!).
2024.01 One paper got accepted by Information Sciences (CCF-B journal, First Author).

Open-Source Projects

LMMs-Engine

A simple, unified multimodal models training engine. Lean, flexible, and built for hacking at scale.

Contributor

Project Link: https://github.com/EvolvingLMMs-Lab/lmms-engine

LMMs-Eval

Accelerating the development of large multimodal models (LMMs) with lmms-eval. We support most text, image, video and audio tasks..

Contributor

Project Link: https://github.com/EvolvingLMMs-Lab/lmms-eval

Selected Publications

(* equal contribution)

Multi-modality AIGC & Evaluation

EditReward: A Human-Aligned Reward Model for Instruction-Guided Image Editing [WebPage] [Code]
Keming Wu, Sicong Jiang, Max Ku, Ping Nie, Minghao Liu, Wenhu Chen

Technical Report
Hybrid Layout Control for Diffusion Transformer: Fewer Annotations, Superior Aesthetics [WebPage] [Code]
Keming Wu, Junwen Chen, Zhanhao Liang, Yinuo Wang, Ji Li, Chao Zhang, Bin Wang, Yuhui Yuan

ICCV 2025
BizGen: Advancing Article-level Visual Text Rendering for Infographics Generation [PDF] [WebPage] [Code]
Yuyang Peng, Shishi Xiao, Keming Wu, Qisheng Liao, Bohan Chen, Kevin Lin, Danqing Huang, Ji Li, Yuhui Yuan

CVPR 2025
PrismLayers: Open Data for High-Quality Multi-Layer Transparent Image Generative Models [PDF] [WebPage]
Junwen Chen*, Heyang Jiang*, Keming Wu*, Yanbin Wang*, Ji Li, Chao Zhang, Keiji Yanai, Dong Chen, Yuhui Yuan

Technical Report

Multi-modality Understanding & Reasoning

OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe [PDF] [WebPage] [Code] [Daily Paper]
Kaichen Zhang*, Keming Wu*, Zuhao Yang, Bo Li, Kairui Hu, Bin Wang, Ziwei Liu, Xingxuan Li, Lidong Bing

Technical Report 🏆 Top #1 Paper of the day at HuggingFace Daily Papers
LongVT: Incentivizing "Thinking with Long Videos" via Native Tool Calling [PDF] [WebPage] [Code] [Daily Paper]
Zuhao Yang*, Sudong Wang*, Kaichen Zhang*, Keming Wu, Sicong Leng, Yifan Zhang, Bo Li, Chengwei Qin, Shijian Lu, Xingxuan Li, Lidong Bing

Technical Report 🏆 Top #2 Paper of the day at HuggingFace Daily Papers

Other Publications

(* equal contribution)

VideoScore2: Think before You Score in Generative Video Evaluation [PDF] [WebPage] [Code]
Xuan He, Dongfu Jiang, Ping Nie, Minghao Liu, Zhengxuan Jiang, Mingyi Su, Wentao Ma, Junru Lin, Chun Ye, Yi Lu, Keming Wu, Benjamin Schneider, Quy Duc Do, Zhuofeng Li, Yiming Jia + 9 more authors

Technical Report
Physics-Informed Representation Alignment for Sparse Radio-Map Reconstruction [PDF]
Haozhe Jia, Wenshuo Chen, Zhihui Huang, Lei Wang, Hongru Xiao, Nanqian Jia, Keming Wu, Songning Lai, Bowen Tian, Yutao Yue

ACM MM 2025 Brave New Ideas Track（Oral）
Free-T2M: Frequency Enhanced Text-to-Motion Diffusion Model With Consistency Loss [PDF] [Code]
Wenshuo Chen, Haozhe Jia, Songning Lai, Keming Wu, Hongru Xiao, Lijie Hu, Yutao Yue

Technical Report
RSC-SNN: Exploring the Trade-off Between Adversarial Robustness and Accuracy in Spiking Neural Networks via Randomized Smoothing Coding [PDF] [Code]
Keming Wu*, Man Yao*, Yuhong Chou, Xuerui Qiu, Rui Yang, Bo Xu, Guoqi Li

ACM MM 2024
A Fractal-based Complex Belief Entropy for Uncertainty Measure in Complex Evidence Theory [PDF]

Keming Wu, Fuyuan Xiao, Yi Zhang

IEEE Transactions on Systems, Man and Cybernetics: Systems 2024
A Novel Quantum Belief Entropy for Uncertainty Measure in Complex Evidence Theory [PDF]

Keming Wu, Fuyuan Xiao

Information Sciences 2024

Honors & Awards

National Scholarship (Three times)

Work Experience

University of Waterloo
TIGER Lab

Research Intern, very fortunate to have been supervised by Prof. Wenhu Chen.

Apr. 2025 - Current

Microsoft Research Asia
Visual Computing Group

Research Intern, supervised by Senior Researcher Yuhui Yuan and Principal Research Manager Dong Chen.

Sep. 2024 - Apr. 2025

Institute of Automation, Chinese Academy of Sciences

Research Intern, supervised by Prof. Guoqi Li.

Apr. 2023 - Jan. 2024

Professional Activities

Journal Reviewer
Conference Reviewer
- CVPR 2026, ICLR 2026