RealUnify: Do Unified Models Truly Benefit from Unification? A Comprehensive Benchmark

arXiv Paper: Dataset:

📢 News

[2025/09/29] We are proud to introduce RealUnify, a comprehensive benchmark designed to evaluate bidirectional capability synergy. 🎉
- We aim to raise a key question through RealUnify: Do Unified Models Truly Benefit from Unification?💥

📌 Introduction

The integration of visual understanding and generation into unified multimodal models represents a significant stride toward general-purpose AI. However, a fundamental question remains unanswered by existing benchmarks: does this architectural unification actually enable synergetic interaction between the constituent capabilities?
Existing evaluation paradigms, which primarily assess understanding and generation in isolation, are insufficient for determining whether a unified model can leverage its understanding to enhance its generation, or use generative simulation to facilitate deeper comprehension.
To address this critical gap, we introduce RealUnify, a benchmark specifically designed to evaluate bidirectional capability synergy. RealUnify comprises 1,000 meticulously human-annotated instances spanning 10 categories and 32 subtasks.
It is structured around two core axes: 1) Understanding Enhances Generation (UEG), which requires reasoning (e.g., commonsense, logic) to guide image generation, and 2) Generation Enhances Understanding (GEU), which necessitates mental simulation or reconstruction (e.g., of transformed or disordered visual inputs) to solve reasoning tasks.
A key contribution is our dual-evaluation protocol, which combines direct end-to-end assessment with a diagnostic stepwise evaluation that decomposes tasks into distinct understanding and generation phases. This protocol allows us to precisely discern whether performance bottlenecks stem from deficiencies in core abilities or from a failure to integrate them.

🔍 Benchmark Overview

✨ Evaluation Pipeline

We support two evaluation methods: direct evaluation and stepwise evaluation.

Before evaluation, please download the dataset files from our Hugging Face repository to your local path.

📍 Direct Evaluation

Understanding Ehances Generation (UEG) Tasks
- For the UEG task, please use UEG_direct.json as the dataset for evaluation.
  - The prompts for image generation are stored in the prompt field. Please save the path to the generated image in the generated_image field.
- After obtaining all the generated images and saving the JSON file, please use eval/eval_generation.py for evaluation.
  - Please add the model names and their corresponding result JSON files to task_json_list in eval/eval_generation.py, and set the directory for saving the evaluation results as RES_JSON_DIR.
Generation Enhances Understanding (GEU) Tasks
- For the GEU task, please use GEU_direct.json as the dataset for evaluation.
  - The prompts for visual understanding are stored in the evaluation_prompt field. Please save the response of model in the response field.
- After obtaining all the responses and saving the JSON file, please use eval/eval_understanding.py for evaluation.
  - Please add the model names and their corresponding result JSON files to task_json_list in eval/eval_understanding.py.

📍 Stepwise Evaluation

Understanding Ehances Generation (UEG) Tasks
- For the UEG task, please use UEG_step.json as the dataset for evaluation.
  - The prompts for prompt refine (understanding) are stored in the new_prompt field. Please save the response of model in the response field.
- After obtaining all the responses and saving the JSON file, please use response as the prompt for image generation. Please save the path to the generated image in the generated_image field.
- Please add the model names and their corresponding result JSON files to task_json_list in eval/eval_generation.py, and set the directory for saving the evaluation results as RES_JSON_DIR.
Generation Enhances Understanding (GEU) Tasks
- For the GEU task, please use GEU_step.json as the dataset for evaluation.
  - The prompts for image manipulation (editing) are stored in the edit_prompt field. Please save the path to the generated image in the edit_image field.
- After obtaining all the edited images and saving the JSON file, please use edit_image as the input image for visual understanding. Please save the response of model in the response field.
- Please add the model names and their corresponding result JSON files to task_json_list in eval/eval_understanding.py.

💡 Representive Examples of Each Task

🔍 Examples of Understanding Enhances Generation (UEG) tasks in RealUnify.

🔍 Examples of Generation Enhances Understanding (GEU) tasks in RealUnify.

🔖 Dataset License

License:

RealUnify is only used for academic research. Commercial use in any form is prohibited.
The copyright of all (generated) images belongs to the image/model owners.
If there is any infringement in RealUnify, please email [email protected] and we will remove it immediately.
Without prior approval, you cannot distribute, publish, copy, disseminate, or modify RealUnify in whole or in part. 
You must strictly comply with the above restrictions.

Please send an email to [email protected]. 🌟

📚 Citation

@misc{shi2025realunifyunifiedmodelstruly,
      title={RealUnify: Do Unified Models Truly Benefit from Unification? A Comprehensive Benchmark}, 
      author={Yang Shi and Yuhao Dong and Yue Ding and Yuran Wang and Xuanyu Zhu and Sheng Zhou and Wenting Liu and Haochen Tian and Rundong Wang and Huanqian Wang and Zuyan Liu and Bohan Zeng and Ruizhe Chen and Qixun Wang and Zhuoran Zhang and Xinlong Chen and Chengzhuo Tong and Bozhou Li and Chaoyou Fu and Qiang Liu and Haotian Wang and Wenjing Yang and Yuanxing Zhang and Pengfei Wan and Yi-Fan Zhang and Ziwei Liu},
      year={2025},
      eprint={2509.24897},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2509.24897}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
eval		eval
src		src
.DS_Store		.DS_Store
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

RealUnify: Do Unified Models Truly Benefit from Unification? A Comprehensive Benchmark

📢 News

📌 Introduction

🔍 Benchmark Overview

✨ Evaluation Pipeline

📍 Direct Evaluation

📍 Stepwise Evaluation

💡 Representive Examples of Each Task

🔍 Examples of Understanding Enhances Generation (UEG) tasks in RealUnify.

🔍 Examples of Generation Enhances Understanding (GEU) tasks in RealUnify.

🔖 Dataset License

📚 Citation

About

Uh oh!

Releases

Packages

Languages

License

FrankYang-17/RealUnify

Folders and files

Latest commit

History

Repository files navigation

RealUnify: Do Unified Models Truly Benefit from Unification? A Comprehensive Benchmark

📢 News

📌 Introduction

🔍 Benchmark Overview

✨ Evaluation Pipeline

📍 Direct Evaluation

📍 Stepwise Evaluation

💡 Representive Examples of Each Task

🔍 Examples of Understanding Enhances Generation (UEG) tasks in RealUnify.

🔍 Examples of Generation Enhances Understanding (GEU) tasks in RealUnify.

🔖 Dataset License

📚 Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages