Skip to content

AlenjandroWang/UniReason

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BAGEL

UniReason Paper on arXiv UniReason Model UniReason tuing data

UniReason 1.0: A Unified Reasoning Framework for World Knowledge Aligned Image Generation and Editing

Dianyi Wang**, Chaofan Ma**, Feng Han, Size Wu, Wei Song, Yibin Wang, Zhixiong Zhang, Tianhang Wang, Siyuan Wang 📧, Zhongyu Wei 📧, Jiaqi Wang 🎩 📧

contact: [email protected], [email protected], [email protected], [email protected]

We propose UniReason, a unified framework that harmonizes these two tasks through a dual reasoning paradigm. We formulate generation as world knowledge-enhanced planning to inject implicit constraints, and leverage editing capabilities for fine-grained visual refinement to further correct visual errors via self-reflection. This approach unifies generation and editing within a shared representation, mirroring the human cognitive process of planning followed by refinement. We support this framework by systematically constructing a large-scale reasoning-centric dataset covering five major knowledge domains (e.g., cultural commonsense, physics, etc.) for planning, alongside an agent-generated corpus for visual self-correction. Extensive experiments demonstrate that UniReason achieves advanced performance on reasoning-intensive benchmarks such as WISE and KrisBench, while maintaining superior general synthesis capabilities on GenEval and ImgEdit. The figure below showcases UniReason's qualitative performance.

🧠 Method

Our core objective is to equip the unified multimodal model to infer implicit world knowledge underlying abstract instructions, and integrate world knowledge inference and surface-level organization into textual reasoning. This process provides explicit and structured guidance for synthesizing an initial visual output, mirroring human conceptual planning prior to rendering. The second complementary components is Fine-grained Editing-like Visual Refinement that re-assesses the initial synthesized image considering prior textual reasoning, reflectively identifies and verbalizes inconsistencies or missing details or incorporating a second round of textual reasoning to think twice, enabling iterative reflection and correction.

🔥 News

  • Feb 1, 2026: We released UniReason 1.0 stage_1(Foundational Generation Strengthening) and stage_2(Interleaved Reasoning Tuning) checkpoints on Huggingface, support both T2I generation and image editing with two complementary reasoning paradigms.
  • Feb 2, 2026: We released the training and evaluation code support wide range of benchmarks
  • Feb 3, 2026: We released the UniReason 1.0 technical report on Arxiv

🔥 Train & Eval

Set up environment

git clone https://github.com/AlenjandroWang/UniReason.git
cd UniReason
conda create -n UniReason python=3.10 -y
conda activate UniReason
pip install -r requirements.txt
pip install flash_attn==2.5.8 --no-build-isolation

Train

We provide the scripts for Interleaved Reasoning Tuning.

bash scripts/sft.sh

You can replace the variables in the script with your own before running. See TRAIN for more details.

Eval

We provide the scripts for evaluating T2I and Editing benchmarks, support World Knowledge-Enhanced Textual Reasoning and Fine-grained Editing-like Visual Refinement. Please See EVAL for more details.

📊 Benchmarks

1. Text-to-Image Generation

Model Geneval ↑ DPGBench ↑ WISE ↑
BAGEL 0.88 85.07 0.70
Hunyuan-Image-3.0 0.72 86.10 0.57
Qwen-Image 0.74 88.32 0.62
UniCoT 0.83 - 0.75
UniReason 0.90 86.21 0.78

2. Image Editing

Model GEdit-EN ↑ KrisBench ↑ UniREditBench ↑
BAGEL 6.52 60.18 50.96
Qwen-Image-Edit 7.56 - 56.52
LightFusion-World 6.58 61.85 -
UniCoT 6.74 68.00 -
UniReason 6.94 68.23 70.06

🎨 Qualitative Results

✍️ Citation

@article{wang2026unireason,
  title={UniReason 1.0: A Unified Reasoning Framework for World Knowledge Aligned Image Generation and Editing},
  author={Wang, Dianyi and Ma, Chaofan and Han, Feng and Wu, Size and Song, Wei and Wang, Yibin and Zhang, Zhixiong and Wang, Tianhang and Wang, Siyuan and Wei, Zhongyu and others},
  journal={arXiv preprint arXiv:2602.02437},
  year={2026}
}

🙏 Acknowledgement

The project builds upon the following pioneering works:

  • BAGEL: We thank the BAGEL team releasing the elegant and concise code and strong performance unified model.
  • BLIP3-o: We thank the BLIP3-o team for releasing the precious high-quality tuning dataset.
  • OpenGPT-4o-Image: We thank the OpenGPT-4o-Image team for releasing the precious high-quality tuning dataset.
  • ShareGPT-4o-Image: We thank the ShareGPT-4o-Image team for releasing the precious high-quality tuning dataset.
  • Echo-4o: We thank the Echo-4o team for releasing the precious high-quality tuning dataset.
  • Picobanana: We thank the Picobanana team for releasing the precious high-quality editing tuning dataset.
  • Nano-consist: We thank the Nano-consist team for releasing the precious high-quality editing tuning dataset.
  • UniREditBench: We thank the UniREditBench team for releasing the precious high-quality reason-based editing tuning dataset.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors