GitHub - MLLMKCBENCH/MLLMKC: 【AAAI 2026 🔥】A benchmark that evaluates multimodel knowledge conflicts for large multimodal model

🔔 News

[2025.5.16] Code is available now!
[2025.5.16] We release the MMKE-Bench dataset at 🤗 Huggingface Dataset.

🌟Overview

TL;DR: we introduce MLLMKC, a Multi-Modal Knowledge Conflict benchmark, designed to analyze factual knowledge conflict under both context-memory and inter-context scenarios.

🤗 Dataset

You can download MMKC-Bench data 🤗 Huggingface Dataset. And the expected structure of files is:

MLLMKC
|-- image
|   |-- nike
|   |-- kobe
|   |-- .....
|-- ER.json
|-- people_knowledge.json
|-- logo_knowledge.json
|-- IS.json

🛠️ Requirements and Installation

# clone MMKC-Bench
git clone https://github.com/MLLMKCBENCH/MLLMKC.git

# create conda env
cd MLLMKC
conda create -n mllmkc python=3.10
cd VLMEvalKit
pip install -r requirements.txt

💥Inference

Note: If you want to use local model weights, download them before running experiments:, and in VLMEvalKit/vlmeval/config.py change the local weight path inside

Begin to replace the following .sh file to revise the MODEL_NAME like "InternVL3-8B" to be consistent with the name in VLMEvalKit vlmeval/config.py

For non-GPT models

For the original answer:

# Mutl-choise question format
bash start_original_mcq.sh

# Open-ended question answer format 
bash start_original_open.sh

For the context-memory conflicts answer:

# Mutl-choise question format
bash start_mcq_ie.sh

# Open-ended question answer format 
bash start_open_ie.sh

For the inter-context conflicts answer:

# Mutl-choise question format
bash start_mcq_ee.sh

# Open-ended question answer format 
bash start_open_ee.sh

For GPT models:

bash start_gpt.sh

For conflict detection:

# Coarse-grained conflict detection
bash detection_coarse.sh

# Fine-grained conflict detection
bash detection_fine.sh

💥Evaluation：

We also provide the relevant code for the evaluation. Please check in detail:

MLLMKC/evaluation/evaluation.py.

We need to organize the resulting file generated by the model into the following format and get the path to the MODEL_OUT folder as input to evaluation.py:

MODEL_OUT
|-- original
|   |-- ER
|   |-- IS
|   |-- people_knowledge
|   |-- logo_knowledge
|-- output
|   |-- ER
|   |-- IS
|   |-- people_knowledge
|   |-- people_knowledge

Name		Name	Last commit message	Last commit date
Latest commit History 60 Commits
.idea		.idea
Knowledge_Conflict_result		Knowledge_Conflict_result
VLMEvalKit		VLMEvalKit
evaluation		evaluation
figs		figs
models		models
.DS_Store		.DS_Store
README.md		README.md
detection_coarse.sh		detection_coarse.sh
detection_fine.sh		detection_fine.sh
index.html		index.html
inference.py		inference.py
inference1.py		inference1.py
inference1_d.py		inference1_d.py
inference1_d_x.py		inference1_d_x.py
inference1_gpt.py		inference1_gpt.py
inference_d.py		inference_d.py
inference_gpt.py		inference_gpt.py
start_gpt.sh		start_gpt.sh
start_mcq_ee.sh		start_mcq_ee.sh
start_mcq_ie.sh		start_mcq_ie.sh
start_open_ee.sh		start_open_ee.sh
start_open_ie.sh		start_open_ie.sh
start_original_mcq.sh		start_original_mcq.sh
start_original_open.sh		start_original_open.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Table of Contents

🔔 News

🌟Overview

🤗 Dataset

🛠️ Requirements and Installation

💥Inference

💥Evaluation：

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 4

Uh oh!

Languages

MLLMKCBENCH/MLLMKC

Folders and files

Latest commit

History

Repository files navigation

Table of Contents

🔔 News

🌟Overview

🤗 Dataset

🛠️ Requirements and Installation

💥Inference

💥Evaluation：

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 4

Uh oh!

Languages

Packages