- Table of Contents
- 🔔 News
- 🌟 Overview
- 🤗 Dataset
- 🛠️ Requirements and Installation
- 💥 Inference
- 💥 Evaluation
-
[2025.5.16] Code is available now!
-
[2025.5.16] We release the MMKE-Bench dataset at 🤗 Huggingface Dataset.
TL;DR: we introduce MLLMKC, a Multi-Modal Knowledge Conflict benchmark, designed to analyze factual knowledge conflict under both context-memory and inter-context scenarios.
You can download MMKC-Bench data 🤗 Huggingface Dataset. And the expected structure of files is:
MLLMKC
|-- image
| |-- nike
| |-- kobe
| |-- .....
|-- ER.json
|-- people_knowledge.json
|-- logo_knowledge.json
|-- IS.json
# clone MMKC-Bench
git clone https://github.com/MLLMKCBENCH/MLLMKC.git
# create conda env
cd MLLMKC
conda create -n mllmkc python=3.10
cd VLMEvalKit
pip install -r requirements.txt
Note: If you want to use local model weights, download them before running experiments:, and in VLMEvalKit/vlmeval/config.py change the local weight path inside
Begin to replace the following .sh file to revise the MODEL_NAME like "InternVL3-8B" to be consistent with the name in VLMEvalKit vlmeval/config.py
For non-GPT models
For the original answer:
# Mutl-choise question format
bash start_original_mcq.sh
# Open-ended question answer format
bash start_original_open.shFor the context-memory conflicts answer:
# Mutl-choise question format
bash start_mcq_ie.sh
# Open-ended question answer format
bash start_open_ie.shFor the inter-context conflicts answer:
# Mutl-choise question format
bash start_mcq_ee.sh
# Open-ended question answer format
bash start_open_ee.shFor GPT models:
bash start_gpt.shFor conflict detection:
# Coarse-grained conflict detection
bash detection_coarse.sh
# Fine-grained conflict detection
bash detection_fine.shWe also provide the relevant code for the evaluation. Please check in detail:
MLLMKC/evaluation/evaluation.py.
We need to organize the resulting file generated by the model into the following format and get the path to the MODEL_OUT folder as input to evaluation.py:
MODEL_OUT
|-- original
| |-- ER
| |-- IS
| |-- people_knowledge
| |-- logo_knowledge
|-- output
| |-- ER
| |-- IS
| |-- people_knowledge
| |-- people_knowledge
