Med-CMR: A Fine-Grained Benchmark Integrating Visual Evidence and Clinical Logic for Medical Complex Multimodal Reasoning

🕸️ Overview

Med-CMR includes seven tasks, and each task corresponds to a specific type of medical multimodal reasoning complexity.

🏆 Leaderboard

To upload your results, please create a pull request with your result file. The results will be verified before being added to the leaderboard.

Bold indicates the best. Italics indicates the second place.

☑️ Multi-choice Questions

Proprietary Models

Model	Year	SOD	FDD	SU	TP	CR	LTG	MSI	All Score
GPT-5	2025	66.08	71.45	62.06	58.33	60.30	55.19	69.00	57.81
Gemini-2.5-Pro	2025	58.75	68.07	56.70	52.08	53.54	46.42	64.42	49.87

Open-source Models

Model	Year	SOD	FDD	SU	TP	CR	LTG	MSI	All Score
Medgemma-4B	2025	16.13	17.72	13.12	14.58	17.64	14.00	23.45	14.90
Lingshu-7B	2025	32.84	47.12	31.17	38.99	31.53	23.86	39.62	27.26
Gemma3-4B	2025	31.57	38.68	31.59	25.70	28.10	23.74	35.04	25.98
Qwen2.5-VL-7B	2025	37.83	48.10	33.29	32.74	35.22	28.15	43.13	31.06
InternVL3.5-8B	2025	29.52	36.01	25.95	30.95	29.71	21.53	31.00	24.17
Qwen3-VL-8B	2025	46.63	53.87	45.84	42.86	43.50	34.48	53.64	38.18
Medgemma-27B	2025	37.44	47.54	37.24	29.46	30.80	25.92	36.93	28.91
Lingshu-32B	2025	37.83	48.95	31.88	38.99	36.21	27.08	40.70	30.47
Gemma3-27B	2025	45.94	52.74	35.98	37.80	42.35	33.62	45.28	37.07
Qwen2.5-VL-32B	2025	42.82	52.60	36.25	38.89	39.39	29.60	45.82	33.36
Qwen2.5-VL-72B	2025	52.10	61.32	47.39	51.19	46.36	38.46	54.18	42.17
InternVL3.5-38B	2025	42.13	48.10	37.80	44.05	38.35	32.30	40.97	35.07
Qwen3-VL-30B-A3B	2025	44.57	51.20	41.18	39.88	38.14	32.86	47.17	35.79
Qwen3-VL-32B	2025	49.66	60.90	49.22	46.43	47.45	41.58	53.91	44.28
InternVL3.5-241B-A28B	2025	55.91	65.68	52.47	54.17	48.80	42.73	56.33	46.17
Qwen3-VL-235B-A22B	2025	57.48	66.95	55.99	55.06	53.33	45.86	63.07	49.34

✒️ Open-ended Questions

Proprietary Models

Model	Year	Con	Coh	VA	GT	All Score
GPT-5	2025	97.77	88.86	40.45	34.65	48.70
Gemini-2.5-Pro	2025	98.11	89.36	35.77	32.07	45.98

Open-source Models

Model	Year	Con	Coh	VA	GT	All Score
Medgemma-4B	2025	86.98	57.17	26.65	17.57	32.10
Lingshu-7B	2025	96.19	73.96	33.47	26.26	40.91
Gemma3-4B	2025	90.05	64.26	18.53	13.14	28.10
Qwen2.5-VL-7B	2025	92.35	65.83	26.21	19.15	33.96
InternVL3.5-8B	2025	95.20	71.10	35.38	26.65	41.44
Qwen3-VL-8B	2025	88.46	72.99	28.72	23.54	37.05
Medgemma-27B	2025	89.82	72.21	20.64	17.58	31.49
Lingshu-32B	2025	96.89	76.36	35.76	28.48	43.02
Gemma3-27B	2025	93.72	76.23	25.96	19.96	35.36
Qwen2.5-VL-32B	2025	95.54	75.81	33.55	26.66	41.22
Qwen2.5-VL-72B	2025	96.31	75.11	33.29	25.68	40.73
InternVL3.5-38B	2025	97.20	76.48	36.83	29.03	43.71
Qwen3-VL-30B-A3B	2025	94.37	79.84	29.73	25.15	39.37
Qwen3-VL-32B	2025	96.00	79.31	31.09	25.65	39.37
InternVL3.5-241B-A28B	2025	96.87	76.33	42.74	33.66	47.88
Qwen3-VL-235B-A22B	2025	97.10	85.21	33.02	27.95	42.62

📝 Citation

If you use Med-CMR in your research, please cite our paper:

@misc{gong2025medcmrfinegrainedbenchmarkintegrating,
  title={Med-CMR: A Fine-Grained Benchmark Integrating Visual Evidence and Clinical Logic for Medical Complex Multimodal Reasoning},
  author={Haozhen Gong and Xiaozhong Ji and Yuansen Liu and Wenbin Wu and Xiaoxiao Yan and Jingjing Liu and Kai Wu and Jiazhen Pan and Bailiang Jian and Jiangning Zhang and Xiaobin Hu and Hongwei Bran Li},
  year={2025},
  eprint={2512.00818},
  archivePrefix={arXiv},
  primaryClass={cs.AI},
  url={https://arxiv.org/abs/2512.00818},
}

📮 Contact

Haozhen Gong: [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
images		images
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Med-CMR: A Fine-Grained Benchmark Integrating Visual Evidence and Clinical Logic for Medical Complex Multimodal Reasoning

🕸️ Overview

🏆 Leaderboard

☑️ Multi-choice Questions

Proprietary Models

Open-source Models

✒️ Open-ended Questions

Proprietary Models

Open-source Models

📝 Citation

📮 Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Med-CMR: A Fine-Grained Benchmark Integrating Visual Evidence and Clinical Logic for Medical Complex Multimodal Reasoning

🕸️ Overview

🏆 Leaderboard

☑️ Multi-choice Questions

Proprietary Models

Open-source Models

✒️ Open-ended Questions

Proprietary Models

Open-source Models

📝 Citation

📮 Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages