Skip to content

LsmnBmnc/Med-CMR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 

Repository files navigation

Med-CMR: A Fine-Grained Benchmark Integrating Visual Evidence and Clinical Logic for Medical Complex Multimodal Reasoning

🕸️ Overview

Med-CMR includes seven tasks, and each task corresponds to a specific type of medical multimodal reasoning complexity.

🏆 Leaderboard

To upload your results, please create a pull request with your result file. The results will be verified before being added to the leaderboard.

Bold indicates the best. Italics indicates the second place.

☑️ Multi-choice Questions

Proprietary Models

Model Year SOD FDD SU TP CR LTG MSI All Score
GPT-5 2025 66.08 71.45 62.06 58.33 60.30 55.19 69.00 57.81
Gemini-2.5-Pro 2025 58.75 68.07 56.70 52.08 53.54 46.42 64.42 49.87

Open-source Models

Model Year SOD FDD SU TP CR LTG MSI All Score
Medgemma-4B 2025 16.13 17.72 13.12 14.58 17.64 14.00 23.45 14.90
Lingshu-7B 2025 32.84 47.12 31.17 38.99 31.53 23.86 39.62 27.26
Gemma3-4B 2025 31.57 38.68 31.59 25.70 28.10 23.74 35.04 25.98
Qwen2.5-VL-7B 2025 37.83 48.10 33.29 32.74 35.22 28.15 43.13 31.06
InternVL3.5-8B 2025 29.52 36.01 25.95 30.95 29.71 21.53 31.00 24.17
Qwen3-VL-8B 2025 46.63 53.87 45.84 42.86 43.50 34.48 53.64 38.18
Medgemma-27B 2025 37.44 47.54 37.24 29.46 30.80 25.92 36.93 28.91
Lingshu-32B 2025 37.83 48.95 31.88 38.99 36.21 27.08 40.70 30.47
Gemma3-27B 2025 45.94 52.74 35.98 37.80 42.35 33.62 45.28 37.07
Qwen2.5-VL-32B 2025 42.82 52.60 36.25 38.89 39.39 29.60 45.82 33.36
Qwen2.5-VL-72B 2025 52.10 61.32 47.39 51.19 46.36 38.46 54.18 42.17
InternVL3.5-38B 2025 42.13 48.10 37.80 44.05 38.35 32.30 40.97 35.07
Qwen3-VL-30B-A3B 2025 44.57 51.20 41.18 39.88 38.14 32.86 47.17 35.79
Qwen3-VL-32B 2025 49.66 60.90 49.22 46.43 47.45 41.58 53.91 44.28
InternVL3.5-241B-A28B 2025 55.91 65.68 52.47 54.17 48.80 42.73 56.33 46.17
Qwen3-VL-235B-A22B 2025 57.48 66.95 55.99 55.06 53.33 45.86 63.07 49.34

✒️ Open-ended Questions

Proprietary Models

Model Year Con Coh VA GT All Score
GPT-5 2025 97.77 88.86 40.45 34.65 48.70
Gemini-2.5-Pro 2025 98.11 89.36 35.77 32.07 45.98

Open-source Models

Model Year Con Coh VA GT All Score
Medgemma-4B 2025 86.98 57.17 26.65 17.57 32.10
Lingshu-7B 2025 96.19 73.96 33.47 26.26 40.91
Gemma3-4B 2025 90.05 64.26 18.53 13.14 28.10
Qwen2.5-VL-7B 2025 92.35 65.83 26.21 19.15 33.96
InternVL3.5-8B 2025 95.20 71.10 35.38 26.65 41.44
Qwen3-VL-8B 2025 88.46 72.99 28.72 23.54 37.05
Medgemma-27B 2025 89.82 72.21 20.64 17.58 31.49
Lingshu-32B 2025 96.89 76.36 35.76 28.48 43.02
Gemma3-27B 2025 93.72 76.23 25.96 19.96 35.36
Qwen2.5-VL-32B 2025 95.54 75.81 33.55 26.66 41.22
Qwen2.5-VL-72B 2025 96.31 75.11 33.29 25.68 40.73
InternVL3.5-38B 2025 97.20 76.48 36.83 29.03 43.71
Qwen3-VL-30B-A3B 2025 94.37 79.84 29.73 25.15 39.37
Qwen3-VL-32B 2025 96.00 79.31 31.09 25.65 39.37
InternVL3.5-241B-A28B 2025 96.87 76.33 42.74 33.66 47.88
Qwen3-VL-235B-A22B 2025 97.10 85.21 33.02 27.95 42.62

📝 Citation

If you use Med-CMR in your research, please cite our paper:

@misc{gong2025medcmrfinegrainedbenchmarkintegrating,
  title={Med-CMR: A Fine-Grained Benchmark Integrating Visual Evidence and Clinical Logic for Medical Complex Multimodal Reasoning},
  author={Haozhen Gong and Xiaozhong Ji and Yuansen Liu and Wenbin Wu and Xiaoxiao Yan and Jingjing Liu and Kai Wu and Jiazhen Pan and Bailiang Jian and Jiangning Zhang and Xiaobin Hu and Hongwei Bran Li},
  year={2025},
  eprint={2512.00818},
  archivePrefix={arXiv},
  primaryClass={cs.AI},
  url={https://arxiv.org/abs/2512.00818},
}

📮 Contact

About

Official code repository for Med-CMR : "A Fine-Grained Benchmark Integrating Visual Evidence and Clinical Logic for Medical Complex Multimodal Reasoning"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors