Med-CMR: A Fine-Grained Benchmark Integrating Visual Evidence and Clinical Logic for Medical Complex Multimodal Reasoning
Med-CMR includes seven tasks, and each task corresponds to a specific type of medical multimodal reasoning complexity.
To upload your results, please create a pull request with your result file. The results will be verified before being added to the leaderboard.
Bold indicates the best. Italics indicates the second place.
| Model | Year | SOD | FDD | SU | TP | CR | LTG | MSI | All Score |
|---|---|---|---|---|---|---|---|---|---|
| GPT-5 | 2025 | 66.08 | 71.45 | 62.06 | 58.33 | 60.30 | 55.19 | 69.00 | 57.81 |
| Gemini-2.5-Pro | 2025 | 58.75 | 68.07 | 56.70 | 52.08 | 53.54 | 46.42 | 64.42 | 49.87 |
| Model | Year | SOD | FDD | SU | TP | CR | LTG | MSI | All Score |
|---|---|---|---|---|---|---|---|---|---|
| Medgemma-4B | 2025 | 16.13 | 17.72 | 13.12 | 14.58 | 17.64 | 14.00 | 23.45 | 14.90 |
| Lingshu-7B | 2025 | 32.84 | 47.12 | 31.17 | 38.99 | 31.53 | 23.86 | 39.62 | 27.26 |
| Gemma3-4B | 2025 | 31.57 | 38.68 | 31.59 | 25.70 | 28.10 | 23.74 | 35.04 | 25.98 |
| Qwen2.5-VL-7B | 2025 | 37.83 | 48.10 | 33.29 | 32.74 | 35.22 | 28.15 | 43.13 | 31.06 |
| InternVL3.5-8B | 2025 | 29.52 | 36.01 | 25.95 | 30.95 | 29.71 | 21.53 | 31.00 | 24.17 |
| Qwen3-VL-8B | 2025 | 46.63 | 53.87 | 45.84 | 42.86 | 43.50 | 34.48 | 53.64 | 38.18 |
| Medgemma-27B | 2025 | 37.44 | 47.54 | 37.24 | 29.46 | 30.80 | 25.92 | 36.93 | 28.91 |
| Lingshu-32B | 2025 | 37.83 | 48.95 | 31.88 | 38.99 | 36.21 | 27.08 | 40.70 | 30.47 |
| Gemma3-27B | 2025 | 45.94 | 52.74 | 35.98 | 37.80 | 42.35 | 33.62 | 45.28 | 37.07 |
| Qwen2.5-VL-32B | 2025 | 42.82 | 52.60 | 36.25 | 38.89 | 39.39 | 29.60 | 45.82 | 33.36 |
| Qwen2.5-VL-72B | 2025 | 52.10 | 61.32 | 47.39 | 51.19 | 46.36 | 38.46 | 54.18 | 42.17 |
| InternVL3.5-38B | 2025 | 42.13 | 48.10 | 37.80 | 44.05 | 38.35 | 32.30 | 40.97 | 35.07 |
| Qwen3-VL-30B-A3B | 2025 | 44.57 | 51.20 | 41.18 | 39.88 | 38.14 | 32.86 | 47.17 | 35.79 |
| Qwen3-VL-32B | 2025 | 49.66 | 60.90 | 49.22 | 46.43 | 47.45 | 41.58 | 53.91 | 44.28 |
| InternVL3.5-241B-A28B | 2025 | 55.91 | 65.68 | 52.47 | 54.17 | 48.80 | 42.73 | 56.33 | 46.17 |
| Qwen3-VL-235B-A22B | 2025 | 57.48 | 66.95 | 55.99 | 55.06 | 53.33 | 45.86 | 63.07 | 49.34 |
| Model | Year | Con | Coh | VA | GT | All Score |
|---|---|---|---|---|---|---|
| GPT-5 | 2025 | 97.77 | 88.86 | 40.45 | 34.65 | 48.70 |
| Gemini-2.5-Pro | 2025 | 98.11 | 89.36 | 35.77 | 32.07 | 45.98 |
| Model | Year | Con | Coh | VA | GT | All Score |
|---|---|---|---|---|---|---|
| Medgemma-4B | 2025 | 86.98 | 57.17 | 26.65 | 17.57 | 32.10 |
| Lingshu-7B | 2025 | 96.19 | 73.96 | 33.47 | 26.26 | 40.91 |
| Gemma3-4B | 2025 | 90.05 | 64.26 | 18.53 | 13.14 | 28.10 |
| Qwen2.5-VL-7B | 2025 | 92.35 | 65.83 | 26.21 | 19.15 | 33.96 |
| InternVL3.5-8B | 2025 | 95.20 | 71.10 | 35.38 | 26.65 | 41.44 |
| Qwen3-VL-8B | 2025 | 88.46 | 72.99 | 28.72 | 23.54 | 37.05 |
| Medgemma-27B | 2025 | 89.82 | 72.21 | 20.64 | 17.58 | 31.49 |
| Lingshu-32B | 2025 | 96.89 | 76.36 | 35.76 | 28.48 | 43.02 |
| Gemma3-27B | 2025 | 93.72 | 76.23 | 25.96 | 19.96 | 35.36 |
| Qwen2.5-VL-32B | 2025 | 95.54 | 75.81 | 33.55 | 26.66 | 41.22 |
| Qwen2.5-VL-72B | 2025 | 96.31 | 75.11 | 33.29 | 25.68 | 40.73 |
| InternVL3.5-38B | 2025 | 97.20 | 76.48 | 36.83 | 29.03 | 43.71 |
| Qwen3-VL-30B-A3B | 2025 | 94.37 | 79.84 | 29.73 | 25.15 | 39.37 |
| Qwen3-VL-32B | 2025 | 96.00 | 79.31 | 31.09 | 25.65 | 39.37 |
| InternVL3.5-241B-A28B | 2025 | 96.87 | 76.33 | 42.74 | 33.66 | 47.88 |
| Qwen3-VL-235B-A22B | 2025 | 97.10 | 85.21 | 33.02 | 27.95 | 42.62 |
If you use Med-CMR in your research, please cite our paper:
@misc{gong2025medcmrfinegrainedbenchmarkintegrating,
title={Med-CMR: A Fine-Grained Benchmark Integrating Visual Evidence and Clinical Logic for Medical Complex Multimodal Reasoning},
author={Haozhen Gong and Xiaozhong Ji and Yuansen Liu and Wenbin Wu and Xiaoxiao Yan and Jingjing Liu and Kai Wu and Jiazhen Pan and Bailiang Jian and Jiangning Zhang and Xiaobin Hu and Hongwei Bran Li},
year={2025},
eprint={2512.00818},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2512.00818},
}- Haozhen Gong: [email protected]
