MedFrameQA: A Multi-Image Medical VQA Benchmark for Clinical Reasoning
Suhao Yu*, Haojin Wang*, Juncheng Wu*, Cihang Xie, Yuyin Zhou
- [📄💥 May 22, 2025] Our arXiv paper is released.
- [💾 May 22, 2025] Full dataset released.
Star 🌟 us if you think it is helpful!!
MedFrameQA introduces multi-image, clinically grounded questions that require comprehensive reasoning across all images. Unlike prior benchmarks such as SLAKE and MedXpertQA, it emphasizes diagnostic complexity, expert-level knowledge, and explicit reasoning chains.
- We develop a scalable pipeline that automatically constructs multi-image, clinically grounded VQA questions from medical education videos.
- We benchmark ten state-of-the-art MLLMs on MEDFRAMEQA and find that their accuracies mostly fall below 50% with substantial performance across different body systems, organs, and modalities.
We open-sourced our data and code here.
MedFrameQA generation pipeline contains four stages:
- Medical Video Collection: Collecting 3,420 medical videos via clinical search queries;
- Frame-Caption Pairing: Extracting keyframes and aligning with transcribed captions;
- Multi-Frame Merging: Merging clinically related frame-caption pairs into multi-frame clips;
- Question-Answer Generation: Generating multi-image VQA from the multi-frame clips.
In figure (a), we show the distribution across body systems; (b) presents the distribution across organs; (c) shows the distribution across imaging modalities; (d) provides a word cloud of keywords in MedFrameQA; and (e) reports the distribution of frame counts per question.
| Dataset | 🤗 Huggingface Hub |
|---|---|
| MedFrameQA | SuhaoYu1020/MedFrameQA |
Using Linux system,
- Clone this repository and navigate to the folder
git clone https://github.com/haojinw0027/MedFrameQA.git
cd MedFrameQA- Install Package
conda create -n medframeqa python=3.10 -y
conda activate medframeqa
pip install -r requirements.txt
cd srcpython process.py --process_stage download_process --csv_file ../data/30_disease_video_id.csv
# Specify the number of videos to be downloaded
python process.py --process_stage download_process --csv_file ../data/30_disease_video_id.csv --num_ids number(-1 for all)python process.py --process_stage video_process --csv_file ../data/30_disease_video_id.csv python process.py --process_stage pair_process --csv_file ../data/30_disease_video_id.csv
# Specify the time intervals for the selection of video frames
python process.py --process_stage pair_process --csv_file ../data/30_disease_video_id.csv --bias_time 20python process.py --process_stage vqa_process --csv_file ../data/30_disease_video_id.csv
# Specify the max frame num of one question
python process.py --process_stage vqa_process --csv_file ../data/30_disease_video_id.csv --max_frame_num 5python eval_process.py --input_file "your vqa pairs file path" --output_dir ../eval --model_name "your model"
# Specify the number of questions you want to evaluate
python eval_process.py --input_file "your vqa pairs file path" --output_dir ../eval --model_name "your model" --num_q number(-1 for all)You can download our datasets to evaluate at SuhaoYu1020/MedFrameQA
If you find MedFrameQA useful for your research and applications, please cite using this BibTeX:
@misc{yu2025medframeqamultiimagemedicalvqa,
title={MedFrameQA: A Multi-Image Medical VQA Benchmark for Clinical Reasoning},
author={Suhao Yu and Haojin Wang and Juncheng Wu and Cihang Xie and Yuyin Zhou},
year={2025},
eprint={2505.16964},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2505.16964},
}- We thank the Microsoft Accelerate Foundation Models Research Program for supporting our computing needs.




