Skip to content

haojinw0027/MedFrameQA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MedFrameQA: A Multi-Image Medical VQA Benchmark for Clinical Reasoning


MedFrameQA: A Multi-Image Medical VQA Benchmark for Clinical Reasoning
Suhao Yu*, Haojin Wang*, Juncheng Wu*, Cihang Xie, Yuyin Zhou


📢 Breaking News

  • [📄💥 May 22, 2025] Our arXiv paper is released.
  • [💾 May 22, 2025] Full dataset released.

Star 🌟 us if you think it is helpful!!


⚡Introduction

MedFrameQA introduces multi-image, clinically grounded questions that require comprehensive reasoning across all images. Unlike prior benchmarks such as SLAKE and MedXpertQA, it emphasizes diagnostic complexity, expert-level knowledge, and explicit reasoning chains.

  • We develop a scalable pipeline that automatically constructs multi-image, clinically grounded VQA questions from medical education videos.
  • We benchmark ten state-of-the-art MLLMs on MEDFRAMEQA and find that their accuracies mostly fall below 50% with substantial performance across different body systems, organs, and modalities.

We open-sourced our data and code here.

🚀 Dataset construction pipeline

MedFrameQA generation pipeline contains four stages:

  1. Medical Video Collection: Collecting 3,420 medical videos via clinical search queries;
  2. Frame-Caption Pairing: Extracting keyframes and aligning with transcribed captions;
  3. Multi-Frame Merging: Merging clinically related frame-caption pairs into multi-frame clips;
  4. Question-Answer Generation: Generating multi-image VQA from the multi-frame clips.

📚 Statistical overview of MedFrameQA

main

In figure (a), we show the distribution across body systems; (b) presents the distribution across organs; (c) shows the distribution across imaging modalities; (d) provides a word cloud of keywords in MedFrameQA; and (e) reports the distribution of frame counts per question.

🤗 Dataset Download

Dataset 🤗 Huggingface Hub
MedFrameQA SuhaoYu1020/MedFrameQA

🏆 Results

Accuracy by Human Body System on MedFrameQA

main

Accuracy by Modality and Frame Count on MedFrameQA

main


💬 Quick Start

⏬ Install

Using Linux system,

  1. Clone this repository and navigate to the folder
git clone https://github.com/haojinw0027/MedFrameQA.git
cd MedFrameQA
  1. Install Package
conda create -n medframeqa python=3.10 -y
conda activate medframeqa
pip install -r requirements.txt
cd src

🎬 Generate VQA pairs from Video

Download video and audio

python process.py --process_stage download_process --csv_file ../data/30_disease_video_id.csv 

# Specify the number of videos to be downloaded
python process.py --process_stage download_process --csv_file ../data/30_disease_video_id.csv --num_ids number(-1 for all)

Extract frame from video and generate transcripts from audio

python process.py --process_stage video_process --csv_file ../data/30_disease_video_id.csv 

Frame-caption pairing

python process.py --process_stage pair_process --csv_file ../data/30_disease_video_id.csv 

# Specify the time intervals for the selection of video frames
python process.py --process_stage pair_process --csv_file ../data/30_disease_video_id.csv --bias_time 20

Multi-frame merging and question-answer generation

python process.py --process_stage vqa_process --csv_file ../data/30_disease_video_id.csv 

# Specify the max frame num of one question
python process.py --process_stage vqa_process --csv_file ../data/30_disease_video_id.csv --max_frame_num 5

🧐 Evaluate on MLLMs

python eval_process.py --input_file "your vqa pairs file path" --output_dir ../eval --model_name "your model"

# Specify the number of questions you want to evaluate
python eval_process.py --input_file "your vqa pairs file path" --output_dir ../eval --model_name "your model" --num_q number(-1 for all)

You can download our datasets to evaluate at SuhaoYu1020/MedFrameQA


📜 Citation

If you find MedFrameQA useful for your research and applications, please cite using this BibTeX:

@misc{yu2025medframeqamultiimagemedicalvqa,
      title={MedFrameQA: A Multi-Image Medical VQA Benchmark for Clinical Reasoning}, 
      author={Suhao Yu and Haojin Wang and Juncheng Wu and Cihang Xie and Yuyin Zhou},
      year={2025},
      eprint={2505.16964},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2505.16964}, 
}

🙏 Acknowledgement

  • We thank the Microsoft Accelerate Foundation Models Research Program for supporting our computing needs.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages