Skip to content
@WorldMedQA

WorldMedQA

👋 Welcome to WorldMedQA! 🌍

The WorldMedQA team is on a mission to elevate medical AI by refining the benchmarks used to evaluate vision and language models for healthcare.

Why It Matters:

  • MedQA Datasets: Medical knowledge is typically evaluated using MedQA datasets, consisting of multiple-choice questions from exams like the USMLE.
  • Big Data Training: LLMs, like GPT, are trained on vast datasets, including medical content from sources like PubMed and scholarly articles. 📚

The Gaps We’re Filling:

  • 🩺 Real-world Validity: Existing datasets contain errors that can affect clinical relevance.
  • 🌍 Linguistic Diversity: Many benchmarks lack proper representation of non-English languages.
  • 🖼️ Imaging Data: Most medical QA benchmarks don't account for multimodal (text + image) data.
  • 🕰️ Training Data Contamination: Older datasets may overlap with LLM training corpora, leading to biased evaluation.

Our First Release 🚀

We’ve launched our first dataset - WorldMedQA-V - to help bridge these gaps.

WorldMedQA-V is a multilingual, multimodal, clinically-validated dataset with 568 image-based medical QAs from Brazil, Israel, Japan, and Spain, designed to evaluate vision and language models for healthcare.

It is available now on Hugging Face and GitHub:

Let’s build more equitable, effective, and representative health AI together!

Pinned Loading

  1. VLMEvalKit VLMEvalKit Public

    Forked from shan23chen/VLMEvalKit

    Open-source evaluation toolkit of large vision-language models (LVLMs), support ~100 VLMs, 40+ benchmarks

    Python

Repositories

Showing 4 of 4 repositories

People

This organization has no public members. You must be a member to see who’s a part of this organization.

Top languages

Loading…

Most used topics

Loading…