Datasets:

MMVP
/

MMVP

Tasks:

Question Answering

Modalities:

Formats:

Size:

Libraries:

License:

Dataset card Data Studio Files Files and versions

xet

Community

Dataset Viewer

Auto-converted to Parquet Duplicate

Split (1)

train · 300 rows

Search is not available for this dataset

image imagewidth (px) 224 224

End of preview. Expand in Data Studio

MMVP Benchmark Datacard

Basic Information

Title: MMVP Benchmark

Description: The MMVP (Multimodal Visual Patterns) Benchmark focuses on identifying “CLIP-blind pairs” – images that are perceived as similar by CLIP despite having clear visual differences. MMVP benchmarks the performance of state-of-the-art systems, including GPT-4V, across nine basic visual patterns. It highlights the challenges these systems face in answering straightforward questions, often leading to incorrect responses and hallucinated explanations.

Dataset Details

Content Types: Images (CLIP-blind pairs)
Volume: 300 images
Source of Data: Derived from ImageNet-1k and LAION-Aesthetics
Data Collection Method: Identification of CLIP-blind pairs through comparative analysis

Downloads last month: 549

Size of downloaded dataset files:

3.12 MB

Size of the auto-converted Parquet files:

3.08 MB

Number of rows:

300