[Benchmark] Support VGRP-Bench by ryf1123 · Pull Request #918 · open-compass/VLMEvalKit

ryf1123 · 2025-04-15T07:26:39Z

Hi there! Thanks to this amazing project. I want to contribute a puzzle benchmark VGRP-Bench.

It has only one task, outputting the perception and answer in one output. A sample evaluation I did gives me the scores as in the attached screenshot.

One (possible) big difference between other benchmark is this puzzle benchmark includes a (different) rule-based verifier for each puzzle, such as this one (vlmeval/dataset/utils/vgrpbench/puzzles/aquarium.py); It also requires using an additional LLM to format the output (similar to the judge function, so I use it).

Thank you!

Best!
Yufan

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Benchmark] Support VGRP-Bench#918

[Benchmark] Support VGRP-Bench#918
kennymckormick merged 40 commits intoopen-compass:mainfrom
ryf1123:main

ryf1123 commented Apr 15, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

15 participants

Conversation

ryf1123 commented Apr 15, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

15 participants