Skip to content

[Benchmark] Support VGRP-Bench#918

Merged
kennymckormick merged 40 commits intoopen-compass:mainfrom
ryf1123:main
Apr 30, 2025
Merged

[Benchmark] Support VGRP-Bench#918
kennymckormick merged 40 commits intoopen-compass:mainfrom
ryf1123:main

Conversation

@ryf1123
Copy link
Copy Markdown
Contributor

@ryf1123 ryf1123 commented Apr 15, 2025

Hi there! Thanks to this amazing project. I want to contribute a puzzle benchmark VGRP-Bench.

It has only one task, outputting the perception and answer in one output. A sample evaluation I did gives me the scores as in the attached screenshot.

One (possible) big difference between other benchmark is this puzzle benchmark includes a (different) rule-based verifier for each puzzle, such as this one (vlmeval/dataset/utils/vgrpbench/puzzles/aquarium.py); It also requires using an additional LLM to format the output (similar to the judge function, so I use it).

Thank you!

Best!
Yufan

image

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.