[Benchmark] Fix function extract_subjective in CreationBench by SYuan03 · Pull Request #911 · open-compass/VLMEvalKit

SYuan03 · 2025-04-11T13:10:21Z

Using GPT-4o-20241120 as the judge model sometimes generates answers like "**FINAL VERDICT IS:...", which cannot be extracted correctly by the old function.

kennymckormick · 2025-04-11T13:23:46Z

@FangXinyu-0913 Please help review & merge this PR.

FangXinyu-0913

Approve this PR

…mpass#911) * creation: extract_subjective * fix lint

* add vgrpbench * remove unnecessary files * [Improvement] Allow setting model name for lmdeploy wrapper (#913) Signed-off-by: Isotr0py <[email protected]> * [Minor] Add GPT-4.1 * [Fix] Fix function extract_subjective in dataset/creation.py (#911) * creation: extract_subjective * fix lint * [Fix] fix LA mode in HLE * [Fix] Fix COT Prompt BUG (#922) * [Patch] Bypass SSL (#923) * [Benchmark] Add PHYSICS Benchmark for Open-Ended Physics Reasoning (#931) * add physic.py and update dataset logic * Initial commit:integrated physics prompt eval * fix lint * [Fix] update get judge model logic in physics dataset * edit the prompt in auxeval * fix auxeval in physices * fix lint --------- Co-authored-by: FangXinyu-0913 <[email protected]> * [Dataset ] add support for SAIL-VL-1.5 (#926) * 修改提交 * 提交名称修改 * 去除提交 * 去除提交 * 文件名修改 * 文件名修改 * 格式修复 --------- Co-authored-by: jinfeng.km <[email protected]> Co-authored-by: qiuyan.kk <[email protected]> * remove unnecessary file * [Fix] fix physics_yale with not using custom prompt in internvl series * [Model] Support SAIL-VL-1.6 (#939) Co-authored-by: qiuyan.kk <[email protected]> * [Minor] More info in tqdm progress bar (#937) * [Feature] Add vLLM support for Qwen2-VL/Qwen2.5-VL (#935) Co-authored-by: TianhaoLiang2000 <[email protected]> * [Benchmark] Support MMIFEval (#938) * add mmifeval * add req nltk * [Fix] update url and remove unnecessary log --------- Co-authored-by: FangXinyu-0913 <[email protected]> * [Model] Add support for Janus-Pro-1B (#945) add support for Janus-Pro-1B * [Minor] Patch to fix DynaMath preprocess * add vgrpbench * [Fix] Fix Lint * remove files * add vgrpbench's format json files, and update gitignore rule * [Benchmark] Add PHYSICS Benchmark for Open-Ended Physics Reasoning (#931) * add physic.py and update dataset logic * Initial commit:integrated physics prompt eval * fix lint * [Fix] update get judge model logic in physics dataset * edit the prompt in auxeval * fix auxeval in physices * fix lint --------- Co-authored-by: FangXinyu-0913 <[email protected]> * [Benchmark] Add Support for Spatial457 Benchmark (CVPR 2025 Highlight) (#932) * update spatial457 * fix format * update readme * update README.md * update summarize.py * update dataset/__init__.py * update summarize.py * Revert image_vqa.py * add back spatial457 * Implement a more robust strategy for Spatial457 answer matching --------- Co-authored-by: kennymckormick <[email protected]> Co-authored-by: Haodong Duan <[email protected]> * [Model] add support for Qwen2.5-Omni (#883) * add support for qwen2.5_omni * add support for qwen2.5_omni (only single process) * update model cls for qwen2_5omni * Delete VIDEO_DLC_scripts/MMSci_internvl2_8b.sh * Delete VIDEO_DLC_scripts/video_lb_update_cu118_smol.sh * Delete VIDEO_DLC_scripts/video_lb_update_qwen2_5_vl_7b.sh * Delete files * [Fix] Fix Lint --------- Co-authored-by: Haodong Duan <[email protected]> Co-authored-by: kennymckormick <[email protected]> * [Benchmark] Support VisuLogic (#944) Co-authored-by: Haodong Duan <[email protected]> * [Minor] Support Gemini 2.5 Flash / Pro (#958) * [Minor] Add Explicit Format Instruction for AMBER (#961) * [Minor] Fix all_finished return null (#951) * [Benchmark] Support CVBench (CV-Bench-2D, CV-Bench-3D) (#909) * [Benchmark] Support CVBench, including CV-Bench-2D, CV-Bench-3D two sub tasks. * fix(image_mcq.py): prompt error * [Fix] Fix vllm with config (#953) * fix use config with vllm * fix * update * [Fix] Fix MM-IFEval & Custom Prompt in InternVL (#959) * [Model] Support SenseNova-V6-Pro (#964) * [Model] Support SenseNova-V6 * update model config * update * update * update config * [Benchmark] Support TDBench (#947) * [Benchmark] Add TDBench for top-down images * fix REresult symlink and index * fix symlink * fix lint * [Fix] Refactor Task Launching Policy (#952) * Update run.py * [Refactor] Set CUDA_VISIBLE_DEVICES at the beginning * [Minor] auto / cuda device for several VLMs * [Doc] Update Doc * [Minor] Update CV-Bench URL * [Fix] Fix tmp ans load error in MM-IFEval (#969) * Fix tmp ans load error in MM-IFEval * Fix KeyError 0 * add vgrpbench * [Benchmark] Add PHYSICS Benchmark for Open-Ended Physics Reasoning (#931) * add physic.py and update dataset logic * Initial commit:integrated physics prompt eval * fix lint * [Fix] update get judge model logic in physics dataset * edit the prompt in auxeval * fix auxeval in physices * fix lint --------- Co-authored-by: FangXinyu-0913 <[email protected]> --------- Signed-off-by: Isotr0py <[email protected]> Co-authored-by: Isotr0py <[email protected]> Co-authored-by: kennymckormick <[email protected]> Co-authored-by: Shengyuan Ding <[email protected]> Co-authored-by: Xinyu Fang <[email protected]> Co-authored-by: Haodong Duan <[email protected]> Co-authored-by: suencgo <[email protected]> Co-authored-by: cmatachuan <[email protected]> Co-authored-by: jinfeng.km <[email protected]> Co-authored-by: qiuyan.kk <[email protected]> Co-authored-by: Xiangyu Zhao <[email protected]> Co-authored-by: TianhaoLiang2000 <[email protected]> Co-authored-by: TianhaoLiang2000 <[email protected]> Co-authored-by: Jiang Li <[email protected]> Co-authored-by: Xingrui Wang <[email protected]> Co-authored-by: xwy-bit <[email protected]> Co-authored-by: psp_dada <[email protected]> Co-authored-by: MaoSong2022 <[email protected]> Co-authored-by: Scott Zhao <[email protected]>

…mpass#911) * creation: extract_subjective * fix lint

* add vgrpbench * remove unnecessary files * [Improvement] Allow setting model name for lmdeploy wrapper (open-compass#913) Signed-off-by: Isotr0py <[email protected]> * [Minor] Add GPT-4.1 * [Fix] Fix function extract_subjective in dataset/creation.py (open-compass#911) * creation: extract_subjective * fix lint * [Fix] fix LA mode in HLE * [Fix] Fix COT Prompt BUG (open-compass#922) * [Patch] Bypass SSL (open-compass#923) * [Benchmark] Add PHYSICS Benchmark for Open-Ended Physics Reasoning (open-compass#931) * add physic.py and update dataset logic * Initial commit:integrated physics prompt eval * fix lint * [Fix] update get judge model logic in physics dataset * edit the prompt in auxeval * fix auxeval in physices * fix lint --------- Co-authored-by: FangXinyu-0913 <[email protected]> * [Dataset ] add support for SAIL-VL-1.5 (open-compass#926) * 修改提交 * 提交名称修改 * 去除提交 * 去除提交 * 文件名修改 * 文件名修改 * 格式修复 --------- Co-authored-by: jinfeng.km <[email protected]> Co-authored-by: qiuyan.kk <[email protected]> * remove unnecessary file * [Fix] fix physics_yale with not using custom prompt in internvl series * [Model] Support SAIL-VL-1.6 (open-compass#939) Co-authored-by: qiuyan.kk <[email protected]> * [Minor] More info in tqdm progress bar (open-compass#937) * [Feature] Add vLLM support for Qwen2-VL/Qwen2.5-VL (open-compass#935) Co-authored-by: TianhaoLiang2000 <[email protected]> * [Benchmark] Support MMIFEval (open-compass#938) * add mmifeval * add req nltk * [Fix] update url and remove unnecessary log --------- Co-authored-by: FangXinyu-0913 <[email protected]> * [Model] Add support for Janus-Pro-1B (open-compass#945) add support for Janus-Pro-1B * [Minor] Patch to fix DynaMath preprocess * add vgrpbench * [Fix] Fix Lint * remove files * add vgrpbench's format json files, and update gitignore rule * [Benchmark] Add PHYSICS Benchmark for Open-Ended Physics Reasoning (open-compass#931) * add physic.py and update dataset logic * Initial commit:integrated physics prompt eval * fix lint * [Fix] update get judge model logic in physics dataset * edit the prompt in auxeval * fix auxeval in physices * fix lint --------- Co-authored-by: FangXinyu-0913 <[email protected]> * [Benchmark] Add Support for Spatial457 Benchmark (CVPR 2025 Highlight) (open-compass#932) * update spatial457 * fix format * update readme * update README.md * update summarize.py * update dataset/__init__.py * update summarize.py * Revert image_vqa.py * add back spatial457 * Implement a more robust strategy for Spatial457 answer matching --------- Co-authored-by: kennymckormick <[email protected]> Co-authored-by: Haodong Duan <[email protected]> * [Model] add support for Qwen2.5-Omni (open-compass#883) * add support for qwen2.5_omni * add support for qwen2.5_omni (only single process) * update model cls for qwen2_5omni * Delete VIDEO_DLC_scripts/MMSci_internvl2_8b.sh * Delete VIDEO_DLC_scripts/video_lb_update_cu118_smol.sh * Delete VIDEO_DLC_scripts/video_lb_update_qwen2_5_vl_7b.sh * Delete files * [Fix] Fix Lint --------- Co-authored-by: Haodong Duan <[email protected]> Co-authored-by: kennymckormick <[email protected]> * [Benchmark] Support VisuLogic (open-compass#944) Co-authored-by: Haodong Duan <[email protected]> * [Minor] Support Gemini 2.5 Flash / Pro (open-compass#958) * [Minor] Add Explicit Format Instruction for AMBER (open-compass#961) * [Minor] Fix all_finished return null (open-compass#951) * [Benchmark] Support CVBench (CV-Bench-2D, CV-Bench-3D) (open-compass#909) * [Benchmark] Support CVBench, including CV-Bench-2D, CV-Bench-3D two sub tasks. * fix(image_mcq.py): prompt error * [Fix] Fix vllm with config (open-compass#953) * fix use config with vllm * fix * update * [Fix] Fix MM-IFEval & Custom Prompt in InternVL (open-compass#959) * [Model] Support SenseNova-V6-Pro (open-compass#964) * [Model] Support SenseNova-V6 * update model config * update * update * update config * [Benchmark] Support TDBench (open-compass#947) * [Benchmark] Add TDBench for top-down images * fix REresult symlink and index * fix symlink * fix lint * [Fix] Refactor Task Launching Policy (open-compass#952) * Update run.py * [Refactor] Set CUDA_VISIBLE_DEVICES at the beginning * [Minor] auto / cuda device for several VLMs * [Doc] Update Doc * [Minor] Update CV-Bench URL * [Fix] Fix tmp ans load error in MM-IFEval (open-compass#969) * Fix tmp ans load error in MM-IFEval * Fix KeyError 0 * add vgrpbench * [Benchmark] Add PHYSICS Benchmark for Open-Ended Physics Reasoning (open-compass#931) * add physic.py and update dataset logic * Initial commit:integrated physics prompt eval * fix lint * [Fix] update get judge model logic in physics dataset * edit the prompt in auxeval * fix auxeval in physices * fix lint --------- Co-authored-by: FangXinyu-0913 <[email protected]> --------- Signed-off-by: Isotr0py <[email protected]> Co-authored-by: Isotr0py <[email protected]> Co-authored-by: kennymckormick <[email protected]> Co-authored-by: Shengyuan Ding <[email protected]> Co-authored-by: Xinyu Fang <[email protected]> Co-authored-by: Haodong Duan <[email protected]> Co-authored-by: suencgo <[email protected]> Co-authored-by: cmatachuan <[email protected]> Co-authored-by: jinfeng.km <[email protected]> Co-authored-by: qiuyan.kk <[email protected]> Co-authored-by: Xiangyu Zhao <[email protected]> Co-authored-by: TianhaoLiang2000 <[email protected]> Co-authored-by: TianhaoLiang2000 <[email protected]> Co-authored-by: Jiang Li <[email protected]> Co-authored-by: Xingrui Wang <[email protected]> Co-authored-by: xwy-bit <[email protected]> Co-authored-by: psp_dada <[email protected]> Co-authored-by: MaoSong2022 <[email protected]> Co-authored-by: Scott Zhao <[email protected]>

SYuan03 added 2 commits April 11, 2025 20:52

creation: extract_subjective

4c06d48

fix lint

a7892c6

kennymckormick requested a review from FangXinyu-0913 April 11, 2025 13:23

FangXinyu-0913 approved these changes Apr 15, 2025

View reviewed changes

FangXinyu-0913 merged commit 5ff6ede into open-compass:main Apr 15, 2025
7 checks passed

kennymckormick pushed a commit to ryf1123/VLMEvalKit that referenced this pull request Apr 24, 2025

[Fix] Fix function extract_subjective in dataset/creation.py (open-co…

f39c171

…mpass#911) * creation: extract_subjective * fix lint

chengyuehuang511 pushed a commit to chengyuehuang511/VLMEvalKit that referenced this pull request May 7, 2025

[Fix] Fix function extract_subjective in dataset/creation.py (open-co…

33c72ba

…mpass#911) * creation: extract_subjective * fix lint

Koii2k3 pushed a commit to wjnwjn59/VLMEvalKit that referenced this pull request Nov 13, 2025

[Fix] Fix function extract_subjective in dataset/creation.py (open-co…

acbaac2

…mpass#911) * creation: extract_subjective * fix lint

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Benchmark] Fix function extract_subjective in CreationBench#911

[Benchmark] Fix function extract_subjective in CreationBench#911
FangXinyu-0913 merged 2 commits intoopen-compass:mainfrom
SYuan03:creation_re

SYuan03 commented Apr 11, 2025

Uh oh!

kennymckormick commented Apr 11, 2025

Uh oh!

FangXinyu-0913 left a comment •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

SYuan03 commented Apr 11, 2025

Uh oh!

kennymckormick commented Apr 11, 2025

Uh oh!

FangXinyu-0913 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

FangXinyu-0913 left a comment •

edited

Loading