Skip to content

[Benchmark] Fix function extract_subjective in CreationBench#911

Merged
FangXinyu-0913 merged 2 commits intoopen-compass:mainfrom
SYuan03:creation_re
Apr 15, 2025
Merged

[Benchmark] Fix function extract_subjective in CreationBench#911
FangXinyu-0913 merged 2 commits intoopen-compass:mainfrom
SYuan03:creation_re

Conversation

@SYuan03
Copy link
Copy Markdown
Collaborator

@SYuan03 SYuan03 commented Apr 11, 2025

Using GPT-4o-20241120 as the judge model sometimes generates answers like "**FINAL VERDICT IS:...", which cannot be extracted correctly by the old function.

@kennymckormick
Copy link
Copy Markdown
Member

@FangXinyu-0913 Please help review & merge this PR.

Copy link
Copy Markdown
Collaborator

@FangXinyu-0913 FangXinyu-0913 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approve this PR

@FangXinyu-0913 FangXinyu-0913 merged commit 5ff6ede into open-compass:main Apr 15, 2025
7 checks passed
kennymckormick pushed a commit to ryf1123/VLMEvalKit that referenced this pull request Apr 24, 2025
kennymckormick added a commit that referenced this pull request Apr 30, 2025
* add vgrpbench

* remove unnecessary files

* [Improvement] Allow setting model name for lmdeploy wrapper (#913)

Signed-off-by: Isotr0py <[email protected]>

* [Minor] Add GPT-4.1

* [Fix] Fix function extract_subjective in dataset/creation.py (#911)

* creation: extract_subjective

* fix lint

* [Fix] fix LA mode in HLE

* [Fix] Fix COT Prompt BUG (#922)

* [Patch] Bypass SSL (#923)

* [Benchmark] Add PHYSICS Benchmark for Open-Ended Physics Reasoning (#931)

* add physic.py and update dataset logic

* Initial commit:integrated physics prompt eval

* fix lint

* [Fix] update get judge model logic in physics dataset

* edit the prompt in auxeval

* fix auxeval in physices

* fix lint

---------

Co-authored-by: FangXinyu-0913 <[email protected]>

* [Dataset ] add support for SAIL-VL-1.5 (#926)

* 修改提交

* 提交名称修改

* 去除提交

* 去除提交

* 文件名修改

* 文件名修改

* 格式修复

---------

Co-authored-by: jinfeng.km <[email protected]>
Co-authored-by: qiuyan.kk <[email protected]>

* remove unnecessary file

* [Fix] fix physics_yale with not using custom prompt in internvl series

* [Model] Support SAIL-VL-1.6 (#939)

Co-authored-by: qiuyan.kk <[email protected]>

* [Minor] More info in tqdm progress bar (#937)

* [Feature] Add vLLM support for Qwen2-VL/Qwen2.5-VL (#935)

Co-authored-by: TianhaoLiang2000 <[email protected]>

* [Benchmark] Support MMIFEval (#938)

* add mmifeval

* add req nltk

* [Fix] update url and remove unnecessary log

---------

Co-authored-by: FangXinyu-0913 <[email protected]>

* [Model] Add support for Janus-Pro-1B (#945)

add support for Janus-Pro-1B

* [Minor] Patch to fix DynaMath preprocess

* add vgrpbench

* [Fix] Fix Lint

* remove files

* add vgrpbench's format json files, and update gitignore rule

* [Benchmark] Add PHYSICS Benchmark for Open-Ended Physics Reasoning (#931)

* add physic.py and update dataset logic

* Initial commit:integrated physics prompt eval

* fix lint

* [Fix] update get judge model logic in physics dataset

* edit the prompt in auxeval

* fix auxeval in physices

* fix lint

---------

Co-authored-by: FangXinyu-0913 <[email protected]>

* [Benchmark] Add Support for Spatial457 Benchmark (CVPR 2025 Highlight) (#932)

* update spatial457

* fix format

* update readme

* update README.md

* update summarize.py

* update dataset/__init__.py

* update summarize.py

* Revert image_vqa.py

* add back spatial457

* Implement a more robust strategy for Spatial457 answer matching

---------

Co-authored-by: kennymckormick <[email protected]>
Co-authored-by: Haodong Duan <[email protected]>

* [Model] add support for Qwen2.5-Omni (#883)

* add support for qwen2.5_omni

* add support for qwen2.5_omni (only single process)

* update model cls for qwen2_5omni

* Delete VIDEO_DLC_scripts/MMSci_internvl2_8b.sh

* Delete VIDEO_DLC_scripts/video_lb_update_cu118_smol.sh

* Delete VIDEO_DLC_scripts/video_lb_update_qwen2_5_vl_7b.sh

* Delete files

* [Fix] Fix Lint

---------

Co-authored-by: Haodong Duan <[email protected]>
Co-authored-by: kennymckormick <[email protected]>

* [Benchmark] Support VisuLogic (#944)

Co-authored-by: Haodong Duan <[email protected]>

* [Minor] Support Gemini 2.5 Flash / Pro (#958)

* [Minor] Add Explicit Format Instruction for AMBER (#961)

* [Minor] Fix all_finished return null (#951)

* [Benchmark] Support CVBench (CV-Bench-2D, CV-Bench-3D) (#909)

* [Benchmark] Support CVBench, including CV-Bench-2D, CV-Bench-3D two sub tasks.

* fix(image_mcq.py): prompt error

* [Fix] Fix vllm with config (#953)

* fix use config with vllm

* fix

* update

* [Fix] Fix MM-IFEval & Custom Prompt in InternVL (#959)

* [Model] Support SenseNova-V6-Pro (#964)

* [Model] Support SenseNova-V6

* update model config

* update

* update

* update config

* [Benchmark] Support TDBench (#947)

* [Benchmark] Add TDBench for top-down images

* fix REresult symlink and index

* fix symlink

* fix lint

* [Fix] Refactor Task Launching Policy (#952)

* Update run.py

* [Refactor] Set CUDA_VISIBLE_DEVICES at the beginning

* [Minor] auto / cuda device for several VLMs

* [Doc] Update Doc

* [Minor] Update CV-Bench URL

* [Fix] Fix tmp ans load error in MM-IFEval (#969)

* Fix tmp ans load error in MM-IFEval

* Fix KeyError 0

* add vgrpbench

* [Benchmark] Add PHYSICS Benchmark for Open-Ended Physics Reasoning (#931)

* add physic.py and update dataset logic

* Initial commit:integrated physics prompt eval

* fix lint

* [Fix] update get judge model logic in physics dataset

* edit the prompt in auxeval

* fix auxeval in physices

* fix lint

---------

Co-authored-by: FangXinyu-0913 <[email protected]>

---------

Signed-off-by: Isotr0py <[email protected]>
Co-authored-by: Isotr0py <[email protected]>
Co-authored-by: kennymckormick <[email protected]>
Co-authored-by: Shengyuan Ding <[email protected]>
Co-authored-by: Xinyu Fang <[email protected]>
Co-authored-by: Haodong Duan <[email protected]>
Co-authored-by: suencgo <[email protected]>
Co-authored-by: cmatachuan <[email protected]>
Co-authored-by: jinfeng.km <[email protected]>
Co-authored-by: qiuyan.kk <[email protected]>
Co-authored-by: Xiangyu Zhao <[email protected]>
Co-authored-by: TianhaoLiang2000 <[email protected]>
Co-authored-by: TianhaoLiang2000 <[email protected]>
Co-authored-by: Jiang Li <[email protected]>
Co-authored-by: Xingrui Wang <[email protected]>
Co-authored-by: xwy-bit <[email protected]>
Co-authored-by: psp_dada <[email protected]>
Co-authored-by: MaoSong2022 <[email protected]>
Co-authored-by: Scott Zhao <[email protected]>
chengyuehuang511 pushed a commit to chengyuehuang511/VLMEvalKit that referenced this pull request May 7, 2025
Koii2k3 pushed a commit to wjnwjn59/VLMEvalKit that referenced this pull request Nov 13, 2025
Koii2k3 pushed a commit to wjnwjn59/VLMEvalKit that referenced this pull request Nov 13, 2025
* add vgrpbench

* remove unnecessary files

* [Improvement] Allow setting model name for lmdeploy wrapper (open-compass#913)

Signed-off-by: Isotr0py <[email protected]>

* [Minor] Add GPT-4.1

* [Fix] Fix function extract_subjective in dataset/creation.py (open-compass#911)

* creation: extract_subjective

* fix lint

* [Fix] fix LA mode in HLE

* [Fix] Fix COT Prompt BUG (open-compass#922)

* [Patch] Bypass SSL (open-compass#923)

* [Benchmark] Add PHYSICS Benchmark for Open-Ended Physics Reasoning (open-compass#931)

* add physic.py and update dataset logic

* Initial commit:integrated physics prompt eval

* fix lint

* [Fix] update get judge model logic in physics dataset

* edit the prompt in auxeval

* fix auxeval in physices

* fix lint

---------

Co-authored-by: FangXinyu-0913 <[email protected]>

* [Dataset ] add support for SAIL-VL-1.5 (open-compass#926)

* 修改提交

* 提交名称修改

* 去除提交

* 去除提交

* 文件名修改

* 文件名修改

* 格式修复

---------

Co-authored-by: jinfeng.km <[email protected]>
Co-authored-by: qiuyan.kk <[email protected]>

* remove unnecessary file

* [Fix] fix physics_yale with not using custom prompt in internvl series

* [Model] Support SAIL-VL-1.6 (open-compass#939)

Co-authored-by: qiuyan.kk <[email protected]>

* [Minor] More info in tqdm progress bar (open-compass#937)

* [Feature] Add vLLM support for Qwen2-VL/Qwen2.5-VL (open-compass#935)

Co-authored-by: TianhaoLiang2000 <[email protected]>

* [Benchmark] Support MMIFEval (open-compass#938)

* add mmifeval

* add req nltk

* [Fix] update url and remove unnecessary log

---------

Co-authored-by: FangXinyu-0913 <[email protected]>

* [Model] Add support for Janus-Pro-1B (open-compass#945)

add support for Janus-Pro-1B

* [Minor] Patch to fix DynaMath preprocess

* add vgrpbench

* [Fix] Fix Lint

* remove files

* add vgrpbench's format json files, and update gitignore rule

* [Benchmark] Add PHYSICS Benchmark for Open-Ended Physics Reasoning (open-compass#931)

* add physic.py and update dataset logic

* Initial commit:integrated physics prompt eval

* fix lint

* [Fix] update get judge model logic in physics dataset

* edit the prompt in auxeval

* fix auxeval in physices

* fix lint

---------

Co-authored-by: FangXinyu-0913 <[email protected]>

* [Benchmark] Add Support for Spatial457 Benchmark (CVPR 2025 Highlight) (open-compass#932)

* update spatial457

* fix format

* update readme

* update README.md

* update summarize.py

* update dataset/__init__.py

* update summarize.py

* Revert image_vqa.py

* add back spatial457

* Implement a more robust strategy for Spatial457 answer matching

---------

Co-authored-by: kennymckormick <[email protected]>
Co-authored-by: Haodong Duan <[email protected]>

* [Model] add support for Qwen2.5-Omni (open-compass#883)

* add support for qwen2.5_omni

* add support for qwen2.5_omni (only single process)

* update model cls for qwen2_5omni

* Delete VIDEO_DLC_scripts/MMSci_internvl2_8b.sh

* Delete VIDEO_DLC_scripts/video_lb_update_cu118_smol.sh

* Delete VIDEO_DLC_scripts/video_lb_update_qwen2_5_vl_7b.sh

* Delete files

* [Fix] Fix Lint

---------

Co-authored-by: Haodong Duan <[email protected]>
Co-authored-by: kennymckormick <[email protected]>

* [Benchmark] Support VisuLogic (open-compass#944)

Co-authored-by: Haodong Duan <[email protected]>

* [Minor] Support Gemini 2.5 Flash / Pro (open-compass#958)

* [Minor] Add Explicit Format Instruction for AMBER (open-compass#961)

* [Minor] Fix all_finished return null (open-compass#951)

* [Benchmark] Support CVBench (CV-Bench-2D, CV-Bench-3D) (open-compass#909)

* [Benchmark] Support CVBench, including CV-Bench-2D, CV-Bench-3D two sub tasks.

* fix(image_mcq.py): prompt error

* [Fix] Fix vllm with config (open-compass#953)

* fix use config with vllm

* fix

* update

* [Fix] Fix MM-IFEval & Custom Prompt in InternVL (open-compass#959)

* [Model] Support SenseNova-V6-Pro (open-compass#964)

* [Model] Support SenseNova-V6

* update model config

* update

* update

* update config

* [Benchmark] Support TDBench (open-compass#947)

* [Benchmark] Add TDBench for top-down images

* fix REresult symlink and index

* fix symlink

* fix lint

* [Fix] Refactor Task Launching Policy (open-compass#952)

* Update run.py

* [Refactor] Set CUDA_VISIBLE_DEVICES at the beginning

* [Minor] auto / cuda device for several VLMs

* [Doc] Update Doc

* [Minor] Update CV-Bench URL

* [Fix] Fix tmp ans load error in MM-IFEval (open-compass#969)

* Fix tmp ans load error in MM-IFEval

* Fix KeyError 0

* add vgrpbench

* [Benchmark] Add PHYSICS Benchmark for Open-Ended Physics Reasoning (open-compass#931)

* add physic.py and update dataset logic

* Initial commit:integrated physics prompt eval

* fix lint

* [Fix] update get judge model logic in physics dataset

* edit the prompt in auxeval

* fix auxeval in physices

* fix lint

---------

Co-authored-by: FangXinyu-0913 <[email protected]>

---------

Signed-off-by: Isotr0py <[email protected]>
Co-authored-by: Isotr0py <[email protected]>
Co-authored-by: kennymckormick <[email protected]>
Co-authored-by: Shengyuan Ding <[email protected]>
Co-authored-by: Xinyu Fang <[email protected]>
Co-authored-by: Haodong Duan <[email protected]>
Co-authored-by: suencgo <[email protected]>
Co-authored-by: cmatachuan <[email protected]>
Co-authored-by: jinfeng.km <[email protected]>
Co-authored-by: qiuyan.kk <[email protected]>
Co-authored-by: Xiangyu Zhao <[email protected]>
Co-authored-by: TianhaoLiang2000 <[email protected]>
Co-authored-by: TianhaoLiang2000 <[email protected]>
Co-authored-by: Jiang Li <[email protected]>
Co-authored-by: Xingrui Wang <[email protected]>
Co-authored-by: xwy-bit <[email protected]>
Co-authored-by: psp_dada <[email protected]>
Co-authored-by: MaoSong2022 <[email protected]>
Co-authored-by: Scott Zhao <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants