Skip to content

[Feature] Support G-Pass@k and LiveMathBench#1772

Merged
MaiziXiao merged 15 commits intoopen-compass:mainfrom
jnanliu:g-passk
Dec 30, 2024
Merged

[Feature] Support G-Pass@k and LiveMathBench#1772
MaiziXiao merged 15 commits intoopen-compass:mainfrom
jnanliu:g-passk

Conversation

@jnanliu
Copy link
Copy Markdown
Collaborator

@jnanliu jnanliu commented Dec 20, 2024

Motivation

Support the evaluation using G-Pass@k metric and update the configurations of LiveMathBench.

Modification

  • Implement GPassKEvaluator, which supports all reasoning tasks through the abstract preprocess, group, and reduce function.
  • Support to load LiveMathBench from huggingface.
  • Implement LiveMathBenchEvaluator which is inherited from GPassKEvaluator for mathematical reasoning tasks and support restart eval from the checkpoint.
  • Modify turbomind_with_tf_above_v4_33.py to ensure that gen_cfg of model_cfg is passed into the lmdeploy pipeline.
  • Print gen_config of inference backend for ease of debugging.
  • Modify openai_api.py to print error url for ease of debugging.

Checklist

Before PR:

  • Pre-commit or other linting tools are used to fix the potential lint issues.
  • Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests.
  • The modification is covered by complete unit tests. If not, please add more unit test to ensure the correctness.
  • The documentation has been modified accordingly, like docstring or example tutorials.

gen_config['temperature'] = temperature
# gen_config['top_k'] = 40
# gen_config['temperature'] = temperature
pass # use the parameters passed from gen_config
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this modification will introduce BC? @MaiziXiao

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old implementation will overwrite top_k and temperature if 'do_sample','top_k' and 'temperature' are all set. This change will make sure 'top_k' and 'temperature' will not be overwritten.

@jnanliu We should remove these lines here

Comment thread opencompass/configs/datasets/livemathbench/livemathbench_gen_9befbf.py Outdated
Comment thread opencompass/models/turbomind_with_tf_above_v4_33.py Outdated
Copy link
Copy Markdown
Collaborator

@tonysy tonysy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@MaiziXiao MaiziXiao merged commit 8e8d4f1 into open-compass:main Dec 30, 2024
stephen-nju pushed a commit to stephen-nju/opencompass that referenced this pull request May 14, 2025
* support G-Pass@k and livemathbench

* fix bugs

* fix comments of GPassKEvaluator

* update saved details of GPassKEvaluator

* update saved details of GPassKEvaluator

* fix eval api configs & update openai_api for ease of debugging

* update huggingface path

* fix method name of G-Pass@k

* fix default value of eval_model_name

* refactor G-Pass@k evaluator

* log generation params for each backend

* fix evaluation resume

* add notimplementerror
zyc140345 pushed a commit to zyc140345/opencompass that referenced this pull request Oct 23, 2025
* support G-Pass@k and livemathbench

* fix bugs

* fix comments of GPassKEvaluator

* update saved details of GPassKEvaluator

* update saved details of GPassKEvaluator

* fix eval api configs & update openai_api for ease of debugging

* update huggingface path

* fix method name of G-Pass@k

* fix default value of eval_model_name

* refactor G-Pass@k evaluator

* log generation params for each backend

* fix evaluation resume

* add notimplementerror
iamkaia pushed a commit to iamkaia/opencompass that referenced this pull request Feb 4, 2026
* support G-Pass@k and livemathbench

* fix bugs

* fix comments of GPassKEvaluator

* update saved details of GPassKEvaluator

* update saved details of GPassKEvaluator

* fix eval api configs & update openai_api for ease of debugging

* update huggingface path

* fix method name of G-Pass@k

* fix default value of eval_model_name

* refactor G-Pass@k evaluator

* log generation params for each backend

* fix evaluation resume

* add notimplementerror
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants