[Feature] Support G-Pass@k and LiveMathBench by jnanliu · Pull Request #1772 · open-compass/opencompass

jnanliu · 2024-12-20T10:47:49Z

Motivation

Support the evaluation using G-Pass@k metric and update the configurations of LiveMathBench.

Modification

Implement GPassKEvaluator, which supports all reasoning tasks through the abstract preprocess, group, and reduce function.
Support to load LiveMathBench from huggingface.
Implement LiveMathBenchEvaluator which is inherited from GPassKEvaluator for mathematical reasoning tasks and support restart eval from the checkpoint.
Modify turbomind_with_tf_above_v4_33.py to ensure that gen_cfg of model_cfg is passed into the lmdeploy pipeline.
Print gen_config of inference backend for ease of debugging.
Modify openai_api.py to print error url for ease of debugging.

Checklist

Before PR:

Pre-commit or other linting tools are used to fix the potential lint issues.
Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests.
The modification is covered by complete unit tests. If not, please add more unit test to ensure the correctness.
The documentation has been modified accordingly, like docstring or example tutorials.

tonysy · 2024-12-23T09:21:34Z

-            gen_config['temperature'] = temperature
+            # gen_config['top_k'] = 40
+            # gen_config['temperature'] = temperature
+            pass # use the parameters passed from gen_config


Does this modification will introduce BC? @MaiziXiao

The old implementation will overwrite top_k and temperature if 'do_sample','top_k' and 'temperature' are all set. This change will make sure 'top_k' and 'temperature' will not be overwritten.

@jnanliu We should remove these lines here

tonysy

LGTM

* support G-Pass@k and livemathbench * fix bugs * fix comments of GPassKEvaluator * update saved details of GPassKEvaluator * update saved details of GPassKEvaluator * fix eval api configs & update openai_api for ease of debugging * update huggingface path * fix method name of G-Pass@k * fix default value of eval_model_name * refactor G-Pass@k evaluator * log generation params for each backend * fix evaluation resume * add notimplementerror

jnanliu added 2 commits December 20, 2024 05:27

support G-Pass@k and livemathbench

eeb76eb

fix bugs

b856865

mm-assistant bot assigned tonysy Dec 20, 2024

jnanliu temporarily deployed to prod December 20, 2024 10:48 — with GitHub Actions Inactive

fix comments of GPassKEvaluator

f19b798

jnanliu had a problem deploying to prod December 20, 2024 16:07 — with GitHub Actions Error

jnanliu added 2 commits December 20, 2024 17:03

update saved details of GPassKEvaluator

66cad93

update saved details of GPassKEvaluator

0a8807f

jnanliu temporarily deployed to prod December 20, 2024 17:06 — with GitHub Actions Inactive

jnanliu added 2 commits December 21, 2024 19:43

fix eval api configs & update openai_api for ease of debugging

3fdc500

Merge branch 'main' into g-passk

6ca63ca

jnanliu temporarily deployed to prod December 21, 2024 19:50 — with GitHub Actions Inactive

jnanliu added 2 commits December 23, 2024 03:36

update huggingface path

f0e2edd

Merge branch 'g-passk' of github.com:jnanliu/opencompass into g-passk

98983e6

jnanliu temporarily deployed to prod December 23, 2024 03:38 — with GitHub Actions Inactive

tonysy requested changes Dec 23, 2024

View reviewed changes

fix method name of G-Pass@k

dfbe983

jnanliu temporarily deployed to prod December 23, 2024 09:30 — with GitHub Actions Inactive

tonysy requested a review from liushz December 23, 2024 12:04

fix default value of eval_model_name

1dd6b77

jnanliu temporarily deployed to prod December 23, 2024 12:37 — with GitHub Actions Inactive

liushz reviewed Dec 25, 2024

View reviewed changes

Comment thread opencompass/models/turbomind_with_tf_above_v4_33.py Outdated

refactor G-Pass@k evaluator

8280c11

jnanliu had a problem deploying to prod December 25, 2024 09:40 — with GitHub Actions Failure

log generation params for each backend

ab8cb95

jnanliu temporarily deployed to prod December 25, 2024 10:03 — with GitHub Actions Inactive

fix evaluation resume

3509a26

jnanliu temporarily deployed to prod December 26, 2024 12:57 — with GitHub Actions Inactive

tonysy approved these changes Dec 27, 2024

View reviewed changes

add notimplementerror

bcc74fd

jnanliu temporarily deployed to prod December 27, 2024 12:36 — with GitHub Actions Inactive

MaiziXiao merged commit 8e8d4f1 into open-compass:main Dec 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Support G-Pass@k and LiveMathBench#1772

[Feature] Support G-Pass@k and LiveMathBench#1772
MaiziXiao merged 15 commits intoopen-compass:mainfrom
jnanliu:g-passk

jnanliu commented Dec 20, 2024 •

edited

Loading

Uh oh!

tonysy Dec 23, 2024

Uh oh!

MaiziXiao Dec 25, 2024

Uh oh!

Uh oh!

Uh oh!

tonysy left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

jnanliu commented Dec 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modification

Checklist

Uh oh!

tonysy Dec 23, 2024

Choose a reason for hiding this comment

Uh oh!

MaiziXiao Dec 25, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

tonysy left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jnanliu commented Dec 20, 2024 •

edited

Loading