[Dataset] Add SeedBench Dataset#2020
Merged
Myhs-phz merged 22 commits intoopen-compass:mainfrom Sep 3, 2025
ChenZiHong-Gavin:SeedBench
Merged
[Dataset] Add SeedBench Dataset#2020Myhs-phz merged 22 commits intoopen-compass:mainfrom ChenZiHong-Gavin:SeedBench
Myhs-phz merged 22 commits intoopen-compass:mainfrom
ChenZiHong-Gavin:SeedBench
Conversation
Contributor
|
cc @tonysy |
There was a problem hiding this comment.
Pull Request Overview
This PR adds a new domain-specific benchmark dataset, SeedBench, for evaluating LLMs in seed science and breeding.
- Introduces the SeedBenchDataset and multiple evaluators (F1ScoreEvaluator, AverageRougeScoreEvaluator, AccScoreStr_Evaluator) in opencompass/datasets/SeedBench.py.
- Adds a new dataset configuration along with corresponding documentation and metadata updates in datasets_info.py, dataset-index.yml, and configs/datasets/SeedBench/.
Reviewed Changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| opencompass/utils/datasets_info.py | Registers SeedBench metadata in dataset info |
| opencompass/datasets/init.py | Imports the new SeedBench dataset module |
| opencompass/datasets/SeedBench.py | Implements SeedBenchDataset and its evaluators |
| opencompass/configs/datasets/SeedBench/seedbench_gen_5d5ea1.py | Provides configuration for SeedBench evaluation |
| opencompass/configs/datasets/SeedBench/seedbench_gen.py | Reads base configuration for SeedBench datasets |
| opencompass/configs/datasets/SeedBench/README.md | Documents the SeedBench dataset details |
| dataset-index.yml | Adds SeedBench entry for dataset indexing |
Comments suppressed due to low confidence (2)
opencompass/datasets/SeedBench.py:305
- [nitpick] The evaluator class name 'AccScoreStr_Evaluator' is inconsistent with its base class naming convention. Consider renaming it to 'AccScoreStrEvaluator' to maintain clarity and consistency.
class AccScoreStr_Evaluator(AccScoreStrEvaluator):
opencompass/utils/datasets_info.py:233
- [nitpick] The dataset key 'opencompass/seedbench' uses lowercase while the corresponding module file is named 'SeedBench.py'. Ensure consistent casing across modules and identifiers to avoid potential issues on case-sensitive systems.
"opencompass/seedbench": {
Collaborator
|
Please fix the lint issue. |
Contributor
Author
The lint issue has been fixed. |
Contributor
|
@MaiziXiao @Myhs-phz @bittersweet1999 pls review. |
zyc140345
pushed a commit
to zyc140345/opencompass
that referenced
this pull request
Oct 23, 2025
* [Dataset] Add SeedBench Dataset * docs: add README for SeedBench * refactor: delete unnecessary comment * fix: fix load function for SeedBenchDataset * fix: delete unnecessary code * fix: fix typo * fix: fix lint problem * docs: update summary of SeedBench * docs: add paper link * Update dataset-index.yml --------- Co-authored-by: Songyang Zhang <[email protected]> Co-authored-by: Myhs_phz <[email protected]>
iamkaia
pushed a commit
to iamkaia/opencompass
that referenced
this pull request
Feb 4, 2026
* [Dataset] Add SeedBench Dataset * docs: add README for SeedBench * refactor: delete unnecessary comment * fix: fix load function for SeedBenchDataset * fix: delete unnecessary code * fix: fix typo * fix: fix lint problem * docs: update summary of SeedBench * docs: add paper link * Update dataset-index.yml --------- Co-authored-by: Songyang Zhang <[email protected]> Co-authored-by: Myhs_phz <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
This PR introduces a new domain-specific benchmark dataset, SeedBench, which is the first multi-task benchmark designed to evaluate large language models (LLMs) in seed science, focusing on seed breeding.
Modification
Added a new dataset class SeedBenchDataset and implemented some metrics like F1Evaluator in opencompass/datasets/SeedBench.py.
Added configuration file seedbench_gen_5d5ea1.py, seedbench_gen.py and README.md in configs/datasets/SeedBench/.
Registered the dataset in datasets/init.py.
Updated datasets_info.py with dataset metadata.
Updated dataset-index.yml with dataset metadata.
BC-breaking (Optional)
No backward compatibility breaking changes introduced.
Use cases (Optional)
SeedBench assesses LLMs across three core seed breeding stages:
Built with domain experts, SeedBench features 2,264 expert-validated questions across 11 task types and 10 subcategories, initially targeting rice breeding. Future updates will include other crops like maize, soybean, and wheat.
Following the instruction, we can evaluate with SeedBench using:
Checklist
Before PR:
After PR: