StepMathAgent

StepMathAgent is a novel mathematical process evaluation agent based on Tree-of-Error, which incorporates four internal core operations: logical step segmentation, step scoring, score aggregation and error tree generation, along with four external extension modules: difficulty calibration, simplicity evaluation, completeness validation and format assessment.

We also introduce StepMathBench, a benchmark comprising 1,000 step-divided process evaluation instances, derived from 200 high-quality math problems grouped by problem type, subject category and difficulty level.

StepMathBench

Mathematical Problem: data/question.jsonl

StepMath: data/stepmath.jsonl

Quick Start

Use StepMathAgent

python agent.py

Use Four External Extension Modules in StepMathAgent

# Modify model_name in agent.py
python agent.py

Run Baseline

# Modify api_name and model_name in agent.py
python agent.py

Evaluation

python evaluator.py

Case Study

Citation

If you find this work useful in your research, please consider citing:

@misc{yang2025stepmathagentstepwiseagentevaluating,
      title={StepMathAgent: A Step-Wise Agent for Evaluating Mathematical Processes through Tree-of-Error}, 
      author={Shu-Xun Yang and Cunxiang Wang and Yidong Wang and Xiaotao Gu and Minlie Huang and Jie Tang},
      year={2025},
      eprint={2503.10105},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2503.10105}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
assets		assets
config		config
data		data
README.md		README.md
agent.py		agent.py
evaluator.py		evaluator.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

StepMathAgent

StepMathBench

Quick Start

Use StepMathAgent

Use Four External Extension Modules in StepMathAgent

Run Baseline

Evaluation

Case Study

Citation

About

Uh oh!

Releases

Packages

Languages

SHU-XUN/StepMathAgent

Folders and files

Latest commit

History

Repository files navigation

StepMathAgent

StepMathBench

Quick Start

Use StepMathAgent

Use Four External Extension Modules in StepMathAgent

Run Baseline

Evaluation

Case Study

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages