Skip to content

boduan1/HAGeo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 

Repository files navigation

Gold-Medal-Level Olympiad Geometry Solving with Efficient Heuristic Auxiliary Constructions

[🤗 Benchmark][📜 Paper][🐱 GitHub][🐦 Twitter][📕 Rednote]

Repo for "Gold-Medal-Level Olympiad Geometry Solving with Efficient Heuristic Auxiliary Constructions"


Figure 1: Overview of the HAGeo method. First, the DDAR engine deduces new statements in the problem. If the DDAR does not solve the problem, our heuristic-based strategy gives additional attempts for adds auxiliary constructions to help solve the problem and re-runs the DDAR.

🔥 News

  • [2025/11/27] Our full code is under review by Microsoft and will be released upon approval.
  • [2025/11/27] The HAGeo paper, repo, and the HAGeo-409 benchmark are all released.
  • [2025/11/27] HAGeo is the first framework to achieve gold-medal-level human performance on IMO-level geometry problem solving without using any neural models for inference.

💡 Introduction

In the HAGeo framework (Figure. 1), a geometry problem is first converted into our geometry-specific language and then processed to encode all its properties into a deduction graph. The Deduction Database (DD) and Algebraic Reasoning (AR) engines expand the deduction graph through a brute-force search over all deduction rules and by performing algebraic deductions involving equations of length, ratio, or angle. If the DDAR fails to solve the problem, we further attempt K different auxiliary constructions based on our heuristic strategies, and the DDAR engine is rerun for each attempt.

📏 HAGeo-409 Benchmark

We create the HAGeo-409 benchmark, which comprises 409 IMO-level geometry theorem-proving problems and typically presents greater difficulty than the widely used IMO-30 benchmark.


Figure 2: Problem difficulity distribution of the IMO-30 benchmark and our new HAGeo-409 benchmark.

📊 Evaluation

Table 1. Results on the HAGeo-409 benchmark across different difficulty levels.

Level Count AlphaGeometry
16-64-8
Random
@2048
HAGeo
@2048
Random
@8192
HAGeo
@8192
1–3 161 118 (73.3%) 127 (78.9%) 141 (87.6%) 128 (79.5%) 149 (92.5%)
3–4 112 44 (39.3%) 62 (55.4%) 87 (77.7%) 69 (61.6%) 93 (83.0%)
4–5 71 13 (18.3%) 13 (18.3%) 29 (40.8%) 18 (25.4%) 36 (50.7%)
5–6 43 2 (4.7%) 2 (4.7%) 5 (11.6%) 3 (7.0%) 7 (16.3%)
6–7 22 0 (0.0%) 0 (0.0%) 1 (4.5%) 0 (0.0%) 2 (9.1%)
Total 409 177 (43.3%) 204 (49.9%) 263 (64.3%) 218 (53.3%) 287 (70.2%)

Table 2. Comparison on the IMO-30 benchmark

Method IMO-30
DDAR 15
AlphaGeometry 24*
Random Auxiliary Points + DDAR 25
HAGeo 28

☕️ Citation

If you find this repository or our benchmark helpful, please consider citing our paper:

@misc{duan2025goldmedallevelolympiadgeometrysolving,
      title={Gold-Medal-Level Olympiad Geometry Solving with Efficient Heuristic Auxiliary Constructions}, 
      author={Boyan Duan and Xiao Liang and Shuai Lu and Yaoxiang Wang and Yelong Shen and Kai-Wei Chang and Ying Nian Wu and Mao Yang and Weizhu Chen and Yeyun Gong},
      year={2025},
      eprint={2512.00097},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2512.00097}, 
}

🌟 Star History

Star History Chart

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published