A Claude Code harness for solving rigorous math problems with agentic AI.
This repo bundles:
- A knowledge skill (
skills/formal-math-ai/) covering real & functional analysis, measure theory, operator theory, computer vision, GPU programming, generative AI, and formal verification — backed by a 334-chunk corpus from Lebl's Basic Analysis I. - A subagent roster (
.claude/agents/) with seven specialized math agents. - Slash commands (
.claude/commands/) that orchestrate those agents into a Generator–Critic loop. - Python helpers (
harness/) for symbolic verification (SymPy), Lean skeleton emission, and benchmark runs. - A small benchmark suite (
benchmarks/) of Putnam, IMO, and textbook problems.
git clone https://github.com/lonexreb/claude-code-math-skills.git
cd claude-code-math-skills
python3 -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
# Open in Claude Code
claude .Once Claude Code is running in this directory, the agents and commands load automatically. Try:
/math-solve Prove that every Cauchy sequence in ℝ converges.
When the user poses a math problem, the orchestrator routes through agents instead of answering from memory:
┌──────────────────┐
│ user problem │
└────────┬─────────┘
│
▼
┌──────────────────┐
│ math-strategist │ domain classification + 2–4 candidate strategies
└────────┬─────────┘
│
┌─────┴─────────────────────────────┐
▼ ▼
┌────────────────────────┐ ┌───────────────────────┐
│ counterexample-hunter │ │ numeric-verifier │ (in parallel; cheap falsification)
└──────────┬─────────────┘ └─────────┬─────────────┘
│ │
└─────────────┬─────────────┘
▼
┌──────────────────────────┐
│ symbol-runner (optional) │ SymPy precomputation
└────────────┬─────────────┘
▼
┌──────────────────────────┐
│ math-prover │ generates the proof
└────────────┬─────────────┘
▼
┌──────────────────────────┐
│ math-critic │ ◀──┐ adversarial review
└────────────┬─────────────┘ │
│ │ revise
├───────────────────┘
▼ (APPROVE)
┌──────────────────────────┐
│ math-formalizer (optional│ Lean 4 / mathlib skeleton
└──────────────────────────┘
| Agent | Model | Responsibility |
|---|---|---|
math-strategist |
opus | Domain classification, 2–4 candidate strategies, recommended path. |
math-prover |
opus | Rigorous proof generation with named theorem citations. |
math-counterexample-hunter |
sonnet | Adversarial: tries to falsify the claim with edge cases / pathologies / hypothesis-drop sweep. |
math-numeric-verifier |
haiku | Fast SymPy/numpy smoke test for claims with concrete numerical content. |
math-critic |
opus | Independent adversarial review of a written proof; severity-classified findings. |
math-formalizer |
opus | Translates approved proof into Lean 4 / mathlib skeleton. |
math-symbol-runner |
sonnet | SymPy step-by-step algebra for derivatives, integrals, identities. |
| Command | Purpose |
|---|---|
/math-solve <problem> |
End-to-end pipeline: strategy → falsification sweep → proof → critic loop → optional formalization. |
/math-prove <claim> |
Lighter pipeline; user is committed to the claim being true. |
/math-disprove <claim> |
Counterexample-first pipeline. |
/math-check <proof> |
Audit-only pass via the critic. |
/math-formalize <proof> |
Lean 4 skeleton emission. |
/math-bench <id|all> |
Run the harness on benchmark problems and record reproducible reports. |
/math-solve-deep <problem> [--branches N] |
Self-consistency mode: spawns N parallel strategist→prover branches, ranks them by critic verdict, refines the winner. Reserve for hard problems. |
skills/formal-math-ai/SKILL.md contains a 7-domain router pointing at:
- 16 hand-written reference digests under
references/*.md - A 334-chunk theorem-level corpus from Lebl's Basic Analysis I under
references/lebl-basic-analysis-vol1/
To extend: see skills/formal-math-ai/setup/README.md for the PDF→markdown ingestion pipeline.
benchmarks/problems/ ships with 11 seed problems across domains:
| id | domain | difficulty |
|---|---|---|
epsilon-delta-continuity |
real-analysis | 1 |
sqrt-2-irrational |
number-theory | 1 |
infinitude-of-primes |
number-theory | 1 |
lebl-3-1-7-cauchy-converges |
real-analysis | 2 |
intermediate-value-bisection |
real-analysis | 2 |
cauchy-schwarz-inner-product |
linear-algebra | 2 |
bolzano-weierstrass |
real-analysis | 3 |
banach-fixed-point |
functional-analysis | 3 |
dominated-convergence-application |
measure-theory | 4 |
putnam-2023-a1 |
combinatorics | 4 |
imo-2023-p1 |
number-theory | 5 |
Run with /math-bench <id> or /math-bench all. Reports land in benchmarks/runs/<id>__<ts>/.
Hand-written reference proofs for four of the seed benchmarks live under proofs/. They establish the rigor bar — CI fails if any of them accumulates a CRITICAL or HIGH proof_lint finding.
The setup/convert_pdfs.py pipeline ingests open-access textbook PDFs into theorem-level chunks. Recommended next ingestions are tracked in skills/formal-math-ai/references/SOURCES.md (Axler MIRA, Lebl Vol II, Erdman, Szeliski, …).
.
├── CLAUDE.md # project rules / harness routing
├── LICENSE # MIT (textbook content per SOURCES.md)
├── README.md # this file
├── requirements.txt
├── pytest.ini
├── .claude/
│ ├── settings.json # hooks + permission allow-list
│ ├── agents/ # 7 math subagents
│ └── commands/ # 8 slash commands (incl. /math-solve-deep, /math-explain)
├── skills/
│ └── formal-math-ai/ # knowledge skill (SKILL.md + references/)
├── harness/ # Python helpers
│ ├── sympy_check.py # symbolic / numeric smoke tests
│ ├── lean_stub.py # Lean 4 + mathlib skeleton emitter
│ ├── run_bench.py # /math-bench scaffolding
│ ├── proof_lint.py # structural proof linter (citations,
│ │ # filler, quantifier discipline)
│ └── citation_resolve.py # resolve "(Lebl, Thm 3.4.5)" → file path
├── templates/proofs/ # 5 canonical proof skeletons
├── proofs/ # hand-written reference proofs (rigor bar)
├── benchmarks/
│ ├── problems/ # 11 benchmark inputs
│ └── runs/ # reproducible run reports (gitignored)
├── tests/ # pytest suite (38 tests, py3.10–3.12)
├── docs/ # ARCHITECTURE, ROADMAP, COOKBOOK, EXTENDING
└── .github/
├── workflows/ci.yml # py_compile + pytest + proof-lint gate
│ + frontmatter validation
├── ISSUE_TEMPLATE/ # bug, feature, benchmark, agent-tuning
└── pull_request_template.md
This is an active project. Expect rough edges in the agent prompts; tune them and run /math-bench all to verify you didn't regress.
MIT for the harness code. Textbook reference content under skills/formal-math-ai/references/ retains its upstream license — see SOURCES.md.