claude-code-math-skills

A Claude Code harness for solving rigorous math problems with agentic AI.

This repo bundles:

A knowledge skill (skills/formal-math-ai/) covering real & functional analysis, measure theory, operator theory, computer vision, GPU programming, generative AI, and formal verification — backed by a 334-chunk corpus from Lebl's Basic Analysis I.
A subagent roster (.claude/agents/) with seven specialized math agents.
Slash commands (.claude/commands/) that orchestrate those agents into a Generator–Critic loop.
Python helpers (harness/) for symbolic verification (SymPy), Lean skeleton emission, and benchmark runs.
A small benchmark suite (benchmarks/) of Putnam, IMO, and textbook problems.

Quick start

git clone https://github.com/lonexreb/claude-code-math-skills.git
cd claude-code-math-skills

python3 -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt

# Open in Claude Code
claude .

Once Claude Code is running in this directory, the agents and commands load automatically. Try:

/math-solve Prove that every Cauchy sequence in ℝ converges.

The harness

When the user poses a math problem, the orchestrator routes through agents instead of answering from memory:

┌──────────────────┐
│  user problem    │
└────────┬─────────┘
         │
         ▼
┌──────────────────┐
│ math-strategist  │  domain classification + 2–4 candidate strategies
└────────┬─────────┘
         │
   ┌─────┴─────────────────────────────┐
   ▼                                   ▼
┌────────────────────────┐   ┌───────────────────────┐
│ counterexample-hunter  │   │ numeric-verifier      │  (in parallel; cheap falsification)
└──────────┬─────────────┘   └─────────┬─────────────┘
           │                           │
           └─────────────┬─────────────┘
                         ▼
            ┌──────────────────────────┐
            │ symbol-runner (optional) │  SymPy precomputation
            └────────────┬─────────────┘
                         ▼
            ┌──────────────────────────┐
            │      math-prover         │  generates the proof
            └────────────┬─────────────┘
                         ▼
            ┌──────────────────────────┐
            │      math-critic         │  ◀──┐ adversarial review
            └────────────┬─────────────┘     │
                         │                   │ revise
                         ├───────────────────┘
                         ▼ (APPROVE)
            ┌──────────────────────────┐
            │ math-formalizer (optional│  Lean 4 / mathlib skeleton
            └──────────────────────────┘

Agents

Agent	Model	Responsibility
`math-strategist`	opus	Domain classification, 2–4 candidate strategies, recommended path.
`math-prover`	opus	Rigorous proof generation with named theorem citations.
`math-counterexample-hunter`	sonnet	Adversarial: tries to falsify the claim with edge cases / pathologies / hypothesis-drop sweep.
`math-numeric-verifier`	haiku	Fast SymPy/numpy smoke test for claims with concrete numerical content.
`math-critic`	opus	Independent adversarial review of a written proof; severity-classified findings.
`math-formalizer`	opus	Translates approved proof into Lean 4 / mathlib skeleton.
`math-symbol-runner`	sonnet	SymPy step-by-step algebra for derivatives, integrals, identities.

Slash commands

Command	Purpose
`/math-solve <problem>`	End-to-end pipeline: strategy → falsification sweep → proof → critic loop → optional formalization.
`/math-prove <claim>`	Lighter pipeline; user is committed to the claim being true.
`/math-disprove <claim>`	Counterexample-first pipeline.
`/math-check <proof>`	Audit-only pass via the critic.
`/math-formalize <proof>`	Lean 4 skeleton emission.
`/math-bench <id\|all>`	Run the harness on benchmark problems and record reproducible reports.
`/math-solve-deep <problem> [--branches N]`	Self-consistency mode: spawns N parallel strategist→prover branches, ranks them by critic verdict, refines the winner. Reserve for hard problems.

Knowledge base

skills/formal-math-ai/SKILL.md contains a 7-domain router pointing at:

16 hand-written reference digests under references/*.md
A 334-chunk theorem-level corpus from Lebl's Basic Analysis I under references/lebl-basic-analysis-vol1/

To extend: see skills/formal-math-ai/setup/README.md for the PDF→markdown ingestion pipeline.

Benchmarks

benchmarks/problems/ ships with 11 seed problems across domains:

id	domain	difficulty
`epsilon-delta-continuity`	real-analysis	1
`sqrt-2-irrational`	number-theory	1
`infinitude-of-primes`	number-theory	1
`lebl-3-1-7-cauchy-converges`	real-analysis	2
`intermediate-value-bisection`	real-analysis	2
`cauchy-schwarz-inner-product`	linear-algebra	2
`bolzano-weierstrass`	real-analysis	3
`banach-fixed-point`	functional-analysis	3
`dominated-convergence-application`	measure-theory	4
`putnam-2023-a1`	combinatorics	4
`imo-2023-p1`	number-theory	5

Run with /math-bench <id> or /math-bench all. Reports land in benchmarks/runs/<id>__<ts>/.

Hand-written reference proofs for four of the seed benchmarks live under proofs/. They establish the rigor bar — CI fails if any of them accumulates a CRITICAL or HIGH proof_lint finding.

Adding more knowledge

The setup/convert_pdfs.py pipeline ingests open-access textbook PDFs into theorem-level chunks. Recommended next ingestions are tracked in skills/formal-math-ai/references/SOURCES.md (Axler MIRA, Lebl Vol II, Erdman, Szeliski, …).

Repository layout

.
├── CLAUDE.md                       # project rules / harness routing
├── LICENSE                         # MIT (textbook content per SOURCES.md)
├── README.md                       # this file
├── requirements.txt
├── pytest.ini
├── .claude/
│   ├── settings.json               # hooks + permission allow-list
│   ├── agents/                     # 7 math subagents
│   └── commands/                   # 8 slash commands (incl. /math-solve-deep, /math-explain)
├── skills/
│   └── formal-math-ai/             # knowledge skill (SKILL.md + references/)
├── harness/                        # Python helpers
│   ├── sympy_check.py              # symbolic / numeric smoke tests
│   ├── lean_stub.py                # Lean 4 + mathlib skeleton emitter
│   ├── run_bench.py                # /math-bench scaffolding
│   ├── proof_lint.py               # structural proof linter (citations,
│   │                               #   filler, quantifier discipline)
│   └── citation_resolve.py         # resolve "(Lebl, Thm 3.4.5)" → file path
├── templates/proofs/               # 5 canonical proof skeletons
├── proofs/                         # hand-written reference proofs (rigor bar)
├── benchmarks/
│   ├── problems/                   # 11 benchmark inputs
│   └── runs/                       # reproducible run reports (gitignored)
├── tests/                          # pytest suite (38 tests, py3.10–3.12)
├── docs/                           # ARCHITECTURE, ROADMAP, COOKBOOK, EXTENDING
└── .github/
    ├── workflows/ci.yml            # py_compile + pytest + proof-lint gate
    │                                  + frontmatter validation
    ├── ISSUE_TEMPLATE/             # bug, feature, benchmark, agent-tuning
    └── pull_request_template.md

Status

This is an active project. Expect rough edges in the agent prompts; tune them and run /math-bench all to verify you didn't regress.

License

MIT for the harness code. Textbook reference content under skills/formal-math-ai/references/ retains its upstream license — see SOURCES.md.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

claude-code-math-skills

Quick start

The harness

Agents

Slash commands

Knowledge base

Benchmarks

Adding more knowledge

Repository layout

Status

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.claude		.claude
.github		.github
benchmarks		benchmarks
docs		docs
harness		harness
proofs		proofs
skills/formal-math-ai		skills/formal-math-ai
templates/proofs		templates/proofs
tests		tests
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
NEXT-TO-DO.md		NEXT-TO-DO.md
README.md		README.md
pytest.ini		pytest.ini
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

claude-code-math-skills

Quick start

The harness

Agents

Slash commands

Knowledge base

Benchmarks

Adding more knowledge

Repository layout

Status

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages