Skip to content

Commit a30195f

Browse files
committed
doc(language): macro guide
Add first draft of documentation introducing the macro system to contributors at a higher level than the official paper; consolidate links to official resources and good examples to learn from.
1 parent 3d63bb8 commit a30195f

File tree

1 file changed

+270
-0
lines changed

1 file changed

+270
-0
lines changed

docs/macro_guide.md

Lines changed: 270 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,270 @@
1+
# Preface
2+
3+
Information on Lean 4's macro system is currently scattered between a number of resources, and most of the core APIs are still undocumented. This guide consolidates some of the information and includes more practical guidance on use, leaving the really low level details to the official paper and its supplement.
4+
5+
The offical paper describing the mechanics behind Lean 4's macro system can be found in [Beyond Notations: Hygienic Macro Expansion for Theorem Proving Languages](https://arxiv.org/abs/2001.10490) by Sebastian Ullrich and Leonardo de Moura, and the accompanying repo with example code can be found in the paper's code [supplement](https://github.com/Kha/macro-supplement). The supplement also includes a working implementation of the macro expander, so it's a good case study for people interested in the details.
6+
7+
# What is a macro in Lean?
8+
9+
A macro is a function that takes in a syntax tree and produces a new syntax tree. Macros are useful for many reasons, but two of the big ones are a) allowing users to extend the language with new syntactic constructs without having to actually expand the core language, and b) allowing users to automate tasks that would otherwise be extremely repetitive, time-consuming, and/or error-prone.
10+
11+
A motivating example is Mathlib 4's set builder notation. We would like to be able to write the set of natural numbers 0, 1, and 2 as just `{0, 1, 2}`. However, Lean does not natively support this syntax, and the actual definition of a set in Mathlib does not let us just declare sets in this manner; naively using the set API would force us to write `Set.insert 1 (Set.insert 2 (Set.singleton 3))`. Instead, we can teach Lean's macro system to recognize `{0, 1, 2}` as a shorthand for a composition of existing methods and let it do the repetitive work of creating the `Set.insert...` invocation for us. In this way, we can have our more readable and more convenient syntax without having to extend Lean itself, and while retaining the simple insert/singleton API.
12+
13+
# How macros are handled
14+
15+
The general procedure is as follows:
16+
17+
1. Lean parses the actual code in a given file, creating a Lean AST which contains any unexpanded macros.
18+
19+
2. Lean repeats the cycle (elaboration ~> (macro hygiene and expansion) ~> elaboration...)
20+
21+
The cycle in step 2 repeats until there are no more macros which need to be expanded, and elaboration can finish normally. This repetition is required since macros can expand to other macros, and may expand to code that needs information from the elaborator. As you can see, the process of macro parsing and expansion is interleaved with the parsing and elaboration of non-macro code.
22+
23+
By default, macros in Lean are hygienic, but hygiene can be disabled with the option `set_option hygiene false`. Readers can learn more about hygiene and how it's implemented in the official paper and supplement linked at the top of this guide.
24+
25+
# Elements of "a" macro (important types)
26+
27+
28+
In the big picture, a macro has two components that must be implemented by the user, parsers and syntax transformers, where the latter is a function that says what the input syntax should expand to. There is a third component, syntax categories, such as `term`, `tactic`, and `command`, but declaring a new syntax category is not always necessary. When we say "parser" in the context of a macro, we refer to the core type `Lean.ParserDescr`, which parses elements of type `Lean.Syntax`, where `Lean.Syntax` represents elements of a Lean syntax tree. Syntax transformers are functions of type `Syntax -> MacroM Syntax`. Lean has a synonym for this type, which is simply `Macro`. `MacroM` is a monad that carries state needed for macro expansion to work nicely, including the info needed to implement hygiene.
29+
30+
As an example, we again refer to Mathlib's set builder notation:
31+
```
32+
/- Declares a parser -/
33+
syntax (priority := high) "{" term,+ "}" : term
34+
35+
/- Declares two expansions/syntax transformers -/
36+
macro_rules
37+
| `({$x}) => `(Set.singleton $x)
38+
| `({$x, $xs:term,*}) => `(Set.insert $x {$xs,*})
39+
40+
/- Provided `Set` is in scope, these are all we need for `{1, 2, 3}` to be valid notation to create a literal set -/
41+
42+
```
43+
44+
This example should also make clear the reason why macros (and pretty much all of Lean 4's metaprogramming facilities) are functions that take an argument of type `Syntax` e.g. `Syntax -> MacroM Syntax`; the leading syntax element is the thing that actually triggers the macro expansion by matching with the declared parser, and as a user, you will almost always be interested in inspecting and transforming that initial syntax element (though there are cases in which it can just be ignored, as in the exfalso tactic).
45+
46+
Returning briefly to the API provided by Lean, `Lean.Syntax`, is pretty much what you would expect a basic syntax tree type to look like. This is a slightly simplified representation which omits details in the `atom` and `ident` constructors; users can create atoms and idents which comport with this simplified representation using the `mkAtom` and `mkIdent` methods provided in the `Lean` namespace.
47+
```
48+
inductive Syntax where
49+
| missing : Syntax
50+
| node (kind : SyntaxNodeKind) (args : Array Syntax) : Syntax
51+
| atom : String -> Syntax
52+
| ident : Name -> Syntax
53+
```
54+
55+
56+
57+
For those interested, `MacroM` is a `ReaderT` with the following components:
58+
59+
```
60+
abbrev MacroM := ReaderT Macro.Context (EStateM Macro.Exception Macro.State)
61+
62+
structure Context where
63+
methods : MethodsRef
64+
mainModule : Name
65+
currMacroScope : MacroScope
66+
currRecDepth : Nat := 0
67+
maxRecDepth : Nat := defaultMaxRecDepth
68+
ref : Syntax
69+
70+
inductive Exception where
71+
| error : Syntax → String → Exception
72+
| unsupportedSyntax : Exception
73+
74+
structure State where
75+
macroScope : MacroScope
76+
traceMsgs : List (Prod Name String) := List.nil
77+
deriving Inhabited
78+
```
79+
80+
As a review/checklist, the three (somtimes only two depending on whether you need a new syntax category) components users need to be concerned with are:
81+
82+
0. You may or may not need to declare a new syntax category using `declare_syntax_cat`
83+
1. Declare a parser with either `syntax` or `macro`
84+
2. Declare an expansion/syntax transformer with either `macro_rules` or `macro`
85+
86+
Parsers and syntax transformers can be declared manually, but use of the pattern language and `syntax`, `macro_rules`, and `macro` is recommended.
87+
88+
## syntax categories with declare_syntax_cat
89+
90+
`declare_syntax_cat` declares a new syntax category, like `command`, `tactic`, or mathlib4's `binderterm`. These are the different categories of things that can be referred to in a quote/antiquote. Under the hood, `declare_syntax_cat` just declares a parser descriptor:
91+
92+
```
93+
set_option trace.Elab.definition true in
94+
declare_syntax_cat binderterm
95+
96+
/-
97+
Output:
98+
99+
[Elab.definition.body] binderterm.quot : Lean.ParserDescr :=
100+
Lean.ParserDescr.node `Lean.Parser.Term.quot 1024
101+
(Lean.ParserDescr.binary `andthen (Lean.ParserDescr.symbol "`(binderterm|")
102+
(Lean.ParserDescr.binary `andthen (Lean.ParserDescr.unary `incQuotDepth (Lean.ParserDescr.cat `binderterm 0))
103+
(Lean.ParserDescr.symbol ")")))
104+
105+
[Elab.definition.scc] [binderterm.quot]
106+
-/
107+
```
108+
109+
Declaring a new syntax category like this one automatically declares a quotation operator `` `(binderterm| ...)``. These pipe prefixes `<thing>|` are used in syntax quotations to say what category a given quotation is expected to be an element of. The pipe prefixes are *not* used for elements in the `term` and `command` categories (since they're considered the default), but need to be used for everything else.
110+
111+
## Parsers and the `syntax` keyword
112+
113+
Internally, elements of type `Lean.ParserDescr` are implemented as parser combinators. However, Lean offers the ability to write parsers using the macro/pattern language by way of the `syntax` keyword. This is the recommended means of writing parsers. As an example, the parser for the `rwa` (rewrite, then use assumption) tactic is:
114+
115+
```
116+
syntax "rwa " rwRuleSeq (location)? : tactic
117+
118+
/-
119+
which expands to:
120+
[Elab.definition.body] tacticRwa__ : Lean.ParserDescr :=
121+
Lean.ParserDescr.node `tacticRwa__ 1022
122+
(Lean.ParserDescr.binary `andthen
123+
(Lean.ParserDescr.binary `andthen (Lean.ParserDescr.nonReservedSymbol "rwa " false) Lean.Parser.Tactic.rwRuleSeq)
124+
(Lean.ParserDescr.unary `optional Lean.Parser.Tactic.location))
125+
126+
[Elab.definition.scc] [tacticRwa__]
127+
-/
128+
129+
```
130+
131+
Literals are written as double-quoted strings (`"rwa "` expects the literal sequence of characters `rwa`, while the trailing space provides a hint to the formatter that it should add a space after `rwa` when pretty printing this syntax); `rwRuleSeq` and `location` are themselves `ParserDescr`s, and we finish with `: tactic` specifying that the preceding parser is for an element in the `tactic` syntax category. The parentheses around `(location)?` are necessary (rather than `location?`) because Lean 4 allows question marks to be used in identifiers, so `location?` is one single identifier that ends with a question mark, which is not what we want.
132+
133+
The name `tacticRwa__` is automatically generated. You can name parser descriptors declared with the `syntax` keyword like so:
134+
135+
```
136+
syntax (name := introv) "introv " (colGt ident)* : tactic
137+
138+
[Elab.definition.body] introv : Lean.ParserDescr :=
139+
Lean.ParserDescr.node `introv 1022
140+
(Lean.ParserDescr.binary `andthen (Lean.ParserDescr.nonReservedSymbol "introv " false)
141+
(Lean.ParserDescr.unary `many
142+
(Lean.ParserDescr.binary `andthen (Lean.ParserDescr.const `colGt) (Lean.ParserDescr.const `ident))))
143+
144+
[Elab.definition.scc] [introv]
145+
```
146+
147+
## The pattern language
148+
149+
Available quantifiers are `?` (one or zero occurrences, see note below), `*` (zero or more occurrences), and `+` (one or more occurrences).
150+
151+
Keep in mind that Lean makes `?` available for use in identifiers, so if we want a parser to look for an optional `location`, we would need to write `(location)?` with parenthesis acting as a separator, since `location?` would look for something under the identifier `location?` (where the `?` is part of the identifier).
152+
153+
Parentheses can be used as delimiters.
154+
155+
Separated lists can be constructed like so: `$ts,*` for a comma separated list.
156+
157+
"extended splices" can be constructed as `$[..]`. See the official paper (p. 12) for more details.
158+
159+
Literals are written as double-quoted strings. A literal may use trailing whitespace (see e.g. the `rwa` or `introv` tactics) to tell the pretty-printer how it should be displayed, but such whitespace will not prevent a literal with no trailing whitespace from matching. "The spaces are relevant, but not interpreted literally. When the ParserDescr is turned into a Parser, the actual token matcher [uses the .trim of the provided string](https://github.com/leanprover/lean4/blob/master/src/Lean/Parser/Basic.lean#L1193), but the generated formatter [uses the spaces as specified](https://github.com/leanprover/lean4/blob/master/src/Lean/PrettyPrinter/Formatter.lean#L328), that is, turning the atom "rwa" in the syntax into the string rwa as part of the pretty printed output."
160+
161+
## Syntax expansions with `macro_rules`, and how it desugars.
162+
163+
`macro_rules` lets you declare expansions for a given `Syntax` element using a syntax simlar to a `match` statement. The left-hand side of a match arm is a quotation (with a leading `<cat>|` for categories other than `term` and `command`) in which users can specify the pattern they'd like to write an expansion for. The right-hand side returns a syntax quotation which is the output the user wants to expand to.
164+
165+
A feature of Lean's macro system is that if there are multiple expansions for a particular match, Lean will try the most recently declared expansion first, and will retry with other matching expansions if the previous attempt failed. This is particularly useful for tactics.
166+
167+
The following example shows both the retry behavior, and the fact that macros declared using the shorthand `macro` syntax can still have additional expansions declared with `macro_rules`. This `transitivity` tactic is implemented such that it will work for either Nat.le or Nat.lt. The Nat.lt version was declared "most recently", so it will be tried first, but if it fails (for example, if the actual term in question is Nat.le) the next potential expansion will be tried:
168+
```
169+
macro "transitivity" e:(colGt term) : tactic => `(tactic| apply Nat.le_trans (m := $e))
170+
macro_rules
171+
| `(tactic| transitivity $e) => `(tactic| apply Nat.lt_trans (m := $e))
172+
173+
example (a b c : Nat) (h0 : a < b) (h1 : b < c) : a < c := by
174+
transitivity b <;>
175+
assumption
176+
177+
example (a b c : Nat) (h0 : a <= b) (h1 : b <= c) : a <= c := by
178+
transitivity b <;>
179+
assumption
180+
181+
/- This will fail, but is interesting in that it exposes the "most-recent first" behavior, since the
182+
error message complains about being unable to unify mvar1 <= mvar2, rather than mvar1 < mvar2. -/
183+
example (a b c : Nat) (h0 : a <= b) (h1 : b <= c) : False := by
184+
transitivity b <;>
185+
assumption
186+
```
187+
188+
To see the desugared definition of the actual expansion, we can again use `set_option trace.Elab.definition true in` and observe the output of the humble `exfalso` tactic defined in Mathlib4:
189+
190+
```
191+
192+
macro "exfalso" : tactic => `(apply False.elim)
193+
194+
/-
195+
Results in the expansion:
196+
197+
[Elab.definition.body] myMacro._@._hyg.320 : Lean.Macro :=
198+
fun x =>
199+
let discr := x;
200+
/- This is where Lean tries to actually identify that it's an invocation of the exfalso tactic -/
201+
if Lean.Syntax.isOfKind discr `tacticExfalso = true then
202+
let discr := Lean.Syntax.getArg discr 0;
203+
let x := discr;
204+
do
205+
/- Lean getting scope/meta info from the macro monad -/
206+
let info ← Lean.MonadRef.mkInfoFromRefPos
207+
let scp ← Lean.getCurrMacroScope
208+
let mainModule ← Lean.getMainModule
209+
pure
210+
(Lean.Syntax.node `Lean.Parser.Tactic.seq1
211+
#[Lean.Syntax.node `null
212+
#[Lean.Syntax.node `Lean.Parser.Tactic.apply
213+
#[Lean.Syntax.atom info "apply",
214+
Lean.Syntax.ident info (String.toSubstring "False.elim")
215+
(Lean.addMacroScope mainModule `False.elim scp) [(`False.elim, [])]]]])
216+
else
217+
/- If this wasn't actually an invocation of the exfalso tactic, throw the "unsupportedSyntax" error -/
218+
let discr := x;
219+
throw Lean.Macro.Exception.unsupportedSyntax
220+
-/
221+
```
222+
223+
We can view the expansion of the parser:
224+
225+
```
226+
syntax (name := myExfalsoParser) "myExfalso" : tactic
227+
228+
-- remember that `Macro` is a synonym for `Syntax -> TacticM Unit`
229+
@[macro myExfalsoParser] def implMyExfalso : Macro :=
230+
fun stx => `(apply False.elim)
231+
```
232+
233+
Finally, we can view the expansion of the syntax transformer:
234+
```
235+
syntax (name := myExfalsoParser) "myExfalso" : tactic
236+
237+
@[macro myExfalsoParser] def implMyExfalso : Macro :=
238+
fun stx =>
239+
do
240+
let info ← Lean.MonadRef.mkInfoFromRefPos
241+
let scp ← Lean.getCurrMacroScope
242+
let mainModule ← Lean.getMainModule
243+
pure
244+
(Lean.Syntax.node `Lean.Parser.Term.app
245+
#[Lean.Syntax.ident info (String.toSubstring "apply") (Lean.addMacroScope mainModule `apply scp)
246+
[(`Lean.Parser.Tactic.apply, []), (`Lean.Meta.apply, [])],
247+
Lean.Syntax.node `null
248+
#[Lean.Syntax.ident info (String.toSubstring "False.elim") (Lean.addMacroScope mainModule `False.elim scp)
249+
[(`False.elim, [])]]])
250+
```
251+
252+
## The `macro` keyword
253+
254+
`macro` is a shortcut which allows users to declare both a parser and an expansion at the same time as a matter of convenience. Additional expansions for the parser generated by the `macro` invocation can be added with a separate `macro_rules` block (see the example in the `macro_rules` section).
255+
256+
# Unexpanders
257+
258+
TODO; for now, see the unexpander in Mathlib.Set for an example.
259+
260+
# More illustrative examples:
261+
262+
The [Tactic.Basic](https://github.com/leanprover-community/mathlib4/blob/master/Mathlib/Tactic/Basic.lean) file in Mathlib4 contains many good examples to learn from.
263+
264+
# Practical tips:
265+
266+
You can observe the output of commands and functions that in some way use the macro system by setting this option to true : `set_option trace.Elab.definition true`
267+
268+
Lean 4 also offers the option of limiting the region in which option is set with the syntax `set_option ... in`):
269+
270+
Hygiene can be disabled with the command option `set_option hygiene false`

0 commit comments

Comments
 (0)