This is just a stub to give links to a one page ‘cheat sheet’ for Lean tactics and accompanying examples that I created as part of my long-term project to learn Lean 4. If time permits I may come back to write a bit more about my struggles, which were lightened by the excellent videos and supporting material from Lean for the curious mathematician (2023).
Lean 4 and the Curry–Howard correspondence
February 9, 2026This post is a record of my struggles learning the Lean 4 Proof Assistant, which I’m doing by the inefficient and misleading strategy of interpreting it whenever I can as a souped-up version of Haskell.
Curry–Howard in propositional logic
The formula
is the generic instance of the second axiom scheme in Lukasiewicz’s formulation of propositional logic. As such, it is a tautology, true under any assignment of true and false to the propositional letters. An elegant proof of this is by the Curry–Howard correspondence for propositional logic: a proposition is a tautology if and only if, interpreting the propositional letters as type variables, there is a function of the corresponding type. Thus the fact that the Haskell compiler accepts
lukasiewicz2 :: (p -> (q -> r))
-> ((p -> q) -> (p -> r))
lukasiewicz2 h_ptoqr h_pq h_p = (h_ptoqr h_p) (h_pq h_p)
-- h_ptoqr h_p :: q -> r evaluated at h_pq h_p :: q ...
-- has type r
or that the Lean compiler accepts
lemma lukasiewicz2 {p q r : Prop} : (p → (q → r))
→ ((p → q) → (p → r))
:= fun h_ptoqr h_pq h_p => (h_ptoqr h_p) (h_pq h_p)
is a proof that the Lukasiewicz formula is a tautology. Here h_pq can be read as ‘the hypothesis that ‘, or ‘a proof that
‘. It is a value of type
. (For more on the curly braces block see below.) The Lean code may look more familiar in the entirely equivalent form below, which shows that Lean permits multiple arguments in type signatures to be listed by juxtaposition, just as for multiple arguments to functions:
lemma lukasiewicz2' {p q r : Prop}
(h_ptoqr : p → (q → r)) (h_pq : p → q) : p → r
:= fun h_p => (h_ptoqr h_p) (h_pq h_p)
This proof can be rewritten in tactic style using the intro tactic to introduce the hypothesis h_p of type p, and then the exact tactic to finish:
lemma lukasiewicz2'' {p q r : Prop}
(h_ptoqr : p → (q → r)) (h_pq : p → q) : p → r
:= by intro h_p; exact (h_ptoqr h_p) (h_pq h_p)
One thing I am struggling with in Lean is to get used to tactics, so examples like this where the macro-rewriting black box can easily be understood are particularly welcome. I hope the not so-subtle neurolinguistic programming of typing fun every so often will keep me motivated.
Some more tautologies
In a similar spirit, the Curry–Howard proofs that p → p and p → (q → p) are tautologies have one word solutions: id (valid in Haskell and Lean), and const (valid in Haskell). Here p → (q → p) is the generic instance of the first Lukasiewicz axiom scheme.
Negation
In Lean, negation is definitionally the same as ‘implies false’. That is . We can mimic this in Haskell by writing
data False, to introduce a new type False which deliberately does not have any constructors. We then define negation by the type synonym type Not p = p -> False. (As a visual aid, -> is the Haskell arrow; this is allowed by Lean, but Lean also supports → which I’ve used for all Lean code.) For example, the proposition is a tautology, with Curry–Howard proof in Haskell
impossible :: (p, Not p) -> False
impossible (h_p, h_notp) = h_notp h_p
given by evaluating the function h_notp :: p → False at h_p :: p. One could curry this definition to get the equivalent tautology with definition
impossible' h_p h_notp = h_notp h_p. The analogous Lean code for the curried form is
lemma impossible {p : Prop} : p → ¬p → False
:= fun h_p => fun h_np => h_np h_p
which, very similarly to before, can be rewritten as
lemma impossible' {p : Prop} (h_p : p) (h_np : ¬p)
: False := h_np h_p
Observe that since a proposition cannot be both true and false, it is impossible, without using the special Haskell value undefined, or its Lean colleague sorry, to construct two valid inputs to impossible. (Or for another proof of this, note that a successful evaluation of impossible gives a value of the void type False.) But still, the fact that impossible type checks confirms that it is a tautology.
For a simpler instance, observe that id :: False → False type checks and so is a tautology. Thinking of types as their set of values, this is consistent with
, since
unless
p is itself a void type, in which case we have ; correspondingly
id is the unique function of type False → False.
The contrapositive and vector space duality
Continuing in this mathematical theme, the contrapositive, formulated so that it introduces rather than removes a negation, is and correspondingly
contrapositive :: (p -> q) -> (Not q -> Not p)
contrapositive h_pq h_nq h_p = h_nq (h_pq h_p)
is a valid Haskell program, that takes h_pq : p -> q and the proof that p is false, as expressed by a value of type h_nq : q -> False, to produce a new value, namely the composition h_nq . h_pq of type p -> False, which is then evaluated at h_p : p, the argument of the output Not p. Here it may be helpful to note that the final parentheses are redundant: the type could also be written as (p -> q) -> (q -> False) -> (p -> False). Again the Lean code is almost identical:
lemma contrapositive {p q : Prop} : (p → q) → (¬q → ¬p)
:= fun h_pq h_nq h_p => h_nq (h_pq h_p)
-- fun h_pq => (fun h_nq => (fun h_p =>
-- h_nq (h_pq h_p)))
where the commented out lines are the less idiomatic form; like any civilised being, Lean curries by default.
The flip in the direction of implication corresponds to vector space duality: if we read False as , for a field
(please ignore that this is not a void type), then the contrapositive states that, given a
-linear map
, there is a map
, namely the dual map
. The introduction of a double negation corresponds to the canonical inclusion of a vector space in its double dual
that sends
to evaluation at
:
double_negation_intro :: p -> Not (Not p)
-- equivalently p -> (p -> False) -> False
double_negation_intro h_p h_np = h_np h_p
or in Lean
lemma double_negation_intro {p : Prop} : p → ¬¬p
:= fun h_p => (fun h_notp => h_notp h_p)
which works because ¬¬p is (p → False) → False so we can take the given value of type p and evaluate it at the argument of type p → False. As an antidote to the false argument that the number of negations cannot go down, I find it helpful to consider
lemma triple_negation_elimination {p : Prop} : ¬¬¬p → ¬p
:= fun h_nnnp =>
(fun h_p => h_nnnp (double_negation_intro h_p))
which corresponds to the canonical map defined by dualising the canonical inclusion
just seen. These maps are not just canonical but also ‘natural’ in the sense of ‘natural transformation’: if
is a linear map then the naturality squares below commute.
As a big hint for the question at the following section, can you define a map , natural in the vector space
? If your answer is ‘yes’, you are perhaps working in the category of finite vector spaces, which curiously is not the right setting for this question, even though the question does not, on the face of it, involve any form of infinity!
Double negation elimination
The Lukasiewicz system needs a third axiom scheme to become complete, in the sense that all tautologies are provable from the axioms by repeated applications of modus ponens. This third axiom scheme has as its generic instance the contrapositive, this time removing negation:
Does this have a Curry–Howard proof in Haskell? Does it have a Curry–Howard proof in Lean? Using the version of the contrapositive already proved, we can certainly prove ; then using double negation introduction
, we get
The Curry–Howard proof of this in Haskell is, using our existing functions,
lukasiewicz3_nearly :: (Not p -> Not q)
-> q -> Not (Not p)
lukasiewicz3_nearly h_npnq h_q =
contrapositive h_npnq (double_negation_intro h_q)
which can be ‘desugared’ in careful steps to
lukasiewicz3_nearly1 h_npnq h_q = \h_nq ->
contrapositive h_npnq (double_negation_intro h_q) h_nq
lukasiewicz3_nearly2 h_npnq h_q = \h_nq ->
(double_negation_intro h_q) (h_npnq h_nq)
lukasiewicz3_nearly3 h_npnq h_q = \h_nq ->
(\h_nq' -> h_nq' h_q) (h_npnq h_nq)
lukasiewicz3_nearly4 h_npnq h_q = \h_nq
-> (h_npnq h_nq) h_q
lukasiewicz3_nearly5 h_npnq h_q h_nq = h_npnq h_nq h_q
I find it quite remarkable that, without a type signature, the inferred type for the final version is (a -> b -> c) -> b -> a -> c, which by parametric polymorphism and the theorems for free mantra, has unique value
lukasiewicz3_nearly6 g x y = g y x
I have no intuition for why this apparently complicated tautology involving both negation and double negation should be such a simple combinator.
It is now clear that the third Lukasiewicz axiom scheme is equivalent in logical power to double negation elimination. But, finally reaching one punchline, the Haskell type Not (Not p) -> p, and its syntactic equivalent ((p -> False) -> False) -> p are uninhabited. Thus there is no way to complete
double_negation_elimination :: Not (Not p) -> p
double_negation_elimination h_nnp = ...
without using undefined. Thus Haskell exhibits the Curry–Howard logic for intuitionistic, rather than classical, logic.
The foundations of Lean are in homotopy type theory, which is also intuitionistic. But Lean supports classical logic, and if I understand right, this is automatically enabled by importing Mathlib. Remembering that proof by contradiction is ‘the falsity of implies
‘ we get a one-line proof of double negation elimination in Lean.
lemma double_negation_elimination {p : Prop} :
¬¬p → p := by_contra
-- by_contra has type (¬?m.1 → False) → ?m.1
If you are playing along and find this gives an error, please add import Mathlib.Tactic.ByContra at the top of your file. Or add import Mathlib, and go away and make some tea. I used a full import of Mathlib for the examples of dependent types involving natural numbers below. Using this we can prove the instance of the third Lukasiewicz scheme:
lemma contrapositive_eliminating_negation {p q : Prop}
: (¬p → ¬q) → q → p :=
fun h_npnq h_q => by by_contra h_np;
-- we have h_npnq : ¬p → ¬q, h_q : q and now h_np : ¬p
-- so all set up for
exact h_npnq h_np h_q
where the definition in the body could be rewritten as intro h_npnq h_q; by_contra h_np; exact h_npnq h_np h_q
Peirce’s Law
For a similar example, Peirce’s Law, that , is negation free but still unprovable in intuitionistic logic. Peirce’s Law is the type of the dreaded (well, by me anyway) Scheme function
call-with-current-continuation. This type is void in Haskell: instead the nearest approximation is callCC
class Monad m => MonadCont m where
callCC :: ((a -> m b) -> m a) -> m a
where, omitting type constructors, and choosing a fixed return type r, one suitable type for a monad in the class MonadCont is m p = (p -> r) -> r. This might suggest taking r to be False but then m a is Not (Not a), so we stay in the doubly negated world. I found it an instructive exercise to deduce double negation elimination from Peirce’s Law, by using Peirce with specialised to False to get
, i.e.
and then using a chain of implications
; this becomes a function composition below. I will leave filling in the
sorry in the sublemma h_aux : ¬¬p → (¬p → p) below as an exercise. As a hint, one solution begins with intro h_nn h_n to get hold of values of the input types ¬¬p and \neg p; now compare the earlier proof by contradiction.
lemma peirce_implies_double_negation_elimination
: (∀ r q : Prop, ((r → q) → r) → r)
→ (∀ p : Prop, ¬¬p → p) := by
intro peirce p
specialize peirce p False
-- current state is p : Prop, peirce :
((p → False) → p) → p, goal ¬¬p → p
have h_aux : ¬¬p → (¬p → p) := by sorry
exact fun h_nnp => peirce (h_aux h_nnp)
This proof is probably comically long-winded to experts but no matter. At least I now have a better appreciation for why the first year undergraduates doing my introduction to pure maths course used to write proofs that began with a confident ‘suppose for a contradiction’, continued with a contrapositive, flirted perhaps with induction, and so on and so on, until eventually one reached the meat of the proof, which was usually a one line direct proof of the required proposition.
Types of types
Whereas in Haskell there are values, types and kinds (but without extensions the only kind that matters is , the common kind of all types), in Lean 4, in order to avoid Girard’s paradox, there is an infinite hierarchy. (The logical system of Principia Mathematica is I believe, similar.) Nearly at the bottom of the hierarchy are values, just like in Haskell, such as the numbers 3 or 1.234, the list
[1,2,3] or the string literal "hello". As in Haskell, these values have various types, for instance Nat for 3, ℝ for 1.234, List Nat for [1,2,3] and String for "hello". (Unlike Haskell, strings are not definitionally the same as lists of characters, so String is more like Haskell’s efficient Text data type.) Thus 3 : Nat, 1.234 : ℝ, [1,2,3] : List Nat and "hello" : String. Each of Nat, ℝ, List Nat, String is a type. The common type of all these types is Type, which can also be notated Type 0. The hierarchy then proceeds: the type of Type is Type 1, the type of Type 1 is Type 2, and so on. Off to one side, but at the lowest level of all, are values of a further type: Prop. This is the only type we have used so far. The type of Prop is Type. Thus
10 : Nat : Type : Type 1 : Type 2
h_p : p → p : Prop : Type : Type 1 : Type 2
are chains of valid type signatures (supposing that h_p and p are in scope) and
variable (p q : Prop)
#check p → (q → p)
#check p → q
#check p → p
prints out p → (q → p) : Prop, p → q : Prop, and p → p : Prop, as expected. (Note the second is not an inhabited type.) To avoid having to inspect the interactive output, one can include the expected output at the end, using an extra pair of parentheses. Thus
#check (p → (q → p) : Prop)
is a suitable replacement for the first check; any mismatch gets highlighted in VSCode with the usual red cross. Some of the claims above on types of types can be confirmed using
#check (3 : Nat)
#check (Nat : Type)
#check (Prop : Type)
#check (Type : Type 1)
Impredicativity of the proposition type
One subtle point is that normally the type of a function α → β where α : Type i, β : Type j is Type k where k is the maximum of i and j. But if β : Prop then no matter how high α is in the hierarchy, the resulting type is Prop. Thus the types of the entity following := in the three examples below are respectively Type 2, Prop and Prop
example {α : Type 1} {β : Type 2} : Type 2 := α → β
example : Prop := (n : Nat) → 0 + n = n
example {p : Prop} : Prop := (α : Type 4) → p
This property ‘impredicativity of Prop‘ is needed below so that our functions of dependent type are propositions, and so compound proofs do not quickly leave the world of propositions.
Proof irrelevance
I have the strong feeling that there can be essentially different proofs of the same proposition. For example, the Fundamental Theorem of Algebra can be proved by induction on the 2-power dividing the degree of the polynomial, using only the intermediate value theorem from basic calculus, or instead as a corollary of Liouville’s theorem from complex analysis that a bounded entire function is constant. Lean 4 would disagree:
theorem proof_irrelevance {p : Prop} (h_p h'_p : p)
: h_p = h'_p := by rfl
Here rfl can, I think, be read as ‘are reflexively equal’. The analogue in Haskell would be a data type P that is made an instance of Eq by instance Eq P where _ == _ = True.
Aside on curly braces in types
The block {p q r : Prop} in the first example lukasiewicz2 declares that p, q, r are implicitly of type Prop; this block can be omitted, if one first declares variable (p q r : Prop) at the top of the file, as in the check example above. This is a bit like declaring p, q, r to be global types, each themselves of type Prop; compare a global declaration const int x = 3 of global values in a traditional imperative language such as C. Lean also has an analogue of Haskell’s type classes, for which square brackets are used. For instance [Ring R] means that the compiler should know (probably from some other file) that R has the structure of a ring. We won’t need type classes at all below.
Function types and forall
So far we have stayed in the world of propositional logic (which unlike predicate logic does not have quantification), and correspondingly, except for double-negation elimination, everything we have done in Lean can also be done in Haskell. Function types, or -types, are the key place where the Lean type system is far more expressive. Let me show how one can discover the syntax for oneself. Using the tautology
, the interactive environment prints out identical output
∀ (r : Prop), r → r : Prop for both the #check queries below.
#check (r : Prop) → r → r
#check ∀ r : Prop, r → r
This is not the same as the output p → p : Prop of #check p → p, which is only valid thanks to the variable declaration variable (p q : Prop) above. Thus ∀ r : Prop, r → r is the type of functions whose
- first argument is a proposition, called
r(say) - return value has type the proposition
r → r
Or equivalently, ∀ r : Prop, r → r is the type of functions whose
- first argument is a proposition, called
r(as it happens) - second argument is a value of type
r(let’s call ith_r) - returning a value of type
r(it must beh_r).
As confirmation here are Lean proofs of the axiom , first using implicit declaration as before, and then written with each of these interpretations in mind:
lemma self_implication {p : Prop} : p → p := id
lemma self_implication' : (∀ r : Prop, r → r) :=
by intro r; exact id
lemma self_implication'' : (r : Prop) → r → r :=
fun r h_r => h_r
Both primed forms have identical type signatures (when read by the compiler) and the values defined, i.e. the identity functions r → r for each r : Prop are definitionally equivalent. The compiler, at least with my default settings, will object that the argument p in the final function form is unused; it is a nice touch that the output is still more cluttered by the instruction set_option linter.unusedVariables false on how to declutter it. But fair enough: it would be more idiomatic to write
lemma self_implication''' : (r : Prop) → r → r :=
fun _ h_r => h_r
using the same underscore notation as Haskell for the ‘function hole’. Yet further definitions (i.e. the part following :=) are possible, for instance,
lemma self_implication'''' : (r : Prop) → r → r :=
by simp
lemma self_implication''''' : (r : Prop) → r → r :=
by simp?
both succeed, and the latter even tells us how it did it:
Try this: [apply] simp only [imp_self, implies_true]
Well, if you really must. I think I prefer the identity function.
Dependent types
Observe that the type of the output of self_implication'' depends explicitly on the first argument: in the first slot p, of type Prop, is the proposition to be shown to imply itself, whereas in the second and third slots, p is the type. The nearest we can get to the type signature in Haskell, supposing a type Prop is defined, is
self_implication_Haskell :: Prop → p → p -- what's p?
except this entirely fails to capture that p is not just a proposition, but in fact the first argument of the function. Instead the type signature above specifies a function taking an arbitrary proposition to a fully polymorphic function p → p, with no connection between the type variable p and the proposition. (To underline this point, equivalent Haskell type signatures in which p appears within the scope of a universal quantifier are :: Prop -> forall p. p -> p and forall p. Prop -> p -> p; you might need to enable the language extension ExplicitForAll for this to be accepted, but it seems that it is on by default in ghci-9.6.7.) Of course since we can give a Curry–Howard proof of self-implication in Haskell, there is an acceptable definition of self_implication_Haskell, namely \lambda _ h_p = h_p, but from our new dependent-type perspective, this is just a potentially confusing stroke of good luck.
Propositions indexed by natural numbers as a dependent type
For an example where it is not just the type signature but the whole function that cannot be realised in Haskell (at least without special tricks, see the final section below), we consider propositions parametrised by natural numbers. The following are equivalent declarations of a function that takes as input a natural number n, of type Nat, and returns a value of type 0 + n = n, i.e. a proof that adding zero to n gives n.
lemma zero_add_nat : ∀ n : Nat, 0 + n = n := by sorry
lemma zero_add_nat' : (n : Nat) → (0 + n = n) :=
by sorry
In both cases one suitable replacement for sorry is intro n; ring_nf; after introducing n, the target is 0 + n = n, which falls to the ring normal form tactic. Showing a better appreciation of MathLib, we could also use intro n; exact zero_add n, or, showing a better appreciation of Peano arithmetic, we could use
lemma zero_add_nat'' : (n : Nat) → 0 + n = n := by
intro n; cases n
-- zero case 0 + 0 = 0
exact (add_zero 0)
-- succ case: ⊢ 0 + (n' + 1) = n' + 1
rename_i n
-- Nat.add_assoc (n m k : ℕ) : n+m+k = n+(m+k)
rewrite [← Nat.add_assoc]
-- goal is now 0 + n + 1 = n + 1
-- which is implicitly (0 + n) + 1 = n + 1
rewrite [zero_add_nat'' n]
rfl
Above I edited the Lean output to use n' rather than the dagger from Lean, which did not come through well on WordPress. It would be more idiomatic to replace cases n with induction n hn, where hn is bound to the inductive hypothesis. Also it is standard to use rw rather than rewrite, combining multiple rewrites in one line. Since rw includes a weak form of rfl, the final line is then redundant. With these changes we get something you might recognise from the Lean Number Game.
lemma zero_add_nat''' : (n : Nat) → 0 + n = n := by
intro n; induction n with
| zero => exact (add_zero 0)
| succ n hn => rw [← Nat.add_assoc, hn]
This is also a good example of the impredicativity of Prop mentioned in the section above on the type hierarchy, where we saw that example : Prop := (n : Nat) → 0 + n = n type checks. Correspondingly, the type of the function type zero_add_nat is Prop. This can be verified, taking great care not to confuse types on the left of := with types (as values) on the right, by
def type_of_zero_add_nat := ∀ n : Nat, 0 + n = n
#check (type_of_zero_add_nat : Prop)
Pi-types
Returning to the type signatures, notice that both zero_add_nat : (n : Nat) -> 0 + n = n and
lemma self_implication''' : (p : Prop) → p → p
:= fun _ h_p => h_p
taken verbatim from above are of the general form
lemma piTest {N : Type} {f : N → Prop} : (n : N) → f n
:= by sorry
which does indeed type check. (The triangle warning is from the use of sorry, which is of course the only way to complete the definition.) For more on the –type, see ‘Function types (a.k.a. Pi types)’ in Buzzard’s Formalising mathematics in Lean: one interesting remark I’ll trail here is that a
-type is analogous to a tangent vector field on a manifold, in that each value of
lies in a different (vector) space:
. Anyway, we have now seen the most general type of a Lean function. Buzzard goes on to discuss inductive types (analogous to algebraic data types in Haskell) and quotient types (with no Haskell analogue, beyond what one could fake using GADTs).
This is the end of the Lean part of this post, and probably a very good place to stop reading. Thank you for getting this far!
Playing the natural number game in Haskell with singletons
There is a website out there with about 25 ‘design patterns’, all of which as far as I can tell are ingenious ways to work around the limitations of the type systems in more traditional languages, of which the prime miscreant seems to be Java. So it is with some dismay that I learned from the wonderful Haskell Unfolder podcast that the singleton design pattern is a good way to bring some of the expressiveness of dependent types to Haskell.
We define Peano naturals as a data type
data Nat = Zero | Succ Nat deriving (Eq, Show)
If like me, you often mistype a constructor when you mean a type, as in makeZero :: Zero as opposed to the correct makeZero :: Nat (with definition makeZero = Zero) then you will have got rather used to seeing variations on the theme of ‘Not in scope: type constructor or class ‘Zero’. A data constructor of that name is in scope; did you mean DataKinds?’ Well the great moment has finally arrived, when we do, in fact, mean DataKinds! (Seriously though, this is a nice example of why shipping without a million extensions enabled is the right choice for producing understandable error messages.) Using this extension, we get access to type level naturals, written as 'Zero,
type One = 'Succ 'Zero
and so on. Confusingly the quotes can be omitted, because (unlike this poor human) the compiler always knows what is at the value level and what is at the type level. Here I am including the quotes every time they are appropriate. The three further extensions required appear as comments in the next code block.
To encode the proposition at the type level, we introduce a new generalized algebraic data type (using the GADTs extension) that has ‘wired in’ the Peano axioms that
for all
and, by definition
. (Note that addition on the right is idiomatic, because it generalizes to ordinals in the usual way:
is the successor ordinal to
whereas
.
data Add :: Nat -> Nat -> Nat -> Type where
-- Uses GADTs and KindSignatures
AddZero :: Add m 'Zero m
AddSucc :: Add m n p -> Add m ('Succ n) ('Succ p)
deriving instance Show (Add m n p)
deriving instance Eq (Add m n p)
-- Uses StandaloneDeriving
We now use the Haskell compiler to prove that for all
. What we want is to define a function
zeroAdd :: (n : Nat) -> Add Zero n n such that zeroAdd n has type Add Zero n n. But note that in zeroAdd n, n is a value, whereas in Add Zero n n it is a type. Thus the type of the output depends on a value in the input, exactly as expected for a dependent type. And this is not possible in Haskell, no matter how many extensions one turns on.
What we need is a bridge from values to types. For instance, we need a value of type SomeNewDataType One such that, when we pattern match on it, we know that the type parameter is One (and not 'Zero , or 'Succ 'Succ 'Zero or …). This is what singletons do for us.
data TypeValueDict (n :: Nat) where
TVZero :: TypeValueDict 'Zero
TVSucc :: TypeValueDict n -> TypeValueDict ('Succ n)
As an illustration of the bridge, I offer
showTypeFromDict :: TypeValueDict nType
-> String
showTypeFromDict TVZero = "'Zero"
showTypeFromDict (TVSucc nValue) =
"'Succ " ++ showTypeFromDict nValue
tvOneTypeString = showTypeFromDict
(TVSucc TVZero :: TypeValueDict One)
-- no quote for One, it is a type (well really a Nat)
where the output is the string "'Succ 'Zero'"; earlier we declared One as a type synonym for 'Succ 'Zero; it has type Nat
The proof that is now routine: note that the illegal type signature
zeroAdd :: (n : Nat) -> Add Zero n n is made legitimate using the dictionary.
zeroAdd :: TypeValueDict n -> Add Zero n n
zeroAdd TVZero = AddZero
zeroAdd (TVSucc r) = AddSucc (zeroAdd r)
zeroAddOne = zeroAdd (TVSucc TVZero) :: Add Zero One One
The fact that this compiles is a proof, at the type level, that using only the Peano axioms. A truly expert Haskell programmer would probably refrain from running the code at all. Viewed from a sufficiently great distance, it is the same as the inductive proof in Lean.
Evaluating Gemini 3.0 Pro ‘Deep Think’ on American Mathematical Monthly Problems
January 2, 2026The purpose of this post is to summarise my findings evaluating Gemini 3.0 Pro ‘Deep Think’ on the problems at the back of the May 2025 issue of the American Mathematical Monthly. My evaluation is available from my website, together with the transcript of all my interactions with Gemini. I used Gemini because while it’s probably second to ChatGPT 5.2 Pro at solving maths problems, I find it marginally more interesting to interact with.
I chose the May 2025 issue at random, but subject to the constraints that the deadline for submission of answers should have passed, and the solutions not yet be published. I think the problems in this issue are representative. None of them require exceptional ingenuity, and only 12535 (on binomial coefficients) could, I think, be said to be routine, even for an expert. In the table below, the headings are short for ‘Could I have done it quicker without the LLM?’, ‘Did the LLM make it fun?’ and ‘Fudge factor’, which I hope is self-explanatory. Stars mark the two problems to which I had already submitted solutions.

This evaluation is of course highly subjective, but I hope you will find it of some interest. One highlight for me was when both Gemini 3.0 Pro ‘Deep Think’ and ChatGPT 5.2 ‘Extended Thinking’ rated my solution to 12535 as for `aesthetic merit’. You can read Gemini’s solution verbatim in the evaluation report linked above.
Before embarking on this project, I was surprised that I hadn’t already seen many similar exercises from other better mathematicians … but half-way through I realised how time-consuming it was, and my surprise ceased. But Gemini and ChatGPT are tools, and like any tool, we need to learn how to use them, double-edged though they may be. In this spirit, I know I have benefitted in my career from being an enthusiast for computer algebra, and for programming; for instance Haskell changed how I think about monads.
Thinking of LLMs as just another tool in our toolkit, we should be honest about their strengths as well as pointing out their weaknesses (laughter optional, but sometimes irresistible). Some strengths I can identify are their
- encyclopedic command of the basic literature;
- ability to make connections and find analogies between different areas of mathematics;
- ability, depending on the subfield, to do maths: of the seven problems, Gemini gave essentially correct solution to six.
In particular I have found Gemini and its colleagues excellent at definite integration (see 12534, but please also read the aside: ‘How LLMs sometimes cheat at definite integrals’) and at proving identities in enumerative combinatorics (see my write-up for 12535, but I have also had success with significantly harder problems). The weaknesses we all know: see my write-up of 12531 for an example of the superficially plausible ‘proof’ that they love to produce, complete with (mostly) spurious literature references, that turns out on time-consuming inspection to be fundamentally flawed. And as the ‘fun’ column shows, I am confident that in a narrow majority of cases, I would have enjoyed the problem more if I hadn’t used the LLM at all.
I think this is what really worries me: when in a year or so, state-of-the-art LLMs can reliably solve almost every American Mathematical Monthly problem, and even write aesthetically pleasing proofs of the binomial identities I like so much, what place will be left for mathematicians like me that enjoy the shallows as well as the depths of mathematics? Well, this is my problem, and there could be worse ones to have.
Finally, I hope I am not opening up a huge can of worms with this blogpost. For complete completeness, I will mention, although it is anyway on public record, that I contributed maths problems to the FrontierMath benchmark, organized by Epoch AI. I also set and solve maths problems for Surge AI; they have nothing to do with this post. I have no connection with Google or OpenAI and I no longer do any work for Epoch AI.
The q-Vandermonde Theorem by counting subspaces
October 22, 2025Fix throughout a prime power . We define the
-binomial coefficient
to be the number of
-dimensional subspaces of
. If
or
then, by definition, the
-binomial coefficient is zero. The purpose of this post is to use this setting to prove one version of the
-Vandermonde convolution: en route we prove three results of increasing generality counting complementary subspaces.
In the proofs below we use the notation for a varying
-dimensional subspace that is counted in part of the proof; if the subspace is instead fixed, often as part of the hypotheses, we use
.
Lemma. Let be a given
-dimensional subspace of
. The number of
-dimensional subspaces of
complementary to
is
.
Proof. Let be the canonical basis of
of unit vectors. We may assume without loss of generality that
is spanned by the first
unit vectors. If
is a complementary subspace then
has a unique basis (up to order) of the form
where . There are
choices for each of
and so
subspaces in total.
If the claim about the basis is not obvious, I suggest thinking about row-reduction on the matrix whose rows are followed by a basis for
. We already have
with pivots in the first
columns, so these rows are already perfect. Each of the remaining
columns must be a pivot column, so after row operations (including reordering) on rows
we obtain a matrix
in which different choices for the matrix correspond to distinct subspaces
.
Proposition. Let be a given
-dimensional subspace of
. The number of
-dimensional subspaces
of
such that
is
.
Proof. Observe that is an
-dimensional subspace of
containing
. Such subspaces are in bijection with the
-dimensional subspace of
. Hence there are
-choices for a subspace
in which
and the required
are complementary. By the previous lemma, there are
choices for the complement
in
.
In the following corollary note that, by hypothesis, all the -dimensional subspaces we count contain the given
-dimensional subspace
.
Corollary. Let be given
– and
-dimensional subspace of
. The number of
-dimensional subspaces
of
such that
is
.
Proof. By the previous proposition, applied to , there are
complementary subspaces
to
. Now again use that subspaces of the quotient space
are in bijection with subspaces of
containing
.
Theorem. [-Vandermonde theorem] We have
Proof. Fix a -dimensional subspace
of
. We count subspaces
having intersection of dimension
with
; by the corollary, taking
and
, there are
such subspaces. Summing over
gives the result.
In the final step, we may assume that , else
cannot have an intersection of dimension
with
. Correspondingly, the summands for
not in this range in the
-Vandermonde theorem are zero.
Corollary [-Vandermonde theorem, alternative version] We have
Proof. Substitute and
in the original version, cancel in a few places, swap the order of the binomial coefficients in each product, and then erase the primes.
The continuation monad: a mathematical introduction
August 17, 2025In the Haskell category, objects are types, and morphisms are Haskell functions. The continuation monad for a fixed return type r is the functor m sending a type a to the type m a of functions (a -> r) -> r. The purpose of this post is to use vector space duality to motivate the definitions of the monadic operations return :: a -> m a and join :: m m a -> m a. We show that
-
return, the unit in the monad, is the analogue of the canonical inclusion of a vector space in its double dual, and - monad multiplication is the analogue of the canonical surjection
induced by applying the unit on
to get a canonical inclusion
.
Having constructed the continuation monad, and checked one of the two monad laws, we then use the Haskell identity
ma >>= f = join (fmap f ma)
to define the monadic bind operation. (Here fmap :: (a -> c) -> m a -> m c is what the continuation monad does on morphisms.) By unpacking this definition into one that is self-contained, we interpret monadic bind, usually denoted >>=, as a way to compose computations written in continuation passing style.
Thus the order in this post is the opposite of most (all?) of the many good introductions to the continuation monad that can be found on the web: I hope this will make this post of interest to mathematicians, and also perhaps to the expert computer scientists as an illustration of how a non-expert mathematician came (at last) to some understanding of the ideas. We end with a few examples of Haskell programs written in the continuation monad, or equivalently, in continuation passing style.
A Haskell module with all the Haskell below and some extras is available from my website.
The double dual monad
Fix a field and let Vect be the category of
-vector spaces. There is a canonical inclusion
defined by
maps to ‘evaluate at
‘
or in symbols
for and
. Thus
defines a natural transformation
from the identity functor on Vect to the double dual functor
. This will turn out out to to be the unit in the double dual monad.
Remark. In general is a vastly bigger vector space than
: for instance if
is the vector space of all
-valued sequences with finite support, then
is the vector space of all
-valued sequences. Thus
has a canonical countable basis, whereas and
has dimension equal to the continuum
. Moreover
has dimension
. (Even the existence of bases for
and
depends on set theoretic foundations: when
it is implied by the Boolean Prime Ideal Theorem, a weak form of the Axiom of Choice.) For this reason we have to reason formally, mainly by manipulating symbols, rather than use matrices or bases. But this is a good thing: for instance it means our formulae for the monad unit and monad multiplication translates entirely routinely to define the continuation monad in the category Hask.
The unit tells us how to multiply
To make the double dual functor a monad, we require multiplication, that is, a natural transformation . Thus for each vector space
, we require a ‘naturally defined’ linear map
. As already said, we obtain this as the dual map to
. That is
This needs some unpacking. Explicitly, is the linear map
for and
. Generally, given a linear map
, the dual map is given by `restriction along
‘: that is, we regard a dual vector
as an element of
by mapping
into
by
, and then applying
; in symbols
. Therefore the dual map to
, which is the
component of the natural transformation
, is
If you have an intuitive feeling for what this means, well great. I hope in a later post to show that has an intuitive interpretation in the case
using propositional logic to identify elements of
with boolean formulae in variables indexed by ; multiplication then becomes the observation that interpreting variables in a boolean formula as yet further boolean formulae does not leave the space of boolean formulae. Here I will note that the definition of
in
does at least typecheck:
so the map
is an element of
and therefore in the domain of
. Naturality of
follows very neatly from the naturality of
by duality: see the commutative diagrams below.

But why is it a monad?
What we haven’t shown is that the triple defines a monad. For this we need to check the monad laws shown in the commutative diagrams below.

The first is the identity law that . This has a short proof, which, as expected from the remark about formalism above, come down to juggling the order of symbols until everything works out. Recall that
is the canonical embedding of
into
. Thus if we take
then
is the element of
defined by
Moreover, is defined by regarding elements of
as elements of
by restriction along the canonical inclusion
. This map sends
to the linear map
in
. Therefore
and so is indeed the identity of
.
A proof in this style of the second associativity law seems distinctly fiddly. Making the feeble excuse that there is no ‘super case’ Greek alphabet to represent elements of , I will omit it. I plan in a sequel to this post to show that
is the monad induced by the adjunction betweeen the contravariant dual and itself; the identity monad law then becomes the triangle law
for the right-hand functor
in the adjunction, and the associativity law becomes an immediate application of the interchange law. If time permits I’ll include the proof by string diagrams: to my surprise I can’t immediately find this on the web, except in this nice video from the Catsters: see the blackboard at the 9 minute mark.
But how could it not be a monad?
As a final remark, for either law one could argue `how else could it possibly work?’ Thus in first, we know that is some kind of linear map
, and since all we have to hand is
, maybe it should be obvious from naturality that the map is
itself? It would be intriguing if this could be made into a rigorous argument either working in Vect or directly in Hask, perhaps using ideas from Wadler’s paper Theorems for free.
The continuation monad in Haskell
We now translate the definition of the unit and multiplication from the double dual monad into Haskell. This is completely routine. For the unit, we translate , as defined in
by
to
return x = \k -> k x -- return :: a -> ((a -> r) -> r)
It might be more idiomatic to write this as return = flip ($). Either way we get return :: a -> ((a -> r) -> r).
You can join if you can return
For multiplication, we translate , as defined in
by
to
join :: ((((a -> r) -> r) -> r) -> r) -> ((a -> r) -> r)
join w = \k -> w (return k)
Note that this uses return, capturing the sense seen above in which the unit for the double dual monad tells us how to define multiplication. Unpacked to be self-contained join becomes
join w = \k -> w (\φ -> φ k)
which thanks to Haskell’s support for Unicode is still valid code. It is a useful exercise to prove the identity monad law by equation-style reasoning from these definitions.
You can bind if you can join
We now use the Haskell identity ma >>= f = join (fmap f ma) mentioned in the introduction to define bind. First of course we must define fmap, defining the Functor instance for the continuation monad. For the double dual functor with , we have
and so
or unpacked fully,
We therefore define
fmap :: (a -> b) -> ((a -> r) -> r) -> (b -> r) -> r
fmap f run_a = \k_b -> run_a (k_b . f)
and the compiler will now accept
bind run_a a_to_run_b = join $ fmap a_to_run_b run_a
(For why run_a is a reasonable name for values of the monadic type (a -> r) -> r see the final part of this post.) It is now a mechanical exercise to substitute the definitions fmap run_a = \k_b -> run_a (k_b . f) and then join w = \k -> w (\φ -> φ k) to get a self-contained definition, which we tidy up in three further stages.
bind1 run_a a_to_run_b = join $ \k_b ->
run_a (k_b . a_to_run_b)
where join w = \k -> w (\φ -> φ k)
bind2 run_a a_to_run_b = \k -> (\k_b ->
run_a (k_b . a_to_run_b))(\φ -> φ k)
bind3 run_a a_to_run_b =
\k -> run_a ((\φ -> φ k) . a_to_run_b)
bind4 run_a a_to_run_b =
\k -> run_a (\x -> (a_to_run_b x) k)
In the final line, the parentheses around a_to_run_b x could be dropped. Incidentally, with the DeriveFunctor extension, ghc can derive the declaration for fmap just seen.
But don’t monads need an algebraic data type?
Answer: yes if we want to take advantage of Haskell’s do notation, and be consistent with Haskell expected types for return, fmap, join and >>=. Below we define an algebraic data type RunCont with inner parameter r for the return type and outer parameter a, so that the ‘curried type’ RunCont r has kind * -> *, as a monad should. The constructor is called RC (it is also common to reuse the type name as the constructor) and the ‘wrapped value’ can be extracted using runCont :: RunCont r a -> (a -> r) -> r.
newtype RunCont r a = RC {runCont :: (a -> r) -> r}
For instance if a_to_run_b :: a -> RunCont r b then runCont (a_to_run_b x) has type (b -> r) -> r so is the correct replacement for a_to_run_b in bind4. Wrapping return is even more routine, and we end up with
instance Monad (RunCont r) where
return x = RC (\k -> k x)
RC run_a >>= a_to_run_b = RC (\k ->
run_a (\x -> runCont (a_to_run_b x) k))
It is a nice fact that bind determines both fmap and join, so, after the unpacking above to avoid using fmap in the definition of bind, everything we need is in the declaration above. That said, versions of Haskell after 7.10 require separate instance declarations for Functor, which to illustrate this point we define by
instance Functor (RunCont r) where
fmap f (RC run_a) = (RC run_a) >>= (return . f)
and also for Applicative, which can be derived from bind, but since I find the result completely unintuitive, instead we’ll use the usual boilerplate
instance Applicative (RunCont r) where
pure = return
() = ap -- ap :: Monad m => m (a -> b) -> m a -> m b
recovering this instance from the Monad declaration. Incidentally, the functor declaration also has a ‘boilerplate’ definition; fmap = liftM is the ubiquitous boilerplate. I can’t easily find on the web why the ghc authors make this codebreaking change (the page on the Haskell wiki has a dead link for a 2011 proposal along these lines that was rejected) but surely the rise of programming in the applicative style has something to do with it.
Why newtype?
I used newtype rather than data above to make RC strict in its value. Counterintuitively (to me at least) this means that overall the RunCont monad is lazier, because the compiler does not need to inspect the argument to RC to check it has type (a -> r) -> r. Ever since I implemented the Random state monad with data rather than newtype, and many weeks later, found puzzling instance of what seemed like Haskell being over-eager to evaluate, I have been able to remember this distinction. An alternative to using newtype, which emphasises this point, is data RunCont r a = RC {!runCont :: (r -> a) -> a}. Note the explicit strictness declaration. There is a further benefit to newtype over data, that the strictness and unique constructor together mean that the compiler can erase the constructors and so generate code that has no overhead; a related fact, which at first I found rather surprising, is that fully compiled Haskell code has ‘no remembrance of types past’.
Continuation passing style is the computational model of the continuation monad
Not before time we mention that the type a -> r is that of continuations: functions that take as input type a and return the desired final output type of the entire computation, which we always denote r.
Callbacks
Such functions arise naturally as callbacks: ‘function holes’ left in the anticipation that another author or library will be able to fill in the gap. Consider for instance the following program intended to be useful when checking the Fibonacci identity .
fibSum n = sum [fib k | k <- [0..n]]
Of course this won’t compile, because there is no fib function to hand. Rather than take time off to write one, we instead press on, by bringing fib into scope in the simplest possible way:
run_fibSum n fib = sum [fib k | k | k <- [0..n]]
:: Int -> (Int -> Integer) -> Integer
Here I’ve supposed fib has type Int -> Integer; this is reasonable because the answer might be more than , but surely the input won’t be, but the true reason is to make a distinction between the types
a (here Int) and r (here Integer). Thus run_fibSum is a way of running the continuation fib to give the answer we want, and \n -> RC (run_fibSum n) is a value of type Int -> RunCont Int Integer, so suitable for use in our monad, most obviously as the second argument to bind >>=. For a minimal example, with exactly the same structure, take
run_fib n fib = fib n
-- run_fib :: Int -> (Int -> Integer) -> Integer
run_fib' n = \fib -> fib n
-- run_fib' :: Int -> (Int -> Integer) -> Integer
run_fib'' n = RC (\fib -> fib n)
-- run_fib'' :: Int -> RC Int Integer
where the final function could have been written simply as return n; as we have seen many times, the unit in the continuation monad is ‘ maps to evaluate at
‘. In this form we see clearly that
run_fib doesn’t really care that the intended continuation is the Fibonacci function; it could be any value of type Int -> Integer. I hope this explains why I’ve used RunCont as the name for the monadic type: its wrapped values are continuation runners of type (a -> r) -> r, and not continuations, of type a -> r. To my mind, using Cont is as bad as conflating elements of (continuations) with elements of
(continuation runners).
Continuation passing style
Consider the following very simple Haskell functions addOne x = x+1 and timesTwo y = y * 2 of intended type Int -> Int. We can compose them to get timesTwoAfterAddOne = timesTwo . addOne. To make life harder, we imagine that addOne has to control the entire future course of our program, and to help it out, we’ll give a suitably extended version of it the continuation timesTwo as the second argument:
addOneCPS x timesTwo = timesTwo (x + 1)
and so timesTwoAddOne x = addOneCPS x timesTwo. Of course this generalizes immediately to any continuation of type Int -> r; below I’ve renamed timesTwo to k (as in the derivation of bind above), but the function and type are the same:
addOneCPS :: Int -> (Int -> r) -> r
addOneCPS x k = k (x + 1)
Similarly we can write addTwo in this style,
timesTwoCPS :: Int -> (Int -> r) -> r
timesTwoCPS y k = k (y * 2)
and now a composed function, still in CPS style so having a continuation that might perhaps be as simple as id :: Int -> Int (when r is Int) or show :: Int -> String (when r is String), is
timesTwoAfterAddOneCPS x k = addOneCPS x
(\y -> timesTwoCPS y k)
Even allowing for the extra argument, this is far less attractive than the simple composition timesTwo . addOne we had earlier. But I am sure that by now you see where this is leading. Above our final version of monadic bind for the continuation monad (written without an algebraic data type) was, changing the dummy variable from x to y to avoid an imminent clash,
bind4 run_a a_to_run_b = \k ->
run_a (\y -> (a_to_run_b y) k)
Substitute addOneCPS x for run_a and timesTwoCPS for a_to_run_b and get
bind4 (addOneCPS x) timesTwo = \k -> (addOneCPS x)
(\y -> (timesTwoCPS y) k)
which is the -expansion of
timesTwoAfterAddOneCPS x. Therefore, when wrapped in the appropriate constructors, we can write timesTwoAfterAddOneCPS x = (addOneCPS x) >>= timesTwoCPS, and even use do notation:
timesTwoThenAddOneCPS x =
do y <- addOneCPS x
-- addOneCPS x = RC (\k -> k (x + 1))
timesTwoCPS y
-- timesTwoCPS y = RC (\k -> k (y * 2))
Recursion in continuation passing style
I’ll use the sum function as an example, even though it does not show Haskell in its best light. In fact, it is a notable ‘gotcha’ that both naive ways to write it use an amount of memory proportional to the length of the list. Worst of all is the right-associative fold
sumR xs = foldr (+) 0 xs
which means that sumR [1,2,3,4] expands to and then
is evaluated, and so on. (This can be checked using the linked Haskell code, which puts a trace on plus.) Still bad, but more easily fixed, is the left-associative fold
sumL xs = foldl (+) 0 xs
which means that sumL [1,2,3,4] expands to , and then
is evaluated, and so on. The correct way to write it uses a strict left-fold, available as
foldl'; one can write ones own using the primitive seq, which tells the compiler ‘I really want the first value computed to WHNF before you do anything more’
foldl' op init [] = init
foldl' op init (x : xs) = let next = init `op` xs
in seq next (foldl' op next xs)
One could also replace init above with !init to force evaluation without using seq. Either way, the init value functions as an accumulator for the sum. In this connection it is worth noting that, for the sum function on common numerical types, WHNF means complete evaluation, but in general one has to fight harder to force full evaluation. For more on these issues see the post beginning To foldr, foldl or foldl’, that is the question! on the Haskell wiki.
The right-associative fold is easily converted to CPS-style:
sumRCPS [] k = k 0
sumRCPS (x : xs) k = sumRCPS xs (k . (+ x))
Thus if the list is empty we apply the continuation to , and otherwise we modify the continuation so that it first adds
, then does whatever it was going to do before. Thus
sumCPS [1,2,3,4] id evaluates to sumCPS [2,3,4] (id . (+1)), and then in three further steps to sumCPS [] (id . (+1) . (+2) . (+3) . (+4)); now, as already seen, the continuation constructed above is evaluated , giving
id . (+1) . (+2) . (+3) . (+4) $ 0 which in turn evaluates to . Note that although this might look like a left-associative fold, it does the same computation as the right-associative fold.
The only way to convert the left-associative fold to CPS-style that I can see does all the work outside the continuation
sumLCPS [] acc k = k acc
sumLCPS (x : xs) acc k = sumLCPS xs (acc + x) k
where again either seq or a strictness annotation !acc is needed to avoid a space blow-up. So rather disappointingly, CPS-style is no help at all in writing an efficient sum function.
An example where CPS-style is useful
The function takeUntil has the following naive implementation:
takeUntil :: (a -> Bool) -> [a] -> Maybe [a]
takeUntil p [] = Nothing
takeUntil p (x : xs)
| p x = Just []
| otherwise = x `mCons` takeUntil p xs
where x `mCons` Nothing = Nothing
x `mCons` Just ys = Just (x : ys)
For example, takeUntil (> 4) [1,2,3] evaluate to 1 `mCons` 2 `mCons` 3 `mCons` Nothing, and then to 1 `mCons` 2 `mCons` Nothing, then to 1 `mCons` Nothing and finally to Nothing. (Nothing will come of nothing, as King Lear so inaptly said.) Using continuation passing style, we can short-circuit the evaluation in the ‘Nothing’ case.
takeUntilCPS :: (a -> Bool) -> [a] ->
(Maybe [a] -> Maybe [a]) -> Maybe [a]
takeUntilCPS p [] _ = Nothing
takeUntilCPS p (x : xs) k
| p x = k (Just [x])
| otherwise = let k' (Just ys) = k $ Just (x : ys)
in (takeUntilCPS p xs k')
Note we do not need a pattern match on ‘Nothing’ to define k' because if the input was Nothing then we’d have discarded the continuation and immediately returned Nothing. This is the first and only example in this post where we use the freedom in continuation passing style not just to modify the continuation but to discard it entirely.
Final thought on continuations
Finally, as far as I can see, continuation passing style isn’t really very important in Haskell, and some comments I’ve read suggest it can even be harmful: the linked answer mentions that for GHC,
‘The translation into CPS encodes a particular order of evaluation, whereas direct style does not. That dramatically inhibits code-motion transformations’.
On the other hand, continuations are excellent for stretching the mind, and illuminating how programming languages really work. Hence, no doubt, their ubiquity in computer science courses, and the many tutorials on the web. In this spirit, let me mention that, as in the Yoneda Lemma, if we get to pick the return type r in RunCont r a then there is a very simple hack to extract show that any continuation runner really is a hidden value of type a (this is a common metaphor in web tutorials), which we can extract by
counit :: RunCont a a -> a
counit (RC run_a) = run_a id
This might suggest an alarmingly simple way to implement the continuation monad: just choose the output type r to be a, extract the value, and do the obvious. But of course the code below only type checks if we (as a text substitution) replace r with a, and so, as I’ve seen many times, somehow the syntactical type checker can guide us away from what is semantically incorrect.
instance Monad (RunCont r) where
return x = RC (\k -> k x)
(RC run_a) >>= g = let x = run_a id in g x
-- • Couldn't match type ‘a’ with ‘r’
‘a’ is a rigid type variable bound by ...
Sampling from the tail end of the normal distribution
February 3, 2025The purpose of this post is to present two small observations on the normal distribution, with more speculative applications to academic recruitment. The mathematical part at least is rigorous.
Tails and the exponential approximation
We can sample from the tail end of the standard normal distribution by sampling from
as usual, but then rejecting all samples that are not at least some threshold value
. From my mental picture of the bell curve, I think it is intuitive that the mean of the resulting sample will be not much more than
. In fact the mean is
Please pause for a moment and ask yourself: do you expect the variance of the tail-end sample to increase with or instead to decrease with
?
The answer, less intuitively I think, is that the variance decreases as . More precisely, it is
Thus, the larger is, the more concentrated is the sample. Given that one expects tail events to be rare, and so perhaps more variable, this is maybe a surprise. One can derive the series expansions above by routine but fiddly calculations using the probability density function
of the standard normal distribution.
Alternatively, and I like that this can be done entirely in one’s head, observe that for large and small
, we have
and so the normal distribution conditioned on should be well approximated by an exponential distribution with parameter
, shifted by
. The probability density function of the unshifted exponential distribution with parameter
is
, and, correspondingly, as is well known, its mean and standard deviation are both
. Thus this approximation suggests that the mean of the tail distribution should be
and its variance
, agreeing with the two asymptotic series above.
To make this explicit, the table below shows the percentage of ‘outlier’ samples that are or more standard deviations from the mean, in a sample conditioned to be at least
standard deviations from the mean.
| 0 | 1 | 2 | 3 | 4 | 5 | |
| 31.7 | 14.3 | 5.9 | 2.3 | 0.9 | 0.3 |
The University of Erewhon
In the selection process for the University of Erewhon, applicants are invited to sit a fiendishly difficult exam. The modal score is zero (these candidates are of course rejected) and the faculty recruits the tiny minority who exceed the pass mark. The result, of course, is that the faculty recruits not the best mathematicians, but those best at passing their absurd test. (Nonetheless there is a long queue at the examination centre each year: people want jobs!) But more deleteriously, the effect is to create a monoculture: modelling exam-taking ability as an random variable, the analysis above shows that sampling with threshold
gives a distribution in which almost everyone scores about
. (We saw the standard deviation was
up to terms of order
.) And of course
is chosen so large that only those able to devote years of their life to preparing for the exam, to the exclusion of all other activities, have any chance of passing. And thus the dispririting result is that all candidates are not only rather alike in their test-taking ability, but the selection by this one statistic creates a culture entirely dominated by the branches of mathematics and the personality traits favoured by the test. The table above shows it also creates the delusion that most people are about as good at maths (or at least at the Erewhonian test) as you: for example, if
and you have 50 colleagues, then since only 2.3% of the conditioned sample are outliers, lying in the
tail, there is a good chance that none of your colleagues is more than 1 standard deviation ahead of you in ability.
I have gone through selection processes in my life that had something in common with this obvious parody (often I fear unjustly benefitting), and believe I can write from experience that the result has never been good. The alternative of course is to use multiple criteria when recruiting, allowing that some successful candidates may be weak in some areas, while very strong in others. Rather obvious, I know, but I think it’s interesting that a very simple mathematical argument backs up this conclusion.
In the next part of the post we’ll see that the effect of luck can mean that Erewhonian style recruitment not only creates a monoculture, it doesn’t even successfully select the most talented candidates.
The effect of luck
If our intuition about tail events is perhaps suspect, our intuition for conditional probability is surely worse. When the two come together, by conditioning on rare events, the effects can be catastrophic. Selection bias is frequently overlooked even by those who should know better. It seems quite possible to me that we’re here today only because in the many worlds in which the delicate balance of nuclear deterrence failed, there is no-one left in the charred embers to read blog posts on selection bias. Or to pick a less extreme example, would you spend £1000 to attend a seminar given by a panel of lottery winners in which they promise to reveal the detail of their gambling strategies?
Returning to the normal distribution, and the University of Erewhon, suppose that with probability we sample from
(people with average luck), and with probability
we sample from
(lucky people). Suppose that
is small, say
, so the effect of luck is relatively modest, boosting one’s performance by just
a standard deviation. As before, we then reject all samples below the threshold
. Does the effect of luck become more or less significant as
increases?
I think here most people would intuitively guess the right answer, but the magnitude of the effect is striking. The table below shows the number of ‘average’ and ‘lucky’ people when we take a million samples according to the scheme above, keeping only the samples that are at least . We suppose that there is a probability
of being lucky. The final column is the ratio of lucky to average people.
| average | lucky | ratio | |
| 1 | 127064 | 61573 | 0.48 |
| 1.5 | 53375 | 31774 | 0.60 |
| 2 | 18361 | 13070 | 0.71 |
| 2.5 | 4924 | 4562 | 0.93 |
| 3 | 1054 | 1176 | 1.12 |
| 3.5 | 187 | 250 | 1.34 |
| 4 | 24 | 51 | 2.13 |
Thus even though average people outnumber lucky people four to one, for a ratio of 0.25, when we sample from the tail of the distribution (with, as just explained, a boost of 0.5 for lucky people), lucky people predominate. The effect is that the University of Erewhon fails in its goal of recruiting people three standard deviations above the mean: instead of the 2230 successful candidate, 1176 are people who are merely about 2.5 standard deviations above the mean, but happened to get lucky.
The effect is robust to different models for luck. For instance suppose, to be fair, we allow people to have a chance to be unlucky, and, to counterbalance this, model luck as an random variable, so the standard deviation is
, equal to the effect size in the previous model. Since the sum of
and
random variables has the
distribution, the effect is to increase the variance, while leaving the mean unchanged. The graphs below show the probability density function for the original
random variable, conditioned on the sum being in the
tail, for
.

Observe that the density functions become slightly more concentrated, and while they shift to the right, the means are 2.31109, 2.67471, 3.04521 and 3.42096, significantly less in all cases than the threshold . So, once again, the most obvious effect of being more selective is to increase the importance of being lucky, and we see the monoculture effect from the first scenario. The importance of luck is confirmed by the graph below showing the probability density function for luck, again conditioned on the sum being in the
tail.

It is striking that even for (the blue curve) only 10.09% of the conditional probability density function is in the negative half. Almost everyone recruited gets lucky, most by at least one standard deviation from the mean and many by two or more standard deviations from the mean. The moral is again clear: unless you believe (as is sometimes inaccurately said of Napoleon) that luck is an intrinsic life-long characteristic that you genuinely want to select for, use multiple criteria when recruiting.
Lifting set/multiset duality
November 23, 2024This is a pointer to a note in the same spirit as the previous post. This time we take and lift the identity
where is the number of
-multisets of a set of size
to a long exact sequence of polynomial representations of
, working over an arbitrary field
. We then descend to get quick proofs of generalizations of the symmetric function identity
for and the corresponding
-binomial identity, obtained by specializing at
, namely
Lifting the ‘stars and bars’ identity
September 21, 2024Let denote the number of
-multisubsets of an
-set.
A basic result in enumerative combinatorics is that
One proof of this uses the method of stars and bars. The link will take you to Wikipedia; for a similar take on the method, interpreting as the number of ways to put
unlabelled balls into
numbered urns, see Theorem 2.3.3 in my draft combinatorics textbook.
The purpose of this post is to give a combinatorial proof of the -binomial version of this identity and then to show that this
-lift can be restated as an equality of two symmetric functions specialized at the powers of
, and so is but a shadow of an algebraic isomorphism between two representations of
. I finish by explaining how my joint work with Eoghan McDowell shows that this isomorphism exists for representations of
over an arbitrary field
, and then outlining a beautiful construction by Darij Grinberg of an explicit isomorphism, which (deliberately) agrees with the one in our paper. Each lifting step, indicated by increasing equation numbers, gives a more conceptual, and often more memorable result. It is not yet clear what formulation might deserve to be called the ‘ultimate lift’ of the stars and bars identity.
This earlier post has much more on the background multilinear algebra, and representation theory.
Subsets are enumerated by
-binomial coefficients
Given a subset of
, we define its weight, denoted
, to be its sum of entries. We shall adopt a slightly unconventional approach and define the
-binomial coefficient
to be the enumerator by weight of the
-subsets of
, normalized to have constant term
. That is,
(This property of -binomial coefficients appears to me to be fundamental, but it is not always stressed in introductory accounts.) Set
if
or
is negative. Clearly
is a polynomial in
. For example
and and
for all
. We now recover the usual definition of
. As a preliminary lemma we need a
-lift of Pascal’s Rule
, which we prove by adapting the standard bijective proof.
Lemma. If and
then
Proof. If then the result holds by our convention that
so we may suppose that
. Removing
from a
-subset of
containing
puts such
-subsets in bijection with the
subsets of
, enumerated by
. This map decreases the weight by
. The
-subsets of
not containing
are in an obvious weight preserving bijection with the
-subsets of
, enumerated by
. Therefore,
Cancelling , using that
we obtain the required identity.
For , we define the quantum number
by
and the quantum factorial
by
. Observe that specializing
to
these become
and
, respectively. (See the section on characters below for the connection with the alternative convention in which quantum numbers are Laurent polynomials.) Note that in the following proposition, if
then both products are empty and so
for all
.
Proposition. If ,
then
Proof. Working by induction on using the previous lemma we have
where we used the inductive hypothesis to go from line 1 to line 2 and then
for the final equality.
Partitions in a box are enumerated by
-multibinomial coefficients
Given a multisubset of
, we again define its weight to be its sum of entries (now taken with multiplicities). Let
be the corresponding enumerator. (The double brackets notation is not standard.) For example
and and
for all . This equality
is not a coincidence.
The
-stars and bars identity
The -multisubsets enumerated by
are in obvious bijection with sequences
where
. In turn, such sequences are in bijection with
subsets of
by the map
The composite map takes a multisubset of weight to a subset of weight
. By the first section,
enumerates
-subsets of
by their weight. We therefore have a bijective proof that
This is the -lift of (1), that
. It seems striking to me that this bijection is easier to describe that the stars and bars bijection, while proving a more general result.
The abacus interpretation
We digress to give another combinatorial interpretation of . Observe that a partition whose Young diagram is contained in the
box is uniquely specified by the sequence
of
-coordinates of its
distinct upward steps. (See examples shortly below.) Moreover, since an upward step at
-coordinate
adds
boxes to the partition, the size of the partition is the weight of the multisubset
. Therefore
enumerates partitions in the
box by their size. Using this partition setting, the map
from weakly increasing sequences to subsets of corresponds to recording a partition not by the
coordinates of its upward steps, but instead by the step numbers of its upward steps, numbering steps from
.
Illustrative examples with and
of the partitions
and
are shown in the diagram below with step numbers in bold.

Below each partition is the corresponding abacus: note that the step numbers of the upward steps are simply the bead positions on the abacus. This is related to Loehr’s beautiful labelled abaci model for antisymmetric functions.
Exercise. Use the abacus to give an alternative proof of the lemma, restated below
Perhaps surprisingly, this is not the unique -lift of Pascal’s Rule. Use the abacus in a different way to prove that
Exercise. Prove that
This is one of many identities that generalize the usual Binomial Theorem.
Symmetric functions.
For let
and
be the elementary and complete symmetric functions of degree
in variables
. Thus
is the sum of all products of
distinct variables and
is the sum of all products of
variables, repetitions allowed. More generally, let
denote the Schur function labelled by the partition
as defined combinatorially by
where if the semistandard tableau has
entries of
for each
then
. It is a good basic exercise to show that
and
. It is immediate from our definition of
-binomial coefficients that
and it is immediate from the -stars and bars identity that
and so The
-stars and bars identity becomes
All these identities are special cases of the result, almost immediate from the combinatorial definition of Schur functions that enumerates semistandard tableaux with entries from
by their weight, i.e. their sum of entries.
Characters of 
The reason for bringing in symmetric functions is the following theorem which connects representations of with plethysms of symmetric functions and hence enumeration of semistandard tableaux. Let
denote the Schur function labelled by the partition
; if
is a
-dimensional vector space then the trace of the diagonal matrix with entries
acting on
is
. It is an instructive exercise to verify from this property that
and
.
Theorem. Let be the natural representation of
. The representations
and
are isomorphic if and only if the specialized Schur functions
and
are equal up a power of
.
Proof. See Theorem 3.4(b) and (d) in my joint paper with Rowena Paget.
One reason for not giving full details at this point is that it enables me to pass lightly over one technicality. By our definition, the representation of
has character
(Or to be precise, the trace of the diagonal matrix with entries acting on
is this polynomial evaluated at
,
.) Restricting to
, we should impose the relation
, meaning that if
then
, and so we work most naturally with Laurent polynomials in
. (This is essential since unequal characters of
may become equal on restriction to
. In fact this happens if and only if they differ by a power of the determinant representation.) Thus as a representation of
,
has character
and more generally, has character
where is the plethysm product. (This is the main idea needed to prove the theorem.) One can always pass from a Laurent polynomial identity in
to a polynomial identity in
polynomials by setting
and multiplying through by a suitable power of
, so it is possible to work in either setting.
Exercise. Give as many proofs as you can that the -binomial coefficient
is centrally symmetric; that is, writing
for the coefficient of
in the polynomial
, we have
for all . [Hint: there is a one-line proof using
-characters. For a proof at the level of polynomials, show that the reverse of a polynomial
of degree
is
, and that the palindromic property is preserved by multiplication and division, and then apply this to the formula in the proposition.]
It is worth noting that the sum of centrally symmetric polynomials (as opposed to Laurent series) is centrally symmetric if and only if they have the same degree. This appears to be one obstacle to giving a representation theory interpretation of identities such as the -lift of Pascal’s Rule in the first lemma.
Exercise. Use the representation theory of to show that the
-binomial coefficients
are unimodal; that is, the coefficients increase up to the middle degree(s) and then decrease.
The Wronksian isomorphism.
Taking and
in the theorem, we obtain that the representations
and
of
are isomorphic if and only if
and
are equal up a power of
. (These equalities were proved in the previous subsection.) We therefore set
and obtain the Wronksian isomorphism.
of representations of , lifting the
-stars and bars identity
to the level of representations. Instead going down to the level of ordinary binomial coefficients, by setting
, we obtain
which is where we started. This is however only the start of the story: in my joint paper with Eoghan McDowell, we showed that the modified Wronksian isomorphism
in which the symmetric power (defined as a quotient of
is replaced with its dual
(defined as a submodule of
), holds as an isomorphism of representations of
for an arbitrary field
. We do this by defining an explicit map from the right-hand side to the left-hand side. This is the most refined lift I know to date of stars and bars.
Exercise. Let be a
-dimensional vector space over
. Show that, regarded as a representation of
,
has
as an irreducible
-dimensional submodule with quotient isomorphic to the determinant representation and, dually,
has as a one-dimensional submodule isomorphic to the determinant representation, with quotient the irreducible
-dimensional representation already seen.
A proof of the Wronskian isomorphism after Darij Grinberg
For a very elegant proof of the Wronskian isomorphism, working still over a field of arbitrary characteristic, see these notes of Darij Grinberg, written in response to my joint paper. Here I will give an outline of Grinberg’s very beautiful construction.
Setup
Let as above. Then
can be thought of as the space of homogeneous polynomials of degree
in
and
; it is the degree
component of the polynomial algebra
(It may be helpful to remember the alternative notation for either side.) Thus we can identify
with a subspace of
Since, given any vector space
, by definition
is the subspace of
of symmetric tensors, we can, in turn, identify
with the subspace of
spanned by the symmetrisation of all tensors
for . (Note that this symmetrization should be performed without introducing scalar factors, giving a basis
,
,
as in the exercise above when
and
.) Finally we use that this tensor product is isomorphic to
by the very explicit isomorphism defined on a spanning set by
.
The image of is all polynomials in
of homogeneous degree
with respect to each pair of variables
that are symmetric with respect to permutation of indices. It might at first seem a little strange that the non-commutative tensor product can be represented in a commutative polynomial algebra, but I think any confusion is resolved when one realises that the indices of the
serve to identify which tensor factor all variable come from.
This chain of maps gives the left-hand column of the commutative diagram below.

The right-hand column is similar, and in fact a little easier to think about, as the map can be defined very simply by
without any worries about introducing scalar factors. The image of is all polynomials in
of homogeneous degree
with respect to each pair of variables
that are antisymmetric with respect to permutation of indices.
Antisymmetrization and Grinberg’s map
Let denote the action of
on the indices of a polynomial
. This corresponds to position permutation of tensor factors, as in the displayed equation ending the previous section. Hence the image of the composite map from
to
is all polynomials
Denote this antisymmetric polynomial by . Observe that if we apply
to a polynomial that is already antisymmetric (for instance, one in the image of the right-hand column maps) then we multiply by
. Thus
is a pseudo-projection.
Let be the linear map defined by
for .
Observe that if is homogeneous of degree
in each pair
(as in polynomials in the image of the first column) then the multiplication boosts the homogeneous degree of
from
to
. Since we then antisymmetrize, the composite map going down the left-hand column and then across has image in the subspace of
defined by the going down the right-hand column. Moreover, by the key identity
for polynomials symmetric under index permutation, it follows that the restriction of
to be subspace of such polynomials is defined by multiplication by
This element can be rewritten as
and so affords the power of the determinant representation of
, which is of course the trivial representation of
. It follows that
induces an
-equivariant map, denoted
in the commutative diagram above. Because multiplication in the integral domain
is injective,
is injective. By dimension counting it is surjective, and we are done.
In fact is precisely the map defined in my joint paper, but described in a much more transparent way, which of course Grinberg exploits in the proof, as I have attempted to show above. See Section 0.1 of his paper for details.
Notes on the Wedderburn decomposition
September 10, 2024Let be a finite group and let
denote the set of irreducible representations of
up to isomorphism. Wedderburn’s Theorem is that there is an algebra isomorphism
where is the complex group algebra of
. Considering the modules for the right-hand side, we see that if
is a simple
-module then
acts on
as a full matrix algebra, and the kernel of the action map is the sum of the other block algebras. Thus Wedderburn’s Theorem implies Jacobson’s Density Theorem for simple
-modules, namely that the matrices representing the action
on a simple
-module
span
.
This corollary, that irreducible representations of finite groups are simply matrix algebras acting on column vectors often strikes me as a little surprising: are irreducible representations really so boring? I think the answer is ‘yes they are, from the ring-theoretic perspective’. Consider that many of the interesting problems in representation theory concern the decomposition of big representations into irreducibles: for instance, for the symmetric group, Foulkes’ Conjecture in characteristic zero, or the decomposition matrix problem in prime characteristic. For such problems we care very much about the particular label of the irreducible representations (these come from the group structure), not just their dimension (which we have seen is entirely ring-theoretic information).
In this post I shall give three proofs of Wedderburn’s Theorem, with critical commentary. All three proofs need the following basic result from character theory.
Decomposing the regular character of 
By character orthogonality, the regular character of defined by
decomposes as
. Hence as a left
-module,
and in particular
The inversion theorem proof
Let for
be the irreducible representations of
. We extend the domain of each
to
by linearity, so now
is an algebra homomorphism. Identifying
with
by a fixed choice of basis, we obtain a map
.
We must show it is an algebra isomorphism. By dimension counting using it suffices to show that it is injective. Thus we must show that each
is determined by its actions
on a full set of irreducible representations. For this, we use the following inversion formula:
To see this is true, it suffices, by linearity, to prove it when is a group element. In this case
and the formula follows from the fact that
is
if
is
, and
otherwise. This fact is just column orthogonality between the column of
and the column of the identity element of
.
Critical remarks
The obvious criticism is: ‘where did the inversion formula come from?’ One way to reconstruct it is to use that the central primitive idempotent for the irreducible representation is
We remark that is clearly in the centre of
because its coefficients are constant on conjugacy classes; moreover, it is an easy calculation using row orthogonality to show that
if
and
. These properties can also be used to deduce the formula if one has forgotten it.
Now suppose that is in the pre-image of the Wedderburn block
so acts non-zeroly only on
. We then have
and so the coefficient
of
in
is the coefficient of
in
namely
Thus . (This formula is useful also in modular representation theory: see my blog post Modular representations that look ordinary.) We have now proved the inversion formula in the special case; summing over all irreducible representations
gives it in general. Intuitively, borrowing Kac’s phrase ‘one can hear [from its action on irreducible representations] the shape of an element of a semisimple group algebra’.
A less obvious criticism but one that should be made, is that the proof (particularly when read with the intuition above) can easily be misunderstood as proving more than is possible. The subtle point is that when we are given the image of under the Wedderburn isomorphism, we get a set of
block matrices, not the corresponding labels of irreducible representations. But since the real point is to prove injectivity, not to show that this ‘abstract’ decomposition determines the action of
on each irreducible representation (which of course is false whenever there are two irreducible representations of the same dimension), we can assign the labels as we set fit.
If is a finite abelian group then the inversion formula states that if
is any function then
where we write rather than
for the character group of
. (To deduce this from the inversion formula, just identify
with the corresponding element of the group algebra, namely
.) This can be interpreted as Fourier inversion, reading
as `the Fourier coefficient of
at the irreducible
‘. For example, if
then, setting
we get
where . The very first post in this blog, from September 2010, has at bit more on this perspective.
The Schur’s Lemma proof
By Schur’s lemma applied to we get
To illustrate the second isomorphism, consider the case and let
be a any non-zero vector. Let
be isomorphic to
by an isomorphism sending
to
. Clearly any
endomorphism of
is determined by its effect on
and
, and less clearly, but I hope you’ll see that it follows from yet more Schur’s Lemma, it is of the form
and
for some
, corresponding to the matrix
To complete the proof we use that any -endomorphism of
is of the form
for a unique
(determined as the image of
) and so
, the opposite algebra to
. This gives the Wedderburn isomorphism, but with an inconvenient ‘opposite’, which can be removed by noting that
is an isomorphism between
and
.
Critical remarks
The strength of this proof to my mind is that it has just one idea: ‘look at the endomorphism algebra of as a module for itself’. After that, it’s just the remorseless application of Schur’s Lemma. The one idea requires is well motivated by experience with computing with representations. For instance, one very important link between character theory and the theory of permutation groups is that if
is the permutation representation of
acting on a set
, with canonical basis
, then
and
for an irreducible representation if and only if the action of
on
is
-transitive.
The weaknesses are, in my view: (1) it meets the criterion to be a ‘one shot’ proof, but the fuss with the opposite algebra lengthens the proof; (2) the map it gives comes not from
acting on
(as we expected) but from
acting on
, in a way that commutes with the diagonal left action of
. Taken together (1) and (2) make the isomorphism far less explicit than need be the case.
Digression on Yoneda’s Lemma
The appearance of the opposite algebra can be predicted: I’d like to digress briefly to show that it’s just a manifestation of the ‘opposite’ that appears in Yoneda’s Lemma. Stated for covariant functors, a special case of Yoneda’s Lemma states that natural transformation between the representable functors and
are in bijection with
; given such a morphism
, the map
is defined by
, composing from right-to-left. (What other source of natural maps between these hom-spaces could there possibly be beyond morphisms from
to
?) This makes the Yoneda embedding
a contravariant functor from
to
. (Slogan: ‘pre-composition is contravariant’.)
Cayley’s Theorem is the special case where is a group (a category with one object and all morphisms invertible): the Yoneda embedding is then the map
defined by
where
. Note that
which may be read as ‘do
then
‘ maps to
— we check:
Thus the Yoneda embedding is not a group homomorphism, but becomes one if we regard the codomain as . The proof above that
is almost the same ‘dancing around Yoneda’ but this time with a version of Yoneda’s Lemma for categories enriched over
-vector spaces.
Incidentally, the Yoneda embedding becomes covariant if we instead take natural transformations from to
(‘post-composition is covariant’), but the ‘opposite’ pops up in another place, as these hom-into functors are contravariant, or, in the more modern language, ordinary covariant functors but defined on
.
Finally, Yoneda’s Lemma is often stated as
‘natural transformations from representable functors are representable’; observe that when , we have
, so this is indeed a generalization of the result as I stated it.
The proof via density
In this ‘two shot’ proof we first prove Jacobsen’s Density Theorem for -modules. The proof below closely following Tommaso Scognamiglio elegant proof on MathStackExchange. As motivation, recall that a representation of a finite group is irreducible if and only if it is cyclic, generated by any non-zero vector. Given this, to prove density, it is very natural to show that
is an irreducible representation: but since (supposing the conclusion) we know that as a left
-module
is not irreducible, we are going to have to consider a bigger group.
Density
There is a canonical -linear isomorphism
under which
maps to the rank
endomorphism defined by
. Recall that
is a representation of
(acting on the left as usual) by
. The matrix giving the action of
is then
, the transpose appearing because if
is the dual basis to
then from
we get , and so
, which then requires a transpose to agree with the matrix convention in
. Thus
is a representation of
with action defined by linear extension of
in which acts with matrix
. In particular
. Hence
showing that is an irreducible character and
is an irreducible representation of
, and so (as remarked above) generated by any non-zero element. In particular, the image of the identity
under the action of
spans
. Since, writing
for the modular action,
this span is the same as the span of all for
, as required for density.
Proof of Wedderburn’s Theorem
By density, for each irreducible representation , the induced map
is surjective. Hence
is a surjective algebra homomorphism. But by (), the dimensions on each side are
. Hence this map is an algebra isomorphism, as required.
Critical remarks
A strength of this proof is that you can swap in your own preferred proof of the density theorem. For instance, to prove the generalization to group algebras over an algebraically closed finite field that
one can instead prove the density theorem for irreducible representations over an arbitrary field, and deduce that the map is injective because (either by definition, or an easy equivalent to the definition) an element of acting as zero on every irreducible
-module lies in
. The only disadvantage to my mind is that it’s less suitable for beginners: Jacobson’s Density Theorem is not an obvious result, and although I like Scognamiglio’s proof very much, the use of the
action (on the left, but with one factor twisted over to the right, hence the fiddliness with transposition) could be off-putting.
Why ‘density’? A unitary example
The Weyl algebra models the position and momentum operators in quantum mechanics. The relation above is the algebraic formulation of the uncertainty principle that there can be no simultaneous eigenstate for both the position
and momenum
operators. In the canonical interpretation of (time independent) wave functions as elements of
, i.e. square integrable complex valued functions on
, there is a representation
defined on the dense subspace of of smooth functions in which the Weyl algebra generators act as self-adjoint operators. Exponentiating these operators we obtain
and
defined by
which generate an irreducible unitary representation of the Heisenberg group by multiplication and translation operators. (The second equation is essentially Taylor’s Theorem, and any worries about its applicability can be assuaged by muttering ‘dense subspace’.) The Stone–von Neumann theorem is that any irreducible representation of the Heisenberg group is unitarily equivalent to this representation. An excellent question on MathStackexchange asks whether, whether these maps generate a dense subalgebra of , suitably interpreted. The answer, perhaps surprisingly, is ‘no’ with respect to the operator norm, but ‘yes’ with the strong or weak operator topology; the proofs of this appeals to von Neumann’s double commutatant theorem, which is in turn is a generalization of the density theorem.
Automorphisms
Suppose that is an algebra automorphism of
. Let
be the natural
-dimensional irreducible representation. Then twisting the action by
we get a new representation
in which
acts by
. Since
has a unique irreducible representation up to isomorphism, we have
. Hence, by definition of isomorphism of representations, there exists an invertible matrix
such that
for all
. Therefore
is conjugation by
. We have proved the special case of the Skolem–Noether theorem that all automorphisms of
are inner, and so the automorphism group is isomorphic to
. I am grateful to Mariano for pointing out to me that the automorphism group of
is more complicated whenever there are several blocks of the same dimension.
Eigenvalues of the Kneser graph
June 8, 2023A few days ago Ehud Friedgut gave a beautiful talk in the Bristol Combinatorics Seminar, titled ‘The most complicated proof ever of Mantel’s theorem, and other applications of the representation theory of the symmetric group‘ and based on joint work with Yuval Filmus, Nathan Lindzey and Yuval Wigderson. As the title promised, he used the representation theory of the symmetric group to prove Mantel’s Theorem, that if a graph on vertices has more edges than the complete bipartite graph with bipartition into two parts of size
, then it contains a triangle. He illustrated the method by instead proving the Erdős–Ko–Rado Theorem, that the maximum size of a family of
-subsets of
such that any two members of the family have a non-trivial intersection is
.
The first purpose of this post is to outline the new proof (as I understand it from Ehud’s talk) of this result, dropping in my own proof of one of the lemmas he used. Let me emphasise that — apart from the minor proof of the lemma, which in any case uses results of James — no originality is claimed for anything in this part. I then show that the methods appear (modulo the proof of the claim below) to adapt to proof the analogue of the Erdős–Ko–Rado Theorem for vector spaces.
A graph for the Erdős–Ko–Rado Theorem
The graph we need to prove the Erdős–Ko–Rado Theorem is the Kneser graph, having as its vertices the
-subsets of
. Two subsets
and
are connected by an edge if and only if
. The Erdős–Ko–Rado Theorem is then the claim that the maximum size of an independent set in
is
. As we shall see below, this follows from the Hoffman bound, that an independent set in a
-regular graph on
vertices with adjacent matrix
and least eigenvalue
has size at most
. (Note that since the sum of the eigenvalues is the trace of the adjacency matrix, namely
, and the largest eigenvalue is the degree
, the least eigenvalue is certainly negative.)
Representations
To find the eigenvalues of , Ehud and his coauthors use the representation theory of the symmetric group. Let
be the permutation module of
acting on the
-subsets of
. It is well known that
has character
Since this character is multiplicity-free, there is a unique direct sum decomposition of into Specht modules,
The adjacent matrix acts as a linear map on
that commutes with the
-action. Therefore, by Schur’s Lemma, for each
with
there is a scalar
such that
acts on
as multiplication by
. Thus the
are the eigenvalues (possibly with repetition, but this turns out not to be the case) of
.
Lemma. The adjacency matrix acts on
by scalar multiplication by
.
Proof. It follows from the theory in Chapter 17 of James’ Springer lecture notes on the symmetric groups that has a filtration
such that and
for
. The submodule
is spanned by certain ‘partially symmetrized’ polytabloids. Rather than introduce this notation, which is a bit awkward for a blog post, let me instead suppose that
acts on the set
where
. Let
be a fixed
-subset of
. Then
is a cyclic
-submodule of
generated (using right modules) by
where
There is a ‘raising map’ which — viewing
as a submodule of the permutation module of
acting on
-subsets — is defined on the generator
by
Fix an -set
appearing in the expansion of
in the canonical basis of
. Applying the adjacency matrix
sends this
-set to all those
-sets that it does not meet. Hence, after applying
and then
we get a scalar multiple of
precisely from those sets
in the support of
such that is an
-subset of
not meeting
There are forbidden elements, hence there are
choices for
. Each comes with sign
.
End of proof
The lemma implies that is the least eigenvalue of
. Putting this into Hoffman’s bound and using that the graph
is regular of degree
, since this is the number of
-subsets of
not meeting
, we get that the maximum size of an independent set is at most
which is exactly the Erdős–Ko–Rado bound. Thus we (or rather Ehud Friedgut and his coauthors) have proved the Erdős–Ko–Rado Theorem. Ehud then went on to outline how the method applies to Mantel’s Theorem: the relevant representation is, in the usual notation, ; note one may interpret an
-tabloid as an ordered pair
encoding the path of length 3 with edges
and
. There is a proof of Mantel’s Theorem by double counting using these paths: as Ehud remarked (quoting, if I heard him right, Kalai) ‘It’s very hard not to prove Mantel’s Theorem’.
Possible variations
I’ll end with two possible avenues for generalization, both motivated by the classification of multiplicity-free permutation characters of symmetric groups. My paper built on work of Saxl and was published at about the same time as parallel work of Godsil and Meagher.
Foulkes module
The symmetric group acts on the set of set partitions of
into
sets each of size
. The corresponding permutation character is multiplicity free: it has the attractive decomposition
, where
is the partition obtained from
by doubling the length of each path. Moreover, my collaborator Rowena Paget has given an explicit Specht filtration of the corresponding permutation module, analogous to the filtration used above of
. So everything is in place to prove a version of the Erdős–Ko–Rado Theorem for perfect matchings. Update. I am grateful to Ehud for pointing out that this has already been done (using slightly different methods) by his coauthor Nathan Lindzey.
Finite general linear groups
There is a remarkably close connection between the representations of and the unipotent representations of the finite general linear group
. Perhaps, one day, this will be explained by defining the symmetric group as a group of automorphisms of a vector space over the field with one element. The analogue of the permutation module
of
acting on
-subsets is the permutation module of
acting on
-subspaces of
. This decomposes as a direct sum of irreducible submodules exactly as in
above, replacing the Specht modules with their
-analogues. So once again the decomposition is multiplicity-free. This suggests there might well be a linear analogue of the Erdős–Ko–Rado Theorem in which we aim to maximize the size of a family of subspaces of
such that any two subspaces have at least a
-dimensional intersection.
Update. The Erdős–Ko–Rado Theorem for vector spaces was proved
in an aptly named paper by Frankl and Wilson in the general version for -intersecting families. The special case
is that the maximum number of
-dimenional subspaces of
such that any two subspaces have at least a
-dimensional interection is
for
, where the
-binomial coefficient is the number of
-dimensional subspaces of
. Thus the maximum is attained by taking all the subspaces containing a fixed line, in precise analogy with the case for sets. (Note also that if
then any two
-dimensional subspaces have a non-zero intersection.)
James wrote a second set of lecture notes Representation of the general linear groups, pointing out several times the remarkable analogy with the representation theory of the symmetric group. At the time he wrote, before his work with Dipper on Hecke algebras, this seemed somewhat mysterious. (Very roughly, is to
where
is an infinite field as the Hecke algebra
is to
.) Chapter 13 of his book has, I think, exactly what we need to prove the analogue of the lemma above. Update: I also just noticed that Section 8.6 of Harmonic analysis on finite groups by Ceccherini-Silberstein, Scarabotti and Tolli, titled ‘The
-Johnson scheme’ is entirely devoted to the Gelfand pair
where
is the maximal parabolic subgroup stabilising a chosen
-dimensional subspace.
Here is the setting for the general linear group. Fix and let
be the permutation module of
acting on
-dimensional subspaces of
defined over the complex numbers. For
, let
be the map sending a subspace to the formal linear combination of all
-dimensional subspaces that it contains. The
-Specht module
is then the intersection of the kernels of all the
for
. Moreover if we set
then and
and
for
.
The relevant graph, again denoted has vertices all
-dimensional subspaces of
; two subspaces are adjacent if and only if their intersection is
.
Lemma. The degree of is
.
Proof. Fix a basis for
and let
. An arbitrary
-dimensional subspace of
intersecting
in
has the form
where and
These choices of 'pivot columns' and matrix coefficients uniquely determine the subspace. Hence there are such subspaces. Taking
we get the result.
I have checked the following claim for when
and
when
by computer calculations. Note also that the
-binomial factor
counts, up to a power of
, the number of
-dimensional complements in
to a fixed subspace of dimension
, in good analogy with the symmetric group case. But it seems some careful analysis of reduced row-echelon forms (like the argument for the degree of
above, but too hard for me to to do in an evening) will be needed to nail down the power of
. Examples 11.7 in James' lecture notes seem helpful.
Claim The adjacency matrix for the graph acts on by scalar multiplication by
.
Assuming the claim, the least eigenvalue is the one for , and it is
. Putting this into Hoffman’s bound and using that the subspaces graph is regular of degree
as seen above we get that the maximum size of an independent set is at most
which, somewhat spectacularly I feel, again comes out on the nose.
A Putnam problem
The second generalization ties in with another project I’ve worked on but neglected to write up beyond a blog post: to prove the -analogue of the spectral results on the Kneser graph motivated by Problem A2 in the 2018 Putnam.
Posted by mwildon