0% found this document useful (0 votes)
65 views196 pages

Mmlecturenotes

Uploaded by

Shashwat Shukla
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
65 views196 pages

Mmlecturenotes

Uploaded by

Shashwat Shukla
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Mathematical Methods

Prof Andre Lukas


Rudolf Peierls Centre for Theoretical Physics
University of Oxford

MT 2018, updated MT 2023


Contents
1 Mathematical preliminaries 6
1.1 Vector spaces: (mostly) a reminder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.1.1 Vector spaces and sub vector spaces . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.1.2 Examples of vector spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.1.3 Linear maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.1.4 Examples of linear maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.1.5 Norms and normed vector spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.1.6 Examples of normed vector spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.1.7 Scalar products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.1.8 Examples of inner product vector spaces . . . . . . . . . . . . . . . . . . . . . . . . 19
1.1.9 Eigenvectors and eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.2 Topology and convergence∗ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.2.1 Convergence and Cauchy sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.2.2 Open and closed sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
1.3 Measures and integrals∗ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
1.3.1 The Riemann integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
1.3.2 Measures and measure sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
1.3.3 Examples of measure spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

2 Banach and Hilbert spaces∗ 34


2.1 Banach spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.1.1 Examples of Banach spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.2 Hilbert spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.2.1 Examples of Hilbert spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.2.2 Orthogonal basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.2.3 Dual space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.3 Linear operators on Hilbert spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.3.1 The adjoint operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.3.2 Eigenvectors and eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
2.3.3 The Fredholm alternative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

3 Fourier analysis 46
3.1 Fourier series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.1.1 Cosine Fourier series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.1.2 Sine Fourier series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.1.3 Real standard Fourier series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.1.4 Complex standard Fourier series . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.1.5 Pointwise convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.1.6 Examples of Fourier series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.2 Fourier transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.2.1 Basic definition and properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.2.2 Convolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.2.3 Examples of Fourier transforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
3.2.4 The inverse of the Fourier transform . . . . . . . . . . . . . . . . . . . . . . . . . . 63
3.2.5 Fourier transform in L2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

1
4 Orthogonal polynomials 66
4.1 General theory of ortho-normal polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . 66
4.1.1 Basic set-up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
4.1.2 Recursion relation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.1.3 General Rodriguez formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.1.4 Classification of orthogonal polynomials . . . . . . . . . . . . . . . . . . . . . . . . 68
4.1.5 Differential equation for orthogonal polynomials . . . . . . . . . . . . . . . . . . . 69
4.1.6 Expanding in orthogonal polynomials . . . . . . . . . . . . . . . . . . . . . . . . . 70
4.2 The Legendre polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.2.1 Associated Legendre polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
4.3 The Laguerre polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
4.4 The Hermite polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

5 Ordinary linear differential equations 77


5.1 Basic theory∗ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
5.1.1 Systems of linear first order differential equations . . . . . . . . . . . . . . . . . . . 77
5.1.2 Second order linear differential equations . . . . . . . . . . . . . . . . . . . . . . . 79
5.1.3 The boundary value problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
5.1.4 Solving the homogeneous equation . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
5.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
5.3 Bessel differential equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
5.3.1 The Gamma function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
5.3.2 Bessel differential equation and its solutions . . . . . . . . . . . . . . . . . . . . . . 86
5.3.3 Orthogonal systems of Bessel functions . . . . . . . . . . . . . . . . . . . . . . . . . 88
5.4 The operator perspective - Sturm-Liouville operators . . . . . . . . . . . . . . . . . . . . . 89
5.4.1 Sturm-Liouville operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
5.4.2 Sturm-Liouville eigenvalue problem . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
5.4.3 Sturm-Liouville and Fredholm alternative . . . . . . . . . . . . . . . . . . . . . . . 92

6 Laplace equation 93
6.1 Laplacian in different coordinate systems . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
6.1.1 Two-dimensional Laplacian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
6.1.2 Three-dimensional Laplacian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
6.1.3 Laplacian on the sphere . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
6.1.4 Green Identities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
6.2 Basic theory∗ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
6.2.1 Green functions for the Laplacian . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
6.2.2 Maximum principle and uniqueness . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
6.2.3 Uniqueness - another approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
6.3 Laplace equation in two dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
6.3.1 Complex methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
6.3.2 Separation of variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
6.3.3 Polar coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
6.4 Laplace equation on the two-sphere . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
6.4.1 Functions on S 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
6.4.2 Eigenvalue problem for the Laplacian on S2 . . . . . . . . . . . . . . . . . . . . . . 106
6.4.3 Multipole expansion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
6.5 Laplace equation in three dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

2
6.5.1 Method of image charges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
6.5.2 Cartesian coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
6.5.3 Cylindrical coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
6.5.4 Spherical coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

7 Distributions 114
7.1 Basic definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
7.1.1 Examples of distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
7.1.2 Convergence of distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
7.1.3 Derivatives of distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
7.2 Convolution of distributions∗ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
7.3 Fundamental solutions - Green functions∗ . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
7.4 Fourier transform for distributions∗ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

8 Other linear partial differential equations 124


8.1 The Helmholz equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
8.2 Eigenfunctions and time evolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
8.3 The heat equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
8.4 The wave equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
8.4.1 Strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
8.4.2 Membranes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
8.4.3 Green function of wave equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130

9 Groups and representations∗ 132


9.1 Groups: some basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
9.1.1 Groups and subgroups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
9.1.2 Group homomorphisms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
9.1.3 Examples of groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
9.2 Representations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
9.2.1 Examples of representations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
9.2.2 Properties of representations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
9.3 Lie groups and Lie algebras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
9.3.1 Definition of Lie group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
9.3.2 Definition of Lie algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
9.3.3 Examples of Lie groups and their algebras . . . . . . . . . . . . . . . . . . . . . . . 143
9.3.4 Lie algebra representations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
9.4 The groups SU (2) and SO(3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
9.4.1 Relationship between SU (2) and SO(3) . . . . . . . . . . . . . . . . . . . . . . . . 146
9.4.2 All complex irreducible representations . . . . . . . . . . . . . . . . . . . . . . . . . 147
9.4.3 Examples of SU (2) representations . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
9.4.4 Relation to spherical harmonics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
9.4.5 Clebsch-Gordan decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
9.5 The Lorentz group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
9.5.1 Basic definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
9.5.2 Properties of the Lorentz group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
9.5.3 Examples of Lorentz transformations . . . . . . . . . . . . . . . . . . . . . . . . . . 155
9.5.4 The Lie algebra of the Lorentz group . . . . . . . . . . . . . . . . . . . . . . . . . . 156
9.5.5 Examples of Lorentz group representations . . . . . . . . . . . . . . . . . . . . . . 157

3
Appendices 158

A Calculus in multiple variables - a sketch 158


A.1 The main players . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
A.2 Partial differentiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
A.2.1 Gradient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
A.2.2 Divergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
A.2.3 Curl . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
A.3 The total differential . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
A.4 The chain rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
A.5 Taylor series and extremal points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
A.5.1 Taylor’s formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
A.5.2 Extremal points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
A.6 Implicit functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168

B Manifolds in Rn 172
B.1 Definition of manifolds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
B.2 Tangent space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
B.3 Integration over sub-manifolds of Rn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
B.3.1 Metric and Gram’s determinant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
B.3.2 Definition of integration over sub-manifolds . . . . . . . . . . . . . . . . . . . . . . 176
B.3.3 A few special cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
B.4 Laplacian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178

C Differential forms 179


C.1 Differential one-forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
C.1.1 Definition of differential one-forms . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
C.1.2 The total differential . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
C.2 Basis for differential one-forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
C.2.1 Integrating differential one-forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
C.3 Alternating k-forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
C.3.1 Definition of alternating k-forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
C.3.2 The wedge product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
C.3.3 Basis for alternating k-forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
C.4 Higher-order differential forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
C.4.1 Definition of differential forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
C.4.2 The exterior derivative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
C.4.3 Hodge star and Laplacian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
C.5 Integration of differential forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
C.5.1 Definition of integral and general Stokes’s theorem . . . . . . . . . . . . . . . . . . 191
C.5.2 Stokes’s theorem in three dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . 192
C.5.3 Gauss’s theorem in three dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . 192
C.5.4 Stokes’s theorem in two dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
C.5.5 Gauss’s theorem in n dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193

D Literature 194

4
Foreword: Lecturing a Mathematical Methods course to physicists can be a tricky affair and following
such a course as a second year student may be even trickier. The traditional material for this course consists
of the classical differential equations and associated special function solutions of Mathematical Physics. In
a modern context, both in Mathematics and in Physics, these subjects are increasingly approached in the
appropriate algebraic setting of Banach and Hilbert Spaces. The correct setting for Quantum Mechanics
is provided by Hilbert Spaces and for this reason alone they are a mandatory subject and should at
least receive a rudimentary treatment in a course on Mathematical Methods for Physicists. However, the
associated mathematical discipline of Functional Analysis merits a lecture course in its own right and
cannot possibly be treated comprehensively in a course which also needs to cover a range of applications.
What is more, physics students may not yet have come across some of the requisite mathematics, such
as the notion of convergence and the definition of integrals. All of this places an additional overhead on
introducing mathematical key ideas, such as the idea of a Hilbert Space.
As a result of these various difficulties and requirements Mathematical Methods courses often end up as
collections of various bits of Mathematical Physics, seemingly unconnected and without any guiding ideas,
other than the apparent usefulness for solving some problems in Physics. Sometimes, ideas developed in
the context of finite-dimensional vector spaces are used as guiding principles but this ignores the crucial
differences between finite and infinite-dimensional vector spaces, to do with issues of convergence.
These lecture notes reflect the attempt to provide a modern Mathematical Physics course which
presents the underlying mathematical ideas as well as their applications and provides students with an
intellectual framework, rather than just a “how-to-do” toolkit. We begin by introducing the relevant math-
ematical ideas, including Banach and Hilbert Spaces but keep this at a relatively low level of formality
and quite stream-lined. On the other hand, we will cover the “traditional” subjects related to differential
equations and special functions but attempt to place these into the general mathematical context. Sec-
tions with predominantly mathematical background material are indicated with a star. While they are
important for a deep understanding of the material they are less essential for the relatively basic practical
tasks required to pass an exam. I believe ambitious, mathematically interested student can benefit from
the combination of mathematical foundation and applications in these notes. Students who want to focus
on the practical tasks may concentrate on the un-starred sections.
Two somewhat non-traditional topics, distributions and groups, have beed added. Distributions are
so widely used in physics - and physicists tend to discuss important ideas such as Green functions using
distributions - that they shouldn’t be omitted from a Mathematical Physics course. Symmetries have
become one of the central ideas in physics and they are underlying practically all fundamental theories
of physics. It would, therefore, be negligent, in a course on Mathematical Methods, not to introduce the
associated mathematical ideas of groups and representations.
The three appendices are pure bonus material. The first one is a review of the calculus of multiple
variables, the second a simple account of sub-manifolds in Rn , including curves and surfaces in R3 , as
encountered in vector calculus. Inevitably, it also does some of the groundwork for General Relativity - so
certainly worthwhile for anyone who would like to learn Einstein’s theory of gravity. The third appendix
introduces differential forms, a classical topic in mathematical physics, at an elementary level. Read (or
ignore) at your own leisure.

Andre Lukas
Oxford, 2018

5
1 Mathematical preliminaries
This section provides some basic mathematical background which is essential for the lecture and can also
be considered as part of the general mathematical language every physicist should be familiar with. The
part on vector spaces is (mainly) review and will be dealt with quite quickly - a more detailed treatment
can be found in the first year lecture notes on Linear Algebra. The main mathematical theme of this
course is the study of infinite-dimensional vector spaces and practically every topic we cover can (and
should) be understood in this context. While the first year course on Linear Algebra dealt with finite-
dimensional vector spaces many of the concepts were introduced without any reference to dimension and
straightforwardly generalise to the infinite-dimensional case. These include the definitions of vector space,
sub-vector space, linear maps, scalar products and norms and we begin by briefly reviewing those ideas.
One of the concepts which does not straightforwardly generalise to the infinite-dimensional case is that
of a basis. We know that a finite-dimensional vector space V (over a field F ) has a basis, v1 , . . . , vn , and
that every vector v ∈ V can be written as a unique linear combination
n
X
v= αi vi , (1.1)
i=1

where αi ∈ F are scalars. A number of complications arise when trying to generalise this to infinite
dimensions. Broadly speaking, it is not actually clear whether a basis exists in this case. A basis must
certainly contain an infinite number of basis vectors so that the RHS of Eq. (1.1) becomes an infinite
sum. This means we have to address questions of convergence. Even if we can formulate conditions for
convergence we still have to clarify whether we can find a suitable set of scalars αi such that the sum (1.1)
converges to a given vector v. All this requires techniques from analysis (= calculus done properly) and
the relevant mathematical basics will be discussed in part 2 of this section while much of Section 2 will
be occupied with answering the above questions.
Finally, we need to address another mathematical issue, namely the definition of integrals. The most
important infinite-dimensional vector spaces we need to consider consist of functions, with a scalar product
defined by an integral. To understand these function vector spaces we need to understand the nature of
the integral. In the last part of this section, we will, therefore, briefly discuss measures and the Riemann
and Lebesgue integrals.

1.1 Vector spaces: (mostly) a reminder


In this subsection, we review a number of general ideas in Linear Algebra which were covered in detail
in the first year course. We emphasise that, while the first year course was focused on finite-dimensional
vector spaces, most of the concepts covered (and reviewed below) are actually independent of dimension
and, hence, apply to the finite and the infinite-dimensional case.

1.1.1 Vector spaces and sub vector spaces


Of course we begin by recalling the basic definition of a vector space. It involves two sets, the set V which
consists of what we call vectors, and the field F , typically taken to be either the real numbers R or the
complex numbers C, whose elements are referred to as scalars. For these objects we have two operations,
the vector addition which maps two vectors to a third vector, and the scalar multiplication which maps a
scalar and a vector to a vector, subject to a number of basic axioms (= rules for calculating with vectors
and scalars). The formal definition is:
Definition 1.1. (Vector spaces) A vector space V over a field F (= R, C or any other field) is a set with
two operations:

6
i) vector addition: (v, w) 7→ v + w ∈ V , where v, w ∈ V
ii) scalar multiplication: (α, v) 7→ αv ∈ V , where α ∈ F and v ∈ V .
For all u, v, w ∈ V and all α, β ∈ F , these operations have to satisfy the following rules:
(V1) (u + v) + w = u + (v + w) “associativity”
(V2) There exists a “zero vector”, 0 ∈ V so that 0 + v = v “neutral element”
(V3) There exists an inverse, −v with v + (−v) = 0 “inverse element”
(V4) v+w =w+v “commutativity”
(V5) α(v + w) = αv + αw
(V6) (α + β)v = αv + βv
(V7) (αβ)v = α(βv)
(V8) 1·v =v
The elements v ∈ V are called “vectors”, the elements α ∈ F of the field are called “scalars”.
Closely associated to this definition is the one for the “sub-structure”, that is, for a sub vector space. A
sub vector space is a non-empty subset W ⊂ V of a vector space V which is closed under vector addition
and scalar multiplication. More formally, this means:
Definition 1.2. (Sub vector spaces) A sub vector space W ⊂ V is a non-empty subset of a vector space
V satisfying:
(S1) w1 + w2 ∈ W for all w1 , w2 ∈ W
(S2) αw ∈ W for all α ∈ F and for all w ∈ W
A sub vector space satisfies all the axioms in Def. 1.1 and is, hence, a vector space in its own right. Every
vector space V has two trivial sub vector spaces, the null vector space {0} ⊂ V and the total space V ⊂ V .
For two sub vector spaces U and W of V the sum U + W is defined as
U + W = {u + w | u ∈ U , w ∈ W } . (1.2)
Evidently, U + W is also a sub vector space of V as shown in the following
Exercise 1.1. Show that the sum (1.2) of two sub vector spaces is a sub vector space.
A sum U + W of two sub vector spaces is called direct iff U ∩ W = {0} and a direct sum is written as
U ⊕ W.
Exercise 1.2. Show that the sum U + W is direct iff every v ∈ U + W has a unique decomposition
v = u + w, with u ∈ U and w ∈ W .
Exercise 1.3. Show that a sub sector space is a vector space.
There are a number of basic notions for vector spaces which include linear combinations, span, linear
independence and basis. Let us briefly recall how they are defined. For k vectors v1 , . . . , vk in a vector
space V over a field F the expression
k
X
α 1 v1 + · · · + α k v k = αi vi , (1.3)
i=1

with scalars α1 , . . . , αn ∈ F , is called a linear combination. The set of all linear combinations of v1 , . . . , vk ,
( k )
X
Span(v1 , . . . , vk ) := α i vi | α i ∈ F , (1.4)
i=1
is called the span of v1 , . . . , vk . Linear independence is defined as follows.

7
Definition 1.3. Let V be a vector space over F and α1 , . . . , αk ∈ F scalars. A set of vectors v1 , . . . , vk ∈ V
is called linearly independent if
Xk
αi vi = 0 =⇒ all αi = 0 . (1.5)
i=1
Pk
Otherwise, the vectors are called linearly dependent. That is, they are linearly dependent if i=1 αi vi =0
has a solution with at least one αi 6= 0.
If a vector space V is spanned by a finite number of vectors (that is, every v ∈ V can be written as a
linear combination of these vectors) it is called finite-dimensional, otherwise infinite-dimensional. Recall
the situation for finite-dimensional vector spaces. In this case, we can easily define what is meant by a
basis.
Definition 1.4. A set v1 , . . . , vn ∈ V of vectors is called a basis of V iff:
(B1) v1 , . . . , vn are linearly independent.
(B2) V = Span(v1 , . . . , vn )
The number of elements in a basis is called the dimension, dim(V ) of the vector space. Every vector
v ∈ V can then be written as a unique linear combination of the basis vectors v1 , . . . , vn , that is,
n
X
v= αi vi , (1.6)
i=1

with a unique choice of αi ∈ F for a given vector v. The αi are also called the coordinates of the vector
v relative to the given basis.
Clearly, everything is much more involved for infinite-dimensional vector spaces but the goal is to
generalise the concept of a basis to this case and have an expansion analogous to Eq. (1.6), but with
the sum running over an infinite number of basis elements. Making sense of this requires a number of
mathematical concepts, including that of convergence, which will be developed in this section.

1.1.2 Examples of vector spaces


The most prominent examples of finite dimensional vector spaces are the column vectors
  
 v1 
 .. 
 
F =  .  vi ∈ F ,
n
(1.7)
 
vn
 

over the field F (where, usually, either F = R for real column vectors or F = C for complex column
vectors), with vector addition and scalar multiplication defined “entry-by-entry” as
         
v1 w1 v 1 + w1 v1 αv1
 ..   ..  .
.  ..   ..
 .  +  .  :=   , α  .  :=  .  . (1.8)
  
.
vn wn v n + wn vn αvn
Verifying that these satisfy the vector space axioms 1.1 is straightforward. A basis is given by the standard
unit vectors e1 , . . . , en and, hence, the dimension equals n.
Here, we will also (and predominantly) be interested in more abstract vector spaces consisting of sets
of functions. A general class of such function vector spaces can be defined by starting with a (any) set S
and by considering all functions from S to the vector space V (over the field F ). This set of functions
F(S, V ) := {f : S → V } (1.9)

8
can be made into a vector space over F by defining a “pointwise” vector addition and scalar multiplication

(f + g)(x) := f (x) + g(x) , (αf )(x) := αf (x) , (1.10)


where f, g ∈ F(S, V ) and α ∈ F .

Exercise 1.4. Show that the space (1.9) together with vector addition and scalar multiplication as defined
in Eq. (1.10) defines a vector space.

There are many interesting special cases and sub vector spaces which can be obtained from this
construction. For example, choose S = [a, b] ⊂ R as an interval on the real line (a = −∞ or b = ∞ are
allowed) and V = R (or V = C), so that we are considering the space F([a, b], R) or F([a, b], C) of all
real-valued (or complex-valued) functions on this interval. With the pointwise definitions (1.10) of vector
addition and scalar multiplication these functions form a vector space.
We can consider many sub-sets of this vector space by imposing additional conditions on the functions
and as long as these conditions are invariant under the addition and scalar multiplication of functions (1.10)
Def. 1.2 implies that these sub-sets form sub vector spaces. For example, we know that the sum of two
continuous functions as well as the scalar multiple of a continuous function is continuous so the set of
all continuous functions on an interval forms a (sub) vector space which we denote by C([a, b]). Similar
statements apply to all differentiable functions on an interval and the vector space of k times (continuously)
differentiable functions on the interval [a, b] is denoted by C k ([a, b]), with C ∞ ([a, b]) the space of infinitely
many time differentiable functions on the interval [a, b]. In cases where we consider the entire real line it
is sometimes useful to restrict to functions with compact support. A function f with compact support
vanishes outside a certain radius R > 0 such that f (x) = 0 whenever |x| > R. We indicate the property
of compact support with a subscript “c”, so that, for example, the vector space of continuous functions
on R with compact support is denoted by Cc (R). The vector space of all polynomials, restricted to the
interval [a, b] is denoted by P([a, b]). Whether the functions are real or complex-valued is sometimes also
indicated by a subscript, so CR ([a, b]) are the real-valued continuous functions on [a, b] while CC ([a, b]) are
their complex-valued counterparts.

Exercise 1.5. Find at least three more examples of function vector spaces, starting with the construc-
tion (1.9).

1.1.3 Linear maps


As for any algebraic structure, it is important to study the maps which are compatible with vector spaces,
the linear maps 1 .

Definition 1.5. (Linear maps) A map T : V → W between two vector spaces V and W over a field F is
called linear if
(L1) T (v1 + v2 ) = T (v1 ) + T (v2 )
(L2) T (αv) = αT (v)
for all v, v1 , v2 ∈ V and for all α ∈ F . Further, the set Ker(T ) := {v ∈ V | T (v) = 0} ⊂ V is called the
kernel of T and the set Im(T ) := {T (v) | v ∈ V } ⊂ W is called the image of T .

In the context of infinite-dimensional vector spaces, linear maps are also sometimes called linear operators
and we will occasionally use this terminology. Recall that a linear map T : V → W always maps the zero
vector of V into the zero vector of W , so T (0) = 0 and that the kernel of T is a sub vector space of V
1
We will denote linear maps by uppercase letters such as T . The letter f will frequently be used for the functions which
form the elements of the vector spaces we consider.

9
while the image is a sub vector space of W . Surjectivity and injectivity of the linear map T are related to
the image and kernel via the equivalences
T surjective ⇔ Im(T ) = W , T injective ⇔ Ker(T ) = {0} . (1.11)
A linear map T : V → W which is bijective (= injective and surjective) is also called a (vector space)
isomorphism between V and W . The set of all linear maps T : V → W is referred to as the homorphisms
from V to W and is denoted by Hom(V, W ) := {T : V → W | T linear}. By using the general construc-
tion (1.10) (where V plays the role of the set S and W the role of the vector space V ) this space can be
equipped with vector addition and scalar multiplication. Further, since the sum of two linear functions
and the scalar multiple of a linear function are again linear, it follows from Def. 1.2 that Hom(V, W ) is a
(sub) vector space of F(V, W ). Finally, we note that for two linear maps T : V → W and S : W → U ,
the composition S ◦ T : V → U (defined by S ◦ T (v) := S(T (v))) is also linear.
The identity map id : V → V defined by id(v) = v is evidently linear. Recall that a linear map
S : V → V is said to be the inverse of a linear map T : V → V iff
S ◦ T = T ◦ S = id . (1.12)
The inverse exists iff T is bijective (= injective and surjective) and in this case it is unique, linear and
denoted by T −1 . Also recall the following rules
(T −1 )−1 = T , (T ◦ S)−1 = S −1 ◦ T −1 , (1.13)
for calculating with the inverse.
For a finite-dimensional vector space V with basis (v1 , . . . , vn ) we can associate to a linear map
T : V → V a matrix A with entries defined by
n
X
T (vj ) = Aij vi . (1.14)
i=1
This matrix describes the action of the linear map on the coordinate vectors relative to the basis (v1 , . . . , vn ).
To see what this
Pmeans more explicitly consider a vector v ∈ V with coordinate vector α = (α1 , . . . , αn )T ,
n
such that v = i=1 αi vi . Then, if T maps the vector v to v → T (v) the coordinate vector is mapped to
α → Aα. How does the matrix A depend on the choice of basis? Introduce a second basis (v10 , . . . , vn0 )
with associated matrix A0 . Then we have
Xn
A0 = P AP −1 , vj = Pij vi0 . (1.15)
i=1

The matrix P can also be understood as follows. Consider a vector v = ni=1 αi vi = ni=1 αi0 vi0 with
P P
coordinate vectors α = (α1 , . . . , αn ) and α0 = (α10 , . . . , αn0 ) relative to the unprimed and primed basis.
Then,
α0 = P α . (1.16)
An important special class of homomorphisms is the dual vector space V ∗ := Hom(V, F ) of a vector
space V over F . The elements of the dual vector space are called (linear) functionals and they map vectors
to numbers in the field F . For a finite-dimensional vector space V with basis v1 , . . . , vn , there exists a
basis ϕ1 , . . . , ϕn of V ∗ , called the dual basis, which satisfies
ϕi (vj ) = δij . (1.17)
In particular, a finite-dimensional vector space and its dual have the same dimension. For infinite-
dimensional vector spaces the discussion is of course more involved and we will come back to this later.
Exercise 1.6. For a finite-dimensional vector space V with basis v1 , . . . , vn show that there exists a basis
ϕ1 , . . . , ϕn of the dual space V ∗ which satsfies Eq. (1.17).

10
1.1.4 Examples of linear maps
We know that the linear maps T : Rn → Rm (T : Cn → Cm ) can be identified with the m × n matrices
containing real entries (complex entries) whose linear action is simple realised by the multiplication of
matrices with vectors.
Let us consider some examples of linear maps for vector spaces of functions, starting with the space
C([a, b]) of (real-valued) continuous functions on the interval [a, b]. For a (real-valued) continuous function
K : [a, b] × [a, b] → R of two variables we can define the map T : C([a, b]) → C([a, b]) by
Z b
T (f )(x) := dx̃ K(x, x̃)f (x̃) . (1.18)
a

This map is evidently linear since the integrand is linear in the function f and the integral itself is linear.
A linear map such as the above is called a linear integral operator and the function K is also referred to as
the kernel of the integral operator 2 . Such integral operators play an important role in functional analysis.
For another example consider the vector space C ∞ ([a, b]) of infinitely many times differentiable func-
tions on the interval [a, b]. We can define a linear operator D : C ∞ ([a, b]) → C ∞ ([a, b]) by

df d
D(f )(x) := (x) or D = . (1.19)
dx dx
A further class of linear operators Mp : C ∞ ([a, b]) → C ∞ ([a, b]) is obtained by multiplication with a fixed
function p ∈ C ∞ ([a, b]), defined by
Mp (f )(x) := p(x)f (x) . (1.20)
The above two classes of linear operators can be combined and generalised by including higher-order
differentials which leads to linear operators T : C ∞ ([a, b]) → C ∞ ([a, b]) defined by

dk dk−1 d
T = pk k
+ pk−1 k−1
+ · · · + p1 + p0 , (1.21)
dx dx dx
where pi , for i = 0, . . . , k, are fixed functions in C ∞ ([a, b]). Linear operators of this type will play an
important role in our discussion, mainly because they form the key ingredient for many of the differential
equations which appear in Mathematical Physics.

1.1.5 Norms and normed vector spaces


Frequently, we will require additional structure on our vector spaces which allows us to study the “ge-
ometry” of vectors. The simplest such structure is one that “measures” the length of vectors and such
a structure is called a norm. As we will see in the next sub section, we will require a norm to define
convergence and basic ideas of topology in a vector space. The formal definition of a norm is as follows.

Definition 1.6. (Norms and normed vector spaces) A norm k · k on a vector space V over the field F = R
or F = C is a map k · k : V → R which satsifies
(N1) k v k > 0 for all non-zero v ∈ V
(N2) k αv k = |α| k v k for all α ∈ F and all v ∈ V
(N3) k v + w k ≤ k v k + k w k for all v, w ∈ V (triangle inequality)
A vector space V with a norm is also called a normed vector space.
2
This notion of “kernel” has nothing to do with the kernel of a linear map, as introduced in Def. 1.5. The double-use of
the word is somewhat unfortunate but so established that it cannot be avoided. It will usually be clear from the context
which meaning of “kernel” is referred to.

11
Note that the notation |α| in (N2) refers to the simple real modulus for F = R and to the complex modulus
for F = C. All three axioms are intuitively clear if we think about a norm as providing us with a notion
of “length”. Clearly, a length should be strictly positive for all non-zero vectors as stated in (N1), it needs
to scale with the (real or complex) modulus of a scalar if the vector is multiplied by this scalar as in (N2)
and it needs to satisfy the triangle inequality (N3). Since 0v = 0 for any vector v ∈ V , the axiom (N2)
implies that k 0 k = k 0v k = 0 k v k = 0, so the zero vector has norm 0 (and is, from (N1), the only
vector with this property).

Exercise 1.7. Show that, in a normed vector space V , we have k v − w k ≥ k v k−k w k for all v, w ∈ V .

For normed vector spaces V and W we can now introduce an important new sub-class of linear operators
T : V → W , namely bounded linear operators. They are defined as follows 3 .

Definition 1.7. (Bounded linear operators) A linear operator T : V → W is called bounded if there exists
a positive K ∈ R such that k T (v) kW ≤ K k v kV for all v ∈ V . The smallest number K for which this
condition is satisfied is called the norm, k T k, of the operator T .

Having introduced the notion of the norm of a bounded linear operator, we can now introduce isometries.

Definition 1.8. (Isometries) A bounded linear operator T : V → W is an isometry iff k T (v) kW = k v kV


for all v ∈ V .

1.1.6 Examples of normed vector spaces


You are already familiar with a number of normed vector spaces, perhaps without having thought about
them in this more formal context. The real and complex numbers, seen as one-dimensional vectors spaces,
are normed with the norm given by the (real or complex) modulus. It is evident that this satisfies the
conditions (N1) and (N2) in Def. 1.6. For the condition (N3) consider the following

Exercise 1.8. Show that the real and complex modulus satisfies the triangle inequality.

More interesting examples of normed vector spaces are provided by Rn and Cn with the Euclidean norm

n
!1/2
X
k v k := |vi |2 , (1.22)
i=1

for any vector v = (v1 , . . . , vn )T . (As above, the modulus sign refers to the real or complex modulus,
depending on whether we consider the case of Rn or Cn .) It is immediately clear that axioms (N1) and
(N2) are satisfied and we leave (N3) as an exercise.

Exercise 1.9. Show that the prospective norm on Rn or Cn defined in Eq. (1.22) satisfies the triangle
inequality.

Linear maps T : F n → F m are described by the action of m × n matrices on vectors. Since such
matrices, for a given linear map T , have fixed entries it is plausible that they are bounded with respect
to the norm (1.22). You can attempt the proof of this statement in the following exercise.

Exercise 1.10. Show that linear maps T : F n → F m , where F = R or F = C are bounded, relative to
the norm (1.22).
3
When two normed vector spaces V and W are involved we will distinguish the associated norms by adding the name of
the space as a sub-script, so we write k · kV and k · kW .

12
It is not too difficult to generalise this statement and to show that linear maps between any two finite-
dimensional vector spaces are bounded. For the infinite-dimensional case this is not necessarily true (see
Exercise 1.13 below).
Vector spaces, even finite-dimensional ones, usually allow for more than one way to introduce a norm.
For example, on Rn or Cn , with vectors v = (v1 , . . . , vn )T we can define, for any real number p ≥ 1, the
norm !1/p
Xn
p
k v kp := |vi | . (1.23)
i=1
Clearly, this is a generalisation of the standard norm (1.22) which corresponds to the special case p = 2.
As before, conditions (N1) and (N2) in Def. 1.6 are easily verified. For the triangle inequality (N3) consider
the following exercise.
Exercise 1.11. For two vectors v = (v1 , . . . , vn )T and w = (w1 , . . . , wn )T in Rn or Cn and two real
numbers p, q ≥ 1 with 1/p + 1/q = 1 show that

≤ ( ni=1 |vi |p )1/p ( ni=1 |wi |q )1/q


Pn P P
(a) i=1 |vi wi | (Hölder’s inequality)
Pn p 1/p Pn p 1/p Pn p 1/p (1.24)
(b) ( i=1 |vi + wi | ) ≤ ( i=1 |vi | ) + ( i=1 |wi | ) (Minkowski’s inequality)
Use Minkowski’s inequality to show that the prospective norm (1.23) satisfies the triangle inequality.
Norms can also be introduced on infinite-dimensional vector spaces. As an example, consider the space
C([a, b]) of continuous functions on the interval [a, b]. The analogue of the standard norm (1.22) and its
generalisation (1.23) (thinking, intuitively, about promoting the finite sum in Eq. (1.22) to an integral)
for f ∈ C([a, b] can be defined as
Z b 1/2 Z b 1/p
2 p
k f k := dx |f (x)| , k f kp := dx |f (x)| , (1.25)
a a

for any real p ≥ 1.


Exercise 1.12. Show that Eq. (1.25) for any real p ≥ 1 defines a norm on C([a, b]).
Exercise 1.13. Consider the space C ∞ ([0, 1]) with norm k · k2 , the monomials pk (x) = xk and the differ-
ential operator T = d/dx. Compute k T pk k2 /k pk k2 and use the result to show that T is not bounded.

1.1.7 Scalar products


A normed vector space provides a basic notion of geometry in that it assigns a “length” to each vector.
Often it is desirable to have a more comprehensive framework for geometry which also allows measuring
angles between vectors and defining the concept of orthogonality. Such a framework is provided by a
scalar product or inner product on a vector space which is defined as follows.
Definition 1.9. A real scalar product on a vector space V over F = R and a hermitian scalar product on
a vector space V over the field F = C is a map h · , · i : V × V → F which satisfies
(S1) hv, wi = hw, vi, for a real scalar product, F = R
hv, wi = hw, vi∗ , for a hermitian scalar product, F = C
(S2) hv, αu + βwi = αhv, ui + βhv, wi
(S3) hv, vi > 0 if v 6= 0
for all vectors v, u, w ∈ V and all scalars α, β ∈ F .
A real or hermitian scalar product is also referred to as an inner product on V and a vector space V with
such a scalar product is also called an inner product (vector) space.

13
Note that, from property (S2), the scalar product is linear in the second argument and combining this
with (S1) implies for the first argument that

hαv + βu, wi = αhv, wi + βhu, wi F =R


(1.26)
hαv + βu, wi = α∗ hv, wi + β ∗ hu, wi F =C

Evidently, in the real case the scalar product is also linear in the first argument (and, hence, it is bi-linear).
In the complex case, it is sesqui-linear which means that, in addition to linearity in the second argument,
it is half-linear in the first argument (vector sums can be pulled out of the first argument while scalars
pull out with a complex conjugate). In the following, we will frequently write equations for the hermitian
case, F = C, keeping in mind that the analogous equations for the real case can be obtained by simply
omitting the complex conjugate.
How are inner product vector spaces and normed vector spaces related? Properties (S1) and (S3)
imply that hv, vi is always real and positive so it makes sense to try to define a norm by
p
k v k := hv, vi . (1.27)

As usual, it is easy to show that this satisfies properties (N1) and (N2) in Def. 1.6. To verify the triangle
inequality (N3) we recall that every scalar product satisfies the Cauchy-Schwarz inequality

|hv, wi| ≤ k v k k w k =⇒ kv+w k≤kv k+kw k, (1.28)

from which the triangle inequality follows immediately. In conclusion, Eq. (1.27) does indeed define a
norm in the sense of Def. 1.6 and it is called the norm associated to the scalar product. Hence, any inner
product vector space is also a normed vector space.

Exercise 1.14. Show that a (real or hermitian) scalar product with associated norm (1.27) satisfies the
Cauchy-Schwarz inequality and the triangle inequality in Eq. (1.28). Also show that the norm (1.27)
satisfies the parallelogram law
 
k v + w k2 + k v − w k2 = 2 k v k2 + k w k2 , (1.29)

for all v, w ∈ V .

Recall that two vectors v, w ∈ V are called orthogonal iff hv, wi = 0. Also, recall that any finite set
of mutually orthogonal non-zero vectors is linearly independent.

Exercise 1.15. For an inner product vector space, show that a finite number of orthogonal non-zero
vectors are linearly independent.

For a sub vector space W ⊂ V the orthogonal complement W ⊥ is defined as

W ⊥ := {v ∈ V | hv, wi = 0 for all w ∈ W } . (1.30)

In other words, the orthogonal complement W ⊥ consists of all vectors which are orthogonal to the entire
space W .

Exercise 1.16. Show, for a sub vector space W ⊂ V , that W ∩ W ⊥ = {0}. (This means that the sum of
W and W ⊥ is direct.) For a finite-dimensional V , show that W ⊕ W ⊥ = V .

14
Further, a (finite or infinite) collection i of vectors, where i = 1, 2, . . ., is called an ortho-normal system
iff hi , j i = δij . We know that finite-dimensional vector spaces have a basis and by applying to such a
basis the Gram-Schmidt procedure one obtains an ortho-normal basis. Hence, every finite-dimensional
inner product vector space has an ortho-normal basis. The scalar product makes it easier to work out the
coordinates of a vector v ∈ V relative to an ortho-normal basis by using the formula
n
X
v= αi  i ⇐⇒ αi = hi , vi . (1.31)
i=1

Also, recall that, in terms of the coordinates relative to an ortho-normal


P basis, the
P scalar product and its
associated norm take a very simple form. For two vectors v = i αi i and w = j βj j we have
n
X n
X
hv, wi = αi∗ βi , k v k2 = |αi |2 , (1.32)
i=1 i=1

as can be easily verified using the orthogonality relations hi , j i = δij . For infinite-dimensional inner
product spaces the story is more involved and will be tackled in Section 2.
It is useful to re-consider the relationship of a vector space V and its dual vector space V ∗ in the
presence of an inner product on V . The main observation is that the inner product induces a map
ı : V → V ∗ defined by
ı(v)(w) := hv, wi . (1.33)
For a vector space over R this map is linear, for a vector space over C it is half-linear (meaning, as for
the first argument of hermitian scalar products, that vector sums pull through while scalars pull out with
a complex conjugation). In either case, this map is injective. For finite-dimensional V it is bijective and
provides an identification of the vector space with its dual.
Exercise 1.17. Show that the map ı : V → V ∗ defined in Eq. (1.33) is injective and that it is bijective
for finite-dimensional V .
The properties of the map ı in the infinite-dimensional case will be further explored later.

Application 1.1. Dirac notation


In physics, more specifically in the context of quantum mechanics, the existence of the map ı is
exploited for a convenient convention, referred to as Dirac notation. In Dirac notation, vectors w ∈ V
and dual vectors ı(v) ∈ V ∗ are denoted as follows:

w → |wi , ı(v) → hv| . (1.34)

In other words, vectors in V are denoted by “ket”-vectors |wi, dual vectors in V ∗ , obtained via the
map ı, by “bra”-vectors hv| while the action of one on the other (which equals the scalar product in
Eq. (1.33)) is simple obtained by combining the two to a “bra-(c)ket”, resulting in

ı(v)(w) = hv, wi = hv|wi . (1.35)

Note, there is nothing particularly profound about this notation - for the most part it simply amounts
to replacing the comma separating the two arguments of an inner product with a vertical bar.

We can ask about interesting new properties of linear maps in the presence of an inner product. First,
recall that scalar products of the form
hv, T (w)i (1.36)

15
for a linear map T : V → V are also called matrix elements of T . Two maps T : V → V and S : V → V
are equal iff all their matrix elements are equal, that is, iff hv, T (w)i = hv, S(w)i for all v, w ∈ V .

Exercise 1.18. Show that two linear maps are equal iff all their matrix elements are equal.

In the finite-dimensional case, the matrix A which describes a linear map T : V → V relative to an
ortho-normal basis 1 , . . . , n is simply obtained by the matrix elements

Aij = hi , T (j )i . (1.37)

Next, we recall the definition of the adjoint linear map.

Definition 1.10. For a linear map T : V → V on a vector space V with scalar product, an adjoint linear
map, T † : V → V is a map satisfying
hv, T wi = hT † v, wi (1.38)
for all v, w ∈ V .

If the adjoint exists, it is unique and has the following properties


† −1
(T † )† = T , (αT + βS)† = α∗ T † + β ∗ S † , (S ◦ T )† = T † ◦ S † , T −1 = T † , (1.39)

provided the maps in those equations exist.

Exercise 1.19. Show that the adjoint map is unique and that it has the properties in Eq. (1.39).

For finite-dimensional inner product vector spaces we can describe both T and its adjoint T † by
matrices relative to an ortho-normal basis 1 , . . . , n . They are given by the matrix elements

Aij = hi , T (j )i , (A† )ij = hi , T † (j )i . (1.40)



where A† := AT is the hermitian conjugate of the matrix A. Hence, at the level of matrices, the adjoint
simply corresponds to the hermitian conjugate matrix (or the transpose matrix in the real case). This
observation can also be used to show the existence of the adjoint for finite-dimensional inner product
spaces. Existence of the adjoint in the infinite-dimensional case is not so straightforward and will be
considered later.

Exercise 1.20. Show that the matrix which consists of the matrix elements of T † in Eq. (1.40) is indeed
the hermitian conjugate of the matrix given by the matrix elements of T .

Particularly important linear operators are those which can be moved from one argument of a scalar
product into the other without changing the value of the scalar product and they are called hermitian or
self-adjoint operators.

Definition 1.11. A linear operator T : V → V on a vector space V with scalar product is called self-
adjoint (or hermitian) iff hv, T (w)i = hT (v), wi for all v, w ∈ V .

Hence, a self-adjoint operator T : V → V is one for which the adjoint exists and satisfies T † = T .
Recall that the commutator of two linear operators S, T is defined as

[S, T ] := S ◦ T − T ◦ S , (1.41)

We also say that two operators S and T commute iff [S, T ] = 0.

16
We can ask under what condition the composition S ◦ T of two hermitian operators is again hermitian.
Using the above commutator notation, we have

(S ◦ T )† = S ◦ T ⇔ T † ◦ S† = S ◦ T ⇔ T ◦S =S◦T ⇔ [S, T ] = 0 (1.42)

where S = S † and T = T † has been used for the second equivalence. In conclusion, the composition of
two hermitian operators is hermitian if and only if the operators commute. For a complex inner product
vector space, it is also worth noting that, from Eq. (1.39), an anti-hermitian operator, that is an operator
T satisfying T † = −T , can be turned into a hermitian one (and vice versa) by multiplying with ±i, so

T † = −T ⇐⇒ (±iT )† = ±iT . (1.43)

Also note that every linear operator T : V → V with an adjoint T † can be written as a (unique) sum of a
hermitian and an anti-hermitian operator. Indeed, defining T± = 21 (T ± T † ) we have T = T+ + T− while
T+ is hermitian and T− is anti-hermitian.

Application 1.2. More on Dirac notation


In the context of Dirac notation, the matrix elements of an operator T are denoted by

hv|T |wi := hv, T (w)i . (1.44)

In this way, the matrix element of the operator is obtained by including it between a bra and a
ket vector. This symmetric notation is particularly useful for hermitian operators since they can be
thought of as acting on either one of the scalar product’s arguments. For non-hermitian operators
or for the purpose of proving that an operator is hermitian the Dirac notation is less helpful and
it is sometimes better to use the mathematical notation, as on the RHS of Eq. (1.44). Relative to
an ortho-normal basis 1 , . . . , n of a finite-dimensional inner product space V a self-adjoint linear
operator T : V → V is described by the matrix with entries (in Dirac notation)

Tij = hi |T |j i . (1.45)

In terms of these matrix elements, T can also be written as


n
X
T = Tkl |k ihl | . (1.46)
k,l=1

This can be easily verified by taking the matrix elements with hi | and |j i of this equation and
by using hi |k i = δik . (Formally, Eq. (1.46) exploits the identification Hom(V, V ) ∼
= V ⊗ V ∗ .) In
particular the identity operator id with matrix elements δij can be written as
n
X
id = |i ihi | . (1.47)
i=1

Exercise 1.21. By acting on an arbitrary vector, verify explicitly that the RHS of Eq. (1.47) is indeed
the identity operator.

Dirac notation can be quite intuitive as can be demonstrated by re-writing some of our earlier equa-
tions. For example, writing the relation (1.31) for the coordinates relative to an orth-normal basis in

17
Dirac notation leads to
n
X
|vi = |i ihi |vi . (1.48)
i=1

Evidently, this can now be derived by inserting the identity operator in the form (1.47). Similarly,
the expressions (1.32) for the scalar product and the norm in Dirac notation
n
X n
X
hv|wi = hv|i ihi |wi , k |vi k2 = hv|vi = hv|i ihi |vi (1.49)
i=1 i=1

are easily seen to follow by inserting the identity operator (1.47).

Another important class of specific linear maps on an inner product vector space are unitary maps which
are precisely those maps which leave the value of the inner product unchanged in the sense of the following

Definition 1.12. Let V be an inner producr vector space. A linear map U : V → V is called unitary iff

hU (v), U (w)i = hv, wi (1.50)

for all v, w ∈ V .

Unitary maps have the following important properties.

Proposition 1.1. (Properties of unitary maps) A unitary map U with adjoint U † has the following
properties.
(i) Unitary maps U can also be characterized by U † ◦ U = U ◦ U † = idV .
(ii) Unitary maps U are invertible and U −1 = U † .
(iii) The composition of unitary maps is a unitary map.
(iv) The inverse, U † , of a unitary map U is unitary.

Exercise 1.22. Show the properties of unitary maps in Lemma 1.1.

For finite-dimensional vector spaces we know that, relative to an ortho-normal basis 1 , . . . , n , a unitary
map Û is described by a unitary matrix (orthogonal matrix in the real case). Indeed, introducing the
matrix U with matrix elements (in Dirac notation)

Uij = hi |Û |j i , (1.51)

this statement is verified by the following short calculation.


X X
(U † )ij Ujk = hi |Û † | j ihj | Û |k i = hi | Û †
|{z} Û |k i = hi |k i = δik . (1.52)
| {z }
j j =id
=id

Still in the finite-dimensional case, consider two choices of ortho-normal basis (1 , . . . , n ) and (01 , . . . , 0n )
and the matrices Tij = hi |T̂ |j i and Tij0 = h0i |T̂ |0j i representing a linear operator T̂ with respect to either.
We have already written down the general relation between those two matrices in Eq. (1.15) but how does
this look for a change from one ortho-normal basis to another? Inserting identity operators (1.47) we find
m
X
Tij0 = h0i |T̂ |0j i = h0i |k ihk |T̂ |l ihl |0j i = Qik Tkl Q∗jl = (QT Q† )ij , Qij := h0i |j i (1.53)
k,l=1

18
so that T 0 = QT Q† . This result is, in fact, consistent with Eq. (1.15) since the matrix Q is unitary, so
Q† = Q−1 . This can be verified immediately:
n
X n
X n
X
(Q† Q)ij = Q∗ki Qkj = h0k |i i∗ h0k |j i = hi |0k ih0k |j i = hi |j i = δij . (1.54)
k=1 k=1 k=1

Using this formalism, we can also verify that Q relates coordinate vectors relative to the two choices of
basis, as stated in Eq. (1.16). From Eq. (1.48), the two coordinate vectors for a given vector |vi are given
by αi = hi |vi and αi0 = h0i |vi. It follows
n
X n
X
αi0 = h0i |vi = 0
hi |j ihj |vi = Qij αj . (1.55)
j=1 j=1

1.1.8 Examples of inner product vector spaces


The standard finite-dimensional examples are of course Rn and Cn with scalar product defined by
n
X
hv, wi := vi∗ wi , (1.56)
i=1

for vectors v = (v1 , . . . , vn )T and w = (w1 , . . . , wn )T . (We have followed the convention, mentioned above,
of writing the equations for the complex case. For the real case, simply drop the complex conjugation.)
The norm associated to this scalar product is of course the one given in Eq. (1.22). Linear maps are
described by n × n matrices and the adjoint of a matrix A, relative to the inner product (1.56), is given
by the hermitian conjugate A† . For the complex case, unitary linear maps are given by unitary matrices,
that is matrices U satisfying
U † U = 1n . (1.57)
For the real case, unitary linear maps, relative to the inner product (1.56), are given by orthogonal
matrices, that is matrices A satisfying
AT A = 1n . (1.58)
Both are important classes of matrices which we will return to in our discussion of symmetries in Section 9.
For an infinite-dimensional example, we begin with the space C[a, b] of continuous (complex-valued)
functions on the interval [a, b], equipped with the scalar product
Z b
hf, gi := dx f (x)∗ g(x) , (1.59)
a

for f, g ∈ C[a, b].

Exercise 1.23. Verify that Eq. (1.59) defines a scalar product on C[a, b]. (Hint: Check the conditions in
Def. 1.9).

The norm associated to this scalar product is given by the first equation (1.25). Consider the linear
operator Mp , defined in Eq. (1.20), which acts by multiplication with the function p. What is the adjoint
of Mp ? The short calculation
Z b Z b

hf, Mp (g)i = dx f (x) (p(x)g(x)) = (p(x)∗ f (x))∗ g(x) = hMp∗ (f ), gi (1.60)
a a

19
shows that
Mp† = Mp∗ , (1.61)
so the adjoint operator corresponds to multiplication with the complex conjugate function p∗ . If p is
real-valued so that p = p∗ then Mp is a hermitian operator. From the definition of the multiplication
operator it is clear that
Mp ◦ Mq = Mpq , M1 = id (1.62)
for two functions p and q. Eqs. (1.61) and (1.62) can be used to construct unitary multiplication operators.
For a real-valued function u we have

Me†iu ◦ Meiu = Me−iu ◦ Meiu = M1 = id , (1.63)

so that multiplication with a complex phase eiu(x) (where u is a real-valued function) is a unitary operator.
This can also be verified directly from the scalar product:
Z b  ∗   Z b
hMeiu (f ), Meiu (g)i = dx e iu(x)
f (x) iu(x)
e g(x) = f (x)∗ g(x) = hf, gi . (1.64)
a a

For another example of a unitary map, let us restrict to the space Cc (R) of complex-valued functions
on the real line with compact support, still with the scalar product (1.59), but setting a = −∞ and b = ∞.
(The compact support property is to avoid issues with the finiteness of the integral - we will deal with
this in more generality later.) On this space define the “translation operator” Ta : Cc (R) → Cc (R) by

Ta (f )(x) := f (x − a) , (1.65)

for any fixed a ∈ R. Evidently, this operator “shifts” the graph of the function by an amount of a along
the x-axis. Let us work out the effect of this operator on the scalar product. To find the adjoint of Ta we
calculate
Z ∞ y=x−a Z ∞
hf, Ta (g)i = dx f (x)∗ g(x − a) = dy f (y + a)∗ g(y) = hT−a (f ), gi , (1.66)
z}|{
−∞ −∞

so that Ta† = T−a , that is, the adjoint is given by the shift in the opposite direction. To check unitarity
we work out
Z ∞ y=x−a Z ∞

hTa (f ), Ta (g)i = dx f (x − a) g(x − a) = dy f (y)∗ g(y) = hf, gi . (1.67)
z}|{
−∞ −∞

and conclude that Ta is indeed unitary. Alternatively, we can check the unitarity condition Ta† ◦ Ta =
T−a ◦ Ta = id which works out as expected since combining shifts by a and −a amounts to the identity
operation.
To consider differential operators we restrict further to the inner product space Cc∞ (R) of complex-
valued, infinitely times differentiable functions with compact support, still with scalar product defined by
Eq. (1.59), setting a = −∞ and b = ∞. What is the adjoint of the differential operator D = d/dx? The
short calculation
Z ∞ Z ∞
∗ 0 ∗ ∞
hf, D(g)i = dx f (x) g (x) = [f (x) g(x)]−∞ − dx f 0 (x)∗ g(x) = h−D(f ), gi (1.68)
−∞ | {z } −∞
=0

20
(where the boundary term vanishes since the functions have compact support) shows that
 †
d d
=− , (1.69)
dx dx

so d/dx is anti-hermitian. As discussed earlier, for a complex inner product space, we can turn this into
a hermitian operator by multiplying with ±i, so that
 †
d d
±i = ±i . (1.70)
dx dx

Another lesson from the above computation is that, for scalar products defined by integrals, the property
of being hermitian can depend on boundary conditions satisfied by the functions in the relevant function
vector space. In the case of Eq. (1.68) we were able to reach a conclusion because the boundary term
could be discarded due to the compact support property of the functions.
What about the composite operator Mx ◦ i d/dx? We know that the composition of two hermitian
operators is hermitian iff the two operators commute so let us work out the commutator (writing, for
simplicity, Mx as x)  
d d d d d
i ,x = i ◦x−x◦i = i + ix − ix =i. (1.71)
dx dx dx dx dx
(If the above computation looks confusing remember we are dealing with operators, so think of the entire
equation above as acting on a function f . The second step in the calculation then amounts to using
the product rule for differentiation.) Since the above commutator is non-vanishing we conclude that
Mx ◦ i d/dx is not hermitian.
So much for a few introductory examples of how to carry out calculations for infinite-dimensional
inner product spaces. We will now collect a few more mathematical tools required for a more systematic
approach.

1.1.9 Eigenvectors and eigenvalues


Recall the definition of eigenvalues and eigenvectors.

Definition 1.13. For a linear map T : V → V on a vector space V over F the number λ ∈ F is called
an eigenvalue of f if there is a non-zero vector v such that

T (v) = λv . (1.72)

In this case, v is called an eigenvector of f with eigenvalue λ.

The eigenspace for λ ∈ F is defined by

EigT (λ) := Ker(T − λ idV ) , (1.73)

so that λ is an eigenvalue iff EigT (λ) 6= {0}. If dim(EigT (λ)) = 1 the eigenvalue is called non-degenerate
(there is only one eigenvector up to re-scaling) and degenerate otherwise (there are at least two linearly
independent eigenvectors).
Let us recall the basis facts in the finite-dimensional case. The eigenvalues can be obtained by finding
the zeros of the characteristic polynomial

χT (λ) := det(T − λid) . (1.74)

21
For each eigenvalue λ the associated eigenspace is obtained by finding all solutions v ∈ V to the equation
(T − λid)v = 0. The most important applications of eigenvalues and eigenvectors in the finite-dimensional
case is to diagonalising linear maps, that is, finding a basis in which the matrix describing the linear map
is diagonal. Recall that diagonalising a linear map T is possible if and only if T has a basis v1 , . . . , vn
of eigenvectors. Indeed, in this case T (vi ) = λi vi and the matrix describing T relative to this basis
is diag(λ1 , . . . , λn ). There are certain classes of linear operators which are known to have a basis of
eigenvectors and can, hence, be diagonalised. These include self-adjoint linear operators and normal
operators, that is, operators satisfying [T, T † ] = 0.
Some useful statements which are well-known in the finite-dimensional case continue to hold in infinite
dimensions, such as the following

Theorem 1.24. Let V be an inner product vector space. If T : V → V is self-adjoint then


(i) All eigenvalues of T are real.
(ii) Eigenvectors for different eigenvalues are orthogonal.

Exercise 1.25. Proof Theorem 1.24.

Application 1.3. Eigenvalvues and eigenvectors for a differential operator


As an illustration of this theorem in the infinite-dimensional case, consider the space Cp∞ ([−π, π]) of
infinitely many times differentiable and periodic (real) functions on the interval [−π, π]. (By periodic
functions we mean functions f with f (π) = f (−π) and f 0 (π) = f 0 (−π).) On this vector space, we
define the usual inner product Z π
hf, gi := dx f (x)g(x) . (1.75)
−π

A calculation analogous to the one in Eq. (1.68) (where periodicity allows discarding the boundary
term) shows that the operator d/dx is anti-hermitian and d2 /dx2 is hermitian relative to this inner
product.

Exercise 1.26. For the vector space Cp∞ ([−π, π]) with inner product (1.75) show that d/dx is anti-
hermitian and d2 /dx2 is hermitian.

From Theorem 1.24 we, therefore, conclude that eigenvectors of d2 /dx2 for different eigenvalues must
be orthogonal. To check this explicitly, we write down the eigenvalue equation

d2 f
= λf . (1.76)
dx2
For λ > 0 the solutions to this equation are (real) exponential and, hence, cannot be elements of our
vector space of periodic functions. For λ < 0 the eigenfunctions are given by fk (x) = sin(kx) and
gk (x) = cos(kx), where λ = −k 2 . At this point k is still arbitrary real but for fk and gk to be periodic
with period 2π we need k ∈ Z. Of course for fk we can restrict to k ∈ Z>0 and for gk to k ∈ Z≥0 . In
summary, we have the eigenvectors and eigenvalues

fk (x) = sin(kx) k = 1, 2, . . . λ = −k 2
(1.77)
gk (x) = cos(kx) k = 0, 1, . . . λ = −k 2

In particular, this implies that the eigenvalues λ = −k 2 for k = 1, 2, . . . are degenerate. By direct
calculation, we can check that for k 6= l, we have hfk , fl i = hgk , gl i = hfk , gl i = 0, as stated by

22
Theorem 1.24. In fact, we also have hfk , gk i = 0 which is not predicted by the theorem but follows
from direct calculation.

Exercise 1.27. Show that the functions (1.77) satisfy hfk , fl i = hgk , gl i = hfk , gl i = 0 for k 6= l as
well as hfk , gk i = 0, relative to the scalar product (1.75).

The above example leads to the Fourier series which we will discuss in Section 3.

There are also constraints on the possible eigenvalues of unitary operators.


Theorem 1.28. Let U : V → V be a unitary linear operator on an inner product space V . If λ is an
eigenvalue of U then |λ| = 1.
Exercise 1.29. Proof Theorem 1.28.

1.2 Topology and convergence∗


We will now introduce some basics of topology and convergence. In this sub-section, we will be working
in a normed vector space V over a field F with norm k · k and, hence, this will be more general than a
discussion in the context of real numbers, where these ideas are typically being introduced for the first
time. However, keep in mind that the real (and complex) numbers do form normed vector spaces and are,
hence, covered, as special cases, by the discussion below.

Generalities We begin by defining the ball Br (v) around any v ∈ V with radius r > 0 by

Br (v) := {w ∈ V | k v − w k < r} . (1.78)

Note that this is the “full” ball including all of the “interior” but, due to the strictly less condition in the
definition, without the bounding sphere.
We would like to consider infinite sequences (v1 , v2 , . . . , vi , . . .) of vectors vi ∈ V which we also denote
by (vi )∞
i=1 or simply by (vi ), when the range of the index i is clear from the context.

1.2.1 Convergence and Cauchy sequence


What does it mean for such a sequence to converge to a vector v ∈ V ?
Definition 1.14. (Convergence of a sequence) A sequence (vi )∞ i=1 in a normed vector space V converges
to a vector v ∈ V if, for every  > 0, there exists a positive integer k such that vi ∈ B (v) for all i > k.
In this case, we write limi→∞ vi = v.
Note that, while this definition might sound somewhat convoluted, it actually captures the intuitive idea
of convergence. It says that the sequence converges to v if, for every small deviation  > 0, there is always
a “tail”, sufficiently far out, which is entirely contained within the ball of radius  around v. (See Fig. 1.)

Application 1.4. Convergence of a sequence


As a very simple example for how to use the above definition of convergence consider the sequence
(aj ) in R (with the standard norm given by ||x|| = |x|) defined by aj = 1/j. It is intuitively clear
that this sequence converges to 0 but how can this be shown? We start with any  > 0. For any such
 we can always choose a sufficiently large k ∈ N such that k > 1/. Then, for any j > k we have
|aj − 0| = 1/j < 1/k < . This shows that limj→∞ aj = 0.

23
There is a related, but somewhat weaker notion of convergence which avoids talking about the vector the
sequence converges to. Sequences which converge in this weaker sense are called Cauchy sequences and
are defined as follows. f f
Definition 1.15. (Cauchy sequence) A sequence (vi )∞ i=1 in a normed vector space V is called a Cauchy
. . a positive integer k such that k vi − v.j. k. <  for all i, j > k. (See
sequence if, for every  > 0, there .exist
Fig. 1.)
In other words, a sequence is a Cauchy sequence if for every small  > 0 there is a “tail”, sufficiently far
out, such that the anorm
= x0 x 1 x2 x3 . each
between .. two = b x in the taila is
xn vectors = less x2 x3 .. .The
x0 x1than . notions x convergent
xn = b of

vl. . v
v . . . vk+1 . . k+1
vk . vk
.v.k 1 .v.k 1
. .
B✏ (v) v2 v2

v1 v1

Figure 1: Convergence of a sequence (vk ) to a limit v (left) and Cauchy convergence (right).

sequence and Cauchy sequence lead to analogous notions for a series ∞


P
i=1 vi , which can be defined by
focusing on its partial sums.
Definition 1.16. A series ∞
P
i=1 vi is called convergent P
to a vector v (is called a Cauchy series) iff the
associated sequence of partial sums (sk )k=1 , where sk = ki=1 vi , converges to the vector v (is a Cauchy

sequence).
Exercise 1.30. Show that every convergent sequence in a normed vector space is also a Cauchy sequence.
(Hint: Use the triangle inequality.)

Application 1.5. A divergent series


We would like to show that the series ∞ j=1 j in R (with the standard absolute values norm) diverges
1
P
and to do this it is enough to show that it is not Cauchy convergent. We look at the difference |si − sj |
of two partial sequences and, assuming i < j, this is given by |si − sj | = jl=i+1 1l . Choose i = 2n − 1
P
and j = 2n+1 − 1 for some integer n. In this case, we have
1 1 1 1 1
|si − sj | = n
+ n + · · · + n+1 > 2n n+1 = ,
2 2 +1 2 −1 2 2

where the inequality follows from the fact we have 2n terms each of which is larger than 1/2n+1 .
Choose an  < 1/2 and for any k an integer n with 2n > k + 1. Then, setting i = 2n − 1 and

24
j = 2n+1 − 1 we have i, j > k and |si − sj | > 1/2 >  so the condition for Cauchy convergence cannot
be satisfied.

For a series there is also a stronger version of convergence, called absolute convergence.

Definition 1.17. A series ∞


P P∞
i=1 vi is called absolutely convergent if i=1 k vi k converges (as a series
over the real numbers).

Exercise 1.31. Show that an absolutely convergent series is a Cauchy series.

While every convergent sequence is a Cauchy sequence the opposite is not true. The classical example is
provided by the rational numbers Q (viewed as a normed space, with the absolute modulus as the norm)
and a sequence (qi ) of rational numbers which converges to a real number x ∈ R \ Q. This is clearly a
Cauchy sequence but it does not converge since the prospective limit is not contained in Q (although it
does converge seen as a sequence in R). This example is typical and points to an intuitive understanding
of what non-convergent Cauchy sequences mean. They indicate a deficiency of the underlying normed
vector space which has “holes” and is, as far as convergence properties are concerned, incomplete. This
idea will play an important role for the definition of Banach and Hilbert spaces later on and motivates
the following definition of completeness.

Definition 1.18. A normed vector space is called complete iff every Cauchy series converges.

1.2.2 Open and closed sets


Our next step is to introduce some basic topology in normed vector spaces, most importantly the ideas
of open and closed sets.

Definition 1.19. (Open and closed sets) Let V be a normed vector space.
A subset U ⊂ V is called open if, for every u ∈ U , there exists an  > 0 such that B (u) ⊂ U .
A subset C ⊂ V is called closed if V \ C is open.
For an arbitrary subset S ⊂ V , the closure of S, denoted S̄, is the smallest closed set which contains S.
A v ∈ V is called a limit point of a subset S ⊂ V if there is a sequence (vi ), entirely contained in S, which
converges to v.

An open set is simply a set which contains a ball around each of its points while a closed set is the
complement of an open set.

Exercise 1.32. Show that a ball Br (v) in a normed vector space is open for every r > 0 and every v ∈ V .

The ideas of convergence and closed sets relate in an interesting way as stated in the following lemma.

Proposition 1.2. For a normed vector space V and a subset S ⊂ V we have the following statements.
(a) S is closed ⇐⇒ All limit points of S are contained in S.
(b) The closure, S̄, consists of S and all its limit points.

Proof. (a) “⇒”: Assume that S is closed so that its complement U := V \ S is open. Consider a limit
point v of S, with a sequence (vi ) contained in S and converging to v. We need to show that v ∈ S.
Assume that v ∈ U . Since U is open there is a ball B (v) ⊂ U entirely contained in U and vi ∈ / B (v)
for all i. But this means the sequence (vi ) does not converge to v which is a contradiction. Hence, our
assumption that v ∈ U was incorrect and v ∈ S.
(a) “⇐”: Assume all limit points of S are contained in S. We need to show that U := V \ S is open.

25
Assume that U is not open, so that there is a u ∈ U such that every ball B (U ) around u contains a
v with v ∈ / U . For every positive integer k, choose such a vk ∈ B1/k (u) with vk ∈ / U . It is clear that
the sequence (vk is entirely contained in S (since its not in the complement U ) and converges to u ∈ / S.
Hence, u is a limit point of S but it is not contained in S. This is a contradiction, so our assumption is
incorrect and U must be open.
(b) Define the set Ŝ = S ∪ {all limit points of S}. Using the result (a) it is straightforward to show that
Ŝ is closed. Hence, Ŝ is a closed set containing S which implies, S̄ being defined as the smallest such set,
that S̄ ⊂ Ŝ. Conversely, since S̄ is closed it must contain by (a) all its limit point, including the limit
points of S, so that Ŝ ⊂ S̄.

Another important class of subsets are the compact ones:

Definition 1.20. A set S ⊂ V is called compact iff it is closed and bounded, that is, if there is an R > 0
such that k v k < R for all v ∈ S.

In the context of Hilbert spaces the ideas of a dense subset and separability will become important.

Definition 1.21. A subset S ⊂ V of a normed vector space V is called dense if S̄ = V . A normed vector
space is called separable iff it has a countable, dense subset, that is, a dense subset of the form (vi )∞
i=1 .

The relevance of dense subsets can be seen from the following exercise.

Exercise 1.33. For a normed vector space V and a subset S ⊂ V , proof the equivalence of the following
statements.
(i) S is dense in V
(ii) Every v ∈ V is a limit point of S.

Hence, every element of the vector space can be “approximated” from within a dense subset.

Exercise 1.34. Show that every finite-dimensional normedPn vector space over R (or over C) is separable.
(Hint: Consider a basis (vi )i=1 and linear combinations i=1 αi vi , where αi ∈ Q (or αi ∈ Q + iQ).)
n

An important theoretical statement that will help with some of our later proofs is

Theorem 1.35. (Stone-Weierstrass) The set of polynomials PR ([a, b]) is dense in CR ([a, b]).

Proof. For the proof, see for example Ref. [5].

Finally, we should briefly discuss other notions of convergence which can be defined for function vector
spaces and are based on point-wise convergence in R or C (with the modulus norm), rather than on the
norm k · k as in Def. 1.14.

i=1 be a sequence of (real or complex valued) functions on the open set U ⊂ R .


Definition 1.22. Let (fi )∞ n

We say that the sequence converges point-wise to a function f on U iff the sequence (fi (x))∞
i=1 converges
to f (x) for all x ∈ U (with respect to the real or complex modulus norm).
We say the sequence converges to f uniformly iff for every  > 0 there exists an n ∈ N such that |fi (x) −
f (x)| <  for all i > n and for all x ∈ U .

Uniform convergence demands that a single value n, specifying the “tail” of the sequence, can be chosen
uniformly for all x ∈ U . Point-wise convergence merely asks for the existence of an n for each point x ∈ U ,
that is, the choice of n can depend on the point x. Hence, uniform convergence is the stronger notion and
it implies point-wise convergence.

26
1.3 Measures and integrals∗
We have already seen that norms and scalar products defined by integrals, as in Eqs. (1.25) and (1.59),
are commonplace in function vector spaces. To understand such spaces properly we need to understand
the integrals and this is where things become tricky. While you do know how to integrate classes of
functions you may not yet have seen the actual definition of the most basic type of integral, the Riemann
integral. To complicate matters further, the Riemann integral is actually not what we need for the present
discussion but we require a rather more general integral - the Lebesgue integral. Introducing either type
of integral properly is labour intense and can easily take up the best part of a lecture course - clearly
not something we can indulge in. On the other hand, these integrals form the background for much of
the discussion that follows. For these reasons, we will try to sketch out the main ideas, starting with the
Riemann integral and then move on to the Lebesgue integral, without going through all the details and
proofs. Along the way, we will introduce the ideas of measures and measure spaces which are a very useful
general structures underlying many mathematical constructions.

1.3.1 The Riemann integral


As a warm-up, we would now like to sketch the construction of the Riemann integral - a classical topic for
a first year analysis course. We begin with a finite interval [a, b] ⊂ R, our prospective range of integration.
We call a function ϕ : [a, b] → R piecewise constant if there is a partition x0 = a < x1 < · · · < xn−1 <
xn = b of the interval such that for all x ∈]xi−1 , xi [ the function value ϕ(x) =: ci is constant, where
i = 1, . . . , n. It is very easy to define an integral for piecewise constant functions ϕ by
Z b Xn
dx ϕ(x) := (xi − xi−1 )ci , (1.79)
a i=1

that is, by simply summing up the areas of the rectangles associated to the segments of the partition. It
can be shown that this integral is well-defined (that is, it is independent of the choice of partition), that
the space, Γ([a, b]) of all piecewise constant functions on [a, b] is a vector space and that the integral (1.79)
is a linear functional on this space.
Exercise 1.36. Show that the space Γ([a, b]) of all piecewise constant functions an the interval [a, b] ⊂ R
is a vector space. Also show that the integral (1.79) is a linear functional on Γ([a, b]).
Of course having an integral for piecewise constant functions is not yet good enough. The idea is to
define the Riemann integral by approximating functions by piecewise constant functions and then taking
an appropriate limit. For two functions f, g : [a, b] → R we say that f ≤ g (f ≥ g) iff f (x) ≤ g(x)
(f (x) ≥ g(x)) for all x ∈ [a, b]. For an arbitrary (but bounded) function f : [a, b] → R we introduce the
upper and lower integral by 4
Z ∗b Z b 
dx f (x) := inf dx ϕ(x) | ϕ ∈ Γ([a, b]) , ϕ ≥ f (1.80)
a a
Z b Z b 
dx f (x) := sup dx ϕ(x) | ϕ ∈ Γ([a, b]) , ϕ ≤ f . (1.81)
∗a a

Note that these definitions precisely capture the intuition of approximating f by piecewise constant func-
tions, the upper integral by using piecewise constant function “above” f , the lower integral by using
piecewise constant functions “below” f . (See Fig. 2.) After this set-up we are ready to define the Rie-
mann integral.
4
For a subset S ⊂ R, the supremum, sup(S), is the smallest number which is greater equal than all elements of S. Likewise,
the infimum, inf(S), of S is the largest number less equal than all elements in S.

27
f f

... ...

a = x0 x1 x2 x3 . . . xn = b x a = x0 x1 x2 x3 . . . xn = b x

Figure 2: Approximation of a function f by lower and upper piecewise constant functions, used in the definition
of the Riemann integral.

Definition 1.23. (Riemann integral) A (bounded) function f : [a, b] → R is called Riemann integrable iff
the upper and lower integrals in Eqs. (1.80) and (1.81) are equal. In this case, the common value is called
the Riemann integral of f and is written as
Z b
dx f (x) . (1.82)
a

This is where the work begins. Now we have to derive all the properties of the Riemann integral (which
you are already familiar with) from this definition. We are content with citing

Theorem 1.37. All continuous and piecewise continuous functions f : [a, b] → R are Riemann integrable.

The proof of the above theorem, along with a proof of all the other standard properties of the Riemann
integral starting from Def. 1.23, can be found in most first year analysis textbooks, see for example [10].
We cannot possibly spend more time on this but hopefully the above set-up gives a clear enough starting
point to pursue this independently (which I strongly encourage you to do).
We have already used integrals, somewhat naively, in the definitions (1.25) and (1.59) of a norm and
a scalar product on the space C([a, b]) of continuous functions. We can now be more precise and think of
these integrals as Riemann integrals in the above sense. Unfortunately, this does not lead to particularly
nice properties. For example, for C([a, b]) with the norm (1.25), there are Cauchy convergent sequences
(of functions) which converge to non-continuous functions.

Exercise 1.38. Find an example of a sequence of functions in C([a, b]) (where you can choose a and b)
which converges, relative to the norm (1.25), to a function not contained in C([a, b]).

This means that C([a, b]) is not a complete space, in the sense of Def. (1.18). What is worse, it turns
out that even the space of all Riemann integrable functions on [a, b] is not complete, so the deficiency is
with the Riemann integral. Essentially, the problem is that the Riemann integral is based on a too simple
method of approximation, using finite partitions into intervals. To fix this, we need to be able to measure
the “length” of sets which are more complicated than intervals and this leads to the ideas of measures and
measure sets which we now introduce.

28
1.3.2 Measures and measure sets
This discussion starts fairly general with an arbitrary set X, subsets of which we would like to measure.
Typically, not all such subsets will be suitable for measurement and we need to single out a sufficiently
nice class of subsets Σ, which is called a σ-algebra.
Definition 1.24. (σ-algebra) For a set X, a set of subsets Σ of X is called a σ-algebra if the following is
satisfied.
(S1) {} , X ∈ Σ
(S2) S ∈ Σ ⇒ X \ S ∈ Σ
(S3) Si ∈ Σ for i = 1, 2, · · · ⇒ ∞
S
i=1 Si ∈ Σ

On a σ-algebra, a measure µ is defined as:


Definition 1.25. For a set X with σ-algebra Σ a function µ : Σ → R≥0 ∪ {∞} is called a measure iff
(M1) µ({}) = 0
(M2) For Si ∈ Σ, where i = 1, 2, · · · , and the Si mutually disjoint we have
∞ ∞
!
[ X
µ Si = µ(Si ) . (1.83)
i=1 i=1

The triple (X, Σ, µ) is called a measure space.


The idea is that the σ-algebra generalises the simple notion of intervals and interval partitions used for
the Riemann integral, while the measure µ generalises the notion of the length of an interval. The crucial
difference to what happened for the Riemann integral (where partitions into finitely many intervals were
used) is that we consider countably infinite unions in (S3) and (M2).
While from (M1) the measure of the empty set in a measure space is zero, there may well be other
non-empty sets with zero measure. This motivates the following definition.
Definition 1.26. For a measure space (X, Σ, µ) a set S ∈ Σ is said to have measure zero iff µ(S) = 0.
Proceeding in analogy with the Riemann integral we can now attempt to define an integral for functions
f : X → R, based on the measure space (X, Σ, µ). We start with the characteristic function χS for a
subset S ⊂ X defined as 
1 for x ∈ S
χS (x) := . (1.84)
0 for x ∈
/S
A function ϕ : X → R is called simple if it is of the form ϕ = ki=1 αi χSi for sets Si ∈ Σ and αi ∈ R.
P
(These are the generalisations of the piecewise constant functions used for the Riemann integral.) For a
non-negative simple function ϕ the integral is defined as
Z k
X
ϕ dµ := αi µ(Si ) . (1.85)
X i=1

Note that this captures the intuition: we simply multiply the “height” αi of the function over each set Si
with the measure µ(Si ) of this set. The set-up of the general integral on a measure space is summarised
in the following definition.
Definition 1.27. Let (X, Σ, µ) be a measure set. We say a function f : X → R is measurable iff
{x ∈ X | f (x) > α} ∈ Σ for all α ∈ R. For a non-negative measurable function f : X → R the integral is
defined as Z Z 
f dµ := sup ϕ dµ | ϕ simple and 0 ≤ ϕ ≤ f . (1.86)
X

29
R
A function is called integrable iff it is measurable and X |f | dµ is finite. For a measurable, integrable
function f : X → R the integral is then defined as
Z Z Z
f dµ := f + dµ − f − dµ , (1.87)
X X X

where f ± (x) := max{±f (x), 0} are the positive and negative “parts” of f . The space of all integrable
functions f : X → R is also denoted by L1 (X). The above construction can be generalised to complex-
valued functions f : X → C by splitting up into real and imaginary parts.

It can be shown that the integral defined above is linear and that the space L1 (X) is a (sub) vector space.
The obvious course of action is to try to make L1 (X) into a normed vector space by using the above
integral to define a norm. However, there is a twist. If there are non-trivial sets S ∈ Σ which are measure
zero, then we have non-trivial functions, for example the characteristic function χS , which integrate to
zero. This is in conflict with the requirement (N1) for a norm in Def. 1.6 which asserts that the zero
vector (that is, the zero function) is the only vector with length zero. Fortunately, this problem can be
fixed by identifying two functions f, g ∈ L1 (X) if they only differ on a set of measure zero, so

f ∼g :⇐⇒ µ({x ∈ X | f (x) 6= g(x)}) = 0 . (1.88)

The space of so-obtained classes of functions in L1 (X) is called L1 (X) and this set can be made into a
normed space defining the norm k · k : L1 (X) → L1 (X) by
Z
k f k := |f | dµ . (1.89)
X

We can generalise this construction and for 1 ≤ p < ∞ define the spaces
( Z 1/p )
p p
L (X) = f | f is measurable and |f | dµ finite . (1.90)
X

On these spaces, we can identify functions as in Eq. (1.88) and the resulting space of classes, Lp (X), can
be made into normed vector spaces with norm
Z 1/p
p
k f kp := |f | dµ . (1.91)
X

Exercise 1.39. Show that that Eq. (1.91) defines a norm on Lp (X). (Hint: To show (N3) use Minkowski’s
inequality (1.24).)

The all-important statement about the normed vector spaces Lp (X) is the following.

Theorem 1.40. The normed vector spaces Lp (X) in Eq. (1.90) with norm (1.91) are complete.

Proof. The proof can be found, for example, in Ref. [5].

Our previous experience suggest that the space L2 (X) can in fact be given the structure of an inner
product vector space. To do this we need the following

Exercise 1.41. Show that for f, g ∈ L2 (X) it follows that f¯g is integrable, that is f¯g ∈ L1 (X). (Hint:
Use Hölder’s inequality in Eq. (1.24).)

30
Hence, for two functions f, g ∈ L2 (X), the prospective scalar product
Z
hf, gi := f ∗ g dµ (1.92)
X

is well-defined.
Exercise 1.42. Show that Eq. (1.92) defines a scalar product on L2 (X).
Recall that X is still an arbitrary set so the above construction of measure sets and integrals is very
general. Especially, the statement (1.40) about completeness is quite powerful. It says that the spaces
Lp (X) behave nicely in terms of convergence properties - every Cauchy series converges. This is quite
different from what we have seen for the Riemann integral. We should now exploit this general construction
by discussing a number of examples.

1.3.3 Examples of measure spaces


Probability: This is perhaps an unexpected example and somewhat outside our main line of develop-
ment. It might still be useful to illustrate the strength and breadth of the general approach and to make
connections with other courses, where probability is discussed in more detail. The basis of probability
theory is formed by the Kolmogorov axioms which are given in the following definition.
Definition 1.28. (Kolmogorov axioms of probability) Let Ω be a set, Σ a σ-algebra on Ω and p : Σ → R
a function. The triplet (Ω, Σ, p) is called a probability space if the following holds.
(K1) p(E) ≥ 0 for all E ∈ Σ
(K2) p(Ω) = 1
(K3) For Ei ∈ Σ, where i = 1, 2, · · · , and the Ei mutually disjoint we have
∞ ∞
!
[ X
p Ei = p(Ei ) . (1.93)
i=1 i=1

In this case, Ω is called the sample space, Σ the event space and p the probability measure.
Comparing this definition with Def. 1.25 shows that a probability space (Ω, Σ, p) is, in fact, a particular
measure space with a few additional properties for p, in order to make it a suitable measure for proba-
bility. (The condition (M1) in Def. 1.25, µ({}) = 0, can be deduced from the Kolmogorov axioms.) The
measurable functions f : Ω → R on this space are also called random variables and the integral
Z
E[f ] := f dp (1.94)

is called the expectation value of the random variable f .


Counting measure: Choose X = N to be the natural numbers, Σc to be all subsets of N and for a set
S ∈ Σc define the measure µc (S) as the number of elements of S (with ∞ permitted). Then, (N, Σc , µc )
is a measure space and µc is called the counting measure on N. The functions f : N → R on this space

P∞ with the sequences (xi )i=1 (where xi = f (i − 1))
can be identified P and the integrable “functions” are
those with i=1 |xi | < ∞ while the integral is simply the series ∞ i=1 xi . Specialising from the general
p p
construction (1.90), we can define the spaces ` := L (N) which are explicitly given by
 !1/p 
 X∞ 
`p = (xi )∞i=1 | |xi |p <∞ , (1.95)
 
i=1

31
which are normed vector spaces with norm

!1/p
X
p
k (xi ) kp = |xi | . (1.96)
i=1

Recall that we know from Theorem 1.40 that the spaces `p are complete, relative to this norm. The space
`2 is an inner product space with scalar product

X
h(xi ), (yi )i = x̄i yi . (1.97)
i=1

Lebesgue measure: The Lebesgue measure provides a measure on R (and, more generally, on Rn ) but
constructing it takes some effort and time. Instead we take a short-cut and simply state the following
theorem.
Theorem 1.43. There is a σ-algebra ΣL on R and a measure µL on ΣL , called the Lebesgue measure,
with the following properties.
(L1) All intervals [a, b] ∈ ΣL .
(L2) µL ([a, b]) = b − a
(L3) The sets S of measure zero in ΣL are S∞ characterisedPas follows. For any  > 0 there are intervals

[ai , bi ], where i = 1, 2, · · · , such that S ⊂ i=1 [ai , bi ] and i=1 (bi − ai ) < .
The measure space (R, ΣL , µL ) is uniquely characterised by these properties.
Note that the Lebesgue measure leads to non-trivial sets with measure zero. For example, any finite set
of points in R has measure zero.

i=1 in R have measure zero with


Exercise 1.44. Show that any finite set of points and any sequence (xi )∞
respect to the Lebesgue measure µL .
The above Lebesgue measure has been defined in R and, hence, measures length but it can be suitably
generalised to R2 to measure areas, to R3 to measure volumes and to Rn to measure generalised volumes in
n dimensions. This means we have measure spaces (Rn , ΣL , µL ). Of course, this induces measure spaces on
subsets U ⊂ Rn as long as U ∈ ΣL by simply defining the restricted σ-algebra ΣL (U ) = {S ∈ ΣL |, S ⊂ U }
and in this way we have measure spaces (U, ΣL (U ), µL ).
The integral associated with the measure space (Rn , ΣL , µL ) (or, more generally, with the measure
space (U, ΣL (U ), µL )) is called the Lebesgue integral and it is written as
Z
dx f (x) . (1.98)
U
R
The Lebesgue-integrable functions are those for which U dx |f (x)| is finite and following the general
construction, we can define the spaces
( Z 1/p )
p p
L (U ) = f| dx |f (x)| <∞ . (1.99)
U

The associated spaces Lp (U ), obtained after the identification (1.88) of functions which only differ on sets
of measure zero, are complete normed vector spaces with norm
Z 1/p
k f kp = dx |f (x)|p . (1.100)
U

32
The space L2 (U ) is an inner product vector space with inner product
Z
hf, gi = dx f (x)∗ g(x) . (1.101)
U

As for the relation between the Riemann and the Lebesgue integrals we have

Theorem 1.45. Every Riemann-integrable function is Lebesgue integrable and for such functions the two
integrals are equal.

Proof. For the proof see, for example, Ref. [10].

This means that for practical calculations with the Lebesgue integral we can use all the usual rules of
integration, as long as the integrand is sufficiently “nice” (for example, Riemann integrable). While the
Riemann-integrable functions are included in the Lebesgue-integrable ones, the latter set is much larger
and this facilitates the completeness properties associated to the Lebesgue integral. In the following, when
we write an integral, we usually refer to the Lebesgue integral.

33
2 Banach and Hilbert spaces∗
Banach and Hilbert spaces are central objects in functional analysis and their systematic mathematical
study easily fills a lecture course. Clearly, we cannot afford to do this so we will focus on basics and some
of the results relevant to our later applications. Proofs are provided explicitly only when they can be
provided in a concise fashion and references will be given whenever proofs are omitted. Our main focus
will be on Hilbert spaces which provide the arena of most of the applications discussed later and indeed
are the correct setting for quantum mechanics. We begin with the more basic notion of Banach spaces,
then move on to Hilbert spaces and finish with a discussion of operators on Hilbert spaces.

2.1 Banach spaces


We begin with the definition of Banach spaces.

Definition 2.1. A Banach space is a complete normed vector space.

Recall from Def. 1.18 that completeness means convergence of every Cauchy series, so a Banach space is a
vector space with a basic notion of geometry, as provided by the norm, and good convergence properties.
We have already encountered several important examples of Banach spaces which we recall.

2.1.1 Examples of Banach spaces


• The normed vector spaces Rn and Cn with any of the norms k · kp in Eq. (1.23) are complete
(essentially because the real numbers are complete) and they are, hence, Banach spaces.

• For a measure space (X, Σ, µ), where X is an arbitrary set, we have defined the Banach spaces Lp (X)
1/p
in Eq. (1.90). It consists of all measurable functions f : X → R (or f : X → C) with X |f |p dµ
R

finite and the norm is given in Eq. (1.91). Completeness of these normed vector space is asserted by
Theorem 1.40. This is a very large class of Banach spaces which includes many interesting examples,
some of which we list now.

• Associated to the measure space (N, Σc , µc ) with counting measure µc , introduced in the previous
P∞ p 1/p finite. The
i=1 in R (or C) with ( i=1 |xi | )
sub-section, we have the space `p of all sequences (xi )∞
norm on this space is provided by Eq. (1.96) and Theorem 1.40 guarantees completeness.

• For a Lebesgue measure space (U, ΣL (U ), µL ), where U ⊂ Rn is a Lebesgue measurable set, we


have defined the space Lp (U ) which consists of measurable functions f : U → R (or f : U → C)
1/p
with U dx |f (x)|p finite. (More precisely, Lp (U ) consists of classes of such functions, identified
R

according to Eq. (1.88).) The norm on these spaces is given by Eq. (1.100) and completeness is
guaranteed by Theorem 1.40. We will sometimes write LpR (U ) or LpC (U ) to indicate whether we are
talking about real or complex valued functions.

An important theoretical property for the Banach spaces Lp ([a, b]) which we will need later is

Theorem 2.1. The space C([a, b]) is dense in Lp ([a, b]). Further, the space Cc∞ (Rn ) is dense in Lp (Rn ).

Proof. For the proof see, for example, Ref. [5].

34
2.2 Hilbert spaces
Hilbert spaces are defined as follows.

Definition 2.2. An inner product vector space H is called a Hilbert space if it is complete (relative to the
norm associated to the scalar product).

We know that the Banach spaces given in the previous sub-section can be equipped with a scalar product
when p = 2 and this provides us with examples of Hilbert spaces.

2.2.1 Examples of Hilbert spaces


• The inner product vector spaces Rn and Cn with inner product (1.56) are complete (since they are
Banach spaces relative to the norm associated to this scalar product) and they are, hence, Hilbert
spaces. We know that any finite-dimensional inner product vector space over the field R (over the
field C) is isomorphic to Rn (or Cn ) by mapping a vector to its coordinate vector relative to some
chosen basis. If we choose an ortho-normal basis we know from Eq. (1.32) that, in terms of the
coordinates, the scalar product can be expressed in terms of the standard scalar product on Rn or
Cn . Together, these facts imply that any finite-dimensional inner product vector space over R or C
is a Hilbert space.

• For the measure set (X, Σ, µ), the space L2 (X), defined in Eq. (1.90) is an inner product vector
space with inner product given by Eq.(1.92). We already know that this is a Banach space (relative
to the norm associated to the scalar product), so L2 (X) is complete and, hence, a Hilbert space.

• Associated to the measure space (N, Σc , µc ) with counting measure µc we have the space `2 of all
2 1/2 finite. An inner product on this space is given by
P∞
i=1 in R (or C) with
sequences (xi )∞

i=1 |xi |
Eq. (1.97). Since `2 is a Banach space it is complete and is, hence, also a Hilbert space.

• For a Lebesgue measure space (U, ΣL (U ), µL ), where U ⊂ Rn is a Lebesgue measurable set, we


have defined the space L2 (U ) which consists of measurable functions f : U → R (or f : U → C)
1/2
with U dx |f (x)|2
R
finite. This is an inner product vector space with inner product given by
Eq. (1.101). Following the same logic as before, L2 (U ) is a Banach space and it is, hence, complete
and a Hilbert space. This space is also called the Hilbert space of square integrable functions on U .
We will sometimes write L2R (U ) or L2C (U ) to indicate whether we are talking about real or complex
valued functions.

• There is a useful generalisation of the previous example which we will need later. On an interval
[a, b] ⊂ R introduce an everywhere positive, integrable function w : [a, b] → R>0 , called the weight
function, and define the space L2w ([a, b]) as the space of measurable functions f : [a, b] → R with
R 1/2
dx w(x)|f (x)|2 finite. We can introduce
[a,b]
Z
hf, gi := dx w(x)f (x)∗ g(x) . (2.1)
[a,b]

With the usual identification of functions, as in Eq. (1.88), this leads to a Hilbert space, called
L2w ([a, b]), with scalar product (2.1).

35
2.2.2 Orthogonal basis
We have seen that an ortho-normal basis for a finite-dimensional Hilbert space is really the most convenient
tool to carry out calculations. We should now discuss the concept of ortho-normal basis for infinite-
dimensional Hilbert spaces. One question we need to address first is what happens when we take a limit
inside one of the arguments of the scalar product.
Lemma 2.1. For a convergent sequence (vi )∞ i=1 in a Hilbert space H and any vector w ∈ H we have
limi→∞ hw, vi i = hw, limi→∞ vi i. A similar statement applies to the first argument of the scalar product.
Proof. Set v := limi→∞ vi and consider the inequality

|hw, vi i − hw, vi| = |hw, vi − vi| ≤ k w k k vi − v k , (2.2)

where the last step follows from the Cauchy-Schwarz inequality (1.28). Convergence of (vi ) to v means
we can find, for each  > 0, a k such that k vi − v k < /k w k for all i > k. This implies that

|hw, vi i − hw, vi| <  (2.3)

for all i > k and, hence, that limi→∞ hw, vi i = hw, vi. The analogous statement for the first argument of
the scalar product follows from the above by using the property (S1) in Def. 1.9.

The above lemma says that a limit can be “pulled out” of the arguments of a scalar product, an important
property which we will use frequently. Another technical statement we require asserts the existence and
uniqueness of a point of minimal distance.
Lemma 2.2. (Nearest point to a subspace) Let W be a closed, non-trivial sub vector space of a Hilbert
space H. Then, for every v ∈ H there is a unique w0 ∈ W such that

k v − w0 k ≤ k v − w k (2.4)

for all w ∈ W .
Proof. We set δ := inf{k v − w k | w ∈ W } and choose a sequence (wi )∞
i=1 contained in W with
limi→∞ k v − wi k = δ. We want to proof that this sequence is a Cauchy sequence so we consider
1 2
k wi − wj k2 = k (wi − v) + (v − wj ) k2 = 2k wi − v k2 + 2k v − wj k2 − 4k (wi + wj ) − v k , (2.5)
2
where the parallelogram law (1.29) has been used in the last step. Since W is a sub vector space it
is clear that the vector 12 (wi + wj ) which appears in the last term above is in W . This means that
k 21 (wi + wj ) − v k ≥ δ and we have

k wi − wj k2 ≤ 2k wi − v k2 + 2k v − wj k2 − 4δ 2 . (2.6)

The RHS of this inequality goes to zero as i, j → ∞ which shows that (wi ) is indeed a Cauchy sequence.
Since a Hilbert space is complete we know that every Cauchy sequance converges and we set w0 :=
limi→∞ wi . Since W is assumed to be closed it follows that w0 ∈ W and, with Lemma 2.1 we have
k v − w0 k = δ. This means that w0 is indeed a point in W of minimal distance to v.
It remains to show that w0 is unique. For this assume there is another w̃ ∈ W such that k v − w̃ k = δ.
Then repeating the calculation (2.5) with wi and wj replaced by w0 and w̃ we have
1 2
k w0 − w̃ k2 = 2k w0 − v k2 + 2k v − w̃ k2 − 4k (w0 + w̃) − v k ≤ 2δ 2 + 2δ 2 − 4δ 2 = 0 , (2.7)
2
so that w̃ = w0 .

36
The above Lemma is the main technical result needed to proof the following important statement about
direct sum decompositions in Hilbert spaces.

Theorem 2.2. (Direct sum decomposition) For any closed sub vector space W of a Hilbert space H we
have H = W ⊕ W ⊥ .

Proof. For W = {0} we have W ⊥ = H so that the statement is true. Now assume that W 6= {0}. For
any v ∈ H we can choose, from Lemma 2.2, a “minimal distance” w0 ∈ W and write

v = w0 + (v − w0 ) . (2.8)

This is our prospective decomposition so we want to show that v − w0 ∈ W ⊥ . To do this we assume that
v − w0 ∈/ W ⊥ so that there is a u ∈ W such that hv − w0 , ui = 1. A short calculation for any α ∈ R show
that
k v − w0 − αu k2 = k v − w0 k2 − 2α + α2 k u k2 . (2.9)
For sufficiently small α the sum of the last two terms on the RHS are negative so in this case k v − w0 − αu k <
k v − w0 k. This contradicts the minimality property of w0 and we conclude that indeed v − w0 ∈ W ⊥ .
Hence, we have H = W + W ⊥ . That this sum is direct has already been shown in Exercise 1.16.

One of our goals is to obtain a generalisation of the formula (1.31) to the infinite-dimensional case. Recall
that the sequence (i )∞
i=1 is called an ortho-normal system iff it satisfies hi , j i = δij . We need to worry
about the convergence properties of such ortho-normal systems and this is covered by the following

Lemma 2.3. (Bessel inequality) Let (i )∞ i=1 be an ortho-normal system in a Hilbert space H and v ∈ H.
Then we have the following statements.
(i) P∞
P 2
P∞ 2 2
i=1 |hi , vi| converges and i=1 |hi , vi| ≤ k v k .

(ii) i=1 hi , vii converges.

Proof. (i) Introduce the partial sums sk = ki=1 hi , vii . A short calculation show that
P

k
X
k v − sk k2 = k v k2 − |hi , vi|2 , (2.10)
i=1

and, hence, ki=1 |hi , vi|2 ≤ k v k2 . This means the series ∞ 2


P P
i=1 |hi , vi| is bounded (and increasing)
and, hence, converges. Since all its partial sums satisfy the stated inequality so does the limit.
(ii) A short calculation shows that
l
X
k sk − sl k2 = |hi , vi|2 (2.11)
i=k
P∞ 2
i=1 |hi , vi| is a Cauchy series the RHS of this expression can be made arbitrarily small. This
and since P

means that i=1 hi , vii is a Cauchy series which converges due to completeness of H.

We are now ready to tackle (infinite) ortho-normal system in a Hilbert space H. We recall that an
ortho-normal system is a sequence (i )∞ ∞
i=1 with hi , j i = δij . By the span, Span(i )i=1 , we mean the
sub-space which consists of all finite linear combinations of vectors i . The following theorem provides
the basic statements about ortho-normal systems.

37
Theorem 2.3. Let (i )∞ i=1 be an ortho-normal system in a Hilbert space H. Then, the following statements
are equivalent.
(i) Every v ∈ H can be written as v = P ∞
P
i=1 hi , vii .
2 ∞
(ii) For every v ∈ H we have k v k = i=1 |hi , vi|2 .
(iii) If hi , vi = 0 for all i = 1, 2, · · · then v = 0.
(iv) Span(i )∞ i=1 = H

Proof. For a statement of this kind it is sufficient to show that (i) ⇒ (ii) ⇒ (iii) ⇒ (iv) ⇒ (i) and we will
proceed in this order.
(i) ⇒ (ii): This follows easily by inserting the infinite sum from (i) into k v k2 = hv, vi and using
Lemma 2.1.
(ii) ⇒ (iii): If hi , vi = 0 for all i then the relation in (ii) implies that k v k = 0. Since the zero vector is
the only one with norm zero it follows that v = 0.
(iii) ⇒ (iv): Set W = Span(i )∞ i=1 (where we recall that the bar means the closure, so W is a closed sub
vector space). Then, from Theorem 2.2, we know that H = W ⊕ W ⊥ . Assume that W ⊥ 6= {0}. Then we
have a non-zero vector v ∈ W ⊥ such that hi , vi = 0 for all i, in contradiction of the statement (iii). This
means that W ⊥ = {0} and, hence, H = W .
(iv) ⇒ (i): We know from Lemma 2.3 that w := ∞
P
i=1 hi , vii is well-defined. A short calculation shows
that hv − w, j i = 0 for all j. This means that v − w ∈ W ⊥ (using the earlier definition of W ) but since
H = W from (iv) we have W ⊥ = {0} and, hence, v = w.

After this preparation, we can finally define an ortho-normal basis of a Hilbert space.
Definition 2.3. An ortho-normal system (i )∞ i=1 in a Hilbert space H is called an ortho-normal basis if
it satisfies any of the conditions in Theorem 2.3.
Thanks to Theorem 2.3 an ortho-normal basis on a Hilbert space provides us with the desired generalisa-
tions of the formulae we have seen in the finite-dimensional case. Eq. (1.31) for the expansion of a vector
in terms of an ortho-normal basis simply generalises to

X
v= αi  i ⇐⇒ αi = hi , vi . (2.12)
i=1
P
and we know
P that the now infinite sum always converges to the vector v. For two vectors v = i αi i
and w = j βj j we have the generalisation of Eq. (1.32)

X ∞
X ∞
X ∞
X
2
hv, wi = hv, i ihi , wi = αi∗ βi , kvk = 2
|hi , vi| = |αi |2 , (2.13)
i=1 i=1 i=1 i=1

where, in the first equation, we have also used Lemma 2.1 to pull the infinite sums out of the scalar
product. The map v → (αi )∞ 2
i=1 defined by Eq. (2.12) is, in fact, a vector space isomorphism H → ` from
our general Hilbert space into the Hilbert space `2 of sequences. Moreover, as Eq. (2.13) shows, this map
is consistent with the scalar products defined on those two Hilbert spaces. The last equation (2.13) which
allows calculating the norm of a vector in terms of an infinite sum over the square of its coordinates is
also referred to as Parseval’s equation. As we will see, for specific examples it can lead to quite non-trivial
relations.
Recall the situation in the finite-dimensional case. Finite-dimensional (inner product) vector spaces
over R or C are isomorphic to Rn or Cn and via this identification abstract vectors are described by
(coordinate) column vectors. If an ortho-normal basis underlies the identification then the scalar product
in coordinates is described by the standard scalar product in Rn or Cn (see Eq. (1.32)).

38
We have now seen that the situation is very similar for infinite-dimensional Hilbert spaces but vectors
in Rn or Cn are replaced by infinite sequences in `2 . However, there is still a problem. While we now
appreciate the usefulness of an ortho-normal basis in a Hilbert space we do not actually know yet whether
it exists.
We recall that the Hilbert space `2 which we have introduced earlier consist of sequences (xi )∞
i=1 (where
P∞ 1/2
xi ∈ R or xi ∈ C) with i=1 |xi |
2 < ∞ and has the scalar product

X
h(xi ), (yi )i = x∗i yi . (2.14)
i=1

It is not difficult to show that `2 has an ortho-normal basis. Introduce the sequences
ei = (0, . . . , 0, 1, 0, . . . , 0, . . .) (2.15)
with a 1 in position i and zero everywhere else. These are obviously the infinite-dimensional analogous of
the standard unit vectors in Rn or Cn . It is easy to see that they are an ortho-normal system relative to
the scalar product (2.14).The following theorem states that they form an ortho-normal basis of `2 .
Theorem 2.4. The “standard unit sequences” ei , defined in Eq. (2.15), form an ortho-normal basis of
the Hilbert space `2 .
Proof. For a sequence (xi ) ∈ `2 we have, from direct calculation, that

X ∞
X
2 2
k (xi ) k = |xj | = |hej , (xi )i|2 , (2.16)
j=1 j=1

and, hence, condition (ii) in Theorem 2.3 is satisfied.


Unfortunately, it is not always as easy to find an ortho-normal basis for a Hilbert space - in fact it does
not always exist. The key property for the existence of an ortho-normal basis is that the Hilbert space is
separable (see Def. 1.21).
Theorem 2.5. Let H be a Hilbert space. Then we have the equivalence
H has an ortho-normal basis ⇐⇒ H is separable
Proof. ”⇐” We should show that H has an ortho-normal system which satisfies any of the properties in
Theorem 2.3. First, let (ui )∞i=1 be the countable dense subset which exists since H is separable. We can
refine this set by omitting any uk which is a linear combination of the ui with i < k and this leads to a
sequence we call (vi )∞ ∞
i=1 . To the sequence (vi )i=1 we can inductively apply the Gram-Schmidt procedure

to obtain an ortho-normal system (i )i=1 with Span(1 , . . . , k ) = Span(v1 , . . . , vk ) for all k. (Recall that
leaving these spans unchanged is one of the properties of the Gram-Schmidt procedure.) Altogether, we
have
Span(i )∞ ∞ ∞
i=1 = Span(vi )i=1 = Span(ui )i=1 . (2.17)
Since the sequence (ui )∞ ∞
i=1 is dense in H it follows that Span(i )i=1 = H and this shows property (iv) in
Theorem 2.3. Pk
”⇒” Now suppose that H has an ortho-normal basis (i )∞ i=1 and consider finite sums of the form i=1 αi i ,
where αi ∈ Q in the real case or αi ∈ Q + iQ in the complex case. Clearly, the set of these finite sums is
countable (as Q is countable) and it is not hard to show that it is dense in H.
It turns out that most Hilbert spaces which appear in practical applications are separable and, hence,
do have an ortho-normal basis. However, the proof is not always straightforward. For the Hilbert space
L2 ([a, b]) of square integrable functions on the interval [a, b] we will proof the existence of an ortho-normal
basis in Section 3 in our discussion of the Fourier series.

39
2.2.3 Dual space
In Eq. (1.33) we have introduced the map ı from a vector space to its dual and we would now like to
discuss this map in the context of a Hilbert space. So we have ı : H → H∗ defined as

ı(v)(w) := hv, wi . (2.18)

This map assigns to every vector v ∈ H a functional ı(v) in the dual H∗ . We have seen that it is always
injective and that, in the finite-dimensional case, every functional can be obtained in this way, so ı is
bijective. The following theorem asserts that the last statement continues to be true for Hilbert spaces.

Theorem 2.6. (Riesz) Let H be a Hilbert space and ϕ ∈ H∗ a functional. Then there exists a v ∈ H
such that ϕ = ı(v), where ı is the map defined in Eq. (2.18).

Proof. If ϕ(w) = 0 for all w ∈ H then we have ϕ = ı(0). Let us therefore assume that Ker(ϕ) 6= H. Then
(Ker(ϕ))⊥ 6= {0} and there exists a u ∈ (Ker(ϕ))⊥ with ϕ(u) = 1. It follows that

ϕ(w − ϕ(w)u) = ϕ(w) − ϕ(w)ϕ(u) = 0 , (2.19)

so that w − ϕ(w)u ∈ Ker(ϕ). Since u ∈ (Ker(ϕ))⊥ this mean

hu, w − ϕ(w)ui = 0 . (2.20)

Solving this equation for ϕ(w) gives

ϕ(w) = hv, wi = ı(v)(w) , (2.21)

where v = u/k u k2 .

We already know that the map ı is injective so this theorem tells us that ı : H → H∗ is an isomorphism
just as it is in the finite-dimensional case. This is why it makes sense to generalise Dirac notation

w → |wi , ı(v) → hv| , hv|wi := hv, wi (2.22)

to a Hilbert space. If our Hilbert space has an ortho-normal basis |i i, then Eq. (2.12) can be written in
Dirac notation as
X∞
|vi = |i ihi |vi , (2.23)
i=1

and this defines the map |vi → (ai )∞


where ai = hi |vi are the coordinates of the vector relative to
i=1 ,
2
the basis i , from H to ` we have mentioned earlier. Writing ai = hi |vi and bi = hi |wi we can write
Eqs. (2.13) in the form

X ∞
X
hv|wi = hv|i ihi |wi = h(ai )|(bi )i`2 , k |vi k2 = hv|vi = hv|i ihi |vi = k (ai ) k`2 , (2.24)
i=1 i=1

where h·|·i`2 denotes the standard scalar product (1.97) on `2 . Hence, the above map H → `2 preserves
the scalar product. Also the identity operator can, at least formally, be written as

X
id = |i ihi | . (2.25)
i=1

40
To pursue this somewhat further consider a linear operator T̂ : H → H, a vector v ∈ H and an ortho-
normal basis (i ) of H so that the operator has matrix elements Tij = hi |T̂ |j i and the vector has
coordinates ai = hi |vi. Then the action T̂ |vi of the operator on the vector has coordinates bi := hi |T̂ |vi
which can be written as
X X
bi = hi |T̂ |vi = hi |T̂ |j ihj |vi = Tij aj . (2.26)
j j

In this way, the Hilbert space action of an operator on a vector is turned into a multiplication of a matrix
and a vector (although infinite-dimensional) in `2 . Similarly, composition of operators in H turns into
“matrix multiplication” in `2 as in the following exercise.
Exercise 2.7. For two operators T̂ , Ŝ : H → H with matrix elements P Tij = hi |T̂ |j i and Sij = hi |Ŝ|j i
show that the matrix elements of T̂ ◦ Ŝ are given by hi |T̂ ◦ Ŝ|j i = k Tik Skj .
The correspondence between a Hilbert space H with ortho-normal basis and the space of sequences `2
outlined above is the essence of how quantum mechanics in the operator formulation relates to matrix
mechanics.

2.3 Linear operators on Hilbert spaces


We have already made a number of statements about linear operators on finite and infinite-dimensional
inner product spaces but there are some features which are significantly more difficult in the infinite-
dimensional case. We are not able to develop the full theory - this requires a lecture on functional analysis
and is well beyond our present scope - but we will try to clear up some simple issues and at least cite a
few relevant results. We begin with a discussion of the adjoint operator.

2.3.1 The adjoint operator


In Def. 1.10 we have defined the adjoint of a linear operator. We know that the adjoint, if it exists, is
unique and we also know that it definitely exists in the finite-dimensional case. For the infinite-dimensional
case we have, so far, made no statement about the existence of the adjoint but for Hilbert spaces we have
the following
Theorem 2.8. Let H be a Hilbert space and T : H → H be a bounded linear operator. Then there exists
a unique linear and bounded operator T † : H → H with

hT † v, wi = hv, T wi . (2.27)

for all v, w ∈ H. The operator T † is called the adjoint of T .


Proof. For any v ∈ H, define a linear functional ϕ ∈ H∗ by ϕ(w) := hv, T wi. From the Cauchy-Schwarz
inequality we have
|ϕ(w)| = |hv, T wi| ≤ k v k k T w k ≤ k v k k T k k w k , (2.28)
so that the functional ϕ is bounded. Hence, from Riesz’s theorem 2.6, there exist a u ∈ H such that
ϕ(w) = hu, wi. If we define T † by T † v = u it follows that

hv, T wi = ϕ(w) = hu, vi = hT † v, wi , (2.29)

so that the so-defined T † does indeed have the required property for the adjoint. By tracing through the
above construction it is easy to show that T † is linear. We still need to show that it is bounded.
2
k T † v k = hT † v, T † vi = hv, T T † vi ≤ k v k k T T † v k ≤ k v k k T k k T † v k . (2.30)

41
If k T † v k = 0 the boundedness condition is trivially satisfied. Otherwise, we can divide by k T † v k and
obtain k T † v k ≤ k v k k T k.

Also recall that a linear operator T : H → H is called self-adjoint or hermitian iff

hv, T wi = hT v, wi , (2.31)

for all v, w ∈ H, that is, if it can be moved from one argument of the scalar product to the other.
Comparison with Theorem 2.8 shows that T is self-adjoint iff T = T † .

Application 2.6. Heisenberg uncertainty relation


Suppose we have two hermitian operators Q, P : H → H which satisfy a [Q, P ] = i. The expectation
values of these operators for a state |ψi ∈ H with hψ|ψi = 1 are defined by

Q̄ := hψ|Q|ψi , P̄ := hψ|P |ψi .

and we can introduce the “shifted”” operators q := Q − Q̄ and p = P − P̄ which also satisfy [q, p] = i.
The variances of Q and P are defined by

∆Q2 := hψ|q 2 |ψi = k qψ k2 , ∆P 2 := hψ|p2 |ψi = k pψ k2

It follows from the Cauchy-Schwarz inequality that

∆Q ∆P = k qψ kk pψ k ≥ |hqψ|pψi| = |hψ|qp|ψi|
∆Q ∆P = k pψ kk qψ k ≥ |hpψ|qψi| = |hψ|pq|ψi| .

Adding these two inequalities, we have

2 ∆Q ∆P ≥ |hψ|qp|ψi| + |hψ|pq|ψi| ≥ ||hψ|qp|ψi| − |hψ|pq|ψi|| = |hψ|[q, p]|ψi| = 1 .

and, hence,
1
∆Q ∆P ≥ , (2.32)
2
which is, of course, Heisenberg’s uncertainty relation. The lesson from this derivation is that the
uncertainty relation is really quite general. All it requires is two hermitian operators Q and P with
commutator [Q, P ] = i and then it follows more or less directly from the Cauchy-Schwarz inequality.
a
For simplicity we set ~ to one.

2.3.2 Eigenvectors and eigenvalues


The theory of eigenvalues and eigenvectors (“spectral theory”) in the infinite-dimensional case is signifi-
cantly more complicated than for finite-dimensional vector spaces. Developing this theory systematically
can easily take up a significant part of a lecture course. Here, I would just like to collect a few basic ideas
and cite some of the important results.
Throughout, we will focus on self-adjoint operators T : H → H on a complex Hilbert space H which,
in addition, are compact, a property which makes the spectral theory particularly well-behaved and is
defined as follows.
Definition 2.4. A linear operator T : H → H on a Hilbert space H is called compact iff, for every bounded
sequence (vk ) in H, the sequence (T vk ) contains a convergent sub-sequence.

42
We note that a compact operator is bounded. If it was not bounded, there would be a sequence (vk ) with
k vk k = 1 and k T vk k > k, and, in this case the sequence (T vk ) cannot have a convergent sub-sequence,
contradicting compactness. This means, from Theorem 2.8 that compact operators always have an adjoint.
In fact, we will be focusing on self-adjoint, compact operators and their eigenvalues and eigenvectors
have the following properties.

Theorem 2.9. Let T : H → H be a compact, self-adjoint operator on a (separable) Hilbert space H. Then
we have the following statements:
(i) The eigenvalues of T are real and eigenvectors for different eigenvalues are orthogonal.
(ii) The set of non-zero eigenvalues of T is either finite or it is a sequence which tends to zero.
(iii) Each non-zero eigenvalue has a finite degeneracy.
(iv) There is an ortho-normal system (k ) of eigenvectors with non-zero eigenvalues λk which forms a

basis on Im(T ). Further, we have Im(T ) = Ker(T ) and H = Im(T ) ⊕ Ker(T ).
(v) The operator T has the representation
X
Tv = λk hk , vik . (2.33)
k

Proof. We have already shown (i). For the proof of (ii) and (iii) see, for example, Ref. [5]. The ortho-
normal system (k ) in (iv) is constructed by applying the Gram-Schmidt procedure to each eigenspace
Ker(T − λid) with λ 6= 0 which, from (iii) is finite-dimensional. The proof that the vectors (k ) form an
ortho-normal basis of Im(T ) can be found in Ref. [5].
To show the formula in (v) we set W = Im(T ) and note, from Theorem 2.2, that H = W ⊕ W ⊥ so
that every v ∈ H can be written as v = w + u, where w ∈ W and u ∈ W ⊥ . Since u ∈ W ⊥ we have
0 = hT x, ui = hx, T ui for all x ∈ H and this means that T u = 0. (Hence, W ⊥ ⊂ Ker(T ) and the reverse
inclusion is also easy to show so W ⊥ = Ker(T ).) The other component, w ∈ W , can be written as
X
w= hk , wik (2.34)
k

since the (k ) form a basis of W . Putting it all together gives


X X X
Tv = Tw = T hk , wik = hk , wiT k = λk hk , wik . (2.35)
k k k

The ortho-normal system (k ) from the above theorem provides a basis for Im(T ) but not necessarily
for all of H. This is because the vectors k correspond to non-zero eigenvalues and we are missing the
eigenvectors with zero eigenvalue, that is, the kernel of T . Fortunately, from part (iv) of Theorem 2.9 we
have the decomposition

H = Im(T ) ⊕ Ker(T ) , Im(T ) = Ker(T ) , (2.36)
so we can complete (k ) to a basis of H by adding a basis for Ker(T ). In conclusion, for a compact,
self-adjoint operator on a Hilbert space, we can always find an ortho-normal basis of the Hilbert space
consisting of eigenvectors of the operator. In Dirac notation and dropping the argument v, Eq. (2.33) can
be written as X
T = λk |k ihk | , (2.37)
k

where T |k i = λk |k i. This is the generalisation of the finite-dimensional result (1.46).

43
2.3.3 The Fredholm alternative
Suppose, for a compact, self-adjoint operator T : H → H, a given u ∈ H and a constant λ 6= 0, we would
like to solve the equations
(T − λ id)v = 0 , (T − λ id)v = u . (2.38)
It turns out that many of the differential equations we will consider later can be cast in this form. The right
Eq. (2.38) is an inhomogeneous linear equation and the equation on the left-hand side is its homogeneous
counterpart. Clearly, we have
{solutions of homogeneous equation} = Ker(T − λ id) . (2.39)
We also know, if a solution v0 of the inhomogeneous equation exists, then its general solution is given by
v0 + Ker(T − λ id). There are two obvious cases we should distinguish.
(a) The number λ does not equal any of the eigenvalues of T . In this case Ker(T − λ id) = {0} so that
the homogeneous equation in (2.38) only has the trivial solution.
(b) The number λ does equal one of the eigenvalues of T so that Ker(T −λ id) 6= {0} and the homogenous
equation in Eq. (2.38) does have non-trivial solutions.
The above case distinction is called the Fredholm alternative. Of course we would like to discuss the
solutions to the inhomogeneous equation in either case. The obvious way to proceed is to start with an
ortho-normal basis (k ) of eigenvectors of T with corresponding eigenvalues λk , so that T vk = λk vk (Here,
we include the eigenvectors with eigenvalue zero.), expand v and u in terms of this basis
X X
v= hk , vik , u= hk , uik , (2.40)
k k

and use the representation (2.33) of the operator T . Inserting all this into the inhomogeneous Eq. (2.38)
gives
!
X X
(T − λ id)v = (λi − λ)hi , vii = hi , uii . (2.41)
i i
Taking the inner product of this equation with a basis vector k leads to
(λk − λ)hk , vi = hk , ui , (2.42)
for all k. Now let us consider the two cases above. In case (a), λ does not equal any of the eigenvalues λk
and we can simply solve Eq. (2.42) for all k to obtain
hk , ui X hk , ui
hk , vi = ⇒ v= k . (2.43)
λk − λ λk − λ
k

This result means that in case (a) we have a unique solution v, as given by the above equation, for any
inhomogeneity u. The situation is more complicated in case (b). In this case λ equals λk for some k and
for such cases the LHS of Eq. (2.42) vanishes. This means in this case we have a solution if and only if
hk , ui = 0 for all k with λk = λ . (2.44)
Another way of stating this condition is to say that we need the inhomogeneity u to be perpendicular to
Ker(T − λ id). If this condition is satisfied, the solution can be written as
X hk , ui X
v= k + αk k (2.45)
λk − λ
k:λk 6=λ k:λk =λ

where the αk are arbitrary numbers and the second sum in this expression of course represents a general
element of Ker(T − λ id). This discussion can be summarised in the following

44
Theorem 2.10. (Fredholm alternative) Let T : H → H be a compact, self-adjoint operator on a Hilbert
space H with a basis of eigenvectors k with associated eigenvalues λk , u ∈ H and λ 6= 0. For the solution
to the equation
(T − λ id)v = u (2.46)
the following alternative holds:
(a) The number λ is different from all eigenvalues λk . Then the equation (2.46) has a unique solution for
all u ∈ H given by
X hk , ui
v= k . (2.47)
λk − λ
k

(b) The number λ equals one of the eigenvalues. In this case, a solution exists if and only if

hk , ui = 0 for all k with λk = λ . (2.48)

If this condition is satisfied the general solution is given by


X hk , ui X
v∈ k + αk k , (2.49)
λk − λ
k:λk 6=λ k:λk =λ

where the αk ∈ C are arbitrary.

45
3 Fourier analysis
The Fouries series and the Fourier transform are important mathematical tools in practically all parts of
physics. Intuitively, they allow us to decompose functions into their various frequency components. The
Fouries series, which we discuss first, deals with functions on finite intervals and leads to a decomposition
in terms of a discrete spectrum of frequencies. Mathematically speaking, we find an ortho-normal basis of
functions with well-defined frequencies on the Hilbert space L2 ([−π, π]) (say) and the coordinates relative
to this basis represent the strength of the various frequencies. The Fourier transform applies to functions
defined on the entire real line (or on Rn ) and leads to a decomposition with a continuous spectrum of
frequencies. Mathematically, the Fourier transform can be understood as a unitary map on the Hilbert
space L2 (Rn ).

3.1 Fourier series


3.1.1 Cosine Fourier series
We begin by considering the Hilbert space L2R ([0, π]) of (real-valued) square integrable functions on the
interval [0, π] and recall that the scalar product on this space is defined by
Z π
hf, gi = dx f (x)g(x) . (3.1)
0

It is easy to verify that the functions


r
1 2
c̃0 = √ , c̃k := cos(kx) , k = 1, 2, . . . (3.2)
π π
form an ortho-normal system on L2R ([0, π]).
Exercise 3.1. Show that the functions in Eq. (3.7) form an ortho-normal system on L2R ([0, π]) by explicitly
evaluating the scalar products from Eq. (3.1).
In fact, we have a significantly stronger statement.
Theorem 3.2. The functions (c̃k )∞ 2
k=0 in Eq. (3.2) form an ortho-normal basis of LR ([0, π]). Hence, the
space L2R ([0, π]) is separable.
Proof. These proofs are typically highly technical but in the present case we get away with appealing to
previous theorems. First, it is clear that (c̃k )∞k=0 forms on ortho-normal system. To show that it is an
ortho-normal basis we need to verify one of the conditions in Theorem 2.3 and we opt for (iv). Hence,
we need to show that Span(c̃k )∞ 2 2
k=0 = LR ([0, π]), or, in other words, that every function f ∈ LR ([0, π]) can
be approximated, to arbitrary accuracy, by linear combinations of the c̃k . From Theorem 2.1 we know
that every function f ∈ L2R ([0, π]) can be approximated, to arbitrary accuracy, by continuous functions
g ∈ CR ([0, π]). It is, therefore, enough to show that every continuous g can be approximated by linear
combinations of c̃k . Define the function h(y) := g(cos−1 y) which is in CR ([−1, 1]) and, from the Stone-
Weierstrass theorem 1.35, can be approximated by polynomials p, so that |h(y) − p(y)| < . By setting
x = cos−1 y this impliesPnthat |g(x) − p(cos x)| < . But the function p(cos x), where p is a polynomial, can
always be written as k=0 αk cos(kx), using trigonometric identities. This completes the proof.

Of course we are not stuck with the specific interval [0, π] but a simple re-scaling x → πx/a for a > 0
shows that the Hilbert space L2R ([0, a]) with scalar product
Z a
hf, gi = dx f (x)g(x) , (3.3)
0

46
has an ortho-normal basis
r  
1 2 kπx
c̃0 = √ , c̃k := cos , k = 1, 2, . . . . (3.4)
a a a

Let us be more explicit about what this actually means. From part (i) of Theorem 2.3 we conclude that
every (real-valued) square integrable function f ∈ L2R ([0, a]) can be written as
∞ r ∞  
X α0 2X kπx
f (x) = αk c̃k (x) = √ + αk cos (3.5)
a a a
k=0 k=1

where
a
r Z a  
1 2 kπx
Z
α0 = hc̃0 , f i = √ dx f (x) , αk = hc̃k , f i = dx cos f (x) , k = 1, 2, . . . . (3.6)
a 0 a 0 a
q
α0 2
It is customary to introduce the coefficients a0 = 2 a and ak =

a αk , for k = 1, 2, . . . in order to
re-distribute factors:

2 a
   
a0 X kπx kπx
Z
f (x) = + ak cos where ak = dx cos f (x) . (3.7)
2 a a 0 a
k=1

This series is called the cosine Fourier series and the ak are called the (cosine) Fourier coefficients. It is
important to remember that the equality in the first Eq. (3.7) holds in L2R ([0, a]), a space which consists of
classes of functions which have been identified if they differ only on sets of Lebesgue-measure zero. This
means that the function f and its Fourier series do not actually have to coincide at every point x ∈ [0, a]
- they can differ on a space of measure zero. However, we know that the (cosine) Fourier series always
converges to the function f in the norm on L2R ([0, a]).
We know from part (ii) of Theorem 2.3 that the norm of f can be calculated in terms of its Fourier
coefficients as
∞ ∞
2 a 2 2X |a0 |2 X
Z
dx |f (x)|2 = k f k2 = |hc̃k , f i|2 = + |ak |2 . (3.8)
a 0 a a 2
k=0 k=1

This result is also known as Parseval equation.

3.1.2 Sine Fourier series


Unsurprisingly, the above discussion can be repeated for sine functions. As before, we consider the Hilbert
space L2R ([0, π]) with scalar product (3.1). On this space the functions
r
2
s̃k := sin(kx) , k = 1, 2, . . . (3.9)
π
form an ortho-normal system and indeed an ortho-normal basis as stated in the following

Theorem 3.3. The functions (s̃k )∞ 2


k=1 in Eq. (3.11) form an ortho-normal basis of LR ([0, π]).

Proof. This proof is very similar to the one for Theorem 3.2 and can, for example, be found in Ref. [5].

47
As in the cosine case, we can re-scale by x → πx/a to the interval [0, a] and obtain an ortho-normal basis
r  
2 kπx
s̃k = sin , k = 1, 2, . . . (3.10)
a a

for L2R ([0, a]) with scalar product (3.3). Hence, every function f ∈ L2R ([0, a]) can be expanded as
∞ r ∞   r Z a  
X 2X kπx 2 kπx
f (x) = βk s̃k (x) = βk sin where βk = hs̃k , f i = dx sin f (x) .
a a a 0 a
k=1 k=1
q
2
Introducing to the coefficients bk = a βk this can be re-written in the standard notation

∞   a  
kπx 2 kπx
X Z
f (x) = bk sin where bk = dx sin f (x) . (3.11)
a a 0 a
k=1

This series is called the sine Fourier series and the bk are called the (sine) Fourier coefficients. Of course
there is also a version of Parseval’s equation which reads
a ∞ ∞
2 2 2X
Z X
dx |f (x)|2 = k f k2 = |hs̃k , f i|2 = |bk |2 . (3.12)
a 0 a a
k=1 k=1

3.1.3 Real standard Fourier series


The most commonly used form of the Fourier series uses sine and cosine functions and can be thought of
as a combination of the above two cases. The Hilbert space considered in this case is L2R ([−π, π]) with
scalar product Z π
hf, gi = dx f (x)g(x) . (3.13)
−π
We can also think of the functions in this Hilbert space as the periodic functions with period 2π, so
f (x) = f (x + 2π) (which are square-integrable over one period). The functions
1 1 1
c0 := √ , ck := √ cos(kx) , sk := √ sin(kx) , k = 1, 2, . . . , (3.14)
2π π π

form an ortho-normal system on L2R ([−π, π]).

Exercise 3.4. Check that the functions (3.14) form an ortho-normal system on L2R ([−π, π]).

They also form an ortho-normal basis as the following theorem asserts.

Theorem 3.5. The functions (ck )∞ ∞ 2


k=0 and (sk )k=1 together form an ortho-normal basis on LR ([−π, π]).

Proof. Every function f ∈ L2R ([−π, π]) can be written as f = f+ + f− , where f± (x) = 21 (f (x) ± f (−x)) are
the symmetric and anti-symmetric parts of f . The functions f± can be restricted to the interval [0, π] so
that they can be viewed as elements of L2R ([0, π]). From Theorem 3.2 we can write down a cosine Fourier
series for f+ and from Theorem 3.3 f− has a sine Fourier series, so

X ∞
X
f+ = αk c̃k , f− = βk s̃k . (3.15)
k=0 k=1

48
Since both sides of the first equation are symmetric and both sides of the second equation anti-symmetric
they can both be trivially extended back to the interval [−π, π]. Then, summing up

X ∞
X
f = f+ + f− = αk c̃k + βk s̃k (3.16)
k=0 k=1
proves the statement.
This proof also points to an interpretation of the relation between the sine and cosine Fourier series on
the one hand and the standard Fourier series on the other hand. Starting with a function f ∈ L2R ([0, π])
we can write down the cosine Fourier series. But we can also extend f to a symmetric function on [−π, π]
so it becomes an element of L2R ([−π, π]) and we can write down a standard Fourier series. Of course, f
being symmetric, this Fourier series only contains cosine terms and it looks formally the same as the cosine
Fourier series but is valid on the larger interval [−π, π]. Similarly, for f ∈ L2R ([0, π]) we can write down
the sine Fourier series and extend f to an anti-symmetric function on the interval [−π, π]. The standard
Fourier series for this anti-symmetric function then only contains sine terms and formally coincides with
the sine Fourier series. Conversely, if we start with an even (odd) function f ∈ L2R ([−π, π]) then the
Fourier series only contains cosine (sine) terms and we can restrict the expansion to the interval [0, π] so
it becomes a cosine (sine) Fourier series.
As before, we can use the re-scaling x → πx/a to transform to the interval [−a, a] and obtain the
ortho-normal basis
   
1 1 kπx 1 kπx
c0 := √ , ck := √ cos , sk := √ sin , k = 1, 2, . . . , (3.17)
2a a a a a
for the Hilbert space L2R ([−a, a]) with scalar product
Z a
hf, gi = dx f (x)g(x) . (3.18)
−a

Let us collect the formulae for the standard Fourier series. Every function f ∈ L2R ([−a, a]) can be expanded
as
∞ ∞ ∞   ∞  
X X α0 1 X kπx 1 X kπx
f (x) = αk ck (x) + βk sk (x) = √ + √ αk cos +√ βk sin (3.19)
2a a a a a
k=0 k=1 k=1 k=1
where Ra
α0 = hc0 , f i = √1 dx f (x)
2aR −a
1 a
dx cos kπx

αk = hck , f i = √
a R−a a  f (x) ,
k = 1, 2, . . . (3.20)
a
βk = hck , f i = √1 dx sin kπx f (x) ,
k = 1, 2, . . . .
a −a a
p √ √
As before, we introduce the re-scaled coefficients a0 = 2/a α0 , ak = αk / a and bk = βk / a, where
k = 1, 2, . . . to obtain these equations in the standard form
∞     
a0 X kπx kπx
f (x) = + ak cos + bk sin , (3.21)
2 a a
k=1
where
a a   a  
1 1 kπx 1 kπx
Z Z Z
a0 = dx f (x) , ak = dx cos f (x) , bk = dx sin f (x) (3.22)
a −a a −a a a −a a
for k = 1, 2, . . .. Parseval’s equation now takes the form
∞ ∞ ∞
!
1 a 1 1 X |a0 |2 X
Z X
2 2 2 2
dx |f (x)| = k f k = |hck , f i| + |hsk , f i| = + (|ak |2 + |bk |2 ) . (3.23)
a −a a a 2
k=0 k=1 k=1

49
3.1.4 Complex standard Fourier series
By far the most elegant form of the Fourier series arises in the complex case, where we consider the Hilbert
space L2C ([−π, π]) with scalar product
Z π
hf, gi = dx f (x)∗ g(x) . (3.24)
−π

The functions
1
ek := √ exp(ikx) , k ∈ Z (3.25)

form an ortho-normal system as verified in the following exercise.
Exercise 3.6. Show that the functions (ek )∞ 2
k=−∞ in Eq. (3.25) form an ortho-normal system on LC ([−π, π])
with scalar product (3.24).
The above functions form, in fact, an ortho-normal basis as stated in
Theorem 3.7. The functions (ek )∞ 2
k=−∞ in Eq. (3.25) form an ortho-normal basis of LC ([−π, π]).

Proof. Start with a function f ∈ L2C ([−π, π]) and decompose this function into real and imaginary parts,
so write f = fR + ifI . Since
Z π Z π Z π
∞ > k f k2 = dx|f (x)|2 = dx fR2 + dx fI2 (3.26)
−π −π −π

both fR and fI are real-valued square integrable functions and are, hence, elements of L2R ([−π, π]). This
means, from Theorem 3.5 that we can write down a standard real Fourier series for fR and fI . Inserting
these two real Fourier series into f = fR + ifI and replacing cos(kx) = (exp(ikx) + exp(−ikx))/2,
sin(kx) = (exp(ikx) − exp(−ikx))/(2i) proves the theorem.

The usual re-scaling x → πx/a leads to the Hilbert space L2C ([−a, a]) with scalar product
Z a
hf, gi = dx f (x)∗ g(x) . (3.27)
−a

and ortho-normal basis  


1 ikπx
ek := √ exp , k∈Z. (3.28)
2a a
Every function f ∈ L2C ([−a, a]) then has an expansion
 
X 1 X ikπx
f (x) = αk ek (x) = √ αk exp , (3.29)
2a k∈Z a
k∈Z

where Z a
−ikπx
 
1
αk = hek , f i = √ dx exp f (x) . (3.30)
2a −a a

With the re-scaled Fourier coefficients ak = αk / 2a this turns into the standard form
Z a
−ikπx
   
X ikπx 1
f (x) = ak exp where ak = dx exp f (x) . (3.31)
a 2a −a a
k∈Z

Parseval’s equation now reads


Z a
1 1 1 X X
dx |f (x)|2 = k f k2 = |hek , f i|2 = |ak |2 . (3.32)
2a −a 2a 2a
k∈Z k∈Z

50
3.1.5 Pointwise convergence
So far, our discussion of convergence for the Fourier series has been carried out with respect to the L2
norm (3.18). As emphasised, this type of convergence ensures that the difference of a function and its
Fourier series has a vanishing L2 norm but it does not necessarily imply that the Fourier series converges
to the function at every point x. The following theorem provides a statement about uniform convergence
of a Fourier series.
Theorem 3.8. Let f : [−a, a] → R or C be a (real or complex valued) function which is piecewise
continuously differentiable and which satisfies f (−a) = f (a). Then the (real or complex) Fourier series of
f converges to f uniformly.
Proof. For the proof see, for example, Ref. [10].

Recall from Def. 1.22 that uniform convergence implies point-wise convergence so under the conditions of
Theorem 3.8 the Fourier series of f converges to f at every point x ∈ [−a, a].

3.1.6 Examples of Fourier series

fHxL
3

0 x

-1

-2

-3
-15 -10 -5 0 5 10 15

Figure 3: Graph of the periodic functions f defined by f (x) = f (x + 2π) and f (x) = x for −π < x ≤ π.

Application 3.7. Linear function


Let us start with the function f : [−π, π] → R defined by

f (x) = x , (3.33)

so a simple linear function on the interval [−π, π]. Of course, we can extend this to a periodic
function with period 2π whose graph is shown in Fig. 3. Since this function is anti-symmetric the
Fourier series of course only contains sine terms. (Alternatively, and equivalently, we can consider
this function restricted to the interval [0, π] and compute its sine Fourier series.) Using Eqs. (3.22)
we find for the Fourier coefficients
1 π 2(−1)k+1
Z
ak = 0 , k = 0, 1, 2, . . . , bk = dx x sin(kx) = , k = 1, 2, . . . . (3.34)
π −π k

51
fHxL

3
2.0

1.5 2

1.0 1

0.5 0 x

0.0 k -1

-0.5 -2

-1.0 -3
0 10 20 30 40 50 -3 -2 -1 0 1 2 3

Figure 4: Fourier coefficients and Fourier series for the linear function f in Eq. (3.33). The left figure shows
the Fourier coefficients ak from Eq. (3.34) for k = 1, . . . , 50. The function f together with the first six partial
sums of its Fourier series (3.36) is shown in the right figure.

As a practical matter, it is useful to structure the calculation of Fourier coefficients in order to avoid
mistakes. Creating pages and pages of integration performed in small steps is neither efficient nor likely
to lead to correct answers. Instead, separate the process of integration from the specific calculation of
Fourier coefficients. A particular Fourier calculation
R often involves certain types of standard integrals.
In the above case, these are integrals of the form dx x sin(αx) for a constant α. Find these integrals
first (or simply look them up):

x cos(αx) sin(αx)
Z
dx x sin(αx) = − + . (3.35)
α α2
Then apply this general result to the particular calculation at hand, that is, in the present case, set
α = k and put in the integration limits.
Inserting the above Fourier coefficients into Eq. (3.21), we get the Fourier series

X (−1)k+1
f (x) = 2 sin(kx) . (3.36)
k
k=1

Recall that the equality in Eq. (3.36) is not meant point-wise for every x but as an equality in
L2R ([−π, π]), that is, the difference between f and its Fourier series has length zero with respect to
the norm on L2R ([−π, π]). In fact, Eq. (3.36) shows (and Fig. 4 illustrates) that the Fourier series of f
vanishes at ±π (since every term in the series (3.36) vanishes at ±π) while f (±π) = ±π is non-zero.
So we have an example where the Fourier series does not converge to the function at every point. In
fact, the present function f violates the conditions of Theorem 3.8 (since f (π) 6= f (−π)), so there is
no reason to expect point-wise convergence. It is clear from Fig. 4 that the Fourier series “struggles”
to reproduce the function near ±π and this can be seen as the intuitive reason for the slow drop-off
of the Fourier coefficients, ak ∼ 1/k, in Eq. (3.34). In other words, a larger number of terms in the
Fourier series contribute significantly so that the function can be matched near the boundaries of the
interval [−π, π].

52
For this example, let us consider Parseval’s equation (3.23)
π ∞ ∞
2π 2 1 1
Z X X
2 2
= dx x = |bk | = 4 , (3.37)
3 π −π k2
k=1 k=1

where the left hand side follows from explicitly carrying out the normalisation integral and the right
hand side by inserting the Fourier coefficients (3.34). This leads to the interesting formula

π2 X 1
= . (3.38)
6 k2
k=1

fHxL

3.0

2.5

2.0

1.5

1.0

0.5

0.0
-15 -10 -5 0 5 10 15

Figure 5: Graph of the periodic functions f defined by f (x) = f (x + 2π) and f (x) = |x| for −π < x ≤ π.

fHxL

3.0
3

2.5
2
2.0

1 1.5

1.0
0

0.5
-1
0.0
0 10 20 30 40 50 -3 -2 -1 0 1 2 3

Figure 6: Fourier coefficients and Fourier series for the modulus function f in Eq. (3.39). The left figure shows
the Fourier coefficients ak from Eq. (3.40) for k = 1, . . . , 50. The function f together with the first six partial
sums of its Fourier series (3.42) is shown in the right figure.

53
Application 3.8. Modulus function
Our next example is the modulus function f : [−π, π] → R defined by

f (x) = |x| . (3.39)

Extended to a periodic function with period 2π its graph is shown in Fig. 5. Since this function is
symmetric the Fourier series of course only contains cosine terms. (Alternatively, and equivalently,
we can consider this function restricted to the interval [0, π] and compute its cosine Fourier series.)
Using Eqs. (3.22) we find for the Fourier coefficients

2 (−1)k − 1

1 π 1 π
Z Z
a0 = dx |x| = π , ak = dx |x| cos(kx) = , bk = 0 , (3.40)
π −π π −π πk 2

for k = 1, 2, . . .. The standard integral which enters this calculation is

cos(αx) x sin(αx)
Z
dx x cos(αx) = + , (3.41)
α2 α
where α 6= 0. Note that the value obtained for a0 does not follow by inserting k = 0 into the general
formula for ak - in fact, doing this leads to an undefined expression. This observation points to a
general rule. Sometimes Fourier coefficients, calculated for generic values of k ∈ N, become apparently
singular or undefined for specific k values. Of course Fourier coefficients must be well-defined, so this
indicates a break-down of the integration method which occurs for those specific values of k. In such
cases, the integrals for the problematic k-values should be carried out separately and, with the correct
integration applied, this will lead to well-defined answers. In the present case, the special case arises
because the standard integral (3.41) is only valid for α 6= 0.
The Fourier series from the above coefficients is given by

π 4 X cos(kx)
f (x) = − . (3.42)
2 π k2
k=1,3,5,...

The Fourier coefficients ak and the first few partial sums of the above Fourier series are shown in
Fig. 6. The Fourier coefficients drop off as ak ∼ 1/k 2 , so more quickly as in the previous example, and
convergence of the Fourier series is more efficient. A related observation is that the function (3.39)
satisfies all the conditions of Theorem 3.8 and, hence, its Fourier series converges uniformly (and
point-wise) to f . Fig. (6) illustrates this convincingly.

Application 3.9. Sign function


Another interesting example is the sign function f : [−π, π] → R defined by

 1 for x > 0
f (x) := sign(x) = 0 for x = 0 . (3.43)
−1 for x < 0

The periodically continued version of this function is shown in Fig. 7. Since f is an anti-symmetric
function, the Fourier series only contains sine terms. (Alternatively and equivalently, we can think
of f as a function on the [0, π] and work out the sine Fourier series.) For the Fourier coefficients we

54
fHxL

1.0

0.5

0.0 x

-0.5

-1.0
-15 -10 -5 0 5 10 15

Figure 7: Graph of the periodic functions f defined by f (x) = f (x + 2π) and f (x) = sign(x) for −π < x ≤ π.

fHxL

1.4
1.0
1.2

1.0
0.5

0.8
0.0 x

0.6
-0.5
0.4

0.2 -1.0

0.0
0 10 20 30 40 50 -3 -2 -1 0 1 2 3

Figure 8: Fourier coefficients and Fourier series for the sign function f in Eq. (3.43). The left figure shows
the Fourier coefficients ak from Eq. (3.44) for k = 1, . . . , 50. The function f together with the first six partial
sums of its Fourier series (3.45) is shown in the right figure.

have
2 (−1)k − 1
π

1
Z
ak = 0 , bk = dx sign(x) sin(kx) = − , (3.44)
π −π πk
for k = 1, 2, . . . which leads to the Fourier series

4 X sin(kx)
f (x) = − . (3.45)
π k
k=1,3,5,...

The Fourier coefficients bk and the first few partial sums of the Fourier series are shown in Fig. 8.
As for example 1, the function f does not satisfy the conditions of Theorem 3.8 and the Fourier
series does not converge everywhere point-wise to the function f . Specifically, while the Fourier series
always vanishes at x = ±π the function value f (±π) = ±1 is non-zero. Related to this is the slow
drow-off, ak ∼ 1/k, of the Fourier coefficients.

55
fHxL

0 x

-5

-10

-3 -2 -1 0 1 2 3

Figure 9: Graph of the periodic functions f defined in Eq. (3.46).

fHxL

3 2.0

1.5
2 5
1.0
1
0.5
0 x
0
0.0

-1 -0.5
-5
-2 -1.0

0 10 20 30 40 50 0 10 20 30 40 50 -3 -2 -1 0 1 2 3

Figure 10: Fourier coefficients and Fourier series for the function f in Eq. (3.46). The left figure shows the
Fourier coefficients ak and the middle figure the coefficients bk from Eq. (3.47). The function f together with
the first six partial sums of its Fourier series (3.48) is shown in the right figure.

Application 3.10. A more complicated function


As a final and more complicated example, consider the function f : [−π, π] → R defined by

f (x) = x2 cos(5x) + x . (3.46)

with graph as shown in Fig. 9. Since this function is neither symmetric nor anti-symmetric we expect
both sine and cosine terms to be present. For the Fourier coefficients is follows

1
Rπ 4(−1)k (k2 +25)
ak = π −π dx f (x) cos(kx) = − (k2 −25)2
, k = 0, . . . , 4, 6, . . .
π2

a5 = 1 1
dx f (x) cos(5x) = 50 + 3 (3.47)
π
1
R−π
π 2(−1)k
bk = π −π dx f (x) sin(kx) = − k k = 1, 2, . . . .

Note that inserting k = 5 into the first expression for ak leads to a singularity and, hence, the generic
integration method has broken down in this case. (This is, of course, related to the presence of cos(5x)
in f .) For this reason, we have carried out the calculation of a5 separately. The resulting Fourier

56
series is
X (−1)k k 2 + 25
 ∞
π2 (−1)k
 
1 X
f (x) = −4 cos(kx) + −2 cos(5x) + sin(kx) . (3.48)
k6=5
(k 2 − 25)2 50 3
k=1
k

The corresponding plots for the coefficients ak , bk and the partial Fourier series are shown in Fig. 10.
Again this is a function which violates the conditions of Theorem 3.8 and where the Fourier series
does not converge to the function everywhere. A new feature is the structure in the Fourier coefficients
ak , as seen in the left Fig. 10. The Fourier mode a5 (and, to a lesser extent, that of a4 and a6 ) is
much stronger than the other modes ak and this is of course related to the presence of cos(5x) in the
function f . Intuitively, the Fourier series detects the strength with which frequencies k are contained
in a function f and the presence of cos(5x) in f suggests a strong contribution for k = 5.

Application 3.11. A complex example


So far our examples have been for the real Fourier series so we finish with an example for the complex
case. Of course any of the previous real-valued functions can also be expanded into a complex Fourier
series. For example, for the modulus function f : [−π, π] → R with f (x) = |x|, discussed in example
2, the coefficients for the complex Fourier series are, from Eq. (3.31), given by
(
(−1)k −1
Z π
1 for k = 1, 2, . . . .
ak = dx |x| exp(−ikx) = π
πk2 (3.49)
2π −π 2 for k = 0

Hence the complex Fourier series is

π 2 X exp(ikx)
f (x) = − . (3.50)
2 π odd
k2
k∈Z

which we could have also inferred from the real Fourier series (3.42) by simply replacing cos(kx) =
(exp(ikx) + exp(−ikx))/2.

3.2 Fourier transform


As we have seen, the Fourier series provides a frequency analysis for functions on a finite interval, in
terms of a discrete spectrum of frequencies labeled by an integer k. The Fourier transform serves a similar
purpose but for functions on all of R (or Rn ), leading to a frequency analysis in terms of a continuous
spectrum of frequencies.

3.2.1 Basic definition and properties


The natural arena to start the discussion is the Banach space L1C (Rn ), defined in Eq. (1.90), with norm
k · k1 , defined in Eq. (1.91). As usual, we denote vectors in Rn by bold-face letter, so x = (x1 , . . . , xn )T .
A simple observation is that for a function f ∈ L1C (Rn ) we have exp(−ix · k)f ∈ L1C (Rn ) for any vector
k ∈ Rn . Hence, it makes sense to define 5
5
There are different conventions for how to insert factors of 2π into the definition of the Fourier transform. The convention
adopted below is the most symmetric choice, as we will see later.

57
Definition 3.1. For functions f ∈ L1C (Rn ) we define the Fourier transform Ff = fˆ : Rn → C by
1
Z
ˆ
f (k) = F(f )(k) := dn x exp(−ix · k)f (x) . (3.51)
(2π)n/2 Rn
kf k1
Clearly, F is a linear operator, that is F(αf + βg) = αF(f ) + βF(g). Also note that |fˆ(k)| ≤ (2π)n/2
,
so the modulus of the Fourier transform is bounded. With some more effort it can be shown that fˆ is
continuous. However, it is not clear that the Fourier transform fˆ is an element of L1C (Rn ) as well and,
it turns out, this is not always the case. (See Example 3 below.) We will rectify this later by defining a
version of the Fourier transform which provides a map L2C (Rn ) → L2C (Rn ).
Before we compute examples of Fourier transforms it is useful to look at some of its general properties.
Recall from Section 2.3 the translation operator Ta , the modulation operator Eb and the dilation operator
Dλ , for a, b ∈ Rn and λ ∈ R, defined by

Ta (f )(x) := f (x − a) , Eb (f )(x) := exp(ib · x)f (x) , Dλ (f )(x) := f (λx) , (3.52)

which we can also think of as maps L1C (Rn ) → L1C (Rn ). For any function g : Rn → C, we also have the
multiplication operator
Mg (f )(x) := g(x)f (x) . (3.53)

It is useful to work out how these operators as well as derivative operators Dxj := ∂xj relate to the Fourier
transform.
Proposition 3.1. (Some elementary properties of the Fourier transform) For f ∈ L1C (Rn ) we have
\
(F1) T ˆ
a (f ) = E−a (f ) or, equivalently, F ◦ Ta = E−a ◦ F
\
(F2) E ˆ
b (f ) = Tb (f ) or, equivalently, F ◦ Eb = Tb ◦ F
\
(F3) D λ (f ) =
1 ˆ
n D1/λ (f ) or, equivalently, F ◦ Dλ =
1
n D1/λ ◦ F
|λ| |λ|
For f ∈ Cc1 (Rn ) we have
[
(F4) D ˆ
xj f (k) = ikj f (k) or, equivalently, F ◦ Dxj = Mikj ◦ F
(F5) x
d ˆ
j f (k) = iDk f or, equivalently, F ◦ Mx = iDk ◦ F.
j j j

Proof. (F1) This can be shown by direct calculation.


1 1
Z Z
\
Ta (f )(k) = dx n −ix·k
e f (x − a)
y=x−a
= dy n e−i(y+a)·k f (y) = E−a (fˆ)(k) . (3.54)
(2π)n/2 Rn (2π)n/2 Rn
The proofs for (F2) to (F5) are similar and are left as an exercise.

Exercise 3.9. Proof (F2), (F3), (F4) and (F5) from Prop. 3.1.

3.2.2 Convolution
Another operation which relates to Fourier transforms in an interesting way is the convolution f ? g of two
functions f, g ∈ L1 (Rn ) which is defined as
Z
(f ? g)(x) := dy n f (y)g(x − y) . (3.55)
Rn

A straightforward computation shows that the convolution is commutative, so f ? g = g ? f .


Exercise 3.10. Show that the convolution commutes.

58
From a mathematical point of view, we have the following statement about convolutions.
Theorem 3.11. (Property of convolutions) For f, g ∈ L1 (Rn ) the convolution f ? g is well-defined and
f ? g ∈ L1 (Rn ).
Proof. For the proof see, for example, Ref. [10].

How can the convolution be understood intuitively? From the integral (3.55) we can say that the convo-
lution is “smearing” the function f by the function g. For example, consider choosing f (x) = cos(x) and
 1
g(x) = 2a for x ∈ [−a, a] , (3.56)
0 for |x| > a
for any a > 0. The function g is chosen so that, upon convolution, it leads to a smearing (or averaging)
of the function f over the interval [x − a, x + a] for every x. An explicit calculation shows the convolution
is given by Z x+a
1 sin(a)
(f ? g)(x) = dy cos(y) = cos(x) . (3.57)
2a x−a a
If we consider the limit a → 0, so the averaging width goes to zero, we find that f ? g = f so f remains
unchanged, as one would expect. The other extreme would be to choose a = π in which case f ? g = 0. In
this case, the averaging is over a period [x − π, x + π] of the cos so the convoluted function vanishes for
every x. For other values of a the convolution is still a cos function but with a reduced amplitude, as one
would expect from a local averaging.
The relationship between convolutions and Fourier transforms is stated in the following Lemma.

Proposition 3.2. For f, g ∈ L1 (Rn ) we have f[


? g = (2π)n/2 fˆ ĝ.
Proof. The proof works by straightforward calculation. Since (f ? g)(x) = dy n f (y)g(x − y) we have
R

1
Z
f[
? g(k) = dxn dy n f (y)g(x − y)e−ix·k (3.58)
(2π)n/2
1
Z   
n n −iy·k −i(x−y)·k
= dx dy f (y)e g(x − y)e (3.59)
(2π)n/2
1
Z Z
z=x−y
= dy n f (y)e−iy·k dz n g(z)e−iz·k = (2π)n/2 fˆ(k)ĝ(k) . (3.60)
(2π)n/2

In other words, the Fourier transform of a convolution is (up to a constant) the product of the two Fourier
transforms. This rule is often useful to work out new Fourier transforms from given ones.

3.2.3 Examples of Fourier transforms


We should now discuss a few examples of Fourier transforms to get a better idea of its interpretation. In
this context it is useful to think of the function f as the amplitude of a sound signal and we will rename its
variable as x → t, to indicate dependence on time. Correspondingly, the variable k of the Fourier transform
fˆ will be renamed as k → ω, indicating frequency. So we write (focusing on the one-dimensional case)
1
Z
fˆ(ω) = √ dt f (t)e−iωt . (3.61)
2π R
The basic idea is that the Fourier transform provides the decomposition of the signal f into its frequency
components e−iωt , that is, fˆ(ω) indicates the strength with which the frequency ω is contained in the

59
signal f . Suppose that f is the signal from a single piano tone with frequency ω0 . In this case, we expect
fˆ to have a strong peak around ω = ω0 . However, a piano tone also contains overtones with frequencies
qω0 , where q = 2, 3, . . .. This means we expect fˆ to have smaller peaks around ω = qω0 . (Their height
decreases with increasing q and exactly what the pattern is determines how the tone “sounds”.) Let us
consider this in a more quantitative way.

Application 3.12. A damped single-frequency signal


Consider the function
 √
2π e−γt eiω0 t for t ≥ 0
f = Afω0 ,γ , fω0 ,γ (t) = , (3.62)
0 for t < 0

where A, γ and ω0 are real, positive constants. Using the above sound analogy, we can think of this
function as representing a sound signal with onset at t = 0, overall amplitude A, frequency ω0 and a
decay time of ∼ 1/γ. Inserting into Eq. (3.61), we find the Fourier transform
Z ∞
ˆ [ [ 1
f (ω) = A fω0 ,γ (ω) , fω0 ,γ (ω) = dt e−γt e−i(ω−ω0 )t = . (3.63)
0 i(ω0 − ω) − γ

To interpret this result we compute its complex modulus


1
|f[ 2
ω0 ,γ (ω)| = , (3.64)
(ω0 − ω)2 + γ 2

and this corresponds to a peak with width ∼ γ around ω = ω0 . The longer the tone, the smaller
γ and the smaller the width of this peak. Note that this is precisely in line with our expectation.
The original signal contains a strong component with frequency ω0 which corresponds to the peak
of the Fourier transform around ω0 . However, there is a spectrum of frequencies around ω0 and this
captures the finite decay time ∼ 1/γ of the signal. The longer the signal the closer it is to a pure
signal with frequency ω0 and the narrower the peak in the Fourier transform.
We can take the sound analogy further by considering
X
f= Aq fqω0 ,γq , (3.65)
q=1,2...

where the function fqω0 ,γq is defined in Eq. (3.62). This represents a tone with frequency ω0 , together
with its overtones with amplitudes Aq , frequencies qω0 and decay constants γq . The Fourier transform
of f is easily computed from linearity:
Aq
fˆ(ω) = Aq f\
X X
qω0 ,γq (ω) = . (3.66)
i(qω0 − ω) − γq
q=1,2,... q=1,2,...

This corresponds to a sequence of peaks at frequencies qω0 , where q = 1, 2, . . . which reflects the main
frequency of the tone, together with its overtone frequencies.

Application 3.13. Fourier transforms of Gaussians

60
Another interesting example to consider is the Fourier transform of the one-dimensional Gaussian
2 /2
f (x) = e−x . (3.67)

with width one. For its Fourier transform we have


1
Z
2 2
ˆ
f (k) = √ dx e−x /2−ikx = e−k /2 . (3.68)
2π R
Exercise 3.12. Proof Eq. (3.68). (Hint: Complete the square in the exponent.)

This result means that the Gaussian is invariant under Fourier transformation. Without much effort,
this one-dimensional result can be generalised to the n-dimensional width one Gaussian
2 /2
f (x) = e−|x| . (3.69)

Its Fourier transform can be split up into a product of n one-dimensional Fourier transforms as
n n
1 1
Z Z
2 /2−ik·x 2 2 2
fˆ(k) =
Y Y
dxn e−|x| = √ dxi e−xi /2−iki xi = e−ki /2 = e−|k| /2 , (3.70)
(2π)n/2 Rn i=1
2π R i=1

and the one-dimensional result (3.68) has been used in the second-last step. Hence, the n-dimensional
width one Gaussian is also invariant under Fourier transformation.
We would like to work out the Fourier transform of a more general Gaussian with width a > 0,
given by
|x|2
fa (x) = e− 2a2 = D1/a (f )(x) . (3.71)
where f is the Gaussian (3.69) with width one and D is the dilation operator defined in Eq. (3.52).
The fact that this can be expressed in terms of the dilation operator makes calculating the Fourier
transform quite easy, using the property (F3) in Lemma 3.1.

fba (k) = D\ n ˆ n −a2 |k|2 /2


1/a (f )(k) = a Da (f )(k) = a e . (3.72)

In the last step the result (3.70) for the Fourier transform fˆ of the width one Gaussian has been used.
In conclusion, the Fourier transform of a Gaussian with width a is again a Gaussian with width 1/a.
Finally, we consider a Gaussian with width a and center shifted from the origin to a point c ∈ Rn
given by
|x − c|2
 
fa,c (x) = exp − = Tc (fa )(x) . (3.73)
2a2
Note that this can be written in terms of the zero-centred Gaussian using the translation opera-
tor (3.52) and we can now use property (F1) in Lemma 3.1 to work out the Fourier transform.

fd \ n −ic·k−a2 |k|2 /2
a,c (k) = Tc (fa )(k) = E−c (fa )(k) = a e
b . (3.74)

Application 3.14. Characteristic function of an interval

61
ΧHxL fHxL

2.0 2.0

1.5 1.5

1.0 1.0

0.5 0.5

0.0 0.0
χ?χ
-4 -2 0 2 4 −→ -4 -2 0 2 4

Figure 11: The graph of the characteristic function χ of the interval [−1, 1] (left) and the graph of the
convolution f = χ ? χ (right).

` `
Χ HxL f HxL

1.5 1.5

1.0 1.0

0.5 0.5

0.0 x 0.0 x

χ?χ
-15 -10 -5 0 5 10 15 −→ -15 -10 -5 0 5 10 15

Figure 12: The graph of the Fourier transform


√ χ̂ of2the characteristic function χ (left) and the graph for the
Fourier transform of the convolution χ[
? χ = 2π χ̂ . (right).

Consider the characteristic function χ : R → R of the interval [−1, 1] defined by



1 for |x| ≤ 1
χ(x) = . (3.75)
0 for |x| > 1

A quick calculation shows that its Fourier transform is given by


Z 1 r
1 −ikx 2 sin k
χ̂(k) = √ dx e = . (3.76)
2π −1 π k

Exercise 3.13. Show that the Fourier transform (3.76) of the characteristic function χ is not in
L1 (R). (Hint: Find a lower bound for the integral over | sin k/k| from (m − 1)π to mπ.) Use the
dilation operator to find the Fourier transform of the characteristic function χa for the interval [−a, a].

We can use this example as an illustration of convolutions and their application to Fourier transforms.

62
Consider the convolution f = χ ? χ of χ with itself which is given by
Z
f (x) = dy χ(y)χ(y − x) = max(2 − |x|, 0) . (3.77)
R

The graphs of χ and its convolution f = χ?χ are shown in Fig. 11. From the convolution theorem 3.11
and the Fourier transform χ̂ in Eq. (3.76) we have


r
2 sin2 k
fˆ(k) = χ[? χ(k) = 2π χ̂2 (k) = 2 . (3.78)
π k2

Fig. 12 shows the graphs for the Fourier transforms χ̂ and fˆ.

3.2.4 The inverse of the Fourier transform


We should now come back to general properties of the Fourier transform. An obvious question is how to
obtain a function f from its Fourier transform fˆ and this is answered by the following theorem.

Theorem 3.14. (Inversion formula for the Fourier transform)


Consider a function f ∈ L1 (Rn ) such that fˆ ∈ L1 (Rn ). Then we have

1
Z
f (x) = dk n fˆ(k)eik·x , (3.79)
(2π)n/2 Rn

almost everywhere, that is for all x ∈ Rn except possibly on a set of Lebesgue measure zero.

Proof. The proof is somewhat technical (suggested by the fact that equality can fail on a set of measure
zero) and can, for example, be found in Ref. [10].

Note that the inversion formula (3.79) is very similar to the original definition (3.51) of the Fourier
transform, except for the change of sign in the exponent. It is, therefore, useful to introduce the linear
operator
1
Z
ˆ
F̃(f )(x) := dk n fˆ(k)eik·x (3.80)
(2π)n/2 Rn
for the inverse Fourier transform. With this terminology, the statement of Theorem 3.14 can be expressed
as
F̃ ◦ F(f ) = f ⇒ F ◦ F̃(f ) = f . (3.81)

Exercise 3.15. Show that the equation on the RHS of (3.81) does indeed follow from the equation on the
LHS. (Hint: Think about complex conjugation.)

Theorem 3.14 also means that a function f is uniquely (up to values on a measure zero set) determined
by its Fourier transform fˆ.

3.2.5 Fourier transform in L2


In Exercise 3.13 we have seen that the Fourier transform of a function in L1 (Rn ) may not be an element
of L1 (Rn ). This is somewhat unsatisfactory and we will now see that the Fourier transform has nicer
properties when defined on the space L2 (Rn ). We begin with the following technical Lemma.

Lemma 3.1. If f ∈ Ccn+1 (Rn ) then the Fourier transform fˆ is integrable, that is, fˆ ∈ L1 (Rn ).

63
[
Proof. Property (F4) in Lemma 3.1 states that D ˆ
xj f (k) = ikj f (k) which implies that

|ki fˆ(k)| ≤ k Di f k1 /(2π)n/2 . (3.82)

Differentiating and applying this rule repeatedly, we conclude that there is a constant K such that
n
!n+1
|fˆ(k)| ≤ K
X
1+ |ki | (3.83)
i=1

and this means that fˆ is integrable.

The next Lemma explores the relationship between the Fourier transform and the standard scalar product
on L2 (Rn ).

Lemma 3.2. (a) Let f, g ∈ L1 (Rn ) with Fourier transforms fˆ and ĝ. Then fˆg and f ĝ are integrable and
we have Z Z
n ˆ
dx f (x)g(x) = dxn f (x)ĝ(x) (3.84)
Rn Rn

(b) For f, g ∈ L1 (Rn ) ∩ L2 (Rn ) we have fˆ, ĝ ∈ L2 (Rn ) and

hf, gi = hfˆ, ĝi , (3.85)

where h·, ·i denotes the standard scalar product on L2 (Rn ) .

Proof. (a) Since fˆ, ĝ are bounded and continuous, fˆg and f ĝ are indeed integrable. It follows

1
Z Z Z
n
dx f (x)ĝ(x) = n n
dx dy f (x)g(y)e −ix·y
= dy n fˆ(y)g(y) . (3.86)
(2π)n/2

(b) For h, g ∈ Cc∞ (Rn ) we have, from Lemma 3.1, that ĥ, ĝ ∈ L1 (Rn ). Then, we can apply part (a) to get
Z Z Z
hĥ, ĝi = dx F̃(h̄)(x)ĝ(x) = dx F ◦ F̃(h̄)(x)g(x) = dxn h̄(x)g(x) = hh, gi .
n n
(3.87)

To extend this statement to L1 (Rn ) ∩ L2 (Rn ) we recall from Theorem 2.1 that Cc∞ (Rn ) is dense in this
space. We can, therefore, approximate functions f, g ∈ L1 (Rn )∩L2 (Rn ) by sequences (fk ), (gk ) in Cc∞ (Rn ).
We have already shown that the property (3.85) holds for all fk , gk and, by taking the limit k → ∞ through
the scalar product it follows for f, g. In particular, taking f = g, it follows that k f k2 = k fˆ k2 which
shows that fˆ ∈ L2 (Rn ).

Clearly, Eq. (3.85) is a unitarity property of the Fourier transform, relative to the standard scalar product
on L2 (Rn ). However, to make this consistent, we have to extend the Fourier transform to all of L2 (Rn )
and this is the content of the following theorem.

Theorem 3.16. (Plancherel) There exist a vector space isomorphism T : L2 (Rn ) → L2 (Rn ) with the
following properties:
(a) hT (f ), T (g)i = hf, gi for all f, g ∈ L2 (Rn ). This implies k T (f ) k = k f k for all f ∈ L2 (Rn )
(b) T (f ) = F(f ) for all f ∈ L1 (Rn ) ∩ L2 (Rn )
(c) T −1 (f ) = F̃(f ) for all f ∈ L1 (Rn ) ∩ L2 (Rn )

64
Proof. Since L1 (Rn ) ∩ L2 (Rn ) is dense in L2 (Rn ) we can find, for every f ∈ L2 (Rn ), a sequence (fk ) in
L1 (Rn ) ∩ L2 (Rn ) which converges to f in the norm k · k2 . We set

T (f ) := lim fˆk . (3.88)


k→∞

From Lemma 3.2 (b) the scalar product is preserved for F on L1 (Rn ) ∩ L2 (Rn ) and by taking the limit
through the scalar product, the same property follows for the operator T on L2 (Rn ). If T (f ) = 0 we have
0 = k T (f ) k2 = k f k2 and, hence, f = 0. This means that T is injective. From Theorem 3.14 we have
F ◦ F̃(f ) = f so that Cc∞ (Rn ) ⊂ Im(T ). For a g ∈ L2 (Rn ) we pick a sequence (gk = F(fn )) in Cc∞ (Rn )
which converges to g. For f = limk→∞ fk we then have T (f ) = limk→∞ fˆk = limk→∞ gk = g so that T is
surjective.

It follows that the extension of the Fourier transform to L2 (Rn ) is a unitary linear operator, that is,
an operator which preserves the value of the scalar product on L2 (Rn ). Let us illustrate this with an
example.

Application 3.15. Unitarity of the Fourier transformation


Consider the Example 3 above for the characteristic function χ of the interval [−1, 1], defined in
Eq. (3.75), with Fourier transform χ̂ given in Eq. (3.76). As discussed in Exercise 3.13, χ̂ is not an
element of L1 (R), however, it is (and must be from the above theorem) contained in L2 (R). Since,
from Theorem 3.16, k χ k = k χ̂ k we find

2 sin2 x
Z
2 2
2 = k χ k = k χ̂ k = dx . (3.89)
π R x2

While the norm k χ k is easily worked out the same cannot be said for the integral on the RHS, so
unitarity of the Fourier transform can lead to non-trivial statements.

Exercise 3.17. Show that the Gaussian fa with width a in Eq. (3.71) has the same L2 (Rn ) norm as its
Fourier transform fˆa .

65
4 Orthogonal polynomials
In the previous section, we have discussed the Fourier series and have found a basis for the Hilbert space
L2 [−a, a] in terms of sine and cosine functions. Of course we know from the Stone-Weierstrass theorem 1.35
combined with Theorem 2.1 that the polynomials are dense in the Hilbert space L2 ([a, b]) (and we have
used this for some of the proofs related to the Fourier series). So, rather than using relatively complicated,
transcendental functions such as sine and cosine as basis functions there is a much simpler possibility: a
basis for L2 ([a, b]) which consists of polynomials. Of course we would want this to be an ortho-normal basis
relative to the standard scalar product (1.101) on L2 ([a, b]). A rather pedestrian method to find ortho-
normal polynomials is to start with the monomials (1, x, x2 , . . .) and apply the Gram-Schmidt procedure.
Exercise 4.1. For the Hilbert space L2 ([−1, 1]), apply the Gram-Schmidt procedure
q to the monomials
q
2 1
1, x, x and show that this leads to the ortho-normal system of polynomials √2 , 2 x, 58 (3x2 − 1).
3

The polynomials in Exercise 4.1 obtained in this way are (proportional to) the first three Legendre poly-
nomials which we will discuss in more detail soon. Evidently, the Gram-Schmidt procedure, while con-
ceptually very clear, is not a particularly efficient method in this context. We would like to obtain concise
formulae for orthogonal polynomials at all degrees. There is also an important generalisation. In addition
to finite intervals, we would also like to allow semi-infinite or infinite intervals, so we would like to allow
a = −∞ or b = ∞. Of course for such a semi-infinite or infinite interval, polynomials do not have a finite
norm relative to the standard scalar product (1.101) on L2 ([a, b]). To rectify this, we have to consider the
Hilbert spaces L2w ([a, b]) with an integrable weight function w and a scalar product defined by
Z b
hf, gi = dx w(x)f (x)g(x) , (4.1)
a

and choose w appropriately. Thinking about the types of intervals, that is, finite intervals [a, b], semi-
infinite intervals [a, ∞] or an infinite interval [−∞, ∞] and corresponding suitable weight functions w
will lead to a classification of different types of orthogonal polynomials which we discuss in the following
subsection. For the remainder of the section we will be looking at various entries in this classification,
focusing on the cases which are particularly relevant to applications in physics.

4.1 General theory of ortho-normal polynomials


4.1.1 Basic set-up
We will be working in the Hilbert space L2w ([a, b]) with weight function w and scalar product (4.1). The
interval [a, b] can be finite but we also allow the cases |a| < ∞, b = ∞ of a semi-finite interval and
−a = b = ∞ of the entire real line. On such a space, we would like to find a system of polynomials
(Pn )∞
n=0 , where we demand that Pn is of degree n, which is orthogonal, that is, which satisfies

hPn , Pm i = hn δnm , (4.2)

for positive numbers hn . (By convention, the standard orthogonal polynomials are not normalised to one,
hence the constants hn .) We also introduce the notation

Pn (x) = kn xn + kn0 xn−1 + · · · , (4.3)

that is we call kn 6= 0 the coefficient of the leading monomial xn and kn0 the coefficient of the sub-leading
monomial xn−1 in Pn . An immediate consequence of orthogonality is the relation

hPn , pi = 0 (4.4)

66
for any polynomial p with degree less than n (This follows because such a p can be written as a linear
combination of P0 , . . . , Pn−1 which must be orthogonal to Pn .) Furthermore, Pn is (up to an overall
constant) uniquely characterised by this property.

4.1.2 Recursion relation


We already know that we can get such an ortho-normal system of polynomials by applying the Gram-
Schmidt procedure to the monomials (1, x, x2 , . . .) and, while this may not be very practical, it tells us that
the polynomials Pn are unique, up to overall constants (which are fixed, up to signs, once the constants
hn are fixed). This statement is made more explicit in the following
Theorem 4.2. (Recursion relations for orthogonal polynomials) The orthogonal polynomials Pn satisfy
the following recursion relation

Pn+1 (x) = (An x + Bn )Pn (x) − Cn Pn−1 (x) , (4.5)

for n = 1, 2, . . ., where
0
kn+1 k0
 
kn+1 An hn
An = , Bn = An − n , Cn = . (4.6)
kn kn+1 kn An−1 hn−1

Proof. We start by considering the polynomial Pn+1 − An xPn which (due to the above definition of An )
is of degree n, rather than n + 1, and can, hence, be written as
n
X
Pn+1 − An xPn = αi Pi , (4.7)
i=0

for some αi ∈ R. Taking the inner product of this relation with Pk immediately leads to αk = 0 for
k = 0, . . . , n − 2. This means we are left with

Pn+1 − An xPn = bn Pn − cn Pn−1 , (4.8)

and it remains to be shown that bn = Bn and cn = Cn . The first of these statements follows very easily
by inserting the expressions (4.3) into Eq. (4.8) and comparing coefficients of the xn term. To fix cn we
write Eq. (4.8) as
cn Pn−1 = −Pn+1 + An xPn + Bn Pn (4.9)
and take the inner product of this equation with Pn−1 . This leads to

cn hn−1 = cn k Pn−1 k2 = −hPn+1 , Pn−1 i + An hxPn , Pn−1 i + Bn hPn , Pn−1 i = An hPn , xPn−1 i
An kn−1 An hn
= hPn , Pn i =
kn An−1
and the desired result cn = Cn follows.

4.1.3 General Rodriguez formula


There is yet another way to obtain the orthogonal polynomials Pn , via a derivative formula. To see how
this works, consider the functions Fn defined by

1 d n  (b − x)(a − x) for |a|, |b| < ∞
n
Fn (x) = (w(x)X ) , X= x−a for |a| < ∞ , b = ∞ . (4.10)
w(x) dxn
1 for −a = b = ∞

67
Of course we do not know whether these functions are polynomials. Whether they are depends on the
choice of weight function w and we will come back to this point shortly. But for now, let us assume that
w is such that the Fn are polynomials of degree n. For any polynomial p of degree n − 1 we then have
Z b Z b
dn dn p
hFn , pi = dx n (w(x)X n ) p(x) = (−1)n dx w(x)X n n (x) = 0 , (4.11)
a dx a dx
where we have integrated by parts n times. We recall that the orthogonality property (4.4) determines
Pn uniquely (up to an overall constant) and since Fn has the same property we conclude there must be
constants Kn such that
Fn = Kn Pn . (4.12)
This calculation shows the idea behind the definition (4.10) of the functions Fn . The presence of the
derivatives (and of X which ensures vanishing of the boundary terms) means that, provided the Fn are
polynomials of degree n, they are orthogonal.
Theorem 4.3. (Rodriguez formula) If the functions Fn defined in Eq. (4.10) are polynomials of degree n
they are proportional to the orthogonal polynomials Pn , so we have constants Kn such that Fn = Kn Pn .
It follows that

1 dn  (b − x)(a − x) for |a|, |b| < ∞
n
Pn (x) = (w(x)X ) , X= x−a for |a| < ∞ , b = ∞ (4.13)
Kn w(x) dxn
1 for −a = b = ∞

and this is called the (generalised) Rodriguez formula.

4.1.4 Classification of orthogonal polynomials


What remains to be done is to find the weight functions w for which the Fn are indeed polynomials of
order n. We start with the case of a finite interval [a, b] with X = (b − x)(a − x) and demand that F1 be
a (linear) polynomial, so

1 d 1 ! w0 (x) α β
F1 (x) = (w(x)X) = w0 (x)X + X 0 = Ax + B ⇒ = + , (4.14)
w(x) dx w(x) w(x) x−b x−a
for suitable constants A, B, α, β. Solving the differential equation leads to

w(x) = C(b − x)α (x − a)β (4.15)

and we can set C = 1 by a re-scaling of coordinates. Further, since w needs to be integrable we have to
demand that α > −1 and β > −1. Conversely, it can be shown by calculation that for any such choice of
w the functions Fn are indeed polynomials of degree n.
For the case |a| < ∞ and b = ∞ of the half-infinite interval we can proceed analogously and find that
the Fn are polynomials of degree n iff

w(x) = e−x (x − a)α , (4.16)

where α > −1.


Finally, for the entire real line, −a = b = ∞ a similar calculation leads to
2
w(x) = e−x . (4.17)

The results from this discussion can be summarised in the following

68
[a, b] α, β X w(x) name symbol
(α,β)
[−1, 1] α > −1, β > −1 x2 −1 (1 − x)α (x + 1)β Jacobi Pn
2 (α,α)
[−1, 1] α = β > −1 x −1 (1 − x) (x + 1)α
α Gegenbauer Pn
(±)
[−1, 1] α = β = ± 12 x2 − 1 (1 − x)±1/2 (x + 1)±1/2 Chebyshev Tn
[−1, 1] α=β=0 x2 − 1 1 Legendre Pn
(α)
[0, ∞] α > −1 x e−x xα Laguerre Ln
[0, ∞] α=0 x e−x Laguerre Ln
2
[−∞, ∞] 1 e−x Hermite Hn

Table 1: The types of orthogonal polynomials and several sub-classes which result from the classification
in Theorem 4.4. The explicit polynomials are obtained by inserting the quantities in the Table into the
Rodriguez formula (4.18).

Theorem 4.4. The functions



1 dn  (b − x)(a − x) for |a|, |b| < ∞
Fn (x) = (w(x)X n ) , X= x−a for |a| < ∞ , b = ∞ . (4.18)
w(x) dxn
1 for −a = b = ∞

are polynomials of degree n iff the weight function w is given by


α β

 (b − x) (x − a) for |a|, |b| < ∞
w(x) = e−x (x − a)α for |a| < ∞ , b = ∞ , (4.19)
 −x2
e for −a = b = ∞

where α > −1 and β > −1. In this case the Fn are orthogonal and Fn = Kn Pn for constants Kn .
This theorem implies a classification of orthogonal polynomials in terms of the type of interval, the limits
[a, b] of the interval and the powers α and β which enter the weight function. (Of course a finite interval
[a, b] can always be re-scaled to the standard interval [−1, 1] and a semi-infinite interval [a, ∞] to [0, ∞].)
The different types and important sub-classes of orthogonal polynomials which arise from this classification
are listed in Table 4.1.4. We cannot discuss all of these types in detail but in the following we will focus
on the Legendre, the α = 0 Laguerre and the Hermite polynomials which are the most relevant ones
for applications in physics. Before we get to this we should derive more common properties of all these
orthogonal polynomials.

4.1.5 Differential equation for orthogonal polynomials


All orthogonal polynomials satisfy a (second order) differential equation, perhaps not surprising given
their representation in terms of derivatives, as in the Rodriguez formula.
Theorem 4.5. All orthogonal polynomials Pn covered by the classification theorem 4.4 satisfy the second
order linear differential equation
n − 1 00
 
00 0
Xy + K1 P1 y − n k1 K1 + X y=0, (4.20)
2
where P1 is the linear orthogonal polynomial, k1 is the coefficient in front of its linear term, X the function
in Theorem 4.4 and the coefficient K1 is defined in Theorem 4.3.

69
d
Proof. For ease of notation we abbreviate D = dx and evaluate Dn+1 (XD(wX n )) in two different ways,
remembering that X is a polynomial of degree at most two.
1
Dn+1 (XD(wX n )) = XDn+2 (wX n ) + (n + 1)X 0 Dn+1 (wX n ) + n(n + 1)X 00 Dn (wX n )
 2 
2 0 1 00
= Kn XD (wPn ) + (n + 1)X D(wPn ) + n(n + 1)X wPn
2
n+1 n n+1 n−1 n 0

D (XD(wX )) = D XD(wX)X + (n − 1)wX X
= Kn (K1 P1 + (n − 1)X 0 )D(wPn ) + (n + 1)(k1 K1 + (n − 1)X 00 )wPn .
 

Equating these two results and replacing y = Pn gives after a straightforward calculation
   
00 0 0 0 00 0 0 1 00
wXy +(2Xw +2wX −wK1 P1 )y + Xw + (2X − K1 P1 )w − (n + 1) k1 K1 + (n − 2)X w y = 0 .
2
By working out D(wX) and D2 (wX) one easily concludes that
w0 w00 w0
X = K1 P1 − X 0 , X = (K1 P1 − 2X 0 ) + k1 K1 − X 00 . (4.21)
w w w
Using these results to replace w0 in the factor of y 0 and w00 in the factor of y in the above differential
equation we arrive at the desired result.

4.1.6 Expanding in orthogonal polynomials


Conventionally, the orthogonal polynomials are not normalised to one, so k Pn k2 = hn for positive con-
stants hn which can be computed explicitly for every specific example. Of course we can define ortho-
normal systems of polynomials by simple normalising, so
1 1
P̂n := Pn = √ Pn . (4.22)
k Pn k hn
These normalised versions of the orthogonal polynomials obviously form an ortho-normal system on
L2w ([a, b]), that is,
hP̂n , P̂m i = δnm . (4.23)
In fact, they do form an ortho-normal basis as stated in the following theorem.
Theorem 4.6. The normalised version P̂n in Eq. (4.22) of the orthogonal polynomials classified in The-
orem 4.4 form an ortho-normal basis of L2w ([a, b]).
Proof. For a finite interval [a, b] this follows by combining Theorems 2.1 and 1.35, just as we did in our
proofs for the Fourier series. For semi-infinite and infinite intervals the proofs can, for example, be found
in Ref. [7]. In the next chapter, when we discuss orthogonal polynomials as eigenvectors of hermitian
operators we will obtain an independent proof for the basis properties in those cases.

The above theorem means that every function f ∈ L2w ([a, b]) can be expanded as

X
f= hP̂n , f i P̂n , (4.24)
n=0

or, more explicitly



X Z b
f (x) = an P̂n (x) , an = dx w(x)P̂n (x)f (x) . (4.25)
n=0 a

70
This is of course completely in line with the general idea of expanding vectors in terms of an ortho-normal
basis on a Hilbert space and it can be viewed as the polynomial analogue of the Fourier series.
We should now discuss the most important orthogonal polynomials in more detail.

4.2 The Legendre polynomials


We recall from Table 4.1.4 that the Legendre polynomials are defined on the finite interval [a, b] = [−1, 1],
the function X is given by X(x) = x2 − 1 and the weight function is simply w(x) = 1, so that the relevant
Hilbert space is L2 ([−1, 1]). They are conventionally denoted by Pn (not to be confused with the general
notation Pn for all orthogonal polynomials we have used in the previous subsection). Their Rodriguez
formula reads
1 dn 2
Pn (x) = n (x − 1)n ⇒ Kn = 2n n! , (4.26)
2 n! dxn
where the pre-factor is conventional and we have read off the value of the constants Kn in Theorem 4.3.
They are symmetric for n even and anti-symmetric for n odd, so Pn (x) = (−1)n Pn (−x). Since (x2 − 1)n =
x2n + nx2n−2 + · · · we can easily read off the coefficients kn and kn0 of the monomials xn and xn−1 in Pn
as
(2n)!
kn = n , kn0 = 0 . (4.27)
2 (n!)2
For the normalisation of Pn we find from the Rodriguez formula
Z 1 Z 1
2 2 1 dn
hn = k P n k = dx P (x) = n dx Pn (x) n (x2 − 1)n (4.28)
−1 2 n! −1 dx
Z 1
1 n
d Pn kn 1 2
Z
2 n
= n dx n
(x)(1 − x ) = n dx (1 − x2 )n = , (4.29)
2 n! −1 dx 2 −1 2n + 1

where we have integrated by parts n times. This means the associated basis of ortho-normal polynomials
on L2 ([−1, 1]) is r
2n + 1
P̂n = Pn , (4.30)
2
and functions f ∈ L2 ([−1, 1]) can be expanded as

X
f= hP̂n , f iP̂n , (4.31)
n=0

or, more explicitly, shifting the normalisation factors into the integral, as
∞ 1
2n + 1
X Z
f (x) = an Pn (x) , an = dx Pn (x)f (x) . (4.32)
2 −1
n=0

Such expansions are useful and frequently appear when spherical coordinates are used and we have the
standard inclination angle θ ∈ [0, π]. In this case, the Legendre polynomials are usually a function of
x = cos θ which takes values in the required range [−1, 1]. We will see more explicit examples of this
shortly.
With the above results it is easy to compute the constants An , Bn and Cn which appear in the general
recursion formula (4.5) and we find
2n + 1 n
An = , Bn = 0 , Cn = . (4.33)
n+1 n+1

71
Using these values to specialise Eq. (4.5) we find the recursion formula

(n + 1)Pn+1 (x) = (2n + 1)xPn (x) − nPn−1 (x) (4.34)

for the Legendre polynomials. From the Rodriguez formula (4.26) we can easily compute the first few
Legendre polynomials:
1 1 1
P0 (x) = 1, P1 (x) = x, P2 (x) = (3x2 − 1), P3 (x) = (5x3 − 3x), P4 (x) = (35x4 − 30x2 + 3) . (4.35)
2 2 8
Exercise 4.7. Verify that the first four Legendre polynomials in Eq. (4.35) are orthogonal and are nor-
malised as in Eq. (4.29).

We can insert the results X = x2 − 1, X 00 = 2, P1 (x) = x, K1 = 2 and k1 = 1 into the general differential
equation (4.20) to obtain
(1 − x2 )y 00 − 2xy 0 + n(n + 1)y = 0 . (4.36)
This is the Legendre differential equations which all Legendre polynomials Pn satisfy.

Exercise 4.8. Show that the first four Legendre polynomials in Eq. (4.35) satisfy the Legendre differential
equation (4.36).

Another feature of orthogonal polynomials is the existence of a generating function G = G(x, z) defined
as
X∞
G(x, z) = Pn (x)z n (4.37)
n=0

The generating function encodes all orthogonal polynomials at once and the nth one can be read off as the
coefficient of the z n term in the expansion of G. Of course for this to be of practical use we have to find
another more concise way of writing the generating function. This can be obtained from the recursion
relation (4.34) which leads to
∞ ∞ ∞
∂G X X X
= Pn (x)nz n−1 = (n + 1)Pn+1 (x)z n = [(2n + 1)xPn (x) − nPn−1 (x)] z n
∂z
n=1 n=0 n=0
∞ ∞ ∞
X X X ∂G
= 2xz Pn (x)nz n−1 + x Pn (x)z n − Pn (x)(n + 1)z n+1 = (2xz − z 2 ) + (x − z)G ,
∂z
n=1 n=0 n=0

This provides us with a differential equation for G whose solution is G(x, z) = c(1 − 2xz + z 2 )−1/2 , where
c is a constant. Since c = G(x, 0) = P0 (x) = 1, we have

1 X
G(x, z) = √ = Pn (x)z n . (4.38)
1 − 2xz + z 2 n=0

Exercise 4.9. Check that the generating function (4.38) leads to the correct Legendre polynomials Pn , for
n = 0, 1, 2.

Note that Eq. (4.38) can be viewed as an expansion of the generating function G in the sense of Eq. (4.32),
with expansion coefficients an = z n .

72
Application 4.16. Expanding the Coulomb potential
An important application of the above generating function is to the expansion of a Coulomb potential
term of the form
1
V (r, r0 ) = , (4.39)
|r − r0 |
0
where r, r0 ∈ R3 . Introducing the radii r = |r|, r0 = |r0 |, the angle cos θ = r·r
rr0 and setting x = cos θ
r0
and z = r we can use the generating function to re-write the above Coulomb term as
∞  0 n
0 1 1 1 1X r
V (r, r ) = q = G(x, z) = Pn (cos θ) . (4.40)
r r 0  r0 2
 r r r
1−2 r cos θ + r
n=0

The series on the RHS converges for r0 < r. If r0 > r we can write a similar expansion with the role of
r and r0 exchanged. This formula is very useful in the context of multipole expansions, for example in
electromagnetism. The nth term in this expansions falls off as r−(n+1) , so the n = 0 term corresponds
to the monopole contribution, the n = 1 term to the dipole, the n = 2 term to the quadrupole etc.

4.2.1 Associated Legendre polynomials


Closely related to the Legendre polynomials are the associated Legendre functions Plm , defined by

1 dl+m
Plm (x) = l
(1 − x2 )m/2 l+m (x2 − 1)l , (4.41)
2 l! dx
where l = 0, 1, . . . and m = −l, . . . , l. Clearly, the Legendre polynomials are obtained for m = 0, so
Pl = Pl0 , and for positive m the associated Legendre functions can be written in terms of Legendre
polynomials as
dm
Plm (x) = (1 − x2 )m/2 m Pl (x) . (4.42)
dx
The associated Legendre functions are solutions of the differential equation
m2
 
2 00 0
(1 − x )y − 2xy + l(l + 1) − y=0, (4.43)
1 − x2
generalising the Legendre differential equation (4.36) to which it reduces for m = 0. A calculation based
on Eq. (4.41) and partial integration leads to the orthogonality relations
Z 1
2 (l + m)!
dx Plm (x)Plm
0 (x) = δll0 . (4.44)
−1 2l + 1 (l − m)!
Exercise 4.10. Show that the associated Legendre polynomials Plm solve the differential equation (4.43)
and satisfy the orthogonality relations (4.44).

4.3 The Laguerre polynomials


From Table 4.1.4, the Laguerre polynomials (with α = 0) are defined on the interval [a, b] = [0, ∞], the
function X is given by X(x) = x and the weight function is w(x) = e−x , so that the relevant Hilbert space
is L2w ([0, ∞]). They are denoted by Ln and, inserting into Eq. (4.13), their Rodriguez formula is
1 x dn −x n
Ln = e (e x ) , (4.45)
n! dxn

73
where the pre-factor is conventional and implies that Kn = n!. It is easy to extract from the Rodriguez
formula the coefficients kn and kn0 of xn and xn−1 in Ln and they are given by
(−1)n (−1)n−1 n
kn = , kn0 = . (4.46)
n! (n − 1)!
The normalisation of the Ln is computed from the Rodriguez formula with the usual partial integration
trick and it follows
Z ∞
1 ∞ dn −x n (−1)n ∞ dn Ln
Z Z
2 −x
hn = k Ln k = 2
dx e Ln (x) = d Ln (x) n (e x ) = dx (x)e−x xn
0 n! 0 dx n! 0 dxn
Z ∞
n
=(−1) kn dx e−x xn = 1 . (4.47)
0
Hence, with our convention, the Laguerre polynomials are already normalised 6 . Functions f ∈ L2w ([0, ∞])
can now be expanded as
X∞ ∞
X Z ∞
f= hLn , f iLn or f (x) = an Ln (x) , an = dx e−x Ln (x)f (x) . (4.48)
n=0 n=0 0

Expansions in terms of Laguerre polynomials are often useful for functions which depend on a radius r
which has a natural range [0, ∞].
Inserting the above results for kn , kn0 and hn into Eq. (4.6) gives
1 2n + 1 n
An = − , Bn = , Cn = (4.49)
n+1 n+1 n+1
and using these values in Eq. (4.5) leads to the recursion relation
(n + 1)Ln+1 (x) = (2n + 1 − x)Ln (x) − nLn−1 (x) (4.50)
for the Laguerre polynomials. From the Rodriguez formula the first few Laguerre polynomials are given
by
1 1
L0 (x) = 1 , L1 (x) = −x + 1 , L2 (x) = (x2 − 4x + 2) , L3 (x) = (−x3 + 9x2 − 18x + 6) . (4.51)
2 6
00
Inserting X = x, K1 = 1, k1 = −1, P1 = 1 − x and X = 0 into Eq. (4.20) gives the differential equation
xy 00 + (1 − x)y 0 + ny = 0 (4.52)
for the Laguerre polynomials. For the generating function

X
G(x, z) = Ln (x)z n (4.53)
n=0
we can derive a differential equation in much the same way as we did for the Legendre polynomials, using
the recursion relation (4.50). This leads to
∂G 1−x−z
= G, (4.54)
∂z (1 − z)2
and the solution is   ∞
xz 1 X
G(x, z) = exp − = Ln (x)z n . (4.55)
1−z 1−z
n=0
Exercise 4.11. Derive the differential equation (4.54) for the generating function of the Laguerre poly-
nomials and show that its solution is given by Eq. (4.55).
6
Sometimes the Ln are defined without the n! factor in the Rodriguez formula (4.45). For this convention the Ln are of
course not normalised to one, but instead hn = k Ln k2 = (n!)2 .

74
4.4 The Hermite polynomials
From Table 4.1.4, the Hermite polynomials are defined on the interval [a, b] = [−∞, ∞] = R, we have
2
X = 1 and the weight function is w(x) = e−x , so the relevant Hilbert space is L2w (R). They are denoted
Hn and, from Eq. (4.3), their Rodriguez formula reads

2 dn −x2
Hn (x) = (−1)n ex e , (4.56)
dxn
where the pre-factor is conventional and implies that Kn = (−1)n . Their symmetry properties are Hn (x) =
(−1)n Hn (−x). From this formula it is easy to read off the coefficients kn and kn0 of the leading and sub-
leading monomials xn and xn−1 in Hn as

kn = 2n , kn0 = 0 . (4.57)

The normalisation of the Hermite polynomials is computed as before, by using the Rodriguez formula (4.56)
combined with partial integration:
dn −x2 dn Hn
Z Z Z
2 −x2 2
hn = k Hn k = dx e Hn (x) = (−1)2 n
dx Hn (x) n e = dx n
(x)e−x
R Z R dx R dx
2 √
=kn n! dx e−x = π2n n! . (4.58)
R

Hence, the ortho-normal basis of L2w (R) is given by


1
Ĥn (x) = p√ Hn (x) ⇒ hĤn , Ĥm i = δnm , (4.59)
π2n n!

and functions f ∈ L2w (R) can be expanded as



X
f= hĤn , f iĤn . (4.60)
n=0

More explicitly and rearranging the coefficients this reads



1
Z
2
X
f (x) = an Hn (x) , an = √ n dx e−x Hn (x)f (x) . (4.61)
n=0
π2 n! R

The Hermite polynomials are useful for expanding functions defined on the entire real line and they make
a prominent appearance in the wave functions for the quantum harmonic oscillator.
From the above results for kn , kn0 and hn it is easy, by inserting into Eq. (4.6), to work out

An = 2 , Bn = 0 , Cn = 2n , (4.62)

and, from Eq. (4.5), this leads to the recursion relation

Hn+1 (x) = 2xHn (x) − 2nHn−1 (x) (4.63)

for the Hermite polynomials. Rodriguez’s formula (4.56) can be used to work out the first few Hermite
polynomials which are given by

H0 (x) = 1 , H1 (x) = 2x , H2 (x) = 4x2 − 2 , H3 (x) = 8x3 − 12x . (4.64)

75
With X = 1, X 00 = 0, K1 = −1, H1 (x) = 2x and k1 = 2 Eq. (4.20) turns into the differential equation for
Hemite polynomials
y 00 − 2xy 0 + 2ny = 0 . (4.65)
The generating function

X zn
G(x, z) = Hn (x) (4.66)
n!
n=0

can be derived from the differential equation


∂G
= 2(x − z)G , (4.67)
∂z
which, as in the case of the Legendre and Laguerre polynomials, follows by differentiating Eq. (4.66) and
using the recursion relation (4.63). The solution is

 X zn
G(x, z) = exp 2xz − z 2 = Hn (x) (4.68)
n!
n=0

Exercise 4.12. Show that the generating function G for the Hermite polynomials satisfies the differential
equation (4.67) and verify that it is solved by Eq. (4.68).

Application 4.17. Expanding in terms of Hermite polynomials


We would like to expand the function f (x) = sin(x) in terms of Hermite polynomials. Computing the
expansion coefficients bn := hHn , f i for all n seems like a tall order. We can make our life considerably
easier by using the generating function Gz (x) = G(x, z) from Eq. (4.68) and define the function

zn dn Z
 
(4.68) X
Z(z) := hGz , f i = bn ⇒ bn = .
n! dz n z=0
n=0

This function Z can be view as a generating function for the coefficients bn . Using the explicit form
of G from Eq. (4.68) we find

π
Z Z
2 2
Z(z) = dx f (x)e−(x−z) = dx sin(x)e−(x−z) = 1/4 sin(z) .
R R e
Hence, the coefficients bn are given by
√ (
0 for n even
dn Z dn sin(z)
   
π
bn = = = √
π(−1)(n−1)/2
dz n z=0 e1/4 dz n z=0 for n odd
e1/4

76
5 Ordinary linear differential equations
In this chapter, our focus will be on linear, second order differential equations of the form
α2 (x)y 00 + α1 (x)y 0 + α0 (x)y = f (x)

(5.1)
α2 (x)y 00 + α1 (x)y 0 + α0 (x)y = 0
where α0 , α1 and α2 as well as f are given functions. Clearly, the upper equation is inhomogeneous with
source f and the lower equation is its homogeneous counterpart. In operator form this can be written as

Ty = f
T = α2 D2 + α1 D + α0 , (5.2)
Ty = 0
where D = dx d
. For the range of x we would like to consider the interval [a, b] ⊂ R (where the semi-infinite
and infinite case is allowed) and, provided αi ∈ C ∞ ([a, b]), we can think of T as a linear operator T :
C ∞ ([a, b]) → C ∞ ([a, b]). We note that the general differential equations (4.20) for orthogonal polynomials
(and, hence, the Legendre, Laguerre and Hermitian differentials equations in Eqs. (4.36), (4.52) and (4.65))
are homogeneous equations of the form (5.1).
The above equations are usually solved subject to additional conditions on the solution y and there
are two ways of imposing such conditions. The first one, which leads to what is called an initial value
problem, is to ask for solutions to either of Eqs. (5.1) which, in addition, satisfy the “initial conditions”
y(x0 ) = y0 , y 0 (x0 ) = y00 , (5.3)
for x0 ∈ [a, b] and given values y0 , y00 ∈ R. Another possibility, which defines a boundary value problem, is
to ask for solutions to either of Eqs. (5.1) which satisfy the conditions
da y(a) + na y 0 (a) = ca , db y(b) + nb y 0 (b) = cb , (5.4)
where da , db , na , nb , ca , cb ∈ R are given constants. In other words, we impose linear conditions on the
function at both endpoints of the interval [a, b]. If da = db = 0 so these become conditions on the first
derivate only they are called von Neumann boundary conditions. The opposite case na = nb = 0, when
the boundary conditions only involve y but not y 0 are called Dirichlet boundary conditions. The general
case is referred to as mixed boundary conditions. For ca = cb = 0 the boundary conditions are called
homogeneous, otherwise they are called inhomogeneous.
Initial and boundary value problems, although related, are conceptually quite different. In physics,
the former are usually considered when the problem involves time evolution (so x corresponds to time)
and the initial state of the system needs to be specified at a particular time. Boundary value problems
frequently arise in physics when x has the interpretation of a spatial variable, for example the argument
of a wave function in quantum mechanics which needs to satisfy certain conditions at the boundary.
In this section, we will mainly be concerned with boundary value problems (initial value problems
having been the focus of the first year courses on differential equations). We begin with a quick review of
the relevant basic mathematics.

5.1 Basic theory∗


5.1.1 Systems of linear first order differential equations
The most basic question which arises for differential equations is about the existence and uniqueness of
solutions. To discuss this in the present case, it is useful to consider a somewhat more general problem of
a system of first order inhomogeneous and homogeneous differential equations
y0 = A(x)y + g(x)

(5.5)
y0 = A(x)y

77
where the vector y(x) = (y1 (x), . . . , yn (x))T consists of the n functions we are trying to find, g(x) =
(g1 (x), . . . , gn (x))T is a given vector of functions and A is a given n × n matrix of functions. For this
system, we have the following
Theorem 5.1. Let g = (g1 , . . . gn )T be an n-dimensional vector of continuous functions gi : [a, b] → F
and A = (Aij ) an n × n matrix of continuous functions Aij : [a, b] → F (where F = R or F = C). For
a given x0 ∈ [a, b] and any c ∈ F n the inhomogeneous differential equation (5.5) has a unique solution
y : [a, b] → F n with y(x0 ) = c.
Proof. This is a classical statement from the theory of ordinary differential equations. The existence part
is also sometimes referred to as the Picard-Lindelöf Theorem. The proof can be found in many books on
the subject, for example Ref. [10].

In simple terms, the above theorem states that the initial value problem for the differential equation (5.5)
always has a solution and that this solution is unique. Next, we focus on the homogeneous equation.
Theorem 5.2. Let A = (Aij ) be an n × n matrix of continuous functions Aij : [a, b] → F and y : [a, b] →
F n . Then the set of solutions VH of the homogeneous differential equation (5.5) is an n-dimensional vector
space over F . For k solutions y1 , . . . , yk ∈ VH the following statements are equivalent.
(i) y1 , . . . yk are linearly independent in VH .
(ii) There exists an x0 ∈ [a, b] such that y1 (x0 ), . . . yk (x0 ) ∈ F n are linearly independent.
(iii) The vectors y1 (x), . . . yk (x) ∈ F n are linearly independent for all x ∈ [a, b].
Proof. The proof follows from simple considerations and Theorem 5.1.

So the dimension of the homogeneous solution space is n-dimensional and a given set of solutions y1 , . . . yn
forms a basis of VH iff y1 (x), . . . yn (x) ∈ F n are linearly independent for at least one x. Alternatively, we
can say

y1 , . . . , yn basis of VH ⇔ det(y1 (x), . . . , yn (x)) 6= 0 for at least one x ∈ [a, b] , (5.6)

and this provides a practical way of checking whether a system of solutions forms a basis of the solution
space.
If VI is the set of solutions of the inhomogeneous equation (5.5) it is clear that

V I = y0 + V H , (5.7)

where y0 is any solution to (5.5). A special solution of the inhomogeneous equation can be found by a
process called variation of constants as in the following
Theorem 5.3. (Variation of constants) If Y = (y1 , . . . , yn ) is a basis of VH then
Z x
y0 (x) = Y (x) dt Y (t)−1 g(t) (5.8)
x0

is a solution to the inhomogeneous equation (5.5).


Proof. Again, this is standard and can be shown by straightforward calculation. The fact Y constitutes a
basis of solutions of the homogeneous equations can be expressed as Y 0 = AY . We have
Z x
y00 (x) = Y 0 (x) dt Y (t)−1 g(t) + Y (x)Y (x)−1 g(x) = Ay0 (x) + g(x) . (5.9)
x0

78
5.1.2 Second order linear differential equations
How is the above discussion of first order differential equations relevant to our original problem (5.1) of
second order differential equations? The answer is, of course, that higher order differential equations can
be converted into systems of first order differential equations. To see this, start with the system (5.1) and
define an associated two-dimensional first order system of the form (5.5) given by

0
     
ỹ1 0 1
y= , A= , g= . (5.10)
ỹ2 − αα20 − αα21 f
α2

(We assume α2 is non-zero everywhere.) The solutions of this first-order system and the ones of the second
order equation (5.1) are then in one-to-one correspondence via the identification ỹ1 = y and ỹ2 = y 0 . Given
this observation we can now translate the previous statements for first order systems into statements for
second order differential equations.
Theorem 5.4. Let αi , f : [a, b] → F be continuous (and α2 non-zero on [a, b]). Then we have the following
statements:
(a) For given y0 , y00 ∈ R and x0 ∈ [a, b], the inhomogeneous equation (5.1) has a unique solution y : [a, b] →
F with y(x0 ) = y0 and y 0 (x0 ) = y00 .
(b)The solutions y : [a, b] → F to the homogeneous equation form a two-dimensional vector space VH over
F . Two solutions y1 and y2 to the homogeneous equation form a basis of VH iff the matrix
 
y1 y2
(x) (5.11)
y10 y20

is non-singular for at least one x ∈ [a, b] or, equivalently, iff the Wronski determinant
 
y1 y2
W := det (x) = (y1 y20 − y2 y10 )(x) (5.12)
y10 y20

is non-zero for at least one x ∈ [a, b].


(c) The solution space VI of the inhomogeneous equation (5.1) is given by VI = y0 + VH , where y0 is any
solution to the inhomogeneous equation (5.1).
Proof. All these statements follow directly from the analogoues statements for first order systems in
Theorems 5.1 and 5.2 by using the correspondence (5.10).

The procedure of variation of constants in Theorem 5.3 can also be transferred to second order differential
equations and leads to
Theorem 5.5. (Variation of constants) Let αi , g : [a, b] → F be continuous (and α2 non-zero on [a, b]) and
y1 , y2 : [a, b] → F a basis of solutions for the homogeneous system (5.1). Then, a solution y : [a, b] → F
of the inhomogeneous system is given by
Z x
y(x) = dt G(x, t)f (t) , (5.13)
x0

where G is called the Green function, given by


y1 (t)y2 (x) − y1 (x)y2 (t)
G(x, t) = , (5.14)
α2 (t)W (t)
with the Wronski determinant W = y1 y20 − y2 y10 .

79
Proof. This follows directly from the above results for first order systems. More specifically, inserting

y20 −y2
   
y1 y2 −1 1
Y = , Y = (5.15)
y10 y20 W −y10 y1

together with g from Eq. (5.10) into Eq. (5.8) gives the result.

5.1.3 The boundary value problem


We have now collected a number of standard results which imply that the initial value problem defined
by Eqs. (5.1) and (5.3) always has a unique solution. Also, we have gained insight into the structure
of the total solution space VH of the homogeneous Eq. (5.1) and we know that this space is a two-
dimensional vector space. Further, the space of solutions VI to the inhomogeneous Equation (5.1) is given
by VI = ψ + VH , where ψ is any solution to the inhomogeneous equation. We have also seen that we can
use a basis of solutions in VH to construct a Green function (5.14) which allows us, via Eq. (5.19), to find
a solution to the inhomogeneous equation.
Armed with this information, we should now come back to the boundary value problem defined by
Eqs. (5.1) and (5.4). It is quite useful to split this problem up in the following way. Consider first finding
a solution y0 to the problem

α2 (x)y000 + α1 (x)y00 + α0 (x)y0 = 0 , da y0 (a) + na y00 (a) = ca , db y0 (b) + nb y00 (b) = cb , (5.16)

that is, to the homogeneous equation with the inhomogeneous boundary conditions. Next, find a solution
ỹ to

α2 (x)ỹ 00 + α1 (x)ỹ 0 + α0 (x)ỹ = f (x) , da ỹ(a) + na ỹ 0 (a) = 0 , db ỹ(b) + nb ỹ 0 (b) = 0 , (5.17)

that is, to the inhomogeneous differential equation with a homogeneous version of the boundary conditions.
It is easy to see that, thanks to linearity, the sum y = y0 + ỹ provides a solution to the general problem,
that is, to the inhomogeneous Eq. (5.1) with inhomogeneous boundary conditions (5.4). We can deal with
the first problem (5.16) by finding the most general solution to the homogeneous differential equation,
that is, determine the solution space VH , and then build in the boundary condition. We will discuss some
practical methods to do this soon but for now, let us assume this has been accomplished and we want to
solve the problem (5.17).
The idea is to do this by modifying the variation of constants approach from Theorem (5.5) and
construct a Green function which leads to the correct boundary conditions. Let’s address this for the case
of Dirichlet boundary conditions, so we are considering the problem

α2 (x)y 00 + α1 (x)y 0 + α0 (x)y = f (x) , y(a) = 0 , y(b) = 0 . (5.18)

Theorem 5.6. Let y1 , y2 : [a, b] → F be a basis of VH , that is, a basis of solutions to the homogeneous
system (5.1), satisfying y1 (a) = y2 (b) = 0. Then a solution y : [a, b] → F to the Dirichlet boundary value
problem (5.18) is given by
Z b
y(x) = dt G(x, t)f (t) , (5.19)
a
where the Green function G is given by
y1 (t)y2 (x)θ(x − t) + y1 (x)y2 (t)θ(t − x)
G(x, t) = . (5.20)
α2 (t)W (t)

Here θ is the Heaviside function defined by θ(x) = 1 for x ≥ 0 and θ(x) = 0 for x < 0.

80
Proof. The two homogeneous solutions y1 and y2 satisfy T y1 = T y2 = 0 with the operator T from Eq. (5.2)
and the conditions y1 (a) = y2 (b) = 0 can always be imposed since we know there exists a solution for any
choice of initial condition. Now we start with a typical variation of constant Ansatz
y(x) = u1 (x)y1 (x) + u2 (x)y2 (x) , (5.21)
where u1 , u2 are two functions to be determined. If we impose on those two functions the condition
u01 y1 + u02 y2 = 0 (5.22)
an easy calculation shows that
!
T y = α2 (u01 y10 + u02 y20 ) = f . (5.23)
Solving Eqs. (5.22) and (5.23) for u1 and u2 leads to
Z x x
y2 (t)f (t) y1 (t)f (t)
Z
u1 (x) = − dt , u2 (x) = dt , (5.24)
x1 α 2 (t)W (t) x2 α2 (t)W (t)
where x1 , x2 ∈ [a, b] are two otherwise arbitrary constants. To implement the boundary conditions y(a) =
y(b) = 0 it suffices to demand that u1 (b) = u2 (a) = 0 (given our assumptions about the boundary values
of y1 and y2 ) and this is guaranteed by choosing x1 = b and x2 = a. Inserting these values into Eq. (5.24)
and the expressions for ui back into the Ansatz (5.21) gives the desired result.

Whether Eq. (5.19) is the unique solutions to the boundary value problem (5.18) depends on whether
there is a non-trivial solution to the homogeneous equations in VH which satisfies the relevant boundary
conditions y(a) = y(b) = 0. If there is it can be added to (5.19) and the solution is not unique, otherwise
it is. More generally, going back to the way we have split up the problem into two steps in Eqs. (5.16) and
(5.17), we have now found a method to find a solutions to the second problem (5.17) (the inhomogeneous
equation with the homogeneous boundary conditions) for the Dirichlet case. Any solution to the first
problem (5.16) (the homogeneous equation with inhomogeneous boundary conditions) can be added to
this.

5.1.4 Solving the homogeneous equation


It remains to discuss methods for how to solve the homogeneous equation (5.1). If one solution to this
equation is known a second, linearly independent solution is obtained from the following
Theorem 5.7. (Reduction of order) Let y : [a, b] → F be a solution of T y = 0 with T given in Eq. (5.2)
and I ⊂ [a, b] be an interval for which y and α2 are everywhere non-vanishing. Then ỹ : I → F defined by
 Z x 
1 α1 (t)
ỹ(x) = y(x)u(x) , u0 (x) = exp − dt (5.25)
y(x)2 x0 α2 (t)
satisfies T ỹ = 0 and is linearly independent from y.
Proof. An easy calculation shows that the function u, defined above, satisfies the differential equation
 0 
00 y α1
u + 2 + u0 = 0 . (5.26)
y α2
With ỹ 0 = yu0 + y 0 u and ỹ 00 = yu00 + 2y 0 u0 + y 00 u = − αα12 yu0 + y 00 u it is easy to show that T ỹ = uT y which
vanishes since T y = 0.
The Wronski determinant of the two solutions y and ỹ is W = y ỹ 0 − y 0 ỹ = y(yu0 + y 0 u) − y 0 yu = y 2 u0
and this is non-zero since the last expression is precisely the exponential in Eq. (5.25). Hence, the two
solutions are independent.

81
Obtaining a second independent solution from a known one can be useful but how can we find a solution
in the first place? A very common and efficient method is to start with a power series Ansatz

X
y(x) = ak xk . (5.27)
k=0

Of course, this is only practical if the functions αi which appear in T are polynomial. In this case, the idea
is to insert the Ansatz (5.27) into T y = 0, assemble the coefficient in front of xk and set this coefficient
to zero for every k. In this way, one obtains a recursion relation for the ak and inserting the resulting
ak back into Eq (5.27) gives the solution in terms of a power series. Of course this is where the difficult
work starts. Now one has to understand the properties of the so-obtained series, such as convergence,
singularities or asymptotic behaviour. All this is best demonstrated for examples and we will do this
shortly.

5.2 Examples
We would now like to apply some of the methods and results from the previous subsection to examples.

Application 5.18. Solving the Legendre differential equation


Recall from Eq. (4.36) the Legendre differential equation

(1 − x2 )y 00 − 2xy 0 + n(n + 1)y = 0 . (5.28)

Of course we know that the Legendre polynomials are solutions but we would like to derive this
independently (as well as finding the second solution which must exist) by using the power series
method. Inserting the series (5.27) into the Legendre differential equation gives (after re-defining
some of the summation indices)

X
[(k + 2)(k + 1)ak+2 − (k(k + 1) − n(n + 1))ak ] xk = 0 . (5.29)
k=0

Demanding that the coefficient in front of every monomial xk vanishes (then and only then is a power
series identical to zero) we obtain the recursion formula

k(k + 1) − n(n + 1)
ak+2 = ak , k = 0, 1, . . . , (5.30)
(k + 1)(k + 2)

for the coefficients ak . There are a number of interesting features of this formula. First, the coefficients
a0 and a1 are not fixed but once values have been chosen for them the above recursion formula
determines all other ak . This freedom of choosing two coefficients precisely corresponds to the two
independent solutions we expect. The second interesting feature is that, due to the structure of the
numerator in Eq. (5.30), ak = 0 for k = n + 2, n + 4, . . ..
To see what happens in more detail let’s first assume that n is even. Choose (a0 , a1 ) = (1, 0). In
this case all ak with k odd vanish and the ak with k even are non-zero only for k ≤ n. This means,
the series breaks down and turns into a polynomial - this is of course (propertional to) the Legendre
polynomial Pn for n even. Still for n even, make the complementary choice (a0 , a1 ) = (0, 1). In this
case all the ak with k even are zero. However, for k odd and n even the numerator in Eq. (5.30) never
vanishes so this leads to an infinite series which only contains odd powers of x. This is the second
solution, in addition to the Legendre polynomials. For n odd the situation is of course similar but

82
reversed. For (a0 , a1 ) = (0, 1) we get polynomials - the Legendre polynomial Pn for n odd - while for
(a0 , a1 ) = (1, 0) we get the second solution, an infinite series with only even powers of x.

Exercise 5.8. Show explicitly that, for suitable choices of a0 and a1 , the recursion formula (5.30)
reproduces the first few Legendre polynomials in Eq. (4.35).

Of course the differential equation (5.28) also makes sense if n is a real number, rather than an integer,
and the above calculation leading to the coefficients ak remains valid in this case. However, if n ∈ / N,
the numerator in Eq. (5.30) never vanishes and both solutions to (5.28) are non-polynomial.
To find the second solution we can also use the reduction of order method from Theorem 5.7. To
demonstrate how this works we focus on the case n = 1 with differential equation

(1 − x2 )y 00 − 2xy 0 + 2y = 0 . (5.31)

which is solved by the Legendre polynomial y(x) = P1 (x) = x. Inserting this, together with α1 (x) =
−2x and α2 (x) = 1 − x2 into Eq. (5.25) gives
Z x 
1 2t 1
u0 (x) = 2 exp dt = 2 (5.32)
x 1 − t2 x (1 − x2 )

A further integration leads to


1 1 1+x
u(x) = − + ln (5.33)
x 2 1−x
so the second solution to the Legendre equation (5.31) for n = 1 is
x 1+x
ỹ(x) = xu(x) = ln −1. (5.34)
2 1−x

Exercise 5.9. Find the Taylor series of the solution (5.34) around x = 0 and show that the coefficients
in this series are consistent with the recursion formula (5.30).

Application 5.19. Solving the Hermite differential equation


Recall from Eq. (4.65) that the Hermite differential equation is given by

y 00 − 2xy 0 + 2ny = 0 . (5.35)

To find its solutions we can proceed like we did in the Legendre case and insert the series (5.27). This
leads to
X∞
[(k + 1)(k + 2)ak+2 − 2(k − n)ak ] xk = 0 , (5.36)
k=0

and, hence, the recursion relation

2(k − n)
ak+2 = ak . (5.37)
(k + 1)(k + 2)

As before, we have a free choice of a0 and a1 but with those two coefficients fixed the recursion formula
determines all others. From the numerator in Eq. (5.37) it is clear that ak = 0 for k = n + 2, n + 4, . . ..

83
For n even and (a0 , a1 ) = (1, 0) we get a polynomial with only even powers of x - up to an overall
constant the Hermite Polynomial Hn with n even - while (a0 , a1 ) = (0, 1) leads to an infinite series
with only odd powers of x - the second solution of (5.28). For n odd the choice (a0 , a1 ) = (0, 1) leads
to a polynomial solution with only odd powers of x which is proportional to the Hermite polynomials
Hn for n odd, while the choice (a0 , a1 ) = (1, 0) leads to a power series with only even powers of x.

Exercise 5.10. Show that, for appropriate choices of a0 and a1 the recursion formula reproduces the
first few Hermite polynomials (4.64).

As in the Legendre case, the differential equation (5.35) also makes sense if n is a real number. If
n∈/ N then the numerator in Eq. (5.37) never vanishes and both solutions to (5.35) are non-polynomial.
(This observation plays a role for the energy quantisation of the quantum harmonic oscillator.)

Of course the above discussion can be repeated for the Laguerre differential equation (4.52) as in the
following
Exercise 5.11. Insert the series Ansatz (5.27) into the Laguerre differential equation (4.52) and find
the recursion relation for the coefficients ak . Discuss the result and identify the choices which lead to the
Laguerre polynomials.

Application 5.20. A simple inhomogeneous example


For a simple inhomogeneous case, let us consider the equation

d2
Ty = f , T = +1 (5.38)
dx2
on the interval [a, b] = [0, π2 ], where f is an arbitrary function. (This describes a driven harmonic
oscillator with driving force f .) It is clear that the solution space of the associated homogeneous
equation, T y = 0, is given by

VH = Span (y1 (x) = sin(x), y2 (x) = cos(x)) . (5.39)

As a sanity check we can work out the Wronski determinant

W = y1 y20 − y2 y10 = −1 , (5.40)

and since this is non-vanishing the two solutions are indeed linearly independent. To find the solution
space of the inhomogeneous equation we can use the variation of constant method from Theorem 5.5.
Inserting y1 = sin, y2 = cos, W = −1 and α2 = 1 into Eq. (5.14) we find for the Green function

G(x, t) = sin(x − t) . (5.41)

From Eq. (5.19) this means a special solution to the inhomogeneous equation is given by
Z x Z x
y0 (x) = dt G(x, t)f (t) = dt sin(x − t)f (t) , (5.42)
x0 x0

and, hence, the solution space of the inhomogeneous equation is

VI = y0 + VH . (5.43)

84
Exercise 5.12. Check explicitly that y0 from Eq. (5.42) satisfies the equation T y0 = f .

Let us now consider Eq. (5.38) as a boundary value problem on the interval [a, b] = [0, π2 ] with Dirichlet
boundary conditions y(0) = y(π/2) = 0 and apply the results of Theorem 5.6. First, we note that
y1 (0) = y2 (π/2) = 0 so our chosen homogeneous solutions do indeed satisfy the requirements of the
Theorem. Inserting y1 = sin, y2 = cos, W = −1 and α2 = 1 into Eq. (5.20) gives the Green function

G(x, t) = − sin(t) cos(x) θ(x − t) − sin(x) cos(t) θ(t − x) , (5.44)

and hence Z π/2


y(x) = dt G(x, t)f (t) (5.45)
0
satisfies T y = f as well as the correct boundary conditions y(0) = y(π/2) = 0. We note that there is
no non-trivial solution in VH which satisfies y(0) = y(π/2) = 0 so Eq. (5.45) is the unique solution to
the boundary value problem.

5.3 Bessel differential equation


Another important differential equation in mathematical physics is the Bessel differential equation whose
solutions are referred to as Bessel functions. Before we discuss this in detail we need a bit of preparation
and introduce the Gamma function, Γ, which is another much-used special function.

5.3.1 The Gamma function


The Gamma function is defined by the integral
Z ∞
Γ(x) = dt e−t tx−1 , (5.46)
0

which is certainly well-defined as long as x > 0. A short calculation, using integration by parts and noting
that the boundary term vanishes, gives
Z ∞ Z ∞
∞
dt e−t (xtx−1 ) = e−t tx 0 + dt e−t tx = Γ(x + 1) ,

xΓ(x) = (5.47)
0 0

and, hence, the functional equation


Γ(x + 1) = xΓ(x) (5.48)
of the Gamma function. By direct integration it follows that Γ(1) = 1 and combining this with iterating
the above functional equation we learn that

Γ(n) = (n − 1)! , (5.49)

for n ∈ N. Hence, the Gamma function can be seen as a function which extends the factorial operation
to non-integer numbers. From Eq. (5.48) it also follows by iteration that

Γ(x + n)
Γ(x) = , (5.50)
x(x + 1) · · · (x + n − 1)

which shows that the Γ function has poles at x = 0, −1, −2, . . .. These are, in fact, the only poles of the
Gamma function.

85
Application 5.21. Asymptotic expression for the Γ-function
As an aside, let us derive an asymptotic expression for the Γ-function. We start with the substitution
t = (x − 1)s in the defining integral (5.46), which leads to
Z ∞ Z ∞
x (x−1)(ln(s)−s) x 2
Γ(x) = (x − 1) ds e ' (x − 1) ds e(x−1)(−1−(s−1) /2)+···
0 0
Z ∞ Z ∞
u=s−1 x −(x−1) 2
−(x−1)u /2 x −(x−1) 2
= (x − 1) e du e ' (x − 1) e du e−(x−1)u /2
−1 −∞
x−1
x−1

p
= 2π(x − 1) (5.51)
e

Exercise 5.13. What happened in the second step and the second last step of the calculation in
Eq. (5.51)? Which approximations have been made and how are they justified?

The approximate result (5.51) for the Γ-function leads to the famous Stirling formula
√  n n
n! = Γ(n + 1) ' 2πn (5.52)
e
which provides an asymptotic approximation to the value of n!.

Much more can be said about the Gamma function and its natural habitat is in complex analysis. For
our purposes the above is sufficient but more information can be found, for example, in Ref. [11].

5.3.2 Bessel differential equation and its solutions


We now turn to our main interest, the Bessel differential equation, which reads

x2 y 00 + xy 0 + (x2 − ν 2 )y = 0 , (5.53)

for any number ν ∈ R≥0 . Inserting the Ansatz



X
y(x) = ak xk+α , α = ±ν , (5.54)
k=0

into the differential equation (putting the additional factor xα into the Ansatz proves useful to improve
the properties of the resulting series) leads to

X ∞
X
k(k + 2α)ak xk + ak xk+2 = 0 . (5.55)
k=0 k=0

There is only a single term proportional to x and to remove this term so we need to set a1 = 0. Since the
recursion formula will imply that ak+2 is proportional to ak this means all ak with k odd must vanish.
For the coefficients with even k Eq. (5.55) gives
1
a2k = − a2k−2 , k = 1, 2, . . . . (5.56)
4k(k + α)
This recursion formula can be iterated and leads to
(−1)k Γ(α + 1)
a2k = a0 . (5.57)
22k k!Γ(k + α + 1)

86
That this result for a2k does indeed satisfy the recursion relation (5.56) follows directly from the functional
equation (5.48) of the Gamma function. It is conventional to choose a0 = (2α Γ(α + 1))−1 and, by inserting
everything back into the Ansatz (5.54), this leads to the two series solutions for α = ±ν given by

 x ±ν X (−1)k  x 2k
J±ν (x) := . (5.58)
2 k!Γ(k ± ν + 1) 2
k=0

Both Jν and J−ν are solutions of the Bessel differential equation and they are called Bessel functions of
the first kind. It can be shown (for example by applying the quotient criterion) that the series in Eq (5.58)
converges for all x. There is a subtlety for J−n if n = 0, 1, . . .. In this case, the terms in the series for
k = 0, . . . , n − 1 have a Gamma-function pole in the denominator (see Eq. (5.50)) and are effectively
removed so that the sum starts at k = n. Using this fact, it follows from the above series that

J−n (x) = (−1)n Jn (x) , n = 0, 1, 2, . . . . (5.59)

In other words, if ν = n is an integer then the two solutions Jn and J−n are linearly dependent. If ν is not
an integer the Wronski determinant at x → 0 has the leading term W = Jν (x)J−ν 0 (x) − J 0 (x)J (x) =
ν −ν

− Γ(−ν+1)Γ(ν+1)x (1 + O(x)) 6= 0 which shows that the solutions Jν and J−ν are linearly independent. To
overcome the somewhat awkward distinction between the integer and non-integer case it is customary to
define the Bessel functions of the second kind by
Jν (x) cos(νπ) − J−ν (x)
Nν (x) = . (5.60)
sin(νπ)
This definition has an apparent pole if ν is integer but it can be shown that it is well-defined in the limit
ν → n. We summarise some of the properties of Bessel functions in the following

Proposition 5.1. (Properties of Bessel functions) The Bessel function of the first and second kind, Jν
and Nν , defined above, solve the Bessel differential equation and are linearly independent for all ν ∈ R.
For their asymptotic properties we have
 2
1 π (ln(x/2) + C) for ν = 0
 x ν
x→0 : Jν (x) → Nν → Γ(ν) 2 ν (5.61)
Γ(ν + 1) 2 − π x for ν > 0
r r
2  νπ π  2  νπ π 
x→∞ : Jν (x) → cos x − − Nν (x) → sin x − − (5.62)
πx 2 4 πx 2 4
Proof. The fact that Jν and Nν solve the Bessel differential equation is clear from the above calculation.
Their linear independence has been shows for ν non-integer and the integer case can be dealt with by a
careful consideration of the definition (5.60) as ν → n. (See, for example, Ref. [12]).
The asymptotic limit as x → 0 can be directly read off from the series (except for ν = 0 which requires
a bit more care). The proofs for the asymptotic limits x → ∞ are a bit more involved (see, for example,
in Ref. [12]). Intuitively, for large x, we should only keep the terms proportional to x2 in the Bessel
differential equation (5.53) which leads to y 00 + y ' 0. This is clearly solved by sin and cos.
q q
2 2
Exercise 5.14. Show that J1/2 (x) = πx sin(x) and N1/2 (x) = − πx cos(x).

Particularly the above limits for large x are interesting. They show that the Bessel functions have oscilla-
tory properties close to sin and cos but, unlike these, they are not periodic. This is illustrated in Fig. 13.
In particular, the asymptotic limits for large x show that the Bessel functions have an infinite number of
zeros but they are not equally spaced (although they are asymptotically equally spaced). We denote the

87
1.0

0.5

0.0 x

-0.5

-1.0
0 10 20 30 40

Figure 13: The graph of the Bessel functions J0 , J1 and J2 .

zeros of Jν by zνk where k = 1, 2, . . . labels the zeros from small to large x. Their values can be computed
numerically and some examples for J0 , J1 and J2 are:
z0k = 2.405, 5.520, 8.654, . . .
z1k = 3.832, 7.016, 10.173, . . .
z2k = 5.136, 8.417, 11.620, . . .

5.3.3 Orthogonal systems of Bessel functions


Can we use Bessel functions to construct orthogonal systems of functions on some appropriate Hilbert
space, analogous to what we have done for sin and cos? The singularity for the Bessel functions of the
second kind at x = 0 makes them less suitable but the Bessel functions Jν of the first kind are everywhere

well-behaved. Asymptotically, they approach a function of the form sin(x − const)/ x so it seems natural
to try something analogous to the sine Fourier series, where we have used functions proportional to
sin kπx on ortho-normal basis on L2 ([0, a]). With this motivation we focus on a particular

a to construct
Bessel function Jν for fixed ν and define functions on the interval x ∈ [0, a] by
z x
νk
Jˆνk (x) := Nνk Jν , (5.63)
a
where k = 1, 2, . . . and normalisation constant Nνk . Motivated by the above discussion, the zeros kπ of
the sine which featured in the definition of the sine Fourier series have been replaced by the zeros zνk of
the Bessel function Jν .
Lemma 5.1. The functions Jˆνk for k = 1, 2, . . ., defined in Eq. (5.63) form an orthogonal system of
functions on L2w ([0, a]), where w(x) = x, which is ortho-normal for suitable choices of the constans Nνk .
Proof. A calculation using the definition (5.63) and the Bessel differential equations shows that the Jˆνk
satisfy
2
ν2
 
ˆ zνk ˆ 1 d d
T Jνk = − 2 Jνk , T = x − 2 . (5.64)
a x dx dx x
Hence, the Jˆνk are eigenvectors of the operator T with eigenvalues −zνk 2 /a2 . On the space L2
w,per ([0, a])
of functions with f (0) = f (a) and weight function w(x) = x the operator T is hermitian and Jˆνk ∈
L2w,per ([0, a]) (since they vanish at x = 0, a). Since eigenvectors of a hermitian operator which correspond to
different eigenvalues are orthogonal this must be the case for the Jˆνk since all the zeros zνk are different.

88
In fact, we have the following stronger statement.

Theorem 5.15. For ν > −1, the functions Jˆνk for k = 1, 2, . . ., defined in Eq. (5.63), with suitable choices
for Nνk , form an ortho-normal basis of L2w ([0, a]), where w(x) = x.

Proof. The direct proof is technical and can be found in Ref. [7]. In the next subsection, we will see an
independent argument.

The theorem implies that every function f ∈ L2w ([0, a]) can be expanded in terms of Bessel functions as
∞ Z a
ak Jˆνk , ak = hJˆνk , f i = dx xJˆνk (x)f (x) .
X
f= (5.65)
k=1 0

5.4 The operator perspective - Sturm-Liouville operators


So far we have discussed second order linear differentials equations in a somewhat down-to-earth way,
using methods of basic analysis. We would now like to make contact with functional analysis and our
earlier discussion of Hilbert spaces.

5.4.1 Sturm-Liouville operators


A second order differential operator of the form
   
1 d d
TSL = p(x) + q(x) (5.66)
w(x) dx dx

with (real-valued) smooth functions w, p and q is called a Sturm-Liouville operator. For now, we would
like to think of this as an operator on the space

L([a, b]) := L2w ([a, b]) ∩ C ∞ ([a, b]) (5.67)

on the space square integrable functions, relative to a weight function w, on an interval [a, b] which are
also infinitely many times differentiable. Accordingly, we should demand that w, p and q are smooth
functions and that w, as a weight function, is strictly positive.

Lemma 5.2. Consider a linear second order differential operator of the form

d2 d
T = α2 (x) 2
+ α1 (x) + α0 (x) , (5.68)
dx dx
where x ∈ [a, b] and I ⊂ [a, b] is an interval such that α2 (x) 6= 0 for all x ∈ I. Then, on I, the operator T
can be written in Sturm-Liouville form (5.66) with
Z x 
α1 (t) p(x)
p(x) = exp dt , w(x) = , q(x) = α0 (x)w(x) , (5.69)
x0 α2 (t) α2 (x)

where x0 ∈ I.
d
Proof. Abbreviating D = dx and noting that p0 = αα12 p we obtain, by inserting into the Sturm-Liouville
operator,
p p0 q α1 p
TSL = D2 + D + = α2 D2 + D + α0 = α2 D2 + α1 D + α0 = T . (5.70)
w w w α2 w

89
Introducing the interval I in the above theorem is to avoid an undefined integrand in the first Eq. (5.69)
due to the vanishing of α2 . Even when this happens (such as, for example, for the Legendre differential
equation at x = ±1) and the interval I is, at first, chosen to be genuinely smaller than [a, b] it turns out
the final result for w, p and q can often be extended to and is well-defined on [a, b].
An obvious question is whether TSL is self-adjoint as an operator on the space L([a, b]), relative to the
standard inner product
Z b
hf, gi = dx w(x)f (x)g(x) , (5.71)
a
with weight function w. A quick calculation shows that
Z b Z b
b
hf, TSL gi = dx (f D(pDg) + f qg) = [pf Dg]a − dx (pDf Dg − qf g)
a a
Z b
b
= [pf Dg]ba − [pgDf ]ba + dx (D(pDf )g + qf g) = pf g 0 − pgf 0 a + hTSL f, gi .

(5.72)
a

So TSL is superficially self-adjoint but we have to ensure that the boundary terms on the RHS vanish.
There are two obvious ways in which this can be achieved. First, the interval [a, b] might be chosen such
that p(a) = p(b) = 0 - this is also called the natural choice of the interval. In this case, the boundary term
vanishes without any additional condition on the functions f , g and TSL is self-adjoint on L([a, b]). If this
doesn’t work we can consider the subspace

Lb ([a, b]) := {f ∈ L([a, b]) | da f (a) + na f 0 (a) = db f (b) + nb f 0 (b) = 0} , (5.73)

of smooth functions which satisfy mixed homogeneous boundary conditions at a and b. For such functions
the above boundary term also vanishes. If p(a) = p(b) the boundary term also vanishes for periodic
functions
Lp ([a, b]) := {f ∈ L([a, b]) | f (a) = f (b) , f 0 (a) = f 0 (b)} . (5.74)
Hence, we have
Lemma 5.3. Let TSL be a Sturm-Liouville operator (5.66). If p(a) = p(b) = 0 then TSL is self-adjoint as
on operator on the space L([a, b]) in Eq. (9.112). It is also self-adjoint on the space of functions Lb ([a, b])
with mixed homogeneous boundary in Eq. (5.73). If p(a) = p(b) it is self-adjoint on the space Lp ([a, b]) of
periodic functions in Eq. (5.74).
To simplify the notation, we will refer to the space on which the Sturm-Liouville operator is defined
and self-adjoint as LSL ([a, b]). From the previous Lemma, this can be L([a, b]), Lb ([a, b]) or Lp ([a, b]),
depending on the case.

5.4.2 Sturm-Liouville eigenvalue problem


It is now interesting to consider a Sturm-Liouville eigenvalue problem, that is, to consider the eigenvalue
equation
TSL y = λy , (5.75)
on LSL ([a, b]). Since TSL is hermitian, we already know from general arguments (see Theorem 1.24) that the
eigenvalues λ must be real and that eigenvectors for different eigenvalues must be orthogonal. Since all the
second order differential equations discussed so far can be phrased as a Sturm-Liouville eigenvalue problem
(see Table 2) this provides a uniform reason for the appearance of the various orthogonal function systems
we have encountered. It is tempting to go further and try to use Theorem 2.9 to argue that orthogonal
systems of eigenfunctions of Sturm-Liouville operators must, in fact, form an ortho-normal basis. Arguing

90
name DEQ p q w LSL [a, b] bound. cond. λ y
2 2
00
− πa2k kπx

sine Fourier y = λy 1 0 1 Lb ([0, a]) y(0) = y(π) = 0 sin a 
2 2
cosine Fourier y 00 = λy 1 0 1 Lb ([0, a]) y 0 (0) = y 0 (π) = 0 − πa2k cos kπx
a 
2 2
Fourier y 00 = λy 1 0 1 Lp ([−a, a]) periodic − πa2k sin kπx
a 
2 2
− πa2k cos kπx
a
Legendre (1 − x2 )y 00 − 2xy 0 = λy 1 − x2 0 1 L([−1, 1]) natural −n(n + 1) Pn
Laguerre xy 00 + (1 − x)y 0 = λy xe−x 0 e−x L([0, ∞]) natural −n Ln
2 2
Hermite y 00 − 2xy 0 = λy e−x 0 e−x L([−∞, ∞]) natural −2n Hn
2
ν2 2 zνk
Bessel y 00 + x1 y 0 − x2 y = λy x − xν 2 x Lb ([0, a]) y(0) = y(a) = 0 − a2 Jˆνk

Table 2: The second order differential equations discussed so far and their formulation as a Sturm-Liouville
eigenvalue problem.

in this way would be incorrect for two reasons. First, so far the Sturm-Liouville operator is only defined
on the space LSL which consists of certain smooth functions. While this space may well be dense in the
appropriate L2 Hilbert space it is not a Hilbert space itself. Secondly, Theorem 2.9 applies to compact
operators and we know from Exercise 1.13 that differential operators are not bounded and, hence, not
compact.
One way to make progress is to convert the Sturm-Liouville differential operator into an integral
operator. Some of the hard work has already been done in Theorem 5.6 where we have shown that,
provided Ker(TSL ) = {0} we know (for Dirichlet boundary conditions) that
Z b
TSL y = f ⇐⇒ y = Ĝf , Ĝf (x) := dt G(x, t)f (t) , (5.76)
a

where G is the Green function. The integral operator Ĝ, defined in terms of the Green function kernel
G, can be thought of as the inverse of the Sturm-Liouville operator and, as an integral operator, we can
extend it to act on the space L2w ([a, b]) (with appropriate boundary conditions). Moreover, we have

Lemma 5.4. If Ker(TSL ) = {0} then the operator Ĝ in Eq. (5.76) is self-adjoint and compact on L2w ([a, b])
(with Dirichlet boundary conditions).

Proof. The proof can, for example, be found in Ref. [5].

If we set f = λy in Eq. (5.76) we get


1
TSL y = λy ⇐⇒ Ĝy = y. (5.77)
λ

This means, the Sturm-Liouvllle eigenvalue problem is converted into an eigenvalue problem for Ĝ which
is formulated in terms of an integral equation, also called a Volterra integral equation. The eigenfunctions
for TSL and Ĝ are the same and the eigenvalues each others inverse. Since Ĝ is compact we can now
apply Theorem 2.9 to it. If Ker(TSL ) 6= {0} we can shift TSL → TSL + α by some value α so that the new
operator has a vanishing kernel. In summary, Theorem 2.9 then applies to our eigenvalue problem:

Lemma 5.5. The set of eigenvalues of the Sturm-Liouville operator is either finite or it forms a sequence
which tends to infinity. Every eigenvalue has a finite degeneracy and there exists an ortho-normal basis
of eigenvectors.

91
5.4.3 Sturm-Liouville and Fredholm alternative
In view of the Sturm-Liouville formalism, we can now re-visit our original boundary value problem but, for
simplicity, we specialise to the case of homogeneous boundary conditions. This means, we are considering
the equations

Ty = f , da y(a) + na y 0 (a) = 0 , db y(b) + nb y 0 (b) = 0 (5.78)


0 0
Ty = 0 , da y(a) + na y (a) = 0 , db y(b) + nb y (b) = 0 (5.79)

where T is a second order differential operator of the form (5.2) (which, we now know, can be written
in Sturm-Liouville form). In a way similar to the above Green function method, we can convert this
problem into one that involves a compact integral operator, turning the differential equation into an
integral equation. The benefit is that Theorem 2.10 on Fredholm’s alternative can be applied to this
problem and turned into a version of Fredholm’s alternative for second order linear differential equations.

Theorem 5.16. Let ek be an ortho-normal basis of eigenvectors of T with eigenvalues λk , so T ek = λk ek


(and the ek satisfy the boundary conditions in Eq (5.78)). Then, the following alternative holds for the
solution to the boundary value problem (5.78).
(a) There is no non-trivial solution to the homogeneous problem (5.79). In this case there is a unique
solution y to the problem (5.78) for every f given by
X 1
y= hek , f iek . (5.80)
λk
k

(b) There is a non-trivial solution of the homogeneous problem (5.79). In this case, there exists a solution to
the inhomogeneous problem if and only if hy0 , f i = 0 for all solutions y0 to the homogeneous problem (5.79).
If this condition is satisfied, the solution to (5.78) is given by
X 1
y= hek , f iek + y0 , (5.81)
λk
k:λk 6=0

where y0 is an arbitrary solution to the homogeneous problem (5.79).

Proof. Broadly, this follows from Theorem 2.10 on Fredholm’s alternative. The details of the proof are
somewhat technical, particularly in dealing with the boundary conditions, and can, for example, be found
in Ref. [5].

We note that the unique solution (5.80) in case (a) can be written as
X 1 Z b Z b
y(x) = dt w(t)ek (t)ek (x)f (t) = dt G(x, t)f (t) (5.82)
λk a a
k

so that we obtain an expression for the Green function


X 1
G(x, t) = w(t) ek (t)ek (x) (5.83)
λk
k

in terms of the eigenfunctions.

92
6 Laplace equation
The Laplace operator ∆ in Rn with coordinates (x1 , . . . , xn ) is defined as
n
X ∂2
∆= . (6.1)
i=1
∂x2i

It gives rise to the homogeneous and inhomogeneous Laplace equations


∆φ = 0 , ∆φ = ρ , (6.2)
where ρ : Rn → R is a given function. We are asking for the solutions φ to these equations on a compact set
V ⊂ U ⊂ Rn (where U is open), usually subject to certain boundary conditions on the smooth boundary
∂V of V. A solution to the homogeneous Laplace equation is also called a harmonic function. There are
two types of boundary conditions which are frequently imposed:
φ|∂V = h (Dirichlet) n · ∇φ|∂V = h (von Neumann) (6.3)
where h is a given function on the boundary prescribing the boundary values and n is the normal vector
to the boundary. (Linear combinations of these conditions, referred to as mixed boundary conditions, are
also possible.) If the choice of boundary condition involves setting h = 0 and we define the “force field”
E = −∇φ then Dirichlet boundary conditions imply that the boundary is an equipotential surface, so
E is perpendicular to it. Under the same conditions, von Neumann boundary conditions imply that the
component of E normal to the boundary vanishes.
The above equations are of considerable importance in physics. For example, they govern the theory
of electrostatics (with φ being the electrostatic potential, ρ, up to a constant, the charge density, E the
electric field and boundary conditions implemented, for example, by the presence of conducting surfaces)
and the theory of Newtonian gravity (with φ being the Newtonian gravitational potential, ρ, up to constant,
being the mass density and E the gravitational field). The Laplace operator also appears as part of many
partial differential equations in physics, for example in the context of the Schrödinger equation in quantum
mechanics.
Eqs. (6.2) are obviously linear so we already know that, before imposing any boundary conditions, the
solutions to the homogeneous Laplace equation form a vector space and the solutions to the inhomogeneous
equation can be obtained by adding all solutions of the homogeneous equation to a particular solution of
the inhomogeneous equation.

6.1 Laplacian in different coordinate systems


It is often useful to write the Laplacian in other than Cartesian coordinates and this is facilitated by the
following
Proposition 6.1. (Laplacian in general coordinates) Given the (twice differentiable) map X : V → U
(a “parametrisation” or a “coordinate change”), with V ⊂ Rk and U ⊂ Rn open, we consider the space
M = X(V ). Introducing coordinates t = (t1 , . . . , tk ) on V we write X as t 7→ x(t) = (x1 (t), . . . , xn (t)).
∂x
With the tangent vectors ∂t i
(required to be linearly independent), define the k × k metric G with entries
∂x ∂x
Gij = · , g := det(G) . (6.4)
∂ti ∂tj
The entries of its inverse G−1 are denoted by Gij . Then, the Laplacian ∆X relative to the parametrisation
X is given by  
1 ∂ √ ij ∂
∆X = √ gG . (6.5)
g ∂ti ∂tj

93

The measure for integration relative to X is given by dS = g dt1 · · · dtk .

Proof. This formula is proved in Appendix B which contains an account of some basic differential geometry,
a subject somewhat outside the main thrust of this lecture.

Exercise 6.1. Consider a curve [a, b] 3 t → x(t) ∈ Rn and use Lemma 6.1 to derive the measure dS for
integration over a curve. Do the same with a surface (t1 , t2 ) → x(t1 , t2 ) ∈ R3 and convince yourself that
the measure dS you obtain reproduces what you have learned about integration over surfaces.

Let us apply this formula to derive the Laplacian in several useful coordinate systems.

6.1.1 Two-dimensional Laplacian


In R2 with Cartesian coordinates x = (x, y)T the Laplacian is given by

∂2 ∂2
∆2 = + (6.6)
∂x2 ∂y 2

The two-dimensional case is somewhat special since R2 ∼ = C and we can introduce complex coordinates
z = x + iy and z̄ = x − iy. Introducing the Wirtinger derivatives
   
∂ 1 ∂ ∂ ∂ 1 ∂ ∂
= −i , = +i , (6.7)
∂z 2 ∂x ∂y ∂ z̄ 2 ∂x ∂y

a short calculation shows that


∂2
∆2 = 4 , (6.8)
∂z∂ z̄
This formula is extremely useful. We note that a holomorphic function, φ = φ(z), is, loosely speaking,
a function which does not depend on z̄ and, hence, satisfies ∂φ∂ z̄ = 0. Eq. (6.8) says that every holomor-
phic function φ = φ(z) solves the two-dimensional homogeneous Laplace equation, ∆2 φ = 0, so we can
immediately write down large classes of solutions to this equation. We will come back to this observation
later.
Another common set of coordinates in R2 are two-dimensional polar coordinates t = (r, ϕ), where
r ∈ [0, ∞] and ϕ ∈ [0, 2π[, related to Cartesian coordinates by

x(r, ϕ) = r(cos ϕ, sin ϕ) . (6.9)

In the language of Lemma (6.1), the corresponding tangent vectors are

∂x ∂x
= (cos ϕ, sin ϕ) , = r(− sin ϕ, cos ϕ) , (6.10)
∂r ∂ϕ

which gives G = diag(1, r2 ) and g = r2 . Inserting this into the general formula (6.5) gives the two-
dimensional Laplacian in polar coordinates

1 ∂2
 
1 ∂ ∂
∆2,pol = r + 2 . (6.11)
r ∂r ∂r r ∂ϕ2

(The integration measure in two-dimensional polar coordinates is dS = g dr dϕ = r dr dϕ.)

94
6.1.2 Three-dimensional Laplacian
In R3 with coordinates x = (x, y, z) the Laplacian in Cartesian coordinates is given by

∂2 ∂2 ∂2
∆3 = + + . (6.12)
∂x2 ∂y 2 ∂z 2

Cylindrical coordinates t = (r, ϕ, z), where r ∈ [0, ∞], ϕ ∈ [0, 2π[ and z ∈ R, are related to their Cartesian
counterparts by
x(r, ϕ, z) = (r cos ϕ, r sin ϕ, z) . (6.13)
The tangent vectors
∂x ∂x ∂x
= (cos ϕ, sin ϕ, 0) , = (−r sin ϕ, r cos ϕ, 0) , = (0, 0, 1) , (6.14)
∂r ∂ϕ ∂z

imply the metric G = diag(1, r2 , 1) with determinant g = r2 and hence, by inserting into Eq. (6.5), the
three-dimensional Laplacian in cylindrical coordinates

1 ∂2 ∂2 ∂2
 
1 ∂ ∂
∆3 = r + 2 + = ∆2,pol + . (6.15)
r ∂r ∂r r ∂ϕ2 ∂z 2 ∂z 2

(For the integration measure in cylindrical coordinates we get the well-known result dS = g dr dϕ dz =
rdr dϕ dz.)
We can repeat this analysis for three-dimensional spherical coordinates t = (r, θ, ϕ), where r ∈ [0, ∞],
θ ∈ [0, π[ and ϕ ∈ [0, 2π[, defined by

x(r, θ, ϕ) = r(sin θ cos ϕ, sin θ sin ϕ, cos θ) . (6.16)

The tangent vectors are


∂x
= (sin θ cos ϕ, sin θ sin ϕ, cos θ)
∂r
∂x
= r(cos θ cos ϕ, cos θ sin ϕ, − sin θ)
∂θ
∂x
= r(− sin θ sin ϕ, sin θ cos ϕ, 0)
∂ϕ

which leads to the metric G = diag(1, r2 , r2 sin2 θ) with determinant g = r4 sin2 θ. Inserting into Eq. (6.5)
gives the three-dimensional Lagrangian in spherical coordinates

1 ∂2
     
1 ∂ 2 ∂ 1 1 ∂ ∂
∆3,sph = 2 r + 2 sin θ + . (6.17)
r ∂r ∂r r sin θ ∂θ ∂θ sin2 θ ∂ϕ2

(The integration measure for three-dimensional polar coordinates is dS = g dr dθ dϕ = r2 sin θ dr dθ dϕ.)

6.1.3 Laplacian on the sphere


We can also use Lemma 6.1 to find the Laplacian on non-trivial manifolds, such as a two-sphere S 2 = {x ∈
R3 | |x| = 1}. We parametrise the two-sphere by coordinates t = (θ, ϕ), where θ ∈ [0, π[ and ϕ ∈ [0, 2π[,
by writing
x(θ, ϕ) = (sin θ cos ϕ, sin θ sin ϕ, cos θ) . (6.18)

95
The two tangent vectors are
∂x ∂x
= (cos θ cos ϕ, cos θ sin ϕ, − sin θ) , = (− sin θ sin ϕ, sin θ cos ϕ, 0) , (6.19)
∂θ ∂ϕ

with associated metric G = diag(1, sin2 θ) and determinant g = sin2 θ. Inserting into Eq. (6.5) gives the
Laplacian on the two-sphere

1 ∂2
 
1 ∂ ∂
∆S 2 = sin θ + . (6.20)
sin θ ∂θ ∂θ sin2 θ ∂ϕ2

Comparison with Eq. (6.17) shows that the three-dimensional Laplacian can be expressed as
 
1 ∂ 2 ∂ 1
∆3,sph = 2 r + 2 ∆S 2 . (6.21)
r ∂r ∂r r

(The integration measure on the two-sphere is dS = g dθ dϕ = sin θ dθ dϕ.)

6.1.4 Green Identities


Our discussion below will frequently require Green’s identities which follow from Gauss’s integral theorem
Z Z
∇ · A dV = A · n dS . (6.22)
V ∂V
where A is a continuously differentiable vector field on the open set U ⊂ Rn and V ⊂ U is a compact set
with smooth boundary ∂V. Consider two twice continuously differentiable functions f, g : U → R and set
A = f ∇g. If we use

∇ · A = ∇ · (f ∇g) = f ∆g + ∇f · ∇g , A · n = f ∇g · n (6.23)

Gauss’s theorem turns into the first Green formula


Z Z
(f ∆g + ∇f · ∇g)dV = f ∇g · n dS . (6.24)
V ∂V

Exchanging f and g in this formula and subtracting the two resulting equations gives the second Green
formula or Green’s identity
Z Z
(f ∆g − g∆f )dV = (f ∇g − g∇f ) · n dS . (6.25)
V ∂V

After this preparation we are now ready to delve into the task of solving the Laplace equation.

6.2 Basic theory∗


6.2.1 Green functions for the Laplacian
In this subsection we discuss a number of basic mathematical results for the Laplace equation in Rn (with
Cartesian coordinates x = (x1 , . . . , xn )T ), starting with the inhomogeneous version
n
X ∂2
∆φ = ρ , ∆= 2 , (6.26)
i=1
∂xi

96
of the equation. Define the generalised Newton (or Coulomb) potentials as
(
1 1
− (n−2)v n |x−a|
n−2 for n > 2
G(x − a) = Ga (x) = 1 , (6.27)
2π ln |x − a| for n = 2

where vn is the surface “area” of the n − 1-dimensional sphere, S n−1 (and the constants have been
included for later convenience). In electromagnetism, Ga corresponds the the electrostatic potential of a
point charge located at a. Clearly, Ga is well-defined for all x 6= a. It is straightforward to verify by direct
calculation that
∆Ga = 0 for all x 6= a . (6.28)

Exercise 6.2. Show that the gradient of the Newton potentials (6.27) is given by
1 x−a
∇Ga (x) = . (6.29)
vn |x − a|n

Also, verify that the Newton potentials satisfy the homogeneous Laplace equation for all x 6= a.

Lemma 6.1. For f ∈ C 1 (U ) and a ∈ U we have


Z
f (a) = lim (f (x)∇Ga (x) − Ga (x)∇f (x)) · dS . (6.30)
→0 |x−a|=

x−a
Proof. With dS = n dS and the unit normal vector n to the sphere given by n = |x−a| we have

1 1
∇Ga · dS = dS . (6.31)
vn |x − a|n−1

This gives for the first part of the above integral


1
Z Z
lim f (x)∇Ga (x) · dS = lim f (x)dS
→0 |x−a|= →0 vn n−1 |x−a|=

y=(x−a)/ 1
Z
= lim f (a + y)dS = f (a) . (6.32)
vn →0 |y|=1

For the second integral, using that |∇f (x) · n| ≤ K for some constant K, we have
Z Z Z
2−n →0
Ga (x)∇f (x) · dS ≤ const  dS =  dS −→ 0 , (6.33)
|x−a|= |x−a|= |y|=1

and this completes the proof.

This Lemma was the technical preparation for the following important statement.

Theorem 6.3. Let ρ ∈ Cc2 (Rn ) and define the function


Z
φ(x) := dy n G(x − y)ρ(y) (6.34)
Rn

for all x ∈ Rn . Then ∆φ = ρ, that is, the above φ satisfies the inhomogeneous Laplace equation with
source ρ.

97
Proof. Introducing the coordinate z = y − x, a region V = {z ∈ Rn |  ≤ |z| ≤ R} with R so large that
ρ(x + z) = 0 for |z| > R (which is possible since ρ has compact support) and ρx (z) := ρ(x + z), we have
Z Z Z
n n
∆φ(x) = dz G(z)∆ρ(x + z) = lim dz G ∆ρx = lim dz n (G ∆ρx − ρx ∆G)
Rn →0 V →0 V
Z
= lim (G∇ρx − ρx ∇G) · dS = ρx (0) = ρ(x) , (6.35)
→0 ∂V

where we have used Green’s formula (6.25) and Lemma (6.1).

The above function G is also sometimes referred to as the Green function of the Laplace operator. Of
course, the solution (6.34) is not unique but we know that two solutions to the inhomogeneous Laplace
equation differ by a solution to the homogeneous one. Hence, the general solution to the inhomogeneous
Laplace equation can be written as
Z
φ(x) = φH (x) + dy n G(x − y)ρ(y) where ∆φH = 0 . (6.36)
Rn
The homogeneous solution φH can be used to satisfy the boundary conditions on φ. Note that the
requirement on ρ to have compact support also makes physical sense: normally charge or mass distributions
are localised in space.

6.2.2 Maximum principle and uniqueness


Via Eq. (6.36), we have now reduced the problem of solving the inhomogeneous Laplace equation to that of
solving the homogeneous Laplace equation and this is what we discuss next. A twice differential function
φ which solves the homogeneous Laplace equation,
∆φ = 0 , (6.37)
is called a harmonic function. Harmonic functions have a number of remarkable properties which we now
derive. We begin with another technical Lemma.
Lemma 6.2. For U ⊂ Rn open, V ⊂ U compact with smooth boundary ∂V and φ harmonic on V̊ := V \∂V
we have Z 
φ(a) for a ∈ V̊
(φ∇Ga − Ga ∇φ) · dS = (6.38)
∂V 0 for a ∈ Rn \ V
Proof. For the second case, a ∈ Rn \ V we have from Green’s formula
Z Z
(φ∇Ga − Ga ∇φ) · dS = (φ∆Ga − Ga ∆φ)dV = 0 , (6.39)
∂V V
since ∆Ga = ∆φ = 0 for all x ∈ V.
For the first case, we define V = V \ B (a), that is, we excise a ball with radius  around a. Just like
above it follows from Green’s formula that
Z
(φ∇Ga − Ga ∇φ) · dS = 0 . (6.40)
∂V

Since the boundary ∂V consists of the two components ∂V and ∂B (a) this implies
Z Z
→0
(φ∇Ga − Ga ∇φ) · dS = (φ∇Ga − Ga ∇φ) · dS −→ φ(a) , (6.41)
∂V ∂B (a)

where Lemma (6.1) has been used in the final step. Since the integral on the LHS is independent of  this
completes the proof.

98
We are now ready to proof the first important property of harmonic functions.
Theorem 6.4. (Mean value property of harmonic functions) Let U ⊂ Rn be open, φ harmonic on U and
Br (a) ⊂ U . Then
1
Z
φ(a) = φ(a + ry)dS (6.42)
vn |y|=1
Proof. From the previous Lemma we have
Z
φ(a) = (φ∇Ga − Ga ∇φ) · dS , (6.43)
|x−a|=r

and the first part of this integral


1 1
Z Z Z
y=(x−a)/r
φ∇Ga · ndS = φ(x)dS = φ(a + ry)dS (6.44)
|x−a|=r vn rn−1 |x−a|=r vn |y|=1

already gives the desired result. It remains to be shown that the second part of the integral vanishes.
Since Ga is constant for |x − a| = r it is unimportant and it is sufficient to consider
Z Z Z
Ga ∇φ · dS = (1∇φ − φ∇1) · dS = (1∆φ − φ∆1)dV = 0 , (6.45)
|x−a|=r |x−a|=r |x−a|≤r

which does indeed vanish from Green’s theorem.

An important consequence of this property is


Theorem 6.5. (Maximum principle for harmonic functions) Let U ⊂ Rn be open and (path-) connected
and φ a harmonic function on U . If φ assumes its maximum for a point a ∈ U then φ is constant.
Proof. We set M := sup{φ(x) | x ∈ U } and assume that φ(x) = M for some x ∈ U . We start by showing
that this implies φ is constant on a ball B (x) ⊂ U . From the mean value property it follows for all r with
r0 < r <  that
1
Z Z
M = φ(x) = φ(x + ry)dS ⇔ (M − φ(x − ry))dS = 0 . (6.46)
vn |y|=1 |y|=1

Since M − φ(x − ry) ≥ 0 this implies that φ(x − ry) = M for all |y| = 1 and, hence, φ(y) = M for all
y ∈ B (x).
To extend the statement to all of U we assume there exists a a ∈ U with φ(a) = M . Assume that φ
is not constant on U so there is a b ∈ U with φ(b) < M and choose a (continuous) path α : [0, 1] → U
which connects a and b, that is, α(0) = a and α(1) = b. Let t0 = sup{t ∈ [0, 1] | φ(α(t)) = M } be the
“upper value” of t along the path for which the maximum is assumed. Since φ(b) < M necessarily t0 < 1
and since φ ◦ α is continuous we have φ(α(t0 )) = M . But from the first part of the proof we know there
is a ball B (α(t0 )) where φ equals M which is in contradiction with t0 being the supremum. Hence, the
assumption φ(b) < M was incorrect.

Corollary 6.1. Let U ⊂ Rn be a bounded, connected open set and φ be harmonic on U and continuous
on Ū . Then φ assumes its maximum and minimum on the boundary of U .
Proof. Since Ū is compact φ assumes its maximum on Ū . If the maximum point is on the boundary ∂ Ū
then the statement is true. If it is in U then, from the previous theorem, φ is constant on U and, hence,
by continuity, constant on Ū . Therefore it also assume its maximum on the boundary. The corresponding
statement for the minimum follows by considering −φ.

99
These innocent sounding statements have important implications for boundary value problems. Sup-
pose we have an open, bounded and connected set V ⊂ U ⊂ Rn and we would like to solve the Dirichlet
boundary value problem
∆φ = ρ , φ|∂V = h , (6.47)
where ρ is a given function on U and h is a given function on the boundary ∂V which prescribes the
boundary values of φ. Suppose we have two solutions φ1 and φ2 to this problem. Then the difference
φ := φ1 − φ2 satisfies
∆φ = 0 , φ|∂V = 0 . (6.48)
Since φ is harmonic it assumes its maximum and minimum on the boundary and since the boundary
values are zero we conclude that φ = 0. This means that the solution to the boundary value problem, if
it exists, is unique.
We also have a uniqueness statement for our solution (6.34).

Corollary 6.2. For ρ ∈ Cc2 (Rn ) and n ≥ 3 the equation ∆φ = ρ has a unique solution φ : Rn → R with
lim|x|→∞ |φ(x)| = 0. This solution is given by Eq. (6.34).

Proof. The solution (6.34) has the desired property at infinity since lim|x|→∞ Ga (x) = 0 for n ≥ 3. To
show uniqueness assume another solution, φ̃, with the same property and consider the difference ψ = φ− φ̃.
We have
∆ψ = 0 , lim |ψ(x)| = 0 . (6.49)
|x|→∞

From the vanishing property at infinity, we know that for every  > 0, there exists a radius R such that
|ψ(x)| ≤  for all |x| ≥ R. But the restricted function ψ|BR (0) assumes its maximum and minimum on the
boundary so it follows that |ψ(x) ≤  for all x ∈ Rn . Since  > 0 was arbitrary this means that ψ = 0.

6.2.3 Uniqueness - another approach


There is another way to make the uniqueness argument which can also be directly applied to the case of
boundary value problems with von Neumann boundary conditions

∆φ = ρ , n · ∇φ|∂V = h . (6.50)

Consider a harmonic function φ on U and set f = g = φ in Green’s first identity (6.24). This results in
Z Z
|∇φ|2 dV = φ ∇φ · n dS , (6.51)
V ∂V

and, hence,
∆φ = 0 , φ|∂V = 0 =⇒ φ=0
(6.52)
∆φ = 0 , ∇φ · n|∂V = 0 =⇒ φ = const .
Applied to the difference φ = φ1 − φ2 of two solutions φ1 , φ2 to the Dirichlet problem (6.47) this result
implies φ1 = φ2 , so uniqueness of the solution. Applying it to the difference φ = φ1 −φ2 of two solutions to
the von Neumann problem (6.50) gives φ1 = φ2 + const, so uniqueness up to an additive constant (which
does not change E = −∇φ).
After this somewhat theoretical introduction we now proceed to the problem of how to solve Laplace’s
equation in practice, starting with the two-dimensional case.

100
6.3 Laplace equation in two dimensions
At first sight, the two-dimensional Laplace equation seems of little physical interest - after all physical
space has three dimensions. However, there are many problems, for example, in electro-statics, which are
effectively two-dimensional due to translational symmetry in one direction. (Think, for example, of the
field produced by of a long charged wire along the z-axis.)
We denote the Cartesian coordinates by x = (x, y)T and also introduce complex coordinates z = x + iy
and z̄ = x − iy. Recall that the two-dimensional Laplacian can be written as
∂2 ∂2 ∂2
∆= + = 4 . (6.53)
∂x2 ∂y 2 ∂ z̄∂z

6.3.1 Complex methods


We have already pointed out that every holomorphic function w = w(z) solves the two-dimensional Laplace
equation. Of course, normally we are interested in real-valued solutions but, since the Laplacian is a real
operator, both the real and imaginary part of w(z) are also harmonic. To make this explicit, we write
w = u + iv . (6.54)
If w is holomorphic then ∂w∂ z̄ = 0 and using the derivatives (6.7) together with the decomposition (6.54)
this translates into the Cauchy-Riemann equations
∂u ∂v ∂u ∂v
= , =− . (6.55)
∂x ∂y ∂y ∂x
These equations immediately imply that
∇u · ∇v = 0 , (6.56)
which means that the curves u = const and v = const are perpendicular to one another. Furthermore, we
have
∆w = 0 ⇒ ∆u = ∆v = 0 . (6.57)
Our strategy for solving the two-dimensional Laplace equation is based on these simple equations and is
probably best explained by an example.

Application 6.22. Solving the two-dimensional Laplace equation with complex methods
Suppose we want to solve Laplace’s equation in the positive quadrant {(x, y) | x ≥ 0, y ≥ 0} and
we impose Dirichlet boundary conditions φ(0, y) = φ(x, 0) = 0 along the positive x and y axis. (See
Fig. 14.) It is clear that the holomorphic function w = z 2 has a vanishing imaginary part along
the real and imaginary axis (just insert z = x and z = iy to check this) and, hence, the choice
φ = v = Im(z 2 ) = 2xy leads to a harmonic function with the desired boundary property. On the
other hand, if we had imposed Neumann boundary conditions ∂φ ∂φ
∂x (0, y) = ∂y (x, 0) = 0 along the
positive x and y axis, the real part of w (having perpendicular equipotential lines) leads to a viable
solution φ = u = Re(z 2 ) = x2 − y 2 . (Of course any even power of z would also do the job so the
solution is not unique. This is because we haven’t specified boundary conditions at infinity.) The
equipotential lines for both solutions are shown in Fig. 15.
For another example, consider solving Laplace’s equation on U = {z ∈ C | |z| > 1} with Dirichlet
boundary condition φ||z|=1 = 0. (See Fig. 14.) It is clear that the function w = z+z −1 is real for |z| = 1
(and it is holomorphic on U ) so with z = reiϕ we have a solution φ = v = Im(z+z −1 ) = 2(r−r−1 ) sin ϕ.
(Again, this is not unique since we have to specify another boundary condition, for example at infinity.)

101
The equipotential lines for this solution are shown in Fig. 15. The solution for |z| ≤ 0 is of course
φ = 0, the unique solution consistent with the boundary conditions at |z| = 1.

y y

(0, y) = 0 V
V
x x
(x, 0) = 0
|r=1 = 0

Figure 14: Examples of boundary conditions for two-dimensional Laplace equations.

5 5
2

4 4

3 3

2 2

-1
1 1

0 0 -2
0 1 2 3 4 5 0 1 2 3 4 5 -2 -1 0 1 2

Figure 15: Equipotential lines for φ(x, y) = xy (left) and φ(x, y) = x2 −y 2 (middle) and φ(x, y) = Im(z+z −1 ).

6.3.2 Separation of variables


Separation of variables is a general technique for solving differential equations which is based on factoring
the problem into one-dimensional differential equations. It is useful to demonstrate the technique for the
simple example of the two-dimensional homogeneous Laplace equation

∂2 ∂2
∆φ = 0 , ∆= + , (6.58)
∂x2 ∂y 2
in Cartesian coordinates. We start by considering solutions of the separated form

φ(x, y) = X(x)Y (y) , (6.59)

102
where X = X(x) and Y = Y (y) are functions of their indicated arguments only. Inserting this Ansatz
into Eq. (6.58) gives
X 00 Y 00
(x) + (y) = 0 . (6.60)
|X{z } |Y{z }
=−α2 =α2

The argument goes that the two terms, being functions of different variables, can only add up to zero
if they are equal to constants α2 and −α2 individually, as indicated above. Solving the resulting two
ordinary differential equations
X 00 = −α2 X , Y 00 = α2 Y , (6.61)
results in the solutions

X(x) = aα cos(αx) + bα sin(αx) , Y (y) = cα eαy + dα e−αy , (6.62)

where aα , bα , cα and dα are arbitrary constants. This by itself gives a rather special solution to the
equation but it does so for every choice of the constant α. Since the equation we are solving is linear we
can, hence, construct more general solutions by linearly combining solutions of the above type for different
values of α. This leads to
X
φ(x, y) = (aα cos(αx) + bα sin(αx))(cα eαy + dα e−αy ) , (6.63)
α

where the sum ranges over some suitable set of α values. Of course this is a large class of solutions which
can be narrowed down, or be made unique by imposing boundary conditions. Whether this works out in
practice depends on the type of boundary conditions and if they are “compatible” with the chosen set of
coordinates and resulting solution. Here, we are working with Cartesian coordinates and this goes well
together with boundary conditions imposed along lines with x = const and y = const. More generally,
building in boundary conditions tends to be easiest if coordinates are chosen such that the boundaries are
defined by one of the coordinates being constants. For example, polar or spherical coordinates go well
with imposing boundary conditions on circles or spheres, as we will see below.

Application 6.23. Rectangular boundary conditions with separation of variables


To see how this works in practice, consider solving the problem on the rectangle V = [0, a] × [0, b] with
φ vanishing on all sides of the rectangle except at y = b where we impose the boundary condition
φ(x, b) = h(x) for some given function h. First consider the boundary conditions φ(0, y) = φ(a, y) = 0
which we can satisfy by setting aα = 0 and α = πk a . Further, satisfying φ(x, 0) = 0 can be achieved
by setting dα = −cα = 1/2. Putting this together we end up with
∞    
X kπx kπy
φ(x, y) = bk sin sinh , (6.64)
a a
k=1

which, for any fixed y, is a sine Fourier series on the interval [0, a]. This already indicates how we
built in the final boundary condition φ(x, b) = h(x). Setting y = b in the above formula, we can
determine the coefficients simply by standard (sine) Fourier series techniques and obtain
Z a  
2 kπx
bk = dx h(x) sin . (6.65)
a sinh kπb a

a 0

103
For any given boundary potential h these coefficients can be calculated and inserting these back into
Eq. (6.64) gives the complete solution.

6.3.3 Polar coordinates


From Eq. (6.11) the two-dimensional Laplacian in polar coordinates reads
1 ∂2
 
1 ∂ ∂
∆= r + 2 . (6.66)
r ∂r ∂r r ∂ϕ2
where r ∈ [0, ∞) and ϕ ∈ [0, 2π). Just as in the Cartesian case, we can try to solve the two-dimensional
Laplace equation in polar coordinates by separation of variables as in the following
Exercise 6.6. Solve the two-dimensional homogeneous Laplace equation in polar coordinates by separation
of variables.
However, there is a more systematic way forward. For any fixed radius r > 0, a functions φ(r, ϕ) can be
thought of as a function on the circle S 1 and can, hence, be expanded in a Fourier series. In other words,
we can write

A0 (r) X
φ(r, ϕ) = + (Ak (r) cos(kϕ) + Bk (r) sin(kϕ)) , (6.67)
2
k=1
where the Fourier coefficients Ak (r) and Bk (r) can of course change with the radius. Note, there is no
assumption involved yet. Eq. (6.67) still represents a general function. Inserting (6.67) into the Laplace
equation ∆φ = 0 gives
∞  ∞ 
k2 k2
 
1 0 0
X 1 0 0
X 1 0 0
(rA0 ) + (rAk ) − 2 Ak cos(kϕ) + (rBk ) − 2 Bk sin(kϕ) = 0 . (6.68)
2r r r r r
k=1 k=1

This is a Fourier series which must be identical to zero so all the Fourier coefficients must vanish. This
leads to a set of ordinary differential equations
(rA00 )0 = 0 , r(rA0k )0 = k 2 Ak , r(rBk0 )0 = k 2 Bk , (6.69)
for Ak and Bk . They are easy to solve and lead to
A0 (r) = a0 + ã0 ln r , Ak (r) = ak rk + ãk r−k , Bk (r) = bk rk + b̃k r−k . (6.70)
Inserting these results back into Eq. (6.67) gives for the general solution of the two-dimensional homoge-
neous Laplace equation in polar coordinates
∞ ∞
a0 ã0 X X
φ(r, ϕ) = + ln r + (ak rk + ãk r−k ) cos(kϕ) + (bk rk + b̃k r−k ) sin(kϕ) . (6.71)
2 2
k=1 k=1

The coefficients ak , bk , ãk and b̃k are arbitrary at this stage and have to be fixed by boundary conditions.

Application 6.24. Two-dimensional Laplace equation with circular boundary conditions


For example, consider solving the problem on the unit disk {(r, ϕ) | r ≤ 1} with the boundary condition
φ(1, ϕ) = h(ϕ), where h is a given function on S 1 . Since the origin is in this region we do not want
any negative powers of r for a non-singular solution, so ãk = b̃k = 0. Then, the boundary condition
at r = 1 reads

a0 X !
φ(1, ϕ) = + (ak cos(kϕ) + bk sin(kϕ)) = h(ϕ) , (6.72)
2
k=1

104
This is simply the Fourier series for the function h and we can find the Fourier coefficients ak and bk
by the usual formulae (3.22).
Now consider solving the problem for the same boundary condition φ(1, ϕ) = h(ϕ) but for the
“exterior” region {(r, ϕ) | r ≥ 1} imposing, in addition, that φ remains finite as r → ∞. The last
condition demands that ã0 = 0 and ak = bk = 0 for k = 1, 2, . . . so we have

a0 X !
φ(1, ϕ) = + (ãk cos(kϕ) + b̃k sin(kϕ)) = h(ϕ) . (6.73)
2
k=1

As before, this is a Fourier series for h and we can determine the Fourier coefficients by the standard
formulae (3.22). So the full solution for φ is
 a0 P∞ k k
2 + Pk=1 (ak r cos(kϕ) + bk r sin(kϕ)) for r ≤ 1
φ(r, ϕ) = a0 ∞ −k −k , (6.74)
2 + k=1 (ak r cos(kϕ) + bk r sin(kϕ)) for r ≥ 1

where ak and bk are the Fourier coefficients of the function h. Note that the two parts of this solution
fit together at r = 1 as they must.

6.4 Laplace equation on the two-sphere


The Laplacian and the Laplace equation on the two-sphere are of significance for a number of reasons.
First of all, we have seen in Eq. (6.21) that the three-dimensional Laplacian in spherical coordinates can
be expressed in terms of the Laplacian on S 2 plus a radial piece. Also, as we will discover later, the
Laplacian on the two-sphere is closely connected to the mathematics of the group of rotations. More
practically, two-spheres are all around us - quite literally so in the case of the celestial two-sphere.

6.4.1 Functions on S 2
We recall that the two-sphere is usually parametrised by two angles (θ, ϕ) ∈ [0, π]×[0, 2π[, as in Eq. (6.18).
Alternatively and often more conveniently, we can use the coordinates (x, ϕ) ∈ [−1, 1] × [0, 2π[ where
x = cos θ. Sometimes it is also useful to parametrise the two-sphere by unit vectors n ∈ R3 . In terms of
the coordinates (x, ϕ) the Laplacian (6.20) takes the form

∂2 ∂ 1 ∂2
∆S 2 = (1 − x2 ) − 2x + . (6.75)
∂x2 ∂x 1 − x2 ∂ϕ2

We should add a word of caution about functions f : S 2 → F on the two-sphere. In practice, we describe
these as functions f = f (x, ϕ) of the coordinates but not all of these are well-defined on S 2 . First of all,
a continuous f needs to be periodic in ϕ, so f (θ, 0) = f (θ, 2π). There is another, more basic condition
which arises because the parametrisation (6.18) breaks down at (x, ϕ) = (±1, ϕ) which correspond to the
same two points (the north and the south pole) for all values of ϕ. Hence, a function f = f (x, ϕ) is only
well-defined on S 2 if f (±1, ϕ) is independent of ϕ. So, for example, f (x, ϕ) = x sin ϕ is not well-defined
on S 2 while f (x, ϕ) = (1 − x2 ) sin ϕ is. This discussion can be summarised by saying that we can expand
a function f on the two-sphere in a Fourier series
X
f (x, ϕ) = ym (x)eimϕ , (6.76)
m∈Z

with ym (±1) = 0 for all m 6= 0.

105
Another useful observation is that as an operator on the inner product space C ∞ (S 2 ) with scalar
product Z
hf, hiS 2 = f (x)∗ h(x) dS , dS = sin θdθ dϕ = dx dϕ , (6.77)
S2
the Laplacian ∆S 2 is self-adjoint. This is most elegantly seen by using the general formulae from
Lemma 6.1.
 
∗ 1 ∂ √ ij ∂h √ k
Z Z Z

hf, ∆hiS 2 = f ∆h dS = f (t) √ gG (t) gd t = (∆f )∗ h dS = h∆f, hiS 2 .
S 2 V g ∂ti ∂t j S 2

(6.78)
Hence, we know that the eigenvalues of ∆S 2 are real and eigenvectors for different eigenvalue are orthogonal
relative to the above inner product.

6.4.2 Eigenvalue problem for the Laplacian on S 2


Solving the eigenvalue problem
∆S 2 f = λf , (6.79)
is immensely useful and our main task. Inserting the expansion (6.76) into the eigenvalue equation (6.79)
and using the form (6.75) of the Laplacian it is easy to see that the functions ym have to satisfy the
differential equation
m2
 
2 00 0
(1 − x )ym − 2xym + −λ − ym = 0 . (6.80)
1 − x2
Comparison with Eq. (4.43) shows that this is precisely of the same form as the differential equation
solved by the associated Legendre polynomials Plm with eigenvalue λ = −l(l + 1), where l = 0, 1, . . . and
m = −l, . . . , l. Of course Eq. (6.80) has another solution for λ = −l(l + 1) and solutions for other values
of λ. However, it can be checked, for example using the power series method explained earlier, that the
Plm are the only solutions which are suitable to define functions on S 2 . Conversely, Eq. (4.41) shows that
Plm (±1) = 0 for all m 6= 0 so they do have the required behaviour for functions on S 2 . The conclusion is
that the eigenfunctions and eigenvalues of the Laplacian on the two-sphere are

Plm (x)eimϕ , λ = −l(l + 1) , l = 0, 1, . . . , m = −l, . . . , l . (6.81)

It is customary to include a suitable normalisation factor and define the spherical harmonics
s
2l + 1 (l − m)! m
Ylm (θ, ϕ) = P (cos θ)eimϕ , l = 0, 1, . . . , m = −l, . . . , l . (6.82)
4π (l + m)! l

We also note the relation r


2l + 1
Yl0 (θ, ϕ) =Pl (cos θ) , (6.83)

between the Legendre polynomials and the spherical harmonics with m = 0.

Exercise 6.7. Show that Ylm = (−1)m (Yl−m )∗ . Also show that the first few spherical harmonics are given
by r r
0 1 ±1 3 ±iϕ 0 3
Y0 = √ , Y1 = ∓ sin θe , Y1 = cos θ . (6.84)
4π 8π 4π

106
From what we have seen, the Ylm are eigenfunctions

∆S 2 Ylm = −l(l + 1)Ylm (6.85)

of ∆S 2 and all Ylm for m = −l, . . . , l have the same eigenvalue λ = −l(l + 1) which, hence, has degeneracy
2l+1. We already know that Ylm and Ylm 0
0 must be orthogonal for l 6= l but, in fact, due to the orthogonality

of the eimϕ functions the Ylm form an orthogonal system. A detailed calculation, based on Eq. (4.44),
shows that
0 mm0
hYlm , Ylm
0 iS 2 = δll0 δ , (6.86)
so they form an ortho-normal system on L2 (S 2 ). In fact, we have
Theorem 6.8. The spherical harmonics Ylm form an orthogonal basis on L2 (S 2 ).
Proof. The proof can, for example, be found in Ref. [7].

This means every function f ∈ L2 (S 2 ) can be expanded in terms of spherical harmonics as


∞ X
X l Z
f= alm Ylm , alm = hYlm , f iS 2 = (Ylm )∗ f dS . (6.87)
l=0 m=−l S2

If f happens to be independent of the angle ϕ then we only need the m = 0 terms in the above expansion
and we can write
∞ Z 1
X 8π 2
f (x) = al Pl (cos θ) , al = dx Pl (x)f (x) . (6.88)
2l + 1 −1
l=0

6.4.3 Multipole expansion


A two-sphere can be parametrised by the set of all three-dimensional unit vector n. Consider two such
unit vectors n and n0 and the functions Pl (n · n0 ), where Pl are the Legendre polynomials. For fixed n0
(say) these functions should have an expansion of the type (6.87) and the precise form of this expansion
is given in the following
Lemma 6.3. For two unit vectors n, n0 ∈ R3 we have
l
4π X m 0 ∗ m
Pl (n · n0 ) = Yl (n ) Yl (n) . (6.89)
2l + 1
m=−l

Proof. Let the function F : S 2 × S 2 → C be defined by the RHS of Eq. (6.89). Our discussion of rotations
in Section 9.4 will show that this functions has the property F (Rn0 , Rn) = F (n0 , n), for any rotation R,
so F is invariant under simultaneous rotation of its two arguments. Now, let R be a rotation such that
Rn0 = e3 , so that F (e3 , Rn) = F (n0 , n). The vector e3 is the north pole of S 2 , and corresponds to the
coordinate value x0 = cos θ0 = 1. Hence, Ylm (e3 ) = 0 for m 6= 0 and
r r
0 2l + 1 2l + 1
Yl (e3 ) = Pl (1) = . (6.90)
4π 4π
Inserting this into the definition of F (the RHS of Eq. (6.89)) we find that F (e3 , ñ) = Pl (cos θ̃) = Pl (ñ·e3 ).
From this special result for F we can re-construct the entire function using the rotational invariance:

F (n0 , n) = F (e3 , Rn) = Pl ((Rn) · e3 ). = Pl (n · (RT e3 )) = Pl (n · n0 ) , (6.91)

which is the desired result.

107
We can use this formula to re-write the expansion (4.40) of a Coulomb potential in terms of Legendre
polynomials. Setting r = rn, r0 = r0 n0 , cos θ = n · n0 in this formula and using Eq. (6.89) we get
∞ l  0 l
1 4π X X 1 r
0
= Ylm (n0 )∗ Ylm (n) . (6.92)
|r − r | r 2l + 1 r
l=0 m=−l

Let us apply this result to cast the solution (6.34) to the inhomogeneous Laplace equation in a different
form. Specialising to the three-dimensional case, we have seen that the unique solution to ∆3 φ = −4πρ
which approaches zero for |r| → ∞ is given by
ρ(r0 ) 3 0
Z
φ(r) = 0
d r (6.93)
R3 |r − r |

Inserting the expansion (6.92) this turns into


∞ X
l
qlm Ylm (n)
X Z
φ(r) = 4π , qlm = Ylm (n0 )∗ (r0 )l ρ(r0 ) d3 r0 . (6.94)
2l + 1 rl+1 R3
l=0 m=−l

This is called the multipole expansion and qlm are called the multipole moments of the source ρ. The
multipole expansion gives φ as series in inverse powers of the radius r and is, therefore, useful if r is much
larger than the extension of the (localised) source ρ. The l = 0 term which is proportional to 1/r is called
the monopole term and its coefficient q00 is the total charge of ρ. The l = 1 term which decreases as 1/r2
is the dipole term with dipole moments q1m , the l = 2 term with a 1/r3 fall-off is the quadrupole term
with quadrupole moments q2m and so on.
Exercise 6.9. Convert the monopole and dipole term in the multipole expansion (6.94) into Cartesian
coordinates (Hint: Use the explicit form of the spherical harmonics in Eq. (6.84).)

6.5 Laplace equation in three dimensions


Many of the methods we discuss can be applied, with suitable modifications, to the two as well as the
three-dimensional case. We denote Cartesian coordinates in R3 by r = (x, y, z) = (xi )i=1,2,3 and recall the
expressions for the three-dimensional Laplacian from the beginning of the section. First, we discuss the
method of image charges which can also be useful in two dimensional cases.

6.5.1 Method of image charges


This method is designed for inhomogeneous problems with non-trivial boundary conditions and the general
idea can be explained based on the expression (6.36) for the general solution in the presence of a source
ρ. The main goal is to choose the homogeneous solution φH in Eq. (6.36) so that the required boundary
condition is satisfied. Suppose we are interested in solving the problem on V ⊂ R3 with a source ρ which
is localised in V and with boundary conditions on ∂V. The idea is now to insert another, “unphysical”
source, ρ̃, into the problem which is localised on R3 \ V. In this case, the potential generated by ρ̃ is of
course harmonic in V and can serve as a candidate φH . The art is to choose the charge distribution ρ̃
appropriately, so that the solution (6.36) satisfies the right boundary condition. This is facilitated by the
idea of “mirroring” the actual source ρ in V by a source ρ̃ in R3 \ V.

Application 6.25. Image charges


The practicalities are probably best explained by examples. First consider a simple case where
V = {(x, y, z) ∈ R3 | x ≥ 0} is the half-space with x ≥ 0 with boundary ∂V = {(0, y, z)} the y-z

108
x x
(x, 0) = 0
|r=1 = 0

y y
(0, y) = 0 V
V
q q q̃ q
a a x ã a x

|r=a = 0

Figure 16: Examples for the method of mirror charges.

plane. (See Fig. 16.) We consider a source ρ which corresponds to a single charge q located at
r0 = (a, 0, 0) ∈ V, where a > 0, and demand the boundary condition φ|x=0 = 0. The Coulomb
potential for this charge
q
φI (r) = (6.95)
|r − r0 |
does satisfy the correct equation, ∆φI = −4πρ, but does not satisfy the boundary condition. Suppose,
we introduce the mirror charge density ρ̃ by a single point charge with charge −q located at r̃0 =
(−a, 0, 0) ∈ R3 \ V leading to a potential
q
φH (r) = − (6.96)
|r − r̃0 |

which satisfies ∆φH = 0 in V. Then


q q
φ(r) = φH (r) + φI (r) = − + (6.97)
|r − r̃0 | |r − r0 |

satisfies ∆φ = −4πρ in V = {x ≥ 0} as well as the required boundary condition φ|x=0 = 0. For a


slightly more complicated example, consider the region V = {r ∈ R3 | |r| ≥ b} outside a sphere with
radius b with the charge density ρ generated by a singe charge q located at r0 = (a, 0, 0), where a > b
and boundary condition φ||r|=b = 0. (See Fig. 16.) We try a single mirror charge inside the sphere
with charge q̃ and located at r̃0 = (ã, 0, 0), where ã < b. Then

q̃ q
φ(r) = + (6.98)
|r − r̃0 | |r − r0 |

satisfies ∆φ = −4πρ outside the sphere. We should try to fix q̃ and ã so that φ||r|=b = 0 and a short
calculation shows this is satisfied if
b b2
q̃ = − q , ã = . (6.99)
a a

109
6.5.2 Cartesian coordinates
We would like to solve the three-dimensional homogeneous Laplace equation
∂2 ∂2 ∂2
∆φ = 0 , ∆= + + . (6.100)
∂x2 ∂y 2 ∂z 2
One way to proceed is by a separation Ansatz just as we did in the two-dimensional case with Cartesian
coordinates but here we would like to follow a related by slightly different logic based on a Fourier series
expansion.
Exercise 6.10. Solve the three-dimensional Laplace equation in Cartesian coordinates by separation of
variables.

Application 6.26. Laplacian in a box


Suppose we are interested in a solution in the box V = [0, a] × [0, b] × [0, c] and assume, for now, that
φ = 0 an all boundaries with x = 0, a and y = 0, b. For fixed z, any function φ ∈ L2 ([0, a] × [0, b])
with these boundary conditions can be expanded in a (double) sine Fourier series
∞    
X πkx πly
φ(x, y, z) = Zk,l (z) sin sin , (6.101)
a b
k,l=1

where Zk,l are functions of z. Inserting this into the Laplace equation gives an ordinary differential
equation s 
πk 2
 2
00 2 πl
Zk,l = νk,l Zk,l , νk,l = + , (6.102)
a b
for each Zk,l whose general solution is

Zk,l (z) = Ak,l eνk,l z + Bk,l e−νk,l z , (6.103)

with arbitrary constants Ak,l and Bk,l . Combining this result with Eq. (6.101) leads to the general
solution
∞    
X
νk,l z −νk,l z
 πkx πly
φ(x, y, z) = Ak,l e + Bk,l e sin sin (6.104)
a b
k,l=1

of the Laplace equation in the box V = [0, a]×[0, b]×[0, c] which vanishes on the boundaries at x = 0, a
and y = 0, b. To fix the remaining constants we have to specify boundary conditions at z = 0 and
z = c. Suppose we demand that φ(x, y, 0) = 0. This can be achieved by setting Ak,l = −Bk,l =: ak,l /2
so that the solution becomes
∞    
X πkx πly
φ(x, y, z) = ak,l sinh(νk,l z) sin sin . (6.105)
a b
k,l=1

Finally, assume for the last boundary at z = c that φ(x, y, c) = h(x, y) for a given function h. Then
setting z = c in Eq. (6.105) is a (double) sine Fourier series for the function h and we can compute
the remaining parameters ak,l by standard Fourier series techniques as
   
4 πkx πly
Z
ak,l = dx dy sin sin h(x, y) . (6.106)
ab sinh(νk,l c) [0,a]×[0,b] a b

110
Of course this calculation can be repeated, and leads to a similar result, if another one of the six
boundary planes is subject to a non-trivial boundary condition while φ is required to vanish on
the other five boundary planes. In this way we get six solutions, all similar to the above but with
coordinates permuted and constants determined as appropriate. The sum of these six solutions then
solves the homogeneous Laplace equation with non-trivial boundary conditions on all six sides.

6.5.3 Cylindrical coordinates


To solve the Laplace equation in cylindrical coordinates, using the three-dimensional Laplacian given in
Eq. (6.15), we proceed in close analogy with the previous case. We assume that we would like a solution
on the cylinder V = {(r, ϕ, z) | r ∈ [0, a] , z ∈ [0, L]} with radius a and height L and we assume the
boundary conditions are such that φ vanishes on the side of the cylinder, so φ|r=a = 0. Since ϕ is an
angular coordinate we can use a Fouries series for the ϕ dependence and, it turns out, it is useful to use
the Bessel functions Jˆνk , defined in Eq. (5.63), which form an ortho-normal basis of L2 ([0, a]). We recall
from Eq. (5.64) that these functions satisfy the eigenvalue equations

ν2 ˆ z2 ˆ
   
1 d d
r − 2 Jν,k = − νk Jνk , (6.107)
r dr dr r a2
where zνk are the zeros of the Bessel functions. The fact that the operator on the LHS is similar to the
radial part of the Laplacian in cylindrical coordinates (6.15) is part of the motivation of why we are using
the Bessel functions. We can then expand
∞ X

Zkm (z)Jˆνk (r)(akm sin(mϕ) + bkm cos(mϕ))
X
φ(r, ϕ, z) = (6.108)
k=1 m=0

with as yet undetermined functions Zkm . Inserting this expansion into the Laplace equation in cylindrical
coordinates (6.15) and using the eigenvalue equation (6.107) for the Bessel functions leads to the differential
equation
00 z2
Zkm = mk Zkm , (6.109)
a2
provided the previously arbitrary type, ν, of the Bessel function is fixed to be ν = m. Solving this equation
and inserting the solutions back into the expansion (6.108) gives
∞ X
∞  z
mk z
  z z 
mk
Jˆmk (r)(akm sin(mϕ) + bkm cos(mϕ))
X
φ(r, ϕ, z) = Akm exp + Bkm exp −
a a
k=1 m=0
(6.110)
To fix the remaining coefficients we have to impose boundary conditions at the bottom and top of the
cylinder. For example, if we demand at the bottom that φ|z=0 = 0 then
∞ X
∞ z
mk z

Jˆmk (r)(akm sin(mϕ) + bkm cos(mϕ)) .
X
φ(r, ϕ, z) = sinh (6.111)
a
k=0 m=0

Finally, if we demand the boundary condition φz=L = h, for some function h = h(r, ϕ) at the top of
the cylinder the remaining coefficients akm and bkm can be determined by combining the orthogonality
properties (3.22) of the Fourier series with those of the Bessel functions (5.65).
Exercise 6.11. Use orthogonality properties of the Fourier series and the Bessel functions to find expres-
sions for the coefficients amk and bmk in the solution (6.111), so that the boundary condition φ|z=L = h
is satisfied for a function h = h(r, ϕ).

111
6.5.4 Spherical coordinates
To discuss the three-dimensional Laplace equation in spherical coordinates it is very useful to recall that
the three-dimensional Laplace operator can be written as
 
1 ∂ 2 ∂ 1
∆3,sph = 2 r + 2 ∆S 2 , (6.112)
r ∂r ∂r r
where ∆S 2 is the Laplacian on the two-sphere. Also recall that we have the spherical harmonics Ylm which
form an orthonormal basis of L2 (S 2 ) and are eigenfunctions of ∆S 2 , with
∆S 2 Ylm = −l(l + 1)Ylm . (6.113)
All this suggest we should start with an expansion
∞ X
X l
φ(r, θ, ϕ) = Rlm (r)Ylm (θ, ϕ) . (6.114)
l=0 m=−l

Inserting this expansion into the homogeneous Laplace equation, ∆φ = 0, and using the eigenvector
property (6.113) leads to the differential equations
d 2 0 
r Rlm = l(l + 1)Rlm (6.115)
dr
with general solutions
Rlm (r) = Alm rl + Blm r−l−1 , (6.116)
for constants Alm and Blm . Inserting this back into the expansion (6.114) leads to the general solution to
the homogeneous Laplace equation in spherical coordinates:
∞ X
X l
φ(r, θ, ϕ) = (Alm rl + Blm r−l−1 )Ylm (θ, ϕ) . (6.117)
l=0 m=−l

The arbitrary constants Alm and Blm are fixed by boundary conditions and, given the choice of coordinates,
they are relatively easy to implement if they are imposed on spherical boundaries. We also note that for
problems with azimutal symmetry, that is, when the boundary conditions and φ are independent of
ϕ, we only require the m = 0 terms in the above expansion. Since the Yl0 are proportional to the
Legendre polynomials this means, after a re-definition of the constants, that, for such problems, we have
the simplified expansion
X∞
φ(r, θ) = (Al rl + Bl r−l−1 )Pl (cos θ) . (6.118)
l=0

Application 6.27. Laplace equations with boundary conditions on a sphere


We conclude with an example on how to fix the constants by imposing boundary conditions. Suppose
that V = {(r, θ, ϕ) | r ≤ a} is the ball with radius a and we demand that φ|r=a = h, where h = h(θ, ϕ)
is a given function. Since φ needs to be smooth in the interior of the ball we first conclude that all
terms with negative powers in r in the expansion (6.114) must vanish, so Blm = 0. The remaining
constant Alm can be fixed by imposing φ|r=a = h and using the orthogonality relations (6.87) of the
spherical harmonics. This leads to
∞ X
l
1 1
X Z
h = φ|r=a = Alm al Ylm ⇒ Alm = hYlm , hiS 2 = l (Ylm )∗ h dS . (6.119)
al a S2
l=0 m=−l

112
On the other hand, we might be interested in the region outside a sphere of radius a, so V =
{(r, θ, ϕ) | r ≥ a}, with the same boundary condition φ|r=a = h at r = a. In this case we also
have to specify a boundary condition at “infinity” which can, for example, be done by demanding
that φ → 0 as r → ∞. This last condition implies that all terms with positive powers of r in the
expansion (6.114) must vanish, so that Alm = 0. The remaining coefficients Blm are then fixed by
the boundary condition at r = a and are, in analogy with the previous case, given by
Z
l+1
Blm = a hYlm , hiS 2 = al+1
(Ylm )∗ h dS . (6.120)
S2

113
7 Distributions
The Dirac delta function, δ(x), introduced by Dirac in 1930, is an object frequently used in theoretical
physics. It is usually “defined” as a “function” on R with properties
 Z
0 for x 6= 0
δ(x) = , dx δ(x) = 1 . (7.1)
∞ for x = 0 R

Of course assigning a function “value” of infinity at x = 0 does not make sense. Even if we are prepared
to overlook this we know from our earlier discussion of integrals that a function which is zero everywhere,
except at one point, must integrate to zero (since a single point is a set of measure zero), rather than to
one. Hence, the above “definition” is at odds with basic mathematics, yet it is routinely used in a physics
context. The purpose of this chapter is to introduce the proper mathematical background within which
to understand the Dirac delta - the theory of distributions - and to explain how, in view of this theory, we
can get away with mathematically questionable equations such as Eq. (7.1). The fact that the Dirac delta
is not a function but rather a distribution has far-reaching consequences for many equations routinely
used in physics. For example, the “Coulomb potential” 1/r, where r = |x| is the three-dimensional radius,
satisfies an equation often stated as
1
∆ = −4π δ(x) , (7.2)
r
∂2
where ∆ = 3i=1 ∂x
P
2 is the three-dimensional Laplacian. (We will prove the correct version of this equation
i
below.) Since the right-hand side of this equation should be understood as a distribution so must be the
left-hand side and this leads to a whole range of questions. With these motivations in mind we now begin
by properly defining distributions.

7.1 Basic definitions


The theory of distributions was first developed by Schwarz in 1945. The first step is to introduce test
functions which are functions with particularly nice analytic properties. There are several possible choices
for test function spaces but for the present purpose we adopt the following

Definition 7.1. The space of test functions is the vector space D = D(Rn ) := Cc∞ (Rn ) of infinitely many
times differentiable functions with compact support and a function ϕ ∈ D is called a test functions.
We say a sequence (ϕk ) of test functions converges to a function ϕ ∈ D iff
(i) There is an R > 0 such that ϕk (x) = 0 for all k and all x ∈ Rn with |x| > R.
(ii) (ϕk ) and all its derivatives converge to ϕ uniformly for all x ∈ Rn with |x| ≤ R.
D
In this case we write ϕk → ϕ.

Note that the above version of convergence is very strong. An example of a test function is
(  
a2
exp − a2 −|x| 2 for |x| < a
ϕ(x) = (7.3)
0 for |x| ≥ a

Clearly, this function has compact support and the structure of the exponential ensures that it drops
to zero at |x| = a in an infinitely many times differentiable way. All polynomials times the above ϕ
are also test functions so the vector space D is clearly infinite dimensional. We are now ready to define
distributions.

114
Definition 7.2. A distribution is a linear, continuous map

T : D(Rn ) −→ R
. (7.4)
ϕ −→ T [ϕ]

D
Continuity means that ϕk → ϕ implies T [ϕk ] → T [ϕ]. The space of all distributions is called D0 = D0 (Rn ).

Clearly, the space of distributions D0 is a vector space, a sub-space of the dual vector space to D. Hilbert
spaces are isomorphic to their dual spaces but the test function space D is not a Hilbert space (for example,
it is not complete relative to the standard integral norms). In some sense, the test function space D is
rather “small” and, hence, we expect the dual space to be large. To get a better feeling for what this
means, we should now consider a few examples of distributions.

7.1.1 Examples of distributions


Let f ∈ C(Rn ) be a continuous function. We can define a distribution Tf , associated to f by
Z
Tf [ϕ] := dxn f (x)ϕ(x) (7.5)
Rn

Clearly, Tf is linear and continuous and, hence, it is a distribution. It is also not too hard to show that
the map f → Tf is injective, so that the continuous functions are embedded in the space of distributions.
We can slightly generalise the above construction. For f ∈ L1loc (Rn ) (a locally integrable function) the
above definition of Tf still works. However, the map f → Tf is no longer injective - two functions which
only differ on a set of measure zero are mapped to the same distribution.
For a ∈ Rn the Dirac δ-distribution δa ∈ D0 (Rn ) is defined as

δa [ϕ] := ϕ(a) , (7.6)

and this provides the correct version of the equation


Z
dxn ϕ(x)δ(x − a) = ϕ(a) , (7.7)
Rn

which is frequently written in a physics context. The integral on the LHS is purely symbolic - the reason
one usually gets away with using this notation is that both the integral and the distribution are linear.
The above examples show that distributions should be thought of as generalisations of functions. They
contain functions via the map f → Tf but, in addition, they also contain “more singular” objects such as
the Dirac-delta which cannot be interpreted as a function. The idea is now to generalise many of the tools
of analysis from functions to distributions. We begin with a definition of convergence for distributions.

7.1.2 Convergence of distributions


Definition 7.3. A sequence (Tk ) of distributions is said to converge to a T ∈ D0 iff Tk [ϕ] → T [ϕ] for all
D0
test functions ϕ ∈ D. In this case we write Tk → T .

The idea of this definition is that convergence of distributions is “tested” with functions ϕ ∈ D and hence
the name “test functions”. The above notion of convergence can be used to gain a better understanding of
the Dirac delta. Although by itself not a function the Dirac delta can be obtained as a limit of functions,
as explained in the following

115
Theorem 7.1. Let f : Rn → R be an integrable function with dxn f (x) = 1. For  > 0 define the
R
Rn
functions f (x) := 1n f x . Then we have

D0
Tf −→ δ0 as  −→ 0 . (7.8)

Proof. We need to show convergence in the sense of distributions, as defined in Def. 7.3. This means we
must show that
lim Tf [ϕ] = ϕ(0) (7.9)
→0

for all test functions ϕ ∈ D. A simple computation gives


1 x
Z Z Z
x=y
Tf [ϕ] = dxn f (x)ϕ(x) = dxn n f ϕ(x) = dy n f (y)ϕ(y) . (7.10)
Rn Rn   Rn

The integrand on the RHS side is bounded by an integrable function since |f (y)ϕ(y)| < Kf (y), for some
constant K. This means we can pull the limit  → 0 into the integral so that
Z Z
n
lim Tf [ϕ] = dy f (y) lim ϕ(y) = ϕ(0) dy n f (y) = ϕ(0) . (7.11)
→0 Rn →0 Rn

In the one-dimensional case, a possible choice for the functions f and f in Theorem 7.1 is provided by
the Gaussians
1 2 1 x2
f (x) = √ e−x /2 ⇒ f (x) = √ e− 22 . (7.12)
2π 2π 
The graph of f for decreasing values of  is shown in Fig. 17 and illustrates the idea behind Theorem 7.1.
f Ε HxL
4

0
-3 -2 -1 0 1 2 3

Figure 17: The graph of the function f in Eq. (7.12) for  = 1, 12 , 15 , 10


1
.

For decreasing  the function f becomes more and more peaked around x = 0 and tends to δ0 in the limit.
However, note that f does actually not converge to any function in the usual sense, as  → 0. Rather,
convergence only happens in the weaker sense of distributions, as stated in Theorem 7.1.
We can use the above representation of the Dirac delta as a limit to introduce a new class of distribu-
tions. For a function g in C ∞ (R) we can define the distribution δg ∈ D0 (R) by

δg := lim Tf ◦g . (7.13)


→0

116
To see what this means, we assume that we can split up R into intervals Iα such that g|Iα is invertible
with inverse gα−1 . Then, following the calculation in Theorem 7.1, we have
ϕ(g −1 (y))
 
XZ 1 g(x) y=g(x)/ XZ X ϕ(a)
δg [ϕ] = lim dx f ϕ(x) = lim dy f (y) 0 α−1 = .
→0
α Iα   →0
α g(Iα )/ |g (gα (y))| |g 0 (a)|
a:g(a)=0

Re-writing the RHS in terms of Dirac deltas we find


X 1
δg = δa . (7.14)
|g 0 (a)|
a:g(a)=0

This result suggests an intuitive interpretation of δg as a sum of Dirac deltas located at the zeros of the
function g. As an example, consider the function g(x) = x2 − c2 , where c > 0 is a constant. This function
has two zeros at ±c and |g 0 (±c)| = 2c. Inserting this into Eq. (7.14) gives
1
δx2 −c2 = (δc + δ−c ) . (7.15)
2c
It is useful to translate the above equations into the notation commonly used in physics. There, δg is
usually written as δ(g(x)) and Eq. (7.14) takes the form
X 1
δ(g(x)) = δ(x − a) , (7.16)
|g 0 (a)|
a:g(a)=0

while the example (7.15), written in this notation, reads


1
δ(x2 − c2 ) = (δ(x − c) + δ(x + c)) . (7.17)
2c

7.1.3 Derivatives of distributions


We are now ready for the next step in extending the standard tools of analysis to distributions. We would

like to define derivatives of distributions. For convenience, we introduce the short-hand notation Di := ∂x i
for the partial derivatives. Our goal is to make sense of the expression Di T for a distribution T ∈ D0 .
For a distribution Tf , associated to a differentiable function f via the rule (7.5) there is a natural way to
define the derivative, namely
Di Tf := TDi f . (7.18)
In other words, the derivative of such a distribution is simply the distribution associated to the derivative
of the function. Any general definition of Di T should reduce to this rule for distributions of the form Tf
so that the normal rules for differentiating functions are preserved. Writing the rule (7.18) out explicitly,
using Eq. (7.5), we have
Z Z
n
Di Tf [ϕ] = TDi f [ϕ] = dx Di f (x)ϕ(x) = − dxn f (x)Di ϕ(x) = Tf [−Di ϕ] . (7.19)
Rn Rn
The left and right-hand sides of this chain of equations make sense if we drop the subscript f and this is
used to define the derivative of distributions in general.
Definition 7.4. The derivative Di T of a distribution T ∈ D0 is defined by Di T [ϕ] = T [−Di ϕ].
From this definition, Di T is clearly linear but for it to be a distribution we still have to check that
D D
it is continuous. Suppose we have a convergent sequence ϕk → ϕ. It follows that −Di ϕk → −Di ϕ
(due to the strong notion of convergence in D, as in Def. 7.1) and, from continuity of T , this implies
T [−Di ϕk ] → T [−Di ϕ] and, finally, using Def. 7.4, that Di T [ϕk ] → Di T [ϕ]. This means Di T is indeed
continuous and, hence, a distribution.

117
Application 7.28. Unusual convergence of distributions
Consider the sequence of functions fk : R → R defined by fk (x) = sin(kx), where k = 1, 2, . . ., so a
sequence of sine functions with increasing frequency. Clearly, this sequence does not converge to any
function in the usual sense. What about convergence of the associated sequence Tfk of distributions?
To work this out, we define the sequence of functions (gk ) with gk (x) = − k1 cos(kx). We have
Dgk = fk ⇒ DTgk = Tfk and, since gk → 0 uniformly, as k → ∞, it follows that

D0 D D0
Tgk −→ T0 ⇒ Tfk −→ T0 , (7.20)

where T0 is the distribution associated to the function identical to zero. This is a surprising result
which shows that distributions can behave quite differently from functions: While the sequence of
functions (fk ) does not converge at all, the associated sequence of distributions (Tfk ) converges to the
zero distribution.

Application 7.29. Heaviside function


The Heaviside function θ : R → R is defined as

1 for x ≥ 0
θ(x) := . (7.21)
0 for x < 0

As a function, θ is not differentiable at x = 0 (since it is not even continuous there) but we can still
ask about the differential of the associated Heaviside distribution Tθ . A short calculation shows that
Z Z ∞
0
DTθ [ϕ] = Tθ [−Dϕ] = − dx θ(x)ϕ (x) = − dxϕ0 (x) = ϕ(0) = δ0 [ϕ] (7.22)
R 0

and, hence,
DTθ = δ0 . (7.23)
The Dirac delta is the derivative of the Heaviside distribution. In a physics context this is frequently
written as
d
θ(x) = δ(x) , (7.24)
dx
ignoring the fact that θ, as a function, is actually not differentiable.

Application 7.30. Derivative of Dirac delta


How does the derivative δa0 = Dδa of the Dirac delta act on test functions? A short calculations shows
that
δa0 [ϕ] = Dδa [ϕ] = δa [−Dϕ] = −ϕ0 (a) . (7.25)
While the Dirac delta δa produces the function value of the test function at a its derivative, δa0 , leads
to the (negative) derivative of the test function at a. In a physics context, this equation is also written
as Z
dx δ 0 (x − a)ϕ(x) = −ϕ0 (a) (7.26)

and it is “proven” by partial integration and then using Eq. (7.7).

118
7.2 Convolution of distributions∗
We have already encountered convolutions of functions in the context of Fourier transforms. We would now
like to introduce convolutions between functions and distributions. Suppose we have a locally integrable
∞ (Rn ) and a test function ϕ ∈ D(Rn ). From Eq. (3.55) the convolution of these two
function f ∈ Cloc
functions is given by Z
(f ? ϕ)(x) = dy n f (y)ϕ(x − y) . (7.27)
Rn
Of course we would like to extend this definition to convolutions such that the above formula is reproduced
for distributions Tf , associated to a function f . Hence, we start by re-writing Eq. (7.27) in terms of Tf
and, to this end, we introduce the translation τx by x ∈ Rn , defined by

(τx ϕ)(y) := ϕ(x − y) . (7.28)

It is easy to see that with this definition, the convolution (7.27) can be re-written as

(f ? ϕ)(x) = Tf [τx ϕ] . (7.29)

The RHS of this equation can be used to define convolutions between distributions and functions.

Definition 7.5. For a distribution T ∈ D0 (Rn ) and a test function ϕ ∈ D(Rn ) the convolution T ? ϕ :
Rn → R is a function defined by
(T ? ϕ)(x) := T [τx ϕ] . (7.30)

As an example, consider a convolution with the Dirac delta δ0 .

(δ0 ? ϕ)(x) = δ0 [τx ϕ] = (τx ϕ)(0) = ϕ(x) ⇒ δ0 ? ϕ = ϕ . (7.31)

This means that the Dirac delta δ0 can be seen as the identity of the convolution operation.
The result of a convolution T ? ϕ is a function and it is an obvious question what the properties of this
function are. Some answers are given by the following

Theorem 7.2. For a distribution T ∈ D0 (Rn ) and a test function ϕ ∈ D(Rn ) the convolution T ? ϕ is a
differentiable function and we have
(a) (b)
Di (T ? ϕ) = T ? (Di ϕ) = (Di T ) ? ϕ . (7.32)

Proof. For the equality (a) we evaluate the LHS explicitly:


 
1 1
Di (T ? ϕ)(x) = lim (T [τx+ei ϕ] − T [τx ϕ]) = lim T (τx+ei ϕ − τx ϕ)
→0  →0 
= T [τx Di ϕ] = (T ? Di ϕ)(x) .

Proving the equality (b) is even more straightforward:

Di (T ? ϕ)(x) = T [τx Di ϕ] = T [−Di τx ϕ] = Di T [τx ϕ] = ((Di T ) ? ϕ)(x) .

119
7.3 Fundamental solutions - Green functions∗
We have already seen Green functions appear a number of times, notably in the context of ordinary linear
differential equations and in the context of the Laplace equation. They are, in fact, quite a general concept
which can be formulated concisely using distributions. To get a rough idea of how this works let us first
discuss this using the physics language where we work with a Dirac delta “function”, δ(x).
Suppose, we have a differential operator L, (typically second order) and we are trying to solve the
equation Lφ(x) = ρ(x). (The Laplace equation with source ρ is an example but the discussion here is for
a general differential operator.) Suppose we have found a “function” G satisfying

LG(x) = δ(x) . (7.33)

In terms of G we can immediately write down a solution to our equation, namely


Z
φ(x) = dy n G(x − y)ρ(y) . (7.34)
Rn

To check that this is indeed a solution simply work out


Z Z
Lx φ(x) = dy n Lx G(x − y)ρ(y) = dy n δ(x − y)ρ(y) = ρ(x) . (7.35)
Rn Rn

For example, for the (three-dimensional) Laplace equation ∆φ(x) = −4πρ(x), we have the Green function
G(x) = 1/r, where r = |x|, which satisfies
1
∆ = −4πδ(x) . (7.36)
r
(This statement will be justified in Theorem 7.4 below.) Hence, we have a solution

ρ(y)
Z
φ(x) = dy 3 (7.37)
R3 |x − y|

While the above discussion is what you will normally find in the physics literature we know by now that
it is mathematically questionable and has to be justified by a proper treatment in terms of distributions.
The general theorem facilitating this is

Theorem 7.3. If the distribution E ∈ D0 (Rn ) satisfies LE = δ0 for a differential operator L and ρ ∈
D(Rn ) is a testfunction, then φ := E ? ρ satisfies Lφ = ρ.

Proof. Given our preparation the proof is now quite easy:

Lφ = L(E ? ρ) = (LE) ? ρ = δ0 ? ρ = ρ . (7.38)

Here we have used Theorem 7.2 and the fact that the Dirac delta is the identity under convolution, as in
Eq. (7.31).

In the above theorem, the distribution E, also called a fundamental solution for L, is the analogue of
the Green function and a general solution to the inhomogeneous equation with source ρ is obtained by
convoluting E with ρ.
Of course, for this theorem to be of practical use we first have to work out a fundamental solution for
a given differential operator L. For the Laplace operator, L = ∆, the following theorem provides such a
fundamental solution:

120
Theorem 7.4. The distribution T1/r ∈ D0 (R3 ), where r = |x|, satisfies ∆T1/r = −4πδ0 and is, hence, a
fundamental solution for the Laplace operator.

Proof. We have
∆ϕ
Z
∆T1/r [ϕ] = T1/r [∆ϕ] = dx3 . (7.39)
R3 r
That this integral is equal to −4πδ0 [ϕ] has already been shown in Theorem (6.3).

Hence, T1/r is a fundamental solution of the Laplace operator and can be used to write down solutions
of the inhomogeneous Laplace equation by applying Theorem 7.3. In fact, in Corollary 6.2 we have seen
that this provides the unique solution φ to the inhomogeneous Laplace equation with φ → 0 as r → ∞.
We have also encountered Green functions in our discussion of second order ordinary differential
equations in Section 5 and it is useful to re-formulate these results in terms of distributions. Recall that
the relevant differential operator is given by

d2 d
L = α2 (x) + α1 (x) + α0 (x) , (7.40)
dx2 dx
and, on the interval x ∈ [a, b], we would like to solve the equation Ly = f , with y(a) = y(b) = 0. The
solution to this problem is described, in terms of a Green function, in Theorem 5.6. From this statement
we have

Theorem 7.5. For G̃(x) := G(0, x) where G is the Green function function from Theorem 5.6, but for
the operator L† , the distribution TG̃ , is a fundamental solution of the operator (7.40), that is, LTG̃ = δ0 .

Proof. From Eq. (5.83) the Green function for L† can be written in terms of its eigenfunvyions ek and
eigenalues λk as
X 1
G(x, t) = w ek (t)ek (x) , (7.41)
λk
k
where
L† ek = λk ek (7.42)
and this Green function provides a solution to L† y = ϕ given by
Z Z
y(x) = dt G(x, t)ϕ(t) ⇒ dt L†x G(x, t)ϕ(t) = ϕ(x) . (7.43)
R R

Using this and the fact that L† is hermitian relative to the scalar product defined with weight function w
(as it appears in its Sturm-Liouville form) we get
Eq. (7.41) X 1
Z Z


LTG̃ [ϕ] = TG̃ [L ϕ] = dt G(0, t)Lt ϕ(t) = dt w(t)ek (t)ek (0)L†t ϕ(t)
R λk R
k
X 1 Z
† Eq. (7.42) X
Z
= dt w(t)Lt ek (t)ek (0)ϕ(t) = dt w(t)ek (t)ek (0)ϕ(t)
λk R R
k k
Eq. (7.42) X 1
Z Z
Eq. (7.41) Eq. (7.43)
= dt w(t)ek (t)(L†x ek )(0)ϕ(t) = dt (L†x G)(0, t)ϕ(t) = ϕ(0) = δ0 [ϕ]
λk R R
k

We will encounter further examples of fundamental solutions in the next chapter when we discuss other
linear partial differential equations.

121
7.4 Fourier transform for distributions∗
In section 3.2 we have discussed the Fourier transform and we have defined
1 1
Z Z
F(f )(k) = n
d ye −ik·y
f (y) , ˆ
F̃(f )(x) = dn k eik·x fˆ(k) , (7.44)
(2π)n/2 Rn (2π)n/2 Rn

and one of our central results was that F̃ ◦ F = id, so F̃ is the inverse Fourier transform. Writing this
out explicitly we have
1
Z
dn y dn k eik·(x−y) f (y) = f (x) . (7.45)
(2π)n
Symbolically, this result can also be written as
1
Z
dn k eik·(x−y) = δ(x − y) , (7.46)
(2π)n

in the sense that this equation multiplied with f (y) and integrated over y leads to Eq. (7.45), provided the
naive rule (7.7) for working with δ(x − y) is used. The equation (7.46) is frequently used in physics calcu-
lations but, as all other consideration which involve the Dirac delta used as a function, is mathematically
unsound.

Exercise 7.6. Use Eq. (7.46) naively to “prove” part (a) of Plancherel’s theorem (3.16).

To check that using Eq. (7.46) in a naive way makes sense we should consider the generalisation of the
Fourier transform to distributions.
For some guidance on how to define the Fourier transform for distributions we proceed as usual and
demand that FTf = TF f in order to ensure that the Fourier transform on distributions reduces to the
familiar one for functions whenever it is applied to a distribution of the form Tf . This implies

1
Z Z
(FTf )[ϕ] = TF f [ϕ] = dn y (Ff )(y)ϕ(y) = dn x dn y e−ix·y f (x)ϕ(y)
(2π)n/2
Z
= dn x (Fϕ)(x)f (x) = Tf [Fϕ] . (7.47)

The LHS and RHS of this equation make sense if we drop the subscript f and this can be used to define
the Fourier transform of distributions by 7

(FT )[ϕ] := T [Fϕ] . (F̃T )[ϕ] := T [F̃ϕ] , (7.48)

so that F̃ ◦ F(T ) = F ◦ F̃(T ) = T . Given this definition, what is the Fourier transform of the Dirac delta,
δk ? The quick calculation
1
Z
(Fδk )[ϕ] = δk [Fϕ] = (Fϕ)(k) = dn x e−ik·x ϕ(x) = Te−ik·x /(2π)n/2 [ϕ] (7.49)
(2π)n/2

shows that
Fδk = T 1
e−ik·x . (7.50)
(2π)n/2

7
A mathematically fully satisfactory definition requires modifying the test function space but this would go beyond our
present scope. The main idea of how to define Fourier transforms of distributions is already apparent without considering
this subtlety.

122
Note that this result is in line with our intuitive understanding of Fourier transforms as frequency analysis.
It says that a frequency spectrum “sharply peaked” at k Fourier transforms to a monochromatic wave
with wave vector k.
Now consider the inverse of the above computation, that is, we would like to work out the Fourier
transform of Teik.·x . Note that the function eik.·x is not integrable over Rn so it does not have a Fourier
transform in the conventional sense. Taking the complex conjugate

F̃δk = T 1
eik·x (7.51)
(2π)n/2

of Eq. (7.50) and applying F to this equation gives

FTeik·x = (2π)n/2 δk (7.52)

Again, this is in line with the intuitive understanding of Fourier transforms. The transform of a monochro-
matic wave with wave vector k is a spectrum “sharply peaked” at k. At the same time, Eq. (??) is the
mathematically correct version of Eq. (7.46).

123
8 Other linear partial differential equations
In this chapter, we will discuss a number of other linear partial differential equations which are important
in physics, including the Helmholz equation, the wave equation and the heat equation. We will cover a
number of methods to solve these equations but in the interest of keeping these notes manageable we will
not be quite as thorough as we have been for the Laplace equation. We begin with the Helmholz equation
which is closest to the Laplace equation.

8.1 The Helmholz equation


The homogeneous and inhomogeneous Helmholz equations in R3 (with coordinates x = (xi )) are given by

(∆ + k 2 )ψ = 0 , (∆ + k 2 )ψ = f , (8.1)
where k ∈ R is a real number and ∆ is the three-dimensional Laplace operator (although the equation can,
of course, also be considered in other dimensions). This equation appears, for example, in wave problems
with fixed wave number k, as we will see in our discussion of the wave equation later on.
As always, the general solution to the inhomogeneous Helmholz equation is given as a sum of the
general solution of the homogeneous equation plus a special solution of the inhomogeneous equation. The
homogeneous Helmholz equation is an eigenvalue equation (with eigenvalue −k 2 ) for the Laplace operator
and many of the methods discussed in the context of the Laplace equation can be applied.
To find a special solution of the inhomogeneous equation we can use the Green function method, in
analogy to what we did for the Laplace equation. Define the functions

e±ikr
G± (r) = , (8.2)
r
where r = |x| is the radial coordinate. Given that this function is independent of the angles we can use
only the radial part of the Laplacian in spherical coordinates (6.17) to verify that, for r > 0
   
1 d 2 d
(∆ + k 2 )G± = r + k 2
G± = 0 . (8.3)
r2 dr dr

Hence, G± solves the homogeneous Helmholz equation for r > 0 in much the same way 1/r solves the
homogeneous Laplace equation. In fact, the analogy goes further as stated in the following

Theorem 8.1. With G = AG+ + BG− , where A, B ∈ R and A + B = 1 the distribution TG is a


fundamental solution to the Helmholz operator, that is,

(∆ + k 2 )TG = −4πδ0 . (8.4)

Proof. The proof is very much in analogy with the corresponding one for the Laplace equation 7.4 and
can be found in Ref. [4]. Essentially, it relies on ∆T1/r = −4πδ0 and the fact that 1/r is really the only
singularity in G.

With this result, the general solution to the inhomogeneous Helmholz equation can be written as
1 1
Z
ψ(x) = ψhom (x) − (TG ? f )(x) = ψhom (x) − d3 y G(x − y)f (y) , (8.5)
4π 4π R3

where ψhom is an arbitrary solution of the homogeneous equation.

124
8.2 Eigenfunctions and time evolution
Many partial differential equations in physics involve a number of spatial coordinates x = (x1 , . . . , xn )T ∈
V ⊂ U ⊂ Rn as well as time t ∈ R and are of the form
1
Hψ = ψ̇ or Hψ = ψ̈ , (8.6)
c

where ψ = ψ(t, x), the dot denotes the derivative ∂t and c is a constant. We assume that H is a second

order linear differential operator in the spatial differentials ∂x i
, so a differential operator on C ∞ (U )∩L2 (U )
which is time-independent and hermitian relative to the standard scalar product on L2 (U ). If boundary
conditions are imposed on ∂V we assume that they are also time-independent. Under these conditions
there frequently exists a (time-independent) ortho-normal basis (φi )∞ 2
i=1 of L (U ) with the desired boundary
behaviour which consists of eigenfunctions of H, so

Hφi = λi φi , (8.7)

where the eigenvalues λi are real since H is hermitian. The problem is to solve the above equations
subject to an initial condition ψ(0, x) = ψ0 (x) and, in addition, ψ̇(0, x) = ψ̇0 (x) in the case of the second
equation (8.6), for given functions ψ0 and ψ̇0 . This can be done by expanding the function ψ, for any
given time t, in terms of the basis (φi ), so that
X
ψ(t, x) = Ai (t)φi (x) . (8.8)
i

Inserting this into the first Eq. (8.6) leads to

Ȧi = cλi Ai ⇒ Ai = ai ecλi t (8.9)

so that the complete solution reads


X
ψ(t, x) = ai φi (x)ecλi t . (8.10)
i

P !
The remaining constants ai are fixed by the initial condition ψ(0, x) = i ai φi (x) = ψ0 (x) which can be
solved in the usual way, using the orthogonality relations hφi , φj i = δij . This leads to

ai = hφi , ψ0 i . (8.11)

A similar calculation for the second equation (8.6) (assuming that λi < 0) leads to
p  p 
Äi = −|λi |Ai ⇒ Ai = ai sin |λi | t + bi cos |λi | t (8.12)

so that X p  p 
ψ(t, x) = ai sin |λi | t + bi cos |λi | t φi (x) . (8.13)
i
P !
The constants ai and bi are fixed by the initial conditions ψ(0, x) = i bi φi (x) = ψ0 (x) and ψ̇(0, x) =
P p !
i ai |λi |φi (x) = ψ̇0 (x) and are, hence, given by

1
ai = p hφi , ψ̇0 i , bi = hφi , ψ0 i . (8.14)
|λi |
Below, we will discuss various examples of this structure more explicitly.

125
8.3 The heat equation
The homogeneous and inhomogeneous heat equations are given by
   
∂ ∂
∆n − ψ=0, ∆n − ψ=f , (8.15)
∂t ∂t

where ∆n is the Laplacian in n dimensions (with n = 1, 2, 3 cases of physical interest). The solution
ψ = ψ(t, x) can be interpreted as a temperature distribution evolving in time. If we are solving the
equations on a spatial patch V ⊂ Rn we have to provide boundary conditions on ∂V and in addition, we
should provide an initial distribution ψ(0, x) = ψ0 (x) at time t = 0.
The homogeneous equation is of the form discussed in the previous subsection (with H = ∆ and
c = 1). If we are solving the equation on a spatial patch V ⊂ Rn with boundary conditions such that the
spectrum of the Laplacian has countable many eigenvectors then we can apply the method described in
Section 8.2.

Application 8.31. Evolution of temperature along a rod


To illustrate how this works in practice consider a one-dimensional problem (n = 1) where V = [0, a]
and we demand Dirichlet boundary conditions ψ(t, 0) = ψ(t, a) = 0 and some initial distribution
ψ0 (x) = ψ(0, x). (Physically, this corresponds to a rod of length a whose endpoints are kept at a fixed
temperature and with a given initial temperature distribution at time t = 0.) Given the space and
kπx

the boundary conditions the functions φk = sin a of the sine Fourier series provide an orthogonal
basis of eigenfunctions satisfying φ00k = λk φk with eigenvalues

k2 π2
λk = − . (8.16)
a2
Inserting this into the general solution (8.10) leads to
∞  
kπx k2 π 2
e− t
X
ψ(t, x) = bk sin a2 . (8.17)
a
k=1

The coefficients bk are determined by the initial condition ψ(0, x) = ψ0 (x) which leads to the standard
sine Fourier series
∞  
X kπx !
ψ(0, x) = bk sin = ψ0 (x) (8.18)
a
k=1

for ψ0 which, of course, implies that


a  
2 kπx
Z
bk = dx sin ψ0 (x) . (8.19)
a 0 a

Exercise 8.2. For V = [0, a], boundary conditions ψ(t, 0) = ψ(t, a) = 0 and an initial distribution
ψ(0, x) = T0 x(a−x)/a2 (where T0 is a constant) find the solution ψ(t, x) to the homogeneous heat equation.

For solutions in ψ(t, ·) ∈ L2 (Rn ) we can also solve the homogeneous heat equation using Fourier transforms.
Inserting
1
Z
ψ(t, x) = dn k ψ̃(t, k)e−ik·x (8.20)
(2π)n/2 Rn

126
into the heat equation leads to
˙ 2
ψ̃ = −|k|2 ψ̃ ⇒ ψ̃(t, k) = χ(k)e−|k| t , (8.21)

for a function χ(k). Inserting this back into the Ansatz gives
1
Z
2
ψ(t, x) = n/2
dn k χ(k)e−ik·x−|k| t , (8.22)
(2π) Rn

and the initial condition ψ(0, x) = ψ0 (x) translates into F(χ) = ψ0 which can be inverted using the inverse
Fourier transform, so χ = F̃(ψ0 ).

Exercise 8.3. Find the solution ψ(t, x) of the heat equation for x ∈ R, square integrable for all t, which
x2
satisfies ψ(0, x) = ψ0 (x) = T0 e− 2a2 .

To solve the inhomogeneous heat equation we would like to find a Green function. It can be verified
by direct calculation, that the function G : Rn+1 → R defined by
|x|2
(
− (4πt)1 n/2 e− 4t for t > 0
G(t, x) = (8.23)
0 for t ≤ 0

solves the homogeneous heat equation whenever t 6= 0 and x 6= 0.

Exercise 8.4. Show that the function G in Eq. (8.23) solves the homogeneous heat equation for t 6= 0 and
x 6= 0.

In fact we have

Theorem 8.5. The distribution TG with G defined in Eq. (8.23) is a fundamental solution of the heat
equation, so  

∆− TG = δ0 . (8.24)
∂t
Proof. The proof is similar to the one for the Laplacian in Theorem 7.4 and can be found in Ref. [4].

From this result, the general solution to the inhomogeneous heat equation can be written as
Z
ψ(t, x) = ψhom (t, x) + (TG ? f )(t, x) = ψhom (t, x) + dτ dn y G(t − τ, x − y)f (τ, y) , (8.25)
Rn+1

where ψhom is a solution to the homogeneous equation.

8.4 The wave equation


The homogeneous and inhomogeneous wave equations are of the form

∂2 ∂2
   
∆n − 2 ψ = 0 , ∆n − 2 ψ = f , (8.26)
∂t ∂t

where ∆n is the n-dimensional Laplacian (and n = 1, 2, 3 are the most interesting dimensions for physics).
If this equation is considered on the spatial patch V ⊂ Rn we should specify boundary conditions on ∂V
for all t. In addition, we require initial conditions ψ(0, x) = ψ0 (x) and ψ̇(0, x) = ψ̇0 (x) at some initial
time t = 0.

127
For an Ansatz of the form
ψ(t, x) = ψ̃(x)e−iωt (8.27)
the wave equation turns into the Helmholz equation for ψ̃ with k = ω and we can use the methods
discussed in Section 8.1.
Starting with the homogeneous equation with ψ(t, ·) ∈ L2 (Rn ) we can start with a Fourier integral
1
Z
ψ(t, x) = dn k ψ̃(t, k)e−ik·x . (8.28)
(2π)n/2 Rn
Inserting this into the homogenous equation implies
¨
ψ̃ = −|k|2 ψ̃ ⇒ ψ̃(t, k) = ψ+ (k)ei|k|t + ψ− (k)e−i|k|t , (8.29)

and, hence,
1
Z
ψ(t, x) = dn k (ψ+ (k)ei|k|t + ψ− (k)e−i|k|t )e−ik·x . (8.30)
(2π)n/2 Rn

The functions ψ± are fixed by the initial conditions via ψ0 = F(ψ+ + ψ− ) and ψ̇0 = F(i|k|(ψ+ − ψ− ))
which can be solved, using the inverse Fourier transform, to give ψ± = 12 F̃ ψ0 ∓ |k|
i
ψ̇0 .
If we work on a spatial patch with boundary conditions which lead to a countable number of eigenvec-
tors of the Laplacian we can use the method in Section 8.2 to solve the homogeneous wave equation. For
one and two spatial dimensions this leads to systems usually referred to as “strings” and “membranes”,
respectively, and we now discuss them in turn.

8.4.1 Strings
The wave equation now reads 8

∂2 ∂2
 
2
− 2 ψ=0, (8.31)
∂x ∂t
where ψ = ψ(t, x) and x ∈ [0, a]. This equation describes various kinds of strings from guitar strings to the
strings of string theory. We will impose Dirichlet boundary conditions ψ(t, 0) = ψ(t, a) = 0 as appropriate
for a string with fixed endpoints. (The strings of string theory allow for both Dirichlet and von Neumann
boundary conditions.) In addition, we need to fix the initial position, ψ(0, x) = ψ0 (x), and initial velocity
ψ̇(0, x) = ψ̇0 (x).
Given the boundary conditions an appropriate set of eigenfunctions is provided by φk = sin kπx

a ,
that is the functions for the sine Fourier series. We have φ00k = λk φk with eigenvalues
k2 π2
−λk = =: ωk2 . (8.32)
a2
Inserting this into the general solution (8.13) leads to
∞       
X kπt kπt kπx
ψ(t, x) = ak sin + bk cos sin (8.33)
a a a
k=1

The coefficients ak and bk are fixed by the initial conditions via


∞   ∞  
X kπx ! X kπak kπx !
ψ(0, x) = bk sin = ψ0 (x) , ψ̇(0, x) = sin = ψ̇0 (x) . (8.34)
a a a
k=1 k=1
8
In a physics context, this equation is frequently written with an additional factor of 1/c2 in front of the time-derivatives,
where c is the speed of the wave. Such a factor can always be removed by a re-definition ct → t.

128
These equations can of course be solved for ak and bk using standard Fourier series techniques resulting
in Z a
2 a
   
2 kπx kπx
Z
ak = dx sin ψ̇0 (x) , bk = dx sin ψ0 (x) . (8.35)
kπ 0 a a 0 a
Note that the eigenfrequencies of the system

ωk = (8.36)
a
are all integer multiples of the ground frequency ω1 = π/a.

Exercise 8.6. Find the solution ψ = ψ(t, x) for a string with length a, Dirichlet boundary conditions
ψ(t, 0) = ψ(t, a) = 0 and initial conditions ψ̇(0, x) = 0 and
(
hx
b for 0 ≤ x ≤ b
ψ(0, x) = h(a−x) (8.37)
a−b for b < x ≤ a

where b ∈ [0, a] and h are constants. (Think of a guitar string plucked at distance b from the end of the
string.) How do the parameters b and h affect the sound of the guitar?

8.4.2 Membranes
We are now dealing with a wave equation of the form
 2
∂2 ∂2


+ − ψ=0, (8.38)
∂x2 ∂y 2 ∂t2

where ψ = ψ(t, x, y) and (x, y) ∈ V ⊂ R2 with a (compact) spatial patch V. We require boundary
conditions at ∂V and initial conditions ψ(0, x, y) = ψ0 (x, y) and ψ̇(0, x, y) = ψ̇0 (x, y). Of course we can
consider any number of “shapes”, V, of the membrane. Let us start with the simplest possibility of a
rectangular membrane, V = [0, a] × [0, b], and Dirichlet boundary conditions
  ψ|∂V = 0. In this case,
we have an orthogonal basis of eigenfunctions φk,l (x, y) = sin a sin lπy
kπx

b , where k, l = 1, 2, . . ., with
∆2 φk,l = λk,l φk,l and eigenvalues

k 2 π 2 l2 π 2 2
−λk,l = + 2 =: ωk,l . (8.39)
a2 b
Inserting into Eq. (8.13) gives
∞    
X kπx lπy
ψ(t, x, y) = (ak,l sin(ωk,l t) + bk,l cos(ωk,l t)) sin sin . (8.40)
a b
k,l=1

The coefficients ak and bk are fixed by the initial conditions and can be obtained using standard Fourier
series techniques (in both the x and y coordinate), in analogy with the string case. We note that the
lowest frequencies of the quadratic (a = b) drum are
√ π √ π
ω1,1 = 2 , ω2,1 = ω1,2 = 5 . (8.41)
a a
Unlike for a string the eigenfrequencies of a drum are not integer multiples of the ground frequency - this
is why a drum sounds less well-defined compared to other instruments and why most instruments use
strings or other, essentially one-dimensional systems to produce sound.

129
For the round membrane with V = {x ∈ R2 | |x| ≤ a} and boundary condition φ|r=a = 0 we should first
find a set of eigenfunctions φ for the two-dimensional Laplacian in polar coordinates, satisfying ∆φ = −λφ.
Starting with the Ansatz φ(r, ϕ) = R(r)eimϕ and using the two-dimensional Laplace operator (6.11) in
polar coordinates gives
r2 R00 + rR0 + (λr2 − m2 )R = 0 . (8.42)

This is the Bessel differential equation for ν = |m| and the solution is R(r) ∼ J|m| ( λr). The boundary

condition at r = a implies that λa = z|m|n , where zνn denotes the nth zero of the Bessel function Jν .
This means we have eigenfunctions and eigenvalues
2
z|m|n
φmn (r, ϕ) ∼ Jˆ|m|n (r)eimϕ , λmn = 2
=: ωmn . (8.43)
a2
2 T
P
Expanding ψ(t, r, ϕ) = m,n Tmn (t)φmn (r, ϕ) we find the differential equations T̈mn = −ωmn mn so that
ωmn are the eigenfrequencies of the round membrane. As is clear from Eq. (8.43) these eigenfrequencies
are determined by the zeros of the Bessel functions so they are quite irregular.

8.4.3 Green function of wave equation


Returning to the inhomogeneous wave equation, we would like to find a Green function G, satisfying
∂2
 
∆3 − 2 G(t, x) = −4πδ(t)δ(x) . (8.44)
∂t
(We are using the intuitive notion of Delta “functions” to be closer to the standard physics treatment
but our discussion of distributions should reassure us that this leads to sensible results.) If we Fourier
transform G in the t-direction
1
Z
G(t, x) = dω G̃(ω, x)e−iωt (8.45)
2π R
it follows that the Fourier transform G̃ satisfies

(∆ + ω 2 )G̃(ω, x) = −4πδ(x) , (8.46)

and is, hence, a Green function for the Helmholz equation. From our discussion in Section 8.1 there are
essentially two choices for this Green function, namely
e±iω|x|
G̃± (ω, x) = . (8.47)
|x|
Inserting this into the Fourier integral (8.45) and using the result (7.46) gives
δ(t ∓ |x|)
G± (t, x) = . (8.48)
|x|
The general solution to the inhomogeneous wave equation, using the so-called retarded Green function
G+ , is then given by
1
Z
ψ(t, x) = ψhom (t, x) − dt0 d3 x0 G+ (t − t0 , x − x0 )f (t0 , x0 )
4π R4
1 δ(t − t0 − |x − x0 |) 0 0
Z
= ψhom (t, x) − dt0 d3 x0 f (t , x )
4π R4 |x − x0 |
 0 0 
1 f (t , x )
Z
3 0
= ψhom (t, x) − d x (8.49)
4π R3 |x − x0 | t0 =t−|x−x0 |

130
Note that this formula is very much in analogy with the corresponding result for the Laplace equation. The
difference is of course the dependence on t. The source f is evaluated at the retarded time t0 = t − |x − x0 |
to produce the solution at time t. The physical interpretation is that this takes into account the time it
takes for the effect of the source at x0 to influence the solution at x. The above result is the starting point
for calculating the electromagnetic radiation from moving charges.

131
9 Groups and representations∗
Symmetries have become a key idea in modern physics and they are an indispensable tool for the construc-
tion of new physical theories. They play an important role in the formulation of practically all established
physical theories, from Classical/Relativistic Mechanics, Electrodynamics, Quantum Mechanics, General
Relativity to the Standard Model of Particle Physics.
The word “symmetry” used in a physics context (usually) refers to the mathematical structure of a
group, so this is what we will have to study. In physics, the typical problem is to construct a theory which
“behaves” in a certain defined way under the action of a symmetry, for example, which is invariant. To
tackle such a problem we need to know how symmetries act on the basic building blocks of physical theories
and these building blocks are often elements of vector spaces. (Think, for example, of the trajectory r(t)
of a particle in Classical Mechanics which, for every time t, is an element of R3 , a four-vector xµ (t) which
is an element of R4 or the electric and magnetic fields which, at each point in space-time, are elements of
R3 .) Hence, we need to study the action of groups on vector spaces and the mathematical theory dealing
with this problem is called (linear) representation theory of groups. The translation between physical and
mathematical terminology is summarised in the diagram below.

physics mathematics

symmetry ∼
= group
action on ↓ ↓ representation
building blocks ∼
= vector spaces

Groups and their representations form a large area of mathematics and a comprehensive treatment can
easily fill two or three lecture courses. Here we will just touch upon some basics and focus on some of the
examples with immediate relevance for physics. We begin with elementary group theory - the definition
of a group, of a sub-group and of group homomorphisms plus examples of groups - before we move on to
representations. Lie groups and their associated Lie algebras play an important role in physics and we
briefly discuss the main ideas before moving on to the physically most relevant examples of such groups.

9.1 Groups: some basics


A group is one of the simplest algebraic structures - significantly simpler than a vector space - which has
only one operation, usually referred to as group multiplication, subject to three axioms.

9.1.1 Groups and subgroups


The formal definition of a group is:

Definition 9.1. (Group) A group is a set G with a map · : G × G → G, called group multiplication, which
satisfies
(G1) g1 · (g2 · g3 ) = (g1 · g2 ) · g3 for all g1 , g2 , g3 ∈ G (associativity)
(G2) There exists an e ∈ G such that e · g = g for all g ∈ G (neutral element)
(G3) For all g ∈G there exists a g̃ ∈ G such that g̃ · g = e (inverse element)
If, in addition, the group multiplication commutes, that is, if g1 · g2 = g2 · g1 for all g1 , g2 ∈ G, then the
group is called Abelian. Otherwise it is called non-Abelian.

Groups can have a finite or infinite number of elements and we will see examples of either. In the former
case, they are called finite groups. The number of elements in a finite group is also called the order of the
group.

132
This above definition looks somewhat asymmetric since we have postulated that the neutral element
and the inverse in (G2) and (G3) multiply from the left but have made no statement about their multi-
plication from the right. However, this is not a problem due to the following Lemma.
Proposition 9.1. For a group G we have the following statements.
(i) A left-inverse is also a right-inverse, that is, g̃ · g = e ⇒ g · g̃ = e.
(ii) A left-neutral is also a right neutral, that is, e · g = g ⇒ g · e = g.
(iii) For a given g ∈ G, the inverse is unique and denoted by g −1 .
(iv) The neutral element e is unique.
Proof. (i) Start with a g ∈ G and its left-inverse g̃ so that g̃ · g = e. Of course, g̃ must also have a
left-inverse which we denote by g 0 so that g 0 · g̃ = e. Then we have
g · g̃ = e · g · g̃ = g 0 · g̃ · g ·g̃ = g 0 · g̃ = e (9.1)
|{z}
=e

so that g̃ is indeed also a right-inverse.


(ii), (iii), (iv) These proofs are similar to the one for (i) and we leave them as an exercise.

The inverse satisfies the following simple properties which we have already encountered in the context of
maps and their inverses.
(g −1 )−1 = g , (g1 ◦ g2 )−1 = g2−1 ◦ g1−1 . (9.2)
Exercise 9.1. Proof statements (ii), (iii) and (iv) of Lemma 9.1 as well as the rules (9.2).
We follow the same build-up as in the case of vector spaces and next definite the relevant sub-structure,
the sub-group.
Definition 9.2. (Sub-group) A subset H ⊂ G of a group G is called a sub-group of G if it forms a group
under the multiplication induced from G.
The following exercise provides a practical way of checking whether a subset of a group is a sub-group.
Exercise 9.2. Show that a subset H ⊂ G of a group G is a sub-group iff it satisfies the following conditions:
(i) H is closed under the group multiplication.
(ii) e ∈ H
(iii) For all h ∈ H we have h−1 ∈ H

9.1.2 Group homomorphisms


The next step is to define the maps which are consistent with the group structure, the group homomor-
phisms.
Definition 9.3. A map f : G → G̃ between two groups G and G̃ is called a group homomorphism iff
f (g1 · g2 ) = f (g1 ) · f (g2 ) for all g1 , g2 ∈ G.
We can define the image of f by Im(f ) = {f (g) | g ∈ G} ⊂ G̃.
The kernel of f is defined as Ker(f ) = {g ∈ G | f (g) = ẽ} ⊂ G (where ẽ is the neutral element of G̃).
Note that these definitions are in complete analogy with those for linear maps.
Exercise 9.3. Given a group homomorphism f : G → G̃, show that the image is a sub-group of G̃ and
the kernel is a sub-group of G. Also show that
(i) f (e) = ẽ
(ii) f injective ⇐⇒ Ker(f ) = {e}
(iii) f surjective ⇐⇒ Im(f ) = G̃.

133
9.1.3 Examples of groups
We should now discuss some examples of groups. You are already familiar with many of them although
you may not yet have thought about them in this context.
Examples from “numbers”: Every field F forms a group, (F, +), with respect to addition and F \ {0}
forms a group, (F \ {0}, ·) with respect to multiplication (we need to exclude 0 since it does not have
a multiplicative inverse). So, more concretely, we have the groups (R, +), (C, +), (R \ {0}, ·) and (C \
{0}, ·). The integers Z also form a group, (Z, +) with respect to addition (however, not with respect to
multiplication since there is no multiplicative inverse in Z). Clearly, all of these groups are Abelian. The
group (Z, +) is a sub-group of (R, +) which, in turn, is a sub-group of (C, +).
Examples from vector spaces: Every vector space forms an Abelian group with respect to vector
addition.
Finite Abelian groups: Consider the set Zp := {0, 1, . . . , p − 1} for any positive integer p and introduce
the group multiplication
g1 · g2 := (g1 + g2 ) mod p . (9.3)
Clearly, the Zp form finite, Abelian groups (with neutral element 0) which are also referred to as cyclic
groups.
Finite non-abelian groups: The permutations Sn := {σ : {1, . . . , n} → {1, . . . , n} | σ bijective} of n
objects form a group with group multiplication given by the composition of maps. Indeed, the composition
of maps is associative, we have the identity map which serves as the neutral element and the inverse is
given by the inverse map. In conclusions, the permutations Sn form a finite group of order n! which is also
referred to as symmetric group. Are these groups Abelian or non-Abelian? We begin with the simplest
case of S2 which has the two elements
    
1 2 1 2
S2 = e = , (9.4)
1 2 2 1

(Recall, the above notation is a way of writing down permutations explicitly. It indicates a permutation
which maps the numbers in the top row to the corresponding numbers in the bottom row.) Clearly, this
group is Abelian since the second element commutes with itself and everything commutes with the identity
(the first element). For Sn with n > 2 consider the two permutations
   
1 2 3 ··· 1 2 3 ···
σ1 = , σ2 = , (9.5)
1 3 2 ··· 2 1 3 ···

where the dots stand for arbitrary permutations of the number 4, . . . , n. We have
   
1 2 3 ··· 1 2 3 ···
σ1 · σ2 = , σ2 · σ1 = , (9.6)
3 1 2 ··· 2 3 1 ···

so that σ1 · σ2 6= σ2 · σ1 . Hence, the permutation groups for n > 2 are non-Abelian.


General linear groups: For a (finite-dimensional, say) vector space V , the set Gl(V ) := {g : V →
V | g linear and invertible} of all invertible, linear maps forms a group, called the general linear group of
V . The group multiplication is composition of maps, the identity map is the neutral element and the
inverse is given be the inverse map (which exists since we are considering only invertible linear maps).
More specifically, we have the groups Gl(Rn ) of invertible, real n × n matrices and the groups Gl(Cn ) of
invertible, complex n × n matrices, both with matrix multiplication as the group multiplication, the unit
matrix as the neutral element and the matrix inverse as the inverse. General linear groups naturally act

134
on the vector spaces they are associated with and, therefore, realise the action of groups on vectors we
would like to achieve for groups more generally. For this reason, they play an important role in the theory
of representations, as we will see shortly. General linear groups have many interesting sub-groups, some
of which we will now discuss.
Unitary and special unitary groups: The unitary group, U (n), is defined as

U (n) := {U ∈ Gl(Cn ) | U † U = 1n } , (9.7)

so it consists of all unitary n × n matrices. Why is this a group? Since U (n) is clearly a subset of the
general linear group Gl(Cn ) all we have to do is verify the three conditions for a sub-group in Exercise 9.2.
Condition (i), closure, is obvious since the product of two unitary matrices is again unitary, as is condition
(ii) since the unit matrix is unitary. To verify condition (iii) we need to show that with U ∈ U (n) also the
inverse U −1 = U † is in U (n). To do this, consider U as an element of the group Gl(Cn ), so that U † U = 1n
implies U U † = 1n (since, in a group, the left inverse is the right inverse). The last equation can be written
as (U † )† U † = 1n which shows that U † ∈ U (n). Also note that the defining relation, U † U = 1n implies
that the determinant of unitary matrices satisfies

|det(U )| = 1 . (9.8)

The simplest unitary group is


U (1) = {eiθ | 0 ≤ θ < 2π} , (9.9)
which consists of complex numbers with length one. It is clearly Abelian. The observation that U (1) can
be seen as the unit circle, S 1 , in the complex plane points to a more general feature - some groups are
also manifolds and these groups are referred to as Lie groups. We will discuss this in more detail later.
As an example of a group homomorphism consider the map f : Zn = {0, 1, . . . , n − 1} → U (1) defined
by
f (k) := e2πik/n . (9.10)
Exercise 9.4. Show that Eq. (9.10) defines an injective group homomorphism.
To discuss higher-dimensional cases it is useful to introduce the special unitary groups SU (n) which consist
of all unitary matrices with determinant one, so

SU (n) := {U ∈ U (n) | det(U ) = 1} . (9.11)

Exercise 9.5. Show that SU (n) is a sub-group of U (n).


What is the relationship between U (n) and SU (n)? For an arbitrary unitary A ∈ U (n) pick a solution
ζ to the equation ζ n = det(A) . (Due to Eq. (9.8) we have, of course, |ζ| = 1.) Then U := ζ −1 A is a
special unitary matrix since det(U ) = ζ −n det(A) = 1 which shows that every unitary matrix A can be
written as
A=ζU , (9.12)
that is, as a product of a complex phase and a special unitary matrix. We can, therefore, focus on special
unitary matrices. The lowest-dimensional non-trivial case is SU (2) and inserting a general complex 2 × 2
matrix into the defining relations U † U = 12 and det(U ) = 1 gives
  
α β
SU (2) = | α, β ∈ C , |α| + |β| = 1 .
2 2
(9.13)
−β ∗ α∗

Exercise 9.6. Show that Eq. (9.13) provides the correct expression for the group SU (2).

135
The explicit form (9.13) of SU (2) shows that this group is non-Abelian (as are all higher SU (n) groups)
and that it can be identified with the three-sphere S 3 . This means we have another example of a group
which is also a manifold, so a Lie group.
Finding explicit parametrisations along the lines of Eq. (9.13) for SU (3) or even higher-dimensional
cases is not practical anymore and we will later discuss more efficient ways of dealing with such groups.
Orthogonal and special orthogonal groups: This discussion is very much parallel to the previous one
on unitary and special unitary groups, except it is based on the real, rather than the complex numbers.
The orthogonal and special orthogonal groups (= rotations) are defined as

O(n) := {A ∈ Gl(Rn ) | AT A = 1n } (9.14)


SO(n) := {R ∈ O(n) | det(R) = 1n } , (9.15)

and, hence, consist of all orthogonal matrices and all orthogonal matrices with determinant one, respec-
tively.
Exercise 9.7. Show that O(n) is a sub-group of Gl(Rn ) and that SO(n) is a sub-group of O(n). (Hint:
Proceed in analogy with the unitary case.) Also show that every A ∈ O(N ) has determinant det(A) = ±1
and is either special orthogonal or can be written as A = F R, where R ∈ SO(n) and F = diag(−1, 1, . . . , 1).
Just as for unitary groups, it is easy to deal with the two-dimensional case and show, by explicitly inserting
an arbitrary 2 × 2 matrix into the defining relations, that SO(2) is given by
  
cos θ sin θ
SO(2) = | 0 ≤ θ < 2π . (9.16)
− sin θ cos θ

Since this is parametrised by a circular coordinate, SO(2) can also be thought of as a circle, S 1 . This is a
good opportunity to present another example of a group homomorphism f : U (1) → SO(2), defined by
 
iθ cos θ sin θ
f (e ) := . (9.17)
− sin θ cos θ

Exercise 9.8. Show that the map (9.17) defines a bijective group homomorphism.
The previous exercise shows that U (1) and SO(2) are isomorphic - as far as their group structure is
concerned they represent the same object. Perhaps more surprisingly, we will see later that SO(3) and
SU (2) are homeomorphic (and very nearly isomorphic) as well.
We could continue and write down an explicit parametrisation for SO(3), for example in terms of a
product of three two-dimensional rotations but will refrain from doing so explicitly. Later we will see that
there are more efficient methods to deal with SO(n) for n > 2.

The previous list provides us with sufficiently many interesting examples of groups and we now turn to
our main task - representing groups.

9.2 Representations
Recall from the introduction that we are after some rule by which groups can act on vector spaces. We
already know that the general linear group Gl(V ) (where V = Rn or V = Cn , say), which consists of
invertible linear maps (or matrices) on V , naturally acts on the vector space V . We can, therefore, achieve
our goal if we embed an arbitrary group G into Gl(V ). However, we want to do this in a way that preserves
the group structure of G and this is precisely accomplished by group homomorphisms. This motivates the
following definition of a representation of a group.

136
Definition 9.4. (Representation of a group) A representation of a group G is a group homomorphism
R : G → Gl(V ), where V is a vector space (typically taken to be V = Rn or V = Cn ). The dimension
of V is called the dimension, dim(R), of the representation R. A representation is called faithful if R is
injective.
The keyword in this definition is “group homomorphism” which means that the representation R satisfies

R(g1 · g2 ) = R(g1 )R(g2 ) , (9.18)


for all g1 , g2 ∈ G. This rule implies that the representation matrices R(g) multiply “in the same way”
as the original group elements g and it is for this reason that we call such an R a “representation”. To
illustrate the idea of a group representation we should discuss a few examples.

9.2.1 Examples of representations


Trivial representation: For every group G, we have the trivial representation defined by R(g) = 1n for
all g ∈ G. Of course this representation is not faithful.
Matrix groups: Some of the examples of groups we have introduced earlier already consist of matrices
(that is, they are sub-groups of the general linear group) and are, therefore, represented by themselves,
taking R to be the identity map. This applies to the (special) unitary groups (S)U (n) and the (special)
orthogonal groups (S)O(n). In each case, the dimension of the representation is n (since the matrices act on
n-dimensional vectors) and the representation is faithful. These particular representations are frequently
called the fundamental representations. Of course, these groups have other, less trivial representations
and we will see examples later.
Representations of Zn : For the cyclic groups Zn = {0, 1, . . . , n − 1} and any q = 0, . . . , n − 1 we can
write down a one-dimensional representation R(q) : Zn → Gl(C) =: C∗ by

R(q) (k) := e2πiqk/n . (9.19)

Representations of U (1): For U (1) = {eiθ | 0 ≤ θ < 2π} and any q ∈ Z we can define a one-dimensional,
faithful representation R(q) : U (1) → Gl(C) =: C∗ by

R(q) (eiθ ) = eiqθ . (9.20)

Direct sum representations: For two representations R : G → Gl(V ) and R̃ : G → Gl(Ṽ ) of the same
group G, we can define the direct sum representation R ⊕ R̃ : G → Gl(V ⊕ Ṽ ) by
 
R(g) 0
(R ⊕ R̃)(g) := , (9.21)
0 R̃(g)

that is, be simply arranging the representation matrices R(g) and R̃(g) into a block-matrix. Obviously
dimensions sum up, so dim(R ⊕ R̃) = dim(R) + dim(R̃).
As an explicit example, consider the above one-dimensional representations R(1) and R(−1) of U (1),
taking q = ±1 in Eq. (9.20). Their direct sum representation is two-dimensional and given by
 iθ 
iθ e 0
(R(1) ⊕ R(−1) )(e ) = . (9.22)
0 e−iθ

Tensor representations: For this we need a bit of preparation. Consider two square matrices A, B of
size n and m, respectively. By their Kronecker product A × B we mean the square matrix of size nm

137
obtained by replacing each entry in A with that entry times the entire matrix B. The Kronecker product
satisfies the useful rule
(A × B)(C × D) = (AC) × (BD) , (9.23)
provided the sizes of the matrices A, B, C, D fit as required.
Now consider two representations R : G → Gl(V ) and R̃ : G → Gl(Ṽ ) of the same group G. The
tensor representation R ⊗ R̃ : G → Gl(V ⊗ Ṽ ) is defined by

(R ⊗ R̃)(g) := R(g) × R̃(g) . (9.24)

Given the definition of the Kronecker product, dimensions multiply, so dim(R ⊗ R̃) = dim(R)dim(R̃).
As an explicit example, consider the two-dimensional fundamental representation of SU (2) given by
R(U ) = U of any U ∈ SU (2). Then the tensor representation R ⊗ R is four-dimensional and given by
 
U11 U U12 U
(R ⊗ R)(U ) = , (9.25)
U21 U U22 U

Exercise 9.9. Use Eq. (9.23) to show that Eq. (9.24) does indeed define a representation.

This should be enough examples for now.

9.2.2 Properties of representations


To talk about representations efficiently we need to introduce a few more pieces of standard terminology.

Definition 9.5. Two representations R : G → Gl(V ) and R̃ : G → Gl(V ) are called equivalent if there is
an invertible linear map P : V → V such that R̃(g) = P R(g)P −1 for all g ∈ G.

In other words, if the representation matrices of two representations only differ by a common basis trans-
formation they are really the same representation and we call them equivalent. A further simple piece of
terminology is the following.

Definition 9.6. A representation R : G → Gl(V ) on an inner product vector space V is called unitary iff
all R(g) are unitary with respect to the inner product on V .

More practically, if V = Rn or V = Cn with their respective standard scalar products, the unitary
representations are precisely those with all representation matrices orthogonal or unitary matrices. For
example, the U (1) representations (9.20) and (9.22) are unitary.
In Eq. (9.21) we have seen that larger representations can be built up from smaller ones by forming
a direct sum. Conversely we can ask whether a given representation can be split up into a direct sum of
smaller representations. In this way, we can attempt to break up representations into their smallest build-
ing block which cannot be decomposed further in this way and which are called irreducible representation.

Definition 9.7. A representation R : G → Gl(V ) is called irreducible if there is no non-trivial sub-vector


space W ⊂ V (that is, no sub-space other than W = {0} and W = V ) invariant under R, that is, there is
no non-trivial W for which R(g)(W ) ⊂ W for all g ∈ G. Otherwise, the representation is called reducible.

For example, all fundamental representations of (S)O(n) and (S)U (n) are irredcible.

Exercise 9.10. Show that the fundamental representations of SU (2) and SO(3) are irreducible.

138
What does it mean in practice for a representation to be reducible? Suppose we have a non-trivial sub-
space W ⊂ V invariant under the representation R and we choose a basis for W which we complete to a
basis for V . Relative to this basis, the representation matrices have the form
 
A(g) B(g) W
R(g) = , (9.26)
0 D(g) rest

where A(g), B(g) and D(g) are matrices. The zero in the lower left corner is forced upon us to ensure a
vector in W is mapped to a vector in W as required by invariance. The form (9.26) is not quite a direct
sum due to the presence of the matrix B(g). To deal with this we define the following.

Definition 9.8. A reducible representation is called fully reducible if it is a direct sum of irreducible
representations.

In practice, this means we can achieve a form for which all matrices B(g) in Eq. (9.26) vanish. Not all
reducible representations are fully reducible but we have

Theorem 9.11. All reducible representations of finite groups and all reducible unitary representations are
fully reducible.

Proof. The proof can be found in Ref [13].

A common problem is having to find the irreducible representations contained in a given (fully) re-
ducible representation. If this representation is given as a direct sum, R ⊕ R̃, with R and R̃ irreducible,
then this task is trivial - the irreducible pieces are simply R and R̃. The tensor product R ⊗ R̃ is more
interesting. It is not obviously block-diagonal but, as it turns out, it is usually reducible. This means
there is a decomposition, also called Clebsch-Gordan decomposition,

R ⊗ R̃ ∼
= R1 ⊕ · · · ⊕ Rk (9.27)

of a tensor product into irreducible representations Ri . We will later study this decomposition more
explicitly for the case of SU (2).
Since the interest in physics is usually in orthogonal or unitary matrices Theorem 9.11 covers the
cases we are interested in and it suggests a classification problem. For a given group G we should find all
irreducible representations - the basic building blocks from which all other representations (subject to the
conditions of the above theorem) can be obtained by forming direct sums. A very helpful statement for
this purpose (which is useful in other context, for example in quantum mechanics, as well) is the famous
Schur’s Lemma.

Lemma 9.1. (Schur’s Lemma) Let R : G → Gl(V ) be an irreducible representations of the group G over
a complex vector space V and P : V → V a linear map with [P, R(g)] = 0 for all g ∈ G. Then P = λ idV ,
for a complex number λ.

Proof. Since we are working over the complex numbers the characteristic polynomial for P has at least one
zero, λ, and the associated eigenspace EigP (λ) is non-trivial. For any v ∈ EigP (λ), using [P, R(g)] = 0,
we have
P R(g)v = R(g)P v = λR(g)v . (9.28)
Hence, R(g)v is also an eigenvector with eigenvalue λ and we conclude that the eigenspace EigP (λ) is
invariant under R. However, R is irreducible by assumption which means there are no non-trivial invariant
sub-spaces. Since EigP (λ) 6= {0} the only way out is that EigP (λ) = V . This implies that P = λ idV .

139
Schur’s lemma says in essence that a matrix commuting with all (irreducible) representation matrices of
a group must be a multiple of the unit matrix. This can be quite a powerful statement. However, note
the condition that the representation is over a complex vector space - the theorem fails in the real case.
A counterexample is provided by the fundamental representation of SO(2), given by the matrices (9.16).
Seen as a representation over R2 it is irreducible but all representation matrices of SO(2) commute with
one another.
An immediate conclusion from Schur’s Lemma is the following.
Corollary 9.1. All complex irreducible representations of an Abelian group are one-dimensional.
Proof. For an Abelian group G we have g ◦ g̃ = g̃ ◦ g for all g, g̃ ∈ G which implies [R(g), R(g̃)] = 0
for any representation R. The linear map P = R(g) then satisfies all the conditions of Schur’s Lemma
and we conclude that R(g) = λ(g) idV . However, this form is only consistent with R being irreducible if
dim(R) = 1.

This statement is the key to finding all complex, irreducible representations of Abelian groups and we
discuss this for the two most important Abelian examples, Zn and U (1).

Application 9.32. All complex irreducible representations of Zn


The group Zn = {0, 1, . . . , n − 1} is Abelian so all complex irreducible representations R must be
one-dimensional from the previous lemma. The identity element 0 ∈ Zn must be mapped to the
identity “matrix”, so R(0) = 1. Let us set R(1) = ζ ∈ Gl(C) = C∗ . It follows that

+ · · · 1}) = R(1)n = ζ n ,
1 = R(0) = R(1| + 1{z (9.29)
n times

so ζ must be an nth root of unity, ζ = e2πiq/n , where q = 0, . . . , n − 1. Note that the choice of
R(1) = ζ determines the entire representation since R(k) = R(1)k = ζ k . This means we have
precisely n complex, irreducible representations of Zn given by

R(q) (k) = e2πiqk/n where q = 0, 1, . . . , n − 1 . (9.30)

Application 9.33. All complex irreducible representations of U (1)


Arguing for all (continuous) irreducible, complex representations of U (1) works in much the same
way, with the conclusion that the representations (9.20) with q ∈ Z form a complete set.

Exercise 9.12. Show that Eq. (9.20), where q ∈ Z is a complete list of all complex, irreducible U (1)
representations. (Hint: Start by considering representation matrices for group element eiθ where θ is
rational and then use continuity.)

In a physics context, the integer q which labels the above Zn and U (1) representations is also called
the charge. As you will learn later in the physics course, fixing the electrical charge of a particle (such
as the electron charge), mathematically amounts to choosing a U (1) representation for this particle.
As we have seen above, complex, irreducible representations of Abelian groups are quite straight-
forward to classify, essentially because the representation “matrices” are really just numbers. The
somewhat “brute-force” approach taken above becomes impractical if not unfeasible for the non-

140
Abelian case. Just imagine having to write down, for a two-dimensional representation, an arbitrary
2 × 2 matrix as an Ansatz for each representation matrix and then having to fix the unknown entries
by imposing the group multiplication table on the matrices. Clearly, we need more sophisticated
methods to deal with the non-Abelian case. There is a beautiful set of methods for finite non-Abelian
groups, using characters but discussing this in detail is beyond the scope of this lecture. If you are
interested, have a look at Ref. [13]. Instead, we focus on Lie groups, which we now define and analyse.

9.3 Lie groups and Lie algebras


9.3.1 Definition of Lie group
A Lie group is formally defined as follows.

Definition 9.9. A group G is a Lie group if it is a differentiable manifold and the left-multiplication with
group elements and the inversion of group elements are differentiable maps.

It is difficult to exploit this definition without talking in some detail about differentiable manifolds, which
is well beyond the scope of this lecture. Fortunately, for our purposes we can adopt a somewhat more
practical definition of a Lie group which follows from the more abstract one above.
A matrix Lie group G is a group given (at least around a neighbourhood of the group identity 1), by
a family g = g(t) of n × n matrices, which depend (in an infinitely differentiable way) on real parameters
∂g ∂g
t = (t1 , . . . , tk ) and such that the matrices ∂t 1
, . . . ∂tk
are linearly independent. (This last requirement
is really the same as the maximal rank condition in Theorem B.1.) Note that the matrices g(t) act on
the vector space V = Rn or V = Cn . For convenience, we assume that g(0) = 1n is the group identity.
The number, k, of parameters required is called the dimension of the Lie group. (Do not confuse this
dimension of the matrix Lie group with the dimension n of the vector space V on which these matrices
act.)
Obvious examples of Lie groups are U (1) in Eq. (9.9) and SO(2) in Eq. (9.16), both of which are
parametrised by one angle (playing the role of the single parameter t1 ) and are, hence, one-dimensional.
A more interesting example is provided p by SU (2). We can solve the constraint on α and β in Eq. (9.13)
by setting β = −t2 + it1 and α = 1 − t21 − t22 eit3 . This leads to the explicit parameterization
 p
1 − t21 − t22 eit3 p −t2 + it1

U= , (9.31)
t2 + it1 1 − t21 − t22 e−it3

which shows that SU (2) is a three-dimensional Lie group. A similar argument can be made for SO(3)
which can be parametrised in terms of three angles and is, hence, a three-dimensional Lie group as well.
In fact, all orthogonal and unitary groups (as well as their special sub-groups) can be parametrised in this
way and are Lie groups. However, writing down these parametrisations explicitly becomes unpractical in
higher dimensions.

9.3.2 Definition of Lie algebra


Instead, there are much more efficient methods of dealing with Lie-groups which are based on the obser-
vation that, for many purposes, it is already sufficient to consider “infinitesimal” transformations, that is,
group elements “near” the identity. The set of these infinitesimal transformations of a group G is called
the Lie algebra, of G. Abstractly, a Lie algebra is defined as follows.

141
Definition 9.10. A Lie algebra L is a vector space with a commutator bracket [·, ·] : L × L → L which is
anti-symmetric, so [T, S] = −[S, T ] and satisfies the Jacobi identity [T, [S, U ]] + [S, [U, T ]] + [U, [T, S]] = 0
for all T, S, U ∈ L.
Let us see how we can associate a Lie algebra in this sense to our group G. We start with the generators,
Ti , of the group defined by 9 .
∂g
Ti := −i i (0) . (9.32)
∂t
In terms of the generators we can think of group elements near the identity as given by the Taylor series
k
g(t) = 1 + i
X
ti Ti + O(t2 ) . (9.33)
i=1

The Lie algebra L(G) is the vector space of matrices spanned (over R) by the generators, that is,

L(G) = SpanR (T1 , . . . , Tk ) . (9.34)

By definition of a matrix Lie group the generators must be linearly independent, so the dimension of L(G)
as a vector space is the same as the dimension of the underlying group as a manifold. Now we understand
how to obtain the generators from the Lie group. Is there a way to reverse this process and obtain the
Lie group from its generators? Amazingly, the answer is “yes” due to the following theorem.
Theorem 9.13. Let G be a (matrix) Lie group and L(G) as defined in Eq. (9.34). Then the matrix
exponential exp(i ·) provides a map exp(i ·) : L(G) → G whose image is the part of G which is (path)-
connected to the identity.
Proof. See, for example, Ref. [13].

Application 9.34. The exponential map for SU(2)


Theorem 9.13 can be verified explicitly for SU (2) where the matrix exponential can be carried out
explicitly. Setting t = 2θn with a unit vector n we have

(iθ)n
(n · σ)n = cos(θ)12 + i sin(θ)n · σ .
X
g(t) = exp(itj τj ) = (9.35)
n!
n=0

This gives all SU(2) matrices as comparison with Eq. (9.13) shows,
We can also recover the group SO(3) by forming the matrix exponential

g(t) = exp(iti T̃i ) , (9.36)

with the matrices T̃i in Eq. (9.51). (Note, that O(3), which has the same Lie algebra as SO(3) cannot
be fully recovered by the matrix exponential since the orthogonal matrices with determinant −1 are
not path-connected to the identity.)

We know that L(G) is a vector space and we can define a bracket by the simple matrix commutator
[T, S] := T S − ST . Clearly, this bracket is anti-symmetric and a simple calculation shows that it satisfies
the Jacobi identity.
9
The factor of −i is included so that the generators become hermitian (rather than anti-hermitian).

142
Exercise 9.14. Show that the matrix commutator bracket [T, S] = T S − ST on L(G) satisfies the Jacobi
identity.

All that remains to be shown for L(G) to be a Lie-algebra in the sense of Def. 9.10 is that it is closed
under the bracket, that is, if T, S ∈ L(G) then [T, S] ∈ L(G). This follows from the closure of G under
multiplications. Start with two group element g(t) and g(s), expand each to second order using the matrix
exponential and consider the combination
k
X
g(t)−1 g(s)−1 g(t)g(s) = 1 − ti sj [Tj , Tk ] + · · · . (9.37)
i,j=1

Since the LHS of this equation must be a group element we conclude the commutators [T, S] is indeed an
element of T, S ∈ L(G). In conclusion L(G), the vector space spanned by the generators, together with
the matrix commutator, forms a Lie algebra in the sense of Def. 9.10.
Since [Ti , Tj ] ∈ L(G) and the generators form a basis of L(G) it is clear that there must be constants
k
fij , also called structure constants of the Lie algebra, such that

[Ti , Tj ] = fij k Tk . (9.38)

Eq. (9.33) shows why we should think of the Lie algebra L(G) as encoding “infinitesimal” group trans-
formations. Consider a vector v ∈ V which transforms under a group element g(t) as v → g(t)v. Then
inserting the expansion (9.33), it follows that
k k
!
v → g(t)v = 1n + i
X X
ti Ti + O(t2 ) v ⇒ δv := g(t)v − v = i ti Ti v + O(t2 ) . (9.39)
i=1 i=1

9.3.3 Examples of Lie groups and their algebras


Let us compute a few Lie algebras explicitly.
Lie algebra of U (1) The group U (1) depends on one parameter t1 , see Eq. (9.9), and we find for the
single generator
∂g
g(t1 ) = eit1 ⇒ T1 = −i (0) = 1 . (9.40)
∂t1
Hence, the Lie algebra, L(U (1)) = R, is one-dimensional and the commutator bracket and the structure
constants are trivial.
Lie algebra of SO(2) The group SO(2) also depends on a single parameter t1 , see Eq. (9.16), and for
the generator we have
   
cos t1 sin t1 ∂g 0 −i
g(t1 ) = ⇒ T1 = −i (0) = . (9.41)
− sin t1 cos t1 ∂t1 i 0

This means the Lie algebra L(SO(2)) is one-dimensional and consists of (i times) the 2 × 2 anti-symmetric
matrices.
Lie algebra of SU (2) This is a more interesting example, since SU (2) depends on three parameters ti ,
where i = 1, 2, 3 and taking the derivatives of the group elements in Eq. (9.31) we find for the generators
∂g
Ti = −i = σi , (9.42)
∂ti

143
where σi are the Pauli matrices. Recall that the Pauli matrices satisfy the useful relation

σi σj = 12 δij + iijk σk . (9.43)

It is customary to take as the standard generators for L(SU (2)) the matrices
1
τi := σi . (9.44)
2
Hence, the Lie-algebra of SU (2) is three-dimensional and given by

L(SU (2)) = Span(τi ) = {hermitian, traceless 2 × 2 matrices} (9.45)

Given the commutation relations [σi , σj ] = 2iijk σk (which follow from Eq. (9.43)) for the Pauli matrices
we have
[τi , τj ] = iijk τk , (9.46)
for the standard generators τi so that the structure constants are fijk = iijk .
Exercise 9.15. Verify the relation (9.43) for the Pauli matrices. (Hint: Show that the Pauli matrices
square to the unit matrix and that the product of two different Pauli matrices is ±i times the third.) Show
from Eq. (9.43) that [σi , σj ] = 2iijk σk and that tr(σi σj ) = 2δij .
The method for computing the Lie algebra illustrated above becomes impractical for higher-dimensional
groups since it requires an explicit parametrisation. However, there is a much more straightforward way
to proceed.
Lie algebras of unitary and special unitary groups: To work out the Lie algebra of unitary groups
U (n) in general start with the Ansatz U = 1n +iT +· · · and insert this into the defining relation U † U = 1n ,
keeping only terms up to linear order in T , to work out the resulting constraint on the generators. Doing
this in the present case results in T = T † , so the generators must be hermitian and the Lie algebra is

L(U (n)) = {hermitian n × n matrices} ⇒ dim(L(U (n))) = n2 . (9.47)

For the case of SU (n) we have to add the condition det(U ) = 1 which leads to det(1n + iT + · · · ) =
!
1 + i tr(T ) + · · · = 1, so that tr(T ) = 0. This means

L(SU (n)) = {traceless hermitian n × n matrices} ⇒ dim(L(SU (n))) = n2 − 1 . (9.48)

We can now choose a basis and compute structure constants for these Lie algebras but we will not do this
explicitly, other than for the case of SU (2) which we have already covered.
Lie algebras of orthogonal and special orthogonal groups: This works in analogy with the unitary
case. Inserting the Ansatz A = 1n + iT + · · · into the defining relation AT A = 1n for O(n) (where T has
to be purely imaginary, so that A is real) leads to T = −T T , so that T must be anti-symmetric. Since
anti-symmetric matrices are already traceless the additional condition det(A) = 1 for SO(n) does not add
anything new and we have

L(O(n)) = L(SO(n)) = {anti-symmetric, purely imaginary n × n matrices} (9.49)


1
⇒ dim(L(O(n))) = dim(L(SO(n))) = n(n − 1) . (9.50)
2
We will only discuss a choice of basis for the simplest non-trivial case L(SO(3)) which is spanned by the
three matrices
(T̃i )jk = −iijk (9.51)

144
which are more explicitly given by
     
0 0 0 0 0 1 0 −1 0
T̃1 = i  0 0 −1  , T̃2 = i  0 0 0  , T̃3 = i  1 0 0  . (9.52)
0 1 0 −1 0 0 0 0 0

They satisfying the commutation relations

[T̃i , T̃j ] = iijk T̃k . (9.53)

Hence the structure constants are fijk = iijk .

Exercise 9.16. Verify the SO(3) commutation relations (9.53).

We have seen above that the generators of unitary (or orthogonal) groups are hermitian. Conversely, we
can show that the exponential map acting on hermitian generators leads to unitary group elements. To
do this first note that for two matrices X, Y with [X, Y ] = 0 we have exp(X) exp(Y ) = exp(X + Y ). Also
note that (exp(X))† = exp(X † ). It follows for a hermitian matrix T that

(exp(iT ))† exp(iT ) = exp(−iT ) exp(iT ) = exp(0) = 1 , (9.54)

so that exp(iT ) is indeed unitary.

Exercise 9.17. Show that (exp(X))† = exp(X † ) for a square matrix X. Also show that for two square
matrices X, Y with [X, Y ] = 0 we have exp(X) exp(Y ) = exp(X + Y ). (Hint: Use the series which defines
the exponential function.)

Exercise 9.18. Proof equation (9.35).(Hint: Use Eq. (9.43).) Further, show that the RHS of Eq. (9.35)
provides all SU (2) matrices.

9.3.4 Lie algebra representations


Let us come back to our original goal of finding representations of Lie groups. Since, as we have seen,
the group can be recovered from the algebra, it makes sense to discuss representations at the level of
the algebra (and then use the matrix exponential to lift to the group). For this, we need the idea of a
representation of a Lie algebra.

Definition 9.11. A representation r of a Lie algebra L is a linear map r : L → Hom(V, V ) which preserves
the bracket, that is, r([T, S]) = [r(T ), r(S)].

The notions of reducible, irreducible and fully reducible representations we have introduced for group
representations directly transfer to representations of the Lie algebra. Note that the space Hom(V, V ) is,
in practice, the space of n × n real matrices for V = Rn or the space of n × n complex matrices for V = Cn .
So a representation of a Lie algebra amounts to assigning (linearly) to each Lie algebra element T a matrix
r(T ) such that the matrices commute “in the same way” as the Lie algebra elements. A practical way
of stating what a Lie algebra representation is in terms of a basis Ti of the Lie algebra. Suppose the Ti
commute as
[Ti , Tj ] = fij k Tk . (9.55)
Then the assignment Ti → r(Ti ) defines a Lie algebra representation provided the matrices r(Ti ) commute
on the same structure constants, that is,

[r(Ti ), r(Tj )] = fij k r(Tk ) . (9.56)

145
As an example for how this works in practice, consider the three matrices τi in Eq. (9.44) which form a
basis of L(SU (2)). Any assignment τi → Ti to matrices Ti which commute on the same structure constants
as the τi , that is [Ti , Tj ] = iijk Tk , defines a Lie algebra representation of L(SU (2)). In fact, we have
already seen an example of such matrices, namely the matrices T̃i in Eq. (9.51) which form a basis of the
Lie-algebra L(SO(3)). Hence, we see that the Lie algebra of SO(3) is a representation of the Lie-algebra
of SU (2) - a clear indication that those two groups are closely related.
The idea for how we want to proceed discussing representations is summarised in the following diagram.
R
G −→ Gl(V )
exp ↑ ↑ exp (9.57)
r
L(G) −→ Hom(V, V )
Instead of studying representations R of the Lie group G we will be studying representations r of its
Lie-algebra L(G). When required, we can use the exponential map to reconstruct the group G and its
representation matrices in Gl(V ). In other words, suppose we have a Lie-algebra element T ∈ L(G) and
the associated group element g = exp(iT ) ∈ G. Then the relation between a representation r of the Lie
algebra and the corresponding representation R at the group level is summarised by

T 7→ r(T ) , g = exp(iT ) 7→ R(g) = exp(ir(T )) . (9.58)

We will now explicitly discuss all this for the group SU (2) and its close relative SO(3).

9.4 The groups SU (2) and SO(3)


9.4.1 Relationship between SU (2) and SO(3)
Our first step is to understand the relationship between SU (2) and SO(3) better. We have already seen
that their Lie algebras are representations of each other but what about the groups? We will now see that
SO(3) is, in fact, a group representation of SU (2).
To do this it is useful to define the map ϕ : R3 → L(SU (2)) by

ϕ(t) := ti σi , (9.59)

so that ϕ identifies R3 with L(SU (2)). Note that the dot product between two vectors t, s ∈ R3 can then
be written as
1
t · s = tr(ϕ(t)ϕ(s)) , (9.60)
2
as a result of the identity tr(σi σj ) = 2 δij which follows from Eq. (9.43).

Exercise 9.19. Proof the relation (9.60), by using Eq. (9.43).

We now define a three-dimensional representation R : SU (2) → Gl(R3 ) of SU (2) by


 
R(U )(t) := ϕ−1 U ϕ(t)U † . (9.61)

Note that this makes sense. The matrix U ϕ(t)U † is hermitian traceless and, hence, an element of
L(SU (2)). Therefore, we can associate to it, via the inverse map ϕ−1 , a vector in R3 . In order to
study the representation R further, we analyse its effect on the dot product.
1 1   1
(R(U )t) · (R(U )s) = tr (ϕ(R(U )t)ϕ(R(U )s)) = tr U ϕ(t)U † U ϕ(s)U † = tr(ϕ(t)ϕ(s)) = t · s (9.62)
2 2 2

146
This shows that R(U ) leaves the dot product invariant and, therefore, R(U ) ∈ O(3). A connectedness
argument shows that, in fact, R(U ) ∈ SO(3) 10 . The representation R is not faithful since, from Eq. (9.61),
U and −U are mapped to the same rotation. In fact, R is precisely a two-to-one map (two elements of
SU (2) are mapped to one element of SO(3)) as can be shown in the following exercise.
Exercise 9.20. Show that the kernel of the representation (9.61) is given by Ker(R) = {±12 }. (Hint:
Use Schur’s Lemma.)
It can also be verified by explicit calculation that Im(R) = SO(3). In conclusion, we have seen that the
groups SU (2) and SO(3) are very closely related - the former is what is called a double-cover of the latter.
We have a two-to-one representation map R : SU (2) → SO(3) which allows us to recover SO(3) from
SU (2), but not the other way around since R is not invertible. From this point of view the group SU (2)
is the more basic object. In particular, all representations of SO(3) are also representations of SU (2)
(just combine the SO(3) representation with R) but not the other way around. It is already clear that
the group SO(3) plays an important role in physics - just think of rotationally symmetric problems in
classical mechanics, for example. The above result strongly suggest that SU (2) is also an important group
for physics and, in quantum mechanics, this turns out to be the case.

9.4.2 All complex irreducible representations


Now that we have clarified the relationship between the groups let us find all (complex) irreducible and
unitary representations of their associated Lie algebras. We consider representations r : L(SU (2)) →
Hom(Cn , Cn ) by complex n × n matrices, and denote the representation matrices of the generators by
Ji := r(τi ). (Since we are after unitary representations the Ji should be hermitian.) For the Ji to form a
representation they must satisfy the same commutation relations as the τi in Eq. (9.46), namely
[Ji , Jj ] = iijk Jk . (9.63)
Finding all (finite-dimensional, hermitian) irreducible matrices Ji which satisfy this relation is a purely
algebraic problem as we will now see. First we introduce a new basis
J ± := J1 ± iJ2 , J3 (9.64)
on our Lie-alebra 11 and re-express the commutation relations (9.63) in this basis:
[J3 , J± ] = ±J± , [J+ , J− ] = 2J3 . (9.65)
We also introduce the matrix
1
J 2 := J12 + J22 + J32 = (J+ J− + J− J+ ) + J32 (9.66)
2
Combining Eqs. (9.65) and (9.66) leads to the useful relations
J− J+ = J 2 − J3 (J3 + 1) , J+ J− = J 2 − J3 (J3 − 1) . (9.67)
A key observation is now that J 2 commutes with all matrices Ji , that is,
[J 2 , Ji ] = 0 for i = 1, 2, 3 . (9.68)
An operator which commutes with the entire Lie algebra, such as J 2 above, is also called a Casimir
operator.
10
Since SU (2) is path-connected every U ∈ SU (2) can be connected to the identity 12 . Hence, every image R(U ) must be
connected to the identity 13 but we know that only elements in SO(3) are.
11
There is a subtlety here. We have defined the Lie-algebra as the real span of the generators but our re-definition includes
a factor of i. Really, we are working here with the complexification of the Lie-alebra LC := L + iL.

147
Exercise 9.21. Derive Eqs. (9.65), (9.67) and (9.68), starting with the commutation relations (9.63) and
the definition (9.64).
From Schur’s Lemma this means that J 2 must be a multiple of the unit matrix and we write 12

J 2 = j(j + 1)1 , (9.69)

for a real number j. The number j characterises the representation we are considering and which we write
as rj with associated representation vector space Vj . The idea is to construct this vector space by thinking
about the eigenvalues and eigenvectors of J3 which we denote by m and |jmi, respectively, so that

Vj = Span(|jmi | |jmi eigenvector with eigenvalue m of J3 ) . (9.70)

What can we say about these eigenvectors and eigenvalues? Since J3 is hermitian we can choose the
eigenvectors as an ortho-normal basis, so

hjm|j m̃i = δmm̃ . (9.71)

It is also clear that 0 ≤ hjm|J 2 |jmi = j(j + 1) so we can take j ≥ 0. Further, the short calculation

J3 J± |jmi = (J± J3 + [J3 , J± ])|jmi = (m ± 1)J± |jmi (9.72)

shows that J± |jmi is also an eigenvector of J3 but with eigenvalue m ± 1. Hence, J± act as “ladder
operators” which increase or decrease the eigenvalue by one. If this is carried out repeatedly the process
must break down at some point since we are looking for finite-dimensional representations. The breakdown
point can be found by considering

0 ≤ hjm|J− J+ |jmi = (j − m)(j + m + 1)hjm|jmi (9.73)


0 ≤ hjm|J+ J− |jmi = (j + m)(j − m + 1)hjm|jmi , (9.74)

where we have used that J± = J∓ (since the Ji are hermitian) to show that the LHS is positive and
Eqs. (9.67) to obtain the RHS. The above equations tell us that (j − m)(j + m + 1) ≥ 0 and (j + m)(j −
m + 1) ≥ 0 and this has two important implications:

−j ≤ m ≤ j , J± |jmi = 0 ⇔ m = ±j . (9.75)

This means we have succeeded in bounding the eigenvalues m and in finding a breakdown condition for
the ladder operators. As we apply J+ successively we can get as high as |jji and then J+ |jji = 0. Starting
with |jji and using J− to go in the opposite direction we obtain
2 2j 2j+1
|jji , J− |jj|i ∼ |jj − 1i , J− |jji ∼ |jj − 2i, · · · , J− |jji ∼ |j − ji , J− |jji = 0 , (9.76)

where the breakdown must arise at m = −j. In the above sequence, we have gone from m = j to m = −j
in integer steps. This is only possible if j is integer or half-integer. The result of this somewhat lengthy
argument can be summarised in
Theorem 9.22. The irreducible representations rj of L(SU (2)) are labelled by an integer or half integer
number j ∈ Z/2 and the corresponding representation vector spaces Vj are spanned by

Vj = Span(|jmi | m = −j, −j + 1, . . . , j − 1, j) , (9.77)

so that dim(rj ) = 2j + 1. The generators J± , J3 in Eq. (9.64) act on the states |jmi as
p
J± |jmi = j(j + 1) − m(m ± 1) |jm ± 1i , J3 |jmi = m|jmi . (9.78)
12
The reason for writing the constant as j(j + 1) will become clear shortly.

148
Proof. The only part we haven’t shown yet is the factor in the first Eq. (9.78). To do this we write
J± |jmi = N± (j, m)|jm ± 1i with some constants N± (j, m) to be determined. By multiplying with
hjm ± 1| we get
1
N± (j, m) = hjm ± 1|J± |jmi = hjm|J∓ J± |jmi , (9.79)
N± (j, m)∗
and, using Eqs. (9.73) and (9.74), this implies |N± (j, m)|2 = j(j + 1) − m(m ± 1). Up to a possible phase
this is the result we need. It can be shown that this phase can be consistently set to one and this completes
the proof.

The representation rj is also called the spin j representation and j is referred to as the total spin of the
representation. The label m for the basis vectors |jmi is also called z-component of the spin. These
representations play an important role in quantum mechanics where they describe states with well-defined
total spin and well-defined z-component of spin.
(j) (j)
The results (9.78) can be used to explicitly compute the representation matrices T± and T3 whose
entries are given by
(j) p
T+,m̃m := hj m̃|J+ |jmi = j(j + 1) − m(m + 1) δm,m̃−1
(j) p
T−,m̃m := hj m̃|J− |jmi = j(j + 1) − m(m − 1) δm,m̃+1 . (9.80)
(j)
T3,m̃m := hj m̃|J3 |jmi = m δmm̃

Let us use these equations to compute these matrices explicitly for the lowest-dimensional representations.

9.4.3 Examples of SU (2) representations


representation j = 0: The representation r0 is one-dimensional with representation vector space V0 = C
and m = 0 is the only allowed value. The 1×! representation matrices are obtained by inserting j = m = 0
into Eq. (9.80) which shows that
(0) (0)
T± = T3 = (0) . (9.81)
From Eq. (9.58), the corresponding group representation matrices are g(t) = exp(0) = 1, so this is the
trivial ( = snglet) representation.
representation j = 1/2: The representation r1/2 is two-dimensional with representation vector space
V1/2 = C2 and allowed values m = −1/2, 1/2. Since this is the only irreducible two-dimensional represen-
tation it must coincide with the fundamental representation of SU (2) which is also two-dimensional. We
can verify this explicitly by inserting j = 1/2 and m̃, m = 1/2, −1/2 into Eq. (9.80) which leads to
     
(1/2) 0 1 (1/2) 0 0 (1/2) 1 1 0
T+ = = τ1 + iτ2 , T− = = τ1 − iτ2 , T3 = = τ3 . (9.82)
0 0 1 0 2 0 −1

Here, τi are the standard generators of the SU (2) Lie-algebra, defined in Eq. (9.44) which confirms that
we are indeed dealing with the fundamental representation.
representation j = 1: The representation r1 is three-dimensional with representation vector space V1 =
C3 (or R3 since the representation matrices turn out to be real) with allowed values m = −1, 0, 1. There
is only one irreducible three-dimensional representation so this must coincide with the three-dimensional
representation of SU (2) provided by SO(3) which we have discussed earlier. Again, we can verify this

149
explicitly by inserting j = 1 and m̃, m = 1, 0, −1 into Eqs. (9.80), leading to
     
√ 0 1 0 √ 0 0 0 1 0 0
(1) (1) (1)
T+ = 2 0  0 1  , T− = 2  1 0 0  , T3 =  0 0 0  . (9.83)
0 0 0 0 1 0 0 0 −1
Those matrices look somewhat different from the matrices (9.52) for the Lie-algebra of SO(3) but the two
sets of matrices are, in fact, related by a common basis transformation.
(1) (1)
Exercise 9.23. Show that the matrices T± and T3 in Eq. (9.83) satisfy the correct L(SU (2)) com-
mutation relations. Also show that they are related by a common basis transformation to the matrices
T̃± = T̃1 ± iT̃2 and T̃3 , with T̃i given in Eq. (9.52).
(3/2) (3/2)
Exercise 9.24. Find the representation matrices T± annd T3 for the j = 3/2 representation and
show that they satisfy the correct L(SU (2)) commutation relations.

9.4.4 Relation to spherical harmonics


The spherical harmonics Ylm which we have discussed in Section 6.4 are labelled in terms of the same
combinatorial data as the representations rl of L(SU (2)) in Theorem 9.22. This can’t be a coincidence.
To understand the relation we should study how the rotation group can be represented on vector spaces
of functions. For any rotation A ∈ SO(3) and a (smooth) function f : R3 → R define

R(A)(f )(x) := f (A−1 x) . (9.84)

It is straightforward to check that R is a representation of SO(3), that is, it satisfies R(AB) = R(A)R(B),
but note that including the inverse of A in the definition is crucial for this to work out.
Exercise 9.25. Show that the SO(3) representation R in Eq. (9.84) satisfies R(AB) = R(A)R(B).
To consider the associated Lie algebra we evaluate this for “small” rotations A = 13 + itj T̃j + ·, with the
matrices T̃j defined in Eq. (9.52). Inserting this into Eq. (9.84) and performing a Taylor expansion leads,
after a short calculation, to

R(A)(f )(x) = f (x) + itj L̂j f (x) + · · · , L̂ = −ix × ∇ . (9.85)

The operators L̂j span a representation of the Lie algebra of SO(3) and, hence, must satisfy

[L̂i , L̂j ] = iijk L̂k . (9.86)

This can indeed be verified by direct calculation, using the definition of L̂ in Eq. (9.85).
Exercise 9.26. Verify the commutation relations (9.86) by explicitly calculation, using the definition (9.85)
of L̂.
We have seen that the operators L̂i generate small rotations on functions. In quantum mechanics, the
L̂i are the angular momentum operators, obtained from the classical angular momentum L = x × p by

carrying out the replacement pi → −i ∂x i
. In conclusion, we see that there is a close connection between
angular momentum and the rotation group.
Since the L̂i form a representation of the Lie algebra of SO(3) it is natural to ask which irreducible
representations it contains. To do this it is useful to re-write the L̂i in terms of spherical coordinates
x = r(sin θ cos ϕ, sin θ sin ϕ, cos θ), which leads to
   
∂ ∂ ∂ ∂ ∂
L̂1 = i sin ϕ + cot θ cos ϕ , L̂2 = i − cos ϕ + cot θ sin ϕ , L̂3 = −i . (9.87)
∂θ ∂ϕ ∂θ ∂ϕ ∂ϕ

150
The Casimir operator L̂2 is given by
L̂2 = −∆S 2 , (9.88)
where ∆S 2 is the Laplacian on the two-sphere, defined in Eq. (6.20).

Exercise 9.27. Derive the expressions (9.87) and (9.88) for angular momentum in polar coordinates.

It is now easy to verify, using Eqs (6.85) and (6.82) that the spherical harmonics satisfy

L̂2 Ylm = l(l + 1)Ylm , L̂3 Ylm = mYlm . (9.89)

This means that the vector space spanned by the Ylm for m = −l, . . . , l forms the representation space
Vl of the representation rl with total spin (or angular momentum) l. Note that this only leads to the
representation rj with integer j but not the ones with half integer j. This is directly related to the fact
that we have started with SO(3), rather than SU (2).
The fact that the Ylm span a spin l representation means that
(l)m 0
X
R(A)Ylm (n) = Ylm (A−1 n) = Rm0 (A) Ylm (n) , (9.90)
m0 =−l,··· ,l

where R(l) (A) are the spin l representation matrices (and n is a unit vector which parametrises S 2 ). An
immediate conclusion from this formula is that the function
X
F (n0 , n) := Ylm (n0 )∗ Ylm (n) (9.91)
m=−l,··· ,l

is invariant under rotations, that is, F (An0 , An) = F (n0 , n) for a rotation A. This fact was used in the
proof of Lemma (6.3).

9.4.5 Clebsch-Gordan decomposition


We would now like to discuss the tensor product of two L(SU (2)) representations rj1 and rj2 and the
corresponding Clensch-Gordan decomposition (9.27). In quantum mechanics this corresponds to what is
referred to as “addition of spin”. We start by introducing the representation matrices
(1) (2)
Ji = rj1 (τi ) , Ji = rj2 (τi ) , (9.92)

for the two representations rj1 and rj2 . If r = rj1 ⊗ rj2 is the tensor representation we would like to
(1) (2)
understand how its representation matrices Ji := r(τi ) relate to Ji and Ji above. To this end, we write
the corresponding infinitesimal group transformations

Rj1 (t) = 1 + iti Ji Rj2 (t) = 1 + iti Ji


(1) (2)
+ ··· , + ··· , (9.93)

and recall, from Eq. (9.24), the definition of the tensor product in term of the Kronecker product. This
means  
R(t) = Rj1 (t) × Rj2 (t) = 1 + iti Ji × 1 + 1 × Ji
(1) (2)
+ ··· . (9.94)

and comparing this with R(t) = 1 + iti Ji + · · · leads to

× 1 + 1 × Ji
(1) (2)
Ji = Ji . (9.95)

This formula is the reason for referring to “addition of spin”.

151
(1) (2)
Exercise 9.28. Show, if Ji and Ji each satisfy the commutation relations (9.63) then so does Ji ,
defined in Eq. (9.95) (Hint: Use the property (9.23) of the Kronecker product.)
In summary, we now have the following representations, representation vector spaces and representation
matrices:
representation dimension spanned by range for m representation matrices
(1)
rj1 2j1 + 1 |j1 m1 i m1 = −j1 , . . . , j1 Ji = rj1 (τi )
(2)
rj2 2j2 + 1 |j2 m2 i m2 = −j2 , . . . , j2 Ji = rj2 (τi )
Ji = Ji × 1 + 1 ⊗ Ji
(1) (2)
r = rj1 ⊗ rj2 (2j1 + 1)(2j2 + 1) |j1 j2 m1 m2 i = m1 = −j1 , . . . , j1
|j1 m1 i ⊗ |j2 m2 i m2 = −j2 , . . . , j2
We note that
 
J3 × 1 + 1 × J3
(1) (2)
J3 |j1 j2 m1 m2 i = |j1 m1 i ⊗ |j2 m2 i
   
(1) (2)
= J3 |j1 m1 i ⊗ |j2 m2 i + |j1 m1 i ⊗ J3 |j2 m2 i = (m1 + m2 ) |j1 j2 m1 m2 i(9.96)

so the basis states |j1 j2 m1 m2 i of the tensor representation are eigenvectors of J3 with eigenvalues m1 +m2 .
While rj1 and rj2 are irreducible there is no reason for r to be. However, given that we have a complete
list of all irreducible representations from Theorem 9.22 we know that r must have a Clebsch-Gordan
decomposition of the form M
r = rj1 ⊗ rj2 = νj rj , (9.97)
j

where νj ∈ Z≥0 indicates how many times rj is contained in r. (If rj is not contained in r then νj = 0.) Our
first problem is to determine the numbers νj and, hence, to work out the Clebsch-Gordan decomposition
explicitly.
Theorem 9.29. For two representations rj1 and rj2 of L(SU (2)) we have
jM
1 +j2

rj1 ⊗ rj2 = rj . (9.98)


j=|j1 −j2 |

Proof. Our starting point is to think about the degeneracy, δm , of the eigenvalue m of J3 in the represen-
tation r = rj1 ⊗ rj2 . Every representation rj ⊂ r with j ≥ |m| contributes exactly one to this degeneracy
while representations rj with j < |m| do not contain a state with J3 eigenvalue m. This implies
X
δm = νj , (9.99)
j≥|m|

where νj counts how many time rj is contained in r, as in Eq. (9.97). Eq. (9.99) implies
νj = δj − δj+1 , (9.100)
so if we can work out the degeneracies δm this equation allows us to compute the desired numbers νj .
The degeneracies δm are computed from the observation in Eq. (9.96) that the states with eigenvalue m
are precisely those states |j1 j2 m1 m2 i with m = m1 + m2 . Hence, all we need to do is count the pairs
(m1 , m2 ), where mi = −ji , . . . , ji and m1 + m2 = m. The result is

 0 for |m| > j1 + j2
δm = j1 + j2 + 1 − |m| for j1 + j2 ≥ |m| ≥ |j1 − j2 | . (9.101)
2j2 + 1 for |j1 − j2 | ≥ |m| ≥ 0

Inserting these results into Eq. (9.100) shows that νj = 1 for j1 +j2 ≥ j ≥ |j1 −j2 | and νj = 0 otherwise.

152
Eq. (9.98) tells us how to “couple” two spins. For example, two spin 1/2 representations

r1/2 ⊗ r1/2 = r0 ⊕ r1 (9.102)

contain a singlet r0 and a spin 1 representation r1 . Note that dimensions work out since dim(r1/2 ) = 2,
dim(r0 ) = 1 and dim(r1 ) = 3. As another example, consider coupling a spin 1/2 and a spin 1 representation
which leads to
r1/2 ⊗ r1 = r1/2 ⊕ r3/2 . (9.103)
We have now identified the representation content of a tensor product for two irreducible representa-
tions rj1 and rj2 but this gives rise to a more detailed problem. On the tensor representation vector space
V = Vj1 ⊗ Vj2 we have two sets of basis vectors, namely

|j1 j2 m1 m2 i where m1 = −j1 , . . . , j1 , m2 = −j2 , . . . , j2 (9.104)


|jmi where m = −j, . . . , j , j = |j1 − j2 |, . . . , j1 + j2 . (9.105)

The question is how these two sets of basis vectors are related. Formally, we can write (using Dirac
notation) X
|jmi = |j1 j2 m1 m2 ihj1 j2 m1 m2 |jmi (9.106)
m1 ,m2

and the numbers hj1 j2 m1 m2 |jmi which appear in this equation are called Clebsch-Gordan coefficients.
Once we have computed these coefficients the relation between the two sets of basis vectors is fixed.

Application 9.35. Example of a Clebsch-Gordan decomposition


How to carry out a Clebsch-Gordan decomposition is best explained with an example. Let us consider
two spin 1/2 representation, as in Eq. (9.102), whose tensor produce space V = V1/2 ⊗ V1/2 is four-
dimensional and spanned by

|j1 j2 m1 m2 i → | 12 12 − 21 − 12 i , | 12 12 − 12 12 i , | 12 21 12 − 12 i , | 21 12 12 12 i
(9.107)
|jmi → |00i , |1 − 1i , |10i , |11i

The key to relating these two sets of basis vectors is again to think about the eigenvalue m of
(1) (2)
J3 = J3 + J3 . Consider the m = 1 state |11i from the second basis, The only state from the first
basis with m = 1 (remembering that m1 and m2 sum up) is | 12 21 12 12 i. After a choice of phase, we can
therefore set
|11i = | 12 12 12 12 i . (9.108)
(1) (2)
We can generate the other required relations by acting on Eq. (9.108) with J− = J− + J− , using
the formula (9.78). This leads to

|11i = | 12 12 12 21 i
↓ ↓

2 |10i = | 12 12 − 11
2 2i + | 12 21 12 − 12 i
↓ ↓ (9.109)
2 |1 − 1i = | 12 21 − 12 − 21 i + | 12 21 − 21 − 12 i

2 |00i = | 12 12 − 21 12 i − | 12 21 12 − 12 i

where the arrows indicate the action of J− and the last relation follows from orthogonality. So in

153
summary we have

= | 12 12 12 21 i

|11i 
 
√1 | 12 12 − 11
+ | 12 12 12 1
10i = 2 2 2i − 2i j=1 (9.110)
| 12 12 1 1

|1 − 1i = − − 2i

2
 o
|00i = √1 | 21 12 − 11
− | 21 12 12 − 12 i
2 2 2i j=0 (9.111)

and this provides a complete set of relations between the two sets of basis vector from which all
Clebsch-Gordan coefficients can be read off.

Exercise 9.30. Find the Clebsch-Gordan coefficients for the case (9.103).

9.5 The Lorentz group


The Lorentz group is of fundamental importance in physics. It is the group at the heart of special relativity
and it is a symmetry of all relativistic theories, including relativistic mechanics and electro-magnetism.
Mainly, we would like to determine all irreducible representations of the Lorentz group Lie algebra and,
as we will see, this is rather easy given our earlier results for SU (2).

9.5.1 Basic definition


The Lorentz group L is defined as

L = Λ ∈ Gl(R4 ) | ΛT ηΛ = η

, (9.112)

where η = diag(−1, 1, 1, 1) is the Minkowski metric. In other words, the Lorentz group consists of all real
4 × 4 matrices Λ which satisfy the defining relation ΛT ηΛ = η. Using index notation, this relation can
also be written as
Λµ ρ Λν σ ηµν = ηρσ . (9.113)
Clearly, the Lorentz group is a sub-group of the four-dimensional general linear group Gl(R4 ). As a
matrix group it has a fundamental representation which is evidently four-dimensional. The action of this
fundamental representation Λ : R4 → R4 is explicitly given by
µ
x 7→ x0 = Λx ⇐⇒ xµ 7→ x0 = Λµ ν xν . (9.114)

In Special Relativity this is interpreted as a transformation from one inertial system with space-time
coordinates x = (t, x, y, z)T to another one with space-time coordinates x0 = (t0 , x0 , y 0 , z 0 )T .

9.5.2 Properties of the Lorentz group


Ultimately, we will be interested in the Lie algebra of the Lorentz group and its representations but we
begin by discussing a few elementary properties of the group itself. Taking the determinant of the defining
relation (9.112) and using standard properties of the determinant implies that (det(Λ))2 = 1 so that

det(Λ) = ±1 . (9.115)
P3
Further, the ρ = σ = 0 component of Eq. (9.113) reads −(Λ0 0 )2 + i 2
i=1 (Λ 0 ) = −1 so that

Λ0 0 ≥ 1 or Λ0 0 ≤ −1 . (9.116)

154
Combining the two sign ambiguities in Eqs. (9.115) and (9.116) we see that there are four types of Lorentz
transformations. The sign ambiguity in the determinant is analogous to what we have seen for orthogonal
matrices and its interpretation is similar to the orthogonal case. Lorentz transformations with determinant
1 are called “proper” Lorentz transformations while Lorentz transformations with determinant −1 can be
seen as a combination of a proper Lorentz transformation and a reflection. More specifically, consider the
special Lorentz transformation P = diag(1, −1, −1, −1) (note that this matrix indeed satisfies Eq. (9.113))
which is also referred to as “parity”. Then every Lorentz transformation Λ can be written as
Λ = P Λ+ , (9.117)
where Λ+ is a proper Lorentz transformation. The sign ambiguity (9.116) in Λ0 0 is new but has an obvious
physical interpretation. Under a Lorentz transformations Λ with Λ0 0 ≥ 1 the sign of the time component
x0 = t of a vector x remains unchanged, so that the direction of time is unchanged. Correspondingly,
such Lorentz transformation with positive Λ0 0 are called “ortho-chronous”. On the other hand, Lorentz
transformations Λ with Λ0 0 ≤ −1 change the direction of time. If we introduce the special Lorentz
transformation T = diag(−1, 1, 1, 1), also referred to as “time reversal”, then every Lorentz transformation
Λ can be written as
Λ = T Λ↑ , (9.118)
where Λ↑ is an ortho-chronous Lorentz transformation. Introducing the sub-group L↑+ of proper ortho-
chronous Lorentz transformations, the above discussion shows that the full Lorentz group can be written
as a union of four disjoint pieces:
L = (L↑+ ) ∪ (P L↑+ ) ∪ (T L↑+ ) ∪ (P T L↑+ ) . (9.119)
The Lorentz transformations normally used in Special Relativity are the proper, ortho-chronous Lorentz
transformations. However, the other Lorentz transformations are relevant as well and it is an important
question as to whether they constitute symmetries of nature in the same way that proper, ortho-chronous
Lorentz transformations do. More to the point, the question is whether nature respects parity P and
time-reversal T 13 .

9.5.3 Examples of Lorentz transformations


What do proper, ortho-chronous Lorentz transformations look like explicitly? To answer this question we
basically have to solve Eq. (9.112) which is clearly difficult to do in full generality. However, some special
Lorentz transformations are more easily obtained. First, we note that for three-dimensional rotation
matrices R the map  
1 0
R→ (9.120)
0 R
leads to proper, ortho-chronous Lorentz transformations. Indeed, the matrices on the RHS satisfy
Eq. (9.113) by virtue of RT R = 13 and we have det(Λ) = det(R) = 1 and Λ0 0 = 1. In group-theoretical
terms this means there is an injective group homomorphism SO(3) → L↑+ defined by Eq. (9.120), or, in
short, the three-dimensional rotations are embedded into the proper ortho-chromous Lorentz transforma-
tions.
To find less trivial examples we start with the Ansatz
   
Λ2 0 a b
Λ= , Λ2 = (9.121)
0 12 c d
13
The answer is not simple and depends on the interactions/forces considered. But it is clear that the current fundamental
theory which describes strong, electromagnetic and weak interactions - the standard model of particle physics - respects
neither parity nor time inversion.

155
of a two-dimensional Lorentz transformation which affects time and the x-coordinate, but leaves y and
z unchanged. Demanding that a Λ of the above form is a proper ortho-chronous Lorentz transformation
leads to  
cosh(ξ) sinh(ξ)
Λ2 (ξ) = . (9.122)
sinh(ξ) cosh(ξ)
Exercise 9.31. Show that Eq. (9.122) is the most general form for Λ2 in order for Λ in Eq. (9.121) to be
a proper ortho-chronous Lorenetz transformation.
The quantity ξ in Eq. (9.122) is also called rapidity. It follows from the addition theorems for hyperbolic
functions that Λ(ξ1 )Λ(ξ2 ) = Λ(ξ1 +ξ2 ), so rapidities add up in the same way that two-dimensional rotation
angles do. For a more common parametrisation introduce the parameter β = tanh(ξ) ∈ [−1, 1] so that
1
cosh(ξ) = p =: γ , sinh(ξ) = βγ . (9.123)
1 − β2
In terms of β and γ the two-dimensional Lorentz transformations can then be written in the more familiar
form  
γ βγ
Λ2 = . (9.124)
βγ γ
Here, β is interpreted as the relative speed of the two inertial systems (in units of the speed of light).

9.5.4 The Lie algebra of the Lorentz group


To find the Lie algebra L(L) of the Lorentz group we proceed as before and write down an “infinitesimal”
Lorentz transformation Λ = 14 + iT + · · · , (where T is purely imaginary) insert this into the defining
relation ΛT ηΛ = η and work out the linear constraint on T . The result is

L(L) = {T | T = −ηT T η , T purely imaginary} = Span(Ti , Si )i=1,2,3 , (9.125)

where
     
  0 i 0 0 0 0 i 0 0 0 0 i
0 0  i 0 0 0 
  0 0 0 0 
  0 0 0 0 
Ti = , S1 = 
 0 , S2 =  , S3 =   , (9.126)
0 T̃i 0 0 0   i 0 0 0   0 0 0 0 
0 0 0 0 0 0 0 0 i 0 0 0

and T̃i are the SO(3) generators defined in Eq. (9.52). Given that the three-dimensional rotations are
embedded into the Lorentz group as in Eq. (9.120) the appearance of the SO(3) generators in the lower
3 × 3 block is entirely expected. The other generators Si do not correspond to rotations and are called the
boost generators. They are related to non-trivial Lorentz boosts. Altogether, the dimension of the Lorentz
group (Lie algebra) is six, with three parameters describing rotations and the three others Lorentz boosts.
The commutation relations can be worked out by direct computation with the above matrices and they
are given by
[Ti , Tj ] = iijk Tk , [Si , Sj ] = −iijk Tk , [Ti , Sj ] = iijk Sk . (9.127)
These commutation relations are very reminiscent of the ones for SU (2) and we can make this more
explicit by introducing a new basis
1
Ti± := (Ti ± iSi ) (9.128)
2
for the Lie algebra which commute as

[Ti± , Tj± ] = iijk Tk± , [Ti+ , Tj− ] = 0 . (9.129)

156
Exercise 9.32. Verify the commutation relations (9.127) and (9.129).

These commutation relations mean that the Lie algebra of the Lorentz group corresponds to two copies
of the SU (2) Lie algebra, so L(L) ∼= L(SU (2)) ⊕ L(SU (2)). This is a rather lucky state of affairs (which
does not persist for other groups, such as SU (n) for n > 2) since this means we can obtain the irreducible
representations of the Lorentz group from those of SU (2). We have

Theorem 9.33. The finite-dimensional, irreducible representations of the Lorentz group Lie algebra are
classified by two “spins”, (j+ , j− ), where j± ∈ Z/2, and the corresponding representations r(j+ ,j− ) with
representation vector space V(j+ ,j− ) have dimension (2j+ + 1)(2j− + 1). If J˜i± = rj± (Ti± ) are the L(SU (2))
representation matrices for Ti± then

Ji+ := r(j+ ,j− ) (Ti+ ) = J˜i+ × 12j− +1 , Ji− := r(j+ ,j− ) (Ti− ) = 12j+ +1 × J˜i− (9.130)

are the representation matrices for r(j+ ,j− ) .

Exercise 9.34. Verify that the representation matrices (9.130) satisfy the correct commutation relations
for a representation of the Lorentz Lie algebra.

9.5.5 Examples of Lorentz group representations


The lowest-dimensional representations of the Lorentz group algebra classify the types of particles we
observe in nature.
(j+ , j− ) = (0, 0) This is the trivial, singlet representation, which is one-dimensional, so V(0,0) ∼
= C. Fields
which take values in V(0,0) (at every point in space-time) are also called scalar fields.
(j+ , j− ) = (1/2, 0) This representation is two-dimensional, so V(1/2,0) ∼
= C2 . In physics, it is called the
left-handed Weyl spinor representation.
∼ C2 . In physics, it is called the
(j+ , j− ) = (0, 1/2) This representation is two-dimensional, so V(0,1/2) =
right-handed Weyl spinor representation.
(1/2, 0) ⊕ (0, 1/2) This is the direct sum of the previous two representations. It is, therefore, reducible
and four-dimensional. In physics, it is called the Dirac spinor representation.
(j+ , j− ) = (1/2, 1/2) This representation is four-dimensional and, in fact, corresponds to the four-dimensional
fundamental representation of the Lorentz group. In physics, the fields which take values in V(1/2,1/2) are
called vector fields.

157
Appendices
A Calculus in multiple variables - a sketch
Calculus of multiple variables provides the background for much of what we are doing in this course (and
calculus is the “engine” of differential geometry which we discuss in the following appendices) so it may
be worth including a brief account of the subject. You have seen some of this in your first year, although
perhaps not in quite the same way.

A.1 The main players


Our main arena is Rn which we think of as a normed vector space (in the sense of Def. 1.6) with the usual
Euklidean norm. Based on this norm we can introduce a topology on Rn , with the open sets defined as
in Def. 1.19, that is, open sets U ⊂ Rn are those for which every point x ∈ U has a (sufficiently small)
ball, centred on x, which is entirely contained in U . Why is introducing open sets important for calculus?
We can ask which subsets of Rn are appropriate as domains of functions, if we want to perform standard
operations of calculus such as taking derivatives. As you know from the one-dimensional case, defining
the derivative of a function at a point involves taking the limes of a difference quotient and setting this
up assumes the function is defined near the point in question. Hence, if we want to define derivates we
should focus on function domains where each point has a neighbourhood contained within the domain -
but these are precisely the open sets.
Having said this, we would like to consider functions f : U → Rm , with domains U that are open
subsets of Rn and co-domains that are subsets of Rm . We will typically denote the coordinates on the
domain U ⊂ Rn by x = (x1 , . . . , xn )T and the coordinates on the co-domain Rm by y = (y1 , . . . , ym )T . So,
in more down-to-earth language, our function f depends on n variables x1 , . . . , xn and is vector-valued in
m dimensions. More explicitly, the value of f at x ∈ U can be written down as
 
f1 (x1 , . . . , xn )
f (x) =  ..
 , (A.1)
 
.
fm (x1 , . . . , xn )

so we can think of f is being given by m real-valued functions f = (f1 , . . . , fm )T , each of which depends
on the n variables x1 , . . . , xn . There are two special choices of dimensions which are of particular interest.
If m = 1 we call f a real-valued function, or scalar field and if n = m we call f a vector field. Vector fields
will also be denoted by uppercase letters A, B, . . ., adopting the notation more common in physics.
Definition A.1. (Continuity) A function f : U → Rm , where U ⊂ Rn open, is said to be continuous at
x ∈ U if every sequence (xk ) in U with limk→∞ xk = x satisfies limk→∞ f (xk ) = f (x). The function f is
said to be continuous on U if it is continuous for all x ∈ U .
Note this definition of continuity is extremely natural. It says continuous functions are those for which
limites can be pulled in and out of the function argument, that is, f (limk→∞ xk ) = limk→∞ f (xk ).

Application 1.36. Heaviside function and discontinuity


The aim is to show, using Def. A.1, that the Heaviside function θ : R → R defined in Eq. (7.21) is
not continuous at x = 0. This can be done by finding a sequence converging to 0 which violates the
condition stated in Def. A.1.
Consider, for example, the sequence (xk ) defined by xk = −1/k. It is clear that limk→∞ xk = 0. On

158
the other hand, θ(xk ) = 0 for all k since xk < 0. Hence, 0 = limk→∞ θ(xk ) 6= θ(limk→∞ xk ) = θ(0) = 1
and this indeed violates the condition in Def. A.1. Hence, the Heaviside function is not continuous at
x = 0.

Exercise A.1. The function f (x) = 1/x is not well-defined at x = 0 and should hence be seen as a
function f : R \ {0} → R. Suppose we construct a “completion” of f by defining a new function F : R → R
with values F (x) = 1/x for x 6= 0 and F (0) = x0 for x0 ∈ R. Show, using Def. A.1, that there is no choice
for x0 such that F is continuous at x = 0.

A.2 Partial differentiation


We consider functions f : U → Rm , with U ⊂ Rn open, and coordinates x = (x1 , . . . , xn )T ∈ Rn as before
and also introduce the standard unit vectors ei , where i = 1, . . . , n, on Rn . The partial derivatives of f
are defined as follows.

Definition A.2. The function f : U → Rm , with U ⊂ Rn open, is said to be partially differentiable with
respect to xi at x ∈ U if the limes
f (x + ei ) − f (x)
lim (A.2)
→0 
exists. In this case, the limes (A.2) is called the partial derivative of f with respect to xi at x and it is
∂f
denoted by ∂x i
(x) or by ∂i f (x).

As a further piece of common terminology, a function f is called κ times continuously differentiable if


it is κ times (partially) differentiable and the resulting derivatives of order κ are continuous. Note that
Def. A.2 is very much in analogy with the one-dimensional case, except that the difference quotient is now
taken in the direction of one of the coordinate axes, as indicated by the presence of the unit vector ei . For
this reason, all the standard rules of differentiation in one dimension - linearity, product rule, chain rule,
quotient rule - directly generalise to partial derivatives (treating the coordinates xj which are not involved
as if they were constants). However, there is one feature of partial derivatives which has no analogy in one
dimension - that is, whether partial derivatives in different directions commute. Fortunately, the answer
is “yes” as stated in the following proposition.

Proposition A.1. If f : U → Rm , with U ⊂ Rn open, is continuously partially differentiable twice at


x ∈ U (in all directions) then
∂2f ∂2f
(x) = (x) . (A.3)
∂xi ∂xj ∂xj ∂xi
Proof. A standard proof working with limites which can, for example, be found in Ref. [10].

There are certain (linear) combinations of partial derivatives which are of particular interest. In general,
for a function f = (f1 , . . . , fm )T : U → Rm , where U ∈ Rn , with n variables and m-dimensional values,
we have n m partial derivatives, ∂i fj , which can be arranged into an m × n matrix. This matrix is called
the Jacobi matrix and it will be discussed in detail below. For now we focus on some special cases.

A.2.1 Gradient
Suppose we have a real-valued function f : U → R, where U ⊂ Rn . Then all we have is n partial derivates
∂i f and they can be conveniently arranged into a row-vector

grad f (x) = ∇f (x) := (∂1 f, . . . , ∂n f ) (x) ⇒ (∇f (x))i = ∂i f (x) , (A.4)

159
which is called the gradient of f at x. Note that for two partially differentiable real-valued functions
f, g : U → R the gradient satisfies the product rule (leaving out the argument x for simplicity)

∇(f g) = g∇f + f ∇g . (A.5)

This follows easily from the product rule for partial derivates:

(∇(f g))i = ∂i (f g) = g∂i f + f ∂i g = (g∇f + f ∇g)i . (A.6)

It is often useful to think of the gradient as a formal row-vector

∇ = (∂1 , . . . , ∂n ) ⇒ ∇i = ∂i (A.7)

whose components are the partial derivatives.

A.2.2 Divergence
Now let us consider a vector field A = (A1 , . . . , An )T : U → Rn , where U ⊂ Rn . Its partial derivatives ∂i Aj
can be arranged into Pan n × n matrix (the Jacobi matrix of A). Of course we can consider any particular
linear combination i,j mij ∂i Aj , where mij ∈ R, of these partial derivates, but is there a specific choice
of the n × n matrix m = (mij ) which is singled out in some way? Suppose we would like the combination
of partial derivatives to be invariant under rotations R ∈ SO(n), acting as xi 7→ Rik xk (which implies, via
the chain rule, that ∂i 7→ Rik ∂k ) and Aj 7→ Rjl Al . Then we have mij ∂i Aj 7→ Rik mij Rjl ∂k Al and, hence,
the combination is invariant iff RT mR = m. The only matrices satisfying this condition for all R ∈ SO(n)
are multiples of the unit matrix 1n , so that mij is proportional to δij . This leads to the divergence of the
vector field A defined by
Xn
divA(x) = ∇ · A(x) := ∂i Ai (x) . (A.8)
i=1

Note that the divergence can be written as a formal dot product between the nabla operator (A.7) and
the vector field A, so div = ∇·. The divergence also satisfies a product rule which involves a vector field
A : U → Rn and a real-valued function f : U → R and is given by

∇ · (f A) = A · ∇f + f ∇ · A . (A.9)

As before, its proof relies on the product rule for partial derivatives:

∇ · (f A) = ∂i (f A)i = ∂i (f Ai ) = Ai ∂i f + f ∂i Ai = A · ∇f + f ∇ · A . (A.10)

A.2.3 Curl
We have just seen that the divergence is (up to an overall factor) the only rotationally invariant linear
combination of the partial derivatives ∂i Aj of a vector field A : U → Rn . What if we allow tensors with
more than two indices (rather than just δij ) to be summed into ∂i Aj ? The only other special tensor (which
is rotationally invariant just as δij ) is the n-dimensional Levi-Civita tensor i1 ,...,in . This means it is of
interest to look at the following linear combinations

k1 ,...,kn−2 ij ∂i Aj (A.11)

of the partial derivatives of A, leading to an object with n − 2 indices (which is also called a tensor field).
This is an entirely sensible construction (which finds its natural home in the context of differential forms,
see Appendix C) but only in three dimensions, n = 3, does it lead to a familiar object with one index,

160
that is, a vector field. A frequent question is why the curl is only defined in three dimensions. In fact,
it is, in the sense of Eq. (A.11), defined in all dimensions (and differential forms provide a more natural
framework for this) but only in three dimensions does it lead back to a vector field. This is why the curl is
normally only introduced in three dimensions, and this is the case we focus on now. The curl of a vector
field A : U → R3 is another vector field with components (A.11), that is,
(curl A)i := ijk ∂j Ak ⇒ curl A = ∇ × A . (A.12)
Note that the curl can be expressed as a formal cross product with the nabla operator, so curl = ∇×.
Exercise A.2. Show that for any n × n matrix R the Levi-Civita tensor satisfies
X
Ri1 j1 · · · Rin jj j1 ,...,jn = det(R)i1 ,...,in .
j1 ,...,jn

Deduce that the Levi-Civita tensor is invariant under rotations.


There are some further product rules for differentiable vector fields A, B : U → R3 and functions f : U →
R3 , namely
∇ × (f A) = ∇f × A + f ∇ × A (A.13)
∇ · (A × B) = (∇ × A) · B + A · (∇ × B) . (A.14)
Exercise A.3. Prove the product rules (A.13) and (A.14) using index notation and the product rule for
partial derivatives.
While we have motivated the definitions of grad, div and curl it is fair to say that these motivations might
not seem entirely compelling. Convincing mathematical reasons for introducing these operations can be
given in the context of differential form, where they arise as special cases of the exterior derivative d. This
will be discussed in Appendix C.
The gradient, the divergence and the curl relate in an interesting way and to discuss this we specialise to
functions f and vector fields A in three dimensions (since this is the only dimension for which we have
defined the curl - the general case will be dealt with in the context of differential forms later). We have
(∇ × (∇f ))k = kij ∂i ∂j f = 0 (A.15)
∇ · (∇ × A) = ijk ∂i ∂j Ak = 0 , (A.16)
where the vanishing follows from the commutativity of second derivatives, Proposition A.1, (which implies
that ∂i ∂j is symmetric in (ij)) and the anti-symmetry of the Levi-Civita tensor in (ij). There is an
interesting and concise way to summarise the relationship between grad, curl and div. First let us introduce
the set C ∞ (U ) of infinitely many times partially differentiable functions and the set V(U ) of infinitely many
times partially differentiable vector fields on U ⊂ R3 . Then, grad, curl and div map in the following way
grad=∇ curl=∇× div=∇·
0 −→ C ∞ (U ) −−−−−→ V(U ) −−−−−→ V(U ) −−−−−→ C ∞ (U ) −→ 0 (A.17)
and Eqs. (A.15) and (A.16) imply that neighbouring maps in this chain compose to zero. Such a mathe-
matical structure, that is a sequence
di−2 di−1
i d di+1
· · · −→ Vi−1 −→ Vi −→ Vi+1 −→ · · · . (A.18)
of vector spaces Vi with maps di and neighbouring maps composing to zero, di ◦ di−1 = 0, is called a
complex. The relation di ◦di−1 = 0 tells us that Im(di−1 ) ⊂ Ker(di ) and this observation allows us to define
the cohomology of the complex (A.18) which consists of the vector spaces H i = Ker(di )/Im(di−1 ). The
complex (A.17) with its associated cohomology is extremely important. It encodes topological information
about the space U and it represents one of the gateways into an area of mathematics called algebraic
geometry. Pursuing this further is well beyond our present scope.

161
A.3 The total differential
So far we have looked at individual partial derivatives but there is a more general notion of derivative
which does not rely on choosing particular directions (such as the directions of the coordinate axes). This
notion is referred to as total derivative.
To introduce the total derivative in a concise way it is useful to recall briefly the one-dimensional case
of a function f : R → R. Such a function is called differentiable at x ∈ R with derivative f 0 (x) if the limes

f (x + ) − f (x)
f 0 (x) := lim (A.19)
→0 
exists. Alternatively and equivalently, we can say that f is differentiable at x ∈ R with derivative f 0 (x)
iff
O(2 )
f (x + ) = f (x) + f 0 (x) + O(2 ) where lim =0 (A.20)
→0 
is satisfied for all sufficiently small . This says we should think of the derivative as a linear map which
provides the leading behaviour of the function’s variation, away from f (x). Note that, intuitively, O(2 )
denotes any expression which “goes” like 2 (or an even higher power of ) but it is also properly defined
as an expression which satisfies the limes condition in the right-hand side of Eq. (A.20). More generally,
we say an expression is of order κ , where κ ∈ N, and we write this expression as O(κ ) if
O(κ )
lim =0. (A.21)
→0 κ−1

Using this notation often leads to a good match between mathematical precision and intuition.
Now we want to generalise the one-dimensional definition of derivatives, in the alternative form (A.20),
to introduce the total derivative in the multi-dimensional case.
Definition A.3. A function f : U → Rm , with U ⊂ Rn open, is called totally differentiable at x ∈ U if
there exists an m × n matrix A such that

f (x + ) = f (x) + A + O(||2 ) (A.22)

for all  ∈ Rn in a sufficiently small ball Br (0). In this case, the matrix A is called the Jacobi matrix for
f at x and is also denoted by Df (x) := A.
This is where linear algebra meets calculus. In the one-dimensional case (A.20) we had a somewhat trivial
linear map f 0 (x) : R → R (a 1 × 1 matrix) but in the multi-dimensional case, for a function f with n
arguments and m-dimensional values, this becomes a linear map Df (x) : Rn → Rm , that is, an m × n
matrix.
You probably know (or perhaps you don’t?) that a function f : R → R in one dimension which is
differentiable at x ∈ R must be continuous at x. Here is the multi-dimensional generalisation of this
statement.
Proposition A.2. If f : U → Rm , with U ⊂ Rn open, is totally differentiable at x ∈ U , then it is
continuous at x.
Proof. We have to show the continuity property in Def. A.1 so we start with a sequence (xk ) which con-
verges to x. This sequence can also be written as xk = x+k , where limk→∞ k = 0. Total differentiability
at x implies that f (xk ) = f (x + k ) = f (x) + Ak + O(|k |2 ) and taking the limes of this equation gives

lim f (xk ) = lim f (x) + Ak + O(|k |2 ) = f (x)



k→∞ k→∞

and this shows f is continuous at x.

162
It is intuitively clear that the total and partial derivatives must be related and the precise relationship is
formulated in the following proposition.

Proposition A.3. If f : U → Rm , with U ⊂ Rn open, is totally differentiable at x ∈ U then it is partially


differentiable with respect to all variables and the total derivative is given by
 
∂1 f1 · · · ∂n f1
Df (x) =  ... .. ..  (x) (A.23)

. . 
∂1 fm · · · ∂n fm

Proof. Assume that f is totally differentiable at x ∈ U , with Jacobi matrix A = Df (x), such that
f (x + ) = f (x) + A + O(||2 ). Focusing on the ith component of f this can be written as fi (x + ) =
fi (x) + Aij j + O(||2 ). Since this holds for all (sufficiently small)  we can choose in particular  = ej
for  ∈ R small and by inserting this into the previous equation we find

fi (x + ej ) = fi (x) + Aij + O(2 ) . (A.24)

This result allows us to calculate the partial derivatives as

fi (x + ej ) − fi (x) O(2 )


 
(A.24)
∂j fi (x) = lim = lim Aij + = Aij , (A.25)
→0  →0 

and this is the required statement.

In short, the Jacobi matrix is the matrix which contains all the partial derivates. Note from Eq. (A.23)
that it is organised such that every row of Df is the gradient of one of the component functions fi of f .
This means, the Jacobi matrix can also be written as
 
∇f1
Df (x) =  ...  (x) . (A.26)
 

∇fm

In particular, for a real-valued function f : U → R the Jacobi matrix is simply the gradient, so Df = ∇f .
For practical calculations it can be useful to have a notation for the Jacobi matrix which refers explicitly
to the variables xj and the function components fi and such a notation is given by 14

∂(f1 , . . . , fm )
Df (x) = (x) . (A.27)
∂(x1 , . . . , xn )

A.4 The chain rule


What does the chain rule for total derivatives look like? In the one-dimensional case the chain rule reads
(f ◦ g)0 (x) = f 0 (g(x))g 0 (x), that is, we obtain the derivative of the composite function f ◦ g by multiplying
the derivatives of f and g. In the multi-dimensional case, the total derivatives become (Jacobi) matrices,
so it is natural to expect that the chain rule works by multiplying these matrices. This is in fact what the
chain rule states.
14
This notation is sometimes used to denote the Jacobian which is the determinant of the Jacobi matrix, for the case of
vector fields. This seems a bit of a waste and we prefer to denote by Eq. (A.27) the Jacobi matrix while the Jacobian is
∂(f1 ,...,fn )
written as det ∂(x 1 ,...,xn )
(x).

163
Theorem A.4. (Chain rule) Consider maps g : U → Rm , with U ⊂ Rn open and f : V → Rp , with
V ⊂ Rm and g(U ) ⊂ V , so a sequence
g f
U ⊂ Rn −→ V ⊂ Rm −→ Rp .
Let g be totally differentiable at x ∈ U and f be totally differentiable at y := g(x) ∈ V . Then f ◦ g is
totally differentiable at x ∈ U and we have
D(f ◦ g)(x) = Df (y) Dg(x) . (A.28)
Proof. By assumption the total derivatives of g at x and of f at y exist. We denote these by A = Dg(x)
and B = Df (y) so that
g(x + ) = g(x) + A + O(||2 ) , f (y + η) = f (y) + Bη + O(|η|2 ) .
The trick is to choose η = g(x + ) − g(x) = A + O(||2 ) and this gives
(f ◦ g)(x + ) = f (g(x + )) = f (g(x) + η) = f (g(x)) + Bη + O(|η|2 )
= f (g(x)) + BA + O(|η|2 , ||2 ) . (A.29)
This shows that D(f ◦ g)(x) = BA = Df (y) Dg(x).

In short, the Jacobi matrix of the composite function is the matrix product of the individual Jabobi
matrices, as in Eq. (A.28). There are various other, and perhaps more familiar ways to write this. For
example, let us adopt the somewhat abusive notation, common in physics, where we denote functions and
coordinates by the same name, that is, we write yi (x) = gi (x). Then, using the notation (A.27) for the
Jacobi matrix, the chain rule can be stated as
∂(((f ◦ y)1 , . . . , (f ◦ y)p ) ∂(f1 , . . . , fp ) ∂(y1 , . . . , ym )
(x) = (y) (x) , (A.30)
∂(x1 , . . . , xn ) ∂(y1 , . . . , ym ) ∂(x1 , . . . , xn )
p×n p×m m×n (A.31)
where the size of the Jacobi matrices is indicated underneath. Alternatively, we can write the matrix
product on the right-hand side of Eq. (A.30) out with indices, keeping in mind that a matrix product
translates into a sum over the adjacent indices. Then (omitting the points x and y for simplicity) we get
m
∂(f ◦ y)i X ∂fi ∂yk
= , (A.32)
∂xj ∂yk ∂xj
k=1

perhaps the version of the chain rule you are most familiar with.
If we apply the chain rule in one dimension, (f ◦ g)0 (x) = f 0 (g(x))g 0 (x), to the special case where
0
f = g −1 (assuming that g is invertible and its inverse is also differentiable) we get 1 = g −1 (y)g 0 (x), where
0
y = g(x), and, hence, this leads to the rule for the derivative of the inverse function, g −1 (y) = 1/g 0 (x).
Let us generalises this argument to the multi-dimension case, starting with the chain rule (A.28) and
assuming that f = g −1 exists and is totally differentiable. On the left-hand side of Eq. (A.28) we have
the map f ◦ g = g −1 ◦ g = id, that is the identify map, with id(x) = x. Since
∂ idi ∂xi
= = δij ,
∂xj ∂xj
the Jacobi matrix of the identity map is the unit matrix, so D id = 1. Hence, Eq. (A.28) turns into
1 = D(g−1 )(y) Dg(x) . (A.33)
where y = g(x) and we have shown

164
Corollary A.1. Let g : U → U , with U ⊂ Rn open, be totally differentiable at x ∈ U , and invertible with
the inverse g −1 also totally differentiable at y = g(x) ∈ U . Then, the Jacobi matrix Dg(x) is invertible
and we have
D(g −1 )(y) = (Dg(x))−1 . (A.34)

Proof. This follows directly from Eq. (A.33).

Note that the inverse on the right-hand side of Eq. (A.34) is a matrix inverse. In other words, the Jacobi
matrix of the inverse function is the matrix inverse of the original Jacobi matrix. If we adopt a more
physics-related notation and write yi (x) = gi (x), as before, then, using the notation (A.27) for the Jacobi
matrix, the inverse derivative rule (A.34) can be stated as
 −1
∂(x1 , . . . , xn ) ∂(y1 , . . . , yn )
(y) = (x) . (A.35)
∂(y1 , . . . , yn ) ∂(x1 , . . . , xn )

Application 1.37. Jacobian of the inverse function


Suppose we have a function g : R2 → R2 defined by g = (g1 , g2 )T with g1 (x1 , x2 ) = (x21 + x22 )/2 and
g2 (x1 , x2 ) = x1 x2 . Writing y1 = g1 and y2 = g2 as before, the Jacobi matrix of g is given by
! 
∂y1 ∂y1 
∂(y1 , y2 ) ∂x ∂x x1 x2
= 1
∂y2 ∂y2
2 = . (A.36)
∂(x1 , x2 ) ∂x1 ∂x2
x2 x1

One way to compute the Jacobi matrix of g −1 is to work out this inverse map explicitly by solving
the equations y1 = (x21 + x22 )/2 and y2 = x1 x2 for x1 and x2 and then computing ∂(x 1 ,x2 )
∂(y1 ,y2 ) . But it is
easier to use the inverse derivative rule (A.35) instead, which gives
 −1  
∂(x1 , x2 ) ∂(y1 , y2 ) 1 x1 −x2
= = 2 . (A.37)
∂(y1 , y2 ) ∂(x1 , x2 ) x1 − x22 −x2 x1

Note this result for the Jacobi matrix for g −1 is only well-defined in the neighbourhood of points
(x1 , x2 ) where x21 − x22 6= 0. Indeed, when x21 − x22 = 0 the Jacobi matrix (A.36) of g becomes singular
so its inverse does not exist. This indicates that the function g cannot be inverted near points (x1 , x2 )
with x21 − x22 = 0, a statement that will be made more precise by Corollary A.2 below.

A.5 Taylor series and extremal points


We would like to write down the Taylor series for a (suitably many times continuously differentiable)
real-valued function f : U → R, where U ⊂ Rn open, with coordinates x = (x1 , . . . , xn )T . To this end it is
useful to introduce a more compact multi-index notation, with multi-indices K = (k1 , . . . , kn ), monomials
xK := xk11 xk22 · · · xknn and a generalised factorial K! := k1 ! k2 ! · · · kn !. With this notation, the typical term
we expect in a multi-dimensional Taylor series can be written concisely as

xK xk1 xk2 xkn


= 1 2 ··· n . (A.38)
K! k1 ! k2 ! kn !
It is useful to keep track of the total degree of such a term which is given by |K| := k1 + k2 + · · · kn . Also,
multiple partial derivatives are denoted by ∂ K := ∂1k1 ∂2k2 · · · ∂nkn .

165
A.5.1 Taylor’s formula
Theorem A.5. (Taylor’s formula) Let f : U → R, with U ⊂ Rn open, be κ-times continuously differen-
tiable at x ∈ U . For a ξ ∈ Rn which satisfies x + tξ ∈ U for all t ∈ [0, 1] there exists an  ∈ [0, 1] such
that
X ∂ K f (x) X ∂ K f (x + ξ) − ∂ K f (x)
f (x + ξ) = ξK + ξK . (A.39)
K! K!
|K|≤κ |K|=κ

Proof. The proof is not too difficult but somewhat elaborate, and we refer to the literature, for example
Ref. [10], for details.

Note that this formula is actually an equality and the second sum on the right-hand side can be seen as
the size of the error if we truncate the series by just keeping the first sum up to order κ on the right-hand
side. The full Taylor series
X ∂ K f (x)
ξK (A.40)
K!
K

does not need to converge and even if it does it need not converge to the function value. However, careful
consideration of the error term in Eq (A.39) and its behaviour with the order κ, for a given function f ,
can often be used to decide for which values of ξ the series (A.40) converges to f (x + ξ).
The error term can be recast into a simpler, more intuitive form. Since, by assumption, κ derivatives
on f still lead to continuous functions we have limξ→0 (∂ K f (x + ξ) − ∂ K f (x)) = 0. From Eq. (A.21), this
means that the second sum in Eq. (A.39) is of O(|ξ|κ+1 ) and Taylor’s formula becomes
X ∂ K f (x)
f (x + ξ) = ξ K + O(|ξ|κ+1 ) . (A.41)
K!
|K|≤κ

It is instructive to write this out more explicitly for κ = 2. This leads to


n n
X 1 X
f (x + ξ) = f (x) + ∂i f (x) ξi + ∂i ∂j f (x) ξi ξj + O(|ξ|3 ) (A.42)
2
i=1 i,j=1
1
= f (x) + ∇f (x) · ξ + ξ T H(x) ξ + O(|ξ|3 ) , (A.43)
2
where the symmetric n × n matrix H(x) with entries

(H(x))ij = ∂i ∂j f (x) (A.44)

is called the Hesse matrix of f at x.

A.5.2 Extremal points


The above form of Taylor’s formula up to quadratic terms is very useful to analyse local properties of a
function, such as minima and maxima. Just to be sure we define these properly first.

Definition A.4. Consider a real-valued function f : U → R, where U ⊂ Rn open. A point x ∈ U


is called a local minimum (a local maximum) of f if there is a sufficiently small ball B (x) ⊂ U such
that f (z) ≥ f (x) for all z ∈ B (x) (f (z) ≤ f (x) for all z ∈ B (x)). A local extremum is either a local
minimum or a local maximum. If the equality in the above conditions is only realised for z = x then the
local minimum, maximum or extremum is called isolated.

166
Local extrema have an interesting property.

Proposition A.4. Consider a real-valued continuously differentiable function f : U → R, where U ⊂ Rn


open, with a local extremum at x ∈ U . Then ∇f (x) = 0, that is, x is a stationary point of f .

Proof. This can be shown by relating the statement to its analogue in one dimension. To do this, consider
the functions gi : R → R defined by gi (t) := f (x + tei ). Clearly, if f has a local extremum at x the
functions gi have a local extremum at t = 0. Hence, from the well-known one-dimensional statement we
know that gi0 (0) = 0. However, gi0 (0) = ∂i f (x) (for all i) and this proves the claim.

Hence, the local extrema of a function f are to be found among its stationary points, that is, among the
points x in its domain for which ∇f (x) = 0. However, not all such stationary points are necessarily local
extrema. Criteria for when this is the case can be formulated in terms of the Hesse matrix. Indeed, at a
stationary point x of f Taylor’s formula (A.43) becomes
1
f (x + ξ) − f (x) = ξ T H(x) ξ + O(|ξ|3 ) , (A.45)
2
so the leading behaviour near x is determined by the second order term which is controlled by the Hesse
matrix. To formulate what this implies more precisely we need to recall some properties of symmetric
n × n matrices M form linear algebra. Such a matrix M is called positive definite if ξ T M ξ > 0 for all
ξ 6= 0. It is called negative definite if −M is positive definite. Further, M is called indefinite if ξ T M ξ takes
on strictly positive and strictly negative values, for suitable ξ. We recall that M is positive (negative)
definite iff all its eigenvalues are strictly positive (strictly negative). It is indefinite iff it has at least one
strictly positive and at least one strictly negative eigenvalue.

Theorem A.6. Consider a real-valued twice continuously differentiable function f : U → R, where


U ⊂ Rn open, with a stationary point at x ∈ U . Then we have the following statements.
(i) If H(x) is positive definite then x is an isolated local minimum.
(ii) If H(x) is negative definite then x is an isolated local maximum.
(iii) If H(x) is indefinite then x is not a local extremum.

Proof. (i) Let us sketch the proof. Since H(x) is positive definite there exists a constant c > 0 such that
2 ξ H(x) ξ ≥ c|ξ| for all ξ ∈ R . This means, we can always find an  > 0 so that the O(|ξ| ) term in
1 T 2 n 3

Eq. (A.45) is smaller than 21 ξ T H(x) ξ for all ξ with |.ξ| < . Hence, from Eq. (A.45), f (x + ξ) − f (x) > 0
for all ξ with |.ξ| <  so that x is indeed an isolated local minimum.
(ii), (iii) The proofs are similar to the one for (i).

Proposition A.4 and Theorem A.6 suggest a method to find the isolated local extrema of a function
f : U → R. As a first step, find the stationary points of f in U by solving the equation ∇f (x) = 0. Next,
for each stationary point x, compute the Hesse matrix H(x) and its eigenvalues and use the following
criterion:
x local isolated minimum ⇐ all eigenvalues of H(x) strictly positive
x local isolated maximum ⇐ all eigenvalues of H(x) strictly negative . (A.46)
x not a local extremum ⇐ H(x) has strictly positive and strictly negative eigenvalues

Let us discuss functions f of two variables x = (x1 , x2 )T more explicitly. In this case, the Hesse matrix at
x takes the form
(
det(H(x)) = (∂12 f ∂22 f − (∂1 ∂2 f )2 )(x)
 2 
∂1 f ∂1 ∂2 f
H(x) = 2 (x) ⇒ . (A.47)
∂1 ∂2 f ∂ 2 f tr(H(x)) = (∂12 f + ∂22 f )(x)

167
It has two eigenvalues, λ1 , λ2 , and their signs can be determined by considering the determinant and trace
of H(x). More specifically, for a stationary point x of f we have

x local isolated minimum ⇐ λ1 , λ2 > 0 ⇔ det(H(x)) > 0 and tr(H(x)) > 0


x local isolated maximum ⇐ λ1 , λ2 < 0 ⇔ det(H(x)) > 0 and tr(H(x)) < 0 . (A.48)
x not a local extremum ⇐ λ1 λ2 < 0 ⇔ det(H(x)) < 0
If the function depends on more than two variables, the determinant and trace of the Hesse matrix are not
sufficient to decide on the signs of all of its eigenvalues. For such cases, in order to work out the nature of
a stationary point x, one has to compute the eigenvalues of H(x) from its characteristic polynomial and
use the general criterion (A.46).

Application 1.38. Extrema for a function f : R3 → R


As a simple example, consider the function f : R3 → R defined by f (x, y, z) = x2 − x + xy 2 + z 2 − 2z
with gradient and Hesse matrix
 
2 2y 0
∇f (x, y, z) = (y 2 + 3x − 1, 2xy, 2z − 2) , H(x, y, z) =  2y 2x 0  .
0 0 2

Setting ∇f (x, y, z) = 0 one finds three stationary points which we collect, together with the associated
Hesse matrices and their eigenvalues, in the following table.

(x, y, z) H(x, y, z) eigenvalues of H


 
2 2 0 √ √
(0, 1, 1)  2 0 0  (1 + 5, 1 − 5, 2)
0 0 2
 
2 −2 0 √ √
(0, −1, 1)  −2 0 0  (1 + 5, 1 − 5, 2)
0 0 2
 
2 0 0
(1/2, 0, 1)  0 1 0  (2, 1, 2)
0 0 2

Hence, the stationary points at (x, y, z) = (0, ±1, 1) are not local extrema (but saddles) since they
have two strictly positive and one strictly negative eigenvalue. On the other hand, the stationary
point at (x, y, z) = (1/2, 0, 1) is a local minimum, since all eigenvalues are strictly positive.

A.6 Implicit functions


Suppose we have a function f = (f1 , . . . , fm ) : U → Rm , where U ⊂ Rn open, of n variables which we
denote by z = (z1 , . . . , zn )T and also assume that m < n. We want to solve the equation f (z) = 0, a
common problem which arises in many situations in mathematics and physics.
An obvious way to proceed is as follows. Since f (z) = 0 consists of m scalar equations for n > m
variables we can split our variables as z = (x, y), where x ∈ Rn−m and y ∈ Rm and then attempt to solve
for the variables y in terms of x. More precisely, suppose we have a point (a, b) ∈ Rn−m ×Rm which solves
the equation, that is, f (a, b) = 0. For open neighbourhoods Ṽ and W̃ of a and b (where Ṽ × W̃ ⊂ U )
we would then like to find a function g : Ṽ → W̃ such that b = g(a) and f (x, g(x)) = 0 for all x ∈ Ṽ .
This function is precisely the desired solution y = g(x) of f (x, y) = 0 (at least locally, near (a, b)), but

168
does it exist? Unfortunately, the answer is “not always” and this is where the implicit function theorem
comes into play. It states sufficient conditions for the solution g to exist.

Application 1.39. A simple implicit function example


To develop a better intuition for what the obstruction is it is useful to discuss an example. Consider
the function f : R2 → R of two variables z = (x, y)T defined by f (x, y) = x2 + y 2 − 1. Of course the
solutions to f (x, y) = 0 consist of all points on the unit circle in R2 . Choose a specific point (a, b)
on this circle, so that a2 + b2 = 1. Can we always find open neighbourhoods Ṽ of a and W̃ of b for
which we can solve y in terms of x? Consider the points (a, b) = (±1, 0) where the circle intersects
the x-axis and the tangent to the circle is vertical. For any open neighbourhood Ṽ = (±1 − , ±1 + )
we have a problem. If x ∈ Ṽ and |x| < 1 then we have two possible values of y (one on the upper
half-circle, the other one on the lower half-circle) for this x. If |x| > 1 there is no corresponding value
of y. In either case, we cannot define a function y = g(x) since there is no (unique) y to assign as the
value of g at x. Evidently, the problem arises where the tangent is vertical, that is, at points (a, b)
with ∂y f (a, b) = 0, while all other points seem fine.

The implicit function theorem states, roughly, that, as long as the condition ∂y f (a, b) 6= 0 (and its suitable
generalisation to higher dimensions) is satisfied, so that the tangent is not vertical, we can find a local
solution y = g(x), near the point (a, b). The precise formulation is as follows.

Theorem A.7. (Implicit function theorem). Let f : V × W → Rm , with V ⊂ Rn−m and W ⊂ Rm open
(where n > m), be a continuously differentiable function. Denote the coordinates by (x, y) ∈ V × W and
choose a point (a, b) ∈ V × W . If f (a, b) = 0 and if the matrix

∂(f1 , . . . , fm )
(a, b) (A.49)
∂(y1 , . . . , ym )

has maximal rank, then there exist open neighbourhoods Ṽ ⊂ V of a and W̃ ⊂ W of b and a continuously
differentiable function g : Ṽ → W̃ with f (x, g(x)) = 0 for all (x, y) ∈ Ṽ ×W̃ . Conversely, if (x, y) ∈ Ṽ ×W̃
and f (x, y) = 0 then y = g(x).

Proof. The proof is somewhat lengthy and can be found in the literature, see for example [10].

The maximal rank condition on the (partial) Jacobi matrix (A.49) is the generalisation of the requirement,
stated earlier, that the tangent at (a, b) is not vertical. The theorem implies a formula for the derivative
of the function g in terms of f . To see this, apply the chain rule to the equation f (x, g(x)) = 0:

∂(f1 , . . . , fm ) ∂(f1 , . . . , fm ) ∂(g1 , . . . , gm )


+ =0. (A.50)
∂(x1 , . . . , xn−m ) ∂(y1 , . . . , ym ) ∂(x1 , . . . , xn−m )

The second Jacobi matrix in the above equation is precisely the one in Eq. (A.49) which is required to
have maximal rank. Hence, we can invert this matrix and solve for the Jacobi matrix of g:

∂(f1 , . . . , fm ) −1 ∂(f1 , . . . , fm )
 
∂(g1 , . . . , gm )
=− . (A.51)
∂(x1 , . . . , xn−m ) ∂(y1 , . . . , ym ) ∂(x1 , . . . , xn−m )

Note, we can calculate the Jacobi matrix of g from this formula without actually knowing the function g
explicitly - all we need is the original function f .

169
Application 1.40. Another implicit function example
Let us illustrate this with a further example, for a function f : R3 → R defined by f (x1 , x2 , y) =
x31 /3 + x22 y 3 /3 + 1. The solutions to the equation f (x1 , x2 , y) = 0 form a surface in R3 and it is
often convenient to describe this surface by solving for y in terms of x1 and x2 . The implicit function
theorem states this is possible, at least locally, for points where

∂y f = x22 y 2 (A.52)

is different from zero, that is, for every point on the surface with x2 6= 0 and y 6= 0. In the neighbour-
hood of such a point, we can find a solution y = g(x1 , x2 ) and compute its derivative from Eq. (A.51):
 
∂f 1 2
∇g = − [∂y f ]−1 = − 2 2 x21 , x2 y 3 . (A.53)
∂(x1 , x2 ) x2 y 3
Here, y should be though of as the function y = g(x1 , x2 ) obtained by solving f (x1 , x2 , y) = 0 for y.

We can apply the implicit function theorem to answer the question under which conditions an inverse of
a (differentiable) function g : W → Rm , with W ⊂ Rm open, exists. More explicitly, if we have points
a ∈ W and b ∈ Rm with a = g(b) we want to know if the equation x = g(y) can be solved for y in terms
of x near (a, b). To make contact with the implicit function theorem, we define the auxiliary function
f : R2m → Rm by f (x, y) = g(y) − x. Then we have f (a, b) = 0 and the implicit function theorem states
that f can be solved for y in terms of x near (a, b) if the matrix
∂(f1 , . . . , fm ) ∂(g1 , . . . , gm )
(a, b) = (b) (A.54)
∂(y1 , . . . , ym ) ∂(y1 , . . . , ym )
has maximimal rank. Hence, we have
Corollary A.2. Let g : W → Rm , where W ⊂ Rm open, be a differentiable function with a = g(b) for
certain points a ∈ W and b ∈ Rm . Then g is invertible (that is, the equation x = g(y) can be solved for
y) in terms of x near (a, b) if the Jacobi matrix
∂(g1 , . . . , gm )
(b) (A.55)
∂(y1 , . . . , ym )
of g is invertible at b.

Application 1.41. Implicit functions on a surface


Suppose we have a (continuously differentiable) function f : R3 → R, and we denote the coordinates
on R3 by (x1 , x2 , x3 ) = (p, v, t). Then, the equation f (p, v, t) = 0 (in the context of Thermodynamics
this might be an equation of state) defines a surface in R3 . Suppose, at a point (p, v, t) all three
partial derivates ∂p f , ∂v f and ∂t f are non-zero. Then the implicit function theorem tells us that,
near this point, we can solve the equation f (p, v, t) = 0 for each of the three variables in terms of the
other two. Let me call these solutions p = P (v, t), v = V (p, t) and t = T (p, v) so that

f (P (v, t), v, t) = f (p, V (p, t), t) = f (p, v, T (p, v)) = 0 . (A.56)

In common physics notation, no distinction is made between the functions and the associated co-
ordinates. Here, in order to avoid confusion, I have used lowercase letters for the coordinates and
uppercase letters for the corresponding functions. The derivatives of our implicit functions P , V and

170
T can be computed using Eq. (A.51) (or, alternatively, by differentiating Eq. (A.56) using the chain
rule) and this leads to
∂v P = − ∂∂vp ff ∂t P = − ∂∂pt ff
∂ f
∂p V = − ∂vp f ∂t V = − ∂∂vt ff (A.57)
∂ f
∂p T = − ∂pt f ∂v T = − ∂∂vt ff
Various conclusions can be easily obtained using these equations, for example,

∂v P ∂t V ∂p T = −1 , ∂v P = (∂p V )−1 . (A.58)

Do not let yourself be confused by the second of these equations which says we can simply invert
individual partial derivatives to get the derivatives of the inverse function. At first sight, this seems
to be at odds with our general rule, Eq. (A.35), which says the Jacobi-matrix of the inverse function
is the matrix inverse of the original Jacobi matrix. But it is important to note that this general rule
was based on a set-up with independent coordinates (x1 , . . . , xn )T ∈ Rn . In the present case, on the
other hand, we start with three independent coordinates (p, v, t)T ∈ R3 but we impose the condition
f (p, v, t) = 0. This means we are effectively working on a surface within R3 and only two coordinates
are independent. The somewhat peculiar looking relations (A.58) are a consequence of this special
set-up and are, therefore, not in contradiction with our general rule in Eq. (A.35). In fact, they were
derived using the general chain rule, in the form of Eq. (A.51).

171
B Manifolds in Rn
Differentiable manifolds are the main arena of an area of mathematics called differential geometry. It is a
large subject (and a major area of contemporary research in mathematics) which is well beyond the scope
of these lectures. Here, we would like to present a rather limited discussion of manifolds embedded in Rn
and some of their elementary geometry. (A more advanced introduction to differential geometry can, for
example, be found in Ref. [14].) Some special cases of this have, in fact, already been covered in your
first year course. Apart from generalising these the main purpose of this appendix is to support some
results used in the main part of the text and provide a (very) basic mathematical grounding for General
Relativity.

B.1 Definition of manifolds


You have all seen examples of manifolds in Rn , for example the unit circle S 1 embedded in R2 . There are
two ways of describing such a circle. We can either use an equation, so
S 1 = {(x, y) ∈ R2 | x2 + y 2 = 1} , (B.1)
or we can use a parametrisation
S 1 = {(cos(t), sin(t)) | t ∈ [0, 2π)} . (B.2)
More generally, differentiable manifolds in Rn include curves, surfaces and their higher-dimensional ana-
logues, all embedded in Rn , and they can be defined in two, equivalent ways which are modelled on the
two descriptions of the circle above. We begin with the definition in terms of equations which generalises
the description (B.1) for the circle.
Definition B.1. A subset M ⊂ Rn is a k-dimensional (differential) sub-manifold of Rn if, for every
p ∈ M and an open neighbourhood V ⊂ Rn of p we have V ∩ M = {x ∈ Rn | f1 (x) = · · · = fn−k (x) = 0}
for continuously differentiable functions fi : Rn → R for which the Jacobi matrix
∂(f1 , . . . , fn−k )
(p) (B.3)
∂(x1 , . . . , xn )
has rank n − k.
This definition says, essentially, that a k-dimensional sub-manifold of Rn is given, locally, by the common
zero locus of n − k functions. (See Fig. 18.) The rank condition ensures that this manifold is smooth,

V \ M = {x 2 Rn | f1 (x) = · · · = fn k (x) = 0}

V
M

Rn

Figure 18: Manifold defined locally as the common zero locus of functions.

that is, there are no “edges”. The circle S 1 in (B.1) provides an example with n = 2, k = 1 and
V
f1 (x, y) = x2 + y 2 − 1. The alternative description (B.2) of S 1 in terms of a parametrisation generalises to
M V \M
172

X
k
Theorem B.1. The set M ⊂ Rn is a k-dimensional sub-manifold of Rn iff, for every p ∈ M we have an
open neighbourhood V ⊂VRn\ofM
p, =
an {x RnU |⊂f1R(x)and
open2set k
=a·bijective
· · = fnmap X = (X1 , . . . , Xn )T : U → V ∩M
k (x) = 0}
such that the Jacobi matrix
∂(X1 , . . . Xn )
(B.4)
∂(t1 , . . . , tk ) V
tk )T ∈ U . The map X is called a chart of M .
has rank k for all t = (t1 , . . . , M

Proof. The proof involves the implicit function theorem A.7 and it can, for example, be found in Ref. [10].

n
R
The circle S 1 as parametrised in Eq. (B.2) provides an example with n = 2, k = 1 and X(t) =
T T
(X1 (t), X2 (t)) = (cos t, sin t) . More generally, the theorem says we can describe a k-dimensional sub-
manifold of Rn (at least locally) by a parametrisation X(t) = (X1 (t), . . . Xn (t))T , where t ∈ Rk are the
parameters. (See Fig. 19.) For one parameter (k = 1), this describes a one-dimensional sub-manifold,

V
M V \M

U ⇢ Rk

Figure 19: Manifold defined locally by a parametrisation (a chart).

that is a curve, for two parameters (k = 2) it describes a two-dimensional sub-manifold, that is a surface,
and so on.

B.2 Tangent space


It is intuitively clear that a k-dimensional sub-manifold M of Rn has, at each point p ∈ M , a tangent
space which is a one-dimensional vector space for a curve (k = 1), a two-dimensional vector space for a
surface (k = 2), and, generally, is k-dimensional. This tangent space at p is denoted by Tp M and is more
precisely defined as follows.

Definition B.2. The tangent space Tp M of a k-dimensional sub-manifold M of Rn is defined by


 
∂X ∂X
Tp M = Span (p), . . . , (p) . (B.5)
∂t1 ∂tk

The vectors in Tp M are called tangent vectors of M at p.


∂X
Note that ∂ta
(p), where a = 1, . . . , k, are k vectors in Rn and the maximal rank condition on the Jaco-
bian (B.4) ensures that these k vectors are linearly independent. This means that the tangent space of a
k-dimensional sub-manifold is indeed a vector space of dimension k, at every point. (See Fig. 20.) Hence,

173
X

U ⇢ Rk

Tp M
M p
V \M

Figure 20: Tangent space Tp M of a manifold M at a point p ∈ M .

a general tangent vector at p ∈ M can be written as


k
X ∂X
va (B.6)
∂ta
a=1

for v a ∈ R.
Note that the above definition of the tangent space is independent of the parametrisation used. Con-
sider a new set of parameters t̃ = (t̃1 , . . . , t̃k ), a reparametrisation t̃ = t̃(t) and an alternative parametri-
∂ t̃
sation X̃(t̃) := X(t(t̃)) of the same manifold, such that the Jacobian ∂t has rank k. It follows that

∂ X̃ ∂X ∂tb
= . (B.7)
∂ t̃a ∂tb ∂ t̃a
∂ t̃ ∂ X̃ ∂X
Since the Jacobian ∂t has maximal rank the vectors ∂ t̃a
and ∂ta span the same space.

Application 2.42. Tangent space of S 1


It may be useful to work out the tangent space for an example. Consider S 1 , parametrised as in
Eq. (B.2), so X(t) = (cos(t), sin(t))T . Since this is a curve the tangent space is one-dimensional and
it is, at every point on the circle, spanned by the vector
 
∂X − sin(t)
= . (B.8)
∂t cos(t)

Note that, intuitively, this vector is indeed tangential to the circle for every t.

Exercise B.2. Write down a suitable parametrisation for a two-sphere S 2 ⊂ R3 and compute its tangent
space at each point.

Suppose we have a hypersurface M ⊂ Rn defined (locally) by the vanishing locus of a function f . It is


well-known that the gradient ∇f is orthogonal to M but how is this statement actually proved? First
we have to clarify what we mean by a vector being orthogonal to a hypersurface – after all, M is not
necessarily a plane but can be a complicated surface. We say a vector w ∈ Rn is orthogonal to M at a
point p ∈ M if w · v = 0 for all v ∈ Tp M so, in short, if w is orthogonal to the entire tangent space at p.
So this is the property we have to show for the gradient ∇f (p).

174
From Def. B.1 we know that at least one component of ∇f (p) must be non-zero. Without restricting
generality, let us assume this is the last component and we spit the coordinates in the neighbourhood of
p up as (x, y) ∈ Rn , where x ∈ Rn−1 and y ∈ R. Since ∂y f (p) 6= 0 we know from the implicit function
theorem A.7 that we have a solution y = g(x) with f (x, g(x)) = 0 in a neighbourhood of p. Further, from
Eq. (A.51) we know that the derivatives of g are given by
∂i f
∂i g = − for i = 1, . . . , n − 1 . (B.9)
∂y f
Using the function g we can also write down a parametrisation for the manifold and compute the resulting
tangent vectors.
 
x1
 ..  ∂X
 
 .  ei
X(x) =  ⇒ = for i = 1, . . . , n − 1 . (B.10)
∂xi ∂i g

 xn−1 
g(x)
Here we have used the coordinates x = (x1 , . . . , xn−1 )T as the parameters and ei are the standard unit
vectors in Rn−1 . We can now work out the dot product between the gradient ∇f and the above tangent
vectors which leads to
∂X Eq. (B.9)
∇f = ∂i f + ∂y f ∂ i g = 0. (B.11)
∂xi
This shows that ∇f is indeed orthogonal to all tangent vectors and, hence, to Tp M and this is the desired
result.

B.3 Integration over sub-manifolds of Rn


Integrating over a sub-manifold of Rn is conceptually quite different from “plain” integration, as described
in Section 1.3. The latter only requires an (integrable) function plus a measurable set to integrate over.
To integrate functions over manifolds additional geometrical information is required and, as we will now
see, this information is provided via a metric.

B.3.1 Metric and Gram’s determinant


Integration over (k-dimensional) sub-manifolds of Rn requires a non-trivial “measure”, dS, in the integral
which we now construct. First define the metric
∂X ∂X
gab := · , (B.12)
∂ta ∂tb
a symmetric k × k matrix at every point t ∈ U which contains the dot products of the tangent vectors
∂X ∂X
∂ta . The metric encodes the fact that the tangent vectors ∂ta , while providing a basis for the tangent
space Tp M , do not necessarily form an ortho-normal basis and this, in turn, can be seen as a measure
for the non-trivial “shape” of the manifold. The metric is the central object of General Relativity, which
describes the effects of gravity via a curved space time.
To see how the metric changes if we consider another set of parameters t̃ = t̃(t), we can use Eq. (B.7),
which leads to
∂ X̃ ∂ X̃ ∂tc ∂td
g̃ab = · = gcd . (B.13)
∂ t̃a ∂ t̃b ∂ t̃a ∂ t̃b
The determinant g := det(gab ) of this metric is called Gram’s determinant and, under a coordinate
transformation, it changes as
 2
∂t
g := det(gab ) ⇒ g̃ = det g. (B.14)
∂ t̃

175
B.3.2 Definition of integration over sub-manifolds
We are now ready 15 to define integration over a sub-manifold of Rn .
Definition B.3. Let M ⊂ Rn be a k-dimensional sub-manifold of Rn which (up to a set of measure zero)
is given by a single chart X : U → M . For a function f : M → R the integral over M is defined as

Z Z
f dS := f (X(t)) g dk t . (B.15)
M U

Symbolically this can also be written as dS = g dk t.
Note that the integral on the RHS is well-defined - it is simply an integral over the parameter space
U ⊂ Rk . However, it is important to check that this definition is independent of the parametrisation
chosen. Clearly, we do not want the value of integrals to depend on how we choose to parametrise the
manifold. Consider a re-parametrisation t̃ = t̃(t) and the transformation rules
   
p ∂t √ ∂ t̃
g̃ = det g, dk t̃ = det dk t (B.16)
∂ t̃ ∂t
where the former equation follows from (B.14) and the latter is simply the transformation formula for
integrals in Rk . Then we learn that
p √
dS̃ = g̃ dk t̃ = g dk t = dS , (B.17)
so the integral is indeed independent of the parametrisation. Essentially, this is the reason for including

the factor g in the measure - its transformation cancels the transformation under coordinate change, as
is evident from Eq. (B.16).
Exercise B.3. Parametrise the upper half circle in R2 in two different ways (by an angle and by the x
coordinate) and check that the integral over the half-circle is the same for these two parametrisations.

B.3.3 A few special cases


It is useful to apply this to a few specific cases. First consider a curve X : [a, b] → Rn , where X(t) =
(X1 (t), . . . , Xn (t)T . A quick calculation shows that
v
n  2 u n 
dX i uX dX i 2
X 
g= ⇒ dS = t dt . (B.18)
dt dt
i=1 i=1

This is indeed the well-known measure for integrating over curves.


Next, let us consider surfaces in R3 , parametrised by X : U → R3 with parameters (t1 , t2 ) ∈ U . Since
we are in R3 we can use the cross product to define a normal vector and a unit normal vector to the
surface by
∂X ∂X N
N := × , n := . (B.19)
∂t1 ∂t2 |N|
The metric is two-dimensional and given by
 2 
∂X ∂X ∂X
∂t1 ·
∂t1 ∂t2 
(gab ) =  2 (B.20)
∂X ∂X ∂X
·
∂t1 ∂t2 ∂t2

15
For simplicity we focus on the case where the manifold is, up to a set of measure zero, given by a single chart. If multiple
charts come into play a further technical twist, referred to as partition of unity is required.

176
For Gram’s determinant we find
2 2  2 2
∂X ∂X ∂X ∂X ∂X ∂X
g = det(gab ) = − · = × = |N|2 , (B.21)
∂t1 ∂t2 ∂t1 ∂t2 ∂t1 ∂t2
end, hence the measure for surfaces in R3 can be written as
dS = |N| dt1 dt2 . (B.22)
What if the surface is given not by a parametrisation but as the zero locus of a function f , so that
M = {(x, y, z) ∈ R3 | f (x, y, z) = 0}? In this case, we can solve (at least locally where ∂f
∂z 6= 0, from the
implicit function theorem A.7) the equation f (x, y, z) = 0 for z and obtain z = h(x, y) for some function
h. This provides us with a parametrisation of the surface in terms of t1 = x and t2 = y, given by
 
x
X(x, y) =  y  . (B.23)
h(x, y)
For this parametrisation the metric and Gram’s determinant read (denoting partial derivatives by sub-
scripts, for simplicity)
1 + h2x hx hy
 
(gab ) = ⇒ g = 1 + h2x + h2y . (B.24)
hx hy 1 + h2y
Since f (x, y, h(x, y)) = 0 (as z = h(x, y) is a solution) it follows, by applying the chain rule that

fx + fz hx = 0 ⇒ hx = − ffxz
f (B.25)
fy + fz hy = 0 ⇒ hy = − fyz
These equations allow us to re-write Gram’s determinant and the measure in terms of the function f :
|∇f |2 |∇f | dx dy ∇f
g= ⇒ dS = dx dy = , n= . (B.26)
|fz |2 |fz | |n3 | |∇f |
This is a known result from year 1 but note that it applies to a rather special case - the general (and much
more symmetric) formula is the one given in Def. (B.3).
As a final (and more explicit) application, let us compute the measure on a two-sphere S 2 with the
standard parametrisation X(θ, ϕ) = (sin θ cos ϕ, sin θ sin ϕ, cos θ)T , where θ ∈ [0, π] and ϕ ∈ [0, 2π). With
the two tangent vectors
   
cos θ cos ϕ − sin θ sin ϕ
∂X ∂X
=  cos θ sin ϕ  , =  sin θ cos ϕ  , (B.27)
∂θ ∂ϕ
− sin θ 0
the metric and Gram’s determinant are
 
1 0
(gab ) = ⇒ g = sin2 θ . (B.28)
0 sin2 θ
As a result we find the well-known measure
dS = sin θ dθ dϕ (B.29)
for the integration over S 2 .
Exercise B.4. Consider an ellipsoid in R3 with half-axis a, b and c and parametrisation X(θ, ϕ) =
(a sin θ cos ϕ, b sin θ sin ϕ, c cos θ)T . Work out the corresponding measure dS.

177
B.4 Laplacian
Suppose we have a k-dimensional manifold M ⊂ Rn and a chart X : U → M and we denote the parameters
by t = (t1 , . . . , tk )T ∈ U , as usual. Suppose we choose another set of coordinates t̃ = (t̃1 , . . . , t̃k )T = T (t)
on T (U ) such that
∂ t̃c ∂ t̃d
gab = δcd , (B.30)
∂ta ∂tb
that is, relative to the coordinates t̃a , the metric is δcd . In those coordinates, the Laplacian is of the
standard form
k
X ∂2
∆= . (B.31)
a=1
∂ 2 t̃a
We would like to define a notion of a Laplacian ∆X on M which means re-expressing the Laplacian in
terms of the original coordinates ta .

Definition B.4. Given the above set-up, the Laplacian ∆X , associated to the chart X : U → M , is
defined by
∆X (f ◦ T ) := ∆(f ) ◦ T , (B.32)
where f : Rn → R are twice continuously differentiable functions.

Note that this is quite a natural definition. The composition f ◦ T is a function on the parameter space U
and we define the action of the Laplacian ∆X on this function by the action of the ordinary (Cartesian)
Laplacian, ∆, on f followed by a composition with T , to make this a function on U . While the above
definition seems natural it is not particularly practical for computations. To this end, we have the following

Lemma B.1. The Laplacian ∆X defined in Def. (B.4) can be written as


 
1 ∂ √ ab ∂
∆X = √ gg , (B.33)
g ∂ta ∂tb

where g ab are the entries of the inverse of the metric (gab ).

Proof. For two twice continuously differentiable functions u, v : T (U ) → R and ũ = u ◦ T , ṽ = v ◦ T we


have from the transformation formula
√ k √
Z Z Z
(∆u)v d t̃ = ((∆u) ◦ T )(v ◦ T ) g d t = (∆X ũ)ṽ g dk t .
k
(B.34)
T (U ) U U

On the other hand, we can re-write the same integral as



Z Z Z
k
(∆u)v d t̃ = − (∇u) · (∇v) d t̃ = − ((∇u) · (∇v)) ◦ T g dk t
k
T (U ) T (U ) U
 
ab ∂ ũ ∂ṽ √ ∂ √ ab ∂ ũ
Z Z
k
= − g gd t = gg ṽ dk t (B.35)
U ∂ta ∂tb U ∂ta ∂tb

The right-hand sides of Eqs. (B.34) and (B.35) are equal for any ṽ and given we are dealing with continuous
functions this means  
√ ∂ √ ab ∂ ũ
g ∆X ũ = gg , (B.36)
∂ta ∂tb
which is the desired result.

178
C Differential forms
Differentials forms is another classical subject of Mathematical Physics which cannot be covered in the
main part of this course. This appendix is a no-nonsense guide to differential forms and a chance to read up
on the subject without having to deal with excessive mathematical overhead. Many physical theories can
be elegantly formulated in terms of differential form, including Classical Mechanics and Electrodynamics.
Differential forms also provide a unifying perspective on the subject of “vector calculus” which leads to
a deeper understanding of the many ad-hoc objects - such as divergence, curl etc. - which have been
introduced in this context. Our treatment here is basic in that we focus on differential forms on Rn and
sub-manifolds thereof, in line with our basic treatment of manifolds in Appendix B. (Ref. [14] contains a
more advanced treatment of differential forms.)

C.1 Differential one-forms


C.1.1 Definition of differential one-forms
We begin by defining differential one-forms on open sets U ⊂ Rn with coordinates x = (x1 , . . . , xn )T . If
we view U as a manifold (in the sense of Theorem B.1 and, for example, parametrised by itself using
the identity map) then attached to every point p ∈ U there is a tangent space Tp U . Since U is an n-
dimensional manifold we have Tp U ∼ = Rn and we can use the standard unit vectors ei , where i = 1, . . . , n,
as a basis of Tp U . Intuitively, just think about a family of vector spaces, one each attached to every point
p ∈ U , as in Fig. 21. Basically, any construction in linear algebra can now be carried out for this family.

Rn

Tp U
U p

Figure 21: Open set U ⊂ Rn and tangent space Tp U at a point p ∈ U .

In particular, every tangent space Tp U has a dual vector space Tp∗ U (which consists of linear functionals
Tp U → R), called the co-tangent space. The elements of the co-tangent space are called co-tangent vectors.
So attached to every point p ∈ U we have two vector spaces, the tangent space Tp U and its dual, the
co-tangent space Tp∗ U . We are now ready to define differential one-forms.

Definition C.1. (Differential one-forms) A differential one-form is a map w : U → T ∗ U := p∈U Tp∗ U ,


S
with p 7→ wp , such that wp ∈ Tp∗ U .

Stated less formally, a differential one-form w provides us with a co-tangent vector wp at every point
p ∈ U . Recall that such a co-tangent vector is a functional on the tangent space, so it provides a map
wp : Tp U → R and for a tangent vector v ∈ Tp U we have wp (v) ∈ R.

179
C.1.2 The total differential
So far this sounds fairly abstract and appears to be of little use but some light is shed on the matter by
the following
Definition C.2. (Total differential) For a differentiable function f : U → R the total differential df is a
one-form defined by
n
X ∂f
dfp (v) := ∇fp · v = vi (C.1)
∂xi p
i=1

where v = ni=1 v i ei ∈ Tp U .
P

Exercise C.1. Convince yourself that the total differential defined in Def. C.2 is indeed a differential
one-form.

C.2 Basis for differential one-forms


We can use the total differential to gain a better understanding of differential forms. To do this we
introduce coordinate functions xi : U → R defined by

xi (p) := pi , (C.2)

where p = (p1 , . . . , pn )T , which assign to each point p ∈ U its ith coordinate pi . (By a slight abuse of
notation we have used the same symbol for the coordinate xi and the coordinate function.) To understand
what the total differentials dxi of the coordinate functions are we act on the basis vectors ei of the tangent
space.
dxi |p (ej ) = ∇xi |p · ej = ei · ej = δji ⇒ dxi |p (v) = v i . (C.3)
This is precisely the defining relation for a dual basis 16 and we learn that (dxi |p ) is the basis of Tp∗ U dual
to the basis (ei ) of Tp U . Hence, we have the following
Proposition C.1. The total differentials (dx1 |p , . . . , dxn |p ) of the coordinate functions form a basis of
the co-tangent space Tp∗ U . Hence, every differential one-form w and every total differential df on U can
be written as
n n
X X ∂f
w= wi dxi , df = dxi , (C.4)
∂xi
i=1 i=1
where wi : U → R are functions.
Proof. We have already shown the first part of this statement, that is, that (dx1 |p , . . . , dx n
Pn|p ) is indeed a

basis of Tp U . This means for every one-form w and every point p ∈ U we can write wp = i=1 wi (p)dxi |p ,
for some suitable coefficients wi (p). Dropping the argument p gives the first Eq. (C.4). To show the second
Eq. (C.4) work out
∂f i
X ∂f
df |p (v) = ∇f |p · v = v = dxi |p (v) , (C.5)
∂xi p ∂xi p
i

where Eq. (C.3) has been used in the last step. Dropping the arguments v and p leads to the second
Eq. (C.4).
16
It is sometimes said that dxi represents a “small interval attached to xi ”. Obviously, this statement is not really correct
given how we have just defined differential forms. Generously interpreted, the “small interval” view of dxi can be thought of
as an imprecise way of stating the content of the equation on the right in (C.3). This equation says that dxi |p , when acting
on a tangent v ∈ Tp M (and it is this tangent vector which should really be seen as the displacement from p), gives its ith
component v i .

180
This proposition gives us a clear idea what differential one-forms are and how to write them down in
practice. All we need is n functions wi : U → R and we can write down a differential form using Eq. (C.4).
In particular, we see that differential one-forms w on U are in one-to-one correspondence with vector fields
A : U → Rn via  
n w1
A =  ...  .
X
w= wi dxi ←→ (C.6)
 
i=1 wn
This is our first hint that differential forms provide us with a new way to think about vector calculus.
Under the identification (C.6), total differentials df correspond to vector fields ∇f which are given as
gradients of functions. This means that total differentials correspond to conservative vector fields.
The representation (C.4) of differential forms can be used to define their properties in terms of prop-
erties of the constituent functions wi .
Definition C.3. A one-form w = ni=1 wi dxi on U is called differentiable (continuously differentiable, k
P
times continuously differentiable) if all functions wi : U → R are differentiable (continuously differentiable,
k times continuously differentiable).

C.2.1 Integrating differential one-forms


Differential one-forms are designed to be integrated along one-dimensional manifolds, that is, along curves.
To see how this works, consider a curve X : [a, b] → U with parameter t ∈ [a, b] and tangent vector
dX
dt X(t) ∈ TX(t) U . At each point X(t) on the curve, a one-form w on U provides a linear functional
w|X(t) : TX(t) U → R which can act on the tangent to the curve to produce a real number, so
!
dX
w|X(t) ∈R. (C.7)
dt X(t)

Based on this observation we can define


Definition C.4. Let w be a differential one-form on U and X : [a, b] → U a curve. Then, we define the
integral of w over X as !
Z b
dX
Z
w := w|X(t) dt . (C.8)
X a dt X(t)
Note that this definition is extremely natural. At each point on the curve we let the differential one-form
act on the tangent vector to the curve to produce a number and then we integrate over these numbers
along the curve. If we write the one-form and the tangent vector relative to their standard bases as
n n
X dX j
X dX
w= wi dxi , = ej , (C.9)
dt dt
i=1 j=1

we have
!
dX X dX j X dX i dX
w|X(t) = wi (X(t)) dxi |X(t) (ej ) = wi (X(t)) = A(X(t)) ·
dt X(t) dt X(t) dt X(t) dt X(t)
i,j i

where the duality property (C.3) has been used in the second last step and we have used the identifica-
tion (C.6) of one-forms and vector fields in the last step. Hence, the integral of w over X can also be
written as
n Z b Z b
dX i dX
Z X
w= wi (X(t)) dt = A(X(t)) · dt . (C.10)
X a dt X(t) a dt X(t)
i=1

181
This relation shows that the integral of a one-form over a curve is nothing else but the “line-integral” of
the corresponding vector field.
For the integral of a total differential we have
b b
dX d
Z Z Z
df = ∇f (X(t)) · dt = (f (X(t))) = f (X(b)) − f (X(a)) (C.11)
X a dt X(t) a dt

which says that the curve integral of a total differential df equals the difference of the values of f at the
endpoints of the curve. That is, the value of such integrals only depends on the endpoints but not on
the path taken. In particular, integrals of total differentials over closed curves (curves with X(a) = X(b))
vanish. These statements are of course equivalent to the corresponding statements for conservative vector
fields.
So far this does not appear to be overly useful. All we seem to have done is to describe vector fields in
a different way. However, this was just a preparation for introducing higher order differential forms and
this is where things get interesting. Before we can do this we need a bit of mathematical preparation.

C.3 Alternating k-forms


C.3.1 Definition of alternating k-forms
We start with a vector space V over a field F and its dual space V ∗ . As we have seen in the previous
sub-section the dual vector space (to the tangent space) was the crucial ingredient in defining differential
one-forms. In order to be able to introduce higher order differential forms we first need to discuss a bit of
more advanced linear algebra which generalises linear functionals. These are the alternating k-forms on
V which are defined as follows.

Definition C.5. (Alternating k-forms) An alternating k-form w on a vector space V over F is a map
w : V ⊗k → F which is
(i) linear in each of its k arguments, so w(. . . , αv + βw, . . .) = αw(. . . , v, . . .) + βw(. . . , w, . . .)
(ii) alternating in the sense that w(. . . , v, . . . , w, . . .) = −w(. . . , w, . . . , v, . . .)
(The dots indicate arguments which remain unchanged.) The vector space of alternating k-forms over V
is denoted Λk V ∗ , where k = 1, 2, . . .. We also define Λ0 V ∗ = F .

In short, alternating k-forms take k vector arguments to produce a scalar and they are linear in each
argument and completely antisymmetric under the exchange of arguments. It is clear from the definition
that alternating one-forms are linear functionals, so Λ1 V ∗ = V ∗ , so in this sense we have set up a
generalisation of the dual vector space.

C.3.2 The wedge product


However, it is not quite obvious from the above definition how to write down these k-forms more concretely.
To get a handle on this we now devise a way to built up k-forms from one-forms, that is functionals, by
means of an operation called the wedge product.

Definition C.6. (Wedge or outer product) Consider functionals ϕ1 , . . . , ϕk ∈ V ∗ . Then, the k-form
ϕ1 ∧ . . . ∧ ϕk ∈ Λk V ∗ is defined by
 
ϕ1 (v1 ) · · · ϕ1 (vk )
(ϕ1 ∧ . . . ∧ ϕk )(v1 , . . . , vk ) := det  .. ..
 . (C.12)
 
. .
ϕk (v1 ) · · · ϕk (vk )

182
It is clear from the linearity of the functionals ϕi as well as the linearity and anti-symmetry of the
determinant that ϕ1 ∧ . . . ∧ ϕk as defined in Eq. (C.12) is indeed a k-form. It is also easy to see from the
properties of the determinant that calculating with the wedge product is subject to the following rules (α
and β are scalars):

ϕ1 ∧ . . . ∧ (αϕi + β ϕ̃i ) ∧ . . . ∧ ϕk = α ϕ1 ∧ . . . ∧ ϕi ∧ . . . ∧ ϕk + β ϕ1 ∧ . . . ∧ ϕ̃i ∧ . . . ∧ ϕk (C.13)


ϕ1 ∧ . . . ∧ ϕi ∧ ϕi+1 ∧ . . . ∧ ϕk = −ϕ1 ∧ . . . ∧ ϕi+1 ∧ ϕi ∧ . . . ∧ ϕk (C.14)
... ∧ ϕ ∧ ϕ ∧ ... = 0 (C.15)

The first two rules follow from linearity and anti-symmetry of the determinant, respectively, the third one
is a direct consequence of the second.

C.3.3 Basis for alternating k-forms


Proposition C.2. Let (1∗ , . . . , n∗ ) be a basis of V ∗ . Then the k-forms i∗1 ∧ . . . ∧ i∗k , where 1 ≤ i1 < i2 <
· · · < ik ≤ n are a basis of Λk V ∗ . In particular, we have
 
k ∗ n
dim(Λ V ) = . (C.16)
k

Proof. Let 1 , . . . , n be the basis of V with dual basis 1∗ , . . . , n∗ , so that i∗ (j ) = δji . To proof linear
independence consider X
λi1 ...ik i∗1 ∧ . . . ∧ i∗k = 0 (C.17)
i1 <···<ik

and act with this equation on (j1 , . . . , jk ), where j1 < · · · < ik . It follows immediately that λj1 ...jk = 0.
To show that these k-forms span the space start with an arbitrary w ∈ Λk V ∗ . Define the numbers
ci1 ...ik := w(i1 , . . . , ik ) and the differential k-form
X
w̃ := cj1 ...jk j∗1 ∧ . . . ∧ j∗k . (C.18)
j1 <···<jk

Then it follows that w̃(i1 , . . . , ik ) = ci1 ...ik = w(i1 , . . . , ik ) and since w̃ and w agree on a basis we have
w = w̃ and have, hence, written w as a linear combination (C.18) of our basis forms.

The above Lemma gives us a clear idea of how to write down alternating k-forms. Once we have a
basis i∗ of the dual space V ∗ we can use the wedge product to construct a basis i∗1 ∧ . . . ∧ i∗k , where
1 ≤ i1 < . . . < ik ≤ n, of Λk V ∗ , the space of alternating k-forms. Then any k-form w can be written as
X 1 X
w= wi1 ...ik i∗1 ∧ . . . ∧ i∗k = wi1 ...ik i∗1 ∧ . . . ∧ i∗k , (C.19)
k!
i1 <···<ik i1 ,...,ik

where the coefficients wi1 ...ik ∈ F are completely anti-symmetric in the k indices. Also note that there are
no non-trivial alternating k-forms for k > n = dim(V ). In this case the wedge product will always involve
at least two same basis vectors and must vanish from Eq. (C.15). Of course the wedge product generalises
to arbitrary alternating forms by linearity. Specifically, consider an alternating k-form w as in Eq. (C.19)
and an alternating l-forms ν given by
1 X
ν= νj1 ...jl j∗1 ∧ . . . ∧ j∗l , (C.20)
l!
j1 ,...jl

183
Their wedge product is an alternating k + l-form defined by
1 X
w ∧ ν := wi1 ...ik νj1 ...jl i∗1 ∧ . . . ∧ i∗k ∧ j∗1 ∧ . . . ∧ j∗l . (C.21)
k! l!
i1 ,...,ik ,j1 ,...,jl

It is easy to see that


w ∧ ν = (−1)kl ν ∧ w . (C.22)
Let us illustrate alternating k-forms with an example.

Application 3.43. The three-dimensional case


Consider a three-dimensional vector space V with basis (i ) and dual basis (i∗ ), where i = 1, 2, 3. In
this case, we have non-trivial k-forms for the values k = 0, 1, 2, 3 and the spaces Λk V ∗ are as follows.

space basis typical element dimension


Λ0 V ∗ (1) w∈R 1
P3
Λ1 V ∗ (1∗ , 2∗ , 3∗ ) i
i=1 wi ∗ 3
P3
Λ2 V ∗ (ν 1 := 2∗ ∧ 3∗ , ν 2 := 3∗ ∧ 1∗ , ν 3 := 1∗ ∧ 3∗ ) i=1 wi ν
i 3
Λ3 V ∗ (1∗ ∧ 2∗ ∧ 3∗ ) w 1∗ ∧ 2∗ ∧ 3∗ 1
P3 i
P3 j
Suppose we have two alternating one-forms v = i=1 vi ∗ and w = j=1 wj ∗ and we write the
coefficients as vectors, so v = (v1 , v2 , v3 )T and w = (w1 , w2 , w3 )T . Then, taking the wedge product
we find
X3 X3
i j
v∧w = vi wj ∗ ∧ ∗ = (v × w)i ν i , (C.23)
i,j=1 i=1

where ν i are the basis forms for Λ2 V ∗ defined in the above table. In conclusion, we see that the
wedge-product of two one-forms leads to a two-form which can be expressed in terms of the cross
product. Alternating k-forms in three dimensions are the proper context within which to formulate
the cross product and we can see why the cross product only appears in three dimensions. Only in
this case have the spaces Λ1 V ∗ and Λ2 V ∗ the same dimensions so that both one-forms and two-forms
can be interpreted as three-dimensional vectors.

To summarise our discussion, over each n-dimensional vector space V , we now have a “tower” of vector
spaces
Λ0 V ∗ = F , Λ1 V ∗ = V ∗ , Λ2 V ∗ , · · · , Λn V ∗ (C.24)
consisting of alternating k-forms for k = 0, 1, 2, . . . , n. In our definition of differential one-forms we have
used Λ1 V ∗ = V ∗ (where the tangent spaces Tp U have assumed the role of V ). Now we can go further and
arrange the whole tower of vector spaces (C.24) over each tangent space V = Tp U . As we will see, doing
this leads to higher-order differential forms.

C.4 Higher-order differential forms


C.4.1 Definition of differential forms
After this preparation it is now straightforward to define differential k-forms. We simply copy the defini-
tion C.1 of differential one-forms but replace the cotangent space Tp∗ U used there by Λk Tp∗ U .

184
Definition C.7. (Differential k-forms) A differential k-form is a map w : U → Λk T ∗ U := Λk Tp∗ U ,
S
p∈U
with p 7→ wp , such that wp ∈ Λk Tp∗ U .
From this definition, a differential k-form w provides an alternating k-form wp over Tp U for every point
p ∈ U which can act on k tangent vectors v1 , . . . , vk ∈ Tp U to give a number wp (v1 , . . . , vk ) ∈ R.
It is now quite easy to write down the general expression for a differential k-form. From our discussion
of differential one-forms we know that the differentials dx1p , . . . , dxnp form a basis of the cotengent space
Tp∗ U at p ∈ U . Combining this with what we have said about alternating k-forms in the previous
subsection (see Prop. C.2) shows that the k-forms dxip1 ∧ . . . ∧ dxipk , where 1 ≤ i1 < . . . < ik ≤ n, form a
basis of Λk Tp∗ U . Hence, dropping the point p, a differential k-form can be written as
1 X
w= wi1 ...ik dxi1 ∧ . . . ∧ dxik , (C.25)
k!
i1 ,...,ik

where the wi1 ...ik : U → R are functions, labelled by a completely anti-symmetric set of indices, i1 , . . . , ik .
We can define properties of a differential k-form w in terms of the functions wi1 ...ik : U → R, just as we
have done in the case of differential one-forms (see Def. C.3). For example, we say that w is differentiable
iff all the functions wi1 ...ik are. The space of infinitely many times differentiable k-forms on U is denoted
by Ωk (U ). Note that we can generalise the wedge product and define the k + l-differential form w ∧ ν,
where w and ν are differential k- and l-forms respectively, in complete analogy with what we have done
for alternating forms in Eq. (C.21).

C.4.2 The exterior derivative


We can now introduce a derivative d : Ωk (U ) → Ωk+1 (U ), called exterior derivative, which maps differential
k-forms into differential k + 1-forms. The exterior derivative of a k-forms (C.25) is defined by
1 X 1 X ∂wi1 ···ik i
dw := dwi1 ...ik ∧ dxi1 ∧ . . . ∧ dxik = dx ∧ dxi1 ∧ . . . ∧ dxik , (C.26)
k! k! ∂xi
i1 ,...,ik i1 ,...,ik ,i

where we have worked out the total derivatives dwi1 ···ik of the functions wi1 ···ik explicitly in the second
step. Recall that the total differential of a functions f : U → R is a one-form df given by
X ∂f
df = dxi . (C.27)
∂xi
i

The exterior derivative satisfies a Leibnitz-type rule. If w is a k-form and ν is an l-form then
d(w ∧ ν) = dw ∧ ν + (−1)k w ∧ dν . (C.28)
Another simple but important property of the exterior derivative is
Proposition C.3. d2 = 0
Proof. This statement follows from straightforward calculation.
1 X ∂wi1 ...ik i
dw = dx ∧ dxi1 ∧ . . . ∧ dxik
k! ∂xi
i,i1 ,...,ik
1 X ∂ 2 wi1 ...ik j
d2 w = d(dw) = dx ∧ dxi ∧ dxi1 ∧ . . . ∧ dxik = 0
k! ∂xi ∂xj
i,j,i1 ,...,ik

The last equality follows because the second partial derivative is symmetric in (i, j) while dxj ∧ dxi is
anti-symmetric.

185
In view of this result it is useful to introduce the following terminology.

Definition C.8. A differential k-form w is called closed iff dw = 0. It is called exact iff there exists a
differential (k − 1)-form ν such that w = dν.

An immediate consequence of Prop. C.3 is that every exact differential form is also closed. The converse is
not always true but it is under certain conditions and this is formulated in a statement known as Poincaré’s
Lemma (see, for example, Ref. [10]). In general, the failure of closed forms to be exact is an important
phenomenon which is related to the topology of the manifold and leads into an area of mathematics known
as algebraic geometry. This is captured by the sequence
d d dn−1 d
0 −→ Ω0 (U ) −→
0
Ω1 (U ) −→
1
· · · · · · −→ Ωn (U ) −→
n
0 (C.29)

where we have attached an index to the exterior derivative d to indicate which degree form it acts on.
The property d2 = 0 from Prop. C.3 then translates into dk ◦ dk−1 = 0, that is, successive maps in the se-
quence (C.29) compose to zero. This makes the sequence (C.29) into what is called a complex: a sequence
of vector spaces related by maps such that adjacent maps compose to zero. The relation dk ◦ dk−1 = 0
implies that Im(dk−1 ) ⊂ Ker(dk ) (another, fancier way of saying every exact form is closed) and this
allows us to defined the cohomology of the space U by H k (U ) = Ker(dk )/Im(dk−1 ). If the space U is
such that Poincaré’s Lemma applies, so that every closed form is exact, then Im(dk−1 ) = Ker(dk ) and the
cohomology groups are trivial, that is, H k (U ) = {0}. Conversely, non-trivial cohomology groups indicate
there are closed but non-exact forms, that Poincaré’s Lemma does not apply and that we have a non-trivial
manifold topology. Pursuing this further is well beyond our present scope (see, for example, Ref. [14]).

Let us take a practical approach and work out differential forms and the exterior derivative more explicitly
for an example.

Application 3.44. Differential forms on R3


Let us consider differential forms in three dimensions on an open set U ⊂ R3 . We denote coordinates
by x = (x1 , x2 , x3 )T = (x, y, z)T . The basis differential forms required to write down one-forms are
(dx, dy, dz) and, as discussed above, the basis forms for higher-order differential forms can be obtained
as wedge products of (dx, dy, dz). For concise notation we arrange all these basis forms as follows:

ds := (dx, dy, dz)T


dS := (dy ∧ dz, dz ∧ dx, dx ∧ dy)T
dV := dx ∧ dy ∧ dz . (C.30)

Recall that in three dimensions we only have differential k-forms for k = 0, 1, 2, 3 and no higher. With
the above notation their general form is:

degree differential form components name in vector caclulus


0 w0 = f f function
1 w1 = A · ds A = (A1 , A2 , A3 )T vector field
2 w2 = B · dS B = (B1 , B2 , B3 )T vector field
3 w3 = F dV F function

The above table gives the general expressions for differential forms in three dimensions and also shows
that they can be identified with well-known objects in three-dimensional vector calculus. Specifically,

186
zero-forms and three-forms correspond to (real-valued) functions while one-forms and two-forms cor-
respond to vector fields. (These identifications become more complicated in higher dimensions.)
Now that we have a correspondence between three-dimensional differential forms and objects in
vector calculus it is natural to ask about the meaning of the exterior derivative in this context. We
begin with the exterior derivative of a zero-form which is given by
3
X ∂f i
dw0 = df = dx = (∇f ) · ds . (C.31)
∂xi
i=1

Hence, the exterior derivative acting on three-dimensional zero forms corresponds to the gradient of
the associated function. What about one-forms?
3 3
X X ∂Ai j
dw1 = dAi ∧ dxi = dx ∧ dxi
∂xj
i=1 i,j=1
     
∂A3 ∂A2 2 3 ∂A1 ∂A3 3 1 ∂A2 ∂A1
= − dx ∧ dx + − dx ∧ dx + − dx1 ∧ dx2
∂x2 ∂x3 ∂x3 ∂x1 ∂x1 ∂x2
= (∇ × A) · dS . (C.32)

Evidently, the exterior derivative of a three-dimensional one-form corresponds to the curl of the
associated vector field. Finally, for a differential two-form we have
3 3
X X ∂Bi j
dw2 = dBi ∧ dS i = dx ∧ dS i = (∇ · B) dV , (C.33)
∂xj
i=1 i,j=1

and, hence, its exterior derivative corresponds to the divergence of the associated vector field. There
are no non-trivial four-forms in three dimensions so for the exterior derivative of a three-form we
have, of course
dw3 = 0 . (C.34)
These results can be summarised as follows. For three-dimensional differential forms the exterior
derivative acts as follows
d d d d
0 −→ Ω0 (U ) −→ Ω1 (U ) −→ Ω2 (U ) −→ Ω3 (U ) −→ 0 , (C.35)

where we recall that Ωk (U ) denotes all (infinitely many times differentiable) k-forms on U . The same
diagram but expressed in the language of three-dimensional vector calculus (see also Eq. (A.17)) reads:
grad=∇ curl=∇× div=∇·
0 −→ C ∞ (U ) −−−−−→ V(U ) −−−−−→ V(U ) −−−−−→ C ∞ (U ) −−−−−→ 0 . (C.36)
The somewhat strange and ad-hoc differential operators of three-dimensional vector calculus are
perfectly natural when understood from the viewpoint of differential forms. They are simply all
versions of the exterior derivative. It is well-known in three-dimensional vector calculus that ∇ ×
(∇f ) = 0 and ∇ · (∇ × B) = 0, that is, carrying out adjacent maps in the diagram (C.36) one after the
other gives zero. From the point of view of differential forms, these equations are just manifestations
of the general property d2 = 0 in Lemma C.3. From Def. C.8, an exact differential one-form w1 is one

187
that can be written as w1 = df , where f is a function. For the corresponding vector field A the same
property is called “conservative”, meaning the vector field can be written as A = ∇f .
In essence, vector calculus is the calculus of differentials forms written in different (and some might
say, awkward) notation.

Exercise C.2. Write the vector fields A = (x2 , y, z 3 )T and B = (y, −x, 0)T on R3 as differential one-
forms and use the exterior derivative to calculate their curl. Next, write A and B as differential two-forms
and use the exterior derivative to compute their divergence.
Exercise C.3. Repeat the above discussion of differential forms in three dimensions for the case of dif-
ferential forms in four dimensions.

Application 3.45. Some explicit differential forms


Consider the following differential one-forms in R2 and R3 :

ω = xdy + ydx , ν = xdy − ydx , µ = xdx + ydy + zdz . (C.37)

Are these one-forms closed? A simple calculation based on the definition, Eq. (C.26), of the exterior
derivative gives

dω = dx ∧ dy + dy ∧ dx = 0
dν = dx ∧ dy − dy ∧ dx = 2dx ∧ dy
dµ = dx ∧ dx + dy ∧ dy + dz ∧ dz = 0 .

It follows that ω and µ are closed, while ν is not closed. Poincaré’s Lemma applies on Rn so this
means that ω and µ must also be exact, that is, we must be able to find functions f and g such
that ω = df and µ = dg. It is easy to verify, using Eq. (C.27), that suitable functions are given by
f (x, y) = xy and g(x, y, z) = (x2 + y 2 + z 2 )/2.

C.4.3 Hodge star and Laplacian


Coming back to the case of arbitrary dimensions, n, there is another important operation, called the Hodge
star, which maps k forms to n − k forms. For a k-form w as in Eq. (C.25) it is defined by
1
?w = i1 ···ik ik+1 ···in wi1 ···ik dxik+1 ∧ · · · ∧ dxin , (C.38)
k!(n − k)!
where i1 ···in is the Levi-Civita tensor in n dimensions 17 . The Hodge star can be used to introduce another
derivative d† : Ωk (U ) → Ωk−1 (U ) (mapping k-forms to k − 1 forms) by d† := ?d?. A related differential
operator is the Laplacian ∆ : Ωk (U ) → Ωk (U ) for differential forms which is defined by
∆ = d† d + dd† . (C.39)
Exercise C.4. Verify that the Laplacian (C.39) when acting on zero-forms, that is functions, does indeed
coincide with the usual Laplacian.

17
If we have a non-trivial metric gij we have to be a bit more careful. In this case we can define the Levi-Civita tensor as
i1 ···in
 = √ 1 ˆi1 ···in where ˆi1 ···in is the “pure number” epsilon. Then Eq. (C.38) remains valid with the understanding
det(g)
that indices on the  tensor are lowered with gij .

188
Application 3.46. Maxwell’s equations with differential forms
Maxwell’s equations contain three vector fields, the electric field E, the magentic field B and the cur-
rent density J, plus a scalar, the charge density ρ. We know from our discussion of three-dimensional
differential forms above that three-dimensional vector fields can be represented by either one-forms
or two-forms, so we have a choice. It turns out it is convenient (that is, it makes the equations look
simpler) if we write E and J as one-forms and B as a two-form, that is,

E := E · ds , B := B · dS , J := J · ds . (C.40)

It is also useful to work out the Hodge star of these fields. It follows from its definition (C.38) that

?1 = dV , ?ds = dS , ?dS = ds , ?dV = 1 . (C.41)

and this leads immediately to

?E = E · dS , ?B = B · ds . (C.42)

Using the relations between curl/divergence and the exterior derivative in Eqs. (C.32) and (C.33) we
have

dE = (∇ × E) · dS , dB = (∇ · B)dV , d† E = ∇ · E , d† B = (∇ × B) · ds . (C.43)

Now we are ready to convert Maxwell’s equations. In vector calculus notation they are given by

∇ · E = 4πρ
∇ × B − 1c Ė = 4π
c J
(C.44)
∇ × E + 1c Ḃ = 0
∇·B = 0,

where the dot stands for the time derivative ∂t and c is the speed of light. Multiplying these equations
with 1, ds, dS and dV , respectively, and using Eqs. (C.43) they are easily converted to

d† E = 4πρ
d† B − 1c Ė = 4π
c J
(C.45)
dE + 1c Ḃ = 0
dB = 0 .

This is by no means the most elegant form of Maxwell’s equations. By using differential forms in
three dimensions we are treating the three spatial dimensions and time on different footing.
It is much more natural to think of differential forms in four dimensions with coordinates (xµ ) =
(x = t, xi ) and basis differentials (dxµ ) = (dx0 = dt, dxi ). Then, the fields E and B can be combined
0

into a four-dimensional two-form F (called the field-strength tensor), and the charge density ρ and
the current J into a four-dimensional one-form j (called the four-current). More explicitly, these
quantities are defined by (for simplicity, setting c = 1 from now on)
1
.F = E ∧ dt + B =: Fµν dxµ ∧ dxν , j = ρ dt + J =: jµ dxµ , (C.46)
2

189
where the second equalities define the components Fµν and jµ . Maxwell’s equations (C.45) in three-
dimensional form can be converted into the language of four-dimensional differential forms and be
written in terms of F and j.
To do this we have to be careful to distinguish between operations on three-dimensional and
four-dimensional differential forms. The exterior derivative in four-dimensions is denoted by d =
dxµ ∂µ ∧ while we now denote its three-dimensional counterpart by d3 = dxi ∂i ∧. Likewise, the four-
dimensional Hodge star ? is defined by Eq. (C.38) (using as metric the Lorentz metric ηµν ) with the
four-dimensional Levi-Civita tensor while its three-dimensional counterpart, now denoted by ?3 , is
defined by Eq. (C.38) with the three-dimensional Levi-Civita tensor. For a three-dimensional k-form
w we then have ?(dt ∧ w) = ?3 w and ?w = (−1)k+1 dt ∧ ?3 w. Using these rules, together with the
product rule (C.28) for differential forms we find

F = E
 ∧ dt + B ?F = − ?3 E + dt ∧ ?3 B
(C.47)
dF = d3 E + Ḃ ∧ dt + d3 B d† F = d†3 E dt + d†3 B − Ė

Comparing the last two of these equations with the three-dimensional version of Maxwell equa-
tions (C.45) (and remembering that d in those equations is d3 in our new notation and we have
set c = 1) we find that, equivalently, they can be written as

d† F = 4πj , dF = 0 . (C.48)

Eqs. (C.48) are referred to as the covariant form of electro-magnetism and covariant here refers to
Lorentz transformations. If we transform covariant and contravariant tensor as we normally do in
Special Relativity (for example dxµ 7→ Λµ ν dxν or Fµν 7→ Λµ ρ Λν σ Fρσ ) then expressions with all indices
contracted (between upper and lower indices) are Lorenz invariant. This means that F = 12 Fµν dxµ ∧
dxν is Lorentz-invariant, as is j = jµ dxµ and d = dxµ ∂µ ∧. It follows that Maxwell’s equations in the
form (C.48) are expressed entirely in terms of Lorentz-invariant quantities and are, therefore, Lorentz-
invariant themselves. In other words, Maxwell’s theory is Lorentz-invariant and, hence, compatible
with Special Relativity. This was not obvious in the three-dimensional formulation (C.45) but it is
manifest in the four-dimensional one (C.48).
Covariant electro-magnetism in the form (C.48) is already quite elegant but it can be simplified
further. Note that the second Eq. (C.48) states that F is closed. This means if the conditions of
Poincaré’s theorem are satisfied (and, in particular, locally) we can write F = dA for a one-form
A = Aµ dxµ . This one-form is called the vector potential or gauge field. From Eq. (C.46), the field
strength F contains the electric and the magnetic fields and is, hence, the physical field. On the other
hand, the vector potential A contains unphysical degrees of freedom, that is, different potentials A
can lead to the same F. This can be seen by changing A by a gauge transformation which is defined
as
A 7→ A0 = A + dλ , (C.49)
where λ is a function. Under such a gauge transformation the field strength is unchanged since

F 7→ F 0 = dA0 = dA + d2 λ = dA = F .

Note that this is a direct consequence of the property d2 = 0 of the exterior derivative, see Prop. C.3.
In conclusion, since A and A0 , related by a gauge transformation (C.49), lead to the same field strength
tensor F they describe the same physics.

190
The gauge transformation (C.49) can be used to choose a vector potential which satisfies an
additional condition (without affecting any of the physics as encoded in the gauge-invariant F). One
such condition, referred to as Lorentz gauge, is

d† A = 0 . (C.50)

Why can this be done? Suppose that A does not satisfy the Lorentz gauge condition (C.50). Then
we perform a gauge transformation (C.49) and demand that the new (physically equivalent) gauge
field A0 satisfies the Lorentz condition, that is, d† A0 = 0. This can be accomplished if the gauge
transformation is carried out with a function λ which satisfies d† dλ = ∆λ = −d† A, that is, with a λ
which satisfies a certain Laplace equation a .
What form do Maxwell’s equations (C.48) take when expressed in terms of the gauge field A?
Since F = dA and d2 = 0, the second, homogeneous Maxwell equation in (C.48) is automatically
satisfied. The first, inhomogeneous Maxwell equation (C.48) becomes

d† A=0
d† dA = 4πj −−−−−→ ∆A = 4πj . (C.51)

That is, expressed in terms of the gauge field A and choosing the Lorentz gauge (C.50) electro-
magnetism is described by the single equation ∆A = 4πj. It does not get any simpler.
a
Note that for zero forms λ we have d† λ = 0 and, hence, from Eq. (C.39), d† dλ = ∆λ.

C.5 Integration of differential forms


We have already seen that differential one-forms are designed to be integrated over one-dimensional sub-
manifolds, that is, over curves. This suggests differential k-forms can be integrated over k-dimensional
sub-manifolds and we would now like to discuss how this works.

C.5.1 Definition of integral and general Stokes’s theorem


Start with an open set U ⊂ Rn , a differential k-form w on U and a k-dimensional manifold M ⊂ U , given by
a parametrisation X : V → U , with parameters t = (t1 , . . . , tk )T ∈ V . At every point X(t) on the manifold
M we have an n-dimensional tangent space TX(t) U with a k-dimensional subspace TX(t) M , the tangent
space to the manifold M at X(t), the latter being spanned by the k tangent vectors ∂X ∂ti (t) ∈ TX(t) U . At
each such point X(t) the differential k-form w provides an alternating k-form wX(t) which takes k vector
arguments and can, hence, act on the k tangent vectors to the manifold at that point. This observation
is used to define the integral of the differential k form w over the k-dimensional manifold M as 18
 
∂X ∂X
Z Z
w := wX(t) (t), . . . , (t) dtk . (C.52)
M V ∂t1 ∂tk

The integral over differential k-forms relates to the exterior derivative in an interesting way.

Theorem C.5. (General Stokes’s theorem) Let w be a continuously differentiable k-form on U ⊂ Rn


and M ⊂ U a (k + 1)-dimensional (oriented) manifold with a (smooth) boundary ∂M (with the induced
18
We are glossing over a number of subtleties here. First, the manifold might consist of several charts and the definition of
the integral then involves a sum over these, patched together by a “partition of unity”. We have already ignored this earlier,
in our definition of the integral over sub-manifolds. Secondly, manifolds may or may not have an “orientation”. We want to
talk about the integral only when an orientation exists.

191
orientation). Then Z Z
w= dw . (C.53)
∂M M

Proof. A proof can be found in analysis textbooks, for example, Ref. [10].

Stokes’s theorem as above is very general and powerful. In particular, it contains the integral theorems
you have heard about in year one as special cases.

C.5.2 Stokes’s theorem in three dimensions


First consider a surface S ⊂ U ⊂ R3 , bounded by the curve ∂S and a differential one-form w = A · ds on
U . From Eq. (C.32) we know that its exterior derivative can be written as

dw = (∇ × A) · dS . (C.54)

Inserting all this into Stokes’s theorem (C.53) leads to


Z Z
A · ds = (∇ × A) · dS . (C.55)
∂S S

This is, of course the integral theorem in three dimensions also known as Stokes’s theorem (in the narrow
sense).

C.5.3 Gauss’s theorem in three dimensions


Next, let V ⊂ U ⊂ R3 be a three-dimensional manifold with bounding surface ∂V and w = B · dS a
differential two-form on U . From Eq. (C.33) we know its exterior derivative can be written as

dw = (∇ · B) dV . (C.56)

Inserting into Stokes’s theorem (C.53) gives


Z Z
B · dS = (∇ · B) dV , (C.57)
∂V V

and this is, of course, known as Gauss’s theorem.


The general form of Stokes’s theorem is not limited to three dimensions. Let us consider a two-dimensional
case.

C.5.4 Stokes’s theorem in two dimensions


We want to consider differential forms on U ⊂ R2 . We use coordinates x = (x, y)T and basis differential
forms (dx, dy). In analogy with the three-dimensional case we introduce

ds = (dx, dy)T , da = dx ∧ dy . (C.58)

Hence, we can write a differential one-form w on U as


 
A1
w = A · ds , A= , (C.59)
A2

192
where we can think of A as a vector field in two dimensions. A quick calculation of the exterior derivative
dw gives  
∂A1 ∂A2
dw = − da . (C.60)
∂y ∂x
For a two-dimensional manifold V ⊂ U ⊂ R2 with bounding curve ∂V Stokes’s theorem (C.53) then
becomes Z  
∂A1 ∂A2
Z
A · ds = − da . (C.61)
∂V V ∂y ∂x

C.5.5 Gauss’s theorem in n dimensions


Let us consider an open set U ⊂ Rn , coordinates x = (x1 , . . . , xn ) and basis differentials (dx1 , . . . , dxn ).
We define

dS = (dS 1 , . . . , dS n ) , dS i = (−1)i+1 dx1 ∧ . . . ∧ dx


ci ∧ . . . ∧ dxn
dV = dx1 ∧ . . . ∧ dxn ,

where the hat over dxi indicates that this differential should be omitted from the wedge product. Hence,
the dS i are wedge products of all the dxj , except for j = i. A differential n − 1-form w on U can be
written as
w = B · dS ⇒ dw = (∇ · B) dV , (C.62)
where B = (B1 , . . . , Bn )T is an n-dimensional vector field and ∇ · B = ni=1 ∂B
P i
∂xi
is the n-dimensional
version of the divergence. With an n-dimensional manifold V ⊂ U ⊂ R , bounded by an (n − 1)-
n

dimensional hyper-surface ∂V, Stokes’s theorem (C.53) becomes


Z Z
B · dS = (∇ · B) dV . (C.63)
∂V V

Exercise C.6. Derive a version of Stoke’s theorem in four dimensions, which relates integrals of a two-
form over surfaces S ⊂ R4 to integrals of a one-form over the boundary curve ∂S.

193
D Literature
The references below do not provide a comprehensive list - there is a large number of mathematics and
physics books relevant to the subject of mathematical methods in physics and I suggest an old-fashioned
trip to the library. The books below have been useful in preparing these lectures.

[1] K. F. Riley, M. P. Hobson and S. J. Bence, “Mathematical Methods for Physics and
Engineering”, CUP 2002.
This is the recommended book for the mathematics component of the physics course. As the title
suggests this is a “hands-on” book, strong on explaining methods and concrete applications, rather
weaker on presenting a coherent mathematical exposition.
[2] Albert Messiah, “Quantum Mechanics”, Courier Corporation, 2014.
A comprehensive physics book on quantum mechanics, covering the more formal aspects of the subject
as well as applications, with a number of useful appendices, including on special functions and on
group theory.
[3] John David Jackson, “Classical Electrodynamics”, Wiley, 2012.
For me the ultimate book on electrodynamics. In addition to a comprehensive coverage of the subject
and many physical applications it also explains many of the required mathematical methods informally
but efficiently.
[4] F. Constantinescu, “Distributions and Their Applications in Physics”, Elsevier, 2017.
Exactly what the title says. Contains a lot more on distributions then we were able to cover in these
lectures, including a much more comprehensive discussion of the various types of test function spaces
and distributions and the topic of Green functions for many linear operators relevant to physics.
[5] Bryan P. Rynne and Martin A. Youngson, “Linear Functional Analysis”, Springer 2008.
A nice comprehensive treatment of functional analysis which gets to the point quickly but which is
not particularly explicit about applications. A good source to learn some of the basics.
[6] Ole Christensenn, “Functions, Spaces and Expansions”, Springer 2010.
A book on functional analysis from a more applied point of view, starting with some of the basic
mathematics and then focusing on systems of ortho-normal functions and their applications. A good
book to understand some of the basic mathematics and the practicalities of dealing with ortho-normal
systems - less formal than Rynne/Youngson.
[7] Francesco Giacomo Tricomi, “Serie ortogonali di funzioni”.
The hardcore book on ortho-normal systems of functions from the Italian master. Sadly, I haven’t been
able to find an English version. The book contains a very nice treatment of orthogonal polynomials,
among many other things, which is a lot more interesting than the stale account found in so many
other books. Chapter 4 of these lectures follows the logic of this book.
[8] Serge Lang, “Real and Functional Analysis”, Springer 1993.
A high quality mathematics book covering analysis and functional analysis at an advanced level.
[9] Michela Petrini, Gianfranco Pradisi, Alberto Zaffaroni, “A Guide to Mathematical
Methods for Physicists: With Problems and Solutions”, World Scientific, 2017.
A nice book covering many of the main pieces of mathematics crucial to physics, including complex
functions, integration theory and functional analysis, taking the mathematics seriously but without
being overly formal.

194
[10] Serge Lang, “Undergraduate Analysis”, Springer 1997.
A very nice book on the subject, not overly formal and very useful to fill in inevitable mathematical
gaps.

[11] Serge Lang, “Complex Analysis”, Springer 1999.


A great book an complex analysis - suitable for self-study and to complement the short option on
complex functions.

[12] Abramowitz and Stegun, “Handbook of Mathematical Functions”.


The hardcore reference for information on special functions.

[13] William Fulton and Joe Harris, “Representation Theory: A First Course”, Springer
2013.
An excellent book, covering both finite groups and Lie groups, but somewhat advanced.

[14] Mikio Nakahara, “Geometry, Topology and Physics”, Taylor and Francis, 2013.
An excellent textbook, written for physicists, but at a more advanced level, discussing topology,
differential geometry and how it relates to physics. This is not really relevant for the main part of
this course but it gives a more advanced account of the material explained in Appendices B and C. If
you are interested in this aspect of mathematical physics (and you would perhaps like to learn about
the mathematics underlying General Relativity) this is the book for you.

195

You might also like