Mmlecturenotes
Mmlecturenotes
3 Fourier analysis 46
3.1 Fourier series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.1.1 Cosine Fourier series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.1.2 Sine Fourier series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.1.3 Real standard Fourier series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.1.4 Complex standard Fourier series . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.1.5 Pointwise convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.1.6 Examples of Fourier series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.2 Fourier transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.2.1 Basic definition and properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.2.2 Convolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.2.3 Examples of Fourier transforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
3.2.4 The inverse of the Fourier transform . . . . . . . . . . . . . . . . . . . . . . . . . . 63
3.2.5 Fourier transform in L2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
1
4 Orthogonal polynomials 66
4.1 General theory of ortho-normal polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . 66
4.1.1 Basic set-up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
4.1.2 Recursion relation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.1.3 General Rodriguez formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.1.4 Classification of orthogonal polynomials . . . . . . . . . . . . . . . . . . . . . . . . 68
4.1.5 Differential equation for orthogonal polynomials . . . . . . . . . . . . . . . . . . . 69
4.1.6 Expanding in orthogonal polynomials . . . . . . . . . . . . . . . . . . . . . . . . . 70
4.2 The Legendre polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.2.1 Associated Legendre polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
4.3 The Laguerre polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
4.4 The Hermite polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
6 Laplace equation 93
6.1 Laplacian in different coordinate systems . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
6.1.1 Two-dimensional Laplacian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
6.1.2 Three-dimensional Laplacian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
6.1.3 Laplacian on the sphere . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
6.1.4 Green Identities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
6.2 Basic theory∗ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
6.2.1 Green functions for the Laplacian . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
6.2.2 Maximum principle and uniqueness . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
6.2.3 Uniqueness - another approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
6.3 Laplace equation in two dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
6.3.1 Complex methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
6.3.2 Separation of variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
6.3.3 Polar coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
6.4 Laplace equation on the two-sphere . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
6.4.1 Functions on S 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
6.4.2 Eigenvalue problem for the Laplacian on S2 . . . . . . . . . . . . . . . . . . . . . . 106
6.4.3 Multipole expansion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
6.5 Laplace equation in three dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
2
6.5.1 Method of image charges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
6.5.2 Cartesian coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
6.5.3 Cylindrical coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
6.5.4 Spherical coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
7 Distributions 114
7.1 Basic definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
7.1.1 Examples of distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
7.1.2 Convergence of distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
7.1.3 Derivatives of distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
7.2 Convolution of distributions∗ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
7.3 Fundamental solutions - Green functions∗ . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
7.4 Fourier transform for distributions∗ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
3
Appendices 158
B Manifolds in Rn 172
B.1 Definition of manifolds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
B.2 Tangent space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
B.3 Integration over sub-manifolds of Rn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
B.3.1 Metric and Gram’s determinant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
B.3.2 Definition of integration over sub-manifolds . . . . . . . . . . . . . . . . . . . . . . 176
B.3.3 A few special cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
B.4 Laplacian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
D Literature 194
4
Foreword: Lecturing a Mathematical Methods course to physicists can be a tricky affair and following
such a course as a second year student may be even trickier. The traditional material for this course consists
of the classical differential equations and associated special function solutions of Mathematical Physics. In
a modern context, both in Mathematics and in Physics, these subjects are increasingly approached in the
appropriate algebraic setting of Banach and Hilbert Spaces. The correct setting for Quantum Mechanics
is provided by Hilbert Spaces and for this reason alone they are a mandatory subject and should at
least receive a rudimentary treatment in a course on Mathematical Methods for Physicists. However, the
associated mathematical discipline of Functional Analysis merits a lecture course in its own right and
cannot possibly be treated comprehensively in a course which also needs to cover a range of applications.
What is more, physics students may not yet have come across some of the requisite mathematics, such
as the notion of convergence and the definition of integrals. All of this places an additional overhead on
introducing mathematical key ideas, such as the idea of a Hilbert Space.
As a result of these various difficulties and requirements Mathematical Methods courses often end up as
collections of various bits of Mathematical Physics, seemingly unconnected and without any guiding ideas,
other than the apparent usefulness for solving some problems in Physics. Sometimes, ideas developed in
the context of finite-dimensional vector spaces are used as guiding principles but this ignores the crucial
differences between finite and infinite-dimensional vector spaces, to do with issues of convergence.
These lecture notes reflect the attempt to provide a modern Mathematical Physics course which
presents the underlying mathematical ideas as well as their applications and provides students with an
intellectual framework, rather than just a “how-to-do” toolkit. We begin by introducing the relevant math-
ematical ideas, including Banach and Hilbert Spaces but keep this at a relatively low level of formality
and quite stream-lined. On the other hand, we will cover the “traditional” subjects related to differential
equations and special functions but attempt to place these into the general mathematical context. Sec-
tions with predominantly mathematical background material are indicated with a star. While they are
important for a deep understanding of the material they are less essential for the relatively basic practical
tasks required to pass an exam. I believe ambitious, mathematically interested student can benefit from
the combination of mathematical foundation and applications in these notes. Students who want to focus
on the practical tasks may concentrate on the un-starred sections.
Two somewhat non-traditional topics, distributions and groups, have beed added. Distributions are
so widely used in physics - and physicists tend to discuss important ideas such as Green functions using
distributions - that they shouldn’t be omitted from a Mathematical Physics course. Symmetries have
become one of the central ideas in physics and they are underlying practically all fundamental theories
of physics. It would, therefore, be negligent, in a course on Mathematical Methods, not to introduce the
associated mathematical ideas of groups and representations.
The three appendices are pure bonus material. The first one is a review of the calculus of multiple
variables, the second a simple account of sub-manifolds in Rn , including curves and surfaces in R3 , as
encountered in vector calculus. Inevitably, it also does some of the groundwork for General Relativity - so
certainly worthwhile for anyone who would like to learn Einstein’s theory of gravity. The third appendix
introduces differential forms, a classical topic in mathematical physics, at an elementary level. Read (or
ignore) at your own leisure.
Andre Lukas
Oxford, 2018
5
1 Mathematical preliminaries
This section provides some basic mathematical background which is essential for the lecture and can also
be considered as part of the general mathematical language every physicist should be familiar with. The
part on vector spaces is (mainly) review and will be dealt with quite quickly - a more detailed treatment
can be found in the first year lecture notes on Linear Algebra. The main mathematical theme of this
course is the study of infinite-dimensional vector spaces and practically every topic we cover can (and
should) be understood in this context. While the first year course on Linear Algebra dealt with finite-
dimensional vector spaces many of the concepts were introduced without any reference to dimension and
straightforwardly generalise to the infinite-dimensional case. These include the definitions of vector space,
sub-vector space, linear maps, scalar products and norms and we begin by briefly reviewing those ideas.
One of the concepts which does not straightforwardly generalise to the infinite-dimensional case is that
of a basis. We know that a finite-dimensional vector space V (over a field F ) has a basis, v1 , . . . , vn , and
that every vector v ∈ V can be written as a unique linear combination
n
X
v= αi vi , (1.1)
i=1
where αi ∈ F are scalars. A number of complications arise when trying to generalise this to infinite
dimensions. Broadly speaking, it is not actually clear whether a basis exists in this case. A basis must
certainly contain an infinite number of basis vectors so that the RHS of Eq. (1.1) becomes an infinite
sum. This means we have to address questions of convergence. Even if we can formulate conditions for
convergence we still have to clarify whether we can find a suitable set of scalars αi such that the sum (1.1)
converges to a given vector v. All this requires techniques from analysis (= calculus done properly) and
the relevant mathematical basics will be discussed in part 2 of this section while much of Section 2 will
be occupied with answering the above questions.
Finally, we need to address another mathematical issue, namely the definition of integrals. The most
important infinite-dimensional vector spaces we need to consider consist of functions, with a scalar product
defined by an integral. To understand these function vector spaces we need to understand the nature of
the integral. In the last part of this section, we will, therefore, briefly discuss measures and the Riemann
and Lebesgue integrals.
6
i) vector addition: (v, w) 7→ v + w ∈ V , where v, w ∈ V
ii) scalar multiplication: (α, v) 7→ αv ∈ V , where α ∈ F and v ∈ V .
For all u, v, w ∈ V and all α, β ∈ F , these operations have to satisfy the following rules:
(V1) (u + v) + w = u + (v + w) “associativity”
(V2) There exists a “zero vector”, 0 ∈ V so that 0 + v = v “neutral element”
(V3) There exists an inverse, −v with v + (−v) = 0 “inverse element”
(V4) v+w =w+v “commutativity”
(V5) α(v + w) = αv + αw
(V6) (α + β)v = αv + βv
(V7) (αβ)v = α(βv)
(V8) 1·v =v
The elements v ∈ V are called “vectors”, the elements α ∈ F of the field are called “scalars”.
Closely associated to this definition is the one for the “sub-structure”, that is, for a sub vector space. A
sub vector space is a non-empty subset W ⊂ V of a vector space V which is closed under vector addition
and scalar multiplication. More formally, this means:
Definition 1.2. (Sub vector spaces) A sub vector space W ⊂ V is a non-empty subset of a vector space
V satisfying:
(S1) w1 + w2 ∈ W for all w1 , w2 ∈ W
(S2) αw ∈ W for all α ∈ F and for all w ∈ W
A sub vector space satisfies all the axioms in Def. 1.1 and is, hence, a vector space in its own right. Every
vector space V has two trivial sub vector spaces, the null vector space {0} ⊂ V and the total space V ⊂ V .
For two sub vector spaces U and W of V the sum U + W is defined as
U + W = {u + w | u ∈ U , w ∈ W } . (1.2)
Evidently, U + W is also a sub vector space of V as shown in the following
Exercise 1.1. Show that the sum (1.2) of two sub vector spaces is a sub vector space.
A sum U + W of two sub vector spaces is called direct iff U ∩ W = {0} and a direct sum is written as
U ⊕ W.
Exercise 1.2. Show that the sum U + W is direct iff every v ∈ U + W has a unique decomposition
v = u + w, with u ∈ U and w ∈ W .
Exercise 1.3. Show that a sub sector space is a vector space.
There are a number of basic notions for vector spaces which include linear combinations, span, linear
independence and basis. Let us briefly recall how they are defined. For k vectors v1 , . . . , vk in a vector
space V over a field F the expression
k
X
α 1 v1 + · · · + α k v k = αi vi , (1.3)
i=1
with scalars α1 , . . . , αn ∈ F , is called a linear combination. The set of all linear combinations of v1 , . . . , vk ,
( k )
X
Span(v1 , . . . , vk ) := α i vi | α i ∈ F , (1.4)
i=1
is called the span of v1 , . . . , vk . Linear independence is defined as follows.
7
Definition 1.3. Let V be a vector space over F and α1 , . . . , αk ∈ F scalars. A set of vectors v1 , . . . , vk ∈ V
is called linearly independent if
Xk
αi vi = 0 =⇒ all αi = 0 . (1.5)
i=1
Pk
Otherwise, the vectors are called linearly dependent. That is, they are linearly dependent if i=1 αi vi =0
has a solution with at least one αi 6= 0.
If a vector space V is spanned by a finite number of vectors (that is, every v ∈ V can be written as a
linear combination of these vectors) it is called finite-dimensional, otherwise infinite-dimensional. Recall
the situation for finite-dimensional vector spaces. In this case, we can easily define what is meant by a
basis.
Definition 1.4. A set v1 , . . . , vn ∈ V of vectors is called a basis of V iff:
(B1) v1 , . . . , vn are linearly independent.
(B2) V = Span(v1 , . . . , vn )
The number of elements in a basis is called the dimension, dim(V ) of the vector space. Every vector
v ∈ V can then be written as a unique linear combination of the basis vectors v1 , . . . , vn , that is,
n
X
v= αi vi , (1.6)
i=1
with a unique choice of αi ∈ F for a given vector v. The αi are also called the coordinates of the vector
v relative to the given basis.
Clearly, everything is much more involved for infinite-dimensional vector spaces but the goal is to
generalise the concept of a basis to this case and have an expansion analogous to Eq. (1.6), but with
the sum running over an infinite number of basis elements. Making sense of this requires a number of
mathematical concepts, including that of convergence, which will be developed in this section.
over the field F (where, usually, either F = R for real column vectors or F = C for complex column
vectors), with vector addition and scalar multiplication defined “entry-by-entry” as
v1 w1 v 1 + w1 v1 αv1
.. .. .
. .. ..
. + . := , α . := . . (1.8)
.
vn wn v n + wn vn αvn
Verifying that these satisfy the vector space axioms 1.1 is straightforward. A basis is given by the standard
unit vectors e1 , . . . , en and, hence, the dimension equals n.
Here, we will also (and predominantly) be interested in more abstract vector spaces consisting of sets
of functions. A general class of such function vector spaces can be defined by starting with a (any) set S
and by considering all functions from S to the vector space V (over the field F ). This set of functions
F(S, V ) := {f : S → V } (1.9)
8
can be made into a vector space over F by defining a “pointwise” vector addition and scalar multiplication
Exercise 1.4. Show that the space (1.9) together with vector addition and scalar multiplication as defined
in Eq. (1.10) defines a vector space.
There are many interesting special cases and sub vector spaces which can be obtained from this
construction. For example, choose S = [a, b] ⊂ R as an interval on the real line (a = −∞ or b = ∞ are
allowed) and V = R (or V = C), so that we are considering the space F([a, b], R) or F([a, b], C) of all
real-valued (or complex-valued) functions on this interval. With the pointwise definitions (1.10) of vector
addition and scalar multiplication these functions form a vector space.
We can consider many sub-sets of this vector space by imposing additional conditions on the functions
and as long as these conditions are invariant under the addition and scalar multiplication of functions (1.10)
Def. 1.2 implies that these sub-sets form sub vector spaces. For example, we know that the sum of two
continuous functions as well as the scalar multiple of a continuous function is continuous so the set of
all continuous functions on an interval forms a (sub) vector space which we denote by C([a, b]). Similar
statements apply to all differentiable functions on an interval and the vector space of k times (continuously)
differentiable functions on the interval [a, b] is denoted by C k ([a, b]), with C ∞ ([a, b]) the space of infinitely
many time differentiable functions on the interval [a, b]. In cases where we consider the entire real line it
is sometimes useful to restrict to functions with compact support. A function f with compact support
vanishes outside a certain radius R > 0 such that f (x) = 0 whenever |x| > R. We indicate the property
of compact support with a subscript “c”, so that, for example, the vector space of continuous functions
on R with compact support is denoted by Cc (R). The vector space of all polynomials, restricted to the
interval [a, b] is denoted by P([a, b]). Whether the functions are real or complex-valued is sometimes also
indicated by a subscript, so CR ([a, b]) are the real-valued continuous functions on [a, b] while CC ([a, b]) are
their complex-valued counterparts.
Exercise 1.5. Find at least three more examples of function vector spaces, starting with the construc-
tion (1.9).
Definition 1.5. (Linear maps) A map T : V → W between two vector spaces V and W over a field F is
called linear if
(L1) T (v1 + v2 ) = T (v1 ) + T (v2 )
(L2) T (αv) = αT (v)
for all v, v1 , v2 ∈ V and for all α ∈ F . Further, the set Ker(T ) := {v ∈ V | T (v) = 0} ⊂ V is called the
kernel of T and the set Im(T ) := {T (v) | v ∈ V } ⊂ W is called the image of T .
In the context of infinite-dimensional vector spaces, linear maps are also sometimes called linear operators
and we will occasionally use this terminology. Recall that a linear map T : V → W always maps the zero
vector of V into the zero vector of W , so T (0) = 0 and that the kernel of T is a sub vector space of V
1
We will denote linear maps by uppercase letters such as T . The letter f will frequently be used for the functions which
form the elements of the vector spaces we consider.
9
while the image is a sub vector space of W . Surjectivity and injectivity of the linear map T are related to
the image and kernel via the equivalences
T surjective ⇔ Im(T ) = W , T injective ⇔ Ker(T ) = {0} . (1.11)
A linear map T : V → W which is bijective (= injective and surjective) is also called a (vector space)
isomorphism between V and W . The set of all linear maps T : V → W is referred to as the homorphisms
from V to W and is denoted by Hom(V, W ) := {T : V → W | T linear}. By using the general construc-
tion (1.10) (where V plays the role of the set S and W the role of the vector space V ) this space can be
equipped with vector addition and scalar multiplication. Further, since the sum of two linear functions
and the scalar multiple of a linear function are again linear, it follows from Def. 1.2 that Hom(V, W ) is a
(sub) vector space of F(V, W ). Finally, we note that for two linear maps T : V → W and S : W → U ,
the composition S ◦ T : V → U (defined by S ◦ T (v) := S(T (v))) is also linear.
The identity map id : V → V defined by id(v) = v is evidently linear. Recall that a linear map
S : V → V is said to be the inverse of a linear map T : V → V iff
S ◦ T = T ◦ S = id . (1.12)
The inverse exists iff T is bijective (= injective and surjective) and in this case it is unique, linear and
denoted by T −1 . Also recall the following rules
(T −1 )−1 = T , (T ◦ S)−1 = S −1 ◦ T −1 , (1.13)
for calculating with the inverse.
For a finite-dimensional vector space V with basis (v1 , . . . , vn ) we can associate to a linear map
T : V → V a matrix A with entries defined by
n
X
T (vj ) = Aij vi . (1.14)
i=1
This matrix describes the action of the linear map on the coordinate vectors relative to the basis (v1 , . . . , vn ).
To see what this
Pmeans more explicitly consider a vector v ∈ V with coordinate vector α = (α1 , . . . , αn )T ,
n
such that v = i=1 αi vi . Then, if T maps the vector v to v → T (v) the coordinate vector is mapped to
α → Aα. How does the matrix A depend on the choice of basis? Introduce a second basis (v10 , . . . , vn0 )
with associated matrix A0 . Then we have
Xn
A0 = P AP −1 , vj = Pij vi0 . (1.15)
i=1
The matrix P can also be understood as follows. Consider a vector v = ni=1 αi vi = ni=1 αi0 vi0 with
P P
coordinate vectors α = (α1 , . . . , αn ) and α0 = (α10 , . . . , αn0 ) relative to the unprimed and primed basis.
Then,
α0 = P α . (1.16)
An important special class of homomorphisms is the dual vector space V ∗ := Hom(V, F ) of a vector
space V over F . The elements of the dual vector space are called (linear) functionals and they map vectors
to numbers in the field F . For a finite-dimensional vector space V with basis v1 , . . . , vn , there exists a
basis ϕ1 , . . . , ϕn of V ∗ , called the dual basis, which satisfies
ϕi (vj ) = δij . (1.17)
In particular, a finite-dimensional vector space and its dual have the same dimension. For infinite-
dimensional vector spaces the discussion is of course more involved and we will come back to this later.
Exercise 1.6. For a finite-dimensional vector space V with basis v1 , . . . , vn show that there exists a basis
ϕ1 , . . . , ϕn of the dual space V ∗ which satsfies Eq. (1.17).
10
1.1.4 Examples of linear maps
We know that the linear maps T : Rn → Rm (T : Cn → Cm ) can be identified with the m × n matrices
containing real entries (complex entries) whose linear action is simple realised by the multiplication of
matrices with vectors.
Let us consider some examples of linear maps for vector spaces of functions, starting with the space
C([a, b]) of (real-valued) continuous functions on the interval [a, b]. For a (real-valued) continuous function
K : [a, b] × [a, b] → R of two variables we can define the map T : C([a, b]) → C([a, b]) by
Z b
T (f )(x) := dx̃ K(x, x̃)f (x̃) . (1.18)
a
This map is evidently linear since the integrand is linear in the function f and the integral itself is linear.
A linear map such as the above is called a linear integral operator and the function K is also referred to as
the kernel of the integral operator 2 . Such integral operators play an important role in functional analysis.
For another example consider the vector space C ∞ ([a, b]) of infinitely many times differentiable func-
tions on the interval [a, b]. We can define a linear operator D : C ∞ ([a, b]) → C ∞ ([a, b]) by
df d
D(f )(x) := (x) or D = . (1.19)
dx dx
A further class of linear operators Mp : C ∞ ([a, b]) → C ∞ ([a, b]) is obtained by multiplication with a fixed
function p ∈ C ∞ ([a, b]), defined by
Mp (f )(x) := p(x)f (x) . (1.20)
The above two classes of linear operators can be combined and generalised by including higher-order
differentials which leads to linear operators T : C ∞ ([a, b]) → C ∞ ([a, b]) defined by
dk dk−1 d
T = pk k
+ pk−1 k−1
+ · · · + p1 + p0 , (1.21)
dx dx dx
where pi , for i = 0, . . . , k, are fixed functions in C ∞ ([a, b]). Linear operators of this type will play an
important role in our discussion, mainly because they form the key ingredient for many of the differential
equations which appear in Mathematical Physics.
Definition 1.6. (Norms and normed vector spaces) A norm k · k on a vector space V over the field F = R
or F = C is a map k · k : V → R which satsifies
(N1) k v k > 0 for all non-zero v ∈ V
(N2) k αv k = |α| k v k for all α ∈ F and all v ∈ V
(N3) k v + w k ≤ k v k + k w k for all v, w ∈ V (triangle inequality)
A vector space V with a norm is also called a normed vector space.
2
This notion of “kernel” has nothing to do with the kernel of a linear map, as introduced in Def. 1.5. The double-use of
the word is somewhat unfortunate but so established that it cannot be avoided. It will usually be clear from the context
which meaning of “kernel” is referred to.
11
Note that the notation |α| in (N2) refers to the simple real modulus for F = R and to the complex modulus
for F = C. All three axioms are intuitively clear if we think about a norm as providing us with a notion
of “length”. Clearly, a length should be strictly positive for all non-zero vectors as stated in (N1), it needs
to scale with the (real or complex) modulus of a scalar if the vector is multiplied by this scalar as in (N2)
and it needs to satisfy the triangle inequality (N3). Since 0v = 0 for any vector v ∈ V , the axiom (N2)
implies that k 0 k = k 0v k = 0 k v k = 0, so the zero vector has norm 0 (and is, from (N1), the only
vector with this property).
Exercise 1.7. Show that, in a normed vector space V , we have k v − w k ≥ k v k−k w k for all v, w ∈ V .
For normed vector spaces V and W we can now introduce an important new sub-class of linear operators
T : V → W , namely bounded linear operators. They are defined as follows 3 .
Definition 1.7. (Bounded linear operators) A linear operator T : V → W is called bounded if there exists
a positive K ∈ R such that k T (v) kW ≤ K k v kV for all v ∈ V . The smallest number K for which this
condition is satisfied is called the norm, k T k, of the operator T .
Having introduced the notion of the norm of a bounded linear operator, we can now introduce isometries.
Exercise 1.8. Show that the real and complex modulus satisfies the triangle inequality.
More interesting examples of normed vector spaces are provided by Rn and Cn with the Euclidean norm
n
!1/2
X
k v k := |vi |2 , (1.22)
i=1
for any vector v = (v1 , . . . , vn )T . (As above, the modulus sign refers to the real or complex modulus,
depending on whether we consider the case of Rn or Cn .) It is immediately clear that axioms (N1) and
(N2) are satisfied and we leave (N3) as an exercise.
Exercise 1.9. Show that the prospective norm on Rn or Cn defined in Eq. (1.22) satisfies the triangle
inequality.
Linear maps T : F n → F m are described by the action of m × n matrices on vectors. Since such
matrices, for a given linear map T , have fixed entries it is plausible that they are bounded with respect
to the norm (1.22). You can attempt the proof of this statement in the following exercise.
Exercise 1.10. Show that linear maps T : F n → F m , where F = R or F = C are bounded, relative to
the norm (1.22).
3
When two normed vector spaces V and W are involved we will distinguish the associated norms by adding the name of
the space as a sub-script, so we write k · kV and k · kW .
12
It is not too difficult to generalise this statement and to show that linear maps between any two finite-
dimensional vector spaces are bounded. For the infinite-dimensional case this is not necessarily true (see
Exercise 1.13 below).
Vector spaces, even finite-dimensional ones, usually allow for more than one way to introduce a norm.
For example, on Rn or Cn , with vectors v = (v1 , . . . , vn )T we can define, for any real number p ≥ 1, the
norm !1/p
Xn
p
k v kp := |vi | . (1.23)
i=1
Clearly, this is a generalisation of the standard norm (1.22) which corresponds to the special case p = 2.
As before, conditions (N1) and (N2) in Def. 1.6 are easily verified. For the triangle inequality (N3) consider
the following exercise.
Exercise 1.11. For two vectors v = (v1 , . . . , vn )T and w = (w1 , . . . , wn )T in Rn or Cn and two real
numbers p, q ≥ 1 with 1/p + 1/q = 1 show that
13
Note that, from property (S2), the scalar product is linear in the second argument and combining this
with (S1) implies for the first argument that
Evidently, in the real case the scalar product is also linear in the first argument (and, hence, it is bi-linear).
In the complex case, it is sesqui-linear which means that, in addition to linearity in the second argument,
it is half-linear in the first argument (vector sums can be pulled out of the first argument while scalars
pull out with a complex conjugate). In the following, we will frequently write equations for the hermitian
case, F = C, keeping in mind that the analogous equations for the real case can be obtained by simply
omitting the complex conjugate.
How are inner product vector spaces and normed vector spaces related? Properties (S1) and (S3)
imply that hv, vi is always real and positive so it makes sense to try to define a norm by
p
k v k := hv, vi . (1.27)
As usual, it is easy to show that this satisfies properties (N1) and (N2) in Def. 1.6. To verify the triangle
inequality (N3) we recall that every scalar product satisfies the Cauchy-Schwarz inequality
from which the triangle inequality follows immediately. In conclusion, Eq. (1.27) does indeed define a
norm in the sense of Def. 1.6 and it is called the norm associated to the scalar product. Hence, any inner
product vector space is also a normed vector space.
Exercise 1.14. Show that a (real or hermitian) scalar product with associated norm (1.27) satisfies the
Cauchy-Schwarz inequality and the triangle inequality in Eq. (1.28). Also show that the norm (1.27)
satisfies the parallelogram law
k v + w k2 + k v − w k2 = 2 k v k2 + k w k2 , (1.29)
for all v, w ∈ V .
Recall that two vectors v, w ∈ V are called orthogonal iff hv, wi = 0. Also, recall that any finite set
of mutually orthogonal non-zero vectors is linearly independent.
Exercise 1.15. For an inner product vector space, show that a finite number of orthogonal non-zero
vectors are linearly independent.
In other words, the orthogonal complement W ⊥ consists of all vectors which are orthogonal to the entire
space W .
Exercise 1.16. Show, for a sub vector space W ⊂ V , that W ∩ W ⊥ = {0}. (This means that the sum of
W and W ⊥ is direct.) For a finite-dimensional V , show that W ⊕ W ⊥ = V .
14
Further, a (finite or infinite) collection i of vectors, where i = 1, 2, . . ., is called an ortho-normal system
iff hi , j i = δij . We know that finite-dimensional vector spaces have a basis and by applying to such a
basis the Gram-Schmidt procedure one obtains an ortho-normal basis. Hence, every finite-dimensional
inner product vector space has an ortho-normal basis. The scalar product makes it easier to work out the
coordinates of a vector v ∈ V relative to an ortho-normal basis by using the formula
n
X
v= αi i ⇐⇒ αi = hi , vi . (1.31)
i=1
as can be easily verified using the orthogonality relations hi , j i = δij . For infinite-dimensional inner
product spaces the story is more involved and will be tackled in Section 2.
It is useful to re-consider the relationship of a vector space V and its dual vector space V ∗ in the
presence of an inner product on V . The main observation is that the inner product induces a map
ı : V → V ∗ defined by
ı(v)(w) := hv, wi . (1.33)
For a vector space over R this map is linear, for a vector space over C it is half-linear (meaning, as for
the first argument of hermitian scalar products, that vector sums pull through while scalars pull out with
a complex conjugation). In either case, this map is injective. For finite-dimensional V it is bijective and
provides an identification of the vector space with its dual.
Exercise 1.17. Show that the map ı : V → V ∗ defined in Eq. (1.33) is injective and that it is bijective
for finite-dimensional V .
The properties of the map ı in the infinite-dimensional case will be further explored later.
In other words, vectors in V are denoted by “ket”-vectors |wi, dual vectors in V ∗ , obtained via the
map ı, by “bra”-vectors hv| while the action of one on the other (which equals the scalar product in
Eq. (1.33)) is simple obtained by combining the two to a “bra-(c)ket”, resulting in
Note, there is nothing particularly profound about this notation - for the most part it simply amounts
to replacing the comma separating the two arguments of an inner product with a vertical bar.
We can ask about interesting new properties of linear maps in the presence of an inner product. First,
recall that scalar products of the form
hv, T (w)i (1.36)
15
for a linear map T : V → V are also called matrix elements of T . Two maps T : V → V and S : V → V
are equal iff all their matrix elements are equal, that is, iff hv, T (w)i = hv, S(w)i for all v, w ∈ V .
Exercise 1.18. Show that two linear maps are equal iff all their matrix elements are equal.
In the finite-dimensional case, the matrix A which describes a linear map T : V → V relative to an
ortho-normal basis 1 , . . . , n is simply obtained by the matrix elements
Definition 1.10. For a linear map T : V → V on a vector space V with scalar product, an adjoint linear
map, T † : V → V is a map satisfying
hv, T wi = hT † v, wi (1.38)
for all v, w ∈ V .
Exercise 1.19. Show that the adjoint map is unique and that it has the properties in Eq. (1.39).
For finite-dimensional inner product vector spaces we can describe both T and its adjoint T † by
matrices relative to an ortho-normal basis 1 , . . . , n . They are given by the matrix elements
Exercise 1.20. Show that the matrix which consists of the matrix elements of T † in Eq. (1.40) is indeed
the hermitian conjugate of the matrix given by the matrix elements of T .
Particularly important linear operators are those which can be moved from one argument of a scalar
product into the other without changing the value of the scalar product and they are called hermitian or
self-adjoint operators.
Definition 1.11. A linear operator T : V → V on a vector space V with scalar product is called self-
adjoint (or hermitian) iff hv, T (w)i = hT (v), wi for all v, w ∈ V .
Hence, a self-adjoint operator T : V → V is one for which the adjoint exists and satisfies T † = T .
Recall that the commutator of two linear operators S, T is defined as
[S, T ] := S ◦ T − T ◦ S , (1.41)
16
We can ask under what condition the composition S ◦ T of two hermitian operators is again hermitian.
Using the above commutator notation, we have
where S = S † and T = T † has been used for the second equivalence. In conclusion, the composition of
two hermitian operators is hermitian if and only if the operators commute. For a complex inner product
vector space, it is also worth noting that, from Eq. (1.39), an anti-hermitian operator, that is an operator
T satisfying T † = −T , can be turned into a hermitian one (and vice versa) by multiplying with ±i, so
Also note that every linear operator T : V → V with an adjoint T † can be written as a (unique) sum of a
hermitian and an anti-hermitian operator. Indeed, defining T± = 21 (T ± T † ) we have T = T+ + T− while
T+ is hermitian and T− is anti-hermitian.
In this way, the matrix element of the operator is obtained by including it between a bra and a
ket vector. This symmetric notation is particularly useful for hermitian operators since they can be
thought of as acting on either one of the scalar product’s arguments. For non-hermitian operators
or for the purpose of proving that an operator is hermitian the Dirac notation is less helpful and
it is sometimes better to use the mathematical notation, as on the RHS of Eq. (1.44). Relative to
an ortho-normal basis 1 , . . . , n of a finite-dimensional inner product space V a self-adjoint linear
operator T : V → V is described by the matrix with entries (in Dirac notation)
This can be easily verified by taking the matrix elements with hi | and |j i of this equation and
by using hi |k i = δik . (Formally, Eq. (1.46) exploits the identification Hom(V, V ) ∼
= V ⊗ V ∗ .) In
particular the identity operator id with matrix elements δij can be written as
n
X
id = |i ihi | . (1.47)
i=1
Exercise 1.21. By acting on an arbitrary vector, verify explicitly that the RHS of Eq. (1.47) is indeed
the identity operator.
Dirac notation can be quite intuitive as can be demonstrated by re-writing some of our earlier equa-
tions. For example, writing the relation (1.31) for the coordinates relative to an orth-normal basis in
17
Dirac notation leads to
n
X
|vi = |i ihi |vi . (1.48)
i=1
Evidently, this can now be derived by inserting the identity operator in the form (1.47). Similarly,
the expressions (1.32) for the scalar product and the norm in Dirac notation
n
X n
X
hv|wi = hv|i ihi |wi , k |vi k2 = hv|vi = hv|i ihi |vi (1.49)
i=1 i=1
Another important class of specific linear maps on an inner product vector space are unitary maps which
are precisely those maps which leave the value of the inner product unchanged in the sense of the following
Definition 1.12. Let V be an inner producr vector space. A linear map U : V → V is called unitary iff
for all v, w ∈ V .
Proposition 1.1. (Properties of unitary maps) A unitary map U with adjoint U † has the following
properties.
(i) Unitary maps U can also be characterized by U † ◦ U = U ◦ U † = idV .
(ii) Unitary maps U are invertible and U −1 = U † .
(iii) The composition of unitary maps is a unitary map.
(iv) The inverse, U † , of a unitary map U is unitary.
For finite-dimensional vector spaces we know that, relative to an ortho-normal basis 1 , . . . , n , a unitary
map Û is described by a unitary matrix (orthogonal matrix in the real case). Indeed, introducing the
matrix U with matrix elements (in Dirac notation)
Still in the finite-dimensional case, consider two choices of ortho-normal basis (1 , . . . , n ) and (01 , . . . , 0n )
and the matrices Tij = hi |T̂ |j i and Tij0 = h0i |T̂ |0j i representing a linear operator T̂ with respect to either.
We have already written down the general relation between those two matrices in Eq. (1.15) but how does
this look for a change from one ortho-normal basis to another? Inserting identity operators (1.47) we find
m
X
Tij0 = h0i |T̂ |0j i = h0i |k ihk |T̂ |l ihl |0j i = Qik Tkl Q∗jl = (QT Q† )ij , Qij := h0i |j i (1.53)
k,l=1
18
so that T 0 = QT Q† . This result is, in fact, consistent with Eq. (1.15) since the matrix Q is unitary, so
Q† = Q−1 . This can be verified immediately:
n
X n
X n
X
(Q† Q)ij = Q∗ki Qkj = h0k |i i∗ h0k |j i = hi |0k ih0k |j i = hi |j i = δij . (1.54)
k=1 k=1 k=1
Using this formalism, we can also verify that Q relates coordinate vectors relative to the two choices of
basis, as stated in Eq. (1.16). From Eq. (1.48), the two coordinate vectors for a given vector |vi are given
by αi = hi |vi and αi0 = h0i |vi. It follows
n
X n
X
αi0 = h0i |vi = 0
hi |j ihj |vi = Qij αj . (1.55)
j=1 j=1
for vectors v = (v1 , . . . , vn )T and w = (w1 , . . . , wn )T . (We have followed the convention, mentioned above,
of writing the equations for the complex case. For the real case, simply drop the complex conjugation.)
The norm associated to this scalar product is of course the one given in Eq. (1.22). Linear maps are
described by n × n matrices and the adjoint of a matrix A, relative to the inner product (1.56), is given
by the hermitian conjugate A† . For the complex case, unitary linear maps are given by unitary matrices,
that is matrices U satisfying
U † U = 1n . (1.57)
For the real case, unitary linear maps, relative to the inner product (1.56), are given by orthogonal
matrices, that is matrices A satisfying
AT A = 1n . (1.58)
Both are important classes of matrices which we will return to in our discussion of symmetries in Section 9.
For an infinite-dimensional example, we begin with the space C[a, b] of continuous (complex-valued)
functions on the interval [a, b], equipped with the scalar product
Z b
hf, gi := dx f (x)∗ g(x) , (1.59)
a
Exercise 1.23. Verify that Eq. (1.59) defines a scalar product on C[a, b]. (Hint: Check the conditions in
Def. 1.9).
The norm associated to this scalar product is given by the first equation (1.25). Consider the linear
operator Mp , defined in Eq. (1.20), which acts by multiplication with the function p. What is the adjoint
of Mp ? The short calculation
Z b Z b
∗
hf, Mp (g)i = dx f (x) (p(x)g(x)) = (p(x)∗ f (x))∗ g(x) = hMp∗ (f ), gi (1.60)
a a
19
shows that
Mp† = Mp∗ , (1.61)
so the adjoint operator corresponds to multiplication with the complex conjugate function p∗ . If p is
real-valued so that p = p∗ then Mp is a hermitian operator. From the definition of the multiplication
operator it is clear that
Mp ◦ Mq = Mpq , M1 = id (1.62)
for two functions p and q. Eqs. (1.61) and (1.62) can be used to construct unitary multiplication operators.
For a real-valued function u we have
so that multiplication with a complex phase eiu(x) (where u is a real-valued function) is a unitary operator.
This can also be verified directly from the scalar product:
Z b ∗ Z b
hMeiu (f ), Meiu (g)i = dx e iu(x)
f (x) iu(x)
e g(x) = f (x)∗ g(x) = hf, gi . (1.64)
a a
For another example of a unitary map, let us restrict to the space Cc (R) of complex-valued functions
on the real line with compact support, still with the scalar product (1.59), but setting a = −∞ and b = ∞.
(The compact support property is to avoid issues with the finiteness of the integral - we will deal with
this in more generality later.) On this space define the “translation operator” Ta : Cc (R) → Cc (R) by
Ta (f )(x) := f (x − a) , (1.65)
for any fixed a ∈ R. Evidently, this operator “shifts” the graph of the function by an amount of a along
the x-axis. Let us work out the effect of this operator on the scalar product. To find the adjoint of Ta we
calculate
Z ∞ y=x−a Z ∞
hf, Ta (g)i = dx f (x)∗ g(x − a) = dy f (y + a)∗ g(y) = hT−a (f ), gi , (1.66)
z}|{
−∞ −∞
so that Ta† = T−a , that is, the adjoint is given by the shift in the opposite direction. To check unitarity
we work out
Z ∞ y=x−a Z ∞
∗
hTa (f ), Ta (g)i = dx f (x − a) g(x − a) = dy f (y)∗ g(y) = hf, gi . (1.67)
z}|{
−∞ −∞
and conclude that Ta is indeed unitary. Alternatively, we can check the unitarity condition Ta† ◦ Ta =
T−a ◦ Ta = id which works out as expected since combining shifts by a and −a amounts to the identity
operation.
To consider differential operators we restrict further to the inner product space Cc∞ (R) of complex-
valued, infinitely times differentiable functions with compact support, still with scalar product defined by
Eq. (1.59), setting a = −∞ and b = ∞. What is the adjoint of the differential operator D = d/dx? The
short calculation
Z ∞ Z ∞
∗ 0 ∗ ∞
hf, D(g)i = dx f (x) g (x) = [f (x) g(x)]−∞ − dx f 0 (x)∗ g(x) = h−D(f ), gi (1.68)
−∞ | {z } −∞
=0
20
(where the boundary term vanishes since the functions have compact support) shows that
†
d d
=− , (1.69)
dx dx
so d/dx is anti-hermitian. As discussed earlier, for a complex inner product space, we can turn this into
a hermitian operator by multiplying with ±i, so that
†
d d
±i = ±i . (1.70)
dx dx
Another lesson from the above computation is that, for scalar products defined by integrals, the property
of being hermitian can depend on boundary conditions satisfied by the functions in the relevant function
vector space. In the case of Eq. (1.68) we were able to reach a conclusion because the boundary term
could be discarded due to the compact support property of the functions.
What about the composite operator Mx ◦ i d/dx? We know that the composition of two hermitian
operators is hermitian iff the two operators commute so let us work out the commutator (writing, for
simplicity, Mx as x)
d d d d d
i ,x = i ◦x−x◦i = i + ix − ix =i. (1.71)
dx dx dx dx dx
(If the above computation looks confusing remember we are dealing with operators, so think of the entire
equation above as acting on a function f . The second step in the calculation then amounts to using
the product rule for differentiation.) Since the above commutator is non-vanishing we conclude that
Mx ◦ i d/dx is not hermitian.
So much for a few introductory examples of how to carry out calculations for infinite-dimensional
inner product spaces. We will now collect a few more mathematical tools required for a more systematic
approach.
Definition 1.13. For a linear map T : V → V on a vector space V over F the number λ ∈ F is called
an eigenvalue of f if there is a non-zero vector v such that
T (v) = λv . (1.72)
so that λ is an eigenvalue iff EigT (λ) 6= {0}. If dim(EigT (λ)) = 1 the eigenvalue is called non-degenerate
(there is only one eigenvector up to re-scaling) and degenerate otherwise (there are at least two linearly
independent eigenvectors).
Let us recall the basis facts in the finite-dimensional case. The eigenvalues can be obtained by finding
the zeros of the characteristic polynomial
21
For each eigenvalue λ the associated eigenspace is obtained by finding all solutions v ∈ V to the equation
(T − λid)v = 0. The most important applications of eigenvalues and eigenvectors in the finite-dimensional
case is to diagonalising linear maps, that is, finding a basis in which the matrix describing the linear map
is diagonal. Recall that diagonalising a linear map T is possible if and only if T has a basis v1 , . . . , vn
of eigenvectors. Indeed, in this case T (vi ) = λi vi and the matrix describing T relative to this basis
is diag(λ1 , . . . , λn ). There are certain classes of linear operators which are known to have a basis of
eigenvectors and can, hence, be diagonalised. These include self-adjoint linear operators and normal
operators, that is, operators satisfying [T, T † ] = 0.
Some useful statements which are well-known in the finite-dimensional case continue to hold in infinite
dimensions, such as the following
A calculation analogous to the one in Eq. (1.68) (where periodicity allows discarding the boundary
term) shows that the operator d/dx is anti-hermitian and d2 /dx2 is hermitian relative to this inner
product.
Exercise 1.26. For the vector space Cp∞ ([−π, π]) with inner product (1.75) show that d/dx is anti-
hermitian and d2 /dx2 is hermitian.
From Theorem 1.24 we, therefore, conclude that eigenvectors of d2 /dx2 for different eigenvalues must
be orthogonal. To check this explicitly, we write down the eigenvalue equation
d2 f
= λf . (1.76)
dx2
For λ > 0 the solutions to this equation are (real) exponential and, hence, cannot be elements of our
vector space of periodic functions. For λ < 0 the eigenfunctions are given by fk (x) = sin(kx) and
gk (x) = cos(kx), where λ = −k 2 . At this point k is still arbitrary real but for fk and gk to be periodic
with period 2π we need k ∈ Z. Of course for fk we can restrict to k ∈ Z>0 and for gk to k ∈ Z≥0 . In
summary, we have the eigenvectors and eigenvalues
fk (x) = sin(kx) k = 1, 2, . . . λ = −k 2
(1.77)
gk (x) = cos(kx) k = 0, 1, . . . λ = −k 2
In particular, this implies that the eigenvalues λ = −k 2 for k = 1, 2, . . . are degenerate. By direct
calculation, we can check that for k 6= l, we have hfk , fl i = hgk , gl i = hfk , gl i = 0, as stated by
22
Theorem 1.24. In fact, we also have hfk , gk i = 0 which is not predicted by the theorem but follows
from direct calculation.
Exercise 1.27. Show that the functions (1.77) satisfy hfk , fl i = hgk , gl i = hfk , gl i = 0 for k 6= l as
well as hfk , gk i = 0, relative to the scalar product (1.75).
The above example leads to the Fourier series which we will discuss in Section 3.
Generalities We begin by defining the ball Br (v) around any v ∈ V with radius r > 0 by
Note that this is the “full” ball including all of the “interior” but, due to the strictly less condition in the
definition, without the bounding sphere.
We would like to consider infinite sequences (v1 , v2 , . . . , vi , . . .) of vectors vi ∈ V which we also denote
by (vi )∞
i=1 or simply by (vi ), when the range of the index i is clear from the context.
23
There is a related, but somewhat weaker notion of convergence which avoids talking about the vector the
sequence converges to. Sequences which converge in this weaker sense are called Cauchy sequences and
are defined as follows. f f
Definition 1.15. (Cauchy sequence) A sequence (vi )∞ i=1 in a normed vector space V is called a Cauchy
. . a positive integer k such that k vi − v.j. k. < for all i, j > k. (See
sequence if, for every > 0, there .exist
Fig. 1.)
In other words, a sequence is a Cauchy sequence if for every small > 0 there is a “tail”, sufficiently far
out, such that the anorm
= x0 x 1 x2 x3 . each
between .. two = b x in the taila is
xn vectors = less x2 x3 .. .The
x0 x1than . notions x convergent
xn = b of
vl. . v
v . . . vk+1 . . k+1
vk . vk
.v.k 1 .v.k 1
. .
B✏ (v) v2 v2
✏
v1 v1
Figure 1: Convergence of a sequence (vk ) to a limit v (left) and Cauchy convergence (right).
sequence).
Exercise 1.30. Show that every convergent sequence in a normed vector space is also a Cauchy sequence.
(Hint: Use the triangle inequality.)
where the inequality follows from the fact we have 2n terms each of which is larger than 1/2n+1 .
Choose an < 1/2 and for any k an integer n with 2n > k + 1. Then, setting i = 2n − 1 and
24
j = 2n+1 − 1 we have i, j > k and |si − sj | > 1/2 > so the condition for Cauchy convergence cannot
be satisfied.
For a series there is also a stronger version of convergence, called absolute convergence.
While every convergent sequence is a Cauchy sequence the opposite is not true. The classical example is
provided by the rational numbers Q (viewed as a normed space, with the absolute modulus as the norm)
and a sequence (qi ) of rational numbers which converges to a real number x ∈ R \ Q. This is clearly a
Cauchy sequence but it does not converge since the prospective limit is not contained in Q (although it
does converge seen as a sequence in R). This example is typical and points to an intuitive understanding
of what non-convergent Cauchy sequences mean. They indicate a deficiency of the underlying normed
vector space which has “holes” and is, as far as convergence properties are concerned, incomplete. This
idea will play an important role for the definition of Banach and Hilbert spaces later on and motivates
the following definition of completeness.
Definition 1.18. A normed vector space is called complete iff every Cauchy series converges.
Definition 1.19. (Open and closed sets) Let V be a normed vector space.
A subset U ⊂ V is called open if, for every u ∈ U , there exists an > 0 such that B (u) ⊂ U .
A subset C ⊂ V is called closed if V \ C is open.
For an arbitrary subset S ⊂ V , the closure of S, denoted S̄, is the smallest closed set which contains S.
A v ∈ V is called a limit point of a subset S ⊂ V if there is a sequence (vi ), entirely contained in S, which
converges to v.
An open set is simply a set which contains a ball around each of its points while a closed set is the
complement of an open set.
Exercise 1.32. Show that a ball Br (v) in a normed vector space is open for every r > 0 and every v ∈ V .
The ideas of convergence and closed sets relate in an interesting way as stated in the following lemma.
Proposition 1.2. For a normed vector space V and a subset S ⊂ V we have the following statements.
(a) S is closed ⇐⇒ All limit points of S are contained in S.
(b) The closure, S̄, consists of S and all its limit points.
Proof. (a) “⇒”: Assume that S is closed so that its complement U := V \ S is open. Consider a limit
point v of S, with a sequence (vi ) contained in S and converging to v. We need to show that v ∈ S.
Assume that v ∈ U . Since U is open there is a ball B (v) ⊂ U entirely contained in U and vi ∈ / B (v)
for all i. But this means the sequence (vi ) does not converge to v which is a contradiction. Hence, our
assumption that v ∈ U was incorrect and v ∈ S.
(a) “⇐”: Assume all limit points of S are contained in S. We need to show that U := V \ S is open.
25
Assume that U is not open, so that there is a u ∈ U such that every ball B (U ) around u contains a
v with v ∈ / U . For every positive integer k, choose such a vk ∈ B1/k (u) with vk ∈ / U . It is clear that
the sequence (vk is entirely contained in S (since its not in the complement U ) and converges to u ∈ / S.
Hence, u is a limit point of S but it is not contained in S. This is a contradiction, so our assumption is
incorrect and U must be open.
(b) Define the set Ŝ = S ∪ {all limit points of S}. Using the result (a) it is straightforward to show that
Ŝ is closed. Hence, Ŝ is a closed set containing S which implies, S̄ being defined as the smallest such set,
that S̄ ⊂ Ŝ. Conversely, since S̄ is closed it must contain by (a) all its limit point, including the limit
points of S, so that Ŝ ⊂ S̄.
Definition 1.20. A set S ⊂ V is called compact iff it is closed and bounded, that is, if there is an R > 0
such that k v k < R for all v ∈ S.
In the context of Hilbert spaces the ideas of a dense subset and separability will become important.
Definition 1.21. A subset S ⊂ V of a normed vector space V is called dense if S̄ = V . A normed vector
space is called separable iff it has a countable, dense subset, that is, a dense subset of the form (vi )∞
i=1 .
The relevance of dense subsets can be seen from the following exercise.
Exercise 1.33. For a normed vector space V and a subset S ⊂ V , proof the equivalence of the following
statements.
(i) S is dense in V
(ii) Every v ∈ V is a limit point of S.
Hence, every element of the vector space can be “approximated” from within a dense subset.
Exercise 1.34. Show that every finite-dimensional normedPn vector space over R (or over C) is separable.
(Hint: Consider a basis (vi )i=1 and linear combinations i=1 αi vi , where αi ∈ Q (or αi ∈ Q + iQ).)
n
An important theoretical statement that will help with some of our later proofs is
Theorem 1.35. (Stone-Weierstrass) The set of polynomials PR ([a, b]) is dense in CR ([a, b]).
Finally, we should briefly discuss other notions of convergence which can be defined for function vector
spaces and are based on point-wise convergence in R or C (with the modulus norm), rather than on the
norm k · k as in Def. 1.14.
We say that the sequence converges point-wise to a function f on U iff the sequence (fi (x))∞
i=1 converges
to f (x) for all x ∈ U (with respect to the real or complex modulus norm).
We say the sequence converges to f uniformly iff for every > 0 there exists an n ∈ N such that |fi (x) −
f (x)| < for all i > n and for all x ∈ U .
Uniform convergence demands that a single value n, specifying the “tail” of the sequence, can be chosen
uniformly for all x ∈ U . Point-wise convergence merely asks for the existence of an n for each point x ∈ U ,
that is, the choice of n can depend on the point x. Hence, uniform convergence is the stronger notion and
it implies point-wise convergence.
26
1.3 Measures and integrals∗
We have already seen that norms and scalar products defined by integrals, as in Eqs. (1.25) and (1.59),
are commonplace in function vector spaces. To understand such spaces properly we need to understand
the integrals and this is where things become tricky. While you do know how to integrate classes of
functions you may not yet have seen the actual definition of the most basic type of integral, the Riemann
integral. To complicate matters further, the Riemann integral is actually not what we need for the present
discussion but we require a rather more general integral - the Lebesgue integral. Introducing either type
of integral properly is labour intense and can easily take up the best part of a lecture course - clearly
not something we can indulge in. On the other hand, these integrals form the background for much of
the discussion that follows. For these reasons, we will try to sketch out the main ideas, starting with the
Riemann integral and then move on to the Lebesgue integral, without going through all the details and
proofs. Along the way, we will introduce the ideas of measures and measure spaces which are a very useful
general structures underlying many mathematical constructions.
that is, by simply summing up the areas of the rectangles associated to the segments of the partition. It
can be shown that this integral is well-defined (that is, it is independent of the choice of partition), that
the space, Γ([a, b]) of all piecewise constant functions on [a, b] is a vector space and that the integral (1.79)
is a linear functional on this space.
Exercise 1.36. Show that the space Γ([a, b]) of all piecewise constant functions an the interval [a, b] ⊂ R
is a vector space. Also show that the integral (1.79) is a linear functional on Γ([a, b]).
Of course having an integral for piecewise constant functions is not yet good enough. The idea is to
define the Riemann integral by approximating functions by piecewise constant functions and then taking
an appropriate limit. For two functions f, g : [a, b] → R we say that f ≤ g (f ≥ g) iff f (x) ≤ g(x)
(f (x) ≥ g(x)) for all x ∈ [a, b]. For an arbitrary (but bounded) function f : [a, b] → R we introduce the
upper and lower integral by 4
Z ∗b Z b
dx f (x) := inf dx ϕ(x) | ϕ ∈ Γ([a, b]) , ϕ ≥ f (1.80)
a a
Z b Z b
dx f (x) := sup dx ϕ(x) | ϕ ∈ Γ([a, b]) , ϕ ≤ f . (1.81)
∗a a
Note that these definitions precisely capture the intuition of approximating f by piecewise constant func-
tions, the upper integral by using piecewise constant function “above” f , the lower integral by using
piecewise constant functions “below” f . (See Fig. 2.) After this set-up we are ready to define the Rie-
mann integral.
4
For a subset S ⊂ R, the supremum, sup(S), is the smallest number which is greater equal than all elements of S. Likewise,
the infimum, inf(S), of S is the largest number less equal than all elements in S.
27
f f
... ...
a = x0 x1 x2 x3 . . . xn = b x a = x0 x1 x2 x3 . . . xn = b x
Figure 2: Approximation of a function f by lower and upper piecewise constant functions, used in the definition
of the Riemann integral.
Definition 1.23. (Riemann integral) A (bounded) function f : [a, b] → R is called Riemann integrable iff
the upper and lower integrals in Eqs. (1.80) and (1.81) are equal. In this case, the common value is called
the Riemann integral of f and is written as
Z b
dx f (x) . (1.82)
a
This is where the work begins. Now we have to derive all the properties of the Riemann integral (which
you are already familiar with) from this definition. We are content with citing
Theorem 1.37. All continuous and piecewise continuous functions f : [a, b] → R are Riemann integrable.
The proof of the above theorem, along with a proof of all the other standard properties of the Riemann
integral starting from Def. 1.23, can be found in most first year analysis textbooks, see for example [10].
We cannot possibly spend more time on this but hopefully the above set-up gives a clear enough starting
point to pursue this independently (which I strongly encourage you to do).
We have already used integrals, somewhat naively, in the definitions (1.25) and (1.59) of a norm and
a scalar product on the space C([a, b]) of continuous functions. We can now be more precise and think of
these integrals as Riemann integrals in the above sense. Unfortunately, this does not lead to particularly
nice properties. For example, for C([a, b]) with the norm (1.25), there are Cauchy convergent sequences
(of functions) which converge to non-continuous functions.
Exercise 1.38. Find an example of a sequence of functions in C([a, b]) (where you can choose a and b)
which converges, relative to the norm (1.25), to a function not contained in C([a, b]).
This means that C([a, b]) is not a complete space, in the sense of Def. (1.18). What is worse, it turns
out that even the space of all Riemann integrable functions on [a, b] is not complete, so the deficiency is
with the Riemann integral. Essentially, the problem is that the Riemann integral is based on a too simple
method of approximation, using finite partitions into intervals. To fix this, we need to be able to measure
the “length” of sets which are more complicated than intervals and this leads to the ideas of measures and
measure sets which we now introduce.
28
1.3.2 Measures and measure sets
This discussion starts fairly general with an arbitrary set X, subsets of which we would like to measure.
Typically, not all such subsets will be suitable for measurement and we need to single out a sufficiently
nice class of subsets Σ, which is called a σ-algebra.
Definition 1.24. (σ-algebra) For a set X, a set of subsets Σ of X is called a σ-algebra if the following is
satisfied.
(S1) {} , X ∈ Σ
(S2) S ∈ Σ ⇒ X \ S ∈ Σ
(S3) Si ∈ Σ for i = 1, 2, · · · ⇒ ∞
S
i=1 Si ∈ Σ
Note that this captures the intuition: we simply multiply the “height” αi of the function over each set Si
with the measure µ(Si ) of this set. The set-up of the general integral on a measure space is summarised
in the following definition.
Definition 1.27. Let (X, Σ, µ) be a measure set. We say a function f : X → R is measurable iff
{x ∈ X | f (x) > α} ∈ Σ for all α ∈ R. For a non-negative measurable function f : X → R the integral is
defined as Z Z
f dµ := sup ϕ dµ | ϕ simple and 0 ≤ ϕ ≤ f . (1.86)
X
29
R
A function is called integrable iff it is measurable and X |f | dµ is finite. For a measurable, integrable
function f : X → R the integral is then defined as
Z Z Z
f dµ := f + dµ − f − dµ , (1.87)
X X X
where f ± (x) := max{±f (x), 0} are the positive and negative “parts” of f . The space of all integrable
functions f : X → R is also denoted by L1 (X). The above construction can be generalised to complex-
valued functions f : X → C by splitting up into real and imaginary parts.
It can be shown that the integral defined above is linear and that the space L1 (X) is a (sub) vector space.
The obvious course of action is to try to make L1 (X) into a normed vector space by using the above
integral to define a norm. However, there is a twist. If there are non-trivial sets S ∈ Σ which are measure
zero, then we have non-trivial functions, for example the characteristic function χS , which integrate to
zero. This is in conflict with the requirement (N1) for a norm in Def. 1.6 which asserts that the zero
vector (that is, the zero function) is the only vector with length zero. Fortunately, this problem can be
fixed by identifying two functions f, g ∈ L1 (X) if they only differ on a set of measure zero, so
The space of so-obtained classes of functions in L1 (X) is called L1 (X) and this set can be made into a
normed space defining the norm k · k : L1 (X) → L1 (X) by
Z
k f k := |f | dµ . (1.89)
X
We can generalise this construction and for 1 ≤ p < ∞ define the spaces
( Z 1/p )
p p
L (X) = f | f is measurable and |f | dµ finite . (1.90)
X
On these spaces, we can identify functions as in Eq. (1.88) and the resulting space of classes, Lp (X), can
be made into normed vector spaces with norm
Z 1/p
p
k f kp := |f | dµ . (1.91)
X
Exercise 1.39. Show that that Eq. (1.91) defines a norm on Lp (X). (Hint: To show (N3) use Minkowski’s
inequality (1.24).)
The all-important statement about the normed vector spaces Lp (X) is the following.
Theorem 1.40. The normed vector spaces Lp (X) in Eq. (1.90) with norm (1.91) are complete.
Our previous experience suggest that the space L2 (X) can in fact be given the structure of an inner
product vector space. To do this we need the following
Exercise 1.41. Show that for f, g ∈ L2 (X) it follows that f¯g is integrable, that is f¯g ∈ L1 (X). (Hint:
Use Hölder’s inequality in Eq. (1.24).)
30
Hence, for two functions f, g ∈ L2 (X), the prospective scalar product
Z
hf, gi := f ∗ g dµ (1.92)
X
is well-defined.
Exercise 1.42. Show that Eq. (1.92) defines a scalar product on L2 (X).
Recall that X is still an arbitrary set so the above construction of measure sets and integrals is very
general. Especially, the statement (1.40) about completeness is quite powerful. It says that the spaces
Lp (X) behave nicely in terms of convergence properties - every Cauchy series converges. This is quite
different from what we have seen for the Riemann integral. We should now exploit this general construction
by discussing a number of examples.
In this case, Ω is called the sample space, Σ the event space and p the probability measure.
Comparing this definition with Def. 1.25 shows that a probability space (Ω, Σ, p) is, in fact, a particular
measure space with a few additional properties for p, in order to make it a suitable measure for proba-
bility. (The condition (M1) in Def. 1.25, µ({}) = 0, can be deduced from the Kolmogorov axioms.) The
measurable functions f : Ω → R on this space are also called random variables and the integral
Z
E[f ] := f dp (1.94)
Ω
31
which are normed vector spaces with norm
∞
!1/p
X
p
k (xi ) kp = |xi | . (1.96)
i=1
Recall that we know from Theorem 1.40 that the spaces `p are complete, relative to this norm. The space
`2 is an inner product space with scalar product
∞
X
h(xi ), (yi )i = x̄i yi . (1.97)
i=1
Lebesgue measure: The Lebesgue measure provides a measure on R (and, more generally, on Rn ) but
constructing it takes some effort and time. Instead we take a short-cut and simply state the following
theorem.
Theorem 1.43. There is a σ-algebra ΣL on R and a measure µL on ΣL , called the Lebesgue measure,
with the following properties.
(L1) All intervals [a, b] ∈ ΣL .
(L2) µL ([a, b]) = b − a
(L3) The sets S of measure zero in ΣL are S∞ characterisedPas follows. For any > 0 there are intervals
∞
[ai , bi ], where i = 1, 2, · · · , such that S ⊂ i=1 [ai , bi ] and i=1 (bi − ai ) < .
The measure space (R, ΣL , µL ) is uniquely characterised by these properties.
Note that the Lebesgue measure leads to non-trivial sets with measure zero. For example, any finite set
of points in R has measure zero.
The associated spaces Lp (U ), obtained after the identification (1.88) of functions which only differ on sets
of measure zero, are complete normed vector spaces with norm
Z 1/p
k f kp = dx |f (x)|p . (1.100)
U
32
The space L2 (U ) is an inner product vector space with inner product
Z
hf, gi = dx f (x)∗ g(x) . (1.101)
U
As for the relation between the Riemann and the Lebesgue integrals we have
Theorem 1.45. Every Riemann-integrable function is Lebesgue integrable and for such functions the two
integrals are equal.
This means that for practical calculations with the Lebesgue integral we can use all the usual rules of
integration, as long as the integrand is sufficiently “nice” (for example, Riemann integrable). While the
Riemann-integrable functions are included in the Lebesgue-integrable ones, the latter set is much larger
and this facilitates the completeness properties associated to the Lebesgue integral. In the following, when
we write an integral, we usually refer to the Lebesgue integral.
33
2 Banach and Hilbert spaces∗
Banach and Hilbert spaces are central objects in functional analysis and their systematic mathematical
study easily fills a lecture course. Clearly, we cannot afford to do this so we will focus on basics and some
of the results relevant to our later applications. Proofs are provided explicitly only when they can be
provided in a concise fashion and references will be given whenever proofs are omitted. Our main focus
will be on Hilbert spaces which provide the arena of most of the applications discussed later and indeed
are the correct setting for quantum mechanics. We begin with the more basic notion of Banach spaces,
then move on to Hilbert spaces and finish with a discussion of operators on Hilbert spaces.
Recall from Def. 1.18 that completeness means convergence of every Cauchy series, so a Banach space is a
vector space with a basic notion of geometry, as provided by the norm, and good convergence properties.
We have already encountered several important examples of Banach spaces which we recall.
• For a measure space (X, Σ, µ), where X is an arbitrary set, we have defined the Banach spaces Lp (X)
1/p
in Eq. (1.90). It consists of all measurable functions f : X → R (or f : X → C) with X |f |p dµ
R
finite and the norm is given in Eq. (1.91). Completeness of these normed vector space is asserted by
Theorem 1.40. This is a very large class of Banach spaces which includes many interesting examples,
some of which we list now.
• Associated to the measure space (N, Σc , µc ) with counting measure µc , introduced in the previous
P∞ p 1/p finite. The
i=1 in R (or C) with ( i=1 |xi | )
sub-section, we have the space `p of all sequences (xi )∞
norm on this space is provided by Eq. (1.96) and Theorem 1.40 guarantees completeness.
according to Eq. (1.88).) The norm on these spaces is given by Eq. (1.100) and completeness is
guaranteed by Theorem 1.40. We will sometimes write LpR (U ) or LpC (U ) to indicate whether we are
talking about real or complex valued functions.
An important theoretical property for the Banach spaces Lp ([a, b]) which we will need later is
Theorem 2.1. The space C([a, b]) is dense in Lp ([a, b]). Further, the space Cc∞ (Rn ) is dense in Lp (Rn ).
34
2.2 Hilbert spaces
Hilbert spaces are defined as follows.
Definition 2.2. An inner product vector space H is called a Hilbert space if it is complete (relative to the
norm associated to the scalar product).
We know that the Banach spaces given in the previous sub-section can be equipped with a scalar product
when p = 2 and this provides us with examples of Hilbert spaces.
• For the measure set (X, Σ, µ), the space L2 (X), defined in Eq. (1.90) is an inner product vector
space with inner product given by Eq.(1.92). We already know that this is a Banach space (relative
to the norm associated to the scalar product), so L2 (X) is complete and, hence, a Hilbert space.
• Associated to the measure space (N, Σc , µc ) with counting measure µc we have the space `2 of all
2 1/2 finite. An inner product on this space is given by
P∞
i=1 in R (or C) with
sequences (xi )∞
i=1 |xi |
Eq. (1.97). Since `2 is a Banach space it is complete and is, hence, also a Hilbert space.
• There is a useful generalisation of the previous example which we will need later. On an interval
[a, b] ⊂ R introduce an everywhere positive, integrable function w : [a, b] → R>0 , called the weight
function, and define the space L2w ([a, b]) as the space of measurable functions f : [a, b] → R with
R 1/2
dx w(x)|f (x)|2 finite. We can introduce
[a,b]
Z
hf, gi := dx w(x)f (x)∗ g(x) . (2.1)
[a,b]
With the usual identification of functions, as in Eq. (1.88), this leads to a Hilbert space, called
L2w ([a, b]), with scalar product (2.1).
35
2.2.2 Orthogonal basis
We have seen that an ortho-normal basis for a finite-dimensional Hilbert space is really the most convenient
tool to carry out calculations. We should now discuss the concept of ortho-normal basis for infinite-
dimensional Hilbert spaces. One question we need to address first is what happens when we take a limit
inside one of the arguments of the scalar product.
Lemma 2.1. For a convergent sequence (vi )∞ i=1 in a Hilbert space H and any vector w ∈ H we have
limi→∞ hw, vi i = hw, limi→∞ vi i. A similar statement applies to the first argument of the scalar product.
Proof. Set v := limi→∞ vi and consider the inequality
where the last step follows from the Cauchy-Schwarz inequality (1.28). Convergence of (vi ) to v means
we can find, for each > 0, a k such that k vi − v k < /k w k for all i > k. This implies that
for all i > k and, hence, that limi→∞ hw, vi i = hw, vi. The analogous statement for the first argument of
the scalar product follows from the above by using the property (S1) in Def. 1.9.
The above lemma says that a limit can be “pulled out” of the arguments of a scalar product, an important
property which we will use frequently. Another technical statement we require asserts the existence and
uniqueness of a point of minimal distance.
Lemma 2.2. (Nearest point to a subspace) Let W be a closed, non-trivial sub vector space of a Hilbert
space H. Then, for every v ∈ H there is a unique w0 ∈ W such that
k v − w0 k ≤ k v − w k (2.4)
for all w ∈ W .
Proof. We set δ := inf{k v − w k | w ∈ W } and choose a sequence (wi )∞
i=1 contained in W with
limi→∞ k v − wi k = δ. We want to proof that this sequence is a Cauchy sequence so we consider
1 2
k wi − wj k2 = k (wi − v) + (v − wj ) k2 = 2k wi − v k2 + 2k v − wj k2 − 4k (wi + wj ) − v k , (2.5)
2
where the parallelogram law (1.29) has been used in the last step. Since W is a sub vector space it
is clear that the vector 12 (wi + wj ) which appears in the last term above is in W . This means that
k 21 (wi + wj ) − v k ≥ δ and we have
k wi − wj k2 ≤ 2k wi − v k2 + 2k v − wj k2 − 4δ 2 . (2.6)
The RHS of this inequality goes to zero as i, j → ∞ which shows that (wi ) is indeed a Cauchy sequence.
Since a Hilbert space is complete we know that every Cauchy sequance converges and we set w0 :=
limi→∞ wi . Since W is assumed to be closed it follows that w0 ∈ W and, with Lemma 2.1 we have
k v − w0 k = δ. This means that w0 is indeed a point in W of minimal distance to v.
It remains to show that w0 is unique. For this assume there is another w̃ ∈ W such that k v − w̃ k = δ.
Then repeating the calculation (2.5) with wi and wj replaced by w0 and w̃ we have
1 2
k w0 − w̃ k2 = 2k w0 − v k2 + 2k v − w̃ k2 − 4k (w0 + w̃) − v k ≤ 2δ 2 + 2δ 2 − 4δ 2 = 0 , (2.7)
2
so that w̃ = w0 .
36
The above Lemma is the main technical result needed to proof the following important statement about
direct sum decompositions in Hilbert spaces.
Theorem 2.2. (Direct sum decomposition) For any closed sub vector space W of a Hilbert space H we
have H = W ⊕ W ⊥ .
Proof. For W = {0} we have W ⊥ = H so that the statement is true. Now assume that W 6= {0}. For
any v ∈ H we can choose, from Lemma 2.2, a “minimal distance” w0 ∈ W and write
v = w0 + (v − w0 ) . (2.8)
This is our prospective decomposition so we want to show that v − w0 ∈ W ⊥ . To do this we assume that
v − w0 ∈/ W ⊥ so that there is a u ∈ W such that hv − w0 , ui = 1. A short calculation for any α ∈ R show
that
k v − w0 − αu k2 = k v − w0 k2 − 2α + α2 k u k2 . (2.9)
For sufficiently small α the sum of the last two terms on the RHS are negative so in this case k v − w0 − αu k <
k v − w0 k. This contradicts the minimality property of w0 and we conclude that indeed v − w0 ∈ W ⊥ .
Hence, we have H = W + W ⊥ . That this sum is direct has already been shown in Exercise 1.16.
One of our goals is to obtain a generalisation of the formula (1.31) to the infinite-dimensional case. Recall
that the sequence (i )∞
i=1 is called an ortho-normal system iff it satisfies hi , j i = δij . We need to worry
about the convergence properties of such ortho-normal systems and this is covered by the following
Lemma 2.3. (Bessel inequality) Let (i )∞ i=1 be an ortho-normal system in a Hilbert space H and v ∈ H.
Then we have the following statements.
(i) P∞
P 2
P∞ 2 2
i=1 |hi , vi| converges and i=1 |hi , vi| ≤ k v k .
∞
(ii) i=1 hi , vii converges.
Proof. (i) Introduce the partial sums sk = ki=1 hi , vii . A short calculation show that
P
k
X
k v − sk k2 = k v k2 − |hi , vi|2 , (2.10)
i=1
We are now ready to tackle (infinite) ortho-normal system in a Hilbert space H. We recall that an
ortho-normal system is a sequence (i )∞ ∞
i=1 with hi , j i = δij . By the span, Span(i )i=1 , we mean the
sub-space which consists of all finite linear combinations of vectors i . The following theorem provides
the basic statements about ortho-normal systems.
37
Theorem 2.3. Let (i )∞ i=1 be an ortho-normal system in a Hilbert space H. Then, the following statements
are equivalent.
(i) Every v ∈ H can be written as v = P ∞
P
i=1 hi , vii .
2 ∞
(ii) For every v ∈ H we have k v k = i=1 |hi , vi|2 .
(iii) If hi , vi = 0 for all i = 1, 2, · · · then v = 0.
(iv) Span(i )∞ i=1 = H
Proof. For a statement of this kind it is sufficient to show that (i) ⇒ (ii) ⇒ (iii) ⇒ (iv) ⇒ (i) and we will
proceed in this order.
(i) ⇒ (ii): This follows easily by inserting the infinite sum from (i) into k v k2 = hv, vi and using
Lemma 2.1.
(ii) ⇒ (iii): If hi , vi = 0 for all i then the relation in (ii) implies that k v k = 0. Since the zero vector is
the only one with norm zero it follows that v = 0.
(iii) ⇒ (iv): Set W = Span(i )∞ i=1 (where we recall that the bar means the closure, so W is a closed sub
vector space). Then, from Theorem 2.2, we know that H = W ⊕ W ⊥ . Assume that W ⊥ 6= {0}. Then we
have a non-zero vector v ∈ W ⊥ such that hi , vi = 0 for all i, in contradiction of the statement (iii). This
means that W ⊥ = {0} and, hence, H = W .
(iv) ⇒ (i): We know from Lemma 2.3 that w := ∞
P
i=1 hi , vii is well-defined. A short calculation shows
that hv − w, j i = 0 for all j. This means that v − w ∈ W ⊥ (using the earlier definition of W ) but since
H = W from (iv) we have W ⊥ = {0} and, hence, v = w.
After this preparation, we can finally define an ortho-normal basis of a Hilbert space.
Definition 2.3. An ortho-normal system (i )∞ i=1 in a Hilbert space H is called an ortho-normal basis if
it satisfies any of the conditions in Theorem 2.3.
Thanks to Theorem 2.3 an ortho-normal basis on a Hilbert space provides us with the desired generalisa-
tions of the formulae we have seen in the finite-dimensional case. Eq. (1.31) for the expansion of a vector
in terms of an ortho-normal basis simply generalises to
∞
X
v= αi i ⇐⇒ αi = hi , vi . (2.12)
i=1
P
and we know
P that the now infinite sum always converges to the vector v. For two vectors v = i αi i
and w = j βj j we have the generalisation of Eq. (1.32)
∞
X ∞
X ∞
X ∞
X
2
hv, wi = hv, i ihi , wi = αi∗ βi , kvk = 2
|hi , vi| = |αi |2 , (2.13)
i=1 i=1 i=1 i=1
where, in the first equation, we have also used Lemma 2.1 to pull the infinite sums out of the scalar
product. The map v → (αi )∞ 2
i=1 defined by Eq. (2.12) is, in fact, a vector space isomorphism H → ` from
our general Hilbert space into the Hilbert space `2 of sequences. Moreover, as Eq. (2.13) shows, this map
is consistent with the scalar products defined on those two Hilbert spaces. The last equation (2.13) which
allows calculating the norm of a vector in terms of an infinite sum over the square of its coordinates is
also referred to as Parseval’s equation. As we will see, for specific examples it can lead to quite non-trivial
relations.
Recall the situation in the finite-dimensional case. Finite-dimensional (inner product) vector spaces
over R or C are isomorphic to Rn or Cn and via this identification abstract vectors are described by
(coordinate) column vectors. If an ortho-normal basis underlies the identification then the scalar product
in coordinates is described by the standard scalar product in Rn or Cn (see Eq. (1.32)).
38
We have now seen that the situation is very similar for infinite-dimensional Hilbert spaces but vectors
in Rn or Cn are replaced by infinite sequences in `2 . However, there is still a problem. While we now
appreciate the usefulness of an ortho-normal basis in a Hilbert space we do not actually know yet whether
it exists.
We recall that the Hilbert space `2 which we have introduced earlier consist of sequences (xi )∞
i=1 (where
P∞ 1/2
xi ∈ R or xi ∈ C) with i=1 |xi |
2 < ∞ and has the scalar product
∞
X
h(xi ), (yi )i = x∗i yi . (2.14)
i=1
It is not difficult to show that `2 has an ortho-normal basis. Introduce the sequences
ei = (0, . . . , 0, 1, 0, . . . , 0, . . .) (2.15)
with a 1 in position i and zero everywhere else. These are obviously the infinite-dimensional analogous of
the standard unit vectors in Rn or Cn . It is easy to see that they are an ortho-normal system relative to
the scalar product (2.14).The following theorem states that they form an ortho-normal basis of `2 .
Theorem 2.4. The “standard unit sequences” ei , defined in Eq. (2.15), form an ortho-normal basis of
the Hilbert space `2 .
Proof. For a sequence (xi ) ∈ `2 we have, from direct calculation, that
∞
X ∞
X
2 2
k (xi ) k = |xj | = |hej , (xi )i|2 , (2.16)
j=1 j=1
39
2.2.3 Dual space
In Eq. (1.33) we have introduced the map ı from a vector space to its dual and we would now like to
discuss this map in the context of a Hilbert space. So we have ı : H → H∗ defined as
This map assigns to every vector v ∈ H a functional ı(v) in the dual H∗ . We have seen that it is always
injective and that, in the finite-dimensional case, every functional can be obtained in this way, so ı is
bijective. The following theorem asserts that the last statement continues to be true for Hilbert spaces.
Theorem 2.6. (Riesz) Let H be a Hilbert space and ϕ ∈ H∗ a functional. Then there exists a v ∈ H
such that ϕ = ı(v), where ı is the map defined in Eq. (2.18).
Proof. If ϕ(w) = 0 for all w ∈ H then we have ϕ = ı(0). Let us therefore assume that Ker(ϕ) 6= H. Then
(Ker(ϕ))⊥ 6= {0} and there exists a u ∈ (Ker(ϕ))⊥ with ϕ(u) = 1. It follows that
where v = u/k u k2 .
We already know that the map ı is injective so this theorem tells us that ı : H → H∗ is an isomorphism
just as it is in the finite-dimensional case. This is why it makes sense to generalise Dirac notation
to a Hilbert space. If our Hilbert space has an ortho-normal basis |i i, then Eq. (2.12) can be written in
Dirac notation as
X∞
|vi = |i ihi |vi , (2.23)
i=1
where h·|·i`2 denotes the standard scalar product (1.97) on `2 . Hence, the above map H → `2 preserves
the scalar product. Also the identity operator can, at least formally, be written as
∞
X
id = |i ihi | . (2.25)
i=1
40
To pursue this somewhat further consider a linear operator T̂ : H → H, a vector v ∈ H and an ortho-
normal basis (i ) of H so that the operator has matrix elements Tij = hi |T̂ |j i and the vector has
coordinates ai = hi |vi. Then the action T̂ |vi of the operator on the vector has coordinates bi := hi |T̂ |vi
which can be written as
X X
bi = hi |T̂ |vi = hi |T̂ |j ihj |vi = Tij aj . (2.26)
j j
In this way, the Hilbert space action of an operator on a vector is turned into a multiplication of a matrix
and a vector (although infinite-dimensional) in `2 . Similarly, composition of operators in H turns into
“matrix multiplication” in `2 as in the following exercise.
Exercise 2.7. For two operators T̂ , Ŝ : H → H with matrix elements P Tij = hi |T̂ |j i and Sij = hi |Ŝ|j i
show that the matrix elements of T̂ ◦ Ŝ are given by hi |T̂ ◦ Ŝ|j i = k Tik Skj .
The correspondence between a Hilbert space H with ortho-normal basis and the space of sequences `2
outlined above is the essence of how quantum mechanics in the operator formulation relates to matrix
mechanics.
hT † v, wi = hv, T wi . (2.27)
so that the so-defined T † does indeed have the required property for the adjoint. By tracing through the
above construction it is easy to show that T † is linear. We still need to show that it is bounded.
2
k T † v k = hT † v, T † vi = hv, T T † vi ≤ k v k k T T † v k ≤ k v k k T k k T † v k . (2.30)
41
If k T † v k = 0 the boundedness condition is trivially satisfied. Otherwise, we can divide by k T † v k and
obtain k T † v k ≤ k v k k T k.
hv, T wi = hT v, wi , (2.31)
for all v, w ∈ H, that is, if it can be moved from one argument of the scalar product to the other.
Comparison with Theorem 2.8 shows that T is self-adjoint iff T = T † .
and we can introduce the “shifted”” operators q := Q − Q̄ and p = P − P̄ which also satisfy [q, p] = i.
The variances of Q and P are defined by
∆Q ∆P = k qψ kk pψ k ≥ |hqψ|pψi| = |hψ|qp|ψi|
∆Q ∆P = k pψ kk qψ k ≥ |hpψ|qψi| = |hψ|pq|ψi| .
and, hence,
1
∆Q ∆P ≥ , (2.32)
2
which is, of course, Heisenberg’s uncertainty relation. The lesson from this derivation is that the
uncertainty relation is really quite general. All it requires is two hermitian operators Q and P with
commutator [Q, P ] = i and then it follows more or less directly from the Cauchy-Schwarz inequality.
a
For simplicity we set ~ to one.
42
We note that a compact operator is bounded. If it was not bounded, there would be a sequence (vk ) with
k vk k = 1 and k T vk k > k, and, in this case the sequence (T vk ) cannot have a convergent sub-sequence,
contradicting compactness. This means, from Theorem 2.8 that compact operators always have an adjoint.
In fact, we will be focusing on self-adjoint, compact operators and their eigenvalues and eigenvectors
have the following properties.
Theorem 2.9. Let T : H → H be a compact, self-adjoint operator on a (separable) Hilbert space H. Then
we have the following statements:
(i) The eigenvalues of T are real and eigenvectors for different eigenvalues are orthogonal.
(ii) The set of non-zero eigenvalues of T is either finite or it is a sequence which tends to zero.
(iii) Each non-zero eigenvalue has a finite degeneracy.
(iv) There is an ortho-normal system (k ) of eigenvectors with non-zero eigenvalues λk which forms a
⊥
basis on Im(T ). Further, we have Im(T ) = Ker(T ) and H = Im(T ) ⊕ Ker(T ).
(v) The operator T has the representation
X
Tv = λk hk , vik . (2.33)
k
Proof. We have already shown (i). For the proof of (ii) and (iii) see, for example, Ref. [5]. The ortho-
normal system (k ) in (iv) is constructed by applying the Gram-Schmidt procedure to each eigenspace
Ker(T − λid) with λ 6= 0 which, from (iii) is finite-dimensional. The proof that the vectors (k ) form an
ortho-normal basis of Im(T ) can be found in Ref. [5].
To show the formula in (v) we set W = Im(T ) and note, from Theorem 2.2, that H = W ⊕ W ⊥ so
that every v ∈ H can be written as v = w + u, where w ∈ W and u ∈ W ⊥ . Since u ∈ W ⊥ we have
0 = hT x, ui = hx, T ui for all x ∈ H and this means that T u = 0. (Hence, W ⊥ ⊂ Ker(T ) and the reverse
inclusion is also easy to show so W ⊥ = Ker(T ).) The other component, w ∈ W , can be written as
X
w= hk , wik (2.34)
k
The ortho-normal system (k ) from the above theorem provides a basis for Im(T ) but not necessarily
for all of H. This is because the vectors k correspond to non-zero eigenvalues and we are missing the
eigenvectors with zero eigenvalue, that is, the kernel of T . Fortunately, from part (iv) of Theorem 2.9 we
have the decomposition
⊥
H = Im(T ) ⊕ Ker(T ) , Im(T ) = Ker(T ) , (2.36)
so we can complete (k ) to a basis of H by adding a basis for Ker(T ). In conclusion, for a compact,
self-adjoint operator on a Hilbert space, we can always find an ortho-normal basis of the Hilbert space
consisting of eigenvectors of the operator. In Dirac notation and dropping the argument v, Eq. (2.33) can
be written as X
T = λk |k ihk | , (2.37)
k
where T |k i = λk |k i. This is the generalisation of the finite-dimensional result (1.46).
43
2.3.3 The Fredholm alternative
Suppose, for a compact, self-adjoint operator T : H → H, a given u ∈ H and a constant λ 6= 0, we would
like to solve the equations
(T − λ id)v = 0 , (T − λ id)v = u . (2.38)
It turns out that many of the differential equations we will consider later can be cast in this form. The right
Eq. (2.38) is an inhomogeneous linear equation and the equation on the left-hand side is its homogeneous
counterpart. Clearly, we have
{solutions of homogeneous equation} = Ker(T − λ id) . (2.39)
We also know, if a solution v0 of the inhomogeneous equation exists, then its general solution is given by
v0 + Ker(T − λ id). There are two obvious cases we should distinguish.
(a) The number λ does not equal any of the eigenvalues of T . In this case Ker(T − λ id) = {0} so that
the homogeneous equation in (2.38) only has the trivial solution.
(b) The number λ does equal one of the eigenvalues of T so that Ker(T −λ id) 6= {0} and the homogenous
equation in Eq. (2.38) does have non-trivial solutions.
The above case distinction is called the Fredholm alternative. Of course we would like to discuss the
solutions to the inhomogeneous equation in either case. The obvious way to proceed is to start with an
ortho-normal basis (k ) of eigenvectors of T with corresponding eigenvalues λk , so that T vk = λk vk (Here,
we include the eigenvectors with eigenvalue zero.), expand v and u in terms of this basis
X X
v= hk , vik , u= hk , uik , (2.40)
k k
and use the representation (2.33) of the operator T . Inserting all this into the inhomogeneous Eq. (2.38)
gives
!
X X
(T − λ id)v = (λi − λ)hi , vii = hi , uii . (2.41)
i i
Taking the inner product of this equation with a basis vector k leads to
(λk − λ)hk , vi = hk , ui , (2.42)
for all k. Now let us consider the two cases above. In case (a), λ does not equal any of the eigenvalues λk
and we can simply solve Eq. (2.42) for all k to obtain
hk , ui X hk , ui
hk , vi = ⇒ v= k . (2.43)
λk − λ λk − λ
k
This result means that in case (a) we have a unique solution v, as given by the above equation, for any
inhomogeneity u. The situation is more complicated in case (b). In this case λ equals λk for some k and
for such cases the LHS of Eq. (2.42) vanishes. This means in this case we have a solution if and only if
hk , ui = 0 for all k with λk = λ . (2.44)
Another way of stating this condition is to say that we need the inhomogeneity u to be perpendicular to
Ker(T − λ id). If this condition is satisfied, the solution can be written as
X hk , ui X
v= k + αk k (2.45)
λk − λ
k:λk 6=λ k:λk =λ
where the αk are arbitrary numbers and the second sum in this expression of course represents a general
element of Ker(T − λ id). This discussion can be summarised in the following
44
Theorem 2.10. (Fredholm alternative) Let T : H → H be a compact, self-adjoint operator on a Hilbert
space H with a basis of eigenvectors k with associated eigenvalues λk , u ∈ H and λ 6= 0. For the solution
to the equation
(T − λ id)v = u (2.46)
the following alternative holds:
(a) The number λ is different from all eigenvalues λk . Then the equation (2.46) has a unique solution for
all u ∈ H given by
X hk , ui
v= k . (2.47)
λk − λ
k
(b) The number λ equals one of the eigenvalues. In this case, a solution exists if and only if
45
3 Fourier analysis
The Fouries series and the Fourier transform are important mathematical tools in practically all parts of
physics. Intuitively, they allow us to decompose functions into their various frequency components. The
Fouries series, which we discuss first, deals with functions on finite intervals and leads to a decomposition
in terms of a discrete spectrum of frequencies. Mathematically speaking, we find an ortho-normal basis of
functions with well-defined frequencies on the Hilbert space L2 ([−π, π]) (say) and the coordinates relative
to this basis represent the strength of the various frequencies. The Fourier transform applies to functions
defined on the entire real line (or on Rn ) and leads to a decomposition with a continuous spectrum of
frequencies. Mathematically, the Fourier transform can be understood as a unitary map on the Hilbert
space L2 (Rn ).
Of course we are not stuck with the specific interval [0, π] but a simple re-scaling x → πx/a for a > 0
shows that the Hilbert space L2R ([0, a]) with scalar product
Z a
hf, gi = dx f (x)g(x) , (3.3)
0
46
has an ortho-normal basis
r
1 2 kπx
c̃0 = √ , c̃k := cos , k = 1, 2, . . . . (3.4)
a a a
Let us be more explicit about what this actually means. From part (i) of Theorem 2.3 we conclude that
every (real-valued) square integrable function f ∈ L2R ([0, a]) can be written as
∞ r ∞
X α0 2X kπx
f (x) = αk c̃k (x) = √ + αk cos (3.5)
a a a
k=0 k=1
where
a
r Z a
1 2 kπx
Z
α0 = hc̃0 , f i = √ dx f (x) , αk = hc̃k , f i = dx cos f (x) , k = 1, 2, . . . . (3.6)
a 0 a 0 a
q
α0 2
It is customary to introduce the coefficients a0 = 2 a and ak =
√
a αk , for k = 1, 2, . . . in order to
re-distribute factors:
∞
2 a
a0 X kπx kπx
Z
f (x) = + ak cos where ak = dx cos f (x) . (3.7)
2 a a 0 a
k=1
This series is called the cosine Fourier series and the ak are called the (cosine) Fourier coefficients. It is
important to remember that the equality in the first Eq. (3.7) holds in L2R ([0, a]), a space which consists of
classes of functions which have been identified if they differ only on sets of Lebesgue-measure zero. This
means that the function f and its Fourier series do not actually have to coincide at every point x ∈ [0, a]
- they can differ on a space of measure zero. However, we know that the (cosine) Fourier series always
converges to the function f in the norm on L2R ([0, a]).
We know from part (ii) of Theorem 2.3 that the norm of f can be calculated in terms of its Fourier
coefficients as
∞ ∞
2 a 2 2X |a0 |2 X
Z
dx |f (x)|2 = k f k2 = |hc̃k , f i|2 = + |ak |2 . (3.8)
a 0 a a 2
k=0 k=1
Proof. This proof is very similar to the one for Theorem 3.2 and can, for example, be found in Ref. [5].
47
As in the cosine case, we can re-scale by x → πx/a to the interval [0, a] and obtain an ortho-normal basis
r
2 kπx
s̃k = sin , k = 1, 2, . . . (3.10)
a a
for L2R ([0, a]) with scalar product (3.3). Hence, every function f ∈ L2R ([0, a]) can be expanded as
∞ r ∞ r Z a
X 2X kπx 2 kπx
f (x) = βk s̃k (x) = βk sin where βk = hs̃k , f i = dx sin f (x) .
a a a 0 a
k=1 k=1
q
2
Introducing to the coefficients bk = a βk this can be re-written in the standard notation
∞ a
kπx 2 kπx
X Z
f (x) = bk sin where bk = dx sin f (x) . (3.11)
a a 0 a
k=1
This series is called the sine Fourier series and the bk are called the (sine) Fourier coefficients. Of course
there is also a version of Parseval’s equation which reads
a ∞ ∞
2 2 2X
Z X
dx |f (x)|2 = k f k2 = |hs̃k , f i|2 = |bk |2 . (3.12)
a 0 a a
k=1 k=1
Exercise 3.4. Check that the functions (3.14) form an ortho-normal system on L2R ([−π, π]).
Proof. Every function f ∈ L2R ([−π, π]) can be written as f = f+ + f− , where f± (x) = 21 (f (x) ± f (−x)) are
the symmetric and anti-symmetric parts of f . The functions f± can be restricted to the interval [0, π] so
that they can be viewed as elements of L2R ([0, π]). From Theorem 3.2 we can write down a cosine Fourier
series for f+ and from Theorem 3.3 f− has a sine Fourier series, so
∞
X ∞
X
f+ = αk c̃k , f− = βk s̃k . (3.15)
k=0 k=1
48
Since both sides of the first equation are symmetric and both sides of the second equation anti-symmetric
they can both be trivially extended back to the interval [−π, π]. Then, summing up
∞
X ∞
X
f = f+ + f− = αk c̃k + βk s̃k (3.16)
k=0 k=1
proves the statement.
This proof also points to an interpretation of the relation between the sine and cosine Fourier series on
the one hand and the standard Fourier series on the other hand. Starting with a function f ∈ L2R ([0, π])
we can write down the cosine Fourier series. But we can also extend f to a symmetric function on [−π, π]
so it becomes an element of L2R ([−π, π]) and we can write down a standard Fourier series. Of course, f
being symmetric, this Fourier series only contains cosine terms and it looks formally the same as the cosine
Fourier series but is valid on the larger interval [−π, π]. Similarly, for f ∈ L2R ([0, π]) we can write down
the sine Fourier series and extend f to an anti-symmetric function on the interval [−π, π]. The standard
Fourier series for this anti-symmetric function then only contains sine terms and formally coincides with
the sine Fourier series. Conversely, if we start with an even (odd) function f ∈ L2R ([−π, π]) then the
Fourier series only contains cosine (sine) terms and we can restrict the expansion to the interval [0, π] so
it becomes a cosine (sine) Fourier series.
As before, we can use the re-scaling x → πx/a to transform to the interval [−a, a] and obtain the
ortho-normal basis
1 1 kπx 1 kπx
c0 := √ , ck := √ cos , sk := √ sin , k = 1, 2, . . . , (3.17)
2a a a a a
for the Hilbert space L2R ([−a, a]) with scalar product
Z a
hf, gi = dx f (x)g(x) . (3.18)
−a
Let us collect the formulae for the standard Fourier series. Every function f ∈ L2R ([−a, a]) can be expanded
as
∞ ∞ ∞ ∞
X X α0 1 X kπx 1 X kπx
f (x) = αk ck (x) + βk sk (x) = √ + √ αk cos +√ βk sin (3.19)
2a a a a a
k=0 k=1 k=1 k=1
where Ra
α0 = hc0 , f i = √1 dx f (x)
2aR −a
1 a
dx cos kπx
αk = hck , f i = √
a R−a a f (x) ,
k = 1, 2, . . . (3.20)
a
βk = hck , f i = √1 dx sin kπx f (x) ,
k = 1, 2, . . . .
a −a a
p √ √
As before, we introduce the re-scaled coefficients a0 = 2/a α0 , ak = αk / a and bk = βk / a, where
k = 1, 2, . . . to obtain these equations in the standard form
∞
a0 X kπx kπx
f (x) = + ak cos + bk sin , (3.21)
2 a a
k=1
where
a a a
1 1 kπx 1 kπx
Z Z Z
a0 = dx f (x) , ak = dx cos f (x) , bk = dx sin f (x) (3.22)
a −a a −a a a −a a
for k = 1, 2, . . .. Parseval’s equation now takes the form
∞ ∞ ∞
!
1 a 1 1 X |a0 |2 X
Z X
2 2 2 2
dx |f (x)| = k f k = |hck , f i| + |hsk , f i| = + (|ak |2 + |bk |2 ) . (3.23)
a −a a a 2
k=0 k=1 k=1
49
3.1.4 Complex standard Fourier series
By far the most elegant form of the Fourier series arises in the complex case, where we consider the Hilbert
space L2C ([−π, π]) with scalar product
Z π
hf, gi = dx f (x)∗ g(x) . (3.24)
−π
The functions
1
ek := √ exp(ikx) , k ∈ Z (3.25)
2π
form an ortho-normal system as verified in the following exercise.
Exercise 3.6. Show that the functions (ek )∞ 2
k=−∞ in Eq. (3.25) form an ortho-normal system on LC ([−π, π])
with scalar product (3.24).
The above functions form, in fact, an ortho-normal basis as stated in
Theorem 3.7. The functions (ek )∞ 2
k=−∞ in Eq. (3.25) form an ortho-normal basis of LC ([−π, π]).
Proof. Start with a function f ∈ L2C ([−π, π]) and decompose this function into real and imaginary parts,
so write f = fR + ifI . Since
Z π Z π Z π
∞ > k f k2 = dx|f (x)|2 = dx fR2 + dx fI2 (3.26)
−π −π −π
both fR and fI are real-valued square integrable functions and are, hence, elements of L2R ([−π, π]). This
means, from Theorem 3.5 that we can write down a standard real Fourier series for fR and fI . Inserting
these two real Fourier series into f = fR + ifI and replacing cos(kx) = (exp(ikx) + exp(−ikx))/2,
sin(kx) = (exp(ikx) − exp(−ikx))/(2i) proves the theorem.
The usual re-scaling x → πx/a leads to the Hilbert space L2C ([−a, a]) with scalar product
Z a
hf, gi = dx f (x)∗ g(x) . (3.27)
−a
where Z a
−ikπx
1
αk = hek , f i = √ dx exp f (x) . (3.30)
2a −a a
√
With the re-scaled Fourier coefficients ak = αk / 2a this turns into the standard form
Z a
−ikπx
X ikπx 1
f (x) = ak exp where ak = dx exp f (x) . (3.31)
a 2a −a a
k∈Z
50
3.1.5 Pointwise convergence
So far, our discussion of convergence for the Fourier series has been carried out with respect to the L2
norm (3.18). As emphasised, this type of convergence ensures that the difference of a function and its
Fourier series has a vanishing L2 norm but it does not necessarily imply that the Fourier series converges
to the function at every point x. The following theorem provides a statement about uniform convergence
of a Fourier series.
Theorem 3.8. Let f : [−a, a] → R or C be a (real or complex valued) function which is piecewise
continuously differentiable and which satisfies f (−a) = f (a). Then the (real or complex) Fourier series of
f converges to f uniformly.
Proof. For the proof see, for example, Ref. [10].
Recall from Def. 1.22 that uniform convergence implies point-wise convergence so under the conditions of
Theorem 3.8 the Fourier series of f converges to f at every point x ∈ [−a, a].
fHxL
3
0 x
-1
-2
-3
-15 -10 -5 0 5 10 15
Figure 3: Graph of the periodic functions f defined by f (x) = f (x + 2π) and f (x) = x for −π < x ≤ π.
f (x) = x , (3.33)
so a simple linear function on the interval [−π, π]. Of course, we can extend this to a periodic
function with period 2π whose graph is shown in Fig. 3. Since this function is anti-symmetric the
Fourier series of course only contains sine terms. (Alternatively, and equivalently, we can consider
this function restricted to the interval [0, π] and compute its sine Fourier series.) Using Eqs. (3.22)
we find for the Fourier coefficients
1 π 2(−1)k+1
Z
ak = 0 , k = 0, 1, 2, . . . , bk = dx x sin(kx) = , k = 1, 2, . . . . (3.34)
π −π k
51
fHxL
3
2.0
1.5 2
1.0 1
0.5 0 x
0.0 k -1
-0.5 -2
-1.0 -3
0 10 20 30 40 50 -3 -2 -1 0 1 2 3
Figure 4: Fourier coefficients and Fourier series for the linear function f in Eq. (3.33). The left figure shows
the Fourier coefficients ak from Eq. (3.34) for k = 1, . . . , 50. The function f together with the first six partial
sums of its Fourier series (3.36) is shown in the right figure.
As a practical matter, it is useful to structure the calculation of Fourier coefficients in order to avoid
mistakes. Creating pages and pages of integration performed in small steps is neither efficient nor likely
to lead to correct answers. Instead, separate the process of integration from the specific calculation of
Fourier coefficients. A particular Fourier calculation
R often involves certain types of standard integrals.
In the above case, these are integrals of the form dx x sin(αx) for a constant α. Find these integrals
first (or simply look them up):
x cos(αx) sin(αx)
Z
dx x sin(αx) = − + . (3.35)
α α2
Then apply this general result to the particular calculation at hand, that is, in the present case, set
α = k and put in the integration limits.
Inserting the above Fourier coefficients into Eq. (3.21), we get the Fourier series
∞
X (−1)k+1
f (x) = 2 sin(kx) . (3.36)
k
k=1
Recall that the equality in Eq. (3.36) is not meant point-wise for every x but as an equality in
L2R ([−π, π]), that is, the difference between f and its Fourier series has length zero with respect to
the norm on L2R ([−π, π]). In fact, Eq. (3.36) shows (and Fig. 4 illustrates) that the Fourier series of f
vanishes at ±π (since every term in the series (3.36) vanishes at ±π) while f (±π) = ±π is non-zero.
So we have an example where the Fourier series does not converge to the function at every point. In
fact, the present function f violates the conditions of Theorem 3.8 (since f (π) 6= f (−π)), so there is
no reason to expect point-wise convergence. It is clear from Fig. 4 that the Fourier series “struggles”
to reproduce the function near ±π and this can be seen as the intuitive reason for the slow drop-off
of the Fourier coefficients, ak ∼ 1/k, in Eq. (3.34). In other words, a larger number of terms in the
Fourier series contribute significantly so that the function can be matched near the boundaries of the
interval [−π, π].
52
For this example, let us consider Parseval’s equation (3.23)
π ∞ ∞
2π 2 1 1
Z X X
2 2
= dx x = |bk | = 4 , (3.37)
3 π −π k2
k=1 k=1
where the left hand side follows from explicitly carrying out the normalisation integral and the right
hand side by inserting the Fourier coefficients (3.34). This leads to the interesting formula
∞
π2 X 1
= . (3.38)
6 k2
k=1
fHxL
3.0
2.5
2.0
1.5
1.0
0.5
0.0
-15 -10 -5 0 5 10 15
Figure 5: Graph of the periodic functions f defined by f (x) = f (x + 2π) and f (x) = |x| for −π < x ≤ π.
fHxL
3.0
3
2.5
2
2.0
1 1.5
1.0
0
0.5
-1
0.0
0 10 20 30 40 50 -3 -2 -1 0 1 2 3
Figure 6: Fourier coefficients and Fourier series for the modulus function f in Eq. (3.39). The left figure shows
the Fourier coefficients ak from Eq. (3.40) for k = 1, . . . , 50. The function f together with the first six partial
sums of its Fourier series (3.42) is shown in the right figure.
53
Application 3.8. Modulus function
Our next example is the modulus function f : [−π, π] → R defined by
Extended to a periodic function with period 2π its graph is shown in Fig. 5. Since this function is
symmetric the Fourier series of course only contains cosine terms. (Alternatively, and equivalently,
we can consider this function restricted to the interval [0, π] and compute its cosine Fourier series.)
Using Eqs. (3.22) we find for the Fourier coefficients
2 (−1)k − 1
1 π 1 π
Z Z
a0 = dx |x| = π , ak = dx |x| cos(kx) = , bk = 0 , (3.40)
π −π π −π πk 2
cos(αx) x sin(αx)
Z
dx x cos(αx) = + , (3.41)
α2 α
where α 6= 0. Note that the value obtained for a0 does not follow by inserting k = 0 into the general
formula for ak - in fact, doing this leads to an undefined expression. This observation points to a
general rule. Sometimes Fourier coefficients, calculated for generic values of k ∈ N, become apparently
singular or undefined for specific k values. Of course Fourier coefficients must be well-defined, so this
indicates a break-down of the integration method which occurs for those specific values of k. In such
cases, the integrals for the problematic k-values should be carried out separately and, with the correct
integration applied, this will lead to well-defined answers. In the present case, the special case arises
because the standard integral (3.41) is only valid for α 6= 0.
The Fourier series from the above coefficients is given by
π 4 X cos(kx)
f (x) = − . (3.42)
2 π k2
k=1,3,5,...
The Fourier coefficients ak and the first few partial sums of the above Fourier series are shown in
Fig. 6. The Fourier coefficients drop off as ak ∼ 1/k 2 , so more quickly as in the previous example, and
convergence of the Fourier series is more efficient. A related observation is that the function (3.39)
satisfies all the conditions of Theorem 3.8 and, hence, its Fourier series converges uniformly (and
point-wise) to f . Fig. (6) illustrates this convincingly.
The periodically continued version of this function is shown in Fig. 7. Since f is an anti-symmetric
function, the Fourier series only contains sine terms. (Alternatively and equivalently, we can think
of f as a function on the [0, π] and work out the sine Fourier series.) For the Fourier coefficients we
54
fHxL
1.0
0.5
0.0 x
-0.5
-1.0
-15 -10 -5 0 5 10 15
Figure 7: Graph of the periodic functions f defined by f (x) = f (x + 2π) and f (x) = sign(x) for −π < x ≤ π.
fHxL
1.4
1.0
1.2
1.0
0.5
0.8
0.0 x
0.6
-0.5
0.4
0.2 -1.0
0.0
0 10 20 30 40 50 -3 -2 -1 0 1 2 3
Figure 8: Fourier coefficients and Fourier series for the sign function f in Eq. (3.43). The left figure shows
the Fourier coefficients ak from Eq. (3.44) for k = 1, . . . , 50. The function f together with the first six partial
sums of its Fourier series (3.45) is shown in the right figure.
have
2 (−1)k − 1
π
1
Z
ak = 0 , bk = dx sign(x) sin(kx) = − , (3.44)
π −π πk
for k = 1, 2, . . . which leads to the Fourier series
4 X sin(kx)
f (x) = − . (3.45)
π k
k=1,3,5,...
The Fourier coefficients bk and the first few partial sums of the Fourier series are shown in Fig. 8.
As for example 1, the function f does not satisfy the conditions of Theorem 3.8 and the Fourier
series does not converge everywhere point-wise to the function f . Specifically, while the Fourier series
always vanishes at x = ±π the function value f (±π) = ±1 is non-zero. Related to this is the slow
drow-off, ak ∼ 1/k, of the Fourier coefficients.
55
fHxL
0 x
-5
-10
-3 -2 -1 0 1 2 3
fHxL
3 2.0
1.5
2 5
1.0
1
0.5
0 x
0
0.0
-1 -0.5
-5
-2 -1.0
0 10 20 30 40 50 0 10 20 30 40 50 -3 -2 -1 0 1 2 3
Figure 10: Fourier coefficients and Fourier series for the function f in Eq. (3.46). The left figure shows the
Fourier coefficients ak and the middle figure the coefficients bk from Eq. (3.47). The function f together with
the first six partial sums of its Fourier series (3.48) is shown in the right figure.
with graph as shown in Fig. 9. Since this function is neither symmetric nor anti-symmetric we expect
both sine and cosine terms to be present. For the Fourier coefficients is follows
1
Rπ 4(−1)k (k2 +25)
ak = π −π dx f (x) cos(kx) = − (k2 −25)2
, k = 0, . . . , 4, 6, . . .
π2
Rπ
a5 = 1 1
dx f (x) cos(5x) = 50 + 3 (3.47)
π
1
R−π
π 2(−1)k
bk = π −π dx f (x) sin(kx) = − k k = 1, 2, . . . .
Note that inserting k = 5 into the first expression for ak leads to a singularity and, hence, the generic
integration method has broken down in this case. (This is, of course, related to the presence of cos(5x)
in f .) For this reason, we have carried out the calculation of a5 separately. The resulting Fourier
56
series is
X (−1)k k 2 + 25
∞
π2 (−1)k
1 X
f (x) = −4 cos(kx) + −2 cos(5x) + sin(kx) . (3.48)
k6=5
(k 2 − 25)2 50 3
k=1
k
The corresponding plots for the coefficients ak , bk and the partial Fourier series are shown in Fig. 10.
Again this is a function which violates the conditions of Theorem 3.8 and where the Fourier series
does not converge to the function everywhere. A new feature is the structure in the Fourier coefficients
ak , as seen in the left Fig. 10. The Fourier mode a5 (and, to a lesser extent, that of a4 and a6 ) is
much stronger than the other modes ak and this is of course related to the presence of cos(5x) in the
function f . Intuitively, the Fourier series detects the strength with which frequencies k are contained
in a function f and the presence of cos(5x) in f suggests a strong contribution for k = 5.
π 2 X exp(ikx)
f (x) = − . (3.50)
2 π odd
k2
k∈Z
which we could have also inferred from the real Fourier series (3.42) by simply replacing cos(kx) =
(exp(ikx) + exp(−ikx))/2.
57
Definition 3.1. For functions f ∈ L1C (Rn ) we define the Fourier transform Ff = fˆ : Rn → C by
1
Z
ˆ
f (k) = F(f )(k) := dn x exp(−ix · k)f (x) . (3.51)
(2π)n/2 Rn
kf k1
Clearly, F is a linear operator, that is F(αf + βg) = αF(f ) + βF(g). Also note that |fˆ(k)| ≤ (2π)n/2
,
so the modulus of the Fourier transform is bounded. With some more effort it can be shown that fˆ is
continuous. However, it is not clear that the Fourier transform fˆ is an element of L1C (Rn ) as well and,
it turns out, this is not always the case. (See Example 3 below.) We will rectify this later by defining a
version of the Fourier transform which provides a map L2C (Rn ) → L2C (Rn ).
Before we compute examples of Fourier transforms it is useful to look at some of its general properties.
Recall from Section 2.3 the translation operator Ta , the modulation operator Eb and the dilation operator
Dλ , for a, b ∈ Rn and λ ∈ R, defined by
which we can also think of as maps L1C (Rn ) → L1C (Rn ). For any function g : Rn → C, we also have the
multiplication operator
Mg (f )(x) := g(x)f (x) . (3.53)
∂
It is useful to work out how these operators as well as derivative operators Dxj := ∂xj relate to the Fourier
transform.
Proposition 3.1. (Some elementary properties of the Fourier transform) For f ∈ L1C (Rn ) we have
\
(F1) T ˆ
a (f ) = E−a (f ) or, equivalently, F ◦ Ta = E−a ◦ F
\
(F2) E ˆ
b (f ) = Tb (f ) or, equivalently, F ◦ Eb = Tb ◦ F
\
(F3) D λ (f ) =
1 ˆ
n D1/λ (f ) or, equivalently, F ◦ Dλ =
1
n D1/λ ◦ F
|λ| |λ|
For f ∈ Cc1 (Rn ) we have
[
(F4) D ˆ
xj f (k) = ikj f (k) or, equivalently, F ◦ Dxj = Mikj ◦ F
(F5) x
d ˆ
j f (k) = iDk f or, equivalently, F ◦ Mx = iDk ◦ F.
j j j
Exercise 3.9. Proof (F2), (F3), (F4) and (F5) from Prop. 3.1.
3.2.2 Convolution
Another operation which relates to Fourier transforms in an interesting way is the convolution f ? g of two
functions f, g ∈ L1 (Rn ) which is defined as
Z
(f ? g)(x) := dy n f (y)g(x − y) . (3.55)
Rn
58
From a mathematical point of view, we have the following statement about convolutions.
Theorem 3.11. (Property of convolutions) For f, g ∈ L1 (Rn ) the convolution f ? g is well-defined and
f ? g ∈ L1 (Rn ).
Proof. For the proof see, for example, Ref. [10].
How can the convolution be understood intuitively? From the integral (3.55) we can say that the convo-
lution is “smearing” the function f by the function g. For example, consider choosing f (x) = cos(x) and
1
g(x) = 2a for x ∈ [−a, a] , (3.56)
0 for |x| > a
for any a > 0. The function g is chosen so that, upon convolution, it leads to a smearing (or averaging)
of the function f over the interval [x − a, x + a] for every x. An explicit calculation shows the convolution
is given by Z x+a
1 sin(a)
(f ? g)(x) = dy cos(y) = cos(x) . (3.57)
2a x−a a
If we consider the limit a → 0, so the averaging width goes to zero, we find that f ? g = f so f remains
unchanged, as one would expect. The other extreme would be to choose a = π in which case f ? g = 0. In
this case, the averaging is over a period [x − π, x + π] of the cos so the convoluted function vanishes for
every x. For other values of a the convolution is still a cos function but with a reduced amplitude, as one
would expect from a local averaging.
The relationship between convolutions and Fourier transforms is stated in the following Lemma.
1
Z
f[
? g(k) = dxn dy n f (y)g(x − y)e−ix·k (3.58)
(2π)n/2
1
Z
n n −iy·k −i(x−y)·k
= dx dy f (y)e g(x − y)e (3.59)
(2π)n/2
1
Z Z
z=x−y
= dy n f (y)e−iy·k dz n g(z)e−iz·k = (2π)n/2 fˆ(k)ĝ(k) . (3.60)
(2π)n/2
In other words, the Fourier transform of a convolution is (up to a constant) the product of the two Fourier
transforms. This rule is often useful to work out new Fourier transforms from given ones.
59
signal f . Suppose that f is the signal from a single piano tone with frequency ω0 . In this case, we expect
fˆ to have a strong peak around ω = ω0 . However, a piano tone also contains overtones with frequencies
qω0 , where q = 2, 3, . . .. This means we expect fˆ to have smaller peaks around ω = qω0 . (Their height
decreases with increasing q and exactly what the pattern is determines how the tone “sounds”.) Let us
consider this in a more quantitative way.
where A, γ and ω0 are real, positive constants. Using the above sound analogy, we can think of this
function as representing a sound signal with onset at t = 0, overall amplitude A, frequency ω0 and a
decay time of ∼ 1/γ. Inserting into Eq. (3.61), we find the Fourier transform
Z ∞
ˆ [ [ 1
f (ω) = A fω0 ,γ (ω) , fω0 ,γ (ω) = dt e−γt e−i(ω−ω0 )t = . (3.63)
0 i(ω0 − ω) − γ
and this corresponds to a peak with width ∼ γ around ω = ω0 . The longer the tone, the smaller
γ and the smaller the width of this peak. Note that this is precisely in line with our expectation.
The original signal contains a strong component with frequency ω0 which corresponds to the peak
of the Fourier transform around ω0 . However, there is a spectrum of frequencies around ω0 and this
captures the finite decay time ∼ 1/γ of the signal. The longer the signal the closer it is to a pure
signal with frequency ω0 and the narrower the peak in the Fourier transform.
We can take the sound analogy further by considering
X
f= Aq fqω0 ,γq , (3.65)
q=1,2...
where the function fqω0 ,γq is defined in Eq. (3.62). This represents a tone with frequency ω0 , together
with its overtones with amplitudes Aq , frequencies qω0 and decay constants γq . The Fourier transform
of f is easily computed from linearity:
Aq
fˆ(ω) = Aq f\
X X
qω0 ,γq (ω) = . (3.66)
i(qω0 − ω) − γq
q=1,2,... q=1,2,...
This corresponds to a sequence of peaks at frequencies qω0 , where q = 1, 2, . . . which reflects the main
frequency of the tone, together with its overtone frequencies.
60
Another interesting example to consider is the Fourier transform of the one-dimensional Gaussian
2 /2
f (x) = e−x . (3.67)
This result means that the Gaussian is invariant under Fourier transformation. Without much effort,
this one-dimensional result can be generalised to the n-dimensional width one Gaussian
2 /2
f (x) = e−|x| . (3.69)
Its Fourier transform can be split up into a product of n one-dimensional Fourier transforms as
n n
1 1
Z Z
2 /2−ik·x 2 2 2
fˆ(k) =
Y Y
dxn e−|x| = √ dxi e−xi /2−iki xi = e−ki /2 = e−|k| /2 , (3.70)
(2π)n/2 Rn i=1
2π R i=1
and the one-dimensional result (3.68) has been used in the second-last step. Hence, the n-dimensional
width one Gaussian is also invariant under Fourier transformation.
We would like to work out the Fourier transform of a more general Gaussian with width a > 0,
given by
|x|2
fa (x) = e− 2a2 = D1/a (f )(x) . (3.71)
where f is the Gaussian (3.69) with width one and D is the dilation operator defined in Eq. (3.52).
The fact that this can be expressed in terms of the dilation operator makes calculating the Fourier
transform quite easy, using the property (F3) in Lemma 3.1.
In the last step the result (3.70) for the Fourier transform fˆ of the width one Gaussian has been used.
In conclusion, the Fourier transform of a Gaussian with width a is again a Gaussian with width 1/a.
Finally, we consider a Gaussian with width a and center shifted from the origin to a point c ∈ Rn
given by
|x − c|2
fa,c (x) = exp − = Tc (fa )(x) . (3.73)
2a2
Note that this can be written in terms of the zero-centred Gaussian using the translation opera-
tor (3.52) and we can now use property (F1) in Lemma 3.1 to work out the Fourier transform.
fd \ n −ic·k−a2 |k|2 /2
a,c (k) = Tc (fa )(k) = E−c (fa )(k) = a e
b . (3.74)
61
ΧHxL fHxL
2.0 2.0
1.5 1.5
1.0 1.0
0.5 0.5
0.0 0.0
χ?χ
-4 -2 0 2 4 −→ -4 -2 0 2 4
Figure 11: The graph of the characteristic function χ of the interval [−1, 1] (left) and the graph of the
convolution f = χ ? χ (right).
` `
Χ HxL f HxL
1.5 1.5
1.0 1.0
0.5 0.5
0.0 x 0.0 x
χ?χ
-15 -10 -5 0 5 10 15 −→ -15 -10 -5 0 5 10 15
Exercise 3.13. Show that the Fourier transform (3.76) of the characteristic function χ is not in
L1 (R). (Hint: Find a lower bound for the integral over | sin k/k| from (m − 1)π to mπ.) Use the
dilation operator to find the Fourier transform of the characteristic function χa for the interval [−a, a].
We can use this example as an illustration of convolutions and their application to Fourier transforms.
62
Consider the convolution f = χ ? χ of χ with itself which is given by
Z
f (x) = dy χ(y)χ(y − x) = max(2 − |x|, 0) . (3.77)
R
The graphs of χ and its convolution f = χ?χ are shown in Fig. 11. From the convolution theorem 3.11
and the Fourier transform χ̂ in Eq. (3.76) we have
√
r
2 sin2 k
fˆ(k) = χ[? χ(k) = 2π χ̂2 (k) = 2 . (3.78)
π k2
Fig. 12 shows the graphs for the Fourier transforms χ̂ and fˆ.
1
Z
f (x) = dk n fˆ(k)eik·x , (3.79)
(2π)n/2 Rn
almost everywhere, that is for all x ∈ Rn except possibly on a set of Lebesgue measure zero.
Proof. The proof is somewhat technical (suggested by the fact that equality can fail on a set of measure
zero) and can, for example, be found in Ref. [10].
Note that the inversion formula (3.79) is very similar to the original definition (3.51) of the Fourier
transform, except for the change of sign in the exponent. It is, therefore, useful to introduce the linear
operator
1
Z
ˆ
F̃(f )(x) := dk n fˆ(k)eik·x (3.80)
(2π)n/2 Rn
for the inverse Fourier transform. With this terminology, the statement of Theorem 3.14 can be expressed
as
F̃ ◦ F(f ) = f ⇒ F ◦ F̃(f ) = f . (3.81)
Exercise 3.15. Show that the equation on the RHS of (3.81) does indeed follow from the equation on the
LHS. (Hint: Think about complex conjugation.)
Theorem 3.14 also means that a function f is uniquely (up to values on a measure zero set) determined
by its Fourier transform fˆ.
Lemma 3.1. If f ∈ Ccn+1 (Rn ) then the Fourier transform fˆ is integrable, that is, fˆ ∈ L1 (Rn ).
63
[
Proof. Property (F4) in Lemma 3.1 states that D ˆ
xj f (k) = ikj f (k) which implies that
Differentiating and applying this rule repeatedly, we conclude that there is a constant K such that
n
!n+1
|fˆ(k)| ≤ K
X
1+ |ki | (3.83)
i=1
The next Lemma explores the relationship between the Fourier transform and the standard scalar product
on L2 (Rn ).
Lemma 3.2. (a) Let f, g ∈ L1 (Rn ) with Fourier transforms fˆ and ĝ. Then fˆg and f ĝ are integrable and
we have Z Z
n ˆ
dx f (x)g(x) = dxn f (x)ĝ(x) (3.84)
Rn Rn
Proof. (a) Since fˆ, ĝ are bounded and continuous, fˆg and f ĝ are indeed integrable. It follows
1
Z Z Z
n
dx f (x)ĝ(x) = n n
dx dy f (x)g(y)e −ix·y
= dy n fˆ(y)g(y) . (3.86)
(2π)n/2
(b) For h, g ∈ Cc∞ (Rn ) we have, from Lemma 3.1, that ĥ, ĝ ∈ L1 (Rn ). Then, we can apply part (a) to get
Z Z Z
hĥ, ĝi = dx F̃(h̄)(x)ĝ(x) = dx F ◦ F̃(h̄)(x)g(x) = dxn h̄(x)g(x) = hh, gi .
n n
(3.87)
To extend this statement to L1 (Rn ) ∩ L2 (Rn ) we recall from Theorem 2.1 that Cc∞ (Rn ) is dense in this
space. We can, therefore, approximate functions f, g ∈ L1 (Rn )∩L2 (Rn ) by sequences (fk ), (gk ) in Cc∞ (Rn ).
We have already shown that the property (3.85) holds for all fk , gk and, by taking the limit k → ∞ through
the scalar product it follows for f, g. In particular, taking f = g, it follows that k f k2 = k fˆ k2 which
shows that fˆ ∈ L2 (Rn ).
Clearly, Eq. (3.85) is a unitarity property of the Fourier transform, relative to the standard scalar product
on L2 (Rn ). However, to make this consistent, we have to extend the Fourier transform to all of L2 (Rn )
and this is the content of the following theorem.
Theorem 3.16. (Plancherel) There exist a vector space isomorphism T : L2 (Rn ) → L2 (Rn ) with the
following properties:
(a) hT (f ), T (g)i = hf, gi for all f, g ∈ L2 (Rn ). This implies k T (f ) k = k f k for all f ∈ L2 (Rn )
(b) T (f ) = F(f ) for all f ∈ L1 (Rn ) ∩ L2 (Rn )
(c) T −1 (f ) = F̃(f ) for all f ∈ L1 (Rn ) ∩ L2 (Rn )
64
Proof. Since L1 (Rn ) ∩ L2 (Rn ) is dense in L2 (Rn ) we can find, for every f ∈ L2 (Rn ), a sequence (fk ) in
L1 (Rn ) ∩ L2 (Rn ) which converges to f in the norm k · k2 . We set
From Lemma 3.2 (b) the scalar product is preserved for F on L1 (Rn ) ∩ L2 (Rn ) and by taking the limit
through the scalar product, the same property follows for the operator T on L2 (Rn ). If T (f ) = 0 we have
0 = k T (f ) k2 = k f k2 and, hence, f = 0. This means that T is injective. From Theorem 3.14 we have
F ◦ F̃(f ) = f so that Cc∞ (Rn ) ⊂ Im(T ). For a g ∈ L2 (Rn ) we pick a sequence (gk = F(fn )) in Cc∞ (Rn )
which converges to g. For f = limk→∞ fk we then have T (f ) = limk→∞ fˆk = limk→∞ gk = g so that T is
surjective.
It follows that the extension of the Fourier transform to L2 (Rn ) is a unitary linear operator, that is,
an operator which preserves the value of the scalar product on L2 (Rn ). Let us illustrate this with an
example.
2 sin2 x
Z
2 2
2 = k χ k = k χ̂ k = dx . (3.89)
π R x2
While the norm k χ k is easily worked out the same cannot be said for the integral on the RHS, so
unitarity of the Fourier transform can lead to non-trivial statements.
Exercise 3.17. Show that the Gaussian fa with width a in Eq. (3.71) has the same L2 (Rn ) norm as its
Fourier transform fˆa .
65
4 Orthogonal polynomials
In the previous section, we have discussed the Fourier series and have found a basis for the Hilbert space
L2 [−a, a] in terms of sine and cosine functions. Of course we know from the Stone-Weierstrass theorem 1.35
combined with Theorem 2.1 that the polynomials are dense in the Hilbert space L2 ([a, b]) (and we have
used this for some of the proofs related to the Fourier series). So, rather than using relatively complicated,
transcendental functions such as sine and cosine as basis functions there is a much simpler possibility: a
basis for L2 ([a, b]) which consists of polynomials. Of course we would want this to be an ortho-normal basis
relative to the standard scalar product (1.101) on L2 ([a, b]). A rather pedestrian method to find ortho-
normal polynomials is to start with the monomials (1, x, x2 , . . .) and apply the Gram-Schmidt procedure.
Exercise 4.1. For the Hilbert space L2 ([−1, 1]), apply the Gram-Schmidt procedure
q to the monomials
q
2 1
1, x, x and show that this leads to the ortho-normal system of polynomials √2 , 2 x, 58 (3x2 − 1).
3
The polynomials in Exercise 4.1 obtained in this way are (proportional to) the first three Legendre poly-
nomials which we will discuss in more detail soon. Evidently, the Gram-Schmidt procedure, while con-
ceptually very clear, is not a particularly efficient method in this context. We would like to obtain concise
formulae for orthogonal polynomials at all degrees. There is also an important generalisation. In addition
to finite intervals, we would also like to allow semi-infinite or infinite intervals, so we would like to allow
a = −∞ or b = ∞. Of course for such a semi-infinite or infinite interval, polynomials do not have a finite
norm relative to the standard scalar product (1.101) on L2 ([a, b]). To rectify this, we have to consider the
Hilbert spaces L2w ([a, b]) with an integrable weight function w and a scalar product defined by
Z b
hf, gi = dx w(x)f (x)g(x) , (4.1)
a
and choose w appropriately. Thinking about the types of intervals, that is, finite intervals [a, b], semi-
infinite intervals [a, ∞] or an infinite interval [−∞, ∞] and corresponding suitable weight functions w
will lead to a classification of different types of orthogonal polynomials which we discuss in the following
subsection. For the remainder of the section we will be looking at various entries in this classification,
focusing on the cases which are particularly relevant to applications in physics.
for positive numbers hn . (By convention, the standard orthogonal polynomials are not normalised to one,
hence the constants hn .) We also introduce the notation
that is we call kn 6= 0 the coefficient of the leading monomial xn and kn0 the coefficient of the sub-leading
monomial xn−1 in Pn . An immediate consequence of orthogonality is the relation
hPn , pi = 0 (4.4)
66
for any polynomial p with degree less than n (This follows because such a p can be written as a linear
combination of P0 , . . . , Pn−1 which must be orthogonal to Pn .) Furthermore, Pn is (up to an overall
constant) uniquely characterised by this property.
for n = 1, 2, . . ., where
0
kn+1 k0
kn+1 An hn
An = , Bn = An − n , Cn = . (4.6)
kn kn+1 kn An−1 hn−1
Proof. We start by considering the polynomial Pn+1 − An xPn which (due to the above definition of An )
is of degree n, rather than n + 1, and can, hence, be written as
n
X
Pn+1 − An xPn = αi Pi , (4.7)
i=0
for some αi ∈ R. Taking the inner product of this relation with Pk immediately leads to αk = 0 for
k = 0, . . . , n − 2. This means we are left with
and it remains to be shown that bn = Bn and cn = Cn . The first of these statements follows very easily
by inserting the expressions (4.3) into Eq. (4.8) and comparing coefficients of the xn term. To fix cn we
write Eq. (4.8) as
cn Pn−1 = −Pn+1 + An xPn + Bn Pn (4.9)
and take the inner product of this equation with Pn−1 . This leads to
cn hn−1 = cn k Pn−1 k2 = −hPn+1 , Pn−1 i + An hxPn , Pn−1 i + Bn hPn , Pn−1 i = An hPn , xPn−1 i
An kn−1 An hn
= hPn , Pn i =
kn An−1
and the desired result cn = Cn follows.
67
Of course we do not know whether these functions are polynomials. Whether they are depends on the
choice of weight function w and we will come back to this point shortly. But for now, let us assume that
w is such that the Fn are polynomials of degree n. For any polynomial p of degree n − 1 we then have
Z b Z b
dn dn p
hFn , pi = dx n (w(x)X n ) p(x) = (−1)n dx w(x)X n n (x) = 0 , (4.11)
a dx a dx
where we have integrated by parts n times. We recall that the orthogonality property (4.4) determines
Pn uniquely (up to an overall constant) and since Fn has the same property we conclude there must be
constants Kn such that
Fn = Kn Pn . (4.12)
This calculation shows the idea behind the definition (4.10) of the functions Fn . The presence of the
derivatives (and of X which ensures vanishing of the boundary terms) means that, provided the Fn are
polynomials of degree n, they are orthogonal.
Theorem 4.3. (Rodriguez formula) If the functions Fn defined in Eq. (4.10) are polynomials of degree n
they are proportional to the orthogonal polynomials Pn , so we have constants Kn such that Fn = Kn Pn .
It follows that
1 dn (b − x)(a − x) for |a|, |b| < ∞
n
Pn (x) = (w(x)X ) , X= x−a for |a| < ∞ , b = ∞ (4.13)
Kn w(x) dxn
1 for −a = b = ∞
1 d 1 ! w0 (x) α β
F1 (x) = (w(x)X) = w0 (x)X + X 0 = Ax + B ⇒ = + , (4.14)
w(x) dx w(x) w(x) x−b x−a
for suitable constants A, B, α, β. Solving the differential equation leads to
and we can set C = 1 by a re-scaling of coordinates. Further, since w needs to be integrable we have to
demand that α > −1 and β > −1. Conversely, it can be shown by calculation that for any such choice of
w the functions Fn are indeed polynomials of degree n.
For the case |a| < ∞ and b = ∞ of the half-infinite interval we can proceed analogously and find that
the Fn are polynomials of degree n iff
68
[a, b] α, β X w(x) name symbol
(α,β)
[−1, 1] α > −1, β > −1 x2 −1 (1 − x)α (x + 1)β Jacobi Pn
2 (α,α)
[−1, 1] α = β > −1 x −1 (1 − x) (x + 1)α
α Gegenbauer Pn
(±)
[−1, 1] α = β = ± 12 x2 − 1 (1 − x)±1/2 (x + 1)±1/2 Chebyshev Tn
[−1, 1] α=β=0 x2 − 1 1 Legendre Pn
(α)
[0, ∞] α > −1 x e−x xα Laguerre Ln
[0, ∞] α=0 x e−x Laguerre Ln
2
[−∞, ∞] 1 e−x Hermite Hn
Table 1: The types of orthogonal polynomials and several sub-classes which result from the classification
in Theorem 4.4. The explicit polynomials are obtained by inserting the quantities in the Table into the
Rodriguez formula (4.18).
where α > −1 and β > −1. In this case the Fn are orthogonal and Fn = Kn Pn for constants Kn .
This theorem implies a classification of orthogonal polynomials in terms of the type of interval, the limits
[a, b] of the interval and the powers α and β which enter the weight function. (Of course a finite interval
[a, b] can always be re-scaled to the standard interval [−1, 1] and a semi-infinite interval [a, ∞] to [0, ∞].)
The different types and important sub-classes of orthogonal polynomials which arise from this classification
are listed in Table 4.1.4. We cannot discuss all of these types in detail but in the following we will focus
on the Legendre, the α = 0 Laguerre and the Hermite polynomials which are the most relevant ones
for applications in physics. Before we get to this we should derive more common properties of all these
orthogonal polynomials.
69
d
Proof. For ease of notation we abbreviate D = dx and evaluate Dn+1 (XD(wX n )) in two different ways,
remembering that X is a polynomial of degree at most two.
1
Dn+1 (XD(wX n )) = XDn+2 (wX n ) + (n + 1)X 0 Dn+1 (wX n ) + n(n + 1)X 00 Dn (wX n )
2
2 0 1 00
= Kn XD (wPn ) + (n + 1)X D(wPn ) + n(n + 1)X wPn
2
n+1 n n+1 n−1 n 0
D (XD(wX )) = D XD(wX)X + (n − 1)wX X
= Kn (K1 P1 + (n − 1)X 0 )D(wPn ) + (n + 1)(k1 K1 + (n − 1)X 00 )wPn .
Equating these two results and replacing y = Pn gives after a straightforward calculation
00 0 0 0 00 0 0 1 00
wXy +(2Xw +2wX −wK1 P1 )y + Xw + (2X − K1 P1 )w − (n + 1) k1 K1 + (n − 2)X w y = 0 .
2
By working out D(wX) and D2 (wX) one easily concludes that
w0 w00 w0
X = K1 P1 − X 0 , X = (K1 P1 − 2X 0 ) + k1 K1 − X 00 . (4.21)
w w w
Using these results to replace w0 in the factor of y 0 and w00 in the factor of y in the above differential
equation we arrive at the desired result.
The above theorem means that every function f ∈ L2w ([a, b]) can be expanded as
∞
X
f= hP̂n , f i P̂n , (4.24)
n=0
70
This is of course completely in line with the general idea of expanding vectors in terms of an ortho-normal
basis on a Hilbert space and it can be viewed as the polynomial analogue of the Fourier series.
We should now discuss the most important orthogonal polynomials in more detail.
where we have integrated by parts n times. This means the associated basis of ortho-normal polynomials
on L2 ([−1, 1]) is r
2n + 1
P̂n = Pn , (4.30)
2
and functions f ∈ L2 ([−1, 1]) can be expanded as
∞
X
f= hP̂n , f iP̂n , (4.31)
n=0
or, more explicitly, shifting the normalisation factors into the integral, as
∞ 1
2n + 1
X Z
f (x) = an Pn (x) , an = dx Pn (x)f (x) . (4.32)
2 −1
n=0
Such expansions are useful and frequently appear when spherical coordinates are used and we have the
standard inclination angle θ ∈ [0, π]. In this case, the Legendre polynomials are usually a function of
x = cos θ which takes values in the required range [−1, 1]. We will see more explicit examples of this
shortly.
With the above results it is easy to compute the constants An , Bn and Cn which appear in the general
recursion formula (4.5) and we find
2n + 1 n
An = , Bn = 0 , Cn = . (4.33)
n+1 n+1
71
Using these values to specialise Eq. (4.5) we find the recursion formula
for the Legendre polynomials. From the Rodriguez formula (4.26) we can easily compute the first few
Legendre polynomials:
1 1 1
P0 (x) = 1, P1 (x) = x, P2 (x) = (3x2 − 1), P3 (x) = (5x3 − 3x), P4 (x) = (35x4 − 30x2 + 3) . (4.35)
2 2 8
Exercise 4.7. Verify that the first four Legendre polynomials in Eq. (4.35) are orthogonal and are nor-
malised as in Eq. (4.29).
We can insert the results X = x2 − 1, X 00 = 2, P1 (x) = x, K1 = 2 and k1 = 1 into the general differential
equation (4.20) to obtain
(1 − x2 )y 00 − 2xy 0 + n(n + 1)y = 0 . (4.36)
This is the Legendre differential equations which all Legendre polynomials Pn satisfy.
Exercise 4.8. Show that the first four Legendre polynomials in Eq. (4.35) satisfy the Legendre differential
equation (4.36).
Another feature of orthogonal polynomials is the existence of a generating function G = G(x, z) defined
as
X∞
G(x, z) = Pn (x)z n (4.37)
n=0
The generating function encodes all orthogonal polynomials at once and the nth one can be read off as the
coefficient of the z n term in the expansion of G. Of course for this to be of practical use we have to find
another more concise way of writing the generating function. This can be obtained from the recursion
relation (4.34) which leads to
∞ ∞ ∞
∂G X X X
= Pn (x)nz n−1 = (n + 1)Pn+1 (x)z n = [(2n + 1)xPn (x) − nPn−1 (x)] z n
∂z
n=1 n=0 n=0
∞ ∞ ∞
X X X ∂G
= 2xz Pn (x)nz n−1 + x Pn (x)z n − Pn (x)(n + 1)z n+1 = (2xz − z 2 ) + (x − z)G ,
∂z
n=1 n=0 n=0
This provides us with a differential equation for G whose solution is G(x, z) = c(1 − 2xz + z 2 )−1/2 , where
c is a constant. Since c = G(x, 0) = P0 (x) = 1, we have
∞
1 X
G(x, z) = √ = Pn (x)z n . (4.38)
1 − 2xz + z 2 n=0
Exercise 4.9. Check that the generating function (4.38) leads to the correct Legendre polynomials Pn , for
n = 0, 1, 2.
Note that Eq. (4.38) can be viewed as an expansion of the generating function G in the sense of Eq. (4.32),
with expansion coefficients an = z n .
72
Application 4.16. Expanding the Coulomb potential
An important application of the above generating function is to the expansion of a Coulomb potential
term of the form
1
V (r, r0 ) = , (4.39)
|r − r0 |
0
where r, r0 ∈ R3 . Introducing the radii r = |r|, r0 = |r0 |, the angle cos θ = r·r
rr0 and setting x = cos θ
r0
and z = r we can use the generating function to re-write the above Coulomb term as
∞ 0 n
0 1 1 1 1X r
V (r, r ) = q = G(x, z) = Pn (cos θ) . (4.40)
r r 0 r0 2
r r r
1−2 r cos θ + r
n=0
The series on the RHS converges for r0 < r. If r0 > r we can write a similar expansion with the role of
r and r0 exchanged. This formula is very useful in the context of multipole expansions, for example in
electromagnetism. The nth term in this expansions falls off as r−(n+1) , so the n = 0 term corresponds
to the monopole contribution, the n = 1 term to the dipole, the n = 2 term to the quadrupole etc.
1 dl+m
Plm (x) = l
(1 − x2 )m/2 l+m (x2 − 1)l , (4.41)
2 l! dx
where l = 0, 1, . . . and m = −l, . . . , l. Clearly, the Legendre polynomials are obtained for m = 0, so
Pl = Pl0 , and for positive m the associated Legendre functions can be written in terms of Legendre
polynomials as
dm
Plm (x) = (1 − x2 )m/2 m Pl (x) . (4.42)
dx
The associated Legendre functions are solutions of the differential equation
m2
2 00 0
(1 − x )y − 2xy + l(l + 1) − y=0, (4.43)
1 − x2
generalising the Legendre differential equation (4.36) to which it reduces for m = 0. A calculation based
on Eq. (4.41) and partial integration leads to the orthogonality relations
Z 1
2 (l + m)!
dx Plm (x)Plm
0 (x) = δll0 . (4.44)
−1 2l + 1 (l − m)!
Exercise 4.10. Show that the associated Legendre polynomials Plm solve the differential equation (4.43)
and satisfy the orthogonality relations (4.44).
73
where the pre-factor is conventional and implies that Kn = n!. It is easy to extract from the Rodriguez
formula the coefficients kn and kn0 of xn and xn−1 in Ln and they are given by
(−1)n (−1)n−1 n
kn = , kn0 = . (4.46)
n! (n − 1)!
The normalisation of the Ln is computed from the Rodriguez formula with the usual partial integration
trick and it follows
Z ∞
1 ∞ dn −x n (−1)n ∞ dn Ln
Z Z
2 −x
hn = k Ln k = 2
dx e Ln (x) = d Ln (x) n (e x ) = dx (x)e−x xn
0 n! 0 dx n! 0 dxn
Z ∞
n
=(−1) kn dx e−x xn = 1 . (4.47)
0
Hence, with our convention, the Laguerre polynomials are already normalised 6 . Functions f ∈ L2w ([0, ∞])
can now be expanded as
X∞ ∞
X Z ∞
f= hLn , f iLn or f (x) = an Ln (x) , an = dx e−x Ln (x)f (x) . (4.48)
n=0 n=0 0
Expansions in terms of Laguerre polynomials are often useful for functions which depend on a radius r
which has a natural range [0, ∞].
Inserting the above results for kn , kn0 and hn into Eq. (4.6) gives
1 2n + 1 n
An = − , Bn = , Cn = (4.49)
n+1 n+1 n+1
and using these values in Eq. (4.5) leads to the recursion relation
(n + 1)Ln+1 (x) = (2n + 1 − x)Ln (x) − nLn−1 (x) (4.50)
for the Laguerre polynomials. From the Rodriguez formula the first few Laguerre polynomials are given
by
1 1
L0 (x) = 1 , L1 (x) = −x + 1 , L2 (x) = (x2 − 4x + 2) , L3 (x) = (−x3 + 9x2 − 18x + 6) . (4.51)
2 6
00
Inserting X = x, K1 = 1, k1 = −1, P1 = 1 − x and X = 0 into Eq. (4.20) gives the differential equation
xy 00 + (1 − x)y 0 + ny = 0 (4.52)
for the Laguerre polynomials. For the generating function
∞
X
G(x, z) = Ln (x)z n (4.53)
n=0
we can derive a differential equation in much the same way as we did for the Legendre polynomials, using
the recursion relation (4.50). This leads to
∂G 1−x−z
= G, (4.54)
∂z (1 − z)2
and the solution is ∞
xz 1 X
G(x, z) = exp − = Ln (x)z n . (4.55)
1−z 1−z
n=0
Exercise 4.11. Derive the differential equation (4.54) for the generating function of the Laguerre poly-
nomials and show that its solution is given by Eq. (4.55).
6
Sometimes the Ln are defined without the n! factor in the Rodriguez formula (4.45). For this convention the Ln are of
course not normalised to one, but instead hn = k Ln k2 = (n!)2 .
74
4.4 The Hermite polynomials
From Table 4.1.4, the Hermite polynomials are defined on the interval [a, b] = [−∞, ∞] = R, we have
2
X = 1 and the weight function is w(x) = e−x , so the relevant Hilbert space is L2w (R). They are denoted
Hn and, from Eq. (4.3), their Rodriguez formula reads
2 dn −x2
Hn (x) = (−1)n ex e , (4.56)
dxn
where the pre-factor is conventional and implies that Kn = (−1)n . Their symmetry properties are Hn (x) =
(−1)n Hn (−x). From this formula it is easy to read off the coefficients kn and kn0 of the leading and sub-
leading monomials xn and xn−1 in Hn as
kn = 2n , kn0 = 0 . (4.57)
The normalisation of the Hermite polynomials is computed as before, by using the Rodriguez formula (4.56)
combined with partial integration:
dn −x2 dn Hn
Z Z Z
2 −x2 2
hn = k Hn k = dx e Hn (x) = (−1)2 n
dx Hn (x) n e = dx n
(x)e−x
R Z R dx R dx
2 √
=kn n! dx e−x = π2n n! . (4.58)
R
The Hermite polynomials are useful for expanding functions defined on the entire real line and they make
a prominent appearance in the wave functions for the quantum harmonic oscillator.
From the above results for kn , kn0 and hn it is easy, by inserting into Eq. (4.6), to work out
An = 2 , Bn = 0 , Cn = 2n , (4.62)
for the Hermite polynomials. Rodriguez’s formula (4.56) can be used to work out the first few Hermite
polynomials which are given by
75
With X = 1, X 00 = 0, K1 = −1, H1 (x) = 2x and k1 = 2 Eq. (4.20) turns into the differential equation for
Hemite polynomials
y 00 − 2xy 0 + 2ny = 0 . (4.65)
The generating function
∞
X zn
G(x, z) = Hn (x) (4.66)
n!
n=0
Exercise 4.12. Show that the generating function G for the Hermite polynomials satisfies the differential
equation (4.67) and verify that it is solved by Eq. (4.68).
This function Z can be view as a generating function for the coefficients bn . Using the explicit form
of G from Eq. (4.68) we find
√
π
Z Z
2 2
Z(z) = dx f (x)e−(x−z) = dx sin(x)e−(x−z) = 1/4 sin(z) .
R R e
Hence, the coefficients bn are given by
√ (
0 for n even
dn Z dn sin(z)
π
bn = = = √
π(−1)(n−1)/2
dz n z=0 e1/4 dz n z=0 for n odd
e1/4
76
5 Ordinary linear differential equations
In this chapter, our focus will be on linear, second order differential equations of the form
α2 (x)y 00 + α1 (x)y 0 + α0 (x)y = f (x)
(5.1)
α2 (x)y 00 + α1 (x)y 0 + α0 (x)y = 0
where α0 , α1 and α2 as well as f are given functions. Clearly, the upper equation is inhomogeneous with
source f and the lower equation is its homogeneous counterpart. In operator form this can be written as
Ty = f
T = α2 D2 + α1 D + α0 , (5.2)
Ty = 0
where D = dx d
. For the range of x we would like to consider the interval [a, b] ⊂ R (where the semi-infinite
and infinite case is allowed) and, provided αi ∈ C ∞ ([a, b]), we can think of T as a linear operator T :
C ∞ ([a, b]) → C ∞ ([a, b]). We note that the general differential equations (4.20) for orthogonal polynomials
(and, hence, the Legendre, Laguerre and Hermitian differentials equations in Eqs. (4.36), (4.52) and (4.65))
are homogeneous equations of the form (5.1).
The above equations are usually solved subject to additional conditions on the solution y and there
are two ways of imposing such conditions. The first one, which leads to what is called an initial value
problem, is to ask for solutions to either of Eqs. (5.1) which, in addition, satisfy the “initial conditions”
y(x0 ) = y0 , y 0 (x0 ) = y00 , (5.3)
for x0 ∈ [a, b] and given values y0 , y00 ∈ R. Another possibility, which defines a boundary value problem, is
to ask for solutions to either of Eqs. (5.1) which satisfy the conditions
da y(a) + na y 0 (a) = ca , db y(b) + nb y 0 (b) = cb , (5.4)
where da , db , na , nb , ca , cb ∈ R are given constants. In other words, we impose linear conditions on the
function at both endpoints of the interval [a, b]. If da = db = 0 so these become conditions on the first
derivate only they are called von Neumann boundary conditions. The opposite case na = nb = 0, when
the boundary conditions only involve y but not y 0 are called Dirichlet boundary conditions. The general
case is referred to as mixed boundary conditions. For ca = cb = 0 the boundary conditions are called
homogeneous, otherwise they are called inhomogeneous.
Initial and boundary value problems, although related, are conceptually quite different. In physics,
the former are usually considered when the problem involves time evolution (so x corresponds to time)
and the initial state of the system needs to be specified at a particular time. Boundary value problems
frequently arise in physics when x has the interpretation of a spatial variable, for example the argument
of a wave function in quantum mechanics which needs to satisfy certain conditions at the boundary.
In this section, we will mainly be concerned with boundary value problems (initial value problems
having been the focus of the first year courses on differential equations). We begin with a quick review of
the relevant basic mathematics.
77
where the vector y(x) = (y1 (x), . . . , yn (x))T consists of the n functions we are trying to find, g(x) =
(g1 (x), . . . , gn (x))T is a given vector of functions and A is a given n × n matrix of functions. For this
system, we have the following
Theorem 5.1. Let g = (g1 , . . . gn )T be an n-dimensional vector of continuous functions gi : [a, b] → F
and A = (Aij ) an n × n matrix of continuous functions Aij : [a, b] → F (where F = R or F = C). For
a given x0 ∈ [a, b] and any c ∈ F n the inhomogeneous differential equation (5.5) has a unique solution
y : [a, b] → F n with y(x0 ) = c.
Proof. This is a classical statement from the theory of ordinary differential equations. The existence part
is also sometimes referred to as the Picard-Lindelöf Theorem. The proof can be found in many books on
the subject, for example Ref. [10].
In simple terms, the above theorem states that the initial value problem for the differential equation (5.5)
always has a solution and that this solution is unique. Next, we focus on the homogeneous equation.
Theorem 5.2. Let A = (Aij ) be an n × n matrix of continuous functions Aij : [a, b] → F and y : [a, b] →
F n . Then the set of solutions VH of the homogeneous differential equation (5.5) is an n-dimensional vector
space over F . For k solutions y1 , . . . , yk ∈ VH the following statements are equivalent.
(i) y1 , . . . yk are linearly independent in VH .
(ii) There exists an x0 ∈ [a, b] such that y1 (x0 ), . . . yk (x0 ) ∈ F n are linearly independent.
(iii) The vectors y1 (x), . . . yk (x) ∈ F n are linearly independent for all x ∈ [a, b].
Proof. The proof follows from simple considerations and Theorem 5.1.
So the dimension of the homogeneous solution space is n-dimensional and a given set of solutions y1 , . . . yn
forms a basis of VH iff y1 (x), . . . yn (x) ∈ F n are linearly independent for at least one x. Alternatively, we
can say
and this provides a practical way of checking whether a system of solutions forms a basis of the solution
space.
If VI is the set of solutions of the inhomogeneous equation (5.5) it is clear that
V I = y0 + V H , (5.7)
where y0 is any solution to (5.5). A special solution of the inhomogeneous equation can be found by a
process called variation of constants as in the following
Theorem 5.3. (Variation of constants) If Y = (y1 , . . . , yn ) is a basis of VH then
Z x
y0 (x) = Y (x) dt Y (t)−1 g(t) (5.8)
x0
78
5.1.2 Second order linear differential equations
How is the above discussion of first order differential equations relevant to our original problem (5.1) of
second order differential equations? The answer is, of course, that higher order differential equations can
be converted into systems of first order differential equations. To see this, start with the system (5.1) and
define an associated two-dimensional first order system of the form (5.5) given by
0
ỹ1 0 1
y= , A= , g= . (5.10)
ỹ2 − αα20 − αα21 f
α2
(We assume α2 is non-zero everywhere.) The solutions of this first-order system and the ones of the second
order equation (5.1) are then in one-to-one correspondence via the identification ỹ1 = y and ỹ2 = y 0 . Given
this observation we can now translate the previous statements for first order systems into statements for
second order differential equations.
Theorem 5.4. Let αi , f : [a, b] → F be continuous (and α2 non-zero on [a, b]). Then we have the following
statements:
(a) For given y0 , y00 ∈ R and x0 ∈ [a, b], the inhomogeneous equation (5.1) has a unique solution y : [a, b] →
F with y(x0 ) = y0 and y 0 (x0 ) = y00 .
(b)The solutions y : [a, b] → F to the homogeneous equation form a two-dimensional vector space VH over
F . Two solutions y1 and y2 to the homogeneous equation form a basis of VH iff the matrix
y1 y2
(x) (5.11)
y10 y20
is non-singular for at least one x ∈ [a, b] or, equivalently, iff the Wronski determinant
y1 y2
W := det (x) = (y1 y20 − y2 y10 )(x) (5.12)
y10 y20
The procedure of variation of constants in Theorem 5.3 can also be transferred to second order differential
equations and leads to
Theorem 5.5. (Variation of constants) Let αi , g : [a, b] → F be continuous (and α2 non-zero on [a, b]) and
y1 , y2 : [a, b] → F a basis of solutions for the homogeneous system (5.1). Then, a solution y : [a, b] → F
of the inhomogeneous system is given by
Z x
y(x) = dt G(x, t)f (t) , (5.13)
x0
79
Proof. This follows directly from the above results for first order systems. More specifically, inserting
y20 −y2
y1 y2 −1 1
Y = , Y = (5.15)
y10 y20 W −y10 y1
together with g from Eq. (5.10) into Eq. (5.8) gives the result.
α2 (x)y000 + α1 (x)y00 + α0 (x)y0 = 0 , da y0 (a) + na y00 (a) = ca , db y0 (b) + nb y00 (b) = cb , (5.16)
that is, to the homogeneous equation with the inhomogeneous boundary conditions. Next, find a solution
ỹ to
that is, to the inhomogeneous differential equation with a homogeneous version of the boundary conditions.
It is easy to see that, thanks to linearity, the sum y = y0 + ỹ provides a solution to the general problem,
that is, to the inhomogeneous Eq. (5.1) with inhomogeneous boundary conditions (5.4). We can deal with
the first problem (5.16) by finding the most general solution to the homogeneous differential equation,
that is, determine the solution space VH , and then build in the boundary condition. We will discuss some
practical methods to do this soon but for now, let us assume this has been accomplished and we want to
solve the problem (5.17).
The idea is to do this by modifying the variation of constants approach from Theorem (5.5) and
construct a Green function which leads to the correct boundary conditions. Let’s address this for the case
of Dirichlet boundary conditions, so we are considering the problem
Theorem 5.6. Let y1 , y2 : [a, b] → F be a basis of VH , that is, a basis of solutions to the homogeneous
system (5.1), satisfying y1 (a) = y2 (b) = 0. Then a solution y : [a, b] → F to the Dirichlet boundary value
problem (5.18) is given by
Z b
y(x) = dt G(x, t)f (t) , (5.19)
a
where the Green function G is given by
y1 (t)y2 (x)θ(x − t) + y1 (x)y2 (t)θ(t − x)
G(x, t) = . (5.20)
α2 (t)W (t)
Here θ is the Heaviside function defined by θ(x) = 1 for x ≥ 0 and θ(x) = 0 for x < 0.
80
Proof. The two homogeneous solutions y1 and y2 satisfy T y1 = T y2 = 0 with the operator T from Eq. (5.2)
and the conditions y1 (a) = y2 (b) = 0 can always be imposed since we know there exists a solution for any
choice of initial condition. Now we start with a typical variation of constant Ansatz
y(x) = u1 (x)y1 (x) + u2 (x)y2 (x) , (5.21)
where u1 , u2 are two functions to be determined. If we impose on those two functions the condition
u01 y1 + u02 y2 = 0 (5.22)
an easy calculation shows that
!
T y = α2 (u01 y10 + u02 y20 ) = f . (5.23)
Solving Eqs. (5.22) and (5.23) for u1 and u2 leads to
Z x x
y2 (t)f (t) y1 (t)f (t)
Z
u1 (x) = − dt , u2 (x) = dt , (5.24)
x1 α 2 (t)W (t) x2 α2 (t)W (t)
where x1 , x2 ∈ [a, b] are two otherwise arbitrary constants. To implement the boundary conditions y(a) =
y(b) = 0 it suffices to demand that u1 (b) = u2 (a) = 0 (given our assumptions about the boundary values
of y1 and y2 ) and this is guaranteed by choosing x1 = b and x2 = a. Inserting these values into Eq. (5.24)
and the expressions for ui back into the Ansatz (5.21) gives the desired result.
Whether Eq. (5.19) is the unique solutions to the boundary value problem (5.18) depends on whether
there is a non-trivial solution to the homogeneous equations in VH which satisfies the relevant boundary
conditions y(a) = y(b) = 0. If there is it can be added to (5.19) and the solution is not unique, otherwise
it is. More generally, going back to the way we have split up the problem into two steps in Eqs. (5.16) and
(5.17), we have now found a method to find a solutions to the second problem (5.17) (the inhomogeneous
equation with the homogeneous boundary conditions) for the Dirichlet case. Any solution to the first
problem (5.16) (the homogeneous equation with inhomogeneous boundary conditions) can be added to
this.
81
Obtaining a second independent solution from a known one can be useful but how can we find a solution
in the first place? A very common and efficient method is to start with a power series Ansatz
∞
X
y(x) = ak xk . (5.27)
k=0
Of course, this is only practical if the functions αi which appear in T are polynomial. In this case, the idea
is to insert the Ansatz (5.27) into T y = 0, assemble the coefficient in front of xk and set this coefficient
to zero for every k. In this way, one obtains a recursion relation for the ak and inserting the resulting
ak back into Eq (5.27) gives the solution in terms of a power series. Of course this is where the difficult
work starts. Now one has to understand the properties of the so-obtained series, such as convergence,
singularities or asymptotic behaviour. All this is best demonstrated for examples and we will do this
shortly.
5.2 Examples
We would now like to apply some of the methods and results from the previous subsection to examples.
Of course we know that the Legendre polynomials are solutions but we would like to derive this
independently (as well as finding the second solution which must exist) by using the power series
method. Inserting the series (5.27) into the Legendre differential equation gives (after re-defining
some of the summation indices)
∞
X
[(k + 2)(k + 1)ak+2 − (k(k + 1) − n(n + 1))ak ] xk = 0 . (5.29)
k=0
Demanding that the coefficient in front of every monomial xk vanishes (then and only then is a power
series identical to zero) we obtain the recursion formula
k(k + 1) − n(n + 1)
ak+2 = ak , k = 0, 1, . . . , (5.30)
(k + 1)(k + 2)
for the coefficients ak . There are a number of interesting features of this formula. First, the coefficients
a0 and a1 are not fixed but once values have been chosen for them the above recursion formula
determines all other ak . This freedom of choosing two coefficients precisely corresponds to the two
independent solutions we expect. The second interesting feature is that, due to the structure of the
numerator in Eq. (5.30), ak = 0 for k = n + 2, n + 4, . . ..
To see what happens in more detail let’s first assume that n is even. Choose (a0 , a1 ) = (1, 0). In
this case all ak with k odd vanish and the ak with k even are non-zero only for k ≤ n. This means,
the series breaks down and turns into a polynomial - this is of course (propertional to) the Legendre
polynomial Pn for n even. Still for n even, make the complementary choice (a0 , a1 ) = (0, 1). In this
case all the ak with k even are zero. However, for k odd and n even the numerator in Eq. (5.30) never
vanishes so this leads to an infinite series which only contains odd powers of x. This is the second
solution, in addition to the Legendre polynomials. For n odd the situation is of course similar but
82
reversed. For (a0 , a1 ) = (0, 1) we get polynomials - the Legendre polynomial Pn for n odd - while for
(a0 , a1 ) = (1, 0) we get the second solution, an infinite series with only even powers of x.
Exercise 5.8. Show explicitly that, for suitable choices of a0 and a1 , the recursion formula (5.30)
reproduces the first few Legendre polynomials in Eq. (4.35).
Of course the differential equation (5.28) also makes sense if n is a real number, rather than an integer,
and the above calculation leading to the coefficients ak remains valid in this case. However, if n ∈ / N,
the numerator in Eq. (5.30) never vanishes and both solutions to (5.28) are non-polynomial.
To find the second solution we can also use the reduction of order method from Theorem 5.7. To
demonstrate how this works we focus on the case n = 1 with differential equation
(1 − x2 )y 00 − 2xy 0 + 2y = 0 . (5.31)
which is solved by the Legendre polynomial y(x) = P1 (x) = x. Inserting this, together with α1 (x) =
−2x and α2 (x) = 1 − x2 into Eq. (5.25) gives
Z x
1 2t 1
u0 (x) = 2 exp dt = 2 (5.32)
x 1 − t2 x (1 − x2 )
Exercise 5.9. Find the Taylor series of the solution (5.34) around x = 0 and show that the coefficients
in this series are consistent with the recursion formula (5.30).
To find its solutions we can proceed like we did in the Legendre case and insert the series (5.27). This
leads to
X∞
[(k + 1)(k + 2)ak+2 − 2(k − n)ak ] xk = 0 , (5.36)
k=0
2(k − n)
ak+2 = ak . (5.37)
(k + 1)(k + 2)
As before, we have a free choice of a0 and a1 but with those two coefficients fixed the recursion formula
determines all others. From the numerator in Eq. (5.37) it is clear that ak = 0 for k = n + 2, n + 4, . . ..
83
For n even and (a0 , a1 ) = (1, 0) we get a polynomial with only even powers of x - up to an overall
constant the Hermite Polynomial Hn with n even - while (a0 , a1 ) = (0, 1) leads to an infinite series
with only odd powers of x - the second solution of (5.28). For n odd the choice (a0 , a1 ) = (0, 1) leads
to a polynomial solution with only odd powers of x which is proportional to the Hermite polynomials
Hn for n odd, while the choice (a0 , a1 ) = (1, 0) leads to a power series with only even powers of x.
Exercise 5.10. Show that, for appropriate choices of a0 and a1 the recursion formula reproduces the
first few Hermite polynomials (4.64).
As in the Legendre case, the differential equation (5.35) also makes sense if n is a real number. If
n∈/ N then the numerator in Eq. (5.37) never vanishes and both solutions to (5.35) are non-polynomial.
(This observation plays a role for the energy quantisation of the quantum harmonic oscillator.)
Of course the above discussion can be repeated for the Laguerre differential equation (4.52) as in the
following
Exercise 5.11. Insert the series Ansatz (5.27) into the Laguerre differential equation (4.52) and find
the recursion relation for the coefficients ak . Discuss the result and identify the choices which lead to the
Laguerre polynomials.
d2
Ty = f , T = +1 (5.38)
dx2
on the interval [a, b] = [0, π2 ], where f is an arbitrary function. (This describes a driven harmonic
oscillator with driving force f .) It is clear that the solution space of the associated homogeneous
equation, T y = 0, is given by
and since this is non-vanishing the two solutions are indeed linearly independent. To find the solution
space of the inhomogeneous equation we can use the variation of constant method from Theorem 5.5.
Inserting y1 = sin, y2 = cos, W = −1 and α2 = 1 into Eq. (5.14) we find for the Green function
From Eq. (5.19) this means a special solution to the inhomogeneous equation is given by
Z x Z x
y0 (x) = dt G(x, t)f (t) = dt sin(x − t)f (t) , (5.42)
x0 x0
VI = y0 + VH . (5.43)
84
Exercise 5.12. Check explicitly that y0 from Eq. (5.42) satisfies the equation T y0 = f .
Let us now consider Eq. (5.38) as a boundary value problem on the interval [a, b] = [0, π2 ] with Dirichlet
boundary conditions y(0) = y(π/2) = 0 and apply the results of Theorem 5.6. First, we note that
y1 (0) = y2 (π/2) = 0 so our chosen homogeneous solutions do indeed satisfy the requirements of the
Theorem. Inserting y1 = sin, y2 = cos, W = −1 and α2 = 1 into Eq. (5.20) gives the Green function
which is certainly well-defined as long as x > 0. A short calculation, using integration by parts and noting
that the boundary term vanishes, gives
Z ∞ Z ∞
∞
dt e−t (xtx−1 ) = e−t tx 0 + dt e−t tx = Γ(x + 1) ,
xΓ(x) = (5.47)
0 0
for n ∈ N. Hence, the Gamma function can be seen as a function which extends the factorial operation
to non-integer numbers. From Eq. (5.48) it also follows by iteration that
Γ(x + n)
Γ(x) = , (5.50)
x(x + 1) · · · (x + n − 1)
which shows that the Γ function has poles at x = 0, −1, −2, . . .. These are, in fact, the only poles of the
Gamma function.
85
Application 5.21. Asymptotic expression for the Γ-function
As an aside, let us derive an asymptotic expression for the Γ-function. We start with the substitution
t = (x − 1)s in the defining integral (5.46), which leads to
Z ∞ Z ∞
x (x−1)(ln(s)−s) x 2
Γ(x) = (x − 1) ds e ' (x − 1) ds e(x−1)(−1−(s−1) /2)+···
0 0
Z ∞ Z ∞
u=s−1 x −(x−1) 2
−(x−1)u /2 x −(x−1) 2
= (x − 1) e du e ' (x − 1) e du e−(x−1)u /2
−1 −∞
x−1
x−1
p
= 2π(x − 1) (5.51)
e
Exercise 5.13. What happened in the second step and the second last step of the calculation in
Eq. (5.51)? Which approximations have been made and how are they justified?
The approximate result (5.51) for the Γ-function leads to the famous Stirling formula
√ n n
n! = Γ(n + 1) ' 2πn (5.52)
e
which provides an asymptotic approximation to the value of n!.
Much more can be said about the Gamma function and its natural habitat is in complex analysis. For
our purposes the above is sufficient but more information can be found, for example, in Ref. [11].
x2 y 00 + xy 0 + (x2 − ν 2 )y = 0 , (5.53)
into the differential equation (putting the additional factor xα into the Ansatz proves useful to improve
the properties of the resulting series) leads to
∞
X ∞
X
k(k + 2α)ak xk + ak xk+2 = 0 . (5.55)
k=0 k=0
There is only a single term proportional to x and to remove this term so we need to set a1 = 0. Since the
recursion formula will imply that ak+2 is proportional to ak this means all ak with k odd must vanish.
For the coefficients with even k Eq. (5.55) gives
1
a2k = − a2k−2 , k = 1, 2, . . . . (5.56)
4k(k + α)
This recursion formula can be iterated and leads to
(−1)k Γ(α + 1)
a2k = a0 . (5.57)
22k k!Γ(k + α + 1)
86
That this result for a2k does indeed satisfy the recursion relation (5.56) follows directly from the functional
equation (5.48) of the Gamma function. It is conventional to choose a0 = (2α Γ(α + 1))−1 and, by inserting
everything back into the Ansatz (5.54), this leads to the two series solutions for α = ±ν given by
∞
x ±ν X (−1)k x 2k
J±ν (x) := . (5.58)
2 k!Γ(k ± ν + 1) 2
k=0
Both Jν and J−ν are solutions of the Bessel differential equation and they are called Bessel functions of
the first kind. It can be shown (for example by applying the quotient criterion) that the series in Eq (5.58)
converges for all x. There is a subtlety for J−n if n = 0, 1, . . .. In this case, the terms in the series for
k = 0, . . . , n − 1 have a Gamma-function pole in the denominator (see Eq. (5.50)) and are effectively
removed so that the sum starts at k = n. Using this fact, it follows from the above series that
In other words, if ν = n is an integer then the two solutions Jn and J−n are linearly dependent. If ν is not
an integer the Wronski determinant at x → 0 has the leading term W = Jν (x)J−ν 0 (x) − J 0 (x)J (x) =
ν −ν
4ν
− Γ(−ν+1)Γ(ν+1)x (1 + O(x)) 6= 0 which shows that the solutions Jν and J−ν are linearly independent. To
overcome the somewhat awkward distinction between the integer and non-integer case it is customary to
define the Bessel functions of the second kind by
Jν (x) cos(νπ) − J−ν (x)
Nν (x) = . (5.60)
sin(νπ)
This definition has an apparent pole if ν is integer but it can be shown that it is well-defined in the limit
ν → n. We summarise some of the properties of Bessel functions in the following
Proposition 5.1. (Properties of Bessel functions) The Bessel function of the first and second kind, Jν
and Nν , defined above, solve the Bessel differential equation and are linearly independent for all ν ∈ R.
For their asymptotic properties we have
2
1 π (ln(x/2) + C) for ν = 0
x ν
x→0 : Jν (x) → Nν → Γ(ν) 2 ν (5.61)
Γ(ν + 1) 2 − π x for ν > 0
r r
2 νπ π 2 νπ π
x→∞ : Jν (x) → cos x − − Nν (x) → sin x − − (5.62)
πx 2 4 πx 2 4
Proof. The fact that Jν and Nν solve the Bessel differential equation is clear from the above calculation.
Their linear independence has been shows for ν non-integer and the integer case can be dealt with by a
careful consideration of the definition (5.60) as ν → n. (See, for example, Ref. [12]).
The asymptotic limit as x → 0 can be directly read off from the series (except for ν = 0 which requires
a bit more care). The proofs for the asymptotic limits x → ∞ are a bit more involved (see, for example,
in Ref. [12]). Intuitively, for large x, we should only keep the terms proportional to x2 in the Bessel
differential equation (5.53) which leads to y 00 + y ' 0. This is clearly solved by sin and cos.
q q
2 2
Exercise 5.14. Show that J1/2 (x) = πx sin(x) and N1/2 (x) = − πx cos(x).
Particularly the above limits for large x are interesting. They show that the Bessel functions have oscilla-
tory properties close to sin and cos but, unlike these, they are not periodic. This is illustrated in Fig. 13.
In particular, the asymptotic limits for large x show that the Bessel functions have an infinite number of
zeros but they are not equally spaced (although they are asymptotically equally spaced). We denote the
87
1.0
0.5
0.0 x
-0.5
-1.0
0 10 20 30 40
zeros of Jν by zνk where k = 1, 2, . . . labels the zeros from small to large x. Their values can be computed
numerically and some examples for J0 , J1 and J2 are:
z0k = 2.405, 5.520, 8.654, . . .
z1k = 3.832, 7.016, 10.173, . . .
z2k = 5.136, 8.417, 11.620, . . .
88
In fact, we have the following stronger statement.
Theorem 5.15. For ν > −1, the functions Jˆνk for k = 1, 2, . . ., defined in Eq. (5.63), with suitable choices
for Nνk , form an ortho-normal basis of L2w ([0, a]), where w(x) = x.
Proof. The direct proof is technical and can be found in Ref. [7]. In the next subsection, we will see an
independent argument.
The theorem implies that every function f ∈ L2w ([0, a]) can be expanded in terms of Bessel functions as
∞ Z a
ak Jˆνk , ak = hJˆνk , f i = dx xJˆνk (x)f (x) .
X
f= (5.65)
k=1 0
with (real-valued) smooth functions w, p and q is called a Sturm-Liouville operator. For now, we would
like to think of this as an operator on the space
on the space square integrable functions, relative to a weight function w, on an interval [a, b] which are
also infinitely many times differentiable. Accordingly, we should demand that w, p and q are smooth
functions and that w, as a weight function, is strictly positive.
Lemma 5.2. Consider a linear second order differential operator of the form
d2 d
T = α2 (x) 2
+ α1 (x) + α0 (x) , (5.68)
dx dx
where x ∈ [a, b] and I ⊂ [a, b] is an interval such that α2 (x) 6= 0 for all x ∈ I. Then, on I, the operator T
can be written in Sturm-Liouville form (5.66) with
Z x
α1 (t) p(x)
p(x) = exp dt , w(x) = , q(x) = α0 (x)w(x) , (5.69)
x0 α2 (t) α2 (x)
where x0 ∈ I.
d
Proof. Abbreviating D = dx and noting that p0 = αα12 p we obtain, by inserting into the Sturm-Liouville
operator,
p p0 q α1 p
TSL = D2 + D + = α2 D2 + D + α0 = α2 D2 + α1 D + α0 = T . (5.70)
w w w α2 w
89
Introducing the interval I in the above theorem is to avoid an undefined integrand in the first Eq. (5.69)
due to the vanishing of α2 . Even when this happens (such as, for example, for the Legendre differential
equation at x = ±1) and the interval I is, at first, chosen to be genuinely smaller than [a, b] it turns out
the final result for w, p and q can often be extended to and is well-defined on [a, b].
An obvious question is whether TSL is self-adjoint as an operator on the space L([a, b]), relative to the
standard inner product
Z b
hf, gi = dx w(x)f (x)g(x) , (5.71)
a
with weight function w. A quick calculation shows that
Z b Z b
b
hf, TSL gi = dx (f D(pDg) + f qg) = [pf Dg]a − dx (pDf Dg − qf g)
a a
Z b
b
= [pf Dg]ba − [pgDf ]ba + dx (D(pDf )g + qf g) = pf g 0 − pgf 0 a + hTSL f, gi .
(5.72)
a
So TSL is superficially self-adjoint but we have to ensure that the boundary terms on the RHS vanish.
There are two obvious ways in which this can be achieved. First, the interval [a, b] might be chosen such
that p(a) = p(b) = 0 - this is also called the natural choice of the interval. In this case, the boundary term
vanishes without any additional condition on the functions f , g and TSL is self-adjoint on L([a, b]). If this
doesn’t work we can consider the subspace
of smooth functions which satisfy mixed homogeneous boundary conditions at a and b. For such functions
the above boundary term also vanishes. If p(a) = p(b) the boundary term also vanishes for periodic
functions
Lp ([a, b]) := {f ∈ L([a, b]) | f (a) = f (b) , f 0 (a) = f 0 (b)} . (5.74)
Hence, we have
Lemma 5.3. Let TSL be a Sturm-Liouville operator (5.66). If p(a) = p(b) = 0 then TSL is self-adjoint as
on operator on the space L([a, b]) in Eq. (9.112). It is also self-adjoint on the space of functions Lb ([a, b])
with mixed homogeneous boundary in Eq. (5.73). If p(a) = p(b) it is self-adjoint on the space Lp ([a, b]) of
periodic functions in Eq. (5.74).
To simplify the notation, we will refer to the space on which the Sturm-Liouville operator is defined
and self-adjoint as LSL ([a, b]). From the previous Lemma, this can be L([a, b]), Lb ([a, b]) or Lp ([a, b]),
depending on the case.
90
name DEQ p q w LSL [a, b] bound. cond. λ y
2 2
00
− πa2k kπx
sine Fourier y = λy 1 0 1 Lb ([0, a]) y(0) = y(π) = 0 sin a
2 2
cosine Fourier y 00 = λy 1 0 1 Lb ([0, a]) y 0 (0) = y 0 (π) = 0 − πa2k cos kπx
a
2 2
Fourier y 00 = λy 1 0 1 Lp ([−a, a]) periodic − πa2k sin kπx
a
2 2
− πa2k cos kπx
a
Legendre (1 − x2 )y 00 − 2xy 0 = λy 1 − x2 0 1 L([−1, 1]) natural −n(n + 1) Pn
Laguerre xy 00 + (1 − x)y 0 = λy xe−x 0 e−x L([0, ∞]) natural −n Ln
2 2
Hermite y 00 − 2xy 0 = λy e−x 0 e−x L([−∞, ∞]) natural −2n Hn
2
ν2 2 zνk
Bessel y 00 + x1 y 0 − x2 y = λy x − xν 2 x Lb ([0, a]) y(0) = y(a) = 0 − a2 Jˆνk
Table 2: The second order differential equations discussed so far and their formulation as a Sturm-Liouville
eigenvalue problem.
in this way would be incorrect for two reasons. First, so far the Sturm-Liouville operator is only defined
on the space LSL which consists of certain smooth functions. While this space may well be dense in the
appropriate L2 Hilbert space it is not a Hilbert space itself. Secondly, Theorem 2.9 applies to compact
operators and we know from Exercise 1.13 that differential operators are not bounded and, hence, not
compact.
One way to make progress is to convert the Sturm-Liouville differential operator into an integral
operator. Some of the hard work has already been done in Theorem 5.6 where we have shown that,
provided Ker(TSL ) = {0} we know (for Dirichlet boundary conditions) that
Z b
TSL y = f ⇐⇒ y = Ĝf , Ĝf (x) := dt G(x, t)f (t) , (5.76)
a
where G is the Green function. The integral operator Ĝ, defined in terms of the Green function kernel
G, can be thought of as the inverse of the Sturm-Liouville operator and, as an integral operator, we can
extend it to act on the space L2w ([a, b]) (with appropriate boundary conditions). Moreover, we have
Lemma 5.4. If Ker(TSL ) = {0} then the operator Ĝ in Eq. (5.76) is self-adjoint and compact on L2w ([a, b])
(with Dirichlet boundary conditions).
This means, the Sturm-Liouvllle eigenvalue problem is converted into an eigenvalue problem for Ĝ which
is formulated in terms of an integral equation, also called a Volterra integral equation. The eigenfunctions
for TSL and Ĝ are the same and the eigenvalues each others inverse. Since Ĝ is compact we can now
apply Theorem 2.9 to it. If Ker(TSL ) 6= {0} we can shift TSL → TSL + α by some value α so that the new
operator has a vanishing kernel. In summary, Theorem 2.9 then applies to our eigenvalue problem:
Lemma 5.5. The set of eigenvalues of the Sturm-Liouville operator is either finite or it forms a sequence
which tends to infinity. Every eigenvalue has a finite degeneracy and there exists an ortho-normal basis
of eigenvectors.
91
5.4.3 Sturm-Liouville and Fredholm alternative
In view of the Sturm-Liouville formalism, we can now re-visit our original boundary value problem but, for
simplicity, we specialise to the case of homogeneous boundary conditions. This means, we are considering
the equations
where T is a second order differential operator of the form (5.2) (which, we now know, can be written
in Sturm-Liouville form). In a way similar to the above Green function method, we can convert this
problem into one that involves a compact integral operator, turning the differential equation into an
integral equation. The benefit is that Theorem 2.10 on Fredholm’s alternative can be applied to this
problem and turned into a version of Fredholm’s alternative for second order linear differential equations.
(b) There is a non-trivial solution of the homogeneous problem (5.79). In this case, there exists a solution to
the inhomogeneous problem if and only if hy0 , f i = 0 for all solutions y0 to the homogeneous problem (5.79).
If this condition is satisfied, the solution to (5.78) is given by
X 1
y= hek , f iek + y0 , (5.81)
λk
k:λk 6=0
Proof. Broadly, this follows from Theorem 2.10 on Fredholm’s alternative. The details of the proof are
somewhat technical, particularly in dealing with the boundary conditions, and can, for example, be found
in Ref. [5].
We note that the unique solution (5.80) in case (a) can be written as
X 1 Z b Z b
y(x) = dt w(t)ek (t)ek (x)f (t) = dt G(x, t)f (t) (5.82)
λk a a
k
92
6 Laplace equation
The Laplace operator ∆ in Rn with coordinates (x1 , . . . , xn ) is defined as
n
X ∂2
∆= . (6.1)
i=1
∂x2i
93
√
The measure for integration relative to X is given by dS = g dt1 · · · dtk .
Proof. This formula is proved in Appendix B which contains an account of some basic differential geometry,
a subject somewhat outside the main thrust of this lecture.
Exercise 6.1. Consider a curve [a, b] 3 t → x(t) ∈ Rn and use Lemma 6.1 to derive the measure dS for
integration over a curve. Do the same with a surface (t1 , t2 ) → x(t1 , t2 ) ∈ R3 and convince yourself that
the measure dS you obtain reproduces what you have learned about integration over surfaces.
Let us apply this formula to derive the Laplacian in several useful coordinate systems.
∂2 ∂2
∆2 = + (6.6)
∂x2 ∂y 2
The two-dimensional case is somewhat special since R2 ∼ = C and we can introduce complex coordinates
z = x + iy and z̄ = x − iy. Introducing the Wirtinger derivatives
∂ 1 ∂ ∂ ∂ 1 ∂ ∂
= −i , = +i , (6.7)
∂z 2 ∂x ∂y ∂ z̄ 2 ∂x ∂y
∂x ∂x
= (cos ϕ, sin ϕ) , = r(− sin ϕ, cos ϕ) , (6.10)
∂r ∂ϕ
which gives G = diag(1, r2 ) and g = r2 . Inserting this into the general formula (6.5) gives the two-
dimensional Laplacian in polar coordinates
1 ∂2
1 ∂ ∂
∆2,pol = r + 2 . (6.11)
r ∂r ∂r r ∂ϕ2
√
(The integration measure in two-dimensional polar coordinates is dS = g dr dϕ = r dr dϕ.)
94
6.1.2 Three-dimensional Laplacian
In R3 with coordinates x = (x, y, z) the Laplacian in Cartesian coordinates is given by
∂2 ∂2 ∂2
∆3 = + + . (6.12)
∂x2 ∂y 2 ∂z 2
Cylindrical coordinates t = (r, ϕ, z), where r ∈ [0, ∞], ϕ ∈ [0, 2π[ and z ∈ R, are related to their Cartesian
counterparts by
x(r, ϕ, z) = (r cos ϕ, r sin ϕ, z) . (6.13)
The tangent vectors
∂x ∂x ∂x
= (cos ϕ, sin ϕ, 0) , = (−r sin ϕ, r cos ϕ, 0) , = (0, 0, 1) , (6.14)
∂r ∂ϕ ∂z
imply the metric G = diag(1, r2 , 1) with determinant g = r2 and hence, by inserting into Eq. (6.5), the
three-dimensional Laplacian in cylindrical coordinates
1 ∂2 ∂2 ∂2
1 ∂ ∂
∆3 = r + 2 + = ∆2,pol + . (6.15)
r ∂r ∂r r ∂ϕ2 ∂z 2 ∂z 2
√
(For the integration measure in cylindrical coordinates we get the well-known result dS = g dr dϕ dz =
rdr dϕ dz.)
We can repeat this analysis for three-dimensional spherical coordinates t = (r, θ, ϕ), where r ∈ [0, ∞],
θ ∈ [0, π[ and ϕ ∈ [0, 2π[, defined by
which leads to the metric G = diag(1, r2 , r2 sin2 θ) with determinant g = r4 sin2 θ. Inserting into Eq. (6.5)
gives the three-dimensional Lagrangian in spherical coordinates
1 ∂2
1 ∂ 2 ∂ 1 1 ∂ ∂
∆3,sph = 2 r + 2 sin θ + . (6.17)
r ∂r ∂r r sin θ ∂θ ∂θ sin2 θ ∂ϕ2
√
(The integration measure for three-dimensional polar coordinates is dS = g dr dθ dϕ = r2 sin θ dr dθ dϕ.)
95
The two tangent vectors are
∂x ∂x
= (cos θ cos ϕ, cos θ sin ϕ, − sin θ) , = (− sin θ sin ϕ, sin θ cos ϕ, 0) , (6.19)
∂θ ∂ϕ
with associated metric G = diag(1, sin2 θ) and determinant g = sin2 θ. Inserting into Eq. (6.5) gives the
Laplacian on the two-sphere
1 ∂2
1 ∂ ∂
∆S 2 = sin θ + . (6.20)
sin θ ∂θ ∂θ sin2 θ ∂ϕ2
Comparison with Eq. (6.17) shows that the three-dimensional Laplacian can be expressed as
1 ∂ 2 ∂ 1
∆3,sph = 2 r + 2 ∆S 2 . (6.21)
r ∂r ∂r r
√
(The integration measure on the two-sphere is dS = g dθ dϕ = sin θ dθ dϕ.)
∇ · A = ∇ · (f ∇g) = f ∆g + ∇f · ∇g , A · n = f ∇g · n (6.23)
Exchanging f and g in this formula and subtracting the two resulting equations gives the second Green
formula or Green’s identity
Z Z
(f ∆g − g∆f )dV = (f ∇g − g∇f ) · n dS . (6.25)
V ∂V
After this preparation we are now ready to delve into the task of solving the Laplace equation.
96
of the equation. Define the generalised Newton (or Coulomb) potentials as
(
1 1
− (n−2)v n |x−a|
n−2 for n > 2
G(x − a) = Ga (x) = 1 , (6.27)
2π ln |x − a| for n = 2
where vn is the surface “area” of the n − 1-dimensional sphere, S n−1 (and the constants have been
included for later convenience). In electromagnetism, Ga corresponds the the electrostatic potential of a
point charge located at a. Clearly, Ga is well-defined for all x 6= a. It is straightforward to verify by direct
calculation that
∆Ga = 0 for all x 6= a . (6.28)
Exercise 6.2. Show that the gradient of the Newton potentials (6.27) is given by
1 x−a
∇Ga (x) = . (6.29)
vn |x − a|n
Also, verify that the Newton potentials satisfy the homogeneous Laplace equation for all x 6= a.
x−a
Proof. With dS = n dS and the unit normal vector n to the sphere given by n = |x−a| we have
1 1
∇Ga · dS = dS . (6.31)
vn |x − a|n−1
y=(x−a)/ 1
Z
= lim f (a + y)dS = f (a) . (6.32)
vn →0 |y|=1
For the second integral, using that |∇f (x) · n| ≤ K for some constant K, we have
Z Z Z
2−n →0
Ga (x)∇f (x) · dS ≤ const dS = dS −→ 0 , (6.33)
|x−a|= |x−a|= |y|=1
This Lemma was the technical preparation for the following important statement.
for all x ∈ Rn . Then ∆φ = ρ, that is, the above φ satisfies the inhomogeneous Laplace equation with
source ρ.
97
Proof. Introducing the coordinate z = y − x, a region V = {z ∈ Rn | ≤ |z| ≤ R} with R so large that
ρ(x + z) = 0 for |z| > R (which is possible since ρ has compact support) and ρx (z) := ρ(x + z), we have
Z Z Z
n n
∆φ(x) = dz G(z)∆ρ(x + z) = lim dz G ∆ρx = lim dz n (G ∆ρx − ρx ∆G)
Rn →0 V →0 V
Z
= lim (G∇ρx − ρx ∇G) · dS = ρx (0) = ρ(x) , (6.35)
→0 ∂V
The above function G is also sometimes referred to as the Green function of the Laplace operator. Of
course, the solution (6.34) is not unique but we know that two solutions to the inhomogeneous Laplace
equation differ by a solution to the homogeneous one. Hence, the general solution to the inhomogeneous
Laplace equation can be written as
Z
φ(x) = φH (x) + dy n G(x − y)ρ(y) where ∆φH = 0 . (6.36)
Rn
The homogeneous solution φH can be used to satisfy the boundary conditions on φ. Note that the
requirement on ρ to have compact support also makes physical sense: normally charge or mass distributions
are localised in space.
Since the boundary ∂V consists of the two components ∂V and ∂B (a) this implies
Z Z
→0
(φ∇Ga − Ga ∇φ) · dS = (φ∇Ga − Ga ∇φ) · dS −→ φ(a) , (6.41)
∂V ∂B (a)
where Lemma (6.1) has been used in the final step. Since the integral on the LHS is independent of this
completes the proof.
98
We are now ready to proof the first important property of harmonic functions.
Theorem 6.4. (Mean value property of harmonic functions) Let U ⊂ Rn be open, φ harmonic on U and
Br (a) ⊂ U . Then
1
Z
φ(a) = φ(a + ry)dS (6.42)
vn |y|=1
Proof. From the previous Lemma we have
Z
φ(a) = (φ∇Ga − Ga ∇φ) · dS , (6.43)
|x−a|=r
already gives the desired result. It remains to be shown that the second part of the integral vanishes.
Since Ga is constant for |x − a| = r it is unimportant and it is sufficient to consider
Z Z Z
Ga ∇φ · dS = (1∇φ − φ∇1) · dS = (1∆φ − φ∆1)dV = 0 , (6.45)
|x−a|=r |x−a|=r |x−a|≤r
Since M − φ(x − ry) ≥ 0 this implies that φ(x − ry) = M for all |y| = 1 and, hence, φ(y) = M for all
y ∈ B (x).
To extend the statement to all of U we assume there exists a a ∈ U with φ(a) = M . Assume that φ
is not constant on U so there is a b ∈ U with φ(b) < M and choose a (continuous) path α : [0, 1] → U
which connects a and b, that is, α(0) = a and α(1) = b. Let t0 = sup{t ∈ [0, 1] | φ(α(t)) = M } be the
“upper value” of t along the path for which the maximum is assumed. Since φ(b) < M necessarily t0 < 1
and since φ ◦ α is continuous we have φ(α(t0 )) = M . But from the first part of the proof we know there
is a ball B (α(t0 )) where φ equals M which is in contradiction with t0 being the supremum. Hence, the
assumption φ(b) < M was incorrect.
Corollary 6.1. Let U ⊂ Rn be a bounded, connected open set and φ be harmonic on U and continuous
on Ū . Then φ assumes its maximum and minimum on the boundary of U .
Proof. Since Ū is compact φ assumes its maximum on Ū . If the maximum point is on the boundary ∂ Ū
then the statement is true. If it is in U then, from the previous theorem, φ is constant on U and, hence,
by continuity, constant on Ū . Therefore it also assume its maximum on the boundary. The corresponding
statement for the minimum follows by considering −φ.
99
These innocent sounding statements have important implications for boundary value problems. Sup-
pose we have an open, bounded and connected set V ⊂ U ⊂ Rn and we would like to solve the Dirichlet
boundary value problem
∆φ = ρ , φ|∂V = h , (6.47)
where ρ is a given function on U and h is a given function on the boundary ∂V which prescribes the
boundary values of φ. Suppose we have two solutions φ1 and φ2 to this problem. Then the difference
φ := φ1 − φ2 satisfies
∆φ = 0 , φ|∂V = 0 . (6.48)
Since φ is harmonic it assumes its maximum and minimum on the boundary and since the boundary
values are zero we conclude that φ = 0. This means that the solution to the boundary value problem, if
it exists, is unique.
We also have a uniqueness statement for our solution (6.34).
Corollary 6.2. For ρ ∈ Cc2 (Rn ) and n ≥ 3 the equation ∆φ = ρ has a unique solution φ : Rn → R with
lim|x|→∞ |φ(x)| = 0. This solution is given by Eq. (6.34).
Proof. The solution (6.34) has the desired property at infinity since lim|x|→∞ Ga (x) = 0 for n ≥ 3. To
show uniqueness assume another solution, φ̃, with the same property and consider the difference ψ = φ− φ̃.
We have
∆ψ = 0 , lim |ψ(x)| = 0 . (6.49)
|x|→∞
From the vanishing property at infinity, we know that for every > 0, there exists a radius R such that
|ψ(x)| ≤ for all |x| ≥ R. But the restricted function ψ|BR (0) assumes its maximum and minimum on the
boundary so it follows that |ψ(x) ≤ for all x ∈ Rn . Since > 0 was arbitrary this means that ψ = 0.
∆φ = ρ , n · ∇φ|∂V = h . (6.50)
Consider a harmonic function φ on U and set f = g = φ in Green’s first identity (6.24). This results in
Z Z
|∇φ|2 dV = φ ∇φ · n dS , (6.51)
V ∂V
and, hence,
∆φ = 0 , φ|∂V = 0 =⇒ φ=0
(6.52)
∆φ = 0 , ∇φ · n|∂V = 0 =⇒ φ = const .
Applied to the difference φ = φ1 − φ2 of two solutions φ1 , φ2 to the Dirichlet problem (6.47) this result
implies φ1 = φ2 , so uniqueness of the solution. Applying it to the difference φ = φ1 −φ2 of two solutions to
the von Neumann problem (6.50) gives φ1 = φ2 + const, so uniqueness up to an additive constant (which
does not change E = −∇φ).
After this somewhat theoretical introduction we now proceed to the problem of how to solve Laplace’s
equation in practice, starting with the two-dimensional case.
100
6.3 Laplace equation in two dimensions
At first sight, the two-dimensional Laplace equation seems of little physical interest - after all physical
space has three dimensions. However, there are many problems, for example, in electro-statics, which are
effectively two-dimensional due to translational symmetry in one direction. (Think, for example, of the
field produced by of a long charged wire along the z-axis.)
We denote the Cartesian coordinates by x = (x, y)T and also introduce complex coordinates z = x + iy
and z̄ = x − iy. Recall that the two-dimensional Laplacian can be written as
∂2 ∂2 ∂2
∆= + = 4 . (6.53)
∂x2 ∂y 2 ∂ z̄∂z
Application 6.22. Solving the two-dimensional Laplace equation with complex methods
Suppose we want to solve Laplace’s equation in the positive quadrant {(x, y) | x ≥ 0, y ≥ 0} and
we impose Dirichlet boundary conditions φ(0, y) = φ(x, 0) = 0 along the positive x and y axis. (See
Fig. 14.) It is clear that the holomorphic function w = z 2 has a vanishing imaginary part along
the real and imaginary axis (just insert z = x and z = iy to check this) and, hence, the choice
φ = v = Im(z 2 ) = 2xy leads to a harmonic function with the desired boundary property. On the
other hand, if we had imposed Neumann boundary conditions ∂φ ∂φ
∂x (0, y) = ∂y (x, 0) = 0 along the
positive x and y axis, the real part of w (having perpendicular equipotential lines) leads to a viable
solution φ = u = Re(z 2 ) = x2 − y 2 . (Of course any even power of z would also do the job so the
solution is not unique. This is because we haven’t specified boundary conditions at infinity.) The
equipotential lines for both solutions are shown in Fig. 15.
For another example, consider solving Laplace’s equation on U = {z ∈ C | |z| > 1} with Dirichlet
boundary condition φ||z|=1 = 0. (See Fig. 14.) It is clear that the function w = z+z −1 is real for |z| = 1
(and it is holomorphic on U ) so with z = reiϕ we have a solution φ = v = Im(z+z −1 ) = 2(r−r−1 ) sin ϕ.
(Again, this is not unique since we have to specify another boundary condition, for example at infinity.)
101
The equipotential lines for this solution are shown in Fig. 15. The solution for |z| ≤ 0 is of course
φ = 0, the unique solution consistent with the boundary conditions at |z| = 1.
y y
(0, y) = 0 V
V
x x
(x, 0) = 0
|r=1 = 0
5 5
2
4 4
3 3
2 2
-1
1 1
0 0 -2
0 1 2 3 4 5 0 1 2 3 4 5 -2 -1 0 1 2
Figure 15: Equipotential lines for φ(x, y) = xy (left) and φ(x, y) = x2 −y 2 (middle) and φ(x, y) = Im(z+z −1 ).
∂2 ∂2
∆φ = 0 , ∆= + , (6.58)
∂x2 ∂y 2
in Cartesian coordinates. We start by considering solutions of the separated form
102
where X = X(x) and Y = Y (y) are functions of their indicated arguments only. Inserting this Ansatz
into Eq. (6.58) gives
X 00 Y 00
(x) + (y) = 0 . (6.60)
|X{z } |Y{z }
=−α2 =α2
The argument goes that the two terms, being functions of different variables, can only add up to zero
if they are equal to constants α2 and −α2 individually, as indicated above. Solving the resulting two
ordinary differential equations
X 00 = −α2 X , Y 00 = α2 Y , (6.61)
results in the solutions
where aα , bα , cα and dα are arbitrary constants. This by itself gives a rather special solution to the
equation but it does so for every choice of the constant α. Since the equation we are solving is linear we
can, hence, construct more general solutions by linearly combining solutions of the above type for different
values of α. This leads to
X
φ(x, y) = (aα cos(αx) + bα sin(αx))(cα eαy + dα e−αy ) , (6.63)
α
where the sum ranges over some suitable set of α values. Of course this is a large class of solutions which
can be narrowed down, or be made unique by imposing boundary conditions. Whether this works out in
practice depends on the type of boundary conditions and if they are “compatible” with the chosen set of
coordinates and resulting solution. Here, we are working with Cartesian coordinates and this goes well
together with boundary conditions imposed along lines with x = const and y = const. More generally,
building in boundary conditions tends to be easiest if coordinates are chosen such that the boundaries are
defined by one of the coordinates being constants. For example, polar or spherical coordinates go well
with imposing boundary conditions on circles or spheres, as we will see below.
which, for any fixed y, is a sine Fourier series on the interval [0, a]. This already indicates how we
built in the final boundary condition φ(x, b) = h(x). Setting y = b in the above formula, we can
determine the coefficients simply by standard (sine) Fourier series techniques and obtain
Z a
2 kπx
bk = dx h(x) sin . (6.65)
a sinh kπb a
a 0
103
For any given boundary potential h these coefficients can be calculated and inserting these back into
Eq. (6.64) gives the complete solution.
This is a Fourier series which must be identical to zero so all the Fourier coefficients must vanish. This
leads to a set of ordinary differential equations
(rA00 )0 = 0 , r(rA0k )0 = k 2 Ak , r(rBk0 )0 = k 2 Bk , (6.69)
for Ak and Bk . They are easy to solve and lead to
A0 (r) = a0 + ã0 ln r , Ak (r) = ak rk + ãk r−k , Bk (r) = bk rk + b̃k r−k . (6.70)
Inserting these results back into Eq. (6.67) gives for the general solution of the two-dimensional homoge-
neous Laplace equation in polar coordinates
∞ ∞
a0 ã0 X X
φ(r, ϕ) = + ln r + (ak rk + ãk r−k ) cos(kϕ) + (bk rk + b̃k r−k ) sin(kϕ) . (6.71)
2 2
k=1 k=1
The coefficients ak , bk , ãk and b̃k are arbitrary at this stage and have to be fixed by boundary conditions.
104
This is simply the Fourier series for the function h and we can find the Fourier coefficients ak and bk
by the usual formulae (3.22).
Now consider solving the problem for the same boundary condition φ(1, ϕ) = h(ϕ) but for the
“exterior” region {(r, ϕ) | r ≥ 1} imposing, in addition, that φ remains finite as r → ∞. The last
condition demands that ã0 = 0 and ak = bk = 0 for k = 1, 2, . . . so we have
∞
a0 X !
φ(1, ϕ) = + (ãk cos(kϕ) + b̃k sin(kϕ)) = h(ϕ) . (6.73)
2
k=1
As before, this is a Fourier series for h and we can determine the Fourier coefficients by the standard
formulae (3.22). So the full solution for φ is
a0 P∞ k k
2 + Pk=1 (ak r cos(kϕ) + bk r sin(kϕ)) for r ≤ 1
φ(r, ϕ) = a0 ∞ −k −k , (6.74)
2 + k=1 (ak r cos(kϕ) + bk r sin(kϕ)) for r ≥ 1
where ak and bk are the Fourier coefficients of the function h. Note that the two parts of this solution
fit together at r = 1 as they must.
6.4.1 Functions on S 2
We recall that the two-sphere is usually parametrised by two angles (θ, ϕ) ∈ [0, π]×[0, 2π[, as in Eq. (6.18).
Alternatively and often more conveniently, we can use the coordinates (x, ϕ) ∈ [−1, 1] × [0, 2π[ where
x = cos θ. Sometimes it is also useful to parametrise the two-sphere by unit vectors n ∈ R3 . In terms of
the coordinates (x, ϕ) the Laplacian (6.20) takes the form
∂2 ∂ 1 ∂2
∆S 2 = (1 − x2 ) − 2x + . (6.75)
∂x2 ∂x 1 − x2 ∂ϕ2
We should add a word of caution about functions f : S 2 → F on the two-sphere. In practice, we describe
these as functions f = f (x, ϕ) of the coordinates but not all of these are well-defined on S 2 . First of all,
a continuous f needs to be periodic in ϕ, so f (θ, 0) = f (θ, 2π). There is another, more basic condition
which arises because the parametrisation (6.18) breaks down at (x, ϕ) = (±1, ϕ) which correspond to the
same two points (the north and the south pole) for all values of ϕ. Hence, a function f = f (x, ϕ) is only
well-defined on S 2 if f (±1, ϕ) is independent of ϕ. So, for example, f (x, ϕ) = x sin ϕ is not well-defined
on S 2 while f (x, ϕ) = (1 − x2 ) sin ϕ is. This discussion can be summarised by saying that we can expand
a function f on the two-sphere in a Fourier series
X
f (x, ϕ) = ym (x)eimϕ , (6.76)
m∈Z
105
Another useful observation is that as an operator on the inner product space C ∞ (S 2 ) with scalar
product Z
hf, hiS 2 = f (x)∗ h(x) dS , dS = sin θdθ dϕ = dx dϕ , (6.77)
S2
the Laplacian ∆S 2 is self-adjoint. This is most elegantly seen by using the general formulae from
Lemma 6.1.
∗ 1 ∂ √ ij ∂h √ k
Z Z Z
∗
hf, ∆hiS 2 = f ∆h dS = f (t) √ gG (t) gd t = (∆f )∗ h dS = h∆f, hiS 2 .
S 2 V g ∂ti ∂t j S 2
(6.78)
Hence, we know that the eigenvalues of ∆S 2 are real and eigenvectors for different eigenvalue are orthogonal
relative to the above inner product.
It is customary to include a suitable normalisation factor and define the spherical harmonics
s
2l + 1 (l − m)! m
Ylm (θ, ϕ) = P (cos θ)eimϕ , l = 0, 1, . . . , m = −l, . . . , l . (6.82)
4π (l + m)! l
Exercise 6.7. Show that Ylm = (−1)m (Yl−m )∗ . Also show that the first few spherical harmonics are given
by r r
0 1 ±1 3 ±iϕ 0 3
Y0 = √ , Y1 = ∓ sin θe , Y1 = cos θ . (6.84)
4π 8π 4π
106
From what we have seen, the Ylm are eigenfunctions
of ∆S 2 and all Ylm for m = −l, . . . , l have the same eigenvalue λ = −l(l + 1) which, hence, has degeneracy
2l+1. We already know that Ylm and Ylm 0
0 must be orthogonal for l 6= l but, in fact, due to the orthogonality
of the eimϕ functions the Ylm form an orthogonal system. A detailed calculation, based on Eq. (4.44),
shows that
0 mm0
hYlm , Ylm
0 iS 2 = δll0 δ , (6.86)
so they form an ortho-normal system on L2 (S 2 ). In fact, we have
Theorem 6.8. The spherical harmonics Ylm form an orthogonal basis on L2 (S 2 ).
Proof. The proof can, for example, be found in Ref. [7].
If f happens to be independent of the angle ϕ then we only need the m = 0 terms in the above expansion
and we can write
∞ Z 1
X 8π 2
f (x) = al Pl (cos θ) , al = dx Pl (x)f (x) . (6.88)
2l + 1 −1
l=0
Proof. Let the function F : S 2 × S 2 → C be defined by the RHS of Eq. (6.89). Our discussion of rotations
in Section 9.4 will show that this functions has the property F (Rn0 , Rn) = F (n0 , n), for any rotation R,
so F is invariant under simultaneous rotation of its two arguments. Now, let R be a rotation such that
Rn0 = e3 , so that F (e3 , Rn) = F (n0 , n). The vector e3 is the north pole of S 2 , and corresponds to the
coordinate value x0 = cos θ0 = 1. Hence, Ylm (e3 ) = 0 for m 6= 0 and
r r
0 2l + 1 2l + 1
Yl (e3 ) = Pl (1) = . (6.90)
4π 4π
Inserting this into the definition of F (the RHS of Eq. (6.89)) we find that F (e3 , ñ) = Pl (cos θ̃) = Pl (ñ·e3 ).
From this special result for F we can re-construct the entire function using the rotational invariance:
107
We can use this formula to re-write the expansion (4.40) of a Coulomb potential in terms of Legendre
polynomials. Setting r = rn, r0 = r0 n0 , cos θ = n · n0 in this formula and using Eq. (6.89) we get
∞ l 0 l
1 4π X X 1 r
0
= Ylm (n0 )∗ Ylm (n) . (6.92)
|r − r | r 2l + 1 r
l=0 m=−l
Let us apply this result to cast the solution (6.34) to the inhomogeneous Laplace equation in a different
form. Specialising to the three-dimensional case, we have seen that the unique solution to ∆3 φ = −4πρ
which approaches zero for |r| → ∞ is given by
ρ(r0 ) 3 0
Z
φ(r) = 0
d r (6.93)
R3 |r − r |
This is called the multipole expansion and qlm are called the multipole moments of the source ρ. The
multipole expansion gives φ as series in inverse powers of the radius r and is, therefore, useful if r is much
larger than the extension of the (localised) source ρ. The l = 0 term which is proportional to 1/r is called
the monopole term and its coefficient q00 is the total charge of ρ. The l = 1 term which decreases as 1/r2
is the dipole term with dipole moments q1m , the l = 2 term with a 1/r3 fall-off is the quadrupole term
with quadrupole moments q2m and so on.
Exercise 6.9. Convert the monopole and dipole term in the multipole expansion (6.94) into Cartesian
coordinates (Hint: Use the explicit form of the spherical harmonics in Eq. (6.84).)
108
x x
(x, 0) = 0
|r=1 = 0
y y
(0, y) = 0 V
V
q q q̃ q
a a x ã a x
|r=a = 0
plane. (See Fig. 16.) We consider a source ρ which corresponds to a single charge q located at
r0 = (a, 0, 0) ∈ V, where a > 0, and demand the boundary condition φ|x=0 = 0. The Coulomb
potential for this charge
q
φI (r) = (6.95)
|r − r0 |
does satisfy the correct equation, ∆φI = −4πρ, but does not satisfy the boundary condition. Suppose,
we introduce the mirror charge density ρ̃ by a single point charge with charge −q located at r̃0 =
(−a, 0, 0) ∈ R3 \ V leading to a potential
q
φH (r) = − (6.96)
|r − r̃0 |
q̃ q
φ(r) = + (6.98)
|r − r̃0 | |r − r0 |
satisfies ∆φ = −4πρ outside the sphere. We should try to fix q̃ and ã so that φ||r|=b = 0 and a short
calculation shows this is satisfied if
b b2
q̃ = − q , ã = . (6.99)
a a
109
6.5.2 Cartesian coordinates
We would like to solve the three-dimensional homogeneous Laplace equation
∂2 ∂2 ∂2
∆φ = 0 , ∆= + + . (6.100)
∂x2 ∂y 2 ∂z 2
One way to proceed is by a separation Ansatz just as we did in the two-dimensional case with Cartesian
coordinates but here we would like to follow a related by slightly different logic based on a Fourier series
expansion.
Exercise 6.10. Solve the three-dimensional Laplace equation in Cartesian coordinates by separation of
variables.
where Zk,l are functions of z. Inserting this into the Laplace equation gives an ordinary differential
equation s
πk 2
2
00 2 πl
Zk,l = νk,l Zk,l , νk,l = + , (6.102)
a b
for each Zk,l whose general solution is
with arbitrary constants Ak,l and Bk,l . Combining this result with Eq. (6.101) leads to the general
solution
∞
X
νk,l z −νk,l z
πkx πly
φ(x, y, z) = Ak,l e + Bk,l e sin sin (6.104)
a b
k,l=1
of the Laplace equation in the box V = [0, a]×[0, b]×[0, c] which vanishes on the boundaries at x = 0, a
and y = 0, b. To fix the remaining constants we have to specify boundary conditions at z = 0 and
z = c. Suppose we demand that φ(x, y, 0) = 0. This can be achieved by setting Ak,l = −Bk,l =: ak,l /2
so that the solution becomes
∞
X πkx πly
φ(x, y, z) = ak,l sinh(νk,l z) sin sin . (6.105)
a b
k,l=1
Finally, assume for the last boundary at z = c that φ(x, y, c) = h(x, y) for a given function h. Then
setting z = c in Eq. (6.105) is a (double) sine Fourier series for the function h and we can compute
the remaining parameters ak,l by standard Fourier series techniques as
4 πkx πly
Z
ak,l = dx dy sin sin h(x, y) . (6.106)
ab sinh(νk,l c) [0,a]×[0,b] a b
110
Of course this calculation can be repeated, and leads to a similar result, if another one of the six
boundary planes is subject to a non-trivial boundary condition while φ is required to vanish on
the other five boundary planes. In this way we get six solutions, all similar to the above but with
coordinates permuted and constants determined as appropriate. The sum of these six solutions then
solves the homogeneous Laplace equation with non-trivial boundary conditions on all six sides.
ν2 ˆ z2 ˆ
1 d d
r − 2 Jν,k = − νk Jνk , (6.107)
r dr dr r a2
where zνk are the zeros of the Bessel functions. The fact that the operator on the LHS is similar to the
radial part of the Laplacian in cylindrical coordinates (6.15) is part of the motivation of why we are using
the Bessel functions. We can then expand
∞ X
∞
Zkm (z)Jˆνk (r)(akm sin(mϕ) + bkm cos(mϕ))
X
φ(r, ϕ, z) = (6.108)
k=1 m=0
with as yet undetermined functions Zkm . Inserting this expansion into the Laplace equation in cylindrical
coordinates (6.15) and using the eigenvalue equation (6.107) for the Bessel functions leads to the differential
equation
00 z2
Zkm = mk Zkm , (6.109)
a2
provided the previously arbitrary type, ν, of the Bessel function is fixed to be ν = m. Solving this equation
and inserting the solutions back into the expansion (6.108) gives
∞ X
∞ z
mk z
z z
mk
Jˆmk (r)(akm sin(mϕ) + bkm cos(mϕ))
X
φ(r, ϕ, z) = Akm exp + Bkm exp −
a a
k=1 m=0
(6.110)
To fix the remaining coefficients we have to impose boundary conditions at the bottom and top of the
cylinder. For example, if we demand at the bottom that φ|z=0 = 0 then
∞ X
∞ z
mk z
Jˆmk (r)(akm sin(mϕ) + bkm cos(mϕ)) .
X
φ(r, ϕ, z) = sinh (6.111)
a
k=0 m=0
Finally, if we demand the boundary condition φz=L = h, for some function h = h(r, ϕ) at the top of
the cylinder the remaining coefficients akm and bkm can be determined by combining the orthogonality
properties (3.22) of the Fourier series with those of the Bessel functions (5.65).
Exercise 6.11. Use orthogonality properties of the Fourier series and the Bessel functions to find expres-
sions for the coefficients amk and bmk in the solution (6.111), so that the boundary condition φ|z=L = h
is satisfied for a function h = h(r, ϕ).
111
6.5.4 Spherical coordinates
To discuss the three-dimensional Laplace equation in spherical coordinates it is very useful to recall that
the three-dimensional Laplace operator can be written as
1 ∂ 2 ∂ 1
∆3,sph = 2 r + 2 ∆S 2 , (6.112)
r ∂r ∂r r
where ∆S 2 is the Laplacian on the two-sphere. Also recall that we have the spherical harmonics Ylm which
form an orthonormal basis of L2 (S 2 ) and are eigenfunctions of ∆S 2 , with
∆S 2 Ylm = −l(l + 1)Ylm . (6.113)
All this suggest we should start with an expansion
∞ X
X l
φ(r, θ, ϕ) = Rlm (r)Ylm (θ, ϕ) . (6.114)
l=0 m=−l
Inserting this expansion into the homogeneous Laplace equation, ∆φ = 0, and using the eigenvector
property (6.113) leads to the differential equations
d 2 0
r Rlm = l(l + 1)Rlm (6.115)
dr
with general solutions
Rlm (r) = Alm rl + Blm r−l−1 , (6.116)
for constants Alm and Blm . Inserting this back into the expansion (6.114) leads to the general solution to
the homogeneous Laplace equation in spherical coordinates:
∞ X
X l
φ(r, θ, ϕ) = (Alm rl + Blm r−l−1 )Ylm (θ, ϕ) . (6.117)
l=0 m=−l
The arbitrary constants Alm and Blm are fixed by boundary conditions and, given the choice of coordinates,
they are relatively easy to implement if they are imposed on spherical boundaries. We also note that for
problems with azimutal symmetry, that is, when the boundary conditions and φ are independent of
ϕ, we only require the m = 0 terms in the above expansion. Since the Yl0 are proportional to the
Legendre polynomials this means, after a re-definition of the constants, that, for such problems, we have
the simplified expansion
X∞
φ(r, θ) = (Al rl + Bl r−l−1 )Pl (cos θ) . (6.118)
l=0
112
On the other hand, we might be interested in the region outside a sphere of radius a, so V =
{(r, θ, ϕ) | r ≥ a}, with the same boundary condition φ|r=a = h at r = a. In this case we also
have to specify a boundary condition at “infinity” which can, for example, be done by demanding
that φ → 0 as r → ∞. This last condition implies that all terms with positive powers of r in the
expansion (6.114) must vanish, so that Alm = 0. The remaining coefficients Blm are then fixed by
the boundary condition at r = a and are, in analogy with the previous case, given by
Z
l+1
Blm = a hYlm , hiS 2 = al+1
(Ylm )∗ h dS . (6.120)
S2
113
7 Distributions
The Dirac delta function, δ(x), introduced by Dirac in 1930, is an object frequently used in theoretical
physics. It is usually “defined” as a “function” on R with properties
Z
0 for x 6= 0
δ(x) = , dx δ(x) = 1 . (7.1)
∞ for x = 0 R
Of course assigning a function “value” of infinity at x = 0 does not make sense. Even if we are prepared
to overlook this we know from our earlier discussion of integrals that a function which is zero everywhere,
except at one point, must integrate to zero (since a single point is a set of measure zero), rather than to
one. Hence, the above “definition” is at odds with basic mathematics, yet it is routinely used in a physics
context. The purpose of this chapter is to introduce the proper mathematical background within which
to understand the Dirac delta - the theory of distributions - and to explain how, in view of this theory, we
can get away with mathematically questionable equations such as Eq. (7.1). The fact that the Dirac delta
is not a function but rather a distribution has far-reaching consequences for many equations routinely
used in physics. For example, the “Coulomb potential” 1/r, where r = |x| is the three-dimensional radius,
satisfies an equation often stated as
1
∆ = −4π δ(x) , (7.2)
r
∂2
where ∆ = 3i=1 ∂x
P
2 is the three-dimensional Laplacian. (We will prove the correct version of this equation
i
below.) Since the right-hand side of this equation should be understood as a distribution so must be the
left-hand side and this leads to a whole range of questions. With these motivations in mind we now begin
by properly defining distributions.
Definition 7.1. The space of test functions is the vector space D = D(Rn ) := Cc∞ (Rn ) of infinitely many
times differentiable functions with compact support and a function ϕ ∈ D is called a test functions.
We say a sequence (ϕk ) of test functions converges to a function ϕ ∈ D iff
(i) There is an R > 0 such that ϕk (x) = 0 for all k and all x ∈ Rn with |x| > R.
(ii) (ϕk ) and all its derivatives converge to ϕ uniformly for all x ∈ Rn with |x| ≤ R.
D
In this case we write ϕk → ϕ.
Note that the above version of convergence is very strong. An example of a test function is
(
a2
exp − a2 −|x| 2 for |x| < a
ϕ(x) = (7.3)
0 for |x| ≥ a
Clearly, this function has compact support and the structure of the exponential ensures that it drops
to zero at |x| = a in an infinitely many times differentiable way. All polynomials times the above ϕ
are also test functions so the vector space D is clearly infinite dimensional. We are now ready to define
distributions.
114
Definition 7.2. A distribution is a linear, continuous map
T : D(Rn ) −→ R
. (7.4)
ϕ −→ T [ϕ]
D
Continuity means that ϕk → ϕ implies T [ϕk ] → T [ϕ]. The space of all distributions is called D0 = D0 (Rn ).
Clearly, the space of distributions D0 is a vector space, a sub-space of the dual vector space to D. Hilbert
spaces are isomorphic to their dual spaces but the test function space D is not a Hilbert space (for example,
it is not complete relative to the standard integral norms). In some sense, the test function space D is
rather “small” and, hence, we expect the dual space to be large. To get a better feeling for what this
means, we should now consider a few examples of distributions.
Clearly, Tf is linear and continuous and, hence, it is a distribution. It is also not too hard to show that
the map f → Tf is injective, so that the continuous functions are embedded in the space of distributions.
We can slightly generalise the above construction. For f ∈ L1loc (Rn ) (a locally integrable function) the
above definition of Tf still works. However, the map f → Tf is no longer injective - two functions which
only differ on a set of measure zero are mapped to the same distribution.
For a ∈ Rn the Dirac δ-distribution δa ∈ D0 (Rn ) is defined as
which is frequently written in a physics context. The integral on the LHS is purely symbolic - the reason
one usually gets away with using this notation is that both the integral and the distribution are linear.
The above examples show that distributions should be thought of as generalisations of functions. They
contain functions via the map f → Tf but, in addition, they also contain “more singular” objects such as
the Dirac-delta which cannot be interpreted as a function. The idea is now to generalise many of the tools
of analysis from functions to distributions. We begin with a definition of convergence for distributions.
The idea of this definition is that convergence of distributions is “tested” with functions ϕ ∈ D and hence
the name “test functions”. The above notion of convergence can be used to gain a better understanding of
the Dirac delta. Although by itself not a function the Dirac delta can be obtained as a limit of functions,
as explained in the following
115
Theorem 7.1. Let f : Rn → R be an integrable function with dxn f (x) = 1. For > 0 define the
R
Rn
functions f (x) := 1n f x . Then we have
D0
Tf −→ δ0 as −→ 0 . (7.8)
Proof. We need to show convergence in the sense of distributions, as defined in Def. 7.3. This means we
must show that
lim Tf [ϕ] = ϕ(0) (7.9)
→0
The integrand on the RHS side is bounded by an integrable function since |f (y)ϕ(y)| < Kf (y), for some
constant K. This means we can pull the limit → 0 into the integral so that
Z Z
n
lim Tf [ϕ] = dy f (y) lim ϕ(y) = ϕ(0) dy n f (y) = ϕ(0) . (7.11)
→0 Rn →0 Rn
In the one-dimensional case, a possible choice for the functions f and f in Theorem 7.1 is provided by
the Gaussians
1 2 1 x2
f (x) = √ e−x /2 ⇒ f (x) = √ e− 22 . (7.12)
2π 2π
The graph of f for decreasing values of is shown in Fig. 17 and illustrates the idea behind Theorem 7.1.
f Ε HxL
4
0
-3 -2 -1 0 1 2 3
For decreasing the function f becomes more and more peaked around x = 0 and tends to δ0 in the limit.
However, note that f does actually not converge to any function in the usual sense, as → 0. Rather,
convergence only happens in the weaker sense of distributions, as stated in Theorem 7.1.
We can use the above representation of the Dirac delta as a limit to introduce a new class of distribu-
tions. For a function g in C ∞ (R) we can define the distribution δg ∈ D0 (R) by
116
To see what this means, we assume that we can split up R into intervals Iα such that g|Iα is invertible
with inverse gα−1 . Then, following the calculation in Theorem 7.1, we have
ϕ(g −1 (y))
XZ 1 g(x) y=g(x)/ XZ X ϕ(a)
δg [ϕ] = lim dx f ϕ(x) = lim dy f (y) 0 α−1 = .
→0
α Iα →0
α g(Iα )/ |g (gα (y))| |g 0 (a)|
a:g(a)=0
This result suggests an intuitive interpretation of δg as a sum of Dirac deltas located at the zeros of the
function g. As an example, consider the function g(x) = x2 − c2 , where c > 0 is a constant. This function
has two zeros at ±c and |g 0 (±c)| = 2c. Inserting this into Eq. (7.14) gives
1
δx2 −c2 = (δc + δ−c ) . (7.15)
2c
It is useful to translate the above equations into the notation commonly used in physics. There, δg is
usually written as δ(g(x)) and Eq. (7.14) takes the form
X 1
δ(g(x)) = δ(x − a) , (7.16)
|g 0 (a)|
a:g(a)=0
117
Application 7.28. Unusual convergence of distributions
Consider the sequence of functions fk : R → R defined by fk (x) = sin(kx), where k = 1, 2, . . ., so a
sequence of sine functions with increasing frequency. Clearly, this sequence does not converge to any
function in the usual sense. What about convergence of the associated sequence Tfk of distributions?
To work this out, we define the sequence of functions (gk ) with gk (x) = − k1 cos(kx). We have
Dgk = fk ⇒ DTgk = Tfk and, since gk → 0 uniformly, as k → ∞, it follows that
D0 D D0
Tgk −→ T0 ⇒ Tfk −→ T0 , (7.20)
where T0 is the distribution associated to the function identical to zero. This is a surprising result
which shows that distributions can behave quite differently from functions: While the sequence of
functions (fk ) does not converge at all, the associated sequence of distributions (Tfk ) converges to the
zero distribution.
As a function, θ is not differentiable at x = 0 (since it is not even continuous there) but we can still
ask about the differential of the associated Heaviside distribution Tθ . A short calculation shows that
Z Z ∞
0
DTθ [ϕ] = Tθ [−Dϕ] = − dx θ(x)ϕ (x) = − dxϕ0 (x) = ϕ(0) = δ0 [ϕ] (7.22)
R 0
and, hence,
DTθ = δ0 . (7.23)
The Dirac delta is the derivative of the Heaviside distribution. In a physics context this is frequently
written as
d
θ(x) = δ(x) , (7.24)
dx
ignoring the fact that θ, as a function, is actually not differentiable.
118
7.2 Convolution of distributions∗
We have already encountered convolutions of functions in the context of Fourier transforms. We would now
like to introduce convolutions between functions and distributions. Suppose we have a locally integrable
∞ (Rn ) and a test function ϕ ∈ D(Rn ). From Eq. (3.55) the convolution of these two
function f ∈ Cloc
functions is given by Z
(f ? ϕ)(x) = dy n f (y)ϕ(x − y) . (7.27)
Rn
Of course we would like to extend this definition to convolutions such that the above formula is reproduced
for distributions Tf , associated to a function f . Hence, we start by re-writing Eq. (7.27) in terms of Tf
and, to this end, we introduce the translation τx by x ∈ Rn , defined by
It is easy to see that with this definition, the convolution (7.27) can be re-written as
The RHS of this equation can be used to define convolutions between distributions and functions.
Definition 7.5. For a distribution T ∈ D0 (Rn ) and a test function ϕ ∈ D(Rn ) the convolution T ? ϕ :
Rn → R is a function defined by
(T ? ϕ)(x) := T [τx ϕ] . (7.30)
This means that the Dirac delta δ0 can be seen as the identity of the convolution operation.
The result of a convolution T ? ϕ is a function and it is an obvious question what the properties of this
function are. Some answers are given by the following
Theorem 7.2. For a distribution T ∈ D0 (Rn ) and a test function ϕ ∈ D(Rn ) the convolution T ? ϕ is a
differentiable function and we have
(a) (b)
Di (T ? ϕ) = T ? (Di ϕ) = (Di T ) ? ϕ . (7.32)
119
7.3 Fundamental solutions - Green functions∗
We have already seen Green functions appear a number of times, notably in the context of ordinary linear
differential equations and in the context of the Laplace equation. They are, in fact, quite a general concept
which can be formulated concisely using distributions. To get a rough idea of how this works let us first
discuss this using the physics language where we work with a Dirac delta “function”, δ(x).
Suppose, we have a differential operator L, (typically second order) and we are trying to solve the
equation Lφ(x) = ρ(x). (The Laplace equation with source ρ is an example but the discussion here is for
a general differential operator.) Suppose we have found a “function” G satisfying
For example, for the (three-dimensional) Laplace equation ∆φ(x) = −4πρ(x), we have the Green function
G(x) = 1/r, where r = |x|, which satisfies
1
∆ = −4πδ(x) . (7.36)
r
(This statement will be justified in Theorem 7.4 below.) Hence, we have a solution
ρ(y)
Z
φ(x) = dy 3 (7.37)
R3 |x − y|
While the above discussion is what you will normally find in the physics literature we know by now that
it is mathematically questionable and has to be justified by a proper treatment in terms of distributions.
The general theorem facilitating this is
Theorem 7.3. If the distribution E ∈ D0 (Rn ) satisfies LE = δ0 for a differential operator L and ρ ∈
D(Rn ) is a testfunction, then φ := E ? ρ satisfies Lφ = ρ.
Here we have used Theorem 7.2 and the fact that the Dirac delta is the identity under convolution, as in
Eq. (7.31).
In the above theorem, the distribution E, also called a fundamental solution for L, is the analogue of
the Green function and a general solution to the inhomogeneous equation with source ρ is obtained by
convoluting E with ρ.
Of course, for this theorem to be of practical use we first have to work out a fundamental solution for
a given differential operator L. For the Laplace operator, L = ∆, the following theorem provides such a
fundamental solution:
120
Theorem 7.4. The distribution T1/r ∈ D0 (R3 ), where r = |x|, satisfies ∆T1/r = −4πδ0 and is, hence, a
fundamental solution for the Laplace operator.
Proof. We have
∆ϕ
Z
∆T1/r [ϕ] = T1/r [∆ϕ] = dx3 . (7.39)
R3 r
That this integral is equal to −4πδ0 [ϕ] has already been shown in Theorem (6.3).
Hence, T1/r is a fundamental solution of the Laplace operator and can be used to write down solutions
of the inhomogeneous Laplace equation by applying Theorem 7.3. In fact, in Corollary 6.2 we have seen
that this provides the unique solution φ to the inhomogeneous Laplace equation with φ → 0 as r → ∞.
We have also encountered Green functions in our discussion of second order ordinary differential
equations in Section 5 and it is useful to re-formulate these results in terms of distributions. Recall that
the relevant differential operator is given by
d2 d
L = α2 (x) + α1 (x) + α0 (x) , (7.40)
dx2 dx
and, on the interval x ∈ [a, b], we would like to solve the equation Ly = f , with y(a) = y(b) = 0. The
solution to this problem is described, in terms of a Green function, in Theorem 5.6. From this statement
we have
Theorem 7.5. For G̃(x) := G(0, x) where G is the Green function function from Theorem 5.6, but for
the operator L† , the distribution TG̃ , is a fundamental solution of the operator (7.40), that is, LTG̃ = δ0 .
Proof. From Eq. (5.83) the Green function for L† can be written in terms of its eigenfunvyions ek and
eigenalues λk as
X 1
G(x, t) = w ek (t)ek (x) , (7.41)
λk
k
where
L† ek = λk ek (7.42)
and this Green function provides a solution to L† y = ϕ given by
Z Z
y(x) = dt G(x, t)ϕ(t) ⇒ dt L†x G(x, t)ϕ(t) = ϕ(x) . (7.43)
R R
Using this and the fact that L† is hermitian relative to the scalar product defined with weight function w
(as it appears in its Sturm-Liouville form) we get
Eq. (7.41) X 1
Z Z
†
†
LTG̃ [ϕ] = TG̃ [L ϕ] = dt G(0, t)Lt ϕ(t) = dt w(t)ek (t)ek (0)L†t ϕ(t)
R λk R
k
X 1 Z
† Eq. (7.42) X
Z
= dt w(t)Lt ek (t)ek (0)ϕ(t) = dt w(t)ek (t)ek (0)ϕ(t)
λk R R
k k
Eq. (7.42) X 1
Z Z
Eq. (7.41) Eq. (7.43)
= dt w(t)ek (t)(L†x ek )(0)ϕ(t) = dt (L†x G)(0, t)ϕ(t) = ϕ(0) = δ0 [ϕ]
λk R R
k
We will encounter further examples of fundamental solutions in the next chapter when we discuss other
linear partial differential equations.
121
7.4 Fourier transform for distributions∗
In section 3.2 we have discussed the Fourier transform and we have defined
1 1
Z Z
F(f )(k) = n
d ye −ik·y
f (y) , ˆ
F̃(f )(x) = dn k eik·x fˆ(k) , (7.44)
(2π)n/2 Rn (2π)n/2 Rn
and one of our central results was that F̃ ◦ F = id, so F̃ is the inverse Fourier transform. Writing this
out explicitly we have
1
Z
dn y dn k eik·(x−y) f (y) = f (x) . (7.45)
(2π)n
Symbolically, this result can also be written as
1
Z
dn k eik·(x−y) = δ(x − y) , (7.46)
(2π)n
in the sense that this equation multiplied with f (y) and integrated over y leads to Eq. (7.45), provided the
naive rule (7.7) for working with δ(x − y) is used. The equation (7.46) is frequently used in physics calcu-
lations but, as all other consideration which involve the Dirac delta used as a function, is mathematically
unsound.
Exercise 7.6. Use Eq. (7.46) naively to “prove” part (a) of Plancherel’s theorem (3.16).
To check that using Eq. (7.46) in a naive way makes sense we should consider the generalisation of the
Fourier transform to distributions.
For some guidance on how to define the Fourier transform for distributions we proceed as usual and
demand that FTf = TF f in order to ensure that the Fourier transform on distributions reduces to the
familiar one for functions whenever it is applied to a distribution of the form Tf . This implies
1
Z Z
(FTf )[ϕ] = TF f [ϕ] = dn y (Ff )(y)ϕ(y) = dn x dn y e−ix·y f (x)ϕ(y)
(2π)n/2
Z
= dn x (Fϕ)(x)f (x) = Tf [Fϕ] . (7.47)
The LHS and RHS of this equation make sense if we drop the subscript f and this can be used to define
the Fourier transform of distributions by 7
so that F̃ ◦ F(T ) = F ◦ F̃(T ) = T . Given this definition, what is the Fourier transform of the Dirac delta,
δk ? The quick calculation
1
Z
(Fδk )[ϕ] = δk [Fϕ] = (Fϕ)(k) = dn x e−ik·x ϕ(x) = Te−ik·x /(2π)n/2 [ϕ] (7.49)
(2π)n/2
shows that
Fδk = T 1
e−ik·x . (7.50)
(2π)n/2
7
A mathematically fully satisfactory definition requires modifying the test function space but this would go beyond our
present scope. The main idea of how to define Fourier transforms of distributions is already apparent without considering
this subtlety.
122
Note that this result is in line with our intuitive understanding of Fourier transforms as frequency analysis.
It says that a frequency spectrum “sharply peaked” at k Fourier transforms to a monochromatic wave
with wave vector k.
Now consider the inverse of the above computation, that is, we would like to work out the Fourier
transform of Teik.·x . Note that the function eik.·x is not integrable over Rn so it does not have a Fourier
transform in the conventional sense. Taking the complex conjugate
F̃δk = T 1
eik·x (7.51)
(2π)n/2
Again, this is in line with the intuitive understanding of Fourier transforms. The transform of a monochro-
matic wave with wave vector k is a spectrum “sharply peaked” at k. At the same time, Eq. (??) is the
mathematically correct version of Eq. (7.46).
123
8 Other linear partial differential equations
In this chapter, we will discuss a number of other linear partial differential equations which are important
in physics, including the Helmholz equation, the wave equation and the heat equation. We will cover a
number of methods to solve these equations but in the interest of keeping these notes manageable we will
not be quite as thorough as we have been for the Laplace equation. We begin with the Helmholz equation
which is closest to the Laplace equation.
(∆ + k 2 )ψ = 0 , (∆ + k 2 )ψ = f , (8.1)
where k ∈ R is a real number and ∆ is the three-dimensional Laplace operator (although the equation can,
of course, also be considered in other dimensions). This equation appears, for example, in wave problems
with fixed wave number k, as we will see in our discussion of the wave equation later on.
As always, the general solution to the inhomogeneous Helmholz equation is given as a sum of the
general solution of the homogeneous equation plus a special solution of the inhomogeneous equation. The
homogeneous Helmholz equation is an eigenvalue equation (with eigenvalue −k 2 ) for the Laplace operator
and many of the methods discussed in the context of the Laplace equation can be applied.
To find a special solution of the inhomogeneous equation we can use the Green function method, in
analogy to what we did for the Laplace equation. Define the functions
e±ikr
G± (r) = , (8.2)
r
where r = |x| is the radial coordinate. Given that this function is independent of the angles we can use
only the radial part of the Laplacian in spherical coordinates (6.17) to verify that, for r > 0
1 d 2 d
(∆ + k 2 )G± = r + k 2
G± = 0 . (8.3)
r2 dr dr
Hence, G± solves the homogeneous Helmholz equation for r > 0 in much the same way 1/r solves the
homogeneous Laplace equation. In fact, the analogy goes further as stated in the following
Proof. The proof is very much in analogy with the corresponding one for the Laplace equation 7.4 and
can be found in Ref. [4]. Essentially, it relies on ∆T1/r = −4πδ0 and the fact that 1/r is really the only
singularity in G.
With this result, the general solution to the inhomogeneous Helmholz equation can be written as
1 1
Z
ψ(x) = ψhom (x) − (TG ? f )(x) = ψhom (x) − d3 y G(x − y)f (y) , (8.5)
4π 4π R3
124
8.2 Eigenfunctions and time evolution
Many partial differential equations in physics involve a number of spatial coordinates x = (x1 , . . . , xn )T ∈
V ⊂ U ⊂ Rn as well as time t ∈ R and are of the form
1
Hψ = ψ̇ or Hψ = ψ̈ , (8.6)
c
∂
where ψ = ψ(t, x), the dot denotes the derivative ∂t and c is a constant. We assume that H is a second
∂
order linear differential operator in the spatial differentials ∂x i
, so a differential operator on C ∞ (U )∩L2 (U )
which is time-independent and hermitian relative to the standard scalar product on L2 (U ). If boundary
conditions are imposed on ∂V we assume that they are also time-independent. Under these conditions
there frequently exists a (time-independent) ortho-normal basis (φi )∞ 2
i=1 of L (U ) with the desired boundary
behaviour which consists of eigenfunctions of H, so
Hφi = λi φi , (8.7)
where the eigenvalues λi are real since H is hermitian. The problem is to solve the above equations
subject to an initial condition ψ(0, x) = ψ0 (x) and, in addition, ψ̇(0, x) = ψ̇0 (x) in the case of the second
equation (8.6), for given functions ψ0 and ψ̇0 . This can be done by expanding the function ψ, for any
given time t, in terms of the basis (φi ), so that
X
ψ(t, x) = Ai (t)φi (x) . (8.8)
i
P !
The remaining constants ai are fixed by the initial condition ψ(0, x) = i ai φi (x) = ψ0 (x) which can be
solved in the usual way, using the orthogonality relations hφi , φj i = δij . This leads to
ai = hφi , ψ0 i . (8.11)
A similar calculation for the second equation (8.6) (assuming that λi < 0) leads to
p p
Äi = −|λi |Ai ⇒ Ai = ai sin |λi | t + bi cos |λi | t (8.12)
so that X p p
ψ(t, x) = ai sin |λi | t + bi cos |λi | t φi (x) . (8.13)
i
P !
The constants ai and bi are fixed by the initial conditions ψ(0, x) = i bi φi (x) = ψ0 (x) and ψ̇(0, x) =
P p !
i ai |λi |φi (x) = ψ̇0 (x) and are, hence, given by
1
ai = p hφi , ψ̇0 i , bi = hφi , ψ0 i . (8.14)
|λi |
Below, we will discuss various examples of this structure more explicitly.
125
8.3 The heat equation
The homogeneous and inhomogeneous heat equations are given by
∂ ∂
∆n − ψ=0, ∆n − ψ=f , (8.15)
∂t ∂t
where ∆n is the Laplacian in n dimensions (with n = 1, 2, 3 cases of physical interest). The solution
ψ = ψ(t, x) can be interpreted as a temperature distribution evolving in time. If we are solving the
equations on a spatial patch V ⊂ Rn we have to provide boundary conditions on ∂V and in addition, we
should provide an initial distribution ψ(0, x) = ψ0 (x) at time t = 0.
The homogeneous equation is of the form discussed in the previous subsection (with H = ∆ and
c = 1). If we are solving the equation on a spatial patch V ⊂ Rn with boundary conditions such that the
spectrum of the Laplacian has countable many eigenvectors then we can apply the method described in
Section 8.2.
k2 π2
λk = − . (8.16)
a2
Inserting this into the general solution (8.10) leads to
∞
kπx k2 π 2
e− t
X
ψ(t, x) = bk sin a2 . (8.17)
a
k=1
The coefficients bk are determined by the initial condition ψ(0, x) = ψ0 (x) which leads to the standard
sine Fourier series
∞
X kπx !
ψ(0, x) = bk sin = ψ0 (x) (8.18)
a
k=1
Exercise 8.2. For V = [0, a], boundary conditions ψ(t, 0) = ψ(t, a) = 0 and an initial distribution
ψ(0, x) = T0 x(a−x)/a2 (where T0 is a constant) find the solution ψ(t, x) to the homogeneous heat equation.
For solutions in ψ(t, ·) ∈ L2 (Rn ) we can also solve the homogeneous heat equation using Fourier transforms.
Inserting
1
Z
ψ(t, x) = dn k ψ̃(t, k)e−ik·x (8.20)
(2π)n/2 Rn
126
into the heat equation leads to
˙ 2
ψ̃ = −|k|2 ψ̃ ⇒ ψ̃(t, k) = χ(k)e−|k| t , (8.21)
for a function χ(k). Inserting this back into the Ansatz gives
1
Z
2
ψ(t, x) = n/2
dn k χ(k)e−ik·x−|k| t , (8.22)
(2π) Rn
and the initial condition ψ(0, x) = ψ0 (x) translates into F(χ) = ψ0 which can be inverted using the inverse
Fourier transform, so χ = F̃(ψ0 ).
Exercise 8.3. Find the solution ψ(t, x) of the heat equation for x ∈ R, square integrable for all t, which
x2
satisfies ψ(0, x) = ψ0 (x) = T0 e− 2a2 .
To solve the inhomogeneous heat equation we would like to find a Green function. It can be verified
by direct calculation, that the function G : Rn+1 → R defined by
|x|2
(
− (4πt)1 n/2 e− 4t for t > 0
G(t, x) = (8.23)
0 for t ≤ 0
Exercise 8.4. Show that the function G in Eq. (8.23) solves the homogeneous heat equation for t 6= 0 and
x 6= 0.
In fact we have
Theorem 8.5. The distribution TG with G defined in Eq. (8.23) is a fundamental solution of the heat
equation, so
∂
∆− TG = δ0 . (8.24)
∂t
Proof. The proof is similar to the one for the Laplacian in Theorem 7.4 and can be found in Ref. [4].
From this result, the general solution to the inhomogeneous heat equation can be written as
Z
ψ(t, x) = ψhom (t, x) + (TG ? f )(t, x) = ψhom (t, x) + dτ dn y G(t − τ, x − y)f (τ, y) , (8.25)
Rn+1
∂2 ∂2
∆n − 2 ψ = 0 , ∆n − 2 ψ = f , (8.26)
∂t ∂t
where ∆n is the n-dimensional Laplacian (and n = 1, 2, 3 are the most interesting dimensions for physics).
If this equation is considered on the spatial patch V ⊂ Rn we should specify boundary conditions on ∂V
for all t. In addition, we require initial conditions ψ(0, x) = ψ0 (x) and ψ̇(0, x) = ψ̇0 (x) at some initial
time t = 0.
127
For an Ansatz of the form
ψ(t, x) = ψ̃(x)e−iωt (8.27)
the wave equation turns into the Helmholz equation for ψ̃ with k = ω and we can use the methods
discussed in Section 8.1.
Starting with the homogeneous equation with ψ(t, ·) ∈ L2 (Rn ) we can start with a Fourier integral
1
Z
ψ(t, x) = dn k ψ̃(t, k)e−ik·x . (8.28)
(2π)n/2 Rn
Inserting this into the homogenous equation implies
¨
ψ̃ = −|k|2 ψ̃ ⇒ ψ̃(t, k) = ψ+ (k)ei|k|t + ψ− (k)e−i|k|t , (8.29)
and, hence,
1
Z
ψ(t, x) = dn k (ψ+ (k)ei|k|t + ψ− (k)e−i|k|t )e−ik·x . (8.30)
(2π)n/2 Rn
The functions ψ± are fixed by the initial conditions via ψ0 = F(ψ+ + ψ− ) and ψ̇0 = F(i|k|(ψ+ − ψ− ))
which can be solved, using the inverse Fourier transform, to give ψ± = 12 F̃ ψ0 ∓ |k|
i
ψ̇0 .
If we work on a spatial patch with boundary conditions which lead to a countable number of eigenvec-
tors of the Laplacian we can use the method in Section 8.2 to solve the homogeneous wave equation. For
one and two spatial dimensions this leads to systems usually referred to as “strings” and “membranes”,
respectively, and we now discuss them in turn.
8.4.1 Strings
The wave equation now reads 8
∂2 ∂2
2
− 2 ψ=0, (8.31)
∂x ∂t
where ψ = ψ(t, x) and x ∈ [0, a]. This equation describes various kinds of strings from guitar strings to the
strings of string theory. We will impose Dirichlet boundary conditions ψ(t, 0) = ψ(t, a) = 0 as appropriate
for a string with fixed endpoints. (The strings of string theory allow for both Dirichlet and von Neumann
boundary conditions.) In addition, we need to fix the initial position, ψ(0, x) = ψ0 (x), and initial velocity
ψ̇(0, x) = ψ̇0 (x).
Given the boundary conditions an appropriate set of eigenfunctions is provided by φk = sin kπx
a ,
that is the functions for the sine Fourier series. We have φ00k = λk φk with eigenvalues
k2 π2
−λk = =: ωk2 . (8.32)
a2
Inserting this into the general solution (8.13) leads to
∞
X kπt kπt kπx
ψ(t, x) = ak sin + bk cos sin (8.33)
a a a
k=1
128
These equations can of course be solved for ak and bk using standard Fourier series techniques resulting
in Z a
2 a
2 kπx kπx
Z
ak = dx sin ψ̇0 (x) , bk = dx sin ψ0 (x) . (8.35)
kπ 0 a a 0 a
Note that the eigenfrequencies of the system
kπ
ωk = (8.36)
a
are all integer multiples of the ground frequency ω1 = π/a.
Exercise 8.6. Find the solution ψ = ψ(t, x) for a string with length a, Dirichlet boundary conditions
ψ(t, 0) = ψ(t, a) = 0 and initial conditions ψ̇(0, x) = 0 and
(
hx
b for 0 ≤ x ≤ b
ψ(0, x) = h(a−x) (8.37)
a−b for b < x ≤ a
where b ∈ [0, a] and h are constants. (Think of a guitar string plucked at distance b from the end of the
string.) How do the parameters b and h affect the sound of the guitar?
8.4.2 Membranes
We are now dealing with a wave equation of the form
2
∂2 ∂2
∂
+ − ψ=0, (8.38)
∂x2 ∂y 2 ∂t2
where ψ = ψ(t, x, y) and (x, y) ∈ V ⊂ R2 with a (compact) spatial patch V. We require boundary
conditions at ∂V and initial conditions ψ(0, x, y) = ψ0 (x, y) and ψ̇(0, x, y) = ψ̇0 (x, y). Of course we can
consider any number of “shapes”, V, of the membrane. Let us start with the simplest possibility of a
rectangular membrane, V = [0, a] × [0, b], and Dirichlet boundary conditions
ψ|∂V = 0. In this case,
we have an orthogonal basis of eigenfunctions φk,l (x, y) = sin a sin lπy
kπx
b , where k, l = 1, 2, . . ., with
∆2 φk,l = λk,l φk,l and eigenvalues
k 2 π 2 l2 π 2 2
−λk,l = + 2 =: ωk,l . (8.39)
a2 b
Inserting into Eq. (8.13) gives
∞
X kπx lπy
ψ(t, x, y) = (ak,l sin(ωk,l t) + bk,l cos(ωk,l t)) sin sin . (8.40)
a b
k,l=1
The coefficients ak and bk are fixed by the initial conditions and can be obtained using standard Fourier
series techniques (in both the x and y coordinate), in analogy with the string case. We note that the
lowest frequencies of the quadratic (a = b) drum are
√ π √ π
ω1,1 = 2 , ω2,1 = ω1,2 = 5 . (8.41)
a a
Unlike for a string the eigenfrequencies of a drum are not integer multiples of the ground frequency - this
is why a drum sounds less well-defined compared to other instruments and why most instruments use
strings or other, essentially one-dimensional systems to produce sound.
129
For the round membrane with V = {x ∈ R2 | |x| ≤ a} and boundary condition φ|r=a = 0 we should first
find a set of eigenfunctions φ for the two-dimensional Laplacian in polar coordinates, satisfying ∆φ = −λφ.
Starting with the Ansatz φ(r, ϕ) = R(r)eimϕ and using the two-dimensional Laplace operator (6.11) in
polar coordinates gives
r2 R00 + rR0 + (λr2 − m2 )R = 0 . (8.42)
√
This is the Bessel differential equation for ν = |m| and the solution is R(r) ∼ J|m| ( λr). The boundary
√
condition at r = a implies that λa = z|m|n , where zνn denotes the nth zero of the Bessel function Jν .
This means we have eigenfunctions and eigenvalues
2
z|m|n
φmn (r, ϕ) ∼ Jˆ|m|n (r)eimϕ , λmn = 2
=: ωmn . (8.43)
a2
2 T
P
Expanding ψ(t, r, ϕ) = m,n Tmn (t)φmn (r, ϕ) we find the differential equations T̈mn = −ωmn mn so that
ωmn are the eigenfrequencies of the round membrane. As is clear from Eq. (8.43) these eigenfrequencies
are determined by the zeros of the Bessel functions so they are quite irregular.
and is, hence, a Green function for the Helmholz equation. From our discussion in Section 8.1 there are
essentially two choices for this Green function, namely
e±iω|x|
G̃± (ω, x) = . (8.47)
|x|
Inserting this into the Fourier integral (8.45) and using the result (7.46) gives
δ(t ∓ |x|)
G± (t, x) = . (8.48)
|x|
The general solution to the inhomogeneous wave equation, using the so-called retarded Green function
G+ , is then given by
1
Z
ψ(t, x) = ψhom (t, x) − dt0 d3 x0 G+ (t − t0 , x − x0 )f (t0 , x0 )
4π R4
1 δ(t − t0 − |x − x0 |) 0 0
Z
= ψhom (t, x) − dt0 d3 x0 f (t , x )
4π R4 |x − x0 |
0 0
1 f (t , x )
Z
3 0
= ψhom (t, x) − d x (8.49)
4π R3 |x − x0 | t0 =t−|x−x0 |
130
Note that this formula is very much in analogy with the corresponding result for the Laplace equation. The
difference is of course the dependence on t. The source f is evaluated at the retarded time t0 = t − |x − x0 |
to produce the solution at time t. The physical interpretation is that this takes into account the time it
takes for the effect of the source at x0 to influence the solution at x. The above result is the starting point
for calculating the electromagnetic radiation from moving charges.
131
9 Groups and representations∗
Symmetries have become a key idea in modern physics and they are an indispensable tool for the construc-
tion of new physical theories. They play an important role in the formulation of practically all established
physical theories, from Classical/Relativistic Mechanics, Electrodynamics, Quantum Mechanics, General
Relativity to the Standard Model of Particle Physics.
The word “symmetry” used in a physics context (usually) refers to the mathematical structure of a
group, so this is what we will have to study. In physics, the typical problem is to construct a theory which
“behaves” in a certain defined way under the action of a symmetry, for example, which is invariant. To
tackle such a problem we need to know how symmetries act on the basic building blocks of physical theories
and these building blocks are often elements of vector spaces. (Think, for example, of the trajectory r(t)
of a particle in Classical Mechanics which, for every time t, is an element of R3 , a four-vector xµ (t) which
is an element of R4 or the electric and magnetic fields which, at each point in space-time, are elements of
R3 .) Hence, we need to study the action of groups on vector spaces and the mathematical theory dealing
with this problem is called (linear) representation theory of groups. The translation between physical and
mathematical terminology is summarised in the diagram below.
physics mathematics
symmetry ∼
= group
action on ↓ ↓ representation
building blocks ∼
= vector spaces
Groups and their representations form a large area of mathematics and a comprehensive treatment can
easily fill two or three lecture courses. Here we will just touch upon some basics and focus on some of the
examples with immediate relevance for physics. We begin with elementary group theory - the definition
of a group, of a sub-group and of group homomorphisms plus examples of groups - before we move on to
representations. Lie groups and their associated Lie algebras play an important role in physics and we
briefly discuss the main ideas before moving on to the physically most relevant examples of such groups.
Definition 9.1. (Group) A group is a set G with a map · : G × G → G, called group multiplication, which
satisfies
(G1) g1 · (g2 · g3 ) = (g1 · g2 ) · g3 for all g1 , g2 , g3 ∈ G (associativity)
(G2) There exists an e ∈ G such that e · g = g for all g ∈ G (neutral element)
(G3) For all g ∈G there exists a g̃ ∈ G such that g̃ · g = e (inverse element)
If, in addition, the group multiplication commutes, that is, if g1 · g2 = g2 · g1 for all g1 , g2 ∈ G, then the
group is called Abelian. Otherwise it is called non-Abelian.
Groups can have a finite or infinite number of elements and we will see examples of either. In the former
case, they are called finite groups. The number of elements in a finite group is also called the order of the
group.
132
This above definition looks somewhat asymmetric since we have postulated that the neutral element
and the inverse in (G2) and (G3) multiply from the left but have made no statement about their multi-
plication from the right. However, this is not a problem due to the following Lemma.
Proposition 9.1. For a group G we have the following statements.
(i) A left-inverse is also a right-inverse, that is, g̃ · g = e ⇒ g · g̃ = e.
(ii) A left-neutral is also a right neutral, that is, e · g = g ⇒ g · e = g.
(iii) For a given g ∈ G, the inverse is unique and denoted by g −1 .
(iv) The neutral element e is unique.
Proof. (i) Start with a g ∈ G and its left-inverse g̃ so that g̃ · g = e. Of course, g̃ must also have a
left-inverse which we denote by g 0 so that g 0 · g̃ = e. Then we have
g · g̃ = e · g · g̃ = g 0 · g̃ · g ·g̃ = g 0 · g̃ = e (9.1)
|{z}
=e
The inverse satisfies the following simple properties which we have already encountered in the context of
maps and their inverses.
(g −1 )−1 = g , (g1 ◦ g2 )−1 = g2−1 ◦ g1−1 . (9.2)
Exercise 9.1. Proof statements (ii), (iii) and (iv) of Lemma 9.1 as well as the rules (9.2).
We follow the same build-up as in the case of vector spaces and next definite the relevant sub-structure,
the sub-group.
Definition 9.2. (Sub-group) A subset H ⊂ G of a group G is called a sub-group of G if it forms a group
under the multiplication induced from G.
The following exercise provides a practical way of checking whether a subset of a group is a sub-group.
Exercise 9.2. Show that a subset H ⊂ G of a group G is a sub-group iff it satisfies the following conditions:
(i) H is closed under the group multiplication.
(ii) e ∈ H
(iii) For all h ∈ H we have h−1 ∈ H
133
9.1.3 Examples of groups
We should now discuss some examples of groups. You are already familiar with many of them although
you may not yet have thought about them in this context.
Examples from “numbers”: Every field F forms a group, (F, +), with respect to addition and F \ {0}
forms a group, (F \ {0}, ·) with respect to multiplication (we need to exclude 0 since it does not have
a multiplicative inverse). So, more concretely, we have the groups (R, +), (C, +), (R \ {0}, ·) and (C \
{0}, ·). The integers Z also form a group, (Z, +) with respect to addition (however, not with respect to
multiplication since there is no multiplicative inverse in Z). Clearly, all of these groups are Abelian. The
group (Z, +) is a sub-group of (R, +) which, in turn, is a sub-group of (C, +).
Examples from vector spaces: Every vector space forms an Abelian group with respect to vector
addition.
Finite Abelian groups: Consider the set Zp := {0, 1, . . . , p − 1} for any positive integer p and introduce
the group multiplication
g1 · g2 := (g1 + g2 ) mod p . (9.3)
Clearly, the Zp form finite, Abelian groups (with neutral element 0) which are also referred to as cyclic
groups.
Finite non-abelian groups: The permutations Sn := {σ : {1, . . . , n} → {1, . . . , n} | σ bijective} of n
objects form a group with group multiplication given by the composition of maps. Indeed, the composition
of maps is associative, we have the identity map which serves as the neutral element and the inverse is
given by the inverse map. In conclusions, the permutations Sn form a finite group of order n! which is also
referred to as symmetric group. Are these groups Abelian or non-Abelian? We begin with the simplest
case of S2 which has the two elements
1 2 1 2
S2 = e = , (9.4)
1 2 2 1
(Recall, the above notation is a way of writing down permutations explicitly. It indicates a permutation
which maps the numbers in the top row to the corresponding numbers in the bottom row.) Clearly, this
group is Abelian since the second element commutes with itself and everything commutes with the identity
(the first element). For Sn with n > 2 consider the two permutations
1 2 3 ··· 1 2 3 ···
σ1 = , σ2 = , (9.5)
1 3 2 ··· 2 1 3 ···
where the dots stand for arbitrary permutations of the number 4, . . . , n. We have
1 2 3 ··· 1 2 3 ···
σ1 · σ2 = , σ2 · σ1 = , (9.6)
3 1 2 ··· 2 3 1 ···
134
on the vector spaces they are associated with and, therefore, realise the action of groups on vectors we
would like to achieve for groups more generally. For this reason, they play an important role in the theory
of representations, as we will see shortly. General linear groups have many interesting sub-groups, some
of which we will now discuss.
Unitary and special unitary groups: The unitary group, U (n), is defined as
so it consists of all unitary n × n matrices. Why is this a group? Since U (n) is clearly a subset of the
general linear group Gl(Cn ) all we have to do is verify the three conditions for a sub-group in Exercise 9.2.
Condition (i), closure, is obvious since the product of two unitary matrices is again unitary, as is condition
(ii) since the unit matrix is unitary. To verify condition (iii) we need to show that with U ∈ U (n) also the
inverse U −1 = U † is in U (n). To do this, consider U as an element of the group Gl(Cn ), so that U † U = 1n
implies U U † = 1n (since, in a group, the left inverse is the right inverse). The last equation can be written
as (U † )† U † = 1n which shows that U † ∈ U (n). Also note that the defining relation, U † U = 1n implies
that the determinant of unitary matrices satisfies
|det(U )| = 1 . (9.8)
Exercise 9.6. Show that Eq. (9.13) provides the correct expression for the group SU (2).
135
The explicit form (9.13) of SU (2) shows that this group is non-Abelian (as are all higher SU (n) groups)
and that it can be identified with the three-sphere S 3 . This means we have another example of a group
which is also a manifold, so a Lie group.
Finding explicit parametrisations along the lines of Eq. (9.13) for SU (3) or even higher-dimensional
cases is not practical anymore and we will later discuss more efficient ways of dealing with such groups.
Orthogonal and special orthogonal groups: This discussion is very much parallel to the previous one
on unitary and special unitary groups, except it is based on the real, rather than the complex numbers.
The orthogonal and special orthogonal groups (= rotations) are defined as
and, hence, consist of all orthogonal matrices and all orthogonal matrices with determinant one, respec-
tively.
Exercise 9.7. Show that O(n) is a sub-group of Gl(Rn ) and that SO(n) is a sub-group of O(n). (Hint:
Proceed in analogy with the unitary case.) Also show that every A ∈ O(N ) has determinant det(A) = ±1
and is either special orthogonal or can be written as A = F R, where R ∈ SO(n) and F = diag(−1, 1, . . . , 1).
Just as for unitary groups, it is easy to deal with the two-dimensional case and show, by explicitly inserting
an arbitrary 2 × 2 matrix into the defining relations, that SO(2) is given by
cos θ sin θ
SO(2) = | 0 ≤ θ < 2π . (9.16)
− sin θ cos θ
Since this is parametrised by a circular coordinate, SO(2) can also be thought of as a circle, S 1 . This is a
good opportunity to present another example of a group homomorphism f : U (1) → SO(2), defined by
iθ cos θ sin θ
f (e ) := . (9.17)
− sin θ cos θ
Exercise 9.8. Show that the map (9.17) defines a bijective group homomorphism.
The previous exercise shows that U (1) and SO(2) are isomorphic - as far as their group structure is
concerned they represent the same object. Perhaps more surprisingly, we will see later that SO(3) and
SU (2) are homeomorphic (and very nearly isomorphic) as well.
We could continue and write down an explicit parametrisation for SO(3), for example in terms of a
product of three two-dimensional rotations but will refrain from doing so explicitly. Later we will see that
there are more efficient methods to deal with SO(n) for n > 2.
The previous list provides us with sufficiently many interesting examples of groups and we now turn to
our main task - representing groups.
9.2 Representations
Recall from the introduction that we are after some rule by which groups can act on vector spaces. We
already know that the general linear group Gl(V ) (where V = Rn or V = Cn , say), which consists of
invertible linear maps (or matrices) on V , naturally acts on the vector space V . We can, therefore, achieve
our goal if we embed an arbitrary group G into Gl(V ). However, we want to do this in a way that preserves
the group structure of G and this is precisely accomplished by group homomorphisms. This motivates the
following definition of a representation of a group.
136
Definition 9.4. (Representation of a group) A representation of a group G is a group homomorphism
R : G → Gl(V ), where V is a vector space (typically taken to be V = Rn or V = Cn ). The dimension
of V is called the dimension, dim(R), of the representation R. A representation is called faithful if R is
injective.
The keyword in this definition is “group homomorphism” which means that the representation R satisfies
Representations of U (1): For U (1) = {eiθ | 0 ≤ θ < 2π} and any q ∈ Z we can define a one-dimensional,
faithful representation R(q) : U (1) → Gl(C) =: C∗ by
Direct sum representations: For two representations R : G → Gl(V ) and R̃ : G → Gl(Ṽ ) of the same
group G, we can define the direct sum representation R ⊕ R̃ : G → Gl(V ⊕ Ṽ ) by
R(g) 0
(R ⊕ R̃)(g) := , (9.21)
0 R̃(g)
that is, be simply arranging the representation matrices R(g) and R̃(g) into a block-matrix. Obviously
dimensions sum up, so dim(R ⊕ R̃) = dim(R) + dim(R̃).
As an explicit example, consider the above one-dimensional representations R(1) and R(−1) of U (1),
taking q = ±1 in Eq. (9.20). Their direct sum representation is two-dimensional and given by
iθ
iθ e 0
(R(1) ⊕ R(−1) )(e ) = . (9.22)
0 e−iθ
Tensor representations: For this we need a bit of preparation. Consider two square matrices A, B of
size n and m, respectively. By their Kronecker product A × B we mean the square matrix of size nm
137
obtained by replacing each entry in A with that entry times the entire matrix B. The Kronecker product
satisfies the useful rule
(A × B)(C × D) = (AC) × (BD) , (9.23)
provided the sizes of the matrices A, B, C, D fit as required.
Now consider two representations R : G → Gl(V ) and R̃ : G → Gl(Ṽ ) of the same group G. The
tensor representation R ⊗ R̃ : G → Gl(V ⊗ Ṽ ) is defined by
Given the definition of the Kronecker product, dimensions multiply, so dim(R ⊗ R̃) = dim(R)dim(R̃).
As an explicit example, consider the two-dimensional fundamental representation of SU (2) given by
R(U ) = U of any U ∈ SU (2). Then the tensor representation R ⊗ R is four-dimensional and given by
U11 U U12 U
(R ⊗ R)(U ) = , (9.25)
U21 U U22 U
Exercise 9.9. Use Eq. (9.23) to show that Eq. (9.24) does indeed define a representation.
Definition 9.5. Two representations R : G → Gl(V ) and R̃ : G → Gl(V ) are called equivalent if there is
an invertible linear map P : V → V such that R̃(g) = P R(g)P −1 for all g ∈ G.
In other words, if the representation matrices of two representations only differ by a common basis trans-
formation they are really the same representation and we call them equivalent. A further simple piece of
terminology is the following.
Definition 9.6. A representation R : G → Gl(V ) on an inner product vector space V is called unitary iff
all R(g) are unitary with respect to the inner product on V .
More practically, if V = Rn or V = Cn with their respective standard scalar products, the unitary
representations are precisely those with all representation matrices orthogonal or unitary matrices. For
example, the U (1) representations (9.20) and (9.22) are unitary.
In Eq. (9.21) we have seen that larger representations can be built up from smaller ones by forming
a direct sum. Conversely we can ask whether a given representation can be split up into a direct sum of
smaller representations. In this way, we can attempt to break up representations into their smallest build-
ing block which cannot be decomposed further in this way and which are called irreducible representation.
For example, all fundamental representations of (S)O(n) and (S)U (n) are irredcible.
Exercise 9.10. Show that the fundamental representations of SU (2) and SO(3) are irreducible.
138
What does it mean in practice for a representation to be reducible? Suppose we have a non-trivial sub-
space W ⊂ V invariant under the representation R and we choose a basis for W which we complete to a
basis for V . Relative to this basis, the representation matrices have the form
A(g) B(g) W
R(g) = , (9.26)
0 D(g) rest
where A(g), B(g) and D(g) are matrices. The zero in the lower left corner is forced upon us to ensure a
vector in W is mapped to a vector in W as required by invariance. The form (9.26) is not quite a direct
sum due to the presence of the matrix B(g). To deal with this we define the following.
Definition 9.8. A reducible representation is called fully reducible if it is a direct sum of irreducible
representations.
In practice, this means we can achieve a form for which all matrices B(g) in Eq. (9.26) vanish. Not all
reducible representations are fully reducible but we have
Theorem 9.11. All reducible representations of finite groups and all reducible unitary representations are
fully reducible.
A common problem is having to find the irreducible representations contained in a given (fully) re-
ducible representation. If this representation is given as a direct sum, R ⊕ R̃, with R and R̃ irreducible,
then this task is trivial - the irreducible pieces are simply R and R̃. The tensor product R ⊗ R̃ is more
interesting. It is not obviously block-diagonal but, as it turns out, it is usually reducible. This means
there is a decomposition, also called Clebsch-Gordan decomposition,
R ⊗ R̃ ∼
= R1 ⊕ · · · ⊕ Rk (9.27)
of a tensor product into irreducible representations Ri . We will later study this decomposition more
explicitly for the case of SU (2).
Since the interest in physics is usually in orthogonal or unitary matrices Theorem 9.11 covers the
cases we are interested in and it suggests a classification problem. For a given group G we should find all
irreducible representations - the basic building blocks from which all other representations (subject to the
conditions of the above theorem) can be obtained by forming direct sums. A very helpful statement for
this purpose (which is useful in other context, for example in quantum mechanics, as well) is the famous
Schur’s Lemma.
Lemma 9.1. (Schur’s Lemma) Let R : G → Gl(V ) be an irreducible representations of the group G over
a complex vector space V and P : V → V a linear map with [P, R(g)] = 0 for all g ∈ G. Then P = λ idV ,
for a complex number λ.
Proof. Since we are working over the complex numbers the characteristic polynomial for P has at least one
zero, λ, and the associated eigenspace EigP (λ) is non-trivial. For any v ∈ EigP (λ), using [P, R(g)] = 0,
we have
P R(g)v = R(g)P v = λR(g)v . (9.28)
Hence, R(g)v is also an eigenvector with eigenvalue λ and we conclude that the eigenspace EigP (λ) is
invariant under R. However, R is irreducible by assumption which means there are no non-trivial invariant
sub-spaces. Since EigP (λ) 6= {0} the only way out is that EigP (λ) = V . This implies that P = λ idV .
139
Schur’s lemma says in essence that a matrix commuting with all (irreducible) representation matrices of
a group must be a multiple of the unit matrix. This can be quite a powerful statement. However, note
the condition that the representation is over a complex vector space - the theorem fails in the real case.
A counterexample is provided by the fundamental representation of SO(2), given by the matrices (9.16).
Seen as a representation over R2 it is irreducible but all representation matrices of SO(2) commute with
one another.
An immediate conclusion from Schur’s Lemma is the following.
Corollary 9.1. All complex irreducible representations of an Abelian group are one-dimensional.
Proof. For an Abelian group G we have g ◦ g̃ = g̃ ◦ g for all g, g̃ ∈ G which implies [R(g), R(g̃)] = 0
for any representation R. The linear map P = R(g) then satisfies all the conditions of Schur’s Lemma
and we conclude that R(g) = λ(g) idV . However, this form is only consistent with R being irreducible if
dim(R) = 1.
This statement is the key to finding all complex, irreducible representations of Abelian groups and we
discuss this for the two most important Abelian examples, Zn and U (1).
+ · · · 1}) = R(1)n = ζ n ,
1 = R(0) = R(1| + 1{z (9.29)
n times
so ζ must be an nth root of unity, ζ = e2πiq/n , where q = 0, . . . , n − 1. Note that the choice of
R(1) = ζ determines the entire representation since R(k) = R(1)k = ζ k . This means we have
precisely n complex, irreducible representations of Zn given by
Exercise 9.12. Show that Eq. (9.20), where q ∈ Z is a complete list of all complex, irreducible U (1)
representations. (Hint: Start by considering representation matrices for group element eiθ where θ is
rational and then use continuity.)
In a physics context, the integer q which labels the above Zn and U (1) representations is also called
the charge. As you will learn later in the physics course, fixing the electrical charge of a particle (such
as the electron charge), mathematically amounts to choosing a U (1) representation for this particle.
As we have seen above, complex, irreducible representations of Abelian groups are quite straight-
forward to classify, essentially because the representation “matrices” are really just numbers. The
somewhat “brute-force” approach taken above becomes impractical if not unfeasible for the non-
140
Abelian case. Just imagine having to write down, for a two-dimensional representation, an arbitrary
2 × 2 matrix as an Ansatz for each representation matrix and then having to fix the unknown entries
by imposing the group multiplication table on the matrices. Clearly, we need more sophisticated
methods to deal with the non-Abelian case. There is a beautiful set of methods for finite non-Abelian
groups, using characters but discussing this in detail is beyond the scope of this lecture. If you are
interested, have a look at Ref. [13]. Instead, we focus on Lie groups, which we now define and analyse.
Definition 9.9. A group G is a Lie group if it is a differentiable manifold and the left-multiplication with
group elements and the inversion of group elements are differentiable maps.
It is difficult to exploit this definition without talking in some detail about differentiable manifolds, which
is well beyond the scope of this lecture. Fortunately, for our purposes we can adopt a somewhat more
practical definition of a Lie group which follows from the more abstract one above.
A matrix Lie group G is a group given (at least around a neighbourhood of the group identity 1), by
a family g = g(t) of n × n matrices, which depend (in an infinitely differentiable way) on real parameters
∂g ∂g
t = (t1 , . . . , tk ) and such that the matrices ∂t 1
, . . . ∂tk
are linearly independent. (This last requirement
is really the same as the maximal rank condition in Theorem B.1.) Note that the matrices g(t) act on
the vector space V = Rn or V = Cn . For convenience, we assume that g(0) = 1n is the group identity.
The number, k, of parameters required is called the dimension of the Lie group. (Do not confuse this
dimension of the matrix Lie group with the dimension n of the vector space V on which these matrices
act.)
Obvious examples of Lie groups are U (1) in Eq. (9.9) and SO(2) in Eq. (9.16), both of which are
parametrised by one angle (playing the role of the single parameter t1 ) and are, hence, one-dimensional.
A more interesting example is provided p by SU (2). We can solve the constraint on α and β in Eq. (9.13)
by setting β = −t2 + it1 and α = 1 − t21 − t22 eit3 . This leads to the explicit parameterization
p
1 − t21 − t22 eit3 p −t2 + it1
U= , (9.31)
t2 + it1 1 − t21 − t22 e−it3
which shows that SU (2) is a three-dimensional Lie group. A similar argument can be made for SO(3)
which can be parametrised in terms of three angles and is, hence, a three-dimensional Lie group as well.
In fact, all orthogonal and unitary groups (as well as their special sub-groups) can be parametrised in this
way and are Lie groups. However, writing down these parametrisations explicitly becomes unpractical in
higher dimensions.
141
Definition 9.10. A Lie algebra L is a vector space with a commutator bracket [·, ·] : L × L → L which is
anti-symmetric, so [T, S] = −[S, T ] and satisfies the Jacobi identity [T, [S, U ]] + [S, [U, T ]] + [U, [T, S]] = 0
for all T, S, U ∈ L.
Let us see how we can associate a Lie algebra in this sense to our group G. We start with the generators,
Ti , of the group defined by 9 .
∂g
Ti := −i i (0) . (9.32)
∂t
In terms of the generators we can think of group elements near the identity as given by the Taylor series
k
g(t) = 1 + i
X
ti Ti + O(t2 ) . (9.33)
i=1
The Lie algebra L(G) is the vector space of matrices spanned (over R) by the generators, that is,
By definition of a matrix Lie group the generators must be linearly independent, so the dimension of L(G)
as a vector space is the same as the dimension of the underlying group as a manifold. Now we understand
how to obtain the generators from the Lie group. Is there a way to reverse this process and obtain the
Lie group from its generators? Amazingly, the answer is “yes” due to the following theorem.
Theorem 9.13. Let G be a (matrix) Lie group and L(G) as defined in Eq. (9.34). Then the matrix
exponential exp(i ·) provides a map exp(i ·) : L(G) → G whose image is the part of G which is (path)-
connected to the identity.
Proof. See, for example, Ref. [13].
This gives all SU(2) matrices as comparison with Eq. (9.13) shows,
We can also recover the group SO(3) by forming the matrix exponential
with the matrices T̃i in Eq. (9.51). (Note, that O(3), which has the same Lie algebra as SO(3) cannot
be fully recovered by the matrix exponential since the orthogonal matrices with determinant −1 are
not path-connected to the identity.)
We know that L(G) is a vector space and we can define a bracket by the simple matrix commutator
[T, S] := T S − ST . Clearly, this bracket is anti-symmetric and a simple calculation shows that it satisfies
the Jacobi identity.
9
The factor of −i is included so that the generators become hermitian (rather than anti-hermitian).
142
Exercise 9.14. Show that the matrix commutator bracket [T, S] = T S − ST on L(G) satisfies the Jacobi
identity.
All that remains to be shown for L(G) to be a Lie-algebra in the sense of Def. 9.10 is that it is closed
under the bracket, that is, if T, S ∈ L(G) then [T, S] ∈ L(G). This follows from the closure of G under
multiplications. Start with two group element g(t) and g(s), expand each to second order using the matrix
exponential and consider the combination
k
X
g(t)−1 g(s)−1 g(t)g(s) = 1 − ti sj [Tj , Tk ] + · · · . (9.37)
i,j=1
Since the LHS of this equation must be a group element we conclude the commutators [T, S] is indeed an
element of T, S ∈ L(G). In conclusion L(G), the vector space spanned by the generators, together with
the matrix commutator, forms a Lie algebra in the sense of Def. 9.10.
Since [Ti , Tj ] ∈ L(G) and the generators form a basis of L(G) it is clear that there must be constants
k
fij , also called structure constants of the Lie algebra, such that
Eq. (9.33) shows why we should think of the Lie algebra L(G) as encoding “infinitesimal” group trans-
formations. Consider a vector v ∈ V which transforms under a group element g(t) as v → g(t)v. Then
inserting the expansion (9.33), it follows that
k k
!
v → g(t)v = 1n + i
X X
ti Ti + O(t2 ) v ⇒ δv := g(t)v − v = i ti Ti v + O(t2 ) . (9.39)
i=1 i=1
This means the Lie algebra L(SO(2)) is one-dimensional and consists of (i times) the 2 × 2 anti-symmetric
matrices.
Lie algebra of SU (2) This is a more interesting example, since SU (2) depends on three parameters ti ,
where i = 1, 2, 3 and taking the derivatives of the group elements in Eq. (9.31) we find for the generators
∂g
Ti = −i = σi , (9.42)
∂ti
143
where σi are the Pauli matrices. Recall that the Pauli matrices satisfy the useful relation
It is customary to take as the standard generators for L(SU (2)) the matrices
1
τi := σi . (9.44)
2
Hence, the Lie-algebra of SU (2) is three-dimensional and given by
Given the commutation relations [σi , σj ] = 2iijk σk (which follow from Eq. (9.43)) for the Pauli matrices
we have
[τi , τj ] = iijk τk , (9.46)
for the standard generators τi so that the structure constants are fijk = iijk .
Exercise 9.15. Verify the relation (9.43) for the Pauli matrices. (Hint: Show that the Pauli matrices
square to the unit matrix and that the product of two different Pauli matrices is ±i times the third.) Show
from Eq. (9.43) that [σi , σj ] = 2iijk σk and that tr(σi σj ) = 2δij .
The method for computing the Lie algebra illustrated above becomes impractical for higher-dimensional
groups since it requires an explicit parametrisation. However, there is a much more straightforward way
to proceed.
Lie algebras of unitary and special unitary groups: To work out the Lie algebra of unitary groups
U (n) in general start with the Ansatz U = 1n +iT +· · · and insert this into the defining relation U † U = 1n ,
keeping only terms up to linear order in T , to work out the resulting constraint on the generators. Doing
this in the present case results in T = T † , so the generators must be hermitian and the Lie algebra is
For the case of SU (n) we have to add the condition det(U ) = 1 which leads to det(1n + iT + · · · ) =
!
1 + i tr(T ) + · · · = 1, so that tr(T ) = 0. This means
We can now choose a basis and compute structure constants for these Lie algebras but we will not do this
explicitly, other than for the case of SU (2) which we have already covered.
Lie algebras of orthogonal and special orthogonal groups: This works in analogy with the unitary
case. Inserting the Ansatz A = 1n + iT + · · · into the defining relation AT A = 1n for O(n) (where T has
to be purely imaginary, so that A is real) leads to T = −T T , so that T must be anti-symmetric. Since
anti-symmetric matrices are already traceless the additional condition det(A) = 1 for SO(n) does not add
anything new and we have
144
which are more explicitly given by
0 0 0 0 0 1 0 −1 0
T̃1 = i 0 0 −1 , T̃2 = i 0 0 0 , T̃3 = i 1 0 0 . (9.52)
0 1 0 −1 0 0 0 0 0
We have seen above that the generators of unitary (or orthogonal) groups are hermitian. Conversely, we
can show that the exponential map acting on hermitian generators leads to unitary group elements. To
do this first note that for two matrices X, Y with [X, Y ] = 0 we have exp(X) exp(Y ) = exp(X + Y ). Also
note that (exp(X))† = exp(X † ). It follows for a hermitian matrix T that
Exercise 9.17. Show that (exp(X))† = exp(X † ) for a square matrix X. Also show that for two square
matrices X, Y with [X, Y ] = 0 we have exp(X) exp(Y ) = exp(X + Y ). (Hint: Use the series which defines
the exponential function.)
Exercise 9.18. Proof equation (9.35).(Hint: Use Eq. (9.43).) Further, show that the RHS of Eq. (9.35)
provides all SU (2) matrices.
Definition 9.11. A representation r of a Lie algebra L is a linear map r : L → Hom(V, V ) which preserves
the bracket, that is, r([T, S]) = [r(T ), r(S)].
The notions of reducible, irreducible and fully reducible representations we have introduced for group
representations directly transfer to representations of the Lie algebra. Note that the space Hom(V, V ) is,
in practice, the space of n × n real matrices for V = Rn or the space of n × n complex matrices for V = Cn .
So a representation of a Lie algebra amounts to assigning (linearly) to each Lie algebra element T a matrix
r(T ) such that the matrices commute “in the same way” as the Lie algebra elements. A practical way
of stating what a Lie algebra representation is in terms of a basis Ti of the Lie algebra. Suppose the Ti
commute as
[Ti , Tj ] = fij k Tk . (9.55)
Then the assignment Ti → r(Ti ) defines a Lie algebra representation provided the matrices r(Ti ) commute
on the same structure constants, that is,
145
As an example for how this works in practice, consider the three matrices τi in Eq. (9.44) which form a
basis of L(SU (2)). Any assignment τi → Ti to matrices Ti which commute on the same structure constants
as the τi , that is [Ti , Tj ] = iijk Tk , defines a Lie algebra representation of L(SU (2)). In fact, we have
already seen an example of such matrices, namely the matrices T̃i in Eq. (9.51) which form a basis of the
Lie-algebra L(SO(3)). Hence, we see that the Lie algebra of SO(3) is a representation of the Lie-algebra
of SU (2) - a clear indication that those two groups are closely related.
The idea for how we want to proceed discussing representations is summarised in the following diagram.
R
G −→ Gl(V )
exp ↑ ↑ exp (9.57)
r
L(G) −→ Hom(V, V )
Instead of studying representations R of the Lie group G we will be studying representations r of its
Lie-algebra L(G). When required, we can use the exponential map to reconstruct the group G and its
representation matrices in Gl(V ). In other words, suppose we have a Lie-algebra element T ∈ L(G) and
the associated group element g = exp(iT ) ∈ G. Then the relation between a representation r of the Lie
algebra and the corresponding representation R at the group level is summarised by
We will now explicitly discuss all this for the group SU (2) and its close relative SO(3).
ϕ(t) := ti σi , (9.59)
so that ϕ identifies R3 with L(SU (2)). Note that the dot product between two vectors t, s ∈ R3 can then
be written as
1
t · s = tr(ϕ(t)ϕ(s)) , (9.60)
2
as a result of the identity tr(σi σj ) = 2 δij which follows from Eq. (9.43).
Note that this makes sense. The matrix U ϕ(t)U † is hermitian traceless and, hence, an element of
L(SU (2)). Therefore, we can associate to it, via the inverse map ϕ−1 , a vector in R3 . In order to
study the representation R further, we analyse its effect on the dot product.
1 1 1
(R(U )t) · (R(U )s) = tr (ϕ(R(U )t)ϕ(R(U )s)) = tr U ϕ(t)U † U ϕ(s)U † = tr(ϕ(t)ϕ(s)) = t · s (9.62)
2 2 2
146
This shows that R(U ) leaves the dot product invariant and, therefore, R(U ) ∈ O(3). A connectedness
argument shows that, in fact, R(U ) ∈ SO(3) 10 . The representation R is not faithful since, from Eq. (9.61),
U and −U are mapped to the same rotation. In fact, R is precisely a two-to-one map (two elements of
SU (2) are mapped to one element of SO(3)) as can be shown in the following exercise.
Exercise 9.20. Show that the kernel of the representation (9.61) is given by Ker(R) = {±12 }. (Hint:
Use Schur’s Lemma.)
It can also be verified by explicit calculation that Im(R) = SO(3). In conclusion, we have seen that the
groups SU (2) and SO(3) are very closely related - the former is what is called a double-cover of the latter.
We have a two-to-one representation map R : SU (2) → SO(3) which allows us to recover SO(3) from
SU (2), but not the other way around since R is not invertible. From this point of view the group SU (2)
is the more basic object. In particular, all representations of SO(3) are also representations of SU (2)
(just combine the SO(3) representation with R) but not the other way around. It is already clear that
the group SO(3) plays an important role in physics - just think of rotationally symmetric problems in
classical mechanics, for example. The above result strongly suggest that SU (2) is also an important group
for physics and, in quantum mechanics, this turns out to be the case.
147
Exercise 9.21. Derive Eqs. (9.65), (9.67) and (9.68), starting with the commutation relations (9.63) and
the definition (9.64).
From Schur’s Lemma this means that J 2 must be a multiple of the unit matrix and we write 12
for a real number j. The number j characterises the representation we are considering and which we write
as rj with associated representation vector space Vj . The idea is to construct this vector space by thinking
about the eigenvalues and eigenvectors of J3 which we denote by m and |jmi, respectively, so that
What can we say about these eigenvectors and eigenvalues? Since J3 is hermitian we can choose the
eigenvectors as an ortho-normal basis, so
It is also clear that 0 ≤ hjm|J 2 |jmi = j(j + 1) so we can take j ≥ 0. Further, the short calculation
shows that J± |jmi is also an eigenvector of J3 but with eigenvalue m ± 1. Hence, J± act as “ladder
operators” which increase or decrease the eigenvalue by one. If this is carried out repeatedly the process
must break down at some point since we are looking for finite-dimensional representations. The breakdown
point can be found by considering
−j ≤ m ≤ j , J± |jmi = 0 ⇔ m = ±j . (9.75)
This means we have succeeded in bounding the eigenvalues m and in finding a breakdown condition for
the ladder operators. As we apply J+ successively we can get as high as |jji and then J+ |jji = 0. Starting
with |jji and using J− to go in the opposite direction we obtain
2 2j 2j+1
|jji , J− |jj|i ∼ |jj − 1i , J− |jji ∼ |jj − 2i, · · · , J− |jji ∼ |j − ji , J− |jji = 0 , (9.76)
where the breakdown must arise at m = −j. In the above sequence, we have gone from m = j to m = −j
in integer steps. This is only possible if j is integer or half-integer. The result of this somewhat lengthy
argument can be summarised in
Theorem 9.22. The irreducible representations rj of L(SU (2)) are labelled by an integer or half integer
number j ∈ Z/2 and the corresponding representation vector spaces Vj are spanned by
so that dim(rj ) = 2j + 1. The generators J± , J3 in Eq. (9.64) act on the states |jmi as
p
J± |jmi = j(j + 1) − m(m ± 1) |jm ± 1i , J3 |jmi = m|jmi . (9.78)
12
The reason for writing the constant as j(j + 1) will become clear shortly.
148
Proof. The only part we haven’t shown yet is the factor in the first Eq. (9.78). To do this we write
J± |jmi = N± (j, m)|jm ± 1i with some constants N± (j, m) to be determined. By multiplying with
hjm ± 1| we get
1
N± (j, m) = hjm ± 1|J± |jmi = hjm|J∓ J± |jmi , (9.79)
N± (j, m)∗
and, using Eqs. (9.73) and (9.74), this implies |N± (j, m)|2 = j(j + 1) − m(m ± 1). Up to a possible phase
this is the result we need. It can be shown that this phase can be consistently set to one and this completes
the proof.
The representation rj is also called the spin j representation and j is referred to as the total spin of the
representation. The label m for the basis vectors |jmi is also called z-component of the spin. These
representations play an important role in quantum mechanics where they describe states with well-defined
total spin and well-defined z-component of spin.
(j) (j)
The results (9.78) can be used to explicitly compute the representation matrices T± and T3 whose
entries are given by
(j) p
T+,m̃m := hj m̃|J+ |jmi = j(j + 1) − m(m + 1) δm,m̃−1
(j) p
T−,m̃m := hj m̃|J− |jmi = j(j + 1) − m(m − 1) δm,m̃+1 . (9.80)
(j)
T3,m̃m := hj m̃|J3 |jmi = m δmm̃
Let us use these equations to compute these matrices explicitly for the lowest-dimensional representations.
Here, τi are the standard generators of the SU (2) Lie-algebra, defined in Eq. (9.44) which confirms that
we are indeed dealing with the fundamental representation.
representation j = 1: The representation r1 is three-dimensional with representation vector space V1 =
C3 (or R3 since the representation matrices turn out to be real) with allowed values m = −1, 0, 1. There
is only one irreducible three-dimensional representation so this must coincide with the three-dimensional
representation of SU (2) provided by SO(3) which we have discussed earlier. Again, we can verify this
149
explicitly by inserting j = 1 and m̃, m = 1, 0, −1 into Eqs. (9.80), leading to
√ 0 1 0 √ 0 0 0 1 0 0
(1) (1) (1)
T+ = 2 0 0 1 , T− = 2 1 0 0 , T3 = 0 0 0 . (9.83)
0 0 0 0 1 0 0 0 −1
Those matrices look somewhat different from the matrices (9.52) for the Lie-algebra of SO(3) but the two
sets of matrices are, in fact, related by a common basis transformation.
(1) (1)
Exercise 9.23. Show that the matrices T± and T3 in Eq. (9.83) satisfy the correct L(SU (2)) com-
mutation relations. Also show that they are related by a common basis transformation to the matrices
T̃± = T̃1 ± iT̃2 and T̃3 , with T̃i given in Eq. (9.52).
(3/2) (3/2)
Exercise 9.24. Find the representation matrices T± annd T3 for the j = 3/2 representation and
show that they satisfy the correct L(SU (2)) commutation relations.
It is straightforward to check that R is a representation of SO(3), that is, it satisfies R(AB) = R(A)R(B),
but note that including the inverse of A in the definition is crucial for this to work out.
Exercise 9.25. Show that the SO(3) representation R in Eq. (9.84) satisfies R(AB) = R(A)R(B).
To consider the associated Lie algebra we evaluate this for “small” rotations A = 13 + itj T̃j + ·, with the
matrices T̃j defined in Eq. (9.52). Inserting this into Eq. (9.84) and performing a Taylor expansion leads,
after a short calculation, to
The operators L̂j span a representation of the Lie algebra of SO(3) and, hence, must satisfy
This can indeed be verified by direct calculation, using the definition of L̂ in Eq. (9.85).
Exercise 9.26. Verify the commutation relations (9.86) by explicitly calculation, using the definition (9.85)
of L̂.
We have seen that the operators L̂i generate small rotations on functions. In quantum mechanics, the
L̂i are the angular momentum operators, obtained from the classical angular momentum L = x × p by
∂
carrying out the replacement pi → −i ∂x i
. In conclusion, we see that there is a close connection between
angular momentum and the rotation group.
Since the L̂i form a representation of the Lie algebra of SO(3) it is natural to ask which irreducible
representations it contains. To do this it is useful to re-write the L̂i in terms of spherical coordinates
x = r(sin θ cos ϕ, sin θ sin ϕ, cos θ), which leads to
∂ ∂ ∂ ∂ ∂
L̂1 = i sin ϕ + cot θ cos ϕ , L̂2 = i − cos ϕ + cot θ sin ϕ , L̂3 = −i . (9.87)
∂θ ∂ϕ ∂θ ∂ϕ ∂ϕ
150
The Casimir operator L̂2 is given by
L̂2 = −∆S 2 , (9.88)
where ∆S 2 is the Laplacian on the two-sphere, defined in Eq. (6.20).
Exercise 9.27. Derive the expressions (9.87) and (9.88) for angular momentum in polar coordinates.
It is now easy to verify, using Eqs (6.85) and (6.82) that the spherical harmonics satisfy
This means that the vector space spanned by the Ylm for m = −l, . . . , l forms the representation space
Vl of the representation rl with total spin (or angular momentum) l. Note that this only leads to the
representation rj with integer j but not the ones with half integer j. This is directly related to the fact
that we have started with SO(3), rather than SU (2).
The fact that the Ylm span a spin l representation means that
(l)m 0
X
R(A)Ylm (n) = Ylm (A−1 n) = Rm0 (A) Ylm (n) , (9.90)
m0 =−l,··· ,l
where R(l) (A) are the spin l representation matrices (and n is a unit vector which parametrises S 2 ). An
immediate conclusion from this formula is that the function
X
F (n0 , n) := Ylm (n0 )∗ Ylm (n) (9.91)
m=−l,··· ,l
is invariant under rotations, that is, F (An0 , An) = F (n0 , n) for a rotation A. This fact was used in the
proof of Lemma (6.3).
for the two representations rj1 and rj2 . If r = rj1 ⊗ rj2 is the tensor representation we would like to
(1) (2)
understand how its representation matrices Ji := r(τi ) relate to Ji and Ji above. To this end, we write
the corresponding infinitesimal group transformations
and recall, from Eq. (9.24), the definition of the tensor product in term of the Kronecker product. This
means
R(t) = Rj1 (t) × Rj2 (t) = 1 + iti Ji × 1 + 1 × Ji
(1) (2)
+ ··· . (9.94)
× 1 + 1 × Ji
(1) (2)
Ji = Ji . (9.95)
151
(1) (2)
Exercise 9.28. Show, if Ji and Ji each satisfy the commutation relations (9.63) then so does Ji ,
defined in Eq. (9.95) (Hint: Use the property (9.23) of the Kronecker product.)
In summary, we now have the following representations, representation vector spaces and representation
matrices:
representation dimension spanned by range for m representation matrices
(1)
rj1 2j1 + 1 |j1 m1 i m1 = −j1 , . . . , j1 Ji = rj1 (τi )
(2)
rj2 2j2 + 1 |j2 m2 i m2 = −j2 , . . . , j2 Ji = rj2 (τi )
Ji = Ji × 1 + 1 ⊗ Ji
(1) (2)
r = rj1 ⊗ rj2 (2j1 + 1)(2j2 + 1) |j1 j2 m1 m2 i = m1 = −j1 , . . . , j1
|j1 m1 i ⊗ |j2 m2 i m2 = −j2 , . . . , j2
We note that
J3 × 1 + 1 × J3
(1) (2)
J3 |j1 j2 m1 m2 i = |j1 m1 i ⊗ |j2 m2 i
(1) (2)
= J3 |j1 m1 i ⊗ |j2 m2 i + |j1 m1 i ⊗ J3 |j2 m2 i = (m1 + m2 ) |j1 j2 m1 m2 i(9.96)
so the basis states |j1 j2 m1 m2 i of the tensor representation are eigenvectors of J3 with eigenvalues m1 +m2 .
While rj1 and rj2 are irreducible there is no reason for r to be. However, given that we have a complete
list of all irreducible representations from Theorem 9.22 we know that r must have a Clebsch-Gordan
decomposition of the form M
r = rj1 ⊗ rj2 = νj rj , (9.97)
j
where νj ∈ Z≥0 indicates how many times rj is contained in r. (If rj is not contained in r then νj = 0.) Our
first problem is to determine the numbers νj and, hence, to work out the Clebsch-Gordan decomposition
explicitly.
Theorem 9.29. For two representations rj1 and rj2 of L(SU (2)) we have
jM
1 +j2
Proof. Our starting point is to think about the degeneracy, δm , of the eigenvalue m of J3 in the represen-
tation r = rj1 ⊗ rj2 . Every representation rj ⊂ r with j ≥ |m| contributes exactly one to this degeneracy
while representations rj with j < |m| do not contain a state with J3 eigenvalue m. This implies
X
δm = νj , (9.99)
j≥|m|
where νj counts how many time rj is contained in r, as in Eq. (9.97). Eq. (9.99) implies
νj = δj − δj+1 , (9.100)
so if we can work out the degeneracies δm this equation allows us to compute the desired numbers νj .
The degeneracies δm are computed from the observation in Eq. (9.96) that the states with eigenvalue m
are precisely those states |j1 j2 m1 m2 i with m = m1 + m2 . Hence, all we need to do is count the pairs
(m1 , m2 ), where mi = −ji , . . . , ji and m1 + m2 = m. The result is
0 for |m| > j1 + j2
δm = j1 + j2 + 1 − |m| for j1 + j2 ≥ |m| ≥ |j1 − j2 | . (9.101)
2j2 + 1 for |j1 − j2 | ≥ |m| ≥ 0
Inserting these results into Eq. (9.100) shows that νj = 1 for j1 +j2 ≥ j ≥ |j1 −j2 | and νj = 0 otherwise.
152
Eq. (9.98) tells us how to “couple” two spins. For example, two spin 1/2 representations
contain a singlet r0 and a spin 1 representation r1 . Note that dimensions work out since dim(r1/2 ) = 2,
dim(r0 ) = 1 and dim(r1 ) = 3. As another example, consider coupling a spin 1/2 and a spin 1 representation
which leads to
r1/2 ⊗ r1 = r1/2 ⊕ r3/2 . (9.103)
We have now identified the representation content of a tensor product for two irreducible representa-
tions rj1 and rj2 but this gives rise to a more detailed problem. On the tensor representation vector space
V = Vj1 ⊗ Vj2 we have two sets of basis vectors, namely
The question is how these two sets of basis vectors are related. Formally, we can write (using Dirac
notation) X
|jmi = |j1 j2 m1 m2 ihj1 j2 m1 m2 |jmi (9.106)
m1 ,m2
and the numbers hj1 j2 m1 m2 |jmi which appear in this equation are called Clebsch-Gordan coefficients.
Once we have computed these coefficients the relation between the two sets of basis vectors is fixed.
|j1 j2 m1 m2 i → | 12 12 − 21 − 12 i , | 12 12 − 12 12 i , | 12 21 12 − 12 i , | 21 12 12 12 i
(9.107)
|jmi → |00i , |1 − 1i , |10i , |11i
The key to relating these two sets of basis vectors is again to think about the eigenvalue m of
(1) (2)
J3 = J3 + J3 . Consider the m = 1 state |11i from the second basis, The only state from the first
basis with m = 1 (remembering that m1 and m2 sum up) is | 12 21 12 12 i. After a choice of phase, we can
therefore set
|11i = | 12 12 12 12 i . (9.108)
(1) (2)
We can generate the other required relations by acting on Eq. (9.108) with J− = J− + J− , using
the formula (9.78). This leads to
|11i = | 12 12 12 21 i
↓ ↓
√
2 |10i = | 12 12 − 11
2 2i + | 12 21 12 − 12 i
↓ ↓ (9.109)
2 |1 − 1i = | 12 21 − 12 − 21 i + | 12 21 − 21 − 12 i
√
2 |00i = | 12 12 − 21 12 i − | 12 21 12 − 12 i
where the arrows indicate the action of J− and the last relation follows from orthogonality. So in
153
summary we have
= | 12 12 12 21 i
|11i
√1 | 12 12 − 11
+ | 12 12 12 1
10i = 2 2 2i − 2i j=1 (9.110)
| 12 12 1 1
|1 − 1i = − − 2i
2
o
|00i = √1 | 21 12 − 11
− | 21 12 12 − 12 i
2 2 2i j=0 (9.111)
and this provides a complete set of relations between the two sets of basis vector from which all
Clebsch-Gordan coefficients can be read off.
Exercise 9.30. Find the Clebsch-Gordan coefficients for the case (9.103).
L = Λ ∈ Gl(R4 ) | ΛT ηΛ = η
, (9.112)
where η = diag(−1, 1, 1, 1) is the Minkowski metric. In other words, the Lorentz group consists of all real
4 × 4 matrices Λ which satisfy the defining relation ΛT ηΛ = η. Using index notation, this relation can
also be written as
Λµ ρ Λν σ ηµν = ηρσ . (9.113)
Clearly, the Lorentz group is a sub-group of the four-dimensional general linear group Gl(R4 ). As a
matrix group it has a fundamental representation which is evidently four-dimensional. The action of this
fundamental representation Λ : R4 → R4 is explicitly given by
µ
x 7→ x0 = Λx ⇐⇒ xµ 7→ x0 = Λµ ν xν . (9.114)
In Special Relativity this is interpreted as a transformation from one inertial system with space-time
coordinates x = (t, x, y, z)T to another one with space-time coordinates x0 = (t0 , x0 , y 0 , z 0 )T .
det(Λ) = ±1 . (9.115)
P3
Further, the ρ = σ = 0 component of Eq. (9.113) reads −(Λ0 0 )2 + i 2
i=1 (Λ 0 ) = −1 so that
Λ0 0 ≥ 1 or Λ0 0 ≤ −1 . (9.116)
154
Combining the two sign ambiguities in Eqs. (9.115) and (9.116) we see that there are four types of Lorentz
transformations. The sign ambiguity in the determinant is analogous to what we have seen for orthogonal
matrices and its interpretation is similar to the orthogonal case. Lorentz transformations with determinant
1 are called “proper” Lorentz transformations while Lorentz transformations with determinant −1 can be
seen as a combination of a proper Lorentz transformation and a reflection. More specifically, consider the
special Lorentz transformation P = diag(1, −1, −1, −1) (note that this matrix indeed satisfies Eq. (9.113))
which is also referred to as “parity”. Then every Lorentz transformation Λ can be written as
Λ = P Λ+ , (9.117)
where Λ+ is a proper Lorentz transformation. The sign ambiguity (9.116) in Λ0 0 is new but has an obvious
physical interpretation. Under a Lorentz transformations Λ with Λ0 0 ≥ 1 the sign of the time component
x0 = t of a vector x remains unchanged, so that the direction of time is unchanged. Correspondingly,
such Lorentz transformation with positive Λ0 0 are called “ortho-chronous”. On the other hand, Lorentz
transformations Λ with Λ0 0 ≤ −1 change the direction of time. If we introduce the special Lorentz
transformation T = diag(−1, 1, 1, 1), also referred to as “time reversal”, then every Lorentz transformation
Λ can be written as
Λ = T Λ↑ , (9.118)
where Λ↑ is an ortho-chronous Lorentz transformation. Introducing the sub-group L↑+ of proper ortho-
chronous Lorentz transformations, the above discussion shows that the full Lorentz group can be written
as a union of four disjoint pieces:
L = (L↑+ ) ∪ (P L↑+ ) ∪ (T L↑+ ) ∪ (P T L↑+ ) . (9.119)
The Lorentz transformations normally used in Special Relativity are the proper, ortho-chronous Lorentz
transformations. However, the other Lorentz transformations are relevant as well and it is an important
question as to whether they constitute symmetries of nature in the same way that proper, ortho-chronous
Lorentz transformations do. More to the point, the question is whether nature respects parity P and
time-reversal T 13 .
155
of a two-dimensional Lorentz transformation which affects time and the x-coordinate, but leaves y and
z unchanged. Demanding that a Λ of the above form is a proper ortho-chronous Lorentz transformation
leads to
cosh(ξ) sinh(ξ)
Λ2 (ξ) = . (9.122)
sinh(ξ) cosh(ξ)
Exercise 9.31. Show that Eq. (9.122) is the most general form for Λ2 in order for Λ in Eq. (9.121) to be
a proper ortho-chronous Lorenetz transformation.
The quantity ξ in Eq. (9.122) is also called rapidity. It follows from the addition theorems for hyperbolic
functions that Λ(ξ1 )Λ(ξ2 ) = Λ(ξ1 +ξ2 ), so rapidities add up in the same way that two-dimensional rotation
angles do. For a more common parametrisation introduce the parameter β = tanh(ξ) ∈ [−1, 1] so that
1
cosh(ξ) = p =: γ , sinh(ξ) = βγ . (9.123)
1 − β2
In terms of β and γ the two-dimensional Lorentz transformations can then be written in the more familiar
form
γ βγ
Λ2 = . (9.124)
βγ γ
Here, β is interpreted as the relative speed of the two inertial systems (in units of the speed of light).
where
0 i 0 0 0 0 i 0 0 0 0 i
0 0 i 0 0 0
0 0 0 0
0 0 0 0
Ti = , S1 =
0 , S2 = , S3 = , (9.126)
0 T̃i 0 0 0 i 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 i 0 0 0
and T̃i are the SO(3) generators defined in Eq. (9.52). Given that the three-dimensional rotations are
embedded into the Lorentz group as in Eq. (9.120) the appearance of the SO(3) generators in the lower
3 × 3 block is entirely expected. The other generators Si do not correspond to rotations and are called the
boost generators. They are related to non-trivial Lorentz boosts. Altogether, the dimension of the Lorentz
group (Lie algebra) is six, with three parameters describing rotations and the three others Lorentz boosts.
The commutation relations can be worked out by direct computation with the above matrices and they
are given by
[Ti , Tj ] = iijk Tk , [Si , Sj ] = −iijk Tk , [Ti , Sj ] = iijk Sk . (9.127)
These commutation relations are very reminiscent of the ones for SU (2) and we can make this more
explicit by introducing a new basis
1
Ti± := (Ti ± iSi ) (9.128)
2
for the Lie algebra which commute as
156
Exercise 9.32. Verify the commutation relations (9.127) and (9.129).
These commutation relations mean that the Lie algebra of the Lorentz group corresponds to two copies
of the SU (2) Lie algebra, so L(L) ∼= L(SU (2)) ⊕ L(SU (2)). This is a rather lucky state of affairs (which
does not persist for other groups, such as SU (n) for n > 2) since this means we can obtain the irreducible
representations of the Lorentz group from those of SU (2). We have
Theorem 9.33. The finite-dimensional, irreducible representations of the Lorentz group Lie algebra are
classified by two “spins”, (j+ , j− ), where j± ∈ Z/2, and the corresponding representations r(j+ ,j− ) with
representation vector space V(j+ ,j− ) have dimension (2j+ + 1)(2j− + 1). If J˜i± = rj± (Ti± ) are the L(SU (2))
representation matrices for Ti± then
Ji+ := r(j+ ,j− ) (Ti+ ) = J˜i+ × 12j− +1 , Ji− := r(j+ ,j− ) (Ti− ) = 12j+ +1 × J˜i− (9.130)
Exercise 9.34. Verify that the representation matrices (9.130) satisfy the correct commutation relations
for a representation of the Lorentz Lie algebra.
157
Appendices
A Calculus in multiple variables - a sketch
Calculus of multiple variables provides the background for much of what we are doing in this course (and
calculus is the “engine” of differential geometry which we discuss in the following appendices) so it may
be worth including a brief account of the subject. You have seen some of this in your first year, although
perhaps not in quite the same way.
so we can think of f is being given by m real-valued functions f = (f1 , . . . , fm )T , each of which depends
on the n variables x1 , . . . , xn . There are two special choices of dimensions which are of particular interest.
If m = 1 we call f a real-valued function, or scalar field and if n = m we call f a vector field. Vector fields
will also be denoted by uppercase letters A, B, . . ., adopting the notation more common in physics.
Definition A.1. (Continuity) A function f : U → Rm , where U ⊂ Rn open, is said to be continuous at
x ∈ U if every sequence (xk ) in U with limk→∞ xk = x satisfies limk→∞ f (xk ) = f (x). The function f is
said to be continuous on U if it is continuous for all x ∈ U .
Note this definition of continuity is extremely natural. It says continuous functions are those for which
limites can be pulled in and out of the function argument, that is, f (limk→∞ xk ) = limk→∞ f (xk ).
158
the other hand, θ(xk ) = 0 for all k since xk < 0. Hence, 0 = limk→∞ θ(xk ) 6= θ(limk→∞ xk ) = θ(0) = 1
and this indeed violates the condition in Def. A.1. Hence, the Heaviside function is not continuous at
x = 0.
Exercise A.1. The function f (x) = 1/x is not well-defined at x = 0 and should hence be seen as a
function f : R \ {0} → R. Suppose we construct a “completion” of f by defining a new function F : R → R
with values F (x) = 1/x for x 6= 0 and F (0) = x0 for x0 ∈ R. Show, using Def. A.1, that there is no choice
for x0 such that F is continuous at x = 0.
Definition A.2. The function f : U → Rm , with U ⊂ Rn open, is said to be partially differentiable with
respect to xi at x ∈ U if the limes
f (x + ei ) − f (x)
lim (A.2)
→0
exists. In this case, the limes (A.2) is called the partial derivative of f with respect to xi at x and it is
∂f
denoted by ∂x i
(x) or by ∂i f (x).
There are certain (linear) combinations of partial derivatives which are of particular interest. In general,
for a function f = (f1 , . . . , fm )T : U → Rm , where U ∈ Rn , with n variables and m-dimensional values,
we have n m partial derivatives, ∂i fj , which can be arranged into an m × n matrix. This matrix is called
the Jacobi matrix and it will be discussed in detail below. For now we focus on some special cases.
A.2.1 Gradient
Suppose we have a real-valued function f : U → R, where U ⊂ Rn . Then all we have is n partial derivates
∂i f and they can be conveniently arranged into a row-vector
159
which is called the gradient of f at x. Note that for two partially differentiable real-valued functions
f, g : U → R the gradient satisfies the product rule (leaving out the argument x for simplicity)
This follows easily from the product rule for partial derivates:
∇ = (∂1 , . . . , ∂n ) ⇒ ∇i = ∂i (A.7)
A.2.2 Divergence
Now let us consider a vector field A = (A1 , . . . , An )T : U → Rn , where U ⊂ Rn . Its partial derivatives ∂i Aj
can be arranged into Pan n × n matrix (the Jacobi matrix of A). Of course we can consider any particular
linear combination i,j mij ∂i Aj , where mij ∈ R, of these partial derivates, but is there a specific choice
of the n × n matrix m = (mij ) which is singled out in some way? Suppose we would like the combination
of partial derivatives to be invariant under rotations R ∈ SO(n), acting as xi 7→ Rik xk (which implies, via
the chain rule, that ∂i 7→ Rik ∂k ) and Aj 7→ Rjl Al . Then we have mij ∂i Aj 7→ Rik mij Rjl ∂k Al and, hence,
the combination is invariant iff RT mR = m. The only matrices satisfying this condition for all R ∈ SO(n)
are multiples of the unit matrix 1n , so that mij is proportional to δij . This leads to the divergence of the
vector field A defined by
Xn
divA(x) = ∇ · A(x) := ∂i Ai (x) . (A.8)
i=1
Note that the divergence can be written as a formal dot product between the nabla operator (A.7) and
the vector field A, so div = ∇·. The divergence also satisfies a product rule which involves a vector field
A : U → Rn and a real-valued function f : U → R and is given by
∇ · (f A) = A · ∇f + f ∇ · A . (A.9)
As before, its proof relies on the product rule for partial derivatives:
∇ · (f A) = ∂i (f A)i = ∂i (f Ai ) = Ai ∂i f + f ∂i Ai = A · ∇f + f ∇ · A . (A.10)
A.2.3 Curl
We have just seen that the divergence is (up to an overall factor) the only rotationally invariant linear
combination of the partial derivatives ∂i Aj of a vector field A : U → Rn . What if we allow tensors with
more than two indices (rather than just δij ) to be summed into ∂i Aj ? The only other special tensor (which
is rotationally invariant just as δij ) is the n-dimensional Levi-Civita tensor i1 ,...,in . This means it is of
interest to look at the following linear combinations
of the partial derivatives of A, leading to an object with n − 2 indices (which is also called a tensor field).
This is an entirely sensible construction (which finds its natural home in the context of differential forms,
see Appendix C) but only in three dimensions, n = 3, does it lead to a familiar object with one index,
160
that is, a vector field. A frequent question is why the curl is only defined in three dimensions. In fact,
it is, in the sense of Eq. (A.11), defined in all dimensions (and differential forms provide a more natural
framework for this) but only in three dimensions does it lead back to a vector field. This is why the curl is
normally only introduced in three dimensions, and this is the case we focus on now. The curl of a vector
field A : U → R3 is another vector field with components (A.11), that is,
(curl A)i := ijk ∂j Ak ⇒ curl A = ∇ × A . (A.12)
Note that the curl can be expressed as a formal cross product with the nabla operator, so curl = ∇×.
Exercise A.2. Show that for any n × n matrix R the Levi-Civita tensor satisfies
X
Ri1 j1 · · · Rin jj j1 ,...,jn = det(R)i1 ,...,in .
j1 ,...,jn
161
A.3 The total differential
So far we have looked at individual partial derivatives but there is a more general notion of derivative
which does not rely on choosing particular directions (such as the directions of the coordinate axes). This
notion is referred to as total derivative.
To introduce the total derivative in a concise way it is useful to recall briefly the one-dimensional case
of a function f : R → R. Such a function is called differentiable at x ∈ R with derivative f 0 (x) if the limes
f (x + ) − f (x)
f 0 (x) := lim (A.19)
→0
exists. Alternatively and equivalently, we can say that f is differentiable at x ∈ R with derivative f 0 (x)
iff
O(2 )
f (x + ) = f (x) + f 0 (x) + O(2 ) where lim =0 (A.20)
→0
is satisfied for all sufficiently small . This says we should think of the derivative as a linear map which
provides the leading behaviour of the function’s variation, away from f (x). Note that, intuitively, O(2 )
denotes any expression which “goes” like 2 (or an even higher power of ) but it is also properly defined
as an expression which satisfies the limes condition in the right-hand side of Eq. (A.20). More generally,
we say an expression is of order κ , where κ ∈ N, and we write this expression as O(κ ) if
O(κ )
lim =0. (A.21)
→0 κ−1
Using this notation often leads to a good match between mathematical precision and intuition.
Now we want to generalise the one-dimensional definition of derivatives, in the alternative form (A.20),
to introduce the total derivative in the multi-dimensional case.
Definition A.3. A function f : U → Rm , with U ⊂ Rn open, is called totally differentiable at x ∈ U if
there exists an m × n matrix A such that
for all ∈ Rn in a sufficiently small ball Br (0). In this case, the matrix A is called the Jacobi matrix for
f at x and is also denoted by Df (x) := A.
This is where linear algebra meets calculus. In the one-dimensional case (A.20) we had a somewhat trivial
linear map f 0 (x) : R → R (a 1 × 1 matrix) but in the multi-dimensional case, for a function f with n
arguments and m-dimensional values, this becomes a linear map Df (x) : Rn → Rm , that is, an m × n
matrix.
You probably know (or perhaps you don’t?) that a function f : R → R in one dimension which is
differentiable at x ∈ R must be continuous at x. Here is the multi-dimensional generalisation of this
statement.
Proposition A.2. If f : U → Rm , with U ⊂ Rn open, is totally differentiable at x ∈ U , then it is
continuous at x.
Proof. We have to show the continuity property in Def. A.1 so we start with a sequence (xk ) which con-
verges to x. This sequence can also be written as xk = x+k , where limk→∞ k = 0. Total differentiability
at x implies that f (xk ) = f (x + k ) = f (x) + Ak + O(|k |2 ) and taking the limes of this equation gives
162
It is intuitively clear that the total and partial derivatives must be related and the precise relationship is
formulated in the following proposition.
Proof. Assume that f is totally differentiable at x ∈ U , with Jacobi matrix A = Df (x), such that
f (x + ) = f (x) + A + O(||2 ). Focusing on the ith component of f this can be written as fi (x + ) =
fi (x) + Aij j + O(||2 ). Since this holds for all (sufficiently small) we can choose in particular = ej
for ∈ R small and by inserting this into the previous equation we find
In short, the Jacobi matrix is the matrix which contains all the partial derivates. Note from Eq. (A.23)
that it is organised such that every row of Df is the gradient of one of the component functions fi of f .
This means, the Jacobi matrix can also be written as
∇f1
Df (x) = ... (x) . (A.26)
∇fm
In particular, for a real-valued function f : U → R the Jacobi matrix is simply the gradient, so Df = ∇f .
For practical calculations it can be useful to have a notation for the Jacobi matrix which refers explicitly
to the variables xj and the function components fi and such a notation is given by 14
∂(f1 , . . . , fm )
Df (x) = (x) . (A.27)
∂(x1 , . . . , xn )
163
Theorem A.4. (Chain rule) Consider maps g : U → Rm , with U ⊂ Rn open and f : V → Rp , with
V ⊂ Rm and g(U ) ⊂ V , so a sequence
g f
U ⊂ Rn −→ V ⊂ Rm −→ Rp .
Let g be totally differentiable at x ∈ U and f be totally differentiable at y := g(x) ∈ V . Then f ◦ g is
totally differentiable at x ∈ U and we have
D(f ◦ g)(x) = Df (y) Dg(x) . (A.28)
Proof. By assumption the total derivatives of g at x and of f at y exist. We denote these by A = Dg(x)
and B = Df (y) so that
g(x + ) = g(x) + A + O(||2 ) , f (y + η) = f (y) + Bη + O(|η|2 ) .
The trick is to choose η = g(x + ) − g(x) = A + O(||2 ) and this gives
(f ◦ g)(x + ) = f (g(x + )) = f (g(x) + η) = f (g(x)) + Bη + O(|η|2 )
= f (g(x)) + BA + O(|η|2 , ||2 ) . (A.29)
This shows that D(f ◦ g)(x) = BA = Df (y) Dg(x).
In short, the Jacobi matrix of the composite function is the matrix product of the individual Jabobi
matrices, as in Eq. (A.28). There are various other, and perhaps more familiar ways to write this. For
example, let us adopt the somewhat abusive notation, common in physics, where we denote functions and
coordinates by the same name, that is, we write yi (x) = gi (x). Then, using the notation (A.27) for the
Jacobi matrix, the chain rule can be stated as
∂(((f ◦ y)1 , . . . , (f ◦ y)p ) ∂(f1 , . . . , fp ) ∂(y1 , . . . , ym )
(x) = (y) (x) , (A.30)
∂(x1 , . . . , xn ) ∂(y1 , . . . , ym ) ∂(x1 , . . . , xn )
p×n p×m m×n (A.31)
where the size of the Jacobi matrices is indicated underneath. Alternatively, we can write the matrix
product on the right-hand side of Eq. (A.30) out with indices, keeping in mind that a matrix product
translates into a sum over the adjacent indices. Then (omitting the points x and y for simplicity) we get
m
∂(f ◦ y)i X ∂fi ∂yk
= , (A.32)
∂xj ∂yk ∂xj
k=1
perhaps the version of the chain rule you are most familiar with.
If we apply the chain rule in one dimension, (f ◦ g)0 (x) = f 0 (g(x))g 0 (x), to the special case where
0
f = g −1 (assuming that g is invertible and its inverse is also differentiable) we get 1 = g −1 (y)g 0 (x), where
0
y = g(x), and, hence, this leads to the rule for the derivative of the inverse function, g −1 (y) = 1/g 0 (x).
Let us generalises this argument to the multi-dimension case, starting with the chain rule (A.28) and
assuming that f = g −1 exists and is totally differentiable. On the left-hand side of Eq. (A.28) we have
the map f ◦ g = g −1 ◦ g = id, that is the identify map, with id(x) = x. Since
∂ idi ∂xi
= = δij ,
∂xj ∂xj
the Jacobi matrix of the identity map is the unit matrix, so D id = 1. Hence, Eq. (A.28) turns into
1 = D(g−1 )(y) Dg(x) . (A.33)
where y = g(x) and we have shown
164
Corollary A.1. Let g : U → U , with U ⊂ Rn open, be totally differentiable at x ∈ U , and invertible with
the inverse g −1 also totally differentiable at y = g(x) ∈ U . Then, the Jacobi matrix Dg(x) is invertible
and we have
D(g −1 )(y) = (Dg(x))−1 . (A.34)
Note that the inverse on the right-hand side of Eq. (A.34) is a matrix inverse. In other words, the Jacobi
matrix of the inverse function is the matrix inverse of the original Jacobi matrix. If we adopt a more
physics-related notation and write yi (x) = gi (x), as before, then, using the notation (A.27) for the Jacobi
matrix, the inverse derivative rule (A.34) can be stated as
−1
∂(x1 , . . . , xn ) ∂(y1 , . . . , yn )
(y) = (x) . (A.35)
∂(y1 , . . . , yn ) ∂(x1 , . . . , xn )
One way to compute the Jacobi matrix of g −1 is to work out this inverse map explicitly by solving
the equations y1 = (x21 + x22 )/2 and y2 = x1 x2 for x1 and x2 and then computing ∂(x 1 ,x2 )
∂(y1 ,y2 ) . But it is
easier to use the inverse derivative rule (A.35) instead, which gives
−1
∂(x1 , x2 ) ∂(y1 , y2 ) 1 x1 −x2
= = 2 . (A.37)
∂(y1 , y2 ) ∂(x1 , x2 ) x1 − x22 −x2 x1
Note this result for the Jacobi matrix for g −1 is only well-defined in the neighbourhood of points
(x1 , x2 ) where x21 − x22 6= 0. Indeed, when x21 − x22 = 0 the Jacobi matrix (A.36) of g becomes singular
so its inverse does not exist. This indicates that the function g cannot be inverted near points (x1 , x2 )
with x21 − x22 = 0, a statement that will be made more precise by Corollary A.2 below.
165
A.5.1 Taylor’s formula
Theorem A.5. (Taylor’s formula) Let f : U → R, with U ⊂ Rn open, be κ-times continuously differen-
tiable at x ∈ U . For a ξ ∈ Rn which satisfies x + tξ ∈ U for all t ∈ [0, 1] there exists an ∈ [0, 1] such
that
X ∂ K f (x) X ∂ K f (x + ξ) − ∂ K f (x)
f (x + ξ) = ξK + ξK . (A.39)
K! K!
|K|≤κ |K|=κ
Proof. The proof is not too difficult but somewhat elaborate, and we refer to the literature, for example
Ref. [10], for details.
Note that this formula is actually an equality and the second sum on the right-hand side can be seen as
the size of the error if we truncate the series by just keeping the first sum up to order κ on the right-hand
side. The full Taylor series
X ∂ K f (x)
ξK (A.40)
K!
K
does not need to converge and even if it does it need not converge to the function value. However, careful
consideration of the error term in Eq (A.39) and its behaviour with the order κ, for a given function f ,
can often be used to decide for which values of ξ the series (A.40) converges to f (x + ξ).
The error term can be recast into a simpler, more intuitive form. Since, by assumption, κ derivatives
on f still lead to continuous functions we have limξ→0 (∂ K f (x + ξ) − ∂ K f (x)) = 0. From Eq. (A.21), this
means that the second sum in Eq. (A.39) is of O(|ξ|κ+1 ) and Taylor’s formula becomes
X ∂ K f (x)
f (x + ξ) = ξ K + O(|ξ|κ+1 ) . (A.41)
K!
|K|≤κ
166
Local extrema have an interesting property.
Proof. This can be shown by relating the statement to its analogue in one dimension. To do this, consider
the functions gi : R → R defined by gi (t) := f (x + tei ). Clearly, if f has a local extremum at x the
functions gi have a local extremum at t = 0. Hence, from the well-known one-dimensional statement we
know that gi0 (0) = 0. However, gi0 (0) = ∂i f (x) (for all i) and this proves the claim.
Hence, the local extrema of a function f are to be found among its stationary points, that is, among the
points x in its domain for which ∇f (x) = 0. However, not all such stationary points are necessarily local
extrema. Criteria for when this is the case can be formulated in terms of the Hesse matrix. Indeed, at a
stationary point x of f Taylor’s formula (A.43) becomes
1
f (x + ξ) − f (x) = ξ T H(x) ξ + O(|ξ|3 ) , (A.45)
2
so the leading behaviour near x is determined by the second order term which is controlled by the Hesse
matrix. To formulate what this implies more precisely we need to recall some properties of symmetric
n × n matrices M form linear algebra. Such a matrix M is called positive definite if ξ T M ξ > 0 for all
ξ 6= 0. It is called negative definite if −M is positive definite. Further, M is called indefinite if ξ T M ξ takes
on strictly positive and strictly negative values, for suitable ξ. We recall that M is positive (negative)
definite iff all its eigenvalues are strictly positive (strictly negative). It is indefinite iff it has at least one
strictly positive and at least one strictly negative eigenvalue.
Proof. (i) Let us sketch the proof. Since H(x) is positive definite there exists a constant c > 0 such that
2 ξ H(x) ξ ≥ c|ξ| for all ξ ∈ R . This means, we can always find an > 0 so that the O(|ξ| ) term in
1 T 2 n 3
Eq. (A.45) is smaller than 21 ξ T H(x) ξ for all ξ with |.ξ| < . Hence, from Eq. (A.45), f (x + ξ) − f (x) > 0
for all ξ with |.ξ| < so that x is indeed an isolated local minimum.
(ii), (iii) The proofs are similar to the one for (i).
Proposition A.4 and Theorem A.6 suggest a method to find the isolated local extrema of a function
f : U → R. As a first step, find the stationary points of f in U by solving the equation ∇f (x) = 0. Next,
for each stationary point x, compute the Hesse matrix H(x) and its eigenvalues and use the following
criterion:
x local isolated minimum ⇐ all eigenvalues of H(x) strictly positive
x local isolated maximum ⇐ all eigenvalues of H(x) strictly negative . (A.46)
x not a local extremum ⇐ H(x) has strictly positive and strictly negative eigenvalues
Let us discuss functions f of two variables x = (x1 , x2 )T more explicitly. In this case, the Hesse matrix at
x takes the form
(
det(H(x)) = (∂12 f ∂22 f − (∂1 ∂2 f )2 )(x)
2
∂1 f ∂1 ∂2 f
H(x) = 2 (x) ⇒ . (A.47)
∂1 ∂2 f ∂ 2 f tr(H(x)) = (∂12 f + ∂22 f )(x)
167
It has two eigenvalues, λ1 , λ2 , and their signs can be determined by considering the determinant and trace
of H(x). More specifically, for a stationary point x of f we have
Setting ∇f (x, y, z) = 0 one finds three stationary points which we collect, together with the associated
Hesse matrices and their eigenvalues, in the following table.
Hence, the stationary points at (x, y, z) = (0, ±1, 1) are not local extrema (but saddles) since they
have two strictly positive and one strictly negative eigenvalue. On the other hand, the stationary
point at (x, y, z) = (1/2, 0, 1) is a local minimum, since all eigenvalues are strictly positive.
168
does it exist? Unfortunately, the answer is “not always” and this is where the implicit function theorem
comes into play. It states sufficient conditions for the solution g to exist.
The implicit function theorem states, roughly, that, as long as the condition ∂y f (a, b) 6= 0 (and its suitable
generalisation to higher dimensions) is satisfied, so that the tangent is not vertical, we can find a local
solution y = g(x), near the point (a, b). The precise formulation is as follows.
Theorem A.7. (Implicit function theorem). Let f : V × W → Rm , with V ⊂ Rn−m and W ⊂ Rm open
(where n > m), be a continuously differentiable function. Denote the coordinates by (x, y) ∈ V × W and
choose a point (a, b) ∈ V × W . If f (a, b) = 0 and if the matrix
∂(f1 , . . . , fm )
(a, b) (A.49)
∂(y1 , . . . , ym )
has maximal rank, then there exist open neighbourhoods Ṽ ⊂ V of a and W̃ ⊂ W of b and a continuously
differentiable function g : Ṽ → W̃ with f (x, g(x)) = 0 for all (x, y) ∈ Ṽ ×W̃ . Conversely, if (x, y) ∈ Ṽ ×W̃
and f (x, y) = 0 then y = g(x).
Proof. The proof is somewhat lengthy and can be found in the literature, see for example [10].
The maximal rank condition on the (partial) Jacobi matrix (A.49) is the generalisation of the requirement,
stated earlier, that the tangent at (a, b) is not vertical. The theorem implies a formula for the derivative
of the function g in terms of f . To see this, apply the chain rule to the equation f (x, g(x)) = 0:
The second Jacobi matrix in the above equation is precisely the one in Eq. (A.49) which is required to
have maximal rank. Hence, we can invert this matrix and solve for the Jacobi matrix of g:
∂(f1 , . . . , fm ) −1 ∂(f1 , . . . , fm )
∂(g1 , . . . , gm )
=− . (A.51)
∂(x1 , . . . , xn−m ) ∂(y1 , . . . , ym ) ∂(x1 , . . . , xn−m )
Note, we can calculate the Jacobi matrix of g from this formula without actually knowing the function g
explicitly - all we need is the original function f .
169
Application 1.40. Another implicit function example
Let us illustrate this with a further example, for a function f : R3 → R defined by f (x1 , x2 , y) =
x31 /3 + x22 y 3 /3 + 1. The solutions to the equation f (x1 , x2 , y) = 0 form a surface in R3 and it is
often convenient to describe this surface by solving for y in terms of x1 and x2 . The implicit function
theorem states this is possible, at least locally, for points where
∂y f = x22 y 2 (A.52)
is different from zero, that is, for every point on the surface with x2 6= 0 and y 6= 0. In the neighbour-
hood of such a point, we can find a solution y = g(x1 , x2 ) and compute its derivative from Eq. (A.51):
∂f 1 2
∇g = − [∂y f ]−1 = − 2 2 x21 , x2 y 3 . (A.53)
∂(x1 , x2 ) x2 y 3
Here, y should be though of as the function y = g(x1 , x2 ) obtained by solving f (x1 , x2 , y) = 0 for y.
We can apply the implicit function theorem to answer the question under which conditions an inverse of
a (differentiable) function g : W → Rm , with W ⊂ Rm open, exists. More explicitly, if we have points
a ∈ W and b ∈ Rm with a = g(b) we want to know if the equation x = g(y) can be solved for y in terms
of x near (a, b). To make contact with the implicit function theorem, we define the auxiliary function
f : R2m → Rm by f (x, y) = g(y) − x. Then we have f (a, b) = 0 and the implicit function theorem states
that f can be solved for y in terms of x near (a, b) if the matrix
∂(f1 , . . . , fm ) ∂(g1 , . . . , gm )
(a, b) = (b) (A.54)
∂(y1 , . . . , ym ) ∂(y1 , . . . , ym )
has maximimal rank. Hence, we have
Corollary A.2. Let g : W → Rm , where W ⊂ Rm open, be a differentiable function with a = g(b) for
certain points a ∈ W and b ∈ Rm . Then g is invertible (that is, the equation x = g(y) can be solved for
y) in terms of x near (a, b) if the Jacobi matrix
∂(g1 , . . . , gm )
(b) (A.55)
∂(y1 , . . . , ym )
of g is invertible at b.
In common physics notation, no distinction is made between the functions and the associated co-
ordinates. Here, in order to avoid confusion, I have used lowercase letters for the coordinates and
uppercase letters for the corresponding functions. The derivatives of our implicit functions P , V and
170
T can be computed using Eq. (A.51) (or, alternatively, by differentiating Eq. (A.56) using the chain
rule) and this leads to
∂v P = − ∂∂vp ff ∂t P = − ∂∂pt ff
∂ f
∂p V = − ∂vp f ∂t V = − ∂∂vt ff (A.57)
∂ f
∂p T = − ∂pt f ∂v T = − ∂∂vt ff
Various conclusions can be easily obtained using these equations, for example,
Do not let yourself be confused by the second of these equations which says we can simply invert
individual partial derivatives to get the derivatives of the inverse function. At first sight, this seems
to be at odds with our general rule, Eq. (A.35), which says the Jacobi-matrix of the inverse function
is the matrix inverse of the original Jacobi matrix. But it is important to note that this general rule
was based on a set-up with independent coordinates (x1 , . . . , xn )T ∈ Rn . In the present case, on the
other hand, we start with three independent coordinates (p, v, t)T ∈ R3 but we impose the condition
f (p, v, t) = 0. This means we are effectively working on a surface within R3 and only two coordinates
are independent. The somewhat peculiar looking relations (A.58) are a consequence of this special
set-up and are, therefore, not in contradiction with our general rule in Eq. (A.35). In fact, they were
derived using the general chain rule, in the form of Eq. (A.51).
171
B Manifolds in Rn
Differentiable manifolds are the main arena of an area of mathematics called differential geometry. It is a
large subject (and a major area of contemporary research in mathematics) which is well beyond the scope
of these lectures. Here, we would like to present a rather limited discussion of manifolds embedded in Rn
and some of their elementary geometry. (A more advanced introduction to differential geometry can, for
example, be found in Ref. [14].) Some special cases of this have, in fact, already been covered in your
first year course. Apart from generalising these the main purpose of this appendix is to support some
results used in the main part of the text and provide a (very) basic mathematical grounding for General
Relativity.
V \ M = {x 2 Rn | f1 (x) = · · · = fn k (x) = 0}
V
M
Rn
Figure 18: Manifold defined locally as the common zero locus of functions.
that is, there are no “edges”. The circle S 1 in (B.1) provides an example with n = 2, k = 1 and
V
f1 (x, y) = x2 + y 2 − 1. The alternative description (B.2) of S 1 in terms of a parametrisation generalises to
M V \M
172
X
k
Theorem B.1. The set M ⊂ Rn is a k-dimensional sub-manifold of Rn iff, for every p ∈ M we have an
open neighbourhood V ⊂VRn\ofM
p, =
an {x RnU |⊂f1R(x)and
open2set k
=a·bijective
· · = fnmap X = (X1 , . . . , Xn )T : U → V ∩M
k (x) = 0}
such that the Jacobi matrix
∂(X1 , . . . Xn )
(B.4)
∂(t1 , . . . , tk ) V
tk )T ∈ U . The map X is called a chart of M .
has rank k for all t = (t1 , . . . , M
Proof. The proof involves the implicit function theorem A.7 and it can, for example, be found in Ref. [10].
n
R
The circle S 1 as parametrised in Eq. (B.2) provides an example with n = 2, k = 1 and X(t) =
T T
(X1 (t), X2 (t)) = (cos t, sin t) . More generally, the theorem says we can describe a k-dimensional sub-
manifold of Rn (at least locally) by a parametrisation X(t) = (X1 (t), . . . Xn (t))T , where t ∈ Rk are the
parameters. (See Fig. 19.) For one parameter (k = 1), this describes a one-dimensional sub-manifold,
V
M V \M
U ⇢ Rk
that is a curve, for two parameters (k = 2) it describes a two-dimensional sub-manifold, that is a surface,
and so on.
173
X
U ⇢ Rk
Tp M
M p
V \M
for v a ∈ R.
Note that the above definition of the tangent space is independent of the parametrisation used. Con-
sider a new set of parameters t̃ = (t̃1 , . . . , t̃k ), a reparametrisation t̃ = t̃(t) and an alternative parametri-
∂ t̃
sation X̃(t̃) := X(t(t̃)) of the same manifold, such that the Jacobian ∂t has rank k. It follows that
∂ X̃ ∂X ∂tb
= . (B.7)
∂ t̃a ∂tb ∂ t̃a
∂ t̃ ∂ X̃ ∂X
Since the Jacobian ∂t has maximal rank the vectors ∂ t̃a
and ∂ta span the same space.
Note that, intuitively, this vector is indeed tangential to the circle for every t.
Exercise B.2. Write down a suitable parametrisation for a two-sphere S 2 ⊂ R3 and compute its tangent
space at each point.
174
From Def. B.1 we know that at least one component of ∇f (p) must be non-zero. Without restricting
generality, let us assume this is the last component and we spit the coordinates in the neighbourhood of
p up as (x, y) ∈ Rn , where x ∈ Rn−1 and y ∈ R. Since ∂y f (p) 6= 0 we know from the implicit function
theorem A.7 that we have a solution y = g(x) with f (x, g(x)) = 0 in a neighbourhood of p. Further, from
Eq. (A.51) we know that the derivatives of g are given by
∂i f
∂i g = − for i = 1, . . . , n − 1 . (B.9)
∂y f
Using the function g we can also write down a parametrisation for the manifold and compute the resulting
tangent vectors.
x1
.. ∂X
. ei
X(x) = ⇒ = for i = 1, . . . , n − 1 . (B.10)
∂xi ∂i g
xn−1
g(x)
Here we have used the coordinates x = (x1 , . . . , xn−1 )T as the parameters and ei are the standard unit
vectors in Rn−1 . We can now work out the dot product between the gradient ∇f and the above tangent
vectors which leads to
∂X Eq. (B.9)
∇f = ∂i f + ∂y f ∂ i g = 0. (B.11)
∂xi
This shows that ∇f is indeed orthogonal to all tangent vectors and, hence, to Tp M and this is the desired
result.
175
B.3.2 Definition of integration over sub-manifolds
We are now ready 15 to define integration over a sub-manifold of Rn .
Definition B.3. Let M ⊂ Rn be a k-dimensional sub-manifold of Rn which (up to a set of measure zero)
is given by a single chart X : U → M . For a function f : M → R the integral over M is defined as
√
Z Z
f dS := f (X(t)) g dk t . (B.15)
M U
√
Symbolically this can also be written as dS = g dk t.
Note that the integral on the RHS is well-defined - it is simply an integral over the parameter space
U ⊂ Rk . However, it is important to check that this definition is independent of the parametrisation
chosen. Clearly, we do not want the value of integrals to depend on how we choose to parametrise the
manifold. Consider a re-parametrisation t̃ = t̃(t) and the transformation rules
p ∂t √ ∂ t̃
g̃ = det g, dk t̃ = det dk t (B.16)
∂ t̃ ∂t
where the former equation follows from (B.14) and the latter is simply the transformation formula for
integrals in Rk . Then we learn that
p √
dS̃ = g̃ dk t̃ = g dk t = dS , (B.17)
so the integral is indeed independent of the parametrisation. Essentially, this is the reason for including
√
the factor g in the measure - its transformation cancels the transformation under coordinate change, as
is evident from Eq. (B.16).
Exercise B.3. Parametrise the upper half circle in R2 in two different ways (by an angle and by the x
coordinate) and check that the integral over the half-circle is the same for these two parametrisations.
15
For simplicity we focus on the case where the manifold is, up to a set of measure zero, given by a single chart. If multiple
charts come into play a further technical twist, referred to as partition of unity is required.
176
For Gram’s determinant we find
2 2 2 2
∂X ∂X ∂X ∂X ∂X ∂X
g = det(gab ) = − · = × = |N|2 , (B.21)
∂t1 ∂t2 ∂t1 ∂t2 ∂t1 ∂t2
end, hence the measure for surfaces in R3 can be written as
dS = |N| dt1 dt2 . (B.22)
What if the surface is given not by a parametrisation but as the zero locus of a function f , so that
M = {(x, y, z) ∈ R3 | f (x, y, z) = 0}? In this case, we can solve (at least locally where ∂f
∂z 6= 0, from the
implicit function theorem A.7) the equation f (x, y, z) = 0 for z and obtain z = h(x, y) for some function
h. This provides us with a parametrisation of the surface in terms of t1 = x and t2 = y, given by
x
X(x, y) = y . (B.23)
h(x, y)
For this parametrisation the metric and Gram’s determinant read (denoting partial derivatives by sub-
scripts, for simplicity)
1 + h2x hx hy
(gab ) = ⇒ g = 1 + h2x + h2y . (B.24)
hx hy 1 + h2y
Since f (x, y, h(x, y)) = 0 (as z = h(x, y) is a solution) it follows, by applying the chain rule that
fx + fz hx = 0 ⇒ hx = − ffxz
f (B.25)
fy + fz hy = 0 ⇒ hy = − fyz
These equations allow us to re-write Gram’s determinant and the measure in terms of the function f :
|∇f |2 |∇f | dx dy ∇f
g= ⇒ dS = dx dy = , n= . (B.26)
|fz |2 |fz | |n3 | |∇f |
This is a known result from year 1 but note that it applies to a rather special case - the general (and much
more symmetric) formula is the one given in Def. (B.3).
As a final (and more explicit) application, let us compute the measure on a two-sphere S 2 with the
standard parametrisation X(θ, ϕ) = (sin θ cos ϕ, sin θ sin ϕ, cos θ)T , where θ ∈ [0, π] and ϕ ∈ [0, 2π). With
the two tangent vectors
cos θ cos ϕ − sin θ sin ϕ
∂X ∂X
= cos θ sin ϕ , = sin θ cos ϕ , (B.27)
∂θ ∂ϕ
− sin θ 0
the metric and Gram’s determinant are
1 0
(gab ) = ⇒ g = sin2 θ . (B.28)
0 sin2 θ
As a result we find the well-known measure
dS = sin θ dθ dϕ (B.29)
for the integration over S 2 .
Exercise B.4. Consider an ellipsoid in R3 with half-axis a, b and c and parametrisation X(θ, ϕ) =
(a sin θ cos ϕ, b sin θ sin ϕ, c cos θ)T . Work out the corresponding measure dS.
177
B.4 Laplacian
Suppose we have a k-dimensional manifold M ⊂ Rn and a chart X : U → M and we denote the parameters
by t = (t1 , . . . , tk )T ∈ U , as usual. Suppose we choose another set of coordinates t̃ = (t̃1 , . . . , t̃k )T = T (t)
on T (U ) such that
∂ t̃c ∂ t̃d
gab = δcd , (B.30)
∂ta ∂tb
that is, relative to the coordinates t̃a , the metric is δcd . In those coordinates, the Laplacian is of the
standard form
k
X ∂2
∆= . (B.31)
a=1
∂ 2 t̃a
We would like to define a notion of a Laplacian ∆X on M which means re-expressing the Laplacian in
terms of the original coordinates ta .
Definition B.4. Given the above set-up, the Laplacian ∆X , associated to the chart X : U → M , is
defined by
∆X (f ◦ T ) := ∆(f ) ◦ T , (B.32)
where f : Rn → R are twice continuously differentiable functions.
Note that this is quite a natural definition. The composition f ◦ T is a function on the parameter space U
and we define the action of the Laplacian ∆X on this function by the action of the ordinary (Cartesian)
Laplacian, ∆, on f followed by a composition with T , to make this a function on U . While the above
definition seems natural it is not particularly practical for computations. To this end, we have the following
The right-hand sides of Eqs. (B.34) and (B.35) are equal for any ṽ and given we are dealing with continuous
functions this means
√ ∂ √ ab ∂ ũ
g ∆X ũ = gg , (B.36)
∂ta ∂tb
which is the desired result.
178
C Differential forms
Differentials forms is another classical subject of Mathematical Physics which cannot be covered in the
main part of this course. This appendix is a no-nonsense guide to differential forms and a chance to read up
on the subject without having to deal with excessive mathematical overhead. Many physical theories can
be elegantly formulated in terms of differential form, including Classical Mechanics and Electrodynamics.
Differential forms also provide a unifying perspective on the subject of “vector calculus” which leads to
a deeper understanding of the many ad-hoc objects - such as divergence, curl etc. - which have been
introduced in this context. Our treatment here is basic in that we focus on differential forms on Rn and
sub-manifolds thereof, in line with our basic treatment of manifolds in Appendix B. (Ref. [14] contains a
more advanced treatment of differential forms.)
Rn
Tp U
U p
In particular, every tangent space Tp U has a dual vector space Tp∗ U (which consists of linear functionals
Tp U → R), called the co-tangent space. The elements of the co-tangent space are called co-tangent vectors.
So attached to every point p ∈ U we have two vector spaces, the tangent space Tp U and its dual, the
co-tangent space Tp∗ U . We are now ready to define differential one-forms.
Stated less formally, a differential one-form w provides us with a co-tangent vector wp at every point
p ∈ U . Recall that such a co-tangent vector is a functional on the tangent space, so it provides a map
wp : Tp U → R and for a tangent vector v ∈ Tp U we have wp (v) ∈ R.
179
C.1.2 The total differential
So far this sounds fairly abstract and appears to be of little use but some light is shed on the matter by
the following
Definition C.2. (Total differential) For a differentiable function f : U → R the total differential df is a
one-form defined by
n
X ∂f
dfp (v) := ∇fp · v = vi (C.1)
∂xi p
i=1
where v = ni=1 v i ei ∈ Tp U .
P
Exercise C.1. Convince yourself that the total differential defined in Def. C.2 is indeed a differential
one-form.
xi (p) := pi , (C.2)
where p = (p1 , . . . , pn )T , which assign to each point p ∈ U its ith coordinate pi . (By a slight abuse of
notation we have used the same symbol for the coordinate xi and the coordinate function.) To understand
what the total differentials dxi of the coordinate functions are we act on the basis vectors ei of the tangent
space.
dxi |p (ej ) = ∇xi |p · ej = ei · ej = δji ⇒ dxi |p (v) = v i . (C.3)
This is precisely the defining relation for a dual basis 16 and we learn that (dxi |p ) is the basis of Tp∗ U dual
to the basis (ei ) of Tp U . Hence, we have the following
Proposition C.1. The total differentials (dx1 |p , . . . , dxn |p ) of the coordinate functions form a basis of
the co-tangent space Tp∗ U . Hence, every differential one-form w and every total differential df on U can
be written as
n n
X X ∂f
w= wi dxi , df = dxi , (C.4)
∂xi
i=1 i=1
where wi : U → R are functions.
Proof. We have already shown the first part of this statement, that is, that (dx1 |p , . . . , dx n
Pn|p ) is indeed a
∗
basis of Tp U . This means for every one-form w and every point p ∈ U we can write wp = i=1 wi (p)dxi |p ,
for some suitable coefficients wi (p). Dropping the argument p gives the first Eq. (C.4). To show the second
Eq. (C.4) work out
∂f i
X ∂f
df |p (v) = ∇f |p · v = v = dxi |p (v) , (C.5)
∂xi p ∂xi p
i
where Eq. (C.3) has been used in the last step. Dropping the arguments v and p leads to the second
Eq. (C.4).
16
It is sometimes said that dxi represents a “small interval attached to xi ”. Obviously, this statement is not really correct
given how we have just defined differential forms. Generously interpreted, the “small interval” view of dxi can be thought of
as an imprecise way of stating the content of the equation on the right in (C.3). This equation says that dxi |p , when acting
on a tangent v ∈ Tp M (and it is this tangent vector which should really be seen as the displacement from p), gives its ith
component v i .
180
This proposition gives us a clear idea what differential one-forms are and how to write them down in
practice. All we need is n functions wi : U → R and we can write down a differential form using Eq. (C.4).
In particular, we see that differential one-forms w on U are in one-to-one correspondence with vector fields
A : U → Rn via
n w1
A = ... .
X
w= wi dxi ←→ (C.6)
i=1 wn
This is our first hint that differential forms provide us with a new way to think about vector calculus.
Under the identification (C.6), total differentials df correspond to vector fields ∇f which are given as
gradients of functions. This means that total differentials correspond to conservative vector fields.
The representation (C.4) of differential forms can be used to define their properties in terms of prop-
erties of the constituent functions wi .
Definition C.3. A one-form w = ni=1 wi dxi on U is called differentiable (continuously differentiable, k
P
times continuously differentiable) if all functions wi : U → R are differentiable (continuously differentiable,
k times continuously differentiable).
we have
!
dX X dX j X dX i dX
w|X(t) = wi (X(t)) dxi |X(t) (ej ) = wi (X(t)) = A(X(t)) ·
dt X(t) dt X(t) dt X(t) dt X(t)
i,j i
where the duality property (C.3) has been used in the second last step and we have used the identifica-
tion (C.6) of one-forms and vector fields in the last step. Hence, the integral of w over X can also be
written as
n Z b Z b
dX i dX
Z X
w= wi (X(t)) dt = A(X(t)) · dt . (C.10)
X a dt X(t) a dt X(t)
i=1
181
This relation shows that the integral of a one-form over a curve is nothing else but the “line-integral” of
the corresponding vector field.
For the integral of a total differential we have
b b
dX d
Z Z Z
df = ∇f (X(t)) · dt = (f (X(t))) = f (X(b)) − f (X(a)) (C.11)
X a dt X(t) a dt
which says that the curve integral of a total differential df equals the difference of the values of f at the
endpoints of the curve. That is, the value of such integrals only depends on the endpoints but not on
the path taken. In particular, integrals of total differentials over closed curves (curves with X(a) = X(b))
vanish. These statements are of course equivalent to the corresponding statements for conservative vector
fields.
So far this does not appear to be overly useful. All we seem to have done is to describe vector fields in
a different way. However, this was just a preparation for introducing higher order differential forms and
this is where things get interesting. Before we can do this we need a bit of mathematical preparation.
Definition C.5. (Alternating k-forms) An alternating k-form w on a vector space V over F is a map
w : V ⊗k → F which is
(i) linear in each of its k arguments, so w(. . . , αv + βw, . . .) = αw(. . . , v, . . .) + βw(. . . , w, . . .)
(ii) alternating in the sense that w(. . . , v, . . . , w, . . .) = −w(. . . , w, . . . , v, . . .)
(The dots indicate arguments which remain unchanged.) The vector space of alternating k-forms over V
is denoted Λk V ∗ , where k = 1, 2, . . .. We also define Λ0 V ∗ = F .
In short, alternating k-forms take k vector arguments to produce a scalar and they are linear in each
argument and completely antisymmetric under the exchange of arguments. It is clear from the definition
that alternating one-forms are linear functionals, so Λ1 V ∗ = V ∗ , so in this sense we have set up a
generalisation of the dual vector space.
Definition C.6. (Wedge or outer product) Consider functionals ϕ1 , . . . , ϕk ∈ V ∗ . Then, the k-form
ϕ1 ∧ . . . ∧ ϕk ∈ Λk V ∗ is defined by
ϕ1 (v1 ) · · · ϕ1 (vk )
(ϕ1 ∧ . . . ∧ ϕk )(v1 , . . . , vk ) := det .. ..
. (C.12)
. .
ϕk (v1 ) · · · ϕk (vk )
182
It is clear from the linearity of the functionals ϕi as well as the linearity and anti-symmetry of the
determinant that ϕ1 ∧ . . . ∧ ϕk as defined in Eq. (C.12) is indeed a k-form. It is also easy to see from the
properties of the determinant that calculating with the wedge product is subject to the following rules (α
and β are scalars):
The first two rules follow from linearity and anti-symmetry of the determinant, respectively, the third one
is a direct consequence of the second.
Proof. Let 1 , . . . , n be the basis of V with dual basis 1∗ , . . . , n∗ , so that i∗ (j ) = δji . To proof linear
independence consider X
λi1 ...ik i∗1 ∧ . . . ∧ i∗k = 0 (C.17)
i1 <···<ik
and act with this equation on (j1 , . . . , jk ), where j1 < · · · < ik . It follows immediately that λj1 ...jk = 0.
To show that these k-forms span the space start with an arbitrary w ∈ Λk V ∗ . Define the numbers
ci1 ...ik := w(i1 , . . . , ik ) and the differential k-form
X
w̃ := cj1 ...jk j∗1 ∧ . . . ∧ j∗k . (C.18)
j1 <···<jk
Then it follows that w̃(i1 , . . . , ik ) = ci1 ...ik = w(i1 , . . . , ik ) and since w̃ and w agree on a basis we have
w = w̃ and have, hence, written w as a linear combination (C.18) of our basis forms.
The above Lemma gives us a clear idea of how to write down alternating k-forms. Once we have a
basis i∗ of the dual space V ∗ we can use the wedge product to construct a basis i∗1 ∧ . . . ∧ i∗k , where
1 ≤ i1 < . . . < ik ≤ n, of Λk V ∗ , the space of alternating k-forms. Then any k-form w can be written as
X 1 X
w= wi1 ...ik i∗1 ∧ . . . ∧ i∗k = wi1 ...ik i∗1 ∧ . . . ∧ i∗k , (C.19)
k!
i1 <···<ik i1 ,...,ik
where the coefficients wi1 ...ik ∈ F are completely anti-symmetric in the k indices. Also note that there are
no non-trivial alternating k-forms for k > n = dim(V ). In this case the wedge product will always involve
at least two same basis vectors and must vanish from Eq. (C.15). Of course the wedge product generalises
to arbitrary alternating forms by linearity. Specifically, consider an alternating k-form w as in Eq. (C.19)
and an alternating l-forms ν given by
1 X
ν= νj1 ...jl j∗1 ∧ . . . ∧ j∗l , (C.20)
l!
j1 ,...jl
183
Their wedge product is an alternating k + l-form defined by
1 X
w ∧ ν := wi1 ...ik νj1 ...jl i∗1 ∧ . . . ∧ i∗k ∧ j∗1 ∧ . . . ∧ j∗l . (C.21)
k! l!
i1 ,...,ik ,j1 ,...,jl
where ν i are the basis forms for Λ2 V ∗ defined in the above table. In conclusion, we see that the
wedge-product of two one-forms leads to a two-form which can be expressed in terms of the cross
product. Alternating k-forms in three dimensions are the proper context within which to formulate
the cross product and we can see why the cross product only appears in three dimensions. Only in
this case have the spaces Λ1 V ∗ and Λ2 V ∗ the same dimensions so that both one-forms and two-forms
can be interpreted as three-dimensional vectors.
To summarise our discussion, over each n-dimensional vector space V , we now have a “tower” of vector
spaces
Λ0 V ∗ = F , Λ1 V ∗ = V ∗ , Λ2 V ∗ , · · · , Λn V ∗ (C.24)
consisting of alternating k-forms for k = 0, 1, 2, . . . , n. In our definition of differential one-forms we have
used Λ1 V ∗ = V ∗ (where the tangent spaces Tp U have assumed the role of V ). Now we can go further and
arrange the whole tower of vector spaces (C.24) over each tangent space V = Tp U . As we will see, doing
this leads to higher-order differential forms.
184
Definition C.7. (Differential k-forms) A differential k-form is a map w : U → Λk T ∗ U := Λk Tp∗ U ,
S
p∈U
with p 7→ wp , such that wp ∈ Λk Tp∗ U .
From this definition, a differential k-form w provides an alternating k-form wp over Tp U for every point
p ∈ U which can act on k tangent vectors v1 , . . . , vk ∈ Tp U to give a number wp (v1 , . . . , vk ) ∈ R.
It is now quite easy to write down the general expression for a differential k-form. From our discussion
of differential one-forms we know that the differentials dx1p , . . . , dxnp form a basis of the cotengent space
Tp∗ U at p ∈ U . Combining this with what we have said about alternating k-forms in the previous
subsection (see Prop. C.2) shows that the k-forms dxip1 ∧ . . . ∧ dxipk , where 1 ≤ i1 < . . . < ik ≤ n, form a
basis of Λk Tp∗ U . Hence, dropping the point p, a differential k-form can be written as
1 X
w= wi1 ...ik dxi1 ∧ . . . ∧ dxik , (C.25)
k!
i1 ,...,ik
where the wi1 ...ik : U → R are functions, labelled by a completely anti-symmetric set of indices, i1 , . . . , ik .
We can define properties of a differential k-form w in terms of the functions wi1 ...ik : U → R, just as we
have done in the case of differential one-forms (see Def. C.3). For example, we say that w is differentiable
iff all the functions wi1 ...ik are. The space of infinitely many times differentiable k-forms on U is denoted
by Ωk (U ). Note that we can generalise the wedge product and define the k + l-differential form w ∧ ν,
where w and ν are differential k- and l-forms respectively, in complete analogy with what we have done
for alternating forms in Eq. (C.21).
where we have worked out the total derivatives dwi1 ···ik of the functions wi1 ···ik explicitly in the second
step. Recall that the total differential of a functions f : U → R is a one-form df given by
X ∂f
df = dxi . (C.27)
∂xi
i
The exterior derivative satisfies a Leibnitz-type rule. If w is a k-form and ν is an l-form then
d(w ∧ ν) = dw ∧ ν + (−1)k w ∧ dν . (C.28)
Another simple but important property of the exterior derivative is
Proposition C.3. d2 = 0
Proof. This statement follows from straightforward calculation.
1 X ∂wi1 ...ik i
dw = dx ∧ dxi1 ∧ . . . ∧ dxik
k! ∂xi
i,i1 ,...,ik
1 X ∂ 2 wi1 ...ik j
d2 w = d(dw) = dx ∧ dxi ∧ dxi1 ∧ . . . ∧ dxik = 0
k! ∂xi ∂xj
i,j,i1 ,...,ik
The last equality follows because the second partial derivative is symmetric in (i, j) while dxj ∧ dxi is
anti-symmetric.
185
In view of this result it is useful to introduce the following terminology.
Definition C.8. A differential k-form w is called closed iff dw = 0. It is called exact iff there exists a
differential (k − 1)-form ν such that w = dν.
An immediate consequence of Prop. C.3 is that every exact differential form is also closed. The converse is
not always true but it is under certain conditions and this is formulated in a statement known as Poincaré’s
Lemma (see, for example, Ref. [10]). In general, the failure of closed forms to be exact is an important
phenomenon which is related to the topology of the manifold and leads into an area of mathematics known
as algebraic geometry. This is captured by the sequence
d d dn−1 d
0 −→ Ω0 (U ) −→
0
Ω1 (U ) −→
1
· · · · · · −→ Ωn (U ) −→
n
0 (C.29)
where we have attached an index to the exterior derivative d to indicate which degree form it acts on.
The property d2 = 0 from Prop. C.3 then translates into dk ◦ dk−1 = 0, that is, successive maps in the se-
quence (C.29) compose to zero. This makes the sequence (C.29) into what is called a complex: a sequence
of vector spaces related by maps such that adjacent maps compose to zero. The relation dk ◦ dk−1 = 0
implies that Im(dk−1 ) ⊂ Ker(dk ) (another, fancier way of saying every exact form is closed) and this
allows us to defined the cohomology of the space U by H k (U ) = Ker(dk )/Im(dk−1 ). If the space U is
such that Poincaré’s Lemma applies, so that every closed form is exact, then Im(dk−1 ) = Ker(dk ) and the
cohomology groups are trivial, that is, H k (U ) = {0}. Conversely, non-trivial cohomology groups indicate
there are closed but non-exact forms, that Poincaré’s Lemma does not apply and that we have a non-trivial
manifold topology. Pursuing this further is well beyond our present scope (see, for example, Ref. [14]).
Let us take a practical approach and work out differential forms and the exterior derivative more explicitly
for an example.
Recall that in three dimensions we only have differential k-forms for k = 0, 1, 2, 3 and no higher. With
the above notation their general form is:
The above table gives the general expressions for differential forms in three dimensions and also shows
that they can be identified with well-known objects in three-dimensional vector calculus. Specifically,
186
zero-forms and three-forms correspond to (real-valued) functions while one-forms and two-forms cor-
respond to vector fields. (These identifications become more complicated in higher dimensions.)
Now that we have a correspondence between three-dimensional differential forms and objects in
vector calculus it is natural to ask about the meaning of the exterior derivative in this context. We
begin with the exterior derivative of a zero-form which is given by
3
X ∂f i
dw0 = df = dx = (∇f ) · ds . (C.31)
∂xi
i=1
Hence, the exterior derivative acting on three-dimensional zero forms corresponds to the gradient of
the associated function. What about one-forms?
3 3
X X ∂Ai j
dw1 = dAi ∧ dxi = dx ∧ dxi
∂xj
i=1 i,j=1
∂A3 ∂A2 2 3 ∂A1 ∂A3 3 1 ∂A2 ∂A1
= − dx ∧ dx + − dx ∧ dx + − dx1 ∧ dx2
∂x2 ∂x3 ∂x3 ∂x1 ∂x1 ∂x2
= (∇ × A) · dS . (C.32)
Evidently, the exterior derivative of a three-dimensional one-form corresponds to the curl of the
associated vector field. Finally, for a differential two-form we have
3 3
X X ∂Bi j
dw2 = dBi ∧ dS i = dx ∧ dS i = (∇ · B) dV , (C.33)
∂xj
i=1 i,j=1
and, hence, its exterior derivative corresponds to the divergence of the associated vector field. There
are no non-trivial four-forms in three dimensions so for the exterior derivative of a three-form we
have, of course
dw3 = 0 . (C.34)
These results can be summarised as follows. For three-dimensional differential forms the exterior
derivative acts as follows
d d d d
0 −→ Ω0 (U ) −→ Ω1 (U ) −→ Ω2 (U ) −→ Ω3 (U ) −→ 0 , (C.35)
where we recall that Ωk (U ) denotes all (infinitely many times differentiable) k-forms on U . The same
diagram but expressed in the language of three-dimensional vector calculus (see also Eq. (A.17)) reads:
grad=∇ curl=∇× div=∇·
0 −→ C ∞ (U ) −−−−−→ V(U ) −−−−−→ V(U ) −−−−−→ C ∞ (U ) −−−−−→ 0 . (C.36)
The somewhat strange and ad-hoc differential operators of three-dimensional vector calculus are
perfectly natural when understood from the viewpoint of differential forms. They are simply all
versions of the exterior derivative. It is well-known in three-dimensional vector calculus that ∇ ×
(∇f ) = 0 and ∇ · (∇ × B) = 0, that is, carrying out adjacent maps in the diagram (C.36) one after the
other gives zero. From the point of view of differential forms, these equations are just manifestations
of the general property d2 = 0 in Lemma C.3. From Def. C.8, an exact differential one-form w1 is one
187
that can be written as w1 = df , where f is a function. For the corresponding vector field A the same
property is called “conservative”, meaning the vector field can be written as A = ∇f .
In essence, vector calculus is the calculus of differentials forms written in different (and some might
say, awkward) notation.
Exercise C.2. Write the vector fields A = (x2 , y, z 3 )T and B = (y, −x, 0)T on R3 as differential one-
forms and use the exterior derivative to calculate their curl. Next, write A and B as differential two-forms
and use the exterior derivative to compute their divergence.
Exercise C.3. Repeat the above discussion of differential forms in three dimensions for the case of dif-
ferential forms in four dimensions.
Are these one-forms closed? A simple calculation based on the definition, Eq. (C.26), of the exterior
derivative gives
dω = dx ∧ dy + dy ∧ dx = 0
dν = dx ∧ dy − dy ∧ dx = 2dx ∧ dy
dµ = dx ∧ dx + dy ∧ dy + dz ∧ dz = 0 .
It follows that ω and µ are closed, while ν is not closed. Poincaré’s Lemma applies on Rn so this
means that ω and µ must also be exact, that is, we must be able to find functions f and g such
that ω = df and µ = dg. It is easy to verify, using Eq. (C.27), that suitable functions are given by
f (x, y) = xy and g(x, y, z) = (x2 + y 2 + z 2 )/2.
17
If we have a non-trivial metric gij we have to be a bit more careful. In this case we can define the Levi-Civita tensor as
i1 ···in
= √ 1 ˆi1 ···in where ˆi1 ···in is the “pure number” epsilon. Then Eq. (C.38) remains valid with the understanding
det(g)
that indices on the tensor are lowered with gij .
188
Application 3.46. Maxwell’s equations with differential forms
Maxwell’s equations contain three vector fields, the electric field E, the magentic field B and the cur-
rent density J, plus a scalar, the charge density ρ. We know from our discussion of three-dimensional
differential forms above that three-dimensional vector fields can be represented by either one-forms
or two-forms, so we have a choice. It turns out it is convenient (that is, it makes the equations look
simpler) if we write E and J as one-forms and B as a two-form, that is,
E := E · ds , B := B · dS , J := J · ds . (C.40)
It is also useful to work out the Hodge star of these fields. It follows from its definition (C.38) that
?E = E · dS , ?B = B · ds . (C.42)
Using the relations between curl/divergence and the exterior derivative in Eqs. (C.32) and (C.33) we
have
dE = (∇ × E) · dS , dB = (∇ · B)dV , d† E = ∇ · E , d† B = (∇ × B) · ds . (C.43)
Now we are ready to convert Maxwell’s equations. In vector calculus notation they are given by
∇ · E = 4πρ
∇ × B − 1c Ė = 4π
c J
(C.44)
∇ × E + 1c Ḃ = 0
∇·B = 0,
∂
where the dot stands for the time derivative ∂t and c is the speed of light. Multiplying these equations
with 1, ds, dS and dV , respectively, and using Eqs. (C.43) they are easily converted to
d† E = 4πρ
d† B − 1c Ė = 4π
c J
(C.45)
dE + 1c Ḃ = 0
dB = 0 .
This is by no means the most elegant form of Maxwell’s equations. By using differential forms in
three dimensions we are treating the three spatial dimensions and time on different footing.
It is much more natural to think of differential forms in four dimensions with coordinates (xµ ) =
(x = t, xi ) and basis differentials (dxµ ) = (dx0 = dt, dxi ). Then, the fields E and B can be combined
0
into a four-dimensional two-form F (called the field-strength tensor), and the charge density ρ and
the current J into a four-dimensional one-form j (called the four-current). More explicitly, these
quantities are defined by (for simplicity, setting c = 1 from now on)
1
.F = E ∧ dt + B =: Fµν dxµ ∧ dxν , j = ρ dt + J =: jµ dxµ , (C.46)
2
189
where the second equalities define the components Fµν and jµ . Maxwell’s equations (C.45) in three-
dimensional form can be converted into the language of four-dimensional differential forms and be
written in terms of F and j.
To do this we have to be careful to distinguish between operations on three-dimensional and
four-dimensional differential forms. The exterior derivative in four-dimensions is denoted by d =
dxµ ∂µ ∧ while we now denote its three-dimensional counterpart by d3 = dxi ∂i ∧. Likewise, the four-
dimensional Hodge star ? is defined by Eq. (C.38) (using as metric the Lorentz metric ηµν ) with the
four-dimensional Levi-Civita tensor while its three-dimensional counterpart, now denoted by ?3 , is
defined by Eq. (C.38) with the three-dimensional Levi-Civita tensor. For a three-dimensional k-form
w we then have ?(dt ∧ w) = ?3 w and ?w = (−1)k+1 dt ∧ ?3 w. Using these rules, together with the
product rule (C.28) for differential forms we find
F = E
∧ dt + B ?F = − ?3 E + dt ∧ ?3 B
(C.47)
dF = d3 E + Ḃ ∧ dt + d3 B d† F = d†3 E dt + d†3 B − Ė
Comparing the last two of these equations with the three-dimensional version of Maxwell equa-
tions (C.45) (and remembering that d in those equations is d3 in our new notation and we have
set c = 1) we find that, equivalently, they can be written as
d† F = 4πj , dF = 0 . (C.48)
Eqs. (C.48) are referred to as the covariant form of electro-magnetism and covariant here refers to
Lorentz transformations. If we transform covariant and contravariant tensor as we normally do in
Special Relativity (for example dxµ 7→ Λµ ν dxν or Fµν 7→ Λµ ρ Λν σ Fρσ ) then expressions with all indices
contracted (between upper and lower indices) are Lorenz invariant. This means that F = 12 Fµν dxµ ∧
dxν is Lorentz-invariant, as is j = jµ dxµ and d = dxµ ∂µ ∧. It follows that Maxwell’s equations in the
form (C.48) are expressed entirely in terms of Lorentz-invariant quantities and are, therefore, Lorentz-
invariant themselves. In other words, Maxwell’s theory is Lorentz-invariant and, hence, compatible
with Special Relativity. This was not obvious in the three-dimensional formulation (C.45) but it is
manifest in the four-dimensional one (C.48).
Covariant electro-magnetism in the form (C.48) is already quite elegant but it can be simplified
further. Note that the second Eq. (C.48) states that F is closed. This means if the conditions of
Poincaré’s theorem are satisfied (and, in particular, locally) we can write F = dA for a one-form
A = Aµ dxµ . This one-form is called the vector potential or gauge field. From Eq. (C.46), the field
strength F contains the electric and the magnetic fields and is, hence, the physical field. On the other
hand, the vector potential A contains unphysical degrees of freedom, that is, different potentials A
can lead to the same F. This can be seen by changing A by a gauge transformation which is defined
as
A 7→ A0 = A + dλ , (C.49)
where λ is a function. Under such a gauge transformation the field strength is unchanged since
F 7→ F 0 = dA0 = dA + d2 λ = dA = F .
Note that this is a direct consequence of the property d2 = 0 of the exterior derivative, see Prop. C.3.
In conclusion, since A and A0 , related by a gauge transformation (C.49), lead to the same field strength
tensor F they describe the same physics.
190
The gauge transformation (C.49) can be used to choose a vector potential which satisfies an
additional condition (without affecting any of the physics as encoded in the gauge-invariant F). One
such condition, referred to as Lorentz gauge, is
d† A = 0 . (C.50)
Why can this be done? Suppose that A does not satisfy the Lorentz gauge condition (C.50). Then
we perform a gauge transformation (C.49) and demand that the new (physically equivalent) gauge
field A0 satisfies the Lorentz condition, that is, d† A0 = 0. This can be accomplished if the gauge
transformation is carried out with a function λ which satisfies d† dλ = ∆λ = −d† A, that is, with a λ
which satisfies a certain Laplace equation a .
What form do Maxwell’s equations (C.48) take when expressed in terms of the gauge field A?
Since F = dA and d2 = 0, the second, homogeneous Maxwell equation in (C.48) is automatically
satisfied. The first, inhomogeneous Maxwell equation (C.48) becomes
d† A=0
d† dA = 4πj −−−−−→ ∆A = 4πj . (C.51)
That is, expressed in terms of the gauge field A and choosing the Lorentz gauge (C.50) electro-
magnetism is described by the single equation ∆A = 4πj. It does not get any simpler.
a
Note that for zero forms λ we have d† λ = 0 and, hence, from Eq. (C.39), d† dλ = ∆λ.
The integral over differential k-forms relates to the exterior derivative in an interesting way.
191
orientation). Then Z Z
w= dw . (C.53)
∂M M
Proof. A proof can be found in analysis textbooks, for example, Ref. [10].
Stokes’s theorem as above is very general and powerful. In particular, it contains the integral theorems
you have heard about in year one as special cases.
dw = (∇ × A) · dS . (C.54)
This is, of course the integral theorem in three dimensions also known as Stokes’s theorem (in the narrow
sense).
dw = (∇ · B) dV . (C.56)
192
where we can think of A as a vector field in two dimensions. A quick calculation of the exterior derivative
dw gives
∂A1 ∂A2
dw = − da . (C.60)
∂y ∂x
For a two-dimensional manifold V ⊂ U ⊂ R2 with bounding curve ∂V Stokes’s theorem (C.53) then
becomes Z
∂A1 ∂A2
Z
A · ds = − da . (C.61)
∂V V ∂y ∂x
where the hat over dxi indicates that this differential should be omitted from the wedge product. Hence,
the dS i are wedge products of all the dxj , except for j = i. A differential n − 1-form w on U can be
written as
w = B · dS ⇒ dw = (∇ · B) dV , (C.62)
where B = (B1 , . . . , Bn )T is an n-dimensional vector field and ∇ · B = ni=1 ∂B
P i
∂xi
is the n-dimensional
version of the divergence. With an n-dimensional manifold V ⊂ U ⊂ R , bounded by an (n − 1)-
n
Exercise C.6. Derive a version of Stoke’s theorem in four dimensions, which relates integrals of a two-
form over surfaces S ⊂ R4 to integrals of a one-form over the boundary curve ∂S.
193
D Literature
The references below do not provide a comprehensive list - there is a large number of mathematics and
physics books relevant to the subject of mathematical methods in physics and I suggest an old-fashioned
trip to the library. The books below have been useful in preparing these lectures.
[1] K. F. Riley, M. P. Hobson and S. J. Bence, “Mathematical Methods for Physics and
Engineering”, CUP 2002.
This is the recommended book for the mathematics component of the physics course. As the title
suggests this is a “hands-on” book, strong on explaining methods and concrete applications, rather
weaker on presenting a coherent mathematical exposition.
[2] Albert Messiah, “Quantum Mechanics”, Courier Corporation, 2014.
A comprehensive physics book on quantum mechanics, covering the more formal aspects of the subject
as well as applications, with a number of useful appendices, including on special functions and on
group theory.
[3] John David Jackson, “Classical Electrodynamics”, Wiley, 2012.
For me the ultimate book on electrodynamics. In addition to a comprehensive coverage of the subject
and many physical applications it also explains many of the required mathematical methods informally
but efficiently.
[4] F. Constantinescu, “Distributions and Their Applications in Physics”, Elsevier, 2017.
Exactly what the title says. Contains a lot more on distributions then we were able to cover in these
lectures, including a much more comprehensive discussion of the various types of test function spaces
and distributions and the topic of Green functions for many linear operators relevant to physics.
[5] Bryan P. Rynne and Martin A. Youngson, “Linear Functional Analysis”, Springer 2008.
A nice comprehensive treatment of functional analysis which gets to the point quickly but which is
not particularly explicit about applications. A good source to learn some of the basics.
[6] Ole Christensenn, “Functions, Spaces and Expansions”, Springer 2010.
A book on functional analysis from a more applied point of view, starting with some of the basic
mathematics and then focusing on systems of ortho-normal functions and their applications. A good
book to understand some of the basic mathematics and the practicalities of dealing with ortho-normal
systems - less formal than Rynne/Youngson.
[7] Francesco Giacomo Tricomi, “Serie ortogonali di funzioni”.
The hardcore book on ortho-normal systems of functions from the Italian master. Sadly, I haven’t been
able to find an English version. The book contains a very nice treatment of orthogonal polynomials,
among many other things, which is a lot more interesting than the stale account found in so many
other books. Chapter 4 of these lectures follows the logic of this book.
[8] Serge Lang, “Real and Functional Analysis”, Springer 1993.
A high quality mathematics book covering analysis and functional analysis at an advanced level.
[9] Michela Petrini, Gianfranco Pradisi, Alberto Zaffaroni, “A Guide to Mathematical
Methods for Physicists: With Problems and Solutions”, World Scientific, 2017.
A nice book covering many of the main pieces of mathematics crucial to physics, including complex
functions, integration theory and functional analysis, taking the mathematics seriously but without
being overly formal.
194
[10] Serge Lang, “Undergraduate Analysis”, Springer 1997.
A very nice book on the subject, not overly formal and very useful to fill in inevitable mathematical
gaps.
[13] William Fulton and Joe Harris, “Representation Theory: A First Course”, Springer
2013.
An excellent book, covering both finite groups and Lie groups, but somewhat advanced.
[14] Mikio Nakahara, “Geometry, Topology and Physics”, Taylor and Francis, 2013.
An excellent textbook, written for physicists, but at a more advanced level, discussing topology,
differential geometry and how it relates to physics. This is not really relevant for the main part of
this course but it gives a more advanced account of the material explained in Appendices B and C. If
you are interested in this aspect of mathematical physics (and you would perhaps like to learn about
the mathematics underlying General Relativity) this is the book for you.
195