100% found this document useful (1 vote)
252 views402 pages

Topics in Linear and Nonlinear Functional Analysis

Functional Analysis
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
252 views402 pages

Topics in Linear and Nonlinear Functional Analysis

Functional Analysis
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 402

Topics in Linear and Nonlinear

Functional Analysis
Gerald Teschl

Graduate Studies
in Mathematics
Volume (to appear)

American Mathematical Society


Providence, Rhode Island
Gerald Teschl
Fakultät für Mathematik
Oskar-Mogenstern-Platz 1
Universität Wien
1090 Wien, Austria

E-mail: [email protected]
URL: http://www.mat.univie.ac.at/~gerald/

2010 Mathematics subject classification. 46-01, 46E30, 47H10, 47H11, 58Exx,


76D05

Abstract. This manuscript provides a brief introduction to linear and non-


linear Functional Analysis. It covers basic Hilbert and Banach space theory
as well as some advanced topics like operator semigroups, mapping degrees
and fixed point theorems.

Keywords and phrases. Functional Analysis, Banach space, Hilbert space,


operator semigroup, mapping degree, fixed point theorem, differential equa-
tion, Navier–Stokes equation.

Typeset by AMS-LATEX and Makeindex.


Version: March 19, 2023
Copyright © 1998–2022 by Gerald Teschl
Contents

Preface vii

Part 1. Functional Analysis


Chapter 1. A first look at Banach and Hilbert spaces 3
§1.1. Introduction: Linear partial differential equations 3
§1.2. The Banach space of continuous functions 8
§1.3. The geometry of Hilbert spaces 19
§1.4. Completeness 26
§1.5. Compactness 27
§1.6. Bounded operators 30
§1.7. Sums and quotients of Banach spaces 36
§1.8. Spaces of continuous and differentiable functions 41
Chapter 2. Hilbert spaces 45
§2.1. Orthonormal bases 45
§2.2. The projection theorem and the Riesz representation theorem 52
§2.3. Operators defined via forms 55
§2.4. Orthogonal sums and tensor products 61
§2.5. Applications to Fourier series 63
Chapter 3. Compact operators 71
§3.1. Compact operators 71
§3.2. The spectral theorem for compact symmetric operators 74
§3.3. Applications to Sturm–Liouville operators 83

iii
iv Contents

§3.4. Estimating eigenvalues 91


§3.5. Singular value decomposition of compact operators 94
§3.6. Hilbert–Schmidt and trace class operators 98
Chapter 4. The main theorems about Banach spaces 107
§4.1. The Baire theorem and its consequences 107
§4.2. The Hahn–Banach theorem and its consequences 116
§4.3. Reflexivity 122
§4.4. The adjoint operator 126
§4.5. Weak convergence 132
Chapter 5. Bounded linear operators 143
§5.1. Banach algebras 143
§5.2. The C∗ algebra of operators and the spectral theorem 154
§5.3. Spectral measures 157

Part 2. Advanced Functional Analysis


Chapter 6. More on convexity 165
§6.1. The geometric Hahn–Banach theorem 165
§6.2. Convex sets and the Krein–Milman theorem 169
§6.3. Weak topologies 174
§6.4. Beyond Banach spaces: Locally convex spaces 179
§6.5. Uniformly convex spaces 186
Chapter 7. Advanced Spectral theory 191
§7.1. Spectral theory for compact operators 191
§7.2. Fredholm operators 199
§7.3. The Gelfand representation theorem 206
Chapter 8. Unbounded operators 213
§8.1. Closed operators 213
§8.2. Spectral theory for unbounded operators 223
§8.3. Reducing subspaces and spectral projections 228
§8.4. Relatively bounded and relatively compact operators 231
§8.5. Unbounded Fredholm operators 236

Part 3. Nonlinear Functional Analysis


Chapter 9. Analysis in Banach spaces 243
Contents v

§9.1. Single variable calculus in Banach spaces 243


§9.2. Multivariable calculus in Banach spaces 246
§9.3. Minimizing nonlinear functionals via calculus 258
§9.4. Minimizing nonlinear functionals via compactness 264
§9.5. Contraction principles 272
§9.6. Ordinary differential equations 276
§9.7. Bifurcation theory 283
Chapter 10. The Brouwer mapping degree 289
§10.1. Introduction 289
§10.2. Definition of the mapping degree and the determinant formula291
§10.3. Extension of the determinant formula 295
§10.4. The Brouwer fixed point theorem 302
§10.5. Kakutani’s fixed point theorem and applications to game
theory 304
§10.6. Further properties of the degree 307
§10.7. The Jordan curve theorem 310
Chapter 11. The Leray–Schauder mapping degree 311
§11.1. The mapping degree on finite dimensional Banach spaces 311
§11.2. Compact maps 312
§11.3. The Leray–Schauder mapping degree 314
§11.4. The Leray–Schauder principle and the Schauder fixed point
theorem 316
§11.5. Applications to integral and differential equations 318
Chapter 12. Monotone maps 325
§12.1. Monotone maps 325
§12.2. The nonlinear Lax–Milgram theorem 327
§12.3. The main theorem of monotone maps 328
Appendix A. Some set theory 331
Appendix B. Metric and topological spaces 339
§B.1. Basics 339
§B.2. Convergence and completeness 345
§B.3. Functions 349
§B.4. Product topologies 351
§B.5. Compactness 354
vi Contents

§B.6. Separation 361


§B.7. Connectedness 364
§B.8. Continuous functions on metric spaces 368
Bibliography 377
Glossary of notation 379
Index 383
Preface

The present manuscript was written for my course Functional Analysis given
at the University of Vienna in winter 2004 and 2009. The second part are
the notes for my course Nonlinear Functional Analysis held at the University
of Vienna in Summer 1998, 2001, and 2018. The two parts are essentially in-
dependent. In particular, the first part does not assume any knowledge from
measure theory (at the expense of hardly mentioning Lp spaces). However,
there is an accompanying part on Real Analysis [37], where these topics are
covered.
It is updated whenever I find some errors and extended from time to
time. Hence you might want to make sure that you have the most recent
version, which is available from
http://www.mat.univie.ac.at/~gerald/ftp/book-fa/
Please do not redistribute this file or put a copy on your personal
webpage but link to the page above.

Goals

The main goal of the present book is to give students a concise introduc-
tion which gets to some interesting results without much ado while using a
sufficiently general approach suitable for further studies. Still I have tried
to always start with some interesting special cases and then work my way
up to the general theory. While this unavoidably leads to some duplications,
it usually provides much better motivation and implies that the core ma-
terial always comes first (while the more general results are then optional).
Moreover, this book is not written under the assumption that it will be

vii
viii Preface

read linearly starting with the first chapter and ending with the last. Con-
sequently, I have tried to separate core and optional materials as much as
possible while keeping the optional parts as independent as possible.
Furthermore, my aim is not to present an encyclopedic treatment but to
provide the reader with a versatile toolbox for further study. Moreover, in
contradistinction to many other books, I do not have a particular direction
in mind and hence I am trying to give a broad introduction which should
prepare you for diverse fields such as spectral theory, partial differential
equations, or probability theory. This is related to the fact that I am working
in mathematical physics, an area where you never know what mathematical
theory you will need next.
I have tried to keep a balance between verbosity and clarity in the sense
that I have tried to provide sufficient detail for being able to follow the argu-
ments but without drowning the key ideas in boring details. In particular,
you will find a show this from time to time encouraging the reader to check
the claims made (these tasks typically involve only simple routine calcula-
tions). Moreover, to make the presentation student friendly, I have tried
to include many worked-out examples within the main text. Some of them
are standard counterexamples pointing out the limitations of theorems (and
explaining why the assumptions are important). Others show how to use the
theory in the investigation of practical examples.

Preliminaries

The present manuscript is intended to be gentle when it comes to required


background. Of course I assume basic familiarity with analysis (real and
complex numbers, limits, differentiation, basic (Riemann) integration, open
sets) and linear algebra (finite dimensional vector spaces, matrices).
Apart from these natural assumptions I also expect some familiarity with
metric spaces and point set topology. However, only a few basic things are
required to begin with. This and much more is collected in the Appendix
and I will refer you there from time to time such that you can refresh your
memory should need arise. Moreover, you can always go there if you are
unsure about a certain term (using the extensive index) or if there should
be a need to clarify notation or conventions. I prefer this over referring you
to several other books which might not always be readily available. For
convenience, the Appendix contains full proofs in case one needs to fill some
gaps. As some things are only outlined (or outsourced to exercises), it will
require extra effort in case you see all this for the first time.
On the other hand I do not assume familiarity with Lebesgue integration
and consequently Lp spaces will only be briefly mentioned as the completion
Preface ix

of continuous functions with respect to the corresponding integral norms in


the first part. I am aware that this is a decision one might dispute, however,
it has some evident advantages. In particular, the examples frequently dis-
cuss discrete versions of classical topics thereby not only avoiding Lebesgue
integration as a prerequisite but also avoiding technical difficulties which hide
the main ideas. Readers familiar with measure theory should then have no
problems to handle the continuous case. At a few places I also assume some
basic results from complex analysis but it will be sufficient to just believe
them.
The second part of course requires basic familiarity with functional anal-
ysis and measure theory (Lebesgue and Sobolev spaces). But apart from this
it is again independent form the first two parts.

Content

Below follows a short description of each chapter together with some


hints which parts can be skipped.
Chapter 1. The first part starts with Fourier’s treatment of the heat
equation which led to the theory of Fourier analysis as well as the develop-
ment of spectral theory which drove much of the development of functional
analysis around the turn of the last century. In particular, the first chap-
ter tries to introduce and motivate some of the key concepts and should be
covered in detail except for Section 1.8 which introduces some interesting
examples for later use.
Chapter 2 discusses basic Hilbert space theory and should be considered
core material except for the last section discussing applications to Fourier
series. They will only be used in some examples and could be skipped in
case they are covered in a different course.
Chapter 3 develops basic spectral theory for compact self-adjoint op-
erators. The first core result is the spectral theorem for compact symmetric
(self-adjoint) operators which is then applied to Sturm–Liouville problems.
Of course this application could be skipped, but this would reduce the didac-
tical concept to absurdity. Nevertheless it is clearly possible to shorten the
material as non of it (including the follow-up section which touches upon
some more tools from spectral theory) will be required in later chapters.
The last two sections on singular value decompositions as well as Hilbert–
Schmidt and trace class operators cover important topics for applications,
but will again not be required later on.
Chapter 4 discusses what is typically considered as the core results
from Banach space theory. In order to keep the topological requirements to
a minimum some advanced topics are shifted to the following chapters.
x Preface

Chapter 5 develops spectral theory for bounded self-adjoint operators


via the framework of C ∗ algebras. The last section contains some optional
results establishing the connection with the measure theoretic formulation
of the spectral theorem.
The next chapters contain selected advanced topics.
Chapter 6 centers around convexity. Except for the geometric Hahn–
Banach theorem, which is a prerequisite for the other sections, the remaining
sections are independent of each other to simplify the selection of topics.
Chapter 7 presents some advanced topics from spectral theory: The
Gelfand representation theorem, spectral theory for compact operators in
Banach spaces and Fredholm theory. Again these sections are independent
of each other except for the fact that Section 7.1, which contains the spec-
tral theorem for compact operators, and hence the Fredholm alternative for
compact perturbations of the identity, is of course used to identify compact
perturbations of the identity as premier examples of Fredholm operators.
Chapter 8 touches upon unbounded operators starting with the basic
results about closed operators. Since unbounded operators play an increasing
role in applications I felt it is appropriate to discuss at least some basics.
Finally, there is a part on nonlinear functional analysis.
Chapter 9 discusses analysis in Banach spaces (with a view towards
applications in the calculus of variations and infinite dimensional dynamical
systems).
Chapter 10 finally gives a brief introduction to operator semigroups.
Chapter 11 applies the results obtained so far to an ubiquitous example,
the nonlinear Schrödinger equation.
Chapter 12 and 13 cover degree theory and fixed point theorems in
finite and infinite dimensional spaces. Several applications to integral equa-
tions, ordinary differential equations and to the stationary Navier–Stokes
equation are given.
Chapter 14 provides some basics about monotone maps.
Sometimes also the historic development of the subject is of interest. This
is however not covered in the present book and we reefer to [21, 32, 33] as
good starting points.

To the teacher

There are a couple of courses to be taught from this book. First of


all there is of course a basic functional analysis course: Chapters 1 to 4
(skipping some optional material as discussed above) and perhaps adding
Preface xi

some material from Chapter 5 or 6. If one wants to cover Lebesgue spaces,


this can be easily done by including Chapters 1, 2, and 3 from [37]. In this
case one could cover Section 1.2 (Section 1.1 contains just motivation), give
an outline of Section 1.3 (by covering Dynkin’s π-λ theorem, the uniqueness
theorem for measures, and then quoting the existence theorem for Lebesgue
measure), cover Section 1.5. The core material from Chapter 2 are the
first two sections and from Chapter 3 the first three sections. I think that
this gives a well-balanced introduction to functional analysis which contains
several optional topics to choose from depending on personal preferences and
time constraints.
The remaining material from the first part could then be used for a course
on advanced functional analysis. Typically one could also add some further
topics from the second part or some material from unbounded operators in
Hilbert spaces following [36] (where one can start with Chapter 2).
The third part gives a short basis for a course on nonlinear functional
analysis.
Problems relevant for the main text are marked with a "*". A Solutions
Manual will be available electronically for instructors only.

Acknowledgments
I wish to thank my readers, Olta Ahmeti, Kerstin Ammann, Phillip Bachler,
Batuhan Bayır, Alexander Beigl, Mikhail Botchkarev, Ho Boon Suan, Peng
Du, Christian Ekstrand, Mischa Elkner, Damir Ferizović, Michael Fischer,
Raffaello Giulietti, Melanie Graf, Josef Greilhuber, Julian Grüber, Matthias
Hammerl, Jona Marie Hassenbach, Nobuya Kakehashi, Jerzy Knopik, Niko-
las Knotz, Florian Kogelbauer, Helge Krüger, Reinhold Küstner, Oliver Lein-
gang, Juho Leppäkangas, Annemarie Luger, Joris Mestdagh, Alice Mikikits-
Leitner, Claudiu Mîndrilǎ, Jakob Möller, Caroline Moosmüller, Nikola Ne-
govanovic, Matthias Ostermann, Martina Pflegpeter, Mateusz Piorkowski,
Piotr Owczarek, Fabio Plaga, Tobias Preinerstorfer, Maximilian H. Ruep,
Tidhar Sariel, Chiara Schindler, Christian Schmid, Stephan Schneider, Laura
Shou, Bertram Tschiderer, Liam Urban, Vincent Valmorin, David Wallauch,
Richard Welke, David Wimmesberger, Gunter Wirthumer, Song Xiaojun,
Markus Youssef, Rudolf Zeidler, and colleagues Pierre-Antoine Absil, Nils
C. Framstad, Fritz Gesztesy, Heinz Hanßmann, Günther Hörmann, Aleksey
Kostenko, Wallace Lam, Daniel Lenz, Johanna Michor, Viktor Qvarfordt,
Alex Strohmaier, David C. Ullrich, Hendrik Vogt, Marko Stautz, Maxim
Zinchenko who have pointed out several typos and made useful suggestions
for improvements. Moreover, I am most grateful to Iryna Karpenko who
read several parts of the manuscript, provided long lists of typos, and also
xii Preface

contributed some of the problems. I am also grateful to Volker Enß for mak-
ing his lecture notes on nonlinear Functional Analysis available to me.

Finally, no book is free of errors. So if you find one, or if you


have comments or suggestions (no matter how small), please let
me know.

Gerald Teschl

Vienna, Austria
January, 2019
Part 1

Functional Analysis
Chapter 1

A first look at Banach


and Hilbert spaces

Functional analysis is an important tool in the investigation of all kind of


problems in pure mathematics, physics, biology, economics, etc.. In fact, it
is hard to find a branch in science where functional analysis is not used.
The main objects are (infinite dimensional) vector spaces with different
concepts of convergence. The classical theory focuses on linear operators
(i.e., functions) between these spaces but nonlinear operators are of course
equally important. However, since one of the most important tools in investi-
gating nonlinear mappings is linearization (differentiation), linear functional
analysis will be our first topic in any case.

1.1. Introduction: Linear partial differential equations


Rather than listing an overwhelming number of classical examples I want to
focus on one: linear partial differential equations. We will use this example
as a guide throughout our first three chapters and will develop all necessary
tools for a successful treatment of our particular problem.
In his investigation of heat conduction Fourier1 was led to the (one di-
mensional) heat or diffusion equation
∂ ∂2
u(t, x) = u(t, x). (1.1)
∂t ∂x2
Here u : [0, ∞) × [0, 1] → R is the temperature distribution in a thin rod at
time t ≥ 0 at the point x ∈ [0, 1]. It is usually assumed, that the temperature

1Joseph Fourier (1768–1830), French mathematician and physicist

3
4 1. A first look at Banach and Hilbert spaces

at x = 0 and x = 1 is fixed, say u(t, 0) = α and u(t, 1) = β. By considering


u(t, x) → u(t, x)−α−(β−α)x it is clearly no restriction to assume α = β = 0.
Moreover, the initial temperature distribution u(0, x) = u0 (x) is assumed to
be known as well.
Since finding the solution seems at first sight unfeasible, we could try to
find at least some solutions of (1.1). For example, we could make an ansatz
for u(t, x) as a product of two functions, each of which depends on only one
variable, that is,
u(t, x) := w(t)y(x). (1.2)
Plugging this ansatz into the heat equation we arrive at
ẇ(t)y(x) = y ′′ (x)w(t), (1.3)
where the dot refers to differentiation with respect to t and the prime to
differentiation with respect to x. Bringing all t, x dependent terms to the
left, right side, respectively, we obtain
ẇ(t) y ′′ (x)
= . (1.4)
w(t) y(x)
Accordingly, this ansatz is called separation of variables.
Now if this equation should hold for all t and x, the quotients must be
equal to a constant −λ (we choose −λ instead of λ for convenience later on).
That is, we are led to the equations
−ẇ(t) = λw(t) (1.5)
and
−y ′′ (x) = λy(x), y(0) = y(1) = 0, (1.6)
which can easily be solved. The first one gives
w(t) = a e−λt (1.7)
and the second one
√ √
y(x) = b cos( λx) + c sin( λx). (1.8)
Here a, b, c are arbitrary real constants and since we are only interested in
the product w y, we can choose a = 1 without loss of generality. Moreover,
y(x) must also satisfy the boundary conditions y(0) = y(1) = 0. The first
one y(0) = 0 is satisfied if b = 0 and the second one yields (c = 0 only leads
to the trivial solution) √
sin( λ) = 0, (1.9)

which holds if λ = (πn) , n ∈ N (in the case λ < 0 we get sinh( −λ) = 0,
2

which cannot be satisfied and explains our choice of sign above). In summary,
we obtain the solutions
2
un (t, x) := cn e−(πn) t sin(nπx), n ∈ N. (1.10)
1.1. Introduction: Linear partial differential equations 5

So we have found a large number of solutions, but we still have not dealt
with our initial condition u(0, x) = u0 (x). This can be done using the
superposition principle which holds since our equation is linear: Any finite
linear combination of the above solutions will again be a solution. Moreover,
under suitable conditions on the coefficients we can even consider infinite
linear combinations. In fact, choosing

2
X
u(t, x) := cn e−(πn) t sin(nπx), (1.11)
n=1

where the coefficients cn decay sufficiently fast (e.g. absolutely summable),


we obtain further solutions of our equation. Moreover, these solutions satisfy

X
u(0, x) = cn sin(nπx) (1.12)
n=1
and expanding the initial conditions into a Fourier sine series

X
u0 (x) = û0,n sin(nπx), (1.13)
n=1

we see that the solution of our original problem is given by (1.11) if we choose
cn = û0,n (cf. Problem 1.2).
Of course for this last statement to hold we need to ensure that the series
in (1.11) converges and that we can interchange summation and differentia-
tion. You are asked to do so in Problem 1.1.
In fact, many equations in physics can be solved in a similar way:
• Reaction-Diffusion equation:
∂ ∂2
u(t, x) − 2 u(t, x) + q(x)u(t, x) = 0,
∂t ∂x
u(0, x) = u0 (x),
u(t, 0) = u(t, 1) = 0. (1.14)
Here u(t, x) could be the density of some gas in a pipe and q(x) > 0 describes
that a certain amount per time is removed (e.g., by a chemical reaction).
• Wave equation:
∂2 ∂2
2
u(t, x) − 2 u(t, x) = 0,
∂t ∂x
∂u
u(0, x) = u0 (x), (0, x) = v0 (x)
∂t
u(t, 0) = u(t, 1) = 0. (1.15)
Here u(t, x) is the displacement of a vibrating string which is fixed at x = 0
and x = 1. Since the equation is of second order in time, both the initial
6 1. A first look at Banach and Hilbert spaces

displacement u0 (x) and the initial velocity v0 (x) of the string need to be
known.
• Schrödinger equation:2
∂ ∂2
i u(t, x) = − 2 u(t, x) + q(x)u(t, x),
∂t ∂x
u(0, x) = u0 (x),
u(t, 0) = u(t, 1) = 0. (1.16)
Here |u(t, x)|2 is the probability distribution of a particle trapped in a box
x ∈ [0, 1] and q(x) is a given external potential which describes the forces
acting on the particle.
All these problems (and many others) lead to the investigation of the
following problem
d2
Ly(x) = λy(x), L := − + q(x), (1.17)
dx2
subject to the boundary conditions
y(a) = y(b) = 0. (1.18)
Such a problem is called a Sturm–Liouville boundary value problem.3
Our example shows that we should prove the following facts about Sturm–
Liouville problems:
(i) The Sturm–Liouville problem has a countable number of eigenval-
ues En with corresponding eigenfunctions un , that is, un satisfies
the boundary conditions and Lun = En un .
(ii) The eigenfunctions un are complete, that is, any nice function u
can be expanded into a generalized Fourier series

X
u(x) = cn un (x).
n=1
This problem is very similar to the eigenvalue problem of a matrix and we
are looking for a generalization of the well-known fact that every symmetric
matrix has an orthonormal basis of eigenvectors. However, our linear opera-
tor L is now acting on some space of functions which is not finite dimensional
and it is not at all clear what (e.g.) orthogonal should mean in this context.
Moreover, since we need to handle infinite series, we need convergence and
hence we need to define the distance of two functions as well.
Hence our program looks as follows:
2Erwin Schrödinger (1887–1961), Austrian physicist
3Jacques Charles François Sturm (1803–1855), French mathematician
3Joseph Liouville (1809–1882), French mathematician and engineer
1.1. Introduction: Linear partial differential equations 7

• What is the distance of two functions? This automatically leads


us to the problem of convergence and completeness.
• If we additionally require the concept of orthogonality, we are led
to Hilbert spaces which are the proper setting for our eigenvalue
problem.
• Finally, the spectral theorem for compact symmetric operators will
provide the solution of our above problem.
P∞
Problem 1.1. Suppose n=1 |cn | < ∞. Show that (1.11) is continuous
for (t, x) ∈ [0, ∞) × [0, 1] and solves the heat equation for (t, x) ∈ (0, ∞) ×
[0, 1]. (Hint: Weierstraß4 M-test. When can you interchange the order of
summation and differentiation?)

Problem 1.2. Show that for n, m ∈ N we have


Z 1 (
1, n = m,
2 sin(nπx) sin(mπx)dx =
0 0, n ̸= m.

Conclude that the Fourier sine coefficients are given by


Z 1
û0,n = 2 sin(nπx)u0 (x)dx
0

provided the sum in (1.13) converges uniformly. Conclude that in this case
the solution can be expressed as
Z 1
u(t, x) = K(t, x, y)u0 (y)dy, t > 0,
0

where

2
X
K(t, x, y) := 2 e−(πn) t sin(nπx) sin(nπy)
n=1
1 x−y x+y 
= ϑ( , iπt) − ϑ( , iπt) .
2 2 2
Here
2 τ +2πinz 2τ
X X
ϑ(z, τ ) := eiπn =1+2 eiπn cos(2πnz), Im(τ ) > 0,
n∈Z n∈N

is the Jacobi theta function.5

4Karl Weierstrass (1815–1897), German mathematician


5Carl Gustav Jacob Jacobi (1804–1851), German mathematician
8 1. A first look at Banach and Hilbert spaces

1.2. The Banach space of continuous functions


Our point of departure will be the set of continuous functions C(I) on a
compact interval I := [a, b] ⊂ R. Since we want to handle both real and
complex models, we will formulate most results for the more general complex
case only. In fact, most of the time there will be no difference but we will
add a remark in the rare case where the real and complex case do indeed
differ.
One way of declaring a distance, well-known from calculus, is the max-
imum norm of a function f ∈ C(I):
∥f ∥∞ := max |f (x)|. (1.19)
x∈I

It is not hard to see that with this definition C(I) becomes a normed vector
space:
A normed vector space X is a vector space X over C (or R) with a
nonnegative function (the norm) ∥.∥ : X → [0, ∞) such that
• ∥f ∥ > 0 for f ∈ X \ {0} (positive definiteness),
• ∥α f ∥ = |α| ∥f ∥ for all α ∈ C, f ∈ X (positive homogeneity),
and
• ∥f + g∥ ≤ ∥f ∥ + ∥g∥ for all f, g ∈ X (triangle inequality).
If positive definiteness is dropped from the requirements, one calls ∥.∥ a
seminorm.
From the triangle inequality we also get the inverse triangle inequal-
ity (Problem 1.3)
|∥f ∥ − ∥g∥| ≤ ∥f − g∥, (1.20)
which shows that the norm is continuous.
Also note that norms are closely related to convexity. To this end recall
that a subset C ⊆ X is called convex if for every f, g ∈ C we also have
λf + (1 − λ)g ∈ C whenever λ ∈ (0, 1). Moreover, a mapping F : C → R is
called convex if F (λf +(1−λ)g) ≤ λF (f )+(1−λ)F (g) whenever λ ∈ (0, 1)
and f, g ∈ C. In our case the triangle inequality plus homogeneity imply
that every norm is convex:
∥λf + (1 − λ)g∥ ≤ λ∥f ∥ + (1 − λ)∥g∥, λ ∈ [0, 1]. (1.21)
Moreover, choosing λ = 21 we get back the triangle inequality upon using
homogeneity. In particular, the triangle inequality could be replaced by
convexity in the definition.
Once we have a norm, we have a distance d(f, g) := ∥f − g∥ (in par-
ticular, every normed space is a special case of a metric space) and hence
we know when a sequence of vectors fn converges to a vector f (namely if
1.2. The Banach space of continuous functions 9

∥fn − f ∥ → 0, that is, for every ε > 0 there is some N such that ∥fn − f ∥ < ε
for all n ≥ N ). We will write fn → f or limn→∞ fn = f , as usual, in this
case. Moreover, a mapping F : X → Y between two normed spaces is
called continuous if for every convergent sequence fn → f from X we have
F (fn ) → F (f ) (with respect to the norm of X and Y , respectively). In
fact, the norm, vector addition, and multiplication by scalars are continuous
(Problem 1.4).
Two normed spaces X and Y are called isomorphic if there exists a lin-
ear bijection T : X → Y such that T and its inverse T −1 are continuous. We
will write X ∼= Y in this case. They are called isometrically isomorphic
if in addition, T is an isometry, ∥T (f )∥ = ∥f ∥ for every f ∈ X.
In addition to the concept of convergence, we also have the concept of
a Cauchy sequence:6 A sequence fn is Cauchy if for every ε > 0 there is
some N such that ∥fn −fm ∥ < ε for all n, m ≥ N . Of course every convergent
sequence is Cauchy but the converse might not be true in general. Hence a
normed space is called complete if every Cauchy sequence has a limit. A
complete normed space is called a Banach space.7
Example 1.1. By completeness of the real numbers R as well as the complex
numbers C with the absolute value as norm are Banach spaces. ⋄
Example 1.2. The space ℓ1 (N) of all complex-valued sequences a = (aj )∞
j=1
for which the norm
X∞
∥a∥1 := |aj | (1.22)
j=1
is finite is a Banach space.
To show this, we need to verify three things: (i) ℓ1 (N) is a vector space,
that is, closed under addition and scalar multiplication, (ii) ∥.∥1 satisfies the
three requirements for a norm, and (iii) ℓ1 (N) is complete.
First of all, observe
k
X k
X k
X
|aj + bj | ≤ |aj | + |bj | ≤ ∥a∥1 + ∥b∥1
j=1 j=1 j=1

for every finite k. Letting k → ∞, we conclude that ℓ1 (N) is closed under


addition and that the triangle inequality holds. That ℓ1 (N) is closed under
scalar multiplication together with homogeneity as well as positive definite-
ness are straightforward. It remains to show that ℓ1 (N) is complete. Let
j=1 be a Cauchy sequence; that is, for given ε > 0 we can find
an = (anj )∞
some N such that ∥am − an ∥1 ≤ ε for m, n ≥ N . This implies, in particular,
j − aj | ≤ ε for every fixed j. Thus aj is a Cauchy sequence for fixed j
|am n n

6Augustin-Louis Cauchy (1789–1857), French mathematician


7Stefan Banach (1892–1945), Polish mathematician
10 1. A first look at Banach and Hilbert spaces

and, by completeness of C, it has a limit: aj := limn→∞ anj . Now consider


Pk
j=1 |aj − aj | ≤ ε and take m → ∞:
m n

k
X
|aj − anj | ≤ ε, n ≥ N.
j=1

Since this holds for all finite k, we even have ∥a − an ∥1 ≤ ε. Hence (a − an ) ∈


ℓ1 (N) and since an ∈ ℓ1 (N), we finally conclude a = an + (a − an ) ∈ ℓ1 (N).
By our estimate ∥a − an ∥1 ≤ ε for n ≥ N , our candidate a is indeed the limit
of an . ⋄
Example 1.3. The previous example can be generalized by considering the
space ℓp (N) of all complex-valued sequences a = (aj )∞
j=1 for which the norm
 1/p

X
∥a∥p :=  |aj |p  , p ∈ [1, ∞), (1.23)
j=1

is finite. By |aj + bj |p ≤ 2p max(|aj |, |bj |)p = 2p max(|aj |p , |bj |p ) ≤ 2p (|aj |p +


|bj |p ) it is a vector space, but the triangle inequality is only easy to see in the
case p = 1. (It is also not hard to see that it fails for p < 1, which explains
our requirement p ≥ 1. See also Problem 1.18.)
To prove the triangle inequality we need Young’s inequality8 (Prob-
lem 1.9)
1 1 1 1
α1/p β 1/q ≤ α + β, + = 1, α, β ≥ 0, (1.24)
p q p q
which implies Hölder’s inequality9
∥ab∥1 ≤ ∥a∥p ∥b∥q (1.25)
for a ∈ ℓp (N),b∈ In fact, by homogeneity of the norm it suffices to
ℓq (N).
prove the case ∥a∥p = ∥b∥q = 1. But this case follows by choosing α = |aj |p
and β = |bj |q in (1.24) and summing over all j.
Now using |aj + bj |p ≤ |aj | |aj + bj |p−1 + |bj | |aj + bj |p−1 , we obtain from
Hölder’s inequality (note (p − 1)q = p)
∥a + b∥pp ≤ ∥a∥p ∥(a + b)p−1 ∥q + ∥b∥p ∥(a + b)p−1 ∥q
= (∥a∥p + ∥b∥p )∥a + b∥p−1
p .

Hence ℓp (N) is a normed space. That it is complete can be shown as in the


case p = 1 (Problem 1.10).
The unit ball with respect to these norms in R2 is depicted in Figure 1.1.
One sees that for p < 1 the unit ball is not convex (explaining once more our
8William Henry Young (1863–1942), English mathematician
9Otto Hölder (1859–1937), German mathematician
1.2. The Banach space of continuous functions 11

p=∞
1
p=4

p=2

p=1

1
p= 2

−1 1

−1

Figure 1.1. Unit balls for ∥.∥p in R2

restriction p ≥ 1). Moreover, for 1 < p < ∞ it is even strictly convex (that
is, the line segment joining two distinct points is always in the interior). This
is related to the question of equality in the triangle inequality and will be
discussed in Problems 1.15 and 1.16. ⋄
Example 1.4. The space ℓ∞ (N) of all complex-valued bounded sequences
j=1 together with the norm
a = (aj )∞

∥a∥∞ := sup |aj | (1.26)


j∈N

is a Banach space (Problem 1.11). Note that with this definition, Hölder’s
inequality (1.25) remains true for the cases p = 1, q = ∞ and p = ∞, q = 1.
The reason for the notation is explained in Problem 1.17. ⋄
By a subspace U of a normed space X, we mean a subset which is closed
under the vector operations. If it is also closed in a topological sense, we call
it a closed subspace. In this context recall that a subset U ⊆ X is called open
if for every point f there is also a ball Bε (f ) := {g ∈ X| ∥f − g∥ < ε} ⊆ X
contained within the set. The closed sets are then defined as the complements
of open sets and one has that a set V ⊆ X is closed if and only if for
every convergent sequence fn ∈ V the limit is also in the set, limn fn ∈ V .
Warning: Some authors require subspaces to be closed. 
Example 1.5. Every closed subspace of a Banach space is again a Banach
space. For example, the space c0 (N) ⊂ ℓ∞ (N) of all sequences converging to
zero is a closed subspace. In fact, if a ∈ ℓ∞ (N)\c0 (N), then lim supj→∞ |aj | =
12 1. A first look at Banach and Hilbert spaces

ε > 0 and thus a + b ̸∈ c0 (N) for every b ∈ ℓ∞ (N) with ∥b∥∞ < ε. Hence the
complement of c0 (N) is open. ⋄
Now what about completeness of C(I)? A sequence of functions fn
converges to f if and only if
lim ∥f − fn ∥∞ = lim max |f (x) − fn (x)| = 0. (1.27)
n→∞ n→∞ x∈I

That is, in the language of real analysis, fn converges uniformly to f . Now


let us look at the case where fn is only a Cauchy sequence. Then fn (x) is
clearly a Cauchy sequence of complex numbers for every fixed x ∈ I. In
particular, by completeness of C, there is a limit f (x) for each x. Thus we
get a limiting function f (x) := limn→∞ fn (x). Moreover, letting m → ∞ in
|fm (x) − fn (x)| ≤ ε ∀m, n > Nε , x ∈ I, (1.28)
we see
|f (x) − fn (x)| ≤ ε ∀n > Nε , x ∈ I; (1.29)
that is, fn (x) converges uniformly to f (x). However, up to this point we
do not know whether f is in our vector space C(I), that is, whether it is
continuous. Fortunately, there is a well-known result from real analysis which
tells us that the uniform limit of continuous functions is again continuous:
Fix x ∈ I and ε > 0. To show that f is continuous we need to find a δ such
that |x − y| < δ implies |f (x) − f (y)| < ε. Pick n so that ∥fn − f ∥∞ < ε/3
and δ so that |x − y| < δ implies |fn (x) − fn (y)| < ε/3. Then |x − y| < δ
implies
ε ε ε
|f (x)−f (y)| ≤ |f (x)−fn (x)|+|fn (x)−fn (y)|+|fn (y)−f (y)| < + + = ε
3 3 3
as required. Hence f ∈ C(I) and thus every Cauchy sequence in C(I)
converges. Or, in other words,
Theorem 1.1. Let I ⊂ R be a compact interval, then the continuous func-
tions C(I) with the maximum norm form a Banach space.

For finite dimensional vector spaces the concept of a basis plays a crucial
role. In the case of infinite dimensional vector spaces one could define a
basis as a maximal set of linearly independent vectors (known as a Hamel
basis;10 Problem 1.8). Such a basis has the advantage that it only requires
finite linear combinations. However, the price one has to pay is that such
a basis will be way too large (typically uncountable, cf. Problems 1.7 and
4.4). Since we have the notion of convergence, we can handle countable
linear combinations and try to look for countable bases. We start with a few
definitions.
10Georg Hamel (1877–1954)), German mathematician
1.2. The Banach space of continuous functions 13

The set of all finite linear combinations of a set of vectors {un }n∈N ⊂ X
is called the span of {un }n∈N and denoted by
m
X
span{un }n∈N := { αj unj |nj ∈ N , αj ∈ C, m ∈ N}. (1.30)
j=1

A set of vectors {un }n∈N ⊂ X is called linearly independent if every finite


subset is. If {un }N
n=1 ⊂ X, N ∈ N ∪ {∞}, is countable, we can throw away
all elements which can be expressed as linear combinations of the previous
ones to obtain a subset of linearly independent vectors which have the same
span.
Let N ∈ N ∪ {∞} with N ∈ N in case X is finite dimensional (and in
which case N equals the dimension of X) or N = ∞ in case X is infinite
dimensional. We will call a countable sequence of vectors (un )N
n=1 from X
a Schauder basis if every element f ∈ X can be uniquely written as a
11

countable linear combination of the basis elements:


N
X
f= αn un , αn = αn (f ) ∈ C, (1.31)
n=1

where the sum has to be understood as a limit if N = ∞ (the sum is not


required to converge unconditionally and hence the order of the basis el-
ements is important). Since we have assumed the coefficients αn (f ) to be
uniquely determined, the vectors are necessarily linearly independent. More-
over, one can show that the coordinate functionals f 7→ αn (f ) are continuous
(cf. Problem 4.9). A Schauder basis and its corresponding coordinate func-
tionals u∗n : X → C, f 7→ αn (f ) form a so-called biorthogonal system:
u∗m (un ) = δm,n , where
(
1, n = m,
δn,m := (1.32)
0, n ̸= m,

is the Kronecker delta.12


Example 1.6. In a finite dimensional space every basis is also a Schauder
basis. Note that in this case continuity of the coordinate functionals is im-
mediate since linear maps on finite dimensional spaces are always continuous
(see Lemma 1.15 below). ⋄
Example 1.7. The sequence of vectors δ n = (δm
n := δ
n,m )m∈N is a Schauder
basis for the Banach space ℓ (N), 1 ≤ p < ∞.
p

11Juliusz Schauder (1899–1943), Polish mathematician


12Leopold Kronecker (1823–1891), German mathematician
14 1. A first look at Banach and Hilbert spaces

Pm
Let a = (aj )∞
j=1 ∈ ℓ (N) be given and set a :=
p m
n=1 an δ . Then
n

 1/p
X∞
∥a − am ∥p =  |aj |p  → 0
j=m+1

since am
j = aj for 1 ≤ j ≤ m and am
j = 0 for j > m. Hence

X
a= an δ n
n=1
and (δ n )∞
n=1 is a Schauder basis (uniqueness of the coefficients is left as an
exercise).
Note that (δ n )∞
n=1 is also Schauder basis for c0 (N) but not for ℓ (N) (try

to approximate a constant sequence). ⋄


A set whose span is dense is called total, and if we have a countable total
set, we also have a countable dense set (consider only linear combinations
with rational coefficients — show this). A normed vector space containing a
countable dense set is called separable.
 Warning: Some authors use the term total in a slightly different way —
see the warning on page 125.
Example 1.8. Every Schauder basis is total and thus every Banach space
with a Schauder basis is separable (the converse puzzled mathematicians
for quite some time and was eventually shown to be false by Enflo13). In
particular, the Banach space ℓp (N) is separable for 1 ≤ p < ∞.
However, ℓ∞ (N) is not separable (Problem 1.13)! ⋄
While we will not give a Schauder basis for C(I) (Problem 1.22), we will
at least show that C(I) is separable. We will do this by showing that every
continuous function can be approximated by polynomials, a result which is
of independent interest. But first we need a lemma.
Lemma 1.2 (Smoothing). Let un be a sequence of nonnegative continuous
functions on [−1, 1] such that
Z Z
un (x)dx = 1 and un (x)dx → 0, δ > 0. (1.33)
|x|≤1 δ≤|x|≤1
(In other words, un has mass one and concentrates near x = 0 as n → ∞.)
Then for every f ∈ C[− 21 , 12 ] which vanishes at the endpoints, f (− 12 ) =
f ( 21 ) = 0, we have that
Z 1/2
fn (x) := un (x − y)f (y)dy (1.34)
−1/2

13Per Enflo (*1944), Swedish mathematician


1.2. The Banach space of continuous functions 15

converges uniformly to f (x).

Proof. Since f is uniformly continuous, for given ε we can find a δ < 1/2
(independent of x) such that |f (x) − R f (y)| ≤ ε whenever |x − y| ≤ δ. More-
over, we can choose n such that δ≤|y|≤1 un (y)dy ≤ ε. Now abbreviate
M := maxx∈[−1/2,1/2] {1, |f (x)|} and note
Z 1/2 Z 1/2
|f (x) − un (x − y)f (x)dy| = |f (x)| |1 − un (x − y)dy| ≤ M ε.
−1/2 −1/2

In fact, either the distance of x to one of the boundary points ± 21 is smaller


than δ and hence |f (x)| ≤ ε or otherwise [−δ, δ] ⊂ [x − 1/2, x + 1/2] and the
difference between one and the integral is smaller than ε.
Using this, we have
Z 1/2
|fn (x) − f (x)| ≤ un (x − y)|f (y) − f (x)|dy + M ε
−1/2
Z
= un (x − y)|f (y) − f (x)|dy
|y|≤1/2,|x−y|≤δ
Z
+ un (x − y)|f (y) − f (x)|dy + M ε
|y|≤1/2,|x−y|≥δ
≤ε + 2M ε + M ε = (1 + 3M )ε,
which proves the claim. □

Note that fn will be as smooth as un , hence the title smoothing lemma.


Moreover, fn will be a polynomial if un is. The same idea is used to approx-
imate noncontinuous functions by smooth ones (of course the convergence
will no longer be uniform in this case).
Now we are ready to show:
Theorem 1.3 (Weierstraß). Let I ⊂ R be a compact interval. Then the set
of polynomials is dense in C(I).

Proof. Let f ∈ C(I) be given. By considering f (x) − f (a) − f (b)−f


b−a
(a)
(x − a)
it is no loss to assume that f vanishes at the boundary points. Moreover,
without restriction, we only consider I = [− 21 , 12 ] (why?).
Now the claim follows from Lemma 1.2 using the Landau kernel14
1
un (x) := (1 − x2 )n ,
In

14Lev Landau (1908–1968), Soviet physicist


16 1. A first look at Banach and Hilbert spaces

where (using integration by parts)


Z 1 Z 1
n
In := (1 − x2 )n dx = (1 − x)n−1 (1 + x)n+1 dx
−1 n + 1 −1
n! (n!)2 22n+1 n!
= ··· = 22n+1 = = 1 1 1 .
(n + 1) · · · (2n + 1) (2n + 1)! 2 ( 2 + 1) · · · ( 2 + n)
Indeed, the first part of (1.33) holds by construction, and the second part
follows from the elementary estimate
1
1 < In < 2,
2 +n
which shows δ≤|x|≤1 un (x)dx ≤ 2un (δ) < (2n + 1)(1 − δ 2 )n → 0.
R

Corollary 1.4. The monomials are total and hence C(I) is separable.

Note that while the proof of Theorem 1.3 provides an explicit way of
constructing a sequence of polynomials fn (x) which will converge uniformly
to f (x), this method still has a few drawbacks from a practical point of
view: Suppose we have approximated f by a polynomial of degree n but our
approximation turns out to be insufficient for the intended purpose. First
of all, since our polynomial will not be optimal in general, we could try to
find another polynomial of the same degree giving a better approximation.
However, as this is by no means straightforward, it seems more feasible to
simply increase the degree. However, if we do this, all coefficients will change
and we need to start from scratch. This is in contradistinction to a Schauder
basis where we could just add one new element from the basis (and where it
suffices to compute one new coefficient).
In particular, note that this shows that the monomials are no Schauder
basis for C(I) since the coefficients must satisfy |αn |∥x∥n∞ = ∥fn −fn−1 ∥∞ →
0 and hence the limit must be analytic on the interior of I. This observation
emphasizes that a Schauder basis is more than a set of linearly independent
vectors whose span is dense.
We will see in the next section that the concept of orthogonality resolves
these problems.
Problem* 1.3. Let X be a normed space and f, g ∈ X. Show that |∥f ∥ −
∥g∥| ≤ ∥f − g∥.
Problem* 1.4. Let X be a normed space. Show that the norm, vector
addition, and multiplication by scalars are continuous. That is, if fn → f ,
gn → g, and αn → α, then ∥fn ∥ → ∥f ∥, fn + gn → f + g, and αn gn → αg.
Problem 1.5. Let X be a normed space and g ∈ X. Show that ∥f ∥ ≤
max(∥f − g∥, ∥f + g∥).
1.2. The Banach space of continuous functions 17

P∞
Problem 1.6. Let X be a Banach space. Show that j=1 ∥fj ∥ < ∞ implies
that

X Xn
fj = lim fj
n→∞
j=1 j=1
exists. The series is called absolutely convergent in this case. Conversely,
show that a normed space is complete if every absolutely convergent series
converges.
Problem 1.7. While ℓ1 (N) is separable, it still has room for an uncountable
set of linearly independent vectors. Show this by considering vectors of the
form
aα = (1, α, α2 , . . . ), α ∈ (0, 1).
(Hint: Recall the Vandermonde15 determinant. See Problem 4.4 for a gen-
eralization.)
Problem 1.8. A Hamel basis is a maximal set of linearly independent
vectors. Show that every vector space X has a Hamel basis {uα }α∈A . Show
that given a HamelPnbasis, every x ∈ X can be written as a finite linear
combination x = j=1 cj uαj , where the vectors uαj and the constants cj
are uniquely determined. (Hint: Use Zorn’s lemma, Theorem A.2, to show
existence.)
Problem* 1.9. Prove Young’s inequality (1.24). Show that equality occurs
precisely if α = β. (Hint: Take logarithms on both sides.)
Problem* 1.10. Show that ℓp (N), 1 ≤ p < ∞, is complete.
Problem* 1.11. Show that ℓ∞ (N) is a Banach space.
Problem 1.12. Is ℓ1 (N) a closed subspace of ℓ∞ (N) (with respect to the
∥.∥∞ norm)? If not, what is its closure?
Problem* 1.13. Show that ℓ∞ (N) is not separable. (Hint: Consider se-
quences which take only the value one and zero. How many are there? What
is the distance between two such sequences?)
Problem 1.14. Show that the set of convergent sequences c(N) is a Banach
space isomorphic to the set of convergent sequence c0 (N). (Hint: Hilbert’s
hotel.)
Problem* 1.15. Show that there is equality in the Hölder inequality (1.25)
for 1 < p < ∞ if and only if either a = 0 or |bj |q = α|aj |p for all j ∈ N.
Show that we have equality in the triangle inequality for ℓ1 (N) if and only if
aj b∗j ≥ 0 for all j ∈ N (here the ‘∗’ denotes complex conjugation). Show that
15Alexandre-Théophile Vandermonde (1735–1796), French mathematician, musician and
chemist
18 1. A first look at Banach and Hilbert spaces

we have equality in the triangle inequality for ℓp (N) with 1 < p < ∞ if and
only if a = 0 or b = αa with α ≥ 0.
Problem* 1.16. Let X be a normed space. Show that the following condi-
tions are equivalent.
(i) If ∥x + y∥ = ∥x∥ + ∥y∥ then y = αx for some α ≥ 0 or x = 0.
(ii) If ∥x∥ = ∥y∥ = 1 and x ̸= y then ∥λx + (1 − λ)y∥ < 1 for all
0 < λ < 1.
(iii) If ∥x∥ = ∥y∥ = 1 and x ̸= y then 21 ∥x + y∥ < 1.
(iv) The function x 7→ ∥x∥2 is strictly convex.
A norm satisfying one of them is called strictly convex.
Show that ℓp (N) is strictly convex for 1 < p < ∞ but not for p = 1, ∞.
Problem 1.17. Show that p0 ≤ p implies ℓp0 (N) ⊂ ℓp (N) and ∥a∥p ≤ ∥a∥p0 .
Moreover, show
lim ∥a∥p = ∥a∥∞ .
p→∞

Problem 1.18. Formally extend the definition of ℓp (N) to p ∈ (0, 1). Show
that ∥.∥p does not satisfy the triangle inequality. However, show that it is a
quasinormed space, that is, it satisfies all requirements for a normed space
except for the triangle inequality which is replaced by
∥a + b∥ ≤ K(∥a∥ + ∥b∥)
with some constant K ≥ 1. Show, in fact,
∥a + b∥p ≤ 21/p−1 (∥a∥p + ∥b∥p ), p ∈ (0, 1).
Moreover, show that ∥.∥pp
satisfies the triangle inequality in this case, but
of course it is no longer homogeneous (but at least you can get an honest
metric d(a, b) = ∥a − b∥pp which gives rise to the same topology). (Hint:
Show α + β ≤ (αp + β p )1/p ≤ 21/p−1 (α + β) for 0 < p < 1 and α, β ≥ 0.)
Problem 1.19. Let I be a compact interval and consider X := C(I). Which
of following sets are subspaces of X? If yes, are they closed?
(i) monotone functions
(ii) even functions
(iii) polynomials
(iv) polynomials of degree at most k for some fixed k ∈ N0
(v) continuous piecewise linear functions
(vi) C 1 (I)
(vii) {f ∈ C(I)|f (c) = f0 } for some fixed c ∈ I and f0 ∈ R
1.3. The geometry of Hilbert spaces 19

Problem 1.20. Let I be a compact interval. Show that the set Y := {f ∈


C(I)|f (x) > 0} is open in X := C(I). Compute its closure.
Problem 1.21. Compute of ℓ1 (N): (i)
its closure of the following subsets P
B1 := {a ∈ ℓ (N)| j∈N |aj | ≤ 1}. (ii) B∞ := {a ∈ ℓ (N)| j∈N |aj |2 <
1 2 1
P

∞}.
Problem* 1.22. Show that the following set of functions is a Schauder
basis for C[0, 1]: We start with u1 (t) = t, u2 (t) = 1 − t and then split
[0, 1] into 2n intervals of equal length and let u2n +k+1 (t), for 1 ≤ k ≤ 2n ,
be a piecewise linear peak of height 1 supported in the k’th subinterval:
u2n +k+1 (t) := max(0, 1 − |2n+1 t − 2k + 1|) for n ∈ N0 and 1 ≤ k ≤ 2n .

1.3. The geometry of Hilbert spaces


So far it looks like C(I) has all the properties we want. However, there is
still one thing missing: How should we define orthogonality in C(I)? In
Euclidean space, two vectors are called orthogonal if their scalar product
vanishes, so we would need a scalar product:
Suppose H is a vector space. A map ⟨., ..⟩ : H × H → C is called a
sesquilinear form if it is conjugate linear in the first argument and linear
in the second; that is,
⟨α1 f1 + α2 f2 , g⟩ = α1∗ ⟨f1 , g⟩ + α2∗ ⟨f2 , g⟩,
α1 , α2 ∈ C, (1.35)
⟨f, α1 g1 + α2 g2 ⟩ = α1 ⟨f, g1 ⟩ + α2 ⟨f, g2 ⟩,
where ‘∗’ denotes complex conjugation. A symmetric
⟨f, g⟩ = ⟨g, f ⟩∗ (symmetry)
sesquilinear form is also called a Hermitian form16 and a positive definite
⟨f, f ⟩ > 0 for f ̸= 0 (positive definite),
Hermitian form is called an inner product or scalar product. Note that
positivity already implies symmetry in the complex case (Problem 1.27).
Associated with every scalar product is a norm
(1.36)
p
∥f ∥ := ⟨f, f ⟩.
Only the triangle inequality is nontrivial. It will follow from the Cauchy–
Schwarz inequality below. Until then, just regard (1.36) as a convenient
shorthand notation.
Warning: There is no common agreement whether a sesquilinear form 
(scalar product) should be linear in the first or in the second argument and
different authors use different conventions.
16Charles Hermite (1822–1901), French mathematician
20 1. A first look at Banach and Hilbert spaces

The pair (H, ⟨., ..⟩) is called an inner product space. If H is complete
(with respect to the norm (1.36)), it is called a Hilbert space.17
Example 1.9. Clearly, Cn with the usual scalar product
n
X
⟨a, b⟩ := a∗j bj (1.37)
j=1

is a (finite dimensional) Hilbert space. ⋄


Example 1.10. A somewhat more interesting example is the Hilbert space
ℓ2 (N), that is, the set of all complex-valued sequences
n ∞
X o
(aj )∞
j=1 |aj |2
< ∞ (1.38)
j=1

with scalar product



X
⟨a, b⟩ := a∗j bj . (1.39)
j=1

That this sum is (absolutely) convergent (and thus well-defined) for a, b ∈


ℓ2 (N) follows from Hölder’s inequality (1.25) in the case p = q = 2.
Observe that the norm ∥a∥ = ⟨a, a⟩ is identical to the norm ∥a∥2
p

defined in the previous section. In particular, ℓ2 (N) is complete and thus


indeed a Hilbert space. ⋄
A vector f ∈ H is called normalized or a unit vector if ∥f ∥ = 1.
Two vectors f, g ∈ H are called orthogonal or perpendicular (f ⊥ g) if
⟨f, g⟩ = 0 and parallel if one is a multiple of the other.
If f and g are orthogonal, we have the Pythagorean theorem:18
∥f + g∥2 = ∥f ∥2 + ∥g∥2 , f ⊥ g, (1.40)
which is one line of computation (do it!).
Suppose u is a unit vector. Then the projection of f in the direction of
u is given by
f∥ := ⟨u, f ⟩u, (1.41)
and f⊥ , defined via
f⊥ := f − ⟨u, f ⟩u, (1.42)
is perpendicular to u since ⟨u, f⊥ ⟩ = ⟨u, f − ⟨u, f ⟩u⟩ = ⟨u, f ⟩ − ⟨u, f ⟩⟨u, u⟩ =
0.

17David Hilbert (1862–1943), German mathematician


18 Pythagoras (c. 570–c. 495 BC), ancient Ionian Greek philosopher
1.3. The geometry of Hilbert spaces 21

BMB
f B f⊥
B
1
 B
 
 f∥
 u

1


Taking any other vector parallel to u, we obtain from (1.40)


∥f − αu∥2 = ∥f⊥ + (f∥ − αu)∥2 = ∥f⊥ ∥2 + |⟨u, f ⟩ − α|2 (1.43)
and hence f∥ is the unique vector parallel to u which is closest to f .
As a first consequence we obtain the Cauchy–Bunyakovsky–Schwarz19
inequality:
Theorem 1.5 (Cauchy–Bunyakovsky–Schwarz). Let H0 be an inner product
space. Then for every f, g ∈ H0 we have
|⟨f, g⟩| ≤ ∥f ∥ ∥g∥ (1.44)
with equality if and only if f and g are parallel.

Proof. It suffices to prove the case ∥g∥ = 1. But then the claim follows
from ∥f ∥2 = |⟨g, f ⟩|2 + ∥f⊥ ∥2 . □

We will follow common practice and refer to (1.44) simply as Cauchy–


Schwarz inequality. Note that the Cauchy–Schwarz inequality implies that
the scalar product is continuous in both variables; that is, if fn → f and
gn → g, we have ⟨fn , gn ⟩ → ⟨f, g⟩.
As another consequence we infer that the map ∥.∥ is indeed a norm. In
fact,
∥f + g∥2 = ∥f ∥2 + ⟨f, g⟩ + ⟨g, f ⟩ + ∥g∥2 ≤ (∥f ∥ + ∥g∥)2 . (1.45)
But let us return to C(I). Can we find a scalar product which has the
maximum norm as associated norm? Unfortunately the answer is no! The
reason is that the maximum norm does not satisfy the parallelogram law
(Problem 1.26).
Theorem 1.6 (Jordan–von Neumann20). A norm is associated with a scalar
product if and only if the parallelogram law
∥f + g∥2 + ∥f − g∥2 = 2∥f ∥2 + 2∥g∥2 (1.46)
19Viktor Bunyakovsky (1804–1889), Russian mathematician
19Hermann Schwarz (1843 –1921), German mathematician
20Pascual Jordan (1902–1980), German theoretical and mathematical physicist
20John von Neumann (1903–1957), Hungarian-American mathematician, physicist, computer
scientist, and engineer
22 1. A first look at Banach and Hilbert spaces

holds.
In this case the scalar product can be recovered from its norm by virtue
of the polarization identity
1
∥f + g∥2 − ∥f − g∥2 + i∥f − ig∥2 − i∥f + ig∥2 . (1.47)

⟨f, g⟩ =
4
Proof. If an inner product space is given, verification of the parallelogram
law and the polarization identity is straightforward (Problem 1.27).
To show the converse, we define
1
∥f + g∥2 − ∥f − g∥2 + i∥f − ig∥2 − i∥f + ig∥2 .

s(f, g) :=
4
Then s(f, f ) = ∥f ∥2 and s(f, g) = s(g, f )∗ are straightforward to check.
Moreover, another straightforward computation using the parallelogram law
shows
g+h
s(f, g) + s(f, h) = 2s(f, ).
2
Now choosing h = 0 (and using s(f, 0) = 0) shows s(f, g) = 2s(f, g2 ) and thus
s(f, g)+s(f, h) = s(f, g +h). Furthermore, by induction we infer 2mn s(f, g) =
s(f, 2mn g); that is, α s(f, g) = s(f, αg) for a dense set of positive rational
numbers α. By continuity (which follows from continuity of the norm) this
holds for all α ≥ 0 and s(f, −g) = −s(f, g), respectively, s(f, ig) = i s(f, g),
finishes the proof. □

In the case of a real Hilbert space, the polarization identity of course


simplifies to ⟨f, g⟩ = 14 (∥f + g∥2 − ∥f − g∥2 ).
Note that the parallelogram law and the polarization identity even hold
for sesquilinear forms (Problem 1.27).
But how do we define a scalar product on C(I)? One possibility is
Z b
⟨f, g⟩ := f ∗ (x)g(x)dx. (1.48)
a
The corresponding inner product space is denoted by L2cont (I). Note that we
have
(1.49)
p
∥f ∥ ≤ |b − a|∥f ∥∞
and hence the maximum norm is stronger than the Lcont norm.
2

Suppose we have two norms ∥.∥1 and ∥.∥2 on a vector space X. Then
∥.∥2 is said to be stronger than ∥.∥1 if there is a constant m > 0 such that
∥f ∥1 ≤ m∥f ∥2 . (1.50)
It is straightforward to check the following.
Lemma 1.7. If ∥.∥2 is stronger than ∥.∥1 , then every ∥.∥2 Cauchy sequence
is also a ∥.∥1 Cauchy sequence.
1.3. The geometry of Hilbert spaces 23

Hence if a function F : X → Y is continuous in (X, ∥.∥1 ), it is also


continuous in (X, ∥.∥2 ), and if a set is dense in (X, ∥.∥2 ), it is also dense in
(X, ∥.∥1 ).
In particular, L2cont is separable since the polynomials are dense. But is
it also complete? Unfortunately the answer is no:
Example 1.11. Take I = [0, 2] and define

0,
 0 ≤ x ≤ 1 − n1 ,
fn (x) := 1 + n(x − 1), 1 − n1 ≤ x ≤ 1,

1, 1 ≤ x ≤ 2.

Then fn (x) is a Cauchy sequence in L2cont , but there is no limit in L2cont !


Clearly, the limit should be the step function which is 0 for 0 ≤ x < 1 and
1 for 1 ≤ x ≤ 2, but this step function is discontinuous (Problem 1.30)! ⋄
Example 1.12. The previous example indicates that we should consider
(1.48) on a larger class of functions, for example on the class of Riemann21
integrable functions
R(I) := {f : I → C|f is Riemann integrable}
such that the integral makes sense. While this seems natural, it implies
another problem: Any function which vanishes outside a set which is neg-
ligible for Rthe integral (e.g. finitely many points) has norm zero! That is,
∥f ∥2 := ( I |f (x)|2 dx)1/2 is only a seminorm on R(I) (Problem 1.29). To
get a norm we consider N (I) := {f ∈ R(I)| ∥f ∥2 = 0}. By homogeneity and
the triangle inequality N (I) is a subspace and we can consider equivalence
classes of functions which differ by a negligible function from N (I):
L2Ri (I) := R(I)/N (I).
Since ∥f ∥2 = ∥g∥2 for f − g ∈ N (I) we have a norm on L2Ri (I). Moreover,
since this norm inherits the parallelogram law we even have an inner prod-
uct space. However, this space will not be complete unless we replace the
Riemann by the Lebesgue22 integral. Hence we will not pursue this further
at this point. ⋄
This shows that in infinite dimensional vector spaces, different norms
will give rise to different convergent sequences. In fact, the key to solving
problems in infinite dimensional spaces is often finding the right norm! This
is something which cannot happen in the finite dimensional case.
Theorem 1.8. If X is a finite dimensional vector space, then all norms are
equivalent. That is, for any two given norms ∥.∥1 and ∥.∥2 , there are positive
21Bernhard Riemann (1826–1866), German mathematician
22Henri Lebesgue (1875-1941), French mathematician
24 1. A first look at Banach and Hilbert spaces

constants m1 and m2 such that


1
∥f ∥1 ≤ ∥f ∥2 ≤ m1 ∥f ∥1 . (1.51)
m2

Proof. ChooseP a basis {uj }1≤j≤n such that every f ∈ X can be writ-
ten as f = j αj uj . Since equivalence of norms is an equivalence rela-
tion (check this!), we can assume that ∥.∥2 is the usual Euclidean norm:
∥f ∥2 := ∥ j αj uj ∥2 = ( j |αj |2 )1/2 . Then by the triangle and Cauchy–
P P
Schwarz inequalities,
X sX
∥f ∥1 ≤ |αj |∥uj ∥1 ≤ ∥uj ∥21 ∥f ∥2
j j
qP
and we can choose m2 = j ∥uj ∥21 .
In particular, if fn is convergent with respect to ∥.∥2 , it is also convergent
with respect to ∥.∥1 . Thus ∥.∥1 is continuous with respect to ∥.∥2 and attains
its minimum m > 0 on the unit sphere S := {u|∥u∥2 = 1} (which is compact
by the Heine–Borel theorem, Theorem B.22). Now choose m1 = 1/m. □

Finally, I remark that a real Hilbert space can always be embedded into
a complex Hilbert space. In fact, if H is a real Hilbert space, then H × H is
a complex Hilbert space if we define

(f1 , f2 )+(g1 , g2 ) = (f1 +g1 , f2 +g2 ), (α+iβ)(f1 , f2 ) = (αf1 −βf2 , αf2 +βf1 )
(1.52)
and

⟨(f1 , f2 ), (g1 , g2 )⟩ = ⟨f1 , g1 ⟩ + ⟨f2 , g2 ⟩ + i(⟨f1 , g2 ⟩ − ⟨f2 , g1 ⟩). (1.53)

Here you should think of (f1 , f2 ) as f1 + if2 . Note that we have a conjugate
linear map C : H × H → H × H, (f1 , f2 ) 7→ (f1 , −f2 ) which satisfies C 2 = I
and ⟨Cf, Cg⟩ = ⟨g, f ⟩. In particular, we can get our original Hilbert space
back if we consider Re(f ) = 21 (f + Cf ) = (f1 , 0).

Problem 1.23. Which of the following bilinear forms are scalar products on
Rn ?
(i) s(x, y) := nj=1 (xj + yj ).
P

(ii) s(x, y) := nj=1 αj xj yj , α ∈ Rn .


P

Problem 1.24. Show that the norm in a Hilbert space satisfies ∥f + g∥ =


∥f ∥ + ∥g∥ if and only if f = αg, α ≥ 0, or g = 0. Hence Hilbert spaces are
strictly convex (cf. Problem 1.16).
1.3. The geometry of Hilbert spaces 25

Problem 1.25 (Generalized parallelogram law). Show that, in a Hilbert


space, X X X
∥fj − fk ∥2 + ∥ fj ∥2 = n ∥fj ∥2
1≤j<k≤n 1≤j≤n 1≤j≤n
for every n ∈ N. The case n = 2 is the parallelogram law (1.46).
Problem 1.26. Show that the maximum norm on C[0, 1] does not satisfy
the parallelogram law.
Problem* 1.27. Suppose Q is a complex vector space. Let s(f, g) be a
sesquilinear form on Q and q(f ) := s(f, f ) the associated quadratic form.
Prove the parallelogram law
q(f + g) + q(f − g) = 2q(f ) + 2q(g) (1.54)
and the polarization identity
1
s(f, g) = (q(f + g) − q(f − g) + i q(f − ig) − i q(f + ig)) . (1.55)
4
Show that s(f, g) is symmetric if and only if q(f ) is real-valued.
Note, that if Q is a real vector space, then the parallelogram law is un-
changed but the polarization identity in the form s(f, g) = 41 (q(f + g) − q(f −
g)) will only hold if s(f, g) is symmetric.
Problem 1.28. A sesquilinear form on a complex inner product space is
called bounded if
∥s∥ := sup |s(f, g)|
∥f ∥=∥g∥=1
is finite. Similarly, the associated quadratic form q is bounded if
∥q∥ := sup |q(f )|
∥f ∥=1

is finite. Show
∥q∥ ≤ ∥s∥ ≤ 2∥q∥
with ∥q∥ = ∥s∥ if s is symmetric. (Hint: Use the polarization identity from
the previous problem. For the symmetric case look at the real part.)
Problem* 1.29. Suppose Q is a vector space. Let s(f, g) be a sesquilinear
form on Q and q(f ) := s(f, f ) the associated quadratic form. Show that the
Cauchy–Schwarz inequality
|s(f, g)| ≤ q(f )1/2 q(g)1/2
holds if q(f ) ≥ 0. In this case q(.)1/2 satisfies the triangle inequality and
hence is a seminorm.
(Hint: Consider 0 ≤ q(f + αg) = q(f ) + 2Re(α s(f, g)) + |α|2 q(g) and
choose α = t s(f, g)∗ /|s(f, g)| with t ∈ R.)
Problem* 1.30. Prove the claims made about fn in Example 1.11.
26 1. A first look at Banach and Hilbert spaces

1.4. Completeness
Since L2cont (I) is not complete, how can we obtain a Hilbert space from it?
Well, the answer is simple: take the completion.
If X is an (incomplete) normed space, consider the set of all Cauchy
sequences X . Call two Cauchy sequences equivalent if their difference con-
verges to zero and denote by X̄ the set of all equivalence classes. It is easy
to see that X̄ (and X ) inherit the vector space structure from X. Moreover,

Lemma 1.9. If xn is a Cauchy sequence in X, then ∥xn ∥ is also a Cauchy


sequence and thus converges.

Consequently, the norm of an equivalence class [(xn )∞ n=1 ] can be defined


by ∥[(xn )∞
n=1 ]∥ := lim n→∞ ∥x n ∥ and is independent of the representative
(show this!). Thus X̄ is a normed space. It contains X as a subspace by
virtue of the embedding x 7→ [(x)∞ n=1 ] which identifies x with the constant
sequence xn := x.

Theorem 1.10. X̄ is a Banach space containing X as a dense subspace if


we identify x ∈ X with the equivalence class of all sequences converging to
x.

Proof. (Outline) To see that constant sequences are dense, note that we
can approximate [(xn )∞n=1 ] by the constant sequence [(xn0 )n=1 ] as n0 → ∞.

It remains to show that X̄ is complete. Let ξn = [(xn,j )j=1 ] be a Cauchy


sequence in X̄. Without loss of generality (by dropping terms) we can choose
the representatives xn,j such that |xn,j − xn,k | ≤ n1 for j, k ≥ n. Then it is
not hard to see that ξ = [(xj,j )∞
j=1 ] is its limit. □

Notice that the completion X̄ is unique. More precisely, every other


complete space which contains X as a dense subset is isomorphic to X̄. This
can for example be seen by showing that the identity map on X has a unique
extension to X̄ (compare Theorem 1.16 below).
In particular, it is no restriction to assume that a normed vector space
or an inner product space is complete (note that by continuity of the norm
the parallelogram law holds for X̄ if it holds for X).
Example 1.13. The completion of the space L2cont (I) is denoted by L2 (I).
While this defines L2 (I) uniquely (up to isomorphisms) it is often inconve-
nient to work with equivalence classes of Cauchy sequences. A much more
convenient characterization can be given with the help of the Lebesgue inte-
gral (see Chapter 3 from [37] if you are familiar with basic Lebesgue integra-
tion; Theorem 3.18 from [37] will establish equivalence of both approaches).
1.5. Compactness 27

Similarly, we define Lp (I), 1 ≤ p < ∞, as the completion of C(I) with


respect to the norm
Z b 1/p
p
∥f ∥p := |f (x)| dx .
a

The only requirement for a norm which is not immediate is the triangle
inequality (except for p = 1, 2) but this can be shown as for ℓp (cf. Prob-
lem 1.33). ⋄
Problem 1.31. Provide a detailed proof of Theorem 1.10.
Problem 1.32. For every f ∈ L1 (I) we can define its integral
Z d
f (x)dx
c

as the (unique) extension of the corresponding linear functional from C(I) to


L1 (I) (by Theorem 1.16 below). Show that this integral is linear and satisfies
Z e Z d Z e Z d Z d
f (x)dx = f (x)dx + f (x)dx, f (x)dx ≤ |f (x)|dx.
c c d c c

Problem* 1.33. Show the Hölder inequality


1 1
∥f g∥1 ≤ ∥f ∥p ∥g∥q , + = 1, 1 ≤ p, q < ∞,
p q
for f ∈ Lp (I), g ∈ Lq (I) and conclude that ∥.∥p is a norm on C(I). Also
conclude that Lp (I) ⊆ L1 (I).

1.5. Compactness
In analysis, compactness is one of the most ubiquitous tools for showing
existence of solutions for various problems. In finite dimensions relatively
compact sets are easily identified as they are precisely the bounded sets by
the Heine–Borel theorem (Theorem B.22). In the infinite dimensional case
the situation is more complicated. Before we look into this, please recall
that for a subset U of a Banach space (or more generally a complete metric
space) the following are equivalent (see Corollary B.20 and Lemma B.26):
• U is relatively compact (i.e. its closure is compact)
• every sequence from U has a convergent subsequence
• U is totally bounded (i.e. it has a finite ε-cover for every ε > 0)
Example 1.14. Consider the bounded sequence (δ n )∞
n=1 in ℓ (N). Since
p
n m
∥δ − δ ∥p = 2 1/p for n ̸= m, there is no way to extract a convergent
subsequence. ⋄
28 1. A first look at Banach and Hilbert spaces

In particular, the Heine–Borel theorem fails for ℓp (N). In fact, it turns


out that it fails in any infinite dimensional space as we will see in Theo-
rem 4.31 below. Hence one needs criteria when a given subset is relatively
compact. Our strategy will be based on total boundedness and can be out-
lined as follows: Project the original set to some finite dimensional space
such that the information loss can be made arbitrarily small (by increasing
the dimension of the finite dimensional space) and apply Heine–Borel to the
finite dimensional space. This idea is formalized in the following lemma.
Lemma 1.11. Let X be a metric space and K some subset. Assume that
for every ε > 0 there is a metric space Yε , a surjective map Pε : X → Yε ,
and some δ > 0 such that Pε (K) is totally bounded and d(x, y) < ε whenever
x, y ∈ K with d(Pε (x), Pε (y)) < δ. Then K is totally bounded.
In particular, if X is a Banach space the claim holds if Pε can be chosen a
linear map onto a finite dimensional subspace Yε such that Pε (K) is bounded,
and ∥(1 − Pε )x∥ ≤ ε for x ∈ K.

Proof. Fix ε > 0. Then by total boundedness of Pε (K) we can find a δ-


cover {Bδ (yj )}mj=1 for Pε (K). Now if we choose xj ∈ Pε ({yj }) ∩ K, then
−1

{Bε (xj )}nj=1 is an ε-cover for K since Pε−1 (Bδ (yj )) ∩ K ⊆ Bε (xj ).
For the last claim consider Pε/3 and note that for δ := ε/3 we have
∥x − y∥ ≤ ∥(1 − Pε/3 )x∥ + ∥Pε/3 (x − y)∥ + ∥(1 − Pε/3 )y∥ < ε for x, y ∈ K. □

The first application will be to ℓp (N).


Theorem 1.12 (Fréchet23). Consider ℓp (N), 1 ≤ p < ∞, and let Pn a =
(a1 , . . . , an , 0, . . . ) be the projection onto the first n components. A subset
K ⊆ ℓp (N) is relatively compact if and only if
(i) it is pointwise bounded, supa∈K |aj | ≤ Mj for all j ∈ N, and
(ii) for every ε > 0 there is some n such that ∥(1 − Pn )a∥p ≤ ε for all
a ∈ K.
In the case p = ∞ conditions (i) and (ii) still imply that K is relatively
compact, but the converse only holds for K ⊆ c0 (N).

Proof. Clearly (i) and (ii) is what is needed for Lemma 1.11.
Conversely, if K is relatively compact it is bounded. Moreover, given
δ we can choose a finite δ-cover {Bδ (aj )}m j=1 for K and some n such that
∥(1 − Pn )a ∥p ≤ δ for all 1 ≤ j ≤ m (this last claim fails for ℓ∞ (N)). Now
j

given a ∈ K we have a ∈ Bδ (aj ) for some j and hence ∥(1 − Pn )a∥p ≤


∥(1 − Pn )(a − aj )∥p + ∥(1 − Pn )aj ∥p ≤ 2δ as required. □
23Maurice René Fréchet (1878–1973), French mathematician
1.5. Compactness 29

Example 1.15. Fix a ∈ ℓp (N) if 1 ≤ p < ∞ or a ∈ c0 (N) if p = ∞. Then


K := {b| |bj | ≤ |aj |} ⊂ ℓp (N) is compact. ⋄
The second application will be to C(I). A family of functions F ⊂ C(I)
is called (pointwise) equicontinuous if for every ε > 0 and every x ∈ I
there is a δ > 0 such that
|f (y) − f (x)| ≤ ε whenever |y − x| < δ, ∀f ∈ F. (1.56)
That is, in this case δ is required to be independent of the function f ∈ F .
Theorem 1.13 (Arzelà–Ascoli24). Let F ⊂ C(I) be a family of continuous
functions. Then F is relatively compact if and only if F is equicontinuous
and the set {f (x0 )|f ∈ F } is bounded for one x0 ∈ I. In this case F is even
bounded.

Proof. Suppose F is equicontinuous and bounded for a fixed x0 . Fix ε > 0.


By compactness of I there are finitely many points x1 , . . . , xn ∈ I such
that the balls Bδj (xj ) cover I, where δj is the δ corresponding to xj as
in the definition of equicontinuity. Now first of all note that, since I is
connected and since x0 ∈ Bδj (xj ) for some j, we see that F is bounded:
|f (x)| ≤ supf ∈F |f (x0 )| + 2nε.
Next consider P : C(I) → Cn , P (f ) = (f (x1 ), . . . , f (xn )). Then P (F )
is bounded and ∥f − g∥∞ ≤ 3ε whenever ∥P (f ) − P (g)∥∞ < ε. Indeed,
just note that for every x there is some j such that x ∈ Bδj (xj ) and thus
|f (x) − g(x)| ≤ |f (x) − f (xj )| + |f (xj ) − g(xj )| + |g(xj ) − g(x)| ≤ 3ε. Hence
F is relatively compact by Lemma 1.11.
Conversely, suppose F is relatively compact. Then F is totally bounded
and hence bounded. To see equicontinuity fix x ∈ I, ε > 0 and choose a
corresponding ε-cover {Bε (fj )}nj=1 for F . Pick δ > 0 such that y ∈ Bδ (x)
implies |fj (y) − fj (x)| < ε for all 1 ≤ j ≤ n. Then f ∈ Bε (fj ) for some j and
hence |f (y) − f (x)| ≤ |f (y) − fj (y)| + |fj (y) − fj (x)| + |fj (x) − f (x)| ≤ 3ε,
proving equicontinuity. □
Example 1.16. Consider the solution fn (x) of the initial value problem
f ′ = sin(nf ), f (0) = 1.
(Assuming this solution exists — it can in principle be found using separation
of variables.) Then |fn′ (x)| ≤ 1 and hence the mean value theorem shows
that the family {fn } ⊆ C([0, 1]) is equicontinuous. Hence there is a uniformly
convergent subsequence. ⋄
Problem 1.34. Find a compact subset of ℓ∞ (N) which does not satisfy (ii)
from Theorem 1.12.
24Cesare Arzelá (1847–1912), Italian mathematician
24Giulio Ascoli (1843–1896), Italian mathematician
30 1. A first look at Banach and Hilbert spaces

Problem 1.35. Show that a subset K ⊂ c0 (N) is relatively compact if and


only if there is a nonnegative sequence a ∈ c0 (N) such that |bn | ≤ an for all
n ∈ N and all b ∈ K.
Problem 1.36. Find a sequence in C[0, 1] which is bounded but has no
convergent subsequence.
Problem 1.37. Find a family in C[0, 1] that is equicontinuous but not
bounded.
Problem 1.38. Which of the following families are relatively compact in
C[0, 1]?
(i) F := {f ∈ C 1 [0, 1]| ∥f ∥∞ ≤ 1}
(ii) F := {f ∈ C 1 [0, 1]| ∥f ′ ∥∞ ≤ 1}
(iii) F := {f ∈ C 1 [0, 1]| ∥f ∥∞ ≤ 1, ∥f ′ ∥2 ≤ 1}

1.6. Bounded operators


Given two normed spaces X and Y , a linear map
A : D(A) ⊆ X → Y (1.57)
will be called a (linear) operator. The linear subspace D(A) on which A
is defined is called the domain of A and is frequently required to be dense.
The kernel (also null space)
Ker(A) := {x ∈ D(A)|Ax = 0} ⊆ X (1.58)
and range
Ran(A) := {Ax|x ∈ D(A)} = AD(A) ⊆ Y (1.59)
are again linear subspaces. Note that a linear map A will be continuous if
and only if it is continuous at 0, that is, xn ∈ D(A) → 0 implies Axn → 0.
The operator A is called bounded if the operator norm
∥A∥ := sup ∥Ax∥Y = sup ∥Ax∥Y (1.60)
x∈D(A),∥x∥X ≤1 x∈D(A),∥x∥X =1

is finite. This says that A is bounded if the image of the closed unit ball
B̄1X (0) ∩ D(A) is contained in some closed ball B̄rY (0) of finite radius r (with
the smallest radius being the operator norm). Hence A is bounded if and
only if it maps bounded sets to bounded sets.
Note that if you replace the norm on X or Y , then the operator norm
will of course also change in general. However, if the norms are equivalent
so will be the operator norms.
By construction, a bounded operator satisfies
∥Ax∥Y ≤ ∥A∥∥x∥X , x ∈ D(A), (1.61)
1.6. Bounded operators 31

and hence is Lipschitz25 continuous, that is, ∥Ax − Ay∥Y ≤ ∥A∥∥x − y∥X for
x, y ∈ D(A). Note that ∥A∥ could also be defined as the optimal constant
in the inequality (1.61). In particular, it is continuous. The converse is also
true:

Theorem 1.14. A linear operator A is bounded if and only if it is continu-


ous.

Proof. Suppose A is continuous but not bounded. Then there is a sequence


of unit vectors xn ∈ D(A) such that ∥Axn ∥Y ≥ n. Then yn := n1 xn converges
to 0 but ∥Ayn ∥Y ≥ 1 does not converge to 0. □

Of course it suffices to check continuity at one point in X, say at 0, since


continuity at all other points will then follow by a simple translation.
If X is finite dimensional, then every linear operator is bounded:

Lemma 1.15. Let X, Y be normed spaces with X finite dimensional. Then


every linear operator A : D(A) ⊆ X → Y is bounded.

Proof. Choose a basis {xj }nj=1 for D(A) such that every x ∈ D(A) can be
written as x = nj=1 αj xj . By Theorem 1.8 there is a constant m > 0 such
P

that ( nj=1 |αj |2 )1/2 ≤ m∥x∥X . Then


P

v
n
u n
X uX
∥Ax∥Y ≤ |αj |∥Axj ∥Y ≤ mt ∥Axj ∥2Y ∥x∥X
j=1 j=1
Pn
and thus ∥A∥ ≤ m( 2 1/2 .
j=1 ∥Axj ∥Y ) □

In the infinite dimensional case an operator can be unbounded. More-


over, one and the same operation might be bounded (i.e. continuous) or
unbounded, depending on the norm chosen.
Example 1.17. Let X := ℓp (N) and a ∈ ℓ∞ (N). Consider the multiplication
operator A : X → X defined by
(Ab)j := aj bj .
Then |(Ab)j | ≤ ∥a∥∞ |bj | shows ∥A∥ ≤ ∥a∥∞ . In fact, we even have ∥A∥ =
∥a∥∞ (show this). Note also that that the sup in (1.60) is only attained if a
attains its supremum.
If a is unbounded we need a domain D(A) := {b ∈ ℓp (N)|(aj bj )j∈N ∈
ℓp (N)} and A will be unbounded (show this). ⋄

25Rudolf Lipschitz (1832–1903), German mathematician


32 1. A first look at Banach and Hilbert spaces

Example 1.18. Consider the vector space of differentiable functions X :=


C 1 [0, 1] and equip it with the norm (cf. Problem 1.46)
∥f ∥∞,1 := max |f (x)| + max |f ′ (x)|.
x∈[0,1] x∈[0,1]

Let Y := C[0, 1] and observe that the differential operator A = d


dx :X→Y
is bounded since
∥Af ∥∞ = max |f ′ (x)| ≤ max |f (x)| + max |f ′ (x)| = ∥f ∥∞,1 .
x∈[0,1] x∈[0,1] x∈[0,1]

However, if we consider A = dx d
: D(A) ⊆ Y → Y defined on D(A) =
C [0, 1], then we have an unbounded operator. Indeed, choose un (x) :=
1

sin(nπx) which is normalized, ∥un ∥∞ = 1, and observe that


Aun (x) = u′n (x) = nπ cos(nπx)
is unbounded, ∥Aun ∥∞ = nπ. Note that D(A) contains the set of polyno-
mials and thus is dense by the Weierstraß approximation theorem (Theo-
rem 1.3). ⋄
If A is bounded and densely defined, it is no restriction to assume that
it is defined on all of X.
Theorem 1.16 (extension principle). Let A : D(A) ⊆ X → Y be a bounded
linear operator between a normed space X and a Banach space Y . If D(A)
is dense, there is a unique (continuous) extension of A to X which has the
same operator norm.

Proof. Since D(A) is dense, we can find a convergent sequence xn → x from


D(A) for every x ∈ X. Moreover, since A is bounded, Axn is also Cauchy
and has a limit since Y is assumed complete. Consequently, this extension
can only be given by
Ax := lim Axn , xn ∈ D(A), x ∈ X.
n→∞
To show that this definition is independent of the sequence xn → x, let
yn → x be a second sequence and observe
∥Axn − Ayn ∥ = ∥A(xn − yn )∥ ≤ ∥A∥∥xn − yn ∥ → 0.
Since for x ∈ D(A) we can choose xn := x, we see that Ax = Ax in this
case, that is, A is indeed an extension. From continuity of vector addition
and scalar multiplication it follows that A is linear. Finally, from continuity
of the norm we conclude that the operator norm does not increase. □

The set of all bounded linear operators from X to Y is denoted by


L (X, Y ). If X = Y , we write L (X) := L (X, X). An operator in L (X, C)
is called a bounded linear functional, and the space X ∗ := L (X, C) is
1.6. Bounded operators 33

called the dual space of X. The dual space takes the role of coordinate
functions in a Banach space.
Example 1.19. Let X be a finite dimensional space and {uj }nj=1 a basis.
Then every x ∈ X can be uniquely written as x = nj=1 αj uj and we can
P

consider the dual functionals defined via u∗j (x) := αj for 1 ≤ j ≤ n. The
biorthogonal system {u∗j }nj=1 (which are continuous by Lemma 1.15) form
a dual
Pn basis since any other linear functional ℓ ∈ X ∗ can be written as
ℓ = j=1 ℓ(uj )u∗j . In particular, X and X ∗ have the same dimension. ⋄
Example 1.20. Let X := ℓp (N). Then the coordinate functions

ℓj (a) := aj

are bounded linear functionals: |ℓj (a)| = |aj | ≤ ∥a∥p and hence ∥ℓj ∥ = 1
(since equality is attained for a = δ j ). More general, let b ∈ ℓq (N) where
p + q = 1. Then
1 1


X
ℓb (a) := bj aj
j=1

is a bounded linear functional satisfying ∥ℓb ∥ ≤ ∥b∥q by Hölder’s inequality.


In fact, we even have ∥ℓb ∥ = ∥b∥q (Problem 4.17). Note that the first example
is a special case of the second one upon choosing b = δ j . ⋄
Example 1.21. Consider X := C(I). Then for every x0 ∈ I the point
evaluation ℓx0 (f ) := f (x0 ) is a bounded linear functional. In fact, ∥ℓx0 ∥ = 1
(show this).
However,
q note that ℓx0 is unbounded on Lcont (I)! To see this take
2

2 max(0, 1 − n|x − x0 |) which is a triangle shaped peak sup-


3n
fn (x) :=
ported on [x0 − n−1 , x0 + n−1 ] and normalized according to ∥fn ∥2 = 1 for
n sufficiently
q large such that the support is contained in I. Then ℓx0 (f ) =
2 → ∞. This implies that ℓx0 cannot be extended to the com-
3n
fn (x0 ) =
pletion of Lcont (I) in a natural way and reflects the
2 fact that the integral
cannot see individual points (changing the value of a function at one point
does not change its integral). ⋄
Example 1.22. Consider X := C(I) and let g be some continuous function.
Then
Z b
ℓg (f ) := g(x)f (x)dx
a

is a linear functional with norm ∥ℓg ∥ = ∥g∥1 . Indeed, first of all note that
Z b Z b
|ℓg (f )| ≤ |g(x)f (x)|dx ≤ ∥f ∥∞ |g(x)|dx
a a
34 1. A first look at Banach and Hilbert spaces

shows ∥ℓg ∥ ≤ ∥g∥1 . To see that we have equality consider fε = g ∗ /(|g| + ε)


and note
Z b Z b
|g(x)|2 |g(x)|2 − ε2
|ℓg (fε )| = dx ≥ dx = ∥g∥1 − (b − a)ε.
a |g(x)| + ε a |g(x)| + ε
Since ∥fε ∥ ≤ 1 and ε > 0 is arbitrary this establishes the claim. ⋄
Theorem 1.17. The space L (X, Y ) together with the operator norm (1.60)
is a normed space. It is a Banach space if Y is.

Proof. That (1.60) is indeed a norm is straightforward. If Y is complete


and An is a Cauchy sequence of operators, then An x converges for every
x. Define a new operator A via Ax := limn→∞ An x. By continuity of
the vector operations, A is linear and by continuity of the norm ∥Ax∥ =
limn→∞ ∥An x∥ ≤ (limn→∞ ∥An ∥)∥x∥, it is bounded (recall that ∥An ∥ is
Cauchy by the inverse triangle inequality). Furthermore, given ε > 0, there
is some N such that ∥An − Am ∥ ≤ ε for n, m ≥ N and thus ∥An x − Am x∥ ≤
ε∥f ∥. Taking the limit m → ∞, we see ∥An x − Ax∥ ≤ ε∥x∥; that is,
∥An − A∥ ≤ ε and hence An → A. □

In particular, note that the dual space X ∗ is always a Banach space, even
if X is not complete. Moreover, by Theorem 1.16 the completion X̄ satisfies
X̄ ∗ = X ∗ .
The Banach space of bounded linear operators L (X) even has a multi-
plication given by composition. Clearly, this multiplication is distributive
(A+B)C = AC +BC, A(B +C) = AB +BC, A, B, C ∈ L (X), (1.62)
and associative
(AB)C = A(BC), α (AB) = (αA)B = A (αB), α ∈ C. (1.63)
Moreover, it is easy to see that we have
∥AB∥ ≤ ∥A∥∥B∥. (1.64)
In other words, L (X) is a so-called Banach algebra. However, note that
our multiplication is not commutative (unless X is one-dimensional). We
even have an identity, the identity operator I, satisfying ∥I∥ = 1.
Problem 1.39. Show that two norms on X are equivalent if and only if they
give rise to the same convergent sequences.
Problem 1.40. Show that a finite dimensional subspace M ⊆ X of a normed
space is closed.
Problem 1.41. Consider X = Cn and let A ∈ L (X) be a matrix. Equip
X with the norm (show that this is a norm)
∥x∥∞ := max |xj |
1≤j≤n
1.6. Bounded operators 35

and compute the operator norm ∥A∥ with respect to this norm in terms of
the matrix entries. Do the same with respect to the norm
X
∥x∥1 := |xj |.
1≤j≤n

Problem 1.42. Let X := C[0, 1]. Investigate if the following operators


A : X → X are linear and, if yes, compute the norm.
(i) f (x) 7→ (1 − x)x f (x2 ).
(ii) f (x) 7→ (1 − x)x f (x)2 .
R1
(iii) f (x) 7→ 0 (1 − x)y f (y)dy.
Problem 1.43. Let X := C[0, 1]. Investigate the operator A : X → X,
f (x) 7→ x f (x). Show that this is a bounded linear operator and compute its
norm. What is the closure of Ran(A)?
R1
Problem 1.44. Let X := C[0, 1]. Show that ℓ(f ) := 0 f (x)dx is a linear
functional. Compute its norm. Is the norm attained? What if we replace X
by X0 := {f ∈ C[0, 1]|f (0) = 0} (in particular, check that this is a closed
subspace)?
Problem 1.45. Show that the integral operator
Z 1
(Kf )(x) := K(x, y)f (y)dy,
0
where K(x, y) ∈ C([0, 1] × [0, 1]), defined on D(K) := C[0, 1], is a bounded
operator both in X := C[0, 1] (max norm) and X := L2cont (0, 1). Show that
the norm in the X = C[0, 1] case is given by
Z 1
∥K∥ = max |K(x, y)|dy.
x∈[0,1] 0

Problem* 1.46. Let I be a compact interval. Show that the set of dif-
ferentiable functions C 1 (I) becomes a Banach space if we set ∥f ∥∞,1 :=
maxx∈I |f (x)| + maxx∈I |f ′ (x)|.
Problem* 1.47. Show that ∥AB∥ ≤ ∥A∥∥B∥ for every A, B ∈ L (X).
Conclude that the multiplication is continuous: An → A and Bn → B imply
An Bn → AB.
Problem 1.48. Let A ∈ L (X) be a bijection. Show
∥A−1 ∥−1 = inf ∥Af ∥.
x∈X,∥x∥=1

Problem* 1.49. Suppose B ∈ L (X) with ∥B∥ < 1. Then I+B is invertible
with
X∞
−1
(I + B) = (−1)n B n .
n=0
36 1. A first look at Banach and Hilbert spaces

Consequently for A, B ∈ L (X, Y ), A + B is invertible if A is invertible and


∥B∥ < ∥A−1 ∥−1 .
Problem* 1.50. Let

X
f (z) := fj z j , |z| < R,
j=0

be a convergent power series with radius of convergence R > 0. Suppose X is


a Banach space and A ∈ L (X) is a bounded operator with lim supn ∥An ∥1/n <
R (note that by ∥An ∥ ≤ ∥A∥n the limsup is finite). Show that

X
f (A) := fj Aj
j=0

exists and defines a bounded linear operator. Moreover, if f and g are two
such functions and α ∈ C, then
(f + g)(A) = f (A) + g(A), (αf )(A) = αf (a), (f g)(A) = f (A)g(A).
(Hint: Problem 1.6.)
Problem* 1.51. Show that a linear map ℓ : X → C is continuous if and
only if its kernel is closed. (Hint: If ℓ is not continuous, we can find a
sequence of normalized vectors xn with |ℓ(xn )| → ∞ and a vector y with
ℓ(y) = 1.)
Problem 1.52. Show that the norm of a nontrivial linear functional ℓ ∈ X ∗
equals the reciprocal of the distance of the hyperplane ℓ(x) = 1 to the origin:
1
∥ℓ∥ = .
inf{∥x∥|ℓ(x) = 1}

1.7. Sums and quotients of Banach spaces


Given two normed spaces X1 and X2 we can define their (direct) sum
X := X1 ⊕ X2 as the Cartesian product X1 × X2 together with the norm
∥(x1 , x2 )∥ := ∥x1 ∥ + ∥x2 ∥. Clearly X is again a normed space and a se-
quence in X converges if and only if the components converge in X1 and
X2 , respectively. Hence X1 ⊕ X2 will be complete iff both X1 and X2 are
complete.
Moreover, since all norms on R2 are equivalent (Theorem 1.8), we could
equivalently take the norms ∥(x1 , x2 )∥p := (∥x1 ∥p +∥x2 ∥p )1/p or ∥(x1 , x2 )∥∞ :=
max(∥x1 ∥, ∥x2 ∥). We will write X1 ⊕p X2 if we want to emphasize the norm
used. In particular, in the case of Hilbert spaces the choice p = 2 will
ensure that X is again a Hilbert space associated with the scalar product
⟨(x1 , x2 ), (y1 , y2 )⟩ := ⟨x1 , y1 ⟩ + ⟨x2 , y2 ⟩.
1.7. Sums and quotients of Banach spaces 37

Note that X1 and X2 can be regarded as closed subspaces of X1 × X2


by virtue of the obvious embeddings x1 ,→ (x1 , 0) and x2 ,→ (0, x2 ). It
is straightforward to generalize this concept to finitely many spaces (Prob-
lem 1.54).
Lemma
Ln 1.18. Let Xj , j = 1, . . . , n, be Banach spaces and define X :=
p,j=1 Xj to be the Cartesian product X1 × · · · × Xn together with the norm
 1/p
 Pn
j=1 ∥xj ∥p , 1 ≤ p < ∞,
∥(x1 , . . . , xn )∥p :=
max
j=1,...,n ∥x ∥,
j p = ∞.
Then X is a Banach space. Moreover, all norms are equivalent and the sum
is associative (X1 ⊕p X2 ) ⊕p X3 ∼
= X1 ⊕p (X2 ⊕p X3 ).
If Aj : D(Aj ) ⊆ Xj → Yj , j = 1, 2, are linear operators, then we can
define a linear operator via
A1 ⊕ A2 : D(A1 ) × D(A2 ) ⊆ X1 ⊕ X2 → Y1 ⊕ Y2
(x1 , x0 ) 7→ (A1 x1 , A2 x2 ). (1.65)
Clearly A1 ⊕ A2 will be bounded if and only if both A1 and A2 are bounded
and ∥A1 ⊕ A2 ∥ = max(∥A1 ∥, ∥A2 ∥).
Note that if Aj : Xj → Y , j = 1, 2, there is another natural way of
defining an associated operator given by
ˆ 2 : D(A1 ) × D(A2 ) ⊆ X1 ⊕ X2 → Y
A1 ⊕A
(x1 , x0 ) 7→ A1 x1 + A2 x2 . (1.66)
Again A1 ⊕Aˆ 2 will be bounded if and only if both A1 and A2 are bounded and
ˆ 2 ∥ = max(∥A1 ∥, ∥A2 ∥). If an index p ̸= 1 is used to define X1 ⊕ X2 ,
∥A1 ⊕A
then ∥A1 ⊕Aˆ 2 ∥ = (∥A1 ∥q + ∥A2 ∥q )1/q with p1 + 1q = 1.
In particular, in the case Y = C we get that (X1 ⊕p X2 )∗ ∼ = X1∗ ⊕q X2∗ for
p + q = 1 via the identification (ℓ1 , ℓ2 ) ∈ X1 ⊕q X2 7→ ℓ1 ⊕ℓ2 ∈ (X1 ⊕p X2 ) .
1 1 ∗ ∗ ˆ ∗

It is not hard to see that this identification is bijective and preserves the
norm (Problem 1.55).
Lemma 1.19. Let Xj , j = 1, . . . , n, be Banach spaces. Then ( np,j=1 Xj )∗ ∼
L
=
Ln ∗ 1 1
q,j=1 Xj , where p + q = 1.

Given two subspaces M, N ⊆ X of a vector space, we can define their sum


as usual: M + N := {x + y|x ∈ M, y ∈ N }. In particular, the decomposition
x + y with x ∈ M , y ∈ N is unique iff M ∩ N = {0} and we will write
M ∔ N in this case. It is important to observe, that M ∔ N is in general
not isomorphic to M ⊕ N since both have different norms. In fact, M ∔ N
might not even be closed (no problems occur if one of the spaces is finite
dimensional — see Corollary 1.21 below).
38 1. A first look at Banach and Hilbert spaces

Example 1.23. Consider X := ℓp (N) with 1 ≤ p < ∞. Let M := {a ∈


X|a2n = 0} and N := {b ∈ X|b2n−1 = n3 b2n }. Then both subspaces are
closed and M ∩ N = {0}. Moreover, if c = a + b with a ∈ M and b ∈ N ,
then b2n = c2n (and b2n−1 = n3 c2n ) as well as a2n−1 = c2n−1 − n3 c2n (and
a2n = 0). Hence there is such a splitting for c if and only if n3 c2n ∈ ℓp (N).
In particular, this works for all sequences with compact support and thus
M ∔ N is dense. However, it is not all of X since cn = n12 ̸∈ M ∔ N . Indeed,
by the above analysis we had b2n = 4n1 2 and hence b2n−1 = n4 , contradicting
b ∈ N ⊆ X. What about the case p = ∞? ⋄
A closed subspace M is called complemented if we can find another
closed subspace N with M ∩ N = {0} and M ∔ N = X. In this case every
x ∈ X can be uniquely written as x = x1 + x2 with x1 ∈ M , x2 ∈ N and
we can define a projection P : X → M , x 7→ x1 . By definition P 2 = P
and we have a complementary projection Q := I − P with Q : X → N ,
x 7→ x2 . Moreover, it is straightforward to check M = Ker(Q) = Ran(P )
and N = Ker(P ) = Ran(Q). Of course one would like P (and hence also
Q) to be continuous. If we consider the linear bijection ϕ : M ⊕ N → X,
(x1 , x2 ) 7→ x1 +x2 , then this is equivalent to the question if ϕ−1 is continuous.
By the triangle inequality ϕ is continuous with ∥ϕ∥ ≤ 1 and the inverse
mapping theorem (Theorem 4.8) will answer this question affirmative. In
summary, we have M ⊕ N ∼ = X.
It is important to emphasize, that it is precisely the requirement that N
is closed which makes P continuous (conversely observe that N = Ker(P )
is closed if P is continuous). Without this requirement we can always find
N by a simple application of Zorn’s lemma (order the subspaces which have
trivial intersection with M by inclusion and note that a maximal element has
the required properties). Moreover, the question which closed subspaces can
be complemented is a highly nontrivial one. If M is finite (co)dimensional,
then it can be complemented (see Problems 1.62 and 4.23).
Given a subspace M of a linear space X we can define the quotient
space X/M as the set of all equivalence classes [x] = x + M with respect to
the equivalence relation x ≡ y if x − y ∈ M . It is straightforward to see that
X/M is a vector space when defining [x]+[y] = [x+y] and α[x] = [αx] (show
that these definitions are independent of the representative of the equivalence
class). The dimension of X/M is known as the codimension of M .
In particular, for a linear operator A : X → Y the linear space Coker(A) :=
Y / Ran(A) is know as the cokernel of A.
Lemma 1.20. Let M be a closed subspace of a normed space X. Then X/M
together with the norm
∥[x]∥ := dist(x, M ) = inf ∥x − y∥ (1.67)
y∈M
1.7. Sums and quotients of Banach spaces 39

is a normed space. It is complete if X is.

Proof. First of all we need to show that (1.67) is indeed a norm. If ∥[x]∥ = 0
we must have a sequence yj ∈ M with yj → −x and since M is closed we
conclude x ∈ M , that is [x] = [0] as required. To see ∥α[x]∥ = |α|∥[x]∥ we
use again the definition

∥α[x]∥ = ∥[αx]∥ = inf ∥αx + y∥ = inf ∥αx + αy∥


y∈M y∈M
= |α| inf ∥x + y∥ = |α|∥[x]∥.
y∈M

The triangle inequality follows with a similar argument

∥[x] + [y]∥ = ∥[x + y]∥ = inf ∥x + y + z∥ = inf ∥x + z1 + y + z2 ∥


z∈M z1 ,z2 ∈M
≤ inf ∥x + z1 ∥ + inf ∥y + z2 ∥ = ∥[x]∥ + ∥[y]∥.
z1 ∈M z2 ∈M

Thus (1.67) is a norm and it remains to show that X/M is complete if X is.
To this end let [xn ] be a Cauchy sequence. Since it suffices to show that some
subsequence has a limit, we can assume ∥[xn+1 ]−[xn ]∥ < 2−n without loss of
generality. Moreover, by definition of (1.67) we can chose the representatives
xn such that ∥xn+1 − xn ∥ < 2−n (start with x1 and then chose the remaining
ones inductively). By construction xn is a Cauchy sequence which has a limit
x ∈ X since X is complete. Moreover, by ∥[xn ]−[x]∥ = ∥[xn −x]∥ ≤ ∥xn −x∥
we see that [x] is the limit of [xn ]. □

Observe that dist(x, M ) = 0 whenever x ∈ M and hence we only get a


semi-norm if M is not closed.
Example 1.24. If X := C[0, 1] and M := {f ∈ X|f (0) = 0} then X/M ∼ =
C. In fact, note that every f ∈ X can be written as f (x) = g(x) + α with
g(x) := f (x) − f (0) ∈ M and α := f (0) ∈ C. ⋄
Example 1.25. If X := c(N), the convergent sequences, and M := c0 (N),
the sequences converging to 0, then X/M ∼ = C. In fact, note that every
sequence x ∈ c(N) can be written as x = y + αe with y ∈ c0 (N), e :=
(1, 1, 1, . . . ), and α := limn→∞ xn ∈ C its limit. ⋄
The quotient map π : X → X/M , x 7→ [x] is a linear surjective map
with Ker(π) = M . By ∥[x]∥ ≤ ∥x∥ the quotient map π : X → X/M , x 7→ [x]
is bounded with norm at most one. As a small application we note:

Corollary 1.21. Let X be a normed space and let M, N ⊆ X be two closed


subspaces with one of them, say N , finite dimensional. Then M + N is also
closed.
40 1. A first look at Banach and Hilbert spaces

Proof. If π : X → X/M denotes the quotient map, then M + N =


π −1 (π(N )). Moreover, since π(N ) is finite dimensional it is closed and hence
π −1 (π(N )) is closed by continuity. □
Problem 1.53. Let X be a Banach space and suppose P ∈ L (X) is a
projection (i.e., P 2 = P ). Show that Q := I−P is also a projection satisfying
P Q = QP = 0.
Problem* 1.54. Prove Lemma 1.18.
Problem* 1.55. Prove Lemma 1.19. (Hint: Hölder’s inequality in Cn and
note that equality is attained.)
L
Problem 1.56. Let Xj , j ∈ N, be Banach spaces. Let X := p,j∈N Xj be
the set of all elements x = (xj )j∈N of the Cartesian product for which the
norm  1/p
 P
∥x ∥ p , 1 ≤ p < ∞,
j∈N j
∥x∥p :=
max ∥x ∥,
j∈N j p = ∞,
is finite. Show that X is a Banach space. Show that for 1 ≤ p < ∞ the
elements with finitely many nonzero terms are dense and conclude that X is
separable if all Xj are.
Problem 1.57. Let X := ℓp (N) and M := {a ∈ X|a2n = 0}, N := {a ∈
X|n a2n = a2n−1 }. Is M ∔ N closed?
Problem 1.58. Let ℓ be a nontrivial linear functional. Then its kernel has
codimension one.
Problem 1.59. Consider X := ℓ∞ (N) and M := c0 (N). Show dist(a, M ) =
lim supj |aj | for a ∈ X.
Problem 1.60 (Complexification). Given a real normed space X its com-
plexification is given by XC := X × X together with the (complex) scalar
multiplication α(x, y) = (Re(α)x − Im(α)y, Re(α)y + Im(α)x). By virtue of
the embedding x ,→ (x, 0) you should of course think of (x, y) as x + iy.
Show that
∥x + iy∥C := max ∥ cos(t)x + sin(t)y∥,
0≤t≤π
defines a norm on XC which satisfies ∥x∥C = ∥x∥ and
max(∥x∥, ∥y∥) ≤ ∥x + iy∥C ≤ (∥x∥2 + ∥y 2 ∥)1/2 .
In particular, this norm is equivalent to the product norm on X ⊕ X.
If X is a Hilbert space, then the above norm will in general not give
rise to a scalar product. However, any bilinear form s : X × X → R gives
rise to a sesquilinearform sC (x1 + iy1 , x2 + iy2 ) := s(x1 , x2 ) + s(y1 , y2 ) +
i s(x1 , y2 ) − s(y1 , x2 ) . If s is symmetric or positive definite, so will be sC .
1.8. Spaces of continuous and differentiable functions 41

The corresponding norm satisfies ⟨x + iy, x + iy⟩C = ∥x∥2 + ∥y∥2 and is


equivalent to the above one since 21 (∥x∥2 + ∥y∥2 ) ≤ ∥x + iy∥2C ≤ ∥x∥2 + ∥y∥2 .
Given two real normed spaces X, Y , every linear operator A : X → Y
gives rise to a linear operator AC : XC → YC via AC (x + iy) = Ax + iAy.
Show ∥AC ∥ = ∥A∥.
Problem* 1.61. Suppose A ∈ L (X, Y ). Show that Ker(A) is closed.
Suppose M ⊆ Ker(A) is a closed subspace. Show that the induced map
à : X/M → Y , [x] 7→ Ax is a well-defined operator satisfying ∥Ã∥ = ∥A∥
and Ker(Ã) = Ker(A)/M . In particular, Ã is injective for M = Ker(A).
Problem* 1.62. Show that if a closed subspace M of a Banach space X has
finite codimension, then it can be complemented. (Hint: Start with a basis
{[xj ]} for X/M and choose a corresponding dual basis {ℓk } with ℓk ([xj ]) =
δj,k .)

1.8. Spaces of continuous and differentiable functions


In this section we introduce a few further sets of continuous and differen-
tiable functions which are of interest in applications. Let I be some compact
interval, then we can make C 1 (I) into a Banach space by (Problem 1.46)
by introducing the norm ∥f ∥1,∞ := ∥f ∥∞ + ∥f ′ ∥∞ . By a straightforward
extension we can even get (cf. Problem 1.65)
Theorem 1.22. Let I ⊆ R be some interval. The space Cbk (I) of all func-
tions whose partial derivatives up to order k are bounded and continuous
form a Banach space with norm
k
X
∥f ∥k,∞ := sup |f (j) (x)|. (1.68)
j=0 x∈I

Note that the space Cbk (I) could be further refined by requiring the
highest derivatives to be Hölder continuous. Recall that a function f : I → C
is called uniformly Hölder continuous with exponent γ ∈ (0, 1] if
|f (x) − f (y)|
[f ]γ := sup (1.69)
x̸=y∈I |x − y|γ
is finite. Clearly, any Hölder continuous function is uniformly continuous
and, in the special case γ = 1, we obtain the Lipschitz continuous func-
tions. Note that for γ = 0 the Hölder condition boils down to boundedness
and also the case γ > 1 is not very interesting (Problem 1.63).
Example 1.26. By the mean value theorem every function f ∈ Cb1 (I) is
Lipschitz continuous with [f ]γ ≤ ∥f ′ ∥∞ . ⋄
42 1. A first look at Banach and Hilbert spaces

Example 1.27. The prototypical example of a Hölder continuous function


is of course f (x) := xγ on [0, ∞) with γ ∈ (0, 1]. In fact, without loss of
generality we can assume 0 ≤ x < y and set t = xy ∈ [0, 1). Then we have

y γ − xγ 1 − tγ 1−t
γ
≤ γ
≤ = 1.
(y − x) (1 − t) 1−t

From this one easily gets further examples since the composition of two
Hölder continuous functions is again Hölder continuous (the exponent being
the product). ⋄
It is easy to verify that this is a seminorm and that the corresponding
space is complete.

Theorem 1.23. Let I ⊆ R be an interval. The space Cbk,γ (I) of all functions
whose partial derivatives up to order k are bounded and Hölder continuous
with exponent γ ∈ (0, 1] form a Banach space with norm

∥f ∥k,γ,∞ := ∥f ∥k,∞ + [f (k) ]γ . (1.70)

As already noted before, in the case γ = 0 we get a norm which is equiv-


alent to ∥f ∥∞,k and we will set Cbk,0 (I) := Cbk (I) for notational convenience
later on.
Note that by the mean value theorem all derivatives up to order lower
than k are automatically Lipschitz continuous. Moreover, every Hölder con-
tinuous function is uniformly continuous and hence has a unique extension
to the closure I (cf. Theorem B.39). In this sense, the spaces Cb0,γ (I) and
Cb0,γ (I) are naturally isomorphic. Finally, since Hölder continuous functions
on a bounded domain are automatically bounded, we can drop the subscript
b in this situation.

Theorem 1.24. Suppose I ⊂ R is a compact interval. Then C 0,γ2 (I) ⊆


C 0,γ1 (I) ⊆ C(I) for 0 < γ1 < γ2 ≤ 1 with the embeddings being compact.

Proof. That we have continuous embeddings follows since |x − y|−γ1 =


|x−y|−γ2 +(γ2 −γ1 ) ≤ (2r)γ2 −γ1 |x−y|−γ2 if r denotes the length of I. Moreover,
that the embedding C 0,γ1 (I) ⊆ C(I) is compact follows from the Arzelà–
Ascoli theorem (Theorem 1.13). To see the remaining claim let fm be a
bounded sequence in C 0,γ1 (I), explicitly ∥fm ∥∞ ≤ C and [f ]γ1 ≤ C. Hence
by the Arzelà–Ascoli theorem we can assume that fm converges uniformly to
some f ∈ C(I). Moreover, taking the limit in |fm (x) − fm (y)| ≤ C|x − y|γ1
we see that we even have f ∈ C 0,γ1 (I). To see that f is the limit of fm in
C 0,γ2 (I) we need to show [gm ]γ2 → 0, where gm := fm − f . Now observe
1.8. Spaces of continuous and differentiable functions 43

that
|gm (x) − gm (y)| |gm (x) − gm (y)|
[gm ]γ2 = sup γ
+ sup
x̸=y∈I:|x−y|≥ε |x − y| 2 x̸=y∈I:|x−y|<ε |x − y|γ2
≤ 2∥gm ∥∞ ε−γ2 + [gm ]γ1 εγ1 −γ2 ≤ 2∥gm ∥∞ ε−γ2 + 2Cεγ1 −γ2 ,
implying lim supm→∞ [gm ]γ2 ≤ 2Cεγ1 −γ2 and since ε > 0 is arbitrary this
establishes the claim. □

As pointed out in Example 1.26, the embedding Cb1 (I) ⊆ Cb0,1 (I) is
continuous and combining this with the previous result immediately gives
Corollary 1.25. Suppose I ⊂ R is a compact interval, k1 , k2 ∈ N0 , and
0 ≤ γ1 , γ2 ≤ 1. Then C k2 ,γ2 (I) ⊆ C k1 ,γ1 (I) for k1 + γ1 ≤ k2 + γ2 with the
embeddings being compact if the inequality is strict.

For now continuous functions on intervals will be sufficient for our pur-
pose. However, once we delve deeper into the subject we will also need
continuous functions on topological spaces X. Luckily most of the results
extend to this case in a more or less straightforward way. If you are not
familiar with these extensions you can find them in Section B.8.
Problem 1.63. Let I be an interval. Suppose f : I → C is Hölder continu-
ous with exponent γ > 1. Show that f is constant.
Problem 1.64. Let I := [a, b] be a compact interval and consider C 1 (I).
Which of the following is a norm? In case of a norm, is it equivalent to
∥.∥1,∞ ?
(i) ∥f ∥∞
(ii) ∥f ′ ∥∞
(iii) |f (a)| + ∥f ′ ∥∞
(iv) |f (a) − f (b)| + ∥f ′ ∥∞
Rb
(v) a |f (x)|dx + ∥f ∥∞
Problem* 1.65. Suppose X is a vector space Pand ∥.∥j , 1 ≤ j ≤ n, is a
finite family of seminorms. Show that ∥x∥ := nj=1 ∥x∥j is a seminorm. It
is a norm if and only if ∥x∥j = 0 for all j implies x = 0.
Problem 1.66. Let I be a compact interval. Show that the product of two
bounded Hölder continuous functions is again Hölder continuous with
[f g]γ ≤ ∥f ∥∞ [g]γ + [f ]γ ∥g∥∞ .
Chapter 2

Hilbert spaces

The additional geometric structure of Hilbert spaces allows for an intuitive


geometric solution of many problems. In fact, in many situations, e.g. in
Quantum Mechanics, Hilbert spaces occur naturally. This makes them the
weapon of choice whenever possible. Throughout this chapter H will be a
(complex) Hilbert space.

2.1. Orthonormal bases


In this section we will investigate orthonormal series and you will notice
hardly any difference between the finite and infinite dimensional cases. As
our first task, let us generalize the projection into the direction of one vector.
A set of vectors {uj } is called an orthonormal set if ⟨uj , uk ⟩ = 0
for j ̸= k and ⟨uj , uj ⟩ = 1. Note that every orthonormal set is linearly
independent (show this).
Lemma 2.1. Suppose {uj }nj=1 is a finite orthonormal set in a Hilbert space
H. Then every f ∈ H can be written as
n
X
f = f∥ + f⊥ , f∥ := ⟨uj , f ⟩uj , (2.1)
j=1

where f∥ and f⊥ are orthogonal. Moreover, ⟨uj , f⊥ ⟩ = 0 for all 1 ≤ j ≤ n.


In particular,
Xn
2
∥f ∥ = |⟨uj , f ⟩|2 + ∥f⊥ ∥2 . (2.2)
j=1

Furthermore, every fˆ in the span of {uj }nj=1 satisfies


∥f − fˆ∥ ≥ ∥f⊥ ∥ (2.3)

45
46 2. Hilbert spaces

with equality holding if and only if fˆ = f∥ . In other words, f∥ is uniquely


characterized as the vector in the span of {uj }nj=1 closest to f .

Proof. A straightforward calculation shows ⟨uj , f − f∥ ⟩ = 0 and hence f∥


and f⊥ := f − f∥ are orthogonal. The formula for the norm follows by
applying (1.40) iteratively.
Pn
Now, fix a vector fˆ := j=1 αj uj in the span of {uj }j=1 . Then one
n

computes

∥f − fˆ∥2 = ∥f∥ + f⊥ − fˆ∥2 = ∥f⊥ ∥2 + ∥f∥ − fˆ∥2


n
X
= ∥f⊥ ∥2 + |αj − ⟨uj , f ⟩|2
j=1

from which the last claim follows. □

From (2.2) we obtain Bessel’s inequality1


n
X
|⟨uj , f ⟩|2 ≤ ∥f ∥2 (2.4)
j=1

with equality holding if and only if f lies in the span of {uj }nj=1 .
Of course, since we cannot assume H to be a finite dimensional vec-
tor space, we need to generalize Lemma 2.1 to arbitrary orthonormal sets
{uj }j∈J . We start by assuming that J is countable. Then Bessel’s inequality
(2.4) shows that
X
|⟨uj , f ⟩|2 (2.5)
j∈J

converges absolutely. Moreover, for any finite subset K ⊂ J we have


X X
∥ ⟨uj , f ⟩uj ∥2 = |⟨uj , f ⟩|2 (2.6)
j∈K j∈K

by the Pythagorean theorem and thus j∈J ⟨uj , f ⟩uj is a Cauchy sequence
P

if and only if j∈J |⟨uj , f ⟩|2 is. Now let J be arbitrary. Again, Bessel’s
P
inequality shows that for any given ε > 0 there are at most finitely many
j for which |⟨uj , f ⟩| ≥ ε (namely at most ∥f ∥/ε). Hence there are at most
countably many j for which |⟨uj , f ⟩| > 0. Thus it follows that
X
|⟨uj , f ⟩|2 (2.7)
j∈J

1Friedrich Bessel (1784–1846), German astronomer, mathematician, physicist, and geodesist


2.1. Orthonormal bases 47

is well defined (as a countable sum over the nonzero terms) and (by com-
pleteness) so is
X
⟨uj , f ⟩uj . (2.8)
j∈J
Furthermore, it is also independent of the order of summation.
In particular, by continuity of the scalar product we see that Lemma 2.1
can be generalized to arbitrary orthonormal sets.
Theorem 2.2. Suppose {uj }j∈J is an orthonormal set in a Hilbert space H.
Then every f ∈ H can be written as
X
f = f∥ + f⊥ , f∥ := ⟨uj , f ⟩uj , (2.9)
j∈J

where f∥ and f⊥ are orthogonal. Moreover, ⟨uj , f⊥ ⟩ = 0 for all j ∈ J. In


particular, X
∥f ∥2 = |⟨uj , f ⟩|2 + ∥f⊥ ∥2 . (2.10)
j∈J

Furthermore, every fˆ ∈ span{uj }j∈J satisfies


∥f − fˆ∥ ≥ ∥f⊥ ∥ (2.11)
with equality holding if and only if fˆ = f∥ . In other words, f∥ is uniquely
characterized as the vector in span{uj }j∈J closest to f .

Proof. The first part follows as in Lemma 2.1 using continuity of the scalar
product. The same is true for the lastP part except for the fact that every
f ∈ span{uj }j∈J can be written as f = j∈J αj uj (i.e., f = f∥ ). To see this,
let fn ∈ span{uj }j∈J converge to f . Then ∥f −fn ∥2 = ∥f∥ −fn ∥2 +∥f⊥ ∥2 → 0
implies fn → f∥ and f⊥ = 0. □

Note that from Bessel’s inequality (which of course still holds), it follows
that the map f → f∥ is continuous.
Of course we are
P particularly interested in the case where every f ∈ H
can be written as j∈J ⟨uj , f ⟩uj . In this case we will call the orthonormal
set {uj }j∈J an orthonormal basis (ONB).
If H is separable it is easy to construct an orthonormal basis. In fact, if
H is separable, then there exists a countable total set {fj }N
j=1 . Here N ∈ N
if H is finite dimensional and N = ∞ otherwise. After throwing away some
vectors, we can assume that fn+1 cannot be expressed as a linear combination
of the vectors f1 , . . . , fn . Now we can construct an orthonormal set as
follows: We begin by normalizing f1 :
f1
u1 := . (2.12)
∥f1 ∥
48 2. Hilbert spaces

Next we take f2 and remove the component parallel to u1 and normalize


again:
f2 − ⟨u1 , f2 ⟩u1
u2 := . (2.13)
∥f2 − ⟨u1 , f2 ⟩u1 ∥
Proceeding like this, we define recursively
Pn−1
fn − j=1 ⟨uj , fn ⟩uj
un := Pn−1 . (2.14)
∥fn − j=1 ⟨uj , fn ⟩uj ∥
This procedure is known as Gram–Schmidt orthogonalization.2 Hence
we obtain an orthonormal set {uj }N j=1 such that span{uj }j=1 = span{fj }j=1
n n

for any finite n and thus also for n = N (if N = ∞). Since {fj }N j=1 is total,
so is {uj }j=1 . Now suppose there is some f = f∥ + f⊥ ∈ H for which f⊥ ̸= 0.
N

Since {uj }N ˆ ˆ
j=1 is total, we can find a f in its span such that ∥f − f ∥ < ∥f⊥ ∥,
contradicting (2.11). Hence we infer that {uj }N j=1 is an orthonormal basis.

Theorem 2.3. Every separable Hilbert space has a countable orthonormal


basis.
Example 2.1. The vectors {δ n }n∈N form an orthonormal basis for ℓ2 (N). ⋄
Example 2.2. In L2cont (−1, 1), we can orthogonalize the monomials fn (x) :=
xn (which are total by the Weierstraß approximation theorem — Theo-
rem 1.3). The resulting polynomials are up to a normalization known as
Legendre polynomials3
3 x2 − 1
P0 (x) = 1, P1 (x) = x, P2 (x) = , ... (2.15)
2
(which are normalized such that Pn (1) = 1). ⋄
Example 2.3. The set of functions
1
un (x) := √ einx , n ∈ Z, (2.16)

forms an orthonormal basis for H := L2cont (0, 2π). The corresponding or-
thogonal expansion is just the ordinary Fourier series. We will discuss this
example in detail in Section 2.5. ⋄
The following equivalent properties also characterize a basis.
Theorem 2.4. For an orthonormal set {uj }j∈J in a Hilbert space H, the
following conditions are equivalent:
(i) {uj }j∈J is a maximal orthogonal set.
2Jørgen Pedersen Gram (1850–1916), Danish actuary and mathematician
2Erhard Schmidt (1876–1959), Baltic German mathematician
3Adrien-Marie Legendre (1752–1833), French mathematician
2.1. Orthonormal bases 49

(ii) For every vector f ∈ H we have


X
f= ⟨uj , f ⟩uj . (2.17)
j∈J

(iii) For every vector f ∈ H we have Parseval’s relation4


X
∥f ∥2 = |⟨uj , f ⟩|2 . (2.18)
j∈J

(iv) ⟨uj , f ⟩ = 0 for all j ∈ J implies f = 0.

Proof. We will use the notation from Theorem 2.2.


(i) ⇒ (ii): If f⊥ ̸= 0, then we can normalize f⊥ to obtain a unit vector f˜⊥
which is orthogonal to all vectors uj . But then {uj }j∈J ∪ {f˜⊥ } would be a
larger orthonormal set, contradicting the maximality of {uj }j∈J .
(ii) ⇒ (iii): This follows since (ii) implies f⊥ = 0.
(iii) ⇒ (iv): If ⟨f, uj ⟩ = 0 for all j ∈ J, we conclude ∥f ∥2 = 0 and hence
f = 0.
(iv) ⇒ (i): If {uj }j∈J were not maximal, there would be a unit vector g such
that {uj }j∈J ∪ {g} is a larger orthonormal set. But ⟨uj , g⟩ = 0 for all j ∈ J
implies g = 0 by (iv), a contradiction. □

By continuity of the norm it suffices to check (iii), and hence also (ii),
for f in a dense set. In fact, by the inverse triangle inequality for ℓ2 (N) and
the Bessel inequality we have
X X sX sX
2 2 2
|⟨uj , f ⟩| − |⟨uj , g⟩| ≤ |⟨uj , f − g⟩| |⟨uj , f + g⟩|2
j∈J j∈J j∈J j∈J

≤ ∥f − g∥∥f + g∥ (2.19)
implying |⟨uj , fn ⟩|2 → |⟨uj , f ⟩|2 if fn → f .
P P
j∈J j∈J
It is not surprising that if there is one countable basis, then it follows
that every other basis is countable as well.
Theorem 2.5. In a Hilbert space H every orthonormal basis has the same
cardinality.

Proof. Let {uj }j∈J and {vk }k∈K be two orthonormal bases. We first look
at the case where one of them, say the first, is finite dimensional: J =
{1, . . . , n}. Suppose
P the other basis has at least n elements {1, . . . , n} ⊆
K. Then vk = nj=1 Uk,j uj , where Uk,j := ⟨uj , vk ⟩. By δj,k = ⟨vj , vk ⟩ =
Pn Pn
l=1 Uj,l Uk,l we see k=1 Uk,j vk = uj showing that v1 , . . . , vn span H and
∗ ∗

hence K cannot have more than n elements.


4Marc-Antoine Parseval (1755–1836), French mathematician
50 2. Hilbert spaces

Now let us turn to the case where both J and K are infinite. Set Kj =
̸ 0}. Since these are the expansion coefficients of uj with
{k ∈ K|⟨vk , uj ⟩ =
respect to {vk }k∈K , this set is countable (and nonempty). Hence the set
K̃ = j∈J Kj satisfies |K̃| ≤ |J × N| = |J| (Theorem A.9). But k ∈ K \ K̃
S

implies vk = 0 and hence K̃ = K. So |K| ≤ |J| and reversing the roles of J


and K shows |K| = |J| (Theorem A.3). □

The cardinality of an orthonormal basis is also called the Hilbert space


dimension of H.
It even turns out that, up to unitary equivalence, ℓ2 (N) is the only sep-
arable infinite dimensional Hilbert space:
A bijective linear operator U ∈ L (H1 , H2 ) is called unitary if U pre-
serves scalar products:
⟨U g, U f ⟩2 = ⟨g, f ⟩1 , g, f ∈ H1 . (2.20)
By the polarization identity, (1.47) this is the case if and only if U preserves
norms: ∥U f ∥2 = ∥f ∥1 for all f ∈ H1 (note that a norm preserving linear
operator is automatically injective). The two Hilbert spaces H1 and H2 are
called unitarily equivalent in this case.
Let H be a separable infinite dimensional Hilbert space and let {uj }j∈N
be any orthogonal basis. Then the map U : H → ℓ2 (N), f 7→ (⟨uj , f ⟩)j∈N is
unitary. Indeed by Theorem 2.4 (iii) it is norm preserving andPhence injective.
To see that it is onto, let a ∈ ℓ (N) and observe that by ∥ nj=m aj uj ∥2 =
2
Pn
j=m |aj | the vector f := j∈N aj uj is well defined and satisfies aj =
2
P

⟨uj , f ⟩. In particular,
Theorem 2.6. Any separable infinite dimensional Hilbert space is unitarily
equivalent to ℓ2 (N).

Of course the same argument shows that every finite dimensional Hilbert
space of dimension n is unitarily equivalent to Cn with the usual scalar
product.
Finally we briefly turn to the case where H is not separable.
Theorem 2.7. Every Hilbert space has an orthonormal basis.

Proof. To prove this we need to resort to Zorn’s lemma (Theorem A.2): The
collection of all orthonormal sets in H can be partially ordered by inclusion.
Moreover, every linearly ordered chain has an upper bound (the union of all
sets in the chain). Hence Zorn’s lemma implies the existence of a maximal
element, that is, an orthonormal set which is not a proper subset of every
other orthonormal set. This maximal element is an ONB by Theorem 2.4
(i). □
2.1. Orthonormal bases 51

Hence, if {uj }j∈J is an orthogonal basis, we can show that H is unitarily


equivalent to ℓ2 (J) and, by prescribing J, we can find a Hilbert space of any
given dimension. Here ℓ2 (J) is the set of all complex-valued
P functions (aj )j∈J
where at most countably many values are nonzero and j∈J |aj |2 < ∞.
Example 2.4. Define the set of all almost periodic functions AP (R) as
the closure of the set of trigonometric polynomials
n
X
f (t) = αk eiθk t , αk ∈ C, θk ∈ R,
k=1

with respect to the sup norm. In particular AP (R) ⊂ Cb (R) is a Banach


space when equipped with the sup norm. Since the trigonometric polynomi-
als form an algebra, it is even a Banach algebra. Using the Stone–Weierstraß
theorem one can verify that every periodic function is almost periodic (make
the approximation on one period and note that you get the √ rest of R for free
from periodicity) but the converse is not true (e.g. eit + ei 2t is not periodic).
It is not difficult to show that
Z T (
1 iθt 1, θ = 0,
lim e dt =
T →∞ 2T −T ̸ 0,
0, θ =
and hence one can conclude that every almost periodic function has a mean
value
Z T
1
M (f ) := lim f (t)dt.
T →∞ 2T −T

Note that |M (f )| ≤ ∥f ∥∞ .
Next one can show that
⟨f, g⟩ := M (f ∗ g)
defines a scalar product on AP (R). To see that it is positive definite (all other
properties are straightforward), let f ∈ AP (R) with ∥f ∥2 = M (|f |2 ) = 0.
Choose a sequence of trigonometric polynomials fn with ∥f − fn ∥∞ → 0. By
∥f ∥ ≤ ∥f ∥∞ we also have ∥f −fn ∥ → 0. Moreover, by the triangle inequality
(which holds for any nonnegative sesquilinear form — Problem 1.29) we have
∥fn ∥ ≤ ∥f ∥ + ∥f − fn ∥ = ∥f − fn ∥ ≤ ∥f − fn ∥∞ → 0, and thus f = 0.
Abbreviating eθ (t) = eiθt we see that {eθ }θ∈R is an uncountable orthonor-
mal set and
f (t) 7→ fˆ(θ) := ⟨eθ , f ⟩ = M (e−θ f )
maps AP (R) isometrically (with respect to ∥.∥) into ℓ2 (R). This map is
however not surjective (take e.g. a Fourier series which converges in mean
square but not uniformly — see later) and hence AP (R) is not complete
with respect to ∥.∥. ⋄
52 2. Hilbert spaces

Problem 2.1. Given some vectors f1 , . . . , fn we define their Gram deter-


minant as
Γ(f1 , . . . , fn ) := det (⟨fj , fk ⟩)1≤j,k≤n .
Show that the Gram determinant is nonzero if and only if the vectors are
linearly independent. Moreover, show that in this case
Γ(f1 , . . . , fn , g)
dist(g, span{f1 , . . . , fn })2 =
Γ(f1 , . . . , fn )
and
n
Y
Γ(f1 , . . . , fn ) ≤ ∥fj ∥2 .
j=1
with equality if the vectors are orthogonal. (Hint: First establish Γ(f1 , . . . , fj +
αfk , . . . , fn ) = Γ(f1 , . . . , fn ) for j ̸= k and use it to investigate how Γ changes
when you apply the Gram–Schmidt procedure?)
Problem 2.2. Let {uj } be some orthonormal basis. Show that a bounded
linear operator A is uniquely determined by its matrix elements Ajk :=
⟨uj , Auk ⟩ with respect to this basis.
Problem 2.3. Give an example of a nonempty closed bounded subset of a
Hilbert space which does not contain an element with minimal norm. Can
this happen in finite dimensions? (Hint: Look for a discrete set.)
Problem 2.4. Show that the set of vectors {cn := (1, n−1 , n−2 , . . . )}∞
n=2 is
total in ℓ 2 (N). (Hint: Use that for any a ∈ ℓ2 (N) the functions f (z) :=
j−1 is holomorphic in the unit disc.)
P
j∈N aj z

2.2. The projection theorem and the Riesz representation


theorem
Let M ⊆ H be a subset. Then
M ⊥ := {f |⟨g, f ⟩ = 0, ∀g ∈ M } (2.21)
is called the orthogonal complement of M . By continuity of the scalar
product it follows that M ⊥ is a closed linear subspace and by linearity that
(span(M ))⊥ = M ⊥ . For example, we have H⊥ = {0} since any vector in H⊥
must be in particular orthogonal to all vectors in some orthonormal basis.
Theorem 2.8 (Projection theorem). Let M be a closed linear subspace of a
Hilbert space H. Then every f ∈ H can be uniquely written as f = f∥ + f⊥
with f∥ ∈ M and f⊥ ∈ M ⊥ , where f∥ is uniquely characterized as the vector
in M closest to f . One writes
M ⊕ M⊥ = H (2.22)
in this situation.
2.2. The projection theorem and the Riesz representation theorem 53

Proof. Since M is closed, it is a Hilbert space and has an orthonormal


basis {uj }j∈J . Hence the existence part follows from Theorem 2.2. To see
uniqueness, suppose there is another decomposition f = f˜∥ + f˜⊥ . Then
f∥ − f˜∥ = f˜⊥ − f⊥ ∈ M ∩ M ⊥ = {0} (since g ∈ M ∩ M ⊥ implies ∥g∥2 =
⟨g, g⟩ = 0). □
Corollary 2.9. Every orthogonal set {uj }j∈J can be extended to an orthog-
onal basis.

Proof. Just add an orthogonal basis for ({uj }j∈J )⊥ . □

The operator PM f := f∥ is called the orthogonal projection corre-


sponding to M . Note that we have
2
PM = PM and ⟨PM g, f ⟩ = ⟨g, PM f ⟩ (2.23)
since ⟨PM g, f ⟩ = ⟨g∥ , f∥ ⟩ = ⟨g, PM f ⟩. Clearly we have PM ⊥ f = f −
PM f = f⊥ . Furthermore, (2.23) uniquely characterizes orthogonal projec-
tions (Problem 2.9).
Moreover, if M is a closed subspace, we have PM ⊥⊥ = I − PM ⊥ =
I − (I − PM ) = PM ; that is, M ⊥⊥ = Ran(PM ⊥⊥ ) = Ran(PM ) = M . If M is
an arbitrary subset, we have at least
M ⊥⊥ = span(M ). (2.24)
Note that by H⊥ = {0} we see that M ⊥ = {0} if and only if M is total.
Next we turn to linear functionals, that is, to operators ℓ : H → C. By
the Cauchy–Schwarz inequality we know that ℓg : f 7→ ⟨g, f ⟩ is a bounded
linear functional (with norm ∥g∥). In turns out that, in a Hilbert space,
every bounded linear functional can be written in this way.
Theorem 2.10 (Riesz5 representation theorem). Suppose ℓ is a bounded
linear functional on a Hilbert space H. Then there is a unique vector g ∈ H
such that ℓ(f ) = ⟨g, f ⟩ for all f ∈ H.
In other words, a Hilbert space is equivalent to its own dual space H∗ ∼
=H
via the map f 7→ ⟨f, .⟩ which is a conjugate linear isometric bijection between
H and H∗ .

Proof. If ℓ ≡ 0, we can choose g = 0. Otherwise Ker(ℓ) = {f |ℓ(f ) = 0} is a


proper subspace and we can find a unit vector g̃ ∈ Ker(ℓ)⊥ . For every f ∈ H
we have ℓ(f )g̃ − ℓ(g̃)f ∈ Ker(ℓ) and hence
0 = ⟨g̃, ℓ(f )g̃ − ℓ(g̃)f ⟩ = ℓ(f ) − ℓ(g̃)⟨g̃, f ⟩.

5Frigyes Riesz (1880–1956), Hungarian mathematician


54 2. Hilbert spaces

In other words, we can choose g = ℓ(g̃)∗ g̃. To see uniqueness, let g1 , g2 be


two such vectors. Then ⟨g1 − g2 , f ⟩ = ⟨g1 , f ⟩ − ⟨g2 , f ⟩ = ℓ(f ) − ℓ(f ) = 0 for
every f ∈ H, which shows g1 − g2 ∈ H⊥ = {0}. □

In particular, this shows that H∗ is again a Hilbert space whose scalar


product (in terms of the above identification) is given by ⟨⟨f, .⟩, ⟨g, .⟩⟩H∗ =
⟨f, g⟩∗ .
We can even get a unitary map between H and H∗ but such a map is
not unique. To this end note that every Hilbert space has a conjugation C
which generalizes taking the complex conjugate of every coordinate. In fact,
choosing an orthonormal basis (and different choices will produce different
maps in general) we can set
X X
Cf := ⟨uj , f ⟩∗ uj = ⟨f, uj ⟩uj .
j∈J j∈J

Then C is conjugate linear, isometric ∥Cf ∥ = ∥f ∥, and idempotent C 2 = I.


Note also ⟨Cf, Cg⟩ = ⟨f, g⟩∗ . As promised, the map f → ⟨Cf, .⟩ is a unitary
map from H to H∗ .
Finally, we remark that projections cannot only be defined for subspaces
but also for closed convex sets (of course they will no longer be linear in this
case).
Theorem 2.11 (Hilbert projection theorem). Let H be a Hilbert space and
K a nonempty closed convex subset. Then for every f ∈ H \ K there is a
unique PK (f ) ∈ K such that ∥PK (f ) − f ∥ = inf g∈K ∥f − g∥. If we extend
PK : H → K by setting PK (g) = g for g ∈ K then PK will be Lipschitz
continuous: ∥PK (f ) − PK (g)∥ ≤ ∥f − g∥, f, g ∈ H.

Proof. Fix f ∈ H \ K and choose a sequence fn ∈ K with ∥fn − f ∥ → d :=


inf g∈K ∥f − g∥. Then applying the parallelogram law to the vectors fn − f
and fm − f we obtain
∥fn − fm ∥2 = 2(∥f − fn ∥2 + ∥f − fm ∥2 ) − 4∥f − 12 (fn + fm )∥2
≤ 2(∥f − fn ∥2 + ∥f − fm ∥2 ) − 4d2 ,
which shows that fn is Cauchy and hence converges to some point in K which
we call P (f ). By construction ∥P (f ) − f ∥ = d. If there would be another
point P̃ (f ) with the same property, we could apply the parallelogram law
to P (f ) − f and P̃ (f ) − f giving ∥P (f ) − P̃ (f )∥2 ≤ 0 and hence P (f ) is
uniquely defined.
Next, let f ∈ H, g ∈ K and consider g̃ = (1 − t)P (f ) + t g ∈ K, t ∈ [0, 1].
Then
0 ≥ ∥f − P (f )∥2 − ∥f − g̃∥2 = 2tRe(⟨f − P (f ), g − P (f )⟩) − t2 ∥g − P (f )∥2
2.3. Operators defined via forms 55

for arbitrary t ∈ [0, 1] shows Re(⟨f − P (f ), P (f ) − g⟩) ≥ 0. Consequently


we have Re(⟨f − P (f ), P (f ) − P (g)⟩) ≥ 0 for all f, g ∈ H. Now reverse
to roles of f, g and add the two inequalities to obtain ∥P (f ) − P (g)∥2 ≤
Re⟨f − g, P (f ) − P (g)⟩ ≤ ∥f − g∥∥P (f ) − P (g)∥. Hence Lipschitz continuity
follows. □

If K is a closed subspace then this projection will of course coincide with


the orthogonal projection defined before. By inspection of the proof, note
that PK (f ) is alternatively characterized by Re(⟨f − PK (f ), g − PK (f )⟩) ≤ 0
for all g ∈ K.

Problem 2.5. Let M1 , M2 be two subspaces of a Hilbert space H. Show that


(M1 + M2 )⊥ = M1⊥ ∩ M2⊥ . If in addition M1 and M2 are closed, show that
(M1 ∩ M2 )⊥ = M1⊥ + M2⊥ .
aj +aj+2
Problem 2.6. Show that ℓ(a) = ∞
P
j=1 2j
defines a bounded linera func-
2
tional on X := ℓ (N). Compute its norm.

Problem 2.7. Suppose U : H → H is unitary and M ⊆ H. Show that


U M ⊥ = (U M )⊥ .

Problem 2.8. Show that an orthogonal projection PM ̸= 0 has norm one.

Problem* 2.9. Suppose P ∈ L (H) satisfies


P2 = P and ⟨P f, g⟩ = ⟨f, P g⟩
and set M := Ran(P ). Show
• P f = f for f ∈ M and M is closed,
• Ker(P ) = M ⊥
and conclude P = PM .

Problem 2.10. Compute PK for the closed unit ball K := B̄1 (0).

2.3. Operators defined via forms


One of the key results about linear maps is that they are uniquely deter-
mined once we know the images of some basis vectors. In fact, the matrix
elements with respect to some basis uniquely determine a linear map. Clearly
this raises the question how this results extends to the infinite dimensional
setting. As a first result we show that the Riesz lemma, Theorem 2.10, im-
plies that a bounded operator A is uniquely determined by its associated
sesquilinear form ⟨g, Af ⟩. In fact, there is a one-to-one correspondence be-
tween bounded operators and bounded sesquilinear forms:
56 2. Hilbert spaces

Lemma 2.12. Let H1 , H2 be Hilbert spaces. Suppose s : H2 × H1 → C is a


bounded sesquilinear form; that is,
|s(g, f )| ≤ C∥g∥H2 ∥f ∥H1 . (2.25)
Then there is a unique bounded operator A ∈ L (H1 , H2 ) such that
s(g, f ) = ⟨g, Af ⟩H2 . (2.26)
Moreover, the norm of A is given by
∥A∥ = sup |⟨g, Af ⟩H2 | ≤ C. (2.27)
∥g∥H2 =∥f ∥H1 =1

Proof. For every f ∈ H1 we have an associated bounded linear functional


ℓf (g) := s(g, f )∗ on H2 . By Theorem 2.10 there is a corresponding h ∈ H2
(depending on f ) such that ℓf (g) = ⟨h, g⟩H2 , that is s(g, f ) = ⟨g, h⟩H2 and
we can define A via Af := h. It is not hard to check that A is linear and
from
∥Af ∥2H2 = ⟨Af, Af ⟩H2 = s(Af, f ) ≤ C∥Af ∥H2 ∥f ∥H1
we infer ∥Af ∥H2 ≤ C∥f ∥H1 , which shows that A is bounded with ∥A∥ ≤ C.
Equation (2.27) is left as an exercise (Problem 2.13). □

Note that if {uk }k∈K ⊆ H1 and {vj }j∈J ⊆ H2 are some orthogonal bases,
then the matrix elements Aj,k := ⟨vj , Auk ⟩H2 for all (j, k) ∈ J × K uniquely
determine ⟨g, Af ⟩H2 for arbitrary f ∈ H1 , g ∈ H2 (just expand f, g with
respect to these bases) and thus A by our theorem.
Example 2.5. Consider ℓ2 (N) and let A ∈ L (ℓ2 (N)) be some bounded
operator. Let Ajk = ⟨δ j , Aδ k ⟩ be its matrix elements such that

X
(Aa)j = Ajk ak .
k=1

Since
P∞ Ajk are the expansion coefficients of A∗ δ j (see (2.28) below), we have
k=1 |Ajk | = ∥A δ ∥ and the sum is even absolutely convergent.
2 ∗ j 2 ⋄
Moreover, for A ∈ L (H) the polarization identity (Problem 1.27) implies
that A is already uniquely determined by its quadratic form qA (f ) := ⟨f, Af ⟩.
As a first application we introduce the adjoint operator via Lemma 2.12
as the operator associated with the sesquilinear form s(f, g) := ⟨Af, g⟩H2 .
Theorem 2.13. Let H1 , H2 be Hilbert spaces. For every bounded operator
A ∈ L (H1 , H2 ) there is a unique bounded operator A∗ ∈ L (H2 , H1 ) defined
via
⟨f, A∗ g⟩H1 = ⟨Af, g⟩H2 . (2.28)

A bounded operator A ∈ L (H) satisfying A∗ = A is called self-adjoint.


Note that qA∗ (f ) = ⟨Af, f ⟩ = qA (f )∗ and hence
2.3. Operators defined via forms 57

Lemma 2.14. Let H be a complex Hilbert space. A bounded operator is


self-adjoint if and only if its quadratic form is real-valued.

Warning: This result fails in a real Hilbert space. 


Example 2.6. If H := and A := (ajk )1≤j,k≤n , then
Cn A∗ = (a∗kj )1≤j,k≤n .
Clearly A is self-adjoint if and only if ajk = a∗kj . ⋄
Example 2.7. If I ∈ L (H) is the identity, then I∗ = I. ⋄
Example 2.8. Consider the linear functional ℓ : H → C, f → 7 ⟨g, f ⟩. Then
by the definition ⟨f, ℓ∗ α⟩ = ℓ(f )∗ α = ⟨f, αg⟩ we obtain ℓ∗ : C → H, α 7→
αg. ⋄
Example 2.9. Let H := ℓ2 (N), a ∈ ℓ∞ (N) and consider the multiplication
operator
(Ab)j := aj bj .
Then

X ∞
X
⟨Ab, c⟩ = (aj bj )∗ cj = b∗j (a∗j cj ) = ⟨b, A∗ c⟩
j=1 j=1
with (A∗ c)j
= a∗j cj ,
that is, is the multiplication operator with a∗ . In
A∗
particular, A is self-adjoint if and only if a is real-valued. ⋄
Example 2.10. Let H := ℓ2 (N) and consider the shift operators defined via
(S ± a)j := aj±1
with the convention that a0 = 0. That is, S − shifts a sequence to the right
and fills up the left most place by zero and S + shifts a sequence to the left
dropping the left most place:
S − (a1 , a2 , a3 , · · · ) = (0, a1 , a2 , · · · ), S + (a1 , a2 , a3 , · · · ) = (a2 , a3 , a4 , · · · ).
Then

X ∞
X
⟨S − a, b⟩ = a∗j−1 bj = a∗j bj+1 = ⟨a, S + b⟩,
j=2 j=1
which shows that =(S − )∗ S+.
Using symmetry of the scalar product we
also get ⟨b, S a⟩ = ⟨S b, a⟩, that is, (S + )∗ = S − .
− +

Note that S + is a left inverse of S − , S + S − = I, but not a right inverse


as S − S + ̸= I. This is different from the finite dimensional case, where a left
inverse is also a right inverse and vice versa. ⋄
Example 2.11. Suppose U ∈ L (H1 , H2 ) is unitary. Then U ∗ = U −1 . This
follows from Lemma 2.12 since ⟨f, g⟩H1 = ⟨U f, U g⟩H2 = ⟨f, U ∗ U g⟩H1 implies
U ∗ U = IH1 . Since U is bijective we can multiply this last equation from the
right with U −1 to obtain the claim. Of course this calculation shows that
the converse is also true, that is U ∈ L (H1 , H2 ) is unitary if and only if
U ∗ = U −1 . ⋄
58 2. Hilbert spaces

A few simple properties of taking adjoints are listed below.


Lemma 2.15. Let A, B ∈ L (H1 , H2 ), C ∈ L (H2 , H3 ), and α ∈ C. Then
(i) (A + B)∗ = A∗ + B ∗ , (αA)∗ = α∗ A∗ ,
(ii) A∗∗ = A,
(iii) (CA)∗ = A∗ C ∗ ,
(iv) ∥A∗ ∥ = ∥A∥ and ∥A∥2 = ∥A∗ A∥ = ∥AA∗ ∥.

Proof. (i) is obvious. (ii) follows from ⟨g, A∗∗ f ⟩H2 = ⟨A∗ g, f ⟩H1 = ⟨g, Af ⟩H2 .
(iii) follows from ⟨g, (CA)f ⟩H3 = ⟨C ∗ g, Af ⟩H2 = ⟨A∗ C ∗ g, f ⟩H1 . (iv) follows
using (2.27) from
∥A∗ ∥ = sup |⟨f, A∗ g⟩H1 | = sup |⟨Af, g⟩H2 |
∥f ∥H1 =∥g∥H2 =1 ∥f ∥H1 =∥g∥H2 =1

= sup |⟨g, Af ⟩H2 | = ∥A∥


∥f ∥H1 =∥g∥H2 =1

and
∥A∗ A∥ = sup |⟨f, A∗ Ag⟩H1 | = sup |⟨Af, Ag⟩H2 |
∥f ∥H1 =∥g∥H2 =1 ∥f ∥H1 =∥g∥H2 =1

= sup ∥Af ∥2 = ∥A∥2 ,


∥f ∥H1 =1

where we have used that |⟨Af, Ag⟩H2 | attains its maximum when Af and Ag
are parallel (compare Theorem 1.5). Finally, ∥AA∗ ∥ = ∥A∗∗ A∗ ∥ = ∥A∗ ∥2 =
∥A∥2 . □

Note that ∥A∥ = ∥A∗ ∥ implies that taking adjoints is a continuous op-
eration. For later use also note that (Problem 2.15)
Ker(A∗ ) = Ran(A)⊥ . (2.29)

For the remainder of this section we restrict to the case of one Hilbert
space. A sesquilinear form s : H×H → C is called nonnegative if s(f, f ) ≥ 0
and it is called coercive if
Re(s(f, f )) ≥ ε∥f ∥2 , ε > 0. (2.30)
We will call A ∈ L (H) nonnegative, coercive if its associated sesquilinear
form is. We will write A ≥ 0 if A is nonnegative and A ≥ B if A − B ≥ 0.
Observe that nonnegative operators are self-adjoint (as their quadratic forms
are real-valued — here it is important that the underlying space is complex;
in case of a real space a nonnegative form is required to be symmetric).
Example 2.12. For any operator A the operators A∗ A and AA∗ are both
nonnegative. In fact ⟨f, A∗ Af ⟩ = ⟨Af, Af ⟩ = ∥Af ∥2 ≥ 0 and similarly
⟨f, AA∗ f ⟩ = ∥A∗ f ∥2 ≥ 0. ⋄
2.3. Operators defined via forms 59

Lemma 2.16. Suppose A ∈ L (H) satisfies ∥Af ∥ ≥ ε∥f ∥ for some ε > 0.
Then Ran(A) is closed and A : H → Ran(A) is a bijection with bounded
inverse, ∥A−1 ∥ ≤ 1ε . If we have the stronger condition |⟨f, Af ⟩| ≥ ε∥f ∥2 ,
then Ran(A) = H.

Proof. Since Af = 0 implies f = 0 our operator is injective and thus for


every g ∈ Ran(A) there is a unique f = A−1 g. Moreover, by ∥A−1 g∥ =
∥f ∥ ≤ ε−1 ∥Af ∥ = ε−1 ∥g∥ the operator A−1 is bounded. So if gn ∈ Ran(A)
converges to some g ∈ H, then fn = A−1 gn converges to some f . Taking
limits in gn = Afn shows that g = Af is in the range of A, that is, the range
of A is closed.
By ε∥f ∥2 ≤ |⟨f, Af ⟩| ≤ ∥f ∥∥Af ∥ the second condition implies the first.
To show that Ran(A) = H we pick h ∈ Ran(A)⊥ . Then 0 = ⟨h, Ah⟩ ≥ ε∥h∥2
shows h = 0 and thus Ran(A)⊥ = {0}. □

As a consequence we obtain the famous Lax–Milgram theorem6 which


plays an important role in theory of elliptic partial differential equations.
Theorem 2.17 (Lax–Milgram). Let s : H × H → C be a sesquilinear form
on a Hilbert space H which is
• bounded, |s(f, g)| ≤ C∥f ∥ ∥g∥, and
• satisfies |s(f, f )| ≥ ε∥f ∥2 for some ε > 0.
Then for every g ∈ H there is a unique f ∈ H such that
s(h, f ) = ⟨h, g⟩, ∀h ∈ H. (2.31)
Moreover, ∥f ∥ ≤ 1ε ∥g∥.

Proof. Let A be the operator associated with s by Lemma 2.12. Then A is


a bijection by Lemma 2.16 and f = A−1 g has the required properties. □

Instead of the second condition one frequently requires that s is coercive,


which is clearly weaker.
Note that (2.31) can also be phrased as a minimizing problem if s is
nonnegative — Problem 2.17.
Example 2.13. Consider H = ℓ2 (N) and introduce the operator
(Aa)j := −aj+1 + 2aj − aj−1
which is a discrete version of a second derivative (discrete one-dimensional
Laplace operator). Here we use the convention a0 := 0, that is, (Aa)1 =

6Peter Lax (*1926), American mathematician of Hungarian origin


6Arthur Milgram (1912–1961), American mathematician
60 2. Hilbert spaces

−a2 + 2a1 . In terms of the shift operators S ± we can write


A = −S + + 2 − S − = (S + − 1)(S − − 1)
and using (S ± )∗ = S ∓ we obtain

X
sA (a, b) = ⟨(S − − 1)a, (S − − 1)b⟩ = (aj−1 − aj )∗ (bj−1 − bj ).
j=1

In particular, this shows A ≥ 0. Moreover, we have |sA (a, b)| ≤ 4∥a∥2 ∥b∥2
or equivalently ∥A∥ ≤ 4.
Next, let
(Qa)j = qj aj
for some sequence q ∈ ℓ∞ (N). Then

X
sQ (a, b) = qj a∗j bj
j=1

and |sQ (a, b)| ≤ ∥q∥∞ ∥a∥2 ∥b∥2 or equivalently ∥Q∥ ≤ ∥q∥∞ . If in addition
qj ≥ ε > 0, then sA+Q (a, b) = sA (a, b) + sQ (a, b) satisfies the assumptions of
the Lax–Milgram theorem and
(A + Q)a = b
has a unique solution a = (A + Q)−1 b for every given b ∈ ℓ2 (N). Moreover,
since (A + Q)−1 is bounded, this solution depends continuously on b. ⋄
Problem* 2.11. Let H1 , H2 be Hilbert spaces and let u ∈ H1 , v ∈ H2 . Show
that the operator
Af := ⟨u, f ⟩v
is bounded and compute its norm. Compute the adjoint of A.
Problem 2.12. Show that under the assumptions of Problem 1.50 one has
f (A)∗ = f # (A∗ ) where f # (z) = f (z ∗ )∗ .
Problem* 2.13. Prove (2.27). (Hint: Use ∥f ∥ = sup∥g∥=1 |⟨g, f ⟩| — com-
pare Theorem 1.5.)
Problem 2.14. Suppose A ∈ L (H1 , H2 ) has a bounded inverse A−1 ∈
L (H2 , H1 ). Show (A−1 )∗ = (A∗ )−1 .
Problem* 2.15. Show (2.29).
Problem* 2.16. Show that every operator A ∈ L (H) can be written as the
linear combination of two self-adjoint operators Re(A) := 12 (A + A∗ ) and
Im(A) := 2i1 (A − A∗ ). Moreover, every self-adjoint operator can be written
as a linear combination√ of two unitary operators. (Hint: For the last part
consider f± (z) = z ± i 1 − z 2 and Problems 1.50, 2.12.)
2.4. Orthogonal sums and tensor products 61

Problem 2.17 (Abstract Dirichlet problem). Show that the solution of


(2.31) is also the unique minimizer of
1 
h 7→ Re s(h, h) − ⟨h, g⟩
2
if s is nonnegative with s(w, w) ≥ ε∥w∥2 for all w ∈ H.

2.4. Orthogonal sums and tensor products


Given two Hilbert spaces H1 and H2 , we define their orthogonal sum H1 ⊕
H2 to be the set of all pairs (f1 , f2 ) ∈ H1 × H2 together with the scalar
product
⟨(g1 , g2 ), (f1 , f2 )⟩ := ⟨g1 , f1 ⟩H1 + ⟨g2 , f2 ⟩H2 . (2.32)
It is left as an exercise to verify that H1 ⊕ H2 is again a Hilbert space.
Moreover, H1 can be identified with {(f1 , 0)|f1 ∈ H1 }, and we can regard H1
as a subspace of H1 ⊕ H2 , and similarly for H2 . With this convention we have
H⊥1 = H2 . It is also customary to write f1 ⊕L f2 instead of (f1 , f2 ). In the
same way we can define the orthogonal sum nj=1 Hj of any finite number
of Hilbert spaces.
Example 2.14. For example we have nj=1 C = Cn and hence we will write
L
Ln
j=1 H =: H .
n ⋄
More generally, let Hj , j ∈ N, be a countable collection of Hilbert spaces
and define

M M∞ ∞
X
Hj := { fj | fj ∈ Hj , ∥fj ∥2Hj < ∞}, (2.33)
j=1 j=1 j=1

which becomes a Hilbert space with the scalar product



M ∞
M ∞
X
⟨ gj , fj ⟩ := ⟨gj , fj ⟩Hj . (2.34)
j=1 j=1 j=1
L∞
Example 2.15. j=1 C = ℓ2 (N). ⋄

Similarly, if H and H̃ are two Hilbert spaces, we define their tensor prod-
uct as follows: The elements should be products f ⊗ f˜ of elements f ∈ H
and f˜ ∈ H̃. Hence we start with the set of all finite linear combinations of
elements of H × H̃
Xn
F(H, H̃) := { αj (fj , f˜j )|(fj , f˜j ) ∈ H × H̃, αj ∈ C}. (2.35)
j=1
62 2. Hilbert spaces

Since we want (f1 + f2 ) ⊗ f˜ = f1 ⊗ f˜+ f2 ⊗ f˜, f ⊗ (f˜1 + f˜2 ) = f ⊗ f˜1 + f ⊗ f˜2 ,


and (αf ) ⊗ f˜ = f ⊗ (αf˜) = α(f ⊗ f˜) we consider F(H, H̃)/N (H, H̃), where
n
X Xn n
X
N (H, H̃) := span{ αj βk (fj , f˜k ) − ( αj fj , βk f˜k )} (2.36)
j,k=1 j=1 k=1

and write f ⊗ f˜ for the equivalence class of (f, f˜). By construction, every
element in this quotient space is a linear combination of elements of the type
f ⊗ f˜.
Next, we want to define a scalar product such that
⟨f ⊗ f˜, g ⊗ g̃⟩ = ⟨f, g⟩H ⟨f˜, g̃⟩ H̃ (2.37)
holds. To this end we set
Xn n
X n
X
s( αj (fj , f˜j ), βk (gk , g̃k )) = αj∗ βk ⟨fj , gk ⟩H ⟨f˜j , g̃k ⟩H̃ , (2.38)
j=1 k=1 j,k=1

which is a symmetric sesquilinear form on F(H, H̃). Moreover, one verifies


that s(f, g) = 0 for arbitrary f ∈ F(H, H̃) and g ∈ N (H, H̃) and thus
n
X n
X n
X
⟨ αj fj ⊗ f˜j , βk gk ⊗ g̃k ⟩ = αj∗ βk ⟨fj , gk ⟩H ⟨f˜j , g̃k ⟩H̃ (2.39)
j=1 k=1 j,k=1

is a symmetric sesquilinear form on F(H, H̃)/N (H, H̃). To show that this is in
fact a scalar product, we need to ensure positivity. Let f = i αi fi ⊗ f˜i ̸= 0
P

and pick orthonormal bases uj , ũk for span{fi }, span{f˜i }, respectively. Then
X X
f= αjk uj ⊗ ũk , αjk = αi ⟨uj , fi ⟩H ⟨ũk , f˜i ⟩H̃ (2.40)
j,k i

and we compute X
⟨f, f ⟩ = |αjk |2 > 0. (2.41)
j,k

The completion of F(H, H̃)/N (H, H̃) with respect to the induced norm is
called the tensor product H ⊗ H̃ of H and H̃.
Lemma 2.18. If uj , ũk are orthonormal bases for H, H̃, respectively, then
uj ⊗ ũk is an orthonormal basis for H ⊗ H̃.

Proof. That uj ⊗ ũk is an orthonormal set is immediate from (2.37). More-


over, since span{uj }, span{ũk } are dense in H, H̃, respectively, it is easy to
see that uj ⊗ ũk is dense in F(H, H̃)/N (H, H̃). But the latter is dense in
H ⊗ H̃. □

Note that this in particular implies dim(H ⊗ H̃) = dim(H) dim(H̃).


Example 2.16. We have H ⊗ Cn = Hn . ⋄
2.5. Applications to Fourier series 63

Example 2.17. A quantum mechanical particle which can only attain two
possible states is called a qubit. Its state space is accordingly C2 and the
two states, usually written as |0⟩ and |1⟩, are an orthonormal basis for C2 .
The state space for two qubits is given by the tensor product C2 ⊗ C2 ∼ = C4 .
An orthonormal basis is given by |00⟩ := |0⟩ ⊗ |0⟩, |01⟩ := |0⟩ ⊗ |1⟩, |10⟩ :=
|1⟩ ⊗ |0⟩, and |11⟩ := |1⟩ ⊗ |1⟩. The state space of n qubits is the n fold
n
tensor product of C2 (isomorphic to C2 ). ⋄
Example 2.18. P We have ℓ2 (N) ⊗ ℓ2 (N) = ℓ2 (N × N) by virtue of the identi-
fication (ajk ) 7→ jk ajk δ j ⊗ δ k where δ j is the standard basis for ℓ2 (N). In
fact, this follows from the previous lemma as in the proof of Theorem 2.6. ⋄
It is straightforward to extend the tensor product to any finite number
of Hilbert spaces. We even note
M∞ ∞
M
( Hj ) ⊗ H = (Hj ⊗ H), (2.42)
j=1 j=1

where equality has to be understood in the sense that both spaces are uni-
tarily equivalent by virtue of the identification

X ∞
X
( fj ) ⊗ f = fj ⊗ f. (2.43)
j=1 j=1

Problem 2.18. Show that f ⊗ f˜ = 0 if and only if f = 0 or f˜ = 0.


Problem 2.19. We have f ⊗ f˜ = g ⊗ g̃ ̸= 0 if and only if there is some
α ∈ C \ {0} such that f = αg and f˜ = α−1 g̃.
Problem* 2.20. Show (2.42).

2.5. Applications to Fourier series


We have already encountered the Fourier sine series during our treatment
of the heat equation in Section 1.1. Given an integrable function f we can
define its Fourier series
a0 X
(2.44)

S(f )(x) := + ak cos(kx) + bk sin(kx) ,
2
k∈N
where the corresponding Fourier coefficients are given by
1 π 1 π
Z Z
ak := cos(kx)f (x)dx, bk := sin(kx)f (x)dx. (2.45)
π −π π −π
At this point (2.44) is just a formal expression and the question in what sense
the above series converges lead to the development of harmonic analysis. For
example, does it converge at a given point (e.g. at every point of continuity
of f ) or when does it converge uniformly? We will give some first answers in
64 2. Hilbert spaces

3
D3 (x)

2
D2 (x)

1
D1 (x)

−π π

Figure 2.1. The Dirichlet kernels D1 , D2 , and D3

the present section and then come back later to this when we have further
tools at our disposal.
For our purpose the complex form
Z π
X 1
S(f )(x) = ˆ ikx
fk e , ˆ
fk := e−iky f (y)dy (2.46)
2π −π
k∈Z

will be more convenient. The connection is given via fˆ±k = ak ∓ib2


k
, k ∈ N0
(with the convention b0 = 0). In this case the n’th partial sum can be written
as
n Z π
X
ˆ 1
Sn (f )(x) := ikx
fk e = Dn (x − y)f (y)dy, (2.47)
2π −π
k=−n

where
n
X sin((n + 1/2)x)
Dn (x) = eikx = (2.48)
sin(x/2)
k=−n

is known as the Dirichlet kernel7 (to obtain the second form observe that
the left-hand side is a geometric series). Note that Dn (−x) = Dn (x) and
that |Dn (x)| has a global
R π maximum Dn (0) = 2n + 1 at x = 0. Moreover, by
Sn (1) = 1 we see that −π Dn (x)dx = 2π.
Since Z π
e−ikx eilx dx = 2πδk,l (2.49)
−π

7Peter Gustav Lejeune Dirichlet (1805 –1859), German mathematician


2.5. Applications to Fourier series 65

the functions ek (x) := (2π)−1/2 eikx are orthonormal in L2 (−π, π) and hence
the Fourier series is just the expansion with respect to this orthogonal set.
Hence we obtain
Theorem 2.19. For every square integrable function f ∈ L2 (−π, π), the
Fourier coefficients fˆk are square summable
Z π
X 1
|fˆk |2 = |f (x)|2 dx (2.50)
2π −π
k∈Z

and the Fourier series converges to f in the sense of L2 . Moreover, this is a


continuous bijection between L2 (−π, π) and ℓ2 (Z).

Proof. To show this theorem it suffices to show that the functions ek form
a basis. This will follow from Theorem 2.22 below (see the discussion after
this theorem). It will also follow as a special case of Theorem 3.11 below
(see the examples after this theorem) as well as from the Stone–Weierstraß
theorem — Problem 2.27. □

This gives a satisfactory answer in the Hilbert space L2 (−π, π) but does
not answer the question about pointwise or uniform convergence. The latter
will be the case if the Fourier coefficients are summable. First of all we note
that for integrable functions the Fourier coefficients will at least tend to zero.
Lemma 2.20 (Riemann–Lebesgue lemma). Suppose f ∈ L1 (−π, π), then
the Fourier coefficients fˆk converge to zero as |k| → ∞.

Proof. By our previous theorem this holds for continuous functions. But the
map f → fˆ is bounded from C[−π, π] ⊂ L1 (−π, π) to c0 (Z) (the sequences
vanishing as |k| → ∞) since |fˆk | ≤ (2π)−1 ∥f ∥1 and there is a unique exten-
sion to all of L1 (−π, π). □

It turns out that this result is best possible in general and we cannot say
more about the decay without additional assumptions on f . For example, if
f is periodic of period 2π and continuously differentiable, then integration
by parts shows Z π
1
ˆ
fk = e−ikx f ′ (x)dx. (2.51)
2πik −π
Then, since both k −1 and the Fourier coefficients of f ′ are square summa-
ble, we conclude that fˆ is absolutely summable and hence the Fourier series
converges uniformly. So we have a simple sufficient criterion for summa-
bility of the Fourier coefficients, but can we do better? Of course conti-
nuity of f is a necessary condition for absolute summability but this alone
will not even be enough for pointwise convergence as we will see in Exam-
ple 4.3. Moreover, continuity will not tell us more about the decay of the
66 2. Hilbert spaces

Fourier coefficients than what we already know in the integrable case from
the Riemann–Lebesgue lemma (see Example 4.4).
A few improvements are easy: (2.51) holds for any class of functions
for which integration by parts holds, e.g., piecewise continuously differen-
tiable functions or, slightly more general, absolutely continuous functions
(cf. Lemma 4.30 from [37]) provided one assumes that the derivative is
square integrable. However, for an arbitrary absolutely continuous func-
tion the Fourier coefficients might not be absolutely summable: For an
absolutely continuous function f we have a derivative which is integrable
(Theorem 4.29 from [37]) and hence the above formula combined with the
Riemann–Lebesgue lemma implies fˆk = o( k1 ). But on the other hand we
can choose an absolutely summable sequence ck which does not obey this
asymptotic requirement, say ck = k1 for k = l2 and ck = 0 else. Then
X X 1 2
f (x) := ck eikx = eil x (2.52)
l2
k∈Z l∈N

is a function with absolutely summable Fourier coefficients fˆk = ck (by


uniform convergence we can interchange summation and integration) but
which is not absolutely continuous. There are further criteria for absolute
summability of the Fourier coefficients, but no simple necessary and sufficient
one. A particularly simple sufficient one is:
0,γ
Theorem 2.21 (Bernstein8). Suppose that f ∈ Cper [−π, π] is Hölder con-
tinuous (cf. (1.69)) of exponent γ > 2 , then
1

X
|fˆk | ≤ Cγ ∥f ∥0,γ .
k∈Z

Proof. The proof starts with the observation that the Fourier coefficients of
fδ (x) := f (x−δ) are fˆk = e−ikδ fˆk . Now for δ := 2π
3 2
−m and 2m ≤ |k| < 2m+1

we have |e − 1| ≥ 3 implying
ikδ 2
Z π
X
ˆ 2 1 X ikδ 2 ˆ 2 1
|fk | ≤ |e − 1| |fk | = |fδ (x) − f (x)|2 dx
m m+1
3 6π −π
2 ≤|k|<2 k
1
≤ [f ]2γ δ 2γ
3
Now the sum on the left has 2 · 2m terms and hence Cauchy–Schwarz implies
r  γ
X 2(m+1)/2 2 2π
|fˆk | ≤ √ [f ]γ δ γ = 2(1/2−γ)m [f ]γ .
m m+1 3 3 3
2 ≤|k|<2

8Sergei Natanovich Bernstein (1880–1968), Russian mathematician


2.5. Applications to Fourier series 67

F3 (x)

F2 (x)
F1 (x)
1

−π π

Figure 2.2. The Fejér kernels F1 , F2 , and F3

Summing over m shows X


|fˆk | ≤ Cγ [f ]γ
k̸=0

provided γ > 1
2 and establishes the claim since |fˆ0 | ≤ ∥f ∥∞ . □

Note however, that the situation looks much brighter if one looks at mean
values
n−1 Z π
1X 1
S̄n (f )(x) := Sk (f )(x) = Fn (x − y)f (y)dy, (2.53)
n 2π −π
k=0
where
n−1  2
1X 1 sin(nx/2)
Fn (x) = Dk (x) = (2.54)
n n sin(x/2)
k=0
is the Fejér kernel.9 To see the second form we use the closed form for the
Dirichlet kernel to obtain
n−1
X sin((k + 1/2)x) n−1
1 X
nFn (x) = = Im ei(k+1/2)x
sin(x/2) sin(x/2)
k=0 k=0
inx sin(nx/2) 2
   
1 ix/2 e −1 1 − cos(nx)
= Im e = = .
sin(x/2) eix − 1 2 sin(x/2)2 sin(x/2)
The main differenceR to the Dirichlet kernel is positivity: Fn (x) ≥ 0. Of
π
course the property −π Fn (x)dx = 2π is inherited from the Dirichlet kernel.

9Lipót Fejér (1880–1959), Hungarian mathematician


68 2. Hilbert spaces

Theorem 2.22 (Fejér). Suppose f is continuous and periodic with period


2π. Then S̄n (f ) → f uniformly.

Proof. Let us set Fn = 0 outside [−π, π]. Then Fn (x) ≤ n sin(δ/2)


1
2 for

δ ≤ |x| ≤ π implies that a straightforward adaption of Lemma 1.2 to the


periodic case is applicable. □

In particular, this shows that the functions {ek }k∈Z are total in Cper [−π, π]
(continuous periodic functions) and hence also in Lp (−π, π) for 1 ≤ p < ∞
(Problem 2.26).
Note that for a given continuous function f this result shows that if
Sn (f )(x) converges, then it must converge to S̄n (f )(x) = f (x). We also
remark that one can extend this result (see Lemma 3.21 from [37]) to show
that for f ∈ Lp (−π, π), 1 ≤ p < ∞, one has S̄n (f ) → f in the sense of Lp .
As a consequence note that the Fourier coefficients uniquely determine f for
integrable f (for square integrable f this follows from Theorem 2.19).
Finally, we look at pointwise convergence.
Theorem 2.23. Suppose
f (x) − f (x0 )
(2.55)
x − x0
is integrable (e.g. f is Hölder continuous), then
Xn
lim fˆk eikx0 = f (x0 ). (2.56)
m,n→∞
k=−m

Proof. Without loss of generality we can assume x0 = 0 (by shifting x →


x − x0 modulo 2π implying fˆk → e−ikx0 fˆk ) and f (x0 ) = 0 (by linearity since
the claim is trivial for constant functions). Then by assumption
f (x)
g(x) :=
eix − 1
is integrable and f (x) = (eix − 1)g(x) implies fˆk = ĝk−1 − ĝk and hence
Xn
fˆk = ĝ−m−1 − ĝn .
k=−m
Now the claim follows from the Riemann–Lebesgue lemma. □

If we look at symmetric partial sums Sn (f ) we can do even better.


Corollary 2.24 (Dirichlet–Dini10 criterion). Suppose there is some α such
that
f (x0 + x) + f (x0 − x) − 2α
(2.57)
x
10Ulisse Dini (1845–1918), Italian mathematician and politician
2.5. Applications to Fourier series 69

is integrable. Then Sn (f )(x0 ) → α.

Proof. Without loss of generality we can assume x0 = 0. Now observe


(since Dn (−x) = Dn (x)) Sn (f )(0) = α + Sn (g)(0), where g(x) := 21 (f (x) +
f (−x)) − α and apply the previous result. □
Problem 2.21. Compute the Fourier series of Dn and Fn .
Problem 2.22. Compute the Fourier series of f (x) := x and use this to
show (Basel problem)

X 1 π2
= .
n2 6
n=1
(Hint: Parseval’s relation.)
Problem 2.23. Compute the Fourier series of f (x) := x2 and use this to
solve again the Basel problem (see the previous problem). (Hint: Evaluate
the series at x = π.)
π π 2
Problem 2.24. Show |Dn (x)| ≤ min(2n + 1, |x| ) and Fn (x) ≤ min(n, nx2 ).

0,γ
Problem 2.25. Show that if f ∈ Cper [−π, π] is Hölder continuous (cf.
(1.69)), then
[f ]γ π γ
 
ˆ
|fk | ≤ , k ̸= 0.
2 |k|
(Hint: What changes if you replace e−iky by e−ik(y+π/k) in (2.46)? Now make
a change of variables y → y − π/k in the integral.)
Problem 2.26. Show that Cper [−π, π] is dense in Lp (−π, π) for 1 ≤ p < ∞.
Problem 2.27. Show that the functions ek (x) := √12π eikx , k ∈ Z, form an
orthonormal basis for H = L2 (−π, π). (Hint: Start with K = [−π, π] where
−π and π are identified and use the Stone–Weierstraß theorem.)
Chapter 3

Compact operators

Typically, linear operators are much more difficult to analyze than matrices
and many new phenomena appear which are not present in the finite dimen-
sional case. So we have to be modest and slowly work our way up. A class
of operators which still preserves some of the nice properties of matrices is
the class of compact operators to be discussed in this chapter.

3.1. Compact operators


A linear operator A : X → Y defined between normed spaces X, Y is called
compact if every sequence Afn has a convergent subsequence whenever fn is
bounded. Equivalently (cf. Corollary B.20), A is compact if it maps bounded
sets to relatively compact ones. The set of all compact operators is denoted
by K (X, Y ). If X = Y we will just write K (X) := K (X, X) as usual.
Example 3.1. Every linear map between finite dimensional spaces is com-
pact by the Bolzano–Weierstraß theorem. Slightly more general, a bounded
operator is compact if its range is finite dimensional. ⋄
The following elementary properties of compact operators are left as an
exercise (Problem 3.1):

Theorem 3.1. Let X, Y , and Z be normed spaces. Every compact linear


operator is bounded, K (X, Y ) ⊆ L (X, Y ). Linear combinations of compact
operators are compact, that is, K (X, Y ) is a subspace of L (X, Y ). More-
over, the product of a bounded and a compact operator is again compact, that
is, A ∈ L (X, Y ), B ∈ K (Y, Z) or A ∈ K (X, Y ), B ∈ L (Y, Z) implies
BA ∈ K (X, Z).

71
72 3. Compact operators

In particular, the set of compact operators K (X) is an ideal within the


set of bounded operators. Moreover, if X is a Banach space this ideal is even
closed:
Theorem 3.2. Suppose X is a normed and Y a Banach space. Let An ∈
K (X, Y ) be a convergent sequence of compact operators. Then the limit A
is again compact.

Proof. Let fj0 be a bounded sequence. Choose a subsequence fj1 such that
A1 fj1 converges. From fj1 choose another subsequence fj2 such that A2 fj2
converges and so on. Since there might be nothing left from fjn as n → ∞, we
consider the diagonal sequence fj := fjj . By construction, fj is a subsequence
of fjn for j ≥ n and hence An fj is Cauchy for every fixed n. Now
∥Afj − Afk ∥ = ∥(A − An )(fj − fk ) + An (fj − fk )∥
≤ ∥A − An ∥∥fj − fk ∥ + ∥An fj − An fk ∥
shows that Afj is Cauchy since the first term can be made arbitrary small
by choosing n large and the second by the Cauchy property of An fj . □
Example 3.2. Let X := ℓp (N) and consider the operator
(Qa)j := qj aj
for some sequence q = (qj )∞ ∈ c0 (N) converging to zero. Let Qn be
j=1
associated with qj := qj for j ≤ n and qjn := 0 for j > n. Then the
n

range of Qn is finite dimensional and hence Qn is compact. Moreover, by


∥Qn − Q∥ = supj>n |qj | we see Qn → Q and thus Q is also compact by the
previous theorem. ⋄
Example 3.3. Let X := C 1 [0, 1], Y := C[0, 1] (cf. Problem 1.46) then the
embedding X ,→ Y is compact. Indeed, a bounded sequence in X has
both the functions and the derivatives uniformly bounded. Hence by the
mean value theorem the functions are equicontinuous and hence there is
a uniformly convergent subsequence by the Arzelà–Ascoli theorem (Theo-
rem 1.13). Of course the same conclusion holds if we take X := C 0,γ [0, 1] to
be Hölder continuous functions (cf. Theorem 1.23). ⋄
If A : X → Y is a bounded operator there is a unique extension A : X →
Y to the completion by Theorem 1.16. Moreover, if A ∈ K (X, Y ), then
A ∈ K (X, Y ) is immediate. That we also have A ∈ K (X, Y ) will follow
from the next lemma. In particular, it suffices to verify compactness on a
dense set.
Lemma 3.3. Let X, Y be normed spaces and A ∈ K (X, Y ). Let X, Y be
the completion of X, Y , respectively. Then A ∈ K (X, Y ), where A is the
unique extension of A (cf. Theorem 1.16).
3.1. Compact operators 73

Proof. Let fn ∈ X be a given bounded sequence. We need to show that Afn


has a convergent subsequence. Pick gn ∈ X such that ∥gn − fn ∥ ≤ n1 and
by compactness of A we can assume that Agn → g. But then ∥Afn − g∥ ≤
∥A∥∥fn − gn ∥ + ∥Agn − g∥ shows that Afn → g. □

One of the most important examples of compact operators are integral


operators. The proof will be based on the Arzelà–Ascoli theorem (Theo-
rem 1.13).
Lemma 3.4. Let X := C([a, b]) or X := L2cont (a, b). The integral operator
K : X → X defined by
Z b
(Kf )(x) := K(x, y)f (y)dy, (3.1)
a
where K(x, y) ∈ C([a, b] × [a, b]), is compact.

Proof. First of all note that K(., ..) is continuous on [a, b] × [a, b] and hence
uniformly continuous. In particular, for every ε > 0 we can find a δ > 0 such
that |K(y, t) − K(x, t)| ≤ ε for any t ∈ [a, b] whenever |y − x| ≤ δ. Moreover,
∥K∥∞ = supx,y∈[a,b] |K(x, y)| < ∞.
We begin with the case X := L2cont (a, b). Let g := Kf . Then
Z b Z b
|g(x)| ≤ |K(x, t)| |f (t)|dt ≤ ∥K∥∞ |f (t)|dt ≤ ∥K∥∞ ∥1∥ ∥f ∥,
a a
where
√ we have used Cauchy–Schwarz in the last step (note that ∥1∥ =
b − a). Similarly,
Z b
|g(x) − g(y)| ≤ |K(y, t) − K(x, t)| |f (t)|dt
a
Z b
≤ε |f (t)|dt ≤ ε∥1∥ ∥f ∥,
a

whenever |y − x| ≤ δ. Hence, if fn (x) is a bounded sequence in L2cont (a, b),


then gn := Kfn is bounded and equicontinuous and hence has a uniformly
convergent subsequence by the Arzelà–Ascoli theorem (Theorem 1.13). But
a uniformly convergent sequence is also convergent in the norm induced by
the scalar product. Therefore K is compact.
The case X := C([a, b]) follows by the same argument upon observing
Rb
a |f (t)|dt ≤ (b − a)∥f ∥∞ . □

Compact operators share many similarities with (finite) matrices as we


will see in the next section.
Problem* 3.1. Show Theorem 3.1.
74 3. Compact operators

Problem 3.2. Is the left shift (a1 , a2 , a3 , . . . ) 7→ (a2 , a3 , . . . ) compact in


ℓ2 (N)?
Problem 3.3 (Lions’ lemma).1 Let X, Y , and Z be Banach spaces. Assume
X is compactly embedded into Y and Y is continuously embedded into Z.
Show that for every ε > 0 there exists some C(ε) such that
∥x∥Y ≤ ε∥x∥X + C(ε)∥x∥Z .
d
Problem 3.4. Is the operator dx : C k [0, 1] → C[0, 1] compact for k = 1, 2?
(Hint: Problem 1.38 and Example 3.3.)
Problem 3.5. Is the multiplication operator Mt : C k [0, 1] → C[0, 1] with
Mt f (t) = tf (t) compact for k = 0, 1? (Hint: Problem 1.38 and Example 3.3.)
Problem 3.6. Let X := C([a, b]) or X := L2cont (a, b). Show that the
Volterra integral operator2 K : X → X defined by
Z x
(Kf )(x) := K(x, y)f (y)dy,
a

where K(x, y) ∈ C([a, b] × [a, b]), is compact.


Problem* 3.7. Show that the adjoint of the integral operator K on L2cont (a, b)
from Lemma 3.4 is the integral operator with kernel K(y, x)∗ :
Z b

(K f )(x) = K(y, x)∗ f (y)dy.
a

(Hint: Fubini.)

3.2. The spectral theorem for compact symmetric operators


Let H be an inner product space. A linear operator A : D(A) ⊆ H → H is
called symmetric if its domain is dense and if
⟨g, Af ⟩ = ⟨Ag, f ⟩ f, g ∈ D(A). (3.2)
If A is bounded (with D(A) = H), then A is symmetric precisely if A = A∗ ,
that is, if A is self-adjoint. However, for unbounded operators there is a
subtle but important difference between symmetry and self-adjointness (see
also Example 3.7 below).
A number z ∈ C is called eigenvalue of A if there is a nonzero vector
u ∈ D(A) such that
Au = zu. (3.3)

1Jacques-Louis Lions (1928–2001), French mathematician


2Vito Volterra (1860–1940), Italian mathematician
3.2. The spectral theorem for compact symmetric operators 75

The vector u is called a corresponding eigenvector in this case. The set of


all eigenvectors corresponding to z is called the eigenspace

Ker(A − z) (3.4)

corresponding to z. Here we have used the shorthand notation A − z for


A − zI. An eigenvalue is called (geometrically) simple if there is only one
linearly independent eigenvector.
Example 3.4. Let H := ℓ2 (N) and consider the shift operators (S ± a)j :=
aj±1 (with a0 := 0). Suppose z ∈ C is an eigenvalue, then the corresponding
eigenvector u must satisfy uj±1 = zuj . For S − the special case j = 1 gives
0 = u0 = zu1 . So either z = 0 and u = 0 or z ̸= 0 and again u = 0. Hence
there are no eigenvalues. For S + we get uj = z j u1 and this will give an
element in ℓ2 (N) if and only if |z| < 1. Hence z with |z| < 1 is an eigenvalue.
All these eigenvalues are simple. ⋄
Example 3.5. Let H := ℓ2 (N) and consider the multiplication operator
(Qa)j := qj aj with a bounded sequence q ∈ ℓ∞ (N). Suppose z ∈ C is an
eigenvalue, then the corresponding eigenvector u must satisfy (qj − z)uj = 0.
Hence every value qj is an eigenvalue with corresponding eigenvector u := δ j .
If there is only one j with z = qj the eigenvalue is simple (otherwise the
numbers of linearly independent eigenvectors equals the number of times z
appears in the sequence q). If z is different from all entries of the sequence,
then u = 0 and z is no eigenvalue. ⋄
Note that in the last example Q will be self-adjoint if and only if q is real-
valued and hence if and only if all eigenvalues are real-valued. Moreover, the
corresponding eigenfunctions are orthogonal. This has nothing to do with
the simple structure of our operator and is in fact always true.

Theorem 3.5. Let A be symmetric. Then all eigenvalues are real and eigen-
vectors corresponding to different eigenvalues are orthogonal.

Proof. Suppose λ is an eigenvalue with corresponding normalized eigen-


vector u. Then λ = ⟨u, Au⟩ = ⟨Au, u⟩ = λ∗ , which shows that λ is real.
Furthermore, if Auj = λj uj , j = 1, 2, we have

(λ1 − λ2 )⟨u1 , u2 ⟩ = ⟨Au1 , u2 ⟩ − ⟨u1 , Au2 ⟩ = 0

finishing the proof. □

Note that while eigenvectors corresponding to the same eigenvalue λ will


in general not automatically be orthogonal, we can of course replace each
set of eigenvectors corresponding to λ by a set of orthonormal eigenvectors
having the same linear span (e.g. using Gram–Schmidt orthogonalization).
76 3. Compact operators

Example 3.6. Let H := ℓ2 (N) and consider the Jacobi operator J := 12 (S + +


S − ):
1
(Jc)j := (cj+1 + cj−1 )
2
with the convention c0 = 0. Recall that J ∗ = J. If we look for an eigenvalue
Ju = zu, we need to solve the corresponding recursion uj+1 = 2zuj − uj−1
starting from u0 = 0 (our convention) and u1 = 1 (normalization). Like
an ordinary differential equation, a linear recursion relation with constant
coefficients can be solved by an exponential ansatz uj = k j which leads to the
characteristic polynomial k 2 = 2zk − 1. This gives two linearly independent
solutions and our requirements lead us to
k j − k −j p
uj (z) = , k = z − z 2 − 1.
k − k −1

Note that k −1 = z+ z 2 − 1 and in the case k = z = ±1 the above expression
has to be understood as its limit uj (±1) = (±1)j+1 j. In fact, Uj (z) :=
uj+1 (z) are polynomials of degree j known as Chebyshev polynomials3
of the second kind.
Now for z ∈ R \ [−1, 1] we have |k| < 1 and uj explodes exponentially.
For z ∈ [−1, 1] we have |k| = 1 and hence we can write k = eiκ with κ ∈ R.
Thus uj = sin(κj)
sin(κ) is oscillating. So for no value of z ∈ R our potential
eigenvector u is square summable and thus J has no eigenvalues. ⋄
Example 3.7. Let H0 := L2cont (−π, π) and consider the operator
1 d
A := , D(A) := C 1 [−π, π].
i dx
To compute its eigenvalues we need to solve the differential equation
−iu′ (x) = zu(x)
whose solution is given by
u(x) := eizx
and hence every z ∈ C is an eigenvalue. To investigate symmetry we use
integration by parts which shows
1 π
Z
⟨g, Af ⟩ = g(x)∗ f ′ (x)dx
i −π
 1 π ′ ∗
Z
1 ∗ ∗
= g(π) f (π) − g(−π) f (−π) − g (x) f (x)dx
i i −π
1
= g(π)∗ f (π) − g(−π)∗ f (−π) + ⟨Ag, f ⟩

i
for g, f ∈ D(A). Since the boundary terms will not vanish in general, we
conclude that our operator is not symmetric. This could also be verified
3Pafnuty Chebyshev (1821–1894), Russian mathematician
3.2. The spectral theorem for compact symmetric operators 77

by observing that the eigenfunctions for (e.g.) z = 0 and z = 1/2 are not
orthogonal. However, the above formula also shows that we can obtain a
symmetric operator by further restricting the domain. For example, we can
impose Dirichlet boundary conditions4 and consider
1 d
A0 := , D(A0 ) := {f ∈ C 1 [−π, π]|f (−π) = f (π) = 0}.
i dx
Then the above computation shows that A0 is symmetric since the boundary
terms vanish for g, f ∈ D(A0 ). Moreover, note that this domain is still
dense (to see this note that both 1 and x can be approximated by functions
vanishing at the boundary and that every polynomial can be decomposed into
a linear part and a polynomial which vanishes at the boundary). However,
note that since the exponential function has no zeros, we loose all eigenvalues!
The reason for this unfortunate behavior is that A and A0 are adjoint to
each other in the sense that ⟨g, A0 f ⟩ = ⟨Ag, f ⟩ for f ∈ D(A0 ) and g ∈ D(A).
Hence, at least formally, the adjoint of A0 is A and hence A0 is symmetric
but not self-adjoint. This gives a first hint at the fact, that symmetry is not
the same as self-adjointness for unbounded operators.
Returning to our original problem, another choice are periodic boundary
conditions
1 d
Ap := , D(Ap ) := {f ∈ C 1 [−π, π]|f (−π) = f (π)}.
i dx
Now we have increased the domain (in comparison to A0 ) such that we are
still symmetric, but such that A is no longer adjoint to Ap . Moreover, we
loose some of the eigenfunctions, but not all:
1
αn := n, un (x) := √ einx , n ∈ Z.

In fact, the eigenfunctions are just the orthonormal basis from the Fourier
series. ⋄
The previous examples show that in the infinite dimensional case sym-
metry is not enough to guarantee existence of even a single eigenvalue. In
order to always get this, we will need an extra condition. In fact, we will
see that compactness provides a suitable extra condition to obtain an or-
thonormal basis of eigenfunctions. The crucial step is to prove existence of
one eigenvalue, the rest then follows as in the finite dimensional case.

Theorem 3.6. Let H0 be an inner product space. A symmetric compact


operator A has an eigenvalue α1 which satisfies |α1 | = ∥A∥.

4Peter Gustav Lejeune Dirichlet (1805 –1859), German mathematician


78 3. Compact operators

Proof. We set α := ∥A∥ and assume α ̸= 0 (i.e., A ̸= 0) without loss of


generality. Since
∥A∥2 = sup ∥Af ∥2 = sup ⟨Af, Af ⟩ = sup ⟨f, A2 f ⟩
f :∥f ∥=1 f :∥f ∥=1 f :∥f ∥=1

there exists a normalized sequence un such that


lim ⟨un , A2 un ⟩ = α2 .
n→∞

Since A is compact, it is no restriction to assume that A2 un converges, say


limn→∞ α12 A2 un =: u. Now
∥(A2 − α2 )un ∥2 = ∥A2 un ∥2 − 2α2 ⟨un , A2 un ⟩ + α4
≤ 2α2 (α2 − ⟨un , A2 un ⟩)
(where we have used ∥A2 un ∥ ≤ ∥A∥∥Aun ∥ ≤ ∥A∥2 ∥un ∥ = α2 ) implies
limn→∞ (A2 un − α2 un ) = 0 and hence limn→∞ un = u. In addition, u is
a normalized eigenvector of A2 since (A2 − α2 )u = 0. Factorizing this last
equation according to (A − α)u = v and (A + α)v = 0 shows that either
v ̸= 0 is an eigenvector corresponding to −α or v = 0 and hence u ̸= 0 is an
eigenvector corresponding to α. □

Note that for a bounded operator A, there cannot be an eigenvalue with


absolute value larger than ∥A∥, that is, the set of eigenvalues is bounded by
∥A∥ (Problem 3.8).
Now consider a symmetric compact operator A with eigenvalue α1 (as
above) and corresponding normalized eigenvector u1 . Setting
H1 := {u1 }⊥ = {f ∈ H|⟨u1 , f ⟩ = 0} (3.5)
we can restrict A to H1 since f ∈ H1 implies
⟨u1 , Af ⟩ = ⟨Au1 , f ⟩ = α1 ⟨u1 , f ⟩ = 0 (3.6)
and hence Af ∈ H1 . Denoting this restriction by A1 , it is not hard to see
that A1 is again a symmetric compact operator. Hence we can apply Theo-
rem 3.6 iteratively to obtain a sequence of eigenvalues αj with corresponding
normalized eigenvectors uj . Moreover, by construction, uj is orthogonal to
all uk with k < j and hence the eigenvectors {uj } form an orthonormal set.
By construction we also have |αj | = ∥Aj+1 ∥ ≤ ∥Aj ∥ = |αj−1 |. This proce-
dure will not stop unless H is finite dimensional. However, note that αj = 0
for j ≥ n might happen if An = 0.
Theorem 3.7 (Hilbert–Schmidt; Spectral theorem for compact symmetric
operators). Suppose H is an infinite dimensional Hilbert space and A : H →
H is a compact symmetric operator. Then there exists a sequence of real
3.2. The spectral theorem for compact symmetric operators 79

eigenvalues αj converging to 0. The corresponding normalized eigenvectors


uj form an orthonormal set and every f ∈ H can be written as

X
f= ⟨uj , f ⟩uj + h, (3.7)
j=1
where h is in the kernel of A, that is, Ah = 0.
In particular, if 0 is not an eigenvalue, then the eigenvectors form an
orthonormal basis (in addition, H need not be complete in this case).

Proof. Existence of the eigenvalues αj and the corresponding eigenvectors


uj has already been established. Since the sequence |αj | is decreasing it has a
limit ε ≥ 0 and we have |αj | ≥ ε. If this limit is nonzero, then vj = αj−1 uj is
a bounded sequence (∥vj ∥ ≤ 1ε ) for which Avj has no convergent subsequence
since ∥Avj − Avk ∥2 = ∥uj − uk ∥2 = 2, a contradiction.
Next, setting
n
X
fn := ⟨uj , f ⟩uj ,
j=1
we have
∥A(f − fn )∥ ≤ |αn+1 |∥f − fn ∥ ≤ |αn+1 |∥f ∥
since f − fn ∈ Hn and ∥An ∥ = |αn+1 |. Letting n → ∞ shows A(f∞ − f ) = 0
proving (3.7). Finally, note that without completeness f∞ might not be
well-defined unless h = 0. □

By applying A to (3.7) we obtain the following canonical form of compact


symmetric operators.
Corollary 3.8. Every compact symmetric operator A can be written as
N
X
Af = αj ⟨uj , f ⟩uj , (3.8)
j=1

where (αj )N
j=1 are the nonzero eigenvalues with corresponding eigenvectors
uj from the previous theorem.

Remark: There are two cases where our procedure might fail to construct
an orthonormal basis of eigenvectors. One case is where there is an infinite
number of nonzero eigenvalues. In this case αn never reaches 0 and all eigen-
vectors corresponding to 0 are missed. In the other case, 0 is reached, but
we might still miss some of the eigenvectors corresponding to 0 (if the kernel
is not separable or if we do not choose the vectors uj properly). In any
case, by adding vectors from the kernel (which are automatically eigenvec-
tors), one can always extend the eigenvectors uj to an orthonormal basis of
eigenvectors.
80 3. Compact operators

Corollary 3.9. Every compact symmetric operator A has an associated or-


thonormal basis of eigenvectors {uj }j∈J . The corresponding unitary map
U : H → ℓ2 (J), f 7→ {⟨uj , f ⟩}j∈J diagonalizes A in the sense that U AU −1 is
the operator which multiplies each basis vector δ j = U uj by the corresponding
eigenvalue αj .
Example 3.8. Let a, b ∈ c0 (N) be real-valued sequences and consider the
operator
(Jc)j := aj cj+1 + bj cj + aj−1 cj−1
in ℓ2 (N). If A, B denote the multiplication operators by the sequences a, b,
respectively, then we already know that A and B are compact. Moreover,
using the shift operators S ± we can write
J = AS + + B + S − A,
which shows that J is self-adjoint since A∗ = A, B ∗ = B, and (S ± )∗ =
S ∓ . Hence we can conclude that J has a countable number of eigenvalues
converging to zero and a corresponding orthonormal basis of eigenvectors.
Note that in this case it is not possible to get a closed expression for either
the eigenvalues or the eigenvectors. ⋄
In particular, in the new picture it is easy to define functions of our
operator (thus extending the functional calculus from Problem 1.50). To this
end set Σ := {αj }j∈J and denote by B(Σ) the Banach algebra of bounded
functions F : Σ → C together with the sup norm.
Corollary 3.10 (Functional calculus). Let A be a compact symmetric op-
erator with associated orthonormal basis of eigenvectors {uj }j∈J and corre-
sponding eigenvalues {αj }j∈J . Suppose F ∈ B(Σ), then
X
F (A)f = F (αj )⟨uj , f ⟩uj (3.9)
j∈J

defines a continuous algebra homomorphism from the Banach algebra B(Σ)


to the algebra L (H) with 1(A) = I and I(A) = A. Moreover F (A)∗ = F ∗ (A),
where F ∗ is the function which takes complex conjugate values.

Proof. This is straightforward to check for multiplication operators in ℓ2 (J)


and hence the result follows by the previous corollary. □

In many applications F will be given by a function on R (or at least on


[−∥A∥, ∥A∥]) and, since only the values F (αj ) are used, two functions which
agree on all eigenvalues will give the same result.
As a brief application we will say a few words about general spectral
theory for bounded operators A ∈ L (X) in a Banach space X. In the finite
3.2. The spectral theorem for compact symmetric operators 81

dimensional case, the spectrum is precisely the set of eigenvalues. In the


infinite dimensional case one defines the spectrum as
σ(A) := C \ {z ∈ C|∃(A − z)−1 ∈ L (X)}. (3.10)
It is important to emphasize that the inverse is required to exist as a bounded
operator. Hence there are several ways in which this can fail: First of all,
A − z could not be injective. In this case z is an eigenvalue and thus all
eigenvalues belong to the spectrum. Secondly, it could not be surjective.
And finally, even if it is bijective, it could be unbounded. However, it will
follow form the open mapping theorem that this last case cannot happen
for a bounded operator. The inverse of A − z for z ∈ C \ σ(A) is known
as the resolvent of A and plays a crucial role in spectral theory. Using
Problem 1.49 one can show that the complement of the spectrum is open,
and hence the spectrum is closed. Since we will discuss this in detail in
Chapter 5, we will not pursue this here but only look at our special case of
symmetric compact operators.
To compute the inverse of A − z we will use the functional calculus and
consider F (α) = α−z1
. Of course this function is unbounded on R but if z
is neither an eigenvalue nor zero it is bounded on Σ and hence satisfies our
requirements. Then
X 1
RA (z)f := ⟨uj , f ⟩uj (3.11)
αj − z
j∈J

satisfies (A − z)RA (z) = RA (z)(A − z) = I, that is, RA (z) = (A − z)−1 ∈


L (H). Of course, if z is an eigenvalue, then the above formula breaks down.
However, in the infinite dimensional case it also breaks down if z = 0 even
if 0 is not an eigenvalue! In this case the above definition will still give an
operator which is the inverse of A − z, however, since the sequence αj−1 is
unbounded, so will be the corresponding multiplication operator in ℓ2 (J) and
the sum in (3.11) will only converge if (αj−1 ⟨uj , f ⟩)j∈J ∈ ℓ2 (J). So in the
infinite dimensional case 0 is in the spectrum even if it is not an eigenvalue.
In particular,
σ(A) = {αj }j∈J . (3.12)
αj
Moreover, if we use 1
αj −z = z(αj −z) − 1
z we can rewrite this as
 
N
1 X αj
RA (z)f =  ⟨uj , f ⟩uj − f 
z αj − z
j=1

where it suffices to take the sum over all nonzero eigenvalues.


Before we apply these results to Sturm–Liouville operators, we look at a
toy problem which illustrates the main ideas.
82 3. Compact operators

Example 3.9. We continue Example 3.7 and would like to apply the spectral
theorem to our operator Ap . However, since Ap is unbounded (its eigenvalues
are not bounded), it cannot be compact and hence we cannot apply Theo-
rem 3.7 directly to Ap . However, the trick is to apply it to the resolvent. To
this end we need to solve the inhomogeneous differential equation

−if ′ (x) − z f (x) = g(x),

whose solution is (variation of constants)


Z x
iz(x+π)
f (x) = f (−π)e +i eiz(x−y) g(y)dy.
−π

The requirement f ∈ D(Ap ) gives


Z π
2πiz
f (−π) = f (π) = f (−π)e +i eiz(π−y) g(y)dy
−π

implying
π x
ie2πiz
Z Z
−1 iz(x−y)
(Ap − z) g(x) = e g(y)dy + i eiz(x−y) g(y)dy
1 − e2πiz −π −π

for z ̸∈ Z. Indeed, for every g ∈ H0 we have constructed f ∈ D(Ap ) such that


(Ap − z)f = g provided z ̸∈ Z. In particular, Ap − z is surjective in this case.
Moreover, since z ̸∈ Z is no eigenvalue, Ap − z is also injective and hence
bijective in this case. Thus the inverse exists and is given by (Ap −z)−1 g := f
as claimed. (Alternatively we could have also checked (Ap − z)−1 (Ap − z)f
for f ∈ D(Ap ) — please remember that a right inverse might not be a left
inverse in the infinite dimensional case; cf. Problem 2.10).
That the resolvent is compact follows from Lemma 3.4 and Problem 3.6.
Moreover, that it is symmetric for z ∈ R \ Z could be checked using Fubini,
but this is not necessary since it comes for free from symmetry of Ap (cf.
Problem 3.13).
Finally observe, that the eigenvalues of the resolvent are n−z
1
while
the eigenvectors are the same as those of Ap . This shows that un (x) :=
(2π)−1/2 einx is an orthonormal basis for L2cont (−π, π).
It is interesting to observe, that, while the algebraic formulas remain
the same, most of our claims break down if we try to replace the Hilbert
space L2cont (−π, π) by the Banach space C[−π, π]: Of course we have the
same eigenvalues and eigenfunctions and also the formula for the resolvent
remains valid. The resolvent is still compact, but of course symmetry is not
defined in this context. Indeed, we already know that the Fourier series of
a continuous function does not converge in general. This emphasizes the
special role of Hilbert spaces and symmetry. ⋄
3.3. Applications to Sturm–Liouville operators 83

Problem 3.8. Show that if A ∈ L (H), then every eigenvalue α satisfies


|α| ≤ ∥A∥.
Problem 3.9. Suppose A ∈ L (H) is idempotent, that is, A2 = I. Show
that the only possible eigenvalues are ±1. Show that there are projections P±
onto the corresponding eigenspaces with P+ + P− = I.
Problem 3.10. Find the eigenvalues and eigenfunctions of the integral op-
erator K ∈ L (L2cont (0, 1)) given by
Z 1
(Kf )(x) := u(x)v(y)f (y)dy,
0
where u, v ∈ C([0, 1]) are some given continuous functions.
Problem 3.11. Find the eigenvalues and eigenfunctions of the integral op-
erator K ∈ L (L2cont (0, 1)) given by
Z 1
(Kf )(x) := 2 (2xy − x − y + 1)f (y)dy.
0

Problem 3.12. Let H := L2cont (0, 1). Show that the Volterra integral opera-
tor K : H → H from Problem 3.6 has no eigenvalues except for 0. Show that
0 is no eigenvalue if K(x, y) > 0. Why does this not contradict Theorem 3.6?
(Hint: Gronwall’s inequality.)
Problem* 3.13. Show that the resolvent RA (z) = (A − z)−1 (provided it
exists and is densely defined) of a symmetric operator A is again symmetric
for z ∈ R. (Hint: g ∈ D(RA (z)) if and only if g = (A − z)f for some
f ∈ D(A).)

3.3. Applications to Sturm–Liouville operators


Now, after all this hard work, we can show that our Sturm–Liouville operator
d2
L := − + q(x), (3.13)
dx2
where q is continuous and real, defined on
D(L) := {f ∈ C 2 [0, 1]|f (0) = f (1) = 0} ⊂ L2cont (0, 1), (3.14)
has an orthonormal basis of eigenfunctions.
The corresponding eigenvalue equation Lu = zu explicitly reads
−u′′ (x) + q(x)u(x) = zu(x). (3.15)
It is a second order homogeneous linear ordinary differential equation and
hence has two linearly independent solutions. In particular, specifying two
initial conditions, e.g. u(0) = 0, u′ (0) = 1 determines the solution uniquely.
Hence, if we require u(0) = 0, the solution is determined up to a multiple
84 3. Compact operators

and consequently the additional requirement u(1) = 0 cannot be satisfied by


a nontrivial solution in general. However, there might be some z ∈ C for
which the solution corresponding to the initial conditions u(0) = 0, u′ (0) = 1
happens to satisfy u(1) = 0 and these are precisely the eigenvalues we are
looking for.
Note that the fact that L2cont (0, 1) is not complete causes no problems
since we can always replace it by its completion H = L2 (0, 1).
We first verify that L is symmetric:
Z 1
⟨f, Lg⟩ = f (x)∗ (−g ′′ (x) + q(x)g(x))dx
0
Z 1 Z 1
′ ∗ ′
= f (x) g (x)dx + f (x)∗ q(x)g(x)dx
0 0
Z 1 Z 1
= −f ′′ (x)∗ g(x)dx + f (x)∗ q(x)g(x)dx (3.16)
0 0
= ⟨Lf, g⟩.

Here we have used integration by parts twice (the boundary terms vanish
due to our boundary conditions f (0) = f (1) = 0 and g(0) = g(1) = 0).
Of course we want to apply Theorem 3.7 and for this we would need to
show that L is compact. But this task is bound to fail, since L is not even
bounded (see Example 1.18)!
So here comes the trick (cf. Example 3.9): If L is unbounded its inverse
L−1 might still be bounded. Moreover, L−1 might even be compact and this
is the case here! Since L might not be injective (0 might be an eigenvalue),
we consider the resolvent RL (z) := (L − z)−1 , z ∈ C.
In order to compute the resolvent, we need to solve the inhomogeneous
equation (L − z)f = g. This can be done using the variation of constants
formula from ordinary differential equations which determines the solution
up to an arbitrary solution of the homogeneous equation. This homogeneous
equation has to be chosen such that f ∈ D(L), that is, such that f (0) =
f (1) = 0.
Define
Z x
u+ (z, x)  
f (x) := u− (z, t)g(t)dt
W (z) 0
u− (z, x)  1
Z 
+ u+ (z, t)g(t)dt , (3.17)
W (z) x

where u± (z, x) are the solutions of the homogeneous differential equation


−u′′± (z, x)+(q(x)−z)u± (z, x) = 0 satisfying the initial conditions u− (z, 0) =
3.3. Applications to Sturm–Liouville operators 85

0, u′− (z, 0) = 1 respectively u+ (z, 1) = 0, u′+ (z, 1) = 1 and


W (z) := W (u+ (z), u− (z)) = u′− (z, x)u+ (z, x) − u− (z, x)u′+ (z, x) (3.18)
is the Wronski determinant,5 which is independent of x (compute its
derivative!).
Of course this formula implicitly assume W (z) ̸= 0. This condition is
not surprising since the zeros of the Wronskian are precisely the eigenvalues:
In fact, W (z) evaluated at x = 0 gives W (z) = u+ (z, 0) and hence shows
that the Wronskian vanishes if and only if u+ (z, x) satisfies both boundary
conditions and is thus an eigenfunction.
Returning to (3.17) we clearly have f (0) = 0 since u− (z, 0) = 0 and
similarly f (1) = 0 since u+ (z, 1) = 0. Furthermore, f is differentiable and a
straightforward computation verifies
u′+ (z, x)  x
Z 

f (x) = u− (z, t)g(t)dt
W (z) 0

u− (z, x)  Z 1 
+ u+ (z, t)g(t)dt . (3.19)
W (z) x
Thus we can differentiate once more giving
u′′ (z, x)  x
Z 
f ′′ (x) = + u− (z, t)g(t)dt
W (z) 0
u′′− (z, x)  1
Z 
+ u+ (z, t)g(t)dt − g(x)
W (z) x
=(q(x) − z)f (x) − g(x). (3.20)
In summary, f is in the domain of L and satisfies (L−z)f = g. In particular,
L − z is surjective for W (z) ̸= 0. Hence we conclude that L − z is bijective
for W (z) ̸= 0
Introducing the Green function6

1 u+ (z, x)u− (z, t), x ≥ t,
G(z, x, t) := (3.21)
W (u+ (z), u− (z)) u+ (z, t)u− (z, x), x ≤ t,
we see that (L − z)−1 is given by
Z 1
(L − z)−1 g(x) = G(z, x, t)g(t)dt. (3.22)
0
Symmetry of L − z for z ∈ R also implies symmetry of RL (z) for z ∈
R (Problem 3.13) but this can also be verified directly using G(z, x, t) =
G(z, t, x) (Problem 3.14). From Lemma 3.4 it follows that it is compact.

5Józef Maria Hoene-Wroński (1776–1853), Polish philosopher and mathematician


6George Green (1793–1841), British mathematical physicist
86 3. Compact operators

Hence Theorem 3.7 applies to (L − z)−1 once we show that we can find a
real z which is not an eigenvalue.
Theorem 3.11. The Sturm–Liouville operator L has a countable number of
discrete and simple eigenvalues En which accumulate only at ∞. They are
bounded from below and can hence be ordered as follows:
min q(x) < E0 < E1 < · · · . (3.23)
x∈[0,1]

The corresponding normalized eigenfunctions un form an orthonormal basis


for L2cont (0, 1), that is, every f ∈ L2cont (0, 1) can be written as

X
f (x) = ⟨un , f ⟩un (x). (3.24)
n=0
Moreover, for f ∈ D(L) this series is absolutely uniformly convergent.

Proof. If Ej is an eigenvalue with corresponding normalized eigenfunction


uj we have
Z 1
|u′j (x)|2 + q(x)|uj (x)|2 dx ≥ min q(x) (3.25)

Ej = ⟨uj , Luj ⟩ =
0 x∈[0,1]

where we have used integration by parts as in (3.16). Note that equality


could only occur if uj is constant, which is incompatible with our boundary
conditions. Hence the eigenvalues are bounded from below.
Now pick a value λ ∈ R such that RL (λ) exists (λ < minx∈[0,1] q(x)
say). By Lemma 3.4 RL (λ) is compact and by Lemma 3.3 this remains
true if we replace L2cont (0, 1) by its completion. By Theorem 3.7 there are
eigenvalues αn of RL (λ) with corresponding eigenfunctions un . Moreover,
RL (λ)un = αn un is equivalent to Lun = (λ + α1n )un , which shows that
En = λ+ α1n are eigenvalues of L with corresponding eigenfunctions un . Now
everything follows from Theorem 3.7 except that the eigenvalues are simple.
To show this, observe that if un and vn are two different eigenfunctions
corresponding to En , then un (0) = vn (0) = 0 implies W (un , vn ) = 0 and
hence un and vn are linearly dependent.
To show that (3.24) converges uniformly if f ∈ D(L) we begin by writing
f = RL (λ)g, g ∈ L2cont (0, 1), implying

X ∞
X ∞
X
⟨un , f ⟩un (x) = ⟨RL (λ)un , g⟩un (x) = αn ⟨un , g⟩un (x).
n=0 n=0 n=0
Moreover, the Cauchy–Schwarz inequality shows
2
n
X n
X n
X
|αj ⟨uj , g⟩uj (x)| ≤ |⟨uj , g⟩|2 |αj uj (x)|2 .
j=m j=m j=m
3.3. Applications to Sturm–Liouville operators 87

Now, by (2.18), ∞ j=0 |⟨uj , g⟩| = ∥g∥ and hence the first term is part of a
2 2
P
convergent series. Similarly, the second term can be estimated independent
of x since
Z 1
αn un (x) = RL (λ)un (x) = G(λ, x, t)un (t)dt = ⟨un , G(λ, x, .)⟩
0

implies
n
X ∞
X Z 1
2 2
|αj uj (x)| ≤ |⟨uj , G(λ, x, .)⟩| = |G(λ, x, t)|2 dt ≤ M (λ)2 ,
j=m j=0 0

where M (λ) := maxx,t∈[0,1] |G(λ, x, t)|, again by (2.18). □

Moreover, it is even possible to weaken our assumptions for uniform


convergence. To this end we consider the sequilinear form associated with
L:
Z 1
f ′ (x)∗ g ′ (x) + q(x)f (x)∗ g(x) dx (3.26)

sL (f, g) := ⟨f, Lg⟩ =
0

for f, g ∈ D(L), where we have used integration by parts as in (3.16). In


fact, the above formula continues to hold for f in a slightly larger class of
functions,

Q(L) := {f ∈ Cp1 [0, 1]|f (0) = f (1) = 0} ⊇ D(L), (3.27)

which we call the form domain of L. Here Cp1 [a, b] denotes the set of
piecewise continuously differentiable functions f in the sense that f is con-
tinuously differentiable except for a finite number of points at which it is
continuous and the derivative has limits from the left and right. In fact, any
class of functions for which the partial integration needed to obtain (3.26)
can be justified would be good enough (e.g. the set of absolutely continuous
functions to be discussed in Section 4.4 from [37]).

Lemma 3.12. For a regular Sturm–Liouville problem (3.24) converges ab-


solutely uniformly provided f ∈ Q(L).

Proof. By replacing L → L − E0 + 1 (this will shift the eigenvalues En →


En − E0 + 1 and leave the eigenvectors unchanged) we can assume qL (f ) :=
sL (f, f ) > 0 and Ej > 0 without loss of generality.
Now let f ∈ Q(L) and consider (3.24). Then, observing that sL (f, g) is
a symmetric sesquilinear form (after our shift it is even a scalar product) as
88 3. Compact operators

well as sL (f, uj ) = Ej ⟨f, uj ⟩ one obtains


n
X n
X

0 ≤qL f − ⟨uj , f ⟩uj = qL (f ) − ⟨uj , f ⟩sL (f, uj )
j=m j=m
n
X n
X
− ⟨uj , f ⟩∗ sL (uj , f ) + ⟨uj , f ⟩∗ ⟨uk , f ⟩sL (uj , uk )
j=m j,k=m
n
X
=qL (f ) − Ej |⟨uj , f ⟩|2
j=m

which implies
n
X
Ej |⟨uj , f ⟩|2 ≤ qL (f ).
j=m

In particular, note that this estimate applies to f (y) = G(λ, x, y). Now from
the proof of Theorem 3.11 (with λ = 0 and αj = Ej−1 ) we have uj (x) =
Ej ⟨uj , G(0, x, .)⟩ and hence
n
X n
X
|⟨uj , f ⟩uj (x)| = Ej |⟨uj , f ⟩⟨uj , G(0, x, .)⟩|
j=m j=m
 1/2
n
X n
X
≤ Ej |⟨uj , f ⟩|2 Ej |⟨uj , G(0, x, .)⟩|2 
j=m j=m
 1/2
n
X
≤ Ej |⟨uj , f ⟩|2  qL (G(0, x, .))1/2 ,
j=m

where we have usedPthe Cauchy–Schwarz inequality for the weighted scalar


product (fj , gj ) 7→ j fj∗ gj Ej . Finally note that qL (G(0, x, .)) is continuous
with respect to x and hence can be estimated by its maximum over [0, 1].
This shows that the sum (3.24) is absolutely convergent, uniformly with
respect to x. □

Another consequence of the computations in the previous proof is also


worthwhile noting:

Corollary 3.13. We have



X 1
G(z, x, y) = uj (x)uj (y), (3.28)
Ej − z
j=0
3.3. Applications to Sturm–Liouville operators 89

where the sum is uniformly convergent. Moreover, we have the following


trace formula
Z 1 ∞
X 1
G(z, x, x)dx = . (3.29)
0 Ej −z
j=0

Proof. Using the conventions from the proof of the previous lemma we have
⟨uj , G(0, x, .)⟩ = Ej−1 uj (x) and since G(0, x, .) ∈ Q(L) for fixed x ∈ [a, b] we
have

X 1
uj (x)uj (y) = G(0, x, y),
Ej
j=0

where the convergence is uniformly with respect to y (and x fixed). Moreover,


for x = y Dini’s theorem (cf. Problem B.38) shows that the convergence is
uniform with respect to x = y and this also proves uniform convergence of
our sum since
 1/2 1/2
n n n
X 1 X 1 X 1
|uj (x)uj (y)| ≤ C(z)  uj (x)2   uj (y)2  ,
|Ej − z| Ej Ej
j=0 j=0 j=0

Ej
where C(z) := supj |Ej −z| .
Finally, the last claim follows upon computing the integral using (3.28)
and observing ∥uj ∥ = 1. □

Example 3.10. Let us look at the Sturm–Liouville problem with q = 0.


Then the underlying differential equation is
−u′′ (x) = z u(x)
√ √
whose solution is given by u(x) = c1 sin( zx) + c2 cos( zx). The solution

satisfying the boundary condition at the left endpoint is u− (z, x) = sin( zx)

and it will be an eigenfunction if and only if u− (z, 1) = sin( z) = 0. Hence
the corresponding eigenvalues and normalized eigenfunctions are

En = π 2 n2 , un (x) = 2 sin(nπx), n ∈ N.
Moreover, every function f ∈ L2cont (0, 1) can be expanded into a Fourier
sine series

X Z 1
f (x) = fn un (x), fn := un (x)f (x)dx,
n=1 0

which is convergent with respect to our scalar product. If f ∈ Cp1 [0, 1] with
f (0) = f (1) = 0 the series will converge uniformly. For an application of the
trace formula see Problem 3.16. ⋄
90 3. Compact operators

Example 3.11. We could also look at the same equation as in the previous
problem but with different boundary conditions
u′ (0) = u′ (1) = 0.
Then (
1, n = 0,
En = π 2 n2 , un (x) = √
2 cos(nπx), n ∈ N.
Moreover, every function f ∈ L2cont (0, 1) can be expanded into a Fourier
cosine series

X Z 1
f (x) = fn un (x), fn := un (x)f (x)dx,
n=1 0

which is convergent with respect to our scalar product. ⋄


Example 3.12. Combining the last two examples we see that every symmet-
ric function on [−1, 1] can be expanded into a Fourier cosine series and every
anti-symmetric function into a Fourier sine series. Moreover, since every
function f (x) can be written as the sum of a symmetric function f (x)+f 2
(−x)

and an anti-symmetric function f (x)−f


2
(−x)
, it can be expanded into a Fourier
series. Hence we recover Theorem 2.19. ⋄
Problem* 3.14. Show that for our Sturm–Liouville operator u± (z, x)∗ =
u± (z ∗ , x). Conclude RL (z)∗ = RL (z ∗ ). (Hint: Which differential equation
does u± (z, x)∗ solve? For the second part use Problem 3.7.)
Problem 3.15. Suppose E0 > 0 and equip Q(L) with the scalar product sL .
Show that
f (x) = sL (G(0, x, .), f ).
In other words, point evaluations are continuous functionals associated with
the vectors G(0, x, .) ∈ Q(L). In this context, G(0, x, y) is called a repro-
ducing kernel.
Problem 3.16. Show that
∞ √ √
X 1 1 − π z cot(π z)
= , z ∈ C \ N.
n2 − z 2z
n=1

In particular, for z = 0 this gives Euler’s7 solution of the Basel problem:



X 1 π2
2
= .
n 6
n=1

7Leonhard Euler (1707–1783), Swiss mathematician, physicist, astronomer, geographer, logi-


cian and engineer
3.4. Estimating eigenvalues 91

In fact, comparing the power series of both sides at z = 0 gives



X 1 (−1)k+1 (2π)2k B2k
= , k ∈ N,
n2k 2(2k)!
n=1
z P∞ Bk k
where Bk are the Bernoulli numbers8 defined via ez −1 = k=0 k! z .
(Hint: Use the trace formula (3.29).)
Problem 3.17. Consider the Sturm–Liouville problem on a compact interval
[a, b] with domain
D(L) = {f ∈ C 2 [a, b]|f ′ (a) − αf (a) = f ′ (b) − βf (b) = 0}
for some real constants α, β ∈ R. Show that Theorem 3.11 continues to hold
except for the lower bound on the eigenvalues.

3.4. Estimating eigenvalues


In general, there is no way of computing eigenvalues and their corresponding
eigenfunctions explicitly. Hence it is important to be able to determine the
eigenvalues at least approximately.
Let A be a symmetric operator which has a lowest eigenvalue α1 (e.g.,
A is a Sturm–Liouville operator). Suppose we have a vector f which is an
approximation for the eigenvector u1 of this lowest eigenvalue α1 . Moreover,
suppose we can write

X ∞
X
A := αj ⟨uj , .⟩uj , D(A) := {f ∈ H| |αj ⟨uj , f ⟩|2 < ∞}, (3.30)
j=1 j=1

where {uj }j∈N is an orthonormal basis of eigenvectors. Since α1 is supposed


to be the lowest eigenvalue we have αj ≥ α1 for all j ∈ N.
Writing f = j γj uj , γj = ⟨uj , f ⟩, one computes
P


X ∞
X
⟨f, Af ⟩ = ⟨f, αj γj uj ⟩ = αj |γj |2 , f ∈ D(A), (3.31)
j=1 j=1

and we clearly have


⟨f, Af ⟩
α1 ≤ , f ∈ D(A), (3.32)
∥f ∥2
with equality for f = u1 . In particular, any f will provide an upper bound
and if we add some free parameters to f , one can optimize them and obtain
quite good upper bounds for the first eigenvalue. For example we could

8Jacob Bernoulli (1655–1705), Swiss mathematician


92 3. Compact operators

take some orthogonal basis, take a finite number of coefficients and optimize
them. This is known as the Rayleigh–Ritz method.9
Example 3.13. Consider the Sturm–Liouville operator L with potential
q(x) = x and Dirichlet boundary conditions f (0) = f (1) = 0 on the in-
terval [0, 1]. Our starting point is the quadratic form
Z 1
|f ′ (x)|2 + q(x)|f (x)|2 dx

qL (f ) := ⟨f, Lf ⟩ =
0
which gives us the lower bound
⟨f, Lf ⟩ ≥ min q(x) = 0.
0≤x≤1
While the corresponding differential equation can in principle be solved in
terms of Airy functions, there is no closed form for the eigenvalues.
First of all we can improve the above bound upon observing 0 ≤ q(x) ≤ 1
which implies
⟨f, L0 f ⟩ ≤ ⟨f, Lf ⟩ ≤ ⟨f, (L0 + 1)f ⟩, f ∈ D(L) = D(L0 ),
where L0 is the Sturm–Liouville operator corresponding to q(x) = 0. Since
the lowest eigenvalue of L0 is π 2 we obtain
π 2 ≤ E1 ≤ π 2 + 1
for the lowest eigenvalue E1 of L.

Moreover, using the lowest eigenfunction f1 (x) = 2 sin(πx) of L0 one
obtains the improved upper bound
1
E1 ≤ ⟨f1 , Lf1 ⟩ = π 2 + ≈ 10.3696.
2

Taking the second eigenfunction f2 (x) = 2 sin(2πx) of L0 we can make the
ansatz f (x) = (1 + γ 2 )−1/2 (f1 (x) + γf2 (x)) which gives
1 γ 32 
⟨f, Lf ⟩ = π 2 + + 2
3π 2 γ − 2 .
2 1+γ 9π
The right-hand side has a unique minimum at γ = 27π4 +√1024+729π
32
8
giving
the bound √
5 2 1 1024 + 729π 8
E1 ≤ π + − ≈ 10.3685
2 2 18π 2
which coincides with the exact eigenvalue up to five digits. ⋄
But is there also something one can say about the next eigenvalues?
Suppose we know the first eigenfunction u1 . Then we can restrict A to
the orthogonal complement of u1 and proceed as before: E2 will be the
minimum of ⟨f, Af ⟩ over all f restricted to this subspace. If we restrict to
9John William Strutt, 3rd Baron Rayleigh (1842–1919), English physicist
9Walther Ritz (1878–1909), Swiss theoretical physicist
3.4. Estimating eigenvalues 93

the orthogonal complement of an approximating eigenfunction f1 , there will


still be a component in the direction of u1 left and hence the infimum of the
expectations will be lower than E2 . Thus the optimal choice f1 = u1 will
give the maximal value E2 .
Theorem 3.14 (Courant10 Max-min principle). Let A be a symetric oper-
ator and let α1 ≤ α2 ≤ · · · ≤ αN be eigenvalues of A with corresponding
orthonormal eigenvectors u1 , u2 , . . . , uN . Suppose
N
X
A= αj ⟨uj , .⟩uj + Ã (3.33)
j=1

with ⟨f, Ãf ⟩ ≥ αN ∥f ∥2 for all f ∈ D(A) and u1 , . . . , uN ∈ Ker(Ã). Then


αj = sup inf ⟨f, Af ⟩, 1 ≤ j ≤ N, (3.34)
f1 ,...,fj−1 f ∈U (f1 ,...,fj−1 )

where
U (f1 , . . . , fj ) := {f ∈ D(A)| ∥f ∥ = 1, f ∈ span{f1 , . . . , fj }⊥ }. (3.35)

Proof. We have
inf ⟨f, Af ⟩ ≤ αj .
f ∈U (f1 ,...,fj−1 )
Pj
In fact, set f = k=1 γk uk and choose γk such that f ∈ U (f1 , . . . , fj−1 ).
Then
j
X
⟨f, Af ⟩ = |γk |2 αk ≤ αj
k=1
and the claim follows.
Conversely, let γk = ⟨uk , f ⟩ and write f = jk=1 γk uk + f˜. Then
P
 
XN
inf ⟨f, Af ⟩ = inf  |γk |2 αk + ⟨f˜, Ãf˜⟩ = αj . □
f ∈U (u1 ,...,uj−1 ) f ∈U (u1 ,...,uj−1 )
k=j

Of course if we are interested in the largest eigenvalues all we have to do


is consider −A.
Note that this immediately gives an estimate for eigenvalues if we have
a corresponding estimate for the operators. To this end we will write
A≤B ⇔ ⟨f, Af ⟩ ≤ ⟨f, Bf ⟩, f ∈ D(A) ∩ D(B). (3.36)
Corollary 3.15. Suppose A and B are symmetric operators with corre-
sponding eigenvalues αj and βj as in the previous theorem. If A ≤ B and
D(B) ⊆ D(A) then αj ≤ βj .
10Richard Courant (1888–1972), German American mathematician
94 3. Compact operators

Proof. By assumption we have ⟨f, Af ⟩ ≤ ⟨f, Bf ⟩ for f ∈ D(B) implying


inf ⟨f, Af ⟩ ≤ inf ⟨f, Af ⟩ ≤ inf ⟨f, Bf ⟩,
f ∈UA (f1 ,...,fj−1 ) f ∈UB (f1 ,...,fj−1 ) f ∈UB (f1 ,...,fj−1 )

where we have indicated the dependence of U on the operator via a subscript.


Taking the sup on both sides the claim follows. □
Example 3.14. Let L be again our Sturm–Liouville operator and L0 the
corresponding operator with q(x) = 0. Set q− = min0≤x≤1 q(x) and q+ =
max0≤x≤1 q(x). Then L0 + q− ≤ L ≤ L0 + q+ implies
π 2 n2 + q− ≤ En ≤ π 2 n2 + q+ .
In particular, we have proven the famous Weyl asymptotic11
En = π 2 n2 + O(1)
for the eigenvalues. ⋄
There is also an alternative version which can be proven similar (Prob-
lem 3.18):
Theorem 3.16 (Courant Min-max principle). Let A be as in the previous
theorem. Then
αj = inf sup ⟨f, Af ⟩, (3.37)
Vj ⊂D(A),dim(Vj )=j f ∈Vj ,∥f ∥=1

where the inf is taken over subspaces with the indicated properties.
Problem* 3.18. Prove Theorem 3.16.
Problem 3.19. Suppose A, An are self-adjoint, bounded and An → A.
Then αk (An ) → αk (A). (Hint: For B self-adjoint ∥B∥ ≤ ε is equivalent to
−ε ≤ B ≤ ε.)

3.5. Singular value decomposition of compact operators


Our first aim is to find a generalization of Corollary 3.8 for general com-
pact operators between Hilbert spaces. The key observation is that if K ∈
K (H1 , H2 ) is compact, then K ∗ K ∈ K (H1 ) is compact and symmetric and
thus, by Corollary 3.8, there is a countable orthonormal set {uj } ⊂ H1 and
nonzero real numbers s2j ̸= 0 such that
X
K ∗ Kf = s2j ⟨uj , f ⟩uj . (3.38)
j

Moreover, ∥Kuj ∥2 = ⟨uj , K ∗ Kuj ⟩ = ⟨uj , s2j uj ⟩ = s2j shows that we can set
sj := ∥Kuj ∥ > 0. (3.39)
11Hermann Weyl (1885-1955), German mathematician, theoretical physicist and philosopher
3.5. Singular value decomposition of compact operators 95

The numbers sj = sj (K) are called singular values of K. There are either
finitely many singular values or they converge to zero.
Theorem 3.17 (Schmidt; Singular value decomposition of compact opera-
tors). Let K ∈ K (H1 , H2 ) be compact and let sj be the singular values of K
and {uj } ⊂ H1 corresponding orthonormal eigenvectors of K ∗ K. Then
X
K= sj ⟨uj , .⟩vj , (3.40)
j

where vj = s−1
j Kuj . The norm of K is given by the largest singular value

∥K∥ = max sj (K). (3.41)


j

Moreover, the vectors {vj } ⊂ H2 are again orthonormal and satisfy K ∗ vj =


sj uj . In particular, vj are eigenvectors of KK ∗ corresponding to the eigen-
values s2j .

Proof. For any f ∈ H1 we can write


X
f= ⟨uj , f ⟩uj + f⊥
j

with f⊥ ∈ Ker(K ∗ K) = Ker(K) (Problem 3.20). Then


X X
Kf = ⟨uj , f ⟩Kuj = sj ⟨uj , f ⟩vj
j j

as required. Furthermore,
⟨vj , vk ⟩ = (sj sk )−1 ⟨Kuj , Kuk ⟩ = (sj sk )−1 ⟨K ∗ Kuj , uk ⟩ = sj s−1
k ⟨uj , uk ⟩

shows that {vj } are orthonormal. By definition K ∗ vj = s−1 ∗


j K Kuj = sj uj
which also shows KK ∗ vj = sj Kuj = s2j vj .
Finally, (3.41) follows using Bessel’s inequality
X X  
∥Kf ∥2 = ∥ sj ⟨uj , f ⟩vj ∥2 = s2j |⟨uj , f ⟩|2 ≤ max sj (K)2 ∥f ∥2 ,
j
j j

where equality holds for f = uj0 if sj0 = maxj sj (K). □

If K ∈ K (H) is self-adjoint, then uj = σj vj , σj2 = 1, are the eigenvectors


of K and σj sj are the corresponding eigenvalues. In particular, for a self-
adjoint operators the singular values are the absolute values of the nonzero
eigenvalues.
The above theorem also gives rise to the polar decomposition
K = U |K| = |K ∗ |U, (3.42)
96 3. Compact operators

where
√ X √ X
|K| := K ∗K = sj ⟨uj , .⟩uj , |K ∗ | = KK ∗ = sj ⟨vj , .⟩vj (3.43)
j j

are self-adjoint (in fact nonnegative) and


X
U := ⟨uj , .⟩vj (3.44)
j

is an isometry from Ran(K ∗ ) = span{uj } onto Ran(K) = span{vj }.


From the min-max theorem (Theorem 3.16) we obtain:
Lemma 3.18. Let K ∈ K (H1 , H2 ) be compact; then
sj (K) = min max ∥Kf ∥, (3.45)
f1 ,...,fj−1 f ∈U (f1 ,...,fj−1 )

where U (f1 , . . . , fj ) := {f ∈ H1 | ∥f ∥ = 1, f ∈ span{f1 , . . . , fj }⊥ }.

In particular, note
sj (AK) ≤ ∥A∥sj (K), sj (KA) ≤ ∥A∥sj (K) (3.46)
whenever K is compact and A is bounded (the second estimate follows from
the first by taking adjoints).
An operator K ∈ L (H1 , H2 ) is called a finite rank operator if its
range is finite dimensional. The dimension
rank(K) := dim Ran(K)
is called the rank of K. Since for a compact operator
Ran(K) = span{vj } (3.47)
we see that a compact operator is finite rank if and only if the sum in (3.40)
is finite. Note that the finite rank operators form an ideal in L (H) just as
the compact operators do. Moreover, every finite rank operator is compact
by the Heine–Borel theorem (Theorem B.22).
Now truncating the sum in the canonical form gives us a simple way to
approximate compact operators by finite rank ones. Moreover, this is in fact
the best approximation within the class of finite rank operators:
Lemma 3.19 (Schmidt). Let K ∈ K (H1 , H2 ) be compact and let its singular
values be ordered. Then
sj (K) = min ∥K − F ∥, (3.48)
rank(F )<j

where the minimum is attained for


j−1
X
Fj−1 := sk ⟨uk , .⟩vk . (3.49)
k=1
3.5. Singular value decomposition of compact operators 97

In particular, the closure of the ideal of finite rank operators in L (H) is the
ideal of compact operators.

Proof. That there is equality for F = Fj−1 follows from (3.41). In general,
the restriction of F to span{u1 , . . . , uj } will have a nontrivial kernel. Let
f = jk=1 αj uj be a normalized element of this kernel, then ∥(K − F )f ∥2 =
P

∥Kf ∥2 = jk=1 |αk sk |2 ≥ s2j .


P

In particular, every compact operator can be approximated by finite rank


ones and since the limit of compact operators is compact, we cannot get more
than the compact operators. □

Two more consequences are worthwhile noting.


Corollary 3.20. An operator K ∈ L (H1 , H2 ) is compact if and only if K ∗ K
is.

Proof. Just observe that K ∗ K compact is all that was used to show Theo-
rem 3.17. □
Corollary 3.21. An operator K ∈ L (H1 , H2 ) is compact (finite rank) if
and only K ∗ ∈ L (H2 , H1 ) is. In fact, sj (K) = sj (K ∗ ) and
X
K∗ = sj ⟨vj , .⟩uj . (3.50)
j

Proof. First of all note that (3.50) follows from (3.40) since taking adjoints
is continuous and (⟨uj , .⟩vj )∗ = ⟨vj , .⟩uj (cf. Problem 2.11). The rest is
straightforward. □

From this last lemma one easily gets a number of useful inequalities for
the singular values:
Corollary 3.22 (Weyl). Let K1 and K2 be compact and let sj (K1 ) and
sj (K2 ) be ordered. Then
(i) sj+k−1 (K1 + K2 ) ≤ sj (K1 ) + sk (K2 ),
(ii) sj+k−1 (K1 K2 ) ≤ sj (K1 )sk (K2 ),
(iii) |sj (K1 ) − sj (K2 )| ≤ ∥K1 − K2 ∥.

Proof. Let F1 be of rank j − 1 and F2 of rank k − 1 such that ∥K1 − F1 ∥ =


sj (K1 ) and ∥K2 − F2 ∥ = sk (K2 ). Then sj+k−1 (K1 + K2 ) ≤ ∥(K1 + K2 ) −
(F1 + F2 )∥ = ∥K1 − F1 ∥ + ∥K2 − F2 ∥ = sj (K1 ) + sk (K2 ) since F1 + F2 is of
rank at most j + k − 2.
Similarly F = F1 (K2 − F2 ) + K1 F2 is of rank at most j + k − 2 and hence
sj+k−1 (K1 K2 ) ≤ ∥K1 K2 − F ∥ = ∥(K1 − F1 )(K2 − F2 )∥ ≤ ∥K1 − F1 ∥∥K2 −
F2 ∥ = sj (K1 )sk (K2 ).
98 3. Compact operators

Next, choosing k = 1 and replacing K2 → K2 − K1 in (i) gives sj (K2 ) ≤


sj (K1 ) + ∥K2 − K1 ∥. Reversing the roles gives sj (K1 ) ≤ sj (K2 ) + ∥K1 − K2 ∥
and proves (iii). □
Example 3.15. On might hope that item (i) from the previous corollary
can be improved to sj (K1 + K2 ) ≤ sj (K1 ) + sj (K2 ). However, this is not
the case as the following example shows:
   
1 0 0 0
K1 := , K2 := .
0 0 0 1
Then 1 = s2 (K1 + K2 ) ̸≤ s2 (K1 ) + s2 (K2 ) = 0. ⋄
Problem* 3.20. Show that Ker(A∗ A) = Ker(A) for any A ∈ L (H1 , H2 ).
Problem 3.21. Let K be multiplication by a sequence k ∈ c0 (N) in the
Hilbert space ℓ2 (N). What are the singular values of K?
Problem 3.22. Let K be multiplication by a sequence k ∈ c0 (N) in the
Hilbert space ℓ2 (N) and consider L = KS − . What are the singular values of
L? Does L have any eigenvalues?
Problem 3.23. Let K ∈ K (H1 , H2 ) be compact and let its singular values
be ordered. Let M ⊆ H1 , N ⊆ H1 be subspaces with corresponding orthogonal
projections PM , PN , respectively. Then
sj (K) = min ∥K − KPM ∥ = min ∥K − PN K∥,
dim(M )<j dim(N )<j

where the minimum is taken over all subspaces with the indicated dimension.
Moreover, the minimum is attained for
M = span{uk }j−1