Relativity Notes for Physicists
Relativity Notes for Physicists
Matthias Blau
Albert Einstein Center for Fundamental Physics
Institut f
ur Theoretische Physik
Universit
at Bern
CH-3012 Bern, Switzerland
http://www.blau.itp.unibe.ch/Lecturenotes.html
1
Contents
0 Introduction 12
0.1 Prerequisites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
0.2 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
0.3 Literature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
0.4 References and Footnotes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
0.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3 Tensor Algebra 98
3.1 Principle of General Covariance . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
3.2 Tensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
3.3 Tensor Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
3.4 Generally Covariant Integration and Volume Elements . . . . . . . . . . . . . . 108
3.5 Tensor Densities and Volume Elements . . . . . . . . . . . . . . . . . . . . . . . 110
3.6 Towards a Coordinate-Independent Interpretation of Tensors . . . . . . . . . . . 113
3.7 Multilinear Algebra and Tensors . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
3.8 Vielbeins and Orthonormal Frames . . . . . . . . . . . . . . . . . . . . . . . . . 119
3.9 Epilogue: Indices? Indices! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
2
4.2 Extension of the Covariant Derivative to Other Tensor Fields . . . . . . . . . . 133
4.3 Main Properties of the Covariant Derivative . . . . . . . . . . . . . . . . . . . . 135
4.4 Uniqueness of the Levi-Civita Connection (Christoffel symbols) . . . . . . . . . 137
4.5 Tensor Analysis: Some Special Cases . . . . . . . . . . . . . . . . . . . . . . . . 138
4.6 Appendix: A Formula for the Variation of the Determinant . . . . . . . . . . . 143
4.7 Covariant Differentiation Along a Curve . . . . . . . . . . . . . . . . . . . . . . 145
4.8 Parallel Transport and Geodesics . . . . . . . . . . . . . . . . . . . . . . . . . . 146
4.9 Example: Parallel Transport on the 2-Sphere . . . . . . . . . . . . . . . . . . . 147
4.10 Fermi-Walker Parallel Transport . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
4.11 Epilogue: Manifolds? Think Globally, Act Locally! . . . . . . . . . . . . . . . . 153
3
8.1 Symmetries of a Metric (Isometries): Preliminary Remarks . . . . . . . . . . . . 221
8.2 Lie Derivative for Scalars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222
8.3 Lie Derivative for Vector Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
8.4 Lie Derivative for other Tensor Fields . . . . . . . . . . . . . . . . . . . . . . . . 226
8.5 Lie Derivative of the Metric and Killing Vectors . . . . . . . . . . . . . . . . . . 228
8.6 Lie Derivative for Tensor Densities . . . . . . . . . . . . . . . . . . . . . . . . . 232
4
14.3 Embeddings and Pull-Backs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313
14.4 Embedded Hypersurfaces and Normal Vectors . . . . . . . . . . . . . . . . . . . 316
14.5 Hypersurface Orthogonality and Frobenius Integrability . . . . . . . . . . . . . 318
5
20.6 Back to Gravity: Conjugate Momenta and Primary Constraints . . . . . . . . . 407
20.7 Legendre Transform and ADM Hamiltonian . . . . . . . . . . . . . . . . . . . . 409
20.8 Secondary Constraints: the Hamiltonian and Momentum Constraints . . . . . . 411
20.9 Properties and Significance of the Constraints . . . . . . . . . . . . . . . . . . . 413
20.10 Boundary Terms in the ADM Action and Hamiltonian . . . . . . . . . . . . . . 418
20.11 Alternative Derivation of the Hamiltonian Boundary Terms . . . . . . . . . . . 421
20.12 Significance of the Hamiltonian Boundary Terms: ADM Energy . . . . . . . . . 423
6
24.6 Bending of Light by a Star: 3 Derivations . . . . . . . . . . . . . . . . . . . . . 531
24.7 A Unified Description in terms of the Runge-Lenz Vector . . . . . . . . . . . . . 537
29 Black Holes IV: Other Black Hole Solutions (a brief overview) 642
29.1 Kerr-Newman Family of 4-dimensional Black Holes . . . . . . . . . . . . . . . . 642
7
29.2 Other 4-dimensional Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 645
29.3 Higher-dimensional Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 648
F: Cosmology 723
8
33.10 Comments on Cosmic Expansion as Expansion of Space . . . . . . . . . . . . 752
G: Varia 837
9
38.2.8 Interlude on (A)dS Schwarzschild . . . . . . . . . . . . . . . . . . . . . . 850
38.2.9 Painleve-Gullstrand-like Coordinates . . . . . . . . . . . . . . . . . . . . 851
38.3 Some Coordinate Systems for anti-de Sitter space . . . . . . . . . . . . . . . . . 852
38.3.1 Global (and Static) Coordinates . . . . . . . . . . . . . . . . . . . . . . . 853
38.3.2 Conformal Coordinates, Conformal Boundary and Penrose Diagrams . . 854
38.3.3 Isotropic (Spatially Conformally Flat) Coordinates . . . . . . . . . . . . 857
38.3.4 Cosmological (Hyperbolic Slicing) Coordinates . . . . . . . . . . . . . . 858
38.3.5 de Sitter Slicing Coordinates . . . . . . . . . . . . . . . . . . . . . . . . 858
38.3.6 anti-de Sitter Slicing Coordinates . . . . . . . . . . . . . . . . . . . . . . 859
38.3.7 Poincare Coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 859
38.3.8 Plane Wave AdS Coordinates . . . . . . . . . . . . . . . . . . . . . . . . 862
38.3.9 Codimension-2 Hyperbolic Slicing Coordinates . . . . . . . . . . . . . . 864
38.3.10 Painleve-Gullstrand-like Coordinates? . . . . . . . . . . . . . . . . . . . 865
38.4 Warped Products, Cones, and Maximal Symmetry . . . . . . . . . . . . . . . . 867
10
42.8 Plane Waves with more Isometries . . . . . . . . . . . . . . . . . . . . . . . . . 934
11
0 Introduction
1905 was Einsteins magical year. In that year, he published three articles, on light
quanta, on the foundations of the theory of Special Relativity, and on Brownian motion,
each one separately worthy of a Nobel prize. Immediately after his work on Special
Relativity, Einstein started thinking about gravity and how to give it a relativistically
invariant formulation. He kept on working on this problem during the next ten years,
doing little else. This work, after many trials and errors, culminated in his masterpiece,
the General Theory of Relativity, presented in 1915/1916. It is widely considered to
be one of the greatest scientific and intellectual achievements of all time, a beautiful
theory derived from pure thought and physical intuition, capable of explaining, or at
least describing, still today, more than 100 years later, every aspect of gravitational
physics ever observed.
Einsteins key insight was what is now known as the Einstein Equivalence Principle, the
(local) equivalence of gravitation and inertia. This ultimately led him to the realisation
that gravity is best described and understood not as a physical external force like the
other forces of nature but rather as a manifestation of the geometry and curvature of
space-time itself. This realisation, in its simplicity and beauty, has had a profound
impact on theoretical physics as a whole, and Einsteins vision of a geometrisation of
all of physics is still with us today.
These lecture notes for an introductory course on General Relativity are based on a
course that I originally gave in the years 1998-2003 in the framework of the Diploma
Course of the ICTP (Trieste, Italy). Currently these notes form the basis of a course
that I teach as part of the Master in Theoretical Physics curriculum at the University
of Bern.
In the intervening years, I have made (and keep making) various additions to the lecture
notes, and they now include much more material than is needed for (or can realistically
be covered in) an introductory 1- or even 2-semester course, say, but I hope to have
nevertheless preserved (at least in parts) the introductory character and accessible style
of the original notes.
Invariably, any set of (introductory) lecture notes has its shortcomings, due to lack of
space and time, the requirements of the audience and the expertise (or lack thereof)
and interests of the lecturer. These lecture notes are, of course, no exception. In
particular, the emphasis in these notes is on developing the theory (I am a theoretical
physicist), not on experiments or connecting the theory with observation, but stops
short of doing real mathematical general relativity (i.e. proving theorems), as this would
12
require significantly more mathematical sophistication and machinery than I want to
assume (or can develop) in these notes. I hope that these lecture notes nevertheless
provide the necessary background for studying these or other more advanced topics not
covered in these notes.
I should also stress that I have written these notes primarily for myself, and for my
students. I am making them publicly available just in case somebody else happens to
find them useful, and because I know that previous versions of these notes have enjoyed
some popularity. However, if you do not like these notes or my way of explaining things,
or do not find what you are looking for, please do not complain to me (yes, this has
happened in the past). There will occasionally be further additions and updates to these
notes, reflecting however my personal preferences and taste rather than any (futile) aim
for completeness.
Lecture notes of this length unavoidably contain some minor mistakes somewhere. How-
ever, I hope that these notes are free of major conceptual errors and blunders. I am
of course grateful for any constructive criticism and corrections. If you have such com-
ments, or also if you just happen to find these notes useful, please let me know (blau
at itp.unibe.ch).
0.1 Prerequisites
Special Relativity,
Lagrangian mechanics,
I will thus attempt to explain every single other thing that is required to understand
the basics of Einsteins theory of gravity. However, this also means that I will not be
13
able to discuss some mathematically more advanced and yet equally important aspects
of General Relativity.
0.2 Overview
E: Black Holes
F: Cosmology
G: Varia
I refer to the Table of Contents for rather detailed information about the contents of
the individual parts and sections of these notes and want to just provide some remarks
here for a first orientation.
Part A of the lecture notes is dedicated to explaining and exploring the consequences
of Einsteins insights into the relation between gravity and space-time geometry, and
to developing the machinery (of tensor calculus and Riemannian geometry) required to
describe physics in a curved space time, i.e. in a gravitational field.
From about section 3 onwards, Part A can be read in parallel with other parts of
these notes which deal with various applications of General Relativity. In particular,
at this point in the course I find it useful to develop in parallel (and suggest to read in
parallel) the more formal material on tensor analysis in Part A, and Part D (dealing
with solar system tests of general relativity) cf. the more detailed suggestions at the
end of section 2. Not only does this provide an interesting and physically relevant
application and illustration of the machinery developed so far, it also serves to provide
an appropriate balance between physics and formalism in the lectures.
The topics covered in Parts A and D, together with the first section 18 of Part C
dealing with the Einstein field equations, probably form the core of most introductory
courses on general relativity. This provides (or is meant to provide) the basis for other
applications or investigations of general relativity, and other sections of Part C and
Parts E-G provide a reasonably large variety of topics to choose from.
14
In Part B of the lecture notes I have collected a number of different more mathematical
topics that develop the formalism of tensor calculus and differential geometry in one
way or another. Stricly speaking, none of these topics are essential for understanding
some of the more elementary aspects of general relativity to be treated later on (so Part
B can also be regarded as a mathematical appendix to the notes). However, some of
them are required at a later stage to understand, or even formulate, certain somewhat
more advanced aspects of general relativity (and it is perhaps best to then go back to
this section if and when needed), and others are included simply because they are fun
or beautiful (or, usually, both).
0.3 Literature
Most of the material covered in these notes, in particular in the introductory parts, is
completely standard and can be found in many places. While my way of explaining
things is my own, and numerous gratuitous Remarks throughout the notes as well
as the selection of more advanced topics reflect my own interests, I make no claim to
major originality in these notes and have not attempted to reinvent the wheel.
In particular, in earlier versions of these notes the presentation of much of the intro-
ductory material followed quite closely the treatment in Weinbergs classic
and readers familiar with this book may still recognise the similarities in some places.
Even though my own way of thinking about general relativity is much more geometric
(and this has definitely influenced later versions of and additions to these notes), I have
found that the pragmatic approach adopted by Weinberg is ideally suited to introduce
general relativity to students with little mathematical background.
As far as more recent and modern books are concerned, here is a short personal selection
of my favourites:
2. At an intermediate level (i.e. more or less at the level of these notes), my favourite
modern book is
15
E. Poisson, A Relativists Toolkit: the Mathematics of Black Hole Mechanics
R. Wald, General Relativity
and I will frequently refer to these books in the body of the notes for discussions
of more advanced and/or more mathematical topics.
A. Pais, Subtle is the Lord: the science and life of Albert Einstein
As mentioned before, much of the material covered in these notes is quite standard,
and can be found in many places, and I have not attempted to provide references or
attributions for this.
Nevertheless, these lecture notes contain a large number of footnotes, with significantly
higher density in the sections of the notes dealing with more advanced and, specifically,
more recent developments. For the most part, these are meant as pointers to the
literature for further reading and with more information.
When referring to textbooks, I usually just refer to them in the form Author,
Title (as above), without indicating publisher, year, . . . If you actually need this
information, it will be easy for you to find it.
When referring to articles, if they are available from the preprint server at
http://arXiv.org/
I usually just refer to the arXiv number, regardless of whether or not that article
has been published elsewhere (this just reflects the by now standard practice that
people are more likely to first go there rather than to the library to look for or at
an article).
16
References to pre-arXiv articles are given in the traditional complete Author(s),
Title, Journal, . . . form.
0.5 Exercises
This simply reflects my own style of teaching, where exercises are very much integrated
into the course and mainly serve the purpose of getting students to look at what was done
in the course and to perhaps fill in some details that I skipped in class. In particular,
I am no fan of exercises that go significantly beyond what is covered in class or in the
notes (if it is relevant, I should explain or include it, if it is not then we may as well not
bother).
However, most (sub-)sections contain numerous Remarks, and many of them con-
tain supplementary and/or more advanced information and material, and these may be
regarded as (annotated) exercises or used as a basis for exercises.
17
A: Physics in a Gravitational Field
and General Covariance
18
1 From the Einstein Equivalence Principle to Geodesics
(but we will come back in some detail below to the question if/why the same mass
parameter m appears on both sides of this equation, so as to incorporate the observation,
going back to Galileo, that all bodies fall at the same rate in a a gravitational field).
The latter is the Poisson equation
with GN denoting, here and throughout, Newtons constant, i.e. the gravitational cou-
pling constant, and where is the mass density, and = c2 the associated rest mass
energy density - I will set c = 1 in the following and use .
Let us start with the field equation. It is immediately evident that this cannot be the
final story. Not only is this equation not Lorentz invariant. Because of the absence
of time-derivatives in (1.2), it actually describes an action at a distance and an in-
stantaneous propagation of the gravitational field to every point in space (if you wiggle
your mass distribution here now, this will immediately effect the gravitational potential
arbitrarily far away). This is something that Einstein had just successfully exorcised
from other aspects of physics, and clearly Newtonian gravity had to be revised as well.
It is then also immediately clear that what would have to replace Newtons theory is
something rather more complicated. The reason for this is that, according to Special
Relativity, mass is just another form of energy. Then, since gravity couples to masses, in
a relativistically invariant theory gravity will also have to couple to energy. In particular,
therefore, gravity would have to couple to gravitational energy, i.e. to itself. As a
consequence, the new gravitational field equations will, unlike Newtons, have to be
non-linear: the field of the sum of two masses cannot equal the sum of the gravitational
fields of the two masses because it should also take into account the gravitational energy
of the two-body system.
Now, having realised that Newtons theory cannot be the final word on the issue, how
does one go about finding a better theory?
I will first very briefly discuss (and then dismiss) what at first sight may appear to
be the most natural and naive approach to formulating a relativistic theory of gravity,
19
namely the simple replacement of Newtons field equation (1.2) by its relativistically
covariant version
= 4GN = 4GN , (1.3)
where is the Lorentz invariant dAlembert or wave operator. While this looks promis-
ing, something cant be quite right about this equation. We already know (from Special
Relativity) that is not a scalar but rather the 00-component of a tensor, the energy-
momentum tensor, so if actually appears on the right-hand side, cannot be a scalar,
while if is a scalar something needs to be done to fix the right-hand side.
Turning first to the latter possiblility, one option that suggests itself is to replace by
the trace T = T of the energy-momentum tensor. This is by definition / construction
a scalar, and it will agree with in the non-relativistic limit (where rest mass dominates
over other contributions). Thus a first attempt at fixing the above equation might look
like
= 4GN T . (1.4)
This is certainly an attractive equation, but it definitely has the drawback that it is
too linear. Recall from the discussion above that the universality of gravity (coupling
to all forms of matter) and the equivalence of mass and energy lead to the conclusion
that gravity should couple to gravitational energy, invariably predicting non-linear (self-
interacting) equations for the gravitational field. However, the left hand side could be
such that it only reduces to or of the Newtonian potential in the Newtonian limit
of weak time-independent fields. Thus a second attempt at fixing the above equation
might look like
() = 4GN T , (1.5)
Such a scalar relativistic theory of gravity and variants thereof were proposed and
studied among others by Abraham, Mie, and Nordstrm. As it stands, this field equation
appears to be perfectly consistent (and it may be interesting to discuss if/how the
Einstein equivalence principle, which will put us on our route towards metrics and
space-time curvature is realised in such a theory). However, regardless of this, this
theory is incorrect simply because it is ruled out experimentally. The easiest way to see
this (with hindsight) is to note that the energy-momentum tensor of Maxwell theory
(6.47) is traceless (6.121), and thus the above equation would predict no coupling of
gravity to the electro-magnetic field, in particular to light, hence in such a theory there
would be no deflection of light by the sun etc.1
The other possibility to render (1.3) consistent is the, a priori perhaps much less com-
pelling, option to think of and or not as scalars but as (00)-components of
1
For more on the history and properties of scalar theories of gravity see the review by D. Giulini,
What is (not) wrong with scalar gravity?, arXiv:gr-qc/0611100.
20
some tensor, in which case one could try to salvage (1.3) by promoting it to a tensorial
equation
{Some tensor generalising } 4GN T . (1.6)
This is indeed the form of the field equations for gravity (the Einstein equations) we
will ultimately be led to (see section 18.4), but Einstein arrived at this in a completely
different, and much more insightful, way.
Let us now, very briefly and in a streamlined way, try to retrace (one aspect of) Einsteins
thoughts, namely on the relation between inertial and gravitational mass, which, as we
will see, will lead us rather quickly to the geometric picture of gravity sketched in the
Introduction.
To that end we return to the Newtonian equation of motion (1.1). Recall that in this
Newtonian theory, there are two a priori completely independent concepts of mass:
inertial mass mi (or acceleration mass), which accounts for the resistance of a body
or particle against acceleration and appears universally on the left-hand-side of
the Newtonian equation of motion
mi~a = F~ (1.7)
gravitational mass mg which is the mass the gravitational field couples to, i.e. it
is the gravitational charge of a particle,
~g = mg
F ~ . (1.8)
Now it is an important empirical fact that the inertial mass of a body is equal to its
gravitational mass. This realisation, at least with this clarity, is usually attributed
to Newton, although it goes back to experiments and observations by Galileo usually
paraphrased as all bodies fall at the same rate in a gravitational field. (It is not true,
though, that Galileo dropped objects from the leaning tower of Pisa to test this - he
used an inclined plane, a water clock and a pendulum).
21
Figure 1: Experimenter and his two stones freely floating somewhere in outer space, i.e.
in the absence of forces.
is perfectly acceptable for any ratio qe /mi , and Einstein was very impressed with the
observed equality of mi and mg . This should, he reasoned, not be a mere coincidence
but is probably trying to tell us something rather deep about the nature of gravity.
With his unequalled talent for discovering profound truths in simple observations, he
concluded (calling this der gl
ucklichste Gedanke meines Lebens (the happiest thought
of my life)) that the equality of inertial and gravitational mass suggests a close relation
between inertia and gravity itself, suggests, in fact, that locally effects of gravity and
acceleration are indistinguishable,
2. Now assume (Figure 2) that somebody on the outside suddenly pulls the box up
with a constant acceleration. Then of course, our friend will be pressed to the
bottom of the elevator with a constant force and he will also see his stones drop
to the floor.
22
Figure 2: Constant acceleration upwards mimics the effect of a gravitational field: ex-
perimenter and stones drop to the bottom of the box.
3. Now consider (Figure 3) this same box brought into a constant gravitational field.
Then again, he will be pressed to the bottom of the elevator with a constant force
and he will see his stones drop to the floor. With no experiment inside the elevator
can he decide if this is actually due to a gravitational field or due to the fact that
somebody is pulling the elevator upwards.
Thus our first lesson is that, indeed, locally the effects of acceleration and gravity
are indistinguishable.
4. Now consider somebody cutting the cable of the elevator (Figure 4). Then the
elevator will fall freely downwards but, as in Figure 1, our experimenter and his
stones will float as in the absence of gravity.
Thus lesson number two is that, locally the effect of gravity can be eliminated
by going to a freely falling reference frame (or coordinate system). This should
not come as a surprise. In the Newtonian theory, if the free fall in a constant
gravitational field is described by the equation
x
= g (+ other forces) , (1.10)
and the effect of gravity has been eliminated by going to the freely falling coordi-
nate system . The crucial point here is that in such a reference frame not only our
23
Figure 3: Effect of a constant gravitational field: indistinguishable for our experimenter
from that of a constant acceleration in Figure 2.
Figure 4: Free fall in a gravitational field has the same effect as no gravitational field
(Figure 1): experimenter and stones float.
24
Figure 5: Experimenter and his stones in a non-uniform gravitational field: the stones
will approach each other slightly as they fall to the bottom of the elevator.
observer will float freely, but because of the equality of inertial and gravitational
mass he will also observe all other objects obeying the usual laws of motion in the
absence of gravity.
5. In the above discussion, I have put the emphasis on constant accelerations and
on locally. To see the significance of this, consider our experimenter with his
elevator in the gravitational field of the earth (Figure 5). This gravitational field
is not constant but spherically symmetric, pointing towards the center of the
earth. Therefore the stones will slightly approach each other as they fall towards
the bottom of the elevator, in the direction of the center of the gravitational field.
6. Thus, if somebody cuts the cable now and the elevator is again in free fall (Figure
6), our experimenter will float again, so will the stones, but our experimenter will
also notice that the stones move closer together for some reason. He will have to
conclude that there is some force responsible for this.
This is lesson number three: in a non-uniform gravitational field the effects of
gravity cannot be eliminated by going to a freely falling coordinate system. This
is only possible locally, on such scales on which the gravitational field is essentially
constant.
Einstein formalised the outcome of these thought experiments in what is now known as
the Einstein Equivalence Principle which roughly states that physics in a freely falling
25
Figure 6: Experimentator and stones freely falling in a non-uniform gravitational field.
The experimenter floats, so do the stones, but they move closer together, indicating the
presence of some external force.
and
There are different versions of this principle depending on what precisely one means by
the laws of nature. If one just means the laws of Newtonian (or relativistic) mechanics,
then this priciple essentially reduces to the statement that inertial and gravitational
mass are equal. Usually, however, this statement is taken to imply also Maxwells
2
S. Weinberg, Gravitation and Cosmology.
3
J. Hartle, Gravity. An Introduction to Einsteins General Relativity.
26
theory, quantum mechanics etc.4 What it pragmatically asserts in one of its stronger
forms is that
The power of the above principle, which we will regard as a heuristic guideline, rather
than trying to (prematurely) give it a mathematically precise formulation, lies in the
fact that we can combine it with our understanding of physics in accelerated reference
systems to gain insight into the physics in a gravitational field. Two immediate conse-
quences of this (which cannot be derived on the basis of Newtonian physics or Special
Relativity alone) are
To see the inevitability of the first assertion, imagine a light ray entering the rocket /
elevator in Figure 1 horizontally through a window on the left hand side and exiting
again at the same height through a window on the right. Now imagine, as in Figure
2, accelerating the elevator upwards. Then clearly the light ray that enters on the left
will exit at a lower point of the elevator on the right because the elevator is accelerating
upwards. By the equivalence principle one should observe exactly the same thing in a
constant gravitational field (Figure 3). It follows that in a gravitational field the light
ray is bent downwards, i.e. it experiences a downward acceleration with the (locally
constant) gravitational acceleration g.
To understand the second assertion, one can e.g. simply appeal to the so-called twin-
paradox of Special Relativity: the accelerated twin is younger than his unaccelerated
inertial sibling. Hence accelerated clocks run slower than inertial clocks. Hence, by
the equivalence principle, clocks in a gravitational field run slower than clocks in the
absence of gravity.
Alternatively, one can imagine two observers at the top and bottom of the elevator,
having identical clocks and sending light signals to each other at regular intervals as
determined by their clocks. Once the elevator accelerates upwards, the observer at
the bottom will receive the signals at a higher rate than he emits them (because he
is accelerating towards the signals he receives), and he will interpret this as his clock
running more slowly than that of the observer at the top. By the equivalence principle,
the same conclusion now applies to two observers at different heights in a gravitational
4
For a discussion of different formulations of the equivalence principle and the logical relations
among them, see E. di Casola, S. Liberati, S. Sonego, Nonequivalence of equivalence principles,
arXiv:1310.7426 [gr-qc].
27
field. This can also be interpreted in terms of a gravitational redshift or blueshift
(photons losing or gaining energy by climbing or falling in a gravitational field), and we
will return to a more quantitative discussion of this effect in section 2.9.
What the equivalence principle tells us is that we can expect to learn something about
the effects of gravitation by transforming the laws of nature (equations of motion) from
an inertial Cartesian coordinate system to other (accelerated, curvilinear) coordinates.
As a first step, we will, in section 1.3 below, discuss the above example of an observer
undergoing constant acceleration in the context of special relativity.
As a preparation for this, and the remainder of the course, this section will provide a
lightning review of the Lorentz-covariant formulation of special relativity, mainly to set
the notation and conventions that will be used throughout, and only to the extent that
it will be used in the following.
1. Minkowski space(-time)
( a ) = ( 0 = ct, k = xk ) , (1.13)
where c is the speed of light. Typically in these notes a will indicate such
a (locally) inertial coordinate system, whereas generic coordinates will be
called x etc. We will almost always work in units in which c = 1.
(b) Minkowski space is equipped with a prescription for measuring distances,
encoded in a line-element which, in these coordinates, takes the form
X
ds2 = (d 0 )2 + (d k )2 . (1.14)
k
with metric (ab ) = diag(1, +1, +1, +1) or, more explicitly,
1 0 0 0
0 +1 0 0
ab = (1.16)
0 0 +1 0
0 0 0 +1
(thus we are using the mostly plus convention).
28
2. Lorentz Transformations
a 7 a = La b (1.17)
s2 ab da db = ab d a d b = ds2
d ab Lac Lbd = cd . (1.18)
= L , Lt L = (1.19)
(1 + )t (1 + ) = () + ()t = 0 . (1.21)
a = ab b with ab ac cb = ba . (1.22)
(c) Poincare transformations are those affine transformations that leave the Min-
kowski line-element invariant. They are composed of Lorentz transformations
and arbitrary constant translations and thus have the form
a 7 a = La b + a , (1.23)
29
infinitesimally
a = ab b + a . (1.24)
Any two inertial systems in the sense of the equivalence principle of special
relativity are related by a Poincare transformation.
(a) The Minkowski metric defines the Lorentz (and Poincare) invariant distance
()2 = ab (Pa Q
a
)(Pb Q
b
) (1.25)
(b) Depending on the sign of ()2 , the two events P, Q are called, spacelike,
lightlike (null) or timelike separated,
> 0 spacelike separated
2
() = = 0 lightlike separated (1.26)
< 0 timelike separated
(c) The set of events that are lightlike separated from P define the lightcone
at P . It consists of two components (joined at P ), the future and the past
lightcone, distinguished by the sign of Q0 0 (positive for Q on the future
P
0 > 0 , negative for Q on the past lightcone).
lightcone, Q P
d a
a (0 ) = ()|=0 . (1.27)
d
It is called spacelike, lightlike (null) or timelike, depending on the sign of
ab a b ,
> 0 spacelike
a b
ab = 0 lightlike (1.28)
< 0 timelike
This sign (and hence this classification) depends only on the image of the
curve, not its parametrisation.
(b) A curve whose tangent vector is everywhere timelike is called a timelike curve
(and likewise for lightlike and spacelike curves). A curve whose tangent
vector is everywhere timelike or null (i.e. non-spacelike) is called a causal
curve. Worldlines of massive particles are timelike curves, those of massless
particles (light) are null curves.
30
(c) A natural Lorentz-invariant parametrisation of timelike curves is provided by
the Lorentz-invariant proper time along the curves,
a = a ( ) , (1.29)
with
p p p
cd = ds2 = ab d a d b = ab a b d
d a ( ) d b ( ) (1.30)
ab = c2 .
d d
Likewise spacelike curves are naturally parametrised by proper distance ds.
The derivative with respect to proper time will be denoted by an overdot,
d a
a ( ) = ( ) . (1.31)
d
d a a d
a ( ) = ( ) = b b ( ) = Lab b ( ) . (1.32)
d d
These are the prototypes of what are called Lorentz vectors or, more gener-
ally, Lorentz tensors.
5. Lorentz Vectors
(a) Lorentz vectors (or 4-vectors) are objects with components v a which trans-
form under Lorentz transformations with the matrix Lab (to be thought of as
the Jacobian of the transformation relating a and a ),
va = Lab v b . (1.33)
31
(a) Lorentz scalars are objects that are invariant under Lorentz transformations.
Examples are scalar products and norms of Lorentz vectors.
(b) Lorentz covectors are objects ua that transform under Lorentz transforma-
tions with the (contragredient = transpose inverse) representation
= (Lt )1 = L 1 , (1.35)
i.e.
a = ab ub
u , ab = ac Lcd db , (1.36)
Lt L = t = ac bd cd = ab . (1.37)
u : v V 7 u(v) = ua v a R . (1.38)
Linear combinations of (p, q)-tensors are again (p, q)-tensors. Arbitrary prod-
ucts and contractions of Lorentz tensors are again Lorentz tensors (and the
tensor type can be read off from the number and position of the free in-
dices).
7. Tensor Fields
(a) Lorentz tensor fields are assignments of Lorentz tensors to each point of
Minkowski space,
a ...a
T : 7 Tc11...cqp () . (1.41)
32
(b) Given a vector field V a (), ab V a ()V b () is an example of a scalar field, and
given a scalar field f (), its partial derivatives give a covector field
Ua () = a f () a f () (1.42)
= ab a b (1.44)
are Lorentz invariant in the sense that they are satisfied in one inertial system
iff they are satisfied in all inertial systems.
ua a ( ) (1.46)
ua ua ab ua ub = c2 . (1.47)
ac uc cb ac ub = 0 , (1.50)
33
(c) The action for a free massive particle with worldline a ( ) is essentially the
total proper time along the path,
Z Z p
2
S[] = mc d = mc ab d a d b , (1.52)
9. Energy-Momentum 4-Vector
L
pa = = mab ub pa = mua = m(d a /d ) (1.55)
(d a /d)
(c) Lt
pk = = m(v)vk , (1.57)
v k
ab pa pb = m2 c2 E 2 = m2 c4 + p~2 c2 . (1.59)
We return to the issue discussed in the context of the Einstein equivalence principle in
section 1.1, namely physics as experienced by an observer undergoing constant accel-
eration (as a precursor to studying this observer in a genuine gravitational field), now
specifically within the framework of special relativity.
34
Specialising (1.50) to an observer accelerating in the 1 -direction (so that in the mo-
mentary restframe of this observer one has ua = (1, 0, 0, 0), aa = (0, a, 0, 0)), we will say
that the observer undergoes constant acceleration if a is time-independent. To deter-
mine the worldline of such an observer, we note that the general solution to (1.47) with
u2 = u3 = 0,
ab ua ub = (u0 )2 + (u1 )2 = 1 , (1.60)
is
u0 = cosh F ( ) , u1 = sinh F ( ) (1.61)
for some function F ( ). Thus the acceleration is
with norm
a2 = F 2 , (1.63)
and an observer with constant acceleration is characterised by F ( ) = a ,
ab a b = ( 0 )2 + ( 1 )2 = a2 (1.66)
We can now ask the question what the Minkowski metric or line-element looks like in
the restframe of such an observer. Note that one cannot expect this to be again the
constant Minkowski metric ab : the transformation to an accelerated reference system,
while certainly allowed in special relativity, is not a Lorentz transformation, while ab
is, by definition, invariant under Lorentz-transformations.
We are thus looking for coordinates that are adapted to these accelerated observers
in the same way that the inertial coordinates are adapted to static observers ( 0 is
proper time, and the spatial components i remain constant). In other words, we seek a
coordinate transformation ( 0 , 1 ) (, ) such that the worldlines of these accelerated
observers are characterised by = constant (this is what we mean by restframe, the
observer stays at a fixed value of ) and ideally such that then is proportional to the
proper time of the observer.
35
worldline of a
stationary observer
eta constant
rho constant
Figure 7: Rindler metric: Rindler coordinates (, ) cover the first quadrant 1 >
| 0 |. Indicated are lines of constant (hyperbolas, worldlines of constantly accelerating
observers) and lines of constant (straight lines through the origin). The quadrant is
bounded by the lightlike lines 0 = 1 = . An inertial observer reaches and
crosses the line = in finite proper time = 0 .
It is now easy to see that in terms of these new coordinates the 2-dimensional Minkowski
metric ds2 = (d 0 )2 + (d 1 )2 (we are now suppressing, here and in the remainder of
this subsection, the transverse spectator dimensions 2 and 3) takes the form
ds2 = 2 d 2 + d2 . (1.68)
This is the so-called Rindler metric.
The Rindler coordinates and are obvisouly in some sense hyperbolic (Lorentzian)
analogues of polar coordinates (x = r cos , y = r sin , ds2 = dx2 + dy 2 =
dr 2 + r 2 d2 ). In particular, since
0
( 1 )2 ( 0 )2 = 2 , = tanh , (1.69)
1
by construction the lines of constant , = 0 , are hyperbolas, ( 1 )2 ( 0 )2 = 20 ,
while the lines of constant = 0 are straight lines through the origin, 0 =
(tanh 0 ) 1 .
36
The metric in these new coordinates is time-independent, where time means ,
and time-independent means that the coefficients of the metric or line-element in
(1.68) do not depend on . This is due to the fact that the generator of -
time evolution is actually the generator of a Lorentz boost in the ( 0 , 1 )-plane
in Minkowski space,
= ( 0 ) 0 + ( 1 ) 1 = 1 0 + 0 1 . (1.70)
Since a Lorentz boost leaves the Minkowski metric invariant, the latter has to be
invariant under translations in , i.e. it has to be -independent, as is indeed the
case.
Along the worldline of an observer with constant one has d = 0 d, so that his
proper time parametrised path is
Even though (1.68) is just the metric of Minkowski space-time, written in accelerated
coordinates, this metric exhibits a number of interesting features that are prototypical
of more general metrics that one encounters in general relativity:
1. First of all, we notice that the coefficients of the line element (metric) in (1.68)
are no longer constant (space-time independent). Since in the case of constant
acceleration we are just describing a fake gravitational field, this dependence
on the coordinates is such that it can be completely and globally eliminated by
passing to appropriate new coordinates (namely inertial Minkowski coordinates).
Since, by the equivalence principle, locally an observer cannot distinguish between
a fake and a true gravitational field, this now suggests that a true gravitational
field can be described in terms of a space-time coordinate dependent line-element
where the coordinate dependence on the x is now such that it cannot be elimi-
nated globally by a suitable choice of coordinates.
37
coordinate singularity at the origin of standard polar coordinates in the Cartesian
plane). More generally, whenever a metric written in some coordinate system
appears to exhibit some singular behaviour, one needs to investigate whether this
is just a coordinate singularity or a true singularity of the gravitational field itself.
3. The above coordinates do not just fail at = 0, they actually fail to cover large
parts of Minkowski space. Thus the next lesson is that, given a metric in some
coordinate system, one has to investigate if the space-time described in this way
needs to be extended beyond the range of the original coordinates. One way to
analyse this question (which we will make extensive use of in sections 25 and 26
when trying to understand and come to terms with black holes) is to study light
rays or the worldlines of freely falling (inertial) observers.
In the present case, an example of an inertial observer is a static observer in
Minkowski space, i.e. an observer at a fixed value of 1 , say, with 0 = his proper
time. In Rindler coordinates this is described by the condition that 1 = cosh
is a constant, so this is most certainly not a straight line in an (, )-diagram.
Such an observer will of course discover that = + is not the end of the world
(indeed, he crosses this line at finite proper time = 1 ) and that Minkowski
space continues (at the very least) into the quadrant 0 > | 1 | (see Figure 7 for
an illustration of this).
2 d 2 = d2 d = 1 d . (1.74)
| |2 = ( 0 )2 ( 1 )2 . (1.75)
38
6. Finally we note that there is a large region of Minkowski space that is invisible
to the constantly accelerated observers. While a static observer will eventually
receive information from any event anywhere in space-time (his past lightcone
will eventually cover all of Minkowski space . . . ), the past lightcone of one of
the Rindler accelerated observers (whose worldlines asymptote to the lightcone
direction 0 = 1 ) will asymptotically only cover one half of Minkowski space,
namely the region 0 < 1 . Thus any event above the line 0 = 1 will forever be
invisible to this class of observers. Such an observer-dependent horizon has some
similarities with the event horizon characterising a black hole (see section 26.4 for
a first encounter with such an object, and section 31 for a detailed discussion).
In order to move away from constant accelerations (as models of observers in constant
gravitational fields only), we now consider the effect of arbitrary (general) coordinate
transformations on the laws of special relativity and the geometry of Minkowski space.
This may look like a somewhat exaggerated move at this point (should we perhaps
not just look at coordinate transformations to coordinates that somehow correspond to
adapted coordinates for some arbitrary accelerated observer?), but
there are many useful things that one can learn from doing this;
and we will see later (when discussing the relation between the Einstein Equiva-
lence Principle and the Principle of General Covariance in section 3.1), that the
relation between the description of physics in an arbitrary gravitational field and
the behaviour of this description under arbitrary coordinate transformations is
much closer and more far-reaching than we perhaps have the right to expect at
the moment.
Let us see what the equation of motion (1.49) of a free massive particle looks like when
written in some other (non-inertial, accelerating) coordinate system. It is extremely
useful for bookkeeping purposes and for avoiding algebraic errors to use different kinds
of indices for different coordinate systems. Thus we will call the new coordinates x ( b )
and not, say, xa ( b ).
First of all, proper time should not depend on which coordinates we use to describe the
motion of the particle (the particle couldnt care less what coordinates we experimenters
39
or observers use). [By the way: this is the best way to resolve the so-called twin-
paradox: It doesnt matter which reference system you use - the accelerating twin in
the rocket will always be younger than her brother when they meet again.] Thus
d 2 = ab d a d b
a b
= ab dx dx . (1.76)
x x
Here
a
Ja (x) = (1.77)
x
is the Jacobi matrix associated to the coordinate transformation a = a (x ), and we
will make the assumption that (locally) this matrix is non-degenerate, thus has an
inverse Ja (x) or Ja () which is the Jacobi matrix associated to the inverse coordinate
transformation x = x ( a ),
Ja Jb = ba Ja Ja = . (1.78)
We see that in the new coordinates, proper time and distance are no longer measured
by the Minkowski metric in its standard form (the constant matrix ab ), but by
d 2 = g (x)dx dx , (1.79)
a b
g (x) = ab . (1.80)
x x
The fact that the Minkowski metric written in the coordinates x in general depends
on x should not come as a surprise - after all, this also happens when one writes the
Euclidean metric in spherical coordinates etc.
It is easy to check, using (1.78), that the inverse metric, which we will denote by g ,
is given by
x x
g (x) = ab . (1.82)
a b
We will have much more to say about the metric below and, indeed, throughout this
course.
Turning now to the equation of motion, the usual rules for a change of variables give
d a a dx
= , (1.83)
d x d
40
a
where x is an invertible matrix at every point. Differentiating once more, one finds
d2 a a d2 x 2 a dx dx
= +
d 2 x d 2 x x d d
a
d x2 2 b dx dx
a
= + b
x d 2 x x d d
a
d x 2 x 2 b dx dx
= + . (1.84)
x d 2 b x x d d
Thus, since the matrix appearing outside the square bracket is invertible, in terms of the
coordinates x the equation of motion, or the equation for a straight line in Minkowski
space, becomes
d2 x x 2 a dx dx
+ a =0 . (1.85)
d 2 x x d d
The second term in this equation, which we will write as
d2 x
dx dx
+ d d = 0 , (1.86)
d 2
where
x 2 a
= , (1.87)
a x x
or, more compactly,
= Ja Ja = Ja Ja Ja J
a
, (1.88)
While (1.86) looks a bit complicated and unattractive, it is simply the general variant
of a calculation that you have probably done numerous times before in various specific
contexts. Moreover, there are at least two very useful things that we can extract or
anticipate from this equation, namely
in any theory of gravity satisfying the Einstein equivalence principle. Let us now discuss
these features in turn (relegating some uninspiring calculational details to the end of
this subsection):
41
1. the Metric as a Candidate for the Gravitational Potential
It turns out that the above (pseudo-)force term can be expressed in terms of the
partial derivatives of the metric (1.80) as
= g
(1.89)
= 12 (g , +g , g , )
It is an elementary but nevertheless useful exercise to check this (see below - but
do try this yourself as well).
This shows that the components of the metric appear to play the role of po-
tentials for the gravitational pseudo-force. In particular, since in principle all
components of the metric can contribute to , we learn the interesting fact
that in order to achieve this a single scalar potential, as in the Newtonian theory,
is completely insufficient.
If the metric indeed plays the role of the gravitational potential, as suggested by
these considerations, then it will play the role of the fundamental dynamical vari-
able of gravity. Since the metric encodes what one usually refers to as the geometry
of a space(-time), namely the information required to determine distances, areas,
volumes etc., this means that we are being led to the conclusion that any theory
of gravity based on the equivalence principle is a theory of dynamical geometry.
Wow . . .
Thus the geodesic equation transforms in the simplest possible non-trivial way
under coordinate transformations x y, namely with the Jacobi matrix
y
J = . (1.91)
x
42
We will see later that this transformation behaviour characterises/defines tensors,
in this particular case a vector (or contravriant tensor of rank 1).
In particular, since this matrix is assumed to be invertible, we reach the conclusion
that the left hand side of (1.90) is zero if and only if the term in square brackets
on the right hand side is zero,
d2 y
dy dy
d2 x
dx dx
+ = 0 + d d = 0 (1.92)
d 2 d d d 2
This is what is meant by the statement that the equation takes the same form
in any coordinate system, and is therefore satisfied in one coordinate system if
and only if it is satisfied in all coordinate systems. We see that in this case this
is achieved by having the equation transform in a particularly simple way under
coordinate transformations, namely as a tensor.
One might then, on the basis of the equivalence principle, also want to postulate that
the motion of particles in a general gravitational field, described by a metric, is then
still governed by (1.86) and (1.89). In this more general context the are referred
to as the Christoffel symbols of the metric.
Happily, as we will see below, in section 1.7, these equations need not be postulated
at all - they are simply the geodesic equations satisfied by paths that extremise proper
time (or proper distance), and are thus the Euler-Lagrange equations for the obvious
R
generalisation of the special relativistic action for a free particle, S d , to an
arbitrary metric.
1. Proof of (1.89):
From
g = ab Ja Jb (1.93)
one deduces
a
g, = ab (J Jb + Ja J
b
) (1.94)
where
a 2a
J = Ja = a
= J . (1.95)
x x
Therefore, now adopting (1.89) as the definition of the -symbols, one has
= 12 (g, + g, g, )
a
= 12 ab (J Jb + Ja J
b a b
+ J J + Ja J
b a b
J J Ja J
b
) (1.96)
= ab Ja J
b
,
where the cancellations in passing to the last line arise from the symmetries
b = J b etc.
ab = ba , J
43
Thus, finally (and writing out everything in detail for once),
= g = cd Jc Jd ab Ja J
b b
= cd Jc da ab J
(1.97)
= ca Jc ab J
b
= bc Jc J
b
= Jb J
b
,
as was to be shown.
2. Proof of (1.90):
Equating this result to (1.84) and using the chain rule for partial derivatives
y y a
= , (1.99)
x a x
one finds
d2 y
dy dy
y d2 x
dx dx
+ = + (1.100)
d 2 d d x d 2 d d
as claimed.
Above we saw that the motion of free particles in Minkowski space in curvilinear coordi-
nates is described in terms of a modified metric, g , and a force term representing
the pseudo-force on the particle. Thus the Einstein Equivalence Principle suggests
that an appropriate description of true gravitational fields is in terms of a metric tensor
g (x) (and its associated Christoffel symbols) which can only locally be related to the
Minkowski metric via a suitable coordinate transformation (to locally inertial coordi-
nates). Thus our starting point will now be a space-time equipped with some metric
g (x), which (by analogy with the Euclidean and Minkowski metrics) we will assume
to be symmetric and non-degenerate, i.e.
The metric encodes the information how to measure (spatial and temporal) distances,
as well as areas, volumes etc., via the associated line element
44
Thus a metric determines a geometry (in the literal sense of a prescription for mea-
suring distances etc.), but different metrics may well determine the same geometry,
namely those metrics which are just related by coordinate transformations. In particu-
lar, distances should not depend on which coordinate system is used. Hence, changing
coordinates from the {x } to new coordinates {y (x )} and demanding that
Remarks:
1. Here I have denoted the components of the metric in the new coordinates y simply
by g . Occasionally it is more convenient to use a more elaborate notation, such
as
x x = y g g
= J J g , (1.105)
which allows one to distinguish notationally specific components of the metric in 2
(the (11)-component of the metric in the
different coordinate systems, such as g11
y-coordinates) from g11 (the (11)-component of the metric in the x-coordinates).
As mentioned before, indices and other decorations are primarily bookkeeping
devices; therefore I will usually not be overly-pedantic about these things in the
following and will use whatever notation is more convenient in the case at hand.
Clearly, the inverse metric then transforms inversely, i.e. with the inverse Jacobi
matrices J , and this is now nicely compatible with the convention to denote the
inverse metric by upper indices,
g = J J g . (1.107)
This is also the rationale for writing the invese metric with upper indices: the
positioning of indices is used to indicate how an object transforms under coordinate
transformations (and we will formalise this in the discussion of section 3 on tensor
algebra).
45
3. A space-time equipped with a metric tensor g (x) is called a metric space-time
or (pseudo-)Riemannian space-time. Here Riemannian usually refers to a space
equipped with a positive-definite metric (all eigenvalues positive), while pseudo-
Riemannian (or Lorentzian) refers to a space-time with a metric with one negative
and 3 (or 27, or whatever) positive eigenvalues.
4. One point to note about the tensorial transformation behaviour is that pointwise
it is a similarity transformation in the sense of linear algebra, in matrix notation
g 7 J t gJ . (1.108)
Here are some examples of Riemannian metrics that you may already be familiar with.
Examples:
and plugging this into the Euclidean line-element dx2 + dy 2 + dz 2 , one finds the
above result.
Denoting the Cartesian coordinates by x and the spherical coordinates by y ,
with (y 1 = r, y 2 = , y 3 = ), the non-vanishing components of the metric in the
two coordinate systems are thus (using the prime notation (1.105))
g11 = g22 = g33 = 1 , g11 = 1 , g22 = r 2 , g33 = r 2 sin2 . (1.111)
Alternatively, it is often more informative (and very common) to use the coor-
dinates themselves, rather than indices, as the labels of the components of the
metric tensor. In this case one can dispense with the prime notation and simply
write the components of the metric in spherical coordinates as
46
2. Restricting the first example above to constant radius r = R, this gives us the
1 of radius R,
line-element on the circle SR
ds2 (SR
1
) = R2 d2 . (1.113)
2 of radius R,
Restricting the second to the 2-sphere SR
x2 + y 2 + z 2 = r 2 = R 2 or r=R , (1.114)
ds2 (SR
2
) = R2 (d 2 + sin2 d2 ) R2 d2 . (1.115)
Here
d2 = d 2 + sin2 d2 (1.116)
is usually called the solid angle, and we can now interpret it as the line element
on the unit 2-sphere. We will use the notation / abbrevation d2 for this line
element throughout the notes.
This example provides a nice illustration of the fact that by drawing the coordinate
grid / infinitesimal parallelograms determined by the metric tensor, one can get a
feeling for the geometry and can in particular convince onseself that in general a
metric space or space-time need not or cannot be flat, i.e. is not the flat Euclidean
space of Euclidean geometry.
Indeed, the coordinate grid of the metric d 2 + sin2 d2 cannot be drawn in
flat space because the infinitesimal parallelograms described by ds2 degenerate
to triangles not just at = 0 (as would also be the case for the flat metric
ds2 = dr 2 + r 2 d2 in polar coordinates at r = 0), but also at = . This
coordinate grid can, on the other hand, of course be drawn on the 2-sphere.
and (if required) this can be continued iteratively to yet higher-dimensional spheres.
4. If instead of the unit 2-sphere one considers the unit hyperboloid H 2 , defined
by
x2 + y 2 + z 2 = +1 x2 + y 2 z 2 = 1 , (1.118)
then this is naturally thought of as being embedded not in R3 but in R1,2 , i.e. into
the 3-dimensional vector space with line-element
47
The hyperbolic analogues (r, , ) of the spherical coordinates, defined by
x2 + y 2 z 2 = r 2 (1.121)
so that the unit hyperboloid is evidently just the surface r = 1. In these coordi-
nates, the metric (1.119) takes the form
Examples:
(i.e. the components depend only on the spatial coordinates xi , not on t).
48
3. Somewhat more generally, the spatial comopnents of the metric can depend non-
trivially on time. For example, a space-time metric describing a spatially spherical
universe with a time-dependent radius (expansion of the universe!) might be
described by the line element
ds2 = dt2 + a(t)2 d 2 + sin2 (d 2 + sin2 d2 ) , (1.127)
and more generally one can consider the corresponding generalisation of (1.126),
namely metrics of the form
This describes a space-time with spatial metric gij (x)dxi dxj and a time-dependent
radius a(t); in particular, such a space-time metric can describe an expanding
universe in cosmology. We will discuss such metrics in detail later on in the
context of cosmology, sections 32-37.
The characteristic feature of metrics with Lorentzian signature is of course the presence
of timelike and null (lightlike) directions, and thus in a pseudo-Riemannian space-time
one has the same distinction between spacelike, timelike and lightlike separations as in
Minkowski space(-time). Infinitesimal
49
and null or lightlike if g (x)V (x)V (x) = 0,
and a curve x () is called spacelike if its tangent vector is everywhere spacelike etc.
Using the definition of a vector in general relativity (to be introduced in section 3),
namely an object that transforms in the obvious way, with the Jacobi matrix, under
coordinate transformations, one sees that g (x)V (x)V (x) is a scalar, i.e. invariant
under coordinate transformations, and hence the statement that a vector is, say, space-
like is a coordinate-independent statement, as it should be.
When the metric (1.124) is (time-space) block-diagonal, i.e. when the mixed components
g0k = 0 (as in all of the above examples), then the timelike and spacelike directions are
easy to distinguish by inspection. Typically then the spatial metric gik is positive
definite, and thus necessarily g00 < 0.
When some of the g0k are non-zero, on the other hand, one has a more intricate mixing
of time- and space-directions. This can also be seen from the components of the inverse
metric. Indeed, from (1.106), one finds
In particular, this shows that in general (i.e. unless the off-diagonal components g0k are
all zero), the spatial components gik of the inverse metric are not the inverse of the
spatial components gij of the metric. Rather, using (1.131) one has
1
gij gi0 gj0 gjk = ik . (1.133)
g00
At this point the question naturally arises how one can tell whether a given (perhaps
complicated looking) metric is just the flat (Euclidean or Minkowski) metric written
in other coordinates or whether it describes a genuinely curved space-time. We will see
later that there is an object, the Riemann curvature tensor, constructed from the second
derivatives of the metric, which has the property that all of its components vanish if and
only if the metric is a coordinate transform of the flat space Minkowski metric. Thus,
given a metric, by calculating its curvature tensor one can decide if the metric is just
the flat metric in disguise or not. The curvature tensor will be introduced in section 7,
and the above statement will be established in section 10.2.
50
1.7 Geodesic Equation from the Extremisation of Proper Time
We have seen that the equation for a straight line in Minkowski space, written in arbi-
trary coordinates, is
d2 x
dx dx
+ d d = 0 , (1.134)
d 2
where the pseudo-force term is given by (1.87). We have also seen in (1.89) (pro-
vided you checked this) that can be expressed in terms of the metric (1.80) as
= 12 g (g , +g , g , ) . (1.135)
This gravitational force term is fictitious since it can globally be transformed away by
going to the global inertial coordinates a . The equivalence principle suggests, however,
that in general the equation for the worldline of a massive particle, i.e. a path that
extremises proper time, in a true gravitational field is also of the above form.
We will now confirm this by deriving the equations for a timelike path that extremises
proper time from a variational principle. These paths will be referred to as (timelike)
geodesics. We will briefly return below to the (delicate) issue to which extent these can
be regarded as world lines of actual massive particles.
Recall first of all from special relativity that the Lorentz-covariant description of the
dynamics of a massive particle is based on describing the timelike worldline of the
particle in the parametric form
a = a ( ) (1.136)
d 2 = ab d a d b . (1.137)
We can adopt the same set-up and action in the present setting. Thus we parametrise
the worldlines by
x = x ( ) , (1.141)
51
invariant under general coordinate transformations (provided that one transforms the
metric appropriately). The corresponding 4-velocity
dx
u = (1.143)
d
is again normalised as
g u u = 1 , (1.144)
Of course m drops out of the variational equations (as it should by the equivalence
principle) and we will therefore ignore m in the following.
and to write Z Z Z
dx 1/2
d = (d /d)d = (g dx
d d ) d . (1.147)
keeping the end-points fixed, and will denote the -derivatives by x ( ). By the standard
variational procedure one then finds
Z Z
1 dx dx 1/2 dx dx dx dx
d = 2 (g d d ) d g 2g
d d d d
Z h i
1
= d g , x x x + 2g x
x + 2g , x x x
2
Z h i
= + 12 (g , +g , g , )x x x
d g x (1.149)
Here the factor of 2 in the first equality is a consequence of the symmetry of the metric,
the second equality follows from an integration by parts, the third from relabelling the
indices in one term and using the symmetry in the indices of x x in the other.
= 12 g (g , +g , g , ) , (1.150)
52
Thus we see that indeed the equations for a timelike geodesic in an arbitrary gravita-
tional field are
d2 x
dx dx
+ d d = 0 . (1.152)
d 2
Remarks:
1. By definition, massive test particles are those particles that satisfy the above
geodesic equation, i.e. that follow timelike geodesics in space-time. However, it
needs to be borne in mind that this notion of a test particle is a fiction, in particular
as it neglects the backreaction, i.e. the change in the background gravitational field
due to the mass of the particle. Moreover, real particles either have a finite extent
(in which case this finite size should play a role in their equations of motion) or
are considered to be point-like. However, the notion of a point-like particle is
extremely dangerous and delicate in general relativity: as we will see later, if a
given total mass is concentrated in a sufficiently small region of space-time (and
point-like certainly qualifies as sufficiently small), then one will end up with a
black hole rather than with the description of a particle. The correct description
of point particles in general relativity is a complicated issue and an active area of
research.5
2. One can also consider spacelike paths that extremise (minimise) proper distance,
by using the action Z
S0 ds (1.153)
where
ds2 = g (x)dx dx (1.154)
is the proper distance (or arc-length in the traditional terminology of the differ-
netial geometry of curves).
One should also consider massless particles, whose worldlines will be null (or
lightlike) paths. However, in that case one can evidently not use proper time or
proper distance, since these are by definition zero along a null path, ds2 = 0. We
will come back to this special case, and a unified description of the massive and
massless case, below (section 2.1). In all cases, we will refer to the resulting paths
as geodesics. If required, we add the qualifier timelike, spacelike or null,
and this is meaningful and unambiguous since, as we will see below, a geodesic
that is initially timelike will always remain timelike etc.
We will have much more to say about geodesics and variational principles in section 2.
5
See e.g. E. Poisson, A. Pound, I. Vega, The Motion of Point Particles in Curved Spacetime,
arXiv:1102.0529 [gr-qc] for a detailed discussion and many references (but you will need to acquire
a solid understanding of tensor analysis first).
53
1.8 Christoffel Symbols and Coordinate Transformations
The Christoffel symbols play the role of the gravitational force term, and thus in this
sense the components of the metric play the role of the gravitational potential. These
Christoffel symbols play an important role not just in the geodesic equation but, as
we will see later on, more generally in the definition of a covariant derivative operator
and the construction of the curvature tensor, and thus ultimately also in the generally
covariant description of the dynamics of the gravitational field itself.
Two elementary important properties of the Christoffel symbols are that they are sym-
metric in the second and third indices,
= , = (1.155)
(this follows simply from the definition), and that symmetrising over the first pair
of indices one finds
+ = g, (1.156)
(and this follows from noting that 4 of the 6 partial derivative terms of the metric cancel
in this linear combination while 2 add up)
Knowing how the metric transforms under coordinate transformations, we can now also
determine how the Christoffel symbols (1.135) and the geodesic equation transform. A
straightforward but not particularly inspiring calculation (which you should nevertheless
do) shows that under x y the Christoffel symbols are related by
y x x y 2 x
= + , (1.157)
x y y x y y
or
= J J J + J J . (1.158)
Namely, after another not terribly inspiring calculation (which you should nevertheless
also do at least once in your life) , one finds
d2 y
dy dy
y d2 x
dx dx
+ = + . (1.159)
d 2 d d x d 2 d d
This is analogous to the result (1.90) that we had obtained before in Minkowski space,
and the same remarks about covariance and tensors etc. apply. An explicit proof of
(1.158) and (1.159) is given at the end of this subsection. A more general result along
54
these lines will be established in section 4.1 below, when we introduce the covariant
derivative of a vector field.
Remarks:
1. That the geodesic equation transforms in this simple way (namely as a vector)
should not come as a surprise. We obtained this equation as a variational equation.
The Lagrangian itself is a scalar (invariant under coordinate transformations), and
the variation x is (i.e. transforms like) a vector,
y
y = x = J x . (1.160)
x
Putting these pieces together, one finds the desired result.
3. There is of course a very good physical reason for why the force term in the
geodesic equation (quadratic in the 4-velocities) is not tensorial. This simply
reflects the equivalence principle that locally, at a point (or in a sufficiently small
neighbourhood of a point) you can eliminate the gravitational force by going to
a freely falling (inertial) coordinate system. This would not be possible if the
gravitational force term in the equation of motion for a particle were tensorial.
1. Proof of (1.158)
For partial derivatives one has the chain rule = J ( is a covector).
Therefore for the partial derivatives of the metric one has
g, = (J J g ), = g, J J J + (J
J + J J
)g . (1.161)
55
Adding up the 3 terms comprising the Christoffel symbol , one obtains
2 = g, + g, g,
= 2J J J (1.162)
+ (J J + J J
+ J J + J J
J J J J
)g .
In the last line, the 3rd term cancels against the 5th (because J is symmetric),
the 1st term cancels against the 6th (because J and g are symmetric), while
the 2nd and 4th term add up, so that one finds
= J J J + J J
g . (1.163)
Now the hard work has been done. Raising the 1st index of the Christoffel symbol,
using the inverse metric
g = g J J , (1.164)
it is now simple to see that one obtains the claimed result (4),
= g = J J J + J J
. (1.165)
For example, for the 2nd term one has (just using properties of inverse Jacobi
matrices and metrics)
g J J
g = g J J J J
g = g J J
g
(1.166)
= g J J
g = J J
= J J
2. Proof of (1.159)
The 4-velocities transform as vectors (the chain rule again), y = J x . Therefore
for the acceleration one has
y = J x
+ J
x x . (1.167)
Therefore
x + J J y y ) + J J
y + y y = J (
y y + J x x
(1.168)
= J (
x + x x ) + (J J
+ J J J )y y
The 1st term will give us the desired result, and cooperatively the 2nd term is
identically zero because (use = J again)
0 = ( ), = (J J ), = J
J J + J J . (1.169)
56
1.9 Apology and Outlook
You may feel that, after a promising start in sections 1.1 and 1.3, the things that we
have done subsequently, in particular in sections 1.4 and 1.8, look terribly messy. I
agree, indeed they are! However, I can assure you that things will improve dramatically
rather quickly and that this section 1 is by far the messiest part of the entire lecture
notes.
Indeed, the main purpose and benefit of developing tensor calculus in the next couple
of sections is to develop a formalism in the framework of which (among other things)
one can avoid having to deal explicitly with objects that transform in complicated
ways under coordinate transformations
the transformation behaviour of any object is manifest (and does not have to be
checked)
This tensor calculus formalism is simple, elegant and efficient and will then allow us
to make rapid progress towards describing the dynamics in a (and subsequently of the)
gravitational field in a way compatible with the Einstein equivalence principle.
57
2 Physics and Geometry of Geodesics
Let us first verify that S1 really leads to the same equations of motion as S0 . Either by
direct variation of the action, or by using the Euler-Lagrange equations
d L L
=0 , (2.4)
d (dx /d) x
one finds that the action is indeed extremised by the solutions to the equation
d2 x
dx dx
+ d d = 0 . (2.5)
d2
This is identical to the geodesic equation derived from S0 (with , the proper
time). This is essentially all we will need, and we will make extensive use of this simpler
Lagrangian for geodesics througout these notes.
Even though not strictly required in the following, it is nevertheless quite instructive in
its own right to try to understand and establish the precise relation between these two
actions S0 and S1 , and this is the subject of the remainder of this subsection.
Thus, what is the relation (if any) between the two actions? In order to explain this, it
will be useful to introduce an additional field e() (i.e. in addition to the x ()), and a
master action (or parent action) S which we can relate to both S0 and S1 . Consider
the action
Z Z
1 dx dx
1
S[x, e] = 2 d e() g m e() = d e()1 L 21 m2 e() (2.6)
2
d d
58
The crucial property of this action is that it is parametrisation invariant provided that
one declares e() to transform appropriately. It is easy to see that under a transforma-
tion = f (), with
x = x ()
() = f ()d
d (2.7)
the action S[x, e] is invariant provided that e() transforms such that e()d is invariant,
i.e.
e()d =! e()d e() = e()/f () . (2.8)
Indeed, this is evident when one writes the action (2.6) in the form
Z
dx dx
S[x, e] = 12 e()d e()2 g m2 (2.9)
d d
and notes that d and e() only appear in the combinations e()d and e()1 (d/d).
Now what is the relation between the action S[x, e] and the two standard actions
S0 [x] and S1 [x]?
The first thing to note now is that, courtesy of this parametrisation invariance, we
can always choose a gauge in which e() = 1. With this choice, the action S[x, e]
manifestly reduces to the action S1 [x] modulo an irrelevant field-independent con-
stant, Z Z
S[x, e = 1] = d L 21 m2 d = S1 [x] + const. . (2.10)
Alternatively, instead of fixing the gauge, we can try to eliminate e() (which
appears purely algebraically, i.e. without derivatives, in the action) by its equation
or motion. Varying S[x, e] with respect to e(), one finds the constraint
dx dx
g + m2 e()2 = 0 . (2.11)
d d
This is just the usual mass-shell condition in disguise. It suggests that a better
gauge fixing than e() = 1 would have been e() = m1 . However, the sole effect
of this would have been to replace L in (2.10) by mL,
In any case, for a massive particle, m2 6= 0, one can alternatively solve (2.11) for
e(), r
dx dx
e() = m1 g . (2.13)
d d
59
Using this to eliminate e() from the action, one finds
Z r Z
1 dx dx
S[x, e = m . . .] = m d g = m d = S0 [x] . (2.14)
d d
Thus for m2 6= 0 we find exactly the original action (integral of the proper time)
S0 [x] (and since we have not touched or fixed the parametrisation invariance, no
wonder that S0 is parametrisation invariant).
Thus we have elucidated the common origin of the actions S0 and S1 for a massive
particle.
The perspective provided by the parent action S[x, e] also gives some further insights.
For example, an added benefit of the parent action S[x, e] is that it also makes perfect
sense for a massless particle. For m2 = 0, the mass shell condition
dx dx
g =0 (2.15)
d d
says that these particles move along null lines, and the action reduces to
Z
dx dx
S[x, e] = 2 d e()1 g
1
(2.16)
d d
which is parametrisation invariant but can (as in the massive case) be fixed to e() = 1,
upon which the action reduces to S1 [x]. Thus we see that S1 [x] indeed provides a simple
and unfied action for both massive and massless particles, and in both cases the resulting
equation of motion is the (affinely parametrised) geodesic equation (2.5),
d2 x
dx dx
+ =0 . (2.17)
d2 d d
Remarks:
1. The infinitesimal form of the invariance of the action S[x, e] under (2.7) and (2.8) is
obtained by considering the infinitesimal transformation of x () and e() induced
by an infinitesimal transformation = + (),
dx ()
= () x () = ()
d (2.18)
d() de()
e() = e() + () .
d d
Here the (at first perhaps somewhat peculiar looking) transformation behaviour of
the auxiliary field e() arises from the transformation behaviour (2.8) by setting
60
and calculating (keeping at most linear terms in ())
e() = f () = (1 + ()) e( + ())
e()
= (1 + ()) ( e ())
e() + () (2.20)
= e() + ()
e() + ()
e () .
Under this infinitesimal transformation, the Lagrangian
61
3. This is as it should be: something that starts off as a massless particle will remain
a massless particle etc. If one imposes the initial condition
dx dx
g = , (2.27)
d d =0
then this condition will be satisfied for all . In particular, therefore, one can
choose = 1 for timelike (spacelike) geodesics, and can then be identified with
proper time (proper distance), while the choice = 0 sets the initial conditions
appropriate to massless particles (for which is then not related to proper time
or proper distance).
p p + m2 = 0 (2.30)
To understand the significance of how one parametrises the geodesic, observe that the
geodesic equation itself,
+ x x = 0 ,
x (2.32)
d2 x
dx dx
f dx
+ = . (2.34)
d 2 d d f2 d
62
Thus the geodesic equation retains its form only under affine changes of the proper time
parameter , f ( ) = a + b, and parameters = f ( ) related to by such an affine
transformation are known as affine parameters.
From the first variational principle, based on S0 , the term on the right hand side arises
in the calculation of (1.149) from the integration by parts if one does not switch back
from to the affine parameter . The second variational principle, based on S1 and the
Lagrangian L, on the other hand, always and automatically yields the geodesic equation
in affine form.
d2 x
dx dx
dx
+ d d = () , (2.35)
d 2 d
for some function () (the inaffinity), we can deduce that this curve is the trajectory
of a geodesic, but that it is simply not parametrised by an affine parameter (like proper
time in the case of a timelike curve). Comparison of (2.34) and (2.35) shows that, given
(), an affine parameter is determined by
f d d
(f ( )) = () = ln (2.36)
f2 d d
or R
d
= e ds (s) . (2.37)
d
In the following, whenever we talk about geodesics we will practically always have in
mind the variational principle based on S1 leading to the geodesic equation (2.17) in
affinely parametrised form.
However, it should be kept in mind that sometimes non-affine parameters appear nat-
urally. For instance, it is occasionally convenient to parametrise timelike geodesics in a
geometry with coordinates x = (x0 = t, xk ) not by x = x ( ), where is the proper
time along the geodesic, but rather as xk = xk (t). This is the same curve, but described
with respect to coordinate time (which could for instance agree with the proper time of
some other, perhaps static, observer). The curve t (t, xk (t)) will not be an affinely
parametrised curve unless t itself satisfies the geodesic equation
t = 0 t = a + b . (2.38)
One occasion where this will play a role (and from where I have borrowed the symbol
for the inaffinity) is in our discussion, much later, of the horizon of a black hole,
where the lack of a certain coordinate to be an affine parameter is directly related to
the physical properties of black holes (see section 26.9). In this context is known as
the surface gravity of a black hole.
63
2.3 Example: Geodesics in R2 in Polar Coordinates
It is high time to consider an example. We will consider the simplest non-trivial metric,
namely the standard Euclidean metric on R2 in polar coordinates. Thus the line element
is
ds2 = dx2 + dy 2 = dr 2 + r 2 d2 (2.39)
and the non-zero components of the metric are
and
grr = 1 , g = r 2 . (2.41)
respectively. Since this metric is diagonal, the non-zero components of the inverse metric
g are
g xx = g yy = 1 (2.42)
and
grr = 1 , g = r 2 (2.43)
respectively.
A reminder on notation (cf. the dsicussion leading to (1.112)): since , in g are
coordinate indices, we should really have called x1 = r, x2 = , say, and written
g11 = 1, g22 = r 2 , etc. However, writing grr etc. is more informative and useful since one
then knows that this is the (rr)-component of the metric without having to remember if
one called r = x1 or r = x2 . In the following we will frequently use this kind of notation
when dealing with a specific coordinate system, while we retain the index notation g
etc. for general purposes.
Let us now look at the geodesic equations for this metric, first in the Cartesian coordi-
nates (x, y) and then in the polar coordinates (r, ).
1. Cartesian coordinates
Since the metric in Cartesian coordinates is the constant Euclidean metric g =
, all the partial derivatives of the metric are zero, and therefore also all the
Christoffel symbols are zero. The geodesic equations thus take the form
x
= y = 0 . (2.44)
These equations could also have been obtained as the Euler-Lagrange equations
of the Lagrangian
L = 12 (x 2 + y 2 ) . (2.45)
The general solution is
64
Combining these two, one finds the standard representation
y = kx + e (2.47)
2. Polar coordinates
Now let us consider the same problem in polar coordinates. The crucial point here
is that in these coordinates the geodesic equations will not simply be r = = 0,
but that there are additional terms arising
Taking the latter point of view, the Christoffel symbols of this metric are to be
calculated from
= 12 (g, + g, g, ) . (2.48)
Since the only non-trivial derivative of the metric is g,r = 2r, only Christoffel
symbols with exactly two s and one r are non-zero,
r = g r = g rr r = r
1
r = r = g r = g r = . (2.50)
r
Note that here it was even convenient to use a hybrid notation, as in gr , where
r is a coordinate and is a coordinate index. Once again, it is very convenient to
permit oneself to use such a mixed notation.
In any case, having assembled all the Christoffel symbols, we can now write down
the geodesic equations (one again in the convenient hybrid notation). For r one
has
r + r x x = 0 , (2.51)
r r 2 = 0 . (2.52)
65
Here the factor of 2 arises because both r and r = r contribute.
Remarks:
(a) This equation is supposed to describe geodesics in R2 , i.e. straight lines. This
can be verified in general (but, in general, polar coordinates are of course not
particularly well suited to describe straight lines). However, it is easy to
find a special class of solutions to the above equations, namely curves with
= r = 0. These correspond to paths of the form
which are a special case of straight lines, namely straight lines through the
origin.
(b) The geodesic equations can of course also be derived as the Euler-Lagrange
equations of the Lagrangian
L = 21 (r 2 + r 2 2 ) . (2.55)
66
The next simplest example to discuss would be the two-sphere with its standard metric
d 2 + sin2 d2 . It will appear, in bits and pieces, in section 2.5 to illustrate the general
remarks.
As another example, let us consider the ultrastatic metrics introduced in (1.126) with
coordinates x = (t, xk ) and line-element
Because g00 = 1, g0k = 0, and the gik = gik are time-independent, all Christoffel
symbols with at least one x0 - or t-index are zero,
0 = 0 = 0 = 0 , (2.60)
and the purely spatial components of the Christoffel symbols agree with those of the
spatial metric,
i .
ijk = (2.61)
jk
t = 0 , (2.62)
x i x j x k = 0
i + (2.63)
jk
where the dot denotes a derivative with respect to the affine parameter . The first
equation tells us that
t = 0 t( ) = a + t0 . (2.64)
Thus provided that a 6= 0 we can use t instead of to parametrise the paths (and in
the present case t is then also an affine parameter, cf. the discussion in section 2.2 in
connection with (2.38)), and then one can rewrite the spatial equations as equations for
xi = xi (t),
d2 xi i dxj dxk
+ jk =0 . (2.65)
dt2 dt dt
Therefore the solutions to the space-time geodesic equations have the form
where xi (t) is an affinely parametrised geodesic for the metric gij . When a = 0, one
cannot change variables from to t because t = t0 is fixed. One is then necessarily
dealing with spacelike geodesics in space-time and the solutions have the form
x ( ) = (t0 , xi ( )) (2.67)
67
where xi ( ) is again an affinely parametrised geodesic for the metric gij .
These sorts of considerations evidently generalise to more general metrics of this direct
product form,
ds2 = gab (y)dy a dy b + gik (x)dxi dxk , (2.68)
with the conclusion that geodesics in such space-times have the form (y a ( ), xi ( )) with
y a ( ) and xi ( ) individually solutions of the geodesic equations for the metric gab (y)
respectively gik (x).
Recall from above that the geodesic equation for a metric g can be derived from the
Lagrangian L = (1/2)g x x
d L L
=0 . (2.69)
d x x
This has several immediate consequences which are useful for the determination of
Christoffel symbols and geodesics in practice.
p1 = L/ x 1 = g1 x (2.70)
Remarks:
(a) One might perhaps have wanted to argue that the definition (and interpre-
tation) of conserved momenta should be based on the physical Lagrangian
(2.1) r
dx dx
L0 = m g (2.71)
d d
R
with action S = m d , but this makes no difference since the two momenta
are essentially equal: one has
L0
= mp1 (2.72)
(dx1 /d)
68
with p1 as defined in (2.70), so that this just supplies us with the additional
information that the momenta obtained from the Lagrangian L should (for a
massive particle) be interpreted as momenta per unit mass. This discrepancy
could have been avoided by working with the Lagrangian mL (alternatively:
fixing the gauge e() = m1 in section 2.1, see (2.12)), but unless or until
one starts coupling the particle to fields other than the gravitational field it
is unnecessary (and a nuisance) to carry m around all the time.
(b) For example, on the two-sphere the Lagrangian reads
L = 21 ( 2 + sin2 2 ) . (2.73)
The angle is a cyclic variable and the angular momentum (actually angular
momentum per unit mass for a massive particle)
L
p = = sin2 (2.74)
is a conserved quantity. This generalises to conservation of angular momen-
tum for a particle moving in an arbitrary spherically symmetric gravitational
field.
(c) Likewise, if the metric is independent of the time coordinate x0 = t, the
corresponding conserved quantity
p0 = g0 x E (2.75)
has the interpretation as minus the energy (per unit mass) of the particle,
minus because, with our sign conventions, p0 = E in special relativity.
We will discuss the relation between this notion of energy and the notion
of energy familiar from special relativity (this requires an asymptotically
Minkowski-like metric) in more detail in section 24.1.
(d) We will discuss in more detail in section 2.6 (and then again in sections
8 and 9) how to detect and describe symmetries and conserved charges in
coordinate systems in which the symmetries are not as manifest (via cyclic
variables) as above.
L = 21 (y 2 + g x x ) , (2.76)
y 21 g ,y x x = 0
+ terms proportional to x = 0 .
x (2.77)
69
Therefore x = 0, y = 0 is a solution of the geodesic equation, and it describes
motion along the coordinate lines of y.
Remarks:
(a) In the case of the two-sphere, with its metric ds2 = d 2 + sin2 d2 , this
translates into the familiar statement that the great circles, the coordinate
lines of y = , are geodesics.
(b) The result is also valid when y is a timelike coordinate. For example, consider
a space-time with coordinates (t, xi ) and metric (1.128)
+ x x = 0
x (2.80)
Remarks:
x x = 11 (x 1 )2 + 212 x 1 x 2 + . . . (2.81)
(b) For example, once again in the case of the two-sphere, for the -equation one
has
d L L
= , = sin cos 2 . (2.82)
d
Comparing the resulting Euler-Lagrange equation
70
with the geodesic equation
+ 2 + 2 + 2 = 0 , (2.84)
Likewise, from
d L L
=0 sin2 ( + 2 cot )
=0 (2.86)
d
= = cot , = = 0 . (2.87)
As will be discussed in section 23.2, this is the general form of a static spher-
ically symmetric metric, and as such will provide us with the starting point
for describing the gravitational field of a star. The corresponding Lagrangian
is
L = 21 A(r)t2 + B(r)r 2 + r 2 ( 2 + sin2 2 ) , (2.89)
(a prime denoting an r-derivative), from which one can immediately read off
A
trt = ttr = , t = 0 otherwise . (2.91)
2A
Likewise, the equation for r takes the form
B 2 A 2
r + r + t + = 0 , (2.92)
2B 2B
and from this one can read off that
B A
rrr = , rtt = , ... (2.93)
2B 2B
As we will need them anyway in section 23.3, it is a good exercise to determine
all the Christoffel symbols in this way.
71
2.6 Conserved Charges and (a first encounter with) Killing Vectors
In the previous section we have seen that cyclic coordinates, i.e. coordinates the metric
does not depend on, lead to conserved charges, as in (2.70). As nice and useful as this
may be (and it is nice and useful), it is obvioulsy somewhat unsatisfactory because it
is an explicitly coordinate-dependent statement: the metric may well be independent
of one coordinate in some coordinate system, but if one now performs a coordinate
transformation which depends on that coordinate, then in the new coordinate system
the metric will typically depend on all the new coordinates. Nevertheless,
thus there should be a corresponding first integral of the geodesic equation in any
coordinate system.
To see how this works, let us reconsider the situation discussed in the previous section,
namely a metric which in some coordinate system, we will now call it {y }, has com-
ponents g which are independent of y 1 , say. Translation invariance of the geodesic
Lagrangian is the statement that the Lagrangian is invariant under the infinitesimal
variation y 1 = , y = 0 otherwise, and via Noethers theorem this leads to a con-
served charge g1 y , as in (2.70).
Now we ask ourselves what this statement corresponds to in another coordinate system.
Note that in the y-coordinates, invariance is the statement that the metric is invariant
under the (infinitesimal) coordinate transformation y 1 y 1 + or y 1 = , y = 0
otherwise,
g y1 g = 0 . (2.94)
It is then clear that in another coordinate system, infinitesimal y 1 -translations must also
correspond to some infinitesimal coordinate transformation (but not necessarily just a
translation),
x = V (x) . (2.95)
In particular, if (as in the above example) in y-coordinates V has the components
V 1 = 1, V = 0 otherwise, then in any other coordinate system one has
x = (x /y )y = (x /y 1 ) (2.96)
so that
V = J1 (2.97)
is just the corresponding column of the Jacobi matrix.
72
1. We can investigate directly, under which conditions on the V the transformation
(2.95) leads to an invariance of the Lagrangian (2.2). Using
x = x V (2.98)
where
V g = V g + ( V )g + ( V )g (2.100)
Thus the condition for the infinitesimal transformation (2.95) to leave the La-
grangian invariant is
V g = 0 . (2.101)
QV = p V = g V x . (2.102)
Note that for constant components V , (2.101) is simply the statement that the
metric is constant in the direction V , V g = 0.
g = J J g , y1 = (y1 x ) J1 V (2.103)
J1 J = 1 J = J1 = V = J V (2.105)
All of this may seem a bit ham-handed at this point, and indeed it is. However, we will
see later how these results can be written and understood in a much more pleasing and
covariant way. In particular, we will see in section 4.5 how to write (2.100) in a way
that makes it completely manifest that it transforms like the metric under coordinate
transformations. Moreover, we will discover in section 8 that (2.100) is a special case of
73
the Lie derivative of a tensor field along a vector field V , denoted by LV . Continuous
symmetries of a metric correspond to vector fields along which the Lie derivative of the
metric vanishes. Such vectors are known as Killing vectors, and are thus vectors V
satisfying the Killing equation (2.101),
LV g V g = 0 . (2.106)
We saw that the 10 components of the metric g appear to play the role of potentials for
the gravitational force. In order to substantiate this, and to show that in an appropriate
limit this setting is able to reproduce the Newtonian results, we now want to find the
relation of these potentials to the Newtonian potential, and the relation between the
geodesic equation and the Newtonian equation of motion for a particle moving in a
gravitational field.
First let us determine the conditions under which we might expect the general relativistic
equation of motion (namely the non-linear coupled set of partial differential geodesic
equations) to reduce to the linear equation of motion
d2 ~
~x = (2.107)
dt2
of Newtonian mechanics, with the gravitational potential, e.g.
GN M
= . (2.108)
r
Thus we are trying to characterise the circumstances in which we know and can trust
the validity of Newtons equations, such as those provided e.g. by the gravitational field
of the earth or the sun, the gravitational fields in which Newtons laws were discovered
and tested. Two of these are fairly obvious:
1. Weak Fields: our first plausible assumption is that the gravitational field is in a
suitable sense sufficiently weak. We will need to make more precise by what we
mean by this, and we will come back to this below.
2. Slow Motion: our second, equally reasonable and plausible, assumption is that the
test particle moves at speeds at which we can neglect special relativistic effects, so
slow should be taken to mean that its velocity is small compared to the velocity
of light.
Interestingly, it turns out that one more condition is required. Note that the gravita-
tional fields we have access to are not only quite weak but also only very slowly varying
in time, and we will add this condition,
74
3. Stationary Fields: we will assume that the gravitational field does not vary sig-
nificantly in time (over the time scale probed by our test particle).
The very fact that we have to add this condition in order to find Newtons equations
(as will be borne out by the calculations below) is interesting in its own right, because
it also shows that general relativity predicts phenomena deviating from the Newtonian
picture even for weak fields, provided that they vary sufficiently rapidly (e.g. quickly
oscillating fields), and one such phenomenon is that of gravitational waves (see section
22).
Now, having formulated in words the conditions that we wish to impose, we need to
translate these conditions into equations that we can then use in conjunction with the
geodesic equation.
1. In order to define a notion of weak fields, we need to keep in mind that this is
not a coordinate-independent statement since we can simulate arbitrarily strong
gravitational fields even in Minkowski space by going to suitably accelerated co-
ordinates, and therefore a weak field condition will be a condition not only on
the metric but also on the choice of coordinates. Thus we assume that we can
choose coordinates {x } = {t, xi } in such a way that in these coordinates the
metric differs from the standard constant Minkowski metric only by a small
amount,
g = + h (2.109)
2. The second condition is obviously (with the coordinates chosen above) dxi /dt 1
or, expressed in terms of proper time,
dxi dt
. (2.110)
d d
(for a discussion and explanation of the difference betwen the term stationary
used here and the term static used e.g. to describe the metric (2.88), see section
15.4 - it is not crucial here).
+ x x = 0 .
x (2.112)
75
From the decomposition g = +h we see that is at least linear in h , and by
the weak field condition (condition 1) we will only retain the terms linear in h . Then
the condition of slow motion (condition 2) implies that among the quadratic terms x x
we need to only retain the leading term, namely tt. Thus the geodesic equation can be
approximated by
x + 00 t2 = 0 . (2.113)
From the weak field condition (condition 1), which allows us to write
g = + h g = h , (2.115)
where
h = h , (2.116)
we learn that
00 = 12 i i h00 , (2.117)
t = 0
i =
x 1 2
2 i h00 t . (2.119)
The first of these just says that t is constant, or that t is also an affine parameter,
t( ) = a + b . (2.120)
In other words, in the Newtonian limit there is essentially (up to a choice of scale/units)
no difference between coordinate time and proper time. We can use this in the second
equation to convert the -derivatives into derivatives with respect to the coordinate
time t,
1 d2 1 d 1 d d2
t = 0 = = . (2.121)
t2 d 2 t d t d dt2
Hence we obtain
d2 xi
= 21 h00 ,i (2.122)
dt2
(the spatial index i in this expression is raised or lowered with the Kronecker symbol,
ik = ik ). Comparing this with the Newtonian equation (2.107),
d2 xi
= ,i (2.123)
dt2
76
leads us (with the constant of integration absorbed into an arbitrary constant term in
the gravitational potential) to the key identification
h00 = 2 (2.124)
between the Newtonian gravitational potential and the (00)-component of the deviation
of the space-time metric from the Minkoski metric. By relating this back to g ,
g00 = (1 + 2) . (2.125)
we find the sought-for relation between the Newtonian potential and the space-time
metric. Thus Newtonian gravity can be captured or described by a space-time metric
of the form
ds2 = (1 + 2(~x))dt2 + d~x2 . (2.126)
For a radial gravitational field, with = (r), it is also natural to write this in terms
of spatial spherical coordinates as
Remarks:
1. With the speed of light not set equal to c = 1, the dimensionally correct form
of this identification is (recall that kinetic and potential energy have the same
dimension so that the dimension of , the energy per unit mass, is that of a
velocity-squared; thus /c2 is dimensionless)
2. For the gravitational field of isolated systems, it makes sense to choose the in-
tegration constant in such a way that the potential goes to zero at infinity, and
this choice also ensures that the metric approaches the flat Minkowski metric at
infinity.
3. Restoring the appropriate units, in particular the above factor of c2 , one finds
that the dimensionless factor /c2 109 on the surface of the earth, 106 on
the surface of the sun (see section 23.4 for some more details), so that the distor-
tion in the space-time geometry produced by gravitation is in general quite small
(justifying our approximations).
77
5. Likewise, in this approximation it does not make sense to inquire about the other
subleading components of the metric. As we have seen, a slowly moving particle
in a weak static gravitational field is not sensitive to them, and hence can also not
be used to probe or determine these components.
with
c2 (x) = (1 + 2(x)/c2 )c2 . (2.130)
Einstein realised fairly early on (1911) in his search for a relativistic theory of
gravity that this would have to be part of the story. However, this interpretation
is neither useful nor tenable when considering gravitational fields beyond the static
Newtonian approximation (which requires one to go beyond a theory with a single
scalar potential).
7. Later on, we will determine the exact solution of the Einstein equations (the field
equations for the gravitational field, i.e. for the metric) for the gravitational field
outside a spherically symmetric mass distribution with mass M (the Schwarzschild
metric). The metric turns out to have the simple form (23.28)
2 2GN M 2 2 2GN M 1 2
ds = 1 c dt + 1 dr + r 2 d2 . (2.131)
c2 r c2 r
From this expression one can read off that the leading correction to the flat metric
indeed arises from the 00-component of the metric,
2GN M 2
ds2 c2 dt2 + dr 2 + r 2 d2 + dt + . . .
r (2.132)
2GN M
= dx dx + (dx0 )2 + . . . .
rc2
This is indeed precisely of the above Newtonian form, with the standard Newto-
nian potential
GN M
(r) = . (2.133)
r
One can then also determine the subleading (known as post-Newtonian) correc-
tions to the general relativistic gravitational field, which are evidently suppressed
by additional inverse powers of c2 .
8. The key relation (2.124) can also be obtained at the level of the action. Starting
with the action S0 (the integral of the proper time), and using the time-coordinate
78
t as the parameter, using the same approximations as above one finds that the
action can be written as (keeping c explicit for a change, so that x0 = ct)
Z q
S0 [x] = mc dt g (dx /dt)(dx /dt)
Z q
= mc dt (dx /dt)(dx /dt) h (dx /dt)(dx /dt)
Z q (2.134)
= mc dt c2 ik (dxi /dt)(dxk /dt) h00 c2
Z p
= mc2 dt 1 ~v 2 /c2 h00 .
Expanding the square root and dropping the first (irrelevant) term, one finds that
in this limit the action reduces to
Z
m 2 mc2
S0 [x] dt ~v + h00 (2.135)
2 2
In this compact (but slightly dubious) derivation of this relation, the significance
of the stationarity condition is not manifest: it enters through the condition of
the equivalence of the 4-dimensional and 3-dimensional variational principles (with
respect to the fields x ( ) and xk (t) respectively), guaranteed by the affine relation
between t and implied by requiring in addition stationarity.
In section 1.3 we had discussed the Minkowski metric in Rindler coordinates, i.e. in
coordinates adapted to a constantly accelerating observer. For an observer accelerating
in the x1 -direction, the metric took the form (1.68),
with ~y = (x2 , x3 ) denoting the transverse spectator coordinates (which will again be
suppressed in the following).
What is the relation, if any, between this metric and the metric describing a weak
gravitational field, as derived above (after all, small accelerations should mimic weak
gravitational fields)? At first sight, the only thing they appear to have in common
79
is that the departure from what would be the Minkowski-metric in these coordinates
is encoded in the time-time component of the metric, 2 in one case, (1 + 2) in the
other, but apart from that 2 and (1 + 2) look quite different. This difference is,
however, again a coordinate artefact and the Rindler metric can be made to look like
the weak-field metric with the help of a suitable further redefinition of the coordinates.
For starters, it will be convenient, for this purpose and for a generalisation which we
will discuss below, to introduce the acceleration a explicitly into the coordinates by
redefining the coordinate transformation (1.67) to (I will now also call the Minkowski
coordinates 0 = t and 1 = x)
(so this differs by /a, a from the transformation given in (1.67)). Thus, now
it is the observer at = 1 who has acceleration a and whose proper time is = . The
Rindler metric now has the form
Now the transformation = 1+a x (reminding us that we are talking about acceleration
1
in the x = x direction), leads to
ds2 = (1 + a
x)2 d 2 + d
x2 (2.141)
x)2 1 + 2a
(1 + a x 1 + 2(
x) , (2.142)
Remarks:
1. Remarkably, this same form of the metric remains valid for an arbitrary time-
dependent acceleration a = a( ), and thus is capable of reproducing the weak
field form of the metric for general potentials. To see this, consider the wordline
(t( ), x( )) with general 4-velocity (actually 2-velocity in this case)
u0 = t(
) = cosh v( ) , u1 = x(
) = sinh v( ) , (2.143)
which satisfies
(u0 )2 (u1 )2 = 1 , (2.144)
as it should, and has the time-dependent acceleration a( ) = v(
),
(u 1 )2 (u 0 )2 = x
2 t2 = v(
)2 a( )2 . (2.145)
80
We can pass to adapted coordinates (, x), as above, by setting
e 2a 1 + 2a . (2.149)
For the record, and for later use, we note that the complete coordinate transfor-
mation between the Minkowski coordinates (t, x) and the conformally flat Rindler
coordinates (, ) is
= a(tx + xt ) . (2.151)
This is a boost in the (t, x)-plane, but the limit a 0 appears to be singular.
3. A simple and useful way to rectify this is to introduce a further constant shift of x,
x x 1/a, into the 1-parameter family (2.150) of coordinate transformations,
a0 t , x . (2.153)
= t + a(tx + xt ) (2.154)
81
4. In terms of the Minkowski null (or advanced and retarded time) coordinates
or, compactly,
Note that the range of the coordinates is < , < + or < uR , vR <
+, and that the coordinates (, ) or (uR , vR ) cover (and can be used in) the
right-hand quadrant x > |t| of Minkowski space-time, corresponding to <
uM = t x < 0 and 0 < vM = t + x < +, the so called (right) Rindler wedge.
As we will see in section 6.8, these null Rindler coordinates are particularly useful
for studying the solutions of the scalar wave equation in the Rindler wedge.
Let me close this section with some comments on other versions of (3 + 1)-dimensional
Rindler space. First of all, instead of looking at acceleration in the x1 -direction, say,
one can consider radial accelerations. To that end one first writes the metric in spatial
spherical coordinates,
ds2 = dt2 + dr 2 + r 2 d22 , (2.159)
r 2 t2 = 2 (2.162)
82
r 2 t2 < 0 and covering precisely the interior of the lightcone) is the so-called Milne
metric to be discussed in section 36.1.
(an analogous shift x x x0 for acceleration in the x-direction would have had no
effect on the metric since such a translation is a symmetry of the Minkowski metric,
whereas a translation in the radial direction is not). This form of the metric is adapted
to the hyperboloids
(r r0 )2 t2 = 2 (2.165)
and now describes radially accelerating observers, each one asymptotically approaching
the radial lightray emanating from a distance r0 from the origin (and correspondingly
the region of space-time covered by these coordinates is the complement of the past and
future of the 2-sphere of radius r0 at the origin, a hole in spacetime).
Following Einstein, the gravitational redshift (i.e. the fact that photons lose or gain
energy when rising or falling in a gravitational field) is usually presented as a direct
consequence of the Einstein Equivalence Principle (and is therefore also said to provide
an experimental test of the Einstein Equivalence Principle itself). It can indeed be
derived in this way (see Remark 2 at the end of this section for one such argument, albeit
not the original one). However, here we will derive this effect within the framework that
we have already adopted, inspired by the equivalence principle, namely in terms of the
description of the gravitational field by a metric.
This has several advantages. It allow us to further familiarise ourselves with the formal-
sim and to illustrate how to extract physical effects from our description of lightrays as
null geodesics (much as we employed timelike geodesics above to study the Newtonian
limit). Moreover, it allows us to derive formulae for this effect in quite some generality
and I will actually give 3 different derivations in increasing order of generality. In con-
junction with the Newtonian approximation to the gravitational field these then reduce
to the result in the form in which it is usually presented, e.g. as in (2.191) or (2.192)
(and as then rederived on the basis of the equivalence principle in (2.196)).
6
V. Balasubramanian, B. Czech, B. Chowdhury, J. de Boer, The entropy of a hole in spacetime,
arXiv:1305.0856 [hep-th], V. Balasubramanian, B. Chowdhury, B. Czech, J. de Boer, M. Heller, A
hole-ographic spacetime, arXiv:1310.4204 [hep-th].
83
To set the stage, note that it it is manifest from the expression
d 2 = g (x)dx dx (2.166)
for the proper time that e.g. the rate of clocks is affected by where one is in a gravita-
tional field. However, as by the unviversality of gravity everything is (and in particular
all ideal clocks are) affected in the same way by gravity, it is impossible to measure this
effect locally, at a fixed point in a gravitational field. In order to find an observable
effect, one needs to compare data from two different points in a gravitational potential.
The situation we could consider is that of two observers A and B moving on worldlines
(paths) A and B , A sending light signals to B. In general the frequency, measured
in the observers rest-frame at A (or in a locally inertial coordinate system there) will
differ from the frequency measured by B upon receiving the signal.
In order to separate out Doppler-like effects due to relative velocities, we consider two
observers A and B at rest radially to each other, at radii rA and rB , in a static spherically
symmetric gravitational field. This means that the metric depends only on a radial
coordinate r and we can choose it to be of the form
where d2 is the standard volume element on the two-sphere (see section 23 for a more
detailed justification of this ansatz for the metric).
Observer A sends out light of a given frequency A , say n pulses per proper time unit
A . Observer B receives these n pulses in his proper time B and interprets this
as a frequency B . Thus the relation between the frequency A emitted at A and the
frequency B observed at B is
A B
= . (2.168)
B A
I will now give two arguments to show that this ratio depends on the metric (i.e. the
gravitational field) at rA and rB through
1. The first argument is essentially one based on geometric optics (and is best ac-
companied by drawing a (1+1)-dimensional space-time diagram of the light rays
and worldlines of the observers).
The geometry of the situation dictates that the coordinate time intervals recorded
at A and B are equal, tA = tB as nothing in the metric actually depends on
84
t. In equations, this can be seen as follows. First of all, the equation for a radial
light ray is
g00 (r)dt2 = grr (r)dr 2 , (2.170)
or 1/2
dt grr (r)
= . (2.171)
dr g00 (r)
From this we can calculate the coordinate time for the light ray to go from A to
B. Say that the first light pulse is emitted at point A at time t(A)1 and received
at B at coordinate time t(B)1 . Then
Z rB
t(B)1 t(A)1 = dr(grr (r)/g00 (r))1/2 (2.172)
rA
The right hand side obviously does not depend on t, so we also have
Z rB
t(B)2 t(A)2 = dr(grr (r)/g00 (r))1/2 (2.173)
rA
where t2 denotes the coordinate time for the arrival of the n-th pulse. Therefore,
or
t(A)2 t(A)1 = t(B)2 t(B)1 , (2.175)
as claimed. Thus the coordinate time intervals recorded at A and B between the
first and last pulse are equal. However, to convert this to proper time, we have to
multiply the coordinate time intervals by an r-dependent function,
dx dx 1/2
A,B = (g (rA,B ) ) tA,B , (2.176)
dt dt
and therefore the proper time intervals will not be equal. For observers at rest,
dxi /dt = 0, one has
A,B = (g00 (rA,B ))1/2 tA,B . (2.177)
Since tA = tB , (2.169) now follows from (2.168).
2. The second argument uses the null geodesic equation, in particular the conserved
quantity associated to time-translations (recall that we have assumed that the
metric (2.167) is time-independent), as well as a somewhat more covariant looking,
but equivalent, notion of frequency.
First of all, let the light ray be described the wave vector k . In special relativity,
we would parametrise this as k = (, ~k) with = 2 the frequency. This is the
frequency observed by an inertial observer at rest, with 4-velocity u = (1, 0, 0, 0).
A Lorentz-invariant, and thus in our context now coordinate-independent, notion
of the frequency as measured by an observer with velocity u is thus
= u k . (2.178)
85
This includes as special cases the relativistic Doppler effect (where one compares
with = u k , u
the tangent to the world line of a boosted observer), as
well as the gravitational redshift we want to discuss here.
A static observer in the spherically-symmetric and static gravitational field (2.167)
is described by the 4-velocity
(and likewise for the observer at r = rB ). The wave vector k is a null tangent
vector, k k = 0, to a null geodesic corresponding to the Lagrangian
Since the metric is time-independent, there is (cf. the discussion in section 2.5)
the corresponding conserved quantity
L
E= = g00 (r)t (2.182)
t
(the minus sign serving only to make this quantity positive for t > 0). Then one
finds that the frequency measured by the static observer at r = rA is
3. The above derivation is not completely general, and still not completely covari-
ant, because we used the explicit form of the metric (which is the general form
of a metric with a time-translation invariance in spherical symmetry, but not in
general). We can improve this somewhat by using the more general characteri-
sation of time-translation invariance in terms of Killing vectors (section 2.6) and
the associated conserved charge (2.102).
Thus assume that we have a timelike Killing vector V . Then by definition a
static observer is one whose 4-velocity u is proportional to V ,
u V . (2.184)
86
For V = t this evidently reduces to the statement that only t changes along the
worldline, i.e. that the oberver remains at fixed values of the spatial coordinates,
and this is the sense in which we have informally used the term static observer
so far. Denoting the norm of V by
V = (V V )1/2 , (2.185)
Given the null wave vector k , we have the conserved energy (2.102),
E = k V . (2.187)
Since E is constant along the lightray, frequencies observed by two different static
observers are related by
A VB
= . (2.189)
B VA
For this reason, the norm V is also known as the redshift factor associated with a
timelike Killing vector.
Note that this result reduces to (2.169) if the metric has the form (2.167) and
V = t since then
Having derived (2.169) in 3 different ways, let us now look at what the result tells us in
specific situations of interest. Since on earth and in the solar system we only have access
to gravitational fields that are to a reasonably high degree of precision well described
by Newtonian gravity, we can use the Newtonian approximation (2.125). The (2.169)
becomes
A
g00 = (1 + 2) 1 + (rB ) (rA ) , (2.191)
B
or, with (r) = GN M/r,
A B GN M (rB rA )
= (2.192)
B rA rB
Thus for rB > rA one has
87
so that, as expected, a photon loses energy when rising in (and against the pull of) a
gravitational field, and conversely one has the gravitational blueshift effect
Remarks:
1. Note that the general result (2.169) depends only on the value of the gravitational
field at the points rA and rB , not on the gravitational field inbetween. This
reinforces the interpretation that the gravitational redshift is only due to the
different rate of clocks / proper time at the positions rA and rB , and not due to the
fact that something happens to the lightray as it travels through a gravitational
field (which should lead to a cumulative effect depending also on the intermediate
gravitational field).
2. The result (2.169) can also be deduced from energy conservation. A local inertial
observer at the emitter A will see a change in the internal mass of the emitter
mA = hA when a photon of frequency of A is emitted. Likewise, the absorber
at point B will experience an increase in inertial mass by mB = hB , but the
total internal plus gravitational potential energy must be conserved. Thus
leading to
A 1 + (rB )
= 1 + (rB ) (rA ) , (2.196)
B 1 + (rA )
as before. This derivation (in quotes, because we are wildly mixing Newto-
nian gravity, special relativity and quantum mechanics - do take this derivation
with an appropriately sized grain of salt, please) shows that gravitational redshift
experiments test the Einstein Equivalence Principle in its strong form, in which
the term laws of nature is not restricted to mechanics (inertial = gravitational
mass), but also includes quantum mechanics in the sense that it tests if in an
inertial frame the relation between photon energy and frequency is unaffected by
the presence of a gravitational field.
3. While difficult to observe directly (by looking at light form the sun), this predic-
tion has been verified in the laboratory, first by Pound and Rebka (1960), and
subsequently, with one percent accuracy, by Pound and Snider in 1964 (using the
Mossbauer effect).
Let us make some rough estimates of the expected effect. We first consider light
reaching us (B) from the sun (A). In this case, we have rB rA , where rA is the
88
radius of the sun, and (also inserting a so far suppressed factor of c2 ) we obtain
A B GN M (rB rA ) GN M
= 2
2 . (2.197)
B c rA rB c rA
Using the approximate values
rA 0.7 106 km
Msun 2 1033 g
GN 7 108 g1 cm3 s2
GN c2 7 1029 g1 cm = 7 1034 g1 km , (2.198)
one finds
2 106 . (2.199)
In principle, such a frequency shift should be observable. In practice, however, the
spectral lines of light emitted by the sun are strongly effected e.g. by convection in
the atmosphere of the sun (Doppler effect), and this makes it difficult to measure
this effect with the required precision.
In the Pound-Snider experiment, the actual value of / is much smaller. In
the original set-up one has rB rA 20m (the distance from floor to ceiling of
the laboratory), and rA = rearth 6.4 106 m, leading to
2.5 1015 . (2.200)
However, here the experiment is much better controlled, and the gravitational
redshift was verified with 1% accuracy.
Central to our initial discussion of gravity was the Einstein Equivalence Principle which
postulates the existence of locally inertial (or freely falling) coordinate systems in which
locally at (or around) a point the effects of gravity are absent. Now that we have decided
that the arena of gravity is a general metric space-time, we should establish that such
coordinate systems indeed exist. Looking at the geodesic equation, it it is clear that
at least in this context absence of gravitational effects is tantamount to the existence
of a coordinate system { a } in which at a given point p the metric is the Minkowski
metric, gab (p) = ab and the Christoffel symbols are zero, abc (p) = 0,
89
the latter condition is equivalent to gab , c (p) = 0. I will sketch three arguments estab-
lishing the existence of such coordinate systems, each one having its own virtues and
providing its own insights into the issue.
Actually it is physically plausible (and fortuitously moreover true) that one can always
find coordinates which embody the equivalence principle in the stronger sense that the
metric is the flat metric ab and the Christoffel symbols are zero not just at a point but
along the entire worldline of an inertial (freely falling) observer, i.e. along a geodesic ,
Such coordinates, based on a geodesic rather than on a point, are known as Fermi
normal coordinates. The construction is similar to that of Riemann normal coordinates
(based at a point) to be discussed below.7
1. Direct Construction
We know that given a coordinate system { a } that is inertial at a point p, the
metric and Christoffel symbols at p in a new coordinate system {x } are deter-
mined by (1.80,1.87). Conversely, we will now see that knowledge of the metric
and Christoffel symbols at a point p is sufficient to construct a locally inertial
coordinate system at p.
We will construct this coordinate system a = a (x) locally around the point p
(with coordinates x0 , say, in the original coordinate system) by a Taylor series
expansion,
a (x) = da + (x x0 ) ea + 12 (x x0 ) (x x0 ) f
a
+ ... . (2.204)
Here
da = a (x0 ) 0a (2.205)
are the (arbitrary) coordinate values of the point p in the new coordinates a ,
a
ea = (x0 ) (2.206)
x
is the Jacobi matrix of the coordinate transformation at x = x0 , and
a 2a
f = (x0 ) (2.207)
x x
is its 1st derivative at x0 .
7
Most discussions of Fermi coordinates in the literature follow the presentation given in F. Menasse,
C. Misner, Fermi normal coordinates and some basic concepts in differential geometry, J. Math. Phys.
4 (1963) 735-745; for a geometrically transparent treatment see also section 1.11 of E. Poisson, A
Relativists Toolkit; Fermi coordinates for null geodesics are constructed in M. Blau, D. Frank, S.
Weiss, Fermi Coordinates and Penrose Limits, arXiv:hep-th/0603109.
90
Form the tensorial transformation behaviour of the metric we know that
ea ea = , ea eb = ba , (2.210)
we see that the inverse matrix diagonalises (and scales) the metric at the point p
in such a way that
g (x0 )ea eb = ab . (2.211)
Since g (x0 ) is a symmetric non-degenerate matrix, such matrices always exist
(and are unique up to similarity transformations that leave ab invariant, i.e. up
to Lorentz transformations). The notation ea and ea reflects the fact that these
matrices are the components of an orthonormal vierbein (or vielbein) at the point
p, which are traditionally denoted this way (cf. the discussion in section 3.8 below).
Taking stock, we see that the condition gab (p) = ab determines the coordinate
system to 1st order in a Taylor series expansions, up to translations (the choice of
da ) and Lorentz transformations, i.e. up to Poincare transformation.
We now turn to the 2nd condition characterising a locally inertial coordinate
system, namely abc (p) = 0. We can write the inhomogeneous transformation
behaviour of the Christoffel symbols as
a b
a
c 2a
= + . (2.212)
x bc
x x x x
Thus at the point p we have
Requiring abc (p) = 0 now uniquely determines the 2nd order Taylor coefficients,
abc (0 ) = 0 a
f = ea (x0 ) . (2.214)
Thus to 2nd order in a Taylor series expansion, the transformation from arbitrary
coordinates x to inertial coordinates a at the point p is given by
91
We have therefore established that for an arbitrary point p in an arbitrary gravi-
tational field one can always introduce local coordinates which are inertial at that
point, and that up to 2nd order in a Taylor series expansion such a coordinate
system is unique up to Poincare transformations.
Since this leaves the infinite number of higher-order terms of the Taylor expansion
undetermined, this shows that inertial coordinate systems are highly non-unique,
and raises the following questions:
Can one continue in this vein and choose the (so far undetermined) higher-
order terms in the Taylor expansion such that also e.g. the 2nd derivatives
of the metric at p are equal to zero,
abc b c = 0 . (2.218)
92
way that gab (p) = ab (by choosing the four directions at p to be orthonormal unit
vectors).
Before turning to the more detailed construction, let us look at an example. Con-
sider the standard metric ds2 = d 2 + sin2 d2 on the two-sphere. Any point
is as good as any other point, and one can construct an inertial coordinate sys-
tem at the north pole = 0 in terms of geodesics shot off from the north pole
into the = 0 ( 1 ) and = /2 ( 2 ) directions. The affine parameter along
a great circle (geodesic) connecting the north pole to a point (, ) is , and
thus is also the geodesic distance, and the coordinates of the point (, ) are
( 1 = cos , 2 = sin ). In particular, the north pole is the origin 1 = 2 = 0.
Note that one could have guessed these coordinates from the fact that near = 0
the metric is d 2 + 2 d2 , which is the Euclidean metric in polar coordinates
( cos , sin ).
Calculating the metric in these new components, using
and thus
1 d 1 + 2 d 2 1 d 2 2 d 1
d = p , d = , (2.220)
( 1 )2 + ( 2 )2 ( 1 )2 + ( 2 )2
one finds
d 2 + sin2 d2 = (d 1 )2 + (d 2 )2 + O( 2 d 2 ) , (2.221)
i.e.
gab () = ab + O( 2 ) . (2.222)
Therefore
gab ( = 0) = ab , gab,c ( = 0) = 0 , (2.223)
as required.
We now (re)turn to the general construction of such coordinates, starting with the
geodesic equation
x + x x = 0 . (2.224)
We consider geodesics passing through (or emanating from) the point p with co-
ordinates x0 at = 0, and with initial 4-velocity u0 ,
x ( = 0) = x0 , x ( = 0) = u0 . (2.225)
( = 0) = (x0 )u0 u0 .
x (2.226)
Hence in a Taylor expansion around = 0 we can write the solution to the geodesic
equation as
x ( ) = x0 + u0 21 2 (x0 )u0 u0 + . . . . (2.227)
93
We can expand the (arbitrary) initial 4-velocity u0 in terms of 4 linearly indepen-
dent (and orthonormal, say) vectors at p as
We can then think of the Taylor expansion (2.227) as defining a coordinate trans-
formation
(c) From the present point of view, the 2nd condition arises from the fact (men-
tioned above) that in these coordinates the geodesic equation for the above
geodesics reduces to
as claimed.
(d) In contrast to the previous construction leading to (2.216), here the higher-
order terms in the Taylor expansion of the coordinate transformation are now
determined by the higher-order terms in the Taylor expansion of the solution
(2.227) of the geodesic equation. These higher-order terms will depend on
2nd and higher derivatives of the metric g (x) at x0 , and these in turn will
94
also determine the quadratic and higher terms of the Taylor expansion of the
metric in these coordinates,
gab () = gab (0 ) + ( 0 )c gab , c (0 ) + 12 ( 0 )c ( 0 )d gab , cd (0 ) + . . .
= ab + 12 ( 0 )c ( 0 )d gab , cd (0 ) + . . . .
(2.235)
We will determine the quadratic term in this expansion (expresed in terms
of the Riemann curvature tensor) in section 7.9.
3. A Numerological Argument
This is my favourite argument because it requires no calculations and at the same
time provides additional insight into the nature of curved space-times.
Assuming that the local existence of solutions to differential equations is guaran-
teed by some mathematical theorems, it is frequently sufficient to check that one
has enough degrees of freedom to satisfy the desired initial conditions (one may
also need to check integrability conditions). In the present context, this argument
is useful because it also reveals some information about the true curvature hidden
in the second derivatives of the metric. It works as follows:
95
Again this turns out to agree with the number of independent components (7.25)
of the curvature tensor in D dimensions.
96
Note: At this point in the course I find it useful to develop in parallel (and suggest to
read in parallel)
the more formal material on tensor analysis in sections 3, 4, 5, 6, 7 and 10, say
(and then moving on to the Einstein equations themselves)
and a detailed discussion of the basic properties of the Schwarzschild metric (sec-
tions 23.4 - 26),
since much of the latter (in particular geodesics, solar system tests of general relativity,
even the issues that arise in connection with the Schwarzschild radius) can be understood
just on the basis of what has been done so far (if, for the time being, one accepts
on faith that the Schwarzschild metric is the unique spherically symmetric vacuum
solution of the Einstein field equations). Not only is this an interesting and physically
relevant application of the machinery developed so far, it also provides an appropriate
balance between physics and formalism in the lectures. More advanced material in the
intervening sections can then be covered and dealt with if and when needed or desired
(or, ideally, both).
97
3 Tensor Algebra
The Einstein Equivalence Principle tells us that the laws of nature (including the effects
of gravity) should be such that in an inertial frame they reduce to the laws of Special
Relativity. As we have seen in the case of a free particle, this can be implemented by
transforming the laws of Special Relativity to arbitrary coordinate systems and declaring
that these be valid for arbitrary coordinates and metrics.
However, it may not yet be completely clear at this stage what is the precise relation
between this procedure and the incorporation of a gravitational field via the equivalence
principle. Moreover, this is a somewhat tedious procedure in general (e.g. to obtain the
correct form of the Maxwell equations in the presence of gravity) and not particularly
enlightning.
In order to fill this gap (and overcome these shortcomings), we will now introduce the
Principle of General Covariance and show that it provides us with a concrete way of
implementing the Einstein Equivalence Principle.
where the term in brackets is some invertible matrix or operator. Then clearly the
presence of the junk-terms means that the equation T = 0 is not equivalent to the
equation T = 0. An example of an object that transform in this way is, as we have seen,
98
the Christoffel symbols. On the other hand, if these junk terms are absent, so that we
have
T = (. . .)T (3.2)
then clearly T = 0 if and only if T = 0, i.e. the equation is satisfied in one coordinate
system if and only if it is satisfied in any other (or all) coordinate systems. This is the
kind of equation that embodies general covariance, and again we have already seen an
example of such an equation, namely the geodesic equation, where the term (. . .) in
brackets is just the Jacobi matrix. Thus, to be more concrete, we can replace the 2nd
condition above by
Let us now establish the above statement, namely that the Einstein equivalence principle
implies that an equation that satisfies the conditions 1 and 2 (or 2) is valid in an
arbitrary graviational field:
consider some equation that satisfies these conditions, and assume that we are in
an arbitrary gravitational field;
condition 2 implies that this equation is true (or satisfied) in all coordinate sys-
tems if it is satisfied just in one coordinate system;
now we know that we can always (locally) construct a freely falling coordinate
system in which the effects of gravity are absent;
the Einstein Equivalence Principle now posits that in such a reference system the
physics is that of Minkowski space-time;
Remarks:
1. Note that general covariance alone is an empty statement since any equation
(whether correct or not) can be made generally covariant simply by writing it in
an arbitrary coordinate system (cf. also the discussion in section 5.4). It develops
its power only when used in conjunction with the Einstein Equivalence Principle
99
as a statement about physics in a gravitational field, namely that by virtue of its
general covariance an equation will be true in a gravitational field if it is true in
the absence of gravitation.
2. The principle of general covariance does not fix the equations uniquely because
there are generally covariant objects that one can construct e.g. from the (second)
derivatives of the metric (via the Riemann curvature tensor to be introduced
in section 7) that can therefore be added to an equation and which vanish for
Minkowski space, i.e. in the absence of gravitation.
3.2 Tensors
If you are already familiar with Lorentz tensors from special relativity (as briefly recalled
in section 1.2, these are objects which transform in a particularly simple multi-linear way
under Lorentz transformations), then hardly anything in this or the subsequent section
3.3 should be new or unexpected (but interesting new features will arise in particular
when we move on from tensor algebra to tensor analysis in section 4).
1. Scalars
The simplest example of a tensor is a function (or scalar) f which under a coor-
dinate transformation x y (x ) simply transforms as
or f (y) = f (x(y)). One frequently suppresses the argument, and thus writes
simply, f = f , expressing the fact that, up to the obvious change of argument,
functions are invariant under coordinate transformations.
2. Vectors
The next simplest case are vectors V (x) transforming as
y
V (y(x)) = V (x) . (3.4)
x
100
A prime example is the tangent vector x to a curve, for which this transformation
behaviour
y
x y = x (3.5)
x
is just the familiar one.
Remarks:
and likewise for scalars and scalar fields, and more general tensors and tensor
fields.
(b) One way of thinking about vector fields is as tangent vector fields to families of
curves on a space or space-time which arise as the solutions to the differential
equation
d
x () = V (x()) (3.7)
d
(and we take local existence and uniqueness of these solutions under suitable
regularity and differentiability conditions for granted). These curves x (s)
are the integral curves (or orbits) of the vector field V , and by by con-
struction they are characterised by the fact that at any point x the tangent
vector to the curve passing through that point is the vector V (x) at that
point. Thus vector fields also generate a flow on the space(-time), namely
the motion of points along these integral curves, x () 7 x ( +s) for s R.
(c) An extremely useful related way of thinking about vectors (vector fields) is
as first order differential operators, via the correspondence
V V := V . (3.8)
One of the advantages of this point of view is that the object V is com-
pletely invariant under coordinate transformations as the components V of
V transform inversely to the basis vectors . For more on this see sections
3.6 and 3.8 on the coordinate-independent interpretation of tensors below.
3. Covectors
A covector (field) is an object U (x) which under a coordinate transformation
transforms inversely to a vector, i.e. as
x
U (y(x)) = U (x) . (3.9)
y
101
A familiar example of a covector is the derivative U = f of a function which
of course transforms as
x
f (y(x)) = f (x) . (3.10)
y
Remarks:
(a) As in the case of covectors of special relativity (1.38), one should think of
covectors pointwise as elements of the dual vector space V to the space of
vectors V, i.e. as linear functionals on the space of vectors, given by
U = U dx (3.12)
of a scalar.
(c) Combining the two points of view in the remarks above, one can thus think
of df as the linear functional on vector fields that assigns to a vector field V
the scalar which is the derivative of f along V ,
4. Covariant 2-Tensors
Clearly, given the above objects, we can construct more general objects which
transform in a nice way under coordinate transformations by taking products of
them. Tensors in general are objects which transform like (but need not be equal
to) products of vectors and covectors.
In particular, a covariant 2-tensor, or (0,2)-tensor, is an object A that transforms
under coordinate transformations like the product of two covectors, i.e.
x x
A (y(x)) = A (x) . (3.15)
y y
102
I will from now on use a shorthand notation in which I drop the prime on the trans-
formed object and also omit the argument. In this notation, the above equation
would then become
x x
A = A . (3.16)
y y
We already know one example of such a tensor, namely the metric tensor g
(which happens to be a symmetric tensor).
5. Contravariant 2-Tensors
Likewise we define a contravariant 2-tensor (or a (2,0)-tensor) to be an object B
that transforms like the product of two vectors,
y y
B = B . (3.17)
x x
An example is the inverse metric tensor g .
6. (p, q)-Tensors
It should now be clear how to define a general (p, q)-tensor - namely as an object
...
T 11 ...pq with p contravariant and q covariant indices which under a coordinate
transformation transforms like a product of p vectors and q covectors,
... y 1 y p x1 xq 1 ...p
T 1 ...p = . . . . . . T 1 ...q . (3.18)
1 q x1 xp y 1 y q
Remarks:
1. Note that, in particular, a tensor is zero (at a point) in one coordinate system if
and only if the tensor is zero (at the same point) in another coordinate system.
Thus, any law of nature (field equation, equation of motion) expressed in terms of
...
tensors, say in the form T 11 ...pq = 0, preserves its form under coordinate trasfor-
mations and is therefore automatically generally covariant,
1 ...p 1 ...p
T 1 ...q =0T 1 ...q
=0 (3.19)
103
3. A covariant 2-tensor T , say, is said to be symmetric if T = T and anti-
symmetric if T = T . This is well-defined because it is a generally covariant
notion: a tensor is symmetric in all coordinate system iff it is symmetric in one
coordinate system, etc.
This definition can be extended to any or all pairs of covariant indices or pairs of
contravariant indices. Thus e.g. a tensor T 1 ...p is called totally symmetric (or
totally anti-symmetric) if it is symmetric (anti-symmetric) under the exchange of
any pair of indices.
On the other hand, it is not meaningful to talk of the symmetry of a (1,1)-tensor,
say, as an equation like T = T does not make any sense.
Symmetrisation and anti-symmetrisation of tensors will be discussed in section
3.3 below.
Tensors can be added, multiplied and contracted in certain obvious ways. The basic
algebraic operations are the following:
1. Linear Combinations
1 ...p 1 ...p
Given two (p, q)-tensors A 1 ...q and B 1 ...q , their sum
1 ...p 1 ...p 1 ...p
C 1 ...q =A 1 ...q +B 1 ...q (3.20)
2. Direct Products
104
1 ...p 1 ...p
Given a (p, q)-tensor A 1 ...q and a (p , q )-tensor B 1 ...q , their direct product
1 ...p 1 ...p
A 1 ...q B 1 ...q (3.21)
is a (p + p , q + q )-tensor,
3. Contractions
Given a (p, q)-tensor with p and q non-zero, one can associate to it a (p 1, q 1)-
tensor via contraction of one covariant and one contravariant index,
1 ...p 1 ...p1 1 ...p1
A 1 ...q B 1 ...q1 =A 1 ...q1 . (3.22)
This is indeed a (p 1, q 1)-tensor, i.e. transforms like one. Consider, for ex-
ample, a (1,2)-tensor A and its contraction B = A . Under a coordinate
transformation B transforms as a covector:
B = A
y x x
= A
x y y
x
= A
y
x x
= A = B . (3.23)
y y
Remarks:
(a) Note that there are p different ways of lowering the indices, and they will in
general give rise to different tensors. It is therefore important to keep track of
105
this in the notation. Thus, in the above, had we contracted over the second
index instead of the first, we should write
1 ...p 3 ...p
g2 A 1 ...q A1 1 ...q . (3.25)
(b) In particular, given a vector field V (x), we can associate to it the dual
(with respect to the metric) covector field V (x) with covariant components
V = g V , (3.26)
A = g A . (3.27)
g = g g g . (3.29)
and raising one index of the metric gives the Kronecker tensor,
g g g = . (3.30)
106
The factor 12 is chosen such that the symmetrisation of a symmetric tensor is the
same as the original tensor,
T() = 12 (T + T ) (3.33)
is totally symmetric, i.e. symmetric under the exchange of any pair of indices, and
1
T[] 3! (T T T + T T + T ) (3.35)
is totally anti-symmetric. The prefactor 16 is again there to ensure that the total
symmetrisation of a totally symmetric tensor is the original tensor (and likewise for
the total anti-symmetrisation of totally anti-symmetric tensors). This generalises
in an evident way to higher rank p tensors, with the combinatorial prefactor 1/p!.
An observation we will frequently make use of to recognise when some object is a tensor
is the following (occasionally known as the quotient theorem or quotient lemma):
...
Assume that you are given some object A 11 ...pq . Then if for every covector U the
... ...
contracted object U1 A 11 ...pq transforms like a (p 1, q)-tensor, A 11 ...pq is a (p, q)-
tensor. Likewise for contractions with vectors or other tensors so that if e.g. in an
equation of the form
A = B C (3.36)
you know that A transforms as a tensor for every tensor C, then B itself has to be a
tensor.
If junk 6= 0, then there will be some C such that junk contributes to the contraction
B C . That means that junk contributes to A , the transformed A, contradicting the
premise that A is a tensor.
107
3.4 Generally Covariant Integration and Volume Elements
While tensors are the objects which, in a sense, transform in the nicest and simplest
possible way under coordinate transformations, they are not the only relevant objects.
An important class of non-tensors (but almost tensors) are so-called tensor densities.
They will play a crucial role for us in order to have a generally-covariant notion of
integration at our disposal, and thus ultimately also a way of writing down generally
covariant action principles for fields etc.
In this section we will address the issue of generally covariant integration in a space-time
equipped with a metric. This will be accomplished with the help of a particular tensor
density constructed from the metric. Having thus established that tensor densities are
objects of legitimate interest in their own right, we will then discuss their properties in
more generality in section 3.5 below.
To set the stage, consider once again first the situation in special relativity. In that
case, the integral of a Lorentz scalar f () with respect to the volume element d4 (or
d27 . . . ) is itself a Lorentz scalar, i.e. independent of the inertial reference frame in
which the integral is evaluated,
Z Z
d f () = d4 f()
4 (a = Lab b ) (3.38)
1. f is a scalar by asumption,
f()
= f () , (3.39)
d4 = d4 . (3.42)
108
because of the non-trivial Jacobian,
y 4
4
d y = det d x . (3.44)
x
One way out would be to abandon the idea that one should integrate scalars and to
require that the integrand f (x) should transform in such a way that it cancels the
Jacobian arising from the measure, namely as
1
y
f (y) = det f (x) . (3.45)
x
This is indeed an option, and we will return to this below (see remark 1 in section
3.5), but at this stage this is rather unintuitive and not particularly useful, in particular
because it is not clear how one should go about finding or constructing such objects in
the first place.
Therefore let us approach this question in a different way. Integrals are used to cal-
culate or measure volumes (or areas, or lenghts, or . . . ). Such integrals should have a
coordinate-independent meaning, but they should depend on the prescription one uses
for measuring volumes, areas, lenghts, . . . These prescriptions are concisely encoded in
the metric. Thus it is plausible that in order to define a generally covariant notion of
integration one may need to specify the metric, but that this is all that one should need
to know (while the Jacobian between two coordinate systems should fundamentally be
irrelevant and be considered to be a red herring).
With this in mind, let us recall the standard tensorial transformation behaviour of the
metric under coordinate transformations,
x x
g (y) = g (x) . (3.46)
y y
It follows from this that the absolute value of the determinant of the metric
does not transform like a scalar or some other tensor at all, but instead transforms as
2 2
x y
g = det g = det g . (3.48)
y x
In particular, its square-root g transforms as
1
p y
g = det g . (3.49)
x
Therefore the combined expression gd4 x is invariant under general coordinate trans-
formations,
p 4
g d y = gd4 x , (3.50)
109
and can therefore be used to define integrals of scalars in a generally covariant (but
metric-dependent) way,
Z p Z
4 4
g d y f (y) = gd x f (x) . (3.51)
This will of course be important in order to formulate action principles etc. in a space-
time equipped with a metric in a generally covariant way.
This is also frequently the quickest way to determine the volume element in non-
Cartesian coordinates in Euclidean space. Thus, to determine what is the volume
element in spherical coordinates {y k } = (r, , ), say, instead of laboriously determin-
ing the Jacobi matrix for the coordinate transformation, and then (equally laboriously)
calculating its determinant (which would be the standard uninspiring and uninspired
procedure), all one needs to know is the metric in these coordinates to deduce
and therefore
d3 x = g d3 y = r 2 sin dr d d . (3.53)
In the previous section we have encountered certain not strictly tensorial objects which
nevertheless turned out to be useful. Having thus established the basic credentials of
such objects, we will now formalise this somewhat.
Thus the prime example of what we will call a tensor density is the (absolute value of
the) determinant g := | det g | of the metric tensor, which, as we have seen, transforms
as 2
y
g = det
g . (3.54)
x
An object which transforms in such a way under coordinate transformations is called
a scalar tensor density of weight w = +2, and the square root of the determinant g
transforms as, and hence is, a tensor density of weight w = +1.
1 ...p y 1 y p x1 xq w/2 1 ...p
gw/2 T 1 ...q = . . . . . . g T 1 ...q . (3.56)
x1 x p y 1 y q
110
Conversely, therefore, any tensor density of weight w can be written as a tensor times
g+w/2 ,
The algebraic rules for tensor densities are strictly analogous to those for tensors. Thus,
for example, the sum of two (p, q) tensor densities of weight w (let us call this a (p, q; w)
tensor) is again a (p, q; w) tensor, and the direct product of a (p1 , q1 ; w1 ) and a (p2 , q2 ; w2 )
tensor is a (p1 + p2 , q1 + q2 ; w1 + w2 ) tensor. Contractions and the raising and lowering
of indices of tensor densities can also be defined just as for ordinary tensors.
Remarks:
1. Generalising the argument in section 3.4, we now learn that if f is any scalar
density of weight w = +1, then its integral is well-defined and coordinate inde-
pendent, Z Z
d4 x f = d4 x f . (3.58)
See remark 4 below for one way of constructing such objects without taking re-
course to a metric.
2. There is one more important tensor density which - like the Kronecker tensor - has
the same components in all coordinate systems. This is the totally anti-symmetric
Levi-Civita symbol (taking the values 0, 1) which is a tensor density of
weight w = 1. Then g is a tensor (strictly speaking it is a pseudo-tensor
because of its behaviour under reversal of orientation - see below).
To see this, recall first of all the definition of the Levi-Civita symbol: it is totally
anti-symmetric,
=[] , (3.59)
and has therefore only got one independent component which we will normalise
to be
0123 = +1 . (3.60)
111
Next, recall one possible definition of the determinant det M of a (D D)-matrix
M , namely as the coefficient (proportionality factor) on the right-hand side of
1 ...D M 11 . . . M D
D
= (det M ) 1 ...D . (3.62)
Now choose M to be the Jacobi matrix (y/x). Then the above equation shows
that 1
y x xD
1 ...D = det . . . 1 ...D , (3.63)
x y 1
y D
i.e. that 1 ...D transforms as a tensor density of weight w = 1, provided that
det(y/x) > 0. The latter condition means that the coordinate transformation
preserves the orientation. Thus, 1 ...D transforms as a tensor density under
orientation-preserving coordinate transformations but picks up a sign when the
orientation is reversed. Thus strictly speaking 1 ...D is not a tensor density but
a pseudo-tensor density.
Going back to 4 dimensions, it follows that
g (3.64)
We could have chosen to not absorb the minus sign into the definition of ,
at the expense of an explicit minus sign on the right-hand side of (3.65). The
convention we have adopted is more convenient, however, in particular since it
is compatible with the standard practice in special relativity to (tacitly) identify
= , the minus sign arising from raising the indices on with the
Minkowski metric with 00 = 1, so that 0123 = 0123 .
112
This is not a tensor but transforms like a scalar density. On the other hand, if
one works instead with the tensor one obtains a scalar, and this scalar is
precisely the invariant volume element (3.50),
1
dx dx dx dx = gd4 x . (3.68)
4!
Consider first of all the derivative df of a function (scalar field) f = f (x). This is
clearly a coordinate-independent object, not only because we didnt have to specify a
coordinate system to write df but also because
f (x) f (y(x))
df = dx = dy , (3.70)
x y
which follows from the fact that f (a covector) and dx (the coordinate differentials)
transform inversely to each other under coordinate transformations. This suggests that
it is useful to regard the quantities f as the coefficients of the coordinate independent
object df in a particular coordinate system, namely when df is expanded in the basis
{dx }.
We can do the same thing for any covector A . If A is a covector (i.e. transforms like
one under coordinate transformations), then A := A (x)dx is coordinate-independent,
113
and it is useful to think of the A as the coefficients of the covector A when expanded
in a coordinate basis, A = A dx . Linear combinations of dx built in this way from
covectors are known as 1-forms.
From this point of view, we interpret the {A } simply as the (coordinate dependent)
components of the (coordinate independent) 1-form A when expressed with respect to
the (coordinate dependent) differentials {dx }, considered as a basis of the space of
covectors.
Something similar can be done for vector fields. Just as covectors transform inversely to
coordinate differentials, vectors V transform inversely to partial derivatives . Thus
V := V (x) (3.71)
x
is coordinate-independent - a coordinate-independent linear first-order differential op-
erator. One can thus always think of a vector field as a 1st order differential operator
and this is a very fruitful point of view.
V f = V f . (3.72)
This is also a coordinate independent object, a scalar, arising from the contraction of a
vector and a covector. And this is as it should be because, after all, both a function and
a vector field can be specified on a space-time without having to introduce coordinates
(e.g. by simply drawing the vector field and the profile of the function). Therefore also
the change of the function along a vector field should be coordinate independent and,
as we have seen, it is.
So far we have only discussed vectors and covectors. All this can, in principle, be
extended to higher rank tensors, but at this point it would be very useful to introduce
the notion (or at least the notation) of tensor products. I will briefly desribe this in
section 3.7 below.
For those who do not want to delve into this (and it is not required for the following):
...
fact of the matter is that any (p, q)-tensor T 11 ...pq can be thought of as the collection
of components of a coordinate independent object T when expanded in a particular
coordinate basis in terms of the dx and (/x ).
Any choice of coordinate system {x } gives rise to such a basis {dx }, and such bases
are known as coordinate bases or natural bases. This is not the only possible choice of
basis, however, and we will return to this issue in section 3.8.
In (multi-)linear algebra, the tensor product is used to describe multilinear maps. Let
V be a vector space, and V its dual, consisting of the linear maps V R, and denote
114
the action of a V on v V by
a V , v V a(v) R . (3.73)
ei (Ek ) = ki , (3.74)
a b 6= b a . (3.77)
(a b)(v, w) = ai bk v i wk , (3.78)
acting as
a(v, w) = aik v i wk . (3.80)
From these definitions it follows that the tensor product is evidently linear,
a (b + c) = a b + a c (3.81)
(and likewise for the first factor), and R-linear, i.e. for r R one has
115
Using the canonical isomorphism (V )
= V for finite-dimensional vector spaces,
one can also in the same way define the tensor product V V as the space of
bilinear functions on V V ,
By the same token, the tensor product V W is the space of bilinear maps on
V W .
p V = |V .{z
. . V } (3.87)
p times
These multilinear maps can be added and multipled and thus form an algebra,
the tensor algebra of V , denoted by T (V ). As a vector space, it consists of the
sums of all the p-linear maps,
T (V ) = p=0 p V . (3.88)
The tensor product can also be used to describe multilinear maps between vector spaces:
Likewise a linear map from V to some other vector space W can be regarded as
an element of V W .
116
Clearly, in general, given a basis of V and a dual basis of V , the tensor product can
be used to construct a basis
in the space
T p,q = (V . . . V ) (V . . . V ) (3.92)
| {z } | {z }
p times q times
of (p, q)-tensors,
i ...i
T T p,q : T = Tk11 ...kpq (Ei1 . . . Eip ) (ek1 . . . ekq ) . (3.93)
This is the way we will use the tensor product notation below, as a multilinear operation
providing us with a basis for higher rank tensor fields.
The reason for introducing and working with tensors, defined in this way, is that tensorial
equations have the virtue that they are generally covariant, i.e. that they are satisfied
in all coordinate system if and only if they are satisfied in one coordinate system. The
emphasis in this formulation is thus not on tensors as multilinear maps but on how they
transform under coordinate transformations. This seems to be somewhat at odds with
the definition of tensors in multilinear algebra, but as we will see below this is simply
due to the choice of a particular class of bases (coordinate bases), with respect to which
multilinear maps indeed transform in this way under changes of the coordinate basis,
i.e. under changes of coordinates.
We had already noted above, that there is a more coordinate independent way of looking
at covector fields and vector fields, by associating to them the objects
which are completely invariant under coordinate transformations, with the dx and the
providing a basis for the space of covector and vector fields respectively.
This perspective can now be extended to higher-rank and mixed tensors. In particular,
associated with the metric g (x) we have the coordinate independent line element
ds2 = g dx dx . (3.96)
117
which we can now also think of as the tensor
g = g dx dx . (3.97)
Since we are now dealing with tensor fields rather than just with tensors (multilinear
maps at a given point), the tensor product in this context is required to be multilinear
not just over R, but over functions (scalars) so that e.g.
Now let us return to (3.97). If one wants to emphasise that the metric is a symmetric
(0,2)-tensor, one can also expand it with respect to the symmetrised basis as
but for the metric the tensor-product is often omitted and one simply writes it as the
line element (3.96).
If one has a non-symmetric (0, 2)-tensor T , say, then one can also group these coeffi-
cients into the components of a coordinate-invariant object, but now the tensor product
notation
T T = T dx dx (3.100)
is more useful than just writing T dx dx , simply to emphasise the fact that all com-
ponents of T , not just the symmetric part of T , contribute to T because dx dx
is not symmetric,
dx dx 6= dx dx , (3.101)
(whereas just writing dx dx might lead one to believe that dx and dx commute).
More generally, to a (0, p)-tensor we can associate the object
The tensor product notation is also useful for higher-rank contravariant or mixed tensors.
Given a (2, 0)-tensor with components T , say, one really does not want to write
the corresponding coordinate-invariant object as T , say, because this may be
118
interpreted as a second order differential operator whereas what one really means is a
bilinear first order differential operator, which one writes as
T = T , (3.104)
In general, we can thus think of a (p, q)-tensor field, as given in (3.94), as the components
of a coordinate-independent object
...
T = T 11 ...qp (x) (1 . . . p ) (dx1 . . . dxq ) , (3.105)
when expanded with respect to the coordinate basis in the space of tensor fields gener-
ated by dx and = x .
As we saw in section 3.6, a choice of coordinates provides one with a choice of basis for
vectors, covectors and other tensors, and a quantity like V is then interpreted as the
collection of components of an object V = V with respect to the coordinate basis .
In classical tensor calculus one always works in such a basis, and with the components
of tensors with respect to such a basis. This is very convenient and natural, but this is
now clearly not the only choice.
Indeed, the above point of view suggests a reformulation and generalisation that is
extremely natural and useful (but that I will nevertheless hardly ever make use of in
these notes).
Namley, let {em (x)} be such that it is an invertible matrix for every point x. Then
another possible choice of basis for the space of covectors are the linear combinations
em := em dx . (3.106)
A general such basis is called a vielbein, which is German for multileg, quite appropriate
actually, as one should visualise this as a bunch of linearly independent (co-)vectors at
every point of space-time.
In two, three, and four dimensions these are also known more specifically as zweibeins,
dreibeins and vierbeins respectively. In four dimensions, the Greek word tetrads is
also commonly used. The em are sometimes also referred to as frame fields, mostly in
the context of orthonormal frames (see below).
In general, this new basis is not a coordinate basis, i.e. there does not exist a coordinate
system {y m } such that em = dy m . If such a coordinate system does exist, then one has
y m
em = dy m em =
x (3.107)
em = em ,
119
and locally also the converse is true. In particular, if
For many purposes, bases other than coordinate bases can also be extremely useful and
natural, in particular the orthonormal bases we will introduce below.
dx = em em , (3.109)
em m
en = n em em
= . (3.110)
A = A dx = A em em Am em , (3.111)
so that the components of A with respect to the new basis {em } are
Am = A em . (3.112)
Likewise, the vielbeins allow us to pass from a natural (or coordinate) basis for vector
fields, the { }, to another basis
Em = em , (3.113)
V = V (x) = V m Em (3.114)
with
V m = em V . (3.115)
Note that, unlike the , the Em do not commute in general, i.e.
[Em , En ] 6= 0 . (3.116)
We can apply the same reasoning to any other tensor field, e.g. to the metric tensor
itself. We can write the invariant line element as
so that the components of the metric with respect to the new basis are
gmn = g em en . (3.118)
120
Given a metric, there is a preferred class of bases {ea } which are such that the corre-
sponding matrices ea (x) diagonalise (and normalise) the metric at every point x, i.e.
which are such that gab = ab or
Such a basis ea , with respect to which the components of the metric are the Minkowski
metric ab , is known as an orthonormal basis or orthonormal frame.
In the more mathematical literature, the ea are also referred to as soldering forms
because they identify (solder, glue) an abstract space of (co-)vectors at each point x,
labelled by a, b, . . . with the concrete space of (co-)vectors tangent to the space-time at
the point x, labelled e.g. by the indices , , . . ..
For a general metric, a basis which achieves this cannot be a coordinate basis (because
this would mean that the metric is equivalent to the Minkowski metric by a coordinate
transformation). However, clearly there is no obstacle to finding a more general basis
which will do this: for every point x we can find a matrix ea (x) which achieves (3.119)
As the metric varies smoothly with x, we can also choose the matrices ea (x) to vary
smoothly with x, and hence we can put them together to define the smooth matrix-
valued function ea (x) for all x. [I am ignoring some global (topological) issues here.
We will not need to worry about them here.]
The reason why I referred to a class of bases above is that, clearly, such an orthonormal
basis is not unique. At every point x it is determined up to a Lorentz transformation
Thus a given metric does not determine a unique orthonormal basis, but only an or-
thonormal basis up to Lorentz transformations
If one wants the components of the metric in a given coordinate system {x }, one
expands the orthonormal basis ea in terms of the natural basis dx as above as
to find, as above,
g (x) = ea (x)eb (x)ab . (3.124)
Thus instead of the metric one can choose orthonormal vielbeins as the basic variables
of General Relativity. In that case one has to demand not only general covariance but
121
also invariance under local Lorentz transformations (acting on the orthonormal indices
a, b, . . .). [One could also allow for general vielbeins, in which case one would have to
replace Lorentz transformations by the larger group of general linear transformations.]
Examples:
Here are a few examples to illustrate that orthonormal frames are not something mys-
terious but can usually be read off very easily from the metric in a coordinate basis.
Now define
e1 = Rd , e2 = R sin d , (3.126)
i.e.
ea = ea dx (3.127)
with
e1 = R , e1 = 0 , e2 = 0 , e2 = R sin . (3.128)
ds2 = e1 e1 + e2 e2 = ab ea eb , (3.129)
so the ea are an orthonormal basis. They are obviously not a coordinate basis
because (3.108)
e2 = R cos 6= e2 = 0 . (3.130)
Ea = Ea (3.131)
E1 = R1 , E2 = (R sin )1 , (3.132)
which satisfies
g Ea Eb = ab . (3.133)
That this is not a coordinate basis is reflected in the fact that the commutator
[E1 , E2 ] 6= 0,
122
2. The Schwarzschild Metric (1.129)
The metric is
With
0 1 2 3 2m 1/2 2m 1/2
(e , e , e , e ) = (1 ) dt, (1 ) dr, rd, r sin d (3.136)
r r
ds2 = ab ea eb , (3.137)
Remarks:
g u u = 1 . (3.140)
we see that the 4-velocity u can be interpreted as the timelike component ea=0
of an orthonormal frame along the worldline,
u = ea=0 , (3.142)
123
condition of these vectors along the worldline, such as the Fermi-Walker parallel
transport to be discussed in section 4.10.
In any case, however the laboratory system is defined, the frame components
V a = ea V (3.143)
= u k = ea=0 k = ea=0
k k
a=0
. (3.144)
2. The ea can in some sense be regarded as the square-root of the metric. In par-
ticular denoting the determinant of the matrix ea by
(3.119) implies
p
g(x) := | det(g (x))| = e(x)2 |e(x)| = g(x) . (3.146)
3. Coordinate indices can, as usual, be raised and lowered with the space-time metric
g and its inverse, and Minkowski (tangent space) indices with the Minkowski
metric ab and its inverse.
Note that this is consistent with the notation for ea and its inverse ea because
ea = g ab eb . (3.147)
g = ab ea eb , (3.148)
etc. The reason why I have called the basis of vector fields in a general frame Em
rather than em is that em and Em are of course not related just by lowering or
raising the indices of the metric, Em 6= gmn en . The former are linear combinations
of the dx , the latter linear combinations of the , so they are very different
objects.
One could now go ahead and develop the entire machinery of tensor calculus (covariant
derivatives, curvature, . . . ) that we are about to develop in the following sections in
terms of vielbeins as the basic variables instead of the metric. This is rather straight-
forward. For example, given the expression for the Christoffel symbols in terms of the
124
metric, and for the metric in terms of the vielbeins, one can express the Christoffel
symbols (and hence covariant derivatives and curvatures) in terms of vielbeins, but the
resulting expressions are rather unenlightning and not of much use in practice.
The real power of the vielbein formalism emerges when one combines it with the for-
malism of differential forms. And in practice the most useful and efficient alternative
to working in components in a coordinate basis is working with differential forms in an
orthonormal basis.
I do most of my (curvature) calculations in the latter framework (and e.g. only then
translate them into coordinate components for the purposes of inserting them into these
notes), but this is (for the time being) not something I will develop further here.8
Having reached this point, you may have the impression that the notation we have
...
introduced for tensors, T 11 ...qp say, and which, as you might have noticed by looking
ahead, we will continue to use in these notes, with its morass of indices, is somewhat
cumbersome and unelegant. And perhaps you might prefer to at the very least see
everything written in terms of the index-free coordinate-invariant objects like V = V
or A = A dx introduced in section 3.6.
I cannot disagree with the sentiment that using all these indices does not appear to be
particularly elegant. Mathematicians abhor it. Physicists, however, are pragmatists by
nature - they will use whatever turns out to be useful or efficient for what they want
to achieve, regardless of whether or not it is considered or perceived to be beautiful or
elegant according to some external criteria.
In particular, in the case at hand, the index-laden notation would not be that commonly
used and widespread if it did not have some distinct advantages over other options.
Indeed, this notation is an extremely useful and informative bookkeeping device that
conveys a lot of information in a very compact way. In particular, as we have seen, the
index notation allows one to reliably read off what kind of tensor one is dealing with,
along the lines of if it has p upper and q lower indices, it transform like, hence is, a
(p, q)-tensor. Moreover, as we will see below, it provides one with a much more concise
and informative way of describing and performing algebraic manipulations of tensors
than some index-free notation is capable of.
Let me first make clear what the issue is and what it is not when one writes something
like V or V (x), as this can be interpreted in (at least) 2 distinct ways:
8
See e.g. W. Thirring, Classical Mathematical Physics for a presentation of general relativity entirely
in the coordinate-independent formalism of differential forms, and N. Straumann, General Relativity,
where differential forms are used whenever it is convenient or useful.
125
1. On the one hand, V may refer to the numerical values of the components of a
specific vector V in a specific coordinate system.
2. On the other hand, the notation V may be used to indicate that the object V
transforms like a vector.
The first use of V is completely uncontentious: if one wants to write down the compo-
nents of some object with respect to some basis, one has to write down the components
of that object with respect to that basis, there is no way around that.
It is mainly the second use and interpretation of the notation that is at stake, and it is
also mainly in this sense that the index notation is used for tensor algebra and tensor
calculus in general and in these notes in particular.
To a somewhat lesser extent the fact that the notation itself does not indicate whether
one has in mind the first or the second interpretation is also an issue (even though this
is usually clear from the context). It is actually not so much an issue (if desired this
is something that can easily be remedied - I will come back to this at the end of this
section) as possibly the source of a major misunderstanding between mathematicians
and physicists - namely that a dislike of the index notation arises from the (false!) belief
that it means that one is always writing down objects with respect to a particular basis.
If this were the case, this would indeed be clumsy and silly, and quite contrary to the
spirit of general covariance. However, as interpretation 2 indicates, this is absolutely
not what is meant.
Returning to the use of indices as a way to indicate tensorial type and tensorial oper-
ations (like contractions), let us consider the alternatives. If one wants to indicate in
symbols that some object V is a vector field, then as a mathematician one might write
something like V (T M ), stating that V is a section of the tangent bundle of the
space or space-time (manifold) M . This is fine, but if the space M is clear from the
context, why not declare once and for all that writing V means the same thing? And
perhaps use different kinds of indices to refer to tensors on different spaces?
If this were all then this would hardly be an issue and even physicists could be convinced
to write V (T M ), at least when talking to mathematicians. Where the index
notation really pays off, however, is when it comes to algebraic manipulations such as
those discussed in section 3.3 (and even more so when it comes to tensor analysis, which
is the subject of section 4, but tensor algebra will be enough to illustrate this).
As examples consider the contractions of a (1, 2) tensor T , say, with itself and with
a vector V . With indices one would write T and V and the possible contractions
would be written as
T T , T
(3.149)
(T , V ) T V , T V ,
126
the first line indicating the two distinct covectors one obtains as contractions of T itself,
and the second the two distinct possibilities of contracting T and V to obtain a (1, 1)-
tensor. In an index-free notation one would have to invent some operation like Cnm to
indicate a contraction over the mth upper and nth lower index.9 In this notation, the
four objects above would then be written as
T C11 (T ) , C21 (T )
(3.150)
(T, V ) C12 (T V ) , C22 (T V ) .
Is this superior? It does not even allow one to read off the tensor type of the resulting
objects unless one remembers what the tensor types of T and V were to begin with,
whereas this is completely manifest in (3.149).
Moreover, imagine how untransparent this would become were one to perform even the
simplest sequence of such elementary operations: compare
If you prefer the right-hand side, or some variant of it, feel free to use it. However, you
should be aware of the fact that the left-hand side contains an equivalent amount of
information, simply packaged in a more digestible way that is both more informative
(its a scalar!) and easier to manipulate. For most intents and purposes the index
notation is really extremely convenient and it is for this reason that we will continue to
make use of it in these notes.
One other reason for concern may be that by exclusively working with local coordinates
and coordinate bases one may be missing some global aspects of a space or space-
time. This is certainly true to a certain extent but is not primarily a notational issue.
Rather, it means that in addition one needs to make use of more advanced notions from
topology, global analysis etc. This is not something I will attempt here (cf. the book by
Hawking and Ellis in the previous footnote for a description of the groundbreaking early
applications of global analysis to general relativity). One related, but more elementary,
issue is the introduction and use of the term manifold when referring to spaces or space-
times of the kind we are dealing with in these notes. This is something I will very briefly
come back to in section 4.11 below.
Let me, to conclude this rant section, come back to the issue of the notational ambiguity
when one writes something like V , which can occasionally be a source of confusion.
Even though, as mentioned above, usually it is clear from the context what one means,
one might imagine wanting to write down a couple of equations with indices which
are only valid in spherical coordinates, say, and are therefore not to be understood as
tensorial equations. Then it might be helpful to have a notation which reveals that
information as well.
9
I am not making this up - see e.g. section 2.2 of The large scale structure of space-time by S. Hawking
and G. Ellis, in all other respects a wonderful book.
127
This can for instance be accomplished by inventing a new notation like = (or whatever)
to indicate an equality only in a special or specified coordinate system, but while this
may add clarity it does not address the fundamental issue that just writing V does
not unambiguously specify what one has in mind.
Alternatively, and more elegantly and attractively, this can e.g. be accomplished with
very litle effort with the help of what is known as the Penrose abstract index notation.
The idea is to still indicate the tensor type of an object by a certain kind of indices, but
with these indices only serving that purpose and not simultaneously referring to any
particular kind of basis. Thus for example, one would indicate a vector by an object
V a , where the fact that one has a single upper index a just means that this is a (1, 0)-
tensor, and nothing else (exactly as in interpretation 2 above). For the components of
this vector with respect to some basis (coordinates x ) one could then continue to use
the traditional V .
The advantage of this abstract index notation is that for tensorial operations one never
needs to specify a basis anyway, so they can all be performed at the level of the abstract
indices and tensorial equations look identical when written with these abstract indices
or when written with concrete component indices. Thus V a Wa is used to indicate the
scalar one obtains by contraction of a vector V a with a covector Wa . Likewise, instead
of T (which may look basis dependent) one would write T aab , and this is completely
equivalent to writing something like C11 (T ),
but much more informative and user-friendly, and all the usual rules of tensor algebra
apply to these abstract indices.
Whenever one wants or needs to specify a basis or coordinate system, this can be
accomplished by using other kinds of indices. Thus gab could e.g. be used to refer to
the metric tensor in general, while g could then be used to refer to its components in
the basis x . From this we see that
[...] the distinction between the index notation and the component notation
is much more one of spirit (i.e., how one thinks of the quantities appearing)
than of substance (i.e., the physical form the equations take).10
While I will not make use of the abstract index notation in these notes (with the hope
that this will not cause any confusion), the use of abstract indices appears to be an
ideal (eat the cake and have it too) compromise combining the best of both worlds
10
R. Wald, General Relativity. See section 2.4 of this book for a more detailed explanation of the
abstract index notation, which is systematically used throughout the book. For a detailed treatment of
the abstract index notation and a discussion of some minor subtleties with this notation see R. Penrose,
W. Rindler, Spinors and Space-Time, Vol. 1: Two-Spinor Calculus and Relativistic Fields.
128
and should actually keep both camps happy. It does not yet appear to have found
widespread acceptance among mathematicians, however.
An alternative compromise solution is the already mentioned use of differential forms (in
an orthonormal basis, say), which is manifestly covariant and minimises clutter, display-
ing only the (essential and informative) Lorentz Lie algebra indices while suppressing
the component indices of forms (anti-symmetric tensors).
129
4 Tensor Analysis (Generally Covariant Differentiation)
Tensors transform in a nice and simple way under general coordinate transformations.
Thus these appear to be the right objects to construct equations from that satisfy the
Principle of General Covariance.
However, the laws of physics are differential equations, so we need to know how to
differentiate tensors. The problem is that the ordinary partial derivative does not map
tensors to tensors, the partial derivative of a (p, q)-tensor is not a tensor unless p = q = 0.
This is easy to see: take for example a vector V . Under a coordinate transformation,
its partial derivative transforms as
x y
V = V
y x x
x y x 2 y
= V + V . (4.1)
y x y x x
The appearance of the second term shows that the partial derivative of a vector is not
a tensor.
As the second term is zero for linear transformations, you see that partial derivatives
transform in a tensorial way e.g. under Lorentz transformations, so that partial deriva-
tives are all one usually needs in special relativity.
We also see that the lack of covariance of the partial derivative is very similar to the
= 0, and this suggests that the problem can be
lack of covariance of the equation x
cured in the same way - by introducing Christoffel symbols. This is indeed the case.
V = V + V . (4.2)
It follows from the non-tensorial behaviour (1.157), (1.158) of the Christoffel symbols
under coordinate transformations x y that V , as defined above, is indeed a
(1, 1) tensor.
V = V + V (4.3)
V = J (J V ) + (J J J + J J )J V . (4.4)
130
The obstructions to tensoriality are the 2 terms involving the derivatives of the Jacobi
matrix, but these cooperatively combine to give
J J + J ( J )J = J J + J ( J )J
= J J ( J )J J (4.5)
= J J ( J )J = 0 .
Remarks:
1. Analysing the above argument for the tensoriality of the covariant derivative, we
see that it relies exclusively on the specific non-tensorial form of the transformation
behaviour of the Christoffel symbols, not on the explicit form of the Christoffel
symbols themselves.
Thus any other object could also be used to define a covariant derivative
(generalising the partial derivative and mapping tensors to tensors) provided that
it transforms in the same way as the Christoffel symbols, i.e. provided that one
has
2 x
=
y x x + y . (4.8)
x x y y
y y
This implies (and is equivalent to the fact) that the difference
C =
(4.9)
is of the form
transforms as a tensor. Thus, any such
= + C
(4.10)
131
2. We could have arrived at the above definition of the covariant derivative (using
the Christoffel symbols) in a somewhat more systematic way by appealing to the
equivalence principle and/or general covariance. Namely, let { a } be an inertial
coordinate system. In an inertial coordinate system we can just use the ordinary
partial derivative b V a . We now define the new (improved, covariant) derivative
V in any other coordinate system {x } by demanding that it transforms as a
(1,1)-tensor, i.e. we define
x b
V := b V a . (4.11)
a x
By a straightforward calculation one finds that
V = V + V , (4.12)
V = V ; ; . (4.17)
One can also define the covariant directional derivative of a vector field V along
another vector field X by
X V X V . (4.18)
132
4. The appearance of the Christoffel-term in the definition of the covariant derivative
may at first sight appear a bit unusual (even though it also appears when one
just transforms Cartesian partial derivatives to polar coordinates etc.). There
is a more invariant way of explaining the appearance of this term, related to
the more coordinate-independent way of looking at tensors explained in section
3.6. Namely, since the V (x) are really just the coefficients of the vector field
V (x) = V (x) when expanded in the basis , a meanigful definition of the
derivative of a vector field must take into account not only the change in the
coefficients but must also include a prescription how bases at (infinitesimally)
neighbouring points are related (or connected). Such a prescription is provided by
the Levi-Civita connection (or a general connection ).
Indeed, writing
V = (V )
= ( V ) + V ( ) , (4.19)
we see that the covariant derivative of the coordinate basis vector (i.e. V = 1,
V = 0 otherwise), is the linear transformation (a prescription for a change of
basis)
= . (4.20)
So far we have defined the covariant derivative for vector fields, and we now want to
extend the definition of the covariant derivative to other tensor fields. In order to achieve
this, we now adopt a more systematic and axiomatic approach.
Our basic postulates for the covariant derivative are the following:
= . (4.21)
133
We will now see that, demanding the above properties, in particular the Leibniz rule,
there is a unique extension of the covariant derivative on vector fields to a differential
operator on general tensor fields, mapping (p, q)- to (p, q + 1)-tensors.
To define e.g. the covariant derivative for covectors U , we note that U V is a scalar
for any vector V so that
(U V ) = (U V ) = ( U )V + U ( V ) (4.23)
(since the partial derivative satisfies the Leibniz rule), and we demand
(U V ) = ( U )V + U V . (4.24)
U = U U . (4.25)
That this is indeed a (0, 2)-tensor can either be checked directly or, alternatively, is a
consequence of the quotient theorem.
The extension to other (p, q)-tensors is now immediate. If the (p, q)-tensor is the direct
product of p vectors and q covectors, then we already know its covariant derivative (using
the Leibniz rule again). We simply adopt the same resulting formula for an arbitrary
(p, q)-tensor. The result is that the covariant derivative of a general (p, q)-tensor is the
sum of the partial derivative, a Christoffel symbol with a positive sign for each of the p
upper indices, and a Christoffel with a negative sign for each of the q lower indices. In
equations
1 p 1 p
T 1 q = T 1 q
1 1 p1
+ T 12qp p
+ . . . + T 1 q
| {z }
p terms
1 p 1 p
1 T 2 q . . . q T 1 q1 (4.26)
| {z }
q terms
Having defined the covariant derivative for arbitrary tensors, we are also ready to define
it for tensor densities. For this we recall that if T is a (p, q; w) tensor density, then
gw/2 T is a (p, q)-tensor. Thus (gw/2 T ) is a (p, q + 1)-tensor. To map this back to
a tensor density of weight w, we multiply this by gw/2 , arriving at the definition
134
where tensor
just means the usual covariant derivative for (p, q)-tensors defined above.
For example, for a scalar density one has
w
= ( g) . (4.29)
2g
In particular, since the determinant g is a scalar density of weight +2, it follows that
g = 0 , (4.30)
which obviously simplifies integrations by parts in integrals defined with the measure
4
gd x.
The main properties of the covariant derivative, in addition to those that were part of
our postulates (like linearity and the Leibniz rule) are the following:
A = A + A A = A . (4.32)
The most transparent way of stating this property is that the Kronecker delta is
covariantly constant, i.e. that
= 0 . (4.33)
A... ...
... = (A ... )
= ( A... ...
... ) + A ...
= ( A...
... ) (4.34)
135
which is precisely the statement that covariant differentiation and contraction
commute. To establish that the Kronecker delta is covariantly constant, we follow
the rules to find
= +
= = 0 . (4.35)
This property does not rely on the specific form of the , and is thus true for
any covariant derivative defined by some choice of connection ,
g = + , (4.36)
we calculate
g = g g g
= +
= 0 . (4.37)
136
also knowns as the no torsion property of the covariant derivative. Namely, we
have
=
= + = 0 . (4.39)
Note that the second covariant derivatives on higher rank tensors do not commute
- we will come back to this in our discussion of the curvature tensor later on.
We noted before that the postulates for a covariant derivative (a linear tensorial operator
reducing to the partial derivative on scalars and satisfying the Leibniz rule) do not
determine it uniquely but only up to the addition of a tensor to the connection,
= + C ,
(4.40)
Not unrelated to this is the fact that it is the uniqe connection that can be built
from only the metric and its 1st derivatives (and which thus vanishes in an inertial
coordinate system in Minkowski space or at the origin of an inertial coordinate system
in an arbitrary gravitational field.
Moreover, as we have seen, this covariant derivative has two important properties,
namely that
2. the torsion is zero, i.e. the second covariant derivatives of a scalar commute.
In fact, it turns out that these two conditions uniquely determine the to be the
Christoffel symbols. The second condition implies that the are symmetric in the
two lower indices,
,
[ ] = 0 =
. (4.41)
The first condition now allows one to express the in terms of the derivatives of
the metric, leading uniquely to the familiar expression for the Christoffel symbols :
First of all, by definition / construction one has (e.g. from demanding the Leibniz rule
)
for
g = g
g g
g
. (4.42)
137
Requiring that this be zero implies in particular that
g +
0= g
g
= g + g g
+
+
(4.43)
= 2( )
(where the cancellations are entirely due to the assumed symmetry of the coefficients
= . This unique metric-compatible and torsion-free
in the last two indices). Thus
connection is also known as the Levi-Civita connection. It is the connection canonically
associated to a space-time (manifold) equipped with a metric tensor, and it is the
connection used in general relativity.
It is possible to relax either of the conditions (1) or (2), or both of them and this will
be discussed in section 10.5, and subsequently also in section 19.7.
In this section we will look at some common and useful special cases of the Levi-Civita
covariant derivative (simply the covariant derivative in the following), such as the
covariant curl and divergence etc.
F = A A (4.45)
[ A ] = [ A ] . (4.46)
138
3. The Covariant Divergence of a Vector
By the covariant divergence of a vector field one means the scalar
V = V + V . (4.47)
I will give a proof of this identity in an appendix to this section (subsection 4.6).
Thus the covariant divergence can be written compactly as
1
V = ( gV ) , (4.49)
g
and one only needs to calculate g and its derivative, not the Christoffel symbols
themselves, to calculate the covariant divergence of a vector field.
This formula is also useful (and provides the quickest way of arriving at the result)
if one just wants to write the ordinary flat space divergence of vector calculus on
R3 in, say, polar or cylindrical coordinates.
~ is of course
In Cartesian coordinates (x1 , x2 , x3 ), the divergence of a 3-vector V
given by the familiar expression
~ = 1 V 1 + 2 V 2 + 3 V 3 .
divV (4.50)
However, as you also know, e.g. in spherical coordinates (r, , ) the divergence is
not simply of this form,
~ 6= r V r + V + V .
divV (4.51)
Rather, going through the coordinate transformation and Jacobians etc., one finds
that calculating the divergence in spherical coordinates one picks up additional
terms, the result taking the somewhat unintuitive form
~ = r V r + V + V + 2 V r + cot V .
divV (4.52)
r
The easy and quick way to obtain this, which provides a rationale for and expla-
nation of the origin of these additional terms, is from the result (4.49). Using
g = r 2 sin , one has
1 h i
~ =
divV r (r 2
sin V r
) + (r 2
sin V
) + (r 2
sin V
)
r 2 sin (4.53)
2
= r V r + V + V + V r + cot V .
r
This thus produces the correct result on the nose and with very little effort.
139
4. The Covariant Laplacian of a Scalar
How should the Laplacian be defined? Well, the obvious guess (something that
is covariant and reduces to the ordinary Laplacian for the Minkowski metric) is
= g , which can alternatively be written as
= g = = = g (4.54)
etc. Note that, even though the covariant derivative on scalars reduces to the
ordinary partial derivative, so that one can write
= g , (4.55)
it makes no sense to write this as : since does not commute with the
metric in general, the notation is at best ambiguous as it is not clear whether
this should represent g or g or something altogether different. This am-
biguity does not arise for the Minkowski metric, but of course it is present in
general.
A compact yet explicit expression for the Laplacian follows from the expression
for the covariant divergence of a vector:
:= g
= (g )
= g 1/2 (g1/2 g ) . (4.56)
Again, this formula is also useful (and provides the quickest way of arriving at the
result) if one just wants to write the ordinary flat space Laplacian on R3 in, say,
polar or cylindrical coordinates.
To illustrate this, let us calculate the Laplacian for the standard metric on Rn+1
in polar coordinates. The standard procedure would be to first determine the
coordinate transformation xi = xi (r, angles), then calculate /xi , and finally
P
assemble all the bits and pieces to calculate = i (/xi )2 . This is a pain.
To calculate the Laplacian, we do not need to know the coordinate transformation,
all we need is the metric. In polar coordinates, this metric takes the form
where d2n is the standard line-element on the unit n-sphere S n . The determinant
of this metric is g r 2n (times a function of the coordinates (angles) on the
sphere). Thus, for n = 1 one has ds2 = dr 2 + r 2 d2 and therefore
In general, denoting the angular part of the Laplacian, i.e. the Laplacian of S n ,
by S n , one finds analogously
= r2 + nr 1 r + r 2 S n . (4.59)
140
I hope you agree that this method is superior to the standard procedure.
Now the second term is an ordinary total derivative and thus, if V vanishes
sufficiently rapidly at infinity, one has
Z
4
gd x V = 0 . (4.61)
A somewhat more precise statement of this theorem, including the boundary con-
tributions to the integral, will be given in section 15.3.
T = T + T + T + . . .
= g1/2 (g1/2 T ) + T + . . . . (4.62)
V g = V g + ( V )g + ( V )g . (4.64)
While we saw that this expression could be understood and deduced from the
requirement that the variation of the metric is itself a tensorial object that trans-
forms like the metric, the tensorial nature of the above expression is far from
manifest. However, it has a very nice and simple expression in terms of covariant
derivatives of V , namely
V g = V + V (4.65)
141
We can also obtain this condition as the covariantisation of the statement that in
a particular coordinate system the coefficients of the metric do not depend on one
of these coordinates, say y,
y g = 0 , (4.67)
so that the metric is then manifestly invariant under translations in y. In such
a coordinate system adapted to the symmetry at hand, these translations are
generated by K = y , and for a vector of this form (in particular, thus, with
constant coefficients) one has
K = y K = y
K = y (4.68)
K + K = y g
(where in the last step the basic relation (1.156) was used). Thus we find that the
fact that the metric is y-translation invariant can be characterised covariantly as
the statement that K = y satisfies
y g = 0 K + K = 0 . (4.69)
This is again the Killing equation (4.66). As this equation is now tensorial it is
valid in any coordinate system, in particular independently of whether or not the
coordinate system is adapted to K in the way described above.
The expressions (4.65) and (4.69) will be rederived (and placed into the general
context of Lie derivatives and Killing vectors) in section 8 - see in particular section
8.5.
You will have noticed that many equations simplify considerably for completely anti-
symmetric tensors. In particular, their curl can be defined in a tensorial way without
reference to any metric. This observation is at the heart of the coordinate indepen-
dent calculus of differential forms. In this context, the curl is known as the exterior
derivative.
Indeed, it is also straightforward to show directly, i.e. without going through the illogi-
cal loop of introducing the covariant derivative in order to obtain something manifestly
tensorial only to find it disappear again from the final expression, that [ A1 ...p ] is
a tensor, i.e. transforms as a tensor under coordinate transformations: what happens
is that the possible obstructions to the tensorial behaviour, namely derivatives of Ja-
cobians, drop out after anti-symmetrisations because they are are really 2nd partial
derivatives of the coordinates, which are symmetric and thus do not survive the anti-
symmetrisation.
To see this completely explicitly, consider a covector A (x) and a coordinate transfor-
mation x = x (y ), with Jacobi matrix
x
J = . (4.70)
y
142
As a covector, A transforms as A = J A , and therefore its derivative transforms as
(using = J )
A = J A A = J J A + ( J )A . (4.71)
Because of
2 x
J = = J , (4.72)
y y
for the anti-symmetrised derivative one finds the tensorial transformation behaviour
A A = J J ( A A ) . (4.73)
Likewise, Lie derivatives of tensors in general (section 8) are, as the special case of
the Lie derivative of the metric mentioned above - see (4.65), automatically tensorial
objects (and one can, but need not, make their tensorial nature manifest by writing
these derivatives in terms of covariant derivatives).
Here is an elementary proof of the identity (4.48), and a useful more general formula
for the variation of the determinant of the metric, namely
g = gg g or g1 g = g g . (4.74)
This proof is based on the standard cofactor or minor expansion of the determinant of
a matrix (an alternative standard proof can, as also outlined below, be based on the
identity det G = exp tr log G and its derivative or variation). The cofactor expansion
formula for the determinant is
X
g= (1)+ g |m | , (4.75)
where |m | is the determinant of the minor of g , i.e. of the matrix one obtains by
removing the th row and th column from g .
since this is, in particular, the determinant of a matrix with g = g , i.e. of a matrix
with two equal rows. Together, these two results can be written as
X
(1)+ g |m | = g . (4.77)
This shows that the coefficients of the inverse metric g are given by
|m |
g = (1)+ , (4.78)
g
143
a formula that should also be familiar from linear algebra. Now varying g in (4.75) with
respect to g and noting that, by construction, m does not depend on g , one finds
X
g = (1)+ g |m | = gg g . (4.79)
For a symmetric matrix, in particuar for the metric, this reduces to the formula (4.74)
we set out to establish. It also implies
g = 21 gg g , (4.80)
a particuarly useful result that we will repeatedly make use of. An equally useful
variant of this equation is an expression for the variation of g expressed in terms of
the variations g of the components of the inverse metric. As a consequence of
g g = g = g g g (4.81)
or
g g = 4 (g )g = g g (4.82)
It follows from (4.79) that if the variation is the partial derivative one has
g
g = g = gg g . (4.84)
g
or
g 1 g = g g , (4.85)
The result (4.79) can also be written in matrix form, with G denoting the matrix with
components (G) = g , as
144
In this form, the result can also be derived from variation of the remarkably useful
identity
det G = e tr log G (4.90)
This identity, in turn, can be derived in an elementary way for diagonalisable G by
noting that it holds trivially for diagonal matrices, and therefore, by the conjugation
invariance of det and tr, also for diagonalisable matrices (like the metric). [And if
desired, this can in turn be extended to all matrices by topological arguments involving
extensions of continuous functionals from the dense set of diagonalisable matrices to the
space of all matrices . . . ]
So far, we have defined covariant differentiation for tensors defined everywhere in space
time. Frequently, however, one encounters tensors that are only defined on curves - like
the momentum of a particle which is only defined along its world line. In this section we
will see how to define covariant differentiation along a curve. Thus consider a curve x ( )
(where could be, but need not be, proper time) and the tangent vector field X (x( )) =
x ( ). Now define the covariant derivative D along the curve, covariantising d/d , by
d
= x D = X = x . (4.91)
d
Frequently one also uses the (suggestive, but ugly) notation
D V = x V + x V
d
= V (x( )) + (x( ))x ( )V (x( )) . (4.93)
d
For this to make sense, V needs to be defined only along the curve and not necessarily
everywhere in space time.
This notion of covariant derivative along a curve permits us, in particular, to define the
(covariant) acceleration a of a curve x ( ) as the covariant derivative of the velocity
u = x ,
+ x x = u u .
a = D x = x (4.94)
Thus we can characterise (affinely parametrised) geodesics as those curves whose cvari-
ant acceleration is zero,
Geodesics: a = u u = 0 , (4.95)
a reasonable and natural statement regarding the movement of freely falling particles.
If they are not affinely parametrised, as in (2.35), then instead of u u = 0 one has
u u = u . (4.96)
145
4.8 Parallel Transport and Geodesics
We now come to the important notion of parallel transport of a tensor along a curve.
Note that, in a general (curved) metric space time, it does not make sense to ask if two
vectors defined at points x and y are parallel to each other or not. However, given a
metric and a curve connecting these two points, one can compare the two by dragging
one along the curve to the other using the covariant derivative.
We say that a tensor T
is parallel transported along the curve x ( ) if
D T
= 0 . (4.97)
1. In a locally inertial coordinate system along the curve, this condition reduces to
dT /d = 0, i.e. to the statement that the tensor does not change along the curve.
Thus the above is indeed an appropriate tensorial generalisation of the intuitive
notion of parallel transport to a general metric space-time.
2. The parallel transport condition is a first order differential equation along the
curve and thus defines T
( ) given an initial value T (0 ).
3. Taking T to be the tangent vector u = x to the curve itself, the condition for
parallel transport becomes
D u = 0 + x x = 0 ,
x (4.98)
i.e. precisely the geodesic equation. We have already seen that geodesics are
precisely the curves with zero acceleration. We can now equivalently characterise
them by the property that their tangent vectors are parallel transported (do not
change) along the curve. For this reason geodesics are also known as autoparallels.
4. Since the metric is covariantly constant, it is parallel along any curve. Thus, in
particular, if V is parallel transported, also its length remains constant along the
curve,
d
D V = 0 (g V V ) = D (g V V ) = 0 . (4.99)
d
In particular, we rediscover the fact claimed in (2.23) that the quantity g x x
is constant along a geodesic,
d
D x = 0 (g x x ) = 0 . (4.100)
d
5. Now let x ( ) be a geodesic and V parallel along this geodesic. Then, as one
might intuitively expect, also the angle between V and the tangent vector to the
curve u remains constant. This is a consequence of the fact that both the norm
of V and the norm of u are constant along the curve and that
d
(g u V ) = D (g u V ) = g (D u )V + g u D V = 0 (4.101)
d
146
4.9 Example: Parallel Transport on the 2-Sphere
As usual, the simplest non-trivial example is provided by the 2-sphere with its standard
line element
ds2 = d2 = d 2 + sin2 d2 , (4.102)
with the non-zero Christoffel symbols (determined e.g. from the geodesic equation, as
in (2.82) - (2.87))
g x x = sin2 0 , (4.106)
0 = V + x V = V + V
(4.107)
= V + V + V .
Using the explicit form of the Christoffel symbols, the parallel transport equations are
thus
0 = V sin cos V
(4.108)
0 = V + cot V .
Differentiating once more, these equations can be decoupled and take the form of har-
monic oscillator equations with frequency cos 0 ,
(2 + cos2 0 )V = 0 . (4.109)
Plugging this into the 1st order equations to reduce the spurious 4 to 2 integration
constants, and relating them to the intial values at = 0, say,
V (0 , = 0) = v , (4.111)
147
one finally finds the result
Remarks:
1. In the special case of parallel transport along the equator 0 = /2, one has
cos 0 = 0, and therefore
0 = /2 V (/2, ) = v . (4.113)
In other words, the components are constant under parallel transport along the
equator. This is inuitively obvious on the basis of spherical symmetry. Since
among the family of constant = 0 curves olny the equator is a geodesic (great
circle), this is also in agreement with the general results obtained above, which
imply that upon parallel transport along the equator the angle between the vector
and the equator remains constant. In 2 dimensions, this condition, together with
the fact that the lenght of a vector remains invariant under parallel transport
in general, is sufficient to imply that the parallel transported components are
constant along the path.
2. While the above is not unexpected, perhaps the most interesting consequence
of the above result (4.112) is that, in general, not only are the components not
constant but that actually, after having completed the 2-circuit along the path to
return to the starting point, the parallel transported vector will not agree with the
initial vector. Indeed, the components at = 2 are related to the components
v at = 0 by
3. As we will see in section 10.1, this fact that parallel transport along closed paths is
non-trivial (equivalently that parallel transport from one point to another depends
on the path) can be directly attributed to (and is the smoking gun of) the presence
of curvature.
4. If desired, the result can be written in terms of proper distance s along the circle,
rather than the angle , by the substitution
148
5. The result (4.112) takes on a more transparent form when written in terms of
the components of V and v with respect to an orthonormal basis (section 3.8) E
rather than the coordinate basis . Such an orthonormal basis is provided by
E = , E = (sin )1 , (4.116)
g E E = g E E = 1 , g E E = 0 . (4.117)
The components with respect to this orthonormal basis are related to the coordi-
nate components by
V = V = V E V = V , V = sin V (4.118)
is known as the deficit angle or holonomy of the parallel transport along the given
loop. With this terminology we can say that the holonomy along the equator is
trivial.
7. At the other extreme, we see that there is a non-trivial holonomy as 0 0, i.e. for
parallel transport along an infinitesimal loop around the north pole, along which
the parallel transported vector performs a complete 2-rotation, (2) = 2. As
shown in section 10.1, parallel transport along infinitesimal loops at or around a
point provides a precise measure of the curvature at that point.
8. Curiously, as shown by Rothman, Ellis and Murugan, the holonomy along circular
equatorial orbits in the Schwarzschild geometry (such orbits are geodesics at the
critical points of the effective potential for geodesic motion, to be discussed in sec-
tion 24.3), is non-trivial, even though again intuitive reasoning based on spherical
symmetry might have led one to expect a trivial result (and would thus have led
one astray).11
11
T. Rothman, G. Ellis, J. Murugan, Holonomy in the Schwarzschild-Droste Geometry,
arXiv:gr-qc/0008070.
149
4.10 Fermi-Walker Parallel Transport
The properties of parallel transport established in section 4.8 show that this is a natural
prescription for transporting tensorial objects along a geodesic. However, it is important
to keep in mind that this is just one possible description, obtained by imposing the
differential equation (4.97), e.g. for a vector
D V = 0 . (4.122)
a = D u = x x 6= 0 , (4.123)
however, this prescription has some shortcomings. For example, parallel transport of a
tanget vector to the curve at a point to another point at the curve will not give rise to
the tangent vector at the second point, simply because D V = 0 with initial condition
V (0 ) = u (0 ), say (parallel transport) is evidently not the same as D u = a (the
equation satisfied by the tangent vector). Likewise, the scalar product between the
tangent vector to the (non-geodesic) curve and some parallel-transported vector along
it will not remain constant in general,
d
D u = a , D V = 0 (g u V ) = a V . (4.124)
d
A vivid illustration of this is provided by the example of the previous section:
The latter procedure appears to be much more natural in this case than rotating ones
basis as one goes around the sphere. Analogously, for an observer along a timelike curve
it would be desirable to be able to set up once and for all a local reference system on
the worldline, consisting of the (unit) tangent vector E0 = u in the time-direction,
and three orthogonal and mutually orthogonal vectors Ek in the spatial directions (the
laboratory system of the observer), regardless of whether the oberver is in free fall or
not (indeed, most laboratories are not . . . ).
This procedure can be formalised by replacing the parallel transport condition (4.122)
along a timelike curve by the Fermi-Walker Transport prescription
F V D V + F V = 0 , (4.125)
150
with
F = a u u a . (4.126)
Indeed, parallel transport according to this prescription has the following desirable
features:
a = u u u F = 0 . (4.127)
F u = 0 . (4.128)
Proof:
F u = D u + F u
(4.129)
= a + (a u u a )u = a a = 0
because u u = 1 and a u = 0. Thus the solution to the Fermi-Walker trans-
port prescription for V (0 ) = u (0 ) is just the tangent vector u ,
F V = 0 , V (0 ) = u (0 ) V ( ) = u ( ) . (4.130)
151
Remarks:
1. The signs chosen here are appropriate for timelike curves with u u = 1. As the
proofs of the above statements show, in the spacelike case one needs to replace
F F .
3. Note that the properties 2-4 in the above list rely on the 3 properties
F u = a , u F = a , F + F = 0 (4.139)
F F + (4.140)
with
u = u = 0 , + = 0 . (4.141)
Since there is no such rotation term in the prescription for Fermi-Walker transport,
and no natural candidate for it either with only u and a at ones disposal, it
is natural to think of Fermi-Walker transport as a prescription for transporting
objects in a non-rotating way.
152
4.11 Epilogue: Manifolds? Think Globally, Act Locally!
In section 3.9 I had already briefly discussed some issues regarding the use of indices (and
thus in some sense of local coordinates), and had advocated them as a useful bookkeeping
device that also provides a transparent way of performing algebraic operations (tensor
algebra). In the meantime we have seen that this extends to tensor analysis, and I can
only reiterate that for most purposes and in most cases it is much more convenient to
perform calculations in this notation than in some supposedly more elegant index-free
notation.
There is one issue, however, that is worth commenting upon, and that in the end actually
provides further justification for being allowed to adopt this procedure. Namely, in using
local (Cartesian, say) coordinates x to describe a space or space-time (I will use space
in the following) one is implicitly assuming the following 3 things:
1. first of all, that one can always locally introduce Cartesian coordinates on that
space (so as to then be able to perform tensor algebra, tensor analysis etc.);
2. secondly, that different choices of local coordinates will give compatible descrip-
tions of that space;
3. and finally, that in principle one can obtain complete information about the space
by covering it with such local coordinate systems.
When these assumptions are satisfied, then one is justified in using local coordinates to
describe such a space. The point of this brief section is just to point out that (modulo
some topological fine-points) these conditions amount precisely to the definition of a
(differentiable or smooth) manifold in mathematics.
Thus while I could have started off these notes with an introduction to and definition
of smooth manifolds (and numerous textbooks do), for all local intents and purposes
this is then really equivalent to (consistently) working in local coordinates, as we have
done and will continue to do. It is true that the notion of manifolds, of vector bundles
on them etc. becomes indispensable for certain more advanced questions dealing with
the global structure of a space-time, or theorems about the existence and uniqueness of
solutions to differential equations on some manifold, say, but these are not topics that
will be addressed in these notes.
The usual textbook definition of a manifold consists essentially of the following steps:13
13
This presentation is adapted from the concise and clear description in S. Mukhi, N. Mukunda,
153
1. Topological Spaces
A topological space is a set S together with a collection of subsets U of S (called
open sets) which includes S and the empty set, and which is closed under union
and finite intersection. This set of open sets defines the topology of the space
and a corresponding notion of continuous maps (the inverse image of any open
set is open) and homeomorphisms (bijective maps such that both and 1
are continuous) between topological spaces. In particular there is a notion of
continuity for (real-valued, say) functions
f: SR (4.142)
2. Charts
However, in this context there is no notion of differentiability or differentiation.
In order to have such things at ones disposal one needs topological spaces that
locally look like Rn . The essential building blocks of such a topological space
are charts:
A chart C on a topological space S is the pair C = (U, ) where U S is an open
set of S and is a homeomorphism
U S (U ) Rn . (4.143)
3. Topologial Manifolds
A topological manifold is a topological space M that is locally homeomorphic to
Rn in the sense that for each point p there is a chart C = (U, ) with p U
(and that satisfies some further topological regularity conditions we are not inter-
ested in, such as Hausdorff and usually either second countable or paracompact).
Equivalently, a topological space has the structure of a topological manifold when
it posesses a covering by open sets Ua with charts Ca = (Ua , a ).
154
If one has two charts on M , C1 = (U1 , 1 ) and C2 = (U2 , 2 ), and U1 U2 6= ,
then the transition functions
1 1
2 : 2 (U1 U2 ) 1 (U1 U2 )
(4.144)
2 1
1 : 1 (U1 U2 ) 2 (U1 U2 )
f: M R (4.146)
fU = f 1 : (U ) Rn R . (4.147)
i.e.
pU f (p) = fU (~xp ) . (4.148)
For such functions on Rn we now not only have a notion of continuity at our
disposal, but also the notions of differentiability, smoothness, differentiation etc.
On the intersection of 2 charts we can represent the function f in 2 different ways
in terms of local coordinates, namely by the functions fUa fa for a = 1, 2,
on U1 U2 : f = f1 1 = f2 2 f2 = f1 (1 1
2 )
(4.149)
f1 = f2 (2 1
1 )
This is just the change of variables formula for a function (scalar), namely
6. Compatibility of Charts
In order to be able to extend the notion of smoothness (C -differentiability),
say, of a function from a local chart consistently to all of M , we need to impose
compatibility conditions on intersecting charts.
It is evident from (4.149) that the notion of smoothness of a function around a
point p will only be independent of the chart if the transition functions 1 12
and 2 1
1 (i.e. the coordinate transformations) are also smooth. Thus we define
2 charts to be smoothly compatible if either U1 U2 is empty or, otherwise, if these
maps are smooth.
Note that for topological manifolds and the condition of continuity any 2 charts
are automatically compatible since the transition functions are continuous.
155
7. Smooth Atlas and Compatibility and Equivalence of Atlases
A smooth atlas A(M ) of M is now naturally a family of charts Ca = (Ua , a )
which cover M and such that all charts are mutually smoothly compatible.
2 smooth atlases A1 (M ) and A2 (M ) for the same topological manifold M are said
to be compatible with each other if all the charts of A1 are compatible with all
the charts of A2 . This defines an equivalence relation on atlases.
ab = b 1
a : a (Ua ) Rm b (Vb ) Rn (4.155)
Analagously one can define C k -differentiable manifolds (transition functions are required
to be of degree C k ), real analytic manifolds (transition functions are required to be real
analytic), complex manifolds (modelled on open subsets of Cn , with holomorphic tran-
sition functions), etc., as well as submanifolds (modelled on subspaces of Rn ), manifolds
with boundary (modelled on the half-space Rn+ ) etc.
156
5 Physics in a Gravitational Field and Minimal Coupling
Recall that the Principle of General Covariance (section 3.1) says that, by virtue of
the Einstein Equivalence Principle, a generally covariant equation will be valid in an
arbitrary gravitational field provided that it is valid in Minkowski space in inertial
coordinates (i.e. in the absence of gravity and/or acceleration).
We now have all the tools at our disposal to construct such equations. In particular, the
fact that the covariant derivative maps tensors to tensors and reduces to the ordinary
partial derivative in a locally inertial coordinate system suggests the following procedure
or algorithm for obtaining equations that satisfy the Principle of General Covariance:
a 7 x . (5.1)
6. In particular, for the proper-time derivative along a curve this entails replacing
d/d by D ,
d
7 D = x . (5.5)
d
R R 4
7. Wherever an integral d4 appears, replace it by gd x,
Z Z
4 4
d 7 gd x . (5.6)
157
By construction, the resulting equations or expressions are tensorial (generally covari-
ant) and true in the absence of gravity and hence satisfy the conditions for the Principle
of General Covariance to apply. As a consequence they will be true in the presence of
gravitational fields, at least on scales small compared to those of the gravitational fields.
This procedure can thus be regarded as providing us with a description how to couple
matter (particles, fields) to the gravitational field.
Remarks:
2. The reasons for the at least on small scales caveat in the paragraph above is
that if one considers higher derivatives of the metric tensor then there are other
equations that one can write down, involving e.g. the curvature tensor, that are
tensorial but reduce to the same equations in the absence of gravity.
We can see the power of the formalism we have developed so far by rederiving the laws
of particle mechanics in a general gravitational field. In Special Relativity (SR), the
motion of a free particle with mass m is governed by the equation
dua
SR: aa = =0 , (5.7)
d
where ua = d a /d is the 4-velocity and aa the 4-acceleration. Thus, using the principle
of minimal coupling, the equation of motion of a free particle in a general gravitational
field is
GR: a = D u = 0 x + x x = 0 , (5.8)
158
We could also have arrived at this equation for a free particle in a gravitational field by
applying the minimal coupling description not at the level of the equations of motion
but rather (and perhaps conceptually more satisfactorily) at the level of the action, i.e.
by replacing
Z Z p Z Z
p
S = m d = m a
ab d d b m d = m g dx dx ,
(5.9)
and this is exactly what we already did back in section 1.7 where we showed that this
also leads to the geodesic equation (5.8).
Here is where the formalism we have developed really pays off. We will see once again
that, using the minimal coupling rule, we can immediately rewrite the equations for a
scalar field (here) and the Maxwell equations (in section 5.6 below) in a form in which
they are valid in an arbitrary gravitational field.
1. The action for a (real) free massive scalar field in Special Relativity is
Z h i
SR: S[] = d4 12 ab a b 21 m2 2 . (5.10)
To covariantise this, we replace d4 gd4 x, ab g , and we can replace a
by or (since this makes no difference on scalars). Therefore, the covariant
action in a general gravitational field is
Z
4 h 1 i
GR: S[, g ] = gd x 2 g 12 m2 2 . (5.11)
Here I have also indicated the dependence of the action on the metric g . This
is not (yet) a dynamical field, though, just the gravitational background field.
Remarks:
(a) A comment on how to derive this: if one thinks of the in the action as
covariant derivatives, , then the calculation is identical to that in
159
Minkowski space provided that one remembers that g = 0. If one sticks
with the ordinary partial derivatives, then upon the usual integration by
parts one picks up a term ( gg ) which then evidently leads to the
Laplacian in the form (4.56).
(b) If the relative sign of (or g ) and m2 in the Klein-Gordon equation looks
unfamiliar to you, then this is probably due to the fact that in a course where
you first encountered the Klein-Gordon equation the opposite (particle physi-
cists) sign convention for the Minkowski metric was used, with its negative
definite spatial metric.
(c) All of this generalises in a straightforward way to (self-)interacting scalar
fields, described by a potential V (). In particular, the action is
Z
4 h 1 i
S[, g ] = gd x 2 g V () . (5.13)
Logically the next thing to discuss would be the energy-momentum tensor, e.g. the
minimally coupled counterpart of the special relativistic (Noether) energy-momentum
tensor
SR: Tab = a b + ab L (5.14)
and its properties. However, it turns out that there is more to say about this than meets
the eye, and we will therefore return to this issue in more detail in section 6.
Before turning to our next example, I want to briefly comment on the issue of general
covariance in Minkowski space, as this tends to generate quite a bit of confusion and
unnecessary debates. I will discuss this issue in the context of the above example of a
scalar field, but the discussion is valid more generally.
On the one hand, the action (5.10) is generally considered to be invariant (only) under
Lorentz or Poincare transformations, while by construction the action (5.11) is invariant
under arbitrary coordinate transformations. Does this really mean that the theory of a
scalar field in a non-trivial gravitational background has more invariances than that in
a Minkowski background?
On the other hand, certainly nothing prevents one from using e.g. spherical (and thus in
particular non-inertial) coordinates in Minkowski space to write down the Klein-Gordon
equation or action. But does this mean that the action (5.10) is actually (secretly)
invariant also under such non-Lorentz transformations?
Well, that depends . . . While this sounds like (and generally is correctly considered to
be) a somewhat unsatisfactory answer, I can be more specific:
160
it depends on what one means by invariance (or covariance)
From the current point of view, the natural answer is that the action (5.11) is generally
covariant in any gravitational field, in particular therefore also in the absence of a
true gravitational field, i.e. in a purely fictitious gravitational field or, equivalently, in
Minkowski space. If we specialise the action (5.11) to such a gravitational field, i.e. to
the Minkowski metric written in some perhaps non-inertial coordinates, we get
Z
4 h 1 i
S[, ] = d x 2 21 m2 2 . (5.15)
Here it is important to keep in mind that refers to the components of the Minkowski
metric in the not-necessarily inertial coordinates x , as in
a b
= ab . (5.16)
x x
As a consequence, also is not necessarily equal to 1. This action is invariant under
arbitrary coordinate transformations, provided that one transforms the fields and the
metric appropriately.
1. If one looks for the transformations of the coordinates a and the fields that
leave the action invariant (with fixed metric components ab ) then none too sur-
prisingly one finds that the action is invariant under Poincare transformations of
the coordinates provided that the scalar fields transform as scalars, but not under
more general transformations.
2. If one looks for the transformations of the coordinates a and the fields and
the metric ab that leave the action invariant, then one finds that the action is
invariant under arbitrary coordinate transformations
Sometimes option (1) is taken to define the invariance group (Poincare transformations)
while option (2) refers to the covariance group. In this sense, special relativity is in-
variant under Poincare transformations but is at the same time generally covariant. In
161
philosophy of science or epistemological terms whether one has option (1) or option (2)
is related to the question whether or not the Minkowski metric is regarded as an absolute
element of the theory. With ab promoted to an absolute element, general covariance
is reduced to Poincare invariance (those transformations that, from the generally co-
variant transforms point of view, leave ab invariant).14 Unfruitful discussions
ensue when tacitly conflicting assumptions are made about what are considered to be
the absolute elements of a theory.
~ B
. ~ =0 , ~ E
~ + t B
~ =0 (5.17)
~ E
. ~ = /0 , ~ 1 t E
~ B ~ = 0 J~ (5.18)
c2
~ J~ = 0
t + . (5.19)
~ and ,
4. the vector and scalar potentials A
~ =
B ~ A
~ , ~ =
E ~ t A
~ (5.20)
~ and B
5. and the corresponding gauge transformations leaving E ~ invariant,
~A
A ~ +
~ , t ~ E
E ~ , ~ B
B ~ . (5.21)
The charge density and current can be packaged into a Lorentz vector
162
(note that in signature (-+++) one has to choose whether to identify J 0 or J0 = J 0
with the charge density, here we choose the former), and the continuity equation can
be written in the manifestly Lorentz-invariant form
a J a = 0 . (5.23)
Likewise, the scalar and vector potential can be packaged into a Lorentz covector
~ ,
Aa = (/c, A) (5.24)
Aa Aa + a . (5.25)
Fab = a Ab b Aa (5.26)
and
0 +E1 /c +E2 /c +E3 /c
E /c 0 +B3 B2
ab 1
(F ) = (5.29)
E2 /c B3 0 +B1
E3 /c +B2 B1 0
In terms of these Lorentz tensors, the homogeneous Maxwell equations can be written
as
[a Fbc] = 0 a Fbc + c Fab + b Fca = 0 , (5.30)
and these equations are identically satisfied if Fab derives from a potential,
163
with the Maxwell Lagrangian
~ 2 /c2 B
14 Fab F ab = 21 F0k F 0k 14 Fik F ik = 12 (E ~ 2) . (5.34)
This is essentially all we will need (some facts regarding the Noether versus covariant
energy-momentum tensor of Maxwell theory will be recalled below).
Mutatis mutandis we can now proceed in the same way as for a scalar field.
1. The basic dynamical field is the vector potential Aa . Given the vector potential
A , the Maxwell field strength tensor in Special Relativity is
Therefore in a general metric space time (gravitational field) one is led to (or
tempted to) define the field strength tensor as
GR: F = A A = A A . (5.36)
SR: a F ab = J b
[a Fbc] = 0 . (5.37)
Thus in a general gravitational field (curved space time) these equations become
GR: F = J
(5.38)
[ F] = 0 ,
164
where now of course all indices are raised and lowered with the metric g ,
F = g g F . (5.39)
Remarks:
(a) Regarding the use of the covariant derivative in the second equation, the
same caveat as above applies.
(b) In particular, using the results derived in section 4.5, we can rewrite these
two equations as
GR: ( gF ) = gJ
[ F] = 0 . (5.40)
(c) It is clear from the first of these equations that the Maxwell equations imply
that the current is covariantly conserved: since
( gF ) = 0 (5.41)
a Aa = 0 a F ab = J b Ab = Jb . (5.43)
This gauge condition has the virtue of preserving Lorentz invariance. Simi-
larly, its covariantised version
A = 0 (5.44)
165
the covariant divergence of the Maxwell field strength tensor can be written
as
A = 0 F = ( A A ) = A [ , ]A ,
(5.45)
where A = A is the naive Laplacian on scalars. The second
term would of course be zero in Minkowski space, but here it is not. Indeed,
as we will see in section 7, the quintessence of a non-trivial geometry is
that covariant derivatives do not commute on tensors other than scalars.
In particular, here one finds that as a consequence of (7.38) the Maxwell
equations in the covariant Lorenz gauge can be written as
A R A = J , (5.46)
SR: f a = eF ab b . (5.47)
GR: f = eg F x . (5.48)
in General Relativity.
As for the scalar field, depending on whether one writes the field strength tensor
as F = A A or as F = A A , by varying this action with
respect to the A one obtains the vacuum Maxwell equations F = 0 in either
of the 2 forms
(
( gF ) = 0
S[A , g ] = 0 (5.51)
A F = 0
166
Remarks:
(a) Writing out explicitly the Lagrangian in terms of its components (with re-
spect to some coordinate system x = (t, xk )) one finds
While the 1st and 2nd lines look just like gravitationally dressed standard
terms E ~ 2 and B~ 2 , the last line appears to suggest a gravitationally
induced coupling between the electric and magnetic fields. This, however,
is misleading and simply not a meaningful way of expressing things. After
all, even in Minkowski space the decomposition of the electro-magnetic field
into electric and magnetic fields depends on the choice of inertial reference
system.
R 4
(b) In order to add sources, one can add gd x A J to the Maxwell action,
thus coupling the matter current to the Maxwell gauge field. Instead of just
adding such a (phenomenological) source-term by hand, a more coherent mi-
croscopic approach (which also provides the sources with their own dynamics)
is to consider a matter action (minimally) coupled to the Maxwell field,
SM [] SM [, A ] . (5.53)
The combined Maxwell + matter action will then give rise to the Maxwell
equations with a source provided that one defines the current J as the
variation of the matter action with respect to the gauge field,
SM [, A ]
J . (5.54)
A
Im anticipation of this I just want to point out that, by the same rationale as that
leading to (5.54), perhaps we should define the source term for the gravitational field
by the variation of the gravitationally minimally coupled matter action with respect to
the metric. If we now call this source term the energy-momentum tensor, then we have
a candidate definition of the energy-momentum tensor which is natural and appropriate
from the gravitational point of view. We will pursue this point of view in section 6.6.
167
5.7 Minimal Coupling and (quasi-)Topological Couplings
In all the cases considered so far, the minimal coupling prescription resulted in a mini-
mally coupled matter action that depends explicitly on the metric - this is as it should
be and is not a surprise. What would be more of a surprise would be to find minimally
coupled and hence generally covariant contributions to an action that do do not depend
on the metric, but such examples do indeed exist (and play an important role in many
branches of physics and even mathematics, ranging from the strong-CP problem in QCD
to high-Tc superconductors to topology). Such terms in the action are usually referred
to as topological terms in the physics literature but as they need not be (and usually
are not) purely topological in the mathematics sense, for lack of a better name I refer
to them as quasi-topological.
with Z
Ss [] = d4 x Ls (, ) (5.56)
some arbitrary standard scalar field action (of the type already discussed), Sm [A]
the usual Maxwell action,
Z Z
Sm [A] = d4 x Lm ( A ) = 14 d4 x F F (5.57)
where
F = 1
2 F (5.59)
168
Minimal coupling for the first two (standard) terms proceeds as already discussed
above. For the third, axionic term, we make the usual replacement d4 x gd4 x
and recall from (3.65) that
1
(5.60)
g
is a (4,0) tensor, so that the generally covariant generalisation of the axionic action
is Z
4
1
Sa [, A, g ] = 8 gd xf () F F
Z (5.61)
1 4
= 8 d xf () F F = Sa [, A]
We see that, as announced, the metric dependence drops out of the minimally cou-
pled generally covariant action. The reason for this is that the axionic Lagrangian
is already all by itself a scalar density of weight w = +1, and that therefore
its integral (3.58) is well-defined and generally covariant without having to take
recourse to a metric to construct an auxiliary weight-one object like g.
L = Lm + k Lcs = 14 F F + 21 k A F . (5.62)
Minimal coupling for the first term is standard and for the 2nd term one finds,
as above, that the generally covariant minimally coupled Chern-Simons action is
actually metric independent (since the Chern-Simons Lagrangian is a density of
weight w = 1),
Z
1 3
Scs [A, g ] = 2 k gd x A F
Z (5.63)
1 3
= 2k d x A F = Scs [A]
As an aside note that the above theory is also known as topologically massive
Maxwell theory, since the CS term provides a gauge-invariant mass term for the
photon. One quick way to see this is to note that the equations of motion are
F + k F = 0 . (5.64)
G = 1
2 F (5.65)
169
the equations of motion and the Bianchi identity take the form
G G = 2k G , G = 0 (5.66)
respectively. Acting with on the equation of motion and using the Bianchi
identity and again the equation of motion one finds
G = 2k G = k ( G G )
(5.67)
= 2k2 G = 4k2 G
These quasi-topological terms modify the equations of motion. Moreover, since they
depend on the derivatives of the fields, they will contribute to the canonical Noether
energy-momentum tensor. On the other hand, since they do not depend on the metric,
they do not contribute to the covariant energy-momentum tensor, defined in section 6
in terms of the variation of the matter action with respect to the metric (and as such
playing the role of the source term for the Einstein gravitational feild equations).
How it nevertheless conspires that this tensor is conserved on-shell even though the equa-
tions of motion have been modified and how the improved canonical energy-momentum
tensor nevertheless ends up agreeing with the covariant energy-momentum tensor on-
shell will be explored and explained in section 21.5.
where V is the four-volume R3 [t0 , t1 ]. This holds provided that J vanishes at spatial
infinity.
Now in General Relativity, the conservation law will be replaced by the covariant conser-
vation law J = 0, and one may wonder if this also leads to some conserved charges
in the ordinary sense. The answer is yes because, recalling the formula for the covariant
divergence of a vector,
J = g1/2 (g 1/2 J ) , (5.70)
170
we see that
J = 0 (g1/2 J ) = 0 , (5.71)
so that g1/2 J is a conserved current in the ordinary sense. We then obtain conserved
quantities in the ordinary sense by integrating J over a spacelike hypersurface . We
will develop a more precise formula for this, an appropriate version of the Gauss theorem
for hypersurfaces in curved space-times, in section 15.3.
The factor g1/2 apearing in the current conservation law can be understood physically.
To see what it means, split J into its space-time direction u , with u u = 1, and
its magnitude as
J = u . (5.72)
This defines the average four-velocity of the conserved quantity represented by J and
its density measured by an observer moving at that average velocity (rest mass density,
charge density, number density, . . . ). Since u is a vector, in order for J to be a vector,
has to be a scalar. Therefore this density is defined as per unit proper volume. The
factor of g1/2 transforms this into density per coordinate volume and this quantity is
conserved (in a comoving coordinate system where J 0 = , J i = 0).
We will come back to this in the context of cosmology later on in this course, but
for now just think of the following picture (Figure 44 in section 33): take a balloon,
draw lots of dots on it at random, representing particles or galaxies. Next choose some
coordinate system on the balloon and draw the coordinate grid on it. Now inflate
or deflate the balloon. This represents a time dependent metric, roughly of the form
ds2 = r 2 (t)(d 2 + sin2 d2 ). You see that the number of dots per coordinate volume
element (area element in this case) does not change, whereas the number of dots per
unit proper volume (area) will.
171
6 Energy-Momentum Tensor I: Basics
6.1 Introduction
Newtons gravitational field equation for the gravitational potential is the Poisson
equation = 4GN , with the mass density. Thus in Newtons theory, mass is the
source of gravity. We can also more usefully, and thinking relativistically, write this in
terms of the energy density = c2 as
4GN (c=1)
= = 4GN . (6.1)
c2
Now we already noted in section 1.1 that in Special Relativity is not a scalar but
rather just one component of a tensor, the energy-momentum tensor
with the components Tab transforming into each other under Lorentz transformations
according to the transformation rules for Lorentz tensors.
Within the framework of special relativity and relativistic field theories there are (at
least) 2 common approaches to constructing or defining an energy-momentum tensor,
namely
A macroscopic phenomenological description is useful when one does not know (or does
not care about) the microscopic description of the matter one is dealing with but rather
tries to characterise its properties in terms of the specification of some macroscopic
(thermodynamic, hydrodynamic) parameters such as energy (density), pressure, viscos-
ity etc. For many purposes this is the appropriate language for describing e.g. gases or
fluids.
172
In this case, one constructs the energy-momentum tensor in such a way that it encodes
the physics one is trying to describe (primarily conservation laws and dynamics). As
a simple example of this (not by coincidence the one which is of most relevance for
gravitational physics and thus also later on in these notes), we consider a perfect fluid.
By definition, a perfect fluid is one in which a comoving oberver (i.e. an oberver in a local
rest-frame of the fluid) sees the fluid around him as isotropic (rotation-invariant). This
means that in this reference system the components of the energy-momentum tensor
have the form (any non-zero T0k would break rotation invariance, and ik is the unique
rotation-invariant symmetric (0, 2)-tensor)
Here and p are any functions of the coordinates, interpreted as the energy density and
the pressure of the fluid .
To specify the kind of fluid one is working with, one should supplement this by an
equation of state which provides a relation between and p. Typically this amounts to
specifying p as a function of ,
In terms of the 4-velocity u of the fluid, which in the local rest frame has the compo-
nents
ua = (1, 0, 0, 0) , (6.5)
one can combine the components of the energy-momentum tensor into the expression,
(note that energy density and pressure = force per unit area have the same dimensions).
As this is now a tensorial equation it is now valid in any inertial system. It defines the
energy-momentum tensor of a perfect fluid. The conditions
a Tab = 0 (6.7)
imply a continuity equation and (as we will see below) a relativistic generalisation of
the Euler equations for a perfect fluid. These are usually supplemented by a further
continuity equation for the fluid density current
j a = nua (6.8)
a j a = 0 . (6.9)
173
Now let us look at the consequences of these equations. Since
With the help of the current conservation equation, this equation rcan be recast into
the form
0 = ua a + ( + p)a (j a /n)
= ua a + ( + p)j a a (1/n)
(6.12)
= ua [a + ( + p)na (1/n)]
= nua [pa (1/n) + a (/n)] .
The point of rewriting the equation in this way is that (assuming a situation of ther-
modynamic equilibrium) the 2nd law of thermodynamics says that pressure p, energy
density and the volume per particle (1/n) are related by
where T is the temperature and s the specific entropy, i.e. the entropy per particle.15
Thus the above equation says that the specific entropy s is constant along the flow,
ua a s = 0 . (6.14)
so that
~
ua a = (v)(t + ~v .) (6.16)
is ((v) times) the usual convective derivative or comoving time-derivative, and the
above equation for the conservation of the specific entropy can be written as
~ =0 .
(t + ~v .)s (6.17)
becomes
~
t ((v)n) + .((v)n~
v) = 0 , (6.19)
15
See e.g. J. van Holten, Relativistic Fluid Dynamics, http://www.nikhef.nl/~t32/relhyd.pdf for
a derivation of this and further discussion.
174
and the time-component of (6.7) can be written as
~
t (p (v)2 ( + p)) .[(v)2
( + p)~v ] = 0 . (6.20)
Using this equation the spacelike components of (6.7) can then be written as
~ v ) + ~v t p + p
(v)2 ( + p)(t~v + ~v .~ ~ =0 . (6.21)
For a covariant rendition and elementary covariant derivation of the ensuing equations
of motion in a general gravitational field from the conservation of the energy-momentum
tensor, see e.g. the derivation of (34.73) and (34.74) in section 34.3.
A microscopic Lagrangian description is the method of choice when one has a Poincare-
invariant Lagrangian field theory description of the matter one is trying to describe.
In particular, this applies to the scalar and Maxwell field theories we have already
discussed and, more generally, to the modern microscopic and action-based description
of the fundamental interactions of particle physics.
For a Lagrangian L = L(, a ) depending on some fields and their 1st derivatives
(these could be scalar, vector, . . . fields), this tensor is defined by
L
ab = b + ab L (6.23)
(a )
(sign conventions are such that 00 rather than 00 is the energy density). It is built
from the 4 Noether currents
a
ab J(b) (6.24)
associated to translation invariance in the xb -direction, (b) = b . By calculating its
divergence, one finds
L
a ab = b , (6.25)
175
where L/ is the Euler-Lagrange variational derivative,
L L L
= a . (6.26)
(a )
a ab = 0 on-shell , (6.27)
This procedure and prescription is perfectly adequate and sufficient for scalar (spin 0)
fields, but it turns out to be far from satisfactory and far from the end of the story for
other fields (e.g. for Maxwell theory, for which ab turns out to be neither symmetric
nor gauge invariant). In this more general situation one is then required to improve
this prescription in order to obtain an energy-momentum tensor Tab with the desired
properties.
As a first example where everything works out nicely, consider the energy-momentum
tensor of a Klein-Gordon scalar field in Minkowski space. In this case,
ab = a b + ab L = a b 12 ab cd c d + m2 2 (6.29)
with
00 = 12 ( 2 + ()
~ 2 + m2 2 ) . (6.30)
ba = ab . (6.32)
In particular, this implies that the angular momentum current associated to an infinites-
imal Lorentz transformation (1.22) with parameters bc = cb , namely
La = 21 bc Labc (6.33)
with
Labc = xb ac xc ab , (6.34)
is on-shell conserved,
a Labc = bc cb = 0 . (6.35)
176
Since ab is symmetric (and gauge invariance is not an issue), in this example there is
no need to improve the Noether energy-momentum tensor, and we thus denote it by
Tab ,
Tab = ab = a b + ab L (6.36)
As we will see below, it is also straightforward to promote this tensor by minimal
coupling to a (covariantly conserved) energy-momentum tensor of a scalar field in a
gravitational field,
Now let us take a look at Maxwell theory in Minkowski space. In this case the canonical
Noether energy-momentum tensor is
L
ab = b Ac + ab L = Fac b Ac 14 ab Fcd F cd . (6.37)
( a Ac )
It is of course on-shell conserved by construction,
a ab = 0 on-shell (6.38)
(note that both sets of Maxwell equations are required to derive this), but it is neither
symmetric nor gauge-invariant. In particular, therefore, the angular momentum current
(6.34) is not conserved (even though Maxwell theory is Lorentz invariant), and the
expression for the energy-density is not gauge-invariant and does not agree with the
standard expression
~2 + B
00 6= 21 (E ~ 2) . (6.39)
This can be rectified by manipulating ab as
and noting that the last term can be written as a sum of two terms,
a c (Fac Ab ) = 0 , (6.42)
ab = ab c (Fac Ab )
(6.44)
ab = 0 on-shell ,
a (6.45)
177
as well as on-shell gauge invariant,
ab = Fac F c 1 ab Fcd F cd ( c Fac )Ab
b 4
(6.46)
= Fac Fb c 14 ab Fcd F cd on-shell .
Therefore one can define the improved energy-momentum tensor
(again both sets of Maxwell equations are required to establish this; with an
external source,
[a Fbc] = 0 , a F ab = J b (6.49)
one has the non-conservation law
instead, which becomes a conservation law when one adds to Tab the energy-
momentum tensor of the source fields + interaction terms);
Moreover, the components of T0k are the components of the Poynting vector and the
spatial components Tik are the components of the Maxwell stress tensor. Thus Tab is
the correct energy-momentum tensor of Maxwell theory.
This procedure to obtain Tab from ab can be understood in a more general and sys-
tematic way, via the so-called Belinfante improvement (or symmetrisation) procedure.
A brief synopsis of this construction will be provided in section 6.4 below.
One of the many useful properties of a symmetric, conserved energy-momentum tensor,
and one that is occasionally used in general relativity, e.g. in the discussion of the energy
and energy flux of gravitational waves, is the Laue Theorem (or tensor virial theorem).
It states that for such an energy-momentum tensor and a localised source (so that one
can integrate by parts with impunity) one has the relation
) Z Z
a Tab = 0 , Tab = Tba 3 ik 2
1
d x T = 2 (0 ) d3 x T00 xi xk (6.53)
localised source
178
between the integrated spatial components Tik and the quadrupole moments
Z
Q (t) = d3 x T00 xi xk
ik
(6.54)
Z
= + 2 0 d3 x (j T j0 )xi xk
1
Z
= 2 0 d3 x (T i0 xk + T k0 xi )
1
Z (6.57)
= 2 d3 x (0 T i0 xk + 0 T k0 xi )
1
Z
= 2 d3 x ((j T ij )xk + (j T kj )xi )
1
Z
= + d3 x T ik .
The procedure to obtain a symmetric and conserved Tab from the canonical Noether
energy-momentum tensor ab of a Poincare-invariant field theory, illustrated above in
the case of Maxwell theory, can be understood in a more general and systematic, but also
somewhat round-about way by appealing to the Lorentz-invariance of the action and
taking into account the non-trivial transformation behaviour of the fields with spin 6= 0
under Lorentz transformations. This recipe is known as the Belinfante improvement
procedure.16 Here is, just for reference purposes, a brief description of the general
features of this construction:
16
This is explained in many places, with varying degree of comprehensibility or comprehension. For
a detailed explanation, geared also towards applications to general relativity, see section 2 of T. Ortin,
Gravity and Strings; for a succinct description, and an extension of the usual procedure to Lagrangians
depending also on second derivatives of the fields, see section II of D. Bak, D. Cangemi, R. Jackiw,
Energy-Momentum Conservation in General Relativity, arXiv:hep-th/9310025.
179
In general (with the exception of spin zero scalar fields), ab = ac cb is not
symmetric,
ab 6= ba . (6.58)
By Lorentz invariance of the action and Noethers theorem, the total (orbital +
spin) angular momentum should be conserved, and the above (purely orbital)
angular momentum current fails to be conserved because it does not take into
account the spin, i.e. the fact that the are possibly non-trivial Lorentz tensors
(an irrelevant fact as far as the translational symmetries and hence the Noether
energy-momentum tensor are concerned).
This can be rectified by constructing the conserved total angular momentum cur-
rent J abc directly from Noethers theorem applied to Lorentz transformations
= L of the fields and coordinates. This gives rise to an additional (spin)
contribution to the current, schematically of the form
L a[bc]
J a = Jorbit
a
+ L , Jorbit = Labc . (6.60)
(a )
From the conservation of this current one can then via some gymnastics deduce
and extract a candidate energy-momentum tensor ab which is such that the total
a
angular momentum current J takes the form
ac xc
J abc = xb ab . (6.61)
Note that the spin-contribution to the total angular momentum has in this way
been transformed into an orbital contribution with respect to the new energy-
momentum tensor ab .
ab = ab + c cab ,
(6.62)
with
cab = acb a c cab 0 (6.63)
so that
a ab = 0 on-shell a = 0
a on-shell . (6.64)
b
180
Addition of such a term to the energy-momentum tensor is always possible as it
does not violate the conservation law. While this changes the definition of the
local energy and momentum densities, with suitable fall-off conditions on the abc
this has no effect on the total energy-momentum Pb (6.28),
Z Z
P P + d x c = P + d3 x k k0b .
b b 3 c0b b
I (6.65)
= P + dSk k0b .
b
a J abc = 0 on-shell ab =
ba on-shell . (6.66)
Thus on-shell ab agrees with a tensor Tab , which can be chosen to be symmetric
(off-shell) and on-shell conserved,
ab Tab :
Tab = Tba off-shell
(6.67)
a T ab = 0 on-shell .
Given the success of the minimal coupling prescription, it is natural to try to define the
matter energy-momentum tensor in a gravitational field in the same way. While this is
certainly possible to a certain extent (as the examples will show), this procedure also
leaves something to be desired (as the examples will also show).
Following the minimal coupling rules, we promote this to the energy-momentum tensor
T = ( + p)u u + pg , (6.70)
181
where u denotes the proper-time normalised velocity field of the fluid, g u u = 1.
The covariantisation of the conservation law (6.7) evidently reads
a Tab = 0 T = 0 . (6.71)
This generalises the continuity equation and the relativistic Euler equations to a fluid
moving in a gravitational field and reduces to the special relativistic laws at the origin
of a freely falling coordinate system, as it should.
There are neither conceptual nor technical complications in this example, and we will
adopt this perfect fluid energy-momentum tensor, supplemented by an appropriate equa-
tion of state, to model the interior of a star (section 23.7) and the matter content of
the universe (in our discussion of cosmology). In both of these examples, such a phe-
nomenological description is quite appropriate and sufficient (although for more detailed
investigations one may need to go beyond the perfect fluid approximation). For a de-
tailed analysis of the conservation equations in the context of cosmology, see sections
34.3 and 34.4.
Let us now turn to energy-momentum tensors for Lagrangian field theories, starting
with the example of the Klein-Gordon scalar field. As we saw above, in Minkowski
space its (Noether = improved) energy-momentum tensor is given by
Tab = a b + ab L = a b 12 ab cd c d + m2 2 . (6.72)
and it is easy to check that it is covariantly conserved for a solution to the equations
of motion in a gravitational background,
g m2 = 0 T = 0 . (6.74)
For the action (5.13) with a potential V (), the energy-momentum tensor of course also
has the form (6.73) with m2 2 /2 unsurprisingly replaced by V (),
T = 12 g g g V () , (6.75)
with
g = V () T = 0 . (6.76)
So far so good. However, the significance of this energy-momentum tensor outside the
realm of special relativity is not clear. In special relativity, it encodes the conserved
quantities associated to translation invariance, but in a general gravitational field there
is no translation invariance (or other symmetry). In particular, in a general gravitational
field
182
one cannot even derive the energy-momentum tensor (6.73) from Noethers theo-
rem applied to translations
and, related to this is the fact that one does not obtain an ordinary conservation
law but the covariant conservation law T = 0.
Regarding the second point, we will see in sections 6.9 and 9.1 below that to any con-
tinous symmetry of a gravitational field (metric) and the covariantly conserved energy-
momentum tensor one can associate a covariantly conserved current and thus also (as
discussed in section 5.8) a conserved charge.
Now let us turn to Maxwell theory. Here the situation is a priori a bit murkier, because
in principle we have both the canonical Noether energy-momentum tensor ab (6.37),
at our disposal. Let us start with the latter, not only because it is the nicer object but
also because it turns out to give the correct result. Applying the rules of minimal
coupling, one finds the tensor
T = F F 41 g F F , (6.79)
where indices of the (metric independent) field strength tensor F are of course raised
with the aid of the inverse metric g . This object turns out to have all the right
properties to qualify as a candidate energy-momentum tensor of Maxwell theory in a
gravitational field. In particular, it is off-shell symmetric and moreover on-shell covari-
antly conserved,
T = 0 on-shell , (6.80)
T = J F on-shell . (6.81)
While one may have anticipated these last two equations on the basis of the minimal
coupling recipe, it is important (and a useful exercise) to verify by direct calculation that
183
they indeed hold. The point of this verification is to make sure that no commutators
of covariant derivatives, i.e. curvature terms, arise in and mess up this equation, as
they will in the calculation below involving the Noether energy-momentum tensor.
So let us take a brief look at the covariantised or minimally coupled Noether energy-
momentum tensor, namely
= F A 41 g F F . (6.82)
While the canonical energy-momentum tensor in Minkowski space had some undesirable
properties, its one redeeming feature was that it was on-shell conserved. In contrast
to this, is neither on-shell conserved nor on-shell covariantly conserved. In order
to establish a ab = 0 in Minkowski space, one uses the fact that partial derivatives
commute. Thus, analogously, in calculating one encounters the commutator of
covariant derivatives. Explicitly on-shell one finds
= 12 F [ , ]A , (6.83)
However, as we will discuss at length in section 7, the characteristic and defining feature
of a non-trivial curved space-time is that these covariant derivatives do not commute
when acting on tensors other than scalars (their commutator defining the curvature
tensor of the space-time).
ab = ab c (Fac Ab ) ,
(6.84)
(F A ) = 12 F [ , ]A , (6.85)
so that it would not qualify as an improvement term in the standard sense. Neverthe-
less, subtracting this term from the (non-conserved) Noether energy-momentum tensor,
one finds that this indeed cancels the commutator term arising form (6.83), thus giving
rise to an on-shell covariantly conserved or T . From the present perspective,
however, this must be considered to be somewhat of a miracle or fluke. For some more
comments on this, see section 21.2.
184
6.6 Covariant Energy-Momentum Tensor: the Source of Gravity
As we have seen, there are some irritating conceptual and technical issues associated
with the Noether + minimal coupling procedure in general. These irritants turn
out to be a good thing, though, because they motivate us to rethink this issue from
scratch, and this will now lead us to a much more compelling and both conceptually
and technically perfectly satisfactory general definition of the energy-momentum tensor
of any Lagrangian field theory in a gravitational field.
Thus let us think about this issue from a Lagrangian, action-based, perspective. So
far we have discussed what is the appropriate form of the action for matter fields in a
gravitational field, namely a generally covariant action
Z
4
Smatter = SM [; g ] = gd x LM (, , . . . , g , . . .) (6.87)
for the matter fields in a gravitational background g , obtained e.g. by the minimal
coupling description and thus describing the dynamics of the fields in a gravitational
background and encoding the coupling of the matter fields to gravity. Ultimately, this
action should then be one part of the total gravitational + matter action describing the
dynamics of the matter fields and of the gravitational field,
Since the gravitational field is described by the (now dynamical) variables g (x), we
can write this marginally more explicitly as
S[g , ] = Sg [g ] + SM [; g ] . (6.89)
The precise form of the gravitational action Sg will not be relevant here - this is some-
thing that we will discuss at length in section 19. All we need to keep in mind is that
this action is to provide us with the gravitational part of the gravitational field equa-
tions, i.e. with the appropriate tensorial generalisation of the left-hand side of the
Newtonian field equation = 4GN .
Variation of this total action with respect to the matter fields is equivalent to the
variation of the matter action SM alone with respect to the matter fields,
S[g , ] SM [; g ]
=0 =0 , (6.90)
and will thus simply give rise to the equations of motion of the matter fields in a
gravitational field, as required.
Now let us consider the variation of the total action with respect to the gravitational
dynamical variables g ,
S[g , ] Sg [g ] SM [; g ] !
= + =0 (6.91)
g g g
185
Variation of the gravitational action with respect to the gravitational field g will give
us the gravitational part of the field equations. Thus variation of the matter action with
respect to the gravitational field will give us the source term for the gravitational field
equations provided by the matter fields,
SM [; g ]
= Source of Gravity . (6.92)
g
On the other hand, as recalled in the introduction to this section (section 6.1), we expect
the energy-momentum tensor to act as the source of gravity. Therefore we should simply
define the energy-momentum tensor by this relation,
T := Source of Gravity
SM [; g ] (6.93)
T .
g
We will fix the proportionality factor momentarily.
Note that this is precisely analogous to the way a source term for the Maxwell equations,
a current J , arises from the variation of the coupled matter-Maxwell action with respect
to the gauge field A (5.54),
SM [, A ]
J . (6.94)
A
In order to test this suggestion, let us take a look at our two standard examples, a scalar
field and Maxwell theory. For a scalar field, the minimally coupled action is (5.13)
Z
4 1
S[, g ] = gd x 2 g V () . (6.95)
Since the action depends explicitly on the inverse metric, it is more convenient to de-
termine the variation of the action under variations
g g + g (6.96)
of the inverse metric. Under such a variation, the volume factor g varies as (4.83)
g = 12 gg g . (6.97)
186
Now let us look at Maxwell theory, our litmus test. In this case, the action is (5.50)
Z
1 4
S[A , g ] = 4 gd x g g F F . (6.101)
The variation of g is as before, and as regards the variation of the inverse metric, there
is now an additional relative factor of two compared with the calculation for the scalar
fields because the action depends quadratically on the inverse metric. Thus one has
Z
4
1
S[A , g ] = 2 gd x g F F 41 g F F g . (6.102)
Thus the metric variation of the matter action has given us on the nose the symmetric,
gauge invariant, on-shell conserved energy-momentum tensor of Maxwell theory, without
any need to appeal to any improvement procedures!
Thus, when it comes to defining the energy-momentum tensor for Maxwell theory, the
above approach based on the variation of the matter action with respect to the metric
wins hands down over the painful canonical definition based on Noethers theorem for
translations and the Belinfante improvement procedure combined with minimal cou-
pling.
Encouraged by this, we now define the energy-momentum tensor T in general by
Z
4
1
metric SM [, g ] = 2 gd x T g , (6.105)
or, equivalently,
2
T := SM [, g ] . (6.106)
g g
Even though, as we have seen, there are other definitions of the energy-momentum ten-
sor, this is the modern, and by far the most useful, definition of the energy-momentum
tensor, namely as the response of the matter action to a variation of the metric (equiv-
alently, as the source of gravity).
Moreover, crucially for the present context, whatever the virtues of other definitions
may be, from the variational principle for general relativity it is this energy-momentum
tensor that plays the role of the source term for the Einstein equations.
Remarks:
187
1. The energy-momentum tensor as defined by (6.105) or (6.106) is frequently called
the metric energy-momentum (or stress-energy) tensor, or also the Hilbert or
Rosenfeld energy-momentum tensor. It is sometimes also referred to as the gravi-
tational energy-momentum tensor, but that is confusing as it does not describe the
energy-momentum of the gravitational field itself, a more mysterious and elusive
quantity we will briefly look at and for in section 21.6.
I prefer the attribute covariant, to distinguish it from what is usually called the
canonical Noether energy-momentum tensor. Thus, even though this terminology
is not standard, I will henceforth refer to T as defined by (6.105) or (6.106), as
the Covariant Energy-Momentum Tensor.
4. When the minimally coupled matter Lagrangian depends only on the metric and
not on the first derivatives of the metric (i.e. not on the Christoffel symbols),
as in the case of scalar or Maxwell gauge fields, then more explicitly the covariant
energy-momentum tensor can be written as (and calculated from)
2 ( gLM (x)) LM (x)
T (x) = = 2 + g (x)LM (x) (6.109)
g g (x) g (x)
or
LM (x)
T (x) = 2 + g (x)LM (x) . (6.110)
g (x)
Here the sign change is due to the fact that g denotes the variation of the
inverse metric, not the contravariant components of g . Thus it is not the same
as g g g , but rather minus this expression,
g = g g g , (6.111)
0 = (g g ) = (g )g + g g g = g g g . (6.112)
188
5. The definition (6.105) or the explicit expression (6.109) also provides an efficient
strategy to determine the energy-momentum tensor even if one is just interested
in Poincare-invariant field theories in Minkowski space:
In order to determine a symmetric, gauge invariant, and on-shell conserved energy-
momentum tensor Tab for such a theory, one
It can be shown that for fields of any spin this energy-momentum tensor agrees
on-shell with what one could have also obtained by invoking the Belinfante im-
provement procedure of the Noether energy-momentum tensor,
ab = Tab
on-shell (6.114)
6. When the minimally coupled matter action depends also on the first derivatives
of the metric, through the covariant derivative of some (non-scalar) field ,
say, by the usual rules of variational calculus there will be additional contributions
to the energy-momentum tensor, arising from an integration by parts of
Z
4 LM (x)
gd x 2 ( )(x)
(x)
189
We consider the situation where the minimally coupled matter action happens to be
invariant under Weyl rescalings, i.e. under rescalings of the metric
In particular, thus, we consider the (admittedly very special) situation where one has
such a symmetry without any accompanying transformation of the matter fields. The
discussion can be extended to the case where also a transformation of the matter fields
is required, but for present purposes this special case is good enough (see the end of
this section for a comment on the general case).
Examples of such actions are e.g. the action of a massless scalar field (5.11) in D = 2
(space-time) dimensions
Z
S[, g ] = 2 d2 x gg
1
(6.118)
Indeed, in that case the metric dependence of the action is precisely such that the
combination of of the determinant g and the inverse metric that appears is invariant
under Weyl rescalings,
(
2 D=2 gg gg
g e g (6.120)
D=4 gg g gg g
This is reflected in the fact that the corresponding energy-momentum tensor is traceless
precisely in these dimensions: from (6.73) and (6.79) one finds
T = 21 g (g ) T = 12 (D 2)g
(6.121)
T = F F 14 g F F T = 41 (D 4)F F .
The relation between these two observations / assertions is provided by noting that if
the matter action is invariant under Weyl rescalings one has
Z
D
0 = Smatter = 12 gd x T (x) g (x)
Z Z (6.122)
D D
= gd x T (x)g (x)(x) = gd x T (x)(x) .
190
In the special csae that we have considered here (invariance under scalings of the metric
alone, without transforming the matter fields), this is true off-shell, i.e. without using
the equations of motion for the matter fields. In the more general casse of an invariance
under joint Weyl rescalings of the metric and accompanying scalings of the matter fields,
in the above chain of arguments one would need to also vary the matter action with
respect to the matter fields to establish the invariance of the action. The term arising
from the variation of the matter fields is evidently proportional to the Euler-Lagrange
equations of the matter fields, and therefore in that case one could only conclude that
T = 0 on-shell,
)
invariance under joint Weyl rescalings
T = 0 on-shell. (6.124)
of the metric and the matter fields
An example of this is provided by the so-called conformally coupled scalar field. This
conformal coupling involves a space-time dependent mass term that represents a non-
minimal coupling of the scalar field to the scalar curvature (a contraction of the Riemann
curvature tensor to be introduced in section 7), and understanding the Weyl invariance
of this model requires a formula for the variation of the scalar curvature with respect
to the metric which we will derive in section 19.2. Therefore we will need to postpone
a discussion of this model to section 21.3.
As an aside, but as a concrete, and the simplest non-trivial, example, and an illustration
of the above remarks regarding Weyl invariance, let us consider a massless scalar field
in (1+1)-dimensions, in either the usual Minkowski coordinates, or in the Rindler coor-
dinates discussed in sections 1.3 and 2.8 (we will in particular make use of the results
in section 2.8).
Thus a natural basis of solutions to this equation is provided by the plane waves fk
exp(it + ikx), with k2 = 2 , i.e. k = , > 0, and their complex conjugates. For
a given there are thus two linearly-independent positive frequency solutions,
1
f (t, x) = 1/2
e i(t x)
(4)
(6.126)
1
g (t, x) = e i(t + x)
(4)1/2
(the normalisation factors are inserted for QFT-pedantry reasons only and are irrelevant
for the following). Thus the basis of solutions splits into right-movers or right-moving
191
modes f and left-movers g . It is thus convenient to introduce the corresponding null
coordinates uM = t x, vM = t + x as in (2.155), in terms of which the solutions can
be written as
1
f = f (uM ) = e iuM
(4)1/2
(6.127)
1 iv M
g = g (vM ) = e .
(4)1/2
That the solutions split in this way could have also been deduced from the form of the
wave operator in these lightcone (null) coordinates, namely = 4uM vM , and the
ensuing solutions to the equation of motion,
Here f and g can now be arbitrary wave packets constructed from the solutions f and
g respectively.
The energy-density M = Ttt of the scalar field with respect to Minkowski time is
M = 12 ((t )2 + (x )2 ) (6.129)
and in terms of lightcone coordinates this splits into a sum of left-moving and right-
moving contributions,
M = (uM )2 + (vM )2 , (6.130)
with f (uM ) evidently only contributing to the former and g(vM ) to the latter.
Now let us consider the same issue in Rindler coordinates. In terms of the coordinates
(, ) (2.150), the metric takes the form (2.148)
ds2 = e 2a (d 2 + d 2 ) . (6.131)
Note that, as mentioned in section 2.8, the metric in these coordinates is conformally
flat. Thus, by the reasoning above, in section 6.7, in particular the discussion around
equation (6.120), we know that the action and equation of motion for a scalar field
in Rindler coordinates will look just like those in Minkowski coordinates, with the
replacement (t, x) (, ),
Z Z
1
SR [] = 2
gdd g = 2 dd ,
1
(6.132)
and
g = 0 (2 + 2 ) = 0 . (6.133)
Thus by the same reasoning as above, the solutions can be split into left- and right-
movers and are conveniently written in terms of the Rindler lightcone coordinates (2.156)
(uR , vR ) = , (6.134)
192
i.e. one has
R = 12 (( )2 + ( )2 ) (6.136)
and in terms of lightcone coordinates this splits into a sum of left-moving and right-
moving contributions,
R = (uR )2 + (vR )2 , (6.137)
with f (uR ) evidently only contributing to the former and g(vR ) to the latter.
The interest in these (fairly trivial) considerations lies in the fact that the exponential
relation (2.158) between the Minkowski and Rindler null coordinates
reflecting the exponential redshift of a Rindler relative to an inertial observer (and vice-
versa) has a number of non-trivial and remarkable implications. I will just mention 2
of them here:
1. The exponential redshift expressed by (6.138) implies that the right-moving energy
densities in Minkowski and Rindler coordinates are related by
uM 1
= auM (uM )2 = (uR )2 (6.139)
uR a2 u2M
(and likewise for the left-movers). Thus essentially any classical solution that is
regarded as regular by the Rindler observer (finite and non-zero R ) corresponds
to a divergent Minkowski energy-density as uM 0, i.e. on the future boundary
(horizon) t = x of the Rindler wedge.
2. The exponential redshift expressed by (6.138) also implies that the notions of
positive frequency with respect to Minkowski and Rindler time are inequivalent,
e.g. in the sense that f (uM ), restricted to the right Rindler-wedge uM < 0, say,
cannot be written as a superposition of Rindler right-moving positive frequency
waves alone, Z
f (uM ) 6= d (, )f (uR ) . (6.140)
0
Of course, the f (uR ) and their complex conjugates f (uR ) provide a basis of
solutions for the right-moving modes (in the right Rindler-wedge), so that one can
certainly expand the Minkowski plane waves as
Z
f (uM ) = d (, )f (uR ) + (, )f (uR ) , (6.141)
0
193
but necessarily with some of the (, ) 6= 0.
If you know a little bit of quantum field theory, you will be able to anticipate that
this means that the notions of creation and annihilation operators are inequivalent,
and that therefore what is the vacuum, say, for an inertial observer, will not be
seen as the vacuum by the accelerating observer (and vice-versa).
Combining the two facts, one also arrives at the conclusion that the Rindler
vacuum is singular both at the future horizon (from right-movers) and at the
past horizon (from left-movers).
In the spirit of the equivalence principle (before studying gravity, let us study accelera-
tions in flat space), this Unruh Effect is a fascinating and rewarding first step towards
understanding (or appreciating the difficulties encountered by) quantum field theory
in curved space-times, i.e. in non-trivial gravitational fields. For more on this see the
references given in section 26.6.
In section 5.8 we had discussed how to obtain conserved charges from covariantly con-
served currents. Now in special relativity one can construct conserved currents (cor-
responding to the generators of Poincare transformations) from the conserved energy-
momentum tensor, and hence from there the corersponding conserved charges like en-
ergy, momentum and angular momentum. In this section we will take a first look at
the question if or to which extent we can also obtain such conserved currents from the
covariantly conserved energy-momentum tensor in a gravitational field.
To set the stage, recall that in Special Relativity, if T ab is the energy-momentum tensor
of a physical system, it generally satisfies an equation of the form
a T ab = Gb , (6.142)
where Gb represents the density of the external forces acting on the system. In par-
ticular, if there are no external forces, the divergence of the energy-momentum tensor
is zero. For example, in the case of Maxwell theory and a current corresponding to a
charged particle we have
Gb = Ja F ab = F ab J b F ab b , (6.143)
194
which is indeed the relevant external (Lorentz) force density (in writing this I have
suppressed the -function that localises the current to the worldline a = a ( ) of the
particle).
When there are no external forces, i.e. when one has taken into account the complete
matter action, the total energy-momentum tensor is conserved. In that case, T ab = J (b)a
defines four conserved currents, more or less (modulo Belinfante improvement terms,
see e.g. the discussion in sections 6.4 and 21.2 and the references given there) the
currents associated to translation invariance of the action via Noethers theorem. One
is thus in the setting of conserved currents of the previous section, and one can define
conserved quantities like total energy and momentum, P a , and angular momentum J ab ,
by integrals of T 0a or a T 0b b T 0a (the latter being conserved if Tab is symmetric) over
spacelike hypersurfaces.
We see that, due to the second term, this does not define four conserved currents in the
ordinary or covariant sense (and we will return to the interpretation of this equation,
and the related issue of energy and energy density of the gravitational field, in section
21.6).
Nevertheless, in analogy with special relativity, one might like to attempt to define
conserved quantities like total energy and momentum, P , and angular momentum
J , by integrals of T 0 or x T 0 x T 0 over spacelike hypersurfaces. However, these
quantities are rather obviously not covariant, and nor are they conserved.
This should perhaps not be too surprising because, after all, for a Poincare-invariant field
theory in Minkowski space these quantities are preserved as a consequence of Poincare
invariance, i.e. because of the symmetries (isometries) of the Minkowski metric (as well
as of the action).
A generic metric has no isometries whatsoever (the explicit examples of metrics in these
notes not withstanding, all of which exhibit at least some symmetries). As it has no
symmetries, we have no reason to expect to find associated conserved quantities in
general.
However, if there are symmetries then one should indeed be able to define conserved
quantities (think of Noethers theorem again), one for each symmetry generator. In
order to implement this we need to understand how to define and detect isometries of
195
the metric. For this we need the concepts of Lie derivatives and Killing vectors. These
already made occasional brief appearances in previous sections and will be discussed
more systematically in section 8, the corresponding conserved charges then being the
subject of section 9.
Alternatively, one might try to just go ahead optimistically and attempt to construct
a covariant current-like object (with a corresponding conservation law and the ensuing
possibility to define conserved charges) by contracting the energy-momentum tensor not
with the coordinates but with a vector field V , along the lines of
JV = T V . (6.145)
At least this now has the merit of clearly being a vector field, but is it conserved?
Calculating its covariant divergence, and using the fact that T is symmetric and
conserved, one finds
JV = 21 T ( V + V ) . (6.146)
Thus we would have a conserved current (and associated conserved charge by the pre-
vious section) for any conserved energy-momentum tensor if the vector field V were
such that it satisfies
V + V = 0 (T V ) = 0 . (6.147)
The link between this observation and the one in the preceding paragraph regarding
symmetries is that this is precisely the condition characterising (infinitesimal) symme-
tries of metric:
First of all, this is the condition we already found and encountered in (2.101), as
reformulated in (4.65), for the infinitesimal coordinate transformation x = V
to generate a symmetry of the metric, thus leading to a conserved charge for
geodesics.
More generally, as we will discuss in detail in section 8 below, vector fields satisfy-
ing the equation V + V = 0 are indeed in one-to-one correspondence with
infinitesimal generators of continuous symmetries of a metric (isometries).
Thus this gives a satisfactory and coherent overall picture of symmetries and conserva-
tion laws in a gravitational field.
196
7 Curvature I: The Riemann Curvature Tensor
We now come to one of the most important concepts of General Relativity and Rie-
mannian Geometry, that of curvature and how to describe it in tensorial terms. Among
other things, this will finally allow us to decide unambiguously if a given metric is just
the (flat) Minkowski metric in disguise or the metric of a genuinely curved space (but
a proof of this statement is postponed to section 10). More importantly (for present
purposes) it will allow us to construct tensors that depend on the 2nd derivatives of
the metric and will thus allow us to construct tensorial (generally covariant) differen-
tial equations for the metric. In particular, this will then lead us fairly directly to the
Einstein equations (section 18), i.e. to the field equations for the gravitational field.
Recall that the equations that describe the behaviour of particles and fields in a gravi-
tational field involve the metric and the Christoffel symbols determined by the metric.
Thus the equations for the gravitational field should be generally covariant (tensorial)
differential equations for the metric.
At first, here we seem to face a dilemma. How can we write down covariant differential
equations for the metric when the covariant derivative of the metric is identically zero?
Having come to this point, Einstein himself reached an impasse and required the help
of his mathematician friend Marcel Grossmann (Grossmann, you have to help me, or
else Ill go crazy!) whom he had asked to investigate if there were any tensors that
could be built from the second derivatives of the metric.
Grossmann soon found that this problem had indeed been addressed and solved in the
mathematics literature, in particular by Riemann (generalising work of Gauss on curved
surfaces), Ricci-Curbastro and Levi-Civita. It was shown by them that there are indeed
non-trivial tensors that can be constructed from (ordinary) derivatives of the metric.
These can then be used to write down covariant differential equations for the metric.17
The most important among these are the Riemann curvature tensor and its various
contractions. In fact, it is known that these are the only tensors that can be constructed
from the metric and its first and second derivatives, and they will therefore play a central
role in all that follows.
Technically the most straightforward way of introducing the Riemann curvature tensor is
via the commutator of covariant derivatives. In this section we will adopt this pragmatic
(and relatively streamlined) approach, as it is sufficient to
17
Of course, the story is not as simple and straightforward as that. For an account of Marcel Gross-
manns (often overlooked) contributions to tensor calculus and the development of general relativity, see
T. Sauer, Marcel Grossmann and his contribution to the general theory of relativity, arXiv:1312.4068
[physics.hist-ph].
197
determine the most important algebraic and differential properties of the curvature
tensor (symmetries and Bianchi identities)
assess its physical significance (gravitational tidal forces) via the influence of the
curvature tensor on the motion of (families of) freely falling particles
and to thus provide us with all the information and ingredients we need to then
discuss the Einstein equations (section 18) and their formulation in terms of an
action principle (section 19).
However, this is not geometrically the most intuitive way to introduce the concept
of curvature, and it downplays the extent to which the curvature tensor reflects and
encodes the geometric properties of space time and, more generally, does not do justice
to the fundamental differential geometric notion and significance of curvature. Some of
these aspects are discussed in Part B of these notes, in particular in sections 10, 11, 12
and 13.
[ , ](V ) = [ , ]V (7.1)
for any scalar field . This implies that [ , ]V cannot depend on derivatives of V
because if it did it would also have to depend on derivatives of .
[ , ]V = R V . (7.2)
This can of course also be verified by a direct calculation, and we will come back to
this below. For now let us just note that, since the left hand side of this equation is
clearly a tensor for any V , the quotient theorem implies that the quantities R are
the components of a tensor.
V = ( )V + ( )( V ) + ( )( V ) + V . (7.3)
198
Thus, upon taking the commutator the 2nd and 3rd terms drop out (because the 3rd is
the symmetrisation of the 2nd), and we are left with
[ , ]V = ([ , ])V + [ , ]V
= [ , ]V , (7.4)
where the last line follows from the fact that 2nd covariant derivatives do commute on
scalars. Thus we have established (7.1).
By explicitly calculating the commutator, one can confirm the structure displayed in
(7.2). This explicit calculation shows that the Riemann-Christoffel Curvature Tensor
(or Riemann tensor for short) is given by
R = + (7.5)
Remarks:
1. Note how useful the quotient theorem is in this case. It would be quite unpleasant
to have to verify the tensorial nature of this expression by explicitly checking its
behaviour under coordinate transformations.
2. Note also that this tensor is clearly zero for the Minkowski metric written in
Cartesian coordinates. Hence it is also zero for the Minkowski metric written in
any other coordinate system. We will prove the converse, that vanishing of the
Riemann curvature tensor implies that the metric is (locally) equivalent to the
Minkowski metric, in section 10.2.
3. In the above we have defined the Riemann tensor by the relation (7.2) and then
deduced the explicit expression (7.5). While this is, pragmatically speaking, a
useful way of proceeding, it may be more logical to initially define the Riemann
tensor in a different way, e.g. directly by (7.5) (for instance because by painful
calculations one has discovered that this particular combination of non-tensorial
objects miraculously happens to transform as a tensor). In that case, (7.2) is a
result rather than a definition, known as the Ricci identity.
199
We will see later that the Riemann tensor is anti-symmetric in its first two indices.
Hence we can also write
[ , ]V = R V . (7.7)
The extension to arbitrary (p, q)-tensors now follows the usual pattern, with one Rie-
mann curvature tensor, contracted as for vectors, appearing for each of the p upper
indices, and one Riemann curvature tensor, contracted as for covectors, for each of the
q lower indices. Thus, e.g. for a (2, 0)-tensor T one has
[ , ]T = R T + R T (7.8)
[ , ]A = R A R A . (7.9)
I will give two other versions of the fundamental formula (7.2) which are occasionally
useful and used.
2. Secondly, one can consider a net of curves x (s1 , s2 ) parametrising, say, a two-
dimensional surface, and look at the commutators of the covariant derivatives
along the s1 - and s2 -curves. The formula one obtains in this case (it can be
obtained from (7.10) by noting that X and Y commute in this case) is
dx dx
(Ds1 Ds2 Ds2 Ds1 ) V = R V , (7.11)
ds1 ds2
where Dsk denotes the covariant derivative along the curve parametrised by sk ,
i.e. (section 4.7)
x (s1 , s2 )
Dsk = . (7.12)
sk
200
In general, to read off all the symmetries from the formula (7.5) is difficult. One way
to simplify things is to look at the Riemann curvature tensor at the origin x0 of a
Riemann normal coordinate system (or some other inertial coordinate system). In that
case, all the first derivatives of the metric disappear and only the first two terms of (7.5)
contribute. One finds
R (x0 ) = g ( )(x0 )
= ( )(x0 )
1
= 2 (g , +g , g , g , )(x0 ) . (7.13)
In principle, this expression is sufficiently simple to allow one to read off all the symme-
tries of the Riemann tensor. However, it is more insightful to derive these symmetries
in a different way, one which will also make clear why the Riemann tensor has these
symmetries.
R = R (7.14)
R = R (7.15)
This is a consequence of the fact that the metric is covariantly constant. In fact,
we can calculate
0 = [ , ]g
= R g + R g
= (R + R ) . (7.16)
R[] = 0 R + R + R = 0 (7.17)
This Bianchi identity is a consequence of the fact that there is no torsion. In fact,
applying [ , ] to the covector , a scalar, one has
[ ] = 0 R[] = 0 . (7.18)
As this has to be true for all scalars , this implies R[] = 0 (to see this
you could e.g. choose the (locally defined) coordinate functions (x) = x with
= ).
201
4. Symmetry under exchange of the two pairs of indices
R = R (7.19)
This identity, stating that the Riemann tensor is symmetric in its two pairs of
indices, is not an independent symmetry but can be deduced from the three other
symmetries by some not particularly interesting algebraic manipulations. One
(quite possibly not optimal or minimal) possibility is
(3)
R = R R
(2)
= R + R
(3)
= R R R R
(7.20)
(1,2)
= 2R + R R
(3)
= 2R R
(1,2)
= 2R R ,
while
(1,2)
1 + 2 3 4 = 2R 2R . (7.23)
We can now count how many independent components the Riemann tensor really has.
(1) implies that the second pair of indices can only take N = (4 3)/2 = 6 independent
values. (2) implies the same for the first pair of indices. (4) thus says that the Riemann
curvature tensor behaves like a symmetric (66) matrix and therefore has (67)/2 = 21
components. We now come to the remaining condition (3): if two of the indices in (3)
are equal, (3) is equivalent to (4) and (4) we have already taken into account. With
18
See e.g. D. Bleecker, Gauge Theory and Variational Principles.
202
all indices unequal, (3) then provides one and only one more additional constraint. We
conclude that the total number of independent components is 20.
Remarks:
1. Note that this agrees precisely with our previous counting in section 2.10 of how
many of the second derivatives of the metric cannot be set to zero by a coordinate
transformation: the second derivative of the metric has 100 independent compo-
nents, to be compared with the 4 (4 5 6)/(2 3) = 80 components of the
third derivatives of the coordinates. This also leaves 20 components. We thus see
very explicitly that the Riemann curvature tensor contains all the coordinate in-
dependent information about the geometry up to second derivatives of the metric.
In fact, it can be shown that in a Riemann normal coordinate system one has
2. Just for the record, I note here that in general dimension D = d + 1 the Riemann
tensor has D 2 (D 2 1)/12 independent components. This number arises as
D 2 (D 2 1) N (N + 1) D
=
12 2 4
D(D 1)
N = (7.25)
2
and describes (as above) the number of independent components of a symmetric
(N N )-matrix, now subject to D 4 conditions which arise from all the possibilities
of choosing 4 out of D possible distinct values for the indices in (3). Just as for
D = 4, this number of components of the Riemann tensor coincides with the
number of second derivatives of the metric minus the number of independent
components of the third derivatives of the coordinates determined in (2.236),
D(D + 1) D(D + 1) D(D + 1)(D + 2) D 2 (D 2 1)
D = . (7.26)
2 2 23 12
For D = 2 this formula predicts one independent component, and this is as it
should be. Rather obviously the only independent non-vanishing component of
the Riemann tensor in this case is R1212 . We will discuss curvature in 2 dimensions
in more detail in sections 7.6 and 10.3 below.
Finally, a word of warning: there are a large number of sign conventions involved
in the definition of the Riemann tensor (and its contractions we will discuss below),
so whenever reading a book or article, in particular when you want to use results or
equations presented there, make sure what conventions are being used and either adopt
those or translate the results into some other convention. As a check: the conventions
used here are such that R as well as the curvature scalar (to be introduced below)
are positive for the standard metric on the two-sphere.
203
7.4 Influence of Curvature on Particle Trajectories
In a certain sense the main effect of curvature (or gravity) is that initially parallel
trajectories of freely falling non-interacting particles (dust, pebbles,. . . ) do not remain
parallel, i.e. that gravity is an attractive force that has the tendency to focus matter.
This statement find its mathematically precise formulation in equations describing the
influence of space-time curvature on the behaviour of (families of) geodesics.
Let us, as we will need this later anyway, recall first the situation in the Newtonian
theory. One particle moving under the influence of a gravitational field is governed by
the equation
d2 i
dt2
x = i (x) , (7.27)
where is the potential. Now consider a family of particles, or just two nearby particles,
one at xi (t) and the other at xi (t) + xi (t). The other particle will of course obey the
equation
d2
dt2
(xi + xi ) = i (x + x) . (7.28)
From these two equations one can deduce an equation for x itself, namely
d2 i
dt2 x = i j (x)xj . (7.29)
It describes the effect of gravitational tidal forces (the gradient of the gravitational force)
on a family of particles moving in a gravitational field.
In particular, when there is no gravitational force, and the trajectories are straight lines,
one has
d2
dt2
xi = 0 xi = (xi )0 + (v i )t . (7.30)
Thus one recovers Euclids parallel axiom, that two straight lines intersect at most once
(when v i 6= 0) and that they never intersect when they are initially parallel (v i = 0).
Any departure from this equation or its Minkowskian counterpart
d2
d 2
a =0 , (7.31)
It is the counterpart of (7.29) that we will be seeking in the context of General Rela-
tivity. One derivation of this can be modelled on the Newtonian derivation above. It
is elementary but looks non-covariant (and therefore somewhat messy) at intermediate
stages of the calculation (see section 11.1 for a manifestly covariant derivation).
The starting point is of course the geodesic equation for x and for its nearby partner
x + x ,
d2
d 2
x + (x) d
d d
x d x = 0 , (7.32)
and
d2
d 2
(x + x ) + (x + x) d
d
(x + x ) d
d
(x + x ) = 0 . (7.33)
204
As above, from these one can deduce an equation for x, namely
d2
d 2
x + 2 (x) d x d x + (x)x d
d d d d
x d x = 0 . (7.34)
Now this does not look particularly covariant. Thus instead of in terms of d/d we
would like to rewrite this in terms of the covariant operator D , with
d dx
D x = x + x . (7.35)
d d
appearing in that expression by x x (be-
Calculating (D )2 x , replacing x
cause x satisfies the geodesic equation) and using (7.34), one eventually finds the nice
covariant geodesic deviation equation
(D )2 x = R x x x (7.36)
Remarks:
1. This shows very clearly that curvature, as captured by the Riemann curvature
tensor, leads to non-Euclidean geometry in which e.g. the parallel axiom is not
necessarily satisifed.
2. In general, solutions to the geodesic deviation equation are called Jacobi fields.
They describe the difference between the given geodesic and a (hypothetical) in-
finitely close neighbouring geodesic.
7.5 Contractions of the Riemann Tensor: Ricci Tensor and Ricci Scalar
The Riemann tensor, as we have seen, is a four-index tensor. For many purposes this
is not the most useful object, but we can create new tensors by contractions of the
Riemann tensor. Due to the symmetries of the Riemann tensor, there is essentially only
one possibility, namely the Ricci tensor
R := R = g R . (7.37)
It arises naturally from the definition (7.2) of the Riemann tensor in terms of commu-
tators of covariant derivatives, when one considers a contracted commutator,
[ , ]V = R V [ , ]V = R V R V . (7.38)
205
In particular, this identity explains why the Maxwell equations in the covariant Lorenz
gauge (5.45) take the non-minimally coupled form (5.46).
It follows from the symmetries of the Riemann tensor that R is symmetric. Indeed
R = g R = g R = R = R . (7.39)
Thus, for D = 4, the Ricci tensor has 10 independent components, for D = 3 it has 6,
while for D = 2 there is only 1 because there is only one independent component of the
Riemann curvature tensor to start off with.
There is one more contraction of the Riemann tensor we can perform, namely on the
Ricci tensor itself, to obtain what is called the Ricci scalar or curvature scalar
R := g R . (7.40)
Remarks:
1. One might have thought that at least in four dimensions there is another way
of constructing a (pseudo-)scalar, by contracting the Riemann tensor with the
Levi-Civita tensor, but
R = 0 (7.41)
2. Note that for D = 2 the Riemann curvature tensor has as many independent
components as the Ricci scalar, namely one, and that for D = 3 the Ricci tensor
has as many components as the Riemann tensor, namely 6. Thus in D = 2 one
can express the entire Riemann tensor in terms of the Ricci scalar (and the metric)
alone, and one has
D=2: R = 12 (g g g g )R (7.42)
(we will establish this relation in section 10.3, see (10.29)), while in D = 3 one
has
D = 3 : R = (g R + R g g R R g )
(7.43)
+ 21 (g g g g )R
(and we will prove this in section 10.4).
3. It is thus only in four (and more) dimensions that there are strictly less components
of the Ricci tensor than of the Riemann tensor. This has profound implications
for the dynamics of gravity in these dimensions. In fact, we will see that it is only
in dimensions D > 3 that gravity becomes truly dynamical, where empty space
can be curved, where gravitational waves can exist etc.
206
4. Contracting (7.8), one consequence of the symmetry of the Ricci tensor is the
useful general result
[ , ]T = R (T T ) = 0 (7.44)
F = F ( F ) = 21 [ , ]F = 0 . (7.45)
Note that this can also be deduced (without knowing anything about curvature
in general or the Ricci tensor in particular) from the general expression (4.63) for
the divergence of an anti-symmetric tensor,
F = J J = 0 . (7.47)
Now we see that we can alternatively directly use the identity (7.45) to arrive at
this result.
V V V V = R V V . (7.48)
V V = (V V ) ( V )( V ) (7.49)
V ( V ) + ( V )( V ) (V V ) + R V V = 0 . (7.50)
This is a very useful and versatile master equation which provides valuable infor-
mation about the relation between vector fields and curvature when specialised e.g.
to geodesic vector fields, V V = 0, or Killing vector fields, V = V
and V = 0. Various specialisations of this equation will therefore appear later
on in these notes, and even though we will then usually rederive them from scratch
in the case at hand, it is good to keep in mind that e.g. (11.22) (our starting point
for the discussion of the Raychaudhuri equation in section 11.2) and (12.12) (a
useful identity relating Killing vectors and curvature) are special cases of (7.50).
207
6. There are other scalars that can be built form the curvature tensor, but these
are necessarily of higher order in the curvature tensor, such as (trivially) R2 or
(somewhat less trivially) R R or the square of the Riemann tensor, the so-
called Kretschmann scalar
K = R R . (7.51)
Analogously, scalars can be built from higher powers of the Riemann tensor and or
from powers of covariant derivatives of the Riemann tensor (R being the simplest
example).
7. Such scalars are useful in analysing a given metric because, since they are scalars
they are invariant under coordinate transformations. Thus they directly provide
coordinate-invariant information about a metric. For instance if K is singular at
some point in some coordinate system then it will be singular at that point in all
coordinate system, and thus such a singularity is not an artefact of a bad choice
of coordinate system but a property of the space(-time) itself described by that
metric. A prominent example is the singularity at the origin r = 0 of the Schwarz-
schild metric, unambiguously unveiled by the singularity of its Kretschmann scalar
(26.145).
( V )( V ) + R V V = (V V ) . (7.52)
The simplest (albeit perhaps not of most direct relevance for physics) situation
where one can deduce something of substance from this equation is when one
has a Riemannian (i.e. positive-definite) metric and the space one is dealing with
is compact, without boundary. Then (a) the first term is non-negative, and (b)
upon integration over the space the total derivative term on the right-hand side
gives zero upon use of the Gauss theorem (4.61) (discussed in some more detail in
section 15.3).
This implies that for a harmonic V to exist on such a space, the integral of
R V V must be non-positive. In particular,
if the Ricci tensor is positive (as a quadratic form), there are no harmonic
vector fields at all,
208
and if R V V =0, then a harmonic vector field is necessarily covariantly
constant, V = 0.
In more mathematical terms this means that the first Betti number of a compact
manifold admitting a metric with positive Ricci curvature is equal to zero. A
variant of this kind of argument for Killing vectors will be given in section 12.3.19
To see how calculations of the curvature tensor can be done in practice, let us work out
the example of the two-sphere of unit radius, i.e. with line element
We already know that the non-zero Christoffel symbols necessarily have two -indices
and one -index (from g = sin2 ), and are given by
We also know that the Riemann curvature tensor has only one independent component.
Let us therefore work out r . From the definition we find
R = + c c c c . (7.55)
The second and third terms are manifestly zero, and we are left with
Thus we have
R = R = sin2
(7.57)
R = 1 .
Therefore the Ricci tensor Rab has the components
R = 1
R = 0
R = sin2 . (7.58)
209
showing that the standard metric on the two-sphere is what we will later call an Einstein
metric. The Ricci scalar R is
1
R = g R + g R = 1 + sin2 = 2 . (7.60)
sin2
In particular, we have here our first concrete example of a space with non-trivial, in fact
positive, curvature.
We will see later on, in section 13, that this form of the curvature tensor, or its equivalent,
We now turn to some variations of the above theme (and some other generalisations are
discussed in section 10.3 below).
1. First of all, let us address the question what is the curvature (scalar) of a sphere
of radius , i.e. of the space with line element
The first is to simply and blindly redo the above calculations in this case and
to see what one gets.
Alternatively, and somewhat more insightfully, rather than redoing the cal-
culation in that case one can argue as follows. Let us observe first of all that
the Christoffel symbols are invariant under constant rescalings of the metric
because they are schematically of the form g 1 g. Therefore the Riemann
curvature tensor, which only involves derivatives and products of Christoffel
symbols, is also invariant. Hence the Ricci tensor, which is just a contraction
of the Riemann tensor, is also invariant:
210
However, to construct the Ricci scalar, one needs the inverse metric. This
introduces an explicit -dependence and the result is that the curvature scalar
of a sphere of radius is R = 2/2 ,
2. Now let us consider, instead of the unit 2-sphere, the unit hyperboloid H 2 with
metric (1.123)
ds2 (H 2 ) = d 2 + sinh2 d2 . (7.67)
It is clear that, apart from a few sign changes here and there, the calculation
of the Riemann curvature tensor is identical to that for S 2 . These sign changes
ultimately lead to the conclusion that the curvature scalar of H 2 is (-2). While
the sphere is the prototypical example of a space with positive curvature, the
hyperboloid is the prototypical example of a space with negative curvature.
3. Now let us promote the constant radius of S 2 to a new radial coordinate r and
ask the question what is the curvature tensor of the 3-dimensional space with
coordinates (r, xa ) = (r, , ) and line element
On the one hand, because one seems to have just added a trivial r-direction to the
2-sphere, one might be tempted to suspect that also this 3-dimensional space has
non-trivial curvature. On the other hand, we recognise the above metric as the
Euclidean metric on R3 , written in spherical coordinates, and as such we expect
its curvature (in fact, all components of the Riemann tensor) to be zero.
The latter expectation is of course borne out, but it is instructive to see explicitly
how this cancellation occurs. In fact, it will be even more instructive to consider
an apparently harmless and innocuous modification of the above metric which
consists in replacing dr 2 by some constant multiple of dr 2 ,
211
Equivalently, up to a truly harmless overall constant factor, we can think of this
as the Euclidean metric, but with the metric on the unit-sphere replaced by that
of a metric of radius 1/ p 6= 1),
ds2 = p dr 2 + (r 2 /p)(d 2 + sin2 d2 ) . (7.70)
with ab in this example denoting the components of the metric on the unit sphere
(and with abc and r abcd its associated Christoffel symbols and components of the
Riemann curvature tensor determined in the previous section). From these we can
deduce that for r > 0 the non-trivial Christoffel symbols are
From this, in turn, one finds that all the components of the Riemann tensor
involving at least one r-index are zero, whereas for the purely angular components
one finds
Rabcd = r abcd + acr rbd adr rbc . (7.73)
Using (7.61) and (7.72), one sees that
Therefore precisely for p = 1 the two contributions to the curvature tensor indeed
cancel and the curvature tensor is identically zero, as expected.
Equally interesting is the fact that for p 6= 1 the curvature is non-zero even away
from r = 0 (in addition, there is a conical deficit angle singularity at r = 0, as in
the next example below, but this shall not be our concern here). In particular it
follows from the above result that the only non-vanishing components of the Ricci
tensor of this 3-dimensional space are
We also see from this that this space actually has a curvature singularity as r 0.
Since the Ricci scalar is a scalar (under coordinate transformations), this diver-
gence cannot be an artefact of a bad choice of coordinates, and indicates that
there is a genuine geometric singularity for r 0.
Extended to a four-dimensional space-time metric via
212
this describes the gravitational field outside a monopole.20
4. As a final variation of this theme, we consider the above example in one dimension
less, i.e. we look at the metric one obtains if one replaces the Euclidean metric on
R2 written in polar coordinates by
dr 2 + r 2 d2 p dr 2 + r 2 d2 , (7.78)
This would be the standard Euclidean metric on R2 either for p = 1 or if the angle
had periodicity 2 p, but since has period 2, this results in a misidentification
of the points in a plane, like when one rolls up a flat piece of paper into a cone.
Away from r = 0, this space is intrinsically flat (all the components of the Riemann
curvature tensor are zero, as one can easily calculate - see section 10.1 for an
explanation of this use of the word intrinsic). There is, however, a conical
singularity at the tip of the cone r = 0, which can be thought of as a -function
contribution to the curvature localised at r = 0. Extended to a four-dimensional
space-time metric,
So far, we have discussed algebraic properties of the Riemann tensor. The Riemann
tensor also satisfies some differential identities which, in particular in their contracted
form, will be of fundamental importance in the following.
20
M. Barriola, A. Vilenkin, Gravitational Field of a Global Monopole, Phys. Rev. Lett. 63 (1989)
341-343.
21
It is far from straightforward, however, to find a formalism which allows one to caluclate and derive
the distributional Riemann tensor of this space-time - see R. Geroch, J. Traschen, Strings and other
distributional sources in general relativity, Phys. Rev. D36 (1987) 1017-1031 for a general analysis of
the problem and issues arising in this and related contexts, C. Clarke, J. Vickers, J. Wilson, Generalized
functions and distributional curvature of cosmic strings, Class. Quantum Grav. 13 (1996) 2485-2498
for one approach (based on the Colombeau algebra of distributions), and D. Garfinkle, Metrics with
distributional curvature, arXiv:gr-qc/9906053 for a different approach. We will (mostly) stay away
from distributional curvatures in these notes.
213
The first identity is easy to derive. As a (differential) operator the covariant derivative
clearly satisfies the Jacobi identity
[[ , [ , ] ]] = 0 (7.81)
[[ , [ , ] ]] = 0 [ , [ , ]]+ (, , ) = 0 . (7.82)
If you do not believe this identity (valid for any 3 associative linear operators), you can
just write out the twelve relevant terms explicitly to see that there is indeed a complete
cancellation:
[[ , [ , ] ]] +
+ +
+ +
= 0 . (7.83)
To determine the implications of this identity for the Riemann tensor, we apply it to a
vector field V , say. The first term in (7.82) is
[ , [ , ]]V = (R V ) [ , ]( V )
= ( R )V + R V R V + R V (7.84)
= ( R )V + R V .
Upon taking the cyclic permutations, the sum of the 2nd terms vanishes by the cyclic
symmetry of the Riemann tensor, and therefore one finds
( R )V + (, , ) = 0 . (7.85)
Since this holds for any V , one deduces the Bianchi identity
R + (, , ) = 0 [ R||] = 0 (7.86)
[ R] = 0 . (7.87)
R + R + R = 0 . (7.88)
214
By contracting this with g we obtain
R R + R = 0 . (7.89)
This is not yet particularly useful. To also turn the last term into a Ricci tensor we
contract once more, with g to obtain the contracted Bianchi identity
R R + R = 0 , (7.90)
or
(R 12 g R) = 0 . (7.91)
G = R 12 g R . (7.92)
It is the unique divergence-free tensor that can be built from the metric and its first
and second derivatives (apart from g itself, of course),
G = 0 , (7.93)
and this is why it will play the central role in the Einstein equations for the gravitational
field.
A minor caveat regarding the above statement about the uniqueness of the Einstein
tensor is that, as it stands, it is only true in D = 4 space-time dimensions. In D > 4,
there are other tensors with this property, but they are non-linear in 2nd derivatives of
the metric. The uniqueness statement continues to be true for D > 4 if one adds the
requirement that the tensor is linear in 2nd derivatives of the metric. I will briefly come
back to this in the discussion of the action principle for general relativity in section 19.1.
In particular,
215
2. this led us to consider the coordinate transformation (2.229)
a ( ) = 0a + a (7.96)
implying at = 0
(and we will look at the implications of the next term in the Taylor expansion of
(7.97) below).
Therefore the Taylor expansion of the metric around = 0 has the form
and we will now determine the quadratic term in this expansion (and be able to express
it in terms of the components of the Riemann tensor Rabcd (0 ) at the point p in these
coordinates). To that end we look at the next term in the Taylor expansion of (7.97).
Thus we differentiate (7.97) along the geodesic, i.e. with respect to , and evaluate the
results at = 0 to deduce
d abc (0 )d b c = 0 b (7.100)
or, equivalently
arising from the higher-order terms in the Taylor expansion of (7.97) impose constraints
on the Christoffel symbols and their derivatives that are satisfied in Riemann normal
coordinates (but not in general inertial coordinate systems).
A useful way of reexpressing the condition (7.101) is the following (a certain amount of
hindsight or trial-and-error is required for this): because abc (0 ) = 0, from the definition
of the Riemann tensor we have
Rabcd (0 ) + Racbd (0 ) = c abd (0 ) d abc (0 ) + b acd (0 ) d acb (0 )
(7.103)
= c abd (0 ) + b acd (0 ) 2d abc (0 ) ,
216
and using (7.101) this can be written as
we have
gab , cd () = d abc () + d bac () (7.106)
and at 0 we can use (7.104) and the symmetries of the Riemann tensor to deduce
We have thus found that, to quadratic order in a Taylor expansion of the metric around
the origin of a Riemann normal coordinate system, the metric can be written as
If required, higher order terms can be determined analogously with the help of the
higher order terms in the Taylor expansion of (7.97), and (with a steady hand) can be
expressed in terms of the covariant derivatives of the Riemann tensor at 0 .
In sections 3.1 and 5.1 on the principles of general covariance and minimal coupling
respectively, I mentioned that these do not necessarily fix the equations uniquely. In
other words, there could be more than one generally covariant equation which reduces
to a given equation in Minkowski space. Having the curvature tensor at our disposal
now, we can construct examples of this kind.
Given some tensorial equation, obtained by the minimal coupling prescription, say,
one can always contemplate the possiblity to add additional terms to it involving the
curvature tensor. Since such terms take the form of higher derivative corrections to the
original equation, multiplied by appropriate dimensionful constants, one can usually
get away with ignoring such terms when dealing with weak fields and other low-energy
phenomna, and under such conditions the minimal coupling rule can usually be trusted.
However, such terms are not negligible under extreme conditions involving e.g. very
strong or strongly fluctuating gravitational fields.
An example which shows very clearly that the minimal coupling prescription, at least
the way we have formulated it, is itself ambiguous is, as already briefly pointed out in
217
section 5.6, provided by Maxwell theory. In that case, we saw that in the covariant
Lorenz gauge one has (5.45)
A = 0 F = ( A A ) = A [ , ]A , (7.109)
where A = A . It thus follows from (7.38) that the Maxwell equations in the
covariant Lorenz gauge can be written as (5.46)
A = 0 F = J A R A = J . (7.110)
What this shows is that minimal coupling all by itself is not a unique prescription, as
we would have obtained (7.110) without the curvature terms by applying the minimal
coupling prescription to the special relativity Maxwell equation in the Lorenz gauge,
namely just A = J .
In the present situation, (5.46) is superior to the equation without the curvature term
because
and (related to this) because (7.110) implies that the current is covariantly con-
served (as we had verified in section 5.6 in an arbitrary gauge), while for the
equation without the curvature term covariant current conservation would then
be violated by a curvature term (as can easily be verified).
Thus occasionally some such additional criteria can be used to eliminate (or reduce) the
ambiguity in the minimal coupling prescription, but this need not always be the case.
As another example, consider the wave equation for a (massless, say) scalar field . In
Minkowski space, this is the Klein-Gordon equation which has the obvious curved space
analogue (4.55)
= 0 (7.111)
obtained by the minimal coupling description. However, one could equally well postulate
the equation
( + R) = 0 , (7.112)
218
can be imposed to select a particular non-zero value for (e.g. for a 4-dimensional
space-time this turns out to be the value = 1/6). This will be discussed and explained
in section 21.3.
Thus in general such ambiguities are present and are something one has to live with.
219
B: General Relativity and Geometry
In this second part of the lecture notes I have collected a number of different topics that
develop the formalism of tensor calculus in one way or another. This does not mean,
however, that one necessarily needs to digest all these topics before continuing with the
physical applications of general relativity, and I do not even recommend this.
Stricly speaking none of these topics are essential for understanding some of the more
elementary aspects of general relativity to be treated later on, e.g. the discussion of
the Einstein equations, the field equations for gravity, in section 18, the discussion
of gravitational waves in section 22, or the analysis of geodesics in the Schwarzschild
geometry and the corresponding solar system tests of general relativity in section 24.
Some of the topics treated below will reappear frequently in subsequent sections, e.g.
Killing vectors (section 8) and their associated conserved quantities (section 9), or the
Gauss integral formula derived in section 15.3, and it will be useful to develop at least
some nodding acquaintance with these things.
either to illustrate the relation between the Riemann curvature tensor, a central
object of interest in general relativity and defined in a somewhat pragmatic and
perhaps unintuitive fashion in section 7, and more intuitive and/or geometric
concepts of curvature;
or simply because they are fun or beautiful (or both), and provide an invitation
to the wonderful world of differential geometry;
220
8 Lie Derivative, Symmetries and Killing Vectors
Symmetries and their consequences play a fundamental role in physics. In the present
context, these are symmetries of the gravitational field or of the space-time metric.
Before trying to figure out how to detect symmetries of a metric, or so-called isometries,
let us decide what we mean by symmetries of a metric.
For example, we would say that the Minkowski metric has the Poincare group as a group
of symmetries, because the corresponding coordinate transformations leave the metric
invariant.
Likewise, we would say that the standard metrics on the two- or three-sphere have
rotational symmetries because they are invariant under rotations of the sphere. We can
look at this in one of two ways: either as an active transformation, in which we rotate
the sphere and note that nothing changes, or as a passive transformation, in which we
do not move the sphere, all the points remain fixed, and we just rotate the coordinate
system. So this is tantamount to a relabelling of the points. From the latter (passive)
point of view, the symmetry is again understood as an invariance of the metric under a
particular family of coordinate transformations.
Thinking actively, in order to detect symmetries, we should e.g. compare the geometry,
given by the line-element ds2 = g dx dx , at two different points x and y related by
y (x). Thus we are led to consider the difference
Using the invariance of the line-element under coordinate transformations, i.e. the usual
tensorial transformation behaviour of the components of the metric, we see that we can
also write this as the difference
(g (y) g (y))dy dy . (8.3)
221
Thus we deduce that what we mean by a symmetry, i.e. invariance of the metric under
a coordinate transformation, is the statement
g (y) = g (y) . (8.4)
From the passive point of view, in which a coordinate transformation represents a rela-
belling of the points of the space, this equation compares the new metric at a point P
(with coordinates y ) with the old metric at the point P which has the same values of
the old coordinates as the point P has in the new coordinate system, y (P ) = x (P ).
The above equality then states that the new metric at the point P has the same
functional dependence on the new coordinates as the old metric on the old coordinates
at the point P . Thus a neighbourhood of P in the new coordinates looks identical to
a neighbourhood of P in the old coordinates, and they can be mapped into each other
isometrically, i.e. such that all the metric properties, like distances, are preserved. Thus
either actively or passively one is led to the above condition.
Note that to detect a continuous symmetry in this way, we only need to consider infinites-
imal coordinate transformations. In that case, the above amounts to the statement that
metrically the space time looks the same when one moves infinitesimally in the direction
given by the coordinate transformation.
We now want to translate the above discussion into a condition for an infinitesimal
coordinate transformation
to generate a symmetry of the metric. Here you can and should think of V as a
vector field because, even though coordinates themselves of course do not transform like
vectors, their infinitesimal variations x do,
z
z = z (x) z = x (8.6)
x
and we think of x as V .
In fact, we will do something slightly more general than just trying to detect symmetries
of the metric. After all, we can also speak of functions or vector fields with symmetries,
and this can be extended to arbitrary tensor fields (although that may be harder to
visualise). So, for a general tensor field T we will want to compare T (y(x)) with
T (y(x)) - this is of course equivalent to, and only technically slightly more convenient
in the following than, comparing T (x) with T (x).
222
As usual, we start the discussion with scalars. In that case, we want to compare (y(x))
with (y(x)) = (x). We find
(y(x)) (y(x))
LV := lim . (8.8)
0
Evaluating this, we find
LV = V . (8.9)
Thus for a scalar, the Lie derivative is just the ordinary directional derivative, and this
is as it should be since saying that a function has a certain symmetry amounts to the
assertion that its derivative in a particular direction vanishes.
We now follow the same procedure for a vector field W . We will need the matrix
(y /x ) and its inverse for the above infinitesimal coordinate transformation. We
have
y
= + V , (8.10)
x
and
x
= V + O(2 ) . (8.11)
y
Thus we have
y
W (y(x)) = W (x)
x
= W (x) + W (x) V (x) ,
(8.12)
and
W (y(x)) W (y(x))
LV W := lim , (8.14)
0
we find
LV W = V W W V . (8.15)
223
1. The result looks non-covariant, i.e. non-tensorial, but as a difference of two vectors
at the same point (recall the limit 0) the result should again be a vector. This
is indeed the case. One way to make this manifest is to rewrite (8.15) in terms of
covariant derivatives, as
LV W = V W W V
= V W W V . (8.16)
This shows that LV W is again a vector field. Note, however, that the Lie deriva-
tive, in contrast to the covariant derivative, is defined without reference to any
metric.
2. There is an alternative, and perhaps more intuitive, derivation of the above ex-
pression (8.15) for the Lie derivative of a vector field along a vector field, which
makes both its tensorial character and its interpretation manifest (and which also
generalises to other tensor fields; in fact we had already applied it to the metric
in section 2.6 to deduce (2.100)).
Namely, let us assume that we are initially in a coordinate system {y } adapted
to V in the sense that V = /y a for some particular a, i.e. V = a (so that
[V, W ] := LV W = LW V . (8.20)
224
This is actually a Lie bracket, i.e. it satisfies the Jacobi identity
This can also be rephrased as the statement that the Lie derivative is also a
derivation of the Lie bracket, i.e. that one has
4. I want to reiterate at this point that it is extremely useful to think of vector fields
as first order linear differential operators, via V V = V . In this case, the
Lie bracket [V, W ] is simply the ordinary commutator of differential operators,
[V, W ] = [V , W ]
= V ( W ) + V W W ( V ) W V
= (V W W V )
= (LV W ) = [V, W ] . (8.23)
5. From the above it is evident that if one has two vector fields of the form V(k) = yk ,
they commute as differential operators, i.e. their Lie bracket is zero,
Conversely it is also true that locally this is a sufficient condition for the existence
of such coordinates,
6. For example, if one has a 2-parameter surface x = x (, ), which one can think
of as a 1-parameter family of curves x ( ) labelled by , then the tangent vector
field = x to the family of curves and the connecting vector field (or deviation
vector field) = x have vanishing Lie bracket.
Conversely this also provides a good visualisations of what it means for two vector
fields to Lie commute, namely that locally they span a 2-dimensional surface and
generate a coordinate grid on that surface. We will make use of this in section
11.1 when discussing the so-called geodesic deviation equation.
7. Having equipped the space of vector fields with a Lie algebra structure, in fact
with the structure of an infinite-dimensional Lie algebra, it is fair to ask the
Lie algebra of what group?. Well, we have seen above that we can think of
vector fields as infinitesimal generators of coordinate transformations. Hence,
formally at least, the Lie algebra of vector fields is the Lie algebra of the group
225
of coordinate transformations (passive point of view) or diffeomorphisms (active
point of view).22 We will briefly come back to this below, in remark 1 of section
8.4.
for the relation between the commutator of directional covariant derivatives and
the Riemann curvature tensor. There we had used the abbreviation [X, Y ] for the
vector field X Y Y X . Comparing with (8.16), we see that this is indeed
just the Lie bracket [X, Y ] . Thus one way of interpreting the Riemann tensor
is that the curvature measures the failure of the covariant derivative to provide a
representation of the Lie algebra of vector fields.
To extend the definition of the Lie derivative to other tensors, we can proceed in one of
two ways. We can either extend the above procedure to other tensor fields by defining
T
(y(x)) T (y(x))
LV T
:= lim . (8.27)
0
Or we can extend it to other tensors by proceeding as in the case of the covariant
derivative, i.e. by demanding the Leibniz rule. The Lie derivative on an arbitrary tensor
is then uniquely determined by its action on scalars and vectors.
In either case, the result can be rewritten in manifestly tensorial form in terms of
covariant derivatives. For example, for a covector one finds
LV A = V A + ( V )A = V A + ( V )A . (8.28)
The general result is that the Lie derivative of a (p, q)-tensor T is, like the covariant
derivative, the sum of three kinds of terms: the directional covariant derivative of T
along V , p terms with a minus sign, involving the covariant derivative of V contracted
with each of the upper indices, and q terms with a plus sign, involving the convariant
derivative of V contracted with each of the lower indices (note that the plus and minus
signs are interchanged with respect to the covariant derivative). Thus, e.g., the Lie
derivatives of a (0,2) and a (1,2)-tensor are
LV T = V T + T V + T V
(8.29)
LV T = V T T V + T V + T V .
22
See e.g. H. Gl ockner, Fundamental problems in the theory of infinite-dimensional Lie groups,
arXiv:math/0602078 [math.GR] for an introduction and a survey of the problems that arise when
dealing with or trying to define infinite-dimensional Lie groups.
226
Remarks:
1. While it is not obvious from the somewhat pedestrian definition of the Lie deriva-
tive that we have given here, the Lie derivative is an extremely natural operation
on tensors. In differential geometry textbooks (and mathematically more sophis-
ticated accounts of general relativity) it is defined as follows:
(c) Define the Lie derivative to be the infinitesimal generator of this action,
d t
LV T := ( ) T |t=0 . (8.31)
dt V
While this definition can be shown to be equivalent to the definition of the Lie
derivative given above in terms of coordinates, Taylor expansions etc., this defi-
nition is evidently more compact, more illuminating and somewhat more to the
point. In particular, it makes the tensorial nature of the Lie derivative manifest.
However, in order to arrive at explicit expressions for the Lie derivative of the
components of a tensor, one then still needs to perform a calculation equivalent
to (8.27).
2. The fact that the Lie derivative provides a representation of the Lie algebra of
vector fields by first-order differential operators on the space of (p, q)-tensors is
expressed by the identity
[LV , LW ] = L[V,W ] . (8.32)
227
8.5 Lie Derivative of the Metric and Killing Vectors
The above general formula (8.29) for the Lie derivative of a tensor becomes particularly
simple for the metric tensor g . The first term is not there (because the metric is
covariantly constant), so the Lie derivative is the sum of two terms (with plus signs)
involving the covariant derivative of V ,
LV g = g V + g V . (8.35)
Lowering the index of V with the metric, this can be written more succinctly as
LV g = V + V . (8.36)
The not manifestly covariant avatar of this equation (recall that fundamentally the Lie
derivative requires no notion of a covariant differentiation) is
LV g = V g + V g + V g . (8.37)
A quick alternative way to arrive at this result is to look directly at the infinitesimal
version of the difference
g (y)dy dy g (x)dx dx (8.38)
which was the starting point of our discussion in section 8.1 above. Namely, we consider
the infinitesimal coordinate transformation
V x = V V dx = dV = ( V )dx
(8.39)
V g (x) = V g (x) ,
and define the Lie derivative of the metric by the change this operation V induces in
the line element,
V (g dx dx ) (LV g )dx dx . (8.40)
This leads directly to (8.37) and thus to (8.36).
We are now ready to return to our discussion of isometries (symmetries of the metric).
Evidently, an infinitesimal coordinate transformation is a symmetry of the metric if
LV g = 0. By (8.36) this can be written as (see also (4.66))
V generates an isometry LV g = 0
(8.41)
V + V = 0 .
Vector fields V satisfying this equation are called Killing vectors - not because they
kill the metric but after the 19th century mathematician W. Killing.
The alternative non-covariant way (8.37) of writing the Killing equation makes it man-
ifest that only components and derivatives of the metric in the V -direction enter in this
condition,
V + V = 0 V g + V g + V g = 0 . (8.42)
228
This is precisely the condition (2.101) we had encountered first in our discussion of first
integrals of motion for the geodesic equation, and which we had already rewritten in
terms of covariant derivatives, as in (8.36) above, in (4.65).
Since they are associated with symmetries of space time, and since symmetries are
always of fundamental importance in physics, Killing vectors will play an important
role in the following. Our most immediate concern (in section 9, in particular section
9.1) will be with the conserved quantities associated with Killing vectors. Other aspects
of Killing vectors and their interplay with the geometry of a space-time will be discussed
in sections 12 and 13. For now we just note the following simple facts and examples:
1. Note that by virtue of (8.32) Killing vectors form a Lie algebra, i.e. if V and W
are Killing vectors, then also [V, W ] is a Killing vector,
LV g = LW g = 0 L[V,W ] g = 0 . (8.43)
Indeed one has
L[V,W ]g = LV LW g LW LV g = 0 . (8.44)
An explicit proof of this fact will be given later on in section 12.2.
2. The resulting algebra of Killing vectors is the Lie algebra of the isometry group
of the metric. For example, the collection of all Killing vectors of the Minkowski
metric generates the Lie algebra of the Poincare group. Indeed, for the Minkowski
space-time in inertial (Cartesian) coordinates a , i.e. with the constant standard
metric ab , the Killing condition simply becomes
a Vb + b Va = 0 , (8.45)
which is solved by
V a = ab b + a (8.46)
where the a are constant parameters and the constant matrices ab satisfy ab =
ba . These are precisely the infinitesimal Lorentz transformations and transla-
tions of the Poincare algebra, as given e.g. in (1.24).
Choosing as a basis for the Killing vectors of Minkowski space the vectors
Pa = a , Mab = a b b a , (8.47)
so that the general Killing vector V a (8.46) can be expanded as
V = V a a = 21 ab Mab + a Pa , (8.48)
the Lie algebra (algebra of Lie brackets) is given by
[Pa , Pb ] = 0
[Mab , Pc ] = ac Pb + bc Pa (8.49)
[Mab , Mcd ] = ad Mbc + bc Mad ac Mbd bd Mac .
This is of course the Lie algebra of the Poincare group.
229
3. Another simple example is provided by the two-sphere: as mentioned before, in
some obvious sense the standard metric on the two-sphere is rotationally invariant.
In particular, with our new terminology we would expect the vector field , i.e.
the vector field with components V = 1, V = 0 to be Killing. Let us check
this. With the metric d 2 + sin2 d2 , the corresponding covector V , obtained by
lowering the indices of the vector field V , are
V = 0 , V = sin2 . (8.50)
V = V V
= sin2 = 0
V + V = V V + V V
= 2 sin cos 2 cot sin2 = 0
V = V V = 0 . (8.51)
Alternatively, using the non-covariant form (8.42) of the Killing equation, one
finds, since V = 1, V = 0 are constant, that the Killing equation reduces to
g = 0 , (8.52)
which is obviously satisfied. This is clearly a simpler and more efficient argument.
By solving the Killing equations on S 2 , in addition to V(3) one finds two
other linearly independent Killing vectors V(1) and V(2) , namely
Note that V(3) evidently relates these two other Killing vectors by
This is the Lie algebra of infinitesimal rotations, i.e. of the rotation group SO(3),
which is the isometry group of the standard metric on S 2 .
230
4. In general, if the components of the metric are all independent of a particular
coordinate, say y, then by the above argument V = y is a Killing vector,
Such a coordinate system, in which one of the coordinate lines agrees with the
integral curves of the Killing vector, is said to be adapted to the Killing vector (or
isometry) in question. For any given Killing vector V one can always introduce
local coordinates such that V takes the form V = y . It suffices to choose as y
the parameter along the integral curves of V , using the remaining coordinates to
label the individual integral curves.
5. If one has two Killing vector fields V(1) and V(2) , then the necessary and sufficient
condition that one can introduce local coordinates (y 1 , y 2 , . . .) that are adapted
to both of them, i.e. such that V(k) = yk is that they commute as differential
operators, i.e. that they have vanishing Lie bracket,
6. As we did in section 2.6, one can also take the above equations (8.57) as the
starting point for what one means by a symmetry of the metric (isometry) and
then simply transform it to an arbitrary coordinate system by requiring that it
transforms as a (0, 2)-tensor. Then one arrives at the Killing condition in the form
(8.42).
7. Because by definition the geometry of a space-time does not change along the
orbits of a Killing vector, it is intuitively obvious that in particular the norm of
a Killing vector V should be constant along (the orbits of) V , and this is indeed
easy to prove. Here are two simple proofs of this statement, one using covariant
derivatives and the other using Lie derivatives:
V (V V ) = V (V V ) = 2V V V = 0 (8.59)
by anti-symmetry of V .
(b) Using Lie derivatives, one calculates
V (V V ) = LV (g V V )
(8.60)
= (LV g )V V + 2g (LV V )V = 0
231
8. An occasionally useful result that provides an interesting relation between geodesics
and Killing vectors (different from the one to be discussed below in section 9.1)
and that is straightforward to establish, is the fact that a Killing vector field is
geodesic if and only if it is of constant length. This follows by contracting the
Killing equation with V and writing
0 = V ( V + V ) = V V + 21 (V V ) . (8.61)
9. As an aside: a minimal variation of this proof establishes the same result for
gradient vector fields V = S instead of Killing vector fields, namely that a
gradient vector field is geodesic if and only if it is of constant length. Since a
gradient vector field satisfies
V = S V V = 0 (8.62)
0 = V ( V V ) = V V 12 (V V ) , (8.63)
It is straightforward to extend the Lie derivative to tensor densities. Given the fact
expressed in (3.57) that any tensor density can be written as tensor times a suitble
power of the determinant g of the metric, all we need to know is the Lie derivative
acting on g. For this we can use the general variational formula (4.74) to deduce
LV g = g g LV g . (8.64)
LV g = g g ( V + V ) = 2g V , (8.65)
and for the ubiquitous volume element g one finds
LV g= g V . (8.66)
It follows for example that for a scalar density of weight 1 gF , F a scalar, one has
LV ( g F ) = g(V F + F V ) = g (V F ) . (8.67)
232
Using (4.49), this can also be written as a total derivative
LV ( g F ) = ( g V F ) . (8.68)
This identity lies at the heart of the general covariance of actions built from scalars or
scalar densities, and we will discuss this aspect in more detail in sections 19.6 and 21.2.
Analogously, the Lie derivative can be extended to tensor densities of any rank and
weight.
233
9 Killing Vectors, Symmetries and Conserved Charges
We are used to the fact that symmetries lead to conserved quantities (Noethers theo-
rem). For example, in classical mechanics, the angular momentum of a particle moving
in a rotationally symmetric gravitational field is conserved. In the present context, the
concept of symmetries of a gravitational field is replaced by symmetries of the met-
ric, and we therefore expect conserved charges associated with the presence of Killing
vectors. Here are the two most important classes of examples of this phenomenon:
QK = K x (9.1)
Note that this is precisely the conserved quantity QV (2.102) with V K deduced
from Noethers theorem and the variational principle for geodesics in section 2.6.
234
9.2 Conformal Killing Vectors and Conserved Charges
Another situation of interest occurs when one has a theory invariant under Weyl rescal-
ings and thus a traceless energy-momentum tensor (section 6.7). In that case one can
associate conserved currents not only to Killing vectors fields but also to conformal
Killing vectors C , satisfying
C + C = 2(x)g (9.5)
for some function (x). Such conformal Killing vectors generate coordinate transfor-
mations that leave the metric invariant up to an overall (Weyl) rescaling.
If the theory is invariant under such Weyl rescalings, then the energy-momentum tensor
is traceless and there should also be a corresponding conserved current. Indeed, we have
JC = T C (9.6)
JC = ( T )C + T C
= 0 + 12 T ( C + C ) = (x)T g = 0 . (9.7)
We will look at the example of the conformal Killing vectors of Minkowski space in more
detail in section 9.3 below.
There is also a counterpart of statement 1 (conserved charges for geodesics) in the case
of conformal Killing vectors, namely for null geodesics (this condition replacing the
assumption in statement 2 that the energy-momentum tensor is traceless):
1 Let C be a conformal Killing vector field, and let x ( ) be a null geodesic. Then
the quantity
QC = C x (9.8)
is constant along the geodesic. Indeed, repeating the calculation leading to state-
ment 1, for a null geodesic one has
d d
QC = (C x ) = 12 ( C + C )x x = (x)g x x = 0 . (9.9)
d d
We will make use of (9.9) in the discussion of the cosmological redshift in section
33.7.
235
As an aside, note that if K is a true Killing vector for a metric g , say, then it is at
least a conformal Killing vector for any conformally rescaled metric
g = e 2(x) g . (9.10)
Indeed, writing the Killing equation in the non-covariant form (8.42) (in order to avoid
having to determine the covariant derivatives or Christoffel symbols of conformally
rescaled metrics)
K g + K g + K g = 0 (9.11)
and expressing this in terms of the metric g , one finds
K g + K g + K g = 2(K )g . (9.12)
Thus K will be a true Killing vector field for the rescaled metric if the conformal
factor (x) is constant along the orbits (integral curves) of K , and will otherwise be
a conformal Killing vector field. Conformal Killing vector fields that do not arise from
true Killing vector fields in this way are called essential. In the Riemannian case it is
known that (under some technical assumptions) metrics admitting essential conformal
vector fields are conformal to the standard metric on the sphere or the Euclidean space.
In the pseudo-Riemannian (Lorentzian signature) case the situation turns out to be
quite different (with an interesting connection with the plane wave metrics that are the
subject of section 42).23
As an example, let us consider 4-dimensional Minkowski space. In that case there are
5 conformal Killing vectors (in addition to the 10 true Killing vectors (8.46) generating
Poincare transformations).
D = a a : a Db + b Da = 2ab (9.14)
of dilatations,
236
In this case (x) = 1 is constant, and such a conformal symmetry is called a ho-
mothety (see also section 9.4 below). Provided that one has a symmetric traceless
conserved energy-momentum tensor, one has a corresponding conserved current
a
JD = T ab D b = T ab b . (9.16)
The dilatation and the special conformal transformation enlarge the Poincare algebra
(8.49) of translations and Lorentz transformations to the conformal algebra. Adding the
generators D and Cb C (b) to the generators Pa and Mab of the Poincare algebra, one
finds the extended algebra
[Pa , Pb ] = 0
[Mab , Pc ] = ac Pb + bc Pa
[Mab , Mcd ] = ad Mbc + bc Mad ac Mbd bd Mac
[D, Pa ] = Pa
[Mab , D] = 0 (9.22)
[Pa , Cb ] = 2(ab D Mab )
[Mab , Cc ] = ac Cb + bc Ca
[D, Ca ] = Ca
[Ca , Cb ] = 0 .
237
Here
the fourth expresses the obvious fact that Pa = a is homogeneous of degree (-1)
under the dilatation generated by D;
the eighth says that Ca is homogeneous of degree (+1) under the dilatation gen-
erated by D.
the last relation says that special conformal transformations generate an Abelian
algebra (corresponding to the fact that they generate inverted translations).
Thus the only relation that is not a priori obvious is the sixth, [Pa , Cb ] = 2(ab D Mab ),
but this follows simply from
It is perhaps also not obvious at first sight that this conformal Lie algebra is isomorphic
to the Lie algebra of SO(2, 4), or SO(2, D) in D space-time dimensions. This is the
group of rotations in the (D+2)-dimensional pseudo-Euclidean space R2,D preserving the
metric AB with signature (+. . .+), i.e. the indices have the range A = 0, 1, . . . , D+1,
and DD = (D+1)(D+1) = +1. Its Lie algebra is just the obvious counterpart of the
D-dimensional Lorentz Lie algebra (8.49), namely
Concretely, with z A Cartesian coordinates on R2,D , this Lie algebra can be realised as
the algebra of rotational Killing vectors of the metric AB , given by
Returning to the conformal algebra, it is now easy to see that with the identification
the Lie algebra relations (9.22) and (9.24) are mapped precisely into each other.
Thus, when one has a conserved, symmetric, traceless energy-momentum tensor, one
can construct conserved currents for the entire conformal group and thus has a (at least
classically) conformally invariant field theory (or conformal field theory for short).
As we have seen in section 6.7, when the matter action is invariant under Weyl rescalings
of the metric alone, the covariant energy-momentum tensor is conserved, symmetric and
238
traceless, and thus the specialisation of the theory to Minkowski space should define a
conformal field theory.
There is an interesting twist to this story when one also needs to transform the matter
fields (and modify the action by non-minimal couplings to the gravitational field) which
will be discussed in section 21.3.
Finally, let us consider the special case that the conformal factor (x) in (9.5) is constant,
(x) = 0 ,
C + C = 20 g (9.27)
In that case, the transformation generated by the conformal Killing vector is called a
homothety.
D = a a (9.28)
with Aab (u) an arbitrary function of u. These metrics have the homothety
for any choice of plane wave profile Aab (u), and this homothety is generated by
C = 2vv + xa xa . (9.31)
Whenever one has such a homothety, there is an explicitly -dependent conserved quan-
tity even for non-null geodesics:
QC = C x 0 g x x (9.32)
is constant along the geodesic. Indeed, repeating the calculation leading to state-
ment 1, and using the fact that g x x is constant, one finds
d
QC = 0 g x x 0 g x x = 0 . (9.33)
d
239
Remarks:
1. Note that for a null geodesic (9.32) reduces to the conserved charge C x (9.8) in
1 above (which does not explicitly depend on ).
2. The existence of this constant of motion can also be understood from the Noether
theorem (applied now to transformations of the fields x ( ) and the coordi-
nate ). Indeed, when one has a homothety, one has
g dx dx 2 g dx dx , (9.34)
2 d g x x d g x x , (9.35)
QD = ab a b ab a b . (9.36)
pa pa = m2 , (9.39)
240
9.5 Conserved Charges from Killing Tensors and Killing-Yano Tensors
When a metric possesses sufficiently many symmetries (Killing vectors), the geodesic
equations (or the associated Hamilton-Jacobi equation) or, say, the Klein-Gordon equa-
tion or some other field equation in that background are separable and can hence be
reduced to quadratures of ordinary differential equations. It is not uncommon, how-
ever, in particular in the context of black hole physics, to encounter space-times in which
these equations can be separated even though there appear not to be enough isometries
(symmetries of the metric) to explain this. In many cases, this phenomenon can be
explained via (or deduced from) the existence of additional (hidden) symmetries of the
problem, associated not to Killing vectors but to certain higher-rank generalisations
thereof. Most prominent among them are (totally symmetric) Killing tensors (occa-
sionally also called Killing-Stackel tensors), and (totally anti-symmetric) Killing-Yano
tensors.
To set the stage, recall from above that a Killing vector satisfies
( K) = 0 K = [K] (9.41)
and that using the geodesic equation x x = 0 this leads to a first integral QK =
K x of the geodesic equations of motion via the simple chain of manipulations
d
(K x ) = x (K x ) = x x K = 0 (9.42)
d
by symmetry of x x and anti-symmetry of K .
This has the following two immediate (and, as it turns out, actually useful in practice)
generalisations:
This is evidently one possible generalisation of the Killing vector equation (9.41)
to higher rank tensors (generalising the first formulation in (9.41)). Then
QK = K1 ...n x 1 . . . x n (9.44)
241
2. Killing-Yano Tensors
Let Y1 ...n be totally anti-symmetric rank-n tensor satisfying the Killing-Yano
equation
( Y1 )...n = 0 Y1 ...n = [ Y1 ...n ] (9.46)
This is evidently another possible generalisation of the Killing vector equation
(9.41) to higher rank tensors. Then the tensorial charges
Remarks:
1. Trivial examples of Killing tensors are the metric g (whose associated conserved
quantity g x x we already know), and products of Killing vectors K . . . K
which do not yield any new independent constants of motion beyond those pro-
vided by the Killing vectors. New constants of motion are associated with Killing
tensors that cannot be constructed from the metric and the Killing vectors alone.
Trivial Killing-Yano tensors are Killing vectors K and the Levi-Civita tensor (in
four dimensions ).
2. There are interesting relations between Killing-Yano tensors and Killing tensors.
For example, it is not difficult to check that if Y is a rank-2 Killing-Yano tensor,
then its square
K = Y Y (9.49)
(which is symmetric) is a rank-2 Killing tensor (and squares of trivial Killing-Yano
tensors give rise to trivial Killing tensors, as in K K K ). Indeed, the totally
symmetrised covariant derivative of this K can be expresed in terms of partially
symmetrised covariant derivatives of Y , but by definition of a Killing-Yano tensor
its covariant derivatives are totally anti-symmetric, and hence
Y = [ Y] ( K) = 0 . (9.50)
( C) = (x)g , (9.51)
242
and just as the latter these turn out to be useful for massless particles or fields.
For example, a rank 2 conformal Killing tensor satisifies an equation of the form
( C) = g( V) (9.52)
for some (co-)vector field V . Repeating the calculation (9.45) in the case at hand
for the quantity QC = C x x , one finds
d
QC = ( C) x x x = g V x x x (9.53)
d
which evidentliy vanishes for null geodesics (g x x = 0).
4. Historically, the discovery of (conformal) Killing and Killing-Yano tensors for the
Kerr metric, the metric describing a rotating black hole (see section 29.1) and
their relation to the separability of the geodesic and field equations in the Kerr
background played a decisive role in the development of the subject.24
24
For more information about and examples and applications of Killing(-Yano) tensors, see e.g. section
35.3 of H. Stephani, D. Kramer, M. MacCallum, C. Hoenslaers, E. Herlt, Exact Solutions to Einsteins
Field Equations - Second Edition or the articles O. Santillan, Killing-Yano tensors and some applica-
tions, arXiv:1108.0149 [hep-th], F. Larsen, C. Keeler, Separability of Black Holes in String Theory,
arXiv:1207.5928 [hep-th] and the references therein.
243
10 Curvature II: Geometry and Curvature
In this section, we will first discuss two properties of the Riemann curvature tensor
that illustrate its geometric significance and thus, a posteriori, justify equating the
commutator of covariant derivatives with the intuitive concept of curvature. These
properties are
the fact that the space-time metric is equivalent to the (in an obvious sense flat)
Minkowski metric if and only if the Riemann curvature tensor vanishes.
We then briefly discuss some other general aspects of the relation between geometry
and curvature (while the interplay between geodesics and curvature and Killing vectors
and curvature will be discussed in sections 11 and 12 respectively).
The Riemann curvature tensor and its relatives, introduced above, measure the intrinsic
geometry and curvature of a space or space-time. This means that they can be calculated
by making experiments and measurements in the space itself. Such experiments might
involve things like checking if the interior angles of a triangle add up to or not.
This intrinsic geometry and curvature described above should be contrasted with the
extrinsic geometry which depends on how the space may be embedded in some larger
space. As we have no intention of embedding space-time into something higher dimen-
sional, we will mainly be concerned with intrinsic geometry in the following. However,
if you would for example be interested in the properties of spacelike hypersurfaces in
space-time, then aspects of both intrinsic and extrinsic geometry of that hypersurface
would be relevant. See section 17 for some further comments on this.
Let us return to intrinsic geometry. An even better method, the subject of this section,
to determine the curvature is to check the properties of parallel transport. The tell-tale
sign (or smoking gun) of the presence of curvature is the fact that parallel transport is
path dependent, i.e. that parallel transporting a vector V from a point A to a point B
along two different paths will in general produce two different vectors at B. Another
way of saying this is that parallel transporting a vector around a closed loop at A will
in general produce a new vector at A which differs from the initial vector.
This is easy to see in the case of the two-sphere, for which we also worked out explicitly
the parallel transport in section 4.9 (see Figure 8). Since all the great circles on a
two-sphere are geodesics, in particular the segments N-C, N-E, and E-C in the figure,
we know that in order to parallel transport a vector along such a line we just need to
244
N
E
4 3
C
make sure that its length and the angle between the vector and the geodesic line are
constant. Thus imagine a vector 1 at the north pole N, pointing downwards along the
line N-C-S. First parallel transport this along N-C to the point C. There we will obtain
the vector 2, pointing downwards along C-S. Alternatively imagine parallel transporting
the vector 1 first to the point E. Since the vector has to remain at a constant (right)
angle to the line N-E, at the point E parallel transport will produce the vector 3 pointing
westwards along E-C. Now parallel transporting this vector along E-C to C will produce
the vector 4 at C. This vector clearly differs from the vector 2 that was obtained by
parallel transporting along N-C instead of N-E-C.
To illustrate the claim about closed loops above, imagine parallel transporting vector 1
along the closed loop N-E-C-N from N to N. In order to complete this loop, we still have
to parallel transport vector 4 back up to N. Clearly this will give a vector, not indicated
in the figure, different from (and pointing roughly at a right angle to) the vector 1 we
started off with.
The precise statement regarding the relation between the path dependence of parallel
transport and the presence of curvature is the following. If one parallel transports a
covector V (I use a covector instead of a vector only to save myself a few minus signs
here and there) along a closed infinitesimal loop x ( ) with, say, x(0 ) = x(1 ) = x0 ,
245
then one has I
V (1 ) V (0 ) = 1
2( x dx )R (x0 )V (0 ) . (10.1)
Thus an arbitrary vector V will not change under parallel transport around an arbitrary
small loop at x0 only if the curvature tensor at x0 is zero. This can of course be extended
to finite loops, but the important point is that in order to detect curvature at a given
point one only requires parallel transport along infinitesimal loops.
Before turning to a proof of this result, I just want to note that intuitively it can be
understood directly from the definition of the curvature tensor (7.2). Imagine that the
infinitesimal loop is actually a tiny parallelogram made up of the coordinate lines x1
and x2 . Parallel transport along x1 is governed by the equation 1 V = 0, that along
x2 by 2 V = 0. The fact that parallel transporting first along x1 and then along x2
can be different from doing it the other way around is precisely the statement that 1
and 2 do not commute, i.e. that some of the components R12 of the curvature tensor
are non-zero.
For sufficiently small (infinitesimal) loops, we can expand the Christoffel symbols as
The linear term in the expansion of V ( ) arises from the zeroth order contribution
(x0 ) in the first order (single integral) term in (10.4),
Z 1
[V (1 ) V (0 )](1) = (x0 )V (0 )( d x ( )) . (10.6)
0
Now the important observation is that, for a closed loop, the integral in brackets is zero,
Z 1
d x ( ) = x (1 ) x (0 ) = 0 . (10.7)
0
246
Thus the change in V ( ), when transported along a small loop, is at least of second
order. Such second order terms arise in two different ways, from the first order term
in the expansion of (x) in the first order term in (10.4), and from the zeroth order
terms (x0 ) in the quadratic (double integral) term in (10.4),
Z 1
[V (1 ) V (0 )](2) = ( )(x0 )V (0 )( d (x( ) x0 ) x ( ))
0
Z 1 Z
+ ( )(x0 )V (0 ) d d x ( )x ( )(10.8)
0 0
The final observation we need is that the remaining integral is anti-symmetric in the
indices , , which follows immediately from
Z 1 Z 1
d
d (x ( )x ( ) + x ( )x ( ) = d
(x ( )x ( )) = 0 . (10.11)
0 0 d
It now follows from (10.10) and the definition of the Riemann tensor that
I
V (1 ) V (0 ) = 12 ( x dx )R (x0 )V (0 ) . (10.12)
Simply by raising and lowering of the indices, and using the symmetry properties of
the Riemann tensor, we can deduce that the corresponding equation for the parallel
tansport of vectors is
I
V (1 ) V (0 ) = 2 ( x dx )R (x0 )V (0 ) .
1
(10.13)
As an example, recall that in section 7.6 we already determined explicitly the parallel
transport of vectors on the 2-sphere along the circles with fixed = 0 . Choosing 0
infinitesimal corresponds to an infinitesimal loop around the north pole. Expanding the
result (4.114) for small 0 , in particular using
one finds complete agreement between (10.13) and the components of the Riemann
tensor of the 2-sphere, determined in (7.57),
r = sin2 , r = 1 , (10.15)
247
evaluated for 0 0. In verifying this, some care should be taken with the fact that =
0 is a coordinate singularity so that one should never strictly set 0 = 0. Alternatively,
and to be on the safe side, one can rewrite (10.13) as an equation for orthonormal frame
components and use the result (4.119) for the parallel transport of the frame components
(which is not sensitive to coordinate singularities).
We are now finally in a position to prove the converse to the statement that the
Minkowski metric has vanishing Riemann tensor. Namely, we will see that when the
Riemann tensor of a metric vanishes, locally there are coordinates in which the metric
is the standard Minkowski metric. Since the opposite of curved is flat, this then allows
one to unambiguously refer to the Minkowski metric as the flat metric (locally at least),
and to Minkowski space as flat space(-time).
So let us assume that we are given a metric with vanishing Riemann tensor. Then, by
the above, parallel transport is path independent and we can, in particular, extend a
vector V (x0 ) to a vector field everywhere in space-time: to define V (x1 ) we choose any
path from x0 to x1 and use parallel transport along that path. In particular, the vector
field V , defined in this way, will be covariantly constant or parallel, V = 0. We can
also do this for four linearly independent vectors Va at x0 and obtain four covariantly
constant (parallel) vector fields which are linearly independent at every point.
An alternative way of saying or seeing this is the following: The integrability condition
for the equation V = 0 is
V = 0 [ , ]V = R V = 0 . (10.16)
We will now use this result in the proof, but for covectors instead of vectors. Clearly
this makes no difference: if V is a parallel vector field, then g V is a parallel covector
field.
248
Now we solve the equations
Ea = 0 Ea = Ea (10.18)
with the initial condition Ea (x0 ) = ea . This gives rise to four linearly independent
parallel covectors Ea .
Ea = Ea . (10.19)
Summing this up, we have seen that, starting from the assumption that the Riemann
curvature tensor of a metric g is zero, we have proven the existence of coordinates a
in which the metric takes the Minkowski form,
a b
g = ab . (10.23)
x x
The argument given above is local in the sense that the existence of these coordinates
a is only guaranteed locally, i.e. in the neighbourhood of some point. Whether or
not these coordinates can be used to cover the space-time globally depends on gobal
(topological) properties of the space-time which are not captured by the intrinsic local
and locally determined Riemann tensor.
For example, imagine starting with Minkowski space R1,3 with inertial coordinates a ,
and then making a periodic identification of 1 , say,
in the new space-time the coordinate 1 , which is now an angular variable, is not
globally well defined,
and the space-time looks like Minkowski space only locally, not globally.
249
10.3 Curvature of Surfaces: Euler, Gauss(-Bonnet) and Liouville
We can generalise the example of the curvature of the 2-sphere, discussed in section
7.6, somewhat, in this way connecting our considerations with the classical realm of
the differential geometry of surfaces, in particular with the Gauss Curvature, the Euler
characteristic, the Gauss-Bonnet theorem and the Liouville Equation.
For any 2-dimensional metric gab it is a simple exercise to derive the relation between the
one independent component, say R1212 , of the Riemann tensor, and the scalar curvature.
First of all, the Ricci tensor is
R(gab ) = g ab Rab = g11 R2121 + g12 R1112 + g21 R2221 + g 22 R1212 . (10.26)
Using the fact that in 2 dimensions the components of the inverse metric are explicitly
given by !
1 g g
22 12
gab = (10.27)
g11 g22 g12 g21 g21 g11
and the (anti-)symmetry properties (1) and (2) of the Riemann tensor, one finds
2
R(gab ) = R1212 . (10.28)
g11 g22 g12 g21
This is precisely the relation (7.42) between the Riemann tensor and Ricci scalar. The
factor of 2 in this equation is a consequence of our (and the conventional) definition of
the Riemann curvature tensor, and is responsible for the fact that the scalar curvature
of the unit 2-sphere is R = +2. We can also write this result as
Rabcd = 21 (gac gbd gad gbc )R Rabcd = 12 (ac gbd ad gbc )R . (10.29)
In two dimensions, it is often convenient and natural to absorb this ubiquitous factor
of 2 into the definition of the (scalar) curvature, and what one then gets is the classical
Gauss Curvature
1
K := R(gab ) (10.30)
2
of a two-dimensional surface.
It follows from (10.29) that the Ricci tensor is related to the Ricci scalar by
This generalises the result for the standard metric on the 2-sphere found by explicit
calculation in section 7.6. It shows that in complete generality the Ricci tensor of a
two-dimensional space or space-time, thought of as the linear map
250
has only one (double) eigenvalue, namely the Gauss curvature K. It can also be inter-
preted as saying that in 2 dimensions the Einstein tensor (7.92) is identically zero,
We will now briefly look at two important and interesting consequenes of the above for-
mulae, one related to the Euler characteristic of a surface and its integral representation
(the Gauss-Bonnet theorem), and the other to the Liouville equation describing metrics
with constant Gauss curvauter K = k = 1.
Clearly, this areas depends on a choice of metric, and under a variation g of the
metric it transforms as
Z Z
2 1 2 ab
g A(Sh ) = g gd x = 2 gd x g gab . (10.35)
Sh Sh
g (Sh ) = 0 . (10.37)
Here are two rather explicit ways of establishing this remarkable result:
251
Then one finds
g ( gR) = (g g)g ab Rab + g(g ab )Rab ) + gg ab g Rab
= g( 12 Rgab + Rab )gab + ggab g Rab (10.40)
= gGab gab + ggab g Rab .
for some well-defined B a built from the covariant derivatives of the variations
of the metric, as in (19.19). Taken together, these two facts imply that for a
closed surface Sh (without boundary) one has
Z
1 2
g (Sh ) = gd x(Gab gab + a B a ) = 0 , (10.42)
4
as was to be shown.
(b) Alternatively, somewhat less covariantly but very explicitly, one can show
that the integrand gK or gR can itself locally be written as a total deriva-
tive. Indeed, using (10.29) to write
1 2
R1212 = 21 g11 R K= R (10.43)
g11 121
and simply writing out explicitly this Riemann curvature tensor component
in terms of the Christoffel symbols,
252
Either way we have seen that the real number (Sh ) is independent of the metric
one uses to calculate it. For example, for h = 0 and for the standard metric on
the sphere S 2 one finds
Z Z
1 1
(Sh=0 ) = (S 2 ) = gR = g=2 , (10.48)
4 2
and this will therefore be the result for any metric on S 2 . Likewise, for h = 1, i.e.
a torus, by choosing the flat metric on T 2 (see e.g. the discussion and construction
in section 17.1), one finds
(Sh=1 ) = (T 2 ) = 0 , (10.49)
and this will therefore be the result for any metric on T 2 (and it is instructive
to check this explicitly for the non-trivial, non-flat metric on T 2 induced by its
embedding into R3 constructed in section 17.1). I am not aware of an equally
elementary calculation to determine (Sh ) for h > 1 in this way but fact of the
matter is that
(Sh ) = 2 2h (10.50)
is the Euler characteristic of Sh , which can also be defined purely combinatorially
as the number
(S) = nF nE + nV (10.51)
of faces minus vertices plus edges of any cubist rendition of a surface S (and
(S) is independent of such a cubist realisation or triangulation). The remarkable
fact that this topological invariant of a surface S can be calculated in terms of
differential geometric quantities, namely as the integral of the curvature scalar, is
known as the Gauss-Bonnet theorem.
the calculation of the Riemann tensor is particularly simple and one finds the
(easy to memorise) results
Rxyxy = h (10.53)
and
K = e 2h h (10.54)
where = x2 +y2 is the 2-dimensional Laplacian with respect to the flat Euclidean
metric dx2 + dy 2 . Thus a surface with constant curvature K = k is given by a
solution to the non-linear differential equation
h + ke 2h = 0 . (10.55)
253
This is the (in-)famous Liouville equation, which plays a fundamental role in many
branches of mathematics (and mathematical physics).
In terms of the intrinsic Laplacian g associated to the metric gab , the Gaussian
cuvature and the Liouville equation can also simply be written as
K = g h , g h + k = 0 , (10.56)
since, due to the peculiarities of 2 dimensions, ggab in independent of h, i.e. is
conformally invariant (as we already observed in a different context in section 6.7,
cf. (6.120)),
gg ab = e 2h e 2h ab = ab
1 1 (10.57)
g = a ( gg ab b ) = a (ab b ) = e 2h .
g g
I will not attempt to say anything about the general (local) solution of this equa-
tion (which roughly speaking depends on an arbitrary meromorphic function of
the complex coordinate z = x + iy), but close this section with some special (and
particularly prominent) solutions of this equation.
dx2 + dy 2
ds2 = ( (x, y) R2 , y > 0 ) . (10.59)
y2
By the coordinate transformation y = ez this is mapped to the equivalent
metric
ds2 = dz 2 + e 2z dx2 (10.60)
dx2 + dy 2
ds2 = 4 . (10.62)
(1 + x2 + y 2 )2
254
This is the constant positive curvature metric on the Riemann sphere one
gets by stereographic projection of the standard metric on the two-sphere
S 2 to the (x, y)-plane.
In terms of polar coordinates (r, ) on the Euclidean plane, this metric
takes the form
dr 2 + r 2 d2
ds2 = 4 , (10.63)
(1 + r 2 )2
and the further change of variables r = tan /2 shows that this is indeed
the standard line element d2 on the 2-sphere,
Read backwards, this can also be read as the statement that via the
above change of variables the Euclidean metric on R2 can be written as
(1 + r()2 )2 1
dr 2 + r 2 d2 = (d 2 + sin2 d2 ) = d2 . (10.65)
4 4 cos4 /2
dx2 + dy 2
ds2 = 4 ( {x, y} R2 , x2 + y 2 < 1 ) . (10.66)
(1 (x2 + y 2 ))2
This is the Poincare disc model of the hyperbolic geometry, defined in
the interior of the unit disc in R2 . In terms of polar coordinates, it can
also be written as
dr 2 + r 2 d2
ds2 = 4 (0r<1) (10.67)
(1 r 2 )2
The two metrics (10.59) and (10.66) are isometric, i.e. related by a (albeit
not completely evident) coordinate transformation.
(c) As our final example, one other solution (given here only for k = 1) is
e 2x
e 2h(x, y) = e 2h(x) = 4 . (10.68)
(1 e 2x )2
255
where
x = log tanh(/2) dx = d/ sinh . (10.71)
e 2x
sinh2 (x) = 4 , (10.72)
(1 e 2x )2
It is worth remarking that the Poincare upper-half plane model of a space with constant
negative curvature readily generalises to arbitrary dimensions and signature. Thus
d~x2 + dy 2
ds2 = , d~x2 = ab dxa dxb or d~x2 = ab dxa dxb (10.73)
y2
The Lorentzian metric will reappear later as a solution to the Einstein equations with a
negative cosmological constant, and is in this context known as the anti-de Sitter metric
(in Poincare coordinates, which cover only a part of the complete space-time), and we
will discuss this solution in some detail in section 38.
In section 7 from the Riemann tensor we have extracted its traces, the Ricci tensor and
the Ricci scalar, as well as a particular linear combination of them, the Einstein tensor.
We can therefore also explicitly decompose the Riemann tensor into these trace parts
and the remaining traceless part.
We noted in section 7.5 that for D = 2 and D = 3 the Riemann tensor would be pure
trace, i.e. could be written entirely in terms of the Ricci tensor and Ricci scalar. For
D = 2 we have already established this explicitly by proving the relation (10.29),
D=2: R = 12 (g g g g )R (10.74)
in section 10.3.
We now look at this issue for D 3. Simply by linear algebra one finds, for D 3, the
decomposition
R = C
1
+ (g R + R g g R R g ) (10.75)
D2
1
R(g g g g ) .
(D 1)(D 2)
256
This definition is such that C has all the symmetries of the Riemann tensor (this is
manifest) and such that all of its traces are zero, i.e.
C = 0 , (10.76)
as is easily verified. This traceless part C of the Riemann tensor is called the Weyl
tensor.
Occasionally it is more convenient and transparent to decompose the Riemann tensor not
into the Weyl tensor, the Ricci tensor and the Ricci scalar, but to perform an orthogonal
decomposition (with respect to the metric) into the Weyl tensor, the traceless part S
of the Ricci tensor,
1
S = R g R , (10.77)
D
and the trace R. Then the decomposition becomes
R = C
1
+ (g S + S g g S S g ) (10.78)
D2
1
+ R(g g g g ) .
D(D 1)
One other common and convenient decomposition is in terms of a tensor P such that
(10.75) takes the form
R = C + (g P + P g g P P g ) . (10.79)
Regardless of how we write the trace part of the Riemann tensor, it turns out that for
D = 3 the Weyl tensor vanishes identically,
D=3: C 0 (10.81)
(I will give an elementary proof of this momentarily). Therefore, for D = 3 one has the
decomposition
D=3: R = (g R + R g g R R g )
(10.82)
+ 12 (g g g g )R
This is precisely the result claimed previously in (7.43).
To establish (10.81), in order to trivialise the algebra let us fix a point x0 and choose
coordinates there such that g (x0 ) = (or , depending on the signature of the
metric, but let us assume that we are in the case of Euclidean signature - the same argu-
ment works in the Lorentzian case). Now the proof consists of the following elementary
steps:
257
Since we are in D = 3, at least two of the indices in C must be equal. Since the
Weyl tensor has all the symmetries of the Riemann tensor, if more than two indices
are equal, the Weyl tensor component is zero. Thus we only need to consider the
components where 2 indices are equal and we can without loss of generality choose
these to be C11 , say, with , 6= 1.
Thus all in all the Weyl tensor can have only 3 independent non-zero components,
namely C1212 = C2121 , C1313 = C3131 , C2323 = C3232 , and they are all required to
be pairwise negatives of each other. This is impossible for non-trivial C ,
and implies that all of the components of the Weyl tensor are identically zero in
D = 3.
Thus the Weyl tensor is only non-trivial for D 4. Using the Bianchi identies discussed
in section 7.8, in particular also (7.89),
R = R R (10.86)
from (10.75) one finds a simple expression for the divergence, namely
C = (D 3) ( P P ) . (10.87)
The tensor appearing on the right-hand side also has its own name. It is called the
Cotton Tensor C ,
C = P P . (10.88)
The content of (10.87) is evidently trivial in D = 3, but the Cotton tensor itself is not
(and I will briefly come back to this below).
The Weyl tensor plays an important role in many aspects of gravitational physics:
1. For example, the Weyl tensor has traditionally been one of the central objects
of interest in the invariant algebraic classification of gravitational fields and in
258
the characterisation of what are known as algebraically special solutions to the
Einstein equations (the so-called Petrov classification and related procedures).
Originally, this was (of course) developed for D = 4, and this case has a number of
special features. It is based on the classification of the properties of the eigenvalues
of the Weyl tensor (at a point x0 ), thought of as a map on the space of anti-
symmetric (2, 0)-tensors (bivectors),
1
2 C X = X (10.89)
or
C AB X B = X A (10.90)
with C CAB thought of as a symmetric (6 6) matrix.25
An equivalent (as it turns out) classification arises from determining the number
and multiplicity of linearly independent null vectors satisfying the condition
[ C][ ] = 0 . (10.91)
Such are called the principal null directions of the Weyl tensor. More recently,
this classification scheme (based on the latter approach) has been (partially) ex-
tended to higher dimensions.26
2. As we will see in section 18.6, the Einstein equations imply that the Weyl tensor
describes the gravitational field in vacuum. Specifically, when (or where) the
energy-momentum tensor is zero, the Riemann curvature tensor is equal to the
Weyl tensor,
T (x) = 0 R (x) = C (x) . (10.92)
The Weyl tensor thus encodes the information about things like gravitational
waves and the asymptotic behaviour of a gravitational field and has been studied
extensively from this point of view.
3. In the presence of matter, on the other hand, (10.87), in conjunction with the
Einstein equations, becomes an evolution equation for these vacuum components
of the gravitational field in terms of the sources - see equations (18.51) and (18.52).
The Weyl tensor also plays an important role in geometry, as it is conformally invariant,
i.e. C is invariant under conformal (Weyl) rescalings of the metric,
259
equivalently
In particular, the Weyl tensor is zero if the metric is conformally flat, i.e. related by a
conformal transformation to the flat metric (of any signature),
This can be established by brute force calculation and is not per se particularly enlight-
ning.
Conversely for D 4 vanishing of the Weyl tensor is also a sufficient condition for a
metric to be (locally) conformal to the flat metric. This is a non-trivial result because
at face value one seems to obtain a completely overdetermined system of equations for
the single function f , of the form
However, it turns out that the integrability conditions for this system of equations are
equivalent to the vanishing of the Weyl tensor, and then a variant of the Frobenius
integrability theorem (mentioned in a different context in section 14.5) can be used to
establish the local existence of a solution f .
For D = 3, the situation is slightly (but not fundamentally) different. We see from
(10.88) that for any D 4 conformal flatness implies vanishing of the Cotton tensor.
It turns out that for D = 3 the Cotton tensor takes over the role of the Weyl tensor
(which, as proven above, is itself trivial for D = 3), i.e. one has the statement that for
D = 3 a metric is (locally) conformally flat if and only if the Cotton tensor vanishes.
In section 4.4 we had seen that the Levi-Civita connection (defined by the Christoffel
symbols) is characterised by the fact that
2. the torsion is zero, i.e. the second covariant derivatives of a scalar commute.
It is of course possible to relax either of the conditions (1) or (2), or both of them and,
in particular, connections with torsion (relaxation of condition 2) are popular in certain
circles and/or arise naturally in certain generalised (gauge) theories of gravity and in
string theory.
260
with the canonical Levi-Civita connection, and C a (1, 2)-tensor. We will also
use the corresponding (0, 3)-tensor
C = g C . (10.98)
V = V +
V (10.99)
etc. The reason for this choice is that one should think of the collection of objects
(and ) as the coefficients of a matrix-valued 1-form (cf. section 3.6) = dx ,
the matrices acting by rotation on vectors (and more general tensors), as in (4.20).
,
[ ] = T , T = g T , (10.100)
g = Q .
(10.101)
T = C C = 2C[]
(10.102)
Q = C + C = 2C() .
Thus the torsion is zero iff C (and hence ) is symmetric in its lower indices, and
the connection is compatible with the metric iff C is anti-symmetric in its first two
indices. In particular, if the torsion is zero and the connection is metric-compatible, one
has
C = C and C = C C = 0 , (10.103)
C = C = C = C = C = C = C . (10.104)
Conversely, since the absence of torsion and non-metricity characterises the Levi-Civita
connection, it should be possible to express the deviation C from the Levi-Civita
connection entirely in terms of torsion and non-metricity. This is indeed the case. By
repeating the calculation (4.43) in this more general context, one finds
2C() = Q + Q Q T T . (10.105)
261
Combining this with 2C[] = T , one obtains
C = 12 (T T T ) + 12 (Q + Q Q )
(10.106)
T + Q ,
with
T = 12 (T T T ) = T (10.107)
and
Q .
= 1 (Q + Q Q ) = Q (10.108)
2
Thus we can now split a general connection more informatively into the 3 pieces
= + T + Q
. (10.109)
Remarks:
Q = 0 = + T .
(10.110)
but it cannot be symmetric (if the contorsion were symmetric, the torsion, and
hence the contorsion, would be zero). If its symmetric part vanishes, then T is
completely anti-symmetric,
= 12 g, + 12 (g, g, ) , (10.113)
one might be tempted to think that that part can be cancelled (or absorbed)
by a metric-compatible C = C[] , so that a very simple metric-compatible
connection would be
? 1
= = ? 1
2 g, , 2 g g, . (10.114)
However, the term that one has canclled (or absorbed) is not a tensor. Therefore,
this candidate connection does not transform as (and therefore does not qualify
as) a connection and cannot be used to define a covariant derivative.
262
the notions of autoparallels (section 4.8),
3. In general, for a connection ,
X = 0
x x x = 0 ,
+ (10.115)
(i.e. curves characterised by the fact that their tangent vectors are parallel trans-
ported along the curve - this depends on a choice of connection) no longer coincides
with the notion of geodesics (which are obtained by extremising proper time or
distance, and which always lead to the Levi-Civita connection). However, this
difference disappears if C happens to be anti-symmetric in its lower indices
(e.g. for a metric-compatible connection with totally anti-symmetric contorsion
tensor), as one then has
x x = x
+
x + x x . (10.116)
We have defined the Riemann tensor via the commutator of covariant derivatives (7.2)
[ , ]V = R V (10.117)
R = + . (10.118)
In order to show explicitly (rather than by appealing to (10.117)) that this transforms as
a tensor, all that one needs is the characteristic non-tensorial transformation behaviour
of the Christoffel symbols . As discussed in section 4.4 and above, an arbitrary
connection that can be used to define a tensorial covariant derivative has the same
non-tensorial transformation behaviour. Therefore
R ()
R =
+
(10.119)
.
defines a tensor for any connection, namely the curvature tensor of the connection
It is related to the commutator of covariant derivatives by
,
[ ]V = R
V + (
) V = R
V + T V , (10.120)
where T is the torsion tensor. As before, one can also define the Ricci tensor and
Ricci scalar by
R ()
R =R
, R()
R = g R
. (10.121)
However, it is crucial to keep in mind that the symmetry properties and Bianchi iden-
tities satisfied by these generalised curvature tensors will in general differ from those of
the Riemann-Christoffel tensor. This should be clear from the way we derived the sym-
metries of the Riemann tensor in section 7.3, where we related the symmetries to the
properties (metricity, no torsion) that characterise the canonical Levi-Civita connection
263
(Christoffel symbols). For example, in general the Ricci tensor will not be symmetric,
[] to the
the Bianchi identity R[] = 0 will be replaced by an identity relating R
torsion (and its covariant derivative), etc.27
For some further discussion of connections with non-metricity or torsion and their cur-
vature tensors see section 19.7.
27
For more on this and related topics, see e.g. section 1 of T. Ortin, Gravity and Strings.
264
11 Curvature III: Curvature and Geodesic Congruences
In section 7.4 we had already encountered the so-called geodesic deviation equation
(7.36),
(D )2 x = R x x x , (11.1)
describing the evolution of a separation (or deviation) vector along a given geodesic. In
this section we will rederive this result in a more satisfactory and covariant manner and
also use the same covariant framework to discuss the extension of these results to the
so-called Raychaudhuri equation, which descibes the focussing properties of congruences
of geodesics.
u u = 0 , (11.2)
[u, ] = u u = 0 D = u . (11.3)
so the matrix B describes the evolution and deformation of the deviation vector
along the geodesic. Because u is affinely geodesic, it satisfies
B u = u u = 0 (11.8)
and
u B = 12 (u u ) = 0 , (11.9)
265
and is thus transverse to u . This is a crucial property we will come back to in the
discussion of the Raychaudhuri equation below. As a consequence one has
u D = 0 (11.10)
d
(u ) = D (u ) = u D = 0 . (11.11)
d
This means that the u-component of a geodesic deviation vector in the sense of u
is simply constant and contains no interesting information about the geodesic itself.
In the timelike case this means that a vector of the form = u is a deviation vector
only if is constant, and then is simply a translation along the geodesic and therefore
not a deviation vector of interest (and certainly anyhow not a vector of the kind one
has in mind when thinking about a deviation vector, which should point away from the
geodesic). In the null case, the interpretation is slightly different (and we will return to
this in section 11.4), but the fact that u is simply constant for a deviation vector
remains, and we can without loss of information choose the deviation vector to satisfy
the condition u = 0.
(D )2 = (D B ) + B D
(11.12)
= (D B + B B ) .
For the term in brackets we find, using the geodesic equation for u ,
D B + B B = u u + ( u ) u
= u u + (u u ) u u (11.13)
= u ( )u = R u u ,
and plugging this back into (11.12), we obtain straightaway the covariant version (7.36)
of the geodesic deviation equation in the form
(D )2 = R u u . (11.14)
u (D )2 = R u u u = 0 . (11.15)
Remarks:
266
1. I hope you agree that this derivation is somewhat more satisfactory than the one
given in section 7.4.
2. The object we have called B in (11.6) and its evolution equation (11.13) will
play a central role in our derivation of the Raychaudhuri equation below.
4. If the curve is not a geodesic (but still parametrised by proper time, so that
u u = 1), then the above derivation shows that in addition to the force exerted
by the space-time curvature the deviation vector feels a force proportional to the
change of the acceleration a = u u along the curve,
(D )2 = R u u + D a . (11.16)
In flat space, only the last term is present and describes the (tidal) forces arising
from the possible non-uniformity of the external force acting on the particle (or,
better: on the extended object described by a family of worldlines) to produce
the acceleration a . Thus, in precise analogy with the Newtonian situation, the
gravitational (i.e. here now Riemann curvature tensor) contribution to the geodesic
deviation equation should be interpreted as the gravitational tidal force.
Manipulations similar to those leading to (11.14) allow one to derive an equation for
the rate of change of the divergence u of a family of geodesics along the geodesics.
This simple result, known as the Raychaudhuri equation, has important implications and
ramifications in general relativity, in particular in the context of the so-called singularity
theorems of Penrose, Hawking and others, none of which will, however, be explored here
(see footnote 93 of section 28.3 for some references).
Thus u now denotes a tangent vector field to an affinely parametrised geodesic con-
gruence, u u = 0 (and u u = 1 or u u = 0 everywhere for a timelike or null
congruence). As in section 11.1, we introduce the tensor field (11.6)
B = u . (11.17)
267
Recall from section 11.1 that B satisfies (11.8), (11.9),
B u = u B = 0 (11.18)
and therefore only has components in the directions transverse to u . Its trace
= B = g B = u (11.19)
The key equation governing the evolution of B along the integral curves of the geodesic
vector field is (11.13)
D B + B B = R u u . (11.20)
By taking the trace of this equation, we evidently obtain an evolution equation for the
expansion , namely
d
= ( u )( u ) R u u . (11.21)
d
Note that this equation, written in the form
u ( u ) + ( u )( u ) + R u u = 0 . (11.22)
To gain some more insight into the geometric significance of this equation, we now
consider the case that the geodesic congruence u is timelike and normalised in the
standard way as u u = 1 (so that is proper time).
h = g + u u . (11.23)
The properties of this tensor are closely related to those of the (induced metric) tensor
h = g N N (15.1) studied in section 15.1 in the context of hypersurfaces.
The main difference in the present context is that u is not necessarily hypersurface-
orthogonal (section 14.5) and therefore, in particular, not necessarily a normal vector
field to a familiy of spacelike hypersurfaces. Therefore h does not necesarily have an
interpretation as the induced metric on some hypersurface. Nevertheless, pointwise it
can be interpreted as a metric on the space of vectors transverse to the geodesic and its
purely algebraic properties are identical to those of the induced metric.
In particular,
u h = h u = 0 . (11.24)
268
It can therefore be interpreted as the spatial projection of the metric in the direc-
tions orthogonal to the timelike vector field u . This can be seen more explicitly
in terms of the projectors
h = + u u
h h = h . (11.25)
h u = 0 , (11.26)
h = . (11.27)
satisfies
u t... = . . . = u t... = 0 . (11.29)
g g h h = g + u u = h , (11.30)
as anticipated above. Whereas for the space-time metric one obviously has g g =
4, the trace of h is (in the 4-dimensional case)
g h = g g + g u u = 4 1 = 3 = h h . (11.31)
Thus for an affinely parametrised congruence the properties (11.8) and (11.9) show that
B is automatically a spatial or transverse tensor in the sense above,
b h h B = B . (11.32)
Note that the affine parametrisation of the timelike geodesic congruence, expressed
by the normalisation condition u u = 1, is crucial for this entire set-up, since the
projection operator requires a unit vector field. This is to be contrasted with the
situation for null geodesic congruences , to be discussed below, where the property
= 0 is independent of the parametrisation and one can (and we will) also consider
the case of non-affine parametrisations.
269
In the spirit of elasticity theory, we now decompose b into its anti-symmetric, sym-
metric traceless and trace part,
b = + + 13 h , (11.33)
with
1
= 2 (b b )
1
= 2 (b + b ) 31 h
= h b = g B = u . (11.34)
The quantities , and are known as the rotation tensor, shear tensor, and
expansion of the congruence (family) of geodesics defined by u .
In terms of these quantities we can write the evolution equation (11.7) for deviation
vectors as
D = + + 31 , (11.35)
and the evolution equation (11.21) for the expansion as
d
= 31 2 + R u u . (11.36)
d
This is the Raychaudhuri equation for timelike geodesic congruences.
Remarks:
= h b = h B = h u
(11.37)
= 12 h ( u + u ) = 12 h Lu g
where Lu denotes the Lie derivative along the vector field u. Substituting g =
h u u , one finds
= 12 h Lu (h u u ) = 21 h Lu h . (11.38)
270
a bit of care. When the congruence is hypersurface orthogonal, with the induced
metric (15.6)
hab = Ea Eb h , (11.41)
then (11.40) with h = det(hab ) follows from (11.38), because
Here we have made use of the fact that u and Ea have vanishing Lie bracket,
because (introducing and y a as coordinates, instead of the x )
x x
u = , Ea = (11.43)
y a
and the Lie bracket gives the commutator of the second partical derivatives of x .
When the congruence is not hypersurface orthogonal, one can still construct a
transverse cross-sectional volume, but one can only choose it to be orthogonal at
a given geodesic. Introducing in a neighbourhood of a point on this geodesic coor-
dinates y a labelling the geodesics, as well as the parameter along the geodesic,
the above calculation will then still go through.28
2. If required and desired, from (11.20) similar (but somewhat less transparent)
equations can be derived for the evolution of the shear and rotation tensors along
the geodesic congruence, i.e. for (d/d ) and (d/d ) .
3. Since and are purely spatial tensors, their squares are non-negative,
0 , 0 , (11.44)
with = 0 only for = 0 (and likewise for the rotation). They thus enter
the Raychaudhuri equation with opposite signs.
4. In the presence of both these terms it is difficult to say something general about the
evolution of . Since the first term ( 2 /3) is non-positive, an important special
case of the Raychaudhuri equation arises when the rotation is zero, = 0. This
happens for example when u = S is the gradient co-vector of some function
S. In this case u is orthogonal to the level-surfaces of S. In fact, more generally
we have the statement that
u[ u] = 0 u + u + u = 0 . (11.46)
28
For a more careful proof of this statement see the discussion in section 2.4.8 of E. Poisson, A
Relativists Toolkit.
271
Contracting this with u and using u u = 1 and u = 0, only the first term
survives and one finds on the nose that = 0,
u[ u] = 0 = 0 , (11.47)
and the Frobenius theorem provides one with the converse statement. Alter-
natively, = 0 follows from assuming that u has the explicit hypersurface-
orthogonal form u = f S. Then one has (14.52)
f = (u f )u u = 0 . (11.49)
5. Either way, for a hypersurface orthogonal congruence of timelike geodesics one has
d
= 31 2 R u u . (11.50)
d
The first two terms on the right hand side are manifestly non-positive (recall that
is a spatial tensor and hence 0). Thus, if one assumes that the
geometry is such that
R u u 0 (11.51)
(by the Einstein equations to be discussed in the section 18, this translates into a
positivity condition on the energy-momentum tensor known as the strong energy
condition, cf. section 21.1), one finds
d
= 31 2 R u u 0 . (11.52)
d
This means that the divergence (convergence) of geodesics will decrease (increase)
in time. The interpretation of this result is that gravity is an attractive force (for
matter satisfying the strong energy condition) whose effect is to focus geodesics.
6. According to (11.52), d/d is not only negative but actually bounded from above
by
d
13 2 . (11.53)
d
Rewriting this equation as
d 1 1
, (11.54)
d 3
one deduces immediately that
1 1
+ . (11.55)
( ) (0) 3
This has the rather dramatic implication that, if (0) < 0 (i.e. the geodesics are
initially converging), then ( ) within finite proper time 3/|(0)|,
272
7. If one thinks of the geodesics as trajectories of physical particles, this is obviously
a rather catastrophic situation in which these particles will be infinitely squashed.
In general, however, the divergence of only indicates that the family of geodesics
develops what is known as a caustic where different geodesics meet.
8. Simple non-catastrophic examples of such caustics are e.g. the poles of a sphere
where great circles meet, or even just the origin in Euclidean space Rn when
considering the family of radial geodesics passing through the origin. E.g. in the
latter case the tangent vector field is simply r , and its divergence is
1
(r ) = ( g(r ) ) r 1 , (11.57)
g
9. Nevertheless, the above result plays a crucial role in establishing the occurrence of
true singularities in general relativity if supplemented e.g. by conditions which en-
sure that such harmless caustics cannot appear, as this means that the geodesic
cannot be extended to where one would find . This kind of argument
(leading to the conclusion of geodesic incompleteness of a space-time) is one of the
typical ingredients of the singularity theorems of general relativity (see footnote
93 of section 28.3 for some references).
10. The adaptation of this formalism in general and the Raychaudhuri equation in
particular to congruences of null geodesics requires some more care (and is ul-
timately expressed in terms of 2-dimensional rather than 3-dimensional spatial
tensors), and we will discuss this in section 11.4 below.
In section 11.4 we will derive the null counterpart of the Raychaudhuri equation for
timelike geodesic congruences discussed in section 11.2 above. The set-up we will use
is a suitable combination of that for timelike geodesics and the formalism of projectors
adapted to null directions. As a preparation for this, and a useful by-product, in this
section we will first derive a variant of the geodesic deviation equation for null geodesics,
the transverse null geodesic deviation equation.
Thus we consider a null geodesic (or congruence of null geodesics), with tangent vector
field , and we will initially choose these null geodesics to be affinely parametrised so
that one has
= 0 , = 0 . (11.58)
273
The affine parameter along the null geodesics of this congruence will (for lack of imagi-
nation) be called .
Now recall from the discussion of the geodesic deviation equation in section 11.1 that
for any geodesic deviation vector , i.e. a vector satisfying the condition
D = u , (11.59)
2. In the null case, however, the condition = 0 does not accomplish this, i.e.
does not remove the component of pointing in the direction of because it
imposes no condition precisely on that component. Thus we expect the deviation
vector to have two uninteresting components in the null case, and the
component of in the direction of :
= (11.62)
274
Therefore it is natural to project out both these components from . In order to
construct a suitable projection operator, one can proceed as in section 16.4 and introduce
a complementary null vector (field) n with
n n = 0 , n = 1 . (11.64)
Then
= + . . . = n (11.65)
and we can elininate both boring components by imposing the transversality conditions
= n = 0 (11.66)
= . (11.67)
As in (11.6) we introduce
B = , (11.68)
D = B . (11.69)
B = B = 0 , (11.70)
but B is not automatically orthogonal to n (and we will come back to and recitify
this below). Exactly the same calculation as (11.13) in section 11.1 now shows that
D B + B B = R (11.71)
(D )2 = R . (11.72)
(D )2 = R = 0 , (11.73)
275
Associated with a choice of n we have a decomposition of the metric into a
longitudinal and a transverse spatial part,
g = s ( n + n ) , (11.74)
and
g s = s s = s = 2 . (11.76)
s = + ( n + n ) : s s = s
(11.77)
s = s n = 0 ,
With the aid of these projectors, we can now write the fully projected version of (11.69)
as
s D (s ) = b (11.78)
b = s s B . (11.79)
Likewise the purely transverse (to and n) variant of the null geodesic equation (11.72)
can be written as
s (D )2 (s ) = s s R . (11.80)
While this is essentially the final result, it is not particularly transparent yet. We will
put this equation into a somewhat more attractive form below, in which manifestly only
the transverse components of the deviation vector and R appear.
First of all, note that the auxiliary normal vector n is not unique. For a fixed choice
of , at a point on the geodesic, that is for a given value of , it is uniquely determined
up to null rotations around ,
, n n + a Ea + 12 2 , Ea Ea + a , (11.81)
Then the properties of parallel transport obtained in section 4.8 imply that the con-
ditions (11.64) on n hold everywhere along the null geodesic (or congruence of null
276
geodesics) if they are satisified initially. This reduces the ambiguity in (11.81) to -
independent null rotations.
In fact, one can do even better than that and choose (see also the discussion at the end
of section 16.4, in particular around (16.57)) an entire pseudo-orthonormal frame
{EA } = {E+ = , E = n, Ea } : g EA EB = AB (11.83)
where
++ = = 0 , + = 1 , a+ = a = 0 , ab = ab . (11.84)
If one selects such a frame at one point along the geodesic and then parallel transports
the frame along the geodesic, the orthogonality relations (11.83) will hold everywhere
along the geodesic. Thus we can always choose a basis EA such that
D EA =0 , g EA EB = AB . (11.85)
With this choice, a transverse geodesic deviation vector is simply one which has com-
ponents only in the Ea -directions,
= n = 0 = a Ea , (11.86)
or simply
= a Ea . (11.87)
d2 a
= Ra++b b = Ra+b+ b , (11.89)
d 2
where the Ra+b+ are the frame components of the Riemann tensor,
Ra+b+ = Ea E+ Eb E+ R = Ea Eb R . (11.90)
Thus the transverse null geodesic deviation equation has the form of a (D2)-dimensional
(transverse) harmonic oscillator equation,
d2 a
= (2 )ab b , (11.91)
d 2
with the time-dependent symmetric frequency matrix
277
The notation used here is perhaps suggestive but it is not meant to imply that 2 is
necessarily positive - the frequencies can be real or imaginary. Using the decomposition
(10.75) of the Riemann tensor into its traceless and trace parts, we can (with D = 4,
a+ = ++ = 0, ab = ab ) decompose Ra+b+ as
In particular, if the Ricci tensor is zero (as we will see this means that the metric
solves the vacuum Einstein equations), the frequency matrix 2 is symmetric traceless
and thus necesarily has positive and negative eigenvalues (corresponding to real and
imaginary frequencies).
We now consider a null geodesic congruence, with tangent vector field again denoted by
, and we will initially choose these null geodesics to be affinely parametrised so that
one has
= 0 , = 0 . (11.94)
We use the same framework as in the previous setion, with an auxiliary null vector field
n with n = 1, the associated projectors etc.
b = s s B . (11.95)
Performing this projection explicitly, one sees that this spatial projection b is equal
to
b = s s B = B + n B + n B + n n B . (11.96)
This has two useful immediate consequences that we will make use of in the following,
namely
that the spatial trace of b with respect to s is equal to the space-time trace
of B (with respeect to g ),
g B = g b = s b , (11.97)
B B = b b . (11.98)
We can now, as in the timelike case, decompose b orthogonally into its irreducible
(trace, symmetric traceless, anti-symmetric) parts,
b = 12 s + 21 (b + b s ) + 12 (b b )
(11.99)
= 12 s + + .
278
Here is the expansion
= s b = s = g = , (11.100)
Remarks:
= 12 s L s . (11.101)
2. The equivalence between the spatial and space-time traces of in the above
equation is due to the fact that we have chosen to be affinely parametrised. We
will always define to be the spatial trace (divergence) of , even when is
not affinely parametrised, but in that case and are no longer equal (see
(11.126)). We will return to this issue below.
3. As regards the other terms, and are again known as the shear tensor and
rotation tensor respectively.
B B = b b = + 21 2 + . (11.103)
6. Because the tensors appearing on the right-hand side of this equation are spatial
tensors, their squares are non-negative,
0 , 0 . (11.104)
= R ( ) + ( ) .
279
The 2nd term is just B B and the 3rd term is zero because is geodesic. Thus
one finds the Raychaudhuri equation for null congruences
d
= R 12 2 + . (11.107)
d
Using (11.102) in the form
d
s = s , (11.108)
d
we can also write this as an equation for the change in the expansion rate of the cross-
sectional area s of the congruence. This leads to an additional +2 in the evolution
equation, and thus flips the sign of the 2nd term of (11.107), resulting in
d2
1 2
s = R + 2 + s . (11.109)
d 2
Remarks:
3. Analogously to the timelike case, (11.112) has the consequence that if one has an
initially converging null congruence, (0 ) < 0, then because of
d 1 1 0
12 2 + (11.113)
d ( ) (0 ) 2
1/( ) 0 or ( ) at the latest at
280
(if the geodesics can be extended that far). As in the timelike case, this usually
indicates the formation of a (harmless) caustic where these null geodesics cross.
5. An argument similar to that in the timelike case shows that the rotation vanishes
if (and locally only if, by Frobenius) is hypersurface orthogonal
= 0 hypersurface-orthogonal . (11.119)
281
6. The expansion properties of families of null geodesics play a crucial role both in
the singularity theorems of general relativity (where for example so-called trapped
surfaces are characterised by negative expansions for both ingoing and outgoing
families of lightrays), and in the study of black holes and the laws governing the
evolution of their event horizons (where the interest is in the null geodesic congru-
ences generating the horizon). In particular, in the latter case the Raychaudhuri
equation is one crucial ingredient in the proof of the statement (Hawkings theo-
rem) that under reasonable conditions the cross-sectional area of the event horizon
of a black hole cannot decrease.
Let us now look at the case when the null geodesic congruence is not affinely parametrised,
i.e. when, instead of (11.94), the starting point is a null vector field satisfying
= 0 , = , (11.123)
with the inaffinity. Then a couple of things change in the derivation, but the end
result (11.129) turns out to differ from (11.107) by only one term (and I will give an
alternative and much quicker derivation of the result below).
As before, we can choose an auxiliary null vector field n , construct the projectors s
etc. Defining again B = , one still has B = 0 (because this is implied by
= 0), but instead of B = 0 one now has
B = . (11.124)
While the projection (11.96) remains unchanged, i.e. the relation between b and B
has the same form as in (11.96), the equations (11.97) and (11.98) for the trace and
square of B differ. Instead of (11.97) one has
s b = (g + n + n )B
(11.125)
= + n = .
B B = b b + 2 . (11.127)
282
Putting everything together and calculating (d/d ) as in (11.106), one then finds
d
( + ) = R ( ) + ( )
d
= R B B + ( )
d (11.128)
= R b b 2 + + ( + )
d
d
= R b b + + .
d
Thus the net effect of dealing with a non-affinely parametrised null congruence is that
one just picks up one additional term on the right-hand side of the Raychaudhuri equa-
tion,
d
= R 21 2 + . (11.129)
d
A quick(er) way to derive (11.129) is from the result (11.107) for affinely parametrised
null geodesics, by determining how the quantities appearing in (11.107) change under a
reparametrisation
= f (11.130)
On the other hand for the expansion parameter etc one deduces from
= = f B + f
B (11.133)
b = f b (11.134)
which implies
= f
b = f b
= f (11.135)
= f
Plugging these results into (11.107) one obtains on the nose (11.129) (with ,
etc.).
283
11.6 Expansions and Inaffinities of Radial Null Congruences
In this section, we look at some general properties of radial null congruences in a spher-
ically symmetric space-time. All of the results of the previous sections 11.4 and 11.5
are of course also valid in this case, but the spherically symmetric case also has some
special and simplifying features.
Thus we consider a spherically symmetric metric. Such a metric could always be written
in the form
ds2 = A(t, r)dt2 + B(t, r)dr 2 + r 2 d2 (11.136)
by a suitable choice of coordinates. However, we will not need to commit ourselves to
this particular choice of coordinates. By making an arbitrary coordinate transformation
preserving the manifest spherical symmetry, this metric can be written in the form
for some 2-dimensional Lorentzian metric gab (z), and with r = r(z a ) now a function of
the new coordinates.
We now consider two linearly independent radial and spherically symmetric null vector
fields and n , which we choose to be cross normalised such that
= n n = 0 , n = 1 . (11.139)
Remarks:
1. Here radial means that it has components only in the z a -directions transverse
to the sphere, and spherically symmetric that the coefficients only depend on
the z a and not on the coordinates of the sphere (this can of course also, if desired,
be phrased in a more coordinate-independent way, e.g. as the statement that the
Lie derivatives of and n along the Killing vectors generating the rotational
symmetry vanish, but for present purposes not much is gained by this).
2. In concrete applications we will choose n to be ingoing (in the sense that future
directed null rays tangent to n will move towards smaller values of r) and to
be (asymptotically) outgoing.
3. The minus sign in the cross normalisation is such that both vector fields are either
future or past oriented (and we will of course choose the former).
4. Note that the individual normalisation of the and n is not fixed by the above
conditions, i.e. one can still perform the boost
284
This can e.g. be used to select a preferred normalisation for one of them. If
has been fixed, then, in spherical symmetry and with the assumption that n is
also purely radial (longitudinal), n is uniquely determined by the 2 conditions
n n = 0 and n = 1. This should be contrasted with the situation without
spherical symmetry where, as discussed in section 11.3, there is still the additional
freedom to perform null rotations on n .
Spherical symmetry (and the choice of spherically symmetric null vector fields) also has
other implications. For instance, it follows from spherical symmetry that will
be some linear combination of and n (i.e. no component tangent to the transverse
sphere),
= A + Bn (11.141)
(and likewise for n ). Taking the scalar product with and using
( ) = 12 ( ) = 0 , (11.142)
= , n n = n n . (11.143)
The boost freedom can then e.g. be used to choose either or n to be affinely
parametrised (but usually not both of them simultaneously).
n = n , n = n . (11.144)
= n = n
(11.145)
n = n n = n n ,
or
= 21 ( n + n ) = 12 Ln g
(11.146)
n = 12 n n ( + ) = 12 n n L g
Here L and Ln are the Lie derivatives. Thus the inaffinities encode the information
about the longitudinal projections of the derivatives and n , or of the Lie
derivatives L g and Ln g .
285
Other useful information is contained in the transverse (i.e. parallel to the sphere)
projections of these objects. To define them, note that, as in section 11.3, associated
with a choice of and n we have the decomposition of the metric
g = s ( n + n ) (11.147)
with s the transverse spatial metric (on the sphere),
s dx dx = r(z)2 d2 , (11.148)
but that in the current context this decomposition and the corresponding projectors s
are now unique as the combination n is boost-invariant.
The expansions of and n are defined as the transverse spatial projections of the
divergence of respectively n , i.e.
= s , n = s n . (11.149)
As in (11.101) and (11.102) of section 11.4, these can be written as
1
= 21 s L s = L s
s
(11.150)
1
n = 12 s Ln s = Ln s
s
With s = r(z)2 sin , one finds more explicitly
2 2
= r , n = n r . (11.151)
r r
If one works with r as one of the coordinates, then this can also succinctly be written
as
2 2
= r , n = n r . (11.152)
r r
As in (11.126) of section 11.5, we also have the relations
= + , n = n + n . (11.153)
Turning now to the Raychaudhuri equation for a spherically symmetric radial null con-
gruence , the general result (11.129) (for 6= 0), i.e.
d
= R 12 2 + (11.154)
d
simplifies considerably. Spherical symmetry implies that the spatial shear and rotation
tensors are zero (a spatial rotationally invariant 2-tensor is proportional to ik which
has neither a traceless nor an anti-symmetric part),
= = 0 . (11.155)
The vanishing of the rotation can also be deduced from the fact that is hypersurface
orthogonal (specifically orthogonal to the family of null hypersurfaces generated by ).
Thus the Rauchaudhuri equation reduces to
d
= R 12 2 . (11.156)
d
286
12 Curvature IV: Curvature and Killing Vectors
( )V = R V (12.1)
and its cyclic symmetry (7.17), it is possible to deduce that for a Killing vector K ,
K + K = 0 , (12.2)
one has the following basic identity relating Killing vectors and the curvature tensor,
K = R K . (12.3)
Indeed, proceeding as in the proof of the cyclic permutation identity (7.17), we deduce
that
[ K] R[] K = 0 . (12.4)
K + K + K = 0 . (12.5)
Using the Killing property in the second term, we can write this as
K = [ , ]K = R K (12.6)
which is (12.3).
This identity can be interpreted as the statement (and can alternatively be derived from
the fact) that the Lie derivative of the Christoffel symbols of a metric along a Killing
vector of the metric is zero.
Indeed, first of all it is easy to see that under a general variation of the metric, the
induced variation of the Christoffel symbol can be written as (19.14)
= 21 g ( g + g g ) . (12.7)
(this is easy to derive and also easy to remember as it takes exactly the same form as
the definition of the Christoffel symbol, only with the metric replaced by the metric
variation and the partial derivatives by covariant derivatives - see section 19.2 for a
287
derivation and discussion of this identity). In particular, this exhibits the fact that the
metric variation of the Christoffel symbols is a tensor (as could have been anticipated
from the fact that the non-tensorial term in the transformation of the Christoffel symbols
is independent of the metric), and additionally provides us with an explicit expression
for this tensor.
Next, if the variation g = L g is the Lie derivative, i.e. the variation in the metric
induced by an infinitesimal coordinate transformation x = , one can write this as
L = 12 g ( L g + L g L g ) . (12.8)
Note that in general the Lie derivative of a non-tensorial quantity is not well defined
(or at least its definition requires a bit more thought). Here, however, it is natural to
use the general formula (12.7) for the variation of the Christoffel symbols under metric
variations to in particular define their Lie derivative (as the change in the Christoffel
symbols induced by the Lie derivative of the metric).
Thus, adopting this definition and using L g = + , the right-hand side can
(upon using the definition and cyclic symmetry of the Riemann tensor) be written as
L = R
(12.9)
= + R .
LK g = 0 LK = 0 K = R K , (12.10)
Contracting (12.3) over and , one obtains the next useful and frequently used identity
K = K R . (12.11)
R K K = ( K )( K ) + (K K ) . (12.12)
Note that this can also be deduced directly from (7.50) for V K a Killing vector.
We will now look at various consequences of the identities (12.3), (12.11) and (12.12)
which are useful and interesting in their own right. The implications of these identities
for maximal symmetry and maximally symmetric spaces will be discussed separately in
section 13 below.
288
12.2 Killing Vectors form a Lie algebra
As the first application, we will explicitly prove the assertion (8.43) of section 8.5 that
the Lie bracket of two Killing vectors is again a Killing vector. While this follows from
the general property (8.32) of the Lie derivative, which itself can (with some work)
be deduced from the general definition of the Lie derivative (as the generator of the
action of coordinate transformations on tensors), it is instructive and reassuring to
verify this by an explicit calculation, also because similar manipulations are required
when extending the analysis from Killing vectors to Killing tensors or Killing-Yano
tensors briefly mentioned in section 9.5.29
Thus consider two Killing vectors A and B , say, i.e. vector fields satisfying
A + A = B + B = 0 (12.13)
or, equivalently,
A = [ A] , B = [ B] . (12.14)
C = [A, B] = A B B A , (12.15)
C = [A, B] C = [ C] . (12.16)
C = ( A ) B ( B ) A + A B B A
(12.17)
= ( A ) B + ( B ) A + R (A B B A ) .
The first two terms are already manifestly anti-symmetric (the second being the anti-
symmetrisation of the first), and by the cyclic identity and other symmetries of the
Riemann tensor, so is the last term,
R R = R R = R = R . (12.18)
Thus the Lie bracket of two Killing vectors is indeed again a Killing vector, as claimed.
29
The interesting question if or when Killing-Yano tensors form a Lie algebra, extending and gener-
alising the Lie algebra of the isometry group generated by the Killing vectors, is analysed in D. Kastor,
S. Ray, J. Traschen, Do Killing-Yano tensors form a Lie Algebra?, arXiv:0705.0535 [hep-th].
289
12.3 On the Isometry Algebra of a Compact Riemannian Space
In this section we will look at one immediate application of the identity (12.12),
R K K = ( K )( K ) + (K K ) , (12.19)
namely an analogue of the Bochner-Yano type argument (given in remark 8 of section
7.5) for Killing vectors. Again, in order to be able to say something of substance we as-
sume that the space we are dealing with is compact without boundary, and Riemannian,
i.e. equipped with a positive-definite metric. In spite of this, the result we will derive
is relevant also for physics, at least as long as one is willing to entertain the possibility
that some higher-dimensional generalisations of general relativity (such as Kaluza-Klein
theories discussed in section 43) plays a role in some more fundamental description of
nature.
With the above assumptions, the first term on the right-hand side of (12.19) is non-
negative and the second is a total derivative term that vanishes upon integration.
Therefore for a Killing vector to exist on a compact Riemannian space, the integral
of R V V must be non-negative as well.
Since the Lie bracket of two covariantly constant vector fields is zero,
V = W = 0 [V, W ] = V W W V = 0 , (12.20)
this means that continuous isometries of a space with vanishing Ricci tensor can at
most be Abelian. An example is provided by the torus T n equipped with the flat metric
it inherits from regarding T n as the periodic identification of Rn . This metric has
vanishing Ricci tensor (because evidently even the Riemann tensor is zero), but there
are n linearly-independent (covariantly) constant translational Killing vectors (inherited
from Rn ) that generate the Abelian isomtery group U (1)n .
In Kaluza-Klein theory, one of the basic ideas is that gauge symmetries arise from
isometries of the internal space living in the extra dimensions. This internal space
is usually assumed to be compact (so as to be sufficiently small to have escaped our
attention). Thus, if one wants to generate non-Abelian gauge theories in this way
the above results provide one of the most basic constraints on the intern