100% found this document useful (3 votes)

826 views380 pages

Probability and Random Variables

Uploaded by

Imtiaz Ahmed

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (3 votes)

826 views380 pages

Probability and Random Variables

Uploaded by

Imtiaz Ahmed

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Probability and

Random Variables

A Beginner's Guide

This is a simple and concise introduction to probability theory. Self-contained

and readily accessible, it is written in an informal tutorial style with a
humorous undertone. Concepts and techniques are dened and developed as
necessary. After an elementary discussion of chance, the central and crucial
rules and ideas of probability, including independence and conditioning, are
set out. Counting, combinatorics, and the ideas of probability distributions
and densities are then introduced. Later chapters present random variables
and examine independence, conditioning, covariance, and functions of ran-
dom variables, both discrete and continuous. The nal chapter considers
generating functions and applies this concept to practical problems including
branching processes, random walks, and the central limit theorem. Examples,
demonstrations, and exercises are used throughout to explore the ways in
which probability is motivated by, and applied to, real-life problems in
science, medicine, gaming, and other subjects of interest. Essential proofs of
important results are included.

Since it assumes minimal prior technical knowledge on the part of the reader,
this book is suitable for students taking introductory courses in probability
and will provide a solid foundation for more advanced courses in probability
and statistics. It would also be a valuable reference for those needing a
working knowledge of probability theory and will appeal to anyone interested
in this endlessly fascinating and entertaining subject.
Probability and
Random Variables
A Beginner's Guide

David Stirzaker
University of Oxford
PUBLISHED BY CAMBRIDGE UNIVERSITY PRESS (VIRTUAL PUBLISHING)
FOR AND ON BEHALF OF THE PRESS SYNDICATE OF THE UNIVERSITY OF CAMBRIDGE
The Pitt Building, Trumpington Street, Cambridge CB2 IRP
40 West 20th Street, New York, NY 10011-4211, USA
477 Williamstown Road, Port Melbourne, VIC 3207, Australia

[Link]

Cambridge university Press 1999

This edition Cambridge University Press (Virtual Publishing) 2003

First published in printed format 1999

A catalogue record for the original printed book is available

from the British Library and from the Library of Congress
Original ISBN 0 521 64297 3 hardback
Original ISBN 0 521 64445 3 paperback

ISBN 0 511 02258 1 virtual ([Link] Edition)

Contents

Synopsis viii
Preface xi

1 Introduction 1
1.1 Preview 1
1.2 Probability 1
1.3 The scope of probability 3
1.4 Basic ideas: the classical case 5
1.5 Basic ideas: the general case 10
1.6 Modelling 14
1.7 Mathematical modelling 19
1.8 Modelling probability 21
1.9 Review 22
1.10 Appendix I. Some randomly selected denitions of probability, in random
order 22
1.11 Appendix II. Review of sets and functions 24
1.12 Problems 27

A Probability

2 The rules of probability 31

2.1 Preview 31
2.2 Notation and experiments 31
2.3 Events 34
2.4 Probability; elementary calculations 37
2.5 The addition rules 41
2.6 Simple consequences 44
2.7 Conditional probability; multiplication rule 47
2.8 The partition rule and Bayes' rule 54
2.9 Independence and the product rule 58
2.10 Trees and graphs 66
2.11 Worked examples 72
2.12 Odds 78

v
vi Contents

2.13 Popular paradoxes 81

2.14 Review: notation and rules 86
2.15 Appendix. Difference equations 88
2.16 Problems 89

3 Counting and gambling 93

3.1 Preview 93
3.2 First principles 93
3.3 Arranging and choosing 97
3.4 Binomial coefcients and Pascal's triangle 101
3.5 Choice and chance 104
3.6 Applications to lotteries 109
3.7 The problem of the points 113
3.8 The gambler's ruin problem 116
3.9 Some classic problems 118
3.10 Stirling's formula 121
3.11 Review 123
3.12 Appendix. Series and sums 124
3.13 Problems 126

4 Distributions: trials, samples, and approximation 129

4.1 Preview 129
4.2 Introduction; simple examples 129
4.3 Waiting; geometric distributions 136
4.4 The binomial distribution and some relatives 139
4.5 Sampling 144
4.6 Location and dispersion 147
4.7 Approximations: a rst look 154
4.8 Sparse sampling; the Poisson distribution 156
4.9 Continuous approximations 158
4.10 Binomial distributions and the normal approximation 163
4.11 Density 169
4.12 Distributions in the plane 172
4.13 Review 174
4.14 Appendix. Calculus 176
4.15 Appendix. Sketch proof of the normal limit theorem 178
4.16 Problems 180

B Random Variables

5 Random variables and their distributions 189

5.1 Preview 189
5.2 Introduction to random variables 189
5.3 Discrete random variables 194
Contents vii

5.4 Continuous random variables; density 198

5.5 Functions of a continuous random variable 204
5.6 Expectation 207
5.7 Functions and moments 212
5.8 Conditional distributions 218
5.9 Conditional density 225
5.10 Review 229
5.11 Appendix. Double integrals 232
5.12 Problems 233

6 Jointly distributed random variables 238

6.1 Preview 238
6.2 Joint distributions 238
6.3 Joint density 245
6.4 Independence 250
6.5 Functions 254
6.6 Sums of random variables 260
6.7 Expectation; the method of indicators 267
6.8 Independence and covariance 273
6.9 Conditioning and dependence, discrete case 280
6.10 Conditioning and dependence, continuous case 286
6.11 Applications of conditional expectation 291
6.12 Bivariate normal density 294
6.13 Change-of-variables technique; order statistics 298
6.14 Review 301
6.15 Problems 302

7 Generating functions 309

7.1 Preview 309
7.2 Introduction 309
7.3 Examples of generating functions 312
7.4 Applications of generating functions 315
7.5 Random sums and branching processes 319
7.6 Central limit theorem 323
7.7 Random walks and other diversions 324
7.8 Review 329
7.9 Appendix. Tables of generating functions 329
7.10 Problems 330

Hints and solutions for selected exercises and problems 336

Index 365
Synopsis

This is a simple and concise introduction to probability and the theory of

probability. It considers some of the ways in which probability is motivated
by, and applied to, real-life problems in science, medicine, gaming, and other
subjects of interest. Probability is inescapably mathematical in character but,
as bets a rst course, the book assumes minimal prior technical knowledge
on the part of the reader. Concepts and techniques are dened and developed
as necessary, making the book as accessible and self-contained as possible.
The text adopts an informal tutorial style, with emphasis on examples,
demonstrations, and exercises. Nevertheless, to ensure that the book is
appropriate for use as a textbook, essential proofs of important results are
included. It is therefore well suited to accompany the usual introductory
lecture courses in probability. It is intended to be useful to those who need a
working knowledge of the subject in any one of the many elds of application.
In addition it will provide a solid foundation for those who continue on to
more advanced courses in probability, statistics, and other developments.
Finally, it is hoped that the more general reader will nd this book useful in
exploring the endlessly fascinating and entertaining subject of probability.

viii
On this occasion, I must take notice to such of my readers as are well versed
in Vulgar Arithmetic, that it would not be difcult for them to make
themselves Masters, not only of all the practical Rules in this book, but also
of more useful Discoveries, if they would take the small Pains of being
acquainted with the bare Notation of Algebra, which might be done in the
hundredth part of the Time that is spent in learning to write Short-hand.

A. de Moivre, The Doctrine of Chances, 1717

Preface

This book begins with an introduction, chapter 1, to the basic ideas and methods of
probability that are usually covered in a rst course of lectures. The rst part of the main
text, subtitled Probability, comprising chapters 24, introduces the important ideas of
probability in a reasonably informal and non-technical way. In particular, calculus is not a
prerequisite.
The second part of the main text, subtitled Random Variables, comprising the nal
three chapters, extends these ideas to a wider range of important and practical applica-
tions. In these chapters it is assumed that the student has had some exposure to the small
portfolio of ideas introduced in courses labelled `calculus'. In any case, to be on the safe
side and make the book as self-contained as possible, brief expositions of the necessary
results are included at the ends of appropriate chapters.
The material is arranged as follows.
Chapter 1 contains an elementary discussion of what we mean by probability, and how
our intuitive knowledge of chance will shape a mathematical theory.
Chapter 2 introduces some notation, and sets out the central and crucial rules and ideas
of probability. These include independence and conditioning.
Chapter 3 begins with a brief primer on counting and combinatorics, including
binomial coefcients. This is illustrated with examples from the origins of probability,
including famous classics such as the gambler's ruin problem, and others.
Chapter 4 introduces the idea of a probability distribution. At this elementary level the
idea of a probability density, and ways of using it, are most easily grasped by analogy
with the discrete case. The chapter therefore includes the uniform, normal, and exponen-
tial densities, as well as the binomial, geometric, and Poisson distributions. We also
discuss the idea of mean and variance.
Chapter 5 introduces the idea of a random variable; we discuss discrete random
variables and those with a density. We look at functions of random variables, and at
conditional distributions, together with their expected values.
Chapter 6 extends these ideas to several random variables, and explores all the above
concepts in this setting. In particular, we look at independence, conditioning, covariance,
and functions of several random variables (including sums). As in chapter 5 we treat
continuous and discrete random variables together, so that students can learn by the use
of analogy (a very powerful learning aid).
Chapter 7 introduces the ideas and techniques of generating functions, in particular
probability generating functions and moment generating functions. This ingenious and

xi
xii Preface

elegant concept is applied to a variety of practical problems, including branching

processes, random walks, and the central limit theorem.
In general the development of the subject is guided and illustrated by as many
examples as could be packed into the text. Nevertheless, I have not shrunk from including
proofs wherever they are important, or informative, or entertaining.
Naturally, some parts of the book are easier than others, and I would offer readers the
advice, which is very far from original, that if they come to a passage that seems too
difcult, then they should skip it, and return to it later. In many cases the difculty will be
found to have evaporated.
In general it is much easier and more pleasant to get to grips with a subject if you
believe it to be of interest in its own right, rather than just a handy tool. I have therefore
included a good deal of background material and illustrative examples to convince the
reader that probability is one of the most entertaining and endlessly fascinating branches
of mathematics. Furthermore, even in a long lecture course the time that can be devoted
to examples and detailed explanations is necessarily limited. I have therefore endeavoured
to ensure that the book can be read with a minimum of additional guidance.
Moreover, prerequisites have been kept to a minimum, and mathematical complexities
have been rigorously excluded. You do need common sense, practical arithmetic, and
some bits of elementary algebra. These are included in the core syllabus of all school
mathematics courses.
Readers are strongly encouraged to attempt a respectable fraction of the exercises and
problems. Tackling relevant problems (even when the attempt is not completely success-
ful) always helps you to understand the concepts. In general, the exercises provide routine
and transparent applications of ideas in the nearby text. Problems are often less routine;
they may use ideas from further aeld, and may put them in a new setting. Solutions and
hints for most of the exercises and problems appear before the Index.
While all the exercises and problems have been kept as simple and straightforward as
possible, it is inescapable that some may seem harder than others. I have resisted the
temptation to magnify any slight difculty by advertising it with an asterisk or equivalent
decoration. You are at liberty to nd any exercise easy, irrespective of any difculties I
may have anticipated.
It is certainly difcult to exclude every error from the text. I entreat readers to inform
me of all those they discover.
Finally, you should note that the ends of examples, denitions, and proofs are denoted
by the symbols s, n, and h respectively.

Oxford
January 1999
1
Introduction

I shot an arrow into the air

It fell to earth, I knew not where.
H.W. Longfellow

O! many a shaft at random sent

Finds mark the archer little meant.
W. Scott

1.1 PREVIEW
This chapter introduces probability as a measure of likelihood, which can be placed on a
numerical scale running from 0 to 1. Examples are given to show the range and scope of
problems that need probability to describe them. We examine some simple interpretations
of probability that are important in its development, and we briey show how the well-
known principles of mathematical modelling enable us to progress. Note that in this
chapter exercises and problems are chosen to motivate interest and discussion; they are
therefore non-technical, and mathematical answers are not expected.

Prerequisites. This chapter contains next to no mathematics, so there are no

prerequisites. Impatient readers keen to get to an equation could proceed directly to
chapter 2.

1 . 2 P RO BA B I L I T Y
We all know what light is, but it is not easy to tell what it is.
Samuel Johnson
From the moment we rst roll a die in a children's board game, or pick a card (any card),
we start to learn what probability is. But even as adults, it is not easy to tell what it is, in
the general way.

1
2 1 Introduction

For mathematicians things are simpler, at least to begin with. We have the following:

Probability is a number between zero and one, inclusive.

This may seem a trie arbitrary and abrupt, but there are many excellent and plausible
reasons for this convention, as we shall show. Consider the following eventualities.
(i) You run a mile in less than 10 seconds.
(ii) You roll two ordinary dice and they show a double six.
(iii) You ip an ordinary coin and it shows heads.
(iv) Your weight is less than 10 tons.
If you think about the relative likelihood (or chance or probability) of these eventualities,
you will surely agree that we can compare them as follows.
The chance of running a mile in 10 seconds is less than the chance of a double six,
which in turn is less than the chance of a head, which in turn is less than the chance of
your weighing under 10 tons. We may write
chance of 10 second mile , chance of a double six
, chance of a head
, chance of weighing under 10 tons.
(Obviously it is assumed that you are reading this on the planet Earth, not on some
asteroid, or Jupiter, that you are human, and that the dice are not crooked.)
It is easy to see that we can very often compare probabilities in this way, and so it is
natural to represent them on a numerical scale, just as we do with weights, temperatures,
earthquakes, and many other natural phenomena. Essentially, this is what numbers are
for.
Of course, the two extreme eventualities are special cases. It is quite certain that you
weigh less than 10 tons; nothing could be more certain. If we represent certainty by unity,
then no probabilities exceed this. Likewise it is quite impossible for you to run a mile in
10 seconds or less; nothing could be less likely. If we represent impossibility by zero,
then no probability can be less than this. Thus we can, if we wish, present this on a scale,
as shown in gure 1.1.
The idea is that any chance eventuality can be represented by a point somewhere on
this scale. Everything that is impossible is placed at zero that the moon is made of

0 1

chance that a
impossible coin shows heads certain

chance that two

dice yield double six
Figure 1.1. A probability scale.
1.3 The scope of probability 3

cheese, formation ying by pigs, and so on. Everything that is certain is placed at unity
the moon is not made of cheese, Socrates is mortal, and so forth. Everything else is
somewhere in [0, 1], i.e. in the interval between 0 and 1, the more likely things being
closer to 1 and the more unlikely things being closer to 0.
Of course, if two things have the same chance of happening, then they are at the same
point on the scale. That is what we mean by `equally likely'. And in everyday discourse
everyone, including mathematicians, has used and will use words such as very likely,
likely, improbable, and so on. However, any detailed or precise look at probability
requires the use of the numerical scale. To see this, you should ponder on just how you
would describe a chance that is more than very likely, but less than very very likely.
This still leaves some questions to be answered. For example, the choice of 0 and 1 as
the ends of the scale may appear arbitrary, and, in particular, we have not said exactly
which numbers represent the chance of a double six, or the chance of a head. We have not
even justied the claim that a head is more likely than double six. We discuss all this later
in the chapter; it will turn out that if we regard probability as an extension of the idea of
proportion, then we can indeed place many probabilities accurately and condently on
this scale.
We conclude with an important point, namely that the chance of a head (or a double
six) is just a chance. The whole point of probability is to discuss uncertain eventualities
before they occur. After this event, things are completely different. As the simplest
illustration of this, note that even though we agree that if we ip a coin and roll two dice
then the chance of a head is greater than the chance of a double six, nevertheless it may
turn out that the coin shows a tail when the dice show a double six. Likewise, when the
weather forecast gives a 90% chance of rain, or even a 99% chance, it may in fact not
rain. The chance of a slip on the San Andreas fault this week is very small indeed,
nevertheless it may occur today. The antibiotic is overwhelmingly likely to cure your
illness, but it may not; and so on.

Exercises for section 1.2

1. Formulate your own denition of probability. Having done so, compare and contrast it with
those in appendix I of this chapter.
2. (a) Suppose you ip a coin; there are two possible outcomes, head or tail. Do you agree that
the probability of a head is 12? If so, explain why.
(b) Suppose you take a test; there are two possible outcomes, pass or fail. Do you agree that
the probability of a pass is 12? If not, explain why not.
3. In the above discussion we claimed that it was intuitively reasonable to say that you are more
likely to get a head when ipping a coin than a double six when rolling two dice. Do you agree?
If so, explain why.

1 . 3 T H E S C O P E O F P RO BA B I L I T Y
. . . nothing between humans is 1 to 3. In fact, I long ago come to the conclusion
that all life is 6 to 5 against.
Damon Runyon, A Nice Price
4 1 Introduction

Life is a gamble at terrible odds; if it was a bet you wouldn't take it.
Tom Stoppard, Rosencrantz and Guildenstern are Dead, Faber and Faber
In the next few sections we are going to spend a lot of time ipping coins, rolling dice,
and buying lottery tickets. There are very good reasons for this narrow focus (to begin
with), as we shall see, but it is important to stress that probability is of great use and
importance in many other circumstances. For example, today seems to be a fairly typical
day, and the newspapers contain articles on the following topics (in random order).

1. How are the chances of a child's suffering a genetic disorder affected by a grand-
parent's having this disorder? And what difference does the sex of child or ancestor
make?
2. Does the latest opinion poll reveal the true state of affairs?
3. The lottery result.
4. DNA proling evidence in a trial.
5. Increased annuity payments possible for heavy smokers.
6. An extremely valuable picture (a Van Gogh) might be a fake.
7. There was a photograph taken using a scanning tunnelling electron microscope.
8. Should risky surgical procedures be permitted?
9. Malaria has a signicant chance of causing death; prophylaxis against it carries a
risk of dizziness and panic attacks. What do you do?
10. A commodities futures trader lost a huge sum of money.
11. An earthquake occurred, which had not been predicted.
12. Some analysts expected ination to fall; some expected it to rise.
13. Football pools.
14. Racing results, and tips for the day's races.
15. There is a 10% chance of snow tomorrow.
16. Prots from gambling in the USA are growing faster than any other sector of the
economy. (In connection with this item, it should be carefully noted that prots are
made by the casino, not the customers.)
17. In the preceding year, British postmen had sustained 5975 dogbites, which was
around 16 per day on average, or roughly one every 20 minutes during the time
when mail is actually delivered. One postman had sustained 200 bites in 39 years of
service.

Now, this list is by no means exhaustive; I could have made it longer. And such a list
could be compiled every day (see the exercise at the end of this section). The subjects
reported touch on an astonishingly wide range of aspects of life, society, and the natural
world. And they all have the common property that chance, uncertainty, likelihood,
randomness call it what you will is an inescapable component of the story.
Conversely, there are few features of life, the universe, or anything, in which chance is
not in some way crucial.
Nor is this merely some abstruse academic point; assessing risks and taking chances
are inescapable facets of everyday existence. It is a trite maxim to say that life is a lottery;
it would be more true to say that life offers a collection of lotteries that we can all, to
some extent, choose to enter or avoid. And as the information at our disposal increases, it
does not reduce the range of choices but in fact increases them. It is, for example,
1.4 Basic ideas: the classical case 5

increasingly difcult successfully to run a business, practise medicine, deal in nance, or

engineer things without having a keen appreciation of chance and probability. Of course
you can make the attempt, by relying entirely on luck and uninformed guesswork, but in
the long run you will probably do worse than someone who plays the odds in an informed
way. This is amply conrmed by observation and experience, as well as by mathematics.
Thus, probability is important for all these severely practical reasons. And we have the
bonus that it is also entertaining and amusing, as the existence of all those lotteries,
casinos, and racecourses more than sufciently testies.
Finally, a glance at this and other section headings shows that chance is so powerful
and emotive a concept that it is employed by poets, playwrights, and novelists. They
clearly expect their readers to grasp jokes, metaphors, and allusions that entail a shared
understanding of probability. (This feat has not been accomplished by algebraic struc-
tures, or calculus, and is all the more remarkable when one recalls that the literati are not
otherwise celebrated for their keen numeracy.) Furthermore, such allusions are of very
long standing; we may note the comment attributed by Plutarch to Julius Caesar on
crossing the Rubicon: `Iacta alea est' (commonly rendered as `The die is cast'). And the
passage from Ecclesiastes: `The race is not always to the swift, or the battle to the strong,
but time and chance happen to them all'. The Romans even had deities dedicated to
chance, Fors and Fortuna, echoed in Shakespeare's Hamlet: `. . . the slings and arrows of
outrageous fortune . . .'.
Many other cultures have had such deities, but it is notable that deication has not
occurred for any other branch of mathematics. There is no god of algebra.
One recent stanza (by W.H. Henley) is of particular relevance to students of probability,
who are often soothed and helped by murmuring it during difcult moments in lectures
and textbooks:
In the fell clutch of circumstance
I have not winced or cried aloud:
Under the bludgeonings of chance
My head is bloody, but unbowed.

Exercise for section 1.3

1. Look at today's newspapers and mark the articles in which chance is explicitly or implicitly an
important feature of the report.

1 . 4 BA S I C I D E A S : T H E C L A S S I C A L C A S E
The perfect die does not lose its usefulness or justication by the fact that real dice
fail to live up to it.
W. Feller
Our rst task was mentioned above; we need to supply reasons for the use of the standard
probability scale, and methods for deciding where various chances should lie on this
scale. It is natural that in doing this, and in seeking to understand the concept of
probability, we will pay particular attention to the experience and intuition yielded by
ipping coins and rolling dice. Of course this is not a very bold or controversial decision;
6 1 Introduction

any theory of probability that failed to describe the behaviour of coins and dice would be
widely regarded as useless. And so it would be. For several centuries that we know of,
and probably for many centuries before that, ipping a coin (or rolling a die) has been the
epitome of probability, the paradigm of randomness. You ip the coin (or roll the die),
and nobody can accurately predict how it will fall. Nor can the most powerful computer
predict correctly how it will fall, if it is ipped energetically enough.
This is why cards, dice, and other gambling aids crop up so often in literature both
directly and as metaphors. No doubt it is also the reason for the (perhaps excessive)
popularity of gambling as entertainment. If anyone had any idea what numbers the lottery
would show, or where the roulette ball will land, the whole industry would be a dead
duck.
At any rate, these long-standing and simple gaming aids do supply intuitively con-
vincing ways of characterizing probability. We discuss several ideas in detail.

I Probability as proportion
Figure 1.2 gives the layout of an American roulette wheel. Suppose such a wheel is spun
once; what is the probability that the resulting number has a 7 in it? That is to say, what is
the probability that the ball hits 7, 17, or 27? These three numbers comprise a proportion
3
38 of the available compartments, and so the essential symmetry of the wheel (assuming it
3
is well made) suggests that the required probability ought to be 38 . Likewise the

0 00

1 2 3
118
27 00 1 13
25 10 36 4 5 6
29 24 112
12 3 7 8 9
8 15
34 even
19 22 10 11 12
31 5
18 17 13 14 15
6
21 32
16 17 18
33 20
16 1324
7 19 20 21
4 11
23 30
35 26 22 23 24
14 2 9
0 28
25 26 27
odd
28 29 30
2536
31 32 33
1936
34 35 36

2 to 1 2 to 1 2 to 1

Figure 1.2. American roulette. Shaded numbers are black; the others are red except for the zeros.
1.4 Basic ideas: the classical case 7

probability of an odd compartment is suggested to be 18 9

38 19, because the proportion of
18
odd numbers on the wheel is 38.
Most people nd this proposition intuitively acceptable; it clearly relies on the
fundamental symmetry of the wheel, that is, that all numbers are regarded equally by the
ball. But this property of symmetry is shared by a great many simple chance activities; it
is the same as saying that all possible outcomes of a game or activity are equally likely.
For example:
The ball is equally likely to land in any compartment.
You are equally likely to select either of two cards.
The six faces of a die are equally likely to be face up.
With these examples in mind it seems reasonable to adopt the following convention or
rule. Suppose some game has n equally likely outcomes, and r of these outcomes
correspond to your winning. Then the probability p that you win is r=n. We write
r number of ways of winning the game
(1) p :
n number of possible outcomes of the game
This formula looks very simple. Of course, it is very simple but it has many useful and
important consequences. First note that we always have 0 < r < n, and so it follows that
(2) 0 < p < 1:
If r 0, so that it is impossible for you to win, then p 0. Likewise if r n, so that
you are certain to win, then p 1. This is all consistent with the probability scale
introduced in section 1.2, and supplies some motivation for using it. Furthermore, this
interpretation of probability as dened by proportion enables us to place many simple but
important chances on the scale.

Example 1.4.1. Flip a coin and choose `heads'. Then r 1, because you win on the
outcome `heads', and n 2, because the coin shows a head or a tail. Hence the
probability that you win, which is also the probability of a head, is p 12. s

Example 1.4.2. Roll a die. There are six outcomes, which is to say that n 6. If you
win on an even number then r 3, so the probability that an even number is shown is
p 36 12:
Likewise the chance that the die shows a 6 is 16 , and so on. s

Example 1.4.3. Pick a card at random from a pack of 52 cards. What is the
probability of an ace? Clearly n 52 and r 4, so that
4 1
p 52 13 : s

Example 1.4.4. A town contains x women and y men; an opinion pollster chooses an
adult at random for questioning about toothpaste. What is the chance that the adult is
male? Here
n x y and r y:
8 1 Introduction

Hence the probability is

p y=(x y): s
It may be objected that these results depend on an arbitrary imposition of the ideas of
symmetry and proportion, which are clearly not always relevant. Nevertheless, such
results and ideas are immensely appealing to our intuition; in fact the rst probability
calculations in Renaissance Italy take this framework more or less for granted. Thus
Cardano (writing around 1520), says of a well-made die: `One half of the total number of
faces always represents equality . . . I can as easily throw 1, 3, or 5 as 2, 4, or 6'.
Here we can clearly see the beginnings of the idea of probability as an expression of
proportion, an idea so powerful that it held sway for centuries. However, there is at least
one unsatisfactory aspect to this interpretation: it seems that we do not need ever to roll a
die to say that the chance of a 6 is 16 . Surely actual experiments should have a role in our
denitions? This leads to another idea.

II Probability as relative frequency

Figure 1.3 shows the proportion of sixes that appeared in a sequence of rolls of a die. The
number of rolls is n, for n 0, 1, 2, . . . ; the number of sixes is r(n), for each n, and the
proportion of sixes is
r(n)
(3) p(n) :
n
What has this to do with the probability that the die shows a six? Our idea of probability
as a proportion suggests that the proportion of sixes in n rolls should not be too far from
the theoretical chance of a six, and gure 1.3 shows that this seems to be true for large
values of n. This is intuitively appealing, and the same effect is observed if you record
such proportions in a large number of other repeated chance activities.
We therefore make the following general assertion. Suppose some game is repeated a
large number n of times, and in r(n) of these games you win. Then the probability p that

p(n)

0.20
0.18
0.16
0.14
0.12
0.10

0 10 20 30 40 50 60 70 80 90 100 n

Figure 1.3. The proportion of sixes given in 100 rolls of a die, recorded at intervals of 5 rolls.
_
Figures are from an actual experiment. Of course, 16 0:166.
1.4 Basic ideas: the classical case 9

you win some future similar repetition of this game is close to r(n)=n. We write
r(n) number of wins in n games
(4) p' :
n number n of games
The symbol ' is read as `is approximately equal to'. Once again we note that
0 < r(n) < n and so we may take it that 0 < p < 1.
Furthermore, if a win is impossible then r(n) 0, and r(n)=n 0. Also, if a win is
certain then r(n) n, and r(n)=n 1. This is again consistent with the scale introduced
in gure 1.1, which is very pleasant. Notice the important point that this interpretation
supplies a way of approximately measuring probabilities rather than calculating them
merely by an appeal to symmetry.
Since we can now calculate simple probabilities, and measure them approximately, it is
tempting to stop there and get straight on with formulating some rules. That would be a
mistake, for the idea of proportion gives another useful insight into probability that will
turn out to be just as important as the other two, in later work.

III Probability and expected value

Many problems in chance are inextricably linked with numerical outcomes, especially in
gambling and nance (where `numerical outcome' is a euphemism for money). In these
cases probability is inextricably linked to `value', as we now show.
To aid our thinking let us consider an everyday concrete and practical problem. A
plutocrat makes the following offer. She will ip a fair coin; if it shows heads she will
give you $1, if it shows tails she will give Jack $1. What is this offer worth to you? That
is to say, for what fair price $ p, should you sell it?
Clearly, whatever price $ p this is worth to you, it is worth the same price $ p to Jack,
because the coin is fair, i.e. symmetrical (assuming he needs and values money just as
much as you do). So, to the pair of you, this offer is altogether worth $2 p. But whatever
the outcome, the plutocrat has given away $1. Hence $2 p $1, so that p 12 and the
offer is worth $12 to you.
It seems natural to regard this value p 12 as a measure of your chance of winning the
money. It is thus intuitively reasonable to make the following general rule.
Suppose you receive $1 with probability p (and otherwise you receive nothing). Then
the value or fair price of this offer is $ p. More generally, if you receive $d with
probability p (and nothing otherwise) then the fair price or expected value of this offer is
given by
(5) expected value pd:
This simple idea turns out to be enormously important later on; for the moment we
note only that it is certainly consistent with our probability scale introduced in gure 1.1.
For example, if the plutocrat denitely gives you $1 then this is worth exactly $1 to you,
and p 1. Likewise if you are denitely given nothing, then p 0. And it is easy to see
that 0 < p < 1, for any such offers.
In particular, for the specic example above we nd that the probability of a head when
a fair coin is ipped is 12. Likewise a similar argument shows that the probability of a six
when a fair die is rolled is 16. (Simply imagine the plutocrat giving $1 to one of six people
10 1 Introduction

selected by the roll of the die.)

The `fair price' of such offers is often called the expected value, or expectation, to
emphasize its chance nature. We meet this concept again, later on.
We conclude this section with another classical and famous manifestation of prob-
ability. It is essentially the same as the rst we looked at, but is supercially different.

IV Probability as proportion again

Suppose a small meteorite hits the town football pitch. What is the probability that it
lands in the central circle?
Obviously meteorites have no special propensity to hit any particular part of a football
pitch; they are equally likely to strike any part. It is therefore intuitively clear that the
chance of striking the central circle is given by the proportion of the pitch that it occupies.
In general, if jAj is the area of the pitch in which the meteorite lands, and jCj is the area
of some part of the pitch, then the probability p that C is struck is given by p jCj=jAj:
Once again we formulate a general version of this as follows. Suppose a region A of
the plane has area jAj, and C is some part of A with area jCj. If a point is picked at
random in A, then the probability p that it lies in C is given by
jCj
(6) p :
jAj
As before we can easily see that 0 < p < 1, where p 0 if C is empty and p 1 if
C A.

Example 1.4.5. An archery target is a circle of radius 2. The bullseye is a circle of

radius 1. A naive archer is equally likely to hit any part of the target (if she hits it at all)
and so the probability of a bullseye for an arrow that hits the target is
area of bullseye 3 12 1
p : s
area of target 3 22 4

Exercises for section 1.4

1. Suppose you read in a newspaper that the proportion of $20 bills that are forgeries is 5%. If you
possess what appears to be a $20 bill, what is its expected value? Could it be more than $19? Or
could it be less? Explain! (Does it make any difference how you acquired the bill?)
2. A point P is picked at random in the square ABCD, with sides of length 1. What is the
probability that the distance from P to the diagonal AC is less than 16?

1 . 5 BA S I C I D E A S ; T H E G E N E R A L C A S E
We must believe in chance, for how else can we account for the successes of those
we detest?
Anon.
We noted that a theory of probability would be hailed as useless if it failed to describe the
behaviour of coins and dice. But of course it would be equally useless if it failed to
1.5 Basic ideas; the general case 11

describe anything else, and moreover many real dice and coins (especially dice) have
been known to be biased and asymmetrical. We therefore turn to the question of assigning
probabilities in activities that do not necessarily have equally likely outcomes.
It is interesting to note that the desirability of doing this was implicitly recognized by
Cardano (mentioned in the previous section) around 1520. In his Book on Games of
Chance, which deals with supposedly fair dice, he notes that

`Every die, even if it is acceptable, has its favoured side.'

However, the ideas necessary to describe the behaviour of such biased dice had to wait
for Pascal in 1654, and later workers. We examine the basic notions in turn; as in the
previous section, these notions rely on our concept of probability as an extension of
proportion.

I Probability as relative frequency

Once again we choose a simple example to illustrate the ideas, and a popular choice is
the pin, or tack. Figure 1.4 shows a pin, called a Bernoulli pin. If such a pin is dropped
onto a table the result is a success, S, if the point is not upwards; otherwise it is a failure,
F.
What is the probability p of success? Obviously symmetry can play no part in xing p,
and Figure 1.5, which shows more Bernoulli pins, indicates that mechanical arguments
will not provide the answer.
The only course of action is to drop many similar pins (or the same pin many times),
and record the proportion that are successes (point down). Then if n are dropped, and
r(n) are successes, we anticipate that the long-run proportion of successes is near to p,
that is:
r(n)
(1) p' , for large n:
n

failure F success S

Figure 1.4. A Bernoulli pin.

Figure 1.5. More Bernoulli pins.

12 1 Introduction

If you actually obtain a pin and perform this experiment, you will get a graph like that
of gure 1.6. It does seem from the gure that r(n)=n is settling down around some
number p, which we naturally interpret as the probability of success. It may be objected
that the ratio changes every time we drop another pin, and so we will never obtain an
exact value for p. But this gap between the real world and our descriptions of it is
observed in all subjects at all levels. For example, geometry tells us that the diagonal of a
p
unit square has length 2. But, as A. A. Markov has observed,
If we wished to verify this fact by measurements, we should nd that the ratio of
p
diagonal to side is different for different squares, and is never 2.
It may be regretted that we have only this somewhat hit-or-miss method of measuring
probability, but we do not really have any choice in the matter. Can you think of any other
way of estimating the chance that the pin will fall point down? And even if you did think
of such a method of estimation, how would you decide whether it gave the right answer,
except by ipping the pin often enough to see? We can illustrate this point by considering
a basic and famous example.

Example 1.5.1: sex ratio. What is the probability that the next infant to be born in
your local hospital will be male? Throughout most of the history of the human race it was
taken for granted that essentially equal numbers of boys and girls are born (with some
uctuations, naturally). This question would therefore have drawn the answer 12, until
recently.
However, in the middle of the 16th century, English parish churches began to keep
fairly detailed records of births, marriages, and deaths. Then, in the middle of the 17th
century, one John Graunt (a draper) took the trouble to read, collate, and tabulate the
numbers in various categories. In particular he tabulated the number of boys and girls
whose births were recorded in London in each of 30 separate years.
To his, and everyone else's, surprise, he found that in every single year more boys were
born than girls. And, even more remarkably, the ratio of boys to girls varied very little
between these years. In every year the ratio of boys to girls was close to 14:13. The
meaning and signicance of this unarguable truth inspired a heated debate at the time.
For us, it shows that the probability that the next infant born will be male, is
approximately 14 27 . A few moments thought will show that there is no other way of
answering the general question, other than by nding this relative frequency.

p(n)

0.4

Figure 1.6. Sketch of the proportion p(n) of successes when a Bernoulli pin is dropped n times.
For this particular pin, p seems to be settling down at approximately 0.4.
1.5 Basic ideas; the general case 13

It is important to note that the empirical frequency differs from place to place and from
time to time. Graunt also looked at the births in Romsey over 90 years and found the
empirical frequency to be 16:15. It is currently just under 0.513 in the USA, slightly less
than 14 16
27 (' 0:519) and 31 (' 0:516).
Clearly the idea of probability as a relative frequency is very attractive and useful.
Indeed it is generally the only interpretation offered in textbooks. Nevertheless, it is not
always enough, as we now discuss.

II Probability as expected value

The problem is that to interpret probability as a relative frequency requires that we can
repeat some game or activity as many times as we wish. Often this is clearly not the case.
For example, suppose you have a Russian Imperial Bond, or a share in a company that is
bankrupt and is being liquidated, or an option on the future of the price of gold. What is
the probability that the bond will be redeemed, the share will be repaid, or the option will
yield a prot? In these cases the idea of expected value supplies the answer. (For
simplicity, we assume constant money values and no interest.)
The ideas and argument are essentially the same as those that we used in considering
the benevolent plutocrat in section 1.4, leading to equation (5) in that section. For variety,
we rephrase those notions in terms of simple markets. However, a word of warning is
appropriate at this point. Real markets are much more complicated than this, and what we
call the fair price or expected value will not usually be the actual or agreed market price
in any case, or even very close to it. This is especially marked in the case of deals which
run into the future, such as call options, put options, and other complicated nancial
derivatives. If you were to offer prices based on fairness or expected value as discussed
here and above, you would be courting total disaster, or worse. See the discussion of
bookmakers' odds in section 2.12 for further illustration and words of caution.
Suppose you have a bond with face value $1, and the probability of its being redeemed
at par (that is, for $1) is p. Then, by the argument we used in section 1.4, the expected
value , or fair price, of this bond is given by p. More generally, if the bond has face
value $d then the fair price is $dp.
Now, as it happens, there are markets in all these things: you can buy Imperial Chinese
bonds, South American Railway shares, pork belly futures, and so on. It follows that if
the market gives a price for a bond with face value d, then it gives the probability of
redemption as roughly

(2) p :
d

Example 1.5.2. If a bond for a million roubles is offered to you for one rouble, and
the sellers are assumed to be rational, then they clearly think the chance of the bond's
being bought back at par is less than one in a million. If you buy it, then presumably you
believe the chances are more than one in a million. If you thought the chances were less,
you would reduce your offer. If you both agree that one rouble is a fair price for the bond,
then you have assigned the value p 106 for the probability of its redemption. Of
course this may vary according to various rumours and signals from the relevant banks
14 1 Introduction

and government (and note that the more ornate and attractive bonds now have some
intrinsic value, independent of their chance of redemption). s

This example leads naturally to our nal candidate for an interpretation of probability.

III Probability as an opinion or judgement

In the previous example we were able to assign a probability because the bond had an
agreed fair price, even though this price was essentially a matter of opinion. What
happens if we are dealing with probabilities that are purely personal opinions? For
example, what is the probability that a given political party will win the next election?
What is the probability that small green aliens regularly visit this planet? What is the
probability that some accused person is guilty? What is the probability that a given,
opaque, small, brick building contains a pig?
In each of these cases we could perhaps obtain an estimate of the probability by
persuading a bookmaker to compile a number of wagers and so determine a fair price.
But we would be at a loss if nobody were prepared to enter this game. And it would seem
to be at best a very articial procedure, and at worst extremely inappropriate, or even
illegal. Furthermore, the last resort, betting with yourself, seems strangely unattractive.
Despite these problems, this idea of probability as a matter of opinion is often useful,
though we shall not use it in this text.

Exercises for section 1.5

1. A picture would be worth $1000 000 if genuine, but nothing if a fake. Half the experts say it's a
fake, half say it's genuine. What is it worth? Does it make any difference if one of the experts is
a millionaire?
2. A machine accepts dollar bills and sells a drink for $1. The price is raised to 120c. Converting
the machine to accept coins or give change is expensive, so it is suggested that a simple
randomizer is added, so that each customer who inserts $1 gets nothing with probability 1=6, or
the can with probability 5=6, and that this would be fair because the expected value of the output
is 120 3 5=6 100c $1, which is exactly what the customer paid. Is it indeed fair?
In the light of this, discuss how far our idea of a fair price depends on a surreptitious use of
the concept of repeated experiments.
Would you buy a drink from the modied machine?

1.6 MODELLING
If I wish to know the chances of getting a complete hand of 13 spades, I do not set
about dealing hands. It would take the population of the world billions of years to
obtain even a bad estimate of this.
John Venn
The point of the above quote is that we need a theory of probability to answer even the
simplest of practical questions. Such theories are called models.
1.6 Modelling 15

Example 1.6.1: cards. For the question above, the usual model is as follows. We
assume that all possible hands of cards are equally likely, so that if the number of all
possible hands is n, then the required probability is n1 . s

Experience seems to suggest that for a well-made, well-shufed pack of cards, this
answer is indeed a good guide to your chances of getting a hand of spades. (Though we
must remember that such complete hands occur more often than this predicts, because
humorists stack the pack, as a `joke'.) Even this very simple example illustrates the
following important points very clearly.
First, the model deals with abstract things. We cannot really have a perfectly shufed
pack of perfect cards; this `collection of equally likely hands' is actually a ction. We
create the idea, and then use the rules of arithmetic to calculate the required chances. This
is characteristic of all mathematics, which concerns itself only with rules dening the
behaviour of entities which are themselves undened (such as `numbers' or `points').
Second, the use of the model is determined by our interpretation of the rules and
results. We do not need an interpretation of what chance is to calculate probabilities, but
without such an interpretation it is rather pointless to do it.
Similarly, you do not need to have an interpretation of what lines and points are to do
geometry and trigonometry, but it would all be rather pointless if you did not have one.
Likewise chess is just a set of rules, but if checkmate were not interpreted as victory, not
many people would play.
Use of the term `model' makes it easier to keep in mind this distinction between theory
and reality. By its very nature a model cannot include all the details of the reality it seeks
to represent, for then it would be just as hard to comprehend and describe as the reality
we want to model. At best, our model should give a reasonable picture of some small part
of reality. It has to be a simple (even crude) description; and we must always be ready to
scrap or improve a model if it fails in this task of accurate depiction. That having been
said, old models are often still useful. The theory of relativity supersedes the Newtonian
model, but all engineers use Newtonian mechanics when building bridges or motor cars,
or probing the solar system.
This process of observation, model building, analysis, evaluation, and modication is
called modelling, and it can be conveniently represented by a diagram; see gure 1.7.
(This diagram is therefore in itself a model; it is a model for the modelling process.)
In gure 1.7, the top two boxes are embedded in the real world and the bottom two
boxes are in the world of models. Box A represents our observations and experience of
some phenomenon, together with relevant knowledge of related events and perhaps past
experience of modelling. Using this we construct the rules of a model, represented by box
B. We then use the techniques of logical reasoning, or mathematics, to deduce the way in
which the model will behave. These properties of the model can be called theorems; this
stage is represented by box C. Next, these characteristics of the model are interpreted in
terms of predictions of the way the corresponding real system should work, denoted by
box D. Finally, we perform appropriate experiments to discover whether these predictions
agree with observation. If they do not, we change or scrap the model and go round the
loop again. If they do, we hail the model as an engine of discovery, and keep using it to
make predictions until it wears out or breaks down. This last step is called using or
checking the model or, more grandly, validation.
16 1 Introduction

A D
Real
experiment and use predictions
world
measurement

construction interpretation

B C
Model rules of theorems
deduction
world model

Figure 1.7. A model for modelling.

This procedure is so commonplace that we rather take it for granted. For example, it
has been used every time you see a weather forecast. Meteorologists have observed the
climate for many years. They have deduced certain simple rules for the behaviour of jet
streams, anticyclones, occluded fronts, and so on. These rules form the model. Given any
conguration of airows, temperatures, and pressures, the rules are used to make a
prediction; this is the weather forecast. Every forecast is checked against the actual
outcome, and this experience is used to improve the model.
Models form extraordinarily powerful and economical ways of thinking about the
world. In fact they are often so good that the model is confused with reality. If you ever
think about atoms, you probably imagine little billiard balls; more sophisticated readers
may imagine little orbital systems of elementary particles. Of course atoms are not
`really' like that; these visions are just convenient old models.
We illustrate the techniques of modelling with two simple examples from probability.

Example 1.6.2: setting up a lottery. If you are organizing a lottery you have to
decide how to allocate the prize money to the holders of winning tickets. It would help
you to know the chances of any number winning and the likely number of winners. Is this
possible? Let us consider a specic example.
Several national lotteries allow any entrant to select six numbers in advance from the
integers 1 to 49 inclusive. A machine then selects six balls at random (without replace-
ment) from an urn containing 49 balls bearing these numbers. The rst prize is divided
among entrants selecting these numbers.
Because of the nature of the apparatus, it seems natural to assume that any selection of
six numbers is equally likely to be drawn. Of course this assumption is a mathematical
model, not a physical law established by experiment. Since there are approximately 14
million different possible selections (we show this in chapter 3), the model predicts that
your chance, with one entry, of sharing the rst prize is about one in 14 million. Figure
1.8 shows the relative frequency of the numbers drawn in the rst 1200 draws. It does not
seem to discredit or invalidate the model so far as one can tell.
1.6 Modelling 17

150
Number of appearances

100

1 10 20 30 40 49

Figure 1.8. Frequency plot of an actual 649 lottery after 1200 drawings. The numbers do seem
equally likely to be drawn.

The next question you need to answer is, how many of the entrants are likely to share
the rst prize? As we shall see, we need in turn to ask, how do lottery entrants choose
their numbers?
This is clearly a rather different problem; unlike the apparatus for choosing numbers,
gamblers choose numbers for various reasons. Very few choose at random; they use
birthdays, ages, patterns, and so on. However, you might suppose that for any gambler
chosen at random, that choice of numbers would be evenly distributed over the
possibilities.
In fact this model would be wrong; when the actual choices of lottery numbers are
examined, it is found that in the long run the chances that the various numbers will occur
are very far from equal; see gure 1.9. This clustering of preferences arises because
people choose numbers in lines and patterns which favour central squares, and they also
favour the top of the card. Data like this would provide a model for the distribution of
likely payouts to winners. s

It is important to note that these remarks do not apply only to lotteries, cards, and dice.
Venn's observation about card hands applies equally well to almost every other aspect of
life. If you wished to design a telephone exchange (for example), you would rst of all
construct some mathematical models that could be tested (you would do this by making
assumptions about how calls would arrive, and how they would be dealt with). You can
construct and improve any number of mathematical models of an exchange very cheaply.
Building a faulty real exchange is an extremely costly error.
Likewise, if you wished to test an aeroplane to the limits of its performance, you would
be well advised to test mathematical models rst. Testing a real aeroplane to destruction
is somewhat risky.
So we see that, in particular, models and theories can save lives and money. Here is
another practical example.
18 1 Introduction

1 2 3 4 5

6 7 8 9 10

11 12 13 14 15

16 17 18 19 20

21 22 23 24 25

26 27 28 29 30

31 32 33 34 35

36 37 38 39 40

41 42 43 44 45

46 47 48 49

Figure 1.9. Popular and unpopular lottery numbers: bold, most popular; roman, intermediate
popularity; italic, least popular.

Example 1.6.3: rst signicant digit. Suppose someone offered the following wager:
(i) select any large book of numerical tables (such as a census, some company
accounts, or an almanac);
(ii) pick a number from this book at random (by any means);
(iii) if the rst signicant digit of this number is one of f5, 6, 7, 8, 9g, then you win $1;
if it is one of f1, 2, 3, 4g, you lose $1.
Would you accept this bet? You might be tempted to argue as follows: a reasonable
intuitive model for the relative chances of each digit is that they are equally likely. On
this model the probability p of winning is 59, which is greater than 12 (the odds on winning
would be 5 : 4), so it seems like a good bet. However, if you do some research and
actually pick a large number of such numbers at random, you will nd that the relative
frequencies of each of the nine possible rst signicant digits are given approximately by
f 1 0:301, f 2 0:176, f 3 0:125,
f 4 0:097, f 5 0:079, f 6 0:067,
f 7 0:058, f 8 0:051, f 9 [Link]
Thus empirically the chance of your winning is
f 5 f 6 f 7 f 8 f 9 0:3
The wager offered is not so good for you! (Of course it would be quite improper for a
mathematician to win money from the ignorant by this means.) This empirical distribu-
tion is known as Benford's law, though we should note that it was rst recorded by S.
Newcomb (a good example of Stigler's law of eponymy). s
1.7 Mathematical modelling 19

We see that intuition is necessary and helpful in constructing models, but not sufcient;
you also need experience and observations. A famous example of this arose in particle
physics. At rst it was assumed that photons and protons would satisfy the same statistical
rules, and models were constructed accordingly. Experience and observations showed that
in fact they behave differently, and the models were revised.
The theory of probability exhibits a very similar history and development, and we
approach it in similar ways. That is to say, we shall construct a model that reects our
experience of, and intuitive feelings about, probability. We shall then deduce results and
make predictions about things that have either not been explained or not been observed,
or both. These are often surprising and even counter intuitive. However, when the
predictions are tested against experiment they are almost always found to be good. Where
they are not, new theories must be constructed.
It may perhaps seem paradoxical that we can explore reality most effectively by playing
with models, but this fact is perfectly well known to all children.

Exercise for section 1.6

1. Discuss how the development of the high-speed computer is changing the force of Venn's
observation, which introduced this section.

1 . 7 M AT H E M AT I C A L M O D E L L I N G
There are very few things which we know, which are not capable of being reduced
to a mathematical reasoning; and when they cannot, it is a sign our knowledge of
them is very small and confused; and where a mathematical reasoning can be had,
it is as great a folly to make use of any other, as to grope for a thing in the dark,
when you have a candle standing by you.
John Arbuthnot, Of the Laws of Chance
The quotation above is from the preface to the rst textbook on probability to appear in
English. (It is in a large part a translation of a book by Huygens, which had previously
appeared in Latin and Dutch.) Three centuries later, we nd that mathematical reasoning
is indeed widely used in all walks of life, but still perhaps not as much as it should be. A
small survey of the reasons for using mathematical methods would not be out of place.
The rst question is, why be abstract at all? The blunt answer is that we have no choice,
for many reasons.
In the rst place, as several examples have made clear, practical probability is inescap-
ably numerical. Betting odds can only be numerical, monetary payoffs are numerical,
stock exchanges and insurance companies oat on a sea of numbers. And even the
simplest and most elementary problems in bridge and poker, or in lotteries, involve
counting things. And this counting is often not a trivial task.
Second, the range of applications demands abstraction. For example, consider the
following list of real activities:
customers in line at a post ofce counter
cars at a toll booth
data in an active computer memory
20 1 Introduction

a pile of cans in a supermarket

telephone calls arriving at an exchange
patients arriving at a trauma clinic
letters in a mail box
All these entail `things' or `entities' in one or another kind of `waiting' state, before some
`action' is taken. Obviously this list could be extended indenitely. It is desirable to
abstract the essential structure of all these problems, so that the results can be interpreted
in the context of whatever application happens to be of interest. For the examples above,
this leads to a model called the theory of queues.
Third, we may wish to discuss the behaviour of the system without assigning specic
numerical values to the rate of arrival of the objects (or customers), or to the rate at which
they are processed (or serviced). We may not even know these values. We may wish to
examine the way in which congestion depends generally on these rates. For all these
reasons we are naturally forced to use all the mathematical apparatus of symbolism, logic,
algebra, and functions. This is in fact very good news, and these methods have the simple
practical and mechanical advantage of making our work very compact. This alone would
be sufcient! We conclude this section with two quotations chosen to motivate the reader
even more enthusiastically to the advantages of mathematical modelling. They illustrate
the fact that there is also a considerable gain in understanding of complicated ideas if
they are simply expressed in concise notation. Here is a denition of commerce.
Commerce: a kind of transaction, in which A plunders from B the goods of C, and
for compensation B picks the pocket of D of money belonging to E.
Ambrose Bierce, The Devil's Dictionary
The whole pith and point of the joke evaporates completely if you expand this from its
symbolic form. And think of the expansion of effort required to write it. Using algebra is
the reason or at least one of the reasons why mathematicians so rarely get writer's
cramp or repetitive strain injury.
We leave the nal words on this matter to Abraham de Moivre, who wrote the second
textbook on probability to appear in English. It rst appeared in 1717. (The second
edition was published in 1738 and the third edition in 1756, posthumously, de Moivre
having died on 27 November, 1754 at the age of 87.) He says in the preface:
Another use to be made of this Doctrine of Chances is, that it may serve in
conjunction with the other parts of mathematics as a t introduction to the art of
reasoning; it being known by experience that nothing can contribute more to the
attaining of that art, than the consideration of a long train of consequences, rightly
deduced from undoubted principles, of which this book affords many examples. To
this may be added, that some of the problems about chance having a great
appearance of simplicity, the mind is easily drawn into a belief, that their solution
may be attained by the mere strength of natural good sense; which generally proving
otherwise, and the mistakes occasioned thereby being not infrequent, it is presumed
that a book of this kind, which teaches to distinguish truth from what seems so
nearly to resemble it, will be looked on as a help to good reasoning.
These remarks remain as true today as when de Moivre wrote them around 1717.
1.8 Modelling probability 21

1 . 8 M O D E L L I N G P RO BA B I L I T Y
Rules and Models destroy genius and Art.
W. Hazlitt
First, we examine the real world and select the experiences and experiments that seem
best to express the nature of probability, without too much irrelevant extra detail. You
have already done this, if you have ever ipped a coin, or rolled a die, or wondered
whether to take an umbrella.
Second, we formulate a set of rules that best describe these experiments and exper-
iences. These rules will be mathematical in nature, for simplicity. (This is not paradox-
ical!) We do this in the next chapter.
Third, we use the structure of mathematics (thoughtfully constructed over the millenia
for other purposes), to derive results of practical interest. We do this in the remainder of
the book.
Finally, these results are compared with real data in a variety of circumstances: by
scientists to measure constants, by insurance companies to avoid ruin, by actuaries to
calculate your pension, by telephone engineers to design the network, and so on. This
validates our model, and has been done by many people for hundreds of years. So we do
not need to do it here.
This terse account of our program gives rise to a few questions of detail, which we
address here, as follows. Do we in fact need to know what probability `really' is? The
answer here is, of course, no. We only need our model to describe what we observe. It is
the same in physics; we do not need to know what mass really is to use Newton's or
Einstein's theories. This is just as well, because we do not know what mass really is. We
still do not know even what light `really' is. Questions of reality are best left for
philosophers to argue over, for ever.
Furthermore, in drawing up the rules, do we necessarily have to use the rather
roundabout arguments employed in section 1.2? Is there not a more simple and straight-
forward way to say what probability does? After all, Newton only had to drop apples to
see what gravity, force, and momentum did. Heat burns, electricity shocks, and light
shines, to give some other trivial examples.
By contrast, probability is strangely intangible stuff; you cannot accumulate piles of it,
or run your hands through it, or give it away. No meter will record its presence or absence,
and it is not much used in the home. We cannot deny its existence, since we talk about it,
but it exists in a curiously shadowy and ghost-like way. This difculty was neatly
pinpointed by John Venn in the 19th century:

It is sometimes not easy to give a clear denition of a science at the outset, so as to

set its scope and province before the reader in a few words. In the case of those
sciences which are directly concerned with what are termed objects, this difculty
is not indeed so serious. If the reader is already familiar with the objects, a simple
reference to them will give him a tolerably accurate idea of the direction and nature
of his studies. Even if he is not familiar with them they will still be often to some
extent connected and associated in his mind by a name, and the mere utterance of
the name may thus convey a fair amount of preliminary information.
22 1 Introduction

But when a science is concerned not so much with objects as with laws, the
difculty of giving preliminary information becomes greater.
The Logic of Chance

What Venn meant by this, is that books on subjects such as uid mechanics need not
ordinarily spend a great deal of time explaining the everyday concept of a uid. The
average reader will have seen waves on a lake, watched bathwater going down the plug-
hole, observed trees bending in the wind, and been annoyed by the wake of passing boats.
And anyone who has own in an aeroplane has to believe that uid mechanics
demonstrably works. Furthermore, the language of the subject has entered into everyday
discourse, so that when people use words like wave, or wing, or turbulence, or vortex,
they think they know what they mean. Probability is harder to put your nger on.

1.9 REVIEW
In this chapter we have looked at chance and probability in a non-technical way. It seems
obvious that we recognize the appearance of chance, but it is surprisingly difcult to give
a comprehensive denition of probability. For this reason, and many others, we have
begun to construct a theory of probability that will rely on mathematical models and
methods.
Our rst step on this path has been to agree that any probability is a number lying
between zero and unity, inclusive. It can be interpreted as a simple proportion in
situations with symmetry, or as a measure of long-run proportion, or as an estimate of
expected value, depending on the context. The next task is to determine the rules obeyed
by probabilities, and this is the content of the next chapter.

1 . 1 0 A P P E N D I X I . S O M E R A N D O M LY S E L E C T E D D E F I N I T I O N S
O F P RO BA B I L I T Y, I N R A N D O M O R D E R
One can hardly give a satisfactory denition of probability.
H. Poincare
Probability is a degree of certainty, which is to certainty as a part is to the whole.
J. Bernoulli
Probability is the study of random experiments.
S. Lipschutz
Mathematical probability is a branch of mathematical analysis that has developed around the
problem of assigning numerical measurement to the abstract concept of likelihood.
M. Munroe
Probability is a branch of logic which analyses nondemonstrative inferences, as opposed to
demonstrative ones.
E. Nagel
I call that chance which is nothing but want of art.
J. Arbuthnot
1.10 Appendix I. Some randomly selected denitions of probability in random order 23

The concept of probability is a generalization of the concepts of truth and falsehood.

J. Lucas
The probability of an event is the reason we have to believe that it has taken place or will take
place.
S. Poisson

Probability is the science of uncertainty.

G. Grimmett

Probability is the reason that we have to think that an event will occur, or that a proposition is
true.
G. Boole

Probability describes the various degrees of rational belief about a proposition given different
amount of knowledge.
J. M. Keynes

Probability is likeness to be true.

J. Locke

An event will on a long run of trials tend to occur with a frequency proportional to its
probability.
R. L. Ellis

One regards two events as equally probable when one can see no reason that would make one
more probable than the other.
P. Laplace

The probability of an event is the ratio of the number of cases that are favourable to it, to the
number of possible cases, when there is nothing to make us believe that one case should occur
rather than any other.
P. Laplace

Probability is a feeling of the mind.

A. de Morgan

Probability is a function of two propositions.

H. Jeffreys

The probability of an event is the ratio between the value at which an expectation depending on
the happening of the event ought to be computed, and the value of the thing expected upon its
happening.
T. Bayes

To have p chances of a, and q chances of b, is worth (ap bq)=( p q).

C. Huygens

Probability is a degree of possibility.

G. Leibniz

The limiting value of the relative frequency of an attribute will be called the probability of that
attribute.
R. von Mises
24 1 Introduction

The probability attributed by an individual to an event is revealed by the conditions under which
he would be disposed to bet on that event.
B. de Finetti
Probability does not exist.
B. de Finetti
Personalist views hold that probability measures the condence that a particular individual has
in the truth of a particular proposition.
L. Savage
The probability of an outcome is our estimate for the most likely fraction of a number of
repeated observations that will yield that outcome.
R. Feynman
It is likely that the word `probability' is used by logicians in one sense and by statisticians in
another.
F. P. Ramsey

1.11 APPENDIX II. REVIEW OF SETS AND FUNCTIONS

It is difcult to make progress in any branch of mathematics without using the ideas and notation of
sets and functions. Indeed it would be perverse to try to do so, since these ideas and notation are
very helpful in guiding our intuition and solving problems. (Conversely, almost the whole of
mathematics can be constructed from these few simple concepts.) We therefore give a brief synopsis
of what we need here, for completeness, although it is very likely that the reader will be familiar
with all this already.

Sets
A set is a collection of things that are called the elements of the set. The elements can be any kind of
entity: numbers, people, poems, blueberries, points, lines, and so on, endlessly.
For clarity, upper case letters are always used to denote sets. If the set S includes some element
denoted by x, then we say x belongs to S, and write x 2 S. If x does not belong to S, then we write
x2= S.
There are essentially two ways of dening a set, either by a list or by a rule.

Example 1.11.1. If S is the set of numbers shown by a conventional die, then the rule is that S
comprises the integers lying between 1 and 6 inclusive. This may be written formally as follows:
S fx: 1 < x < 6 and x is an integerg:
Alternatively S may be given as a list:
S f1, 2, 3, 4, 5, 6g: s

One important special case arises when the rule is impossible; for example, consider the set of
elephants playing football on Mars. This is impossible (there is no pitch on Mars) and the set
therefore is empty; we denote the empty set by . We may write as fg.
If S and T are two sets such that every element of S is also an element of T , then we say that T
includes S, and write either S T or S T. If S T and T S then S and T are said to be equal,
and we write S T .
Note that S for every S. Note also that some books use the symbol `' to denote inclusion
1.11 Appendix II. Review of sets and functions 25

and reserve `' to denote strict inclusion, that is to say, S T if every element of S is in T , and
some element of T is not in S. We do not make this distinction.

Combining sets
Given any non-empty set, we can divide it up, and given any two sets, we can join them together.
These simple observations are important enough to warrant denitions and notation.

Denition. Let A and B be sets. Their union, denoted by A [ B, is the set of elements that are in
A or B, or in both. Their intersection, denoted by A \ B, is the set of elements in both A and B. n

Note that in other books the union may be referred to as the join or sum; the intersection may be
referred to as the meet or product. We do not use these terms. Note the following.

Denition. If A \ B , then A and B are said to be disjoint. n

We can also remove bits of sets, giving rise to set differences, as follows.

Denition. Let A and B be sets. That part of A which is not also in B is denoted by A n B, called
the difference of A from B. Elements which are in A or B but not both, comprise the symmetric
difference, denoted by A B. n

Finally we can combine sets in a more complicated way by taking elements in pairs, one from
each set.

Denition. Let A and B be sets, and let

C f(a, b): a 2 A, b 2 Bg
be the set of ordered pairs of elements from A and B. Then C is called the product of A and B and
denoted by A 3 B. n

Example 1.11.2. Let A be the interval [0, a] of the x-axis, and B the interval [0, b] of the y-
axis. Then C A 3 B is the rectangle of base a and height b with its lower left vertex at the origin,
when a, b . 0. s

Venn diagrams
The above ideas are attractively and simply expressed in terms of Venn diagrams. These provide very
expressive pictures, which are often so clear that they make algebra redundant. See gure 1.10.
In probability problems, all sets of interest A lie in a universal set , so that A for all A.
That part of which is not in A is called the complement of A, denoted by A c. Formally
A c n A fx: x 2 , x 2 = Ag:
Obviously, from the diagram or by consideration of the elements
A [ A c , A \ A c , (A c ) c A:
Clearly A \ B B \ A and A [ B B [ A, but we must be careful when making more intricate
combinations of larger numbers of sets. For example, we cannot write down simply A [ B \ C; this
is not well dened because it is not always true that
(A [ B) \ C A [ (B \ C):
26 1 Introduction

Figure 1.10. The set A is included in the universal set .

We use the obvious notation

[
n
A r A1 [ A2 [ [ A n ,
r1
\n
A r A1 \ A2 \ \ A n :
r1

Denition. If A j \ A k , for j 6 k, and

[n
A r ,
r1
then the collection (A r ; 1 < r < n) is said to form a partition of . n

Size
When sets are countable it is often useful to consider the number of elements they contain; this is
called their size or cardinality. For any set A, we denote its size by jAj; when sets have a nite
number of elements, it is easy to see that size has the following properties.
If sets A and B are disjoint then
jA [ Bj jAj jBj,
and more generally, when A and B are not necessarily disjoint,
jA [ Bj jA \ Bj jAj jBj:
Naturally jj 0, and if A B then
jAj < jBj:
Finally, for the product of two such nite sets A 3 B we have
jA 3 Bj jAj 3 jBj:
When sets are innite or uncountable, a great deal more care and subtlety is required in dealing
with the idea of size. However, we can see intuitively that we can consider the length of subsets of a
line, or areas of sets in a plane, or volumes in space, and so on. It is easy to see that if A and B are
two subsets of a line, with lengths jAj and jBj respectively, then in general
jA [ Bj jA \ Bj jAj jBj:
Therefore jA [ Bj jAj jBj when A \ B .
We can dene the product of two such sets as a set in the plane with area jA 3 Bj, which satises
the well-known elementary rule for areas and lengths
jA 3 Bj jAj 3 jBj
and is thus consistent with the nite case above. Volumes and sets in higher dimensions satisfy
similar rules.
1.12 Problems 27

Functions
Suppose we have sets A and B, and a rule that assigns to each element a in A a unique element b in
B. Then this rule is said to dene a function from A to B; for the corresponding elements we write
b f (a):
Here the symbol f (:) denotes the rule or function; often we just call it f . The set A is called the
domain of f , and the set of elements in B that can be written as f (a) for some a is called the range
of f ; we may denote the range by R.
Anyone who has a calculator is familiar with the idea of a function. For any function key, the
calculator will supply f (x), if x is in the domain of the function; otherwise it says `error'.

Inverse function
If f is a function from A to B, we can look at any b in the range R of f and see how it arose from A.
This denes a rule assigning elements of A to each element of R, so if the rule assigns a unique
element a to each b this denes a function from R to A. It is called the inverse function and is
denoted by f 1 (:):
a f 1 (b):

Example 1.11.3: indicator function. Let A and dene the following function I(:) on :
I() 1 if 2 A,
I() 0 if 2= A:
Then I is a function from to f0, 1g; it is called the indicator of A, because by taking the value 1 it
indicates that 2 A. Otherwise it is zero. s

This is about as simple a function as you can imagine, but it is surprisingly useful. For example, note
that if A is nite you can nd its size by summing I() over all :
X
jAj I():
2

1 . 1 2 P RO B L E M S
Note well: these are not necessarily mathematical problems; an essay may be a sufcient answer.
They are intended to provoke thought about your own ideas of probability, which you may well have
without realizing the fact.
1. Which of the denitions of probability in Appendix I do you prefer? Why? Can you produce a
better one?
2. Is there any fundamental difference between a casino and an insurance company? If so, what is
it? (Do not address moral issues.)
3. You may recall the classic paradox of Buridan's mule. Placed midway between two equally
enticing bales of hay, it starved to death because it had no reason to choose one rather than the
other. Would a knowledge of probability have saved it? (The paradox is rst recorded by
Aristotle.)
4. Suppose a coin showed heads 10 times consecutively. If it looked normal, would you neverthe-
less begin to doubt its fairness?
28 1 Introduction

5. Suppose Alf says his dice are fair, but Bill says they are crooked. They look OK. What would
you do to decide the issue?
6. What do you mean by risk? Many public and personal decisions seem to be based on the
premise that the risks presented by food additives, aircraft disasters, and prescribed drugs are
comparable with the risks presented by smoking, road accidents, and heart disease. In fact the
former group present negligible risks compared with the latter. Is this rational? Is it compre-
hensible? Formulate your own view accurately.
7. What kind of existence does chance have? (Hint: What kind of existence do numbers have?)
8. It has been argued that seemingly chance events are not really random; the uncertainty about
the outcome of the roll of a die is just an expression of our inability to do the mechanical
calculations. This is the deterministic theory. Samuel Johnson remarked that determinism
erodes free will. Do you think you have free will? Does it depend on the existence of chance?
9. `Probability serves to determine our hopes and fears' Laplace. Discuss what Laplace meant
by this.
10. `Probability has nothing to do with an isolated case' A. Markov. What did Markov mean by
saying this? Do you agree?
11. `That the chance of gain is naturally overvalued, we may learn from the universal success of
lotteries' Adam Smith (1776). `If there were no difference between objective and subjective
probabilities, no rational person would play games of chance for money' J. M. Keynes
(1921).
Discuss.
12. A proportion f of $100 bills are forgeries. What is the value to you of a proffered $100 bill?
13. Flip a coin 100 times and record the relative frequency of heads over ve-ip intervals as a
graph.
14. Flip a broad-headed pin 100 times and record the relative frequency of `point up' over ve-ip
intervals.
15. Pick a page of the local residential telephone directory at random. Pick 100 telephone numbers
at random (a column or so). Find the proportion p 2 of numbers whose last digit is odd, and
also the proportion p1 of numbers whose rst digit is odd. (Ignore the area code.) Is there much
difference?
16. Open a book at random and nd the proportion of words in the rst 10 lines that begin with a
vowel. What does this suggest?
17. Show that A if and only if B A B.
18. Show that if A B and B A then A B.
Part A

Probability
2
The rules of probability

Probability serves to dene our hopes and fears.

P. Laplace

2.1 PREVIEW
In the preceding chapter we suggested that a model is needed for probability, and that this
model would take the form of a set of rules. In this chapter we formulate these rules.
When doing this, we shall be guided by the various intuitive ideas of probability as a
relative of proportion that we discussed in Chapter 1. We begin by introducing the
essential vocabulary and notation, including the idea of an event. After some elementary
calculations, we introduce the addition rule, which is fundamental to the whole theory of
probability, and explore some of its consequences.
Most importantly we also introduce and discuss the key concepts of conditional
probability and independence. These are exceptionally useful and powerful ideas and
work together to unlock many of the routes to solving problems in probability. By the end
of this chapter you will be able to tackle a remarkably large proportion of the better-
known problems of chance.

Prerequisites. We shall use the routine methods of elementary algebra, together with
the basic concepts of sets and functions. If you have any doubts about these, refresh your
memory by a glance at appendix II of chapter 1.

2 . 2 N OTAT I O N A N D E X P E R I M E N T S
From everyday experience, you are familiar with many ideas and concepts of probability;
this knowledge is gained by observation of lotteries, board games, sport, the weather,
futures markets, stock exchanges, and so on. You have various ways of discussing these
random phenomena, depending on your personal experience. However, everyday dis-
course is too diffuse and vague for our purposes. We need to become routinely much
more precise. For example, we have been happy to use words such as chance, likelihood,
probability, and so on, more or less interchangeably. In future we shall conne ourselves

31
32 2 The rules of probability

Table 2.1.

Procedure Outcomes
Roll a die One of 1, 2, 3, 4, 5, 6
Run a horse race Some horse wins it, or there is a dead heat (tie)
Buy a lottery ticket Your number either is or is not drawn

to using the word probability. The following are typical statements in this context.
The probability of a head is 12.
The probability of rain is 90%.
The probability of a six is 16.
The probability of a crash is 109 .
Obviously we could write down an endless list of probability statements of this kind; you
should write down a few yourself (exercise). However, we have surely seen enough such
assertions to realize that useful statements about probability can generally be cast into the
following general form:
(1) The probability of A is p:
In the above examples, A was à head', `rain', à six', and à crash'; and p was `12', `90%',
`16', and `109 ' respectively. We use this format so often that, to save ink, wrists, trees, and
time, it is customary to write (1) in the even briefer form
(2) P(A) p:
This is obviously an extremely efcient and compact written representation; it is still
pronounced as `the probability of A is p'. A huge part of probability depends on
equations similar to (2).
Here, the number p denotes the position of this probability on the probability scale
discussed in chapter 1. It is most important to remember that on this scale
(3) 0< p<1
If you ever calculate a probability outside this interval then it must be wrong!
We shall look at any event A, the probability function P(:), and probability P(A) in
detail in the next sections. For the moment we continue this section by noting that
underlying every such probability statement is some procedure or activity with a random
outcome; see table 2.1.
Useful probability statements refer to these outcomes. In everyday parlance this
procedure and the possible outcomes are often implicit. In our new rigorous model this
will not do. Every procedure and its possible outcomes must be completely explicit; we
stress that if you do not follow this rule you will be very likely to make mistakes. (There
are plenty of examples to show this.) To help in this task, we introduce some very
convenient notation and jargon to characterize all such trials, procedures, and actions.

Denition. Any activity or procedure that may give rise to a well-dened set of
outcomes is called an experiment. n
2.2 Notation and experiments 33

Denition. The set of all possible outcomes is denoted by , and called the sample
space. n

The adjective `well-dened' in the rst denition just means that you know what all the
possibilities of the experiment are, and could write them down if challenged to do so.
Prior to the experiment you do not know for sure what the outcome will be; when you
carry out the experiment it yields an outcome called the result. Often this result will have
a specic label such as `heads' or `it rains'. In general, when we are not being specic,
we denote the result of an experiment by . Obviously 2 ; that is to say, the result
always lies in the set of possible outcomes. For example, if is the set of possible
outcomes of a horse race in which Dobbin is a runner, then
fDobbin winsg
In this expression the curly brackets are used to alert you to the fact that what lies inside
them is one (or more) of the possible outcomes.
We conclude with two small but important points. First, any experiment can have many
different sample spaces attached to it.

Example 2.2.1. If you ip a coin twice, you may dene

f HH, HT , TH, TTg:
This lists everything that can arise from the experiment, so no larger sample space can be
more informative. However, you may only be interested in the number of heads shown. In
this case you may dene
f0, 1, 2g
and be quite satised. s

The second point is in a sense complementary to the rst. It is that you have little to
lose by choosing a large enough sample space to be sure of including every possible
outcome, even where some are implausible.

Example. 2.2.2 Suppose you are counting the number of pollen grains captured by a
lter. A suitable sample space is the set of all non-negative integers
f0, 1, 2, 3, . . .g:
Obviously only a nite number of these are possible (since there is only a nite amount of
pollen in existence), but any cut-off point would be unpleasantly arbitrary, and might be
too small. s

Furthermore it makes a calculation much easier if any number is allowed as a

possibility, however unlikely. Otherwise it might be necessary to keep track of this bound
throughout the calculation, a prospect which is both formidable and boring.

Exercise for section 2.2

1. Describe a suitable sample space for the following experiments.
(a) m balls are removed from an urn containing j jet (black) balls and k khaki balls.
34 2 The rules of probability

(b) The number of cars passing over a bridge in one week is counted.
(c) Two players play the best of three sets at tennis.
(d) You deal a poker hand to each of four players.

2.3 EVENTS
Suppose we have some experiment whose outcomes comprise , the sample space. As
we have noted above, the whole point of probability is to say how likely the outcomes are,
either individually or collectively. We therefore make the following denition.

Denition. An event is a subset of the sample space . n

Thus each event comprises one or more possible outcomes . By convention, events are
always denoted by capital letters such as A, B, C, . . . , with or without sufxes, super-
xes, or other adornments such as hats, bars, or stars. Here are a few simple but common
examples.

Example 2.3.1: two dice are rolled

(i) If we record their scores, then the sample space is
f(i, j): 1 < i, j < 6g:
We may be interested in the event A that the rst die is even and the second odd,
which is
A f(i, j): i 2 f2, 4, 6g, j 2 f1, 3, 5gg:
(ii) If we add their scores, then the sample space is
fk: 2 < k < 12g:
The event B that the sum of the scores is 7 or 11 is
B f7, 11g s

Example 2.3.2. Suppose you record the number of days this week on which it rains.
The sample space is
f0, 1, 2, 3, 4, 5, 6, 7g:
One outcome is that it rains on one day,
1 1:
The event that it rains more often than not is
A f4, 5, 6, 7g, s
comprising the outcomes 4, 5, 6, 7.

Example 2.3.3. A doctor weighs a patient to the nearest pound. Then, to be on the
safe side, we may agree that
fi: 0 < i < 20 000g:
Some outcomes here seem unlikely, or even impossible, but we lose little by including
them. Then
C fi: 140 < i < 150g
2.3 Events 35

is the event that the patient weighed between 140 and 150 pounds. s

Example 2.3.4. An urn contains a amber and b buff balls. Of these, c balls are
removed without replacing any in the urn, where
c < minfa, bg a ^ b:
Then is the collection of all possible sequences of a's and b's of length c. We may
dene the event D that the number of a's and b's removed is the same. If c is odd, then
this is the impossible event . s

Since events are sets, we can use all the standard ideas and notation of set theory as
summarized in appendix II of Chapter 1. If the outcome of an experiment lies in the
event A, then A is said to occur, or happen. In this case we have 2 A. We always have
A . If A does not occur, then obviously the complementary event A c must occur,
since lies in one of A or A c.
The notation and ideas of set theory are particularly useful in considering combinations
of events.

Example 2.3.5. Suppose you take one card from a conventional pack. Simple events
include
A the card is an ace,
B the card is red,
C the card is a club:
More interesting events are denoted using the operations of union and intersection. For
example
A \ C the card is the ace of clubs,
A [ B the card is either red or an ace or both:
Of course the card cannot be red and a club, so we have B \ C , where denotes the
impossible event, otherwise known as the empty set. s

This leads to a useful denition.

Denition. Two events A and B are said to be disjoint, incompatible, or mutually

exclusive, if they have no outcome in common. That is to say
A \ B , n
Obviously for any event A it is true that
A \ A c :
Notice that we have tacitly assumed that if we form any combination of events A and B,
such as C A [ B or D A \ B, then C and D are themselves events. This is almost
completely obvious, but the point needs to be stressed: unions, intersections, and
complements of events are themselves events.
36 2 The rules of probability

Table 2.2. Events and notation

Certain event or sample space

Impossible event
The event A occurs A
A does not occur Ac
Both A and B occur A\ B
Either or both of A and B occur A[ B
If A occurs then B occurs AB
A occurs, but not B An B
A and B are disjoint A\ B

Table 2.2 gives a brief summary of how set notation represents events and their
relationships.

Denition. If A1 , A2 , A3 , . .S. , A n , . . . is a collection of events such that

A i \ A j for all i 6 j, and i A i , then the A's are said to form a partition
of . n

As we have remarked above, many important relationships between events are very
simply and attractively demonstrated by means of Venn diagrams. For example, gure 2.1
demonstrates very neatly that
(A [ B) \ C (A \ C) [ (B \ C) and A [ (B \ C) (A [ B) \ (A [ C):

A B

Figure 2.1. Venn diagrams. In the upper gure the shaded area is equal to (A [ B) \ C and
(A \ C) [ (B \ C). In the lower gure the shaded area is equal to A [ (B \ C) and also to
(A [ B) \ (A [ C).
2.4 Probability; elementary calculations 37

Exercises for section 2.3

1. Here is a list of experiments, each with an associated event of interest. In each case write down
a suitable sample space , and dene the event of interest in terms of the elements of .
(a) Two dice are rolled; their sum is 3.
(b) 100 light bulbs are tested; at most 4 are defective.
(c) A family comprises 3 children; they are all of the same sex.
(d) The height of a tree is recorded to the nearest metre; it is between 10 and 15 metres high.
(e) Rod and Fred play the best of three sets at tennis; Rod wins.
(f) Tom and Jack are weighed to the nearest pound; Jack is heavier.
2. Show that for any events A and B
(a) (A [ B) c A c \ Bc
(b) A \ (B [ C) (A \ B) [ (A \ C):

2 . 4 P RO BA B I L I T Y; E L E M E N TA RY C A L C U L AT I O N S
Now that we have dened events, we can discuss their probabilities. Suppose some
experiment has outcomes that comprise , and A is an event in . Then the probability
that A occurs is denoted by P(A), where of course
(1) 0 < P(A) < 1:
Thus we can think of P as a rule that assigns a number P(A) 2 [0, 1] to each event A in
. The mathematical term for a rule like this is a function, as we discussed in appendix II
of chapter 1. Thus P() is a function of the events in , which takes values in the interval
[0, 1]. Before looking at the general properties of P, it seems sensible to gain experience
by looking at some simple special cases that are either familiar, or obvious, or both.

Example 2.4.1: Bernoulli trial. Many experiments have just two outcomes, for
example: head or tail; even or odd; win or lose; y or crash; and so on. To simplify
matters these are often thought of as examples of an experiment with the two outcomes
success or failure, denoted by S and F respectively. Then for the events S and F we write
P(S ) p and P(F) q 1 p: s

Example 2.4.2: equally likely outcomes. Suppose that an experiment has sample
space , such that each of the jj outcomes in is equally likely. This may be due to
some physical symmetry or the conditions of the experiment. Now let A be any event; the
number of outcomes in A is jAj. The equidistribution of probability among the outcomes
implies that the probability that A occurs should be proportional to jAj (we discussed this
at length in section 1.4). In cases of this type we therefore write
jAj number of outcomes in A
(2) P(A) : s
jj number of outcomes in
This is a large assumption, but it is very natural and compelling. It is so intuitively
attractive that it was being used explicitly in the 16th century, and it is clearly implicit in
the ideas and writings of several earlier mathematicians. Let us consider some further
examples of this idea in action.
38 2 The rules of probability

Example 2.4.3. Two dice are rolled, one after the other. Let A be the event that the
second number is greater than the rst. Here
jj 36
as there are 6 3 6 pairs of the form (i, j), with 1 < i, j < 6. The pairs with i , j are
given by
A f(5, 6), (4, 5), (4, 6), (3, 4), . . . , (1, 5), (1, 6)g;
it is easy to see that jAj 1 2 5 15. Hence
15 5
P(A) : s
36 12

Example 2.4.4. If a coin is ipped n times, what is the probability that each time the
same face shows? Here is the set of all possible sequences of H and T of length n,
which we regard as equally likely. Hence jj 2 n . The event A comprises all heads or
all tails. Hence jAj 2 and the required probability is
2
2( n1) : s
2n

Example 2.4.5: chain. Suppose you are testing a chain to destruction. It has n links
and is stretched between two shackles attached to a ram. The ram places the chain under
increasing tension until a link fails. Any link is equally likely to be the one that snaps,
and so if A is any group of links the probability that the failed link is in A is the
proportion of the total number of links in A. Now jj n, so
jAj
P(A) : s
n

Example 2.4.6: lottery. Suppose you have an urn containing 20 tickets marked 1, 2,
. . . , 20. A ticket is drawn at random. Thus
f1, 2, . . . , 20g fn: 1 < n < 20g:
Events in may include:
A the number drawn is even;
B the number drawn is divisible by 5;
C the number drawn is less than 8:
The implication of the words `at random' is that any number is equally likely to be
chosen. In this case our discussions above yield
jAj 10 1
P(A) ,
jj 20 2
jBj 4 1
P(B) ,
jj 20 5
and
jCj 7
P(C) : s
jj 20
2.4 Probability; elementary calculations 39

Here is a famous example.

Example 2.4.7: dice. Three dice are rolled and their scores added. Are you more
likely to get 9 than 10, or vice versa?

Solution. There are 63 216 possible outcomes of this experiment, as each die has
six possible faces. You get a sum of 9 with outcomes such as (1, 2, 6), (2, 1, 6), (3, 3, 3)
and so on. Tedious enumeration reveals that there are 25 such triples, so
25
P(9) :
216
A similar tedious enumeration shows that there are 27 triples, such as (1, 3, 6), (2, 4, 4)
and so on, that sum to 10. So
27
P(10) . P(9):
216
This problem was solved by Galileo before 1642. s

An interesting point about this example is that if your sample space does not distinguish
between outcomes such as (1, 2, 6) and (2, 1, 6), then the possible sums 9 and 10 can
each be obtained in the same number of different ways, namely six. However, actual
experiments with real dice demonstrate that this alternative model is wrong.
Notice that symmetry is quite a powerful concept, and implies more than at rst
appears.

Example 2.4.8. You deal a poker hand of ve cards face down. Now pick up the fth
and last card dealt; what is the probability that it is an ace? The answer is
1
P(A) :
13
Sometimes it is objected that the answer should depend on the rst four cards, but of
course if these are still face down they cannot affect the probability we want. By
1
symmetry any card has probability 13 of being an ace; it makes no difference whether the
pack is dealt out or kept together, as long as only one card is actually inspected. s
Our intuitive notions of symmetry and fairness enable us to assign probabilities in
some other natural and appealing situations.

Example 2.4.9: rope. Suppose you are testing a rope to destruction: a ram places it
under increasing tension until it snaps at a point S, say. Of course we suppose that the
rope appeared uniformly sound before the test, so the failure point S is equally likely to
be any point of the rope. If the rope is of length 10 metres, say, then the probability that it
2
fails within 1 metre of each end is naturally P(S lies in those 2 metres) 10 15. Like-
wise the probability that S lies in any 2 metre length of rope is 15, as is the probability that
S lies in any 2 metres of the rope, however this 2 metres is made up. s

Example 2.4.10: meteorite. Leaving your house one morning at 8 a.m. you nd that
a meteorite has struck your car. When did it do so? Obviously meteorites take no account
of our time, so the time T of impact is equally likely to be any time between your leaving
40 2 The rules of probability

the car and returning to it. If this interval was 10 hours, say, then the probability that the
1
meteorite fell between 1 a.m. and 2 a.m. is 10 . s

These are special cases of experiments in which the outcome is equally likely to be any
point of some interval [a, b], which may be in time or space.
We can dene a probability function for this kind of experiment quite naturally as
follows. The sample space is the interval [a, b] of length b a. Now let A be any
interval in of length l jAj. Then the equidistribution of probability among the
outcomes in implies that the probability of the outcome being in A is
jAj l
P(A) :
jj b a

Example 2.4.9. revisited: rope. What is the probability that the rope fails nearer to
the xed point of the ram than the moving point? There are 5 metres nearer the xed
5
point, so this probability is 10 12. s

This argument is equally natural and appealing for points picked at random in regions
of the plane. For deniteness, let be some region of the plane with area jj. Let A be
some region in of area jAj. Suppose a point P is picked at random in , with any point
equally likely to be chosen. Then the equidistribution of probability implies that the
probability of P being in A is
jAj
P(A) :
jj

Example 2.4.10. The garden of your house is the region , with area 100 square
metres; it contains a small circular pond A of radius 1 metre. You are telephoned by a
neighbour who tells you your garden has been struck by a small meteorite. What is the
probability that it hit the pond? Obviously, by everything we have said above
jAj
P(hits pond) : s
jj 100

We return to problems of this type later on.

Let us conclude this section with an example which demonstrates a simple but
important point.

Example 2.4.11: probabilistic equivalence. Consider the following three experi-

ments.
(i) A fair die is rolled, and the number shown is noted.
(ii) An urn contains 5 alabaster balls, and 1 beryl ball. They are removed until the beryl
ball appears, and the number removed is noted.
(iii) Six different numbers are drawn for a lottery and the position in which the smallest
of them is drawn is noted.
In each of these the sample space can be written
f1, 2, 3, 4, 5, 6g,
2.5 The addition rules 41

and by construction, in all three cases,

1
P( i ) , 1 < i < 6:
6
These experiments are probabilistically equivalent, since and P() are essentially the
same for each. s

Since it makes no difference which experiment we use to yield these probabilities, there
is something to be said for having a standard format to present the great variety of
probability problems. For reasons of tradition, urns are often used.
Thus example 2.4.11(ii) would be a standard presentation of the probabilities above.
There are three main reasons for this. The rst is that urns are often useful in situations
too complicated to be readily modelled by coins and dice. The second reason is that using
urns (instead of more realistic descriptions) enables the student to see the probabilistic
problems without being confused by false intuition. The third reason is historical: urns
were widely used in conducting lotteries and elections (in both cases because they are
opaque, thus preventing cheating in the rst place and allowing anonymous voting in the
second). It was therefore natural for early probabilists to use urns as models of real
random behaviour.

Exercises for section 2.4

1. You make one spin of a roulette wheel. (Assume your wheel has 37 pockets numbered from 0 to
36 inclusive.) What is the probability that the outcome is odd?
2. You take two cards at random from a pack of 52. What is the probability that both are aces?
3. You pick a point P at random in a triangle ABC. What is the probability that is in ABD, where
D lies in the side BC?
4. You pick a point at random in a circle. What is the probability that it is nearer to the centre than
to the perimeter?
5. An urn contains a amber balls and b beryl balls. You remove them one by one. What is the
probability that the last but one ball is beryl? (Hint: No one said you looked at them as you
removed them.)

2 . 5 T H E A D D I T I O N RU L E S
Of course not all experiments have equally likely outcomes, so we need to x rules that
tell us about the properties of the probability function P, in general. Naturally we continue
to require that for any event A
(1) 0 < P(A) < 1,
and in particular the certain event has probability 1, so
(2) P() 1:
The most important rule is the following.

Addition rule. If A and B are disjoint events, then

(3) P(A [ B) P(A) P(B):
42 2 The rules of probability

This rule lies at the heart of probability. First let us note that we need such a rule, because
A [ B is an event when A and B are events, and we therefore need to know its probability.
Second, note that it follows from (3) (by induction) that if A1 , A2 , . . . , A n is any
collection of disjoint events then
S n
(4) P i1 A i P(A1 ) P(A n ):
The proof forms exercise 3, at the end of this section.
Third, note that it is sometimes too restrictive to conne ourselves to a nite collection
of events (we have seen several sample spaces with innitely many outcomes), and we
therefore need an extended version of (4).

Extended addition rule. If A1 , A2 , . . . is a collection of disjoint events then

(5) P(A1 [ A2 [ ) P(A1 ) P(A2 ) :
Equation (5) together with (1) and (2),
0 < P(A) < 1 and P() 1,
are sometimes said to be the axioms of probability. They describe the behaviour of the
probability function P dened on subsets of . In fact, in everyday usage P is not referred
to as a probability function but as a probability distribution. Formally we state the
following.

Denition. Let be a sample space and suppose that P() is a probability function
on a family of subsets of satisfying (1), (2), and (5). Then P is called a probability
distribution on . n

The word distribution is used because it is natural to think of probability as something

that is distributed over the outcomes in . The function P tells you just how it is
distributed. In this respect probability behaves like distributed mass, and indeed in many
books authors do speak of a unit of probability mass being distributed over the sample
space, and refer to P as a probability mass function. This metaphor can be a useful aid to
intuition because, of course, mass obeys exactly the same addition rule. If two distinct
objects A and B have respective masses m(A) and m(B), then the mass of their union
m(A [ B) satises
m(A [ B) m(A) m(B):
Of course mass is non-negative also, which reinforces the analogy.
We conclude this section by showing how the addition rule is consistent with, and
suggested by, all our interpretations of probability as a proportion.
First, consider an experiment with equally likely outcomes, for which we dened
probability as the proportion
jAj
P(A) :
jj
If A and B are disjoint then, trivially,
jA [ Bj jAj jBj:
Hence in this case
2.5 The addition rules 43

jA [ Bj jAj jBj
P(A [ B) P(A) P(B):
jj jj jj
Second, consider the interpretation of probability as reecting relative frequency in the
long run. Suppose an experiment is repeated N times. At each repetition, events A and B
may, or may not, occur. If they are disjoint, they cannot both occur at the same repetition.
We argued in section 1.8 that the relative frequency of any event should be not too far
from its probability. Indeed, it is often the case that the relative frequency N (A)=N of an
event A is the only available guide to its probability P(A). Now, clearly
N (A [ B) N (A) N (B):
Hence, dividing by N, there is a powerful suggestion that we should have
P(A [ B) P(A) P(B):
Third, consider probability as a measure of expected value. For this case we resurrect
the benevolent plutocrat who is determined to give away $1 at random. The events A and
B are disjoint; if A occurs you get $1 in your left hand, if B occurs you get $1 in your
right hand. If (A [ B) c occurs, then Bob gets $1. The value of this offer to you is
$P(A [ B), the value to your left hand is $P(A), and the value to your right hand is $P(B).
But obviously it does not matter in which hand you get the money, so
P(A [ B) P(A) P(B):
Finally, consider the case where we imagine a point is picked at random anywhere in
some plane region of area jj. If A , we dened
jAj
P(A) :
jj
Since area also satises the addition rule, we have immediately, when A \ B , that
P(A [ B) P(A) P(B):
It is interesting and important to note that in this case the analogy with mass requires the
unit probability mass to be distributed uniformly over the region . We can envisage this
distribution as a lamina of uniform density jj1 having total mass unity. This may seem
a bizarre thing to imagine, but it turns out to be useful later on.
In conclusion, it seems that the addition rule is natural and compelling in every case
where we have any insight into the behaviour of probability. Of course it is a big step to
say that it should apply to probability in every other case, but it seems inevitable. And
doing so has led to remarkably elegant and accurate descriptions of the real world.

Exercises for section 2.5

1. Bernoulli trial. Show that for a Bernoulli trial P(S ) P(F) 1.
2. Why is the extended addition rule not necessary when is nite?
3. Show that (4) follows from (3).
4. Show that P(A \ B) < minfP(A), P(B)g.
5. Show that for any events A and B, P(A [ B) < P(A) P(B).
44 2 The rules of probability

2.6 SIMPLE CONSEQUENCES

We have agreed that the probability distribution P satises
(1) P(A) > 0, for any A,
(2) P() 1,
(3) P(A [ B) P(A) P(B), when A \ B :
These rules are simple enough, but they have remarkably extensive consequences. Let us
look at some simple applications and deductions.

Complements. We know that for any event A, A [ A c and A \ A c . Hence

by (3) and (2), and as illustrated in gure 2.2,
1 P() P(A [ A c ) P(A) P(A c ):
Thus the probability that A does not occur is
(4) P(A c ) 1 P(A):
In particular, for the impossible event ,
P() P( c ) 1 P() 0:

c
Figure 2.2. P(A) P(A ) P() 1.

It is very pleasant to see this consistency with our intuitive probability scale. Note,
however, that the converse is not true, that is, P(A) 0 does not imply that A , as we
now see in an example.

Example 2.6.1. Pick a point at random in the unit square, say, and let A be the event
that this point lies on a diagonal of the square. As we have seen above,
jAj
P(A) jAj,
jj
where jAj is the area of the diagonal. But lines have zero area; so P(A) 0, even though
the event A is clearly not impossible. s

Let us consider some examples of the complement rule (4).

Example 2.6.2. A die is rolled. How many rolls do you need, to have a better than
evens chance of rolling at least one six?
2.6 Simple consequences 45

Solution. If you roll a die r times, there are 6 r possible outcomes. On each roll there
are 5 faces that do not show a six, and there are therefore in all 5 r outcomes with no six.
Hence P(no six in r rolls) 5 r =6 r : Hence, by (4),

P(at least one six) 1 P(no six) 1 56 r :

For this to be better than evens, we need r large enough that 1 56 r . 12. A short
calculation shows that r 4. s

Example 2.6.3: de Mere's problem. Which of these two events is more likely?
(i) Four rolls of a die yield at least one six.
(ii) Twenty-four rolls of two dice yield at least one (6, 6), i.e. double six.

Solution. Let A denote the rst event and B the second event. Then A c is the event
that no six is shown. There are 64 equally likely outcomes, and 54 of these show no six.
Hence by (4)

P(A) 1 P(A c ) 1 56 4 :
Likewise
3524
P(B) 1 P(Bc ) 1 36 :
Now after a little calculation we nd that
671 1
1296 P(A) . 2 . P(B) ' [Link]
So the rst event is more likely. s

Difference rule. More generally we have the following rule for differences. Suppose
that B A. Then
A B [ (Bc \ A) B [ (A n B) and B \ (Bc \ A) :
Hence
P(A) P(B) P(Bc \ A)
and so
(5) P(A n B) P(A) P(B), if B A:
Figure 2.3 almost makes this argument unnecessary.

A\B

Figure 2.3. When B A, then P(AnB) P(A) P(B).

46 2 The rules of probability

Of course many events are not disjoint. What can we say of P(A [ B) when
A \ B 6 ? The answer is given by the following rule.

Inclusionexclusion rule. This says that, for any events A and B, the probability that
either occurs is given by
(6) P(A [ B) P(A) P(B) P(A \ B):

Proof. Using the addition rule three times shows that

P(A) P(A \ B) P(A \ Bc ),
P(B) P(A \ B) P(B \ A c ),
P(A [ B) P(A \ B) P(A \ Bc ) P(B \ A c ):
(Drawing the Venn diagram in Figure 2.4 makes these relations obvious.) Now, adding
the rst two equalities, and then subtracting that from the third, gives the result. h

Let us consider some examples of this.

Example 2.6.4: wet and windy. From meteorological records it is known that for a
certain island at its winter solstice, it is wet with probability 30%, windy with probability
40%, and both wet and windy with probability 20%.
Using the above rules we can nd the probability of other events of interest. For
example:
(i) P(dry) P(not wet) 1 0:3 0:7, by (4);
(ii) P(dry and windy) P(windynwet) P(windy) P(wet and windy) 0:2, by (5);
(iii) P(wet or windy) 0:4 0:3 0:2 0:5, by (6). s

Example 2.6.5: alarms. A kitchen contains two re alarms; one is activated by

smoke and the other by heat. Experience has shown that the probability of the smoke
alarm sounding within one minute of a re starting is 0.95, the probability of the heat
alarm sounding within one minute of a re starting is 0.91, and the probability of both
sounding within a minute is 0.88. What is the probability of at least one alarm sounding
within a minute?

Figure 2.4. It can be seen that P(A [ B) P(A) P(B) P(A \ B).
2.7 Conditional probability; multiplication rule 47

Solution. We have, with an obvious notation,

P( H [ S ) P( H) P(S ) P( H \ S )
0:91 0:95 0:88
[Link] s
The inclusionexclusion rule can be extended to cover more than two events (see the
exercises and problems), but expressions become long and tedious. Sometimes it is
enough to have bounds for probabilities; here is a famous example.

Boole's inequality. For any events A1 , . . . , A n , we have

Xn
(7) P(A1 [ A2 [ [ A n ) < P(A r ):
r1

Proof. From (6) this is obviously true for n 2. Suppose it is true for some n > 2;
then
X
n1
P(A1 [ A2 [ [ A n1 ) < P(A1 [ A2 [ [ A n ) P(A n1 ) < P(A r ):
r1
The result follows by induction. h

Exercises for section 2.6

1. Suppose that A B. Show that P(A) < P(B).

2. Wet, windy and warm. Show that for any events A (wet), B (windy), and C (warm),
P(A [ B [ C) P(A) P(B) P(C) P(A \ B) P(B \ C) P(C \ A) P(A \ B \ C):
3. Two dice are rolled. How many rolls do you need for a better than evens chance of at least one
double six?

4. Galileo's problem (example 2.4.7) revisited. Let S k be the event that the sum of the three
dice is k. Find P(S k ) for all k.

5. Pepys' problem (1693)

(a) Find the probability that at least 1 six is shown when 6 dice are rolled.
(b) Find the probability that at least 2 sixes are shown when 12 dice are rolled, and compare
with the answer to (a).

Remark. Pepys put this problem to Isaac Newton, but was then very reluctant to accept
Newton's (correct) answer.
6. Show that the probability that exactly one of the events A or B occurs is P(A)
P(B) 2P(A \ B).

2 . 7 C O N D I T I O NA L P RO BA B I L I T Y; M U LT I P L I C AT I O N RU L E
In real life very few experiments amount to just one action with random outcomes; they
usually have a more complicated structure in which some results may be known before
the experiment is complete. Or conditions may change. We need a new rule to add to
48 2 The rules of probability

Table 2.3. Outcomes of 700 treatments grouped by size of

stone

Outcomes

Size success (S ) failure total

small 315 42 357
large (L) 247 96 343
total 562 138 700

those given in section 2.5; it is called the conditioning rule. Before we give the rule, here
are some examples.

Example 2.7.1. Suppose you are about to roll two dice, one from each hand. What is
the probability that your right-hand die shows a larger number than the left-hand die?
There are 36 outcomes, and in 15 of these the right-hand score is larger. So
(1) P(right-hand larger) 15
36

Now suppose you roll the left-hand die rst, and it shows 5. What is the probability that
the right-hand die shows more? It is clearly not 15
36. In fact only one outcome will do: it
must show 6. So the required probability is 16. s

This is a special case of the general observation that if conditions change then results
change. In particular, if the conditions under which an experiment takes place are altered,
then the probabilities of the various outcomes may be altered. Here is another illustration.

Example 2.7.2: stones. Kidney stones are either small, (i.e. , 2 cm diameter) or
large, (i.e. . 2 cm diameter). Treatment can either succeed or fail. For a sequence of 700
patients the stone sizes and outcomes were as shown in table 2.3. Let L denote the event
that the stone treated is large. Then, clearly, for a patient selected at random from the 700
patients,
(2) P(L) 343
700:
Also, for a patient picked at random from the 700, the probability of success is
(3) P(S ) 562
700 ' 0:8

However, suppose a patient is picked at random from those whose stone is large. The
probability of success is different from that given in (3); it is obviously
247
343 ' [Link]
That is to say, the probability of success given that the stone is large is different from the
probability of success given no knowledge of the stone.
This is natural and obvious, but it is most important and useful to have a distinct
notation, in order to keep the conditions of an experiment clear and explicit in our minds
2.7 Conditional probability; multiplication rule 49

and working. We therefore denote the probability of S given L by

(4) P(SjL) ' 0:72
and refer to P(SjL) as the conditional probability of success, given L.
Likewise, we could imagine selecting a patient at random from those whose treatment
was successful, and asking for the probability that such a patient's stone had been large.
This is the conditional probability of L given S, and, from table 2.1, we easily see that
(5) P(LjS ) 247
562 ' 0:44 s
After this preamble, we can now usefully state the rule governing conditional prob-
ability.

Conditioning rule. Let A and B be events. Then the conditional probability of A

given that B occurs is dened as
P(A \ B)
(6) P(AjB)
P(B)
whenever P(B) . 0.

This may seem a little arbitrary, but it is strongly motivated by our interpretation of
probability as an extension of proportion. We may run through the usual examples in the
usual way.
First, consider an experiment with equally likely outcomes, for which
jAj jBj
P(A) and P(B) :
jj jj
Given simply that B occurs, all the outcomes in B are still equally likely. Essentially, we
now have an experiment with jBj equally likely outcomes, in which A occurs if and only
if A \ B occurs. Hence under these conditions
jA \ Bj
P(AjB) :
jBj
But

jA \ Bj jA \ Bj jBj P(A \ B)
,
jBj jj jj P(B)
which is what (6) says.
Second, we consider the argument from relative frequency. Suppose an experiment is
repeated a large number n of times, yielding the event B on N(B) occasions. Given that B
occurs, we may conne our attention to these N(B) repetitions. Now A occurs in just
N(A \ B) of these, and so empirically
N(A \ B) N(A \ B) n
P(AjB) '
N(B) n N(B)
P(A \ B)
'
P(B)
which is consistent with (6).
Third, we return to the interpretation of probability as a fair price. Once again a
plutocrat offers me a dollar. In this case I get the dollar only if both the events A and B
50 2 The rules of probability

occur, so this offer is worth $P(A \ B) to me. But we can also take it in stages: suppose
that if B occurs, the dollar bill is placed on a table, and if A then occurs the bill is mine.
Then
(i) the value of what will be on the table is $P(B),
(ii) the value of a dollar bill on the table is $P(AjB).
The value of the offer is the same, whether or not the dollar bill has rested on the table, so
P(A \ B) P(AjB)P(B),
which is (6) yet again.
Finally consider the experiment in which a point is picked at random in some plane
region . For any regions A and B, if we are given that B occurs then A can occur if and
only if A \ B occurs. Naturally then it is reasonable to require that P(AjB) is proportional
to jA \ Bj, the area of A \ B. That is, for some k,
P(AjB) kP(A \ B):
Now we observe that obviously P(jB) 1, so
1 kP( \ B) kP(B),
as required. Figure 2.5 illustrates the rule (6).
Let us see how this rule applies to the simple examples at the beginning of this section.

Example 2.7.1 revisited. According to the rule,

P(right shows 6 and left shows 5)
P(right die largerjleft shows 5)
P(left shows 5)

1 1 1

36 6 6
as we saw in (1). s

Example 2.7.2 revisited. According to the rule,

P(S \ L) 247 343
P(SjL) ,
P(L) 700 700

AB AB

B B

Figure 2.5. The left-hand diagram shows all possible outcomes . The right-hand diagram
corresponds to our knowing that the outcome must lie in B; P(AjB) is thus the proportion
P(A \ B)=P(B) of these possible outcomes.
2.7 Conditional probability; multiplication rule 51

which is (4), and

P(L \ S ) 247 562
P(LjS ) ,
P(S ) 700 700
which is (5). s

Here are some fresh applications.

Example 2.7.3. A coin is ipped three times. Let A be the event that the rst ip
gives a head, and B the event that there are exactly two heads overall. We know that
f HHH , HHT , HTH , THH , TTH, THT , HTT , TTT g
A f HTT , HHT , HHH , HTH g
B f HHT , HTH, THHg
A \ B f HHT , HTHg:
Hence by (6),
P(A \ B) jA \ Bj jj 2
P(AjB)
P(B) jj jBj 3
and
P(A \ B) 1
P(BjA) 2: s
P(A)

It is particularly important to be careful and painstaking in using conditional prob-

ability. This may seem a trite remark, but experience has shown that students are prone to
attempt intuitive short cuts leading to wrong answers. Here is an example that demon-
strates this.

Example 2.7.4. A box contains a double-headed coin, a double-tailed coin and a

conventional coin. A coin is picked at random and ipped. It shows a head. What is the
probability that it is the double-headed coin?

Solution. The three coins have 6 faces, so jj 6.

Let D be the event that the coin is double-headed, and A the event that it shows a head.
Then 3 faces yield A, so
P(A) 12
Two faces yield A and D (as it can be either way up) so
P(A \ D) 13
Finally
P(A \ D) 2
P(DjA) 3: s
P(A)

The point of this example is that many people are prepared to argue as follows: `If the
coin shows a head, it is either double-headed or the conventional coin. Since the coin was
picked at random, these are equally likely, so P(DjA) 12'.
52 2 The rules of probability

This is supercially plausible but, as we have seen, it is totally wrong. Notice that this
bogus argument avoids mentioning the sample space ; this is very typical of false
reasoning in probability problems.
We conclude this section by looking at (6) again. It assumes that we know P(A \ B) and
P(B), and denes P(AjB). However, in applications we quite often know P(B) and P(AjB),
and wish to know P(A \ B). In cases like this, we use the following reformulation of (6).

Multiplication rule. For any events A and B

(7) P(A \ B) P(AjB)P(B):
Notice that we can avoid the tiresome reservation that P(B) 6 0 in this version; when
P(B) 0 we just assign the value zero to both sides. This is not a very interesting case, of
course.
Here is an example of (7) in use.

Example 2.7.5: socks. A box contains 5 red socks and 3 blue socks. If you remove 2
socks at random, what is the probability that you are holding a blue pair?

Solution. Let B be the event that the rst sock is blue, and A the event that you have a
pair of blue socks. If you have one blue sock, the probability that the second is blue is the
chance of drawing one of the 2 remaining blues from the 7 remaining socks. That is to
say
P(AjB) 27:
Here A A \ B and so, by (7),
P(A) P(AjB)P(B)
27 3 38 28
3
:
Of course you could do this problem by enumerating the entire sample space for the
experiment, but the above method is much easier. s

The multiplication rule can be usefully extended, thus.

Extended multiplication rule. Let A1 , A2 , . . . , A n be any events such that

(8) P(A1 \ A2 \ \ A n ) . 0:
then we have the extended rule
P(A1 \ A2 \ \ A n ) P(A n jA n1 \ \ A1 )
3 P(A n1 jA n2 \ \ A1 )
..
.
3 P(A2 jA1 )P(A1 ):
This follows immediately if we write, for all k,
P(A k \ \ A1 )
P(A k jA k1 \ \ A1 ) :
P(A k1 \ \ A1 )
Then successive cancellation on the right-hand side gives the result required. s
2.7 Conditional probability; multiplication rule 53

Finally let us stress that conditional probability is not in any way an unnatural or purely
theoretical concept. It is completely familiar and natural to you if you have ever bought
insurance, played golf, or observed horse racing, to choose just three examples of the
myriad available. Thus:

Insurance. Insurance companies require large premiums from young drivers to

insure a car, because they know that P(claim|young driver) P(claim). For similarly
obvious reasons, older customers must pay more for life insurance.

Golf. If you play against the Open Champion then P(you win) ' 0. However, given a
sufciently large number of strokes it can be arranged that P(you winjhandicap) ' 12.
Thus any two players can have a roughly even contest.

Horse races. Similarly any horse race can be made into a much more open contest by
requiring faster horses to carry additional weights. Much of the betting industry relies on
the judgement of the handicappers in doing this.
The objective of the handicapper in choosing the weights is to equalize the chances to
some extent and introduce more uncertainty into the result. The ante-post odds reect the
bookmakers' assessment of how far he has succeeded, and the starting prices reect the
gambler's assessment of the position. (See section 2.12 for an introduction to odds.)
Of course this is not the limit to possible conditions; if it rains heavily before a race
then the odds will change to favour horses that run well in heavy conditions. And so on.
Clearly this idea of conditional probability is relevant in almost any experiment; you
should think of some more examples (exercise).

Exercises for section 2.7

1. On any day the chance of rain is 25%. The chance of rain on two consecutive days is 10%.
Given that it is raining today, what is the chance of rain tomorrow? Given that it will rain
tomorrow, what is the chance of rain today?
2. Show that the conditional probability P(AjB) satises the three axioms of probability:
(a) 0 < P(AjB) < 1,
(b) P(jB) 1,
(c) P(A1 [ A2 jB) P(A1 jB) P(A2 jB)
when A1 and A2 are disjoint. Show also that P(AjB) 1 P(A c jB).
3. Extended multiplication. Show that
P(A \ B \ C) P(AjB \ C)P(BjC)P(C):
4. Your kit bag contains 15 wrist bands, of which 6 are blue, 5 are red and 4 are green. You pick 3
at random. What is the probability that they are all (a) red? (b) the same colour?
5. Show that
P(A \ BjA [ B) < minfP(A \ BjA), P(A \ BjB)g:
6. The prosecutor's fallacy. Let G be the event that some accused person is guilty, and T the event
that some testimony or evidence presented is in fact true. It has been known for lawyers to argue
on the assumption that P(GjT ) P(T jG). Show that this holds if and only if P(G) P(T ).
54 2 The rules of probability

2 . 8 T H E PA RT I T I O N RU L E A N D BAY E S ' RU L E
In this section we look at some of the simple consequences of our denition of
conditional probability. The rst and most important thing to show is that conditional
probability satises the rules for a probability function (otherwise its name would be very
misleading).

Conditional probability is a probability

First, for any A and B,
0 < P(A \ B) < P(B)
and so
(1) 0 < P(AjB) < 1:
Second, for any B,
P(jB) P( \ B)=P(B) 1:
Third, if A1 and A2 are disjoint then A1 \ B and A2 \ B are disjoint. Hence
(2) P(A1 [ A2 jB) fP(A1 \ B) P(A2 \ B)g=P(B)
P(A1 jB) P(A2 jB):
Next we consider some very important applications of conditional probability. Let us
recall the multiplication rule for any events A and B,
(3) P(A \ B) P(AjB)P(B):
Since this holds true for any pair of events, it is also true for A and Bc that
(4) P(A \ Bc ) P(AjBc )P(Bc ):
But B and Bc are disjoint events, so by the addition rule
(5) P(A \ B) P(A \ Bc ) P((A \ B) [ (A \ Bc ))
P(A):
Combining (3), (4), and (5) gives the extremely important

Partition rule
(6) P(A) P(AjB)P(B) P(AjBc )P(Bc ):
This has a conditional form as well: for any three events A, B, and C, we have the

Conditional partition rule

(7) P(AjC) P(AjB \ C)P(BjC) P(AjBc \ C)P(Bc jC):
The proof of this is exercise 4 at the end of the section.
The partition rule may not seem very impressive at rst sight, but it is in fact one of the
most important and frequently used results in probability. Here are some examples of its
use.

Example 2.8.1: pirates. An expensive electronic toy made by Acme Gadgets Inc. is
defective with probability 103. These toys are so popular that they are copied and sold
2.8 The partition rule and Bayes' rule 55

illegally but cheaply. Pirate versions capture 10% of the market, and any pirated copy is
defective with probability 12. If you buy a toy, what is the chance that it is defective?

Solution. Let A be the event that you buy a genuine article, and let D be the event
that your purchase is defective. We know that
9
P(A) 10 , P(A c ) 10
1
, 1
P(DjA) 1000 , P(DjA c ) 12 :
Hence, by the partition rule,
P(D) P(DjA)P(A) P(DjA c )P(A c )
10 9000 20
1
' 5%: s

Example 2.8.2: tests. In a population of individuals a proportion p are subject to a

disease. A test is devised to indicate whether any given individual does have the disease;
such an indication is called a positive test. No test is perfect, and in this case the
probability that the test is positive for an individual with the disease is 95%, and the
probability of a positive result for an individual who does not have the disease is 10%. If
you test a randomly selected individual, what is the chance of a positive result?

Solution. Let R denote the event that the result is positive, and D the event that the
individual has the disease. Then by (6)
P(R) P(RjD)P(D) P(RjDc )P(Dc )
0:95p 0:1(1 p)
0:85p [Link]
For a test as bad as this you will get a lot of positive results even if the disease is rare; if it
is rare, most of these will be false positives. s

Example 2.8.3. Patients may be treated with any one of a number of drugs, each of
which may give rise to side effects. A certain drug C has a 99% success rate in the
absence of side effects, and side effects only arise in 5% of cases. However, if they do
arise then C has only a 30% success rate. If C is used, what is the probability of the event
A that a cure is effected?

Solution. Let B be the event that no side effects occur. We are given that
99
P(AjB \ C) 100 ,
95
P(BjC) 100 ,

P(A c jBc \ C) 100

30
,

P(Bc jC) 100

5
:
Hence, by the conditional partition rule (7),
56 2 The rules of probability
99 95 30 5 9555
P(AjC) 100 3 100 100 3 100 10000 ' 95%: s

Of course many populations can be divided into more than two groups, and many
experiments yield an arbitrary number of events. This requires a more general version of
the partition rule.

Extended partition rule. Let A be some event, and suppose that (Bi ; i > 1) is a
collection of events such that.
[
A B1 [ B2 [ Bi ,
i
and, for i 6 j, Bi \ B j , that is to say, the Bi are disjoint. Then, by the extended
addition rule (5) of section 2.5,
(8) P(A) P(A \ B1 ) P(A \ B2 )
X
P(A \ Bi )
i
X
P(AjBi )P(Bi ):
i
This is the extended partition rule. Its conditional form is
X
(9) P(AjC) P(AjBi \ C)P(Bi jC):
i

Example 2.8.4: coins. You have 3 double-headed coins, 1 double-tailed coin and 5
normal coins. You select one coin at random and ip it. What is the probability that it
shows a head?

Solution. Let D, T , and N denote the events that the coin you select is double-headed,
double-tailed or normal, respectively. Then, if H is the event that the coin shows a head,
by conditional probability we have
P( H) P( HjD)P(D) P( HjT )P(T) P( HjN )P(N )
1 3 39 0 3 19 12 3 59 11
18: s

Obviously the list of examples demonstrating the partition rule could be extended
indenitely; it is a crucial result. Now let us consider the examples given above from
another point of view.

Example 2.8.1 revisited: pirates. Typically, we are prompted to consider this problem
when our toy proves to be defective. In this case we wish to know if it is an authentic
product of Acme Gadgets Inc., in which case we will be able to get a replacement. Pirates,
of course, are famous for not paying compensation. In fact we really want to know
P(AjD), which is an upper bound for the chance that you get a replacement. s
2.8 The partition rule and Bayes' rule 57

Example 2.8.2 revisited: tests. Once again, for the individual the most important
question is, given a positive result do you indeed suffer the disease? That is, what is
P(DjR)? s

Of course these questions are straightforward to answer by conditional probability,

since
P(AjB) P(A \ B)=P(B):
The point is that in problems of this kind we are usually given P(BjA) and P(BjA c ).
Expanding the denominator P(B) by the partition rule gives an important result:

Bayes' rule
P(BjA)P(A)
(10) P(AjB) :
P(BjA)P(A) P(BjA c )P(A c )
Here are some applications of this famous rule or theorem.

Example 2.8.2 continued: false positives. Now we can answer the question posed
above: in the context of this test, what is P(DjR)?

Solution. By Bayes' rule,

P(RjD)P(D)
P(DjR)
P(R)
0:95p
:
0:85p 0:1
On the one hand, if p 12 then we nd
P(DjR) 19
21

and the test looks good. On the other hand, if p 106 , so the disease is very rare, then
P(DjR) ' 105
which is far from conclusive. Ordinarily one would hope to have further independent tests
to use in this case. s

Here is an example of Bayes' rule that has the merit of being very simple, albeit
slightly frivolous.

Example 2.8.3: examinations. Suppose a multiple choice question has c available

choices. A student either knows the answer with probability p, say, or guesses at random
with probability 1 p. Given that the answer selected is correct, what is the probability
that the student knew the answer?

Solution. Let A be the event that the question is answered correctly, and S the event
that the student knew the answer. We require P(SjA). To use Bayes' rule, we need to
58 2 The rules of probability

calculate P(A), thus

P(A) P(AjS ) P(S ) P(AjS c )P(S c )
p c1 (1 p):
Now by conditional probability
P(SjA) P(AjS )P(S )=P(A)
p

p c1 (1
p)
cp
:
1 (c 1) p
Notice that the larger c is, the more likely it is that the student knew the answer to the
question, given that it is answered correctly. This is in accord with our intuition about
such tests. Indeed if it were not true there would be little point in setting them. s

Exercises for section 2.8

1. An insurance company knows that the probability of a policy holder's having an accident in any
given year is if the insured is aged less than 25, and if the insured is 25 or over. A fraction
of policy holders are less than 25. What is the probability that
(a) a randomly selected policy holder has an accident?
(b) a policy holder who has an accident is less than 25?
2. A factory makes tool bits; 5% are defective. A machine tests each bit. With probability 103 it
incorrectly passes a defective bit; with probability 104 it incorrectly rejects a good bit. What is
the probability that
(a) a bit was good, given it was rejected?
(b) the machine passes a randomly selected bit?
3. Red ace. A pack of four cards contains two clubs and two red aces.
(a) Two cards are selected at random and a friend tells you that one is the ace of hearts. Can
you say what the probability is that the other is the ace of diamonds?
(b) Two cards are selected at random and inspected by a friend. You ask whether either of them
is the ace of hearts and receive the answer `Yes'. Can you say what the probability is that
the other is the ace of diamonds? (Your friend always tells the truth.)
4. Prove the conditional partition rules (7) and (9).

2 . 9 I N D E P E N D E N C E A N D T H E P RO D U C T RU L E
At the start of section 2.7 we noted that a change in the conditions of some experiment
will often obviously change the probabilities of various outcomes. That led us to dene
conditional probability.
However, it is equally obvious that sometimes there are changes that make no
difference whatever to the outcomes of the experiments, or to the probability of some
event A of interest. For example, suppose you buy a lottery ticket each week; does the
chance of your winning next week depend on whether you won last week? Of course not;
the numbers chosen are independent of your previous history. What does this mean
2.9 Independence and the product rule 59

formally? Let A be the outcome of this week's lottery, and B the event that you won last
week. Then we agree that obviously
(1) P(AjB) P(AjBc ):
There are many events A and B for which, again intuitively, it seems natural that the
chance that A occurs is not altered by any knowledge of whether B occurs or Bc occurs.
For example, let A be the event that you roll a six and B the event that the dollar exchange
rate fell. Clearly we must assume that (1) holds. You can see that this list of pairs A and B
for which (1) is true could be prolonged indenitely:
A this coin shows a head, B that coin shows a head;
A you are dealt a flush, B coffee futures fall:
Think of some more examples yourself.
Now it immediately follows from (1) by the partition rule that
(2) P(A) P(AjB)P(B) P(AjBc )P(Bc )
P(AjB)fP(B) P(Bc )g, by (1)
P(AjB):
That is, if (1) holds then
(3) P(A) P(AjB) P(AjBc ):
Furthermore, by the denition of conditional probability, we have in this case
(4) P(A \ B) P(AjB)P(B)
P(A)P(B), by (2)
This special property of events is called independence and, when (4) holds, A and B are
said to be independent events. The nal version (4) is usually taken to be denitive; thus
we state the

Product rule. Events A and B are said to be independent if and only if

(5) P(A \ B) P(A)P(B):

Example 2.9.1. Suppose I roll a die and pick a card at random from a conventional
pack. What is the chance of rolling a six and picking an ace?

Solution. We can look at this in two ways. The rst way says that the events
A roll a six
and
B pick an ace
are obviously independent in the sense discussed above; that is, P(AjB) P(A) and of
course P(BjA) P(B). Dice and cards cannot inuence each other. Hence
P(A \ B) P(A)P(B)
16 3 13
1 1
78 :
Alternatively, we could use the argument of chapter 1, and point out that by symmetry
60 2 The rules of probability

all 6 3 52 312 possible outcomes of die and card are equally likely. Four of them have
an ace with a six, so
4 1
P(A \ B) 312 78 :
It is very gratifying that the two approaches yield the same answer, but not surprising. In
fact, if you think about the argument from symmetry, you will appreciate that it tacitly
assumes the independence of dice and cards. If there were any mutual inuence it would
break the symmetry. s

If A and B are not independent, then they are said to be dependent. Obviously
dependence and independence are linked to our intuitive notions of cause and effect.
There seems to be no way in which one coin can cause another to be more or less likely
to show a head. However, you should beware of taking this too far. Independence is yet
another assumption that we make in constructing our model of the real world. It is an
extremely convenient assumption, but if it is inappropriate it will yield inaccurate and
irrelevant results. Be careful.
The product rule (5) has an extended version, as follows:

Independence of n events. The events (A r ; r > 1) are independent if and only if

(6) P(A s1 \ A s2 \ \ A s n ) P(A s1 ) P(A s n )
for any selection (s1 , . . . , s n ) of the positive integers Z .
We give various examples to demonstrate these ideas.

Example 2.9.2. A sequence of fair coins is ipped. They each show a head or a tail
independently, with probability 12 in each case. Therefore the probability that any given
set of n coins all show heads is 2 n . Indeed, the probability that any given set of n coins
shows a specied arrangement of heads and tails is 2 n . Thus, for example, if you ip a
fair coin 6 times,
P( HHHHHH) P( HTTHTH) 26 :
(The less experienced sometimes nd this surprising.) s

Let us consider some everyday applications of the idea of independence.

Example 2.9.3: central heating. Your heating system includes a pump and a boiler in
a circuit of pipes. You might represent this as a diagram like gure 2.6.
Let F p and Fb be the events that the pump or boiler fail, respectively. Then the event
W that your system works is
W F cp \ F cb :
You might assume that pump and boiler break down independently, in which case, by (5),
(7) P(W ) P(F cp )P(F cb ):
However, your plumber might object that if the power supply fails then both pump and
boiler will fail, so the assumption of independence is invalid. To meet this objection we
dene the events
2.9 Independence and the product rule 61

pump boiler

radiators

Figure 2.6. A central heating system.

Fe power supply failure,

M p mechanical failure of pump,
M b mechanical failure of boiler:
Then it is much more reasonable to suppose that Fe , M p , and M b are independent events.
We represent this system in gure 2.7. Incidentally, this gure makes it clear that such
diagrams are essentially formal in character; the water does not circulate through the
power supply. We have now
W 9 F ce \ M cp \ M cb ,
F cb M cb \ F ce ,
F cp M cp \ F ce :
Hence the probability that the system works is
(8) P(W 9) P(F ce )P(M cp )P(M cb ):
It is interesting to compare this with the answer obtained on the assumption that F p and
Fb are independent. From (7) this is
P(F cb )P(F cp ) P(M cb )P(M cp )[P(F ce )]2 P(W 9)P(F ce ),

pump boiler power

Figure 2.7. The system works only if all three elements in the sequence work.
62 2 The rules of probability

on using independence, equation (6). This answer is smaller than that given in (8),
showing how unjustied assumptions of independence can mislead. s

In the example above the elements of the system were in series. Sometimes elements
are found in parallel.

Example 2.9.3 continued: central heating. Your power supply is actually of vital
importance (in a hospital, say) and you therefore t an alternative generator for use in
emergencies. The power system can now be represented as in gure 2.8. Let the event
that the emergency power fails be E. If we assume that Fe and E are independent, then
the probability that at least one source of power works is
P(F ce [ E c ) P((Fe \ E) c )
1 P(Fe \ E)
1 P(Fe )P(E), by (5)
> P(F ce ):
Hence the probability that the system works is increased by including the reserve power
unit, as you surely hoped. s

Many systems comprise blocks of independent elements in series or in parallel, and

then P(W ) can be found by repeatedly combining blocks.

Example 2.9.4. Suppose a system can be represented as in gure 2.9. Here each
element works with probability p, independently of the others. Running through the
blocks we can reduce this to gure 2.10, where the expression in each box is the
probability of its working. s

power
supply

emergency
power
supply

Figure 2.8. This system works if either of the two elements works.
2.9 Independence and the product rule 63

p p

A p B

p p

Figure 2.9. Each element works independently with probability p. The system works if a route
exists from A to B that passes only through working elements.

p2 p

1 (1 p)(1 p2)2 p

Figure 2.10. Solution in stages, showing nally that P(W ) pf1 (1 p)(1 p2 )2 g, where W is
the event that the system works.

Sometimes elements are in more complicated congurations, in which case the use of
conditional probability helps.

Example 2.9.5: snow. Four towns are connected by ve roads, as shown in gure
2.11. Each road is blocked by snow independently with probability ; what is the
probability that you can drive from A to D?
64 2 The rules of probability
B

A D

C
Figure 2.11. The towns lie at A, B, C, and D.

Solution. Let R be the event that the road BC is open, and Q the event that you can
drive from A to D. Then
P(Q) P(QjR)(1 ) P(QjR) (1 2 )2 (1 ) [1 f1 (1 )2 g2 ] :
The last line follows on using the methods of example 2.9.4, because when R or R c
occurs the system is reduced to blocks in series and parallel. s

Note that events can in fact be independent when you might reasonably expect them
not to be.

Example 2.9.6. Suppose three fair coins are ipped. Let A be the event that they all
show the same face, and B the event that there is at most one head. Are A and B
independent? Write `yes' or `no', and then read on.

Solution. There are eight equally likely outcomes. We have

jA \ Bj 1 jfTTT gj,
jAj 2 jfTTT , HHH gj,
jBj 4 jfTTT , HTT , THT , TTHgj:
Hence
P(A \ B) 18 P(TTT ), P(A) 14 P(TTT [ HHH ), P(B) 12,
and so P(A \ B) P(A)P(B), and they are independent. s

Very often indeed we need to use a slightly different statement of independence. Just as
P(AjC) is often different from P(A), so also P(A \ BjC) may behave differently from
P(A \ B). Specically, A and B may be independent given C, even though they are not
necessarily independent in general. This is called conditional independence; formally we
state the following

Denition. Events A and B are conditionally independent given C if

(9) P(A \ BjC) P(AjC)P(BjC): n
Here is an example.
2.9 Independence and the product rule 65

Example 2.9.7: high and low rolls. Suppose you roll a die twice. Let A2 be the
event that the rst roll shows a 2, and B5 the event that the second roll shows a 5. Also
let L2 be the event that the lower score is a 2, and H 5 the event that the higher score is
a 5.
(i) Show that A2 and B5 are independent.
(ii) Show that L2 and H 5 are not independent.
(iii) Let D be the event that one roll shows less than a 3 and one shows more than a 3.
Show that L2 and H 5 are conditionally independent given D.

Solution. For (i) we have easily

1
P(A2 \ B5 ) 36 P(A2 )P(B5 )
For (ii), however,
2
P(L2 \ H 5 ) 36 :
Also,
9 9
P(L2 ) 36 and P( H 5 ) 36 :
Therefore L2 and H 5 are dependent, because
1 1
18 6 16:
For (iii) we now have P(D) 12 1
36 3. So, given D,
1 1
P(L2 \ H 5 jD) P(L2 \ H 5 \ D)=P(D) 18 =3 16:
However, in this case
6 12
P(L2 jD) P(L2 \ D)=P(D) 36 =36 12
and
4 12
P( H 5 jD) P( H 5 \ D)=P(D) 36 =36 13
and of course 16 12 3 13, and so L2 and H 5 are independent given D. s

Conditional independence is an important idea that is frequently used surreptitiously,

or taken for granted. It does not imply, nor is it implied by, independence; see the
exercises below.

Exercises for section 2.9

1. Show that A and B are independent if and only if A c and Bc are independent.
2. A and B are events such that P(A) 0:3 and P(A [ B) 0:5. Find P(B) when
(a) A and B are independent,
(b) A and B are disjoint,
(c) P(AjB) 0:1,
(d) P(BjA) 0:4.
3. A coin shows a head with probability p, or a tail with probability 1 p q. It is ipped
repeatedly until the rst head occurs. Show that the probability that n ips are necessary,
including the head, is p n q n1 p.
66 2 The rules of probability

4. Suppose that any child is equally likely to be male or female, and Anna has three children. Let
A be the event that the family includes children of both sexes and B the event that the family
includes at most one girl.
(a) Show that A and B are independent.
(b) Is this still true if boys and girls are not equally likely?
(c) What happens if Anna has four children?
5. Find events A, B, and C such that A and B are independent, but A and B are not conditionally
independent given C.
6. Find events A, B, and C such that A and B are not independent, but A and B are conditionally
independent given C.
7. Two conventional fair dice are rolled. Show that the event that their sum is 7 is independent of
the score on the rst die.
8. Some form of prophylaxis is said to be 90% effective at prevention during one year's treatment.
If years are independent, show that the treatment is more likely than not to fail within seven
years.

2.10 TREES AND GRAPHS

In real life you may be faced with quite a long sequence of uncertain contingent events.
For example, your computer may develop any of a number of faults, you may choose any
of a number of service agents, they may or may not correct it properly, the consequences
of an error are uncertain, and so on ad nauseam. (The same is true of bugs in software.)
In such cases we need and use the extended form of the multiplication rule,
(1) P(A1 \ A2 \ \ A n ) P(A n jA1 \ A2 \ \ A n ) P(A2 jA1 )P(A1 ):
The proof of (1) is trivial and is outlined after equation (8) in section 2.7.
Now, if each of these events represents a stage in some system of multiple random
choices, it is not impossible that the student will become a little confused. In such cases it
is often helpful to use tree diagrams to illustrate what is going on. (This is such a natural
idea that it was rst used by C. Huygens in the 17th century, in looking at the earliest
problems in probability.) These diagrams will not enable you to avoid the arithmetic and
algebra, but they do help in keeping track of all the probabilities and possibilities. The
basic idea is best explained by an example.

Example 2.10.1: faults. (i) A factory has two robots producing capeks. (A capek is
not unlike a widget or a gubbins, but it is more colourful.) One robot is old and one is
new; the newer one makes twice as many capeks as the old. If you pick a capek at
random, what is the probability that it was made by the new machine? The answer is
obviously 23 , and we can display all the possibilities in a natural and appealing way in
Figure 2.12. The arrows in a tree diagram point to possible events, in this example N
(new) or N c (old). The probability of the event is marked beside the relevant arrow.
(ii) Now we are told that 5% of the output of the old machine is defective (D), but 10%
of the output of the new machine is defective. What is the probability that a randomly
selected capek is defective? This time we draw a diagram rst, gure 2.13. Now we begin
to see why this kind of picture is called a tree diagram. Again the arrows point to possible
2.10 Trees and graphs 67
new N old N c

2_ 1_
3 3

Figure 2.12. A small tree.

1 D defective
10

N
2
3 9
10 Dc

1 D
1 20
3
Nc

19
20
D c not defective

Figure 2.13. A tree drawn left to right: classifying capeks.

events. However, the four arrows on the right are marked with conditional probabilities,
because they originate in given events. Thus
P(DjN ) 101
, P(Dc jN c ) 19
20,
and so on.
The probability of traversing any route in the tree is the product of the probabilities on
the route, by (1). In this case two routes end at a defective capek, so the required
probability is
P(D) 23 3 10
1
13 3 20
1

P(DjN )P(N ) P(DjN c )P(N c ) ,

1
12 ,
which is the partition rule, of course. You always have a choice of writing the answer
down by routine algebra, but drawing a diagram often helps.
It is interesting to look at this same example from a slightly different viewpoint, as
follows. By conditional probability we have easily that
P(N jD) 45, P(N c jD) 15,
P(N jDc ) 36 c c 19
55, P(N jD ) 55:
Then we can draw what is known as the reversed tree; see gure 2.14. s
68 2 The rules of probability

4
N
5

D
1 1
12 5
Nc

36
N
11
12 55

19
55
Nc
Figure 2.14. Reversed tree for capeks: D or D is followed by N or N c.
c

Trees like those in gures 2.13 and 2.14, with two branches at each fork, are known as
binary trees. The order in which we should consider the events, and hence draw the tree,
is usually determined by the problem, but given any two events A and B there are
obviously two associated binary trees.
The notation of these diagrams is natural and self-explanatory. Any edge corresponds
to an event, which is indicated at the appropriate node or vertex. The relevant probability
is written adjacent to the edge. We show the rst tree again in gure 2.15, labelled with
symbolic notation.
The edges may be referred to as branches, and the nal node may be referred to as a
leaf. The probability of the event at any node, or leaf, is obtained by multiplying the
probabilities labelling the branches leading to it. For example,
(2) P(Ac \ B) P(BjA c )P(A c ):
Furthermore, since event B occurs at the two leaves marked with an asterisk, the diagram

AB *
P(B|A)

A
P(A) P(Bc|A)
A Bc

P(Ac)
Ac B *
P(B|Ac)

P(Bc|Ac)
Ac Bc
Figure 2.15. A or A c is followed by B or Bc.
2.10 Trees and graphs 69

shows that
(3) P(B) P(BjA)P(A) P(BjA c )P(A c )
as we know.
Figure 2.16 is the reversed tree. If we have the entries on either tree we can nd the
entries on the other by using Bayes' rule.
Similar diagrams arise quite naturally in knock-out tournaments, such as Wimbledon.
The diagram is usually displayed the other way round in this case, so that the root of such
a tree is the winner in the nal.

Example 2.8.2 revisited: tests. Recall that a proportion p of a population is subject to

a disease. A test may help to determine whether any individual has the disease.
Unfortunately unless the test is extremely accurate, this will result in false-positive
results. One would like accurate tests, but reliable tests usually involve invasive biopsy.
This itself can lead to undesirable results; if you have not got some disease you would
regret a biopsy, to nd out, that resulted in your death. It is therefore customary, where
possible, to use a two-stage procedure; the population of interest is rst given the non-
invasive but less reliable test. Only those testing positive are subject to biopsy. The tree
may take the form shown in gure 2.17; T denotes that the result of the biopsy is
positive. s
AB
P(A|B)

B
P(B) P(Ac|B)
Ac B

P(Bc)
A Bc
P(A|Bc)

P(Ac|Bc)
Ac Bc

Figure 2.16. Reversed tree: B or Bc is followed by A or A c.

R T accurate diagnoses

D Rc Tc false negatives

population
tested
Dc R T false positives

Rc Tc accurate diagnoses

Figure 2.17. Sequential tests. D, disease present; R, rst test positive; T , second test positive.
70 2 The rules of probability

Example 2.10.2: embarrassment. Privacy is important to many people. For example,

suppose you were asked directly whether you dye your hair, `borrow' your employer's
stationery, drink to excess, or suffer from some stigmatized illness. You might very well
depart from strict truth in your reply, or simply decline to respond.
Nevertheless, there is no shortage of researchers and other busybodies who would like
to know the correct answer to these and other embarrassing questions. They have invented
the following scheme.
A special pack of cards is shufed; of these cards a proportion n say Ànswer ``no''', a
proportion p say Ànswer ``yes''', and a proportion q say Ànswer truthfully'. You draw a
card at random, secretly, the question is put, and you obey the instructions on the card.
The point is that you can afford to tell the truth when instructed to do so, because no
one can say whether your answer actually is true, even when embarrassing. However, in
this way, in the long run the researchers can estimate the true incidence of alcoholism, or
pilfering in the population, using Bayes' rule. The tree is shown in gure 2.18. The
researcher wishes to know m, the true proportion of the population that is embarrassed.
Now, using the partition rule as usual, we can say that the probability of the answer being
`yes' is
(4) P(yes) p mq:
Then the busybody can estimate m by nding the proportion y who answer `yes' and
setting
(5) m ( y p)=q:
Conversely, the embarrassed person knows that, for the researcher, the probability that
they were truthful in answering `yes' is only
qm
(6) P(truthjyes) ,
p qm
which can be as small as privacy requires. With this degree of security conferred by
randomizing answers, the question is very unlikely to get deliberately wrong answers. (In
fact, the risk is run of getting wrong answers because the responder fails to understand
the procedure.) s

CARD ANSWER

no 1
no
n

p 1
yes yes
q

m
truth yes

1m no
Figure 2.18. Evasive tree.
2.10 Trees and graphs 71

We note that trees can be innite in many cases.

Example 2.10.3: craps, an innite tree. In this well-known game two dice are rolled
and their scores added. If the sum is 2, 3, or 12 the roller loses, if it is 7 or 11 the roller
wins, if it is any other number, say n, the dice are rolled again. On this next roll, if the
sum is n then the roller wins, if it is 7 the roller loses, otherwise the dice are rolled again.
On this and all succeeding rolls the roller loses with 7, wins with n, or rolls again
otherwise. The corresponding tree is shown in gure 2.19. s

We conclude this section by remarking that sometimes diagrams other than trees are
useful.

Example 2.10.4: tennis. Rod and Fred are playing a game of tennis, and have
reached deuce. Rod wins any point with probability r or loses it with probability 1 r.
Let us denote the event that Rod wins a point by R. Then if they share the next two points
the game is back to deuce; an appropriate diagram is shown in gure 2.20. s

7 or 11 win n win

n {4, 5, 6, 8, 9, 10} m {7, n}

continue continue ...

2 or 3 or 12 lose 7 lose
Figure 2.19. Tree for craps. The game continues indenitely.

R Rod wins the game

R
1r R
r

deuce * deuce ...

1r Rc r Rc

1r
R c Fred wins the game

Figure 2.20. The diagram is not a tree because the edges rejoin at .
72 2 The rules of probability

Larger diagrams of this kind can be useful.

Example 2.10.5: coin tossing. Suppose you have a biased coin that yields a head
with probability p and a tail with probability q. Then one is led to the diagram in gure
2.21 as the coin is ipped repeatedly; we truncate it at three ips. s

Exercises for section 2.10

1. Prove the multiplication rule (1).
2. In tennis a tie-break is played when games are 66 in a set. Draw the diagram for such a tie-
break. What is the probability that Rod wins the tie-break 71, if he wins any point with
probability p?
3. In a card game you cut for the deal, with the convention that if the cards show the same value,
you recut. Draw the graph for this experiment.
4. You ip the coin of example 2.10.5 four times. What is the probability of exactly two tails?

2 . 1 1 WO R K E D E X A M P L E S
The rules of probability (we have listed them in subsection [Link]), especially the ideas
of independence and conditioning, are remarkably effective at working together to
provide neat solutions to a wide range of problems. We consider a few examples.

Example 2.11.1. A coin shows a head with probability p, or a tail with probability
1 p q. It is ipped repeatedly until the rst head appears. Find P(E), the probability
of the event E that the rst head appears at an even number of ips.

Solution. Let H and T denote the outcomes of the rst ip. Then, by the partition
rule,
(1) P(E) P(Ej H)P( H) P(EjT )P(T ):
Now of course P(Ej H) 0, because 1 is odd. Turning to P(EjT ), we now require an odd
number of ips after the rst to give an even number overall. Furthermore, ips are
independent and so
(2) P(EjT ) P(E c ) 1 P(E)

TTT
q

TT p
q q * {TTH, THT, HTT }

p {TH, HT} p
T {THH, HTH, HHT }
q q
q

p H p HH p HHH

Figure 2.21. Counting heads. There are three routes to the node marked , so the probability of one
head in three ips is 3 pq 2 .
2.11 Worked examples 73

Hence, using (1) and (2),

P(E) f1 P(E)gq,
so P(E) q= p:
You can check this, and appreciate the method, by writing down the probability of a
head after 2r ips, that is, q 2r1 p, and then summing over r,
X pq q
q 2 r1 p 2
: s
r
1q p

Here is the same problem for a die.

Example 2.11.2. You roll a die repeatedly. What is the probability of rolling a six for
the rst time at an odd number of rolls?

Solution. Let A be the event that a six appears for the rst time at an odd roll. Let S
be the event that the rst roll is a six. Then by the partition rule, with an obvious notation,
P(A) P(AjS) 16 P(AjS c ) 56:
But obviously P(AjS ) 1. Furthermore, the rolls are all independent, and so
P(AjS c ) 1 P(A)
Therefore
P(A) 16 56f1 P(A)g
which yields
6
P(A) 11 : s

Let us try something trickier.

Example 2.11.3: Huygens' problem. Two players take turns at rolling dice; they each
need a different score to win. If they do not roll the required score, play continues. At
each of their attempts A wins with probability , whereas B wins with probability . What
is the probability that A wins if he rolls rst? What is it if he rolls second?

Solution. Let p1 be the probability that A wins when he has the rst roll, and p2 the
probability that A wins when B has the rst roll. By conditioning on the outcome of the
rst roll we see that, when A is rst,
p1 (1 ) p2 :
When B is rst, conditioning on the rst roll gives
p2 (1 ) p1 :
Hence solving this pair gives

p1
1 (1 )(1 )
and
(1 )
p2 : s
1 (1 )(1 )
74 2 The rules of probability

Example 2.11.4: Huygen's problem again. Two coins, A and B, show heads with
respective probabilities and . They are ipped alternately, giving ABABAB . . .. Find
the probability of the event E that A is rst to show a head.

Solution. Consider the three events

f Hg 1st flip heads,
fTHg 1st flip tails, 2nd flip heads,
fTTg 1st and 2nd flips tails:
These form a partition of , so, by the extended partition rule (8) of section 2.8,
P(E) P(Ej H) P(EjTH)(1 ) P(EjTT )(1 )(1 ):
Now obviously
P(Ej H) 1 and P(EjTH) 0:
Furthermore
P(EjTT ) P(E):
To see this, just remember that after two tails, everything is essentially back to the starting
position, and all future ips are independent of those two. Hence
P(E) (1 )(1 )P(E)
and so

P(E) : s

Example 2.11.5: deuce. Rod and Fred are playing a game of tennis, and the game
stands at deuce. Rod wins any point with probability p, independently of any other point.
What is the probability that he wins the game?

Solution. We give two methods of solution.

Method I. Recall that Rod wins as soon as he has won two more points in total than
Fred. Therefore he can win only when an even number 2n 2 of points have been played.
Of these Rod has won n 2 and Fred has won n. Let W 2 n2 be the event that Rod wins at
the (2n 2)th point. Now at each of the rst n deuces, there are two possibilities: either
Rod gains the advantage and loses it, or Fred gains the advantage and loses it. At the last
deuce Rod wins both points. Thus there are 2 n different outcomes in W 2 n2 , and each has
probability p n2 (1 p) n. Hence, by the addition rule, P(W 2 n2 ) 2 n p n2 (1 p) n . If
P(W 2 n ) is the probability that Rod wins the game at the 2nth point then, by the extended
addition rule (5) of section 2.5,
P(W 2 ) P(W 3 )
p2 2 p3 (1 p) 22 p4 (1 p)2
p2
:
1 2 p(1 p)
2.11 Worked examples 75

The possible progress of the game is made clearer by the tree diagram in Figure 2.22.
Clearly after an odd number of points either the game is over, or some player has the
advantage. After an even number, either the game is over or it is deuce.
Method II. The tree diagram suggests an alternative approach. Let be the probability
that Rod wins the game eventually given he has the advantage, and the probability that
Rod wins the game eventually given that Fred has the advantage.
Further, R be the event that Rod wins the game and W i be the event that he wins the ith
point. Then, by the partition rule,
P(R)
P(RjW 1 \ W 2 )P(W 1 \ W 2 )
P(RjW 1c \ W 2c )P(W 1c \ W 2c )

P Rj(W 1 \ W 2c ) [ (W1c \ W 2 ) P(W 1 \ W 2c ) [ (W 1c \ W 2 )
p2 0 2 p(1 p):
This is the same as we obtained by the rst method. s

Example 2.11.6. Three players, known as A, B, and C, roll a die repeatedly in the
order ABCABCA . . .. The rst to roll a six is the winner; nd their respective probabilities
of winning.

Solution. Let the players' respective probabilities of winning be , , and 1 ,

and let the event that the rst roll shows a six be S. Then by conditional probability
P(A wins) P(A winsjS) 16 P(A winsjS c ) 56:
Now
P(A winsjS ) 1:
c
If S occurs, then by independence the game takes the same form as before, except that

Rod wins the game

advantage Rod
p
1p
deuce deuce
p
1p
advantage Fred

1p
Fred wins the game

Figure 2.22. Deuce.

76 2 The rules of probability

the rolls are in the order BCABCA . . . and A is third to roll. Hence, starting from this
point, the probability that A wins is now 1 , and we have that
16 (1 ) 56
Applying a similar argument to the sequence of rolls beginning with B, we nd
1 56
because B must fail to roll a six for A to have a chance of winning, and then the sequence
takes the form CABCAB . . ., in which A is second, with probability of winning.
Applying the same argument to the sequence of rolls beginning with C yields
56
because C must fail to roll a six, and then A is back in rst place. Solving these three
equations gives
3691, 30
91, 1 25
91: s

Another popular and extremely useful approach to many problems in probability

entails using conditioning and independence to yield difference equations. We shall see
more of this in chapter 3 and later; for the moment here is a brief preview. We start with a
trivial example.

Example 2.11.7. A biased coin is ipped repeatedly until the rst head is shown.
Find the probability p n P(A n ) of the event A n that n ips are required.

Solution. By the partition rule, and conditioning on the outcome of the rst ip,
P(A n ) P(A n j H) p P(A n jT )q

p if n 1

0 qP(A n1 ) otherwise,
by independence. Hence
p n qp n1 q n1 p1 q n1 p, n > 1: s

Of course this result is trivially obvious anyway, but it illustrates the method. Here is a
trickier problem.

Example 2.11.8. A biased coin is ipped up to and including the ip on which it has
rst shown two successive tails. Let A n be the event that n ips are required. Show that, if
p n P(A n ), p n satises
p n pp n1 pqp n2 , n . 2:

Solution. As usual we devise a partition; in this case H, TH, TT are three appropriate
disjoint events. Then
2.11 Worked examples 77

p n P(A n ) P(A n j H ) p P(A n jTH ) pq P(A n jTT )q 2

2
q , n2

pp n1 pqp n2 otherwise,
by independence of ips. s

Here is another way of using conditional probability.

Example 2.11.9: degraded signals. A digital communication channel transmits data

using the two symbols 0 and 1. As a result of noise and other degrading inuences, any
symbol is incorrectly transmitted with probability q, independently of the rest of the
symbols. Otherwise it is correctly transmitted with probability p 1 q.
On receiving a signal R comprising n symbols, you decode it by assuming that the
sequence S which was sent is such that P(RjS ) is as large as possible.
For example, suppose you receive the signal 101 and the possible sequences sent are
111 and 000, then
2
p q if S 111
P(101jS )
q 2 p if S 000:
Thus if p . q the signal 101 is decoded as 111. s

Next we turn to a problem that was considered (and solved) by many 18th century
probabilists, and later generalized by Laplace and others. It arose in Paris with the rather
shadowy gure of a Mr Waldegrave, a friend of Montmort. He is described as an English
gentleman, and proposed the problem to Montmort sometime before 1711. de Moivre
studied the same problem in 1711 in his rst book on probability. It seems unlikely that
these events were independent; there is no record of Waldegrave visiting the same coffee
house as de Moivre, but this seems a very likely connection. (de Moivre particularly
favoured Slaughter's coffee house, in St Martin's Lane). de Moivre also worked as a
mathematics tutor to the sons of the wealthy, so an alternative hypothesis is that
Waldegrave was a pupil or a parent.
The problem is as follows.

Example 2.11.10: Waldegrave's problem. There are n 1 players of some game,

A0 , A1 , . . . , A n , who may be visualized as sitting around a circular table. They play a
sequence of rounds in pairs as follows. First A0 plays against A1 ; then the winner plays
against A2 ; after that the new winner plays against A3 , and so on. The rst player to win
n rounds consecutively (thus beating all other players) is the overall victor, and the
game stops. One may ask several questions, but a natural one is to seek the probability
that the game stops at the rth round. Each round is equally likely to be won by either
player.

Solution. As so often in probability problems, it is helpful to restate the problem

before solving it. Each round is played by a challenger and a fresh player, the challenged.
Since each round is equally likely to be won by either player, we might just as well ip a
coin or roll a die. The game is then rephrased as follows.
78 2 The rules of probability

The rst round is decided by rolling a die; if it is even A0 wins, if it is odd A1 wins.
All following rounds are decided by ipping a coin. If it shows heads the challenger
wins, if it shows tails the challenged wins. Now it is easy to see that if the coin shows
n 1 consecutive heads then the game is over. Also, the game can only nish when this
occurs. Hence the rst round does not count towards this, and so the required result is
given by the probability p r that the coin rst shows n 1 consecutive heads at the
(r 1)th ip. But this is a problem we know how to solve; it is just an extension of
example 2.11.8.
First we note that (using an obvious notation) the following is a partition of the sample
space:
fT , HT , H 2 T , . . . , H n2 T , H n1 g:
Using conditional probability and independence of ips, this gives

(3) p r 12 p r1 12 2 p r2 12 n1 p r n1 , r . n
with
1 n1
pn 2
and
p1 p2 p n1 0
In particular, when n 3, (3) becomes

(4) p r 12 p r1 12 2 p r2 , r > 4:
Solving this constitutes problem 26 in section 2.16. s

Exercises for section 2.11

1. A biased coin is ipped repeatedly. Let p n be the probability that n ips have yielded an even
number of heads, with p0 1. As usual P( H) p 1 q, on any ip. Show that
p n p(1 p n1 ) qp n1 , n > 1
and nd p n (for a list of the basic rules, see section 2.15).
2. A die is `xed' so that when rolled the score cannot be the same as the previous score, all other
scores having equal probability 15. If the rst score is 6, what is the probability p n that the nth
score is 6? What is the probability q n that the nth score is j for j 6 6?

2.12 ODDS
. . . and this particular season the guys who play the horses are being murdered by
the bookies all over the country, and are in terrible distress. . . . But personally I
consider all horse players more or less daffy anyway. In fact, the way I see it, if a
guy is not daffy he will not be playing the horses.
Damon Runyon, Dream Street Rose
Occasionally, statements about probability are made in terms of odds. This is universally
true of bookmakers who talk of `long odds', `1001 odds', `the 21 on favourite', and so
on. Many of these phrases and customs are also used colloquially, so it is as well to make
it clear what all this has to do with our theory of probability.
2.12 Odds 79

In dealing with these ideas we must distinguish very carefully between fair odds and
bookmakers' payoff odds. These are not the same. First, we dene fair odds.

Denition. If an event A has probability P(A), then the fair odds against A are
1 P(A)
(1) a (A) f1 P(A)g : P(A)
P(A)
and the fair odds on A are
P(A)
(2) o (A) P(A) : f1 P(A)g
1 P(A)
The ratio notation on the right is often used for odds.
For example, for a fair coin the odds on and against a head are
1=2
o ( H) a ( H) 1 : 1
1=2
These are equal, so these odds are said to be evens. If a die is rolled, the odds on and
against a six are
1=6
o (6) 1 : 5,
1 1=6
1 1=6
a (6) 5 : 1:
1=6
You should note that journalists and reporters (on the principle that ignorance is bliss)
will often refer to `the odds on A', when in fact they intend to state the odds against A.
Be careful.
Now although the fair odds against a head when you ip a coin are 1:1, no bookmaker
would pay out at evens for a bet on heads. The reason is that in the long run she would
pay out just as much in winnings as she would take from losers. Nevertheless, book-
makers and casinos offer odds; where do they come from? First let us consider casino
odds.
When a casino offers odds of 35 to 1 against an event A, it means that if you stake $1
and then A occurs, you will get your stake back plus $35. If A c occurs then you forfeit
your stake. For this reason such odds are often called payoff odds. How are they xed?
In fact, 35:1 is exactly the payoff odds for the event that a single number you select
comes up at roulette. In the American roulette wheel there are 38 compartments. In a
well-made wheel they should be equally likely, by symmetry, so the chance that your
1
number comes up is 38 .
Now, as we have discussed above, if you get $d with probability P(A) and otherwise
you get nothing, then $P(A)d is the value of this offer to you.
We say that a bet is fair if the value of your return is equal to the value of your stake.
To explain this terminology, suppose you bet $1 at the fair odds given in (1) against A.
You get
1 P(A)
$1 $
P(A)
80 2 The rules of probability

with probability P(A), so the value of this to you is

1 P(A)
$P(A) 1 $1:
P(A)
This is the same as your stake, which seems fair. Now consider the case of roulette. Here
1
you get $(1 35) with probability 38 . The value of the return is $1819, which is less than
1
your stake. Of course the difference, $19 , is what the casino charges you for the privilege
of losing your money, so this (in effect) is the value you put on playing roulette. If you get
1
more than $19 worth of pleasure out of wagering $1, then for you the game is worth
playing.
It is now obvious that in a casino, if the fair odds against your winning are a , then the
payoff odds a will always satisfy
(3) a , a:
In this way the casino ensures that the expected value of the payoff is always less than the
value of the stake. It is as well to stress that if the casino has arranged that (3) is true for
every available bet, then no system of betting can be fair, or favourable to the gambler.
Such systems can only change the rate at which you lose money.
Now let us consider bookmakers' odds; for deniteness let us consider a horse race.
There are two main ways of betting on horse races. We will consider that known as the
Tote system; this is also known as pari-mutuel betting. When you place your bet you do
not know the payoff odds; they are not xed until betting stops just before the race begins.
For this reason they are known as starting prices, and are actually determined by the bets
placed by the gamblers. Here is how it works. Suppose there are n horses entered for the
race, and a total of $b j is wagered on the jth horse, yielding a total of $b bet on the race,
where
X
n
b b j:
j1

Then the Tote payoff odds for the jth horse are quoted as
1 pj
(4) a ( j)
pj
where
bj
pj
(1 t)b
for some positive number t, less than 1.
What does all this mean? For those who together bet a total of $b j on the jth horse the
total payoff if it wins is

1 pj bj
(5) bj 1 (1 t)b b tb
pj pj
which is $tb less than the total stake, and independent of j. That is to say, the bookmaker
will enjoy a prot of $tb, the `take', no matter which horse wins. (Bets on places and
other events are treated in a similar but slightly more complicated way.)
2.12 Odds 81

Now suppose that the actual probability that the jth horse will win the race is h j . (Of
course we can never know this probability.) Then the value to the gamblers of their bets
on this horse is h j b(1 t), and the main point of betting on horse races is that this may
be greater than b j . But usually it will not be.
It is clear that you should avoid using payoff odds (unless you are a bookmaker). You
should also avoid using fair odds, as the following example illustrates.

Example 2.12.1. Find the odds on A \ B in terms of the odds on A and the odds on
B, when A and B are independent.

Solution. From the denition (2) of odds we have

P(A \ B)
o (A \ B)
1 P(A \ B)
P(A)P(B)
, by independence
1 P(A)P(B)
o (A)o (B)
:
1 o (A) o (B)
Compare this horrible expression with P(A \ B) P(A)P(B), to see why the use of odds
is best avoided in algebraic work. (Of course it suits bookmakers to obfuscate matters.) s

Finally we note that when statisticians refer to an `odds ratio', they mean a quantity
such as

P(A) P(B)
R(A:B) :
P(A c ) P(Bc )
More loosely, people occasionally call any quotient of the form P(A)=P(B) an odds ratio.
Be careful.

Exercises for section 2.12

1. Suppose the fair odds against an event are a and the casino payoff odds are a . Show that the
casino's percentage take is

a a
100 %:
a 1
2. Suppose you nd a careless bookmaker offering payoff odds of ( j) against the jth horse in an
n-horse race, 1 < j < n, and
X n
1
, 1:
j1
1 ( j)
Show that if you bet $f1 ( j)g1 on the jth horse, for all n horses, then you surely win.
3. Headlines recently trumpeted that the Earth had a one in a thousand chance of being destroyed
by an asteroid shortly. The story then revealed that these were bookmakers' payoff odds.
Criticize the reporters. (Hint: do you think a `chance' is given by a or a ?)
82 2 The rules of probability

2 . 1 3 P O P U L A R PA R A D OX E S
Probability is the only branch of mathematics in which good mathematicians
frequently get results which are entirely wrong.
C. S. Pierce
This section contains a variety of material that, for one reason or another, seems best
placed at the end of the chapter. It comprises a collection of `paradoxes', which
probability supplies in seemingly inexhaustible numbers. These could have been included
earlier, but the subject is sufciently challenging even when not paradoxical; it seems
unreasonable for the beginner to be asked to deal with gratuitously tricky ideas as well.
They are not really paradoxical, merely examples of confused thinking, but, as a by-now
experienced probabilist, you may nd them entertaining. Many of them arise from false
applications of Bayes' rule and conditioning. You can now use these routinely and
appropriately, of course, but in the hands of amateurs, Bayes' rule is deadly.
Probability has always attracted more than its fair share of disputes in the popular
press; and several of the hardier perennials continue to enjoy a zombie-like existence on
the internet (or web). One may speculate about the reasons for this; it may be no more
than the fact that anyone can roll dice, or pick numbers, but rather fewer take the trouble
to get the algebra right. At any rate we can see that, from the very beginning of the
subject, amateurs were very reluctant to believe what the mathematicians told them. We
observe Pepys badgering Newton, de Mere pestering Pascal, and so on. Recall the words
of de Moivre: `Some of the problems about chance having a great appearance of
simplicity, the mind is easily drawn into a belief that their solution may be attained by the
mere strength of natural good sense; which generally proves otherwise . . .'; so still today.
In the following examples `Solution' denotes a false argument, and Resolution or
Solution denotes a true argument.
Most of the early paradoxes arose through confusion and ignorance on the part of non-
mathematicians. One of the rst mathematicians who chose to construct paradoxes was
Lewis Carroll. When unable to sleep, he was in the habit of solving mathematical
problems in his head (that is to say, without writing anything); he did this, as he put it, `as
a remedy for the harassing thoughts that are apt to invade a wholly unoccupied mind'.
The following was resolved on the night of 8 September 1887.

Carroll's paradox. A bag contains two counters, as to which nothing is known except
that each is either black or white. Show that one is black and the other white.

`Solution'. With an obvious notation, since colours are equally likely, the possibilities
have the following distribution:
P(BB) P(WW ) 14, P(BW ) 12:
Now add a black counter to the bag, then shake the bag, and pick a counter at random.
What is the probability that it is black? By conditioning on the three possibilities we have
P(B) 1 3 P(BBB) 23 3 P(BWB) 13 3 P(WWB)
1 3 14 23 3 12 13 3 14 23:
2.13 Popular paradoxes 83

But if a bag contains three counters, and the chance of drawing a black counter is 23, then
there must be two black counters and one white counter, by symmetry. Therefore, before
we added the black counter, the bag contained BW, viz., one black and one white.

Resolution. The two experiments, and hence the two sample spaces, are different.
The fact that an event has the same probability in two experiments cannot be used to
deduce that the sample spaces are the same. And in any case, if the argument were valid,
and you applied it to a bag with one counter in it, you would nd that the counter had to
be half white and half black, that is to say, random, which is what we knew already. s

Galton's paradox (1894). Suppose you ip three fair coins. At least two are alike,
and it is an evens chance whether the third is a head or a tail, so the chance that all three
are the same is 12.

Solution. In fact
P(all same) P(TTT ) P( HHH ) 18 18 14:
What is wrong?

Resolution. Again this paradox arises from fudging the sample space. This `third'
coin is not identied initially in , it is determined by the others. The chance whether the
`third' is a head or a tail is a conditional probability, not an unconditional probability.
Easy calculations show that
)
P(3rd is Hj HH) 14 HH denotes the event that there
P(3rd is T j HH) 4 3 are at least two heads:
)
P(3rd is T jTT ) 14 TT denotes the event that there
P(3rd is HjTT ) 4 3 are at least two tails:
In no circumstances therefore is it true that it is an evens chance whether the `third' is a
head or a tail; the argument collapses. s

Bertrand's other paradox. There are three boxes. One contains two black counters,
one contains two white counters and one contains a black and a white counter. Pick a box
at random and remove a counter without looking at it; it is equally likely to be black or
white. The other counter is equally likely to be black or white. Therefore the chance that
your box contains identical counters is 12. But this is clearly false: the correct answer is 23.

Resolution. This is very similar to Galton's paradox. Having picked a box and
counter, the probability that the other counter is the same is a conditional probability, not
an unconditional probability. Thus easy calculations give (with an obvious notation)
(1) P(both blackjB) 23 P(both whitejW );
in neither case is it true that the other counter is equally likely to be black or white. s

Simpson's paradox. A famous clinical trial compared two methods of treating kidney
stones, either by surgery or nephrolithotomy; we denote these by S and N respectively. In
84 2 The rules of probability

all, 700 patients were treated, 350 by S and 350 by N. Then it was found that for cure
rates
273
P(curejS ) ' 0:78,
350
289
P(curejN ) ' [Link]
35
Surgery seems to have an inferior rate of success at cures. However, the size of the stones
removed was also recorded in two categories:
L diameter more than 2 cm,
T diameter less than 2 cm:
When patients were grouped by stone size as well as treatment, the following results
emerged:
P(curejS \ T) ' 0:93,
P(curejN \ T) ' 0:87,
and
P(curejS \ L) ' 0:73,
P(curejN \ L) ' [Link]
In both these cases surgery has the better success rate; but when the data are pooled to
ignore stone size, surgery has an inferior success rate. This seems paradoxical, which is
why it is known as Simpson's paradox. However, it is a perfectly reasonable property of a
probability distribution, and occurs regularly. Thus it is not a paradox.
Another famous example arose in connection with the admission of graduates to the
University of California at Berkeley. Women in fact had a better chance than men of
being admitted to individual faculties, but when the gures were pooled they seemed to
have a smaller chance. This situation arose because women applied in much greater
numbers to faculties where everyone had a slim chance of admission. Men tended to
apply to faculties where everyone had a good chance of admission. s

The switching paradox: goats and cars, the Monty Hall problem. Television has
dramatically expanded the frontiers of inanity, so you are not too surprised to be faced
with the following decision. There are three doors; behind one there is a costly car,
behind two there are cheap (non-pedigree) goats. You will win whatever is behind the
door you nally choose. You make a rst choice, but the presenter does not open this
door, but a different one (revealing a goat), and asks you if you would like to change your
choice to the nal unopened door that you did not choose at rst. Should you accept this
offer to switch? Or to put it another way: what is the probability that the car is behind
your rst choice compared to the probability that it lies behind this possible fresh choice?

Answer. The blunt answer is that you cannot calculate this probability as the
question stands. You can only produce an answer if you assume that you know how the
presenter is running the show. Many people nd this unsatisfactory, but it is important
to realize why it is the unpalatable truth. We discuss this later; rst we show why there
is no one answer.
2.13 Popular paradoxes 85

I The `usual' solution. The usual approach assumes that the presenter is attempting
to make the `game' longer and less dull. He is therefore assumed to behaving as follows.
Rules. Whatever your rst choice, he will show you a goat behind a different door;
with a choice of two goats he picks either at random.
Let the event that the car is behind the door you chose rst be C f , let the event that the
car is behind your alternative choice be C a , and let the event that the host shows you a
goat be G. We require P(C a jG), and of course we assume that initially the car is equally
likely to be anywhere. Call your rst choice D1 , the presenter's open door D2, and the
alternative door D3. Then
P(C a \ G)
(2) P(C a jG)
P(G)
P(GjC a )P(C a )

P(GjC f )P(C f ) P(GjC a )P(C a )
P(GjC a )
,
P(GjC f ) P(GjC a )
because P(C a ) P(C f ), by assumption.
Now by the presenter's rules
P(GjC a ) 1
because he must show you the goat behind D3 . However,
P(GjC f ) 12
because there are two goats to choose from, behind D2 and D3, and he picks the one
behind D3 with probability 12. Hence
1
P(C a jG) 2:
1 12 3

II The `cheapskate' solution. Suppose we make a different set of assumptions.

Assume the presenter is trying to save some money (the show has given away too many
cars lately). He thus behaves as follows.
Rules.
(i) If there is a goat behind the rst door you choose, then he will open that door with
no further ado.
(ii) If there is a car behind the rst door, then he will open another door (D3 ), and hope
you switch to D2 .
In this case obviously P(C a jG) 0, because you only get the opportunity to switch when
the rst door conceals the car.

III The `maa' solution. There are other possible assumptions; here is a very realistic
set-up. Unknown to the producer, you and the presenter are members of the same family.
If the car is behind D1, he opens the door for you; if the car is behind D2 or D3, he opens
the other door concealing a goat. You then choose the alternative because obviously
P(C a jG) 1: s
86 2 The rules of probability

Remark. This problem is also sometimes known as the Monty Hall problem, after the
presenter of a programme that required this type of decision from participants. It
appeared in this form in Parade magazine, and generated a great deal of publicity and
follow-up articles. It had, however, been around in many other forms for many years
before that.
Of course this is a trivial problem, albeit entertaining, but it is important. This
importance lies in the lesson that, in any experiment, the procedures and rules that dene
the sample space and all the probabilities must be explicit and xed before you begin.
This predetermined structure is called a protocol. Embarking on experiments without a
complete protocol has proved to be an extremely convenient method of faking results
over the years. And will no doubt continue to be so.
There are many more `paradoxes' in probability. As we have seen, few of them are
genuinely paradoxical. For the most part such results attract fame simply because
someone once made a conspicuous error, or because the answer to some problem is
contrary to uninformed intuition. It is notable that many such errors arise from an
incorrect use of Bayes' rule, despite the fact that as long ago as 1957, W. Feller wrote this
warning:
Unfortunately Bayes' rule has been somewhat discredited by metaphysical applica-
tions of the type described by Laplace. In routine practice this kind of argument can
be dangerous . . . . Plato used this type of argument to prove the existence of
Atlantis, and philosophers used it to prove the absurdity of Newtonian mechanics.
Of course Atlantis never existed, and Newtonian mechanics are not absurd. But despite
all this experience, the popular press and even, sometimes, learned journals continue to
print a variety of these bogus arguments in one form or another.

Exercises for section 2.13

1. Prisoners paradox. Three prisoners, A, B, and C, are held in solitary connement. The
warder W tells each of them that two are to be freed, the third is to be ogged. Prisoner A, say,
then knows his chance of being released is 23. At this point the warder reveals to A that one of
those to be released is B; this warder is known to be truthful. Does this alter A's chance of
release? After all, he already knew that one of B or C was to be released. Can it be that knowing
the name changes the probability?

2. Goats and cars revisited. The `incompetent' solution. Due to a combination of indolence
and incompetence the presenter has failed to nd out which door the car is actually behind. So
when you choose the rst door, he picks another at random and opens it (hoping it does not
conceal the car). Show that in this case P(C a jG) 12.

2 . 1 4 R E V I E W: N OTAT I O N A N D RU L E S
In this chapter we have used our intuitive ideas about probability to formulate rules that
probability must satisfy in general. We have introduced some simple standard notation to
help us in these tasks; we summarize the notation and rules here.
2.14 Review: notation and rules 87

I Notation
: sample space of outcomes
A, B, C, . . .: possible events included in
: impossible event
P(:): the probability function
P(A): the probability that A occurs
A [ B: union; either A or B occurs or both occur
A \ B: intersection; both A and B occur
A c : complementary event
A B: inclusion; B occurs if A occurs
AnB: difference; A occurs and B does not

II Rules
Range: 0 < P(A) < 1
Impossible event: P() 0
Certain event: P() 1
Addition: P(A [ B) P(A) P(B) when A \ B
P
Countable addition: P([ i A i ) i P(A i ) when (A i ; i > 1) are disjoint events
Inclusionexclusion: P(A [ B) P(A) P(B) P(A \ B)
Complement: P(A c ) 1 P(A)
Difference: when B A, P(AnB) P(A) P(B)
Conditioning: P(AjB) P(A \ B)=P(B)
Addition: P(A [ BjC) P(AjC) P(BjC) when A \ C and B \ C are disjoint
Multiplication: P(A \ B \ C) P(AjB \ C)P(BjC)P(C)
P
The partition rule: P(A) i P(AjBi )P(Bi ) when (Bi ; i > 1) are disjoint events, and
A [ i Bi
Bayes' rule: P(Bi jA) P(AjBi )P(Bi )=P(A)
Independence: A and B are independent if and only if P(A \ B) P(A)P(B)
This is equivalent to P(AjB) P(A) and to P(BjA) P(B)
Conditional independence: A and B are conditionally independent given C when
P(A \ BjC) P(AjC)P(BjC)
Value and expected value: If an experiment yields the numerical outcome a with
probability p, or zero otherwise, then its value (or expected value) is ap
88 2 The rules of probability

2 . 1 5 A P P E N D I X . D I F F E R E N C E E Q UAT I O N S
On a number of occasions above, we have used conditional probability and independence to show
that the answer to some problem of interest is the solution of a difference equation. For example, in
example 2.11.7 we considered
(1) p n qp n1 ,
in example 2.11.8 we derived
(2) p n pp n1 pqp n2 , pq 6 0,
and in exercise 1 at the end of section 2.11 you derived
(3) p n (q p) p n1 p:
We need to solve such equations systematically. Note that any sequence (x r ; r > 0) in which each
term is a function of its predecessors, so that
(4) x r k f (x r , x r1 , . . . , x r k1 ), r > 0,
is said to satisfy the recurrence relation (4). When f is linear this is called a difference equation of
order k:
(5) x r k a0 x r a1 x r1 a k1 x r k1 g(r), a0 6 0:
When g(r) 0, the equation is homogeneous:
(6) x r k a0 x r a1 x r1 a k1 x r k1 , a0 6 0:
Solving (1) is easy because p n1 qp n2 , p n2 qp n3 and so on. By successive substitution we
obtain
p n q n p0 :
Solving (3) is nearly as easy when we notice that
p n 12
is a particular solution. Now writing p n 12 x n gives
x n (q p)x n1 (q p) n x0 :
Hence
p n 12 (q p) n x0 :
Equation (2) is not so easy but, after some work which we omit, it turns out that (2) has solution
(7) p n c1 1n c2 2n
where 1 and 2 are the roots of
x 2 px pq 0
and c1 and c2 are arbitrary constants. You can verify this by substituting (7) into (2).
Having seen these preliminary results, you will not now be surprised to see the general solution to
the second-order difference equation: let
(8) x r2 a0 x r a1 x r1 g(r), r > 0:
Suppose that (r) is any function such that
(r 2) a0 (r) a1 (r 1) g(r)
and suppose that 1 and 2 are the roots of
x 2 a0 a1 x:
Then the solution of (8) is given by

c1 1r c2 2r (r), 1 6 2
xr
(c1 c2 r)1r (r), 1 2 ,
where c1 and c2 are arbitrary constants. Here (r) is called a particular solution, and you should
note that 1 and 2 may be complex, as then may c1 and c2 .
2.16 Problems 89

The solution of higher-order difference equations proceeds along similar lines; there are more 's
and more c's.

2 . 1 6 P RO B L E M S
1. The classic slot machine has three wheels each marked with 20 symbols. You rotate the wheels
by means of a lever, and win if each wheel shows a bell when it stops. Assume that the outside
wheels each have one bell symbol, the central wheel carries 10 bells, and that wheels are
independently equally likely to show any of the symbols (academic licence). Find:
(a) the probability of getting exactly two bells;
(b) the probability of getting three bells.
2. You deal two cards from a conventional pack. What is the probability that their sum is 21?
(Court cards count 10, and aces 11.)
3. You deal yourself two cards, and your opponent two cards. Your opponent reveals that the sum
of those two cards is 21; what is the probability that the sum of your two cards is 21? What is
the probability that you both have 21?
4. A weather forecaster says that the probability of rain on Saturday is 25%, and the probability of
rain on Sunday is 25%. Can you say the chance of rain at the weekend is 50%? What can you
say?
5. My lucky number is 3, and your lucky number is 7. Your PIN is equally likely to be any
number between 1001 and 9998. What is the probability that it is divisible by at least one of
our two lucky numbers?
6. You keep rolling a die until you rst roll a number that you have rolled before. Let A k be the
event that this happens on the kth roll.
(a) What is P(A12 )? (b) Find P(A3 ) and P(A6 ).
7. Ann aims three darts at the bullseye and Bob aims one. What is the probability that Bob's dart
is nearest the ball? Given that one of Ann's darts is nearest, what is the probability that Bob's
dart is next nearest? (They are equally skilful.)
8. In the lottery of 1710, one in every 40 tickets yielded a prize. It was widely believed at the
time that you needed to buy 40 tickets at least, to have a better than evens chance of a prize.
Was this belief correct?
9. (a) You have two red cards and two black cards. Two cards are picked at random; show that
the probability that they are the same colour is 13.
(b) You have one red card and two black cards; show that the probability that two cards
picked at random are the same colour is 13. Are you surprised?
(c) Calculate this probability when you have
(i) three red cards and three black cards, (ii) two red cards and three black cards.
10. A box contains three red socks and two blue socks. You remove socks at random one by one
until you have a pair. Let T be the event that you need only two removals, R the event that the
rst sock is red and B the event that the rst sock is blue. Find
(a) P(BjT ), (b) P(RjT ), (C) P(T ):
11. Let A, B and C be events. Show that
A \ B (A c [ Bc ) c ,
and
A [ B [ C (A c \ Bc \ C c ) c :
90 2 The rules of probability

12. Let (A r ; r > 1) be events. Show that for all n > 1,

!c !c
[
n \n \
n [
n
c
Ar A r and Ar A cr :
r1 r1 r1 r1

Let A and B be events with P(A) 35 and P(B) 12. Show that
1
10 < P(A \ B) < 12
and give examples to show that both extremes are possible. Can you nd bounds for P(A [ B)?
14. Show that if P(AjB) . P(A), then
P(BjA) . P(B) and P(A c jB) , P(A c ):
15. Show that if A is independent of itself, then either P(A) 0 or P(A) 1.
16. A pack contains n cards labelled 1, 2, 3, . . . , n (one number on each card). The cards are dealt
out in random order. What is the probability that
(a) the kth card shows a larger number than its k 1 predecessors?
(b) each of the rst k cards shows a larger number than its predecessors?
(c) the kth card shows n, given that the kth card shows a larger number than its k 1
predecessors?
17. Show that P(AnB) < P(A):
18. Show that
! !
[
n X X \
n
n1
P Ar P(A r ) P(A r \ A s ) () P Ar :
r1 r r,s r1
T
Is there a similar formula for P( nr1 A r )?
19. Show that
P(A \ B) P(A)P(B) P((A [ B) c ) P(A c )P(Bc ):
20. An urn contains a amber balls and b buff balls. A ball is removed at random.
(a) What is the probability that it is amber?
(b) Whatever colour it is, it is returned to the urn with a further c balls of the same colour as
the rst. Then a second ball is drawn at random from the urn. Show that the probability
that it is amber is .
21. In the game of antidarts a player shoots an arrow into a rectangular board measuring six metres
by eight metres. If the arrow is within one metre of the centre it scores 1 point, between one
and two metres away it scores 2, between two and three metres it scores 3, between three and
four metres and yet still on the board it scores 4, and further than four metres but still on the
board it scores 5. William Tell always lands his arrows on the board but otherwise they are
purely random.
3
(a) Show that the probability that his rst arrow scores more than 3 points is 1 16.

B
A C

Figure 2.23. The mole's burrows.

2.16 Problems 91

(b) Find the probability that he scores a total of exactly 4 points in his rst two arrows.
(c) Show that the probability that he scores exactly 15 points in three arrows is given by
3
2 3 1 p
1 sin1 7 :
3 4 8
22. A mole has a network of burrows as shown in gure 2.23. Each night he sleeps at one of the
junctions. Each day he moves to a neighbouring junction but he chooses a passage randomly,
all choices being equally likely from those available at each move.
(a) He starts at A. Find the probability that two nights later he is at B.
(b) Having arrived at B, nd the pr