Probability
an introduction
Geoffrey Grimmett and
| Dominic WelshProbability
an introduction
Geoffrey Grimmett
School of Mathematics, University of Bristol
Dominic Welsh
Merton College, Oxford
CLARENDON PRESS OXFORD
1986Oxford University Press, Walton Street, Oxford OX2 6DP_
Oxford New York Toronto
Delhi Bombay Calcutta Madras Karachi
Petaling Jaya Singapore Hong Kong Tokyo
Nairobi Dar es Salaam Cape Town
Melbourne Auckland
and associated companies in
Beirut Berlin Ibadan Nicosia
Oxford is a trade mark of Oxford University Press
Published in the United States
by Oxford University Press, New York
© Geoffrey Grimmett and Dominic Welsh, 1986
Alll rights reserved. No part of this publication may be reproduced,
stored in a retrieval system, or transmitted, in any form or by any means,
electronic, mechanical, photocopying, recording, or otherwise, without
the prior permission of Oxford University Press
This book is sold subject to the condition that it shall not, by way
of trade or otherwise, be lent, re-sold, hired out or otherwise circulated
without the publisher's prior consent in any form of binding or cover
other than that in which it is published and without a similar condition
including this condition being imposed on the subsequent purchaser
British Library Cataloguing in Publication Data
Grimmett, Geoffrey
Probability: an introduction.
1. Probabilities
1. Title I. Welsh, Dominic
5192 QA273
ISBN 0-19-853272-5
ISBN 0-19-853264-4 (pbk.)
Library of Congress Cataloging in Publication Data
Grimmett, Geoffrey.
Probability: an introduction.
Bibliography: p.
Includes index.
1. Probabilities. I. Welsh, D.J.. A. __ IL. Title.
QA273.G735 1986 519.2, 85-31035
ISBN 0-19-853272-5
ISBN 0-19-853264-4 (pbk.)
Filmset and printed in Northern Ireland by The Universities Press (Belfast) Lid.Preface
Probability and statistics are now taught widely in schools and are
integral parts of many O-level and A-level syllabuses. Consequently the
attitudes of universities towards these subjects have changed over the last
few years and, at many universities, first-year mathematics students learn
material which was previously taught in the third year only. This text is
based upon first and second year courses in probability theory which are
given at the Universities of Bristol and Oxford.
Broadly speaking we cover the usual material, but we hope that our
account will have certain special attractions for the reader and we shall
say what these may be in a moment. The first eight chapters form a
course in basic probability, being an account of events, random variables,
and distributions—we treat discrete and continuous random variables
separately—together with simple versions of the law of large numbers
and the central limit theorem. There is an account of moment generating
functions and their applications. The last three chapters are about
branching processes, random walks, and continuous-time random pro-
cesses such as the Poisson process; we hope that these chapters are
adequate at this level and are suitable appetizers for courses in applied
probability and random processes. We have deliberately omitted various
topics which are crucial in more advanced treatments, such as the theory
of Markov chains, and we hope that most critics will agree with such
decisions. In the case of Markov chains, we could not justify to ourselves
the space required to teach more than mere fragments of the theory. On
the other hand we have included a brief treatment of characteristic
functions in two optional sections for the more advanced reader.
We have divided the text into three sections: (A) Probability, (B)
Further Probability, and (C) Random Processes. In doing so we hope to
indicate two things. First, the probability in Part A seems to us to be core
material for first-year students, whereas the material in Part B is
somewhat more difficult. Secondly, although random processes are
collected together in the final three chapters, they may well be introduced
much earlier in the course. The chapters on branching processes and
random walks might well come after Chapter 5, and the chapter on
continuous-time processes after Chapter 6.
We have two major aims: to be concise and to be honest about
mathematical rigour. Some will say that this book reads like a set ofvi Preface
lecture notes. We would not regard this as entirely unfair; indeed a
principal reason for writing it was that we believe that most students
benefit more from possessing a compact account of the subject in 200
printed pages or so (at a suitable price) than a diffuse account of 400
pages. Most undergraduates learn probability theory by attending lec-
tures, at which they normally take copious and occasionally incorrect
notes; they may also attend tutorials and classes. Few are the under-
graduates who learn probability in private by relying on a textbook as the
sole, or even principal, source of inspiration and learning. Although
some will say that this book is too difficult, it is the case that first-year
students at many universities learn some quite difficult things, such as
axiomatic systems in algebra and ¢/6 analysis, and we doubt if much of
the material covered here is inherently more challenging than these.
Also, lecturers and tutors have certain advantages over authors—they
have the power to hear and speak to their audiences—and these
advantages should help them to explain the harder things to their
students.
Here are a few words about our approach to rigour. It is clearly
impossible to prove everything with complete rigour at this level; on the
other hand it is important that students should understand why rigour is
necessary. We try to be rigorous where possible, and elsewhere we go to
some lengths to point out how and where we skate over thin ice. This can
occasionally be tedious.
Most sections finish with a few exercises; these are usually completely
routine, and students should do them as a matter of course. Each chapter
finishes with a collection of problems; these are often much harder than
the exercises, and include many parts of questions taken from examina-
tion papers set in Bristol and Oxford; we acknowledge permission from
Bristol University and from Oxford University Press in this regard. There
is a final chapter containing some hints for solving the problems.
Problems marked with an asterisk may be rather difficult.
We hope that the remaining mistakes and misprints are not held
against us too much, and that they do not pose overmuch of a hazard to
the reader. Only with the kind help of our students have we reduced
them to the present level. Finally we thank Rhoda Rees for typing the
manuscript with such skill, speed and good cheer.
Bristol and Oxford .G.
July 1985 D.W.Contents
A. BASIC PROBABILITY
Events and probabilities
11
Experiments with chance
Outcomes and events
Probabilities
Probability spaces
Discrete sample spaces
Conditional probabilities
Independent events
The partition theorem
Probability measures are continuous
7 10 Worked problems
1.11 Problems
Discrete random variables
21
Probability mass functions
Examples
Functions of discrete random variables
Expectation
Conditional expectation and the partition theorem
Problems
Multivariate discrete distributions and independence
Bivariate discrete distributions
Expectation in the multivariate case
Independence of discrete random variables
Sums of random variables
Problems
Probability generating functions
4.1
4.2
Generating functions
Integef-valued random variables
22
24
28
29
32
34
36
38
42
43
45viii Contents
4.3, Moments
4.4 Sums of independent random variables ~
4.5 Problems
Distribution functions and density functions
5.1 Distribution functions
5.2 Examples of distribution functions
5.3 Continuous random variables
5.4 Some common density functions
5.5 Functions of random variables
5.6 Expectations of continuous random variables
5.7 Problems
B. FURTHER PROBABILITY
Multivariate distributions and independence
6.1 Random vectors and independence
6.2 Joint density functions
6.3 Marginal density functions and independence
6.4 Sums of continuous random variables
6.5 Changes of variables
6.6 Conditional density functions
6.7 Expectations of continuous random variables
6.8 Conditional expectation and the bivariate normal
distribution
6.9 Problems
Moments, and moment generating functions
7.1 A general note
7.2 Moments
7.3 Variance and covariance
7.4 Moment generating functions
7.5 Characteristic functions
7.6 Problems
The two main limit theorems
8.1. The law of averages
8.2 Chebyshev’s inequality and the weak law
8.3. The central limit theorem
49
51
53
56
59
63
65
67
70
SRRssra
gs
124
126
12910
11
Contents ix
8.4 Convergence in distribution, and characteristic functions
8.5 Problems
C. RANDOM PROCESSES
Branching processes
9.1 Random processes
9.2 A model for population growth
9.3 The generating-function method
9.4 An example
9.5 The probability of extinction
9.6 Problems
Random walks
10.1 One-dimensional random walks
10.2 Transition probabilities
10.3. Recurrence and transience in random walks
10.4 The Gambler’s Ruin problem
10.5 Problems
Random processes in continuous time
11.1. Life at a telephone exchange
11.2 Poisson processes
11.3 Inter-arrival times and the exponential distribution
11.4 Population growth and the simple birth process
11.5. Birth and death processes
11.6 A simple queueing model
11.7 Problems
Appendix Difference equations
Answers to exercises
Remarks on the problems
Reading list
Index
132
136
143
143
145
147
149
152
153
154
156
159
164
168
171
174
177
182
185
190
195
198
201
207
208A. Basic Probability1
Events and probab'
1.1
les
Experiments with chance
Many actions have outcomes which are largely unpredictable in
advance—tossing a coin and throwing a dart are simple examples.
Probability theory is about such actions and their consequences. The
mathematical theory starts with the idea of an experiment (or trial),
being a course of action whose consequence is not predetermined;
this experiment is reformulated as a mathematical object called a
probability space. In broad terms the probability space corresponding
to a given experiment comprises three items:
(i) the set of all possible outcomes of the experiment,
(ii) a list of all the events which may possibly occur as consequences
of the experiment,
(iii) an assessment of the likelihoods of these events.
For example, if the experiment is the throwing of a fair six-sided die,
then the probability space contains
(i) the set {1, 2, 3, 4, 5, 6} of possible outcomes,
(ii) a list of events such as ‘the result is 3’,
‘the result is at least 4’,
‘the result is a prime number’,
(iii) the assessment that each number 1, 2,3, 4,5, 6 is equally likely
to be the result of the throw.
Given any experiment involving chance, there is a corresponding
probability space, and the study of such spaces is called probability
theory. Next, we shall see how to construct such spaces more
explicitly.
1.2 Outcomes and events
We use the letter to denote a particular experiment whose outcome
is not completely predetermined. The first thing which we do is to
make a list of all the possible outcomes of @; the set of all such
possible outcomes is called the sample space of € and we usually
denote it by 2. The Greek letter w denotes a typical member of 22,
and we call each member @ of 2 an elementary event.
If, for example, @ is the experiment of throwing a fair die once,4 Events and probabilities 12
a
then Q= (1, 2,3, 4,5, 6}. There are many questions which we may
wish to ask about the actual outcome of this experiment (questions
such as ‘is the outcome a prime number?”’), and all such questions
may be rewritten in terms of subsets of Q (the previous question
becomes ‘does the outcome lie in the subset {2,3, 5} of Q”). The
second thing which we do is to make a list of all the events which are
interesting to us; this list takes the form of a collection of subsets of
Q, each such subset A representing the event ‘the outcome of @ lies
in A’. Thus we ask ‘which possible events are interesting to us’ and
then we make a list of the corresponding subsets of 9. This
relationship between events and subsets is very natural, especially
because two or more events combine with each other in just the same
way as the corresponding subsets combine; for example, if A and B
are subsets of 2 then
the set A U B corresponds to the event ‘either A or B occurs’,
the set AM B corresponds to the event ‘both A and B occur’,
the set 2\A corresponds} to the event ‘A does not occur’,
where we say that a subset C of 22 ‘occurs’ whenever the outcome of
€ lies in C. Thus all set-theoretic statements and combinations may
be interpreted in terms of events; for example, the formula
Q\(ANB)=(Q\A)U(Q\B)
may be read as ‘if A and B do not both occur, then either A does not
occur or B does not occur’. In a similar way, if A,, Ao, ... are events
then the sets 7, A; and (2, A; represent the events ‘A, occurs, for
some i’ and ‘A, occurs, for every i’, respectively.
Thus we write down a collection ¥ = {A;:i€I} of subsets of Q
which are interesting to us; each A € F is called an event. In simple
cases, such as the die-throwing example above, we usually take ¥ to
be the set of all subsets of Q (called the power set of Q), but for
reasons which may be appreciated later there are many circumstances
in which we take ¥ to be a very much smaller collection than the
entire power set. In all cases we demand a certain consistency of ¥,
in the following sense. If A, B, C, . . . € ¥ then we may reasonably be
interested also in the events ‘A does not occur’ and ‘at least one of
A, B, C,... occurs’. With this in mind we require that F satisfy the
following definition.
The collection ¥ of subsets of the sample space 9 is called an event
space if
F is non-empty,
+ For any subset A of @, the complement of A is the set of all members of @ which are
not members of A. We denote the complement of A by either 2\A or A°, depending
on the context.@)
3)
Example
4
Example
5
Example
6
Exercises
Probabilities 5
if Ae F then Q\AcF,
if A,, Ao,...€# then UA, F
P=
We speak of an event space ¥ as being ‘closed under the
operations of taking complements and countable unions’. An ele-
mentary consequence of axioms (1)—(3) is that an event space ¥ must
contain the empty set @ and the whole set Q. This holds since ¥
contains some set A (from (1)), and hence ¥ contains Q\A (from
(2), giving also that ¥ contains the union Q = A U(Q\A) together
with the complement 2\2 = © of this last set.
Here are some examples of pairs (@, ¥) of sample spaces and
event spaces.
Qis any set and is the power set of Q. oO
Qis any set and ¥ = {Z, A, Q\A, Q} where A is a given subset of
Q. a
Qz={1, 2, 3, 4, 5, 6} and F is the collection
©, {1, 2}, {3,4}, {5, 6}, {1, 2,3, 4}, (3,4, 5, 6}, {1, 2,5, 6}, 2
of subsets of @. This event space is unlikely to arise naturally in
practice. a
In these exercises, Q is a set and ¥ is an event space of subsets of Q.
If A, Be F, show that AN Be F.
The difference A\B of two subsets A and B of @is the set AM (Q\B) of
all points of @ which are in A but not in B. Show that if A, B ¢ ¥, then
A\B ee F.
3. The symmetric difference AA B of two subsets A and B of Qis defined to
be the set of points of @ which are in either A or B but not in both. If
A, Be &, show that AABe F.
4. If Aj, Az,.-., Am € ¥ and k is a positive integer, show that the set of
points in 2 which belong to exactly k of the A’s belongs to ¥ (the
previous exercise is the case when m = 2 and k = 1).
5. Show that if Qis a finite set then ¥ contains an even number of subsets
of Q.
1.3. Probabilities
From our experiment €, we have so far constructed a sample space
Q and an event space ¥ associated with @, but there has been no
mention yet of probabilities. The third thing which we do is to6 Events and probabilities 13
”)
(8)
(9)
Example
10
allocate probabilities to each event in ¥, writing P(A) for the
probability of the event A. We shall assume that this can be done in
such a way that the probability function P satisfies certain intuitively
attractive conditions:
(i) each event A in the event space should have a probability P(A)
which lies between 0 and 1;
(ii) the event Q, that ‘something happens’, should have probability
1, and the event @, that ‘nothing happens’, should have
probability 0;
(iii) if A and B are disjoint events (so that AN B=@) then
P(A U B) = P(A) + P(B).
We collect these conditions into a formal definition as follows.
A mapping P : ¥—>R is called a probability measure on (Q, F) if
P(A)=0 forall Ae,
P(Q)=1 and P(@)=0,
if A,, Az, ... are disjoint events in F (so that A, A, =@ whenever
i#j) then
e(U A.) = z P(A,).
We emphasize that a probability measure P on (Q, ¥) is defined
only on those subsets of @ which lie in ¥. The second part of
condition (8) is superfluous in the above definition; to see this, note
that @ and Q are disjoint events with union QU@= @ and so
P(Q) = P(2 UD) = P(Q) + P(D) by (9).
Condition (9) requires that the probability of the union of a
countable+ collection of non-overlapping sets is the sum of the
individual probabilities.
Let @ be a set and A be a proper subset of Q (so that A #@, Q). If
F is the event space {B, A, Q\A, Q} then all probability measures
on (, #) have the form
for some p satisfying 0=p <1. o
+A set S is called countable if it may be put in one-one correspondence with a subset
of the natural numbers {1,2,3,...}14
Example
11
Exercises
Probability spaces 7
Let Q= {@,, @2,..., Oy} be a finite set of exactly N points, and let
F be the power set of Q. It is easy to check that the function P
defined byt
P(A)= BIA for AeF
is a probability measure on (Q, ¥). a
6. Let ps, P2,-..,py be non-negative numbers such that p; +p2+ +++ +
Pw =1, and let Q={c, @2,..., @y}, with ¥ the power set of Q, as in
Example 11 above. Show that the function Q given by
Q(A)= Dp, for Ac#,
cA
is a probability measure on (@, #). Is Q a probability measure if F is
not the power set of @ but merely some event space of subsets of 2?
1.4 Probability spaces
(12)
Proof
(13)
Proof
a4)
We now combine the previous ideas and define a probability space to
be a triple (Q, ¥, P) of objects such that
() Qisa set,
(ii) F is an event space of subsets of 2,
(iii) P is a probability measure on (Q, ¥).
There are many elementary consequences of the axioms which
underlie this definition, and we describe some of these. Let (Q, ¥, P)
be a probability space.
If A, Be F thent A\Be F.
The complement of A\B equals (@\A)UB, which is the union of
events and is therefore an event. Hence A\B is an event, by (2). 0
If Ay, Az)... € F then () A, € F.
is
The complement of (7.1 A, equals U7, (@\A,) which is the union of
the complements of events and is therefore an event. Hence the
intersection of the A’s is an event also, as before. o
If Ae F then P(A) + P(Q\A)= 1.
+ The cardinality |A| of a set A is the number of points in A.
$A\B = AN (Q\B) is the set of points in A which are not in B.8 Events and probabilities 14
Proof A and Q\A are disjoint events with union Q, and so
1=P(Q)=P(A) + P(Q\A). Q
(15) If A, Be ¥ then P(A UB) + P(AN B) =P(A) + P(B).
Proof _—The set A is the union of the disjoint sets A\B and AN B, and hence
P(A) = P(A\B) + P(A NB) by (9).
A similar remark holds for the set B, giving that
P(A) + P(B) = P(A\B) + 2P(A NB) + P(B\A)
=P((A\B) U(AN B)U(B\A)) + P(ANB) by (9)
=P(AUB)+P(ANB). oO
(16) If A, Be ¥ and ACB then P(A) =P(B).
Proof P(B) = P(A) + P(B\A) = P(A). a
It is often useful to draw a Venn diagram when working with
probabilities, For example, to show the formula in (15) above we
might draw the diagram in Fig. 1.1, and note that the probability of
AU Bis the sum of P(A) and P(B) minus P(A B), since this latter
probability is counted twice in the simple sum P(A) + P(B).
Exercises In these exercises (@, F, P) is a probability space.
7. If A, Be &, show that
P(A\B) = P(A) — P(ANB).
8. If A, B, Ce &, show that
P(A UBUC)=P(A) + P(B) + P(C)— P(A NB)
—P(ANC)-P(BNC)+P(ANBNC).
Fig. 1.1. A Venn diagram which illustrates the fact that P(A U B) = P(A) +
P(B)— P(ANB)Discrete sample spaces 9
( Let A, B, C be three events such that
P(A)=i0, = P(B)=%, = P(C)=%,
P(ANB)=%, P(BNC)=%, P(ANC)=%,
P(ANBNC)=%.
By drawing a Venn diagram or otherwise, find the probability that
exactly two of the events A, B, C occur.
10. A fair coin is tossed 10 times (so that heads appears with probability } at
each toss). Describe the appropriate probability space in detail for the
two cases when
(i) the outcome of every toss is of interest,
i) only the total number of tails is of interest.
In the first case your event space should have 2” events, but in the
second case it should have only 2" events.
1.5 Discrete sample spaces
Example
7
Example
18
Let € be an experiment with probability space (Q, ¥,P). The
structure of this space depends greatly upon whether @ is a countable
set (that is, a finite or countably infinite set) or an uncountable set. If
Q is a countable set then we normally take F to be the set of all
subsets of , for the following reason. Suppose that Q=
{@;, @2,-..} and, for each w € Q, we are interested in whether or
not this given w is the actual outcome of @; then we require that each
singleton set {w} belongs to ¥. Let AcQ. Then A is countable
(since Q is countable) and so A may be expressed as the union of the
countably many @’s which belong to A, giving that A =U,.4 {@} €
F by (3). The probability P(A) of the event A is determined by the
collection {P({«@}) : € Q} of probabilities since, by (9),
P(A) = =, P({o}).
We usually write P(w) for the probability P({w}) of an event
containing only one point in Q.
Equiprobable outcomes. If Q= {o,, @2,..., Oy} and P(@,) = P(o,)
for all i and j, then P(w)=N~ for all @ € Q, and P(A) =[A|/N for
all AcQ. a
Random integers. There are “intuitively-clear” statements which are
without meaning in probability theory, and here is an example: if we
pick a positive integer at random, then it is an even integer with
probability }. Interpreting “at random” to mean that each positive
integer is equally likely to be picked, then this experiment would10 Events and probabilities 15
Exercises
have probability space (Q, ¥, P) where
@aeta -
(ii) F is the set of all subsets of Q,
(iii) if A c Q then P(A) = Dies P() = |A| where z is the probabil-
ity that any given integer, i say, is picked.
However,
if 2 =0 then P(Q) =
if >0 then P(Q)= >) r=”,
a
neither of which is in agreement with the rule that P(Q)=1. One
possible way of interpreting the italicized statement above is as
follows. Fix N, a large positive integer, and let &y be the experiment
of picking an integer from Qy={1,2,...,N} at random. The
probability that the outcome of y is even is
1
los
N,
so that, as N->«, the required probability tends to 4. Despite this
sensible interpretation of the italicized statement, we emphasize that
this statement is without meaning in its present form and should be
shunned by serious probabilists. a
Pants ow
2
Lif Nis even, and 5 ( )ifvis odd,
The most elementary problems in probability theory are those which involve
experiments such as the shuffling of cards or the throwing of dice, and these
usually give rise to situations in which every possible outcome is equally
likely to occur. This is the case of Example 17 above. Such problems usually
reduce to the problem of counting the number of ways in which some event
may occur, and the following exercises are of this type.
11. Show that if a coin is tossed n times, then there are exactly
(“ena
sequences of possible outcomes in which exactly k heads are obtained. If
the coin is fair (so that heads and tails are equally likely on each toss),
show that the probability of getting at least k heads is
#3 (1):
12. We distribute r distinguishable balls into n cells at random, multiple
occupancy being permitted. Show that
(i there are n” possible arrangements,
(ii) there are (Je — 1)~* arrangements in which the first cell contains
exactly k balls,Conditional probabilities 11
(iii) the probability that the first cell contains exactly k balls is
Ast re
(OG) 0-2)
13. Show that the probability that each of the four players in a game of
bridge receives one ace is
24-481 13°
=0.105. .
oe
14. Show that the probability that two given hands in bridge contain k aces
between them is
4\/ 48 52)
()loe=)/ Ce):
15. Show that the probability that a hand in bridge contains 6 spades, 3
hearts, 2 diamonds, and 2 clubs is
(8)(5)(2) /(a)
6/A3/\2 13/*
Which of the following is more probable:
(i) getting at least one six with 4 throws of a die,
(ii) getting at least one double six with 24 throws of two dice?
This is sometimes called ‘de Méré’s paradox’, after the professional
gambler Chevalier de Méré who believed these two events to have equal
probability.
1
a
1.6 Conditional probabil
(as)
ies
Let & be an experiment with probability space (Q, ¥, P). We may
sometimes possess incomplete information about the actual outcome
of @ without knowing this outcome exactly. For example, if we throw
a fair die and a friend tells us that an even number is showing, then
this information affects all our calculations of probabilities. In
general, if A and B are events (that is A, B € #) and we are given
that B occurs, then, in the light of this information, the new
probability of A may no longer be P(A). Clearly, in this new
circumstance, A occurs if and only if AM B occurs, suggesting that
the new probability of A is proportional to P(A MB). We make this
chat more formal in a definition.
If A, B € ¥ and P(B)>0, then the (conditional) probability of A
given B is denoted by P(A | B) and defined by
P(A|B)=—S
Note that the constant of proportionality in (19) hs been chosen so12 Events and probabilities W7
that the probability P(B |B), that B occurs given that B occurs,
satisfies P(B | B)=1. We must first check that this definition gives
rise to a probability space.
Theorem If Be ¥ and P(B)>0 then (Q, F, Q) is a probability space where
1A Q: FR is defined by Q(A) = P(A |B).
Proof We need only check that Q is a probability measure on (Q, ¥).
Certainly Q(A)=0 for all A e F and
P(QNB)
P(B)
and it remains to check that Q satisfies (9). Suppose that Aj, Ap,...
are disjoint events in ¥. Then
(4) Fw (Ya) na)
[Link]))
Q(@) =P(@| B)= -1
7; e(L
=a FHV PANB) since P satisfies (9)
=D OA). a
Exercises 17. If (@, %, P) is a probability space and A, B, C are events, show that
P(ANBNC)=P(A|BNC)P(B | C)P(C)
so long as P(BNC)>0.
18. Show that P(B)
PBI A)=PCAIB) sy
if P(A) >0 and P(B) >0.
19. Consider the experiment of tossing a fair coin 7 times. Find the
probability of getting a prime number of heads given that heads occurs
on at least 6 of the tosses.
1.7 Independent events
We call two events A and B ‘independent’ if the occurrence of one of
them does not affect the probability that the other occurs; more
formally this requires that, if P(A), P(B)>0, then
(20) P(A|B)=P(A) and P(B|A)=P(B).@y)
(22)
Example
. 23
Exercises
Independent events 13
Writing P(A | B)= P(A B)/P(B), we see that the following defini-
tion is appropriate.
Events A and B of a probability space (@, ¥,P) are called
independent if
P(A NB) =P(A)P(B)
and dependent otherwise.
This definition is slightly more general than (20) since it allows A
and B to have zero probability. It is easily generalized to more than
two events.
A family of = (A; :i€ 1) of events is called independent if, for all
finite subsets J of I,
P(Q4)) = TPC).
The family sf is called pairwise independent if (22) holds whenever
yl=2.
Thus, three events A, B, C are independent if and only if the
following equalities hold:
P(ANBNC)=P(A)P(B)P(C), — P(AN B)=P(A)P(B),
P(ANC)=P(A)P(C), — P(BNC)=P(B)P(C).
There are families of events which are pairwise independent but not
independent.
Suppose that we throw a fair four-sided die (you may think of this as
a square die thrown in a two-dimensional universe). Then we may
take Q = {1, 2, 3, 4} where each w € Q is equally likely to occur. The
events A= {1,2}, B= {1,3}, C={1, 4} are pairwise independent
but not independent. a
20. Let A and B be events satisfying P(A), P(B)>0, and such that
P(A | B) = P(A). Show that P(B | A) = P(B).
21. If A and B are events which are disjoint and independent, what can be
said about the probabilities of A and B?
22. Show that events A and B are independent if and only if A and Q\B are
independent.
23. Show that events A,, Az, ..., Am are independent if and only if Q\A,,
Q\A;,..., Q\A,, are independent.
24. If Aj, Az,...,Am are independent and P(A,)=p for i=1,2,...,m,
find the probability that
(i) none of the A’s occur,
(ii) an even number of the A’s occur
25. On your desk there is a very special die which has a prime number p of
faces, and you throw this die once. Show that no two events A and B can
be independent unless either A or B is the whole sample space.14 Events and probabilities 18
1.8 The partition theorem
Theorem
1B
Proof
Example
24
Solution
Exercises
Let (@, F, P) be a probability space. A partition of Q is a collection
{B; :i € 1} of disjoint events (so that B; € ¥ for each i and B, NB, =@
if i+) with union U, B;=2. The following partition theorem is
extremely useful.
If {B,, By, .. .} is a partition of Q such that P(B;)>0 for each i, then
P(A) => P(A|B)P(B,) forall Ae F.
This theorem has several other fancy names, such as ‘the theorem
of total probability’; it is closely related to the so-called ‘Bayes
formula’.
ruy=r(an(ya)
=P(Utna))
=> P(ANB) by (9)
=> P(A | B)P(B,) by (19). a
Here is an example of this theorem in action.
Tomorrow there will be either rain or snow but not both; the
probability of rain is 3 and the probability of snow is 3. If it rains then
the probability that I will be late for my lecture is $, while the
corresponding probability in the event of snow is 3. What is the
probability that I will be late?
Let A be the event that I am late and B be the event that it rains.
Then the pair B, B° is a partition of the sample space (since one or
other of them must occur). We apply Theorem 1B to find that
P(A) =P(A | B)P(B) + P(A | B°)P(B°)
=b-34+3-3=8. a
26. Here is a routine problem about balls in urns. You are presented with
two urns. Urn I contains 3 white and 4 black balls and Urn II contains 2
white and 6 black balls. You pick a ball randomly from Urn I and place
it in Urn II. Next you pick a ball randomly from Urn Il. What is the
probability that this ball is black?
21. A biased coin shows heads with probability p= 1—q whenever it isProbability measures are continuous 15
tossed. Let u, be the probability that, in tosses, no pair of heads occur
successively. Show that for n= 1
Uns2 = Uns + Pens
and find u, by the usual method (described in the appendix) if p
(Hint: use Theorem 1B with B, the event that the first i — 1 tosses yield
heads and the ith yields tails.
1.9 Probability measures are continuous
Theorem
1c
Example
25
There is a certain property of probability measures which will be
crucially important later, and we describe this next. Too great an
emphasis should not be laid on this property at this stage, and
therefore we strongly recommend to the reader that he omit this
section at the first reading.
A sequence A, Az, ... of events in a probability space (@, ¥, P)
is called increasing if
A, CA, +++ CAQ-1 SAn SAnai S01
The union
of such a sequence is called the limit of the sequence, and it is an
elementary consequence of the axioms for an event space that A is an
event. It is perhaps not surprising that the probability P(A) of A may
be expressed as the limit lim,_... P(A,,) of the probabilities of the A’s.
Let (2, ¥, P) be a probability space. If A, Az, .-. is an increasing
sequence of events in ¥ with limit A, then
P(A) = lim P(A,,).
We precede the proof of the theorem with an application.
It is intuitively clear that the chance of obtaining no heads in an
infinite set of tosses of a fair coin is 0. A rigorous proof goes as
follows. Let A,, be the event that the first n tosses of the coin yield at
least one head. Then
A, CAns for n=1,2,...,
so that the A’s form an increasing sequence; the limit set A is the
event that heads occurs sooner or later. From Theorem 1C,
P(A) = lim P(A,,);16 Events and probabilities 1.10
Proof of
Theorem
however, P(A,,) = 1 — (3)" and so P(A,) > 1 as n>. Thus P(A) = 1,
giving that the probability P(Q\A), that heads never appears, equals
0. Oo
Let B;=A;\A;-, for i=2, 3,.... Then
A=A,UB,UBU>>+
is the union of disjoint events in ¥ (draw a Venn diagram to make
this clear). By (9),
P(A) = P(A,) + P(B,) + P(Bs) + +++
=P(A,) + lim S) P(B).
However,
P(B,) =P(A,)— P(A;-1) for i=2,3,...,
and so a
P(A) = PCAL) + fim > [P(Ag) — Pc DI
= lim P(A,)
as required, since the last sum collapses. Qo
The conclusion of Theorem 1C is expressed in terms of increasing
sequences of events, but the corresponding result about decreasing
sequences is valid too: if By, By, . . . is a sequence of events in ¥ such
that B)> B;4, for i=1,2,..., then P(B,)—>P(B) as n>, where
B=(2:B, is the limit of the B,’s as i>. The easiest way to
deduce this is to set A, = Q\B; in the theorem.
1.10 Worked problems
Example
26
Solution
A fair six-sided die is thrown twice (when applied to such objects as
dice or coins, the adjectives ‘fair’ and ‘unbiased’ imply that each
possible outcome has equal probability of occurring).
(a) Write down the probability space of this experiment.
(b) Let B be the event that the first number thrown is no larger than
3, and let C be the event that the sum of the two numbers thrown
equals 6. Find the probabilities of B and C, and the conditional
probabilities of C given B, and of B given C.
The probability space of this experiment is the triple (2, ¥, P) where
(i) Q={@f):4j=1,2,..., 6} is the set of all ordered pairs of
integers between 1 and 6,1.10
Example
27
Worked problems 17
ii) F is the set of all subsets of Q,
(iii) each point in Q has equal probability, so that
P(i,/))=36 for i, j=1,2,
and, more generally,
P(A)=%|A| for each AcQ.
The events B and C are subsets of 2 given by
B=({(i,j):i=1,2,3 and j=1,2,..., 6},
C={(i, j):itj=6 and i, j=1,2,..., 6}.
B contains 3 X 6 = 18 ordered pairs, and C contains 5 ordered pairs,
giving that
P(B)=%=2, P(C)=%.
Finally, B C is given by
BNC={(1,5), (2,4), G, 3)}
containing just 3 ordered pairs, so that
P(COB
(c| 8) == a= a,
and
P(IBNC
PB | 0) =P ORO Baad 5
You are travelling in a train with your sister. Neither of you has a
valid ticket and the inspector has caught you both. He is authorized
to administer a special punishment for this offence. He holds a box
containing nine apparently identical chocolates, but three of these are
contaminated with a deadly poison. He makes each of you, in turn,
choose and immediately eat a single chocolate.
(a) If you choose before your sister, what is the probability that
you survive?
(b) If you choose first and survive, what is the probability that
your sister survives?
(c) If you choose first and die, what is the probability that your
sister survives?
(d) Is it in your best interests to persuade your sister to choose
first?
(e) If you choose first, what is the probability that you survive,
given that your sister survives?18 Events and probabilities 111
Solution Let A be the event that the first chocolate picked is not poisoned,
(28)
1.11
and let B be the event that the second chocolate picked is not
poisoned. Elementary calculations, if you are allowed the time to
perform them, would show that
P(A4)=8, P(B|A)=3, — P(B| A‘) =8,
giving by the partition theorem 1B that
P(B) = P(B | A)P(A) + P(B | A®)P(A‘)
=a$+§-(1-§)=
Hence P(A) = P(B), so that the only reward of choosing second is to
increase your life expectancy by a few seconds.
‘The final question (e) seems to be the wrong way round in time,
since your sister chooses her chocolate after you. The way to answer
such a question is to reverse the conditioning as follows:
P(ANB) P(A)
AUD EE Cys PCB)’
=P(B|A) 5)
and hence a
P(4| B)=3-8/3=3.
We note that P(A|B)=P(B|A), in agreement with our earlier
observation that the order in which you and your sister pick from the
box is irrelevant to your chances of survival. o
Problems
1. A fair die is thrown 7 times. Show that the probability that there are an
even number of sixes is
11+ @'
For the purpose of this question, 0 is an even number.
. Does there exist an event space which contains just six events?
Prove Boole’s inequality:
P(t) A) =3 P(A,).
‘Two fair dice are thrown. Let A be the event that the first shows an odd
number, B be the event that the second shows an even number, and C
be the event that either both are odd or both are even. Show that
A, B, C are pairwise independent but not independent.
. Urn I contains 4 white and 3 black balls, and Urn II contains 3 white
and 7 black balls. An urn is selected at random and a ball is picked
from it. What is the probability that this ball is black? If this ball is
white, what is the probability that Urn I was selected?
6. A single card is removed at random from a deck of 52 cards. From the
en
s111
2
x
e
og
i.
Problems 19
MN
Bn LO
Fig. 1.2 Two electrical circuits incorporating switches
remainder we draw two cards at random and find that they are both
spades. What is the probability that the first card removed was also a
spade?
Two people toss a fair coin n times each. Show that the probability they
throw equal numbers of heads is
(Ca)
oes
In the circuits in Fig. 1.2 each switch is closed with probability p,
independently of all other switches. For each circuit, find the probabil-
ity that a flow of current is possible between A and B.
Show that if u, is the probability that n tosses of a fair coin contain no
tun of 4 heads, then for n =4
Uy, = Mya + Aina + Bln a + Hellas
Use this difference equation to show that, in 8 throws, the probability
of no such run is 28.
Any number @ €[0, 1] has a decimal expansion
@=0- xx...
and we write f,(@,) for the proportion of times that the integer k
appears in the first n digits in this expansion. We call a normal
number if
filo, n)>h as ne
for k=0, 1,2,...,9. On intuitive grounds we may expect that most
numbers w €[0, 1] are normal numbers, and Borel proved that this is
indeed true. It is quite another matter to exhibit specific normal
numbers. It is known that the number
0.1234567891011121314. . .
is normal; prove this. It is an unsolved problem of mathematics to show
that e — 2 and m3 are normal numbers also.
A square board is divided into 16 equal squares by lines drawn parallel
to its sides. A counter is placed at random on one of these squares and
is then moved n times. At each of these moves it can be transferred to
any neighbouring square, horizontally, vertically, or diagonally, all such
moves being equally likely.
Let c, be the probability that a particular corner site is occupied after 7
such independent moves, and let the corresponding probabilities for an
intermediate site at the side of the board and for a site in the middle of
the board be s, and m, respectively. Show that
2 4, +85,4+4m,=1, 2 =0,1,2,...20. Events and probabilities oe
a
gE
14.
15.
and that
Com Bettie, 21,2,
Find two other relations for s, and m, in terms of ¢,_;, 5,1, and m,_;
and hence find c,, $,, and m,. (Oxford 1974M)
(a) Let P(A) denote the probability of the occurrence of an event A.
Prove carefully, for events A,, A,...,A,, that
P(,U4)= & Pad- DPM) +
+ cyrP(9,4,).
(b) One evening a bemused lodge-porter tried to hang n keys on
their n hooks, but only managed to hang them independently and at
random. There was no limit to the number of keys which could be hung
on any hook. Otherwise, or by using (a), find an expression for the
probability that at least’ one key was hung on its own hook. The
following morning the porter was rebuked by the Bursar, so that in the
evening he was careful to hang only one key on each hook. But he still
only managed to hang them independently and at random. Find an
expression for the probability that no key was then hung on its own
hook.
Find the limits of both expressions as n tends to infinity.
(You may assume that
for all real x.) (Oxford 1978M)
Two identical decks of cards, each containing N cards, are shuffled
randomly. We say that a k-matching occurs if the two decks agree in
exactly k places. Show that the probability that there is a k-matching is
( 1,11 y)
mata at tL
for k=0,1,2,...,N-1,N. We note that 2, ~(k!e)7' for large N
and fixed k. Such matching probabilities are used in testing departures
from randomness in circumstances such as psychological tests and
wine-tasting competitions. (The convention is that 0! = 1.)
The buses which stop at the end of my road do not keep to the
timetable. They should run every quarter hour, at 08.30, 08.45,
09.00,..., but in fact each bus is either five minutes early or five
minutes late, the two possibilities being equally probable and different
buses being independent. Other people arrive at the stop in such a way
that, ¢ minutes after the departure of one bus, the probability that no
one is waiting for the next one is e~“*. What is the probability that no
one is waiting at 09.00? One day I come to the stop at 09.00 and find no
one there; show that the chances are more than four to one that I have
missed the nine o’clock bus.
(You may use the good approximation e? ~ 20.) (Oxford 1977M)
A coin is tossed repeatedly; on each toss a head is shown with
probability p, a tail with probability 1—p. All tosses are mutuallyoat
“17.
Problems 21
independent. Let E denote the event that the first run of r successive
heads occurs earlier than the first run of s successive tails. Let A denote
the outcome of the first toss. Show that
P(E | A =head) =p™"' + (1—p”"")P(E| A = tail).
Find a similar expression for P(E | A = tail) and hence find P(E).
(Oxford 1981M)
. Try Eddington’s Controversy: If A, B, C, D each speak the truth once
in three times (independently) and A claims that B denies that C
declares that D is a liar, what is the probability that D was speaking the
truth?
Show that the axiom that P is countably additive is equivalent to the
axiom that P is finitely additive and continuous. That is to say, let 2 be
a set and ¥ an event space of subsets of @. If P is a mapping from ¥
into [0 , 1] satisfying
(i) P(Q)=1, P()=0,
(ii) if A, Be ¥ and ANB =@ then P(AUB) =
if Ay, Az,...€ Fand A,cA,,, for i=1, 2,
P(A) = lim P(A,)
where A=, A,
then P satisfies P(UZ, A,) =, P(A,) for all sequences Ay, Az, .-.
disjoint events.
. In a piece of electronic equipment, a gate, which may be open or
closed, has a random method of operation. Every millisecond it may
change its state; the changes occur in the following manner:
(i) if it is open it remains open with probability 1— a and becomes
closed with probability «,
(ii) if it is closed it remains so with probabi
with probability B.
If 0