0% found this document useful (0 votes)
27 views115 pages

Ponder Probability 2025 September

The document is a textbook on probability by Nathan Ponder, covering fundamental concepts such as sample spaces, probability axioms, random variables, and widely used distributions. It includes detailed chapters on discrete and continuous random variables, joint probability distributions, and random processes, along with numerous examples and exercises. The book aims to provide a comprehensive understanding of probability theory applicable in various fields.

Uploaded by

Braincain007
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views115 pages

Ponder Probability 2025 September

The document is a textbook on probability by Nathan Ponder, covering fundamental concepts such as sample spaces, probability axioms, random variables, and widely used distributions. It includes detailed chapters on discrete and continuous random variables, joint probability distributions, and random processes, along with numerous examples and exercises. The book aims to provide a comprehensive understanding of probability theory applicable in various fields.

Uploaded by

Braincain007
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 115

Probability

Nathan Ponder

Louisiana Tech University


Copyright ©2025 by the author, Nathan Ponder.

All rights reserved.

No part of this book may be reproduced by any means except by written


permission of the author.
Contents

Chapter 1. Probability 1
1.1. Sample Spaces 1
1.2. Probability Axioms and Rules 2
1.3. Odds 5
1.4. Conditional Probability and Independence 6
1.5. Bayes’ Theorem 9

Chapter 2. Random Variables 13


2.1. Discrete Random Variables 13
2.2. Continuous Random Variables 17
2.3. Cumulative Distribution Functions 20
2.4. Expected Values and Variance 25

Chapter 3. Widely Used Discrete Random Variables 35


3.1. Counting Techniques 35
3.2. Bernoulli 37
3.3. Binomial 38
3.4. Hypergeometric 41
3.5. Poisson 43
3.6. Geometric 47
3.7. Negative Binomial 49

Chapter 4. Widely Used Continuous Random Variables 51


4.1. Uniform 51
4.2. Exponential 52
4.3. Gamma 55

vii
viii Contents

4.4. Normal 56
Chapter 5. Joint Probability Distributions 63
5.1. Discrete Case 63
5.2. Continuous Case 69
5.3. Functions of Joint Random Variables. 74
5.4. Expected Value, Variance, and Covariance 77
Chapter 6. Sampling and Limit Theorems 85
6.1. Sample Mean and Variance 85
6.2. Law of Large Numbers 88
6.3. Moment Generating Functions 90
6.4. Sums of Independent Random Variables 93
6.5. The Central Limit Theorem 96
6.6. Normal Approximation to the Binomial 99
Chapter 7. Random Processes 103
7.1. Markov Chains 103
Index 109
Chapter 1

Probability

Probability theory is the study of randomness. Major developments were made


in this theory in the fifteenth and sixteenth centuries as individuals attempted
to better understand games of chance. In more recent times, probability theory
has been used to characterize random phenomena in the natural sciences, finance,
health care, and other areas of study. It is worth noting too that it’s necessary to
have a grounding in probability in order to study statistics on a sophisticated level.

1.1. Sample Spaces


Probabilists are interested in knowing the likelihood of certain outcomes when an
experiment is conducted. The set of all possible outcomes of an experiment is called
the sample space and is labeled S. One or more outcomes taken together as a unit
are referred to as an event. Events are typically labeled by upper case letters as A,
B, C, etc. Some events are of such interest that we want to know their likelihood,
or probability of occurring. For example we might want to know how likely it is
that it will rain tomorrow, that a certain team will win a game, or that a certain
stock price will go up during today’s trading session. By convention, the probability
that a certain event occurs is taken to be some number from 0 to 1. If we toss a
balanced coin, for example, we say there’s a probability of 1/2 we get heads. This
is so because if we toss the coin over and over again, heads will come up about half
of the time.
The notation for probability is as follows. If A is an event, the probability that
A occurs is written P (A). This is sometimes read “the probability of A” or simply
“P of A”. If the experiment consists of tossing a coin, and A represents the event
that we get heads, then we have P (A) = 1/2. If the sample space S consists of n
equally likely outcomes, and A is an event in S, then

the number of outcomes in A


P (A) = .
n

1
2 1. Probability

As an example, there’s a probability of 1/6 we get a 4 when we roll a die. We


provide some more examples.

Example 1.1. What’s the probability you get at least two heads if you toss a coin
three times?

If you toss a coin three times, then the sample space of outcomes is

S = {HHH, HHT, HT H, T HH, HT T, T HT, T T H, T T T }.

If A represents the event that you get at least two heads, then

A = {HHT, HT H, T HH, HHH}.

Consequently,
4 1
P (A) = =
8 2
since there are just four outcomes in A and eight in S.

Example 1.2. What’s the probability that the sum of the two numbers that come
up when you roll a pair of dice is 8?

If you roll two dice to see which numbers come up, then the sample space
of outcomes is S = {(1, 1), (2, 1), ..., (6, 6)} . If A represents the event we are to
find the probability of, then A = {(6, 2), (5, 3), (4, 4), (3, 5), (2, 6)}. Consequently,
5
P (A) = 36 since there are just five outcomes in A and 36 in S.

It’s often desirable to study groups of events at the same time. To do this,
probabilists have adopted the notation of set theory.

Groups of Events. For the events A and B in the sample space S, the
following notation from set theory is used:
• A ∩ B is the event that both A and B occur.
• A ∪ B is the event that A or B occurs.
• AC is read A complement and represents the event that A does not
occur.
• ∅ is an impossible event.
• If A ∩ B = ∅, then A and B are said to be mutually exclusive.

1.2. Probability Axioms and Rules


All of probability is based on three axioms that are simply formulated. Accepting
them as true, we can build a whole theory of probability.
1.2. Probability Axioms and Rules 3

Probability Axioms
(1) P (A) ≥ 0 for every event A in the sample space S
(2) P (S) = 1
(3) If the events A, B, C, etc. in the sample space S are mutually excusive
one from another, then
P (A ∪ B ∪ C ∪ · · · ) = P (A) + P (B) + P (C) + · · ·

Based on these three axioms we can derive a number of practical probability


rules that will allow us to solve problems of interest. We list the most important.

Probability Rules
(1) 0 ≤ P (A) ≤ 1 for every event A in the sample space S
(2) P (AC ) = 1 − P (A) (Complement Rule)
(3) P (A ∪ B) = P (A) + P (B) − P (A ∩ B) (Probability of a Union Rule)
(4) If A and B are mutually exclusive, then P (A ∪ B) = P (A) + P (B)
(5) For the three events A, B, and C,
P (A ∪ B ∪ C) = P (A) + P (B) + P (C)
−P (A ∩ B) − P (B ∩ C) − P (A ∩ C)
+P (A ∩ B ∩ C)

We provide some examples where these rules are put to use.

Example 1.3. If A and B are events in a sample space for which P (A) = 0.50,
P (B) = 0.40, and P (A ∩ B) = 0.25, compute (a) P (A ∪ B) and (b) P (B C ), and
(c) P (AC ∩ B).

(a) P (A ∪ B) = P (A) + P (B) − P (A ∩ B) = 0.50 + 0.40 − 0.25 = 0.65


(b) P (B C ) = 1 − P (B) = 1 − 0.40 = 0.60
(c) Noting that B can be partitioned into the two mutually exclusive parts
A ∩ B and AC ∩ B, we can write
P (B) = P ((A ∩ B) ∪ (AC ∩ B)) = P (A ∩ B) + P (AC ∩ B).
Hence,
0.40 = 0.25 + P (AC ∩ B),
so that P (AC ∩ B) = 0.15. For a problem like this, it’s worthwhile to sketch a Venn
Diagram to see how B can be partitioned.

Example 1.4. If you deal a card from a well shuffled deck of 52, what’s the
probability it’s red or a king?
4 1. Probability

Let A be the event that it’s red and B the event that it’s a king. Then

P (A ∪ B) = P (A) + P (B) − P (A ∩ B)
26 4 2
= + −
52 52 52
7
= .
13

Example 1.5. Suppose A, B, and C are events in a sample space for which P (A) =
0.32, P (B) = 0.27, P (C) = 0.42, P (A ∩ B) = 0.14, P (B ∩ C) = 0.09, P (A ∩ C) =
0.11, and P (A ∩ B ∩ C) = 0.08. Compute (a) P (A ∪ B ∪ C) and (b) P (A ∩ B C ∩ C).

(a) P (A ∪ B ∪ C) = 0.32 + 0.27 + 0.42 − 0.14 − 0.09 − 0.11 + 0.08 = 0.75


(b) Sketching a Venn Diagram would be of great benefit here. We note that
A ∩ C can be divided into the part that’s in B and the part that’s not in B to get
P (A ∩ C) = P ((A ∩ B ∩ C) ∪ (A ∩ B C ∩ C)) = P (A ∩ B ∩ C) + P (A ∩ B C ∩ C), so
that 0.11 = 0.08 + P (A ∩ B C ∩ C) and P (A ∩ B C ∩ C) = 0.03.

Exercises

(1) An experiment consists of tossing a coin three times. Write out the sample
space of this experiment. Find the probability that you get more heads than
tails.
Ans. See one of the examples above for S. 12
(2) An experiment consists of tossing a coin four times. Write out the sample
space of this experiment. Find the probability that you get (a) exactly two
heads, (b) at most two heads.
Ans. S = {HHHH, HHHT, ..., T T T T } There should be 16 outcomes
11
altogether. (a) 38 , (b) 16
(3) If A and B are two events in a sample space for which P (A) = 0.30, P (B) =
0.60, and P (A ∩ B) = 0.20, compute (a) P (A ∪ B), (b) P (AC ), and (c)
P (AC ∩ B).
Ans. (a) 0.70, (b) 0.70, (c) 0.40
(4) If A and B are mutually exclusive events for which P (A) = 0.30 and P (B) =
0.60, compute (a) P (A ∪ B) and (b) P (A ∩ B).
Ans. (a) 0.90, (b) 0
(5) Suppose 60% of college students have a VISA Card, 45% have a MasterCard,
and 12% have both. What’s the probability that a randomly selected college
student (a) does not have a VISA or MasterCard, (b) has at least one of the
two cards, (c) has a Master Card but not a VISA Card?
Ans. (a) 0.07, (b) 0.93, (c) 0.33
(6) If you roll two dice, what’s the probability that the sum of the two numbers
that come up is at least nine?
5
Ans. 18
1.3. Odds 5

(7) Suppose A, B, and C are events in a sample space for which P (A) = 0.30,
P (B) = 0.25, P (C) = 0.45, P (A ∩ B) = 0.15, P (B ∩ C) = 0.08, P (A ∩ C) =
0.12, and P (A∪B ∪C) = 0.70. Compute (a) P (A∩B ∩C), (b) P (AC ∩B ∩C),
(c) P ((AC ∩ B) ∪ C), and (d) P (AC ∩ (B ∪ C)) .
Ans. (a) 0.05, (b) 0.03, (c) 0.52, (d) 0.40
(8) If you deal a card from a well shuffled deck of 52, what’s the probability (a)
it’s a king and (b) it’s either a diamond or a two?
1 4
Ans. (a) 13 (b) 13

1.3. Odds
In probability theory, the term odds is defined by
P (A)
the odds of A = .
P (AC )
If you were to deal a card from a well shuffled deck of 52, the odds it would be
4 48 1
an ace would be ÷ = . Odds are typically written as a ratio of positive
52 52 12
1
integers, so the odds of dealing an ace could be written as 12 , 1 : 12, or 1 to 12.
Unfortunately, the term odds is not used consistently in different quarters. It is
common to take “odds for the event A” to mean “odds of the event A” and “odds
against the event A” to mean P (AC )/P (A). In other words, the odds against is the
reciprocal of the odds for. Compounding the confusion is the fact that gamblers
usually mean “odds against” when they simply say “odds”.

Exercises

(1) When rolling a die, what are the odds (a) for and (b) against getting a 6.
Ans. (a) 1 to 5 (b) 5 to 1
(2) If you toss a fair coin six times, what are the odds (a) for and (b) against
getting exactly five heads?
Ans. (a) 3/29 (b) 29/3
(3) If you deal a card from a well shuffled deck of 52 cards, what are the odds (a)
for and (b) against getting a face card?
Ans. (a) 3/10 (b) 10/3
(4) If the odds against a candidate winning an election are four to one, what’s the
probability the candidate wins the election?
Ans. 20%
(5) You are told that the odds of the Saints winning the Super Bowl this season
are 24 to 1. You recognize that you are actually being told that the odds
against their winning the super bowl are 24 to 1. What is the probability
they’ll win the Super Bowl?
Ans. 0.04
(6) If you buy 5000 tickets, your odds for winning a lottery are 1 to 800. What is
the probability to six decimal places that you win?
Ans. 0.001248
6 1. Probability

1.4. Conditional Probability and Independence


In many instances investigators want to know the likelihood of an event occurring
given some other event has already occurred. In computing the probability that a
football team will win the next game, for example, we would want to incorporate
all information available. If the starting quarterback is injured, we should compute
the probability conditioned on the event that the reserve quarterback will play. We
use the notation P (A|B) to represent the conditional probability that A will occur
given B has occurred. It’s customary to read P (A|B) as simply “the probability of
A given B”.

By definition, the conditional probability of A given B, P (A|B), is given


by
P (A ∩ B)
P (A|B) =
P (B)
if P (B) 6= 0.

Example 1.6. Suppose P (A) = 0.5 and P (B) = 0.4. Compute then the condi-
tional probability P (A|B) if (a) P (A ∪ B) = 0.75 and (b) A and B are mutually
exclusive.

(a) Since, P (A ∪ B) = P (A) + P (B) − P (A ∩ B), we have 0.75 = 0.5 + 0.4 −


P (A ∩ B) or P (A ∩ B) = 0.15. Consequently,
P (A ∩ B) 0.15
P (A|B) = = = 0.375.
P (B) 0.4
(b) Since A and B are mutually exclusive, P (A∩B) = 0. As a result, P (A|B) =
0
0.4 = 0.

P (A ∩ B)
By multiplying each side of the equation P (A|B) = by P (B), we
P (B)
obtain a useful result:

Multiplication Rule.
P (A ∩ B) = P (A|B)P (B)

This rule proves useful in many probability computations.

Example 1.7. Find the probability that the first two cards you deal from a well-
shuffled deck of 52 are red.

Let A be the event that the second card is red and B the event that the first
is red. Then the probability to be computed is given by
25 26
P (A ∩ B) = P (A|B)P (B) = · = 0.2451.
51 52
1.4. Conditional Probability and Independence 7

If the occurrence of A is not dependent on the occurrence or nonoccurrence of


B, we say that A and B are independent. Mathematically, we have that A and
B are independent if P (A|B) = P (A). Note that the multiplication rule becomes
P (A ∩ B) = P (A|B)P (B) = P (A)P (B) if and only if A and B are independent.
For this reason, the following is a common definition for independence:

Independence. The events A and B are said to be independent if


P (A ∩ B) = P (A)P (B).

Example 1.8. If A and B are independent events for which P (A) = 0.6 and
P (B) = 0.4, compute P (A ∪ B).

Since A and B are independent, P (A ∩ B) = P (A)P (B) = (0.6)(0.4) = 0.24.


As a result,
P (A ∪ B) = P (A) + P (B) − P (A ∩ B) = 0.4 + 0.6 − 0.24 = 0.76.

Note that if the three events A, B, and C are independent, then P (A∩B ∩C) =
P ((A ∩ B) ∩ C) = P (A ∩ B)P (C) = P (A)P (B)P (C). In more general terms, for
the n independent events A1 , A2 , ...,An , we have
P (A1 ∩ A2 ∩ · · · ∩ An ) = P (A1 )P (A2 ) · · · P (An ).

Example 1.9. If you toss a coin six times, what’s the probability you get six
heads?

Let Ai be the event that the ith toss results in heads. Then A1 , A2 , ..., A6 , are
independent. The answer is
 1 6
P (A1 ∩ A2 ∩ · · · ∩ A6 ) = P (A1 )P (A2 ) · · · P (A6 ) = = 0.0156.
2

Example 1.10. If you toss a coin six times what’s the probability you get at least
one tail?

Note that a direct computation would be complicated. You’d have to find the
probability that you get exactly one tail, that you get exactly two tails, that you
get exactly three tails, etc., and then add all these probabilities together. Using
the Complement Rule makes the computation much simpler. The complement
of getting at least one tail is getting no tails. Another way of saying that you
get no tails is to say that you get all heads. We know that the probability of
getting all heads is 0.0156. Therefore, the probability of getting at least one tail is
1 − 0.0156 = 0.9844.

Example 1.11. Suppose one in eight soft drink bottle tops are winners. If you
randomly buy eight bottles of soft drink, what’s the probability you win at least
once?
8 1. Probability

According to the Complement Rule, the probability you win at least once is one
minus the probability you don’t win at all. Not winning at all, would mean that
each of the eight bottle tops is a loser. Since each top is a loser with probability 78 ,
we have that the answer to the question is
 7 8
1− = 0.6564.
8

Exercises

3
(1) If A and B are independent, P (A) = 4 and P (B) = 13 , compute P (A ∩ B).
Ans. 41 = 0.25
(2) Suppose A and B are two events for which P (A) = 0.70 and P (B) = 0.25.
Compute P (A ∩ B) if (a) A and B are independent (b) A and B are mutually
exclusive.
Ans. (a) 0.175 (b) 0
(3) Suppose A and B are two events for which P (A) = 21 , P (B) = 13 , and P (A ∩
B) = 14 . Compute (a) P (A|B), (b) P (B|A), (c) P (A ∪ B).
Ans. (a) 34 , (b) 12 , (c) 12
7

1
(4) Suppose A and B are two events for which P (A|B) = 2 and P (A ∩ B) = 31 .
Compute P (B).
Ans. 23
(5) When rolling two dice, what’s the probability the first comes up as a 3 given
the sum from the two dice is at least eight?
2
Ans. 15
(6) If you toss a coin 12 times, what’s the probability you get no heads? Round
your answer off to five decimal places.
Ans. 0.00024
(7) Draw two cards without replacement from a well-shuffled deck of 52 playing
cards. What’s the probability they are both kings? Recall that there are four
kings in a deck. Round your answer off to four decimal places.
1
Ans. 221 = 0.0045
(8) Only 15% of motorists come to a complete stop at a certain four way stop
intersection. What’s the probability that of the next ten motorists to go
through that intersection (a) none come to a complete stop, (b) at least one
comes to a complete stop, and (c) exactly two come to a complete stop.
Ans. (a) 0.1969, (b) 0.8031, (c) 0.2759
(9) Suppose one in 12 soft drink bottle tops are winners. If you randomly buy six
bottles of soft drink, what’s the probability you win at least once?
 6
Ans. 1 − 11 12 = 0.4067
(10) Suppose one in four soft drink bottle tops are winners. If you randomly buy
six bottles of soft drink, what’s the probability you win at least once?
Ans. 0.8220
1.5. Bayes’ Theorem 9

1.5. Bayes’ Theorem


The result is named after Thomas Bayes, a Presbyterian minister from eighteenth
century England. Bayes’ Theorem is a formula that allows one to reverse the events
in a conditional probability. An example where such a result is useful is when the
physician tries to determine a disease based on certain symptoms. If we know the
disease, we typically know the probability of various symptoms. Given a certain set
of symptoms,however, we don’t necessarily have a good idea of what the disease is.
To compute the conditional probability P (B|A), note that A can be partitioned
into the two parts A ∩ B and A ∩ B C so that we have

A = (A ∩ B) ∪ (A ∩ B C ),

and
P (A) = P (A ∩ B) + P (A ∩ B C ).

Now we can write P (A ∩ B) = P (A|B)P (B) and P (A ∩ B C ) = P (A|B C )P (B C ) to


obtain

P (A ∩ B)
P (B|A) =
P (A)
P (A ∩ B)
=
P (A ∩ B) + P (A ∩ B C )
P (A|B)P (B)
=
P (A|B)P (B) + P (A|B C )P (B C )

We now formulate the result.

Bayes’ Theorem.
P (A|B)P (B)
P (B|A) =
P (A|B)P (B) + P (A|B C )P (B C )

An application follows.

Example 1.12. A company has developed a new drug test that tests positive on
a drug user 99% of the time. It tests negative on non-drug users 99% of the time
also. If only 0.5% of the employees in a large company are drug users, what’s the
probability that a tested employee is actually a drug user if the test is positive?

Let A be the event that the employee tests positive and B the event that the
employee is a drug user . Then we need to compute P (B|A) to answer the question.
We have
10 1. Probability

P (A|B)P (B)
P (B|A) =
P (A|B)P (B) + P (A|B C )P (B C )
(0.99)(0.005)
=
(0.99)(0.005) + (0.01)(0.995)
= 0.332

A corollary to Bayes’ Theorem is the law of total probability. It is nothing


more than the additive expansion derived in the denominator of Bayes’ Theorem.
The formulation is as follows:

Law of Total Probability.


P (A) = P (A|B)P (B) + P (A|B C )P (B C )

Example 1.13. A certain university requires all it’s students to take the ACT
exam before admission. Some 25% of College Algebra students at this university
have an ACT math score of 26 or higher. Studies show that 90% of College Algebra
students who have a math ACT score of 26 or better pass the class. For those with
a math ACT score lower than 26, only 48% pass College Algebra. Assuming all the
given information is accurate, compute the probability a randomly selected College
Algebra student from the university in question will pass the class.

Let A be the event that the randomly chosen College Algebra student passes
the class, and let B be the event that the student made a 26 or higher on the math
ACT. Then the answer is
P (A) = P (A|B)P (B) + P (A|B C )P (B C ) = (0.90)(0.25) + (0.48)(0.75) = 0.585.

A generalized version of Bayes’ Theorem can be derived if the sample space


is partitioned into n events as opposed to just the two events B and B C . The n
events B1 , B2 , ..., Bn , are said to be a partition of the sample space S if
B1 ∪ B2 ∪ · · · ∪ Bn = S
and Bi ∩ Bj = ∅ if i 6= j. To compute the conditional probability P (Bi |A), we note
that A can be partitioned into the n mutually exclusive parts A ∩ B1 , A ∩ B2 , ..., A ∩
Bn , so that we have
B = (A ∩ B1 ) ∪ (A ∩ B2 ) ∪ · · · ∪ (A ∩ Bn ),
and
P (B) = P (A ∩ B1 ) + P (A ∩ B2 ) + · · · + P (A ∩ Bn ).
Now we can write P (A ∩ B1 ) = P (A|B1 )P (B1 ), P (A ∩ B2 ) = P (A|B2 )P (B2 ), ... ,
and P (A ∩ Bn ) = P (A|Bn )P (Bn ), to obtain
1.5. Bayes’ Theorem 11

P (A ∩ Bi )
P (Bi |A) =
P (A)
P (A ∩ Bi )
=
P ((A ∩ B1 ) ∪ (A ∩ B2 ) ∪ · · · ∪ (A ∩ Bn ))
P (A ∩ Bi )
=
P (A ∩ B1 ) + P (A ∩ B2 ) + · · · + P (A ∩ Bn )
P (A|Bi )P (Bi )
=
P (A|B1 )P (B1 ) + P (A|B2 )P (B2 ) + · · · + P (A|Bn )P (Bn )

We now formulate the general theorem.

Generalized Bayes’ Theorem.


P (A|Bi )P (Bi )
P (Bi |A) =
P (A|B1 )P (B1 ) + P (A|B2 )P (B2 ) + · · · + P (A|Bn )P (Bn )

Exercises

(1) If A and B are two events in a sample space for which P (B) = 0.70, P (A|B) =
0.20, P (A|B C ) = 0.40, compute P (A).
Ans. 0.26
(2) If you deal two cards without replacement from a well shuffled deck of 52,
what’s the probability that the second will be red?
Ans. 0.5
(3) A blood test detects a certain disease 95% of the time when the disease is
present. However, the test is also positive 1% of the time when the disease
is not present. If 0.5% of the population actually has the disease, what’s the
probability a person has the disease given the test result is positive?
Ans. 0.3231
(4) Assume 1% of women at age forty have breast cancer. If 80% of women with
breast cancer will get positive mammographies and 9.6% of women without
breast cancer will also get positive mammographies, what’s the probability
that a 40 year old woman that has a positive mammography actually has
breast cancer?
Ans. 0.0776
(5) An insurance company learns that a potential customer is smoking a cigar.
The company also knows that 9.5% of males smoke cigars as do 1.7% of fe-
males. What’s the probability that the potential customer is a male. Assume
that half of the population to answer this question is male.
12 1. Probability

Ans. Let A be the event that the customer is a cigar smoker and B be the
event that the customer is male. Then the answer is
P (A|B)P (B)
P (B|A) =
P (A|B)P (B) + P (A|B C )P (B C )
(0.095)(0.5)
=
(0.095)(0.5) + (0.017)(0.5)
= 0.8482

The insurance company can be about 85% sure that the cigar smoker is a male.
(6) Suppose that just five percent of men and a quarter of one percent of women
are color-blind and that there are an equal number of women and men. If a
color-blind person is chosen at random, what’s the probability that person is
male?
Ans. 0.9524
(7) Three different factories, F1 , F2 , and F3 , are used to manufacture a large
batch of cell phones. Suppose 20% of the phones are produced by F1 , 30%
are produced by F2 , and 50% by F3 . Suppose also that 1% of the phones
produced by F1 are defective, as are 2% of those produced by F2 and 3% of
those produced by F3 . If one phone is selected at random from the entire
batch and is found to be defective, what’s the probability it was produced by
(a) F1 and (b) F2 ?
Ans. (a) 0.0870
Chapter 2

Random Variables

Probabilists use random variables (RV’s) to study random phenomena. A random


variable is a rule that assigns numbers to outcomes of an experiment. Typically,
X (or any other convenient upper case letter) is used to represent an RV. One can
label events with RV’s. For example, if an experiment consists of tossing three
coins and a random variable X counts the number of heads that come up, then
the event A that you get at least two heads can be written in terms of the RV by
A = {X ≥ 2}. Note that

3+1 1
P ({X ≥ 2}) = = ,
8 2

since there are eight equally likely outcomes to the experiment, three of which result
in two heads and one of which results in three heads. It is customary to abbreviate
1
P ({X ≥ 2}) with P (X ≥ 2) for ease of writing, so that we have P (X ≥ 2) = .
2

2.1. Discrete Random Variables


The RV we just considered is said to be discrete because it only takes the values
0, 1, 2, and 3. Discrete RV’s only take a finite or countably infinite number of
values. An example of a discrete random variable taking an infinite number of
values is one that counts the number of tosses of a coin it takes for heads to come
up. This random variable takes the values 1, 2, 3, 4, ..., with positive probability. A
random variable would not be discrete if it could take every value in some interval
of numbers.
Associated with each discrete random variable X is a probability mass function
p(x) defined by

p(x) = P (X = x).

13
14 2. Random Variables

Probability Mass Function. The probability mass function p(x) of a


discrete random variable X has the following properties:
p(x) ≥ 0 for all x
(1) X
(2) p(x) = 1, the sum being over all values that the random variable
x
can take X
(3) P (A) = p(x), where the sum is taken over all outcomes x in the
x∈A
event A

We consider a simple example of a discrete random variable.

Example 2.1. Illustrate that the the first two properties listed above for proba-
bility mass functions are true for the random variable X that counts the number
of heads you get when you toss a balanced coin three times.

If you toss a coin three times, then the sample space of outcomes is

S = {HHH, HHT, HT H, T HH, HT T, T HT, T T H, T T T }.


We therefore have that {X = 0} = {T T T }, {X = 1} = {HT T, T HT, T T H},
{X = 2} = {HHT, T HH, HT H}, {X = 3} = {HHH}, and {X = k} = ∅ for all
k 6= 0, 1, 2, or 3. Consequently,

 1/8 if x = 0

 3/8 if x = 1


p(x) = 3/8 if x = 2
1/8 if x = 3




0 otherwise

Note also that


X 3
X
p(x) = p(k)
x k=0
1 3 3 1
= + + +
8 8 8 8
= 1.

Here’s another example.

Example 2.2. Assume the probability the home team wins the second game of
the NBA finals by fewer than four points is 0.17. That they win by at least four
points and fewer than seven points is 0.28, that they win by at least seven points
and fewer than 10 points is 0.25, that they win by at least 10 points is 0.10, and
that they lose is 0.20. Compute the probability that the home team wins the next
NBA finals by at least four points.
2.1. Discrete Random Variables 15

The event A that they win by at least four points is

X
P (A) = p(x)
x∈A
= 0.28 + 0.25 + 0.10
= 0.63.

We now consider an example for a discrete random variable that assumes an


infinite number of values.

Example 2.3. Illustrate that the the first two properties listed above for proba-
bility mass functions are true for the random variable X that counts the number
tosses of a coin it takes for heads to come up.

Clearly P (X = 1) = 1/2. We note that P (X = 2) is the probability we


get a tails on the first toss and a heads on the second. Since the two events are
independent and each of probability 1/2, we have that P (X = 2) = (1/2)2 = 1/4.
We note that the event {X = 3} is the event that we get tails on the first toss, tails
on the second, and heads on the third. We thus have that P (X = 3) = (1/2)3 = 1/8.
Continuing in this way, we obtain P (x = k) = (1/2)k for k = 4, 5, 6, .... Clearly,
P (X = k) = 0 if k is not a positive integer. As a result, p(x) ≥ 0 for all x and
X ∞
X
p(x) = p(k)
x k=1
∞ 
X 1 k
=
2
k=1
1
2
= 1
1− 2
= 1.

For the penultimate equality we used the formula for the sum of a geometric series.
The reader might wish to review this formula in the sequences and series chapter
of a calculus textbook or the latter pages of a college algebra textbook.
Here is a related example.

Example 2.4. What’s the least number of times you would need to toss a coin to
have at least a 99% probability of getting a heads?

Tossing a coin once would result in a heads with probability 1/2. If you toss
a coin twice, there are four outcomes, three of which result in at least one heads.
If you toss a coin three times there are eight outcomes, seven of which result in
at least one heads. Continuing in this way, we obtain 2k outcomes when the coin
16 2. Random Variables

is tossed k time, and all but one of those outcomes has at least one heads. We
therefore need to solve the inequality
2k − 1
> 0.99
2k
to answer the question. We can rewrite this inequality as
 1 k
1− > 0.99
2
or
 1 k
< 0.01.
2
Taking the natural logarithm of each side, we obtain
1
k ln < ln(0.01)
2
or
ln(0.01)
k> = 6.64.
ln 21
We therefore need to toss the coin at least seven times.

Exercises

(1) Suppose X is a random variable that counts the number of heads you get when
you toss a coin six times and that p(x) is the probability mass function for
X. There are methods - short of listing all outcomes in the sample space and
counting the number that have exactly two or exactly four heads - to establish
that p(2) = p(4) = 15/64. Deduce the values of p(k) for k = 0, 1, 5, and 6 by
inspection, and then compute p(3)
5
Ans. p(3) = 16
(2) Find the value of the constant c that makes f (x) a probability mass function
if p(x) is given by

cx if x = 2, 4, 6, or 8
p(x) =
0 otherwise
Ans. c = 1/20
(3) For the random variable X with the probability mass function in the imme-
diately preceding problem, compute (a) P (X ≤ 4) and (b) P (X < 4).
Ans. (a) 0.3 (b) 0.1
(4) What’s the least number of times you would need to roll a die to have at least
a 90% probability of getting a six?
Ans. Solve the inequality 1 − (5/6)n ≥ 9/10 to get n is at least 13
(5) Let X be a random variable giving the sum obtained when you roll two dice.
Write out a rule for the probability mass function of X. Note that p(x) will
be a piecewise defined function with 12 pieces. Compute the probability you
get a sum of at least 10 by adding p(10), p(11), and p(12).
2.2. Continuous Random Variables 17

Ans.
1

 36 if x = 2 or 12
2

if x = 3 or 11


 36
3

if x = 4 or 10


 36
4
p(x) = 36 if x = 5 or 9
5
if x = 6 or 8


 36
6

if x = 7



 36
0 otherwise

1
Prob. of getting at least 10 is 6
(6) Can a discrete random variable X have a probability mass function with rule

1/x if x is an integer greater than or equal to 2
p(x) =
0 otherwise
Explain your answer.

2.2. Continuous Random Variables


A random variable is said to be continuous if it can take all the values in an interval
of real numbers. Continuous RV’s contrast with discrete RV’s in that they can take
on an uncountable number of values. They are useful for measuring such phenomena
as lengths, forces, time intervals, etc..
One interesting property a continuous random variables X has is that P (X =
c) = 0 for every real number c. You might recall that this was not the case with
the discrete RV X counting the number of heads that come up when you toss a
coin three times. In that experiment P (X = 3) = 1/8. This isn’t as strange as it
seems. An object might appear to be 10 cm long for example. If a fine enough
measuring device is used to measure its length, however, the measurer will notice
that the object is actually either slightly larger or slightly shorter than 10 cm.

Probability Density Function. Each continuous RV X has associated


with it a probability density function (pdf) f (x) with the following proper-
ties:
(1) fZ (x) ≥ 0 for all real x,

(2) f (x)dx = 1, and
−∞ Z c
(3) P (X ≤ c) = f (x)dx for every real number c.
−∞

We provide a few examples involving pdf’s.


Example 2.5. Show that the function
 −2x 
2e for x ≥ 0
f (x) =
0 otherwise
satisfies the first two pdf properties.
18 2. Random Variables

1) Since e raised to any power is positive, 2e−2x is always positive and f (x) ≥ 0
for all x.
Z ∞ Z ∞
2) f (x)dx = 2e−2x dx = −e−2x |∞ 0
0 = 0 − (−e ) = 1.
−∞ 0

Example 2.6. Suppose the random variable X has the pdf


 1 
5 for 1 ≤ x ≤ c
f (x) = .
0 otherwise
What value must the constant c take for f (x) to be a pdf? For that value, compute
P (X ≤ 2.3)
Z c
1 x c 1
1= dx = |c1 = − ⇒ c = 6.
1 5 5 5 5
Z 2.3 Z 1 Z 2.3
1 x 1
P (X ≤ 2.3) = f (x)dx = 0dx + dx = 0 + |2.3
1 = (2.3 − 1) = 0.26
−∞ −∞ 1 5 5 5

We now derive formulas for computing probabilities on different types of inter-


vals. Suppose a < b. Then
Z b Z a Z b
f (x)dx = f (x)dx + f (x)dx
−∞ −∞ a

so that
Z b Z b Z a
f (x)dx = f (x) − f (x)dx
a −∞ −∞
= P (X ≤ b) − P (X ≤ a)
= P ({X ≤ a} ∪ {a < X ≤ b}) − P (X ≤ a)
= P (X ≤ a) + P (a < X ≤ b) − P (X ≤ a)
= P (a < X ≤ b)

Z b
Since P (X = b) = 0, we have that P (a < X < b) = f (x)dx. Similar derivations
a
yield the following formulas:
Formulas Involving Probability Density Functions.
Z b
P (a < X < b) = P (a ≤ X < b) = P (a < X ≤ b) = P (a ≤ X ≤ b) = f (x)dx
a
Z c
P (X ≤ c) = P (X < c) = f (x)dx
−∞
Z ∞
P (X ≥ c) = P (X > c) = f (x)dx
c
2.2. Continuous Random Variables 19

Example 2.7. If the random variable X has the pdf


 2 
x3 for x ≥ 1
f (x) = ,
0 otherwise
compute (a) P (X < 2), (b) P (2 < X < 4), and (c) P (X > 3)
Z 2 Z 2
1 3
(a) P (X < 2) = f (x)dx = 2x−3 dx = −x−2 |21 = − − (−1) =
−∞ 1 4 4
Z 4
1 1 3
(b) P (2 < X < 4) = 2x−3 dx = −x−2 |42 = − − (− ) =
16 4 16
Z ∞ 2
1 1
(c) P (X > 3) = 2x−3 dx = −x−2 |∞ 3 = 0 − (− ) =
3 9 9

Exercises

(1) Suppose X is a continuous random variable with pdf


 1 
10 for 0 ≤ x ≤ 10
f (x) = .
0 otherwise
Compute (a) P (X ≤ 4), (b) P (X > 6.5), (c) P (X ≥ 6.5), and (d) P (5 ≤ X ≤
6).
Ans. (a) 0.4, (b) 0.35, (c) 0.35, (d) 0.1
(2) What value must the constant c take for the function
cx2 for 1 ≤ x ≤ 2
 
f (x) =
0 otherwise
to be a pdf?
3
Ans. c = .
7
(3) Suppose X is a continuous random variable with pdf
 1 
3 for 4 ≤ x ≤ 7
f (x) = .
0 otherwise
Compute (a) P (5 ≤ X ≤ 6), (b) P (X ≥ 6), (c) P (X ≤ 1), (d) P (X ≥ 2), and
(e) P (X = 5).
1 1
Ans. (a) , (b) , (c) 0, (d) 1, (e) 0
3 3
(4) If the random variable X has the pdf
 1 
x2 for x ≥ 1
f (x) = ,
0 otherwise
compute (a) P (X < 5), (b) P (2 < X < 4), and (c) P (X > 2)
Ans. (a) 54 (b) 14 (c) 21
(5) Suppose a random variable X has the pdf
c
f (x) = .
1 + x2
Compute (a) the value of c, (b) P (X > 0), and (c) P (X < 1).
Ans. (a) 1/π (b) 12 (c) 43
20 2. Random Variables

(6) For the random variable Xwith pdf


 −2x 
2e for x ≥ 0
f (x) = ,
0 otherwise
(a) compute P (1 < X < 2) and (b) P (X > 1) to four decimal places.
Ans. (a) 0.1170 (b) 0.1353
(7) For the random variable X with pdf
xe−x for x ≥ 0
 
f (x) = ,
0 otherwise
R∞
(a) verify −∞ f (x) = 1 and (b) compute P (1 < X < 2) to four decimal
places.
Ans. (b) 0.3298
(8) Students of calculus are accustomed to evaluating definite integrals by finding
an antiderivative and applying the Fundamental Theorem. Even though an
antiderivative cannot be found for the function
1 2
f (x) = √ e−x /2 ,

the function is still the pdf of a certain random variable X. For this random
variable use your calculator to approximate to four decimal places (a) P (0 ≤
X ≤ 1) and (b) P (X ≤ 1.96).
Ans. (a) 0.3413 (b) 0.9750

2.3. Cumulative Distribution Functions


Associated with all random variables are cumulative distribution functions (CDF’s).
Some statisticians refer to them simply as distribution functions. The name is
indeed descriptive. For the RV X, the CDF F (x) is given by
F (x) = P (X ≤ x).
The function gives the cumulative probability up to its argument. It therefore is a
nondecreasing function.
Example 2.8. Find a rule for the CDF of the RV X that counts the number of
heads that come up when a coin is tossed three times.

If x < 0, then the event {X ≤ x} cannot occur since it’s not possible to get
fewer than zero heads. Consequently, F (x) = P (X ≤ x) = 0.
If 0 ≤ x < 1, then {X ≤ x} is actually the event that zero heads occur, so we
have that F (x) = P (X ≤ x) = 1/8.
If 1 ≤ x < 2, then {X ≤ x} is the event that 0 or 1 heads come up, so
F (x) = 4/8 = 1/2.
If 2 ≤ x < 3, then {X ≤ x} is the event that 0, 1, or 2, heads come up, so
F (x) = 7/8.
Finally, if x > 3, {X ≤ x} is the event that 0, 1, 2, or 3, heads come up, and
we have F (x) = 1.
2.3. Cumulative Distribution Functions 21

The CDF is therefore given by




 0 if x < 0
1/8 if 0 ≤ x < 1



F (x) = 1/2 if 1 ≤ x < 2
7/8 if 2 ≤ x < 3




1 if x ≥ 3

Note that the function jumps upward at the values of x that X takes with positive
probability. Between these x values F (x) is constant. We will see later that CDF’s
for continuous random variables have no such jumps.

If the RV is continuous with pdf f (x), we can write


Z x
F (x) = f (t)dt.
−∞

I.e. F (x) is the cumulative area under the curve up to x. One will recall that the
Fundamental Theorem of Calculus says that F ′ (x) = f (x) if the function f (x) itself
is continuous.

Example 2.9. Find the CDF for the RV X that has pdf

1/10 if 10 ≤ x ≤ 20
f (x) = .
0 otherwise

If x < 10, then


Z x
F (x) = 0dt = 0.
−∞

If 10 ≤ x < 20, then


Z x 10 x
1 t 1
Z Z
F (x) = f (t)dt = 0dt + dt = 0 + |x10 = (x − 10).
−∞ −∞ 10 10 10 10
If x > 20, then
Z 10 20 x
1 t 20 1
Z Z
F (x) = 0dt + dt + 0dt = 0 + | +0= (20 − 10) = 1.
−∞ 10 10 20 10 10 10
We therefore have that

 0 for x < 10
1
F (x) = 10 (x − 10) for 10 ≤ x ≤ 20 .
1 for x > 20

Note that F (x) is constantly zero to the left of 10, constantly one to the right of
20 and a line segment connecting (10, 0) to (20, 1) when x is between 10 and 20.

We now list some general properties of distribution functions that can be de-
rived from the definition F (x) = P (X ≤ x).
22 2. Random Variables

Cumulative Distribution Function Properties. If F (x) is the cumula-


tive distribution function of a RV X (that is either discrete or continuous),
then
(1) F (x) is non-decreasing,
(2) lim F (x) = 0,
x→−∞
(3) lim F (x) = 1,
x→∞
(4) P (a < X ≤ b) = F (b) − F (a), and
(5) P (X = c) = F (c) − lim F (x).
x→c−

Note furthermore that if X is a continuous RV, P (X = c) = 0 and

P (a < X < b) = P (a ≤ X < b) = P (a < X ≤ b) = P (a ≤ X ≤ b) = F (b) − F (a).

We provide two more examples dealing with CDF’s.

Example 2.10. If the RV X has the CDF




 0 for x < 2
1/4 for 2 ≤ x < 6

F (x) = ,

 3/4 for 6 ≤ x < 7
1 for x ≥ 7

find a rule for the probability mass function of X.

Since the CDF steps up from 0 to 1/4 at x = 2, we have that f (2) = 1/4.
The CDF is then constant on [2, 6) and jumps 3/4 − 1/4 = 1/2 at x = 6. Hence,
f (6) = 1/2. Using the last of the properties of CDF’s, we could write this as

f (6) = P (X = 6) = F (6) − lim F (x) = 3/4 − 1/4 = 1/2.


x→6−

The last jump is of magnitude 1 − 3/4 = 1/4 at x = 7, giving us f (7) = 1/4. The
probability mass function is therefore given by


 1/4 if x = 2
1/2 if x = 6

f (x) =

 1/4 if x = 7
0 otherwise

Example 2.11. Find a rule for the CDF of the random variable X that has pdf
1
f (x) = .
π(1 + x2 )

Then use the CDF to compute P (X < 3) and P (−1 ≤ X ≤ 1).
2.3. Cumulative Distribution Functions 23

We have that
x
1
Z
F (x) = dt
−∞ π(1 + t2 )
1
= tan−1 t|x−∞
π
1
= (tan−1 x − (−π/2))
π
1 1
= + tan−1 x
2 π

1 1
Since F (x) = 2 + π tan−1 x, we have that

√ √
P (X < 3) = F ( 3)
1 1 π
= + ·
2 π 3
1 1
= +
2 3
5
= .
6

We also have that

P (−1 ≤ X ≤ 1) = P (−1 < X ≤ 1)


= F (1) − F (−1)
1 1 π 1 1 π
= + · − ( + · (− ))
2 π 4 2 π 4
1
= .
2

To conclude this section we discuss percentiles.

Percentiles of Continuous Random Variables. If X is a continuous


random variable with CDF F (x), we define the 100pth percentile of X to
be the value a that solves F (a) = p.

For example, the 75th percentile of the random variable from the last example
1 1
is given by the value of a that solves F (a) = 0.75. We solve + tan−1 a = 0.75
2 π
π
and obtain tan−1 a = or a = 1.
4

The reader should note that the median of a continuous random variable is
nothing more than its fiftieth percentile.
24 2. Random Variables

Exercises

(1) Can

6x(1 − x) if 0 ≤ x ≤ 1
F (x) =
0 otherwise
be a CDF? Explain why.
Ans. No. This function is actually decreasing on the interval ( 12 , 1).
(2) Suppose X is a random variable for which P (X = 0) = 1/4 and P (X = 1) =
3/4. Find a rule for X’s CDF F (x). Then graph this CDF and compute
F (−2), F (0), F (0.7),
 and F (1.2).
 0 if x < 0
Ans. F (x) = 1/4 if 0 ≤ x < 1 , F (−2) = 0, F (0) = 1/4, F (0.7) =
1 if x ≥ 1

1/4, F (1.2) = 1
(3) For the previous problem, compute (a) P (X ≤ 2) and (b) P (X > 1).
Ans. (a) F (2) = 1, (b) 1 − P (X ≤ 1) = 1 − F (1) = 1 − 1 = 0
(4) If we let the random variable X count the number of heads you get when you
toss a balanced coin 10 times, then the cumulative distribution function of X
is as given in the table. When reading the table, take the function’s value to
be constant from one integer up to all values less than the next larger integer.
For example, F (x) = 0.172 if 3 ≤ x < 4, and F (x) = 0.377 if 4 ≤ x < 5.
Compute the probability that (a) you get at most three heads, (b) you get at
least six heads, and (c) you get exactly 7 heads.
x F(x)
0 0.001
1 0.011
2 0.055
3 0.172
4 0.377
5 0.623
6 0.828
7 0.945
8 0.989
9 0.999
10 1.000
Ans. (a) P (X ≤ 3) = F (3) = 0.172, (b) P (X ≥ 6) = 1 − P (X < 6) =
1 − P (X ≤ 5) = 1 − F (5) = 1 − 0.623 = 0.377 (c) P (X = 7) = F (7) −
limx→7− F (x) = 0.945 − 0.828 = 0.117
(5) Graph and find a rule for the CDF of the random variable with pdf
 1 
x 2 for x ≥ 1
f (x) = ,
0 otherwise
Ans.  
0 if x < 1
F (x) = 1 ,
1− x if x ≥ 1
2.4. Expected Values and Variance 25

(6) Suppose X is a continuous random variable with CDF



0 for x < 0
F (x) = .
1 − e−x if x ≥ 0

Compute (a) P (X < 2), (b) P (1 < X < 2), (c) P (X > 1).
Ans. (a) 1 − e−2 , (b) e−1 − e−2 , (c) e−1
(7) Compute the 90th percentile of X to four decimal places in the previous
problem.
Ans. 2.3026
(8) For the random variable with pdf

6x(1 − x) if 0 ≤ x ≤ 1
f (x) =
0 otherwise

compute (a) the median and (b) the sixtieth percentile.


Z a
Ans. (a) 1/2 (b) Solving the equation 0.60 = 6x(1 − x)dx, we obtain
0

10a3 − 15a2 + 3 = 0.

A graphing calculator tells us that a is approximately 0.5671.


(9) For the continuous random variable with probability density function

2e−2x

if x ≥ 0
f (x) =
0 otherwise

compute (a) F (1) and (b) the 95th percentile of X.


R1 1
Ans. (a) F (1) = P (X ≤ 1) = 0 2e−2x dx = 0.8647 (b) 2 ln 20 = 1.4979
(10) For the discrete random variable with probability mass function
 3,600
5,269n2 if n = 1, 2, 3, 4, or 5
f (n) =
0 otherwise

compute F (4).
3600
Ans. F (4) = P (X ≤ 4) = 1 − P (X = 5) = 1 − 5269(5)2 = 0.9727.

2.4. Expected Values and Variance


Those who study distributions of values are often interested in central tendencies
of those values. To measure such tendencies, they can take the expected value of a
random variable modeling the distribution. For a discrete random variable X, the
expected value, E(X), is the sum of the values X takes weighted by their likelihood
of occurring. We provide the defining formula for E(X) in terms of the probability
mass function.
26 2. Random Variables

Expected Value of a Discrete Random Variable. For the discrete


random variable X with probability mass function p(x) = P (X = x), the
expected value of X, E(X), is given by
X
E(X) = xp(x),
x
where the sum is over all values that X takes. If the sum diverges, then
the random variable is said not to have an expected value.

The expected value of X is also referred to as the expectation or mean of X,


and the smaller case Greek letter µ (mu) is often used to designate it. We provide
examples of expected values.

Example 2.12. Suppose X is a random variable for which P (X = −20) = 0.85,


P (X = 100) = 0.14, and P (X = 500) = 0.01. Compute the expected value of X.

We have E(X) = (−20)(0.85) + (100)(0.14) + (500)(0.01) = 2.

Example 2.13. Suppose X is a random variable that represents a gambler’s gain


on a $100 bet on red at the roulette wheel. Compute the expected value of X.

The ticker on a roulette wheel has equal likelihood of stopping on each of


38 slots. Eighteen of the slots are red, 18 are black, and two are green. Conse-
18 20 20
quently, P (X = 100) = 38 and P (X = −100) = 38 , so that E(X) = (−100)( 38 )+
18
(100)( 38 ) = −5.2632. This expected value tells us that placing many $100 bets on
red will result in an average loss of about $5.26 for gamblers.

If g(x) is a real valued function on the reals and X is a random variable, then
g(X) is a random variable too. We define the random variable Y by

Y = g(X)

and note that it X is discrete,

E(g(X)) = E(Y )
X
= yP (g(X) = y)
y
X X
= y p(x)
y g(x)=y
X X
= yp(x)
y g(x)=y
X
= g(x)p(x).
x

So we have a practical formula for computing the expected value of a function of a


random variable.
2.4. Expected Values and Variance 27

If X is discrete with probability mass function p(x), then


X
E(g(X)) = g(x)p(x),
x
where the sum is over all values that X takes.

We provide a particular example where we illustrate the somewhat confusing


notation underlying the concept.

Example 2.14. If X is a discrete random variable taking the values −6, 6, and 12,
with P (X = −6) = 12 , P (X = 6) = 13 and P (X = 12) = 61 , Y is a random variable
given by Y = X 2 - i.e. Y = g(X) where g(x) = x2 - we compute E(X 2 ).

Note that the random variable Y takes the two values 36 and 144. The proba-
bility Y takes the value 36 is the probability that X = −6 or 6 which is 21 + 13 = 65 .
The probability that Y = 144 is the probability that X = 12 which is 16 . Hence,
5 1
E(Y ) = 36 · + 144 · = 30 + 24 = 54.
6 6
Using the formula we derived above, we have
1 1 1
E(Y ) = E(X 2 ) = (−6)2 · + (6)2 · + (12)2 · = 18 + 12 + 24 = 54.
2 3 6

A particular formula that proves quite useful is for the linear function g(x) =
ax + b, where a and b are constants. We have

E(aX + b) = aE(X) + b.

Example 2.15. Suppose Y is a random variable that represents a gambler’s gain


on a $1000 bet on red at the roulette wheel. Compute E(Y ).

Since Y = 10X, where X is the gain on a $100 bet on red, we can use our work
in the roulette example above to obtain

E(Y ) = E(10X) = 10E(X) = 10(−5.263) = −52.63.

To determine how scattered the values in a distribution are, statisticians model


the distribution with a random variable and take its variance. The variance V (X)
of a random variable X is given by

V (X) = E((X − µ)2 ),

and the standard deviation of X, SD(X), is defined to be the square root of the
variance. We provide a formula for the variance in the case that X is discrete.
28 2. Random Variables

Variance of a Discrete Random Variable. For the discrete random


variable X with probability mass function p(x) = P (X = x) and expected
value µ, the variance of X, V (X), is given by
X
V (X) = (x − µ)2 p(x),
x
where the sum is over all values that X takes. The standard deviation of
X, SD(X), is given by p
SD(X) = V (X).

Using the fact that E(aX + b) = aE(X) + b, we can establish a more compu-
tationally friendly formula for V (X). We have that

V (X) = E((X − µ)2 ) = E(X 2 − 2µX + µ2 )


= E(X 2 ) + E(−2µX) + µ2
= E(X 2 ) − 2µE(X) + µ2
= E(X 2 ) − 2µ · µ + µ2
= E(X 2 ) − µ2
= E(X 2 ) − (E(X))2 .

We formulate this result.

Shortcut Formula for the Variance of X.


V (X) = E(X 2 ) − (E(X))2 .

Example 2.16. For the random variable X above for which P (X = −20) = 0.85,
P (X = 100) = 0.14, and P (X = 500) = 0.01, compute V (X).

We carry out the computation via the regular formula and then using the
shortcut formula.
X
V (X) = (x − µ)2 f (x)
x
= (−20 − 2)2 (0.85) + (100 − 2)2 (0.14) + (500 − 2)2 (0.01)
= (484)(0.85) + (9604)(0.14) + (248004)(0.01)
= 4236 .
2.4. Expected Values and Variance 29

Now by the shortcut formula we have


X
V (X) = (x2 )f (x) − µ2
x
= (20)2 (0.85) + (100)2 (0.14) + (500)2 (0.01) − 22
= (400)(0.85) + (10000)(0.14) + (250000)(0.01) − 4
= 4236 .

In the case that X is a continuous RV, the expected value is determined by an


integral.

Expected Value of a Continuous Random Variable. For the continu-


ous random variable X with probability density function f (x), the expected
value of X, E(X), is defined to be
Z ∞
E(X) = xf (x)dx.
−∞
If the integral diverges, then the random variable is said not to have an
expected value.

Example 2.17. For the random variable X with pdf



x/2 if 0 ≤ x ≤ 2
f (x) =
0 otherwise

compute E(X).

Z ∞
E(X) = xf (x)dx
−∞
Z 2
x
= x· dx
0 2
2
x2
Z
= dx
0 2
x3 2
=
6 0
4
= .
3

If the expected value of X exists and X only takes nonnegative values, then
there’s an alternative formula for computing E(X) that involves tail probabilities.
30 2. Random Variables

Note that
Z ∞
E(X) = xf (x)dx
0
Z ∞Z x
= 1 dyf (x)dx
Z0 ∞ Z0 ∞
= f (x)dxdy
0 y
Z ∞
= P (X > y)dy.
0

For a continuous random variable X that only takes nonnegative values,


Z ∞
E(X) = P (X > x)dx,
0
if the expected value exists.

Example 2.18. Use this probability tails formula to compute the expected value
of the continuous random variable X with probability density function
 2
x3 if x ≥ 1
f (x) =
0 if x < 1

We note that if x < 1, P (X > x) = 1. If x ≥ 1,


Z ∞
2 1 1
P (X > x) = 3
dt = − 2 |∞
x = 2.
x t t x
So we have
∞ 1 ∞
1 1
Z Z Z
E(X) = P (X > x)dx = 1 dx + dx = 1 + (− |∞ ) = 1 + 1 = 2.
0 0 1 x2 x1

If g(x) is a real valued function on the reals and X is a continuous random


variable with pdf f (x), a formula analogous to the one for discrete random variables
can be derived for computing the expected value of X.
Z ∞
E(g(X)) = g(x)f (x)dx.
−∞

This is the integral version of the sum formula we have for discrete random variables.
As was the case with discrete random variables, we have that

E(aX + b) = aE(X) + b

for the constants a and b, even when X is continuous. The variance V (X) of a
continuous random variable X is defined by V (X) = E((X − µ)2 ) just as in the
discrete case.
2.4. Expected Values and Variance 31

Variance of a Continuous Random Variable. For the continuous


random variable X with probability density function f (x), the variance of
X, V (X), is given by
Z ∞ Z ∞ Z ∞ 2
V (X) = (x − µ)2 f (x)dx = x2 f (x)dx − xf (x)dx .
−∞ −∞ −∞
If one of the integrals diverges, then the random variable is said not to have
an expected value.

Example 2.19. For the random variable X above with pdf



x/2 if 0 ≤ x ≤ 2
f (x) =
0 otherwise

compute V (X).

From our previous work, we have that E(X) = 43 . Now


Z ∞
2
E(X ) = x2 f (x)dx
−∞
Z 2
x
= x2 · dx
0 2
2
x3
Z
= dx
0 2
x4 2
=
8 0
= 2.

 4 2 2
We therefore have that V (X) = 2 − = .
3 9

Exercises

(1) Given X is a random variable for which P (X = 0) = 1/4 and P (X = 1) = 3/4,


compute (a) E(X), (b) E(X 2 ), (c) E(2X − 3), (d) V (X),
√ and (e) SD(X).
3
Ans. (a) 3/4, (b) 3/4, (c) −3/2, (d) 3/16, and (e)
4
(2) If we let the random variable X count the number of heads you get when you
toss a balanced coin 10 times, then the probability mass function of X is as
given in the table. Compute (a) E(X), (b) V (X), (c) F (2), and (d) F (2.1),
where F is the CDF.
32 2. Random Variables

x f(x)
0 0.001
1 0.010
2 0.044
3 0.117
4 0.205
5 0.246
6 0.205
7 0.117
8 0.044
9 0.010
10 0.001
Ans. (a) 5 (b) 2.5
(3) An insurance company offers an automobile policy structured as follows: The
company makes $600 with probability 0.95, it loses $300 with probability 0.03,
and it loses $20, 000 with probability 0.02. If X represents the company’s gain
on one of these policies, compute E(X).
Ans. $161
(4) Suppose X is a continuous random variable with pdf

1/5 for 3 ≤ x ≤ 8
f (x) = .
0 otherwise
Compute (a) E(X) and (b) V (X).
25
Ans. (a) 5.5 (b)
12
(5) Suppose X is a continuous random variable with pdf

0 for x < 1
f (x) = 3 .
x4 if x ≥ 1
Compute (a) E(X) and
√ (b) SD(X).
Ans. (a) 3/2 (b) 3/2
(6) Suppose X is a continuous random variable with pdf

0 for x < 0
f (x) = .
3e−3x if x ≥ 0
Compute (a) E(X) and (b) V (X).
Ans. (a) 1/3 (b) 1/9
(7) Compute E(X) if X is a continuous random variable with pdf
1
f (x) = .
π(1 + x2 )
Ans. E(X) does not exist.
(8) Compute SD(X) if X is a random variable representing a gambler’s gain on
a $100 roulette bet on red.
Ans. 99.8614
2.4. Expected Values and Variance 33

(9) Use the probability tails formula to compute the expected value of the con-
tinuous random variable X with probability density function
 −x
e if x ≥ 0
f (x) =
0 if x < 0
(10) Suppose X is a continuous random variable with cumulative distribution func-
tion 
 0 if x < 0
1 2
F (x) = x if 0 ≤x<2
 4
1 if x ≥ 2
Compute then (a) E(X)√
and (b) SD(X).
Ans. (a) 34 (b) 32
Chapter 3

Widely Used Discrete


Random Variables

3.1. Counting Techniques


In order to understand fully the discrete random variables we present in this chapter,
it’s necessary to know some basic counting techniques.
First we mention a multiplication rule. If there are m ways of completing
one task and n ways of completing a second task, then there are m × n ways of
completing both tasks. This allows us to count the number of outcomes when rolling
a six sided die twice. Since there are six outcomes for each roll of the die, the total
number of outcomes for two rolls would be 6 × 6 = 36. Applying the multiplication
rule twice, we see that if you toss a coin three times, the total number of outcomes
is 2 × 2 × 2 = 8. The number of four digit numbers is 10 × 10 × 10 × 10 = 10, 000.
A useful operation to employ for counting purposes is that of factorials. For
natural numbers n, the notation n! - read n factorial - is defined to be

n! = n(n − 1)(n − 2) · · · 1.

For the special case that n = 0, we have the definition 0! = 1. Factorials grow very
quickly. It’s the case that 6! = 120, 10! = 3,628,800, and 20! = 2.433(10)18, etc. A
decent calculator will compute factorials. With the TI-84 for example, to get 12!
input 12, select MATH-PRB-!, and press ENTER, to get 479, 001, 600.
We noted the number of four digit numbers is 10, 000. This gives us the number
of four digit PINs. If we want to count the number of four digit PINs with distinct
numerals (i.e. where none of the numerals in the PIN repeat), we could use the
multiplication rule to get

10 × 9 × 8 × 7 = 5, 040.

35
36 3. Widely Used Discrete Random Variables

Note that this could be written


6! 10! 10!
10 · 9 · 8 · 7 = 10 · 9 · 8 · 7 · = = .
6! 6! (10 − 4)!
This is referred to as the number of permutations of 10 things taken four at a time,
written 10 P4 . In general the number of permutations of n things taken r at a time
is given by
n!
n Pr = .
(n − r)!
Back to the four digit PIN with non-repeating numerals, the PINs 1563 and 6531
- though they contain the same four numerals - are distinct. In other words the
order in which the numerals are arranged is taken into account. If we weren’t
concerned with order - i.e. if we simply wanted to know how many combinations
of four distinct numerals there are chosen from the set {1, 2, 3, 4, 5, 6, 7, 8, 9, 0}, we
note that there are 4 · 3 · 2 · 1 = 24 orderings of the PINs consisting of the elements
of the set {1, 3, 5, 6}. We list a few of them here:
1356, 1365, 1536, 1563, 1635, 1653, 3156, 3165, 3516, 3561, 3615, 3651, ..., 6531
So the number of subsets with four elements chosen from a set with 10 elements is
10!
4!6! .
One might be interested in counting more in general the number of subsets
with r elements there are taken from a set with n elements. The notation for this is
(nr ) - read the number of combinations of n things taken r at a time. This mouthful
of words is often abbreviated as “n choose r”. The formula is
n!
(nr ) = .
r!(n − r)!
An interesting application is to count the number of five card hands in a deck
of 52 cards. Noting that this is nothing more than counting the number of subsets
with five elements in a set with 52 elements, we have that there are
52! 52 · 51 · 50 · 49 · 48
(525 ) = = = 52 · 51 · 10 · 49 · 2 = 2, 598, 960.
5!47! 5·4·3·2
You read that right. There are roughly 2.6 million hands. The reader will note
that we took advantage of a pair of cancellations to carry out the computation. We
noted first that
52! 52 · 51 · 50 · 49 · 48 · 47!
= = 52 · 51 · 50 · 49 · 48.
47! 47!
We also used the fact that 50 48
5 = 10 and 4·3·2 = 2.
A calculator will do the work for you. For example, with a TI-84, you input 52,
select MATH-PRB-nCr, then input 5, and select ENTER. You’ll note the notation
nCr that Texas Instruments uses for n choose r. There are a number of ways to
represent this in the literature:
(nr ) = nCr = C(n, r) =n Cr = Crn .

Exercises
3.2. Bernoulli 37

(1) A certain state’s license plates numbers consists of three letters A-Z followed
by three numerals 0-9. Compute the total number of license plates numbers
this state can have.
Ans. 17,576,000
(2) How many subsets with two elements does the set {a, b, c, d} have? List them.
Ans. (42 ) = 6, The subsets are {a, b}, {a, c}, {a, d}, {b, c}, {b, d}, {c, d}
(3) How many permutations of size two does abcd have? List them.
Ans. 4 P2 = 12, The permutations are ab, ba, ac, ca, ad, da, bc, cb, bd, db, cd, dc

(4) The Binomial theorem states that for natural numbers n,


Xn
(a + b)n = (nk )an−k bk .
k=0

Use this theorem to expand (a) (x + 1)4 , (b) (2x + 3)3 .


(a) x4 + 4x3 + 6x2 + 4x + 1, (b) 8x3 + 36x2 + 54x + 27
(5) How many combinations of four different letters are there? How many four
letter words are there (including nonsensical ones)? How many four letter
words with non-repeating letters are there.
Ans. 14,950, 456,976, 358,800
(6) How many different two person teams can you select from 10 individuals
Ans. 45
(7) Prove that (nn ) = (n0 ) = 1.
(8) Prove that (n1 ) = n
(9) Prove that (nr ) = (n−r
n
).
(10) If you toss a coin 20 times, compute the number of ways you could get exactly
eight heads. [Hint: One way would be if the first eight were heads and the
next 12 tails; another way would be if the first, third, fifth, seventh, ninth,
eleventh, thirteenth, and fifteenth, were heads and the rest tails; etc.]
Ans. 125,970
(11) If you roll a die 10 times, count the number of ways you could get (a) exactly
two 6’s and (b) exactly three 6’s.
Ans. (a) 45, (b) 120

3.2. Bernoulli
Perhaps the simplest of all discrete random variables used for modeling purposes
is the Bernoulli. It’s named after Jacob Bernoulli, a Swiss mathematician from the
latter half of the seventeenth century. The random variable X is said to have a
Bernoulli distribution with parameter p if P (X = 1) = p and P (X = 0) = 1−p. The
parameter p of course has to be a positive number less than 1. Simple computations
will produce the mean and standard deviation of the Bernoulli. We have
E(X) = 0 · (1 − p) + 1 · p = p
and
p p p
SD(X) = E(X 2 ) − (E(X))2 = 02 · (1 − p) + 12 · p − p2 = p(1 − p).
38 3. Widely Used Discrete Random Variables

3.3. Binomial
Suppose only 15% of airline passengers arriving at an international airport are
chosen for complete baggage scrutiny and that these passengers are selected at
random. What’s the probability that exactly three of the next ten passengers will
be selected? To answer this question we first deduce the probability that, of the
next ten passengers, the first three are selected and the following seven are not.
Since whether or not one passenger is selected is independent of whether or not
another is selected, we have that this probability is

(0.15)3 (0.85)7 .

Now there are many different ways exactly three of the ten passengers can be
selected. It could be the first three or the last three, or it could be the third, fifth,
and ninth, etc. The number of such ways is equal to the number of subsets of three
elements there are in a set of ten elements. In other words, there are

(103 ) = 120

ways. Each one of the ways is equally likely, so we have that the answer to our
question is

(103 )(0.15)3 (0.85)7 = 0.1298.

The passenger baggage check example illustrates a binomial model. In order to


compute probabilities associated with phenomena involving n independent trials,
each of which results in success with probability p, we let the random variable X
count the number of successes in the n trials. Then X is said to be a binomial
random variable with parameters n and p. The probability mass function for this
random variable is given by

P (X = k) = (nk )pk (1 − p)n−k ,

for k = 0, 1, 2, ..., n. A random variable having this probability mass function is said
to be binomial with parameters n and p.
3.3. Binomial 39

We now compute the expected value of such a random variable:


n
X
E(X) = k(nk )pk (1 − p)n−k
k=0
Xn
= k(nk )pk (1 − p)n−k
k=1
n
X k · n!
= pk (1 − p)n−k
k!(n − k)!
k=1
n
X (n − 1)!
= np pk−1 (1 − p)(n−1)−(k−1)
(k − 1)!(n − 1 − (k − 1))!
k=1
n−1
X (n − 1)!
= np pk (1 − p)(n−1)−k
k!((n − 1) − k)!
k=0
= np(p + 1 − p)n−1
= np.

We used the Binomial Theorem for the penultimate equality. A similar compu-
tation along with some clever algebra allows us to compute E(X 2 ) = n(n−1)p2 +np.
We have
E(X 2 ) = E(X(X − 1) + X)
Xn
= k(k − 1)(nk )pk (1 − p)n−k + E(X)
k=0
n
X
= k(k − 1)(nk )pk (1 − p)n−k + np
k=2
n
X k(k − 1) · (n − 2)!
= n(n − 1)p2 pk−2 (1 − p)n−k + np
k!(n − k)!
k=2
n
X (n − 2)!
= n(n − 1)p2 pk−2 (1 − p)(n−2)−(k−2)
(k − 2)!(n − 2 − (k − 2))!
k=2
+ np
n−2
X (n − 2)!
= n(n − 1)p2 pk (1 − p)(n−2)−k + np
k!((n − 2) − k)!
k=0
= n(n − 1)p (p + 1 − p)n−2 + np
2

= n(n − 1)p2 + np.

This allows us to compute


V (X) = n(n − 1)p2 + np − (np)2
= n2 p2 − np2 + np − n2 p2
= np − np2
= np(1 − p).
40 3. Widely Used Discrete Random Variables

Consequently, the mean and standard


p deviation of a binomial random variable with
parameters n and p are np and np(1 − p), respectively.
Since the binomial random variable is so widely used for modeling purposes,
we repeat the distribution’s basic formulas.

Binomial Distribution. If X is a binomial random variable with parame-


ters n and p (i.e. X counts the number of successes in n independent trials,
each of which results in success with probability p), then
P (X = k) = (nk )pk (1 − p)n−k ,
p
for k = 0, 1, 2, ..., n. Moreover, E(X) = np and SD(X) = np(1 − p).

Exercises

(1) If you toss a fair coin 10 times, what’s the probability you get (a) exactly
five heads, (b) exactly three heads, and (c) exactly seven heads? Round your
answers off to four decimal places.
Ans. (a) 0.2461 (b) 0.1172 (c) 0.1172
(2) If you roll a balanced die 20 times, what’s the probability you get (a) exactly
four 6’s? (b) at least two 6’s?
Ans. (a) 0.2022 (b) 0.8696
(3) If you take a 10 question true false exam by guessing on each question, what’s
the probability (a) you get exactly seven questions correct? (b) you pass the
exam be getting at least 60% of the questions correct?
120
Ans. (a) 1024 = 0.1172, (b) 0.377
(4) Assuming it’s equally likely a couple will have a boy or a girl, what’s the
probability that a couple having five children will have (a) exactly two boys ?
(b) all boys?
Ans. (a) 0.3125 (b) 0.03125
(5) Only 15% of motorists come to a complete stop at a certain four way stop
intersection. What’s the probability that of the next ten motorists to go
through that intersection (a) none come to a complete stop, (b) at least one
comes to a complete stop, and (c) exactly two come to a complete stop.
Ans. (a) 0.1969, (b) 0.8031, (c) 0.2759
3
(6) Given X is a binomial random variable with n = 20 and p = 4, compute
F (18) where F (x) is the CDF of X.
Ans. F (18) = P (X ≤ 18) = 1 − P (X = 19 or 20) = 0.9757
(7) Given X is a binomial random variable with n = 10 and p = 12 , compute
P (µ − σ ≤ X ≤ µ + σ) where µ is the expected value and σ the standard
deviation of X. √ √
Ans. P (µ − σ ≤ X ≤ µ + σ) = P (5 − 2.5 ≤ X ≤ 5 + 2.5) = P (X =
4, 5, or 6) = 0.65625.
(8) Given X is a binomial random variable with mean 6 and standard deviation
2, compute P (X = 5).
Ans. 0.1812
3.4. Hypergeometric 41

3.4. Hypergeometric
Another discrete random variable that is closely related to the binomial is the
hypergeometric. For this model, there are a number of trials where the probability
of success in each trial changes depending on what happens in previous trials.
Suppose we select n objects from a lot of N objects without replacement. Suppose,
moreover that M of the objects in the lot of N are of a characteristic of interest.
We can then compute the probability that k of the n objects we select are of
the characteristic. Letting X count the number of objects of the characteristic of
interest in the selection of n objects, we have
N −M
(M
k )(n−k )
P (X = k) = .
(N
n)

A random variable with this probability mass function is said to hypergeometric


with parameters N , M , and n.
We provide a simple example.

Example 3.1. Suppose five cards are dealt without replacement from a well-
shuffled deck of 52 playing cards. What’s the probability exactly two of the five are
aces?

We note that if X counts the number of aces dealt, then X is hypergeometric


with the characteristic of interest being that the card is an ace. We have n = 5,
N = 52, and M = 4. The answer is

(42 )(52−4
5−2 )
P (X = 2) = = 0.0399.
(525 )

A somewhat intricate derivation yields

M
E(X) = n( )
N
and
 M  M  N − n 
V (X) = n 1− .
N N N −1
It’s interesting to note that if M → ∞ and N → ∞ in such a way that M/N stays
constant, say M N = p, then E(X) = np and lim V (X) = np(1 − p). Indeed,
M,N →∞
when M and N are really large, M/N changes very little from selection to selection
in a small sample from the lot, so that the hypergeometric is approximated by
the binomial with p = M N . After, summarizing the important formulas for the
hypergeometric, we illustrate with an example.
42 3. Widely Used Discrete Random Variables

The hypergeometric random variable counts the number of objects of a


characteristic of interest drawn from a lot of size N when M objects in the
lot have the characteristic and the sample size is n. The hypergeometric
random variable X with the parameters N , M , and n has probability mass
function
(M )(N −M
n−k )
P (X = k) = k N
(n )
for k = 0, 1, 2, ..., min{M, n}. This random variable is approximately bino-
mial with parameters n and p = M N if N and M are really large and n is
relatively small.

Example 3.2. Suppose a city has 322,000 registered voters, 58% of whom support
a certain referendum. In a random sample of 20 voters from that city, what’s the
probability that exactly 12 support the referendum?

Letting X count the number in the sample who support the referendum, we
have that X is hypergeometric with N = 322, 000, M = (0.58)(322, 000) = 186, 760,
and n = 20. The answer is therefore
322000−186700
(186760
12 )( 20−12 )
P (X = 12) = 322000 = 0.1774.
( 20 )
Since M and N are so large, the binomial with n = 20 and p = 0.58 gives us a good
approximation:
P (X = 12) = (20 12
12 )(0.58) (1 − 0.58)
20−12
= 0.1768.

Exercises

(1) If 13 cards are to be dealt without replacement from a well-shuffled deck of 52


cards, compute the probability that (a) exactly five will be picture cards and
(b) exactly six will be picture cards. Keep in mind that a deck of 52 cards has
12 picture cards.
Ans. (a) 0.0959 (b) 0.0271
(2) If five cards are to be dealt without replacement from a well-shuffled deck of
52 cards, compute the probability that (a) exactly one will be an ace, (b)
exactly two will be an ace, (c) at least one will be an ace. Keep in mind that
a deck of 52 cards has four aces.
Ans. (a) 0.2995 (b) 0.0399 (c) 0.3412
(3) A calculus professor has 100 students in her class. She randomly selects five
students and tests them individually to see if they can find the antiderivative
of x sin x. If all five can find it, then she’ll conclude the entire class knows
how to find it. There are actually 20 students in the class who are not able to
calculate the antiderivative. What’s the probability she’ll mistakenly conclude
the whole class can do it?
Ans. 0.3193
3.5. Poisson 43

(4) Suppose a unit in a certain company consists of 12 engineers, five accoun-


tants, and three administrative assistants. If a manager randomly selects four
individuals from the unit to be on a committee, what’s the probability the
committee will have (a) exactly two engineers (b) all engineers?
Ans. (a) 0.3814 (b) 0.1022
(5) Suppose a vending machine is loaded with 120 soft drinks, 10 of which are
past the expiration date. What’s the probability that a customer who buys
three drinks from the machine right after it’s loaded will get at least one with
an expired date? Assume the drinks were loaded randomly.
Ans. 0.2315
(6) Suppose a company manufactures 1.2 million TV sets, 7200 of which have a
defective power switch. Compute the probability to seven decimal places that
a restaurant that buys 15 of this company’s TV sets will have at least one
with a defective power switch using (a) the hypergeometric distribution and
(b) the binomial distribution.
(a) .0863170 (b) 0.0863165

3.5. Poisson
The Poisson distribution - named after the great French mathematician and physi-
cist Simeon Poisson of the early nineteenth century - arises as a discrete limiting
model of the binomial. We let X be binomial with parameters n and p and consider
a limiting version of this random variable where n → ∞ and p → 0 in such a way
that np stays constant, say np = λ. We note that for k = 0, 1, 2, ..., n,
lim P (X = k) = lim (nk )pk (1 − p)n−k
n→∞, p→0 n→∞,p→0
 n(n − 1) · · · (n − k + 1)  λ k  λ n−k
= lim 1−
n→∞, p→0 k! n n
k
λ  n(n − 1) · · · (n − k + 1)  λ n−k
= lim 1 −
k! n→∞, p→0 nk n
k
λ  1  2   k − 1
= lim (1) 1 − 1− ··· 1 −
k! n→∞, p→0 n n n
 λ n  λ −k
· 1− 1−
n n
λk
= (1)(e−λ )(1)
k!
λk
= e−λ .
k!

The factor immediately preceding e−λ in the second to last line in fact has limit 1
since it consists of the finite number k of factors, each having limit 1.

The random variable X is said to be Poisson with parameter λ if


 λk 
P (X = k) = e−λ ,
k!
44 3. Widely Used Discrete Random Variables

for k = 0, 1, 2, 3, .... As the limiting derivation above shows, the Poisson with
parameter λ = np approximates a binomial random variable with parameters n
and p if n is large and p is close to zero. In practice, this approximation is good if
n ≥ 100, p ≤ 0.01, and np ≤ 20.

Computations show that

E(X) = V (X) = λ.

Note

X  λk 
E(X) = k · e−λ
k!
k=0
X∞  λk 
= k · e−λ
k!
k=1
∞ 
X λk−1 
= λe−λ
(k − 1)!
k=1
∞ 
X λk 
= λe−λ
k!
k=0
−λ λ
= λe e
= λ.

Also,

X  λk 
E(X(X − 1)) = k(k − 1) · e−λ
k!
k=0
X∞  λk 
= k(k − 1) · e−λ
k!
k=2
∞ 
X λk−2 
= λ2 e−λ
(k − 2)!
k=2
∞  k
X λ
= λ2 e−λ
k!
k=0
2 −λ λ
= λ e e
2
= λ .

We therefore have that the variance is

E(X 2 ) − (E(X))2 = [E(X 2 ) − E(X)] + E(X) − (E(X))2 = λ2 + λ − λ2 = λ.

We provide here the basic properties of the Poisson before applying them to an
example problem.
3.5. Poisson 45

The Poisson random variable X with parameter λ has the probability mass
function
 λk 
P (X = k) = e−λ ,
k!
for k = 0, 1,
√ 2, 3, .... The mean and standard deviation are E(X) = λ and
SD(X) = λ.

Example 3.3. If 0.8% of the population has a certain disease, compute the proba-
bility that exactly three of two hundred randomly chosen individuals will have the
disease. Carry out this computation to four decimal places using (a) the binomial
distribution and (b) the Poisson distribution.

We let X count the number of individuals in the sample to have the disease.
Then (a) according to the binomial model we have that
P (X = 3) = (200 3
3 )(0.008) (1 − 0.008)
200−3
= 0.1382,
and (b) according to the Poisson model we have that
3
[(200)(0.008)]
P (X = 3) ∼
= e−200(0.008) = 0.1378.
3!
You’ll note that the approximation from the Poisson is good to three decimal places.

Another application of the Poisson is to count the number of occurrences of an


event in a given time interval. For example one can model the number of objects in
a queue over time. The application arises under three assumptions for the random
variable N (t) that counts the number of events occurring during the time interval
[0, t]:
(1) P (N (t) = 1) = λt + o(t) where λ is a constant
(2) P (N (t) ≥ 2) = o(t)
(3) The number of events occurring in disjoint subintervals of [0, t] are indepen-
dent.
The expression o(t) represents a function f (t) with the property that
1
lim f (t) = 0.
t→0 t
The reader will note, that a function meeting this requirement will have to approach
zero rapidly. An example is f (t) = t2 . Less precisely, the assumptions are that (1)
the probability that a single event occurs in a small time interval of length t is λt
plus a term that is small in relation to t, (2) the probability that two or more events
occur in a small interval of length t is small in relation to t, and (3) that which
occurs in one interval has no probability effect on what happens in another disjoint
interval.

To compute P (N (t) = k) we divide the interval [0, t] into n subintervals, each


of length t/n. Then we can write P (N (t) = k) as the sum of two probabilities which
we will discuss. The first is the probability that exactly one event occurs in each
of k subintervals and that no events occur in any of the other n − k subintervals.
46 3. Widely Used Discrete Random Variables

The second is the probability that N (t) = k and that two or more events occur in
at least one subinterval. The reader will note that the second of these probabilities
is bounded above by
Xn  o(t/n) 
o(t/n) = n · o(t/n) = t .
t/n
k=1

Since t/n → 0 as n → ∞, we have that


 o(t/n) 
t →0
t/n
as n → ∞.
The first of the two probabilities is equal to
 λt k  λt n−k
(nk ) + o(t/n) 1− − o(t/n) .
n n
Recalling that the binomial with large n and small p approximates the Poisson with
parameter np, we consider
 λt   o(t/n) 
n + o(t/n) = λt + t
n t/n
which approaches λt as n → ∞. Hence, by letting n → ∞, this first probability
becomes
(λt)k
e−λt .
k!
Consequently, we arrive at
(λt)k
P (N (t) = k) = e−λt
k!
for k = 0, 1, 2, ..., .

We provide an application.

Example 3.4. The number of tornadoes touching down per year in the two parish
Caddo/Bossier region is Poisson with mean 19. Compute (a) the probability that
there will be exactly 16 tornadoes to touch down in this region next year and (b)
that there will be exactly 40 to touch down over the next two years.

(a) If X counts the number of tornadoes to touch down in Caddo/Bossier Parish


region next year, then E(X) = λ = 19. We thus have that
(19)16
P (X = 16) = e−19 = 0.0772.
16!
If Y counts the number to touch down over the next two years, then E(Y ) =
(λ)(2) = 38, so that
(38)40
P (Y = 40) = e−38 = 0.0598.
40!

Exercises
3.6. Geometric 47

(1) If X is Poisson with parameter λ = 3, compute (a) P (X = 4), (b) P (X ≤ 2),


(c) E(X), (d) SD(X), and (e) P (X ≥ 1). √
Ans. (a) 0.1680, (b) 0.4232, (c) 3, (d) 3, and (e) 0.9502
(2) The number of cars that go through Griff’s drive-through during the noon
hour is Poisson with mean 21.6. What’s the probability that during the noon
hour tomorrow (a) exactly 22 go through the drive-through (b) at least seven
pass through the drive-through.
Ans. (a) 0.0844 (b) 0.99992
(3) Suppose 0.3% of the population has a certain disease. In a random sample of
400 people, what’s the probability that (a) at least one has the disease and
(b) exactly two have the disease? Compute the answer for each part using
both the binomial distribution and the Poisson.
Ans. (a) 0.69935, 0.69881, (b) 0.21723, 0.21686
(4) Suppose 0.6% of the TV sets a company manufactures have a defective power
switch. Compute the probability that a university that buys 150 of this com-
pany’s TV sets will have just one with a defective power switch using (a) the
binomial distribution and (b) the Poisson distribution.
Ans. (a) 0.3671 (b) 0.3659
(5) Suppose there are 6.2 accidents per year on average on a certain interstate
entrance ramp. Use the Poisson distribution to compute the probability there
will be at most four accidents on this entrance ramp over the next two years.
Ans. 0.0057

3.6. Geometric
Suppose only eight percent of New Orleans residents attended the last Saints foot-
ball game. If you randomly select residents of the city trying to find one who
attended the game, what’s the probability that you don’t encounter an attendee
until the sixth selection? For this to happen, the first five selected would have had
to be non-attendees and the sixth an attendee. The probability would therefore be

(1 − 0.08)5 (0.08) = 0.0527.

The fact that residents are selected randomly makes the selections independent,
allowing us to arrive at the answer by multiplying the probabilities of each of the
five non-attendees times the probability of the the subsequent attendee. Those
who study random phenomena use what is called the geometric random variable to
model this situation.
We consider a succession of independent trials, each of which results in success
with probability p. The geometric random variable X counts the number of trials
required to encounter the first success. In the example just related, we have that
P (X = 6) = 0.0527. In general we have:
48 3. Widely Used Discrete Random Variables

The geometric random variable X counts the number of independent trials


it takes to obtain a success when an individual trial results in success with
probability p. It’s mass function is given by
P (X = n) = (1 − p)n−1 p,
for n = 1, 2,

3, .... The mean and standard deviation are E(X) = 1/p and
1−p
SD(X) = p .

The reader will note that the probability mass function is in fact legitimate
since the probabilities of all the values X can take sum to 1:
∞ ∞
X
n−1
X 1
(1 − p) p=p (1 − p)n−1 = p · = 1.
n=1 n=1
1 − (1 − p)
To compute the mean, note that
X∞
E(X) = n(1 − p)n−1 p
n=1

X
= p+ n(1 − p)n−1 p
n=2

X
= p + (1 − p) n(1 − p)n−2 p
n=2
X∞
= p + (1 − p) (n + 1)(1 − p)n−1 p
n=1

X
= p + (1 − p)(E(X) + 1(1 − p)n−1 p)
n=1
1
= p + (1 − p)(E(X) + p · )
1 − (1 − p)
= p + (1 − p)(E(X) + 1)
= (1 − p)E(X) + 1.
Solving E(X) = (1 − p)E(X) + 1 for E(X) yields E(X) = p1 . Computing the
variance of X requires a bit more ingenuity. The classic derivation involves writing
X 2 = X(X − 1) + X to get a value for E(X 2 ).

Exercises

(1) When flipping a balanced coin, what’s the probability that you get the first
heads on the fourth try?
1
Ans. 16
(2) If X is a random variable counting the number of times you have to flip
a balanced coin to get the first heads, (a) determine the probability mass
function for X, (b) compute E(X), and (c) compute V (X).
Ans. p(n) = 21n , E(X) = V (X) = 2
3.7. Negative Binomial 49

(3) If X is a random variable counting the number of tosses of a balanced die it


takes to get the first six, (a) determine the probability mass function for X,
(b) compute E(X), and (c) compute V (X).
n−1
Ans. p(n) = 5 6n , E(X) = 6, V (X) = 30
(4) The probability an adult male in Finland has a interpupillary distance (IPD)
of less than 60 mm is 22%. If you measure the IPD of randomly selected
Finnish adult males, what’s the probability the tenth one selected is the first
one to have an IPD less than 60?
Ans. 0.0235

3.7. Negative Binomial


The negative binomial random variable generalizes the geometric. Again we con-
sider a number of independent trials, each resulting in success with probability p.
This time our random variable X counts the number of trials it takes to obtain
r successes (instead of just one success as was the case with the geometric). To
compute P (X = n) we’ll note that in the first n − 1 trials there need to be r − 1
successes, and the nth trial needs to result in success. Consequently,
n−1 r−1 n−1 r
P (X = n) = (r−1 )p (1 − p)n−1−(r−1) · p = (r−1 )p (1 − p)n−r .

A technique we introduce later will allow us to derive the formulas for the mean
and standard deviation. We summarize in the table.

For the negative binomial random variable X counting the number of in-
dependent trials (each with success probability p) necessary to obtain r
successes,
n−1 r
P (X = n) = (r−1 )p (1 − p)n−r ,
for n = 1, 2,
√3, .... The mean and standard deviation are E(X) = r/p and
r(1−p)
SD(X) = p .

Exercises

(1) What’s the probability you have to roll a die eight times in order to get two
6’s?
Ans. 0.0651
(2) What’s the probability you have to roll a die 15 times in order to get four 1’s?
Ans. 0.0378
(3) A real estate analyst knows that in a certain county 23% of the houses have
a selling price higher than $300, 000. What’s the probability that during the
next month in that county, the fifth house sold at a price of more than $300, 000
is the twentieth house sold that month?
Ans. 0.0495
Chapter 4

Widely Used Continuous


Random Variables

4.1. Uniform
In previous chapters we studied theoretical continuous random variables just to
gain an understanding of probability density functions. Now we look at several
continuous random variables that are used extensively to model random phenom-
ena.
The uniform distribution on the interval [a,b], a < b, is given by the random
variable X with probability density function
 1 
b−a for a ≤ x ≤ b
f (x) = .
0 otherwise
Note that f (x) is in fact a probability density function since f (x) ≥ 0 for all x and
Z ∞ Z b
1 1 1
f (x)dx = dx = · x|ba = (b − a) = 1.
−∞ a b − a b − a b − a
In one of the exercises, the reader is asked to show that the random variable’s mean
(b−a)2
and variance are a+b
2 and 12 , respectively. We box in the basic formulas for the
uniform distribution before presenting an example.

Uniform Distribution. The random variable X is said to be uniform on


the interval [a,b], a < b, if its probability density function is
 1 
b−a for a ≤ x ≤ b
f (x) = .
0 otherwise
a+b b−a
For this random variable E(X) = and SD(X) = √ .
2 12

51
52 4. Widely Used Continuous Random Variables

Example 4.1. When a clock’s battery runs out, the location at which the minute
hand stops is uniform with pdf
 1 
12 for 0 ≤ x ≤ 12
f (x) = .
0 otherwise
The 0 and 12 refer to the hours on the clock face, of course. Compute the probability
that the minute hand stops somewhere between 10 and 11 o’clock.
R 11 11
1 x 1
P (10 < X < 11) = 10 12
dx = 12 10 = 12 .

Exercises

(1) Given that X is a uniform random variable on the interval [−15, 185], compute
(a) P (X < 0), (b) P (10 < X < 50), and (c) P (X = 85).
Ans. (a) 0.075 (b) 0.2 (c) 0
(2) Computed the 75th percentile of X in the previous problem.
Ans. 135
a+b
(3) If the random variable X is uniform on [a, b], show that (a) E(X) = 2 and
2
(b−a)
(b) V (X) = 12 .
(4) Use the formulas in the previous problem to compute (a) the mean and (b)
the standard deviation of the random variable in the first exercise in this set.
Ans. (a) 85 (b) 57.73503
(5) If F (x) is the CDF of a uniform random variable on the interval [2, 6], compute
(a) F (3), (b) F (4), and (c) F (100).
Ans. (a) 0.25 (b) 0.5 (c) 1
(6) Find a rule for thefunction F (x) in the previous exercise.
 0 if x < 2
x−2
Ans. F (x) = if 2≤x<6
 4
1 if x ≥ 6
(7) Find a rule for the CDF F (x) of the random variable X that is uniform on
the interval [a, b]. The rule should consist of the three pieces where x < a,
a ≤ x < b, and x ≥  b.
 0 if x < a
x−a
Ans. F (x) = if a≤x<b
 b−a
1 if x ≥ b

4.2. Exponential
The random variable X with parameter λ > 0 is said to be an exponential random
variable if it has probability density function
λe−λx for x ≥ 0

f (x) = .
0 otherwise
Many phenomena, such as life times of organisms and interarrival times of
customers at a store or calls made to telephone company, etc., can be modeled
4.2. Exponential 53

accurately with exponential distributions. That this is the case is understandable


if we consider interarrival times of events modeled by the Poisson.

The reader will recall that we studied the random variable N (t) that counts
the number of events occurring during the time interval [0, t]. We imposed the
assumptions that (1) the probability that a single event occurs in the time interval
is λt plus a term that is small in relation to t, (2) the probability that two or more
events occur in the interval is small in relation to t, and (3) that which occurs in one
interval has no probability effect on what happens in another disjoint interval. The
resulting probability mass function at which we arrived for N (t) was the Poisson:
(λt)k
P (N (t) = k) = e−λt
k!
for k = 0, 1, 2, ....

We now consider the time T on an interval up until which the first event occurs.
We have that
FT (t) = P (T ≤ t)
= 1 − P (T > t)
= 1 − P (N (t) = 0)
(λt)0
= 1 − e−λt
0!
1 − e−λt .

Consequently,
d
fT (t) = FT′ (t) = (1 − e−λt ) = λe−λt .
dt
The reader will recognize this as the probability density function for the exponential
random variable with parameter λ. Subsequent interarrival times are independent
and follow the same distribution.

Routine computations show that if X is exponential with parameter λ, E(X) =


1
λ and SD(X) = λ1 . We carry out the integration by parts necessary to obtain the
mean. By letting u = x and dv = λe−λx dx we get du = dx and v = −e−λx , giving
us

Z ∞
E(X) = x · λe−λx dx
0
Z ∞
= −xe−λx |∞
0 + e−λx dx
0
1
= 0 − 0 − e−λx |∞
0
λ
1
= .
λ
54 4. Widely Used Continuous Random Variables

Note that the computation of lim xe−λx in this integral requires use of L’Hospital’s
x→∞
Rule:

x 1
lim xe−λx = lim λx
= lim = 0.
x→∞ x→∞ e x→∞ λeλx

We summarize the basics for the exponential distribution and then present
some examples.

Exponential Distribution. The exponential random variable X with


parameter λ > 0 has probability density function
λe−λx for x ≥ 0

f (x) = .
0 otherwise
For this distribution, E(X) = SD(X) = 1/λ.

Example 4.2. The time between calls at a phone company is exponentially dis-
tributed with mean 4 s. What’s the probability that the time between the next two
calls is more than 5 s?

Let X = the inter-arrival time of calls in seconds. Then


Z ∞
1 −x x
− 45 .
P (X ≥ 5) = e 4 dx = −e− 4 |∞5 = e = 0.287.
5 4

Example 4.3. If X is an exponential RV for which P (X < 1) = 0.90, find t so


that P (X > t) = 0.05. (Note that this value of t is called the 95th percentile of X,
and 1 is the 90th percentile.

We first find λ:
Z 1
0.90 = P (X < 1) = λe−λx dx = −e−λx |10 = −e−λ + 1,
0
.
so e−λ = 0.10 and λ = − ln 0.10 = 2.3026.
Now we have
Z ∞
.
0.05 = P (X > t) = 2.3026e−2.3026xdx = · · · = e−2.3036t ,
t
1 .
so that t = − 2.3026 ln 0.05 = 1.301.

Exercises

(1) Given X is an exponential random variable with mean 4, compute the prob-
abilities (a) P (X ≤ 4) and (b) P (X ≤ 2) to four decimal places.
Ans. (a) 0.6321 (b) 0.3935
(2) Given that X is an exponential random variable with parameter λ = 1, find
a rule for the distribution
 function F of X.
0 if x < 0
Ans. F (x) =
1 − e−x if x ≥ 0
4.3. Gamma 55

(3) Find (a) the median and (b) the mean of the exponential random variable X
that has parameter λ = 1. Why is one greater than the other?
Ans. (a) ln 2 (b) 1
(4) Find (a) the mean, (b) the standard deviation, and the (c) median of the
exponential random variable X that has parameter λ = 3.
Ans. (a) 1/3, (b) 1/3, (c) −(1/3) ln(1/2) = 0.2310
(5) Suppose you model the lifetime of a certain battery with an exponential ran-
dom variable that has mean 2.5 hrs. According to this model, what’s the
probability that a randomly selected such battery lasts for more than three
hours?
Ans. 0.3012
(6) If X is an exponential random variable for which P (X < 1) = 0.80, find t so
that P (X > t) = 0.05.
Ans. 1.8614

4.3. Gamma
Playing an integral role in the gamma distribution is the gamma function
Z ∞
Γ(α) = e−t tα−1 dt.
0

Integrating by parts, one can establish the relationship Γ(α) = (α − 1)Γ(α − 1).
Using this formula when α is a positive integer n and noting that Γ(1) = 1, we have
that
Γ(n) = (n − 1)!.

The probability density function for the gamma distribution with positive pa-
rameters α and β is given by
x
1 α−1 − β

Γ(α)β αx e for x ≥ 0
f (x) = .
0 otherwise
The reader will note that the gamma distribution reduces to the exponential with
parameter λ for α = 1 and β = λ1 . The gamma distribution models the waiting
time until the αth event occurs.

We summarize the basics for the gamma distribution.

Gamma Distribution. The gamma random variable X with parameters


α > 0 and β > 0 has probability density function
x
1 α−1 − β

Γ(α)β α x e for x ≥ 0
f (x) = .
0 otherwise

For this distribution, E(X) = αβ and SD(X) = β α.
56 4. Widely Used Continuous Random Variables

4.4. Normal
Many populations of values follow what is called a normal distribution. Examples
are weights of female individuals, heights of corn stalks, lengths of “8.5 × 11” sheets
of paper, etc., Moreover, one can apply a result called the Central Limit Theorem
to model the averages of random values from any distribution with the normal
distribution. Hence, the normal distribution plays a major role in statistics.

Normal Distribution. The random variable X is said to be a normal


with parameters µ and σ > 0 if it has probability density function
1 (x−µ)2
f (x) = √ e− 2σ2 .
2πσ

This pdf has some notable characteristics:

1) f (x) > 0 for all x


1
2) f (µ) = √
2πσ
1 a2
3) f (µ + a) = √ e− 2σ2 = f (µ − a), so f is symmetric about x = µ.
2πσ
(x − µ) − (x−µ)2
4) f ′ (x) = − √ e 2σ2 , so f ′ (x) < 0 if x > µ and f ′ (x) > 0 if x < µ.
2πσ 3
Thus, f is increasing for x < µ and decreasing for x > µ.
5) lim f (x) = 0 = lim f (x)
x→∞ x→−∞
1 (x−µ)2
6) f ′′ (x) = [x − (µ − σ)][x − (µ + σ)] √ e− 2σ2 , so f ′′ (x) = 0 if x = µ±σ,
2πσ 5
f ′′ (x) < 0 if µ − σ < x < µ + σ, and f ′′ (x) > 0 if x < µ − σ or x > µ + σ. Hence,
f is concave downward on the interval (µ − σ, µ + σ) and concave upward outside
this interval, with inflection points at µ ± σ.

Putting all this together we see that the probability density function for the
normal random variable is a bell-shaped curve symmetric about x = µ. The larger
the value of σ the flatter the curve.

Example. The number of ounces of Coke in a “12 oz” can is normally distributed
with µ = 12 and σ = 0.2. What’s the probability that a randomly selected 12 oz
can of Coke has (a) between 11.9 and 12.1 oz? (b) between 11.5 and 12.5 oz? (c)
at least 11.8 oz?

(a) Letting f (x) be the pdf for the normal random variable X with mean 12
and standard deviation 0.2, we see that
Z 12.1
.
P (11.9 < X < 12.1) = f (x)dx = 0.383.
11.9
The integral cannot be evaluated algebraically, so a graphing calculator or computer
algebra system is used.
4.4. Normal 57

Z 12.5
.
(b) P (11.5 < X < 12.5) = f (x)dx = 0.988
11.5
Z ∞ Z 12
.
(c) P (X ≥ 11.8) = f (x)dx = f (x)dx + 0.5 = 0.841.
11.8 11.8

Since f (x) is symmetric about x = 12, half of X’s probability is to the right of
12.

The standard normal random variable is a normal random variable with µ = 0


and σ = 1. It is denoted by Z. The pdf of Z is

1 x2
f (x) = √ e− 2 .

A double integral in polar coordinates can be used to show that this function
integrates to one:
Z ∞
1 x2
If we let I = √ e− 2 dx, we have
−∞ 2π

∞ Z ∞
1 − x2 1
Z
y2
2
I = √ e 2 dx · √ e− 2 dy
−∞ 2π −∞ 2π
Z ∞Z ∞
1 − x2 1 − y2
= √ e 2 · √ e 2 dxdy
−∞ −∞ 2π 2π
Z ∞Z ∞
1 − x2 +y2
= e 2 dxdy
−∞ −∞ 2π
Z 2π Z ∞
1 − r2
= e 2 rdrdθ
0 0 2π
Z 2π
1 r2
= − e − 2 |∞ 0 dθ
0 2π
Z 2π
1
= dθ
0 2π
= 1

so that I = 1.

When one needs to compute a probability associated with a normal distribution,


the integration cannot be done with pencil and paper. The usual procedure is to
convert the integrand to a standard normal density function using the substitution
x−µ
u= and then use a widely available standard normal distribution table to
σ
58 4. Widely Used Continuous Random Variables

approximate the integral. Since


X −µ
P( < c) = P (X < cσ + µ)
σ
Z cσ+µ
1 (x−µ)2
= √ e− 2σ2 dx
−∞ 2πσ
Z c
1 − u2
= √ e 2 du
−∞ 2π
= P (Z < c),
X −µ
it’s the case that and Z have the same distribution. Statisticians often write
σ
X −µ
Z∼
σ
X −µ
and refer to as a Z-score. As a result, in the preceding example we could
σ
have written
11.9 − 12 12.1 − 12
P (11.9 < X < 12.1) = P ( <Z< )
0.2 0.2
= P (−0.5 < Z < 0.5)
= P (Z < 0.5) − P (Z ≤ −0.5)
Z 0.5 Z −0.5
1 x2 1 x2
= √ e− 2 dx − √ e− 2 dx
−∞ 2π −∞ 2π
.
= 0.6915 − 0.3085
= 0.383.

Since there’s not a simple rule for the cumulative distribution function of the
standard normal random variable Z, statisticians typically use a table. The tabular
entries are for P (Z ≤ x) - for x between zero and 3.59 in increments of 0.01. Since
Z has a probability density function symmetric about x = 0 and more than 99.9%
of Z’s probability fall between -3.5 and 3.5, one has more than enough values in
the table to take care of business. A common notation related to the table is that
of zα . The value zα is defined for all positive α < 1 by
α = P (Z > zα ).
4.4. Normal 59

Standard Normal Probabilities of the Form P (Z ≤ x)

0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0.0 0.5000 0.5040 0.5080 0.5120 0.5160 0.5199 0.5239 0.5279 0.5319 0.5359
0.1 0.5398 0.5438 0.5478 0.5517 0.5557 0.5596 0.5636 0.5675 0.5714 0.5753
0.2 0.5793 0.5832 0.5871 0.5910 0.5948 0.5987 0.6026 0.6064 0.6103 0.6141
0.3 0.6179 0.6217 0.6255 0.6293 0.6331 0.6368 0.6406 0.6443 0.6480 0.6517
0.4 0.6554 0.6591 0.6628 0.6664 0.6700 0.6736 0.6772 0.6808 0.6844 0.6879
0.5 0.6915 0.6950 0.6985 0.7019 0.7054 0.7088 0.7123 0.7157 0.7190 0.7224
0.6 0.7257 0.7291 0.7324 0.7357 0.7389 0.7422 0.7454 0.7486 0.7517 0.7549
0.7 0.7580 0.7611 0.7642 0.7673 0.7704 0.7734 0.7764 0.7794 0.7823 0.7852
0.8 0.7881 0.7910 0.7939 0.7967 0.7995 0.8023 0.8051 0.8078 0.8106 0.8133
0.9 0.8159 0.8186 0.8212 0.8238 0.8264 0.8289 0.8315 0.8340 0.8365 0.8389
1.0 0.8413 0.8438 0.8461 0.8485 0.8508 0.8531 0.8554 0.8577 0.8599 0.8621
1.1 0.8643 0.8665 0.8686 0.8708 0.8729 0.8749 0.8770 0.8790 0.8810 0.8830
1.2 0.8849 0.8869 0.8888 0.8907 0.8925 0.8944 0.8962 0.8980 0.8997 0.9015
1.3 0.9032 0.9049 0.9066 0.9082 0.9099 0.9115 0.9131 0.9147 0.9162 0.9177
1.4 0.9192 0.9207 0.9222 0.9236 0.9251 0.9265 0.9279 0.9292 0.9306 0.9319
1.5 0.9332 0.9345 0.9357 0.9370 0.9382 0.9394 0.9406 0.9418 0.9429 0.9441
1.6 0.9452 0.9463 0.9474 0.9484 0.9495 0.9505 0.9515 0.9525 0.9535 0.9545
1.7 0.9554 0.9564 0.9573 0.9582 0.9591 0.9599 0.9608 0.9616 0.9625 0.9633
1.8 0.9641 0.9649 0.9656 0.9664 0.9671 0.9678 0.9686 0.9693 0.9699 0.9706
1.9 0.9713 0.9719 0.9726 0.9732 0.9738 0.9744 0.9750 0.9756 0.9761 0.9767
2.0 0.9772 0.9778 0.9783 0.9788 0.9793 0.9798 0.9803 0.9808 0.9812 0.9817
2.1 0.9821 0.9826 0.9830 0.9834 0.9838 0.9842 0.9846 0.9850 0.9854 0.9857
2.2 0.9861 0.9864 0.9868 0.9871 0.9875 0.9878 0.9881 0.9884 0.9887 0.9890
2.3 0.9893 0.9896 0.9898 0.9901 0.9904 0.9906 0.9909 0.9911 0.9913 0.9916
2.4 0.9918 0.9920 0.9922 0.9925 0.9927 0.9929 0.9931 0.9932 0.9934 0.9936
2.5 0.9938 0.9940 0.9941 0.9943 0.9945 0.9946 0.9948 0.9949 0.9951 0.9952
2.6 0.9953 0.9955 0.9956 0.9957 0.9959 0.9960 0.9961 0.9962 0.9963 0.9964
2.7 0.9965 0.9966 0.9967 0.9968 0.9969 0.9970 0.9971 0.9972 0.9973 0.9974
2.8 0.9974 0.9975 0.9976 0.9977 0.9977 0.9978 0.9979 0.9979 0.9980 0.9981
2.9 0.9981 0.9982 0.9982 0.9983 0.9984 0.9984 0.9985 0.9985 0.9986 0.9986
3.0 0.9987 0.9987 0.9987 0.9988 0.9988 0.9989 0.9989 0.9989 0.9990 0.9990
3.1 0.9990 0.9991 0.9991 0.9991 0.9992 0.9992 0.9992 0.9992 0.9993 0.9993
3.2 0.9993 0.9993 0.9994 0.9994 0.9994 0.9994 0.9994 0.9995 0.9995 0.9995
3.3 0.9995 0.9995 0.9995 0.9996 0.9996 0.9996 0.9996 0.9996 0.9996 0.9997
3.4 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9998
3.5 0.9998 0.9998 0.9998 0.9998 0.9998 0.9998 0.9998 0.9998 0.9998 0.9998
60 4. Widely Used Continuous Random Variables

Example 4.4. We look at some particular cases for using the standard normal
table.

a) P (Z < 1.92) = 0.9726

.
b) P (Z > 1.04) = 1 − P (Z ≤ 1.04) = 1 − 0.8508 = 0.1492
We used the complement rule P (A′ ) = 1 − P (A).

c)
P (−1 < Z < 2) = P (Z < 2) − P (Z ≤ −1)
.
= 0.9772 − P (Z ≥ 1)
= 0.9772 − (1 − P (Z < 1))
.
= 0.9772 − (1 − 0.8413)
= 0.8185

d) If X is normal with mean 1 and standard deviation 2, then


1−1 2.04 − 1
P (1 < X < 2.04) = P( <Z< )
2 2
= P (0 < Z < 0.52)
= P (Z < 0.52) − P (Z ≤ 0)
.
= 0.6985 − 0.5
= 0.1985

e) z.025 = 1.96 since P (Z ≤ 1.96) = 0.975 (so that P (Z > 1.96) = 0.025).

f) If IQ scores are normally distributed with µ = 100 and σ = 14.2, and a


“good” IQ score is considered to be in the top 20%, then we can calculate the
lowest “good” score. Let it be x. Then, with X representing a normal random
variable with mean 100 and standard deviation 14.2, we have that
x − 100
0.20 = P (X ≥ x) = P (Z ≥ ), so
14.2
x − 100 .
= z0.20 ⇒ x = 14.2z0.20 + 100 = 14.2(0.842) + 100 = 111.96 = 112.
14.2

Exercises

(1) Use the table to compute (a) P (Z < 1.32), (b) P (Z > 0.64), (c) P (Z ≤
−2.06), (d) P (0.32 < Z < 1.56), (e) P (−1.54 < Z < 1.54), and (f) P (Z = 0)
Ans. (a) 0.9066 (b) 0.2611 (c) 0.0197 (d) 0.3151 (e) 0.8764 (f ) 0
(2) Use the table to compute (a) z.025 , (b) z.05 , and (c) z.01
Ans. (a) 1.96 (b) about half way between 1.64 and 1.65 so approximately
1.645 (c) a bit closer to 2.33 than to 2.32 so about 2.327
4.4. Normal 61

(3) If X is normal with µ = 1.4 and σ = 2.0, compute (a) P (X < 2.1), (b)
P (X > 1.0), (c) P (0.6 < X < 2.2), and (d) the 90th percentile of X.
Ans. (a) 0.6368
(4) The mean for scores on a certain test is 300, and the standard deviation is 16.
Assuming the test scores are normally distributed, compute the probability
that a random score on this test is at least 310. What score would be in the
95th percentile?
Ans. 0.2660, 326.3
(5) If X is normal with µ = −5 and σ = 4.2, compute (a) P (X < −7), (b)
P (X > 0), and (c) P (20 < X < 25).
Ans. (a) The table says the answer is between 0.6808 and 0.6844. A
decent graphing calculator will tell you the more precise answer is 0.6830. (c)
0.0000
(6) If X is normal with mean µ and standard deviation σ, compute (a) P (µ − σ <
X < µ + σ), (b) P (µ − 2σ < X < µ + 2σ), and (c) P (µ − 3σ < X < µ + 3σ).
Ans. (a) 0.6827, (b) 0.9545, (c) 0.9973. Note that statisticians talk about a
an “Empirical Rule” which states that for normal distributions the probabilities
that values are within one, two, and three, standard deviations of the mean
are roughly 68%, 95%, and 99%, respectively.
(7) Compute to two decimal places the 90th, 95th, and 99th percentiles of the
standard normal.
Ans. 1.28, 1.64, 2.33
Chapter 5

Joint Probability Distributions

5.1. Discrete Case


Two discrete random variables X and Y can be paired as a single entity referred
to as a random vector or bivariate random variable. This bivariate random variable
has a joint probability mass function p(x, y) for which we give a definition and
properties here.

Joint Probability Mass Function. The joint probability mass function


p(x, y) for the pair of random variables X and Y is defined by
p(x, y) = P (X = x, Y = y).
Consequently,
XX it’s the case that p(x, y) ≥ 0 for all x and y and that
p(x, y) = 1. Moreover, for any event A in the sample space for
x y
X × Y we have that
X X
P ((X, Y ) ∈ A) = p(x, y).
(x,y)∈A

The joint probability distribution function F (x, y) for X and Y is given by

F (x, y) = P (X ≤ x, Y ≤ y),

so that the distribution function FX (x) for X is given by

FX (x) = lim F (x, y),


y→∞

and the distribution function for Y is

FY (y) = lim F (x, y).


x→∞

63
64 5. Joint Probability Distributions

The marginal probability mass function of X is given by


X
pX (x) = p(x, y),
y

where p(x, y) is the joint probability mass function for X and Y . Similarly, we have
the marginal probability mass function of Y given by
X
pY (y) = p(x, y).
x

Example 5.1. An experiment consists of randomly selecting two individuals from


a group of 10 consisting of three accountants, five engineers, and two administrative
assistants. Let X count the number of accountants selected and Y the number of
engineers selected. Find the joint and marginal probability mass functions.

We note that both X and Y can only take the values 0, 1, and 2. Thinking in
terms of pertinent hypergeometric distributions, one can compute the probabilities
in the table:

y
p(x, y) 0 1 2
1 10 10
0 45 45 45
6 15
x 1 45 45 0
3
2 45 0 0

For example, selecting 0 accountants and 0 engineers would be the same as


selecting two administrative assistants, so
(22 )(80 ) 1
p(0, 0) = = .
(102 ) 45
The number of ways to select exactly one accountant and exactly one engineer,
would be (31 )(51 )(20 ) = 15, so
15
p(1, 1) = .
45
1
The marginal probability mass function for X is given by fX (0) = 45 + 10 10
45 + 45 =
21 6 15 21 3 3
45 , fX (1) = 45 + 45 + 0 = 45 , and fX (2) = 45 + 0 + 0 = 45 .

1 6 3
The marginal probability mass function for Y is given by fY (0) = 45 + 45 + 45 =
10
45 , fY (1) = 10 15 25 10 10
45 + 45 + 0 = 45 , and fY (2) = 45 + 0 + 0 = 45 .

We continue the discussion of pairs of random variables with a definition for


their independence. The discrete random variables X and Y are said to be inde-
pendent if for all real numbers x and y,
F (x, y) = FX (x)FY (y).
An equivalent definition is that
p(x, y) = pX (x)pY (y).
5.1. Discrete Case 65

For our discrete pair of random variables X and Y , we can define the conditional
probability mass function of X given Y = y by
pX|Y (x|y) = P (X = x|Y = y)
P (X = x, Y = y)
=
P (Y = y)
p(x, y)
= .
pY (y)

Of course, it is assumed P (Y = y) > 0 in this definition. In a similar fashion we


have
p(x, y)
pY |X (y|x) =
pX (x)
when P (X = x) > 0. We calculate a conditional probability mass function for the
immediately preceding example.
Example 5.2. An experiment consists of randomly selecting two individuals from
a group of 10 consisting of three accountants, five engineers, and two administrative
assistants. Let X count the number of accountants selected and Y the number of
engineers selected. Find the conditional mass function of X given Y = 1.

We have
p(0, 1) 10/45 10 2
pX|Y (0|1) = = = = ,
pY (1) 25/45 25 5
p(1, 1) 15/45 15 3
pX|Y (1|1) = = = = , and
pY (1) 25/45 25 5
p(2, 1) 0
pX|Y (2|1) = = = 0.
pY (1) 25/45

Note that if X and Y are independent discrete random variables then


p(x, y)
pX|Y (x|y) =
pY (y)
pX (x) · pY (y)
=
pY (y)
= pX (x).

It’s even possible to have a conditional distribution for X given a function of the
random variables X and Y equals some constant. We see in our next example
that if X and Y are independent Poisson random variables, that the conditional
distribution of X given the sum of X and Y is a constant natural number becomes
a Binomial random variable.
Example 5.3. Suppose X and Y are independent Poisson random variables with
parameters λX and λY , respectively. Calculate the probability mass function for
X given X + Y = n
66 5. Joint Probability Distributions

We have
P (X = k, X + Y = n)
P (X = k|X + Y = n) =
P (X + Y = n)
P (X = k, Y = n − k)
=
P (X + Y = n)
P (X = k) · P (Y = n − k)
=
P (X + Y = n)
e−λX λkX e−λY λYn−k
= ·
k! (n − k)!
n
X e−λX λiX e−λY λYn−i
÷ ·
i=0
i! (n − i)!
λkX λYn−k
= e−(λX +λY )
k!(n − k)!
n
X λiX λYn−i
÷ e−(λX +λY )
i=0
i!(n − i)!
n
λkX λYn−k 1 X n!
= ÷ λi λn−i
k!(n − k)! n! i=0 i!(n − i)! X Y
λkX λYn−k 1
= ÷ (λX + λY )n
k!(n − k)! n!
n!  λ k  λ n−k
X Y
=
k!(n − k)! λX + λY λX + λY
n!  λX  k  λX n−k
= 1− .
k!(n − k)! λX + λY λX + λY
So the conditional distribution of X given X + Y = n is binomial with parameters
n and λXλ+λ
X
Y
.

Thus far we have been discussing joint probability distributions for two discrete
random variables. We can generalize the discussion to n discrete random variables.
When dealing with the n random variables X1 , X2 , ..., Xn , for example, the function
p(x1 , x2 , ..., xn ) = P (X1 = x1 , X − 2 = x2 , ..., Xn = xn )
serves as a probability mass function. We have p(x1 , x2 , ..., xn ) ≥ 0. Also
X X
··· p(x1 , x2 , ..., xn ) = 1
x1 xn

and X X
P ((X1 , ..., Xn ) ∈ A) = ··· p(x1 , x2 , ..., xn ).
(x1 ,...,xn )∈A

We examine a particular discrete joint distribution called the multinomial. In


this distribution one encounters an intricate counting procedure that we examine
5.1. Discrete Case 67

first. Suppose n is a positive integer and that the r nonnegative integers n1 , ..., nr ,
are such that
n1 + · · · + nr = n.
 
n
Then the multinomial coefficient is defined by
n1 , ..., nr
 
n n!
= .
n1 , ..., nr n1 !n2 ! · · · nr !
We list some quantities that this coefficient counts:

• The number of ways you can split n distinct objects into r distinct groups of
sizes n1 , n2 , . . . , and nr , respectively.
• The number of n-letter words made up of r distinct letters used n1 , n2 , . . .
, and nr , times, respectively
• The coefficient on xn1 1 · · · xnr r in the expansion of (x1 + · · · + xr )n

For example, if you roll a die eight times, the number of ways you can obtain
one 1, one 2, zero 3’s, zero 4’s, two 5’s, and four 6’s, is
 
8 8!
= = 840.
1, 1, 0, 0, 2, 4 1!1!0!0!2!4!

Another example is to count how many words (including nonsensical ones) you
can make from two a’s, one c, three t’s, and one z. The answer is
 
7 7!
= = 420.
2, 1, 3, 1 2!1!3!1!
A couple of these 420 words are aactttz and acatttz.

Yet another example is that the coefficient on x2 y 2 z in the expansion of (x +


y + z)5 is
 
5 5!
= = 30.
2, 2, 1 2!2!1!
By inspection the reader can see that the coefficient on x5 in this expansion is 1.
The multinomial coefficient gives us just this:
 
5 5!
= = 1.
5, 0, 0 5!0!0!

Now we consider the multinomial random vector.


68 5. Joint Probability Distributions

Multinomial Distribution. For a sequence of n independent, identical


experiments with each one of the experiments resulting in r outcomes with
probabilities p1 , p2 , ... , and pr , respectively, where p1 + · · · + pr = 1, we
let Xi count the number of the n experiments that result in the ith of the
r outcomes. Then
 
n
P (X1 = n1 , X2 = n2 , ..., Xn = nr ) = pn1 pn2 · · · pnr r ,
n1 , n2 , ..., nr 1 2
where n1 + · · · + nr = n.

Example 5.4. Roll a die eight times. What’s the probability (a) you obtain one
1, one 2, zero 3’s, zero 4’s, two 5’s, and four 6’s and (b) you obtain one 1, one 2,
one 3, one 4, two 5’s, and two 6’s?

(a)
 
8
(1/6)1 (1/6)1 (1/6)0 (1/6)0 (1/6)2 (1/6)4
1, 1, 0, 0, 2, 4
or
8! 840
(1/6)8 = = 0.0005.
1!1!0!0!2!4! 1, 679, 616
(b)
 
8
(1/6)1 (1/6)1 (1/6)1 (1/6)1 (1/6)2 (1/6)2
1, 1, 1, 1, 2, 2
or
8! 10, 080
(1/6)8 = = 0.0060.
1!1!1!1!2!2! 1, 679, 616

Exercises

(1) Suppose X and Y are discrete random variables with probability mass function
as given in the table.
Y
p(x, y) 2 4
X 0 0.30 0.20
6 0.40 0.10
Compute (a) P (X + Y < 5), (b) P (Y > X), and (c) E(X)
Ans.: (a) 0.50, (b) 0.50, and (c) 3
(2) Suppose X and Y are discrete random variables with probability mass function
as given in the table.
Y
p(x, y) 3 5 10
X 4 0.25 0.10 0.05
6 0.20 0.02 0.38
5.2. Continuous Case 69

Find rules for the two marginal probability mass functions and compute
(a) P (Y ≥ 5), (b) P (X + Y ≤ 9), (c) E(X), and (d) E(Y ).
Ans.:

 0.40 if x = 4
pX (x) = 0.60 if x = 6
0 otherwise



 0.45 if y = 3
0.12 if y = 5

pY (y) =

 0.43 if y = 10
0 otherwise

P (Y ≥ 5) = 0.55, P (X + Y ≤ 9) = 0.55, E(X) = 5.2, and E(Y ) = 6.25


(3) Determine if the random variables in the preceding problem are independent.
Ans.: They are not.
(4) Roll a die four times. What’s the probability (a) you obtain one 1, one 2, two
6’s and (b) you obtain one 1, one 2, one 3, and one 4?
Ans.: (a) 0.0093 (b) 0.0185
(5) Deal a card from a well shuffled deck of 52. Replace it, shuffle and deal another.
Repeat this process until you’ve dealt five cards. What’s the probability you
deal one spade, two diamonds, and two hearts? [Note that you’re not dealing
with a hypergeometric distribution since you’re replacing dealt cards.]
Ans.: 0.0293
(6) Rogelio, Jesús, and Diego, are playing dominoes. If it’s the case that Rogelio
wins 50% of the time, Jesús 40% of the time, and Diego 10% of the time,
what’s the probability that in the next five games, Rogelio wins twice, Jesús
twice, and Diego once?
Ans.: 0.12
(7) In the United States, 45% of the population has Type O blood, 40% Type
A, 11% Type B, and four percent Type AB. If six US residents are randomly
selected, what’s the probability three would have Type O, two type A, and
one Type B?
Ans.: 0.0962

5.2. Continuous Case


Two continuous random variables X and Y that are paired have a joint probability
density function f (x, y). We list properties.
70 5. Joint Probability Distributions

Joint Probability Density Function. The joint probability density


function f (x, y) for the pair of random variables X and Y has the properties
f (x, y) ≥ 0
for all x and y, Z ∞ Z ∞
f (x, y)dxdy = 1,
−∞ −∞
and Z Z
P ((X, Y ) ∈ A) = f (x, y)dxdy.
A

The joint probability distribution function F (x, y) for X and Y is given by

F (x, y) = P (X ≤ x, Y ≤ y).

As was the case with a pair of discrete random variables, here with continuous ran-
dom variables, the distribution function FX (x) is given by FX (x) = limy→∞ F (x, y).
Similarly, we have that the distribution function for Y is given by FY (y) = limx→∞ F (x, y).
The marginal probability density function of X is given by

Z ∞
fX (x) = f (x, y)dy,
−∞

where f (x, y) is the joint probability density function for X and Y . Similarly, the
marginal probability density function of Y is given by

Z ∞
fY (y) = f (x, y)dx.
−∞

We provide an example.

Example 5.5. Suppose X and Y are continuous random variables with joint prob-
ability density function

1 1

12 x + 24 y if 0 ≤ x ≤ 3 and 0 ≤ y ≤ 2
f (x, y) =
0 otherwise

Compute then (a) P (X ≤ 2 and Y ≥ 1), (b) P (X + Y ≤ 1), (c) a rule for fX (x),
(d) a rule for fY (y), and (e) E(X).
5.2. Continuous Case 71

(a)
2 2
1 1
Z Z
P (X ≤ 2, Y ≥ 1)) = ( x + y) dydx
0 1 12 24
2
1 1
Z
= ( xy + y 2 )|21 dx
0 12 48
2
1 1 1 1
Z
= ( x+ − ( x + )) dx
0 6 12 12 48
2
1 3
Z
= x + )dx (
0 12 48
1 3
= ( x2 + x)|20
24 48
7
= .
24
(b)
1 1−x
1 1
Z Z
P (X + Y ≤ 1)) = ( x + y) dydx
0 0 12 24
1
1 1
Z
= ( xy + y 2 )|1−x
0 dx
0 12 48
1
1 1
Z
= ( (x − x2 ) + (1 − 2x + x2 )) dx
0 12 48
1
3 2 1 1
Z
= (− x + x + )dx
0 48 24 48
1
= .
48
(c)
2
1 1
Z
fX (x) = x + y) dy (
0 12 24
1 1
= ( xy + y 2 )|20
12 48
1 1
= x+ ,
6 12
so  1 1
6 x + 12 if 0 ≤ x ≤ 3
fX (x) =
0 otherwise
(d)
3
1 1
Z
fY (y) = (
x + y) dx
0 12 24
1 2 1
= ( x + xy)|30
24 24
3 1
= + y,
8 8
72 5. Joint Probability Distributions

so
1 3

8y + 8 if 0 ≤ y ≤ 2
fY (y) =
0 otherwise
(e)
3
1 1
Z
E(X) = x( x + ) dx
0 6 12
3
1 1
Z
= ( x2 + x) dx
0 6 12
1 3 1 2 3
= ( x + x )|0
18 24
27 9
= +
18 24
15
= .
8

As was the case with pairs of discrete random variables, we can discuss con-
ditional probability density functions associated with a joint probability density
function for a pair of continuous random variables. The conditional probability
density function of X given Y = y is given by
f (x, y)
fX|Y (x|y) = ,
fY (y)
where f (x, y) is the joint probability density function for the continuous rnadom
variables X and Y . Here, we’re assuming y is a value for which fY (y) > 0.

Example 5.6. For the joint probability density function in the previous example,
take y to be a value in [0, 2], and compute fX|Y (x|y).

We have
1 1
12 x + 24 y 2x + y
fX|Y (x|y) = 1 3 = ,
8 y + 8
3y + 9
if 0 ≤ x ≤ 3. In the particular case that y = 13 , we have
1 2x + 1 1 1
fX|Y (x| ) = 1 3 = x + ,
3 3( 3 ) + 9 5 30
if 0 ≤ x ≤ 3.

As is the case when X and Y are discrete, The continuous random variables X
and Y are said to be independent if for all real numbers x and y,
F (x, y) = FX (x)FY (y).
An equivalent definition is that
f (x, y) = fX (x)fY (y),
where the f ’s are probability density functions.
5.2. Continuous Case 73

We can generalize the discussion to n continuous random variables instead of


just two. When dealing with the n continuous random variables X1 , X2 , ..., Xn , the
function f (x1 , x2 , ..., xn ) might serve as the probability density function, and we
would have that f (x1 , x2 , ..., xn ) ≥ 0. Also,
Z ∞ Z ∞
··· f (x1 , x2 , ..., xn ) = 1
−∞ −∞
and Z Z
P ((X1 , ..., Xn ) ∈ A) = ··· f (x1 , x2 , ..., xn ).
(x1 ,...,xn )∈A

Exercises

(1) Suppose X and Y are continuous random variables with joint probability
density function
 1
6 if 0 ≤ x ≤ 3 and 0 ≤ y ≤ 2
f (x, y) =
0 otherwise
Compute the probabilities (a) P (X ≤ 2), (b) P (X + Y ≤ 1) and (c) P (Y ≥
1 2
9 X ).
Ans. (a) 32 , (b) 12
1
, (c) 65
(2) Suppose X and Y are continuous random variables with joint probability
density function
c(x3 + y) if 0 ≤ x ≤ 1 and 0 ≤ y ≤ 2

f (x, y) =
0 otherwise
Compute the value of c, find rules for the two marginal probability density
functions, and compute the probabilities P (Y ≤ 1) and P (X < Y ).
Ans.: c = 2/5,
 4 3
fX (x) = 5 (x + 1) if 0 ≤ x ≤ 1
0 otherwise
 2 1
5 y + 10 if 0 ≤ y ≤ 2
fY (y) =
0 otherwise
64
P (Y ≤ 1) = 0.3, and P (X < Y ) = 75 .
(3) Determine if the random variables in the preceding problem are independent.
Ans.: They are not.
(4) Suppose the continuous random variables X and Y have the joint probability
density function
ab e−ax−by if x ≥ 0 and y ≥ 0

f (x, y) =
0 otherwise
Here a and b are positive constants. This can model the lifetimes of two
components, the first - X - having mean life a1 and the second - Y - having
mean life 1b . Compute (a) the probability that both last past time t, i.e.
P (X > t, Y > t) and (b) that the first lasts longer than the second, i.e.
P (X > Y ).
74 5. Joint Probability Distributions

b
Ans.: (a) e−(a+b)t (b) a+b
(5) Determine if the random variables in the preceding problem are independent.
Ans.: They are. Note
Z sZ t Z s Z t
F (s, t) = abe−ax−by dydx = ae−ax dx be−by dy = FX (s)FY (t)
0 0 0 0

(6) Suppose the continuous random variables X and Y have the joint probability
density function

cye−2x−y

if x ≥ 0 and y ≥ 0
f (x, y) =
0 otherwise
Compute (a) P (X < 1, Y < 1), (b) P (X > 2), (c) P (X < Y ), and (d)
P (X + Y < 1).
Ans.: c = 2, P (X < 1, Y < 1) = 1 − 2e − e12 + e23 = 0.2285, P (X > 2) =
e , P (X < Y ) = 89 , and P (X + Y < 1) = 1 − 2e−1 − e−2 = 0.1289.
−4

(7) Suppose X and Y are independent exponential random variables with means
µX = 4.0 and µY = 5.0. Compute the probability P (X ≥ 5 and Y ≥ 5).
Ans. P (X ≥ 5, Y ≥ 5) = 0.1059
(8) The random vector (X, Y ) has probability density function
c(x3 + 2xy) if 0 ≤ y ≤ x and 0 ≤ x ≤ 2

f (x, y) =
0 otherwise
Compute (a) the value of c, (b) P (X > 2Y ), (c) E(X), and (d) E(Y ).
5
Ans. (a) 52 , (b) 21 64 12
52 , (c) 39 , (d) 13

5.3. Functions of Joint Random Variables.


Suppose X and Y are discrete random variables on the same sample space and that
the real valued, two variable function g(x, y) is defined on the range of (X, Y ). We
consider g(X, Y ) as a random variable say W :
W = g(X, Y ).
If pX,Y (x, y) is the joint probability mass function for X and Y , the the probability
mass function for W is given by
pW (w) = P (g(X, Y ) = w)
= P ((X, Y ) ∈ g −1 (w))
X X
= pX,Y (x, y)
(x,y)∈g−1 (w)
X X
= pX,Y (x, y).
g(x,y)=w

If X and Y happen to be independent, then p(x, y) = pX (x) · pY (y) so that


X
pW (w) = pX (x) · pY (y).
g(x,y)=w
5.3. Functions of Joint Random Variables. 75

Example 5.7. Suppose that X and Y are independent binomial random variables
with the same probability of success p for each trial - say X is binomial with
parameters m and p and Y parameters n and p. Then for g(x, y) = x + y, find a
rule for W = g(X, Y ).

We have X + Y taking the values k = 0, 1, 2, ..., m + n according to the rule


X X
pX+Y (k) = pX (x) · pY (y)
x+y=k
k
X
= pX (r) · pY (k − r),
r=0

where pX (r) = 0 if r > m and pY (k − r) = 0 if k − r > n. Thus,


k
X
pX+Y (k) = (m r
r )p (1 − p)
m−r n
· (k−r )pk−r (1 − p)n−(k−r)
r=0
k
X
= (m n k
r )(k−r )p (1 − p)
m+n−k
,
r=0

again where where (m


r )
n
= 0 if r > m and (k−r ) = 0 if k − r > n. Now consider a
situation where you have m blue balls and n red balls in an urn. The number of
ways you could select k of the balls from the urn is (m+nk ). Note that you could
categorize these ways according to the number of blue and red balls, making it
apparent that these number of ways is the number of ways you could select 0 blue
balls and k red balls plus the number of ways you could select 1 blue ball and k − 1
red balls plus the number of ways you could select 2 blue balls and k − 2 red balls
and so on up to the number of ways you could select k blue balls and 0 red balls.
I.e.
k
X
(m+n
k ) = (m n
r )(k−r ).
r=0
Consequently,
pX+Y (k) = (m+n
k )pk (1 − p)m+n−k
I.e. X + Y is binomial with parameters m + n and p.

Suppose now that X and Y are continuous random variables with joint prob-
ability density function fX,Y (x, y) and that g(x, y) is a real valued two variable
function that yields the continuous random variable W = g(X, Y ). To obtain the
probability density function of W , we first calculate the cumulative distribution
function of W and then take its derivative to get the probability density function.
For example, if we let g(x, y) = x + y so that W = X + Y , we have
Z Z
FX+Y (w) = fX,Y (x, y)dydx
x+y≤w
Z ∞ Z w−x
= fX,Y (x, y)dydx.
−∞ −∞
76 5. Joint Probability Distributions

For the integral with respect to y, we make the substitution u = y + x, yielding


dy = du and the limits of integration u = −∞ when y = −∞ and u = w when
y = w − x. Then we have
Z ∞Z w
FX+Y (w) = fX,Y (x, u − x)dudx
−∞ −∞
Z w Z ∞
= fX,Y (x, u − x)dxdu.
−∞ −∞

From the Fundamental Theorem of the Calculus,we therefore have


Z w Z ∞
d
fX+Y (w) = fX,Y (x, u − x)dxdu
dw −∞ −∞
Z ∞
= fX,Y (x, w − x)dx.
−∞

Example 5.8. Suppose X and Y are independent standard normal random vari-
ables. Calculate the probability density function of X + Y .

Z ∞
FX+Y (w) = fX,Y (x, w − x)dx
−∞
Z ∞
= fX (x) · fY (w − x)dx
−∞
Z ∞
1 x2 1 (w−x)2
= √ e− 2 · √ e− 2 dx
−∞ 2π 2π
Z ∞
1 2 w2 w2
= e−(x −wx+ 4 )− 4 dx
2π −∞
1 − w2 ∞ −(x− w )2
Z
= e 4 e 2 dx
2π −∞
(x− w )2
√ 1 1 − w2 ∞
Z
1 −
2( √
2
1 )2
= ( 2π · √ ) · e 4 √ ·e 2 dx
2 2π −∞ 2π · √1
2
1 − √w2
= √ √ e 2( 2)2 · 1.
2π · 2
The integral does in fact have a value of 1 since the integrand is the pdf for a
normal distribution. The reader will recognize the pdf for X√ + Y as that of a
normal random variable with mean 0 and standard deviation 2.

Now we consider the pair of real valued, two variable functions g(x, y) and
h(x, y) defined on the range of (X, Y ). Assume that their first partial derivatives
are continuous on this range. Assume also that the transformation defined by
u = g(x, y) and v = h(x, y) is one-to-one. Then, defining the two random variables
U and V by U = g(X, Y ) and V = h(X, Y ), according to the change of variables
5.4. Expected Value, Variance, and Covariance 77

formula from multidimensional calculus, the joint probability density function for
U and V is given by
1
fU,V (u, v) = fX,Y (x, y),
|J(x, y)|
where (x, y) is the point in the range of (X, Y ) for which g(x, y) = u and h(x, y) = v,
and J(x, y) is the Jacobian
∂g ∂g
∂x ∂y
J(x, y) = ∂h ∂h
∂x ∂y

There are no exercises for this section. We put the theory of functions of joint
probability distributions to work in the next section to compute expected values.

5.4. Expected Value, Variance, and Covariance


For the single random variable X - whether it be discrete or continuous - we defined
the expected value µ = E(X) in a previous chapter. We saw also that for the
constants a and b, E(aX + b) = aE(X) + b. We also defined the variance σ 2 =
V (X) = E(X − µ)2 and noted that V (aX + b) = a2 V (X).
Here, we consider a pair of random variables X and Y (discrete to start with)
and a real valued, two variable function g(x, y). As we did in the single variable
case, we can establish that
XX
E(g(X, Y )) = g(x, y)pX,Y (x, y).
x y

If X and Y are continuous random variables, then


Z ∞ Z ∞
E(g(X, Y )) = g(x, y)fX,Y (x, y)dydx.
−∞ −∞

We note that if the two random variables X and Y are discrete and a and b
are constants,
XX
E(aX + bY ) = (ax + by)pX,Y (x, y)
x y
XX XX
= a x · pX,Y (x, y) + b y · pX,Y (x, y)
x y x y
= aE(X) + bE(Y ).
A similar derivation in the continuous case yields the same formula.
To compute the variance of X + Y in both the discrete and continuous cases,
we note
V (X + Y ) = E(X + Y − (µX + µY ))2
= E((X − µX ) + (Y − µY ))2
= E((X − µX )2 + 2(X − µX )(Y − µY ) + (Y − µY )2 )
= E(X − µX )2 + 2E(X − µX )(Y − µY ) + E(Y − µY )2
= V (X) + 2E(X − µX )(Y − µY ) + V (Y ).
78 5. Joint Probability Distributions

The middle term here sans the factor of 2 is referred to as the covariance of X and
Y , labeled Cov(X, Y ). So we have

V (X + Y ) = V (X) + V (Y ) + Cov(X, Y ).

When one considers how the covariance is defined -


XX
(x − µX )(y − µY )pX,Y (x, y)
x y

in the discrete case and


Z ∞Z ∞
(x − µX )(y − µY )fX,Y (x, y)dydx
−∞ −∞

in the continuous - it becomes apparent that it provides a measure of how X and


Y are related. If large values of X go with large values of Y and small values of
X with small values of Y with a preponderance of probability, then the product
(x − µX )(y − µY ) is most likely positive because both factors are positive (large
X with large Y ) or both are negative (small X with small Y ). This leads to the
covariance being positive with greater and greater magnitude as the probability
concentrates more and more along the pattern large X with large Y and small X
with small Y . If, on the other hand, small values of X go with large values of Y
and large values of X with small values of Y for the majority of the probability,
then the covariance is negative as the two factors of (x − µX )(y − µY ) are opposite
in sign over most of the probability. This negative value has greater magnitude as
the probability that the pattern of small X with large Y and large X with small Y
is greater.

We can establish a shortcut formula for the covariance in much the same way
we did for the variance of a single random variable. We have

Cov(X, Y ) = E(X − µX )(Y − µY )


= E(XY − µY X − µX Y + µX µY )
= E(XY ) − µY E(X) − µX E(Y ) + µX µY )
= E(XY ) − µX µY .

So the shortcut formula is as stated.

Covariance. The covariance of X and Y is defined by


Cov(X, Y ) = E(X − E(X) )(Y − E(Y ) ).
It can be computed via the shortcut formula
Cov(X, Y ) = E(XY ) − E(X)E(Y ).
5.4. Expected Value, Variance, and Covariance 79

Note that if X and Y are independent, then (in the discrete case)

XX
E(XY ) = xy · pX,Y (x, y)
x y
XX
= xy · pX (x) · pY (y)
x y
X X
= x · pX (x) y · pY (y)
x y
= E(X)E(Y ).

The same formula can be established in a similar way in the continuous case. Hence,
if X and Y are independent,

Cov(X, Y ) = E(XY ) − E(X)E(Y ) = E(X)E(Y ) − E(X)E(Y ) = 0.

Mathematicians - especially Karl Pearson - eventually discovered that this covari-


ance scaled so that it only takes values from −1 to 1 can provide a great deal of
standardized information about the degree to which X and Y are linearly related.
They developed the so called correlation coefficient ρ defined by

Cov(X, Y )
ρ= .
σX σY

The reader will note that when X and Y are independent - i.e. they have no
relationship - ρ = 0 because the covariance is zero. In the case that Y is a linear
function of X, say Y = aX + b with a and b being constants and a 6= 0, we have

E(X(aX + b)) − E(X)E(aX + b)


ρ =
σX σaX+b
E(aX 2 + bX) − E(X)(aE(X) + b)
=
σX · |a|σX
aE(X 2 ) + bE(X) − a(E(X))2 − bE(X)
= 2
|a|σX
a(E(X 2 ) − (E(X))2 )
= 2
|a|σX
2
aσX
= 2
|a|σX
= ±1.

So we see that ρ = 1 if a > 0 and ρ = −1 if a < 0.


80 5. Joint Probability Distributions

Finally we note that ρ only takes values in the interval [−1, 1]. This can be
verified without too much difficulty. Note that
X − µX Y − µY
0 ≤ V( + )
σX σY
1 2 1 2 X − µX Y − µY
= ( ) V (X) + ( ) V (Y ) + 2 · Cov( , )
σX σY σX σY
X − µX Y − µY
= 1 + 1 + 2E( − 0)( − 0)
σX σY
1
= 2 + 2( )E(X − µX )(Y − µY )
σX σY
= 2(1 + ρ).
So 1 + ρ ≥ 0 which makes ρ ≥ −1. Similarly,
X − µX Y − µY
0 ≤ V( − )
σX σY
X − µX Y − µY
= 1 + 1 − 2 · Cov( , )
σX σY
= 2 − 2ρ
= 2(1 − ρ).
We therefore have that 1 − ρ ≥ 0 so that ρ ≤ 1. We summarize.

Correlation Coefficient. The correlation coefficient ρ given by


Cov(X, Y )
ρ=
σX σY
falls in the interval [−1, 1]. If Y = aX + b, then ρ = 1 for positive a and ρ =
−1 for negative a. If X and Y are independent random variables, then ρ =
0. Generally speaking, statisticians say X and Y have a linear relationship
if ρ is close to 1 or −1. They’re said to have no linear relationship if ρ is
close to 0.

Example 5.9. In a previous section we entertained an example in which X and Y


are continuous random variables with joint probability density function
 1 1
f (x, y) = 12 x + 24 y if 0 ≤ x ≤ 3 and 0 ≤ y ≤ 2
0 otherwise
Compute ρ for this pair of random values X and Y .

In that previous example, we determined that


 1 1
6 x + 12 if 0 ≤ x ≤ 3
fX (x) =
0 otherwise
and
1 3

8y + 8 if 0 ≤ y ≤ 2
fY (y) =
0 otherwise
5.4. Expected Value, Variance, and Covariance 81

In addition, we computed E(X) = 15 8 . We now calculate the remainder of the


components needed to determine the value of ρ. First we note

3
1 1
Z
2
E(X ) = x2 ( x + ) dx
0 6 12
3
1 1
Z
= ( x3 + x2 ) dx
0 6 12
1 4 1 3 3
= ( x + x )|0
24 36
81 27
= +
24 36
27 3
= +
8 4
33
= .
8

Next

2
1 3
Z
E(Y ) = y( y + ) dy
0 8 8
2
1 3
Z
= ( y 2 + y) dy
0 8 8
1 3 3 2 2
= ( y + y )|0
24 16
1 3
= +
3 4
4 9
= +
12 12
13
= ,
12

and

2
1 3
Z
E(Y 2 ) = y 2 ( y + ) dy
0 8 8
2
1 3
Z
= ( y 3 + y 2 ) dy
0 8 8
1 4 1 3 2
= ( y + y )|0
32 8
1
= +1
2
3
= .
2
82 5. Joint Probability Distributions

Finally
3 2
1 1
Z Z
E(XY ) = xy( x + y) dydx
0 0 12 24
3 2
1 2 1
Z Z
= ( x y + xy 2 ) dydx
0 0 12 24
3
1 2 2 1
Z
= ( x y + xy 3 )|20 dx
0 24 72
3
1 1
Z
= ( x2 + x) dx
0 6 9
1 1
= ( x3 + x2 )|30 dx
18 18
3 1
= +
2 2
= 2.
This gives us
15 13 1
Cov(X, Y ) = 2 − ( )( ) = − ,
8 12 32
2 33 15 39
σX = − ( )2 = ,
8 8 64
and
3 13 47
σY2 = − ( )2 = .
2 12 144
Hence,
1
− 32 −8 · 12 −3
ρ= q q = √ =√ = −0.07007.
39 47 32 39 · 47 1833
64 · 144

To close the section, note that if we take the random variables X1 , ..., Xn , on
the sample space, they have a joint pmf (if they’re all discrete) or a joint pdf (if
they’re all continuous). In either case, we can expand the formulas for the expected
value and variance of linear combinations of the Xi ’s and obtain
E(a1 X1 + · · · + an Xn ) = a1 E(X1 ) + · · · an E(Xn )
and
n
X XX
V (X1 + · · · + Xn ) = V (Xk ) + 2 Cov(Xk , Xj ).
k=1 k<j
If it’s the case that the Xi s are independent, then the second formula becomes
n
X
V (X1 + · · · + Xn ) = V (Xk ).
k=1

Exercises

(1) Compute the correlation coefficient ρ for the pair of discrete random variables
X and Y with probability mass function as given in the table.
5.4. Expected Value, Variance, and Covariance 83

Y
p(x, y) 2 4
X 0 0.30 0.20
6 0.40 0.10
Ans.: −0.2182
(2) Compute the correlation coefficient ρ for the pair of discrete random variables
X and Y with probability mass function as given in the table.

y
p(x, y) 0 1 2
1 10 10
0 45 45 45
6 15
x 1 45 45 0
3
2 45 0 0
27
Ans.: E(X) = 45 , E(X 2 ) = 33 2
45 , E(Y ) = 1, E(Y ) =
65
45 , and E(XY ) =
15 27
15 √ 45 − 45
45 , so ρ = = −0.5855
(0.37333)(0.5555555)

(3) Compute the correlation coefficient ρ for the pair of discrete random variables
X and Y with probability mass function as given in the table.
Y
p(x, y) 3 5 10
X 4 0.25 0.10 0.05
6 0.20 0.02 0.38
Ans.: E(X) = 5.2, E(X 2 ) = 28, E(Y ) = 6.25, E(Y 2 ) = 50.05, and
34−(5.2)(28) 1.5
E(XY ) = 34, so ρ = √ 2 2
=√ = 0.4527
(28−5.2 )(50.05−6.25 ) (.96)(11.4375)

(4) Suppose the continuous random variables X and Y have the joint probability
density function
2ye−2x−y if x ≥ 0 and y ≥ 0

f (x, y) =
0 otherwise
Compute ρ.
Ans.: ρ = 0. Note that
Z ∞
−2x
fX (x) = 2e ye−y dy = 2e−2x ,
0
Z ∞
−y
fY (y) = ye 2e−2x dx = ye−y ,
0
and
Z ∞ Z ∞ Z ∞ Z ∞
xy · 2ye−2x−y dydx = x · 2e−2x dx · y · ye−y dy,
0 0 0 0
so
E(XY ) = E(X)E(Y ).
Chapter 6

Sampling and Limit Theorems

6.1. Sample Mean and Variance


Statisticians draw conclusions about the overall nature of populations by analyzing
samples. Suppose the random variable X represents a random selection of a value
from a population that has mean µ and standard deviation σ. We can estimate
E(X) and SD(X) by taking a random sample of values x1 , x2 , x3 , ..., xn from the
population. The sample statistics

n
1X
x̄ = xi
n i=1

and

sP
n
− x̄)2
i=1 (xi
s=
n−1

are used to estimate E(X) and SD(X), respectively. These two statistics are called
the sample mean and the sample standard deviation.
Before we take a random sample of n values from the distribution, we are uncer-
tain as to what the ith of the n values will be. We assign the random variable Xi to
the ith selection in the random sample and note that E(Xi ) = µ and SD(Xi ) = σ.
We then define the random variables X̄ and S 2 as the sample mean and variance,
respectively.

85
86 6. Sampling and Limit Theorems

Sample Mean and Variance. The sample mean X̄ and sample variance
S 2 are given by by
n
1X
X̄ = Xi
n i=1
and
n
1 X
S2 = (Xi − X̄)2 .
n − 1 i=1

We note that
n
1X
E(X̄) = E( Xi )
n i=1
n
1 X
= E( Xi )
n i=1
n
1X
= E(Xi )
n i=1
1
= · nµ
n
= µ.

Another computation yields E(S 2 ) = σ 2 . Note that

n
X n
X
(Xi − X̄)2 = (Xi2 − 2X̄Xi + (X̄)2 )
i=1 i=1
n
X n
X n
X
= (Xi2 ) − 2X̄ Xi + (X̄)2
i=1 i=1 i=1
n
X
= (Xi2 ) − 2X̄ · nX̄ + n(X̄)2 )
i=1
n
X
= (Xi2 ) − n(X̄)2
i=1
n n
X 1X
= (Xi2 ) − n( Xi )2
i=1
n i=1
n n
X 1 X
= (Xi2 ) − ( Xi )2 .
i=1
n i=1

Note also that for any random variable X, V (X) = E(X 2 ) − (E(X))2 so that

E(X 2 ) = V (X) + (E(X))2 .


6.1. Sample Mean and Variance 87

Consequently,
n n
 1 X 2 1 X 
E(S 2 ) = E (Xi ) − ( Xi )2
n − 1 i=1 n i=1
n n
1 X 1 X 
= E(Xi2 ) − E( Xi )2 )
n − 1 i=1 n i=1
n n n
1 X 2 1 X X 
= (σ + µ2 ) − [V ( Xi ) + (E( Xi ))2 ]
n − 1 i=1 n i=1 i=1
1  1 
= n(σ 2 + µ2 ) − [nσ 2 + (nµ)2 ]
n−1 n
1  2 
= nσ + nµ − σ 2 − nµ2
2
n−1
= σ2 .
The random variables X̄ and S 2 are said to be unbiased estimators of µ and σ 2
since E(X̄) = µ and E(S 2 ) = σ 2 .

Now continuing to note that the Xi ’s we take as random readings from the
population with mean µ and standard deviation σ are independent, we can make
use of the result that says the variance of a sum of independent random variables
is the sum of their variances to obtain
n
1X
V (X̄) = V ( Xi )
n i=1
 1 2 Xn
= V( Xi )
n i=1
n
1 X
= V (Xi )
n2 i=1
n
1 X 2
= σ
n2 i=1
1
= · nσ 2
n2
σ2
= .
n
σ
Consequently, SD(X̄) = √ . As one might expect, the standard deviation of the
n
sample mean X̄ decreases as the sample size gets larger.

Exercises

(1) A random sample of size 100 is taken from a population with mean µ = 90.0
and standard deviation σ = 14.2. Compute the mean and standard deviation
of the sample mean.
Ans. E(X̄) = 90 and SD(X̄) = 1.42
88 6. Sampling and Limit Theorems

(2) A random sample of size 30 is taken from a population with mean µ = 2.6 and
standard deviation σ = 0.125. Compute the mean and standard deviation of
the sample mean.
Ans. E(X̄) = 2.6 and SD(X̄) = 0.0228
(3) A population has standard deviation σ = 8.5. What’s the minimum size of a
sample from this population that will produce a sample mean with a standard
deviation (a) of less than 1 and (b) of less than 0.5?
Ans.: (a) 73 (b) 289

6.2. Law of Large Numbers

The Weak Law of Large Numbers is a fundamental result in probability . We


derive it by making use of Chebyshev’s Inequality - a formula of importance in and
of itself.

Chebyshev’s Inequality.
Suppose k is a positive constant and X is a random variable with finite
mean µ and standard deviation σ. Then
σ2
P (|X − µ| ≥ k) ≤ .
k2

Proof: (For the continuous case; the discrete case is similar) Suppose the con-
tinuous random variable X has pdf f (x), mean µ, and standard deviation σ. Then
we have that
Z ∞
σ2 = (x − µ)2 f (x)dx
−∞
Z µ−k Z ∞
≥ (x − µ)2 f (x)dx + (x − µ)2 f (x)dx
−∞ µ+k
Z µ−k Z ∞
2
≥ k f (x)dx + k 2 f (x)dx
−∞ µ+k
Z µ−k Z ∞ 
= k2 f (x)dx + f (x)dx
−∞ µ+k
2
= k P (X ≤ µ − k or X ≥ µ + k)
 
= k 2 P |X − µ| ≥ k . 

Example 6.1. Scores on a certain standardized exam are 71.1 on average with a
standard deviation of 12.1. Find a probability bound for a random score for this
exam to be within 15 points of the mean.
6.2. Law of Large Numbers 89

P (|X − 71.1| < 15) = 1 − P (|X − 71.1| ≥ 15)


12.12
≥ 1−
152
= 0.349.
The probability the score will be from 56.1 to 86.1 is at least 34%.

We now formulate the Law of Large Numbers.

The Weak Law of Large Numbers.


Suppose X1 , X2 , ... are independent and identically distributed random vari-
ables with finite mean µ. Then for any ǫ > 0,
 X + ···X 
1 n
lim P | − µ| ≥ ǫ = 0.
n→∞ n

Proof: (With the assumption that the variance σ 2 of the Xi ’s is finite.) We


note that
X + · · · X 
1 n
E =µ
n
and
 X + · · · X  σ2
1 n
V = .
n n
Consequently, by Chebyshev’s Inequality, we have
 X + ···X
1 n
 σ2
P | − µ| ≥ ǫ ≤ 2 .
n nǫ
the result follows by the Squeeze Theorem.

Exercises

(1) For a random variable X with mean 10 and standard deviation 2, use Cheby-
shev’s Inequality to find an upper bound on the probability that X is either
less than or equal to 7 or greater than or equal to 13.
Ans. 4/9 = 0.4444
(2) If the random variable X is normal with mean 10 and standard deviation 2,
compute the probability that X is either less than or equal to 7 or greater
than or equal to 13.
Ans. 0.1336
(3) If the random variable X is Binomial with mean 12 and standard deviation
3, compute the probability that X is either less than or equal to 8 or greater
than or equal to 16. What is the upper bound that Chebyshev’s Inequality
gives for computing this probability?
Ans. 0.2422, 0.5625
90 6. Sampling and Limit Theorems

(4) Scores on a certain standardized exam are 70 on average with a standard


deviation of 10. Find an upper probability bound for a random score for this
exam to be at least two standard deviation different from the mean.
Ans. 0.25
(5) Scores on a certain standardized exam are 65 on average with a standard
deviation of 5. Find a lower probability bound for a random score for this
exam to be between 57 and 73.
Ans. 0.6094

6.3. Moment Generating Functions


Moment generating functions are an important analytical tool when working with
random variables. For one thing, they can be used to prove the Central Limit
Theorem, arguably the most important result in statistics. As we shall see, there
are other applications too.

Moment Generating Function. A moment generating function φ(t) is


a real valued function on the reals defined for the random variable X by
φ(t) = E(etX ). If X is discrete, we have
X
φ(t) = etx f (x).
x
If X is continuous, the defining formula becomes
Z ∞
φ(t) = etx f (x)dx.
−∞

We find a rule for one of the standard discrete random variables.


Example 6.2. Given X is a Poisson random variable with parameter λ, find a rule
for the moment generating function of X.


X  λn 
φ(t) = etn e−λ
n=1
n!
∞ 
X (λet )n 
= e−λ
n=1
n!
t
= e−λ eλe
t
= eλ(e −λ) .
t
The last series in the computation is the Maclaurin series for eλe .

One property of moment generating functions that can be seen immediately is


that
φ(0) = E(e0·X ) = E(1) = 1.
6.3. Moment Generating Functions 91

Note that if X is continuous, we have


d ∞ tx
Z
φ′ (t) = e f (x)dx
dt −∞
Z ∞
= xetx f (x)dx
−∞
= E(XetX ).
Consequently, we have that φ′ (0) = E(X) if the derivative of the generating func-
tion exists at zero. A similar derivation establishes the result for discrete ran-
dom variables. The reader will note that the formula for the nth derivative is
φ(n) (t) = E(X n etX ). We therefore have that φ(n) (0) = E(X n ). This formula has
many applications. We formulate the result once more and provide an example
where it can be applied.

Moments of Random Variables. For the random variable X with


moment generating function φ(t), we have that φ′ (0) = E(X) and
φ(n) (0) = E(X n ), for n = 2, 3, ..., if the derivatives exist at zero.

Example 6.3. Use moment generating functions to show that if X is Binomial


with parameters n and p that E(X) = np and V (X) = np(1 − p).

We first note that

n
X
φ(t) = etk (nk )pk (1 − p)n−k
k=0
n
X
= (et )k (nk )pk (1 − p)n−k
k=0
n
X
= (nk )(pet )k (1 − p)n−k
k=0
= (pet + 1 − p)n .
Hence,
φ′ (t) = n(pet + 1 − p)n−1 pet
and
φ′′ (t) = n(n − 1)(pet + 1 − p)n−2 pet · pet + n(pet + 1 − p)n−1 pet ,
so φ′ (0) = np and φ′′ (0) = n(n − 1)p · p + np. This gives us
E(X) = np,
and the variance of X is
E(X 2 ) − (E(X))2 = n(n − 1)p · p + np − (np)2 = n2 p2 − np2 + np − n2p2 = np(1 − p).
We note two more important facts about moment generating functions. First,
they are unique so if one knows the moment generating function he/she can deduce
92 6. Sampling and Limit Theorems

what random variable he/she is dealing with. Second is that the moment gener-
ating function for the sum of two random variables is the product of the moment
generating functions of the two random variables. This latter result has a simple
derivation.

φX+Y (t) = E(et(X+Y ) )


= E(etX etY )
= E(etX )E(etY )
= φX (t)φY (t).

We state these properties more generally:

Uniqueness of MGFs. Moment generating functions are unique. More-


over, if X1 , X2 , ..., Xn , are independent random variables, then
φX1 +X2 +···+Xn (t) = φX1 (t) · φX2 (t) · · · φXn (t).

We conclude this section with the formulation of a theorem needed for the proof
of the Central Limit Theorem that comes up later in this chapter.

Continuity Theorem for Moment Generating Functions. Suppose


X1 , X2 , ... is a sequence of random variables with cumulative distribution
functions F1 , F2 , ..., and moment generating functions φ1 , φ2 , .... Suppose
also that X is a random variable with cumulative distribution function F
and moment generating function φ and that limn→∞ φn (t) = φ(t) for all t
in an open interval containing zero. Then limn→∞ Fn (x) = F (x) for all x
at which F is continuous.

Proving the Continuity Theorem is beyond the scope of this text. The same
is the case for the fact that moment generating functions are unique. The reader
should note that random variables can be studied in a superior rashion√via char-
acteristic functions φX (t) = E(eitx ), where i is the imaginary number −1. Un-
derstanding he mathematics behind these functions requires knowledge of complex
analysis.

Exercises

(1) Calculate the moment generating function for a discrete random variable X
for which P (X = 2) = 3/4 and P (X = 5) = 1/4.
2t 5t
Ans. φ(t) = 3e4 + e4
(2) Calculate the moment generating function for a continuous random variable
X with probability density function

2x if 0 ≤ x ≤ 1
f (x) =
0 otherwise
6.4. Sums of Independent Random Variables 93

t t
Ans. φ(t) = 2(te −e
t2
+1)

(3) Use moment generating functions to compute the standard deviation of a


Poisson random variable with mean λ. √
t
Ans. Since φ(t) = eλ(e −1) , E(X) = λ, E(X 2 ) = λ2 + λ, SD(X) = λ.
Note that the Maclaurin series for ex is needed to find φ.
(4) Use moment generating functions to compute the standard deviation of an
exponential random variable with mean µ.
λ
Ans. φ(t) = λ−t , E(X) = 1/λ, E(X 2 ) = 2/λ2 , SD(X) = 1/λ = E(X) =
µ.
(5) Calculate the moment generating function of the uniform random variable on
the interval [a, b].
bt
−eat
Ans. e(b−a)t
(6) Calculate the moment generating function of the standard normal random
variable Z. 2
t
Ans. e 2
(7) Compute the mean and standard deviation of the random variable that has
1 1 1
moment generating function φ(t) = + e6t + e12t .
√ 2 3 6
Ans. 4, 20
(8) Compute the mean and standard deviation of the random variable that has
2
moment generating function φ(t) = e2t +5t .
Ans. 5, 2
(9) If X and Y are independent Poisson random variables with means λX and λY ,
respectively, use moment generating functions to show that X + Y is Poisson
with mean λX + λY
t t
We’ve already shown that φX (t) = eλX (e −1) and that φY (t) = eλY (e −1) .
Hence,
t t t
φX+Y (t) = φX (t) · φY (t) = eλX (e −1)
· eλY (e −1)
= e(λX +λY )(e −1)
,
which is the mgf for a Poisson RV with mean λX + λY .

6.4. Sums of Independent Random Variables


We now investigate the distribution of a sum of independent random variables. As-
sume X1 , X2 , ..., Xn are independent random variables. We can obtain the moment
generating function of the sum of these random variables by taking a product of
the moment generating functions of the n random variables. We have
φX1 +X2 +···+Xn (t) = φX1 (t)φX2 (t) · · · φXn (t).
Note that for the independent Poison random variables X1 , X2 , ..., Xn , with param-
eters λ1 , λ2 , ..., λn , respectively, we have
t t t
φX1 +X2 +···+Xn (t) = eλ1 (e −1)
· eλ2 (e −1)
· · · eλn (e −1)
t
= e(λ1 +λ2 ···λn )(e −1)
.
This is the probability mass function for a Poisson random variable with mean
λ1 + λ2 + · · · + λn . We state the result.
94 6. Sampling and Limit Theorems

Sums of Poisson random variables.


If X1 , X2 , ..., Xn are independent Poisson random variables with parameters
λ1 , λ2 , ..., λn , respectively, then X1 + · · · + Xn is a Poisson random variable
with parameter λ1 + · · · + λn .

Example 6.4. It is known that there are 19 tornadoes to touch down per year on
average in Arkansas and two in Maine. The number of tornadoes to touch down
in a region is accurately modeled to be Poisson. What’s the probability that there
will be exactly 20 tornadoes to touch down in Arkansas and Maine combined next
year (assuming that the number touching down in Arkansas is independent of the
number in Maine)?

Since the sum of two independent Poisson random variables is Poisson with the
mean being the sum of the individual means, the answer is
(19 + 2)20
e−(19+2) = 0.0867.
20!
The reader will note that we are making the assumption that the number of torna-
does to touch down in Arkansas and in Maine are independent.

We now determine the probability density function for the sum of n independent
normal random variables. We start with a single normal random variable X with
mean µ and standard deviation σ. We incorporate the exponent on the factor etx
into the exponent including x2 and complete the square in x to obtain the rule:
Z ∞
1 (x−µ)2
φ(t) = etx √ e− 2σ2 dx
−∞ 2πσ
Z ∞
1 (x2 −2(µ+tσ2 )x+µ2 )
= √ e− 2σ2 dx
−∞ 2πσ
Z ∞
1 (x2 −2(µ+tσ2 )x+(µ+tσ2 )2 ) 2
+ 2µtσ2σ+t
2 σ4
= √ e− 2σ2 2 dx
−∞ 2πσ
Z ∞
t2 σ2 1 (x−(µ+tσ2 )2
= eµt+ 2 √ e− 2σ2 dx
−∞ 2πσ
t2 σ2
= eµt+ 2 ·1
t2 σ2
+µt
= e 2 .
The reader will note that the integral in the third to last line is in fact one since
the integrand is the probability density function for a normal random variable with
mean µ + tσ 2 and standard deviation σ.
Note now that for the independent normal random variables X1 , X2 , ..., Xn ,
with means µ1 , µ2 , ..., µn , respectively, and standard deviations σ1 , σ2 , ..., σn , re-
spectively, we have
t2 σ1
2 t2 σ2
2 t2 σn
2
+µ1 t +µ2 t +µn t
φX1 +X2 +···+Xn (t) = e 2 ·e 2 ···e 2

t2 (σ1
2 ···+σ2 )
n +(µ +···+µ )t
= e 2 1 n
.
6.4. Sums of Independent Random Variables 95

This is the moment generating functionp for a normal random variable with mean
µ1 + · · · + µn and standard deviation σ12 + · · · σn2 .

Sums of normal random variables. If X1 , X2 , ..., Xn are independent


normal random variables with means µ1 , ..., µn and standard deviations
1 + · · · Xn is normal with mean µ1 +
σ1 , ..., σn , then the random variable Xp
µ2 + · · · + µn and standard deviation σ12 + σ22 + · · · + σn2 .

Using moment generating functions, it can be shown that for independent


gamma random variables X with parameters λ and α and Y with parameters
λ and β, it is the case that X + Y is a gamma random variable with parameters
λ and α + β. Recalling that the exponential random variable with parameter λ
is nothing more than a gamma random variable with parameters λ and 1, we list
some interesting consequences:

Sums of exponential random variables.


If X1 , X2 , ..., Xn are independent exponential random variables with param-
eter λ, then X1 + · · · + Xn is a gamma random variable with parameters λ
and n.

Now the sum Z12 +· · · Zn2 of the squares of independent standard normal random
variables is referred to as the chi-squared random variable with n degrees of freedom.
Note that the moment generating function for a chi-squared random variable with
1 degree of freedom, φZ 2 (t) is given by
Z ∞ Z ∞
2 2 1 x2 1 (1−2t)x2
φZ 2 (t) = E(eZ t ) = etx √ e− 2 dx = √ e− 2 dx.
−∞ 2π −∞ 2π
√ 1
Letting u = 1 − 2t · x, we obtain dx = √1−2t du and
Z ∞
1 1 −u2 1
φZ 2 (t) = √ √ e 2 du = (1 − 2t)− 2
1 − 2t −∞ 2π
for t < 12 . It’s therefore the case that the chi-squared distribution with n degrees
of freedom is  n
1 n
(1 − 2t)− 2 = (1 − 2t)− 2 .
We will refer back to this moment generating function later when we ascertain the
underlying distribution of a sample standard deviation.

Sums of squares of independent standard normal random vari-


ables.
If Z1 , Z2 , ..., Zn are independent standard normal random variables, then
Z12 +· · ·+Zn2 - referred to as the chi-squared random variable with n degrees
of freedom and notated by Xn2 - has moment generating function
n
φXn2 (t) = (1 − 2t)− 2
for t < 12 .
96 6. Sampling and Limit Theorems

Exercises

(1) Suppose X and Y are independent Poisson RV’s with parameters 2 and 3,
respectively. Compute (a) P (X + Y = 6) and (b) P (X + Y ≥ 4).
3125 118
Ans.: (a) 144e5 = 0.1462 (b) 1 − 3e5 = 0.7350

(2) Calls come in to customer service center at a rate of 4.3 per minute. Assuming
calls arriving in two different minutes are independent, compute the probabil-
ity that (a) at least six calls come in in a two minute period and (b) exactly
25 calls come in in a five minute period. Hint: Use the Poisson.
Ans.: (a) 0.8578 (b) 0.0607
(3) Suppose X and Y are independent normal random variables with the mean
and standard deviation of X being 10 and 3, respectively, and the mean and
standard deviation of Y being 14 and 4, respectively. Compute (a) P (X +Y >
24) and (b) P (X + Y < 25).
Ans.: (a) 0.5000 (b) 0.5793
(4) Suppose X1 , ..., X10 are independent normal random variables, each with mean
2.0 and standard deviation 1.5. Compute (a) P (X1 + · · · + X10 < 23.5) and
(b) P (1.8 ≤ X1 +···+X
10
10
≤ 2.2)
Ans.: (a) 0.7697 (b) 0.3267
(5) Suppose X1 , ..., X20 are independent normal random variables, each with mean
0 and standard deviation 2. Compute (a) P (X1 + · · · + X20 < 1) and (b)
P (−1.1 ≤ X1 +···+X
20
20
≤ 1.1)
Ans.: (a) 0.5445 (b) 0.9861

6.5. The Central Limit Theorem


Perhaps the most remarkable result in all of statistics is the Central Limit Theorem.
We provide a rough formulation.

The Central Limit Theorem.


If n values are randomly sampled from a distribution with mean µ and
standard deviation σ, then the sample mean of these values, X̄, is approx-
σ
imately normal with mean µ and standard deviation √ for large n.
n

The theorem allows one to take as normal an average of a large number of


values randomly selected from any distribution! We have chosen the vague term
“large” here because the required size of n depends somewhat on the underlying
distribution. If the distribution is symmetric, “large” might be taken as n ≥ 5. If
the distribution is highly asymmetric, however, “large” might mean n ≥ 200. A
good rule of thumb that applies in almost all cases is to take n ≥ 30.
6.5. The Central Limit Theorem 97

The Central Limit Theorem can be written more compactly as


X̄ − µ
∼ Z,
√σ
n
for n ≥ 30.

X̄ − µ X1 + · · · + Xn − nµ
We note that and equivalent way to write is √ . A
√σ σ n
n
more precise statement of the theorem is as follows:

The Central Limit Theorem Precisely Stated. If c is a real con-


stant and X1 , X2 , ..., are independent random variables with mean µ and
standard deviation σ, then
 X + · · · + X − nµ  Z c 1 x2
1 n
lim P √ ≤c = √ e− 2 dx.
n→∞ σ n −∞ 2π

Proof: It’s sufficient to prove the theorem in the case that µ = 0 because if we
let Yn = Xn − µ we have

 X + · · · + X − nµ  Y + · · · + Y 
1 n 1 n
lim P √ ≤ c = lim P √ ≤c .
n→∞ σ n n→∞ σ n

X1 + · · · + Xn
We therefore assume µ = 0 and let Zn = √ . Then
σ n

φZn (t) = E(etZn )


t
(X1 +···+Xn )
= E(e σ )

n

t
= φX1 +···+Xn ( √ )
σ n
t t
= φX1 ( √ ) · · · φX1 ( √ )
σ n σ n
t
= [φX1 ( √ )]n .
σ n
98 6. Sampling and Limit Theorems

We now take a limit of the the natural logarithm of φZn (t). We have
t
lim ln φZn (t) = lim ln[φX1 ( √ )]n
n→∞ n→∞ σ n
t
= lim n ln[φX1 ( √ )]
n→∞ σ n
ln[φX1 ( σ√t n )]
= lim
n→∞ 1/n
φ′X1 ( σ√t n )(− 12 ) σnt3/2 /φX1 ( σ√t n )
= lim
n→∞ −1/n2
φ′X1 ( σ√t n )t
= lim
n→∞ 2σφX1 ( σ√t n )n−1/2
φ′′X1 ( σ√t n )(− 12 ) σnt3/2
= lim
n→∞ 2σφ′X1 ( σ√t n )(− 12 ) σnt3/2 n−1/2 + 2σφX1 ( σ√t n )(− 21 ) n3/2
1

2
φ′′X1 ( σ√t n ) tσ
= lim
n→∞ 2φ′X1 ( σ√t n ) √tn + 2σφX1 ( σ√t n )
2
σ 2 · tσ
=
0 + 2σ · 1
t2
= .
2

The reader will note that L’Hospital’s Rule was used in going from line 3
to line 4 and then again from line 5 to line 6. The Rule is applicable since
limn→∞ ln φX1 ( σ√t n ) = ln φX1 (0) = ln 1 = 0, and because limn→∞ φ′X1 ( σ√t n )t =
φ′X1 (0)t = µt = 0 · t = 0.

Hence,
t2
lim φZn (t) = e 2 .
n→∞
t2
Since e is the moment generating function for the standard normal, we can apply
2

the Continuity Theorem for Moment Generating Functions, and the proof is done.


An application of the Central Limit theorem is provided.


Example 6.5. It is known that ACT scores have mean of 21.0 and standard
deviation of 5.2. What’s the probability that the average of 40 randomly chosen
ACT scores is (a) at most 19? (b) at least 20?
19 − 21.0 ∼
(a) P (X̄ ≤ 19) = P (Z ≤ 5.2 ) = P (Z ≤ −2.43) ∼
= 0.0075

40
20 − 21.0 ∼
(b) P (X̄ ≥ 20) = P (Z ≥ 5.2 ) = P (Z ≥ −1.22) ∼
= 0.8888

40
6.6. Normal Approximation to the Binomial 99

A different formulation of the Central Limit Theorem is used to solve the next
example problem. We compute the probability that a sum of random variables is
within a certain interval. To apply the Central Limit Theorem, we convert this
sum to an average.
Example 6.6. A parcel carrier handles packages that weigh on average 16.4 lb with
a standard deviation of 7.5 lb. What’s the probability that the next 50 parcels she
handles will weigh less than 750 lb altogether.

Let Xi be the weight in pounds of the ith package. Then the answer is
X + X + · · · X 750 
1 2 50
P (X1 + X2 + · · · X50 < 750) = P <
50 50
= P (X̄ < 15)
 15 − 16.4 
= P Z< 7.5√
50
= P (Z < −1.32)
= 0.093.

6.6. Normal Approximation to the Binomial


An application of the Central Limit Theorem allows us to obtain a normal approx-
imation to the binomial. Suppose X is binomial with parameters n and p. Then
we can write
Xn
X= Xi ,
i=1
where the Xi ’s are independent Bernoulli random variables with parameter p. We
therefore have that X/n is the sample mean X̄ of n Bernoulli(p) random variables.
We are converting a discrete random variable X to the continuous standard normal
random variable Z, so for k = 0, 1, 2, ..., n we write P (X = k) = P (k − 1/2 < X <
k + 1/2). According to the Central Limit Theorem (and recalling that E(Xi ) = p
and SD(Xi ) = p(1 − p), we have that
n
X
P (k − 1/2 < X < k + 1/2) = P (k − 1/2 < Xi < k + 1/2)
i=1
 k − 1/2 k + 1/2 
= P < X̄ <
n n
 k− 12 − p k+ 12
− p
= P √n < Z < √n
p(1−p) p(1−p)
√ √
n n
k − 21 −
np k + 1 − np 
= P p <Z < p 2
np(1 − p) np(1 − p)

This allows us to write a pair of formulas that are of great use in approximating
binomial probabilities with the standard normal.
100 6. Sampling and Limit Theorems

Normal Approximation to the Binomial. If X is binomial with pa-


rameters n and p, n is large, and k = 0, 1, 2, ..., n, then
 k + 1 − np 
P (X ≤ k) = P Z < p 2
np(1 − p)
and
 k − 1 − np 
P (X ≥ k) = P Z > p 2 .
np(1 − p)

We demonstrate how these formulas might come in handy.


Example 6.7. If you toss a fair coin 100 times, what’s the probability you get at
least 52 heads?

Without the normal approximation, we’d have to compute individually the


probabilities that we get exactly 52 heads, exactly 53 heads, etc., up to the proba-
bility of getting exactly 100 heads. We’d then have to add these 49 computations
together to get our answer. Using the normal approximation, however, the compu-
tation is much simpler. Let X count the number of heads you get. Then
52 − 12 − 100(0.5) 
P (X ≥ 52) = P (Z > p
100(0.5)(1 − 0.5)
= P (Z > 0.3)
= 0.3821.

Exercises

(1) A certain cereal company packages “14 oz” boxes of cereal. These boxes have
a mean weight of 14.1 oz with a standard deviation of 0.42 oz. What’s the
probability that 30 randomly selected boxes of this cereal have an average
weight of at least 13.9 oz?
Ans. 0.995
(2) A bartender pours glasses of wine that are 10.5 oz on average with a standard
deviation of 1.3 oz. (a) What’s the probability that the bartender pours at
least 10 oz for the next customer who requests a glass of wine? (b) What’s
the probability he pours at least 10 oz on average for the next 30 customers
who request a glass of wine? (c) What’s the probability that 430 oz of wine
will suffice for the next 40 customers he pours glasses of wine?
Ans. (a) not enough information given to answer the problem, (b) 0.9824,
(c) 0.888
(3) Hank sales propane tanks that have a mass of 17.46 kg with a standard devi-
ation of 0.12 kg when empty. He loads 50 such tanks on the back of his truck.
What’s the probability that these 50 tanks have a mass of more than 875 kg
combined?
Ans. 0.0092
6.6. Normal Approximation to the Binomial 101

(4) If you roll a balanced die 50 times, what’s the probability you get (a) exactly
eight 6’s? (b) at least 10 6’s?
Ans. (a) 0.1510 (b) 0.3290
(5) Suppose you take a true false exam by guessing on every question and that
the lowest passing score is 60%. What’s the probability you pass if the exam
consists of (a) 30 questions, (b) 50 questions, (c) 100 questions?
(6) Suppose you take a multiple choice exam by guessing on every question and
that the lowest passing score is 60%. What’s the probability you pass if the
exam consists of 30 questions, each with four answers?
(7) Twelve percent of customers at a fast food restaurant order a root beer.
What’s the probability that at least 30 of the next 200 customers at that
restaurant will order a root beer?
Ans. 0.1157
(8) A facilities worker loads pieces of equipment onto a freight elevator that has a
capacity of 1, 800 pounds. The pieces weigh 56.5 lb on average with a standard
deviation of 18.5 lb. What’s the probability that the next 30 pieces that need
to be loaded on the elevator will be within the weight limit?
Ans. 0.851
(9) A particular laptop comes with a battery that will hold a charge for five hours
and 32 minutes on average with a standard deviation of one hours and 56
minutes. If you purchase 40 such laptops, what’s the probability that they
hold a charge on average for five hours or more?
Ans. 0.959
Chapter 7

Random Processes

7.1. Markov Chains


We consider a sequence of random variables X0 , X1 , ..., each of which can take the
values 1, 2, ..., N . If the random variable Xn in the sequence takes the value j, we
say that the process is in state j at time n. If the process is such that the value
Xn+1 takes is only dependent on the value that Xn takes (i.e. that future values
are only dependent on the present value and not previous values), then it is said to
be a Markov process. Using notation from probability, we have the following:

Markov Chains. The sequence X0 , X1 , ... is called a Markov chain if


P (Xn+1 = k|Xn = j, Xn−1 = jn−1 , ..., X0 = j0 ) = P (Xn+1 = k|Xn = j).
We write
pjk = P (Xn+1 = k|Xn = j),
and note that
pjk ≥ 0
and
N
X
pjk = 1
k=0
for j = 1, 2, ..., N . We call the pjk ’s the transition probabilities of the
Markov chain.

It’s helpful to list the transition probabilities in a square N × N matrix as


follows:
 
p11 p12 ... p1N
 p21 p22 ... p2N 
 
 
pN 1 pN 2 ... pN N

103
104 7. Random Processes

This transition matrix allows us to compute probabilities of interest. The reader


will note that we can write

P (Xn = jn , Xn−1 = jn−1 , ..., X0 = j0 ) = P (Xn = jn |Xn−1 = jn−1 , ..., X0 = j0 )


× P (Xn−1 = jn−1 , ..., X0 = j0 )
= pjn−1 jn · P (Xn−1 = jn−1 , ..., X0 = j0 ).

Repeating this argument, we obtain

P (Xn = jn , Xn−1 = jn−1 , ..., X0 = j0 )


= pj0 j1 pj1 j2 · · · pjn−1 jn P (X0 = j0 ).

Now pjk is the probability of going from state j to state k in one step. The
(m)
notation pjk is used to represent the probability of going from state j to state k
in m steps. We have
(m)
pjk = P (Xn+m = k|Xn = j).

We can arrive at the so called Chapman-Kolmogorov equations by noting that


(m)
pjk = P (Xm = k|X0 = j)
N
X
P (Xm = k, Xr = i|X0 = j)
i=1
N
X
P (Xm = k|Xr = i, X0 = j)P (Xr = i|X0 = j)
i=1
N
(m−r) (r)
X
= pik pji .
i=1

We have

Chapman-Kolmogorov Equations. If r is an integer between 0 and m,


we have that
N
(m) (m−r) (r)
X
pjk = pik pji .
i=1

The transition matrix is of use to determine multistep transitions.


7.1. Markov Chains 105

Powers of the Transition Matrix. Taking P to be the transition matrix


 
p11 p12 ... p1N
 p21 p22 ... p2N 
 
 
pN 1 pN 2 ... pN N
with pjk representing to probability of moving from state j to state k in
one step, we have that the entry in the jth row and kth column of P n is the
(n)
probability pjk of transitioning from state j to state k in exactly n steps.

Example 7.1. Suppose that in a certain region the weather is such that a sunny
day is followed by a stormy day with probability 81 and a stormy day is followed by
a stormy day with probability 31 . Taking a sunny day to be State 1 and a stormy
day to be State 2, compute the probability that on the third day after a sunny day
it will be stormy.

To solve this problem we note that the transition matrix is


 
7/8 1/8
P =
2/3 1/3

The third step transition matrix is therefore


 
0.8435 0.1565
P3 =
0.8345 0.1655

(3)
Consequently, the answer to the question is p12 = 0.1565.

If it’s the case that for some positive integer m we have that

(m)
pjk > 0

for all j, k = 1, 2, ..., N , then the Markov chain is said to be ergodic. When the
chain is ergodic, the limit
(n)
lim p
n→∞ jk

exists. We label it πk . Using the Chapman-Kolmogorov equations, the following


result can be established:
106 7. Random Processes

Ergodic Theorem. For an ergodic Markov chain,


(n)
πk = lim pjk
n→∞
exists. Moreover,
N
X
πk = πj pjk
j=1
and
N
X
πk = 1.
k=1
Writing π = hπ1 , π2 , ..., πN i, we have
πP = π.

Example 7.2. Suppose that in a certain region the weather is such that a sunny
day is followed by a stormy day with probability 18 and a stormy day is followed by a
stormy day with probability 31 . Taking a sunny day to be State 1 and a stormy day
to be State 2, compute the probability vector π mentioned in the Ergodic Theorem.

To solve this problem we note that


 
0.8421 0.1579
lim P n =
n→∞ 0.8421 0.1579
Consequently, the answer to the question is π1 = 0.8421 and π2 = 0.1579 so that
 
π = 0.8421 0.1579

Note also that


 
  0.8421 0.1579  
0.8421 0.1579 = 0.8421 0.1579
0.8421 0.1579

The row vector x = hx1 , x2 , ..., xN i is said to be the state vector for the Markov
chain at a given observation if the kth component xk is the probability the system is
in the kth state at that time. To ascertain the state vector for a specific observation
we use the transition matrix.

State Transitions. A Markov chain with transition matrix P and state


vector x(n) at the nth observation is such that
x(n+1) = x(n) P.
Accordingly, we have
x(n) = x(0) P n .

Example 7.3. Returning to the sunny/stormy example, if we start out on a sunny


day, what’s the probability that four days later it is (a) sunny and (b) stormy?
7.1. Markov Chains 107

Again we take State 1 to be sunny and State 2 to be stormy. We start on a


sunny day so we have
x(0) = h1, 0i.
We note that
 4
4
  7/8 1/8  
xP = 1 0 = 0.8424 0.1576
2/3 1/3

Hence, there’s about an 84.2% chance it will be sunny and a 15.8% chance it
will be stormy on the fourth day.

Example 7.4. Consider a random walk on the real line in which an entity either
moves one step to the right with probability p or one step to the left with probability
1 − p. We compute the transition probability for going from state i to state j in
exactly n steps.

(n)
To compute pij we note that the transition has to consist of k steps to the
right, where k is a nonnegative integer less than or equal to n, and n − k steps to
the left. This has to happen in such a way that
i + k − (n − k) = j.
This implies 2k = n − i + j or
n−i+j
.k=
2
Hence, the transition probability for n steps is

n
   n−i+j n−i+j n−i+j
(n)
 p 2 (1 − p)n− 2 if 2 = 0, 1, 2, ..., or n
pij = n−i+j
2
 0 otherwise

Exercises

(1) Suppose a Markov chain X0 , X1 , ..., has the two states 1 and 2, and transition
matrix  
0.6 0.4
P = .
0.3 0.7
(a) What’s the probability of going from State 2 to State 1 in one step? (b)
What’s the probability of going from State 2 to State 1 in two steps? (c) Is
the process ergodic? (d) If it is ergodic, compute the stable state vector π
with each component to four decimal places. (e) If the process starts with
state vector x(0) = h1/2, 1/2i, compute the state vector x(2) .
Ans. (a) 0.3 (b) 0.39 (c) Yes (d) h0.4286, 0.5714i (e) h0.435, 0.565i
108 7. Random Processes

(2) Suppose a Markov chain X0 , X1 , ..., has the three states 1, 2, and 3, and
transition matrix  
1/2 1/4 1/4
P =  1/3 1/2 1/6  .
1 0 0
(a) What’s the probability of going from State 2 to State 1 in one step? (b)
What’s the probability of going from State 2 to State 1 in two steps? (c) Is
the process ergodic? (d) If it is ergodic, compute the stable state vector π
with each component to four decimal places? (e) If the process starts with
state vector x(0) = h0, 1, 0i, compute the state vector x(2) .
Ans. (a) 1/3 (b) 1/2 (c) Yes since every entry in P 2 is positive
(d) h0.5455, 0.2727, 0.1818i (e) h1/2, 1/3, 1/6i
(3) Place $100 bets on red in roulette so that you can be in a state of having either
$0, $100, $200, $300, $400, or $500 in funds. Once you run our of money or
accumulate $500 you stay where you are. (a) Write out the transition matrix
for the six states. (b) If you start with $300, what’s the probability you’ll be
broke after betting five times?
Ans. (a)
 
1 0 0 0 0 0
 20/38 0 18/38 0 0 0 
 
 0 20/38 0 18/38 0 0 
P =  .
 0 0 20/38 0 18/38 0 

 0 0 0 20/38 0 18/38 
0 0 0 0 0 1
(b) 0.2548
(4) In a random walk, assume a particle can move one step to the right with
probability 0.52 and one step to the left with probability 0.48. What then
is the probability that the particle will move two steps to the right (a) in 20
steps and (b) in five steps.
Ans. (a) (20 11 9
11 )(0.52) (0.48) = 0.1708 (b) 0
Index

Z-score, 58 indpendent events, 7

Bayes’ Theorem, 9 joint probability density function, 69


Bernoulli distribution, 37 joint probability distribution function,
binomial distribution, 38 63
bivariate random variable, 63 joint probability mass function, 63

Central Limit Theorem, 94 Law of Total Probability, 10


Chapman-Kolmogorov equations, 102
combinations, 36 marginal probability density function,
complement of an event, 2 70
Complement Rule, 3 marginal probability mass function, 64
conditional probability, 6 Markov chain, 101
conditional probability mass function, mean, 26
65 moment generating function, 88
continuous random variable, 17 multinomial coefficient, 67
correlation coefficient, 78 multinomial distribution, 66
covariance, 77 mutually exclusive, 2
cumulative distribution function, 20
normal distribution, 56
discrete random variable, 13
odds, 5
ergodic Markov chain, 103
event, 1 percentile, 23
expected value, 25 Poisson distribution, 43
exponential distribution, 52 probability density function, 17
probability mass function, 13
gamma distribution, 55
gamma function, 55 random variable, 13
geometric distribution, 47 random vector, 63

hypergeometric, 41 sample mean, 83


sample space, 1
independent, 7 standard deviation, 27
independent random variables, 64, 72 standard normal distribution, 57

109
110 Index

standard normal table, 59


state vector, 104

transition matrix, 102

uniform distribution, 51

variance, 27, 31

You might also like