Ponder Probability 2025 September
Ponder Probability 2025 September
Nathan Ponder
Chapter 1. Probability 1
1.1. Sample Spaces 1
1.2. Probability Axioms and Rules 2
1.3. Odds 5
1.4. Conditional Probability and Independence 6
1.5. Bayes’ Theorem 9
vii
viii Contents
4.4. Normal 56
Chapter 5. Joint Probability Distributions 63
5.1. Discrete Case 63
5.2. Continuous Case 69
5.3. Functions of Joint Random Variables. 74
5.4. Expected Value, Variance, and Covariance 77
Chapter 6. Sampling and Limit Theorems 85
6.1. Sample Mean and Variance 85
6.2. Law of Large Numbers 88
6.3. Moment Generating Functions 90
6.4. Sums of Independent Random Variables 93
6.5. The Central Limit Theorem 96
6.6. Normal Approximation to the Binomial 99
Chapter 7. Random Processes 103
7.1. Markov Chains 103
Index 109
Chapter 1
Probability
1
2 1. Probability
Example 1.1. What’s the probability you get at least two heads if you toss a coin
three times?
If you toss a coin three times, then the sample space of outcomes is
If A represents the event that you get at least two heads, then
Consequently,
4 1
P (A) = =
8 2
since there are just four outcomes in A and eight in S.
Example 1.2. What’s the probability that the sum of the two numbers that come
up when you roll a pair of dice is 8?
If you roll two dice to see which numbers come up, then the sample space
of outcomes is S = {(1, 1), (2, 1), ..., (6, 6)} . If A represents the event we are to
find the probability of, then A = {(6, 2), (5, 3), (4, 4), (3, 5), (2, 6)}. Consequently,
5
P (A) = 36 since there are just five outcomes in A and 36 in S.
It’s often desirable to study groups of events at the same time. To do this,
probabilists have adopted the notation of set theory.
Groups of Events. For the events A and B in the sample space S, the
following notation from set theory is used:
• A ∩ B is the event that both A and B occur.
• A ∪ B is the event that A or B occurs.
• AC is read A complement and represents the event that A does not
occur.
• ∅ is an impossible event.
• If A ∩ B = ∅, then A and B are said to be mutually exclusive.
Probability Axioms
(1) P (A) ≥ 0 for every event A in the sample space S
(2) P (S) = 1
(3) If the events A, B, C, etc. in the sample space S are mutually excusive
one from another, then
P (A ∪ B ∪ C ∪ · · · ) = P (A) + P (B) + P (C) + · · ·
Probability Rules
(1) 0 ≤ P (A) ≤ 1 for every event A in the sample space S
(2) P (AC ) = 1 − P (A) (Complement Rule)
(3) P (A ∪ B) = P (A) + P (B) − P (A ∩ B) (Probability of a Union Rule)
(4) If A and B are mutually exclusive, then P (A ∪ B) = P (A) + P (B)
(5) For the three events A, B, and C,
P (A ∪ B ∪ C) = P (A) + P (B) + P (C)
−P (A ∩ B) − P (B ∩ C) − P (A ∩ C)
+P (A ∩ B ∩ C)
Example 1.3. If A and B are events in a sample space for which P (A) = 0.50,
P (B) = 0.40, and P (A ∩ B) = 0.25, compute (a) P (A ∪ B) and (b) P (B C ), and
(c) P (AC ∩ B).
Example 1.4. If you deal a card from a well shuffled deck of 52, what’s the
probability it’s red or a king?
4 1. Probability
Let A be the event that it’s red and B the event that it’s a king. Then
P (A ∪ B) = P (A) + P (B) − P (A ∩ B)
26 4 2
= + −
52 52 52
7
= .
13
Example 1.5. Suppose A, B, and C are events in a sample space for which P (A) =
0.32, P (B) = 0.27, P (C) = 0.42, P (A ∩ B) = 0.14, P (B ∩ C) = 0.09, P (A ∩ C) =
0.11, and P (A ∩ B ∩ C) = 0.08. Compute (a) P (A ∪ B ∪ C) and (b) P (A ∩ B C ∩ C).
Exercises
(1) An experiment consists of tossing a coin three times. Write out the sample
space of this experiment. Find the probability that you get more heads than
tails.
Ans. See one of the examples above for S. 12
(2) An experiment consists of tossing a coin four times. Write out the sample
space of this experiment. Find the probability that you get (a) exactly two
heads, (b) at most two heads.
Ans. S = {HHHH, HHHT, ..., T T T T } There should be 16 outcomes
11
altogether. (a) 38 , (b) 16
(3) If A and B are two events in a sample space for which P (A) = 0.30, P (B) =
0.60, and P (A ∩ B) = 0.20, compute (a) P (A ∪ B), (b) P (AC ), and (c)
P (AC ∩ B).
Ans. (a) 0.70, (b) 0.70, (c) 0.40
(4) If A and B are mutually exclusive events for which P (A) = 0.30 and P (B) =
0.60, compute (a) P (A ∪ B) and (b) P (A ∩ B).
Ans. (a) 0.90, (b) 0
(5) Suppose 60% of college students have a VISA Card, 45% have a MasterCard,
and 12% have both. What’s the probability that a randomly selected college
student (a) does not have a VISA or MasterCard, (b) has at least one of the
two cards, (c) has a Master Card but not a VISA Card?
Ans. (a) 0.07, (b) 0.93, (c) 0.33
(6) If you roll two dice, what’s the probability that the sum of the two numbers
that come up is at least nine?
5
Ans. 18
1.3. Odds 5
(7) Suppose A, B, and C are events in a sample space for which P (A) = 0.30,
P (B) = 0.25, P (C) = 0.45, P (A ∩ B) = 0.15, P (B ∩ C) = 0.08, P (A ∩ C) =
0.12, and P (A∪B ∪C) = 0.70. Compute (a) P (A∩B ∩C), (b) P (AC ∩B ∩C),
(c) P ((AC ∩ B) ∪ C), and (d) P (AC ∩ (B ∪ C)) .
Ans. (a) 0.05, (b) 0.03, (c) 0.52, (d) 0.40
(8) If you deal a card from a well shuffled deck of 52, what’s the probability (a)
it’s a king and (b) it’s either a diamond or a two?
1 4
Ans. (a) 13 (b) 13
1.3. Odds
In probability theory, the term odds is defined by
P (A)
the odds of A = .
P (AC )
If you were to deal a card from a well shuffled deck of 52, the odds it would be
4 48 1
an ace would be ÷ = . Odds are typically written as a ratio of positive
52 52 12
1
integers, so the odds of dealing an ace could be written as 12 , 1 : 12, or 1 to 12.
Unfortunately, the term odds is not used consistently in different quarters. It is
common to take “odds for the event A” to mean “odds of the event A” and “odds
against the event A” to mean P (AC )/P (A). In other words, the odds against is the
reciprocal of the odds for. Compounding the confusion is the fact that gamblers
usually mean “odds against” when they simply say “odds”.
Exercises
(1) When rolling a die, what are the odds (a) for and (b) against getting a 6.
Ans. (a) 1 to 5 (b) 5 to 1
(2) If you toss a fair coin six times, what are the odds (a) for and (b) against
getting exactly five heads?
Ans. (a) 3/29 (b) 29/3
(3) If you deal a card from a well shuffled deck of 52 cards, what are the odds (a)
for and (b) against getting a face card?
Ans. (a) 3/10 (b) 10/3
(4) If the odds against a candidate winning an election are four to one, what’s the
probability the candidate wins the election?
Ans. 20%
(5) You are told that the odds of the Saints winning the Super Bowl this season
are 24 to 1. You recognize that you are actually being told that the odds
against their winning the super bowl are 24 to 1. What is the probability
they’ll win the Super Bowl?
Ans. 0.04
(6) If you buy 5000 tickets, your odds for winning a lottery are 1 to 800. What is
the probability to six decimal places that you win?
Ans. 0.001248
6 1. Probability
Example 1.6. Suppose P (A) = 0.5 and P (B) = 0.4. Compute then the condi-
tional probability P (A|B) if (a) P (A ∪ B) = 0.75 and (b) A and B are mutually
exclusive.
P (A ∩ B)
By multiplying each side of the equation P (A|B) = by P (B), we
P (B)
obtain a useful result:
Multiplication Rule.
P (A ∩ B) = P (A|B)P (B)
Example 1.7. Find the probability that the first two cards you deal from a well-
shuffled deck of 52 are red.
Let A be the event that the second card is red and B the event that the first
is red. Then the probability to be computed is given by
25 26
P (A ∩ B) = P (A|B)P (B) = · = 0.2451.
51 52
1.4. Conditional Probability and Independence 7
Example 1.8. If A and B are independent events for which P (A) = 0.6 and
P (B) = 0.4, compute P (A ∪ B).
Note that if the three events A, B, and C are independent, then P (A∩B ∩C) =
P ((A ∩ B) ∩ C) = P (A ∩ B)P (C) = P (A)P (B)P (C). In more general terms, for
the n independent events A1 , A2 , ...,An , we have
P (A1 ∩ A2 ∩ · · · ∩ An ) = P (A1 )P (A2 ) · · · P (An ).
Example 1.9. If you toss a coin six times, what’s the probability you get six
heads?
Let Ai be the event that the ith toss results in heads. Then A1 , A2 , ..., A6 , are
independent. The answer is
1 6
P (A1 ∩ A2 ∩ · · · ∩ A6 ) = P (A1 )P (A2 ) · · · P (A6 ) = = 0.0156.
2
Example 1.10. If you toss a coin six times what’s the probability you get at least
one tail?
Note that a direct computation would be complicated. You’d have to find the
probability that you get exactly one tail, that you get exactly two tails, that you
get exactly three tails, etc., and then add all these probabilities together. Using
the Complement Rule makes the computation much simpler. The complement
of getting at least one tail is getting no tails. Another way of saying that you
get no tails is to say that you get all heads. We know that the probability of
getting all heads is 0.0156. Therefore, the probability of getting at least one tail is
1 − 0.0156 = 0.9844.
Example 1.11. Suppose one in eight soft drink bottle tops are winners. If you
randomly buy eight bottles of soft drink, what’s the probability you win at least
once?
8 1. Probability
According to the Complement Rule, the probability you win at least once is one
minus the probability you don’t win at all. Not winning at all, would mean that
each of the eight bottle tops is a loser. Since each top is a loser with probability 78 ,
we have that the answer to the question is
7 8
1− = 0.6564.
8
Exercises
3
(1) If A and B are independent, P (A) = 4 and P (B) = 13 , compute P (A ∩ B).
Ans. 41 = 0.25
(2) Suppose A and B are two events for which P (A) = 0.70 and P (B) = 0.25.
Compute P (A ∩ B) if (a) A and B are independent (b) A and B are mutually
exclusive.
Ans. (a) 0.175 (b) 0
(3) Suppose A and B are two events for which P (A) = 21 , P (B) = 13 , and P (A ∩
B) = 14 . Compute (a) P (A|B), (b) P (B|A), (c) P (A ∪ B).
Ans. (a) 34 , (b) 12 , (c) 12
7
1
(4) Suppose A and B are two events for which P (A|B) = 2 and P (A ∩ B) = 31 .
Compute P (B).
Ans. 23
(5) When rolling two dice, what’s the probability the first comes up as a 3 given
the sum from the two dice is at least eight?
2
Ans. 15
(6) If you toss a coin 12 times, what’s the probability you get no heads? Round
your answer off to five decimal places.
Ans. 0.00024
(7) Draw two cards without replacement from a well-shuffled deck of 52 playing
cards. What’s the probability they are both kings? Recall that there are four
kings in a deck. Round your answer off to four decimal places.
1
Ans. 221 = 0.0045
(8) Only 15% of motorists come to a complete stop at a certain four way stop
intersection. What’s the probability that of the next ten motorists to go
through that intersection (a) none come to a complete stop, (b) at least one
comes to a complete stop, and (c) exactly two come to a complete stop.
Ans. (a) 0.1969, (b) 0.8031, (c) 0.2759
(9) Suppose one in 12 soft drink bottle tops are winners. If you randomly buy six
bottles of soft drink, what’s the probability you win at least once?
6
Ans. 1 − 11 12 = 0.4067
(10) Suppose one in four soft drink bottle tops are winners. If you randomly buy
six bottles of soft drink, what’s the probability you win at least once?
Ans. 0.8220
1.5. Bayes’ Theorem 9
A = (A ∩ B) ∪ (A ∩ B C ),
and
P (A) = P (A ∩ B) + P (A ∩ B C ).
P (A ∩ B)
P (B|A) =
P (A)
P (A ∩ B)
=
P (A ∩ B) + P (A ∩ B C )
P (A|B)P (B)
=
P (A|B)P (B) + P (A|B C )P (B C )
Bayes’ Theorem.
P (A|B)P (B)
P (B|A) =
P (A|B)P (B) + P (A|B C )P (B C )
An application follows.
Example 1.12. A company has developed a new drug test that tests positive on
a drug user 99% of the time. It tests negative on non-drug users 99% of the time
also. If only 0.5% of the employees in a large company are drug users, what’s the
probability that a tested employee is actually a drug user if the test is positive?
Let A be the event that the employee tests positive and B the event that the
employee is a drug user . Then we need to compute P (B|A) to answer the question.
We have
10 1. Probability
P (A|B)P (B)
P (B|A) =
P (A|B)P (B) + P (A|B C )P (B C )
(0.99)(0.005)
=
(0.99)(0.005) + (0.01)(0.995)
= 0.332
Example 1.13. A certain university requires all it’s students to take the ACT
exam before admission. Some 25% of College Algebra students at this university
have an ACT math score of 26 or higher. Studies show that 90% of College Algebra
students who have a math ACT score of 26 or better pass the class. For those with
a math ACT score lower than 26, only 48% pass College Algebra. Assuming all the
given information is accurate, compute the probability a randomly selected College
Algebra student from the university in question will pass the class.
Let A be the event that the randomly chosen College Algebra student passes
the class, and let B be the event that the student made a 26 or higher on the math
ACT. Then the answer is
P (A) = P (A|B)P (B) + P (A|B C )P (B C ) = (0.90)(0.25) + (0.48)(0.75) = 0.585.
P (A ∩ Bi )
P (Bi |A) =
P (A)
P (A ∩ Bi )
=
P ((A ∩ B1 ) ∪ (A ∩ B2 ) ∪ · · · ∪ (A ∩ Bn ))
P (A ∩ Bi )
=
P (A ∩ B1 ) + P (A ∩ B2 ) + · · · + P (A ∩ Bn )
P (A|Bi )P (Bi )
=
P (A|B1 )P (B1 ) + P (A|B2 )P (B2 ) + · · · + P (A|Bn )P (Bn )
Exercises
(1) If A and B are two events in a sample space for which P (B) = 0.70, P (A|B) =
0.20, P (A|B C ) = 0.40, compute P (A).
Ans. 0.26
(2) If you deal two cards without replacement from a well shuffled deck of 52,
what’s the probability that the second will be red?
Ans. 0.5
(3) A blood test detects a certain disease 95% of the time when the disease is
present. However, the test is also positive 1% of the time when the disease
is not present. If 0.5% of the population actually has the disease, what’s the
probability a person has the disease given the test result is positive?
Ans. 0.3231
(4) Assume 1% of women at age forty have breast cancer. If 80% of women with
breast cancer will get positive mammographies and 9.6% of women without
breast cancer will also get positive mammographies, what’s the probability
that a 40 year old woman that has a positive mammography actually has
breast cancer?
Ans. 0.0776
(5) An insurance company learns that a potential customer is smoking a cigar.
The company also knows that 9.5% of males smoke cigars as do 1.7% of fe-
males. What’s the probability that the potential customer is a male. Assume
that half of the population to answer this question is male.
12 1. Probability
Ans. Let A be the event that the customer is a cigar smoker and B be the
event that the customer is male. Then the answer is
P (A|B)P (B)
P (B|A) =
P (A|B)P (B) + P (A|B C )P (B C )
(0.095)(0.5)
=
(0.095)(0.5) + (0.017)(0.5)
= 0.8482
The insurance company can be about 85% sure that the cigar smoker is a male.
(6) Suppose that just five percent of men and a quarter of one percent of women
are color-blind and that there are an equal number of women and men. If a
color-blind person is chosen at random, what’s the probability that person is
male?
Ans. 0.9524
(7) Three different factories, F1 , F2 , and F3 , are used to manufacture a large
batch of cell phones. Suppose 20% of the phones are produced by F1 , 30%
are produced by F2 , and 50% by F3 . Suppose also that 1% of the phones
produced by F1 are defective, as are 2% of those produced by F2 and 3% of
those produced by F3 . If one phone is selected at random from the entire
batch and is found to be defective, what’s the probability it was produced by
(a) F1 and (b) F2 ?
Ans. (a) 0.0870
Chapter 2
Random Variables
3+1 1
P ({X ≥ 2}) = = ,
8 2
since there are eight equally likely outcomes to the experiment, three of which result
in two heads and one of which results in three heads. It is customary to abbreviate
1
P ({X ≥ 2}) with P (X ≥ 2) for ease of writing, so that we have P (X ≥ 2) = .
2
p(x) = P (X = x).
13
14 2. Random Variables
Example 2.1. Illustrate that the the first two properties listed above for proba-
bility mass functions are true for the random variable X that counts the number
of heads you get when you toss a balanced coin three times.
If you toss a coin three times, then the sample space of outcomes is
Example 2.2. Assume the probability the home team wins the second game of
the NBA finals by fewer than four points is 0.17. That they win by at least four
points and fewer than seven points is 0.28, that they win by at least seven points
and fewer than 10 points is 0.25, that they win by at least 10 points is 0.10, and
that they lose is 0.20. Compute the probability that the home team wins the next
NBA finals by at least four points.
2.1. Discrete Random Variables 15
X
P (A) = p(x)
x∈A
= 0.28 + 0.25 + 0.10
= 0.63.
Example 2.3. Illustrate that the the first two properties listed above for proba-
bility mass functions are true for the random variable X that counts the number
tosses of a coin it takes for heads to come up.
For the penultimate equality we used the formula for the sum of a geometric series.
The reader might wish to review this formula in the sequences and series chapter
of a calculus textbook or the latter pages of a college algebra textbook.
Here is a related example.
Example 2.4. What’s the least number of times you would need to toss a coin to
have at least a 99% probability of getting a heads?
Tossing a coin once would result in a heads with probability 1/2. If you toss
a coin twice, there are four outcomes, three of which result in at least one heads.
If you toss a coin three times there are eight outcomes, seven of which result in
at least one heads. Continuing in this way, we obtain 2k outcomes when the coin
16 2. Random Variables
is tossed k time, and all but one of those outcomes has at least one heads. We
therefore need to solve the inequality
2k − 1
> 0.99
2k
to answer the question. We can rewrite this inequality as
1 k
1− > 0.99
2
or
1 k
< 0.01.
2
Taking the natural logarithm of each side, we obtain
1
k ln < ln(0.01)
2
or
ln(0.01)
k> = 6.64.
ln 21
We therefore need to toss the coin at least seven times.
Exercises
(1) Suppose X is a random variable that counts the number of heads you get when
you toss a coin six times and that p(x) is the probability mass function for
X. There are methods - short of listing all outcomes in the sample space and
counting the number that have exactly two or exactly four heads - to establish
that p(2) = p(4) = 15/64. Deduce the values of p(k) for k = 0, 1, 5, and 6 by
inspection, and then compute p(3)
5
Ans. p(3) = 16
(2) Find the value of the constant c that makes f (x) a probability mass function
if p(x) is given by
cx if x = 2, 4, 6, or 8
p(x) =
0 otherwise
Ans. c = 1/20
(3) For the random variable X with the probability mass function in the imme-
diately preceding problem, compute (a) P (X ≤ 4) and (b) P (X < 4).
Ans. (a) 0.3 (b) 0.1
(4) What’s the least number of times you would need to roll a die to have at least
a 90% probability of getting a six?
Ans. Solve the inequality 1 − (5/6)n ≥ 9/10 to get n is at least 13
(5) Let X be a random variable giving the sum obtained when you roll two dice.
Write out a rule for the probability mass function of X. Note that p(x) will
be a piecewise defined function with 12 pieces. Compute the probability you
get a sum of at least 10 by adding p(10), p(11), and p(12).
2.2. Continuous Random Variables 17
Ans.
1
36 if x = 2 or 12
2
if x = 3 or 11
36
3
if x = 4 or 10
36
4
p(x) = 36 if x = 5 or 9
5
if x = 6 or 8
36
6
if x = 7
36
0 otherwise
1
Prob. of getting at least 10 is 6
(6) Can a discrete random variable X have a probability mass function with rule
1/x if x is an integer greater than or equal to 2
p(x) =
0 otherwise
Explain your answer.
1) Since e raised to any power is positive, 2e−2x is always positive and f (x) ≥ 0
for all x.
Z ∞ Z ∞
2) f (x)dx = 2e−2x dx = −e−2x |∞ 0
0 = 0 − (−e ) = 1.
−∞ 0
so that
Z b Z b Z a
f (x)dx = f (x) − f (x)dx
a −∞ −∞
= P (X ≤ b) − P (X ≤ a)
= P ({X ≤ a} ∪ {a < X ≤ b}) − P (X ≤ a)
= P (X ≤ a) + P (a < X ≤ b) − P (X ≤ a)
= P (a < X ≤ b)
Z b
Since P (X = b) = 0, we have that P (a < X < b) = f (x)dx. Similar derivations
a
yield the following formulas:
Formulas Involving Probability Density Functions.
Z b
P (a < X < b) = P (a ≤ X < b) = P (a < X ≤ b) = P (a ≤ X ≤ b) = f (x)dx
a
Z c
P (X ≤ c) = P (X < c) = f (x)dx
−∞
Z ∞
P (X ≥ c) = P (X > c) = f (x)dx
c
2.2. Continuous Random Variables 19
Exercises
If x < 0, then the event {X ≤ x} cannot occur since it’s not possible to get
fewer than zero heads. Consequently, F (x) = P (X ≤ x) = 0.
If 0 ≤ x < 1, then {X ≤ x} is actually the event that zero heads occur, so we
have that F (x) = P (X ≤ x) = 1/8.
If 1 ≤ x < 2, then {X ≤ x} is the event that 0 or 1 heads come up, so
F (x) = 4/8 = 1/2.
If 2 ≤ x < 3, then {X ≤ x} is the event that 0, 1, or 2, heads come up, so
F (x) = 7/8.
Finally, if x > 3, {X ≤ x} is the event that 0, 1, 2, or 3, heads come up, and
we have F (x) = 1.
2.3. Cumulative Distribution Functions 21
Note that the function jumps upward at the values of x that X takes with positive
probability. Between these x values F (x) is constant. We will see later that CDF’s
for continuous random variables have no such jumps.
I.e. F (x) is the cumulative area under the curve up to x. One will recall that the
Fundamental Theorem of Calculus says that F ′ (x) = f (x) if the function f (x) itself
is continuous.
Example 2.9. Find the CDF for the RV X that has pdf
1/10 if 10 ≤ x ≤ 20
f (x) = .
0 otherwise
Note that F (x) is constantly zero to the left of 10, constantly one to the right of
20 and a line segment connecting (10, 0) to (20, 1) when x is between 10 and 20.
We now list some general properties of distribution functions that can be de-
rived from the definition F (x) = P (X ≤ x).
22 2. Random Variables
Since the CDF steps up from 0 to 1/4 at x = 2, we have that f (2) = 1/4.
The CDF is then constant on [2, 6) and jumps 3/4 − 1/4 = 1/2 at x = 6. Hence,
f (6) = 1/2. Using the last of the properties of CDF’s, we could write this as
The last jump is of magnitude 1 − 3/4 = 1/4 at x = 7, giving us f (7) = 1/4. The
probability mass function is therefore given by
1/4 if x = 2
1/2 if x = 6
f (x) =
1/4 if x = 7
0 otherwise
Example 2.11. Find a rule for the CDF of the random variable X that has pdf
1
f (x) = .
π(1 + x2 )
√
Then use the CDF to compute P (X < 3) and P (−1 ≤ X ≤ 1).
2.3. Cumulative Distribution Functions 23
We have that
x
1
Z
F (x) = dt
−∞ π(1 + t2 )
1
= tan−1 t|x−∞
π
1
= (tan−1 x − (−π/2))
π
1 1
= + tan−1 x
2 π
1 1
Since F (x) = 2 + π tan−1 x, we have that
√ √
P (X < 3) = F ( 3)
1 1 π
= + ·
2 π 3
1 1
= +
2 3
5
= .
6
For example, the 75th percentile of the random variable from the last example
1 1
is given by the value of a that solves F (a) = 0.75. We solve + tan−1 a = 0.75
2 π
π
and obtain tan−1 a = or a = 1.
4
The reader should note that the median of a continuous random variable is
nothing more than its fiftieth percentile.
24 2. Random Variables
Exercises
(1) Can
6x(1 − x) if 0 ≤ x ≤ 1
F (x) =
0 otherwise
be a CDF? Explain why.
Ans. No. This function is actually decreasing on the interval ( 12 , 1).
(2) Suppose X is a random variable for which P (X = 0) = 1/4 and P (X = 1) =
3/4. Find a rule for X’s CDF F (x). Then graph this CDF and compute
F (−2), F (0), F (0.7),
and F (1.2).
0 if x < 0
Ans. F (x) = 1/4 if 0 ≤ x < 1 , F (−2) = 0, F (0) = 1/4, F (0.7) =
1 if x ≥ 1
1/4, F (1.2) = 1
(3) For the previous problem, compute (a) P (X ≤ 2) and (b) P (X > 1).
Ans. (a) F (2) = 1, (b) 1 − P (X ≤ 1) = 1 − F (1) = 1 − 1 = 0
(4) If we let the random variable X count the number of heads you get when you
toss a balanced coin 10 times, then the cumulative distribution function of X
is as given in the table. When reading the table, take the function’s value to
be constant from one integer up to all values less than the next larger integer.
For example, F (x) = 0.172 if 3 ≤ x < 4, and F (x) = 0.377 if 4 ≤ x < 5.
Compute the probability that (a) you get at most three heads, (b) you get at
least six heads, and (c) you get exactly 7 heads.
x F(x)
0 0.001
1 0.011
2 0.055
3 0.172
4 0.377
5 0.623
6 0.828
7 0.945
8 0.989
9 0.999
10 1.000
Ans. (a) P (X ≤ 3) = F (3) = 0.172, (b) P (X ≥ 6) = 1 − P (X < 6) =
1 − P (X ≤ 5) = 1 − F (5) = 1 − 0.623 = 0.377 (c) P (X = 7) = F (7) −
limx→7− F (x) = 0.945 − 0.828 = 0.117
(5) Graph and find a rule for the CDF of the random variable with pdf
1
x 2 for x ≥ 1
f (x) = ,
0 otherwise
Ans.
0 if x < 1
F (x) = 1 ,
1− x if x ≥ 1
2.4. Expected Values and Variance 25
Compute (a) P (X < 2), (b) P (1 < X < 2), (c) P (X > 1).
Ans. (a) 1 − e−2 , (b) e−1 − e−2 , (c) e−1
(7) Compute the 90th percentile of X to four decimal places in the previous
problem.
Ans. 2.3026
(8) For the random variable with pdf
6x(1 − x) if 0 ≤ x ≤ 1
f (x) =
0 otherwise
10a3 − 15a2 + 3 = 0.
2e−2x
if x ≥ 0
f (x) =
0 otherwise
compute F (4).
3600
Ans. F (4) = P (X ≤ 4) = 1 − P (X = 5) = 1 − 5269(5)2 = 0.9727.
If g(x) is a real valued function on the reals and X is a random variable, then
g(X) is a random variable too. We define the random variable Y by
Y = g(X)
E(g(X)) = E(Y )
X
= yP (g(X) = y)
y
X X
= y p(x)
y g(x)=y
X X
= yp(x)
y g(x)=y
X
= g(x)p(x).
x
Example 2.14. If X is a discrete random variable taking the values −6, 6, and 12,
with P (X = −6) = 12 , P (X = 6) = 13 and P (X = 12) = 61 , Y is a random variable
given by Y = X 2 - i.e. Y = g(X) where g(x) = x2 - we compute E(X 2 ).
Note that the random variable Y takes the two values 36 and 144. The proba-
bility Y takes the value 36 is the probability that X = −6 or 6 which is 21 + 13 = 65 .
The probability that Y = 144 is the probability that X = 12 which is 16 . Hence,
5 1
E(Y ) = 36 · + 144 · = 30 + 24 = 54.
6 6
Using the formula we derived above, we have
1 1 1
E(Y ) = E(X 2 ) = (−6)2 · + (6)2 · + (12)2 · = 18 + 12 + 24 = 54.
2 3 6
A particular formula that proves quite useful is for the linear function g(x) =
ax + b, where a and b are constants. We have
E(aX + b) = aE(X) + b.
Since Y = 10X, where X is the gain on a $100 bet on red, we can use our work
in the roulette example above to obtain
and the standard deviation of X, SD(X), is defined to be the square root of the
variance. We provide a formula for the variance in the case that X is discrete.
28 2. Random Variables
Using the fact that E(aX + b) = aE(X) + b, we can establish a more compu-
tationally friendly formula for V (X). We have that
Example 2.16. For the random variable X above for which P (X = −20) = 0.85,
P (X = 100) = 0.14, and P (X = 500) = 0.01, compute V (X).
We carry out the computation via the regular formula and then using the
shortcut formula.
X
V (X) = (x − µ)2 f (x)
x
= (−20 − 2)2 (0.85) + (100 − 2)2 (0.14) + (500 − 2)2 (0.01)
= (484)(0.85) + (9604)(0.14) + (248004)(0.01)
= 4236 .
2.4. Expected Values and Variance 29
compute E(X).
Z ∞
E(X) = xf (x)dx
−∞
Z 2
x
= x· dx
0 2
2
x2
Z
= dx
0 2
x3 2
=
6 0
4
= .
3
If the expected value of X exists and X only takes nonnegative values, then
there’s an alternative formula for computing E(X) that involves tail probabilities.
30 2. Random Variables
Note that
Z ∞
E(X) = xf (x)dx
0
Z ∞Z x
= 1 dyf (x)dx
Z0 ∞ Z0 ∞
= f (x)dxdy
0 y
Z ∞
= P (X > y)dy.
0
Example 2.18. Use this probability tails formula to compute the expected value
of the continuous random variable X with probability density function
2
x3 if x ≥ 1
f (x) =
0 if x < 1
This is the integral version of the sum formula we have for discrete random variables.
As was the case with discrete random variables, we have that
E(aX + b) = aE(X) + b
for the constants a and b, even when X is continuous. The variance V (X) of a
continuous random variable X is defined by V (X) = E((X − µ)2 ) just as in the
discrete case.
2.4. Expected Values and Variance 31
compute V (X).
4 2 2
We therefore have that V (X) = 2 − = .
3 9
Exercises
x f(x)
0 0.001
1 0.010
2 0.044
3 0.117
4 0.205
5 0.246
6 0.205
7 0.117
8 0.044
9 0.010
10 0.001
Ans. (a) 5 (b) 2.5
(3) An insurance company offers an automobile policy structured as follows: The
company makes $600 with probability 0.95, it loses $300 with probability 0.03,
and it loses $20, 000 with probability 0.02. If X represents the company’s gain
on one of these policies, compute E(X).
Ans. $161
(4) Suppose X is a continuous random variable with pdf
1/5 for 3 ≤ x ≤ 8
f (x) = .
0 otherwise
Compute (a) E(X) and (b) V (X).
25
Ans. (a) 5.5 (b)
12
(5) Suppose X is a continuous random variable with pdf
0 for x < 1
f (x) = 3 .
x4 if x ≥ 1
Compute (a) E(X) and
√ (b) SD(X).
Ans. (a) 3/2 (b) 3/2
(6) Suppose X is a continuous random variable with pdf
0 for x < 0
f (x) = .
3e−3x if x ≥ 0
Compute (a) E(X) and (b) V (X).
Ans. (a) 1/3 (b) 1/9
(7) Compute E(X) if X is a continuous random variable with pdf
1
f (x) = .
π(1 + x2 )
Ans. E(X) does not exist.
(8) Compute SD(X) if X is a random variable representing a gambler’s gain on
a $100 roulette bet on red.
Ans. 99.8614
2.4. Expected Values and Variance 33
(9) Use the probability tails formula to compute the expected value of the con-
tinuous random variable X with probability density function
−x
e if x ≥ 0
f (x) =
0 if x < 0
(10) Suppose X is a continuous random variable with cumulative distribution func-
tion
0 if x < 0
1 2
F (x) = x if 0 ≤x<2
4
1 if x ≥ 2
Compute then (a) E(X)√
and (b) SD(X).
Ans. (a) 34 (b) 32
Chapter 3
n! = n(n − 1)(n − 2) · · · 1.
For the special case that n = 0, we have the definition 0! = 1. Factorials grow very
quickly. It’s the case that 6! = 120, 10! = 3,628,800, and 20! = 2.433(10)18, etc. A
decent calculator will compute factorials. With the TI-84 for example, to get 12!
input 12, select MATH-PRB-!, and press ENTER, to get 479, 001, 600.
We noted the number of four digit numbers is 10, 000. This gives us the number
of four digit PINs. If we want to count the number of four digit PINs with distinct
numerals (i.e. where none of the numerals in the PIN repeat), we could use the
multiplication rule to get
10 × 9 × 8 × 7 = 5, 040.
35
36 3. Widely Used Discrete Random Variables
Exercises
3.2. Bernoulli 37
(1) A certain state’s license plates numbers consists of three letters A-Z followed
by three numerals 0-9. Compute the total number of license plates numbers
this state can have.
Ans. 17,576,000
(2) How many subsets with two elements does the set {a, b, c, d} have? List them.
Ans. (42 ) = 6, The subsets are {a, b}, {a, c}, {a, d}, {b, c}, {b, d}, {c, d}
(3) How many permutations of size two does abcd have? List them.
Ans. 4 P2 = 12, The permutations are ab, ba, ac, ca, ad, da, bc, cb, bd, db, cd, dc
3.2. Bernoulli
Perhaps the simplest of all discrete random variables used for modeling purposes
is the Bernoulli. It’s named after Jacob Bernoulli, a Swiss mathematician from the
latter half of the seventeenth century. The random variable X is said to have a
Bernoulli distribution with parameter p if P (X = 1) = p and P (X = 0) = 1−p. The
parameter p of course has to be a positive number less than 1. Simple computations
will produce the mean and standard deviation of the Bernoulli. We have
E(X) = 0 · (1 − p) + 1 · p = p
and
p p p
SD(X) = E(X 2 ) − (E(X))2 = 02 · (1 − p) + 12 · p − p2 = p(1 − p).
38 3. Widely Used Discrete Random Variables
3.3. Binomial
Suppose only 15% of airline passengers arriving at an international airport are
chosen for complete baggage scrutiny and that these passengers are selected at
random. What’s the probability that exactly three of the next ten passengers will
be selected? To answer this question we first deduce the probability that, of the
next ten passengers, the first three are selected and the following seven are not.
Since whether or not one passenger is selected is independent of whether or not
another is selected, we have that this probability is
(0.15)3 (0.85)7 .
Now there are many different ways exactly three of the ten passengers can be
selected. It could be the first three or the last three, or it could be the third, fifth,
and ninth, etc. The number of such ways is equal to the number of subsets of three
elements there are in a set of ten elements. In other words, there are
(103 ) = 120
ways. Each one of the ways is equally likely, so we have that the answer to our
question is
for k = 0, 1, 2, ..., n. A random variable having this probability mass function is said
to be binomial with parameters n and p.
3.3. Binomial 39
We used the Binomial Theorem for the penultimate equality. A similar compu-
tation along with some clever algebra allows us to compute E(X 2 ) = n(n−1)p2 +np.
We have
E(X 2 ) = E(X(X − 1) + X)
Xn
= k(k − 1)(nk )pk (1 − p)n−k + E(X)
k=0
n
X
= k(k − 1)(nk )pk (1 − p)n−k + np
k=2
n
X k(k − 1) · (n − 2)!
= n(n − 1)p2 pk−2 (1 − p)n−k + np
k!(n − k)!
k=2
n
X (n − 2)!
= n(n − 1)p2 pk−2 (1 − p)(n−2)−(k−2)
(k − 2)!(n − 2 − (k − 2))!
k=2
+ np
n−2
X (n − 2)!
= n(n − 1)p2 pk (1 − p)(n−2)−k + np
k!((n − 2) − k)!
k=0
= n(n − 1)p (p + 1 − p)n−2 + np
2
Exercises
(1) If you toss a fair coin 10 times, what’s the probability you get (a) exactly
five heads, (b) exactly three heads, and (c) exactly seven heads? Round your
answers off to four decimal places.
Ans. (a) 0.2461 (b) 0.1172 (c) 0.1172
(2) If you roll a balanced die 20 times, what’s the probability you get (a) exactly
four 6’s? (b) at least two 6’s?
Ans. (a) 0.2022 (b) 0.8696
(3) If you take a 10 question true false exam by guessing on each question, what’s
the probability (a) you get exactly seven questions correct? (b) you pass the
exam be getting at least 60% of the questions correct?
120
Ans. (a) 1024 = 0.1172, (b) 0.377
(4) Assuming it’s equally likely a couple will have a boy or a girl, what’s the
probability that a couple having five children will have (a) exactly two boys ?
(b) all boys?
Ans. (a) 0.3125 (b) 0.03125
(5) Only 15% of motorists come to a complete stop at a certain four way stop
intersection. What’s the probability that of the next ten motorists to go
through that intersection (a) none come to a complete stop, (b) at least one
comes to a complete stop, and (c) exactly two come to a complete stop.
Ans. (a) 0.1969, (b) 0.8031, (c) 0.2759
3
(6) Given X is a binomial random variable with n = 20 and p = 4, compute
F (18) where F (x) is the CDF of X.
Ans. F (18) = P (X ≤ 18) = 1 − P (X = 19 or 20) = 0.9757
(7) Given X is a binomial random variable with n = 10 and p = 12 , compute
P (µ − σ ≤ X ≤ µ + σ) where µ is the expected value and σ the standard
deviation of X. √ √
Ans. P (µ − σ ≤ X ≤ µ + σ) = P (5 − 2.5 ≤ X ≤ 5 + 2.5) = P (X =
4, 5, or 6) = 0.65625.
(8) Given X is a binomial random variable with mean 6 and standard deviation
2, compute P (X = 5).
Ans. 0.1812
3.4. Hypergeometric 41
3.4. Hypergeometric
Another discrete random variable that is closely related to the binomial is the
hypergeometric. For this model, there are a number of trials where the probability
of success in each trial changes depending on what happens in previous trials.
Suppose we select n objects from a lot of N objects without replacement. Suppose,
moreover that M of the objects in the lot of N are of a characteristic of interest.
We can then compute the probability that k of the n objects we select are of
the characteristic. Letting X count the number of objects of the characteristic of
interest in the selection of n objects, we have
N −M
(M
k )(n−k )
P (X = k) = .
(N
n)
Example 3.1. Suppose five cards are dealt without replacement from a well-
shuffled deck of 52 playing cards. What’s the probability exactly two of the five are
aces?
(42 )(52−4
5−2 )
P (X = 2) = = 0.0399.
(525 )
M
E(X) = n( )
N
and
M M N − n
V (X) = n 1− .
N N N −1
It’s interesting to note that if M → ∞ and N → ∞ in such a way that M/N stays
constant, say M N = p, then E(X) = np and lim V (X) = np(1 − p). Indeed,
M,N →∞
when M and N are really large, M/N changes very little from selection to selection
in a small sample from the lot, so that the hypergeometric is approximated by
the binomial with p = M N . After, summarizing the important formulas for the
hypergeometric, we illustrate with an example.
42 3. Widely Used Discrete Random Variables
Example 3.2. Suppose a city has 322,000 registered voters, 58% of whom support
a certain referendum. In a random sample of 20 voters from that city, what’s the
probability that exactly 12 support the referendum?
Letting X count the number in the sample who support the referendum, we
have that X is hypergeometric with N = 322, 000, M = (0.58)(322, 000) = 186, 760,
and n = 20. The answer is therefore
322000−186700
(186760
12 )( 20−12 )
P (X = 12) = 322000 = 0.1774.
( 20 )
Since M and N are so large, the binomial with n = 20 and p = 0.58 gives us a good
approximation:
P (X = 12) = (20 12
12 )(0.58) (1 − 0.58)
20−12
= 0.1768.
Exercises
3.5. Poisson
The Poisson distribution - named after the great French mathematician and physi-
cist Simeon Poisson of the early nineteenth century - arises as a discrete limiting
model of the binomial. We let X be binomial with parameters n and p and consider
a limiting version of this random variable where n → ∞ and p → 0 in such a way
that np stays constant, say np = λ. We note that for k = 0, 1, 2, ..., n,
lim P (X = k) = lim (nk )pk (1 − p)n−k
n→∞, p→0 n→∞,p→0
n(n − 1) · · · (n − k + 1) λ k λ n−k
= lim 1−
n→∞, p→0 k! n n
k
λ n(n − 1) · · · (n − k + 1) λ n−k
= lim 1 −
k! n→∞, p→0 nk n
k
λ 1 2 k − 1
= lim (1) 1 − 1− ··· 1 −
k! n→∞, p→0 n n n
λ n λ −k
· 1− 1−
n n
λk
= (1)(e−λ )(1)
k!
λk
= e−λ .
k!
The factor immediately preceding e−λ in the second to last line in fact has limit 1
since it consists of the finite number k of factors, each having limit 1.
for k = 0, 1, 2, 3, .... As the limiting derivation above shows, the Poisson with
parameter λ = np approximates a binomial random variable with parameters n
and p if n is large and p is close to zero. In practice, this approximation is good if
n ≥ 100, p ≤ 0.01, and np ≤ 20.
E(X) = V (X) = λ.
Note
∞
X λk
E(X) = k · e−λ
k!
k=0
X∞ λk
= k · e−λ
k!
k=1
∞
X λk−1
= λe−λ
(k − 1)!
k=1
∞
X λk
= λe−λ
k!
k=0
−λ λ
= λe e
= λ.
Also,
∞
X λk
E(X(X − 1)) = k(k − 1) · e−λ
k!
k=0
X∞ λk
= k(k − 1) · e−λ
k!
k=2
∞
X λk−2
= λ2 e−λ
(k − 2)!
k=2
∞ k
X λ
= λ2 e−λ
k!
k=0
2 −λ λ
= λ e e
2
= λ .
We provide here the basic properties of the Poisson before applying them to an
example problem.
3.5. Poisson 45
The Poisson random variable X with parameter λ has the probability mass
function
λk
P (X = k) = e−λ ,
k!
for k = 0, 1,
√ 2, 3, .... The mean and standard deviation are E(X) = λ and
SD(X) = λ.
Example 3.3. If 0.8% of the population has a certain disease, compute the proba-
bility that exactly three of two hundred randomly chosen individuals will have the
disease. Carry out this computation to four decimal places using (a) the binomial
distribution and (b) the Poisson distribution.
We let X count the number of individuals in the sample to have the disease.
Then (a) according to the binomial model we have that
P (X = 3) = (200 3
3 )(0.008) (1 − 0.008)
200−3
= 0.1382,
and (b) according to the Poisson model we have that
3
[(200)(0.008)]
P (X = 3) ∼
= e−200(0.008) = 0.1378.
3!
You’ll note that the approximation from the Poisson is good to three decimal places.
The second is the probability that N (t) = k and that two or more events occur in
at least one subinterval. The reader will note that the second of these probabilities
is bounded above by
Xn o(t/n)
o(t/n) = n · o(t/n) = t .
t/n
k=1
We provide an application.
Example 3.4. The number of tornadoes touching down per year in the two parish
Caddo/Bossier region is Poisson with mean 19. Compute (a) the probability that
there will be exactly 16 tornadoes to touch down in this region next year and (b)
that there will be exactly 40 to touch down over the next two years.
Exercises
3.6. Geometric 47
3.6. Geometric
Suppose only eight percent of New Orleans residents attended the last Saints foot-
ball game. If you randomly select residents of the city trying to find one who
attended the game, what’s the probability that you don’t encounter an attendee
until the sixth selection? For this to happen, the first five selected would have had
to be non-attendees and the sixth an attendee. The probability would therefore be
The fact that residents are selected randomly makes the selections independent,
allowing us to arrive at the answer by multiplying the probabilities of each of the
five non-attendees times the probability of the the subsequent attendee. Those
who study random phenomena use what is called the geometric random variable to
model this situation.
We consider a succession of independent trials, each of which results in success
with probability p. The geometric random variable X counts the number of trials
required to encounter the first success. In the example just related, we have that
P (X = 6) = 0.0527. In general we have:
48 3. Widely Used Discrete Random Variables
The reader will note that the probability mass function is in fact legitimate
since the probabilities of all the values X can take sum to 1:
∞ ∞
X
n−1
X 1
(1 − p) p=p (1 − p)n−1 = p · = 1.
n=1 n=1
1 − (1 − p)
To compute the mean, note that
X∞
E(X) = n(1 − p)n−1 p
n=1
∞
X
= p+ n(1 − p)n−1 p
n=2
∞
X
= p + (1 − p) n(1 − p)n−2 p
n=2
X∞
= p + (1 − p) (n + 1)(1 − p)n−1 p
n=1
∞
X
= p + (1 − p)(E(X) + 1(1 − p)n−1 p)
n=1
1
= p + (1 − p)(E(X) + p · )
1 − (1 − p)
= p + (1 − p)(E(X) + 1)
= (1 − p)E(X) + 1.
Solving E(X) = (1 − p)E(X) + 1 for E(X) yields E(X) = p1 . Computing the
variance of X requires a bit more ingenuity. The classic derivation involves writing
X 2 = X(X − 1) + X to get a value for E(X 2 ).
Exercises
(1) When flipping a balanced coin, what’s the probability that you get the first
heads on the fourth try?
1
Ans. 16
(2) If X is a random variable counting the number of times you have to flip
a balanced coin to get the first heads, (a) determine the probability mass
function for X, (b) compute E(X), and (c) compute V (X).
Ans. p(n) = 21n , E(X) = V (X) = 2
3.7. Negative Binomial 49
A technique we introduce later will allow us to derive the formulas for the mean
and standard deviation. We summarize in the table.
For the negative binomial random variable X counting the number of in-
dependent trials (each with success probability p) necessary to obtain r
successes,
n−1 r
P (X = n) = (r−1 )p (1 − p)n−r ,
for n = 1, 2,
√3, .... The mean and standard deviation are E(X) = r/p and
r(1−p)
SD(X) = p .
Exercises
(1) What’s the probability you have to roll a die eight times in order to get two
6’s?
Ans. 0.0651
(2) What’s the probability you have to roll a die 15 times in order to get four 1’s?
Ans. 0.0378
(3) A real estate analyst knows that in a certain county 23% of the houses have
a selling price higher than $300, 000. What’s the probability that during the
next month in that county, the fifth house sold at a price of more than $300, 000
is the twentieth house sold that month?
Ans. 0.0495
Chapter 4
4.1. Uniform
In previous chapters we studied theoretical continuous random variables just to
gain an understanding of probability density functions. Now we look at several
continuous random variables that are used extensively to model random phenom-
ena.
The uniform distribution on the interval [a,b], a < b, is given by the random
variable X with probability density function
1
b−a for a ≤ x ≤ b
f (x) = .
0 otherwise
Note that f (x) is in fact a probability density function since f (x) ≥ 0 for all x and
Z ∞ Z b
1 1 1
f (x)dx = dx = · x|ba = (b − a) = 1.
−∞ a b − a b − a b − a
In one of the exercises, the reader is asked to show that the random variable’s mean
(b−a)2
and variance are a+b
2 and 12 , respectively. We box in the basic formulas for the
uniform distribution before presenting an example.
51
52 4. Widely Used Continuous Random Variables
Example 4.1. When a clock’s battery runs out, the location at which the minute
hand stops is uniform with pdf
1
12 for 0 ≤ x ≤ 12
f (x) = .
0 otherwise
The 0 and 12 refer to the hours on the clock face, of course. Compute the probability
that the minute hand stops somewhere between 10 and 11 o’clock.
R 11 11
1 x 1
P (10 < X < 11) = 10 12
dx = 12 10 = 12 .
Exercises
(1) Given that X is a uniform random variable on the interval [−15, 185], compute
(a) P (X < 0), (b) P (10 < X < 50), and (c) P (X = 85).
Ans. (a) 0.075 (b) 0.2 (c) 0
(2) Computed the 75th percentile of X in the previous problem.
Ans. 135
a+b
(3) If the random variable X is uniform on [a, b], show that (a) E(X) = 2 and
2
(b−a)
(b) V (X) = 12 .
(4) Use the formulas in the previous problem to compute (a) the mean and (b)
the standard deviation of the random variable in the first exercise in this set.
Ans. (a) 85 (b) 57.73503
(5) If F (x) is the CDF of a uniform random variable on the interval [2, 6], compute
(a) F (3), (b) F (4), and (c) F (100).
Ans. (a) 0.25 (b) 0.5 (c) 1
(6) Find a rule for thefunction F (x) in the previous exercise.
0 if x < 2
x−2
Ans. F (x) = if 2≤x<6
4
1 if x ≥ 6
(7) Find a rule for the CDF F (x) of the random variable X that is uniform on
the interval [a, b]. The rule should consist of the three pieces where x < a,
a ≤ x < b, and x ≥ b.
0 if x < a
x−a
Ans. F (x) = if a≤x<b
b−a
1 if x ≥ b
4.2. Exponential
The random variable X with parameter λ > 0 is said to be an exponential random
variable if it has probability density function
λe−λx for x ≥ 0
f (x) = .
0 otherwise
Many phenomena, such as life times of organisms and interarrival times of
customers at a store or calls made to telephone company, etc., can be modeled
4.2. Exponential 53
The reader will recall that we studied the random variable N (t) that counts
the number of events occurring during the time interval [0, t]. We imposed the
assumptions that (1) the probability that a single event occurs in the time interval
is λt plus a term that is small in relation to t, (2) the probability that two or more
events occur in the interval is small in relation to t, and (3) that which occurs in one
interval has no probability effect on what happens in another disjoint interval. The
resulting probability mass function at which we arrived for N (t) was the Poisson:
(λt)k
P (N (t) = k) = e−λt
k!
for k = 0, 1, 2, ....
We now consider the time T on an interval up until which the first event occurs.
We have that
FT (t) = P (T ≤ t)
= 1 − P (T > t)
= 1 − P (N (t) = 0)
(λt)0
= 1 − e−λt
0!
1 − e−λt .
Consequently,
d
fT (t) = FT′ (t) = (1 − e−λt ) = λe−λt .
dt
The reader will recognize this as the probability density function for the exponential
random variable with parameter λ. Subsequent interarrival times are independent
and follow the same distribution.
Z ∞
E(X) = x · λe−λx dx
0
Z ∞
= −xe−λx |∞
0 + e−λx dx
0
1
= 0 − 0 − e−λx |∞
0
λ
1
= .
λ
54 4. Widely Used Continuous Random Variables
Note that the computation of lim xe−λx in this integral requires use of L’Hospital’s
x→∞
Rule:
x 1
lim xe−λx = lim λx
= lim = 0.
x→∞ x→∞ e x→∞ λeλx
We summarize the basics for the exponential distribution and then present
some examples.
Example 4.2. The time between calls at a phone company is exponentially dis-
tributed with mean 4 s. What’s the probability that the time between the next two
calls is more than 5 s?
We first find λ:
Z 1
0.90 = P (X < 1) = λe−λx dx = −e−λx |10 = −e−λ + 1,
0
.
so e−λ = 0.10 and λ = − ln 0.10 = 2.3026.
Now we have
Z ∞
.
0.05 = P (X > t) = 2.3026e−2.3026xdx = · · · = e−2.3036t ,
t
1 .
so that t = − 2.3026 ln 0.05 = 1.301.
Exercises
(1) Given X is an exponential random variable with mean 4, compute the prob-
abilities (a) P (X ≤ 4) and (b) P (X ≤ 2) to four decimal places.
Ans. (a) 0.6321 (b) 0.3935
(2) Given that X is an exponential random variable with parameter λ = 1, find
a rule for the distribution
function F of X.
0 if x < 0
Ans. F (x) =
1 − e−x if x ≥ 0
4.3. Gamma 55
(3) Find (a) the median and (b) the mean of the exponential random variable X
that has parameter λ = 1. Why is one greater than the other?
Ans. (a) ln 2 (b) 1
(4) Find (a) the mean, (b) the standard deviation, and the (c) median of the
exponential random variable X that has parameter λ = 3.
Ans. (a) 1/3, (b) 1/3, (c) −(1/3) ln(1/2) = 0.2310
(5) Suppose you model the lifetime of a certain battery with an exponential ran-
dom variable that has mean 2.5 hrs. According to this model, what’s the
probability that a randomly selected such battery lasts for more than three
hours?
Ans. 0.3012
(6) If X is an exponential random variable for which P (X < 1) = 0.80, find t so
that P (X > t) = 0.05.
Ans. 1.8614
4.3. Gamma
Playing an integral role in the gamma distribution is the gamma function
Z ∞
Γ(α) = e−t tα−1 dt.
0
Integrating by parts, one can establish the relationship Γ(α) = (α − 1)Γ(α − 1).
Using this formula when α is a positive integer n and noting that Γ(1) = 1, we have
that
Γ(n) = (n − 1)!.
The probability density function for the gamma distribution with positive pa-
rameters α and β is given by
x
1 α−1 − β
Γ(α)β αx e for x ≥ 0
f (x) = .
0 otherwise
The reader will note that the gamma distribution reduces to the exponential with
parameter λ for α = 1 and β = λ1 . The gamma distribution models the waiting
time until the αth event occurs.
4.4. Normal
Many populations of values follow what is called a normal distribution. Examples
are weights of female individuals, heights of corn stalks, lengths of “8.5 × 11” sheets
of paper, etc., Moreover, one can apply a result called the Central Limit Theorem
to model the averages of random values from any distribution with the normal
distribution. Hence, the normal distribution plays a major role in statistics.
Putting all this together we see that the probability density function for the
normal random variable is a bell-shaped curve symmetric about x = µ. The larger
the value of σ the flatter the curve.
Example. The number of ounces of Coke in a “12 oz” can is normally distributed
with µ = 12 and σ = 0.2. What’s the probability that a randomly selected 12 oz
can of Coke has (a) between 11.9 and 12.1 oz? (b) between 11.5 and 12.5 oz? (c)
at least 11.8 oz?
(a) Letting f (x) be the pdf for the normal random variable X with mean 12
and standard deviation 0.2, we see that
Z 12.1
.
P (11.9 < X < 12.1) = f (x)dx = 0.383.
11.9
The integral cannot be evaluated algebraically, so a graphing calculator or computer
algebra system is used.
4.4. Normal 57
Z 12.5
.
(b) P (11.5 < X < 12.5) = f (x)dx = 0.988
11.5
Z ∞ Z 12
.
(c) P (X ≥ 11.8) = f (x)dx = f (x)dx + 0.5 = 0.841.
11.8 11.8
Since f (x) is symmetric about x = 12, half of X’s probability is to the right of
12.
1 x2
f (x) = √ e− 2 .
2π
A double integral in polar coordinates can be used to show that this function
integrates to one:
Z ∞
1 x2
If we let I = √ e− 2 dx, we have
−∞ 2π
∞ Z ∞
1 − x2 1
Z
y2
2
I = √ e 2 dx · √ e− 2 dy
−∞ 2π −∞ 2π
Z ∞Z ∞
1 − x2 1 − y2
= √ e 2 · √ e 2 dxdy
−∞ −∞ 2π 2π
Z ∞Z ∞
1 − x2 +y2
= e 2 dxdy
−∞ −∞ 2π
Z 2π Z ∞
1 − r2
= e 2 rdrdθ
0 0 2π
Z 2π
1 r2
= − e − 2 |∞ 0 dθ
0 2π
Z 2π
1
= dθ
0 2π
= 1
so that I = 1.
Since there’s not a simple rule for the cumulative distribution function of the
standard normal random variable Z, statisticians typically use a table. The tabular
entries are for P (Z ≤ x) - for x between zero and 3.59 in increments of 0.01. Since
Z has a probability density function symmetric about x = 0 and more than 99.9%
of Z’s probability fall between -3.5 and 3.5, one has more than enough values in
the table to take care of business. A common notation related to the table is that
of zα . The value zα is defined for all positive α < 1 by
α = P (Z > zα ).
4.4. Normal 59
0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0.0 0.5000 0.5040 0.5080 0.5120 0.5160 0.5199 0.5239 0.5279 0.5319 0.5359
0.1 0.5398 0.5438 0.5478 0.5517 0.5557 0.5596 0.5636 0.5675 0.5714 0.5753
0.2 0.5793 0.5832 0.5871 0.5910 0.5948 0.5987 0.6026 0.6064 0.6103 0.6141
0.3 0.6179 0.6217 0.6255 0.6293 0.6331 0.6368 0.6406 0.6443 0.6480 0.6517
0.4 0.6554 0.6591 0.6628 0.6664 0.6700 0.6736 0.6772 0.6808 0.6844 0.6879
0.5 0.6915 0.6950 0.6985 0.7019 0.7054 0.7088 0.7123 0.7157 0.7190 0.7224
0.6 0.7257 0.7291 0.7324 0.7357 0.7389 0.7422 0.7454 0.7486 0.7517 0.7549
0.7 0.7580 0.7611 0.7642 0.7673 0.7704 0.7734 0.7764 0.7794 0.7823 0.7852
0.8 0.7881 0.7910 0.7939 0.7967 0.7995 0.8023 0.8051 0.8078 0.8106 0.8133
0.9 0.8159 0.8186 0.8212 0.8238 0.8264 0.8289 0.8315 0.8340 0.8365 0.8389
1.0 0.8413 0.8438 0.8461 0.8485 0.8508 0.8531 0.8554 0.8577 0.8599 0.8621
1.1 0.8643 0.8665 0.8686 0.8708 0.8729 0.8749 0.8770 0.8790 0.8810 0.8830
1.2 0.8849 0.8869 0.8888 0.8907 0.8925 0.8944 0.8962 0.8980 0.8997 0.9015
1.3 0.9032 0.9049 0.9066 0.9082 0.9099 0.9115 0.9131 0.9147 0.9162 0.9177
1.4 0.9192 0.9207 0.9222 0.9236 0.9251 0.9265 0.9279 0.9292 0.9306 0.9319
1.5 0.9332 0.9345 0.9357 0.9370 0.9382 0.9394 0.9406 0.9418 0.9429 0.9441
1.6 0.9452 0.9463 0.9474 0.9484 0.9495 0.9505 0.9515 0.9525 0.9535 0.9545
1.7 0.9554 0.9564 0.9573 0.9582 0.9591 0.9599 0.9608 0.9616 0.9625 0.9633
1.8 0.9641 0.9649 0.9656 0.9664 0.9671 0.9678 0.9686 0.9693 0.9699 0.9706
1.9 0.9713 0.9719 0.9726 0.9732 0.9738 0.9744 0.9750 0.9756 0.9761 0.9767
2.0 0.9772 0.9778 0.9783 0.9788 0.9793 0.9798 0.9803 0.9808 0.9812 0.9817
2.1 0.9821 0.9826 0.9830 0.9834 0.9838 0.9842 0.9846 0.9850 0.9854 0.9857
2.2 0.9861 0.9864 0.9868 0.9871 0.9875 0.9878 0.9881 0.9884 0.9887 0.9890
2.3 0.9893 0.9896 0.9898 0.9901 0.9904 0.9906 0.9909 0.9911 0.9913 0.9916
2.4 0.9918 0.9920 0.9922 0.9925 0.9927 0.9929 0.9931 0.9932 0.9934 0.9936
2.5 0.9938 0.9940 0.9941 0.9943 0.9945 0.9946 0.9948 0.9949 0.9951 0.9952
2.6 0.9953 0.9955 0.9956 0.9957 0.9959 0.9960 0.9961 0.9962 0.9963 0.9964
2.7 0.9965 0.9966 0.9967 0.9968 0.9969 0.9970 0.9971 0.9972 0.9973 0.9974
2.8 0.9974 0.9975 0.9976 0.9977 0.9977 0.9978 0.9979 0.9979 0.9980 0.9981
2.9 0.9981 0.9982 0.9982 0.9983 0.9984 0.9984 0.9985 0.9985 0.9986 0.9986
3.0 0.9987 0.9987 0.9987 0.9988 0.9988 0.9989 0.9989 0.9989 0.9990 0.9990
3.1 0.9990 0.9991 0.9991 0.9991 0.9992 0.9992 0.9992 0.9992 0.9993 0.9993
3.2 0.9993 0.9993 0.9994 0.9994 0.9994 0.9994 0.9994 0.9995 0.9995 0.9995
3.3 0.9995 0.9995 0.9995 0.9996 0.9996 0.9996 0.9996 0.9996 0.9996 0.9997
3.4 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9998
3.5 0.9998 0.9998 0.9998 0.9998 0.9998 0.9998 0.9998 0.9998 0.9998 0.9998
60 4. Widely Used Continuous Random Variables
Example 4.4. We look at some particular cases for using the standard normal
table.
.
b) P (Z > 1.04) = 1 − P (Z ≤ 1.04) = 1 − 0.8508 = 0.1492
We used the complement rule P (A′ ) = 1 − P (A).
c)
P (−1 < Z < 2) = P (Z < 2) − P (Z ≤ −1)
.
= 0.9772 − P (Z ≥ 1)
= 0.9772 − (1 − P (Z < 1))
.
= 0.9772 − (1 − 0.8413)
= 0.8185
e) z.025 = 1.96 since P (Z ≤ 1.96) = 0.975 (so that P (Z > 1.96) = 0.025).
Exercises
(1) Use the table to compute (a) P (Z < 1.32), (b) P (Z > 0.64), (c) P (Z ≤
−2.06), (d) P (0.32 < Z < 1.56), (e) P (−1.54 < Z < 1.54), and (f) P (Z = 0)
Ans. (a) 0.9066 (b) 0.2611 (c) 0.0197 (d) 0.3151 (e) 0.8764 (f ) 0
(2) Use the table to compute (a) z.025 , (b) z.05 , and (c) z.01
Ans. (a) 1.96 (b) about half way between 1.64 and 1.65 so approximately
1.645 (c) a bit closer to 2.33 than to 2.32 so about 2.327
4.4. Normal 61
(3) If X is normal with µ = 1.4 and σ = 2.0, compute (a) P (X < 2.1), (b)
P (X > 1.0), (c) P (0.6 < X < 2.2), and (d) the 90th percentile of X.
Ans. (a) 0.6368
(4) The mean for scores on a certain test is 300, and the standard deviation is 16.
Assuming the test scores are normally distributed, compute the probability
that a random score on this test is at least 310. What score would be in the
95th percentile?
Ans. 0.2660, 326.3
(5) If X is normal with µ = −5 and σ = 4.2, compute (a) P (X < −7), (b)
P (X > 0), and (c) P (20 < X < 25).
Ans. (a) The table says the answer is between 0.6808 and 0.6844. A
decent graphing calculator will tell you the more precise answer is 0.6830. (c)
0.0000
(6) If X is normal with mean µ and standard deviation σ, compute (a) P (µ − σ <
X < µ + σ), (b) P (µ − 2σ < X < µ + 2σ), and (c) P (µ − 3σ < X < µ + 3σ).
Ans. (a) 0.6827, (b) 0.9545, (c) 0.9973. Note that statisticians talk about a
an “Empirical Rule” which states that for normal distributions the probabilities
that values are within one, two, and three, standard deviations of the mean
are roughly 68%, 95%, and 99%, respectively.
(7) Compute to two decimal places the 90th, 95th, and 99th percentiles of the
standard normal.
Ans. 1.28, 1.64, 2.33
Chapter 5
F (x, y) = P (X ≤ x, Y ≤ y),
63
64 5. Joint Probability Distributions
where p(x, y) is the joint probability mass function for X and Y . Similarly, we have
the marginal probability mass function of Y given by
X
pY (y) = p(x, y).
x
We note that both X and Y can only take the values 0, 1, and 2. Thinking in
terms of pertinent hypergeometric distributions, one can compute the probabilities
in the table:
y
p(x, y) 0 1 2
1 10 10
0 45 45 45
6 15
x 1 45 45 0
3
2 45 0 0
1 6 3
The marginal probability mass function for Y is given by fY (0) = 45 + 45 + 45 =
10
45 , fY (1) = 10 15 25 10 10
45 + 45 + 0 = 45 , and fY (2) = 45 + 0 + 0 = 45 .
For our discrete pair of random variables X and Y , we can define the conditional
probability mass function of X given Y = y by
pX|Y (x|y) = P (X = x|Y = y)
P (X = x, Y = y)
=
P (Y = y)
p(x, y)
= .
pY (y)
We have
p(0, 1) 10/45 10 2
pX|Y (0|1) = = = = ,
pY (1) 25/45 25 5
p(1, 1) 15/45 15 3
pX|Y (1|1) = = = = , and
pY (1) 25/45 25 5
p(2, 1) 0
pX|Y (2|1) = = = 0.
pY (1) 25/45
It’s even possible to have a conditional distribution for X given a function of the
random variables X and Y equals some constant. We see in our next example
that if X and Y are independent Poisson random variables, that the conditional
distribution of X given the sum of X and Y is a constant natural number becomes
a Binomial random variable.
Example 5.3. Suppose X and Y are independent Poisson random variables with
parameters λX and λY , respectively. Calculate the probability mass function for
X given X + Y = n
66 5. Joint Probability Distributions
We have
P (X = k, X + Y = n)
P (X = k|X + Y = n) =
P (X + Y = n)
P (X = k, Y = n − k)
=
P (X + Y = n)
P (X = k) · P (Y = n − k)
=
P (X + Y = n)
e−λX λkX e−λY λYn−k
= ·
k! (n − k)!
n
X e−λX λiX e−λY λYn−i
÷ ·
i=0
i! (n − i)!
λkX λYn−k
= e−(λX +λY )
k!(n − k)!
n
X λiX λYn−i
÷ e−(λX +λY )
i=0
i!(n − i)!
n
λkX λYn−k 1 X n!
= ÷ λi λn−i
k!(n − k)! n! i=0 i!(n − i)! X Y
λkX λYn−k 1
= ÷ (λX + λY )n
k!(n − k)! n!
n! λ k λ n−k
X Y
=
k!(n − k)! λX + λY λX + λY
n! λX k λX n−k
= 1− .
k!(n − k)! λX + λY λX + λY
So the conditional distribution of X given X + Y = n is binomial with parameters
n and λXλ+λ
X
Y
.
Thus far we have been discussing joint probability distributions for two discrete
random variables. We can generalize the discussion to n discrete random variables.
When dealing with the n random variables X1 , X2 , ..., Xn , for example, the function
p(x1 , x2 , ..., xn ) = P (X1 = x1 , X − 2 = x2 , ..., Xn = xn )
serves as a probability mass function. We have p(x1 , x2 , ..., xn ) ≥ 0. Also
X X
··· p(x1 , x2 , ..., xn ) = 1
x1 xn
and X X
P ((X1 , ..., Xn ) ∈ A) = ··· p(x1 , x2 , ..., xn ).
(x1 ,...,xn )∈A
first. Suppose n is a positive integer and that the r nonnegative integers n1 , ..., nr ,
are such that
n1 + · · · + nr = n.
n
Then the multinomial coefficient is defined by
n1 , ..., nr
n n!
= .
n1 , ..., nr n1 !n2 ! · · · nr !
We list some quantities that this coefficient counts:
• The number of ways you can split n distinct objects into r distinct groups of
sizes n1 , n2 , . . . , and nr , respectively.
• The number of n-letter words made up of r distinct letters used n1 , n2 , . . .
, and nr , times, respectively
• The coefficient on xn1 1 · · · xnr r in the expansion of (x1 + · · · + xr )n
For example, if you roll a die eight times, the number of ways you can obtain
one 1, one 2, zero 3’s, zero 4’s, two 5’s, and four 6’s, is
8 8!
= = 840.
1, 1, 0, 0, 2, 4 1!1!0!0!2!4!
Another example is to count how many words (including nonsensical ones) you
can make from two a’s, one c, three t’s, and one z. The answer is
7 7!
= = 420.
2, 1, 3, 1 2!1!3!1!
A couple of these 420 words are aactttz and acatttz.
Example 5.4. Roll a die eight times. What’s the probability (a) you obtain one
1, one 2, zero 3’s, zero 4’s, two 5’s, and four 6’s and (b) you obtain one 1, one 2,
one 3, one 4, two 5’s, and two 6’s?
(a)
8
(1/6)1 (1/6)1 (1/6)0 (1/6)0 (1/6)2 (1/6)4
1, 1, 0, 0, 2, 4
or
8! 840
(1/6)8 = = 0.0005.
1!1!0!0!2!4! 1, 679, 616
(b)
8
(1/6)1 (1/6)1 (1/6)1 (1/6)1 (1/6)2 (1/6)2
1, 1, 1, 1, 2, 2
or
8! 10, 080
(1/6)8 = = 0.0060.
1!1!1!1!2!2! 1, 679, 616
Exercises
(1) Suppose X and Y are discrete random variables with probability mass function
as given in the table.
Y
p(x, y) 2 4
X 0 0.30 0.20
6 0.40 0.10
Compute (a) P (X + Y < 5), (b) P (Y > X), and (c) E(X)
Ans.: (a) 0.50, (b) 0.50, and (c) 3
(2) Suppose X and Y are discrete random variables with probability mass function
as given in the table.
Y
p(x, y) 3 5 10
X 4 0.25 0.10 0.05
6 0.20 0.02 0.38
5.2. Continuous Case 69
Find rules for the two marginal probability mass functions and compute
(a) P (Y ≥ 5), (b) P (X + Y ≤ 9), (c) E(X), and (d) E(Y ).
Ans.:
0.40 if x = 4
pX (x) = 0.60 if x = 6
0 otherwise
0.45 if y = 3
0.12 if y = 5
pY (y) =
0.43 if y = 10
0 otherwise
F (x, y) = P (X ≤ x, Y ≤ y).
As was the case with a pair of discrete random variables, here with continuous ran-
dom variables, the distribution function FX (x) is given by FX (x) = limy→∞ F (x, y).
Similarly, we have that the distribution function for Y is given by FY (y) = limx→∞ F (x, y).
The marginal probability density function of X is given by
Z ∞
fX (x) = f (x, y)dy,
−∞
where f (x, y) is the joint probability density function for X and Y . Similarly, the
marginal probability density function of Y is given by
Z ∞
fY (y) = f (x, y)dx.
−∞
We provide an example.
Example 5.5. Suppose X and Y are continuous random variables with joint prob-
ability density function
1 1
12 x + 24 y if 0 ≤ x ≤ 3 and 0 ≤ y ≤ 2
f (x, y) =
0 otherwise
Compute then (a) P (X ≤ 2 and Y ≥ 1), (b) P (X + Y ≤ 1), (c) a rule for fX (x),
(d) a rule for fY (y), and (e) E(X).
5.2. Continuous Case 71
(a)
2 2
1 1
Z Z
P (X ≤ 2, Y ≥ 1)) = ( x + y) dydx
0 1 12 24
2
1 1
Z
= ( xy + y 2 )|21 dx
0 12 48
2
1 1 1 1
Z
= ( x+ − ( x + )) dx
0 6 12 12 48
2
1 3
Z
= x + )dx (
0 12 48
1 3
= ( x2 + x)|20
24 48
7
= .
24
(b)
1 1−x
1 1
Z Z
P (X + Y ≤ 1)) = ( x + y) dydx
0 0 12 24
1
1 1
Z
= ( xy + y 2 )|1−x
0 dx
0 12 48
1
1 1
Z
= ( (x − x2 ) + (1 − 2x + x2 )) dx
0 12 48
1
3 2 1 1
Z
= (− x + x + )dx
0 48 24 48
1
= .
48
(c)
2
1 1
Z
fX (x) = x + y) dy (
0 12 24
1 1
= ( xy + y 2 )|20
12 48
1 1
= x+ ,
6 12
so 1 1
6 x + 12 if 0 ≤ x ≤ 3
fX (x) =
0 otherwise
(d)
3
1 1
Z
fY (y) = (
x + y) dx
0 12 24
1 2 1
= ( x + xy)|30
24 24
3 1
= + y,
8 8
72 5. Joint Probability Distributions
so
1 3
8y + 8 if 0 ≤ y ≤ 2
fY (y) =
0 otherwise
(e)
3
1 1
Z
E(X) = x( x + ) dx
0 6 12
3
1 1
Z
= ( x2 + x) dx
0 6 12
1 3 1 2 3
= ( x + x )|0
18 24
27 9
= +
18 24
15
= .
8
As was the case with pairs of discrete random variables, we can discuss con-
ditional probability density functions associated with a joint probability density
function for a pair of continuous random variables. The conditional probability
density function of X given Y = y is given by
f (x, y)
fX|Y (x|y) = ,
fY (y)
where f (x, y) is the joint probability density function for the continuous rnadom
variables X and Y . Here, we’re assuming y is a value for which fY (y) > 0.
Example 5.6. For the joint probability density function in the previous example,
take y to be a value in [0, 2], and compute fX|Y (x|y).
We have
1 1
12 x + 24 y 2x + y
fX|Y (x|y) = 1 3 = ,
8 y + 8
3y + 9
if 0 ≤ x ≤ 3. In the particular case that y = 13 , we have
1 2x + 1 1 1
fX|Y (x| ) = 1 3 = x + ,
3 3( 3 ) + 9 5 30
if 0 ≤ x ≤ 3.
As is the case when X and Y are discrete, The continuous random variables X
and Y are said to be independent if for all real numbers x and y,
F (x, y) = FX (x)FY (y).
An equivalent definition is that
f (x, y) = fX (x)fY (y),
where the f ’s are probability density functions.
5.2. Continuous Case 73
Exercises
(1) Suppose X and Y are continuous random variables with joint probability
density function
1
6 if 0 ≤ x ≤ 3 and 0 ≤ y ≤ 2
f (x, y) =
0 otherwise
Compute the probabilities (a) P (X ≤ 2), (b) P (X + Y ≤ 1) and (c) P (Y ≥
1 2
9 X ).
Ans. (a) 32 , (b) 12
1
, (c) 65
(2) Suppose X and Y are continuous random variables with joint probability
density function
c(x3 + y) if 0 ≤ x ≤ 1 and 0 ≤ y ≤ 2
f (x, y) =
0 otherwise
Compute the value of c, find rules for the two marginal probability density
functions, and compute the probabilities P (Y ≤ 1) and P (X < Y ).
Ans.: c = 2/5,
4 3
fX (x) = 5 (x + 1) if 0 ≤ x ≤ 1
0 otherwise
2 1
5 y + 10 if 0 ≤ y ≤ 2
fY (y) =
0 otherwise
64
P (Y ≤ 1) = 0.3, and P (X < Y ) = 75 .
(3) Determine if the random variables in the preceding problem are independent.
Ans.: They are not.
(4) Suppose the continuous random variables X and Y have the joint probability
density function
ab e−ax−by if x ≥ 0 and y ≥ 0
f (x, y) =
0 otherwise
Here a and b are positive constants. This can model the lifetimes of two
components, the first - X - having mean life a1 and the second - Y - having
mean life 1b . Compute (a) the probability that both last past time t, i.e.
P (X > t, Y > t) and (b) that the first lasts longer than the second, i.e.
P (X > Y ).
74 5. Joint Probability Distributions
b
Ans.: (a) e−(a+b)t (b) a+b
(5) Determine if the random variables in the preceding problem are independent.
Ans.: They are. Note
Z sZ t Z s Z t
F (s, t) = abe−ax−by dydx = ae−ax dx be−by dy = FX (s)FY (t)
0 0 0 0
(6) Suppose the continuous random variables X and Y have the joint probability
density function
cye−2x−y
if x ≥ 0 and y ≥ 0
f (x, y) =
0 otherwise
Compute (a) P (X < 1, Y < 1), (b) P (X > 2), (c) P (X < Y ), and (d)
P (X + Y < 1).
Ans.: c = 2, P (X < 1, Y < 1) = 1 − 2e − e12 + e23 = 0.2285, P (X > 2) =
e , P (X < Y ) = 89 , and P (X + Y < 1) = 1 − 2e−1 − e−2 = 0.1289.
−4
(7) Suppose X and Y are independent exponential random variables with means
µX = 4.0 and µY = 5.0. Compute the probability P (X ≥ 5 and Y ≥ 5).
Ans. P (X ≥ 5, Y ≥ 5) = 0.1059
(8) The random vector (X, Y ) has probability density function
c(x3 + 2xy) if 0 ≤ y ≤ x and 0 ≤ x ≤ 2
f (x, y) =
0 otherwise
Compute (a) the value of c, (b) P (X > 2Y ), (c) E(X), and (d) E(Y ).
5
Ans. (a) 52 , (b) 21 64 12
52 , (c) 39 , (d) 13
Example 5.7. Suppose that X and Y are independent binomial random variables
with the same probability of success p for each trial - say X is binomial with
parameters m and p and Y parameters n and p. Then for g(x, y) = x + y, find a
rule for W = g(X, Y ).
Suppose now that X and Y are continuous random variables with joint prob-
ability density function fX,Y (x, y) and that g(x, y) is a real valued two variable
function that yields the continuous random variable W = g(X, Y ). To obtain the
probability density function of W , we first calculate the cumulative distribution
function of W and then take its derivative to get the probability density function.
For example, if we let g(x, y) = x + y so that W = X + Y , we have
Z Z
FX+Y (w) = fX,Y (x, y)dydx
x+y≤w
Z ∞ Z w−x
= fX,Y (x, y)dydx.
−∞ −∞
76 5. Joint Probability Distributions
Example 5.8. Suppose X and Y are independent standard normal random vari-
ables. Calculate the probability density function of X + Y .
Z ∞
FX+Y (w) = fX,Y (x, w − x)dx
−∞
Z ∞
= fX (x) · fY (w − x)dx
−∞
Z ∞
1 x2 1 (w−x)2
= √ e− 2 · √ e− 2 dx
−∞ 2π 2π
Z ∞
1 2 w2 w2
= e−(x −wx+ 4 )− 4 dx
2π −∞
1 − w2 ∞ −(x− w )2
Z
= e 4 e 2 dx
2π −∞
(x− w )2
√ 1 1 − w2 ∞
Z
1 −
2( √
2
1 )2
= ( 2π · √ ) · e 4 √ ·e 2 dx
2 2π −∞ 2π · √1
2
1 − √w2
= √ √ e 2( 2)2 · 1.
2π · 2
The integral does in fact have a value of 1 since the integrand is the pdf for a
normal distribution. The reader will recognize the pdf for X√ + Y as that of a
normal random variable with mean 0 and standard deviation 2.
Now we consider the pair of real valued, two variable functions g(x, y) and
h(x, y) defined on the range of (X, Y ). Assume that their first partial derivatives
are continuous on this range. Assume also that the transformation defined by
u = g(x, y) and v = h(x, y) is one-to-one. Then, defining the two random variables
U and V by U = g(X, Y ) and V = h(X, Y ), according to the change of variables
5.4. Expected Value, Variance, and Covariance 77
formula from multidimensional calculus, the joint probability density function for
U and V is given by
1
fU,V (u, v) = fX,Y (x, y),
|J(x, y)|
where (x, y) is the point in the range of (X, Y ) for which g(x, y) = u and h(x, y) = v,
and J(x, y) is the Jacobian
∂g ∂g
∂x ∂y
J(x, y) = ∂h ∂h
∂x ∂y
There are no exercises for this section. We put the theory of functions of joint
probability distributions to work in the next section to compute expected values.
We note that if the two random variables X and Y are discrete and a and b
are constants,
XX
E(aX + bY ) = (ax + by)pX,Y (x, y)
x y
XX XX
= a x · pX,Y (x, y) + b y · pX,Y (x, y)
x y x y
= aE(X) + bE(Y ).
A similar derivation in the continuous case yields the same formula.
To compute the variance of X + Y in both the discrete and continuous cases,
we note
V (X + Y ) = E(X + Y − (µX + µY ))2
= E((X − µX ) + (Y − µY ))2
= E((X − µX )2 + 2(X − µX )(Y − µY ) + (Y − µY )2 )
= E(X − µX )2 + 2E(X − µX )(Y − µY ) + E(Y − µY )2
= V (X) + 2E(X − µX )(Y − µY ) + V (Y ).
78 5. Joint Probability Distributions
The middle term here sans the factor of 2 is referred to as the covariance of X and
Y , labeled Cov(X, Y ). So we have
V (X + Y ) = V (X) + V (Y ) + Cov(X, Y ).
We can establish a shortcut formula for the covariance in much the same way
we did for the variance of a single random variable. We have
Note that if X and Y are independent, then (in the discrete case)
XX
E(XY ) = xy · pX,Y (x, y)
x y
XX
= xy · pX (x) · pY (y)
x y
X X
= x · pX (x) y · pY (y)
x y
= E(X)E(Y ).
The same formula can be established in a similar way in the continuous case. Hence,
if X and Y are independent,
Cov(X, Y )
ρ= .
σX σY
The reader will note that when X and Y are independent - i.e. they have no
relationship - ρ = 0 because the covariance is zero. In the case that Y is a linear
function of X, say Y = aX + b with a and b being constants and a 6= 0, we have
Finally we note that ρ only takes values in the interval [−1, 1]. This can be
verified without too much difficulty. Note that
X − µX Y − µY
0 ≤ V( + )
σX σY
1 2 1 2 X − µX Y − µY
= ( ) V (X) + ( ) V (Y ) + 2 · Cov( , )
σX σY σX σY
X − µX Y − µY
= 1 + 1 + 2E( − 0)( − 0)
σX σY
1
= 2 + 2( )E(X − µX )(Y − µY )
σX σY
= 2(1 + ρ).
So 1 + ρ ≥ 0 which makes ρ ≥ −1. Similarly,
X − µX Y − µY
0 ≤ V( − )
σX σY
X − µX Y − µY
= 1 + 1 − 2 · Cov( , )
σX σY
= 2 − 2ρ
= 2(1 − ρ).
We therefore have that 1 − ρ ≥ 0 so that ρ ≤ 1. We summarize.
3
1 1
Z
2
E(X ) = x2 ( x + ) dx
0 6 12
3
1 1
Z
= ( x3 + x2 ) dx
0 6 12
1 4 1 3 3
= ( x + x )|0
24 36
81 27
= +
24 36
27 3
= +
8 4
33
= .
8
Next
2
1 3
Z
E(Y ) = y( y + ) dy
0 8 8
2
1 3
Z
= ( y 2 + y) dy
0 8 8
1 3 3 2 2
= ( y + y )|0
24 16
1 3
= +
3 4
4 9
= +
12 12
13
= ,
12
and
2
1 3
Z
E(Y 2 ) = y 2 ( y + ) dy
0 8 8
2
1 3
Z
= ( y 3 + y 2 ) dy
0 8 8
1 4 1 3 2
= ( y + y )|0
32 8
1
= +1
2
3
= .
2
82 5. Joint Probability Distributions
Finally
3 2
1 1
Z Z
E(XY ) = xy( x + y) dydx
0 0 12 24
3 2
1 2 1
Z Z
= ( x y + xy 2 ) dydx
0 0 12 24
3
1 2 2 1
Z
= ( x y + xy 3 )|20 dx
0 24 72
3
1 1
Z
= ( x2 + x) dx
0 6 9
1 1
= ( x3 + x2 )|30 dx
18 18
3 1
= +
2 2
= 2.
This gives us
15 13 1
Cov(X, Y ) = 2 − ( )( ) = − ,
8 12 32
2 33 15 39
σX = − ( )2 = ,
8 8 64
and
3 13 47
σY2 = − ( )2 = .
2 12 144
Hence,
1
− 32 −8 · 12 −3
ρ= q q = √ =√ = −0.07007.
39 47 32 39 · 47 1833
64 · 144
To close the section, note that if we take the random variables X1 , ..., Xn , on
the sample space, they have a joint pmf (if they’re all discrete) or a joint pdf (if
they’re all continuous). In either case, we can expand the formulas for the expected
value and variance of linear combinations of the Xi ’s and obtain
E(a1 X1 + · · · + an Xn ) = a1 E(X1 ) + · · · an E(Xn )
and
n
X XX
V (X1 + · · · + Xn ) = V (Xk ) + 2 Cov(Xk , Xj ).
k=1 k<j
If it’s the case that the Xi s are independent, then the second formula becomes
n
X
V (X1 + · · · + Xn ) = V (Xk ).
k=1
Exercises
(1) Compute the correlation coefficient ρ for the pair of discrete random variables
X and Y with probability mass function as given in the table.
5.4. Expected Value, Variance, and Covariance 83
Y
p(x, y) 2 4
X 0 0.30 0.20
6 0.40 0.10
Ans.: −0.2182
(2) Compute the correlation coefficient ρ for the pair of discrete random variables
X and Y with probability mass function as given in the table.
y
p(x, y) 0 1 2
1 10 10
0 45 45 45
6 15
x 1 45 45 0
3
2 45 0 0
27
Ans.: E(X) = 45 , E(X 2 ) = 33 2
45 , E(Y ) = 1, E(Y ) =
65
45 , and E(XY ) =
15 27
15 √ 45 − 45
45 , so ρ = = −0.5855
(0.37333)(0.5555555)
(3) Compute the correlation coefficient ρ for the pair of discrete random variables
X and Y with probability mass function as given in the table.
Y
p(x, y) 3 5 10
X 4 0.25 0.10 0.05
6 0.20 0.02 0.38
Ans.: E(X) = 5.2, E(X 2 ) = 28, E(Y ) = 6.25, E(Y 2 ) = 50.05, and
34−(5.2)(28) 1.5
E(XY ) = 34, so ρ = √ 2 2
=√ = 0.4527
(28−5.2 )(50.05−6.25 ) (.96)(11.4375)
(4) Suppose the continuous random variables X and Y have the joint probability
density function
2ye−2x−y if x ≥ 0 and y ≥ 0
f (x, y) =
0 otherwise
Compute ρ.
Ans.: ρ = 0. Note that
Z ∞
−2x
fX (x) = 2e ye−y dy = 2e−2x ,
0
Z ∞
−y
fY (y) = ye 2e−2x dx = ye−y ,
0
and
Z ∞ Z ∞ Z ∞ Z ∞
xy · 2ye−2x−y dydx = x · 2e−2x dx · y · ye−y dy,
0 0 0 0
so
E(XY ) = E(X)E(Y ).
Chapter 6
n
1X
x̄ = xi
n i=1
and
sP
n
− x̄)2
i=1 (xi
s=
n−1
are used to estimate E(X) and SD(X), respectively. These two statistics are called
the sample mean and the sample standard deviation.
Before we take a random sample of n values from the distribution, we are uncer-
tain as to what the ith of the n values will be. We assign the random variable Xi to
the ith selection in the random sample and note that E(Xi ) = µ and SD(Xi ) = σ.
We then define the random variables X̄ and S 2 as the sample mean and variance,
respectively.
85
86 6. Sampling and Limit Theorems
Sample Mean and Variance. The sample mean X̄ and sample variance
S 2 are given by by
n
1X
X̄ = Xi
n i=1
and
n
1 X
S2 = (Xi − X̄)2 .
n − 1 i=1
We note that
n
1X
E(X̄) = E( Xi )
n i=1
n
1 X
= E( Xi )
n i=1
n
1X
= E(Xi )
n i=1
1
= · nµ
n
= µ.
n
X n
X
(Xi − X̄)2 = (Xi2 − 2X̄Xi + (X̄)2 )
i=1 i=1
n
X n
X n
X
= (Xi2 ) − 2X̄ Xi + (X̄)2
i=1 i=1 i=1
n
X
= (Xi2 ) − 2X̄ · nX̄ + n(X̄)2 )
i=1
n
X
= (Xi2 ) − n(X̄)2
i=1
n n
X 1X
= (Xi2 ) − n( Xi )2
i=1
n i=1
n n
X 1 X
= (Xi2 ) − ( Xi )2 .
i=1
n i=1
Note also that for any random variable X, V (X) = E(X 2 ) − (E(X))2 so that
Consequently,
n n
1 X 2 1 X
E(S 2 ) = E (Xi ) − ( Xi )2
n − 1 i=1 n i=1
n n
1 X 1 X
= E(Xi2 ) − E( Xi )2 )
n − 1 i=1 n i=1
n n n
1 X 2 1 X X
= (σ + µ2 ) − [V ( Xi ) + (E( Xi ))2 ]
n − 1 i=1 n i=1 i=1
1 1
= n(σ 2 + µ2 ) − [nσ 2 + (nµ)2 ]
n−1 n
1 2
= nσ + nµ − σ 2 − nµ2
2
n−1
= σ2 .
The random variables X̄ and S 2 are said to be unbiased estimators of µ and σ 2
since E(X̄) = µ and E(S 2 ) = σ 2 .
Now continuing to note that the Xi ’s we take as random readings from the
population with mean µ and standard deviation σ are independent, we can make
use of the result that says the variance of a sum of independent random variables
is the sum of their variances to obtain
n
1X
V (X̄) = V ( Xi )
n i=1
1 2 Xn
= V( Xi )
n i=1
n
1 X
= V (Xi )
n2 i=1
n
1 X 2
= σ
n2 i=1
1
= · nσ 2
n2
σ2
= .
n
σ
Consequently, SD(X̄) = √ . As one might expect, the standard deviation of the
n
sample mean X̄ decreases as the sample size gets larger.
Exercises
(1) A random sample of size 100 is taken from a population with mean µ = 90.0
and standard deviation σ = 14.2. Compute the mean and standard deviation
of the sample mean.
Ans. E(X̄) = 90 and SD(X̄) = 1.42
88 6. Sampling and Limit Theorems
(2) A random sample of size 30 is taken from a population with mean µ = 2.6 and
standard deviation σ = 0.125. Compute the mean and standard deviation of
the sample mean.
Ans. E(X̄) = 2.6 and SD(X̄) = 0.0228
(3) A population has standard deviation σ = 8.5. What’s the minimum size of a
sample from this population that will produce a sample mean with a standard
deviation (a) of less than 1 and (b) of less than 0.5?
Ans.: (a) 73 (b) 289
Chebyshev’s Inequality.
Suppose k is a positive constant and X is a random variable with finite
mean µ and standard deviation σ. Then
σ2
P (|X − µ| ≥ k) ≤ .
k2
Proof: (For the continuous case; the discrete case is similar) Suppose the con-
tinuous random variable X has pdf f (x), mean µ, and standard deviation σ. Then
we have that
Z ∞
σ2 = (x − µ)2 f (x)dx
−∞
Z µ−k Z ∞
≥ (x − µ)2 f (x)dx + (x − µ)2 f (x)dx
−∞ µ+k
Z µ−k Z ∞
2
≥ k f (x)dx + k 2 f (x)dx
−∞ µ+k
Z µ−k Z ∞
= k2 f (x)dx + f (x)dx
−∞ µ+k
2
= k P (X ≤ µ − k or X ≥ µ + k)
= k 2 P |X − µ| ≥ k .
Example 6.1. Scores on a certain standardized exam are 71.1 on average with a
standard deviation of 12.1. Find a probability bound for a random score for this
exam to be within 15 points of the mean.
6.2. Law of Large Numbers 89
Exercises
(1) For a random variable X with mean 10 and standard deviation 2, use Cheby-
shev’s Inequality to find an upper bound on the probability that X is either
less than or equal to 7 or greater than or equal to 13.
Ans. 4/9 = 0.4444
(2) If the random variable X is normal with mean 10 and standard deviation 2,
compute the probability that X is either less than or equal to 7 or greater
than or equal to 13.
Ans. 0.1336
(3) If the random variable X is Binomial with mean 12 and standard deviation
3, compute the probability that X is either less than or equal to 8 or greater
than or equal to 16. What is the upper bound that Chebyshev’s Inequality
gives for computing this probability?
Ans. 0.2422, 0.5625
90 6. Sampling and Limit Theorems
∞
X λn
φ(t) = etn e−λ
n=1
n!
∞
X (λet )n
= e−λ
n=1
n!
t
= e−λ eλe
t
= eλ(e −λ) .
t
The last series in the computation is the Maclaurin series for eλe .
n
X
φ(t) = etk (nk )pk (1 − p)n−k
k=0
n
X
= (et )k (nk )pk (1 − p)n−k
k=0
n
X
= (nk )(pet )k (1 − p)n−k
k=0
= (pet + 1 − p)n .
Hence,
φ′ (t) = n(pet + 1 − p)n−1 pet
and
φ′′ (t) = n(n − 1)(pet + 1 − p)n−2 pet · pet + n(pet + 1 − p)n−1 pet ,
so φ′ (0) = np and φ′′ (0) = n(n − 1)p · p + np. This gives us
E(X) = np,
and the variance of X is
E(X 2 ) − (E(X))2 = n(n − 1)p · p + np − (np)2 = n2 p2 − np2 + np − n2p2 = np(1 − p).
We note two more important facts about moment generating functions. First,
they are unique so if one knows the moment generating function he/she can deduce
92 6. Sampling and Limit Theorems
what random variable he/she is dealing with. Second is that the moment gener-
ating function for the sum of two random variables is the product of the moment
generating functions of the two random variables. This latter result has a simple
derivation.
We conclude this section with the formulation of a theorem needed for the proof
of the Central Limit Theorem that comes up later in this chapter.
Proving the Continuity Theorem is beyond the scope of this text. The same
is the case for the fact that moment generating functions are unique. The reader
should note that random variables can be studied in a superior rashion√via char-
acteristic functions φX (t) = E(eitx ), where i is the imaginary number −1. Un-
derstanding he mathematics behind these functions requires knowledge of complex
analysis.
Exercises
(1) Calculate the moment generating function for a discrete random variable X
for which P (X = 2) = 3/4 and P (X = 5) = 1/4.
2t 5t
Ans. φ(t) = 3e4 + e4
(2) Calculate the moment generating function for a continuous random variable
X with probability density function
2x if 0 ≤ x ≤ 1
f (x) =
0 otherwise
6.4. Sums of Independent Random Variables 93
t t
Ans. φ(t) = 2(te −e
t2
+1)
Example 6.4. It is known that there are 19 tornadoes to touch down per year on
average in Arkansas and two in Maine. The number of tornadoes to touch down
in a region is accurately modeled to be Poisson. What’s the probability that there
will be exactly 20 tornadoes to touch down in Arkansas and Maine combined next
year (assuming that the number touching down in Arkansas is independent of the
number in Maine)?
Since the sum of two independent Poisson random variables is Poisson with the
mean being the sum of the individual means, the answer is
(19 + 2)20
e−(19+2) = 0.0867.
20!
The reader will note that we are making the assumption that the number of torna-
does to touch down in Arkansas and in Maine are independent.
We now determine the probability density function for the sum of n independent
normal random variables. We start with a single normal random variable X with
mean µ and standard deviation σ. We incorporate the exponent on the factor etx
into the exponent including x2 and complete the square in x to obtain the rule:
Z ∞
1 (x−µ)2
φ(t) = etx √ e− 2σ2 dx
−∞ 2πσ
Z ∞
1 (x2 −2(µ+tσ2 )x+µ2 )
= √ e− 2σ2 dx
−∞ 2πσ
Z ∞
1 (x2 −2(µ+tσ2 )x+(µ+tσ2 )2 ) 2
+ 2µtσ2σ+t
2 σ4
= √ e− 2σ2 2 dx
−∞ 2πσ
Z ∞
t2 σ2 1 (x−(µ+tσ2 )2
= eµt+ 2 √ e− 2σ2 dx
−∞ 2πσ
t2 σ2
= eµt+ 2 ·1
t2 σ2
+µt
= e 2 .
The reader will note that the integral in the third to last line is in fact one since
the integrand is the probability density function for a normal random variable with
mean µ + tσ 2 and standard deviation σ.
Note now that for the independent normal random variables X1 , X2 , ..., Xn ,
with means µ1 , µ2 , ..., µn , respectively, and standard deviations σ1 , σ2 , ..., σn , re-
spectively, we have
t2 σ1
2 t2 σ2
2 t2 σn
2
+µ1 t +µ2 t +µn t
φX1 +X2 +···+Xn (t) = e 2 ·e 2 ···e 2
t2 (σ1
2 ···+σ2 )
n +(µ +···+µ )t
= e 2 1 n
.
6.4. Sums of Independent Random Variables 95
This is the moment generating functionp for a normal random variable with mean
µ1 + · · · + µn and standard deviation σ12 + · · · σn2 .
Now the sum Z12 +· · · Zn2 of the squares of independent standard normal random
variables is referred to as the chi-squared random variable with n degrees of freedom.
Note that the moment generating function for a chi-squared random variable with
1 degree of freedom, φZ 2 (t) is given by
Z ∞ Z ∞
2 2 1 x2 1 (1−2t)x2
φZ 2 (t) = E(eZ t ) = etx √ e− 2 dx = √ e− 2 dx.
−∞ 2π −∞ 2π
√ 1
Letting u = 1 − 2t · x, we obtain dx = √1−2t du and
Z ∞
1 1 −u2 1
φZ 2 (t) = √ √ e 2 du = (1 − 2t)− 2
1 − 2t −∞ 2π
for t < 12 . It’s therefore the case that the chi-squared distribution with n degrees
of freedom is n
1 n
(1 − 2t)− 2 = (1 − 2t)− 2 .
We will refer back to this moment generating function later when we ascertain the
underlying distribution of a sample standard deviation.
Exercises
(1) Suppose X and Y are independent Poisson RV’s with parameters 2 and 3,
respectively. Compute (a) P (X + Y = 6) and (b) P (X + Y ≥ 4).
3125 118
Ans.: (a) 144e5 = 0.1462 (b) 1 − 3e5 = 0.7350
(2) Calls come in to customer service center at a rate of 4.3 per minute. Assuming
calls arriving in two different minutes are independent, compute the probabil-
ity that (a) at least six calls come in in a two minute period and (b) exactly
25 calls come in in a five minute period. Hint: Use the Poisson.
Ans.: (a) 0.8578 (b) 0.0607
(3) Suppose X and Y are independent normal random variables with the mean
and standard deviation of X being 10 and 3, respectively, and the mean and
standard deviation of Y being 14 and 4, respectively. Compute (a) P (X +Y >
24) and (b) P (X + Y < 25).
Ans.: (a) 0.5000 (b) 0.5793
(4) Suppose X1 , ..., X10 are independent normal random variables, each with mean
2.0 and standard deviation 1.5. Compute (a) P (X1 + · · · + X10 < 23.5) and
(b) P (1.8 ≤ X1 +···+X
10
10
≤ 2.2)
Ans.: (a) 0.7697 (b) 0.3267
(5) Suppose X1 , ..., X20 are independent normal random variables, each with mean
0 and standard deviation 2. Compute (a) P (X1 + · · · + X20 < 1) and (b)
P (−1.1 ≤ X1 +···+X
20
20
≤ 1.1)
Ans.: (a) 0.5445 (b) 0.9861
X̄ − µ X1 + · · · + Xn − nµ
We note that and equivalent way to write is √ . A
√σ σ n
n
more precise statement of the theorem is as follows:
Proof: It’s sufficient to prove the theorem in the case that µ = 0 because if we
let Yn = Xn − µ we have
X + · · · + X − nµ Y + · · · + Y
1 n 1 n
lim P √ ≤ c = lim P √ ≤c .
n→∞ σ n n→∞ σ n
X1 + · · · + Xn
We therefore assume µ = 0 and let Zn = √ . Then
σ n
t
= φX1 +···+Xn ( √ )
σ n
t t
= φX1 ( √ ) · · · φX1 ( √ )
σ n σ n
t
= [φX1 ( √ )]n .
σ n
98 6. Sampling and Limit Theorems
We now take a limit of the the natural logarithm of φZn (t). We have
t
lim ln φZn (t) = lim ln[φX1 ( √ )]n
n→∞ n→∞ σ n
t
= lim n ln[φX1 ( √ )]
n→∞ σ n
ln[φX1 ( σ√t n )]
= lim
n→∞ 1/n
φ′X1 ( σ√t n )(− 12 ) σnt3/2 /φX1 ( σ√t n )
= lim
n→∞ −1/n2
φ′X1 ( σ√t n )t
= lim
n→∞ 2σφX1 ( σ√t n )n−1/2
φ′′X1 ( σ√t n )(− 12 ) σnt3/2
= lim
n→∞ 2σφ′X1 ( σ√t n )(− 12 ) σnt3/2 n−1/2 + 2σφX1 ( σ√t n )(− 21 ) n3/2
1
2
φ′′X1 ( σ√t n ) tσ
= lim
n→∞ 2φ′X1 ( σ√t n ) √tn + 2σφX1 ( σ√t n )
2
σ 2 · tσ
=
0 + 2σ · 1
t2
= .
2
The reader will note that L’Hospital’s Rule was used in going from line 3
to line 4 and then again from line 5 to line 6. The Rule is applicable since
limn→∞ ln φX1 ( σ√t n ) = ln φX1 (0) = ln 1 = 0, and because limn→∞ φ′X1 ( σ√t n )t =
φ′X1 (0)t = µt = 0 · t = 0.
Hence,
t2
lim φZn (t) = e 2 .
n→∞
t2
Since e is the moment generating function for the standard normal, we can apply
2
the Continuity Theorem for Moment Generating Functions, and the proof is done.
A different formulation of the Central Limit Theorem is used to solve the next
example problem. We compute the probability that a sum of random variables is
within a certain interval. To apply the Central Limit Theorem, we convert this
sum to an average.
Example 6.6. A parcel carrier handles packages that weigh on average 16.4 lb with
a standard deviation of 7.5 lb. What’s the probability that the next 50 parcels she
handles will weigh less than 750 lb altogether.
Let Xi be the weight in pounds of the ith package. Then the answer is
X + X + · · · X 750
1 2 50
P (X1 + X2 + · · · X50 < 750) = P <
50 50
= P (X̄ < 15)
15 − 16.4
= P Z< 7.5√
50
= P (Z < −1.32)
= 0.093.
This allows us to write a pair of formulas that are of great use in approximating
binomial probabilities with the standard normal.
100 6. Sampling and Limit Theorems
Exercises
(1) A certain cereal company packages “14 oz” boxes of cereal. These boxes have
a mean weight of 14.1 oz with a standard deviation of 0.42 oz. What’s the
probability that 30 randomly selected boxes of this cereal have an average
weight of at least 13.9 oz?
Ans. 0.995
(2) A bartender pours glasses of wine that are 10.5 oz on average with a standard
deviation of 1.3 oz. (a) What’s the probability that the bartender pours at
least 10 oz for the next customer who requests a glass of wine? (b) What’s
the probability he pours at least 10 oz on average for the next 30 customers
who request a glass of wine? (c) What’s the probability that 430 oz of wine
will suffice for the next 40 customers he pours glasses of wine?
Ans. (a) not enough information given to answer the problem, (b) 0.9824,
(c) 0.888
(3) Hank sales propane tanks that have a mass of 17.46 kg with a standard devi-
ation of 0.12 kg when empty. He loads 50 such tanks on the back of his truck.
What’s the probability that these 50 tanks have a mass of more than 875 kg
combined?
Ans. 0.0092
6.6. Normal Approximation to the Binomial 101
(4) If you roll a balanced die 50 times, what’s the probability you get (a) exactly
eight 6’s? (b) at least 10 6’s?
Ans. (a) 0.1510 (b) 0.3290
(5) Suppose you take a true false exam by guessing on every question and that
the lowest passing score is 60%. What’s the probability you pass if the exam
consists of (a) 30 questions, (b) 50 questions, (c) 100 questions?
(6) Suppose you take a multiple choice exam by guessing on every question and
that the lowest passing score is 60%. What’s the probability you pass if the
exam consists of 30 questions, each with four answers?
(7) Twelve percent of customers at a fast food restaurant order a root beer.
What’s the probability that at least 30 of the next 200 customers at that
restaurant will order a root beer?
Ans. 0.1157
(8) A facilities worker loads pieces of equipment onto a freight elevator that has a
capacity of 1, 800 pounds. The pieces weigh 56.5 lb on average with a standard
deviation of 18.5 lb. What’s the probability that the next 30 pieces that need
to be loaded on the elevator will be within the weight limit?
Ans. 0.851
(9) A particular laptop comes with a battery that will hold a charge for five hours
and 32 minutes on average with a standard deviation of one hours and 56
minutes. If you purchase 40 such laptops, what’s the probability that they
hold a charge on average for five hours or more?
Ans. 0.959
Chapter 7
Random Processes
103
104 7. Random Processes
Now pjk is the probability of going from state j to state k in one step. The
(m)
notation pjk is used to represent the probability of going from state j to state k
in m steps. We have
(m)
pjk = P (Xn+m = k|Xn = j).
We have
Example 7.1. Suppose that in a certain region the weather is such that a sunny
day is followed by a stormy day with probability 81 and a stormy day is followed by
a stormy day with probability 31 . Taking a sunny day to be State 1 and a stormy
day to be State 2, compute the probability that on the third day after a sunny day
it will be stormy.
(3)
Consequently, the answer to the question is p12 = 0.1565.
If it’s the case that for some positive integer m we have that
(m)
pjk > 0
for all j, k = 1, 2, ..., N , then the Markov chain is said to be ergodic. When the
chain is ergodic, the limit
(n)
lim p
n→∞ jk
Example 7.2. Suppose that in a certain region the weather is such that a sunny
day is followed by a stormy day with probability 18 and a stormy day is followed by a
stormy day with probability 31 . Taking a sunny day to be State 1 and a stormy day
to be State 2, compute the probability vector π mentioned in the Ergodic Theorem.
The row vector x = hx1 , x2 , ..., xN i is said to be the state vector for the Markov
chain at a given observation if the kth component xk is the probability the system is
in the kth state at that time. To ascertain the state vector for a specific observation
we use the transition matrix.
Hence, there’s about an 84.2% chance it will be sunny and a 15.8% chance it
will be stormy on the fourth day.
Example 7.4. Consider a random walk on the real line in which an entity either
moves one step to the right with probability p or one step to the left with probability
1 − p. We compute the transition probability for going from state i to state j in
exactly n steps.
(n)
To compute pij we note that the transition has to consist of k steps to the
right, where k is a nonnegative integer less than or equal to n, and n − k steps to
the left. This has to happen in such a way that
i + k − (n − k) = j.
This implies 2k = n − i + j or
n−i+j
.k=
2
Hence, the transition probability for n steps is
n
n−i+j n−i+j n−i+j
(n)
p 2 (1 − p)n− 2 if 2 = 0, 1, 2, ..., or n
pij = n−i+j
2
0 otherwise
Exercises
(1) Suppose a Markov chain X0 , X1 , ..., has the two states 1 and 2, and transition
matrix
0.6 0.4
P = .
0.3 0.7
(a) What’s the probability of going from State 2 to State 1 in one step? (b)
What’s the probability of going from State 2 to State 1 in two steps? (c) Is
the process ergodic? (d) If it is ergodic, compute the stable state vector π
with each component to four decimal places. (e) If the process starts with
state vector x(0) = h1/2, 1/2i, compute the state vector x(2) .
Ans. (a) 0.3 (b) 0.39 (c) Yes (d) h0.4286, 0.5714i (e) h0.435, 0.565i
108 7. Random Processes
(2) Suppose a Markov chain X0 , X1 , ..., has the three states 1, 2, and 3, and
transition matrix
1/2 1/4 1/4
P = 1/3 1/2 1/6 .
1 0 0
(a) What’s the probability of going from State 2 to State 1 in one step? (b)
What’s the probability of going from State 2 to State 1 in two steps? (c) Is
the process ergodic? (d) If it is ergodic, compute the stable state vector π
with each component to four decimal places? (e) If the process starts with
state vector x(0) = h0, 1, 0i, compute the state vector x(2) .
Ans. (a) 1/3 (b) 1/2 (c) Yes since every entry in P 2 is positive
(d) h0.5455, 0.2727, 0.1818i (e) h1/2, 1/3, 1/6i
(3) Place $100 bets on red in roulette so that you can be in a state of having either
$0, $100, $200, $300, $400, or $500 in funds. Once you run our of money or
accumulate $500 you stay where you are. (a) Write out the transition matrix
for the six states. (b) If you start with $300, what’s the probability you’ll be
broke after betting five times?
Ans. (a)
1 0 0 0 0 0
20/38 0 18/38 0 0 0
0 20/38 0 18/38 0 0
P = .
0 0 20/38 0 18/38 0
0 0 0 20/38 0 18/38
0 0 0 0 0 1
(b) 0.2548
(4) In a random walk, assume a particle can move one step to the right with
probability 0.52 and one step to the left with probability 0.48. What then
is the probability that the particle will move two steps to the right (a) in 20
steps and (b) in five steps.
Ans. (a) (20 11 9
11 )(0.52) (0.48) = 0.1708 (b) 0
Index
109
110 Index
uniform distribution, 51
variance, 27, 31