Probability Lecture Notes
Probability Lecture Notes
Probability
Lecture Notes & Exercises
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
0 Counting 1
Lecture 01 - Experiments and Outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Lecture 02 - The Fundamental Counting Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Lecture 03 - Permutations and Combinations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Lecture 04 - The Binomial Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Lecture 05 - Sets and Set Cardinalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1 Probability 27
Lecture 06 - Sample Spaces and Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
Lecture 07 - Axiomatic Probability Theory and Other Approaches . . . . . . . . . . . . . . . . . . 31
Lecture 08 - Calculating Probabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
Lecture 09 - Conditional Probability and Independent Events . . . . . . . . . . . . . . . . . . . . . 43
Lecture 10 - Bayes’ Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
iii
Preface
This text is intended for the undergraduate course MATH-C 228 Probability in the Bachelor of Science in
Mathematics (BS Math) program of the Department of Mathematics and Statistics, College of Arts and
Sciences of Cebu Technological University - Main Campus.
The text is composed of lecture notes and worksheets for 32 lectures divided among 9 chapters in the
course. These contents are compliant to the Course Outline for Probability required by CHED Memorandum
Order No. 48, s. 2017 (PSGs for the BS Math/BS Applied Math program) with an added chapter (Chapter
0 - Counting) on counting techniques and basic combinatorial analysis as a preliminary chapter.
CMO No. 48, s. 2017 provides the following Learning Outcomes for Probability:
(3) define and give examples of mutually exclusive events and independent events;
(4) define a random variable and explain its usefulness in computing probabilities fo events;
(5) enumerate the properties of a cumulative distribution function and probability distribution function;
(6) derive the probability distribution function from the cumulative distribution function and vice versa;
(7) name some commonly used special discrete and continuous distributions and their properties;
(10) derive the distribution of a function of random variables using different techniques;
(12) explain and give the properties of a joint cumulative distribution function and joint probability
distribution;
(15) compute mathematical and conditional expectations involving functions of a random vector;
(16) construct sampling distributions and compute their means and variances;
(17) explain the law of large numbers and the Central Limit Theorem; and,
iv
These objectives are divided among Chapters 1 to 8 of the course contents. For Chapter 0 - Counting, the
following Learning Outcomes are added:
The Learning Outcomes are specified at the beginning of each chapter in this text. Furthermore, at the
beginning of each lecture, Lecture Objectives which address the chapter’s Learning Outcomes are enumerated.
This is then followed by Lecture Notes which presents the body of the lesson. The Lecture Notes is usually
composed of definitions, discussions, theorems, examples, and exercises. This is then followed by the Lecture
Worksheet which is further divided into three parts: Basic Concepts, Problem Solving, and Theoretical Work.
Basic Concepts ask questions regarding fundamental understanding of the lecture. Problem Solving gives
questions for which the tools developed in the Lecture Notes are to be applied. Lastly, the Theoretical Work
provides higher level theoretical development exercises which are mostly on proving or extending theorems,
developing new probability theory tools, and abstract reasoning.
Questions in the Basic Concepts are worth between one and three points depending on their level of
difficulty. Questions in Problem Solving are worth five points and are scored based on the following rubrics:
Theoretical Work questions are worth between five and ten points depending on the level of difficulty. They
are scored based on the following rubrics.
Very Very
Satisfactory Satisfactory Fair Unsatisfactory Unsatisfactory
Accuracy (40%) Every line in At most 75% of At most 50% of At most 25% of Every line in
the proof is the lines in the the lines in the thhe lines in the the proof is in-
mathematically proof are math- proof are math- proof are math- accurate
and logically ematically and ematically and ematically and
accurate logically accu- logically accu- logically accu-
rate rate rate
Completeness (30%) The proof is The proof The proof lacks The proof lacks The proof has
complete with lacks definition at most half of at most 75% of no body to ar-
parts including of new nota- its body its body rive at the con-
definition of tions (when clusion
new notations, applicable),
ypothesis, hypothesis, or
body, and conclusion but
conclusion complete with
its body
Readability (20%) The proof The proof The proof The proof There are no
is complete has appropri- has appropri- has appropri- appropriate
with appro- ate strings of ate strings of ate strings of strings of ex-
priate strings explanations explanations explanations planations
of explana- between at between at between at between the
tions between most 75% of most 50% of most 25% of lines of the
lines of proof the lines the lines the lines proof
for effective
communication
v
Organization (10%) The proof The lines of The proof is The proof The proof fails
is neat and the proof are neat but the is not neat to provide
the lines are well-organized lines are either and the lines any system of
well-organized and logically unorganized are either addressing the
and logically ordered but not or illogically unorganized given theorem
ordered neat ordered or illogically
ordered
vi
Chapter 0
Counting
This chapter is covers basic combinatorial analysis and set theory. Here, we layout some basic counting tech-
niques which will be used in calculating probabilities in Chapter 1. We will also review some basic concepts
in set theory especially on basic terms, set operations, and some important theorems and properties of sets
and set operations.
Lectures
Lecture 1. Experiments and Outcomes
Lecture 2. The Fundamental Counting Principle
Lecture Objectives
(1) Define experiments and outcomes
In this course, we will deal with what are called random experiments. Probability theory is the study of the
associated likelihood of occurrence of events in these experiments. To begin our understanding of the theory
of probability, we define random experiments and sample spaces.
(1) traffic management: testing new routes for PUJs to improve traffic flow
(2) disaster mitigation: proposing disaster and emergency plans to mitigate dangers from natural and
man-made disasters such as typhoons, earthquakes, and fire
(3) crime control : national and local government legislating new laws, regulations, and ordinances as
preventive measures for crimes
(4) promoting quality education: reviewing and revising curricula (or the entire education system itself)
in order to improve the quality of education in the country
In science, repeated experimentation is also done to aid scientific investigation. This includes the following
examples given by Hogg, McKean, & Craig (2018):
Note. An important property of an experiment is that before it is performed, there is no certainty of what
the outcome will be.
A random experiment is an experiment, where every possible outcome is known.
2
Definition 1.1. The set of all possible outcomes of a random experiment is called its sample space, denoted
S. Each possible outcome of a random experiment (i.e., each element in the sample space) is called a sample
point.
Example 1.2. In the experiment of tossing a coin, the sample space is:
S = {H, T }
where H, T are the sample points representing heads and tails, respectively.
(2) Suppose an experiment involves rolling a die and tossing a coin. Then, we have:
S = {(1, H), (2, H), (3, H), (4, H), (5, H), (6, H), (1, T ), (2, T ), (3, T ), (4, T ), (5, T ), (6, T )}.
Exercise. There are two bulbs in a room which may be turned ON (1) or OFF (0). List all the possible
cases for both bulbs.
Example 1.6. The sample space of tossing a coin thrice is:
S = {(H, H, H), (H, H, T ), (H, T, H), (H, T, T ), (T, H, H), (T, H, T ), (T, T, H), (T, T, T )}.
Exercise. In ordering a burger, there are three sizes: small, large, and medium, two kinds of patty: beef
and chicken, and two kinds of sauce: mayonnaise and cheese. Write out all possible choices of burgers.
Now we introduce a more systematic way of writing out the sample space of a given experiment: tree
diagramming. A tree diagram is a tree graph starting from one vertex branching out to levels. Each
level represents a subexperiment. The tree diagram is advantageous for experiments involving multiple
subexperiments and outcomes.
3
Example 1.7. The outcomes of tossing a coin thrice can be determined by the following tree diagram.
Thus, S = {(H, H, H), (H, H, T ), (H, T, H), (H, T, T ), (T, H, H), (T, H, T ), (T, T, H), (T, T, T )}.
Exercise. Francis has three shirts: blue, red, and yellow, and two pants: long and short. Use tree diagram-
ming to determine the sample space of what possible outfits Francis can wear.
Lastly, we discuss another systematic way of obtaining the sample space – the tabular method. The
tabular method is mostly derived from the notion of a Cartesian product. For an experiment with two
subexperiments, the sample points can be derived from the Cartesian product S1 × S2 , where S1 and S2 are
the sample spaces of the two subexperiments.
The tabular method can also be extended for experiments with more than two subexperiments as we will
illustrate later.
Example 1.8. Suppose an urn contains three chips numbered 1,2, and 3. An experiment involves picking
up two of the chips with replacement (i.e., the first chip is returned to the urn after it was picked). Then the
sample space can be obtained from the following table:
1 2 3
1 (1, 1) (1, 2) (1, 3)
2 (2, 1) (2, 2) (2, 3)
3 (3, 1) (3, 2) (3, 3)
Thus, S = {(1, 1), (1, 2), (1, 3), (2, 1), (2, 2), (2, 3), (3, 1), (3, 2), (3, 3)}.
When the experiment involves more than two subexperiments, we use multiple tables as illustrated below.
H T
H T (H,H) (H, H, H) (H, H, T )
H (H, H) (H, T ) (H,T) (H, T, H) (H, T, T )
T (T, H) (T, T ) (T,H) (T, H, H) (T, H, T )
(T,T) (T, T, H) (T, T, T )
Hence, S = {(H, H, H), (H, H, T ), (H, T, H), (H, T, T ), (T, H, H), (T, H, T ), (T, T, H), (T, T, T )}.
Exercise. Use tabular method to obtain the sample space of tossing a die twice.
4
Evaluation
A. Basic Concepts (2 points each)
(1) What is an experiment? Give one example.
(2) What is a random experiment?
(3) What are sample spaces and sample points?
(4) What are some techniques to obtain the sample space of an experiment and how is each done?
B. Problem Solving (5 points each)
(1) A code involves four digits of numbers from the set {0, 1, 2}. Use tree diagramming to obtain all
possible codes.
(2) Use listing method to obtain the sample space of picking a letter from the vowels of the English
alphabet.
(3) In buying a particular car model, Jay may choose between automatic or manual transmission, and
among four colors, black, white, sliver, and maroon. Use tabular method to obtain the sample space
of all choices that Jay has.
Notes
(1) Answer the Evaluation in sheets of short-sized white bond papers. They will be compiled at the end of
the term.
5
Lecture 02 - The Fundamental Counting Principle
Lecture Objectives
(1) State, prove, and apply the Fundamental Counting Principle
(2) Generalize and modify the Fundamental Counting Principle for specific cases of counting problems
Hence, there are 165 possible ways to choose a Prince Charming and a Muse.
Example 2.2. How many possible outcomes are there in rolling a die twice?
Solution. 6 6 = 36
Example 2.3. How many possible outcomes are there in tossing a coin twice?
Solution. 2 2 =4
Example 2.4. How many possible outcomes are there in rolling a die and tossing a coin?
Solution. 6 2 = 12
Exercise. Robert has 7 shirts and 3 pairs of pants. How many outfits can he possibly wear?
Proof (Fundamental Counting Principle). Let S1 = {x1 , x2 , . . . , xm } be the sample space of the first subex-
periment and S2 = {y1 , y2 , . . . , yn } be the sample space of the second subexperiment.
Then, the sample space of doing both subexperiments is given by
S1 × S2 = {(x, y) | x ∈ S1 , y ∈ S2 }.
From Set Theory, we know that |S1 × S2 | = |S1 ||S2 |. Therefore, |S1 × S2 | = mn.
6
The Fundamental Counting Principle can be extended to deal with experiments having more than two
subexperiments. This result is called the Generalized Fundamental Counting Principle (GFCP).
Theorem 2.2 (Generalized Fundamental Counting Principle). If an experiment is composed of r subex-
periments, having n1 , n2 , . . . , nr possible outcomes, respectively, then the experiment has n1 n2 · · · nr possible
outcomes.
Example 2.5. A customer who orders pizza can choose two types of crusts: thick and thin; three sizes:
small, medium, and large; and three types of toppings: Hawaiian, pepperoni, and BBQ. How many choices
of pizza are there in all?
Solution. There are three subexperiments namely type of crust, size, and type of toppings. Hence, we
need three boxes.
2 2 3 = 18
Exercise.
(1) In a city, there are two malls; each mall has three cinemas; and in each cinema, there are two kinds
of movie seats: deluxe and premium. How many ways can someone watch a movie?
(2) How many possible outcomes are there in rolling a die thrice?
(3) How many possible outcomes are there in tossing a coin thrice?
Example 2.6. An ATM Card is secured by a 4-digit PIN code. How many possible PIN codes can be made?
Solution.
10 10 10 10 = 10000
Sometimes, counting problems include restrictions such as the following.
Example 2.7. An ATM Card is secured by a 4-digit PIN code. How many possible PIN codes can be made
if the last digit must be even?
Solution.
10 10 10 5 = 5000
Example 2.8. For lunch, Elvi has to choose a main dish, a side dish and the type of rice. There are four
choices of main dish namely pork, fish, chicken, and beef; three choices of side dish, namely beans, vegetables,
and corn soup; two choices of types of rice namely plain and java. However, due to Elvi’s diet plans, she is
not allowed to eat the fish-beans-plain rice combo. How many choices of lunch combo does Elvi have?
Solution.
4 3 2 −1 = 23.
Suppose an experiment is to be repeated, i.e., to be performed multiple times. We say that the experi-
ment is with replacement if the outcome that occured in a repetition could still possibly occur in the next
repetitions. Otherwise, we say it is without replacement.
Example 2.9. Suppose two cards are drawn from the standard deck. How many possible outcomes are there
if they are drawn (a) with replacement, (b) without replacement?
Solution.
(a) 52 52 = 2704
7
(b) 52 51 = 2652
Example 2.10. Ten runners are racing to the Finish Line. How many possible ways can the Gold, Silver,
and Bronze medals be awarded?
Solution. 10 9 8 = 720
Example 2.11. How many 4-digit PIN Codes can be made if the first and last digits must be the same, and
the second and third must not be equal?
Solution. 10 10 9 1 = 900
Exercise. France, Bert, John, and Jules are members of a rock band. France can be the vocals and can
play any instrument. Bert can only play either drums or guitars. John and Jules both can be vocals and
play any instrument except the keyboard. How many ways can they arrange themselves in a performance?
n 0 1 2 3 4 5 6 7 8 9 10
n! 1 1 2 6 24 120 720 5040 40320 362880 3628800
(1) 7! 6! + 5!
(3)
4!
5!
(2)
3!
Solution.
Exercise. Evaluate:
(1) 8! 6!
(3)
4!2!
10! n!
(2) (4)
8! (n − 2)!
8
2.3 Linear Arrangements
Consider the following problem:
Example 2.13. How many ways can Alex, Bert, Charles, and Dennis arranged themselves if they sit in a
row?
Solution. 4 3 2 1 = 24
This can also be solved using the factorial 4! = 24.
Theorem 2.3 (Linear Arrangements). There are n! ways to rearrange n distinct objects.
Proof. Let there be n distinct objects. Then there are n possible objects that can placed in the first position.
Since this object cannot be placed again in the next position, there are n − 1 remaining objects that can
possibly be placed second. Furthermore, there are n − 2 on that can be placed next, and so on. This manner
continues until the last position in which only one object can possibly be placed. By GFCP we find that
there are
Exercise. How many ways can we rearrange the letters of the word CEBUANO?
Example 2.14. How many ways can we rearrange the letters of the word:
(a) WORD;
(b) WOOD?
Solution.
(a) 4! = 24
Thus, there are 12 ways to rearrange the letters of the word WOOD.
Theorem 2.4 (Linear Arrangement with Indistinct Elements). Let there be n distinct objects, each object
appearing r1 , r2 , . . . , rn times, respectively. Then, there are
(r1 + r2 + · · · + rn )!
r1 !r2 ! · · · rn !
9
Proof. Let there be n distinct objects each appearing r1 , r2 , . . . , rn times, respectively. Thus, there are
r1 + r2 + · · · + rn objects in all and by Linear Arrangement Theorem, there are (r1 + r2 + · · · + rn )! ways to
rearrange them disregarding the indistinctness of some elements.
Now by Linear Arrangement Theorem again, there are ri ! ways to rearrange the ith object. Since they
are indistinct, we have to deduct all these rearrangements of the ith object (for i = 1, 2, . . . , n), and that is
by dividing the total number of arrangements by r1 !r2 ! · · · rn !. Hence, there are
(r1 + r2 + · · · + rn )!
r1 !r2 ! · · · rn !
possible ways to rearrange them.
Example 2.15. How many ways can we rearrange the letters of the words
(a) WOOD;
(b) CRITIC?
Solution.
4!
(a) = 12
2!
6!
(b) = 180
2!2!
Exercise. How many ways can we rearrange the letters of the words:
(a) BOBBY;
(b) ELEMENTAL;
(c) PROBABILITY?
Example 2.16. How many ways can Alex, Bob, Charles, and Dennis arrange themselves around a table?
10
Theorem 2.5 (Circular Arrangements). There are (n − 1)! ways to arrange n objects around a circle
(assuming rotations are indistinct).
Proof. When n objects are to form a line, there are n! ways to arrange them. When ends of this line segment
are joined, we form a circle. This circle has n indistinct rotations that we have to remove from the total
n!
number of arrangements. Thus, there are = (n − 1)! ways to arrange them.
n
Example 2.17. How many ways can Alex, Bob, Charles, and Dennis arrange themselves around a circle:
(c) if Alex and Bob choose not to sit beside each other?
Solution.
(a) (4 − 1)! = 6
(b) (3 − 1)!2! = 4
Exercise. How many ways can Alex, Bob, Charles, Dennis, and Eugene arrange themselves around a circle
if:
(c) Alex and Bob choose not to sit beside each other;
Evaluation
A. Basic Concepts. (2 points each)
(1) State the Fundamental Counting Principle and the Generalized Fundamental Counting Principle.
(2) Explain how each are calculated: (a) linear arrangements; (b) linear arrangements with indistinct
elements; (c) circular arrangements.
(3) What is the difference between an experiment with and without replacement?
(4) Explain how one evaluates the factorial of a positive integer.
11
(4) Dave assembles a new desktop PC. There are 4 choices of monitor brands, 2 choices of keyboards, 1
choice of mouse, 4 choices of processors and 5 choices of mother How many ways can the letters of
the word COMMITTEE be rearranged?
(5) there are two 1-Peso coins, three 5-Peso coins, and two 10-Peso coins. How many ways can they be
piled up in one pile?
(6) How many ways can three friends sit around a circle?
(7) How many ways can six friends sit around a circle if two of them choose not to sit beside each other?
(8) How many 5-digit PIN Codes can possibly be created if the first digit must be prime, the second and
third must be odd, and the fourth and fifth cannot be the same digits?
(9) How many ways can a class with 13 students elect their president, vice-president, secretary, treasurer,
and auditor? (no student can serve two positions)
(10) How many ways can 5 friends be seated in a row?
(11) Suppose there are 3 pairs of couples. How many ways can they arrange themselves in a row if each
couple must sit beside each other?
C. Theoretical Work. (10 points each)
12
Lecture 03 - Permutations and Combinations
Lecture Objectives
(1) Define permutations and combinations
(2) Use permutation and combination functions to solve relevant counting problems
(3) Generalize and modify the concept of permutation and combination function for specific cases of problems
In addition to the counting techniques already discussed in the last lecture, we present here two important
counting functions namely permutation and combination. We start this lecture with a classical example
called the Handshake Problem.
Example 3.1 (The Handshake Problem). Five gentlemen attended a party. Each man shook hands with
every other man. How many handshakes were there in all?
Solution. The five vertices below represent the five men while the edges represent the handshakes:
Counting the edges, there are 10. Hence, there were 10 handshakes in all.
3.1 Permutations
Recall from Lecture 02 that there are n! ways to arrange n distinct objects. In this part of the lecture we
are to deal with n distinct objects and we take r of them (0 ≤ r ≤ n) to arrange. How many arrangements
are possible?
Example 3.2. How many ways can gold, silver, and bronze medals be awarded to four racers?
Solution. We denote the racers by A, B, C, D. The possible outcomes are:
ABC BAC CAB DAB
ACB BCA CBA DBA
ABD BAD CAD DAC
ADB BDA CDA DCA
ACD BCD CBD DBC
ADC BDC CDB DCB
13
This means that there are 24 possible outcomes.
The quicker way to solve the problem is through the permutation function which will be introduced by
the following theorem.
n!
Theorem 3.1. Let there be n distinct objects. Then there are ways to arrange r of them (0 ≤ r ≤ n).
(n − r)!
Proof. Let there be n objects. If we arrange r of them, then by GFCP, there are n(n − 1)(n − 2) · · · (n − k + 1)
ways to arrange them. Using the factorial notation, this number is equal to
n!
.
(n − r)!
4!
4 P3 =
(4 − 3)!
4!
=
1!
= 24.
Exercise.
(1) A valedictorian and a salutatorian have to be selected from a group of 10 students. How many possible
ways can this be done?
(2) A building owner hired 8 guards. Five of the guards have to be assigned to patrol a five-storey building
(one guard per floor), and the remaining will be reserved at the guard house. How many ways can they
be assigned?
(3) In a lottery, five balls in exact order are to be drawn from a total of 20 numbered balls. How many
possible winning entries are there?
3.2 Combinations
We have dealt with problems involving arrangements. In this section, we will look at problems in which the
arrangement or order of the elements does not matter. See the problem below.
Example 3.3. How many ways can we select three letters from the list: A, B, C,D, and E?
14
Hence, there are 10 possible selections.
Observe that for the selection ABC, its rearrangements namely ACB, BAC, BCA, CAB, and CBA did not count.
In other words, all 3! rearrangements of each selection count as one.
In general, if we select r objects from a set containing n distinct ones, we can take all possible permutations
n Pr and then divide by r!, the number of ways to rearrange each selection. This is called the combinatorial
function.
n!
n Cr = .
r!(n − r)!
Proof. We know that there are n Pr permutations. There are r! identical cases when arrangement does not
matter. Thus, there are
n Pr n!
= =n Cr
r! r!(n − r)!
possible outcomes.
5!
5 C3 =
3!(5 − 3)!
5!
=
3!2!
5 · 4 · 3!
=
3! · 2
= 10
Exercise.
(1) How many ways can we select 5 members to form a dance group from a class of 10 students?
(2) John wants to purchase 3 colors of paint for his school project. At the store, there are 7 colors available.
How many choices does John have?
15
Solution. Arrangement does not matter. Hence,
12 12!
=
5 5!(12 − 5)!
12 · 11 · 10 · 9 · 8 · 7!
=
5 · 4 · 3 · 2 · 1 · 7!
= 792
Example 3.5. From a class with 12 students, 5 have to be chosen to become the president, vice-president,
secretary, treasurer, and auditor. How many choices are there?
Solution. In this problem, the arrangement matters.
12!
12 P5 =
(12 − 5)!
12 · 11 · 10 · 9 · 8 · 7!
=
7!
= 95040
Exercise. Identify whether the following experiments are permutations or combinations, then solve the
problem.
(1) A class of 10 students has to elect a president, vice-president, secretary, treasurer, and auditor. How
many ways can these officers be selected?
(2) How many ways can 3 items be selected from a group of 6 items?
(3) Suppose that a bank serves 50 accounts. They wanted to randomly choose a sample of 10 accounts to
study in order to better understand the behavior of their clients. How many possible sets of samples can
be taken?
(4) How many ways can 3 cards be drawn sequentially from the standard deck?
Evaluation
A. Basic Concepts (2 points each)
(1) Differentiate when to use the permutation and combination functions.
B. Problem Solving (5 points each)
(1) A pet shop sells hamsters. If they have 15 available hamsters, how many choices do we have if we
want to buy 5?
(2) Ten chips are numbered 1 to 10. If three chips are to be drawn in exact order, how many possible
outcomes are there?
(3) International airports around the world are coded using three letters. How many possible codes can
be generated?
(4) How may possible winning combinations are there in the 6/42 Lotto draw?
C. Theoretical Work (10 points each)
(1) Generalize and give an explicit formula to solve the Handshake Problem for any number of gentlemen
attending the party.
16
Lecture 04 - The Binomial Theorem
Lecture Objectives
(1) State, prove and apply the binomial theorem
We first present an important identity formed by binomial coefficients and this use this to construct the
famous Pascal Triangle.
Proof. We have,
n n n! n!
+ = +
r r+1 r!(n − r)! (r + 1)!(n − r − 1)!
n! 1 1
= +
r!(n − r − 1)! n − r r + 1
n! n+1
=
r!(n − r − 1)! (n − r)(r + 1)
(n + 1)!
=
(r + 1)!(n − r)!
(n + 1)!
=
(r + 1)! [(n + 1) − (r + 1)]!
n+1
= .
r+1
The primary application of this identity is the construction of the Pascal Triangle:
17
From the Pascal Triangle, nr is the rth entry of the nth row (note however that we start counting with
In this section, we state and prove three basic properties of the binomial coefficient.
(i) n0 = nn = 1;
(ii) n1 = n−1
n
= n; and,
(iii) nr = n−r
n
.
n
n! n!
(i) 0 = = =1
0!(n − 0)! n!
n
n! n!
n = n!(n − n)! = n! = 1
n
n! n!
(ii) 1 = = =n
1!(n − 1)! (n − 1)!
n
n! n!
n−1 = (n − 1)![n − (n − 1)]! = (n − 1)! = n
n
n! n! n
(iii) n−r = = = r .
(n − r)![n − (n − r)]! r!(n − r)!
18
Theorem 4.3 (Binomial Theorem). Let x and y be variables and n any nonnegative integer. Then,
n
X n n−r r
(x + y)n = x y .
r=0
r
In other words,
n n n n−1 n n−2 2 n n
(x + y)n = x + x y+ x y + ··· + y .
0 1 2 n
Proof. (Binomial Theorem) When n = 0, the proof is trivial. For n > 0, we will use induction. When n = 1,
we have
1 n 1 1 1
(x + y) = x + y = x + y .
0 1
Suppose that for n = k,
k
k
X k
(x + y) = xk−r y r .
r=0
r
Then, for n = k + 1,
19
(2) (2x + y)3 = (2x)3 + 3(2x)2 y + 3(2x)y 2 + y 3 = 8x3 + 12x2 y + 6xy 2 + y 3
Exercise. Expand:
(1) (x + y)5
(3) (x + y + z)
The Binomial Theorem also has applications for theoretical work including the following corollary which
may be llustrated by summing up the rows of the Pascal Triangle.
Example 4.2. Team A and Team B play 5 basketball games. How many ways can A win four times and B
once?
So there are 5.
Using the Binomial Theorem, the numerical coefficient of the term with A4 B in the expansion of (A+B)5
gives the answer which is 5. Generally, for two outcomes A and B of an experiment n times, there are nr
ways for A to occur r times and B to occur n − r times. So for Example 4.2, we have
implying that there is 1 way for A to occur 5 times, 5 ways for A to occur 4 times and B once, 10 ways for
A to occur thrice and B twice, and so on.
Exercise. A coin is tossed 4 times. Give the number of ways for heads to occur 0, 1, 2, 3, and 4 times.
20
Evaluation
A. Basic Concepts (2 points each)
(1) Explain how the Pascal’s Triangle is constructed.
(2) How is the Pascal Triangle used?
(3) Illustrate Theorem 4.2 by reflecting it to the Pascal Triangle.
(4) How do we use the Binomial Theorem in expanding binomial powers?
(5) How do we use the Binomial Theorem as a counting technique?
B. Problem Solving (5 points each)
(1) Construct the Pascal Triangle down to n = 10.
(2) Expand:
(a) (x + y)7
(b) (2x − y)6
(c) (x − y + z)3
(3) Garry and Bobby play 5 chess games. How many of each possible outcome are there (Win-Draw-
Loss)?
C. Theoretical Work (10 points each)
n n(n − 1)
(1) Show that 2 = .
2
Pn
(2) Prove that k=1 (−1)k nk = 0.
Pn
(3) Prove that k=1 2k nk = 3n .
21
Lecture 05 - Sets and Set Cardinalities
Lecture Objectives
(1) Review basic concepts in set theory including basic definitions, set operations, and some important
theorems
Set theory is essential to perhaps all areas of mathematics. Most mathematicians believe that sets are the
second or third most important objects in mathematics after numbers or functions. Probability, being built
over sample spaces which are also sets, of course needs sufficient fundamental concepts of sets. In this lecture
we give a quick review of the most important lessons in Set Theory including basic definitions, set operations,
and set cardinalities which is also a topic on counting.
Example 5.1. Write the set of all vowels in the English alphabet.
Solution. S = {a, e, i, o, u}
A set B is said to be a subset of A if every element of B is also in A. We write B ⊆ A. In other words,
B ⊆ A iff x ∈ B ⇒ x ∈ A.
A = B iff A ⊆ B and B ⊆ A.
22
5.2 Set Operations
We will now go on the algebra of sets. Along our discussions, we will accompany set operations with Venn
diagrams. We will define four basic set operations followed by some of the most important identities. For all
the following definitions, we let A and B be sets.
Definition 5.1. The union of A and B, denoted A ∪ B, is the set of all elements of A or B. In other
words,
A ∪ B = {x | x ∈ A or x ∈ B} .
Definition 5.2. The intersection of A and B, denoted A ∩ B, is the set of all elements of A and B. In
other words,
A ∩ B = {x | x ∈ A and x ∈ B} .
Definition 5.3. The (universal) complement of A is the set of all elements that are not in A. In other
words,
Ac = {x | x ∈
/ A}
Definition 5.4. The relative complementof B from A is the set of all elements of A that are not in B.
In other words,
A − B = {x | x ∈ A and x ∈/ B} .
23
Example 5.2. Let U = {1, 2, 3, 4, 5, 6, 7, 8, 9, 19}, A = {2, 3, 4, 5, 6}, B = {5, 6, 7, 8, 9}, and C = {2, 4, 6, 8}.
Find:
(1) A ∪ B
(2) A ∩ C
(3) Ac
(4) B − C
Solution.
(1) A ∪ B = {2, 3, 4, 5, 6, 7, 8, 9}
(2) A ∩ B = {2, 4, 6}
(3) Ac = {1, 7, 8, 9, 10}
(4) B − C = {5, 7, 9}
Theorem 5.1. Let A, B, and C be sets. Then,
(i) (Identity Laws) A ∪ ∅ = A, A ∩ U = A;
(ii) (Domination Laws) A ∩ ∅ = ∅, A ∪ U = U ;
(iii) (Complement Laws) A ∪ Ac = U, A ∩ Ac = ∅;
(iv) (Idempotent Laws) A ∪ A = A, A ∩ A = A;
(v) (Associativity Laws) A ∪ (B ∪ C) = (A ∪ B) ∪ C, A ∩ (B ∩ C) = (A ∩ B) ∩ C;
(vi) (Commutativity Laws) A ∪ B = B ∪ A, A ∩ B = B ∩ A
(vii) (Distributivity Laws) A ∪ (B ∩ C) = (A ∪ B) ∩ (A ∪ C), A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩ C)
(viii) (De Morgan’s Laws) (A ∪ B)c = Ac ∩ B c , (A ∩ B)c = Ac ∪ B c
In proving these identities, the principle of set equality: “A = B if and only if A ⊆ B and B ⊆ A” can
be useful. We do this by assuming an element from the left hand side and showing that it is also an element
of the other side, and then the doing similarly for the reverse.
Proof. (Theorem 5.1) We only prove (i) and (viii). For (i), let x ∈ A ∪ ∅. Then, x ∈ A or x ∈ ∅. But, x ∈/∅
because ∅ is empty. Hence, x ∈ A. Thus, A ∪ ∅ ⊆ A. Now if x ∈ A, then x ∈ A or x ∈ ∅. This means that
x ∈ A ∪ ∅. This implies that A ⊆ A ∪ ∅.
Now for (viii), let x ∈ (A ∪ B)c . Then, x ∈
/ A ∪ B. Hence, x ∈ / B. Thus, x ∈ Ac and x ∈ B c .
/ A and x ∈
So x ∈ Ac ∩ B c . So (A ∪ B) ⊆ Ac ∩ B c . Now if x ∈ Ac ∩ B c , then x ∈ Ac and x ∈ B c . Hence, x ∈/ A and
x∈/ B. Thus it is not true to say that x ∈ A or x ∈ B, or in other words, it not true to say that x ∈ A ∪ B.
/ A ∪ B which means x ∈ (A ∪ B)c . Hence, Ac ∩ B c ⊆ (A ∪ B)c .
Thus, x ∈
24
5.3 Set Cardinality
Lastly, we proceed to the final lesson in counting: set cardinalities. Formally, if In = {1, 2, . . . , n} and there
exists a bijection from In to a finite set A, then the cardinality of A is |A| = n. Loosely speaking, the
cardinality of a finite set A is the number of elements in A. We present the following theorem without proof.
Example 5.3. There are 30 students in a classroom. Fourteen of them own a phone while 16 own a
computer. If 7 own neither phone or computer, how many own both?
Solution. If A is the set of those who own a phone and B the set of all those who own a computer, then,
|A| = 14, |B| = 16, and |A ∪ B| = 23. Now,
|A ∪ B| = |A| + |B| − |A ∩ B|
|A ∩ B| = |A| + |B| − |A ∪ B|
= 16 + 14 − 21
=9
25
26
Chapter 1
Probability
Probability Theory may be described as the “mathematics of chance”. We will first define sample spaces
and events. Our study is focused on the likelihood of occurrence of events in an experiment. Then, we will
establish probability as a measure of this likelihood. Through several examples, we will calculate probability
values of events which is the main feature of this chapter. In addition, we will also tackle a special redefinition
of probability intended for events of which occurrence depend on another event. Lastly, we will apply to
real-world problems an important result called Baye’s Rule.
Lectures
Lecture 6. Sample Spaces and Events
Lecture 7. Axiomatic Probability Theory and Other Approaches
Lecture 8. Calculating Probabilities
Lecture 9. Conditional Probability and Independent Events
Lecture Objectives
(1) Define sample space, sample point, event, sample size, and event size
(2) Write out sample spaces and events of a given experiment
(3) Use counting techniques to determine the sample size and event size of a sample space and an event
The four fundamental terms in Probability Theory are experiment, sample space, sample point, and event.
Experiment has already been understood in the previous chapter while the latter three terms correspond to
similar concepts from Set Theory.
S = {1, , 2, 3, 4, 5, 6}.
(2) A committee is organizing an event. They model the number of people who will possibly come in the said
event. Then sample space is given by
S = {0, 1, 2, 3, . . .}.
(3) The sample space of the experiment “gender of a newborn baby” is given by
S = {male, female}.
(4) The estimated time of arrival of a plane flight is between 30 and 45 minutes. The sample space is
S = {x ∈ R | 30 ≤ x ≤ 45}.
(5) A name is to be given to a newly born baby. The sample space is hard to write but the following might
illustrate it:
S = {Aaron, James, Anna, . . .}.
There are two types of sample spaces: discrete and continuous. A discrete sample space is one that is
countable. Examples of discrete sample spaces are in (1), (2), (3), and (5). A continuous sample space is
one which is uncountable. Example (4) illustrates this type of sample space.
A sample space may also be classified as either finite or infinite. For a discrete sample space, it is finite
if there is a finite number of sample points. For example, (1) and (3) are finite discrete sample spaces but (2)
and (5) are infinite discrete sample spaces because the number of possible outcomes in these experiments are
unbounded. A continuous sample space is finite if the values are bounded, and infinite otherwise. Example
(4) has a finite continuous sample space.
28
The concept of sample size is vital to our approach to studying probability theory. We shall separately
define sample size for the discrete case and continuous case. We denote the sample size of S by |S|.
For a finite discrete sample space, the sample size is its Cardinality. Hence, for Example (1), the sample
size is 6, for (3) is 2. For an infinite discrete sample space, we denote the sample size by ℵ0 (read as “aleph
null ”).
The sample size of continuous sample spaces is harder to define since a background in measure theory is
needed. However, so as not to require much prerequisite understanding, the continuous sample spaces to be
dealt with in this course are only intervals of real numbers, or unions thereof. We define the sample space
of an interval from x = a to x = b (regardless of whether it is open or closed) to be its length b − a. The
size of an infinite sample space is denoted by ∞. The sample size in (4) is 45 − 30 = 15.
6.2 Events
An event E is a subset of the sample space S.
Example 6.2. For the following examples, each item corresponds to the experiments in Example 6.1.
(1) In the experiment of rolling a die, the event described by “the outcome is even” is
E = {2, 4, 6}.
(2) In the second experiment, the event “less than 100 people will come” is given by
E = {0, 1, 2, . . . , 99}.
(3) The event that the plane arrives not earlier than 35 minutes but not later than 40 minutes is given by
the set
E = {x ∈ R | 35 ≤ x ≤ 40}.
(4) The event “the name of a newly born baby starts with the letter A” is given by
Set operations also play an important role in dealing with events. Either or or signify the union of two
events. For example, the phrase “either E or F ” means the event E ∪ F . The conjunction and signify the
intersection of two events. For example, the phrase “both E and F ” describe the event E ∩ F . The negator
“not” signifies the complement. For example, the event “not E” is E c . Lastly relative complement E − F
may be described as “E but not F ”.
Example 6.3. Suppose the sample space of an experiment is given to be
S = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}.
29
Solution.
The size of an event E, called event size and denoted |E|, is defined in a similar manner as for sample
size.
Evaluation
A. Basic Concepts (2 points each)
(1) What are the four undefined terms of probability theory? Describe each.
(2) Differentiate: discrete vs continuous, finite vs infinite sample spaces.
(3) What is the difference between the sample size of a discrete finite sample space and of a continuous
finite sample space? What about for infinite ones?
30
Lecture 07 - Axiomatic Probability Theory and Other
Approaches
Lecture Objectives
(1) Use axiomatic approach to prove properties of probability theoretically
(2) Familiarize other approaches to probability theory: classical, relative frequency, and subjective
What is probability? This question is not as simple as it reads. The ways to define probability may range
from a theoretical standpoint to an experimental conclusion, and may sometimes even be just someone’s
opinion. In this lecture we attempt to define probability using the axiomatic approach, and three other
approaches.
fheads (n)
P (heads) = lim .
n→∞ n
That means we need to toss a coin an infinite number of times in order to calculate the probability P (heads).
By this example, we can already see the obvious problem posed by this definition: it is practically impossible
to toss the coin an infinite number of times!
This problem may require us to look for other ways of obtaining probability. In the following sections of
this lecture, we will learn alternative ways of defining probability. We start with a theoretical way called the
axiomatic approach.
31
Axiom 1. 0 ≤ P (E) ≤ 1.
Axiom 2. P (S) = 1.
Axiom 1 states that probability ranges only between 0 and 1. If P (E) = 0, then E is said to be an
impossible event, while if P (E) = 1, then E is certain. Axiom 2 states that the probability of the sample
space is 1. That is, that the sample space is certain. Now, we state first that two events E and F are
mutually exclusive if and only if they are disjoint. That is,
Axiom 3 states that the probability of the union of two disjoint events is the sum of their probabilities.
From these axioms, we prove a few propositions.
1 = P (S)
= P (E ∪ E c )
= P (E) + P (E c )
Therefore, P (E c ) = 1 − P (E).
From Theorem 7.1, if p is the probability that E occurs, then, 1 − p is the probability that E does not
occur. For example, if the probability that a newborn baby is a boy is 43 , then the probability that it is a
girl is 14 .
P (F ) = P (E ∪ (E c ∩ F ))
= P (E) + P (E c ∩ F )
P (F ) ≥ P (E).
The above theorem tells us that larger events have larger probabilities. This gives an “order” for proba-
bility as a measure.
P (E ∪ F ) = P (E) + P (F ) − P (E ∩ F ).
32
Proof. Note that E ∪ F = E ∪ (E c ∩ F ) and that E and E c ∩ F are mutually exclusive. Now,
P (E ∪ F ) = P (E ∪ (E c ∩ F ))
= P (E) + P (E c ∩ F )
P (F ) = P ((E ∩ F ) ∪ (E c ∩ F ))
= P (E ∩ F ) + P (E c ∩ F )
P (E c ∩ F ) = P (F ) − P (E ∩ F )
P (E ∪ F ) = P (E) + P (F ) − P (E ∩ F ).
It can be shown from Theorem 7.1 that P (∅) = 0 (see next Exercise). Moreover, when E and F are
mutually exclusive, then Theorem 7.3 is shown to be consistent with Axiom 3.
Exercise. Show that P (∅) = 0.
The preceding axioms and theorems can be used to calculate probability values as the following examples
show.
Example 7.1. In a survey conducted by a company among its employees, it was found that 64% are
computer-literate and 25% own a service vehicle. Further analysis also shows that 9% are both computer-
literate and vehicle owners. Suppose we pick one employee in the company, what is the probability that he is
either computer-literate or a vehicle owner?
Solution. We have
P (computer-literate) = 0.64,
P (vehicle owner) = 0.25, and,
P (computer-literate ∩ vehicle owner) = 0.09.
Thus, the probability that the employee is either computer-literate or a vehicle owner is 0.80. We can
also say that the chance that he is either computer-literate or a vehicle owner is 80%.
In order to shorten our notation, we might denote the events into capital letter, say, E and F . We provide
a short-hand notation to rewrite the above solution.
Solution. Let
E : employee is computer-literate
F : employee is a vehicle owner
Then, P (E) = 0.64, P (F ) = 0.25, and P (E ∩ F ) = 0.09. Thus,
33
Exercise. What is the probability that the employee is not computer-literate and not a vehicle owner?
Exercise. Show that if E ⊆ F , then P (F − E) = P (F ) − P (E). [Hint: Note that A − B = A ∩ B c .]
Suppose we have three events E, F, G. Then, by repetitive application of Theorem 7.3,
P (E ∪ F ∪ G) = P (E ∪ (F ∪ G))
= P (E) + P (F ∪ G) − P (E ∩ (F ∪ G))
= P (E) + P (F ) + P (G) − P (F ∩ G) − P ((E ∩ F ) ∪ (E ∩ G))
= P (E) + P (F ) + P (G) − P (F ∩ G) − [P (E ∩ F ) + P (E ∩ G) − P ((E ∩ F ) ∩ (E ∩ G))]
= P (E) + P (F ) + P (G) − P (E ∩ F ) − P (E ∩ G) − P (F ∩ G) + P (E ∩ F ∩ E ∩ G)
= P (E) + P (F ) + P (G) − P (E ∩ F ) − P (E ∩ G) − P (F ∩ G) + P (E ∩ F ∩ G).
· · · + (−1)n+1 P (E1 ∩ E2 ∩ · · · ∩ En ).
For theoretical work, the axiomatic approach to probability theory is always used. These axioms and
theorems will be used in most of the following lectures we will have in this course. However, the axiomatic
approach may not be the most practical approach of obtaining probability values. The following sections
show other methods of calculating the probability of an event.
34
Example 7.4. Two cards are randomly drawn from the standard deck. What is the probability that both are
clubs?
Solution. There are |S| = 51 13
2 = 1275 possible outcomes and |E| = 2 = 78 possible outcomes that are
both clubs. Thus,
78
P (E) = ≈ 0.0612.
1275
Exercise. Two cards are successively drawn from the standard deck. What is the probability that the first
card is clubs and the second is diamonds?
Now we deal with examples of experiments in which the sample space S is continuous. Recall that for
an interval E = [a, b], we define the size of E by
|E| = b − a.
Example 7.5. A plane is expected to arrive in between 30 and 45 minutes. What is the probability that it
arrives between 36 and 39 minutes?
Solution. We have |S| = 45 − 30 = 15 and |E| = 39 − 36 = 3. Thus the probability that the plane arrives
in the said time interval is
3 1
P (E) = = .
15 5
Next, we move in to a more experimental approach to probability.
35
7.5 Subjective Approach
Sometimes, probability is simply given by one’s opinion. In this approach, the likelihood that an event will
occur is measured by personal belief. This appears many times especially when a mathematical formulation
to determine the probability is unachievable.
Example 7.8. A man plans on asking a girl to marry him. The girl says that the man has 60% chance of
getting a YES from her.
Subjective probability values are usually given by expert opinion such as the following example.
Example 7.9. A patient suffering from a serious disease consults a doctor. The doctor says that the patient
has 50% chance of surviving the disease.
Evaluation
A. Basic Concepts (2 points each)
(1) Create a flowchart that instructs when to use which among classical, relative frequency, and subjective
approach to use.
(2) Describe what probability is in your own words.
B. Problem Solving (5 points each)
(1)
C. Theoretical Work
36
Lecture 08 - Calculating Probabilities
Lecture Objectives
(1) Use counting techniques and classical method of assigning probability to calculate the probability of a
given event
(2) Apply axiomatic approach to probability to calculate probability values
In this lecture we will be dealing with a series of examples of problems that ask for probability values of
events. We will be applying what we have learned from the previous lectures, especially those from Lectures
2 to 5, and 7.
37
8.2 Examples Involving Linear Arrangements
Recall our basic permutation rule that there are n! ways of arranging n objects in a line. We apply this to
the following examples.
Example 8.4. Five friends went to watch a movie sitting in a row. What is the probability that Alex and
Bobby sit next to each other?
Solution. There are |S| = 5! = 120 possible ways to arrange themselves. Two count the number of ways
Alex and Bobby sit next to each other, we have
|E| = 2 · 4!
= 48.
Hence, the probability that they are sitting next to each other is
48 2
P (E) = = .
120 5
Example 8.5. Six runners A, B, C, D, E, and F are racing to the finish line. What is the probability that B
will come first and C will finish last?
Solution. There are |S| = 6! = 720 possible ways the race could end. There are |E| = 1 · 4 · 3 · 2 · 1 · 1 = 24
possible ways for B to finish first and C last. Thus, the probability for this event to happen is
24 1
P (E) = = .
720 30
Exercise. Calvin dreams to visit 5 European countries namely England, Spain, France, Italy, and Germany.
What is the probability that he will visit either England or Italy first and will not choose Spain as the last
country to visit?
Exercise. A jar contains 4 red, 3 green, and 2 blue chips. A chip is drawn one after another and the color
is recorded until there is no more chip inside the jar. What is the probability that the last chip drawn is
red?
38
8.4 Examples Involving Circular Permutations
Next we deal with experiments involving circular arrangements. We recall that there are n! ways to arrange
n objects in a circle. We apply these principles in solving some related problems.
Example 8.7. Four friends A, B, C, and D are to sit around a table. What is the probability that
(1) A and B sit together?
(2) A and B does not sit together?
Solution. There are |S| = (4 − 1)! = 6 ways to arrange them around a table. For (1), there are |E| =
(3 − 1)! · 2 = 4 ways for A and B to sit together. Thus, the probability that they sit together is
4 2
P (E) = = .
6 3
For (2), there are |E| = 6 − 4 = 2 for them not to sit together. Hence,
2 1
P (E) = = .
6 3
Exercise. A bracelet is to be decorated with diamond, ruby, emerald, jade, and pearl beads. What is the
probability that
(1) diamond and pearl are not placed next to each other; and,
(2) ruby, pearl, and ruby are together?
Exercise.
(1) Four boys and three girls have to be seated in a row. What is the probability that no two boys sit
together?
(2) Seven cyclists join a race. What is the probability that A, B, and C are in the top 3?
39
8.6 Examples Involving Combinations
n
Recall the combination function r defined by
n n!
= .
r r!(n − r)!
This function gives the number of possible ways to select r objects from a set of n, where arrangement does
not matter.
Example 8.10. A group of friends is composed of four boys B1 , B2 , B3 , and B4 and three girls G1 , G2 and
G3 . They received three free tickets to the zoo and should choose among themselves on who should go. What
is the probability that all who will go are boys?
Solution. They have a total of |S| = 72 = 21 choices. There are |E| = 43 = 4 ways to select three boys
to go. Hence, the probability that all the ones who will go are boys is
4
3 4
P (E) = 7
= .
2
21
Example 8.11. Three cards were drawn from the standard deck. What is the probability that all of them
are spades?
Exercise.
(1) Two cards are to be drawn from the standard deck. What is the probability that both are black cards?
(2) In a 6/42 Lotto draw, six balls are to be drawn from a set of 42 balls numbered 1 through 42. What is
the probability that all the winning entries are even?
This has an equivalent probability identity through Theorem 7.3 stating that for two events E and F
(regardless of whether mutually exclusive or not),
P (E ∪ F ) = P (E) + P (F ) − P (E ∩ F ).
Example 8.12. In a class, there are 50 students, of which 27 own a phone and 29 own a tablet. If 5 students
own neither phone nor tablet, what is the probability that a randomly chosen student owns both a phone and
a tablet?
40
Solution. Let E be the event that the student owns a phone and F the event that he or she owns a tablet.
27 29 5
Then, P (E) = 50 and P (F ) = 50 . Also, P (E ∪ F ) = 1 − 50 = 45 9
50 = 10 . This mens that
P (E ∩ F ) = P (E) + P (F ) − P (E ∪ F )
27 29 45
= + −
50 50 50
11
= .
50
Exercise. A school has two basketball teams: the Men’s Team and the Women’s Team playing in separate
leagues. According to the school’s 30-season history of joining both leagues, there was one season where
both teams won championship. The Men’s Team has been champions 4 times and the Women’s team has
been champions 3 times. What is the probability that if a random season is picked, at least one of the teams
was the champion?
41
Example 8.15. The probability that Lance will fail in Algebra is 0.2. The probability that he will fail in
Analysis is 0.1. The probability that he passes both subjects is 0.6. What is the probability that he will pass
at least one of the two subjects?
Solution. Let E be the event that he passes Algebra and F the event that he passes Analysis. Then,
P (E) = 1 − 0.2 = 0.8 and P (F ) = 1 − 0.1 = 0.9. Also, P (E ∩ F ) = 0.7. Hence,
P (E ∪ F ) = P (E) + P (F ) − P (E ∩ F )
= 0.8 + 0.9 − 0.6
= 0.9
42
Lecture 09 - Conditional Probability and Independent
Events
Lecture Objectives
(1) Define conditional probability and independence
Sometimes, we need the probability of an event with some information already available. This leads us to a
new important concept called conditional probability. Aside from this, conditional probability also provides
simpler computations for some probability calculation problems.
Example 9.1. Suppose a die is rolled twice. What is the probability that the sum is 8, given that the first
roll yields 2?
Notice that in this example, the event in question: “the sum is 8”, is preceded by a given event “first roll
yields 2”. The sample space S of rolling a die twice consists of 36 equally likely outcomes. However, when
it was given that the the outcome of the first roll was 2, the sample space was reduced to:
S 0 = {(2, 1), (2, 2), (2, 3), (2, 4), (2, 5), (2, 6)}
Solution (Example 9.1). There are |S 0 | = 6 outcomes in the (reduced) sample space. Of these, only 1
1
yields a sum of 8. Hence the probability that the sum is 8 given that the first roll is 2, is .
6
In such example, we are given with two events E (the sum is 8) and F (the first roll is 2), and we are
asked for the conditional probability of E given F , denoted P (E|F ) (read as “conditional probability of E
given F ”). The event F is called the condition.
Definition 9.1. Let E and F be events with P (F ) 6= 0. The conditional probability of E given F is
defined as
P (E ∩ F )
P (E|F ) = .
P (F )
43
6
Solution. Let E be the event that the sum is 8 and F the event that the first roll is 2. Then, P (F ) = 36 = 16 .
1
Also, P (E ∩ F ) = 36 because only (2, 6) is the outcome which has a some of 8 and a first roll of 2. Hence,
1
1
P (E|F ) = 36 = .
1 6
6
Example 9.2. What is the probability that the sum of two rolls of a die is even given that the first roll is
odd?
Solution. Let E be the event that the sum of the two rolls is even and F the event that the first roll is odd.
Then, F = {(j, k) | j is odd, k is any number from 1 to 6}. Hence, |F | = 3 · 6 = 18 so that P (F ) = 18 1
36 = 2 .
9 1
Also, E ∩ F = {(j, k) | j, k are odd}. Hence, |E ∩ F | = 3 · 3 = 9 so that P (E ∩ F ) = 36 = 4 . Now,
1
1
P (E|F ) = 4 = .
1 2
2
Exercise. According to research, 83.2% of fathers who beat their children were also beaten by their fathers
when they were young. If 18.1% of fathers beat their children, what percent of fathers both beat and were
beaten?
Exercise. If E and F are mutually exclusive and P (F ) 6= 0, what is P (E|F )?
Exercise. In a class of 50, 37 own a phone and 12 own a tablet, while 6 own neither phone nor tablet. If a
random student who owns a tablet is selected, what is the probability that he also owns a phone?
From the definition of the conditional probability, one may yield
given that P (E) 6= 0 and P (F ) 6= 0. We generalize this observation to any number of events E.
Theorem 9.1 (Multiplication Rule). Let E1 , E2 , . . . , En be events such that P (Ei ) 6= 0 for i = 1, 2, . . . , n.
Then,
Proof. Exercise.
44
which implies that conditional probability satisfies Axiom 2. Now if E1 and E2 are mutually exclusive events,
then E1 ∩ F and E2 ∩ F are also mutually exclusive. It follows that,
P ((E1 ∪ E2 ) ∩ F )
P ((E1 ∪ E2 )|F ) =
P (F )
P ((E1 ∩ F ) ∪ (E2 ∩ F ))
=
P (F )
P (E1 ∩ F ) + P (E2 ∩ F )
=
P (F )
P (E1 ∩ F ) P (E2 ∩ F )
= +
P (F ) P (F )
= P (E1 |F ) + P (E2 |F )
P (E c |F ) = 1 − P (E|F ).
Proof. We know that F = (E ∩ F ) ∪ (E c ∩ F ) and that E ∩ F and E c ∩ F are mutually exclusive. Thus,
1 = P (F |F )
P ((E ∩ F ) ∪ (E c ∩ F ))
=
P (F )
P (E ∩ F ) P (E c ∩ F )
= +
P (F ) P (F )
c
= P (E|F ) + P (E |F ).
Therefore, P (E c |F ) = 1 − P (E|F ).
P (E|F ) ≤ P (G|F ).
P (E ∩ F ) P (G ∩ F )
≤
P (F ) P (F )
P (E|F ) ≤ P (G|F ).
45
Proof. We have,
P ((E ∪ G) ∩ F )
P ((E ∪ G)|F ) =
P (F )
P ((E ∩ F ) ∪ (G ∩ F ))
=
P (F )
P (E ∩ F ) + P (G ∩ F ) − P (E ∩ F ∩ G ∩ F )
=
P (F )
P (E ∩ F ) P (G ∩ F ) P ((E ∩ G) ∩ F )
= + −
P (F ) P (F ) P (F )
= P (E|F ) + P (G|F ) − P ((E ∩ G)|F ).
Exercise. Suppose two dice are rolled. Let E be the event that the sum is 6 and F the event that the first
die is even. Find
(1) P (E|F )
(2) P (F |E)
Definition 9.2. Two events E and F are said to be independent if and only if
P (E ∩ F ) = P (E)P (F ).
Intuitively, independent events do not affect the probability of each other. For example, the gender of a
baby born from a mother this year does not affect the gender of the baby on her next pregnancy. Also, the
outcome of tossing a coin does not affect the outcome of the next toss.
Independent events should also be not confused with mutually exclusive events. Mutually exclusive events
are events that do not have a common outcome.
Example 9.3. Suppose a die is rolled twice. Let E be the event that the first roll is even and F the event
that the second roll is odd. Show that E and F are independent.
46
Solution. We know that |E| = 3 · 6 = 18 so that P (E) = 12 and |F | = 6 · 3 = 18 so that P (F ) = 21 . Also,
|E ∩ F | = 3 · 3 = 9 so that P (E ∩ F ) = 14 . Now,
1 1 1
P (E)P (F ) = = = P (E ∩ F ).
2 2 4
Exercise. Suppose four coins are tossed. Let E be the event that all coins land on heads and let F be the
event that the first coin is heads. Show that E and F are dependent.
Exercise. Suppose two dice are rolled. Verify if the following pairs of events are independent.
(1) “sum is 6” and “the first die is even”
(2) “sum is 6” and “the first die is 3”
(3) “sum is 7” and “the first die is even”
Independence can also be extended for three or more events as the next definition of multiple independence
states.
Definition 9.3. Let E, F, G be events. Then E, F, G are independent iff
(i) P (E ∩ F ∩ G) = P (E)P (F )P (G)
(ii) P (E ∩ F ) = P (E)P (F )
(iii) P (E ∩ G) = P (E)P (G)
(iv) P (F ∩ G) = P (F )P (G).
Exercise. Suppose a die is rolled thrice. Show that the events “first is odd”, “second is even”, and “third
is greater than 4” are independent.
P (E)
O(E) = .
P (E c )
P (E)
O(E) = .
1 − P (E)
47
Example 9.4. The probability that our basketball team wins tonight is 0.8. Find our odds of winning.
Solution. Let E be the event of winning tonight. Then P (E) = 0.8 and so,
0.8
O(E) =
1 − 0.8
= 2.
Exercise. Suppose four coins are tossed. What are the odds of getting at most three heads?
Exercise. Analyze the definition of odds to answer the following questions.
48
Lecture 10 - Bayes’ Rule
Lecture Objectives
(1) Derive Bayes’ formula
Bayes’ Rule is an important formula used in many practical applications. It deals with the relationship of a
given event to a family of events which partition the sample space.
(i) Bi 6= ∅ for i = 1, 2, . . . , n;
(iii) B1 ∪ B2 ∪ · · · ∪ Bn = S.
We start with the simplest case B = {B, B c } for some B such that P (B) 6= 0.
A = (A ∩ B) ∪ (A ∩ B c )
P (A) = P (A ∩ B) + P (A ∩ B c ).
Example 10.1. A certain drug test procedure can positively identify a drug user at a rate of 99.8%. It can
also successfully return negative at a rate of 98.7% if the person is not a drug user. In a certain community,
it is estimated that 5% of the residents are drug users. If a random drug testing is conducted for a certain
resident, what is the probability that the test will return positive?
49
Solution. Let A be the event that the test returns positive and B the event that the person is a drug user.
Then,
P (B) = 0.05
P (B c ) = 0.95
P (A|B) = 0.998
P (A|B c ) = 1 − P (Ac |B c )
= 1 − 0.987
= 0.013
Now,
Hence, the probability that the test will return positive to a randomly selected person is 0.06225 (or 6.23%).
Example 10.2. It was found in a research there are a very small population of left-handed individuals.
Around 7.2% of males and 4.0% of females are left-handed. Also, it is known that male and female populations
are (approximately) equal. What percentage of people in the world are left-handed?
Solution. Let A be the event that the person is left-handed and B the event that he is a male. Since male
and female populations are (approximately) equal, P (B) = P (B c ) = 0.5. From the problem,
P (A|B) = 0.072
P (A|B c ) = 0.040.
Hence,
Hence,
P (A|B)P (B)
P (B|A) = .
P (A)
This equation can be used to reverse the conditional probability from P (A|B) to P (B|A).
Example 10.3. 22% of males who took a college entrance exam passed. In the exam, 67% were males and
the passing rate is 25%. What percent of passing applicants were females?
Solution. Let A be the event that the applicant passes the exam and B be the event that the applicant is
male. Then,
P (A) = 0.25
P (B) = 0.67
B(A|B) = 0.22
50
Then, the probability that the passer is a male is,
(0.22)(0.67)
P (B|A) = = 0.5896.
(0.25)
Hence, the probability that the passer is a female is
1 − 0.5896 = 0.4104.
In other words, 41.04% of the passers were females.
Exercise. 20% of the employees of a company are managers. 90% of these managers have an insurance.
Also, it is known that in the company, 60% of those who do not have an insurance are not managers. What
is the proportion of the employees have insurance in the company?
Exercise. In a tournament, Germany sent 20 players; France sent 40 players, and England sent 60 players.
Exactly 10 players from each country have played this tournament last year. If the player who becomes the
champion played last year, what is the probability that he is not German?
51
52
Chapter 2
One weakness of our current approach to probability is that the domain of our probability function P (·) is
a family of sets — and not of real numbers. That is, there is a need to count the size of events and sample
spaces in order to obtain probability values. In this chapter, we introduce the notion of random variables.
In this way, we approach probability theory in a way that the domain of our probability function (which we
will later call probability distribution function or probability density function) is a set of real numbers.
Our primary objective in this chapter is to understand random variables, distribution and cumulative
distribution functions, and expected values.
Lectures
Lecture 11. The Notion of a Random Variable
Lecture 12. Probability Distribution Function
Lecture Objectives
(1) Define random variables
We prefer to calculate probability values directly as a function of some real number. This is not possible in
our previous treatment of probability where in for P (E), E is not a number but instead, a set. Our objective
for this lecture is to understand that through random variables, events will be represented by real numbers.
Definition 11.1. Let S be a sample space of some experiment. A random variable over S is a mapping
X : S → R.
For a real number a, the set X −1 (a) of all pre-images of a represents a unique event included in S.
In this manner, the event X −1 (a) is being represented by the real number a. Furthermore, the family
{X −1 (a)|X −1 (a) 6= ∅} form a partition of S.
Let us consider, for example, the experiment of rolling a die twice. The sample space is given by
Suppose we are interested with the probability that the sum of the two rolls is k. In this case we can define
X as the sum of the two rolls. That is,
X(1, 1) = 2
X(1, 2) = 3
X(1, 3) = 4
..
.
X(6, 6) = 12
The inverse image X −1 (k) gives the set of all sample points which give a sum of k, i.e., the event “the sum
54
is k”. In particular,
X −1 (2) = {(1, 1)}
X −1 (3) = {(1, 2), (2, 1)}
X −1 (4) = {(1, 3), (2, 2), (3, 1)}
X −1 (5) = {(1, 4), (2, 3), (3, 2), (4, 1)}
X −1 (6) = {(1, 5), (2, 4), (3, 3), (4, 2), (5, 1)}
X −1 (7) = {(1, 6), (2, 5), (3, 4), (4, 3), (5, 2), (6, 1)}
X −1 (8) = {(2, 6), (3, 5), (4, 4), (5, 3), (6, 2)}
X −1 (9) = {(3, 6), (4, 5), (5, 4), (6, 3)}
X −1 (10) = {(4, 6), (5, 5), (6, 4)}
X −1 (11) = {(5, 6), (6, 5)}
X −1 (12) = {(6, 6)}
Further, the event X −1 (k) is usually denoted by
X = k.
For example, X = 5 means the event {(1, 4), (2, 3), (3, 2), (4, 1)}. (We use X −1 (k) when we mean it as the
set and we use X = k when we mean the event.) Note further that when k 6= j, then X −1 (k) ∩ X −1 (j) = ∅
which means that the events X = k and X = j are mutually exclusive.
The range of X, denoted Ran(X), defined as the set of all k ∈ R such that X −1 (a) 6= ∅, also plays an
important role in the study of random variables. In the above example,
Ran(X) = {2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12}.
Example 11.1. Suppose a coin is tossed thrice. Define the random variable X as the number of heads in
the outcome. Map each sample point under X. Identify Ran(X) and give X −1 (k) for all k ∈ Ran(X).
Solution. We have
X(HHH) = 3 X(HHT ) = 2
X(HT H) = 2 X(HT T ) = 1
X(T HH) = 2 X(T HT ) = 1
X(T T H) = 1 X(T T T ) = 0
We also find that
Ran(X) = {0, 1, 2, 3}.
And,
X −1 (0) = {T T T }
X −1 (1) = {HT T, T HT, T T H}
X −1 (2) = {HHT, HT H, T HH}
X −1 (3) = {HHH}
Exercise. Suppose a coin is tossed twice. Define a random variable X as the number of tails in the outcome.
Map each sample point under X. Identify Ran(X) and give X −1 (k) for all k ∈ R(X).
Exercise. Suppose a die is rolled once. Define X to be 1 if the outcome is odd and 0 if the outcome is even.
Give all that were asked in the previous exercise.
Exercise. Suppose a die is rolled thrice.
55
11.2 Classification of Random Variables
In the previous chapter we have dealt with two kinds of sample spaces: discrete and continuous. The same
will be true for random variables. In general, random variables may be classified as discrete or continuous
and are defined in a very similar fashion as for sample spaces.
Although majority of our examples are for the discrete case, it can be realized later that continuous
random variables are more practical in terms of applications, and are sometimes even easier to deal with
theoretically. This is made possible by calculus. In fact, due to the lack of or the difficulty of using tech-
niques to manipulate integers, some discrete random variables are being approximated into their continuous
counterparts.
Since we are more familiar with discrete random variables, we define them first.
This means that all the examples we have mentioned in the previous section of this example are discrete.
Other examples of discrete random variables are:
(1) number of puppies a mother dog may give birth to in a single whelping;
Now consider the following example. Let X be the length of time that Jam spends in waiting for his turn
to be served in a fast food restaurant, measured from the instance he entered the store to the moment he
received his order. If we measure X in minutes, then X could possibly have decimal values. In fact, X can
have any real number value from zero to +∞. In other words,
(1) number of typhoons that enter the Philippine Area of Responsibility in a year
56
11.3 Probability in Terms of a Random Variable
The set X −1 (k) is the event of all outcomes such that X = k. The probability of this event may be denoted
as P (X = k).
Example 11.2. Suppose two dice are rolled. Define X as the sum of the two dice. Find P (X = k) for all
k ∈ R(X).
Solution. We have already seen from the previous section that Ran(X) = {2, 3, . . . , 12}. Now
1
P (X = 2) =
36
2 1
P (X = 3) = =
36 18
3 1
P (X = 4) = =
36 12
4 1
P (X = 5) = =
36 9
5
P (X = 6) =
36
6 1
P (X = 7) = =
36 6
5
P (X = 8) =
36
4 1
P (X = 9) = =
36 9
3 1
P (X = 10) = =
36 12
2 1
P (X = 11) = =
36 18
1
P (X = 12) =
36
Exercise. Find P (X = k) for each k ∈ Ran(X) in Example 11.1.
Example. Suppose a coin is tossed four times. What is the probability that there are:
(a) exactly two heads;
(b) no heads;
(c) at least two heads;
(d) not more than three heads;
(e) more than one head.
Solution.Let X be the number of heads. Then,
X −1 (0) = {TTTT}
X −1 (1) = {TTTH, TTHT, THTT, HTTT}
X −1 (2) = {TTHH, THTH, THTH, THHT, HTHT, HHTT}
X −1 (3) = {THHH, HTHH, HHTH, HHHT}
X −1 (4) = {HHHH}
57
Then,
1
P (X = 0) =
16
4 1
P (X = 1) = =
16 4
6 3
P (X = 2) = =
16 8
4 1
P (X = 3) = =
16 4
1
P (X = 4) =
16
Hence,
(a)
P (exactly two heads) = P (X = 2)
3
= .
8
(b)
P (no heads) = P (X = 0)
1
= .
16
(c)
P (at least two heads) = P (X ≥ 2)
= P (X = 2) + P (X = 3) + P (X = 4)
6 4 1
= + +
16 16 16
11
= .
16
(d)
P (not more than three heads) = P (X ≤ 3)
= P (X = 0) + P (X = 1) + P (X = 2) + P (X = 3)
1
=
16
(e)
P (more than one head) = P (X > 1)
= P (X = 2) + P (X = 3) + P (X = 4)
11
=
16
Exercise. Two dice are rolled. Let X be the sum of the two dice. Using random variables, what is the
probability that
(a) the outcome is even;
(b) the outcome is at least 10;
(c) the outcome is not a multiple of three and greater than 7?
58
Lecture 12 - Distributions
Lecture Objectives
(1) Define probability distribution function and cumulative distribution function
(3) Calculate probabilities using the probability distribution functions and cumulative distribution functions
of random variables
Now, we will put random variables into work — the distribution function. The distribution function works
for random variables in the same way as probability works for events. Through distributions, probability is
directly map from (the value of) the random variable (and not from an event).
Example 12.1. Let X be the number of tails in tossing a coin 4 times. Then, find the probability in terms
of pmf ’s that (a) there are exactly two heads and (b) there are an even number of heads.
3 1
Solution. (a) p(2) = , (b) p(0) + p(2) + p(4) = .
8 2
Exercise. Let X be a discrete random variable with Ran(X) = {1, 2, 3, 4, 5} and define p(k) = 1/5 for
k = 1, 2, 3, 4, 5. Find the probability that (a) X is even, (b) X ≥ 3.
A distribution function, when explicitly defined, may add convenience in the computation of probability
values. This is because pdf’s may serve as “formulas” in calculating probability. For example, in the
59
experiment of tossing a coin n times, and if X is defined as the number of heads, then
n 1
p(k) = .
k 2n
(The derivation of this formula will be dealt with more emphasis in Lecture 14.) With this, it will be easier
to answer problems like, “what is the probability that there will be 2 heads if n = 3?”
3 1
p(2) =
2 23
3
=
8
For another example, if X is the sum in rolling two dice, then
6 − |7 − k|
p(k) = .
36
There are many other examples which can be presented in the next chapter.
The first and most basic property of distribution functions comes from the fact that for an experiment
with sample space S, P (S) = 1.
Theorem 12.1 (Normalization Property for Discrete Random Variables). Let X be a discrete random
variable with pdf p(k). Then, X
p(k) = 1.
k∈R
P P
Proof. Since p(k) = 0 for k ∈
/ Ran(X), k∈R p(k) = k∈Ran(X) p(k). Now, we know that for each j, k ∈
Ran(X) with j 6= k, X = j and X = k are mutually exclusive. Thus,
X
p(k) = p(k1 ) + p(k2 ) + · · ·
k∈Ran(X)
= P (X = k1 ) + P (X = k2 ) + · · ·
= P (X −1 (k1 ) ∪ X −1 (k2 ) ∪ · · · )
= P (S)
= 1,
where k1 , k2 , . . . ∈ Ran(X).
This property may be applied, for example, in the following problem.
k−1
Example 12.2. Let X be a discrete random variable with Ran(X) = {1, 2, 3, 4, 5}. Find c if p(k) =
c
for k = 1, 2, 3, 4, 5.
Solution. Since we must have X
p(k) = 1,
k∈Ran(X)
then, we have
1 = p(1) + p(2) + p(3) + p(4) + p(5)
1−1 2−1 3−1 4−1 5−1
= + + + +
c c c c c
1 2 3 4
= + + +
c c c c
10
1=
c
60
Thus, c = 10.
Exercise. Let X be a discrete random variable with Ran(X) = {−2, −1, 0, 1, 2} with p(k) = ck 2 . Find
p(2).
Exercise. Suppose X is a discrete random variable with possible values Ran(X) = {1, 2, 3}. Is X well-
5
defined if p(k) = for k ∈ Ran(X)?
4 + k2
Definition 12.1. Let X be a continuous random variable. Then, the probability distribution function
f of X satisfies
Z b
P (a < X < b) = f (x)dx.
a
Ra
The definition coincides with the earlier remark that P (X = a) = P (a ≤ X ≤ a) = a
f (x)dx = 0.
Exercise. Let
2
x
if 0 ≤ x ≤ 1
f (x) = 3
0 if otherwise.
(1) P (0 ≤ X ≤ 0.2)
(3) P (−1 ≤ X ≤ 2)
Next, we give an equivalent of Theorem 12.1 for the case of continuous random variables.
Theorem 12.2 (Normalization Property of Continuous Random Variables). Let X be a continuous random
variable with pdf f (x). Then,
Z +∞
f (x)dx = 1.
−∞
Proof. The interval (−∞, ∞) corresponds the sample space S. Hence, 1 = P (S) = P (−∞ < X < +∞) =
R +∞
−∞
f (x)dx.
Theorem 12.3. Let f be integrable on an interval (a, b). If c ∈ (a, b), then
Z b Z c Z b
f (x)dx = f (x)dx + f (x)dx.
a a c
61
Example 12.3. Let X be a continuous random variable with pdf
(
0 if x < 0
f (x) = −x
.
ce if otherwise
Find c.
Solution. We have
Z +∞
1= f (c)dx
−∞
Z +∞
= ce−x dx
0
−x ∞
= c −e 0
=c
(1) (2)
0
if x < 0 ( 2
f (x) = cx + 3 if 0 ≤ x ≤ 10 cx − dx if x < 0
f (x) = c
x if x ≥ 0
0 if x > 10
d
Definition 12.2. Let X be a discrete rv with pdf p(k). Then, the cumulative distribution function
F (k) of X is defined by
Xa
F (a) = p(k).
k=−∞
62
Thus,
F (1) = 0
1 1
F (2) = 0 + =
10 10
1 1 3
F (3) = + =
10 5 10
3 3 3
F (4) = + =
10 10 5
3 2
F (5) = + = 1
5 5
for a = 1, 2, 3, 4, 5.
Exercise. Suppose a coin is tossed 4 times. Let X be the number of heads. Find the cdf of X.
One of the central lessons expected for this chapter is to obtain the cdf of a random variable given its
pdf, and vice versa.
Example 12.5. Calculate the constants in the given distribution. Then, find the cdf (express in terms of
a).
(2)
Theorem 12.4. Let X be a discrete rv with pdf p(k) and cdf F (a). Then,
lim F (a) = 1.
a→∞
We leave the above theorem without proof. In the case of finite discrete rv’s, we only need to look at the
maximum value in Ran(X) to illustrate this theorem. That is, if k 0 = max(Ran(X)), then F (k 0 ) = 1.
Since p(k) ≥ 0 for any k ∈ Ran(X), we know that F (a) does not ever decrease in value whatever p(k) is.
The following theorem formalizes this property.
Theorem 12.5. Let X be a discrete rv with cdf F (a). Then, a1 > a2 , F (a1 ) ≥ F (a2 ).
63
12.5 The Cumulative Distribution Function for Continuous RV’s
***
Definition 12.3. Let X be a continuous random variable with pdf f (x). Then the cumulative distribution
function of X is defined as Z a
F (a) = f (x)dx
−∞
for a ∈ R.
In other words,
F (a) = P (X ≤ a).
There are books which establish definitions of cdf’s for discrete rv’s. However, since discrete rv pdf’s are
already capable of giving probability easily and quickly, we will not anymore define them here.
Example 12.6. Let X be a continuous rv with pdf
(
0 if x < 0
f (x) = .
e−x if otherwise
9
=
10
Exercise. Let X be a continuous rv with pdf
(
0 if x ≤ 0,
f (x) = .
cx2 if otherwise
(a) Solve for c and (b) find F (4) and F (3); (c) find P (3 ≤ X ≤ 4).
Theorem 12.6. Let X be a continuous rv with pdf f (x). Then,
(i) if a < b, then F (a) < F (b); in other words, F is monotone nondecreasing; and,
(ii) lim F (a) = 0 and lim F (a) = 1.
a→−∞ a→∞
64
(ii) We have,
Z a
lim F (a) = lim f (x)dx
a→−∞ a→−∞ −∞
= 0;
and,
Z a
lim F (a) = lim f (x)dx
a→∞ a→∞ −∞
= 1.
When the value F (a) for varying a is needed in a problem, it might be more convenient to derive F (a)
out from a given f (x), or vice versa. We first deal with obtaining F (a) given f (x). Suppose we are given
with a pdf f (x) for a continuous rv X. The, by definition,
Z a
F (a) = f (x)dx.
−∞
Find F (a).
Solution. We have
Z Z
e−x dx = − e−x (−dx)
= −e−x + C.
Since f (x) = 0 for x ≤ 0. Hence, we only consider the domain interval (0, a). Thus,
65
Now we study the reverse process of obtaining the pdf f (x) of a given cdf F (a). Since
F 0 (a) = g 0 (a).
Example 12.8. Let X be a continuous rv with cdf F (a) = 1 − e−a for a > 0. Find its pdf.
Solution. We have
d
f (x) = (1 − e−x ) = e−x
dx
66
Lecture 13 - Expectation
Lecture Objectives
67
The expected value E[X] is sometimes called the theoretical
mean and may be denoted by µ. To calculate E[X], see the follow-
ing example.
Example 13.1. Let X be a discrete rv with pdf
k
p(k) =
10
with Ran(X) = {1, 2, 3, 4}. (Verify if p(k) is well-defined.) Find
E[X].
Solution. (We leave verifying p(k) is well-defined.) We write the
table:
k p(k) kp(k)
1 1
1
10 10
2 4
2
10 10
3 9
3
10 10
4 16
4
10 10
30
Total =3
10
Therefore, E[X] = 3.
Exercise. Let X be a discrete rv with Ran(X) = {0, 1, 2, 3} and
c(k + 1)
pdf p(k) = . Find c and E[X].
5
Exercise.
(a) Let X be the sum in rolling two dice. Find E[X].
(b) Suppose 4 coins are tossed and let X be the number of heads.
Find E[X].
68
Now let us turn our attention to expected values of functions of
random variables. A random variable may also be placed inside some
function φ, with the property that P (X = k) = P (φ(X) = φ(k)).
That is, if X is an rv, we may define Y = φ(X). For example, if
X is the outcome of rolling a die, and Y is defined by Y = 2X,
then we know that Ran(Y ) = {2, 4, 6, 8, 10, 12} because Ran(X) =
{1, 2, 3, 4, 5, 6}.
Example 13.2. Let X be a discrete rv with Ran(X) = {1, 2, 3, 4}
and p(1) = 0.1, p(2) = 0.2, p(3) = 0.3, p(4) = 0.4. Let Y = X 2.
Find E[Y ].
Solution. In the above example, we are asked to find the expected
value of Y = X 2 with range Ran(Y ) = {1, 4, 9, 16}. What we have
to understand is that the event X = k is being mapped to the event
Y = k 2. Since the function must preserve probability, we must have
pX (k) = pY (k 2). That is, pX (1) = pY (1) = 0.1, pX (2) = pY (4) =
0.2, pX (3) = pY (9) = 0.3, pX (4) = pY (16) = 0.4. Now,
k pY (k) kpY (k)
1 0.1 0.1
4 0.2 0.8
9 0.3 2.7
16 0.4 6.4
Total 10
We can generalize the next theorem from the above example.
Theorem 13.1. The expected value of a function of a random
variable may also be calculated by
X
E[φ(X)] = φ(k)p(k).
k∈Ran(X)
69
Proof.
Exercise. Find (i) E[2X −3] and (ii) E[X 2 −X] from the example
above.
= aE[X] + b
We can consider the results of Example 13.3 and part (i) of the
last exercise to verify this theorem. In particular, the above theorem
states that E[aX] = aE[X]. Also, it states the exercise below.
77
78
Chapter 3
(CO 3.3) Compute probabilities, means, and variances of special random variables
Lectures
Lecture 14. Special Discrete Random Variables
Lecture 15. Special Continuous Random Variables
Lecture 14 - Special Discrete Random Variables
Lecture Objectives
(1) Define special discrete random variables: uniform, Bernoulli, binomial, Poisson, hypergeometric, negative
binomial, and geometric distributions.
(2) Give examples of experiments that generate these special random variables.
(3) Calculate probabilities, means, and variances of these random variables using their corresponding special
distribution.
There are random variables that arise naturally from common experiments. This is why we realize that some
distributions are more important than others. We call these special random variables. When the experiment
yields a discrete sample space, we need a discrete random variable, of whose special cases are called special
discrete random variables.
In this lecture, we will study a few special discrete random variables including the corresponding experi-
ments that generate them and their distributions. Then, calculate probability of these random variables and
their common properties and measurements such as mean and variance.
X = Uni(n)
X = Bin(p, n)
80
In other words, the uniform discrete rv is one whose outcomes are equally likely. This rv occurs in experiments
with a (i) finite number of outcomes and (ii) each outcome is equally likely to occur. Examples include:
(2) drawing an object from an urn containing n objects, with each object having an equal likelihood of being
drawn;
(3) selecting a person from a group with n members, with each person having an equal likelihood of being
drawn.
In the uniform rv Uni(n), the parameter n is interpreted as the number of possible outcomes (or the
1
sample size) of the experiment. Hence p(k) = means the probability that the kth sample point occurs from
n
1
a set of n possible outcomes is . This might be the time where it is best to describe experiments as fair
n
experiments in order to assume equality of likelihood among all possible outcomes. So we might emphasize
fair die, fair coin, fair selection.
1
For example, the experiment rolling a fair die has the distribution p(k) = for k = 1, 2, 3, 4, 5, 6. So for
6
example, when asked for the probability that 3 occurs is written as
1
p(3) = .
6
Example 14.1. An urn contains 9 chips numbered 1 to 9. A chip is drawn randomly. Write the probability
that the chip numbered k is drawn using the notation for uniform discrete probability distribution. What is
the probability that 4 is drawn?
Solution. We let X be the number of the chip being drawn. Since each chip must be equally likely to be
drawn, then X is a uniform discrete rv with parameter n.
Exercise. A card is to be randomly drawn from the standard deck. Assign each card to the following
values of k: A♠ 7→ 1, 2♠ 7→ 2, . . . , K♠ 7→ 13, A♣ 7→ 14, . . . , A♦ 7→ 27, A♥ 7→ 40, . . . , K♥ 7→ 52. Express the
probability in terms of the notation above. What is the probability that 4♦ is drawn.
The following bar charts illustrate the effects of the parameter n to the probability distribution. The
vertical axis indicates the probability value for each value of the uniform rv indicated in the horizontal axis.
81
n=2 n=4
n=6 n = 20
We can see from the above figures that as n increases, the uniform probability value decreases. This gives
the common realization that as the sample size increases, the likelihood of each sample point decreases. We
simulate rolling a die 10 000 times. The figure below shows the relative frequencies of each outcome.
1
The dashed line represents the theoretical probability ≈ 0.1667. We can observe the closeness between
6
the actual relative frequencies and the theoretical probabilities.
Let us show that the uniform discrete distribution is well-defined. The range of the uniform discrete rv
is {1, 2, . . . , n}. Hence,
n n
X X 1 1
p(k) = =n = 1.
n n
k=1 k=1
82
Now we calculate the mean and variance:
n
X
µ= kp(k)
k=1
n
X k
=
n
k=1
n
1X
= k
n
k=1
n(n − 1)
=
2n
n−1
= .
2
Also,
n 2
2
X
2 n−1
σ = k p(k) −
2
k=1
n
1 X 2 (n − 1)2
= k −
n 4
k=1
n(n − 1)(2n + 1) (n − 1)2
= −
6n 4
(n − 1)(n + 5)
= .
12
Hence,
r
1 (n − 1)(n + 5)
σ= .
2 3
Example 14.2. Find the mean and variance of the experiment in Example 14.1.
83
The distribution induced by this random variable is called the Bernoulli distribution with parameter p,
where p = p(1). By complement law, p(0) = 1 − p. Hence we have
(
p if k = 1
p(k) = .
1 − p if k = 0
The following bar charts illustrate the effects of the parameter p to the Bernoulli probability distribution.
The vertical axis indicates the probability for each value of the Bernoulli rv in the horizontal axis.
To illustrate the Bernoulli rv, we simulate the experiment in the exercise above 10 000 times. The results
are illustrated in the chart below.
The actual relative frequencies are illustrated through the columns while the theoretical probabilities
11 13
p= ≈ 0.4583 (success) and 1 − p = ≈ 0.5417 (failure) are illustrated through the dotted lines.
24 24
84
Next, we compute the mean:
µ = 1 · p + 0 · (1 − p)
=p
σ 2 = [12 · p + 02 · (1 − p)] − p2
= p − p2
= p(1 − p).
The variance is
1 1 1
σ= 1− = .
2 2 4
Exercise. Find the mean and variance of the experiment in the previous exercise.
where k = 0, 1, 2, . . . , n. The binomial distribution is interpreted as the probability that success will occur k
times if a Bernoulli experiment with success probability p is repeated n times.
The derivation of p(k) may be taken from Lecture 04. If S is the success outcome and F the failure, then
the coefficients of the expansion of (S + F )n tell us the number of cases for every combination possible. In
n
particular, the coefficient k of the term of S k F n−k is the number of ways for k successes and n − k failures
to occur. Since each success has a probability p, we know that k such successes require a probability factor
of pk while since the failure probability is 1 − p, n − k failures require the probability factor (1 − p)n−k .
Example 14.3. Suppose a fair coin is tossed 5 times. Find the probability that there are exactly 3 heads.
1
Solution. We have n = 5 and p = , i.e., X = Bin( 12 , 5). Hence, for k = 3, we have
2
3 2
5 1 1
p(3) =
3 2 2
1
= 10
32
5
=
16
85
Exercise. In a basketball game, Team A has 0.7 probability of winning against Team B. In a 7-game series,
what is the probability that they win at least 5 games?
To illustrate the effects of the parameters p and n to the probability distribution of X = Bin(p, n), the
following figures are presented.
We can notice observable effects of both parameters p and n. One important property to see is that
the success probability p affects the “skewness” of the distribution. That is, p-values closer to 1 shifts
the distribution to be heavier on the right (or skewed to the left), while p-values closer to zero shifts
the distribution to the left (or skewed to the right). Further, the figure below shows the distribution of
X = Bin(0.5, 50).
The binomial distribution gives us a preview to the already-familiar “bell-curve” of the normal distribution
we already know from introductory statistics. We will see more of this relationship between binomial and
normal distribution in the next lecture.
Next, we simulate tossing 10 fair coins 10 000 times. If X is the number of heads in this experiment,
then the relative frequencies for X = 0, 1, 2, . . . , 10 are shown below.
86
The columns show the actual relative frequencies while the dashed lines are the theoretical values. We
also simulate X = Bin(0.75, 10).
n n
X X n
p(k) = pk (1 − p)n−k
k
k=0 k=0
= (p + 1 − p)n
= 1.
The computation of the binomial rv’s mean requires the use of qth moments. We have, the qth moment
E[X q ] of X:
n
X n
E[X q ] = k q pk (1 − p)n−k
k
k=0
n
q n
X
= k pk (1 − p)n−k
k
k=1
since the term with k = 0 zeroes out. Now, using the identity (proof is an exercise)
n n−1
k =n ,
k k−1
87
gives
n
X n−1 k
E[X q ] = k q−1 n p (1 − p)n−k
k−1
k=1
n
q−1 n − 1
X
= np k pk−1 (1 − p)n−k
k−1
k=1
Now if we let j = k − 1,
n−1
X n−1 j
E[X q ] = np (j + 1)q−1 p (1 − p)n−1−j
j=0
j
We can observe from the summation factor that this is the (q − 1)th moment of an r.v. Y + 1 where
Y = Bin(p, n − 1). Hence
E[X q ] = npE[(Y + 1)q−1 ].
So if q = 1, we have the mean of the binomial r.v. X = Bin(p, n):
E[X] = np.
Now, we calculate the variance. Since Var(X) = E[X 2 ] − µ2 , and if X = Bin(p, n), we have
Example 14.4. A coin is tossed 10 times. If X is the number of times head occurred, find the expected
value and variance of X.
Solution. Assuming that the coin is fair, we know that X = Bin(0.5, 20). Hence,
E[X] = 20 · 0.5
= 10.
(One may interpret the expected value as the average or theoretical number of times that head will occur.)
Further,
Exercise. Suppose that in a basketball game, TEAM A is twice more likely to win over TEAM B. (i) Find the
probability p that TEAM B wins. (ii) Find the expected number and variance of TEAM B wins in a 7-game series.
88
computationally. This is where the Poisson rv comes in. The primary role of this random variable is to
estimate the binomial rv especially for cases with large n and small p.
Recall the binomial distribution for an rv X = Bin(p, n):
n k
p(k) = p (1 − p)n−k
k
n!
= pk (1 − p)n−k
(n − k)!k!
λ
We define a constant λ = np so that p = . This means that p(k) becomes
n
k n−k
n! λ λ
p(k) = 1−
(n − k)!k! n n
n
n(n − 1)(n − 2) · · · (n − k + 1) λk (1 − λ/n)
= .
nk k! (1 − λ/n)k
(2) number of individuals with a rare characteristic, abnormality, or disorder in a large population; or,
In other words, a random variable X can take a Poisson distribution if it is a count of the occurrence some
rare value of a variable from a large (or infinite) population. Consider the example below.
Example 14.5. A publishing company prints books and averagely commits 3 typographical errors per 100
pages. What is the probability that it makes at most 5 errors in a 1000-page book?
Solution. Let X be the number of typos. Then the probability of committing an error in a page is
p = 3/100 = 0.03. Since the book has 1000 pages, we have λ = 1000 · 0.03 = 30. Thus,
30k
p(k) = e−30 .
k!
89
Then,
Exercise. WHO data shows that 0.1% of newborn babies suffer from Down’s Syndrome. In the Philippines,
around 5,000 babies are born everyday. What is the probability that at least one baby born today has
Down’s Syndrome?
Next, we compute the mean of the Poisson rv. Let X = Pos(λ) with parameter λ. Then,
∞
X λk
E[X] = ke−λ
k!
k=0
∞
X λk
= ke−λ
k!
k=1
∞
X λk−1
=λ e−λ where j = k − 1
(k − 1)!
k=1
∞
X λj
=λ e−λ
j=0
j!
= λ.
Next, we compute the variance. In order to do this, we first compute the second moment E[X 2 ]
∞
X λk
E[X 2 ] = k 2 e−λ
k!
k=0
∞
X λk−1
=λ ke−λ
(k − 1)!
k=1
∞
X λj
=λ (j + 1)e−λ where j = k − 1
j=0
j!
∞ j ∞
X
−λ λ
X λj
=λ je +λ e−λ
j=0
j! j=0
j!
= λ2 + λ
Var(X) = λ2 + λ − λ2
=λ
90
Bibliography
[1] Anderson, D.F., Seppäläinen, T., & Valkó, B. (2017). Introduction to probability.
[2] AoPS Online (2018). Pascal’s Identity. Retrieved from https://artofproblemsolving/wiki/index.
php/Pascal%27s Identity.
[8] Mood, A.M., Graybill, F.A., & Boes, D.C. (1963). Introduction to the theory of statistics.
[9] Ross, S.M. (2014). A first course in probability, 9th Ed.. Pearson.
[10] Schiller Jr., J.J., Spiegel, M.R., & Srinivasan, R.A. (2013). Schaum’s outline of probability and statistics:
897 solved problems + 20 videos. McGraw Hill Professional.
91