0% found this document useful (0 votes)
153 views97 pages

Probability Lecture Notes

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
153 views97 pages

Probability Lecture Notes

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

MATH-C 228

Probability
Lecture Notes & Exercises

A.Y. 2019 - 2020

Julius D. Selle, M.S. Math

DEPARTMENT OF MATHEMATICS AND STATISTICS


College of Arts and Sciences
Cebu Technological University - Main Campus
Cebu City, Philippines
ii
Contents

Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii

0 Counting 1
Lecture 01 - Experiments and Outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Lecture 02 - The Fundamental Counting Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Lecture 03 - Permutations and Combinations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Lecture 04 - The Binomial Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Lecture 05 - Sets and Set Cardinalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

1 Probability 27
Lecture 06 - Sample Spaces and Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
Lecture 07 - Axiomatic Probability Theory and Other Approaches . . . . . . . . . . . . . . . . . . 31
Lecture 08 - Calculating Probabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
Lecture 09 - Conditional Probability and Independent Events . . . . . . . . . . . . . . . . . . . . . 43
Lecture 10 - Bayes’ Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

2 Random Variables, Distribution Functions, and Expectations 53


Lecture 11 - The Notion of a Random Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
Lecture 12 - Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
Lecture 13 - Expectation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

3 Special Random Variables 79


Lecture 14 - Special Discrete Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

iii
Preface

This text is intended for the undergraduate course MATH-C 228 Probability in the Bachelor of Science in
Mathematics (BS Math) program of the Department of Mathematics and Statistics, College of Arts and
Sciences of Cebu Technological University - Main Campus.
The text is composed of lecture notes and worksheets for 32 lectures divided among 9 chapters in the
course. These contents are compliant to the Course Outline for Probability required by CHED Memorandum
Order No. 48, s. 2017 (PSGs for the BS Math/BS Applied Math program) with an added chapter (Chapter
0 - Counting) on counting techniques and basic combinatorial analysis as a preliminary chapter.
CMO No. 48, s. 2017 provides the following Learning Outcomes for Probability:

(1) define basic terms in probability;

(2) perform and compute probabilities from experiments;

(3) define and give examples of mutually exclusive events and independent events;

(4) define a random variable and explain its usefulness in computing probabilities fo events;

(5) enumerate the properties of a cumulative distribution function and probability distribution function;

(6) derive the probability distribution function from the cumulative distribution function and vice versa;

(7) name some commonly used special discrete and continuous distributions and their properties;

(8) give examples of experiments yielding special distributions;

(9) compute probabilities, means, and variances of special probability distributions;

(10) derive the distribution of a function of random variables using different techniques;

(11) explain the notion of a random vector;

(12) explain and give the properties of a joint cumulative distribution function and joint probability
distribution;

(13) derive conditional distributions and marginal distributions;

(14) explain and show independence of random variables;

(15) compute mathematical and conditional expectations involving functions of a random vector;

(16) construct sampling distributions and compute their means and variances;

(17) explain the law of large numbers and the Central Limit Theorem; and,

(18) discuss the importance of the Central Limit Theorem.

iv
These objectives are divided among Chapters 1 to 8 of the course contents. For Chapter 0 - Counting, the
following Learning Outcomes are added:

(1) use basic counting techniques to solve counting problems; and,

(2) master basic concepts in set theory.

The Learning Outcomes are specified at the beginning of each chapter in this text. Furthermore, at the
beginning of each lecture, Lecture Objectives which address the chapter’s Learning Outcomes are enumerated.
This is then followed by Lecture Notes which presents the body of the lesson. The Lecture Notes is usually
composed of definitions, discussions, theorems, examples, and exercises. This is then followed by the Lecture
Worksheet which is further divided into three parts: Basic Concepts, Problem Solving, and Theoretical Work.
Basic Concepts ask questions regarding fundamental understanding of the lecture. Problem Solving gives
questions for which the tools developed in the Lecture Notes are to be applied. Lastly, the Theoretical Work
provides higher level theoretical development exercises which are mostly on proving or extending theorems,
developing new probability theory tools, and abstract reasoning.
Questions in the Basic Concepts are worth between one and three points depending on their level of
difficulty. Questions in Problem Solving are worth five points and are scored based on the following rubrics:

5 - Solutions are complete, accurate, and organized


4 - Solutions are complete, accurate, but unorganized
3 - Solutions are complete but inaccurate
2 - Solutions are incomplete
1 - Solutions fail to address the problem
0 - No solutions

Theoretical Work questions are worth between five and ten points depending on the level of difficulty. They
are scored based on the following rubrics.

Very Very
Satisfactory Satisfactory Fair Unsatisfactory Unsatisfactory
Accuracy (40%) Every line in At most 75% of At most 50% of At most 25% of Every line in
the proof is the lines in the the lines in the thhe lines in the the proof is in-
mathematically proof are math- proof are math- proof are math- accurate
and logically ematically and ematically and ematically and
accurate logically accu- logically accu- logically accu-
rate rate rate
Completeness (30%) The proof is The proof The proof lacks The proof lacks The proof has
complete with lacks definition at most half of at most 75% of no body to ar-
parts including of new nota- its body its body rive at the con-
definition of tions (when clusion
new notations, applicable),
ypothesis, hypothesis, or
body, and conclusion but
conclusion complete with
its body
Readability (20%) The proof The proof The proof The proof There are no
is complete has appropri- has appropri- has appropri- appropriate
with appro- ate strings of ate strings of ate strings of strings of ex-
priate strings explanations explanations explanations planations
of explana- between at between at between at between the
tions between most 75% of most 50% of most 25% of lines of the
lines of proof the lines the lines the lines proof
for effective
communication

v
Organization (10%) The proof The lines of The proof is The proof The proof fails
is neat and the proof are neat but the is not neat to provide
the lines are well-organized lines are either and the lines any system of
well-organized and logically unorganized are either addressing the
and logically ordered but not or illogically unorganized given theorem
ordered neat ordered or illogically
ordered

The entire text was developed using LATEX.

vi
Chapter 0

Counting

This chapter is covers basic combinatorial analysis and set theory. Here, we layout some basic counting tech-
niques which will be used in calculating probabilities in Chapter 1. We will also review some basic concepts
in set theory especially on basic terms, set operations, and some important theorems and properties of sets
and set operations.

Intended Learning Outcomes


(CO 0.1) Use basic counting techniques to solve counting problems

(CO 0.2) Master basic concepts in set theory

Lectures
Lecture 1. Experiments and Outcomes
Lecture 2. The Fundamental Counting Principle

Lecture 3. Permutations and Combinations


Lecture 4. The Binomial Theorem
Lecture 5. Sets and Set Cardinalities
Lecture 01 - Experiments and Outcomes

Lecture Objectives
(1) Define experiments and outcomes

(2) Write out all possible outcomes of a given experiment

In this course, we will deal with what are called random experiments. Probability theory is the study of the
associated likelihood of occurrence of events in these experiments. To begin our understanding of the theory
of probability, we define random experiments and sample spaces.

1.1 Random Experiments


Experiments can help us in dealing with many problems in the real world including, for instance:

(1) traffic management: testing new routes for PUJs to improve traffic flow

(2) disaster mitigation: proposing disaster and emergency plans to mitigate dangers from natural and
man-made disasters such as typhoons, earthquakes, and fire

(3) crime control : national and local government legislating new laws, regulations, and ordinances as
preventive measures for crimes

(4) promoting quality education: reviewing and revising curricula (or the entire education system itself)
in order to improve the quality of education in the country

In science, repeated experimentation is also done to aid scientific investigation. This includes the following
examples given by Hogg, McKean, & Craig (2018):

(1) medical research: effect of the administration of a drug

(2) economics: prices of commodities at various time intervals

(3) agriculture: effect of a chemical fertilizer to a cereal grain

Note. An important property of an experiment is that before it is performed, there is no certainty of what
the outcome will be.
A random experiment is an experiment, where every possible outcome is known.

Example 1.1. The following are examples of random experiments.

(1) tossing a coin

(2) rolling a die

(3) drawing a card from a deck

2
Definition 1.1. The set of all possible outcomes of a random experiment is called its sample space, denoted
S. Each possible outcome of a random experiment (i.e., each element in the sample space) is called a sample
point.
Example 1.2. In the experiment of tossing a coin, the sample space is:
S = {H, T }
where H, T are the sample points representing heads and tails, respectively.

1.2 Sample Spaces


We discuss three ways of obtaining the sample space of an experiment:
(1) listing method;
(2) tree diagramming; and,
(3) tabular method.
In listing method, the sample space is obtained through naming each possible outcome.
Example 1.3. In rolling a die, the sample space is given by:
S = {1, 2, 3, 4, 5, 6}.
Example 1.4. The gender of a newly-born baby has the sample space:
S = {male, female}.
Sometimes, an experiment is a compound of two or more “subexperiments”. If there are two subexper-
iments, we denote each sample pont by (x1 , x2 ), where x1 and x2 are the outcomes of the first and second
subexperiments, respectively. If the experiment involves n subexperiments, we denote each outcome by the
n-tuple (x1 , x2 , . . . , xn ) where xi (for i = 1, 2, . . . , n) is the outcome of the ith subexperiment.
Example 1.5. The following are examples of experiments involving two subexperiments.
(1) In tossing a coin twice, the sample space is given by:
S = {(H, H), (H, T ), (T, H), (T, T )}.

(2) Suppose an experiment involves rolling a die and tossing a coin. Then, we have:
S = {(1, H), (2, H), (3, H), (4, H), (5, H), (6, H), (1, T ), (2, T ), (3, T ), (4, T ), (5, T ), (6, T )}.

Exercise. There are two bulbs in a room which may be turned ON (1) or OFF (0). List all the possible
cases for both bulbs.
Example 1.6. The sample space of tossing a coin thrice is:
S = {(H, H, H), (H, H, T ), (H, T, H), (H, T, T ), (T, H, H), (T, H, T ), (T, T, H), (T, T, T )}.
Exercise. In ordering a burger, there are three sizes: small, large, and medium, two kinds of patty: beef
and chicken, and two kinds of sauce: mayonnaise and cheese. Write out all possible choices of burgers.
Now we introduce a more systematic way of writing out the sample space of a given experiment: tree
diagramming. A tree diagram is a tree graph starting from one vertex branching out to levels. Each
level represents a subexperiment. The tree diagram is advantageous for experiments involving multiple
subexperiments and outcomes.

3
Example 1.7. The outcomes of tossing a coin thrice can be determined by the following tree diagram.

Thus, S = {(H, H, H), (H, H, T ), (H, T, H), (H, T, T ), (T, H, H), (T, H, T ), (T, T, H), (T, T, T )}.

Exercise. Francis has three shirts: blue, red, and yellow, and two pants: long and short. Use tree diagram-
ming to determine the sample space of what possible outfits Francis can wear.

Lastly, we discuss another systematic way of obtaining the sample space – the tabular method. The
tabular method is mostly derived from the notion of a Cartesian product. For an experiment with two
subexperiments, the sample points can be derived from the Cartesian product S1 × S2 , where S1 and S2 are
the sample spaces of the two subexperiments.
The tabular method can also be extended for experiments with more than two subexperiments as we will
illustrate later.

Example 1.8. Suppose an urn contains three chips numbered 1,2, and 3. An experiment involves picking
up two of the chips with replacement (i.e., the first chip is returned to the urn after it was picked). Then the
sample space can be obtained from the following table:

1 2 3
1 (1, 1) (1, 2) (1, 3)
2 (2, 1) (2, 2) (2, 3)
3 (3, 1) (3, 2) (3, 3)

Thus, S = {(1, 1), (1, 2), (1, 3), (2, 1), (2, 2), (2, 3), (3, 1), (3, 2), (3, 3)}.

When the experiment involves more than two subexperiments, we use multiple tables as illustrated below.

Example 1.9. In tossing a coin thrice, we have the following:

H T
H T (H,H) (H, H, H) (H, H, T )
H (H, H) (H, T ) (H,T) (H, T, H) (H, T, T )
T (T, H) (T, T ) (T,H) (T, H, H) (T, H, T )
(T,T) (T, T, H) (T, T, T )

Hence, S = {(H, H, H), (H, H, T ), (H, T, H), (H, T, T ), (T, H, H), (T, H, T ), (T, T, H), (T, T, T )}.

Exercise. Use tabular method to obtain the sample space of tossing a die twice.

4
Evaluation
A. Basic Concepts (2 points each)
(1) What is an experiment? Give one example.
(2) What is a random experiment?
(3) What are sample spaces and sample points?
(4) What are some techniques to obtain the sample space of an experiment and how is each done?
B. Problem Solving (5 points each)
(1) A code involves four digits of numbers from the set {0, 1, 2}. Use tree diagramming to obtain all
possible codes.
(2) Use listing method to obtain the sample space of picking a letter from the vowels of the English
alphabet.
(3) In buying a particular car model, Jay may choose between automatic or manual transmission, and
among four colors, black, white, sliver, and maroon. Use tabular method to obtain the sample space
of all choices that Jay has.

Notes
(1) Answer the Evaluation in sheets of short-sized white bond papers. They will be compiled at the end of
the term.

5
Lecture 02 - The Fundamental Counting Principle

Lecture Objectives
(1) State, prove, and apply the Fundamental Counting Principle
(2) Generalize and modify the Fundamental Counting Principle for specific cases of counting problems

2.1 Basic Counting Rules


In this lecture, we state the most basic result in combinatorics called the Fundamental Counting Principle
(FCP). This is also called the multiplication rule, mn-rule, or Cartesian product by other authors.
Theorem 2.1 (Fundamental Counting Principle). Suppose an experiment is composed of two subexperi-
ments, the first one having m possible outcomes and the second having n. Then the experiment has mn
possible outcomes.
This means that in order to calculate the number of outcomes for an experiment with two subexperiments,
multiply the number of possible outcomes of the two subexperiments.
Example 2.1. Suppose a class has 11 males and 15 females. How many ways can the class choose their
Prince Charming and Muse?
Solution. Two solve the problem, we aid our use of the FCP with the box method. That is we use two boxes
to represent the two subexperiments. Then, we write the number of possible outcomes inside each box.
11 15 = 165

Hence, there are 165 possible ways to choose a Prince Charming and a Muse.
Example 2.2. How many possible outcomes are there in rolling a die twice?
Solution. 6 6 = 36
Example 2.3. How many possible outcomes are there in tossing a coin twice?
Solution. 2 2 =4
Example 2.4. How many possible outcomes are there in rolling a die and tossing a coin?
Solution. 6 2 = 12
Exercise. Robert has 7 shirts and 3 pairs of pants. How many outfits can he possibly wear?
Proof (Fundamental Counting Principle). Let S1 = {x1 , x2 , . . . , xm } be the sample space of the first subex-
periment and S2 = {y1 , y2 , . . . , yn } be the sample space of the second subexperiment.
Then, the sample space of doing both subexperiments is given by
S1 × S2 = {(x, y) | x ∈ S1 , y ∈ S2 }.
From Set Theory, we know that |S1 × S2 | = |S1 ||S2 |. Therefore, |S1 × S2 | = mn.

6
The Fundamental Counting Principle can be extended to deal with experiments having more than two
subexperiments. This result is called the Generalized Fundamental Counting Principle (GFCP).
Theorem 2.2 (Generalized Fundamental Counting Principle). If an experiment is composed of r subex-
periments, having n1 , n2 , . . . , nr possible outcomes, respectively, then the experiment has n1 n2 · · · nr possible
outcomes.
Example 2.5. A customer who orders pizza can choose two types of crusts: thick and thin; three sizes:
small, medium, and large; and three types of toppings: Hawaiian, pepperoni, and BBQ. How many choices
of pizza are there in all?
Solution. There are three subexperiments namely type of crust, size, and type of toppings. Hence, we
need three boxes.
2 2 3 = 18
Exercise.
(1) In a city, there are two malls; each mall has three cinemas; and in each cinema, there are two kinds
of movie seats: deluxe and premium. How many ways can someone watch a movie?
(2) How many possible outcomes are there in rolling a die thrice?
(3) How many possible outcomes are there in tossing a coin thrice?
Example 2.6. An ATM Card is secured by a 4-digit PIN code. How many possible PIN codes can be made?
Solution.
10 10 10 10 = 10000
Sometimes, counting problems include restrictions such as the following.
Example 2.7. An ATM Card is secured by a 4-digit PIN code. How many possible PIN codes can be made
if the last digit must be even?
Solution.

10 10 10 5 = 5000

Example 2.8. For lunch, Elvi has to choose a main dish, a side dish and the type of rice. There are four
choices of main dish namely pork, fish, chicken, and beef; three choices of side dish, namely beans, vegetables,
and corn soup; two choices of types of rice namely plain and java. However, due to Elvi’s diet plans, she is
not allowed to eat the fish-beans-plain rice combo. How many choices of lunch combo does Elvi have?
Solution.
4 3 2 −1 = 23.
Suppose an experiment is to be repeated, i.e., to be performed multiple times. We say that the experi-
ment is with replacement if the outcome that occured in a repetition could still possibly occur in the next
repetitions. Otherwise, we say it is without replacement.
Example 2.9. Suppose two cards are drawn from the standard deck. How many possible outcomes are there
if they are drawn (a) with replacement, (b) without replacement?
Solution.
(a) 52 52 = 2704

7
(b) 52 51 = 2652

Example 2.10. Ten runners are racing to the Finish Line. How many possible ways can the Gold, Silver,
and Bronze medals be awarded?

Solution. 10 9 8 = 720

Example 2.11. How many 4-digit PIN Codes can be made if the first and last digits must be the same, and
the second and third must not be equal?

Solution. 10 10 9 1 = 900

Exercise. France, Bert, John, and Jules are members of a rock band. France can be the vocals and can
play any instrument. Bert can only play either drums or guitars. John and Jules both can be vocals and
play any instrument except the keyboard. How many ways can they arrange themselves in a performance?

2.2 Factorial Function


We define an important function in arithmetic called the factorial function.

Definition 2.1. For a nonnegative integer n, we define the factorial of n, denoted n! by


(
1 if n = 0
n! = .
n(n − 1)! if n > 0

We give the first few values of n!:

n 0 1 2 3 4 5 6 7 8 9 10
n! 1 1 2 6 24 120 720 5040 40320 362880 3628800

Example 2.12. Evaluate:

(1) 7! 6! + 5!
(3)
4!
5!
(2)
3!

Solution.

(1) 7! = 5040 6! + 5! 5!(6 + 1)


(3) = = 5 · 7= 35
4! 4!
5! 5 · 4 · 3!
(2) = = 5 · 4 = 20
3! 3!

Exercise. Evaluate:

(1) 8! 6!
(3)
4!2!
10! n!
(2) (4)
8! (n − 2)!

8
2.3 Linear Arrangements
Consider the following problem:

Example 2.13. How many ways can Alex, Bert, Charles, and Dennis arranged themselves if they sit in a
row?

Solution. 4 3 2 1 = 24
This can also be solved using the factorial 4! = 24.

Theorem 2.3 (Linear Arrangements). There are n! ways to rearrange n distinct objects.

Proof. Let there be n distinct objects. Then there are n possible objects that can placed in the first position.
Since this object cannot be placed again in the next position, there are n − 1 remaining objects that can
possibly be placed second. Furthermore, there are n − 2 on that can be placed next, and so on. This manner
continues until the last position in which only one object can possibly be placed. By GFCP we find that
there are

n n−1 n−2 ··· 1 = n(n − 1)(n − 2) · · · (1) = n!

Exercise. How many ways can we rearrange the letters of the word CEBUANO?

2.4 Arrangements with Indistinct Elements


Now we move to problems involving arrangements with indistinct elements. Consider the following example.

Example 2.14. How many ways can we rearrange the letters of the word:

(a) WORD;

(b) WOOD?

Solution.

(a) 4! = 24

(b) The possible rearrangements are:

WOOD OWDO DOOW ODWO


WODO OWOD DOWO ODOW
WDOO OOWD DWOO OODW

Thus, there are 12 ways to rearrange the letters of the word WOOD.

Theorem 2.4 (Linear Arrangement with Indistinct Elements). Let there be n distinct objects, each object
appearing r1 , r2 , . . . , rn times, respectively. Then, there are

(r1 + r2 + · · · + rn )!
r1 !r2 ! · · · rn !

ways to arrange them in a line.

9
Proof. Let there be n distinct objects each appearing r1 , r2 , . . . , rn times, respectively. Thus, there are
r1 + r2 + · · · + rn objects in all and by Linear Arrangement Theorem, there are (r1 + r2 + · · · + rn )! ways to
rearrange them disregarding the indistinctness of some elements.
Now by Linear Arrangement Theorem again, there are ri ! ways to rearrange the ith object. Since they
are indistinct, we have to deduct all these rearrangements of the ith object (for i = 1, 2, . . . , n), and that is
by dividing the total number of arrangements by r1 !r2 ! · · · rn !. Hence, there are

(r1 + r2 + · · · + rn )!
r1 !r2 ! · · · rn !
possible ways to rearrange them.

Example 2.15. How many ways can we rearrange the letters of the words

(a) WOOD;

(b) CRITIC?

Solution.
4!
(a) = 12
2!
6!
(b) = 180
2!2!

Exercise. How many ways can we rearrange the letters of the words:

(a) BOBBY;

(b) ELEMENTAL;

(c) PROBABILITY?

2.5 Circular Arrangements


Now we turn to circular arrangements. Consider the following problem.

Example 2.16. How many ways can Alex, Bob, Charles, and Dennis arrange themselves around a table?

Solution. All the possible arrangements are shown below:

Hence, there are 6 possible ways.

10
Theorem 2.5 (Circular Arrangements). There are (n − 1)! ways to arrange n objects around a circle
(assuming rotations are indistinct).

Proof. When n objects are to form a line, there are n! ways to arrange them. When ends of this line segment
are joined, we form a circle. This circle has n indistinct rotations that we have to remove from the total
n!
number of arrangements. Thus, there are = (n − 1)! ways to arrange them.
n
Example 2.17. How many ways can Alex, Bob, Charles, and Dennis arrange themselves around a circle:

(a) if there are no restrictions;

(b) if Alex and Bob choose to sit beside each other;

(c) if Alex and Bob choose not to sit beside each other?

Solution.

(a) (4 − 1)! = 6

(b) (3 − 1)!2! = 4

(c) (4 − 1)! − (3 − 1)!2! = 2

Exercise. How many ways can Alex, Bob, Charles, Dennis, and Eugene arrange themselves around a circle
if:

(a) there are no restrictions;

(b) Alex and Bob choose to sit beside each other;

(c) Alex and Bob choose not to sit beside each other;

(d) Alex, Bob, and Charles choose to sit together?

Evaluation
A. Basic Concepts. (2 points each)

(1) State the Fundamental Counting Principle and the Generalized Fundamental Counting Principle.
(2) Explain how each are calculated: (a) linear arrangements; (b) linear arrangements with indistinct
elements; (c) circular arrangements.
(3) What is the difference between an experiment with and without replacement?
(4) Explain how one evaluates the factorial of a positive integer.

B. Problem Solving. (5 points each) Find the sample size:

(1) Tossing a coin twice and rolling a die.


(2) The possible outfits to be worn by a girl if she has a red, a blue, and a white blouse, and a pink, a
white, and a green skirt.
(3) Mark wants to buy a new sedan from a local Toyota dealer. The dealer has three models: Vios,
Corolla, and Camry. The colors silver, black, white, maroon, and gray are available. How many
possible choices of cars does Mark have if he doesn’t like the white Camry?

11
(4) Dave assembles a new desktop PC. There are 4 choices of monitor brands, 2 choices of keyboards, 1
choice of mouse, 4 choices of processors and 5 choices of mother How many ways can the letters of
the word COMMITTEE be rearranged?
(5) there are two 1-Peso coins, three 5-Peso coins, and two 10-Peso coins. How many ways can they be
piled up in one pile?
(6) How many ways can three friends sit around a circle?
(7) How many ways can six friends sit around a circle if two of them choose not to sit beside each other?
(8) How many 5-digit PIN Codes can possibly be created if the first digit must be prime, the second and
third must be odd, and the fourth and fifth cannot be the same digits?
(9) How many ways can a class with 13 students elect their president, vice-president, secretary, treasurer,
and auditor? (no student can serve two positions)
(10) How many ways can 5 friends be seated in a row?
(11) Suppose there are 3 pairs of couples. How many ways can they arrange themselves in a row if each
couple must sit beside each other?
C. Theoretical Work. (10 points each)

(1) Prove the Generalized Fundamental Counting Principle.


n!
(2) Evaluate: .
2!(n − 2)!
(3) How many n-digit PIN Codes can be created?
n!
(4) Show that is an integer for any r, n ∈ Z+ with r ≤ n.
(n − r)!
n!
(5) Show that is an integer for any r, n ∈ Z+ with r ≤ n.
r!(n − r)!

12
Lecture 03 - Permutations and Combinations

Lecture Objectives
(1) Define permutations and combinations
(2) Use permutation and combination functions to solve relevant counting problems
(3) Generalize and modify the concept of permutation and combination function for specific cases of problems

In addition to the counting techniques already discussed in the last lecture, we present here two important
counting functions namely permutation and combination. We start this lecture with a classical example
called the Handshake Problem.
Example 3.1 (The Handshake Problem). Five gentlemen attended a party. Each man shook hands with
every other man. How many handshakes were there in all?
Solution. The five vertices below represent the five men while the edges represent the handshakes:

Counting the edges, there are 10. Hence, there were 10 handshakes in all.

3.1 Permutations
Recall from Lecture 02 that there are n! ways to arrange n distinct objects. In this part of the lecture we
are to deal with n distinct objects and we take r of them (0 ≤ r ≤ n) to arrange. How many arrangements
are possible?
Example 3.2. How many ways can gold, silver, and bronze medals be awarded to four racers?
Solution. We denote the racers by A, B, C, D. The possible outcomes are:
ABC BAC CAB DAB
ACB BCA CBA DBA
ABD BAD CAD DAC
ADB BDA CDA DCA
ACD BCD CBD DBC
ADC BDC CDB DCB

13
This means that there are 24 possible outcomes.

The quicker way to solve the problem is through the permutation function which will be introduced by
the following theorem.
n!
Theorem 3.1. Let there be n distinct objects. Then there are ways to arrange r of them (0 ≤ r ≤ n).
(n − r)!

Proof. Let there be n objects. If we arrange r of them, then by GFCP, there are n(n − 1)(n − 2) · · · (n − k + 1)
ways to arrange them. Using the factorial notation, this number is equal to

n!
.
(n − r)!

We will use the common notation


n!
n Pr = .
(n − r)!
The function n Pr is called the permutation function. Now, we go back to the example we had earlier.
Solution (Example 3.2). From n = 4, we take 3 to arrange.

4!
4 P3 =
(4 − 3)!
4!
=
1!
= 24.

Exercise.

(1) A valedictorian and a salutatorian have to be selected from a group of 10 students. How many possible
ways can this be done?

(2) A building owner hired 8 guards. Five of the guards have to be assigned to patrol a five-storey building
(one guard per floor), and the remaining will be reserved at the guard house. How many ways can they
be assigned?

(3) In a lottery, five balls in exact order are to be drawn from a total of 20 numbered balls. How many
possible winning entries are there?

3.2 Combinations
We have dealt with problems involving arrangements. In this section, we will look at problems in which the
arrangement or order of the elements does not matter. See the problem below.

Example 3.3. How many ways can we select three letters from the list: A, B, C,D, and E?

Solution. The possible selections are:

ABC ABD ABE ACD ACE


BCD BCE BDE CDE ADE

14
Hence, there are 10 possible selections.

Observe that for the selection ABC, its rearrangements namely ACB, BAC, BCA, CAB, and CBA did not count.
In other words, all 3! rearrangements of each selection count as one.
In general, if we select r objects from a set containing n distinct ones, we can take all possible permutations
n Pr and then divide by r!, the number of ways to rearrange each selection. This is called the combinatorial
function.

Definition 3.1. The combinatorial function is defined by

n!
n Cr = .
r!(n − r)!

Theorem 3.2. There are n Cr ways of selecting r objects from a set of n.

Proof. We know that there are n Pr permutations. There are r! identical cases when arrangement does not
matter. Thus, there are
n Pr n!
= =n Cr
r! r!(n − r)!
possible outcomes.

Solution (Example 3.3).

5!
5 C3 =
3!(5 − 3)!
5!
=
3!2!
5 · 4 · 3!
=
3! · 2
= 10

Exercise.

(1) How many ways can we select 5 members to form a dance group from a class of 10 students?

(2) John wants to purchase 3 colors of paint for his school project. At the store, there are 7 colors available.
How many choices does John have?

From this point forward, we will denote


 
n
n Cr = .
r

3.3 Differentiating Permutations and Combinations


The best way to tell between permutations and combinations is that
permutation - arrangement matters
combination - arrangement does not matter
Example 3.4. From a class with 12 students, 5 have to be chosen to attend a meeting. How many choices
are there?

15
Solution. Arrangement does not matter. Hence,
 
12 12!
=
5 5!(12 − 5)!
12 · 11 · 10 · 9 · 8 · 7!
=
5 · 4 · 3 · 2 · 1 · 7!
= 792
Example 3.5. From a class with 12 students, 5 have to be chosen to become the president, vice-president,
secretary, treasurer, and auditor. How many choices are there?
Solution. In this problem, the arrangement matters.
12!
12 P5 =
(12 − 5)!
12 · 11 · 10 · 9 · 8 · 7!
=
7!
= 95040

Exercise. Identify whether the following experiments are permutations or combinations, then solve the
problem.
(1) A class of 10 students has to elect a president, vice-president, secretary, treasurer, and auditor. How
many ways can these officers be selected?
(2) How many ways can 3 items be selected from a group of 6 items?
(3) Suppose that a bank serves 50 accounts. They wanted to randomly choose a sample of 10 accounts to
study in order to better understand the behavior of their clients. How many possible sets of samples can
be taken?
(4) How many ways can 3 cards be drawn sequentially from the standard deck?

Evaluation
A. Basic Concepts (2 points each)
(1) Differentiate when to use the permutation and combination functions.
B. Problem Solving (5 points each)
(1) A pet shop sells hamsters. If they have 15 available hamsters, how many choices do we have if we
want to buy 5?
(2) Ten chips are numbered 1 to 10. If three chips are to be drawn in exact order, how many possible
outcomes are there?
(3) International airports around the world are coded using three letters. How many possible codes can
be generated?
(4) How may possible winning combinations are there in the 6/42 Lotto draw?
C. Theoretical Work (10 points each)
(1) Generalize and give an explicit formula to solve the Handshake Problem for any number of gentlemen
attending the party.

16
Lecture 04 - The Binomial Theorem

Lecture Objectives
(1) State, prove and apply the binomial theorem

(2) Use the binomial theorem in solving relevant counting problems

4.1 The Pascal Identity and the Pascal Triangle


The number nr is also often referred to as a binomial coefficient because of its role in binomial expansion.


We first present an important identity formed by binomial coefficients and this use this to construct the
famous Pascal Triangle.

Theorem 4.1 (Pascal Identity). Let r, n ≥ 0 with r ≤ n. Then,


     
n n n+1
+ = .
r r+1 r+1

Proof. We have,
   
n n n! n!
+ = +
r r+1 r!(n − r)! (r + 1)!(n − r − 1)!
 
n! 1 1
= +
r!(n − r − 1)! n − r r + 1
 
n! n+1
=
r!(n − r − 1)! (n − r)(r + 1)
(n + 1)!
=
(r + 1)!(n − r)!
(n + 1)!
=
(r + 1)! [(n + 1) − (r + 1)]!
 
n+1
= .
r+1

The primary application of this identity is the construction of the Pascal Triangle:

17
From the Pascal Triangle, nr is the rth entry of the nth row (note however that we start counting with


a zeroth row and zeroth entry). For example, 42 = 6 is illustrated below:




In this section, we state and prove three basic properties of the binomial coefficient.

Theorem 4.2. Let n, r ≥ 0 with r ≤ n. Then,

(i) n0 = nn = 1;
 

(ii) n1 = n−1
n
 
= n; and,

(iii) nr = n−r
n
 
.

Proof. For n, r ≥ 0 and r ≤ n,

n
 n! n!
(i) 0 = = =1
0!(n − 0)! n!
n
 n! n!
n = n!(n − n)! = n! = 1

n
 n! n!
(ii) 1 = = =n
1!(n − 1)! (n − 1)!
n
 n! n!
n−1 = (n − 1)![n − (n − 1)]! = (n − 1)! = n

n
 n! n! n

(iii) n−r = = = r .
(n − r)![n − (n − r)]! r!(n − r)!

4.2 The Binomial Theorem


Binomial coefficients are named so due to their appearance in the expansion of (x + y)n . In this section, we
discuss how to expand binomials quickly.

18
Theorem 4.3 (Binomial Theorem). Let x and y be variables and n any nonnegative integer. Then,
n  
X n n−r r
(x + y)n = x y .
r=0
r

In other words,
       
n n n n−1 n n−2 2 n n
(x + y)n = x + x y+ x y + ··· + y .
0 1 2 n

Proof. (Binomial Theorem) When n = 0, the proof is trivial. For n > 0, we will use induction. When n = 1,
we have    
1 n 1 1 1
(x + y) = x + y = x + y .
0 1
Suppose that for n = k,
k  
k
X k
(x + y) = xk−r y r .
r=0
r
Then, for n = k + 1,

(x + y)k+1 = (x + y)(x + y)k


n  
X n n−r r
= (x + y) x y
r=0
r
        
k k k k−1 k k−2 2 k k
= (x + y) x + x y+ x y + ··· + y
0 1 2 k
       
k k+1 k k k k−1 2 k
= x + x y+ x y + ··· + xy k
0 1 2 k
       
k k k k−1 2 k k−2 3 k k+1
+ x y+ x y + x y + ··· + y
0 1 2 k
         
k + 1 k+1 k k k k
= x + + xk y + + xk−1 y 2 +
0 0 1 1 2
     
k k k + 1 k+1
··· + + xy k + y
k−1 k k+1
       
k + 1 k+1 k+1 k k+1 k + 1 k+1
= x + x y + ··· + xy k + y
0 1 k k+1
k+1
X k + 1 
= y k+1−r .
r=0
r

This completes our proof.


Example 4.1. Expand:
(1) (x + y)4
(2) (2x + y)3
(3) (3x − 2y)2
Solution.
(1) (x + y)4 = x4 + 4x3 y + 6x2 y 2 + 4xy 3 + y 4

19
(2) (2x + y)3 = (2x)3 + 3(2x)2 y + 3(2x)y 2 + y 3 = 8x3 + 12x2 y + 6xy 2 + y 3

(3) (3x − 2y)2 = (3x)2 + 2(3x)(−2y) + (−2y)2 = 9x2 − 12xy + 4y 2

Exercise. Expand:

(1) (x + y)5

(2) (−x + 2y)4

(3) (x + y + z)

The Binomial Theorem also has applications for theoretical work including the following corollary which
may be llustrated by summing up the rows of the Pascal Triangle.

Theorem 4.4. For n, r ≥ 0 with r ≤ n, we have


n  
X n
= 2n .
r=0
r

Proof. Let n, r ≥ 0 with r ≤ n. Then,


n   n  
X n X n
2n = (1 + 1)n = 1n−r 1r = .
r=0
r r=0
r

4.3 Applications of the Binomial Theorem to Counting


Aside from being a handy tool in expanding powers of binomials, the Binomial Theorem also has applications
to counting.

Example 4.2. Team A and Team B play 5 basketball games. How many ways can A win four times and B
once?

Solution. The possible ways are

AAAAB AABAA BAAAA


AAABA ABAAA

So there are 5.

Using the Binomial Theorem, the numerical coefficient of the term with A4 B in the expansion of (A+B)5
gives the answer which is 5. Generally, for two outcomes A and B of an experiment n times, there are nr
ways for A to occur r times and B to occur n − r times. So for Example 4.2, we have

(A + B)5 = A5 + 5A4 B + 10A3 + B 2 + 10A2 B 3 + 5AB 4 + B 5

implying that there is 1 way for A to occur 5 times, 5 ways for A to occur 4 times and B once, 10 ways for
A to occur thrice and B twice, and so on.

Exercise. A coin is tossed 4 times. Give the number of ways for heads to occur 0, 1, 2, 3, and 4 times.

20
Evaluation
A. Basic Concepts (2 points each)
(1) Explain how the Pascal’s Triangle is constructed.
(2) How is the Pascal Triangle used?
(3) Illustrate Theorem 4.2 by reflecting it to the Pascal Triangle.
(4) How do we use the Binomial Theorem in expanding binomial powers?
(5) How do we use the Binomial Theorem as a counting technique?
B. Problem Solving (5 points each)
(1) Construct the Pascal Triangle down to n = 10.
(2) Expand:
(a) (x + y)7
(b) (2x − y)6
(c) (x − y + z)3
(3) Garry and Bobby play 5 chess games. How many of each possible outcome are there (Win-Draw-
Loss)?
C. Theoretical Work (10 points each)

n n(n − 1)

(1) Show that 2 = .
2
Pn
(2) Prove that k=1 (−1)k nk = 0.

Pn
(3) Prove that k=1 2k nk = 3n .


21
Lecture 05 - Sets and Set Cardinalities

Lecture Objectives
(1) Review basic concepts in set theory including basic definitions, set operations, and some important
theorems

(2) Establish set counting formulas

(3) Use set counting formulas in solving relevant counting problems

Set theory is essential to perhaps all areas of mathematics. Most mathematicians believe that sets are the
second or third most important objects in mathematics after numbers or functions. Probability, being built
over sample spaces which are also sets, of course needs sufficient fundamental concepts of sets. In this lecture
we give a quick review of the most important lessons in Set Theory including basic definitions, set operations,
and set cardinalities which is also a topic on counting.

5.1 Basic Definitions


There are two terms that are not defined in set theory: sets and membership. A set is simply understood
as a well-defined collection or aggregate of objects. An object is a member of the set if it is part of the
collection. A member is also called an element of the set. If A is a set and x is an element of A, we write
x ∈ A. Otherwise, we write x ∈ / A.

Example 5.1. Write the set of all vowels in the English alphabet.

Solution. S = {a, e, i, o, u}
A set B is said to be a subset of A if every element of B is also in A. We write B ⊆ A. In other words,

B ⊆ A iff x ∈ B ⇒ x ∈ A.

For example, the set {a, e} is a subset of the set S above.


Two sets are equal if they have the same elements. We denote this by A = B. In other words, A and B
are equal if and only if every element of A is in B and every element of B is also in A. That is,

A = B iff A ⊆ B and B ⊆ A.

Clearly, for any set A, A ⊆ A.


There is a set which contains no elements called the empty set denoted ∅. There is also a set which
contains all elements (of interest) called the universal set U . Note that for any set A, ∅ ⊆ A ⊆ U . This
is because if ∅ * A, then there is an element x in ∅ such that x ∈ / A. But ∅ is empty arriving at a
contradiction. Also, if A * U , then there is an element x in A that is not in U But U should contain all
elements, arriving at another contradiction.
Subset relationship is also transitive. That is, if A ⊆ B and B ⊆ C, then A ⊆ C. This is because if
x ∈ A, then x ∈ B. This further implies that x ∈ C.

22
5.2 Set Operations
We will now go on the algebra of sets. Along our discussions, we will accompany set operations with Venn
diagrams. We will define four basic set operations followed by some of the most important identities. For all
the following definitions, we let A and B be sets.

Definition 5.1. The union of A and B, denoted A ∪ B, is the set of all elements of A or B. In other
words,
A ∪ B = {x | x ∈ A or x ∈ B} .

Definition 5.2. The intersection of A and B, denoted A ∩ B, is the set of all elements of A and B. In
other words,
A ∩ B = {x | x ∈ A and x ∈ B} .

Definition 5.3. The (universal) complement of A is the set of all elements that are not in A. In other
words,
Ac = {x | x ∈
/ A}

Definition 5.4. The relative complementof B from A is the set of all elements of A that are not in B.
In other words,
A − B = {x | x ∈ A and x ∈/ B} .

23
Example 5.2. Let U = {1, 2, 3, 4, 5, 6, 7, 8, 9, 19}, A = {2, 3, 4, 5, 6}, B = {5, 6, 7, 8, 9}, and C = {2, 4, 6, 8}.
Find:
(1) A ∪ B
(2) A ∩ C
(3) Ac
(4) B − C
Solution.
(1) A ∪ B = {2, 3, 4, 5, 6, 7, 8, 9}
(2) A ∩ B = {2, 4, 6}
(3) Ac = {1, 7, 8, 9, 10}
(4) B − C = {5, 7, 9}
Theorem 5.1. Let A, B, and C be sets. Then,
(i) (Identity Laws) A ∪ ∅ = A, A ∩ U = A;
(ii) (Domination Laws) A ∩ ∅ = ∅, A ∪ U = U ;
(iii) (Complement Laws) A ∪ Ac = U, A ∩ Ac = ∅;
(iv) (Idempotent Laws) A ∪ A = A, A ∩ A = A;
(v) (Associativity Laws) A ∪ (B ∪ C) = (A ∪ B) ∪ C, A ∩ (B ∩ C) = (A ∩ B) ∩ C;
(vi) (Commutativity Laws) A ∪ B = B ∪ A, A ∩ B = B ∩ A
(vii) (Distributivity Laws) A ∪ (B ∩ C) = (A ∪ B) ∩ (A ∪ C), A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩ C)
(viii) (De Morgan’s Laws) (A ∪ B)c = Ac ∩ B c , (A ∩ B)c = Ac ∪ B c
In proving these identities, the principle of set equality: “A = B if and only if A ⊆ B and B ⊆ A” can
be useful. We do this by assuming an element from the left hand side and showing that it is also an element
of the other side, and then the doing similarly for the reverse.
Proof. (Theorem 5.1) We only prove (i) and (viii). For (i), let x ∈ A ∪ ∅. Then, x ∈ A or x ∈ ∅. But, x ∈/∅
because ∅ is empty. Hence, x ∈ A. Thus, A ∪ ∅ ⊆ A. Now if x ∈ A, then x ∈ A or x ∈ ∅. This means that
x ∈ A ∪ ∅. This implies that A ⊆ A ∪ ∅.
Now for (viii), let x ∈ (A ∪ B)c . Then, x ∈
/ A ∪ B. Hence, x ∈ / B. Thus, x ∈ Ac and x ∈ B c .
/ A and x ∈
So x ∈ Ac ∩ B c . So (A ∪ B) ⊆ Ac ∩ B c . Now if x ∈ Ac ∩ B c , then x ∈ Ac and x ∈ B c . Hence, x ∈/ A and
x∈/ B. Thus it is not true to say that x ∈ A or x ∈ B, or in other words, it not true to say that x ∈ A ∪ B.
/ A ∪ B which means x ∈ (A ∪ B)c . Hence, Ac ∩ B c ⊆ (A ∪ B)c .
Thus, x ∈

24
5.3 Set Cardinality
Lastly, we proceed to the final lesson in counting: set cardinalities. Formally, if In = {1, 2, . . . , n} and there
exists a bijection from In to a finite set A, then the cardinality of A is |A| = n. Loosely speaking, the
cardinality of a finite set A is the number of elements in A. We present the following theorem without proof.

Theorem 5.2. For any sets A and B,

|A ∪ B| = |A| + |B| − |A ∩ B|.

Example 5.3. There are 30 students in a classroom. Fourteen of them own a phone while 16 own a
computer. If 7 own neither phone or computer, how many own both?
Solution. If A is the set of those who own a phone and B the set of all those who own a computer, then,
|A| = 14, |B| = 16, and |A ∪ B| = 23. Now,

|A ∪ B| = |A| + |B| − |A ∩ B|
|A ∩ B| = |A| + |B| − |A ∪ B|
= 16 + 14 − 21
=9

Hence, 9 students own both a phone and a computer.

A. Basic Concepts (2 points each)

(1) What is a set?


(2) In the lecture we talked about cardinality for finite sets. Define cardinality for non-finite sets.
B. Problem Solving (5 points each)
(1) Let A = {a, b, c, d, e}, B = {a, c, e}, C = {c, d, e}, D = {a, b, c, d, e, f}.
(a) B∪C
(b) Ac
(c) Ac ∪ B
(d) (A ∩ B)c
(e) (A − B) ∪ C
(2) In a room, there are 30 people. Sixteen of them can speak Waray while 15 can speak Ilonggo. If
there are 5 who can speak both Waray and Ilonggo, how many cannot speak either dialect?
C. Theoretical Work (10 points each)
(1) Let A ⊆ B. Show that A ∪ B = B.
(2) Prove that (A − B) ∪ (B − A) = (A ∪ B) − (A ∩ B).
(3) Prove four identities in Theorem 5.1 (except those which have already been proven.)
(4) Extend Theorem 5.2 to involve the union of three sets A ∪ B ∪ C. Prove your equation.

25
26
Chapter 1

Probability

Probability Theory may be described as the “mathematics of chance”. We will first define sample spaces
and events. Our study is focused on the likelihood of occurrence of events in an experiment. Then, we will
establish probability as a measure of this likelihood. Through several examples, we will calculate probability
values of events which is the main feature of this chapter. In addition, we will also tackle a special redefinition
of probability intended for events of which occurrence depend on another event. Lastly, we will apply to
real-world problems an important result called Baye’s Rule.

Intended Learning Outcomes


(CO 1.1) Define basic terms in probability
(CO 1.2) Perform and compute probabilities from experiments
(CO 1.3) Define and give examples of mutually exclusive events and independent events

Lectures
Lecture 6. Sample Spaces and Events
Lecture 7. Axiomatic Probability Theory and Other Approaches
Lecture 8. Calculating Probabilities
Lecture 9. Conditional Probability and Independent Events

Lecture 10. Baye’s Rule


Lecture 06 - Sample Spaces and Events

Lecture Objectives
(1) Define sample space, sample point, event, sample size, and event size
(2) Write out sample spaces and events of a given experiment
(3) Use counting techniques to determine the sample size and event size of a sample space and an event

The four fundamental terms in Probability Theory are experiment, sample space, sample point, and event.
Experiment has already been understood in the previous chapter while the latter three terms correspond to
similar concepts from Set Theory.

6.1 Sample Spaces


A sample space is the set of all possible outcomes of an experiment. Each element in a sample space S is
called a sample point.
Example 6.1. (1) In the experiment of rollng a diem the sample space is given by

S = {1, , 2, 3, 4, 5, 6}.

(2) A committee is organizing an event. They model the number of people who will possibly come in the said
event. Then sample space is given by
S = {0, 1, 2, 3, . . .}.

(3) The sample space of the experiment “gender of a newborn baby” is given by

S = {male, female}.

(4) The estimated time of arrival of a plane flight is between 30 and 45 minutes. The sample space is

S = {x ∈ R | 30 ≤ x ≤ 45}.

(5) A name is to be given to a newly born baby. The sample space is hard to write but the following might
illustrate it:
S = {Aaron, James, Anna, . . .}.
There are two types of sample spaces: discrete and continuous. A discrete sample space is one that is
countable. Examples of discrete sample spaces are in (1), (2), (3), and (5). A continuous sample space is
one which is uncountable. Example (4) illustrates this type of sample space.
A sample space may also be classified as either finite or infinite. For a discrete sample space, it is finite
if there is a finite number of sample points. For example, (1) and (3) are finite discrete sample spaces but (2)
and (5) are infinite discrete sample spaces because the number of possible outcomes in these experiments are
unbounded. A continuous sample space is finite if the values are bounded, and infinite otherwise. Example
(4) has a finite continuous sample space.

28
The concept of sample size is vital to our approach to studying probability theory. We shall separately
define sample size for the discrete case and continuous case. We denote the sample size of S by |S|.
For a finite discrete sample space, the sample size is its Cardinality. Hence, for Example (1), the sample
size is 6, for (3) is 2. For an infinite discrete sample space, we denote the sample size by ℵ0 (read as “aleph
null ”).
The sample size of continuous sample spaces is harder to define since a background in measure theory is
needed. However, so as not to require much prerequisite understanding, the continuous sample spaces to be
dealt with in this course are only intervals of real numbers, or unions thereof. We define the sample space
of an interval from x = a to x = b (regardless of whether it is open or closed) to be its length b − a. The
size of an infinite sample space is denoted by ∞. The sample size in (4) is 45 − 30 = 15.

6.2 Events
An event E is a subset of the sample space S.
Example 6.2. For the following examples, each item corresponds to the experiments in Example 6.1.
(1) In the experiment of rolling a die, the event described by “the outcome is even” is

E = {2, 4, 6}.

(2) In the second experiment, the event “less than 100 people will come” is given by

E = {0, 1, 2, . . . , 99}.

(3) The event that the plane arrives not earlier than 35 minutes but not later than 40 minutes is given by
the set
E = {x ∈ R | 35 ≤ x ≤ 40}.

(4) The event “the name of a newly born baby starts with the letter A” is given by

E = {Aaron, Anna, . . .}.

Set operations also play an important role in dealing with events. Either or or signify the union of two
events. For example, the phrase “either E or F ” means the event E ∪ F . The conjunction and signify the
intersection of two events. For example, the phrase “both E and F ” describe the event E ∩ F . The negator
“not” signifies the complement. For example, the event “not E” is E c . Lastly relative complement E − F
may be described as “E but not F ”.
Example 6.3. Suppose the sample space of an experiment is given to be

S = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}.

What are the events described in the following?

(a) even (f ) even or not more than 7


(b) odd
(g) odd and prime
(c) prime
(h) not prime
(d) greater than 4
(e) not more than 7 (i) even but not greater than 4

29
Solution.

(a) {2, 4, 6, 8, 10} (f) {1, 2, 3, 4, 5, 6, 7, 8, 10}


(b) {1, 3, 5, 7, 9}
(g) {3, 5, 7}
(c) {2, 3, 5, 7}
(h) {1, 4, 6, 8, 9, 10}
(d) {5, 6, 7, 8, 9, 10}
(e) {1, 2, 3, 4, 5, 6, 7} (i) {2, 4}

The size of an event E, called event size and denoted |E|, is defined in a similar manner as for sample
size.

Evaluation
A. Basic Concepts (2 points each)
(1) What are the four undefined terms of probability theory? Describe each.
(2) Differentiate: discrete vs continuous, finite vs infinite sample spaces.
(3) What is the difference between the sample size of a discrete finite sample space and of a continuous
finite sample space? What about for infinite ones?

B. Problem Solving (5 points each)


(1) Give the sample space of the following experiments and the event being described:
(a) rolling a die; not prime
(b) number of heads of tossing 5 coins; at least two heads
(c) height of a newly planted seed after three days; between 3.5 and 4.8cm
(d) going fishing and number of fishes caught; even number of fishes
(e) score in a 10-item exam; pass (passing rate is 60%)
(2) Let S = {x ∈ Z | 1 ≤ x ≤ 100}. Determine the size of the following events:
(a) perfect square
(b) multiple of 3
(c) does not exceed 30 but is greater than 16
(d) prime and less than 50
(e) even or not less than 90

30
Lecture 07 - Axiomatic Probability Theory and Other
Approaches

Lecture Objectives
(1) Use axiomatic approach to prove properties of probability theoretically

(2) Familiarize other approaches to probability theory: classical, relative frequency, and subjective

(3) Identify when each approach is appropriate.

What is probability? This question is not as simple as it reads. The ways to define probability may range
from a theoretical standpoint to an experimental conclusion, and may sometimes even be just someone’s
opinion. In this lecture we attempt to define probability using the axiomatic approach, and three other
approaches.

7.1 Definition of Probability


We want to define probability in way that it measures the likelihood of an event to occur. More precisely, we
want to define it in a way that it tells us how many times the event would occur if we repeat an experiment
in a LARGE number of times. Mathematically, the probability P (E) that the event E would occur is
defined as
fE (n)
P (E) = lim ,
n→∞ n
where fE (n) is the number of times E occurred after n trials of the experiment.
We will consider illustrating this definition through the simple experiment of tossing a coin. Suppose we
want to determine the probability that the outcome is heads (we may denote the event as “heads” and write
the probability as P (heads)). Then,

fheads (n)
P (heads) = lim .
n→∞ n
That means we need to toss a coin an infinite number of times in order to calculate the probability P (heads).
By this example, we can already see the obvious problem posed by this definition: it is practically impossible
to toss the coin an infinite number of times!
This problem may require us to look for other ways of obtaining probability. In the following sections of
this lecture, we will learn alternative ways of defining probability. We start with a theoretical way called the
axiomatic approach.

7.2 Axiomatic Probability Theory


Axiomatic probability theory is closely related to the discipline of measure theory. We define a set S which
we call the sample space. Any subset E of S is called an event. For each event E, we correspond a real
number P (E) called the probability of E which satisfies the following axioms:

31
Axiom 1. 0 ≤ P (E) ≤ 1.

Axiom 2. P (S) = 1.

Axiom 3. P (E ∪ F ) = P (E) + P (F ) if E and F are mutually exclusive.

Axiom 1 states that probability ranges only between 0 and 1. If P (E) = 0, then E is said to be an
impossible event, while if P (E) = 1, then E is certain. Axiom 2 states that the probability of the sample
space is 1. That is, that the sample space is certain. Now, we state first that two events E and F are
mutually exclusive if and only if they are disjoint. That is,

E and F are mutually exclusive ⇔ E ∩ F = ∅.

Axiom 3 states that the probability of the union of two disjoint events is the sum of their probabilities.
From these axioms, we prove a few propositions.

Theorem 7.1. For any event E,


P (E c ) = 1 − P (E).

Proof. We know that S = E ∪ E c and that E ∩ E c = ∅ Hence

1 = P (S)
= P (E ∪ E c )
= P (E) + P (E c )

Therefore, P (E c ) = 1 − P (E).

From Theorem 7.1, if p is the probability that E occurs, then, 1 − p is the probability that E does not
occur. For example, if the probability that a newborn baby is a boy is 43 , then the probability that it is a
girl is 14 .

Theorem 7.2. If E ⊆ F , then P (E) ≤ P (F ).

Proof. Since E ⊆ F , then,


F = E ∪ (E c ∩ F ) .
Note also that E and E c ∩ F are mutually exclusive. Then we have

P (F ) = P (E ∪ (E c ∩ F ))
= P (E) + P (E c ∩ F )

Since P (E c ∩ F ) ≥ 0, we can conclude that

P (F ) ≥ P (E).

The above theorem tells us that larger events have larger probabilities. This gives an “order” for proba-
bility as a measure.

Theorem 7.3. For any events E and F ,

P (E ∪ F ) = P (E) + P (F ) − P (E ∩ F ).

32
Proof. Note that E ∪ F = E ∪ (E c ∩ F ) and that E and E c ∩ F are mutually exclusive. Now,

P (E ∪ F ) = P (E ∪ (E c ∩ F ))
= P (E) + P (E c ∩ F )

Furthermore, since F = (E ∩ F ) ∪ (E c ∩ F ) and E ∩ F and E c ∩ F are mutually exclusive, we have

P (F ) = P ((E ∩ F ) ∪ (E c ∩ F ))
= P (E ∩ F ) + P (E c ∩ F )
P (E c ∩ F ) = P (F ) − P (E ∩ F )

Substituting this to the above equation, we have

P (E ∪ F ) = P (E) + P (F ) − P (E ∩ F ).

It can be shown from Theorem 7.1 that P (∅) = 0 (see next Exercise). Moreover, when E and F are
mutually exclusive, then Theorem 7.3 is shown to be consistent with Axiom 3.
Exercise. Show that P (∅) = 0.
The preceding axioms and theorems can be used to calculate probability values as the following examples
show.
Example 7.1. In a survey conducted by a company among its employees, it was found that 64% are
computer-literate and 25% own a service vehicle. Further analysis also shows that 9% are both computer-
literate and vehicle owners. Suppose we pick one employee in the company, what is the probability that he is
either computer-literate or a vehicle owner?
Solution. We have

P (computer-literate) = 0.64,
P (vehicle owner) = 0.25, and,
P (computer-literate ∩ vehicle owner) = 0.09.

Now by Theorem 7.3,

P (computer-literate ∪ vehicle owner) = 0.64 + 0.25 − 0.09


= 0.80.

Thus, the probability that the employee is either computer-literate or a vehicle owner is 0.80. We can
also say that the chance that he is either computer-literate or a vehicle owner is 80%.
In order to shorten our notation, we might denote the events into capital letter, say, E and F . We provide
a short-hand notation to rewrite the above solution.
Solution. Let
E : employee is computer-literate
F : employee is a vehicle owner
Then, P (E) = 0.64, P (F ) = 0.25, and P (E ∩ F ) = 0.09. Thus,

P (E ∪ F ) = 0.64 + 0.25 − 0.09


= 0.80.

33
Exercise. What is the probability that the employee is not computer-literate and not a vehicle owner?
Exercise. Show that if E ⊆ F , then P (F − E) = P (F ) − P (E). [Hint: Note that A − B = A ∩ B c .]
Suppose we have three events E, F, G. Then, by repetitive application of Theorem 7.3,

P (E ∪ F ∪ G) = P (E ∪ (F ∪ G))
= P (E) + P (F ∪ G) − P (E ∩ (F ∪ G))
= P (E) + P (F ) + P (G) − P (F ∩ G) − P ((E ∩ F ) ∪ (E ∩ G))
= P (E) + P (F ) + P (G) − P (F ∩ G) − [P (E ∩ F ) + P (E ∩ G) − P ((E ∩ F ) ∩ (E ∩ G))]
= P (E) + P (F ) + P (G) − P (E ∩ F ) − P (E ∩ G) − P (F ∩ G) + P (E ∩ F ∩ E ∩ G)
= P (E) + P (F ) + P (G) − P (E ∩ F ) − P (E ∩ G) − P (F ∩ G) + P (E ∩ F ∩ G).

This observation is generalized in the following theorem (presented without proof).


Theorem 7.4. Let E1 , E2 , . . . , En be events. Then,
n
! n
[ X X X
P Ei = P (Ei )− P (Ei ∩ Ej ) + · · · + (−1)r1 P (Ei1 ∩ Ei2 ∩ · · · ∩ Eir )+
i=1 i=1 i≤j i1 <i2 <···<ir

· · · + (−1)n+1 P (E1 ∩ E2 ∩ · · · ∩ En ).

For theoretical work, the axiomatic approach to probability theory is always used. These axioms and
theorems will be used in most of the following lectures we will have in this course. However, the axiomatic
approach may not be the most practical approach of obtaining probability values. The following sections
show other methods of calculating the probability of an event.

7.3 Classical Approach


What we have learned about sample size and event size will be important. In the classical approach to
probability theory, the probability P (E) of an event E is defined as
|E|
P (E) =
|S|
where |E| is the size of E and |S| is the size of S. In this approach, we assume that each outcome is equally
likely to occur, or in other words, there is a uniform likelihood among each outcome.
Example 7.2. Suppose we roll two dice. What is the probability that the sum is 7?
Solution. We know that there are |S| = 36 possible outcomes in rolling two dice. Of these, there are |E| = 4
whose sum is 7. Thus,
4 1
P (sum is 7) = = .
36 9
Example 7.3. Suppose a bowl contains 4 red, 5 blue, and 6 green chips. If we draw one chip from the bowl,
what is the probability that it is not green?
Solution. There are |S| = 4 + 5 + 6 = 15 possible outcomes and |E| = 6 green chips. Hence,
6 2
P (green) = = .
15 5
Thus,
2 3
P (not green) = 1 − = .
5 5

34
Example 7.4. Two cards are randomly drawn from the standard deck. What is the probability that both are
clubs?
Solution. There are |S| = 51 13
 
2 = 1275 possible outcomes and |E| = 2 = 78 possible outcomes that are
both clubs. Thus,
78
P (E) = ≈ 0.0612.
1275
Exercise. Two cards are successively drawn from the standard deck. What is the probability that the first
card is clubs and the second is diamonds?
Now we deal with examples of experiments in which the sample space S is continuous. Recall that for
an interval E = [a, b], we define the size of E by
|E| = b − a.
Example 7.5. A plane is expected to arrive in between 30 and 45 minutes. What is the probability that it
arrives between 36 and 39 minutes?
Solution. We have |S| = 45 − 30 = 15 and |E| = 39 − 36 = 3. Thus the probability that the plane arrives
in the said time interval is
3 1
P (E) = = .
15 5
Next, we move in to a more experimental approach to probability.

7.4 Relative Frequency Approach


Recall that
fE (n)
P (E) = lim .
n→∞ n
For a finite value of n, we call the ratio
fE (n)
n
the relative frequency of E in n trials. For a large value of n, the relative frequency approximates the real
probability value. For example, if we want to determine the probability that E occurs, we can repeat the
experiment to a large number of trials and count the number of times that E occurred.
Example 7.6. The best example of a relative frequency approach to obtaining the probability of an event is
the science of simulation. For instance, there is a bowl containing 4 white and 6 black balls. Suppose we
want to know the probability of obtaining a white ball when we randomly draw one from the bowl. We do
simulation by drawing a ball from the bowl, say, 1000 times. If out of the 1000 trials, the white ball occured
387 times, then we can say that
387
P (E) = = 0.387.
1000
2
We expect the result to be close to the classical probability of = 0.4.
5
Example 7.7. A store manager wants to determine the probability that a customer will purchase an item
once he enters the store. For a whole week, 2332 entered the store and 1345 purchased an item. Hence the
probability that an entering customer will purchase an item is
1345
P (E) = .
2332
Exercise. Use relative frequency method with n = 10 trials in order to obtain a probability of heads in
tossing a coin.
The next section gives another approach to obtaining the probability of an event.

35
7.5 Subjective Approach
Sometimes, probability is simply given by one’s opinion. In this approach, the likelihood that an event will
occur is measured by personal belief. This appears many times especially when a mathematical formulation
to determine the probability is unachievable.
Example 7.8. A man plans on asking a girl to marry him. The girl says that the man has 60% chance of
getting a YES from her.
Subjective probability values are usually given by expert opinion such as the following example.
Example 7.9. A patient suffering from a serious disease consults a doctor. The doctor says that the patient
has 50% chance of surviving the disease.

7.6 When to Use Which Approach?


The axiomatic approach is mainly used for theoretical work. however, even if the subject is non-theoretical
in nature, or even if the probability is obtained using any of the last three approaches, they are still bound to
the axioms of probability. That means whether classical, relative frequency, or subjective approach is used,
the axioms still have to be satisfied.
Below we enumerate some indicators of when and how to decide which approach to use.

Use classical approach when:


(1) the event and sample sizes can be determined; and,
(2) each outcome is assumed to be equally likely to occur with any other outcome.
Use relative frequency approach when:
(1) classical approach cannot be used; and,
(2) either:
(a) the experiment can be repeated (i.e., a simulation can be done n times); or,
(b) previous data is available to determine relative frequency.
Use subjective method when:
(1) both classical and relative frequency approaches cannot be used.

Evaluation
A. Basic Concepts (2 points each)
(1) Create a flowchart that instructs when to use which among classical, relative frequency, and subjective
approach to use.
(2) Describe what probability is in your own words.
B. Problem Solving (5 points each)
(1)
C. Theoretical Work

36
Lecture 08 - Calculating Probabilities

Lecture Objectives
(1) Use counting techniques and classical method of assigning probability to calculate the probability of a
given event
(2) Apply axiomatic approach to probability to calculate probability values

In this lecture we will be dealing with a series of examples of problems that ask for probability values of
events. We will be applying what we have learned from the previous lectures, especially those from Lectures
2 to 5, and 7.

8.1 Examples Involving the Fundamental Counting Principle


We will first deal with examples that will need the application of FCP for either or both the event and
sample sizes. Generally, the classical approach will be used in these problems.
Example 8.1. Suppose a class has 11 males and 15 females. What is the probability that either John or
Emmanuel will be chosen as prince charming and Rowella as muse?
Solution. There are |S| = 11 · 15 = 165 possible outcomes in the sample space. And the specified event has
|E| = 2 · 1 = 2 possible outcomes. Therefore, the probability that the said event occurs is
2
P (E) = .
165
Example 8.2. If a die is rolled twice, what is the probability that the sum is 6?
Solution. There are |S| = 6 · 6 = 36 possible outcomes in the sample space and the specified event has
|E| = 5 possible outcomes (specifically (1, 5), (2, 4), (3, 3), (4, 2), and (1, 5)). Therefore,
5
P (E) = .
36
Example 8.3. A PIN code is made of four digits. What is the probability that the digits of your friend’s
PIN code are all the same?
Solution. There are |S| = 104 = 10000 possible PIN codes in all. There are only |E| = 10 · 1 · 1 · 1 = 10 of
such PIN codes that are made of all the same digits. Hence, the probability that all digits are the same is
10 1
P (E) = = .
10000 1000
Exercise.
(1) Three cards are drawn one by one from the standard deck without replacement. What is the probability
that all three are hearts?
(2) A bowl contains 10 balls numbered 1 to 10. Two balls are drawn one by one with replacement. What is
the probability that both balls are odd?

37
8.2 Examples Involving Linear Arrangements
Recall our basic permutation rule that there are n! ways of arranging n objects in a line. We apply this to
the following examples.
Example 8.4. Five friends went to watch a movie sitting in a row. What is the probability that Alex and
Bobby sit next to each other?
Solution. There are |S| = 5! = 120 possible ways to arrange themselves. Two count the number of ways
Alex and Bobby sit next to each other, we have
|E| = 2 · 4!
= 48.
Hence, the probability that they are sitting next to each other is
48 2
P (E) = = .
120 5
Example 8.5. Six runners A, B, C, D, E, and F are racing to the finish line. What is the probability that B
will come first and C will finish last?
Solution. There are |S| = 6! = 720 possible ways the race could end. There are |E| = 1 · 4 · 3 · 2 · 1 · 1 = 24
possible ways for B to finish first and C last. Thus, the probability for this event to happen is
24 1
P (E) = = .
720 30

Exercise. Calvin dreams to visit 5 European countries namely England, Spain, France, Italy, and Germany.
What is the probability that he will visit either England or Italy first and will not choose Spain as the last
country to visit?

8.3 Examples Involving Arrangements with Indistinct Elements


Next, we deal with probability on experiments where the sample space is made by linear arrangements of
elements that are not necessarily distinct. Recall from Lecture 2 that for a set containing k kinds of elements
with n1 , n2 , . . . , nk indistinct elements of each type, there are
(n1 + n2 + · · · + nk )!
n1 !n2 ! · · · nk !
ways to arrange them.
Example 8.6. A computer program randomly scrambles any word. What is the probability that the first
three letters of EXCELLENCE will be in place after it is scrambled?
Solution. There are
10!
|S| = = 75600
4!2!
possible ways the computer program will scramble the word. Of these, there are
7!
|E| = = 420
3!2!
ways for the first three letters EXC to be in place. Hence the probability is
420 1
P (E) = = .
75600 180

Exercise. A jar contains 4 red, 3 green, and 2 blue chips. A chip is drawn one after another and the color
is recorded until there is no more chip inside the jar. What is the probability that the last chip drawn is
red?

38
8.4 Examples Involving Circular Permutations
Next we deal with experiments involving circular arrangements. We recall that there are n! ways to arrange
n objects in a circle. We apply these principles in solving some related problems.
Example 8.7. Four friends A, B, C, and D are to sit around a table. What is the probability that
(1) A and B sit together?
(2) A and B does not sit together?
Solution. There are |S| = (4 − 1)! = 6 ways to arrange them around a table. For (1), there are |E| =
(3 − 1)! · 2 = 4 ways for A and B to sit together. Thus, the probability that they sit together is
4 2
P (E) = = .
6 3
For (2), there are |E| = 6 − 4 = 2 for them not to sit together. Hence,
2 1
P (E) = = .
6 3

Exercise. A bracelet is to be decorated with diamond, ruby, emerald, jade, and pearl beads. What is the
probability that
(1) diamond and pearl are not placed next to each other; and,
(2) ruby, pearl, and ruby are together?

8.5 Examples Involving Permutations


Recall the permutation function P (n, r) defined by
n!
n Pr = .
(n − r)!
This function is used for counting arrangements of r objects from an n-set and is considered as an extension
of the generalized counting principle to experiments wherein the objects to be arranged are not necessarily
the entire set.
Example 8.8. A class has 6 boys and 4 girls. They have to elect 4 officers to become the President,
Vice-President, Secretary, and Treasurer. What is the probability that all 4 officers are boys?
Solution. There are |S| =10 P4 = 5040 possible ways to select the 4 officers from the class of 10 students.
Of these, there are |S| =6 P4 ways that all 4 officers are from the 6 boys. Hence, the probability is
360 1
P (E) = = .
5040 14
Example 8.9. For a password, a user has to enter a four-letter string from the English alphabet. What is
the probability that all four letters are consonants?
Solution. There are |S| =26 P4 = 358800 possible passwords. There are |E| =21 P4 = 143640 ways so that
all of them are consonants. The probability is
1197
P (E) = ≈ 0.4003
2990

Exercise.
(1) Four boys and three girls have to be seated in a row. What is the probability that no two boys sit
together?
(2) Seven cyclists join a race. What is the probability that A, B, and C are in the top 3?

39
8.6 Examples Involving Combinations
n

Recall the combination function r defined by
 
n n!
= .
r r!(n − r)!

This function gives the number of possible ways to select r objects from a set of n, where arrangement does
not matter.

Example 8.10. A group of friends is composed of four boys B1 , B2 , B3 , and B4 and three girls G1 , G2 and
G3 . They received three free tickets to the zoo and should choose among themselves on who should go. What
is the probability that all who will go are boys?

Solution. They have a total of |S| = 72 = 21 choices. There are |E| = 43 = 4 ways to select three boys
 

to go. Hence, the probability that all the ones who will go are boys is
4

3 4
P (E) = 7
 = .
2
21

Example 8.11. Three cards were drawn from the standard deck. What is the probability that all of them
are spades?

Solution. There are |S| = 52 13


 
3 possible ways to draw 3 cards from the standard deck. Of these, |E| = 3
are the possibilities of all 3 cards being spades. Thus,
13

3 39
P (E) = 52
 = .
3
11050

Exercise.

(1) Two cards are to be drawn from the standard deck. What is the probability that both are black cards?

(2) In a 6/42 Lotto draw, six balls are to be drawn from a set of 42 balls numbered 1 through 42. What is
the probability that all the winning entries are even?

8.7 Examples Involving Set Cardinalities


In this section, we use theorems we obtained from Lecture 08. Specifically, we recall that

|A ∪ B| = |A| + |B| − |A ∩ B|.

This has an equivalent probability identity through Theorem 7.3 stating that for two events E and F
(regardless of whether mutually exclusive or not),

P (E ∪ F ) = P (E) + P (F ) − P (E ∩ F ).

Example 8.12. In a class, there are 50 students, of which 27 own a phone and 29 own a tablet. If 5 students
own neither phone nor tablet, what is the probability that a randomly chosen student owns both a phone and
a tablet?

40
Solution. Let E be the event that the student owns a phone and F the event that he or she owns a tablet.
27 29 5
Then, P (E) = 50 and P (F ) = 50 . Also, P (E ∪ F ) = 1 − 50 = 45 9
50 = 10 . This mens that

P (E ∩ F ) = P (E) + P (F ) − P (E ∪ F )
27 29 45
= + −
50 50 50
11
= .
50
Exercise. A school has two basketball teams: the Men’s Team and the Women’s Team playing in separate
leagues. According to the school’s 30-season history of joining both leagues, there was one season where
both teams won championship. The Men’s Team has been champions 4 times and the Women’s team has
been champions 3 times. What is the probability that if a random season is picked, at least one of the teams
was the champion?

8.8 Examples Involving Continuous Sample Spaces (with Uniform


Distributions)
We have delimited what continuous sample spaces and events are being studied in this course, and that
is intervals of real numbers. We have also defined their corresponding sizes simply as the length of such
interval. That is, if S is the interval (a, b) where a ≤ b, then
|S| = b − a.
Note that if a = b, then |S| = 0. This can mean that |[a, b]| = |[a, a]| + |(a, b)| + |[b, b]| = |(a, b)| which means
to say that the size of an open interval (a, b) is the same as the size of the close interval [a, b]
Thus, the probability of an event E = (x, y) in a sample space S = (a, b) is given by
y−x
P (E) = .
b−a
Example 8.13. As the pilot estimates, the plane arrives between 8:20AM and 8:45AM. What is the proba-
bility that the plane arrives later than 8:30AM?
Solution. The sample size is |S| = 45 − 20 = 25. The event size is |E| = 45 − 30 = 15. Hence, the
probability that it arrives later than 8:30AM is
15 3
P (E) =
= .
25 5
Exercise. In the same situation as above, what is the probability that the plane arrives not later than
8:40AM but not earlier than 8:10AM?

8.9 Examples Involving the Use of the Axiomatic Approach


For the problems in this subsection, we will be using the axioms and theorems we established in Lecture 07.
Example 8.14. Let E and F be mutually exclusive events and G mutually exclusive with E but not with F .
If P (E) = P (F ) = P (G) = 0.4 and P (F ∩ G) = 0.2, find P (E ∪ F ∪ G).
Solution. We have,
P (E ∪ F ∪ G) = P (E ∪ (F ∪ G))
= P (E) + P (F ) + P (G) − P (E ∩ F ) − P (E ∩ G) − P (F ∩ G) + P (E ∩ F ∩ G)
= 0.4 + 0.4 + 0.4 − 0 − 0 − 0.2 + 0
=1

41
Example 8.15. The probability that Lance will fail in Algebra is 0.2. The probability that he will fail in
Analysis is 0.1. The probability that he passes both subjects is 0.6. What is the probability that he will pass
at least one of the two subjects?
Solution. Let E be the event that he passes Algebra and F the event that he passes Analysis. Then,
P (E) = 1 − 0.2 = 0.8 and P (F ) = 1 − 0.1 = 0.9. Also, P (E ∩ F ) = 0.7. Hence,

P (E ∪ F ) = P (E) + P (F ) − P (E ∩ F )
= 0.8 + 0.9 − 0.6
= 0.9

Exercise. Let P (E) = 0.1, P (F ) = 0.2, and P (E − F ) = 0.03. What is P (E ∪ F )?


Exercise. If the chance of Mark being chosen to be part of a dance group is 70%, the chance of Marvin
being chosen is 23%, and the chance of both being chosen is 4%, what is the probability that one and only
one of them is chosen?
Exercise. Let E, F, G, H be four pairwise mutually exclusive events such that P (E) = 2P (F ) = 3P (G) =
4P (H). Find P (E ∪ F ∪ G ∪ H).

42
Lecture 09 - Conditional Probability and Independent
Events

Lecture Objectives
(1) Define conditional probability and independence

(2) Compute conditional probability of conditional events

(3) Prove that two events are independent

Sometimes, we need the probability of an event with some information already available. This leads us to a
new important concept called conditional probability. Aside from this, conditional probability also provides
simpler computations for some probability calculation problems.

9.1 Definition of Conditional Probability


We start with the following example.

Example 9.1. Suppose a die is rolled twice. What is the probability that the sum is 8, given that the first
roll yields 2?

Notice that in this example, the event in question: “the sum is 8”, is preceded by a given event “first roll
yields 2”. The sample space S of rolling a die twice consists of 36 equally likely outcomes. However, when
it was given that the the outcome of the first roll was 2, the sample space was reduced to:

S 0 = {(2, 1), (2, 2), (2, 3), (2, 4), (2, 5), (2, 6)}

with only 6 elements.

Solution (Example 9.1). There are |S 0 | = 6 outcomes in the (reduced) sample space. Of these, only 1
1
yields a sum of 8. Hence the probability that the sum is 8 given that the first roll is 2, is .
6
In such example, we are given with two events E (the sum is 8) and F (the first roll is 2), and we are
asked for the conditional probability of E given F , denoted P (E|F ) (read as “conditional probability of E
given F ”). The event F is called the condition.

Definition 9.1. Let E and F be events with P (F ) 6= 0. The conditional probability of E given F is
defined as
P (E ∩ F )
P (E|F ) = .
P (F )

We solve Example 9.1 using the definition of conditional probability.

43
6
Solution. Let E be the event that the sum is 8 and F the event that the first roll is 2. Then, P (F ) = 36 = 16 .
1
Also, P (E ∩ F ) = 36 because only (2, 6) is the outcome which has a some of 8 and a first roll of 2. Hence,

1
1
P (E|F ) = 36 = .
1 6
6

Example 9.2. What is the probability that the sum of two rolls of a die is even given that the first roll is
odd?
Solution. Let E be the event that the sum of the two rolls is even and F the event that the first roll is odd.
Then, F = {(j, k) | j is odd, k is any number from 1 to 6}. Hence, |F | = 3 · 6 = 18 so that P (F ) = 18 1
36 = 2 .
9 1
Also, E ∩ F = {(j, k) | j, k are odd}. Hence, |E ∩ F | = 3 · 3 = 9 so that P (E ∩ F ) = 36 = 4 . Now,

1
1
P (E|F ) = 4 = .
1 2
2

Exercise. According to research, 83.2% of fathers who beat their children were also beaten by their fathers
when they were young. If 18.1% of fathers beat their children, what percent of fathers both beat and were
beaten?
Exercise. If E and F are mutually exclusive and P (F ) 6= 0, what is P (E|F )?
Exercise. In a class of 50, 37 own a phone and 12 own a tablet, while 6 own neither phone nor tablet. If a
random student who owns a tablet is selected, what is the probability that he also owns a phone?
From the definition of the conditional probability, one may yield

P (E ∩ F ) = P (F )P (E|F ) = P (E)P (F |E),

given that P (E) 6= 0 and P (F ) 6= 0. We generalize this observation to any number of events E.
Theorem 9.1 (Multiplication Rule). Let E1 , E2 , . . . , En be events such that P (Ei ) 6= 0 for i = 1, 2, . . . , n.
Then,

P (E1 ∩ E2 ∩ · · · ∩ En ) = P (E1 )P (E2 |E1 )P (E3 |(E1 ∩ E2 )) · · · P (En |(E1 ∩ E2 ∩ · · · ∩ En )).

Proof. Exercise.

9.2 Conditional Probability as a Probability


We have laid out in Lecture 07 the axioms of Probability Theory. Now, a natural question arises: does
conditional probability satisfy the axioms of an ordinary probability? It will also be convenient if we could
establish theorems stating important properties of conditional probability that are similar to those found in
Lecture 07.
Since 0 ≤ P (E ∩ F ) ≤ 1 and 0 < P (F ) ≤ 1, we see that 0 ≤ P (E|F ) ≤ 1. Hence, conditional probability
satisfies the Axiom 1. Next in the case of conditional probability, we reduce the sample space S to S 0 = F .
Now,
P (F ∩ F ) P (F )
P (F |F ) = = = 1.
P (F ) P (F )

44
which implies that conditional probability satisfies Axiom 2. Now if E1 and E2 are mutually exclusive events,
then E1 ∩ F and E2 ∩ F are also mutually exclusive. It follows that,

P ((E1 ∪ E2 ) ∩ F )
P ((E1 ∪ E2 )|F ) =
P (F )
P ((E1 ∩ F ) ∪ (E2 ∩ F ))
=
P (F )
P (E1 ∩ F ) + P (E2 ∩ F )
=
P (F )
P (E1 ∩ F ) P (E2 ∩ F )
= +
P (F ) P (F )
= P (E1 |F ) + P (E2 |F )

Hence, conditional probability satisfies Axiom 3. Therefore, conditional probability is a probability.


We now investigate properties of conditional probability comparable to those of the theorems in Lecture
07. First, we show the handy formula for the conditional probability of a complement.

Theorem 9.2. For any events E and F , with P (F ) 6= 0,

P (E c |F ) = 1 − P (E|F ).

Proof. We know that F = (E ∩ F ) ∪ (E c ∩ F ) and that E ∩ F and E c ∩ F are mutually exclusive. Thus,

1 = P (F |F )
P ((E ∩ F ) ∪ (E c ∩ F ))
=
P (F )
P (E ∩ F ) P (E c ∩ F )
= +
P (F ) P (F )
c
= P (E|F ) + P (E |F ).

Therefore, P (E c |F ) = 1 − P (E|F ).

Next, we show that conditional probability preserves order.

Theorem 9.3. Let E, F, G be events such that E ⊆ G and P (F ) 6= 0. Then

P (E|F ) ≤ P (G|F ).

Proof. If E ⊆ G, then E ∩ F ⊆ F ∩ G. Hence, P (E ∩ F ) ≤ P (G ∩ F ). This means that

P (E ∩ F ) P (G ∩ F )

P (F ) P (F )
P (E|F ) ≤ P (G|F ).

Lastly, we provide the following useful theorem.

Theorem 9.4. Let E, G, F be events with P (F ) 6= 0. Then,

P ((E ∪ G)|F ) = P (E|F ) + P (G|F ) − P ((E ∩ G)|F ).

45
Proof. We have,

P ((E ∪ G) ∩ F )
P ((E ∪ G)|F ) =
P (F )
P ((E ∩ F ) ∪ (G ∩ F ))
=
P (F )
P (E ∩ F ) + P (G ∩ F ) − P (E ∩ F ∩ G ∩ F )
=
P (F )
P (E ∩ F ) P (G ∩ F ) P ((E ∩ G) ∩ F )
= + −
P (F ) P (F ) P (F )
= P (E|F ) + P (G|F ) − P ((E ∩ G)|F ).

Exercise. Let E, G, F be events such that P (G) 6= 0. Prove that

P (E|F ) = P ((E ∩ G)|F ) + P ((E ∩ Gc )|F ).

Exercise. Suppose two dice are rolled. Let E be the event that the sum is 6 and F the event that the first
die is even. Find

(1) P (E|F )

(2) P (F |E)

9.3 Independent Events


In the previous section, we have observed that the probability of the event E is “affected” by the condition
F . In some cases, the probability of E remains invariant regardless of the occurrence or non-occurrence of
F . That is,
P (E|F ) = P (E).
If this is the case, we say that E and F are independent events. By definition of P (E|F ), we obtain the
equation
P (E ∩ F )
= P (E).
P (F )
This yields to the following definition of independent events.

Definition 9.2. Two events E and F are said to be independent if and only if

P (E ∩ F ) = P (E)P (F ).

Intuitively, independent events do not affect the probability of each other. For example, the gender of a
baby born from a mother this year does not affect the gender of the baby on her next pregnancy. Also, the
outcome of tossing a coin does not affect the outcome of the next toss.
Independent events should also be not confused with mutually exclusive events. Mutually exclusive events
are events that do not have a common outcome.

Example 9.3. Suppose a die is rolled twice. Let E be the event that the first roll is even and F the event
that the second roll is odd. Show that E and F are independent.

46
Solution. We know that |E| = 3 · 6 = 18 so that P (E) = 12 and |F | = 6 · 3 = 18 so that P (F ) = 21 . Also,
|E ∩ F | = 3 · 3 = 9 so that P (E ∩ F ) = 14 . Now,
  
1 1 1
P (E)P (F ) = = = P (E ∩ F ).
2 2 4

Therefore, E and F are independent.


Two events that are not independent are dependent. In other words, E and F are dependent if and
only if
P (E|F ) 6= P (E)
or that
P (E ∩ F ) 6= P (E)P (F ).

Exercise. Suppose four coins are tossed. Let E be the event that all coins land on heads and let F be the
event that the first coin is heads. Show that E and F are dependent.
Exercise. Suppose two dice are rolled. Verify if the following pairs of events are independent.
(1) “sum is 6” and “the first die is even”
(2) “sum is 6” and “the first die is 3”
(3) “sum is 7” and “the first die is even”
Independence can also be extended for three or more events as the next definition of multiple independence
states.
Definition 9.3. Let E, F, G be events. Then E, F, G are independent iff
(i) P (E ∩ F ∩ G) = P (E)P (F )P (G)
(ii) P (E ∩ F ) = P (E)P (F )
(iii) P (E ∩ G) = P (E)P (G)
(iv) P (F ∩ G) = P (F )P (G).
Exercise. Suppose a die is rolled thrice. Show that the events “first is odd”, “second is even”, and “third
is greater than 4” are independent.

9.4 Odds* (optional topic)


In practice, the odds of an event is sometimes used as an alternative to probability. It is defined simply as
the ratio of the probability that an event occurs to the probability that it does not occur. Formally, we give
the following definition.
Definition 9.4. Let E be an event. The odds of E is defined as

P (E)
O(E) = .
P (E c )

Since P (E c ) = 1 − P (E), one may have

P (E)
O(E) = .
1 − P (E)

47
Example 9.4. The probability that our basketball team wins tonight is 0.8. Find our odds of winning.
Solution. Let E be the event of winning tonight. Then P (E) = 0.8 and so,
0.8
O(E) =
1 − 0.8
= 2.

Exercise. Suppose four coins are tossed. What are the odds of getting at most three heads?
Exercise. Analyze the definition of odds to answer the following questions.

(1) Is odds a probability?


(2) How do we interpret O(E) = 1, O(E) > 1, and O(E) < 1?
Exercise. The odds that it will rain today is 3. What is the probability that it will rain today?

48
Lecture 10 - Bayes’ Rule

Lecture Objectives
(1) Derive Bayes’ formula

(2) Apply Bayes’ rule to solve relevant problems

Bayes’ Rule is an important formula used in many practical applications. It deals with the relationship of a
given event to a family of events which partition the sample space.

10.1 Basic Formulations


Let S be a sample space. A partition of S is a family B of events B1 , B2 , . . . , Bn such that

(i) Bi 6= ∅ for i = 1, 2, . . . , n;

(ii) Bi ∩ Bj = ∅ for i 6= j; and,

(iii) B1 ∪ B2 ∪ · · · ∪ Bn = S.

We start with the simplest case B = {B, B c } for some B such that P (B) 6= 0.

Theorem 10.1. For any events A and B with P (B) 6= 0, we have

P (A) = P (A|B)P (B) + P (A|B c )P (B c ).

Proof. For any event A, we know that

A = (A ∩ B) ∪ (A ∩ B c )

and that A ∩ B and A ∩ B c are mutually exclusive. Hence,

P (A) = P (A ∩ B) + P (A ∩ B c ).

By multiplication rule, it follows that

P (A) = P (A|B)P (B) + P (A|B c )P (B c ).

Now, we apply this theorem to some practical problems.

Example 10.1. A certain drug test procedure can positively identify a drug user at a rate of 99.8%. It can
also successfully return negative at a rate of 98.7% if the person is not a drug user. In a certain community,
it is estimated that 5% of the residents are drug users. If a random drug testing is conducted for a certain
resident, what is the probability that the test will return positive?

49
Solution. Let A be the event that the test returns positive and B the event that the person is a drug user.
Then,

P (B) = 0.05
P (B c ) = 0.95
P (A|B) = 0.998
P (A|B c ) = 1 − P (Ac |B c )
= 1 − 0.987
= 0.013

Now,

P (A) = (0.998)(0.05) + (0.013)(0.95)


= 0.06225.

Hence, the probability that the test will return positive to a randomly selected person is 0.06225 (or 6.23%).
Example 10.2. It was found in a research there are a very small population of left-handed individuals.
Around 7.2% of males and 4.0% of females are left-handed. Also, it is known that male and female populations
are (approximately) equal. What percentage of people in the world are left-handed?
Solution. Let A be the event that the person is left-handed and B the event that he is a male. Since male
and female populations are (approximately) equal, P (B) = P (B c ) = 0.5. From the problem,

P (A|B) = 0.072
P (A|B c ) = 0.040.

Hence,

P (A) = (0.072)(0.5) + (0.040)(0.5)


= 0.056.

Hence, around 5.6% of the people in the world are left-handed.


Since P (A ∩ B) = P (A|B)P (B) and P (B ∩ A) = P (B|A)B(A). This gives us the equation

P (A|B)P (B) = P (B|A)P (A).

Hence,
P (A|B)P (B)
P (B|A) = .
P (A)
This equation can be used to reverse the conditional probability from P (A|B) to P (B|A).
Example 10.3. 22% of males who took a college entrance exam passed. In the exam, 67% were males and
the passing rate is 25%. What percent of passing applicants were females?
Solution. Let A be the event that the applicant passes the exam and B be the event that the applicant is
male. Then,

P (A) = 0.25
P (B) = 0.67
B(A|B) = 0.22

50
Then, the probability that the passer is a male is,
(0.22)(0.67)
P (B|A) = = 0.5896.
(0.25)
Hence, the probability that the passer is a female is
1 − 0.5896 = 0.4104.
In other words, 41.04% of the passers were females.
Exercise. 20% of the employees of a company are managers. 90% of these managers have an insurance.
Also, it is known that in the company, 60% of those who do not have an insurance are not managers. What
is the proportion of the employees have insurance in the company?

10.2 Bayes’ Formula


We will generalize the equation
P (A|B)P (B) P (A|B)P (B)
P (B|A) = =
P (A) P (A|B)P (B) + P (A|B c )P (B c )
for any partition B = {B1 , B2 , . . . , Bn }.
Theorem 10.2 (Bayes’ Formula). Let B = {B1 , B2 , . . . , Bn } partition the sample space. with P (Bi ) > 0
for i = 1, 2, . . . , n. Then,
P (A|Bk )P (Bk )
P (Bk |A) = Pn .
i=1 P (A|Bi )P (Bi )
Proof. Exercise.
Example 10.4. The Roman Senate was composed of 124 senators. In the reign of Augustus Caesar, 64
senators were aged 41 to 50 years, 30 were aged 51 to 60, 24 were aged 61 to 70, and the rest were aged 71
to 80 years. In each of these age brackets, half voted to invade Gaul (modern day France), except for the
eldest bracket who all voted yes. From those who voted for the invasion, Nero chooses a General to plan the
attack. What is the probability that the General is aged 51 to 60?
Solution. Let A be the event that a senator votes for the invasion of Gaul.
age bracket proportion
B1 - 41 to 50 64
B2 - 51 to 60 30
B3 - 61 to 70 24
B4 - 71 to 80 6
Then, P (B1 ) = 64/124 = 16/31, P (B2 ) = 30/124 = 15/62, P (B3 ) = 24/124 = 6/31, P (B4 ) = 6/124 = 3/62.
The probability that the general is aged 51 to 60 given that he voted for the attack is
P (A|B3 )P (B3 )
P (B3 |A) =
P (A|B1 )P (B1 ) + P (A|B2 )P (B2 ) + P (A|B3 )P (B3 ) + P (A|B4 )P (B4 )
1
 6
2 31 
= 1  16  1
 15 1 6
 3

2 31 + 2 62 + 2 31 + (1) 62
12
= .
65

Exercise. In a tournament, Germany sent 20 players; France sent 40 players, and England sent 60 players.
Exactly 10 players from each country have played this tournament last year. If the player who becomes the
champion played last year, what is the probability that he is not German?

51
52
Chapter 2

Random Variables, Distribution


Functions, and Expectations

One weakness of our current approach to probability is that the domain of our probability function P (·) is
a family of sets — and not of real numbers. That is, there is a need to count the size of events and sample
spaces in order to obtain probability values. In this chapter, we introduce the notion of random variables.
In this way, we approach probability theory in a way that the domain of our probability function (which we
will later call probability distribution function or probability density function) is a set of real numbers.
Our primary objective in this chapter is to understand random variables, distribution and cumulative
distribution functions, and expected values.

Intended Learning Outcomes


(CO 2.1) Define a random variable and explain its usefulness in computing probabilities of events
(CO 2.2) Enumerate the properties of a cumulative distribution function and probability distribution
function
(CO 2.3) Define the probability distribution function from the cumulative distribution function and vice
versa

Lectures
Lecture 11. The Notion of a Random Variable
Lecture 12. Probability Distribution Function

Lecture 13. Mathematical Expectation


Lecture 11 - The Notion of a Random Variable

Lecture Objectives
(1) Define random variables

(2) Classify discrete and continuous random variables

(3) Transform sample spaces and events into random variables

We prefer to calculate probability values directly as a function of some real number. This is not possible in
our previous treatment of probability where in for P (E), E is not a number but instead, a set. Our objective
for this lecture is to understand that through random variables, events will be represented by real numbers.

11.1 Definition of a Random Variable


We define a random variable as follows.

Definition 11.1. Let S be a sample space of some experiment. A random variable over S is a mapping

X : S → R.

For a real number a, the set X −1 (a) of all pre-images of a represents a unique event included in S.
In this manner, the event X −1 (a) is being represented by the real number a. Furthermore, the family
{X −1 (a)|X −1 (a) 6= ∅} form a partition of S.
Let us consider, for example, the experiment of rolling a die twice. The sample space is given by

S = {(1, 1), (1, 2), (1, 3), . . . , (6, 6)}.

Suppose we are interested with the probability that the sum of the two rolls is k. In this case we can define
X as the sum of the two rolls. That is,

X(1, 1) = 2
X(1, 2) = 3
X(1, 3) = 4
..
.
X(6, 6) = 12

The inverse image X −1 (k) gives the set of all sample points which give a sum of k, i.e., the event “the sum

54
is k”. In particular,
X −1 (2) = {(1, 1)}
X −1 (3) = {(1, 2), (2, 1)}
X −1 (4) = {(1, 3), (2, 2), (3, 1)}
X −1 (5) = {(1, 4), (2, 3), (3, 2), (4, 1)}
X −1 (6) = {(1, 5), (2, 4), (3, 3), (4, 2), (5, 1)}
X −1 (7) = {(1, 6), (2, 5), (3, 4), (4, 3), (5, 2), (6, 1)}
X −1 (8) = {(2, 6), (3, 5), (4, 4), (5, 3), (6, 2)}
X −1 (9) = {(3, 6), (4, 5), (5, 4), (6, 3)}
X −1 (10) = {(4, 6), (5, 5), (6, 4)}
X −1 (11) = {(5, 6), (6, 5)}
X −1 (12) = {(6, 6)}
Further, the event X −1 (k) is usually denoted by
X = k.
For example, X = 5 means the event {(1, 4), (2, 3), (3, 2), (4, 1)}. (We use X −1 (k) when we mean it as the
set and we use X = k when we mean the event.) Note further that when k 6= j, then X −1 (k) ∩ X −1 (j) = ∅
which means that the events X = k and X = j are mutually exclusive.
The range of X, denoted Ran(X), defined as the set of all k ∈ R such that X −1 (a) 6= ∅, also plays an
important role in the study of random variables. In the above example,
Ran(X) = {2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12}.
Example 11.1. Suppose a coin is tossed thrice. Define the random variable X as the number of heads in
the outcome. Map each sample point under X. Identify Ran(X) and give X −1 (k) for all k ∈ Ran(X).
Solution. We have
X(HHH) = 3 X(HHT ) = 2
X(HT H) = 2 X(HT T ) = 1
X(T HH) = 2 X(T HT ) = 1
X(T T H) = 1 X(T T T ) = 0
We also find that
Ran(X) = {0, 1, 2, 3}.
And,
X −1 (0) = {T T T }
X −1 (1) = {HT T, T HT, T T H}
X −1 (2) = {HHT, HT H, T HH}
X −1 (3) = {HHH}
Exercise. Suppose a coin is tossed twice. Define a random variable X as the number of tails in the outcome.
Map each sample point under X. Identify Ran(X) and give X −1 (k) for all k ∈ R(X).
Exercise. Suppose a die is rolled once. Define X to be 1 if the outcome is odd and 0 if the outcome is even.
Give all that were asked in the previous exercise.
Exercise. Suppose a die is rolled thrice.

55
11.2 Classification of Random Variables
In the previous chapter we have dealt with two kinds of sample spaces: discrete and continuous. The same
will be true for random variables. In general, random variables may be classified as discrete or continuous
and are defined in a very similar fashion as for sample spaces.
Although majority of our examples are for the discrete case, it can be realized later that continuous
random variables are more practical in terms of applications, and are sometimes even easier to deal with
theoretically. This is made possible by calculus. In fact, due to the lack of or the difficulty of using tech-
niques to manipulate integers, some discrete random variables are being approximated into their continuous
counterparts.
Since we are more familiar with discrete random variables, we define them first.

Definition 11.2. A random variable X is discrete if Ran(X) is countable.

This means that all the examples we have mentioned in the previous section of this example are discrete.
Other examples of discrete random variables are:

(1) number of puppies a mother dog may give birth to in a single whelping;

(2) population of a city

(3) age (measured in number of years) of a person

(4) score in a basketball game

Now consider the following example. Let X be the length of time that Jam spends in waiting for his turn
to be served in a fast food restaurant, measured from the instance he entered the store to the moment he
received his order. If we measure X in minutes, then X could possibly have decimal values. In fact, X can
have any real number value from zero to +∞. In other words,

Ran(X) = {x ∈ R | x ≥ 0} = [0, +∞).

Clearly, in this case Ran(X) is uncountable. It follows that X is not discrete.

Definition 11.3. A random variable X is continuous if Ran(X) is uncountable.

Examples of continuous random variables are:

(1) weight of a newborn puppy

(2) land area of a city

(3) life time of a person

(4) length recorded by an athlete’s running long jump

Exercise. Determine whether the random variable is discrete or continuous

(1) number of typhoons that enter the Philippine Area of Responsibility in a year

(2) number of days that a typhoon stays in the country

(3) length of time that the typhoon stays in the country

(4) amount of honey produced in a single colony

(5) bee population in the colony

56
11.3 Probability in Terms of a Random Variable
The set X −1 (k) is the event of all outcomes such that X = k. The probability of this event may be denoted
as P (X = k).
Example 11.2. Suppose two dice are rolled. Define X as the sum of the two dice. Find P (X = k) for all
k ∈ R(X).
Solution. We have already seen from the previous section that Ran(X) = {2, 3, . . . , 12}. Now

1
P (X = 2) =
36
2 1
P (X = 3) = =
36 18
3 1
P (X = 4) = =
36 12
4 1
P (X = 5) = =
36 9
5
P (X = 6) =
36
6 1
P (X = 7) = =
36 6
5
P (X = 8) =
36
4 1
P (X = 9) = =
36 9
3 1
P (X = 10) = =
36 12
2 1
P (X = 11) = =
36 18
1
P (X = 12) =
36
Exercise. Find P (X = k) for each k ∈ Ran(X) in Example 11.1.
Example. Suppose a coin is tossed four times. What is the probability that there are:
(a) exactly two heads;
(b) no heads;
(c) at least two heads;
(d) not more than three heads;
(e) more than one head.
Solution.Let X be the number of heads. Then,
X −1 (0) = {TTTT}
X −1 (1) = {TTTH, TTHT, THTT, HTTT}
X −1 (2) = {TTHH, THTH, THTH, THHT, HTHT, HHTT}
X −1 (3) = {THHH, HTHH, HHTH, HHHT}
X −1 (4) = {HHHH}

57
Then,
1
P (X = 0) =
16
4 1
P (X = 1) = =
16 4
6 3
P (X = 2) = =
16 8
4 1
P (X = 3) = =
16 4
1
P (X = 4) =
16
Hence,
(a)
P (exactly two heads) = P (X = 2)
3
= .
8
(b)
P (no heads) = P (X = 0)
1
= .
16
(c)
P (at least two heads) = P (X ≥ 2)
= P (X = 2) + P (X = 3) + P (X = 4)
6 4 1
= + +
16 16 16
11
= .
16
(d)
P (not more than three heads) = P (X ≤ 3)
= P (X = 0) + P (X = 1) + P (X = 2) + P (X = 3)
1
=
16
(e)
P (more than one head) = P (X > 1)
= P (X = 2) + P (X = 3) + P (X = 4)
11
=
16
Exercise. Two dice are rolled. Let X be the sum of the two dice. Using random variables, what is the
probability that
(a) the outcome is even;
(b) the outcome is at least 10;
(c) the outcome is not a multiple of three and greater than 7?

58
Lecture 12 - Distributions

Lecture Objectives
(1) Define probability distribution function and cumulative distribution function

(2) State and prove some properties of distribution functions

(3) Calculate probabilities using the probability distribution functions and cumulative distribution functions
of random variables

Now, we will put random variables into work — the distribution function. The distribution function works
for random variables in the same way as probability works for events. Through distributions, probability is
directly map from (the value of) the random variable (and not from an event).

12.1 The Distribution Function


So far we have already defined random variables and how they work and help in establishing a “new sample
space” Ran(X) and events X = k. We have also calculated probabilities in terms of random variables.
The probability distribution function (pdf) p of a random variable X is a function R → [0, 1] such
that if E is an event, then P (E) = p(X(E)). In other words, p is an induced probability function by X.
(Take notice that we use the small letter p for pdf and capital P for probability.) When there is a need to
emphasize that X is the random variable which induced p, we will write pX . There is, however, a need to
separately define pdf’s for discrete and continuous rv’s. This shall be tackled in the next two sections.

12.2 Probability Mass Function (Discrete Random Variables)


Let X be a discrete random variable. Then the distribution p induced by X is called the probability mass
function defined simply as that
p(k) = P (X = k).
One may think of the pmf p(k) simply as a shorter notation for P (X = k). It is understood that when
a∈
/ Ran(X), X = a is empty so that p(a) = 0. This will come in handy in later lectures.

Example 12.1. Let X be the number of tails in tossing a coin 4 times. Then, find the probability in terms
of pmf ’s that (a) there are exactly two heads and (b) there are an even number of heads.
3 1
Solution. (a) p(2) = , (b) p(0) + p(2) + p(4) = .
8 2
Exercise. Let X be a discrete random variable with Ran(X) = {1, 2, 3, 4, 5} and define p(k) = 1/5 for
k = 1, 2, 3, 4, 5. Find the probability that (a) X is even, (b) X ≥ 3.

A distribution function, when explicitly defined, may add convenience in the computation of probability
values. This is because pdf’s may serve as “formulas” in calculating probability. For example, in the

59
experiment of tossing a coin n times, and if X is defined as the number of heads, then
 
n 1
p(k) = .
k 2n
(The derivation of this formula will be dealt with more emphasis in Lecture 14.) With this, it will be easier
to answer problems like, “what is the probability that there will be 2 heads if n = 3?”
 
3 1
p(2) =
2 23
3
=
8
For another example, if X is the sum in rolling two dice, then
6 − |7 − k|
p(k) = .
36
There are many other examples which can be presented in the next chapter.
The first and most basic property of distribution functions comes from the fact that for an experiment
with sample space S, P (S) = 1.
Theorem 12.1 (Normalization Property for Discrete Random Variables). Let X be a discrete random
variable with pdf p(k). Then, X
p(k) = 1.
k∈R
P P
Proof. Since p(k) = 0 for k ∈
/ Ran(X), k∈R p(k) = k∈Ran(X) p(k). Now, we know that for each j, k ∈
Ran(X) with j 6= k, X = j and X = k are mutually exclusive. Thus,
X
p(k) = p(k1 ) + p(k2 ) + · · ·
k∈Ran(X)

= P (X = k1 ) + P (X = k2 ) + · · ·
= P (X −1 (k1 ) ∪ X −1 (k2 ) ∪ · · · )
= P (S)
= 1,
where k1 , k2 , . . . ∈ Ran(X).
This property may be applied, for example, in the following problem.
k−1
Example 12.2. Let X be a discrete random variable with Ran(X) = {1, 2, 3, 4, 5}. Find c if p(k) =
c
for k = 1, 2, 3, 4, 5.
Solution. Since we must have X
p(k) = 1,
k∈Ran(X)

then, we have
1 = p(1) + p(2) + p(3) + p(4) + p(5)
1−1 2−1 3−1 4−1 5−1
= + + + +
c c c c c
1 2 3 4
= + + +
c c c c
10
1=
c

60
Thus, c = 10.
Exercise. Let X be a discrete random variable with Ran(X) = {−2, −1, 0, 1, 2} with p(k) = ck 2 . Find
p(2).

Exercise. Suppose X is a discrete random variable with possible values Ran(X) = {1, 2, 3}. Is X well-
5
defined if p(k) = for k ∈ Ran(X)?
4 + k2

12.3 Probability Density Function (Continuous Random Variables)


We now give the definition of distribution functions for continuous random variables. First, we have to
understand that the probability of a continuous random variable for a specific point is zero. That is, if X
is continous, the probability that X = k for some k ∈ R is zero, i.e., P (X = k) = 0. Instead, we define
probability for continuous random variables for intervals. For example, it makes more sense to ask the
probability of a < X < b for a, b ∈ R than to ask for the probability of X = k.

Definition 12.1. Let X be a continuous random variable. Then, the probability distribution function
f of X satisfies
Z b
P (a < X < b) = f (x)dx.
a
Ra
The definition coincides with the earlier remark that P (X = a) = P (a ≤ X ≤ a) = a
f (x)dx = 0.
Exercise. Let
 2
x
if 0 ≤ x ≤ 1
f (x) = 3
0 if otherwise.

be the associated pdf of a continuous random variable X. Find

(1) P (0 ≤ X ≤ 0.2)

(2) P (0.5 ≤ X ≤ 1.5)

(3) P (−1 ≤ X ≤ 2)

Next, we give an equivalent of Theorem 12.1 for the case of continuous random variables.

Theorem 12.2 (Normalization Property of Continuous Random Variables). Let X be a continuous random
variable with pdf f (x). Then,
Z +∞
f (x)dx = 1.
−∞

Proof. The interval (−∞, ∞) corresponds the sample space S. Hence, 1 = P (S) = P (−∞ < X < +∞) =
R +∞
−∞
f (x)dx.

It is important to recall the following theorem from Calculus:

Theorem 12.3. Let f be integrable on an interval (a, b). If c ∈ (a, b), then
Z b Z c Z b
f (x)dx = f (x)dx + f (x)dx.
a a c

61
Example 12.3. Let X be a continuous random variable with pdf
(
0 if x < 0
f (x) = −x
.
ce if otherwise

Find c.

Solution. We have
Z +∞
1= f (c)dx
−∞
Z +∞
= ce−x dx
0
−x ∞
 
= c −e 0
=c

Exercise. Find c (and d) of the following pdf’s:

(1)  (2)
0
 if x < 0 ( 2
f (x) = cx + 3 if 0 ≤ x ≤ 10 cx − dx if x < 0
f (x) = c
x if x ≥ 0

0 if x > 10

d

12.4 The Cumulative Distribution Function for Discrete RV’s


We have seen from the last two sections how pdf’s work in calculating probability. The cumulative distribution
function simplifies this problem. From this point forward, we will use the term probability distribution
function or pdf, taking into faith that the reader already understands whether it is a mass or a density
function.

Definition 12.2. Let X be a discrete rv with pdf p(k). Then, the cumulative distribution function
F (k) of X is defined by
Xa
F (a) = p(k).
k=−∞

Since p(k) = 0 for k ∈


/ Ran(X), we can reduce the summation above. That is, if k0 = min Ran(X), then
a
X
F (a) = p(k)
k=k0

where k0 = min Ran(X).


k−1
Example 12.4. Find the cdf of X with Ran(X) = {1, 2, 3, 4, 5} and p(k) = for k ∈ Ran(X).
10
Solution. We have
3
p(1) = 0 p(4) = 10
1
p(2) = 10 p(5) = 52
p(3) = 15

62
Thus,

F (1) = 0
1 1
F (2) = 0 + =
10 10
1 1 3
F (3) = + =
10 5 10
3 3 3
F (4) = + =
10 10 5
3 2
F (5) = + = 1
5 5

Or by definition, we may also have:


a
X k−1
F (a) =
10
k=1
a a
1 X 1 X
= k− 1
10 10
k=1 k=1
a(a + 1) a
= −
20 10
a(a − 1)
=
20

for a = 1, 2, 3, 4, 5.

Exercise. Suppose a coin is tossed 4 times. Let X be the number of heads. Find the cdf of X.

One of the central lessons expected for this chapter is to obtain the cdf of a random variable given its
pdf, and vice versa.

Example 12.5. Calculate the constants in the given distribution. Then, find the cdf (express in terms of
a).

(1) p(k) = ck 2 + k with Ran(X) = {1, 2, . . . , 10}

(2)

As a direct consequence of the normalization property, the following theorem is obtained.

Theorem 12.4. Let X be a discrete rv with pdf p(k) and cdf F (a). Then,

lim F (a) = 1.
a→∞

We leave the above theorem without proof. In the case of finite discrete rv’s, we only need to look at the
maximum value in Ran(X) to illustrate this theorem. That is, if k 0 = max(Ran(X)), then F (k 0 ) = 1.
Since p(k) ≥ 0 for any k ∈ Ran(X), we know that F (a) does not ever decrease in value whatever p(k) is.
The following theorem formalizes this property.

Theorem 12.5. Let X be a discrete rv with cdf F (a). Then, a1 > a2 , F (a1 ) ≥ F (a2 ).

63
12.5 The Cumulative Distribution Function for Continuous RV’s
***
Definition 12.3. Let X be a continuous random variable with pdf f (x). Then the cumulative distribution
function of X is defined as Z a
F (a) = f (x)dx
−∞
for a ∈ R.
In other words,
F (a) = P (X ≤ a).
There are books which establish definitions of cdf’s for discrete rv’s. However, since discrete rv pdf’s are
already capable of giving probability easily and quickly, we will not anymore define them here.
Example 12.6. Let X be a continuous rv with pdf
(
0 if x < 0
f (x) = .
e−x if otherwise

Find F (ln 10).


Solution. We have,
Z ln 10
F (ln 10) = f (x)dx
−∞
Z ln 10
= e−x dx
0
ln 10
= −e−x 0


9
=
10
Exercise. Let X be a continuous rv with pdf
(
0 if x ≤ 0,
f (x) = .
cx2 if otherwise

(a) Solve for c and (b) find F (4) and F (3); (c) find P (3 ≤ X ≤ 4).
Theorem 12.6. Let X be a continuous rv with pdf f (x). Then,
(i) if a < b, then F (a) < F (b); in other words, F is monotone nondecreasing; and,
(ii) lim F (a) = 0 and lim F (a) = 1.
a→−∞ a→∞

Proof. Let X be a continuous rv with pdf f (x).


(i) If a < b, then
Z b
F (b) = f (x)dx
−∞
Z a Z b
= f (x)dx + f (x)dx
−∞ a
= F (a) + P (a ≤ X ≤ b)
> F (a).

64
(ii) We have,
Z a
lim F (a) = lim f (x)dx
a→−∞ a→−∞ −∞
= 0;

and,
Z a
lim F (a) = lim f (x)dx
a→∞ a→∞ −∞
= 1.

When the value F (a) for varying a is needed in a problem, it might be more convenient to derive F (a)
out from a given f (x), or vice versa. We first deal with obtaining F (a) given f (x). Suppose we are given
with a pdf f (x) for a continuous rv X. The, by definition,
Z a
F (a) = f (x)dx.
−∞

Fram Calculus, we know that Z a


F (a) = lim f (x)dx.
y→−∞ y

Now if g(x) is a function such that g 0 (x) = f (x), then

F (a) = g(a) − lim g(y).


y→−∞

Example 12.7. Let X be a continuous rv with pdf


(
e−x if x > 0
f (x) = .
0 if x ≤ 0

Find F (a).
Solution. We have
Z Z
e−x dx = − e−x (−dx)

= −e−x + C.

Since f (x) = 0 for x ≤ 0. Hence, we only consider the domain interval (0, a). Thus,

F (a) = −e−a − (−e−0 ) = 1 − e−a for a > 0.

Exercise. Find F (a) for the following pdf’s:


 2
x
if 0 ≤ x ≤ 1
(a) f (x) = 3 .
0 if otherwise.
( 2
cx − dx if x < 0
(b) f (x) = c .
x if x ≥ 0
d

65
Now we study the reverse process of obtaining the pdf f (x) of a given cdf F (a). Since

F (a) = g(a) − lim g(y),


y→−∞

we can take the derivative of both sides with respect to a:

F 0 (a) = g 0 (a).

Hence, f (x) = F 0 (x).

Example 12.8. Let X be a continuous rv with cdf F (a) = 1 − e−a for a > 0. Find its pdf.
Solution. We have
d
f (x) = (1 − e−x ) = e−x
dx

66
Lecture 13 - Expectation

Lecture Objectives

(1) Define expectation.

(2) Calculate expected values and variances of given distributions.

(3) State and prove some properties of expectations, expected values,


and variances.

The central goal of this course is to provide theoretical foundation


to statistical methods. It is understood that Probability is a course
in mathematical statistics whose objective is to provide the mathe-
matical theory behind the development of different statistical tools.
Thus, we have already introduced distributions. In the next chapter,
we will study special distributions which play a vital role in statisti-
cal tools development. For this lecture, we will look at mathematical
expectation which serves as “measurements” of the distribution.

13.1 Expected Value - Discrete Random Variables

We start with the definition of expected value.


Definition 13.1. Let X be a discrete rv with pdf p. Then the
expected value E[X] of X is defined by
X
E[X] = kp(k).
k∈Ran(X)

67
The expected value E[X] is sometimes called the theoretical
mean and may be denoted by µ. To calculate E[X], see the follow-
ing example.
Example 13.1. Let X be a discrete rv with pdf
k
p(k) =
10
with Ran(X) = {1, 2, 3, 4}. (Verify if p(k) is well-defined.) Find
E[X].
Solution. (We leave verifying p(k) is well-defined.) We write the
table:
k p(k) kp(k)
1 1
1
10 10
2 4
2
10 10
3 9
3
10 10
4 16
4
10 10
30
Total =3
10
Therefore, E[X] = 3.
Exercise. Let X be a discrete rv with Ran(X) = {0, 1, 2, 3} and
c(k + 1)
pdf p(k) = . Find c and E[X].
5
Exercise.
(a) Let X be the sum in rolling two dice. Find E[X].
(b) Suppose 4 coins are tossed and let X be the number of heads.
Find E[X].
68
Now let us turn our attention to expected values of functions of
random variables. A random variable may also be placed inside some
function φ, with the property that P (X = k) = P (φ(X) = φ(k)).
That is, if X is an rv, we may define Y = φ(X). For example, if
X is the outcome of rolling a die, and Y is defined by Y = 2X,
then we know that Ran(Y ) = {2, 4, 6, 8, 10, 12} because Ran(X) =
{1, 2, 3, 4, 5, 6}.
Example 13.2. Let X be a discrete rv with Ran(X) = {1, 2, 3, 4}
and p(1) = 0.1, p(2) = 0.2, p(3) = 0.3, p(4) = 0.4. Let Y = X 2.
Find E[Y ].
Solution. In the above example, we are asked to find the expected
value of Y = X 2 with range Ran(Y ) = {1, 4, 9, 16}. What we have
to understand is that the event X = k is being mapped to the event
Y = k 2. Since the function must preserve probability, we must have
pX (k) = pY (k 2). That is, pX (1) = pY (1) = 0.1, pX (2) = pY (4) =
0.2, pX (3) = pY (9) = 0.3, pX (4) = pY (16) = 0.4. Now,
k pY (k) kpY (k)
1 0.1 0.1
4 0.2 0.8
9 0.3 2.7
16 0.4 6.4
Total 10
We can generalize the next theorem from the above example.
Theorem 13.1. The expected value of a function of a random
variable may also be calculated by
X
E[φ(X)] = φ(k)p(k).
k∈Ran(X)

69
Proof.

Example 13.3. In rolling a die, let X be the outcome. Find


E[X] and E[2X].
Solution. (We will only show for E[2X]). We have
k 2k p(k) 2kp(k)
1 2
1 2
6 6
1 4
2 4
6 6
1 6
3 6
6 6
1 8
4 8
6 6
1 10
5 10
6 6
1 12
6 12
6 6
Total 7
Hence, E[2X] = 7.

Exercise. Find (i) E[2X −3] and (ii) E[X 2 −X] from the example
above.

We now proceed to present some properties of expected values for


discrete rv’s.
Theorem 13.2. Let X be a discrete rv. Then, for any a, b ∈ R,
E[aX + b] = aE[X] + b.
70
Proof. We have
X
E[aX + b] = (ak + b)p(k)
k∈Ran(X)
X X
= akp(k) + bp(k)
k∈Ran(X) k∈Ran(X)
X X
=a kp(k) + b p(k)
k∈Ran(X) k∈Ran(X)

= aE[X] + b

We can consider the results of Example 13.3 and part (i) of the
last exercise to verify this theorem. In particular, the above theorem
states that E[aX] = aE[X]. Also, it states the exercise below.

Exercise. Show that E[c] = c, where c ∈ R is a constant.

Exercise. Show that E[φ1(X) + φ2(X)] = E[φ1(X)] + E[φ2(X)]


for any discrete rv X.

Sometimes, we call E[X n] the nth moment of X. Hence, we


call the mean µ = E[X] as the first moment, E[X 2] as the second
moment, and so on. Next, we will define variance.
Definition 13.2. Let X be a discrete rv. Then Var(X)
p = E[(X−
µ)2] is called the variance of X. Further, σ = V ar(X) is
called the standard deviation of X, so that V ar(X) may also
denoted by σ 2.
71
Example 13.4. Suppose 4 coins are tossed and let X be the
number of heads. Find σ.
Solution. We need the mean µ first, then the values (k − µ)2 to
find E[(X − µ)2]. We have
k p(k) kp(k) k − µ (k − µ)2 (k − µ)2p(k)
1 1
0 0 −1 1
16 16
4 4
1 0 0 0
16 16
6 12 6
2 1 1
16 16 16
4 12 16
3 2 4
16 16 16
1 4 9
4 3 9
16 16 16
2
µ=2 σ =2

Thus, σ = 2.

In practice, the mean µ is interpreted as the “center” of the dis-


tribution while the standard deviation σ is a measure of “spread” of
the distribution.

Exercise. Let X be the sum in rolling two dice. Find σ.

Below, we characterize the calculation of variance.


Theorem 13.3. Let X be a discrete rv. Then,
Var(X) = E[X 2] − (E[X])2.
72
Or in other words, since Var(X) = E[X 2] − µ2, the variance of
a discrete rv is the difference between the second moment and the
square of the first moment.
Proof of Theorem 13.3. We have,
Var(X) = E[(X − µ)2]
= E[X 2 − 2µX + µ2]
= E[X 2] − 2µE[X] + E[µ2]
= E[X 2] − 2µ2 + µ2
= E[X 2] − µ2
= E[X 2] − (E[X])2

Next, a property of variance.


Theorem 13.4. Let X be a discrete rv. Then,
Var(aX + b) = a2Var(X).
Note that b is not preserved in this property. This implies that
variance is translation invariant.

Proof of Theorem 13.4. We have,


Var(aX + b) = E[(aX + b)2] − (E[aX + b])2
= E[a2X 2 + 2abX + b2] − (aµ + b)2
= a2E[X 2] + 2abE[X] + b2 − (a2µ2 + 2abµ + b2)
= a2E[X 2] − a2µ2
= a2(E[X 2] − µ2)
= a2Var(X).
73
Exercise. What is Var(c) for any constant c?
13.2 Expected Value - Continuous Random Variables
We now proceed with an equivalent discussion of the previous section
for continuous random variables.
Definition 13.3. Let X be a continuous rv with pdf f (x). Then,
the expected value of X is defined by
Z ∞
E[X] = xf (x)dx.
−∞
Example 13.5. Let X be a continuous rv with pdf

0 if x < 0


f (x) = 2x if 0 ≤ x ≤ 1 .

0

if 0x > 1
(Verify if f (x) is well-defined.) Find E[X].
Solution. (We will skip verifying f (x) is well-defined.) We have,
Z ∞
E[X] = xf (x)dx
Z−∞ 1
= 2x2dx
0
Z 1
=2 x2dx
0 3 1
x
=2
3 0
2
= .
3
74
Exercise. Let X be a continuous rv with pdf



 0 if x < 0
cx2 if 0 ≤ x ≤ 2

f (x) = .


 2cx if 2 ≤ x ≤ 3

0 if x > 3
Determine c, then find E[X].
As in for discrete rv’s, we can also calculate expected values of
continuous rv’s.
Theorem 13.5. Let X be a continuous rv with pdf f (x). Then,
Z ∞
E[φ(X)] = φ(x)f (x)dx.
−∞
Proof. Exercise.
Exercise. Find E[2X − 6] and E[X 2] from Example 13.5.

We also provide the same property for the transformation aX + b.


Theorem 13.6. Let X be a continuous rv with pdf f (x). We
have
E[aX + b] = aE[X] + b
for constants a, b ∈ R.
Proof.
Z ∞
E[aX + b] = (ax + b)f (x)dx
−∞
Z ∞ Z ∞
=a xf (x)dx + b f (x)dx
−∞ −∞
= aE[X] + b.
75
Lastly we define variance for continuous rv’s which is completely
identical to that of discrete rv’s.
Definition 13.4. Let X be a continuous rv with pdf f (x). Then,
the variance of X is defined by
Var(X) = E[(X − µ)2].

To illustrate, see the following example.


Example 13.6. Find Var(X) in Example 13.5.
Solution. We have
Var(X) = E[(X − µ)2]
Z ∞ 2
2
= x− f (x)dx
−∞ 3
Z 1 2
2
= x− 2xdx
0 3
Z 1 Z 1 Z 1
8 8
=2 x3dx − x2dx + xdx
0  3
 0   03
1 8 1 8 1
=2 − +
4 3 3 3 2
8
= .
9
Exercise. Find Var(X) from the above exercise.

The same property of variances hold for continuous rv’s.


76
Theorem 13.7. Let X be a continuous rv with pdf f (x). For
any a, b ∈ R,
Var(aX + b) = a2Var(X).
Proof. Exercise.
We also have the following characterization:
Theorem 13.8. Let X be a continuous rv with pdf f (x). Then,
Var(X) = E[X 2] − (E[X])2.
Proof. Exercise.

77
78
Chapter 3

Special Random Variables

We have so far learned about random variables. We also understand


that random variables induce a probability function called distribu-
tions. We have seen a variety of examples of distributions both mass
(discrete) and density (continuous) functions. However, we will have
to realize that some distributions are more important than others.
“More important” in the sense that they may be naturally occurring
or are commonly used in the practice of statistics.
Our primary objective in this chapter is to name some important
discrete and continuous random variables and their distributions and
investigate them.
Intended Learning Outcomes
(CO 3.1) Name some commonly used special discrete and continuous random variables and their prop-
erties
(CO 3.2) Give examples of experiments yielding special random variables

(CO 3.3) Compute probabilities, means, and variances of special random variables

Lectures
Lecture 14. Special Discrete Random Variables
Lecture 15. Special Continuous Random Variables
Lecture 14 - Special Discrete Random Variables

Lecture Objectives
(1) Define special discrete random variables: uniform, Bernoulli, binomial, Poisson, hypergeometric, negative
binomial, and geometric distributions.
(2) Give examples of experiments that generate these special random variables.
(3) Calculate probabilities, means, and variances of these random variables using their corresponding special
distribution.

There are random variables that arise naturally from common experiments. This is why we realize that some
distributions are more important than others. We call these special random variables. When the experiment
yields a discrete sample space, we need a discrete random variable, of whose special cases are called special
discrete random variables.
In this lecture, we will study a few special discrete random variables including the corresponding experi-
ments that generate them and their distributions. Then, calculate probability of these random variables and
their common properties and measurements such as mean and variance.

14.1 Parameters and Other Conventions


Before we study some special rv’s, we introduce first parameters. Parameters characterize a special random
variable. That is, special random variables come in families and for each value of the parameters, we obtain
a unique member of that family.
Each special random variable may need different parameters. For example, the uniform discrete rv
requires only one parameter n which is the number of outcomes in the sample space of the experiment. The
Bernoulli and binomial rv’s require p and (p, n) respectively.
Because these distributions are special, we use special notations to replace the usual symbol X. For
example, the uniform discrete rv is denoted by

X = Uni(n)

where n is its parameter. The binomial rv is denoted by

X = Bin(p, n)

where n, p are its parameters.


We will still use the notation p(k) to denote the distribution of a random variable X. When it is necessary
to emphasize X being the rv involved, we may write pX (k).

14.2 Uniform Discrete Random Variable


The uniform discrete random variable X = Uni(n) with parameter n ≥ 1 is defined by the distribution
1
p(k) = for all k = 1, 2, . . . , n.
n

80
In other words, the uniform discrete rv is one whose outcomes are equally likely. This rv occurs in experiments
with a (i) finite number of outcomes and (ii) each outcome is equally likely to occur. Examples include:

(1) rolling a die;

(2) drawing an object from an urn containing n objects, with each object having an equal likelihood of being
drawn;

(3) selecting a person from a group with n members, with each person having an equal likelihood of being
drawn.

In the uniform rv Uni(n), the parameter n is interpreted as the number of possible outcomes (or the
1
sample size) of the experiment. Hence p(k) = means the probability that the kth sample point occurs from
n
1
a set of n possible outcomes is . This might be the time where it is best to describe experiments as fair
n
experiments in order to assume equality of likelihood among all possible outcomes. So we might emphasize
fair die, fair coin, fair selection.
1
For example, the experiment rolling a fair die has the distribution p(k) = for k = 1, 2, 3, 4, 5, 6. So for
6
example, when asked for the probability that 3 occurs is written as

1
p(3) = .
6

Example 14.1. An urn contains 9 chips numbered 1 to 9. A chip is drawn randomly. Write the probability
that the chip numbered k is drawn using the notation for uniform discrete probability distribution. What is
the probability that 4 is drawn?

Solution. We let X be the number of the chip being drawn. Since each chip must be equally likely to be
drawn, then X is a uniform discrete rv with parameter n.

Exercise. A card is to be randomly drawn from the standard deck. Assign each card to the following
values of k: A♠ 7→ 1, 2♠ 7→ 2, . . . , K♠ 7→ 13, A♣ 7→ 14, . . . , A♦ 7→ 27, A♥ 7→ 40, . . . , K♥ 7→ 52. Express the
probability in terms of the notation above. What is the probability that 4♦ is drawn.

The following bar charts illustrate the effects of the parameter n to the probability distribution. The
vertical axis indicates the probability value for each value of the uniform rv indicated in the horizontal axis.

81
n=2 n=4

n=6 n = 20

We can see from the above figures that as n increases, the uniform probability value decreases. This gives
the common realization that as the sample size increases, the likelihood of each sample point decreases. We
simulate rolling a die 10 000 times. The figure below shows the relative frequencies of each outcome.

1
The dashed line represents the theoretical probability ≈ 0.1667. We can observe the closeness between
6
the actual relative frequencies and the theoretical probabilities.
Let us show that the uniform discrete distribution is well-defined. The range of the uniform discrete rv
is {1, 2, . . . , n}. Hence,

n n  
X X 1 1
p(k) = =n = 1.
n n
k=1 k=1

82
Now we calculate the mean and variance:
n
X
µ= kp(k)
k=1
n
X k
=
n
k=1
n
1X
= k
n
k=1
n(n − 1)
=
2n
n−1
= .
2
Also,
n  2
2
X
2 n−1
σ = k p(k) −
2
k=1
n
1 X 2 (n − 1)2
= k −
n 4
k=1
n(n − 1)(2n + 1) (n − 1)2
= −
6n 4
(n − 1)(n + 5)
= .
12
Hence,
r
1 (n − 1)(n + 5)
σ= .
2 3
Example 14.2. Find the mean and variance of the experiment in Example 14.1.

Solution. Since n = 9, we have


9−1
µ= =4
2
and,
r r
1 8 · 14 7
σ= =2 .
2 3 3
Exercise. Find mean and standard deviation of the experiment in the last exercise.

14.3 Bernoulli Random Variable


We consider an experiment with two possible outcomes called success and failure. Any experiment of this
model is called a Bernoulli experiment. We define the random variable X = Ber(p) by
(
1 if success
X= .
0 if failure

83
The distribution induced by this random variable is called the Bernoulli distribution with parameter p,
where p = p(1). By complement law, p(0) = 1 − p. Hence we have
(
p if k = 1
p(k) = .
1 − p if k = 0

It is easy to show that p(k) is well-defined: p + (1 − p) = 1.


The Bernoulli rv commonly occurs in experiment with only two possible outcomes. Examples include
pass-fail experiments, coin toss, yes-no questions, true-false statements, etc. The parameter p is interpreted
as the probability of success.
For example, in the experiment of tossing a fair coin, we set head to be our success X = 1 while tail
to be our failure X = 0 (this assignment is not strict – head may also be assigned X = 0 and tail X = 1).
1
Since the coin is fair, we know that the probability of success (or head) is p = . This is now our parameter
2
p. That means our distribution is (
1
if X = 1
p(k) = 21 .
2 if X = 0
Exercise. Suppose a class has 11 males and 13 females. A student has to be randomly selected. Assign
X = 1 for a male student and X = 0 for a female student. Establish the Bernoulli distribution for this
experiment.

The following bar charts illustrate the effects of the parameter p to the Bernoulli probability distribution.
The vertical axis indicates the probability for each value of the Bernoulli rv in the horizontal axis.

p = 0.75 p = 0.5 p = 0.3

To illustrate the Bernoulli rv, we simulate the experiment in the exercise above 10 000 times. The results
are illustrated in the chart below.

The actual relative frequencies are illustrated through the columns while the theoretical probabilities
11 13
p= ≈ 0.4583 (success) and 1 − p = ≈ 0.5417 (failure) are illustrated through the dotted lines.
24 24

84
Next, we compute the mean:

µ = 1 · p + 0 · (1 − p)
=p

and the variance:

σ 2 = [12 · p + 02 · (1 − p)] − p2
= p − p2
= p(1 − p).

The mean of tossing a fair coin is


   
1 1 1
µ=1· +0· = .
2 2 2

The variance is  
1 1 1
σ= 1− = .
2 2 4
Exercise. Find the mean and variance of the experiment in the previous exercise.

14.4 Binomial Random Variable


We have previously defined a Bernoulli experiment. Now, a binomial experiment is a Bernoulli experiment
repeated in a fixed number of times. In this experiment, we are interested with the number of times success
occurred. For example, if we have a Bernoulli experiment with parameter p repeated n times, and we
define X = Bin(p, n) the number of times success occurred, then the distribution induced by X is called the
binomial distribution with parameters p, n, with 0 ≤ p ≤ 1 and n ∈ Z+ defined by
 
n k
p(k) = p (1 − p)n−k
k

where k = 0, 1, 2, . . . , n. The binomial distribution is interpreted as the probability that success will occur k
times if a Bernoulli experiment with success probability p is repeated n times.
The derivation of p(k) may be taken from Lecture 04. If S is the success outcome and F the failure, then
the coefficients of the expansion of (S + F )n tell us the number of cases for every combination possible. In
n

particular, the coefficient k of the term of S k F n−k is the number of ways for k successes and n − k failures
to occur. Since each success has a probability p, we know that k such successes require a probability factor
of pk while since the failure probability is 1 − p, n − k failures require the probability factor (1 − p)n−k .
Example 14.3. Suppose a fair coin is tossed 5 times. Find the probability that there are exactly 3 heads.
1
Solution. We have n = 5 and p = , i.e., X = Bin( 12 , 5). Hence, for k = 3, we have
2
   3  2
5 1 1
p(3) =
3 2 2
 
1
= 10
32
5
=
16

85
Exercise. In a basketball game, Team A has 0.7 probability of winning against Team B. In a 7-game series,
what is the probability that they win at least 5 games?

To illustrate the effects of the parameters p and n to the probability distribution of X = Bin(p, n), the
following figures are presented.

p = 0.5, n = 4 p = 0.5, n = 6 p = 0.5, n = 9

p = 0.25, n = 10 p = 0.5, n = 10 p = 0.9, n = 10

We can notice observable effects of both parameters p and n. One important property to see is that
the success probability p affects the “skewness” of the distribution. That is, p-values closer to 1 shifts
the distribution to be heavier on the right (or skewed to the left), while p-values closer to zero shifts
the distribution to the left (or skewed to the right). Further, the figure below shows the distribution of
X = Bin(0.5, 50).

The binomial distribution gives us a preview to the already-familiar “bell-curve” of the normal distribution
we already know from introductory statistics. We will see more of this relationship between binomial and
normal distribution in the next lecture.
Next, we simulate tossing 10 fair coins 10 000 times. If X is the number of heads in this experiment,
then the relative frequencies for X = 0, 1, 2, . . . , 10 are shown below.

86
The columns show the actual relative frequencies while the dashed lines are the theoretical values. We
also simulate X = Bin(0.75, 10).

Let us show that p(k) is well-defined for X = Bin(p, n). We have

n n  
X X n
p(k) = pk (1 − p)n−k
k
k=0 k=0
= (p + 1 − p)n
= 1.

The computation of the binomial rv’s mean requires the use of qth moments. We have, the qth moment
E[X q ] of X:

n  
X n
E[X q ] = k q pk (1 − p)n−k
k
k=0
n  
q n
X
= k pk (1 − p)n−k
k
k=1

since the term with k = 0 zeroes out. Now, using the identity (proof is an exercise)
   
n n−1
k =n ,
k k−1

87
gives
n  
X n−1 k
E[X q ] = k q−1 n p (1 − p)n−k
k−1
k=1
n  
q−1 n − 1
X
= np k pk−1 (1 − p)n−k
k−1
k=1

Now if we let j = k − 1,
n−1  
X n−1 j
E[X q ] = np (j + 1)q−1 p (1 − p)n−1−j
j=0
j

We can observe from the summation factor that this is the (q − 1)th moment of an r.v. Y + 1 where
Y = Bin(p, n − 1). Hence
E[X q ] = npE[(Y + 1)q−1 ].
So if q = 1, we have the mean of the binomial r.v. X = Bin(p, n):

E[X] = np.

Now, we calculate the variance. Since Var(X) = E[X 2 ] − µ2 , and if X = Bin(p, n), we have

Var(X) = E[X 2 ] − (E[X])2


= npE[Y + 1] − (np)2 where Y = Bin(p, n − 1)
2
= np(E[Y ] + 1) − (np)
= np[(n − 1)p + 1] − (np)2
= np(1 − p).

Example 14.4. A coin is tossed 10 times. If X is the number of times head occurred, find the expected
value and variance of X.
Solution. Assuming that the coin is fair, we know that X = Bin(0.5, 20). Hence,

E[X] = 20 · 0.5
= 10.

(One may interpret the expected value as the average or theoretical number of times that head will occur.)
Further,

Var(X) = 20 · 0.5(1 − 0.5)


= 5.

Exercise. Suppose that in a basketball game, TEAM A is twice more likely to win over TEAM B. (i) Find the
probability p that TEAM B wins. (ii) Find the expected number and variance of TEAM B wins in a 7-game series.

14.5 Poisson Random Variable


The binomial rv has rich applications because of its simplicity and practicality. However, in some experiments
with large number n of trials and small success probability p, the binomial pdf becomes difficult to handle

88
computationally. This is where the Poisson rv comes in. The primary role of this random variable is to
estimate the binomial rv especially for cases with large n and small p.
Recall the binomial distribution for an rv X = Bin(p, n):
 
n k
p(k) = p (1 − p)n−k
k
n!
= pk (1 − p)n−k
(n − k)!k!

λ
We define a constant λ = np so that p = . This means that p(k) becomes
n
 k  n−k
n! λ λ
p(k) = 1−
(n − k)!k! n n
n
n(n − 1)(n − 2) · · · (n − k + 1) λk (1 − λ/n)
= .
nk k! (1 − λ/n)k

Observe that when n → ∞,


n(n − 1)(n − 2) · · · (n − k + 1)
→ 1,
nk
 n
λ
1− → e−λ , and,
n
 k
λ
1− → 1.
n
The equation then becomes
λk
p(k) = e−λ .
k!
This is the distribution of a Poisson random variable X = Pos(λ) with parameter λ. The Poisson rv
takes values k = 0, 1, 2, . . .. It is a powerful estimator of the binomial Bin(p, n) when p is small and n is
large. Common examples include:

(1) number of machine errors in mass production;

(2) number of individuals with a rare characteristic, abnormality, or disorder in a large population; or,

(3) number of units of a slow-selling product sold in a day.

In other words, a random variable X can take a Poisson distribution if it is a count of the occurrence some
rare value of a variable from a large (or infinite) population. Consider the example below.

Example 14.5. A publishing company prints books and averagely commits 3 typographical errors per 100
pages. What is the probability that it makes at most 5 errors in a 1000-page book?

Solution. Let X be the number of typos. Then the probability of committing an error in a page is
p = 3/100 = 0.03. Since the book has 1000 pages, we have λ = 1000 · 0.03 = 30. Thus,

30k
p(k) = e−30 .
k!

89
Then,

P (X ≤ 5) = p(0) + p(1) + p(2) + p(3) + p(4) + p(5)


300 301 302 303 304 305
= e−30 + e−30 + e−30 + e−30 + e−30 + e−30
0! 1! 2! 3! 4! 5!
≈ 2.26 × 10−8

Exercise. WHO data shows that 0.1% of newborn babies suffer from Down’s Syndrome. In the Philippines,
around 5,000 babies are born everyday. What is the probability that at least one baby born today has
Down’s Syndrome?

Next, we compute the mean of the Poisson rv. Let X = Pos(λ) with parameter λ. Then,

X λk
E[X] = ke−λ
k!
k=0

X λk
= ke−λ
k!
k=1

X λk−1
=λ e−λ where j = k − 1
(k − 1)!
k=1

X λj
=λ e−λ
j=0
j!
= λ.

Next, we compute the variance. In order to do this, we first compute the second moment E[X 2 ]

X λk
E[X 2 ] = k 2 e−λ
k!
k=0

X λk−1
=λ ke−λ
(k − 1)!
k=1

X λj
=λ (j + 1)e−λ where j = k − 1
j=0
j!
∞ j ∞
X
−λ λ
X λj
=λ je +λ e−λ
j=0
j! j=0
j!
= λ2 + λ

Now since Var(X) = E[X 2 ] − (E[X])2 , we have

Var(X) = λ2 + λ − λ2

Example 14.6. In Example 14.5, find E[X] and Var(X).

Solution. Since λ = 30, we have E[X] = 30, and Var(X) = 30.

90
Bibliography

[1] Anderson, D.F., Seppäläinen, T., & Valkó, B. (2017). Introduction to probability.
[2] AoPS Online (2018). Pascal’s Identity. Retrieved from https://artofproblemsolving/wiki/index.
php/Pascal%27s Identity.

[3] Castagnoli, E., Cigola, M., & Peccati, L. (2017). Probability.


[4] Department of Combinatorics and Optimization, University of Waterloo (2016). Introduction to combi-
natorics: Course notes for Math 239.
[5] Grinstead, C.M. & Snell, J.L. (2012). Introduction to probability: Second revised edition. American
Mathematical Society.
[6] Hogg, R.V., McKean, J.W., & Craig, A.T. (2019). Introduction to mathematical statistics, 8th Ed..
Pearson Education.
[7] Larsen, R.J. & Marx, M.L. (2018). An introduction to mathematical statistics, 6th Ed.. Pearson.

[8] Mood, A.M., Graybill, F.A., & Boes, D.C. (1963). Introduction to the theory of statistics.
[9] Ross, S.M. (2014). A first course in probability, 9th Ed.. Pearson.
[10] Schiller Jr., J.J., Spiegel, M.R., & Srinivasan, R.A. (2013). Schaum’s outline of probability and statistics:
897 solved problems + 20 videos. McGraw Hill Professional.

[11] Yehudayoff, A. (2017). An introduction to combinatorics [Lecture notes]. Department of Mathematics,


Technion-IIT.

91

You might also like