Lecture Notes 1
Lecture Notes 1
1
SAMPLE SPACE AND PROBABILITY
PROBABILISTIC MODELS
2
PROBABILISTIC MODELS
Event A
Random
Experiment
In probability theory,
- we usually call subsets of the sample space as “events”
3
PROBABILISTIC MODELS
Sample Space (set of possible outcomes) Probability Law
Event A
P A
Random
Experiment Event B P B
A B Events
- The probability law encodes our knowledge or belief about which events are
more likely to occur compared to others.
4
PROBABILISTIC MODELS
P(A) represents our knowledge or belief about how likely event A is to occur.
A
5
PROBABILISTIC MODELS
Choosing an Appropriate Sample Space
Elements of the sample space (outcomes) should be mutually exclusive and
collectively exhaustive.
Mutually Exclusive
When the experiment is carried out there is a unique outcome.
(If one outcome occurs, another cannot occur)
Collectively Exhaustive
No matter what happens in the experiment, we always obtain an outcome that
has been included in the sample space.
(None of the possible outcomes is forgotten)
7
PROBABILISTIC MODELS
Example: Single roll of a 6-sided die.
8
PROBABILISTIC MODELS
Example (Continued)
9
PROBABILISTIC MODELS
Sequential Models: Many experiments have an inherently sequential character
For example: receiving successive digits at a communication receiver
T
H
HT
TH
T H
T
TT
H : RESULT of the first toss / first stage
T : RESULT of the second toss / second stage
HT : OUTCOME of the experiment
10
Textbook: D. P. Bertsekas, J. N. Tsitsiklis, “Introduction to Probability”, 2nd Ed., Athena Science 2008.
PROBABILISTIC MODELS
Sequential Models
Experiment: Two rolls of a 6-sided die
4 2,1
2, 2
Second Roll
2
Root
3
2 6,1
6 6, 2
1 1 : RESULT of the first roll
1 2 3 4 5 6 2 : RESULT of the second roll
First Roll (1,2) : OUTCOME of the experiment 6,6
11
Textbook: D. P. Bertsekas, J. N. Tsitsiklis, “Introduction to Probability”, 2nd Ed., Athena Science 2008.
PROBABILISTIC MODELS
Probability Axioms
P 1
III. Additivity: If A and B are two disjoint events, then the probability of their union
satisfies
P A B P A P B
12
Textbook: D. P. Bertsekas, J. N. Tsitsiklis, “Introduction to Probability”, 2nd Ed., Athena Science 2008.
PROBABILISTIC MODELS
“Probability is common sense reduced to calculation.”
Laplace
Let's try to understand this definition through a concrete example:
There are two possible outcomes: heads (H) and tails (T).
If the coin is fair, i.e. we believe that heads and tails are “equally likely”, then we should
assign equal probabilities to the possible outcomes: P H P T 0.5
13
Textbook: D. P. Bertsekas, J. N. Tsitsiklis, “Introduction to Probability”, 2nd Ed., Athena Science 2008.
PROBABILISTIC MODELS
Example 1.2. (Continued)
Additivity axiom implies that P H , T P H P T 1
which is consistent with the normalization axiom.
Note that comma (,) in the eventH ,T is used to separate the elements. It is not used to mean "and" in this
notation. Actually, H , T H or T show up H T
1 P P 1 P ,
P 0
15
Textbook: D. P. Bertsekas, J. N. Tsitsiklis, “Introduction to Probability”, 2nd Ed., Athena Science 2008.
PROBABILISTIC MODELS
Another example: consider three disjoint events A1 , A2 ,and A3 . By using the additivity
axiom for two disjoint events repeatedly, we obtain that
P A1 A2 A3 P A1 A2 A3
P A1 P A2 A3
P A1 P A2 P A3
Generalization: Following the same procedure, a more general property for the case of
finitely many sets can be derived:
16
Textbook: D. P. Bertsekas, J. N. Tsitsiklis, “Introduction to Probability”, 2nd Ed., Athena Science 2008.
PROBABILISTIC MODELS
Special case of the above property:
Suppose that we are dealing with a finite set consists of a finite number of
possible outcomes. We want to calculate the probability of this set.
17
Textbook: D. P. Bertsekas, J. N. Tsitsiklis, “Introduction to Probability”, 2nd Ed., Athena Science 2008.
PROBABILISTIC MODELS
If the sample space consists of a finite number of possible outcomes, then the probability
law is specified by the probabilities of events that consists of a single element. In
particular, the probability of any event s1 ,s2 , , sk is the sum of the probabilities of its
individual elements
P s1 ,s 2 , , sk P s1 P s2 P sk
18
Textbook: D. P. Bertsekas, J. N. Tsitsiklis, “Introduction to Probability”, 2nd Ed., Athena Science 2008.
PROBABILISTIC MODELS
In the special case where the probabilities P s1 , P s2 , , P sn are all the same (by
necessity equal to 1 n , in the view of normalization axiom), we obtain the following:
number of elements of A
P A
n
For example:
A
P A P s1 ,s 2 , , sk
· S1 · Sk+1 P s1 P s2 P sk (by using discrete probability law)
· S2 · Sk+2 1 1 1
.
. .
.
.
.
n n n
· Sk · Sn k number of elements of A
n n
19
Textbook: D. P. Bertsekas, J. N. Tsitsiklis, “Introduction to Probability”, 2nd Ed., Athena Science 2008.
PROBABILISTIC MODELS
Example: Consider the experiment of rolling a pair of fair dice.
or Consider the experiment of two rolls of a fair die.
20
Textbook: D. P. Bertsekas, J. N. Tsitsiklis, “Introduction to Probability”, 2nd Ed., Athena Science 2008.
PROBABILISTIC MODELS
Properties of Probability Laws:
1) If A B, then P A P B
2) P A B P A P B P A B
3) P A B P A P B
4) P A B C P A P AC B P AC B C C
Proof of these properties can be obtained analytically or by using Venn diagrams. Proof is
given in the textbook. (HW)
21
Textbook: D. P. Bertsekas, J. N. Tsitsiklis, “Introduction to Probability”, 2nd Ed., Athena Science 2008.
PROBABILISTIC MODELS
Continuous Models
22
Textbook: D. P. Bertsekas, J. N. Tsitsiklis, “Introduction to Probability”, 2nd Ed., Athena Science 2008.
PROBABILISTIC MODELS
Continuous Models
Example 1.4. (Continued)
Note that the probabilities of the single-element events are not sufficient to
characterize the probability law for continuous models.
What can we do? “Area” or “length” may be used for defining a probability law.
This assignment satisfies the three probability axioms and qualifies as a legitimate
probability law.
Set Operations
AC x : x A Complement
A B x : x A or x B Union
A B x : x A and x B Intersection
24
Textbook: D. P. Bertsekas, J. N. Tsitsiklis, “Introduction to Probability”, 2nd Ed., Athena Science 2008.
SET THEORY
A
C C
A Double Complement
A B B A and A B B A Commutative Properties
A B C A B C Associative Properties
A B C A B C
A B C A B A C Distributive Properties
A B C A B A C
A AC
A A
A
A A
25
Textbook: D. P. Bertsekas, J. N. Tsitsiklis, “Introduction to Probability”, 2nd Ed., Athena Science 2008.
SET THEORY
A partition of a set: If the non-empty subsets A1 , , An are disjoint and their union is ,
then it is said that A1 , , An form a partition of .
A2
A1 A2 A1 A3
A B A B A
B
C C C
Simple Examples:
o In an experiment involving two successive rolls of a die, you are told that the sum
of the two rolls is 9. How likely is that the first roll was 6?
o In a word guessing game, the first letter of the word is a “t”. What is the likelihood
that the second letter is an “h”?
Real-world problems:
o A spot shows up on a radar screen. How likely it corresponding to an aircraft?
o How likely is it that a person has a disease given that a medical test was negative?
27
Textbook: D. P. Bertsekas, J. N. Tsitsiklis, “Introduction to Probability”, 2nd Ed., Athena Science 2008.
CONDITIONAL PROBABILITY
Fig. If B is known to have occurred, then A can occur only if A B occurs (Garcia,2008)
Knowledge that event B has occurred implies that the outcome of the experiment is in the
set B. In computing P A | B we can therefore view the experiment as now having the
reduced sample space B as shown in the figure above. The event A occurs in the reduced
sample space if and only if the outcome is in A B .
28
Textbook: D. P. Bertsekas, J. N. Tsitsiklis, “Introduction to Probability”, 2nd Ed., Athena Science 2008.
CONDITIONAL PROBABILITY
Definition: The probability of an event A under the condition that event B has occurred is
called conditional probability of “A given B”.
P A B
P A | B , P B 0
P B
If the possible outcomes are finitely many and equally likely, then
number of elementsof A B
P A | B .
number of elementsof B
29
Textbook: D. P. Bertsekas, J. N. Tsitsiklis, “Introduction to Probability”, 2nd Ed., Athena Science 2008.
CONDITIONAL PROBABILITY
Conditional Probabilities Specify a Legitimate Probability Law
P A B 0, P B 0 P A | B 0 , the non-negativity axiom is satisfied.
P B P B
P | B 1, the normalization axiom is satisfied.
P B P B
Verification of the additivity axiom: Let A1 and A2 be disjoint events related with the same
experiment.
P A1 A2 B
P A1 A2 | B
P B
P A1 B A2 B
(Note that A1 B the and A2 B are disjoint sets)
P B
P A1 B P A2 B
(by using the additivity axiom for unconditional probability law)
P B
P A1 B P A2 B
P B P B
P A1 | B P A2 | B
30
Textbook: D. P. Bertsekas, J. N. Tsitsiklis, “Introduction to Probability”, 2nd Ed., Athena Science 2008.
CONDITIONAL PROBABILITY
Example 1.6 We toss a fair coin three successive times. The events A and B are defined as
follow:
A= {more heads than tails come up},
B= {first toss is a head}
Find the conditional probability P A | B .
HHH,HHT,HTH,THH,TTH,THT,HTT,TTT
P A B 3 8 3 number of elementsof A B 3
P A | B or P A | B
P B 48 4 number of elementsof B 4
(revised probability)
31
Textbook: D. P. Bertsekas, J. N. Tsitsiklis, “Introduction to Probability”, 2nd Ed., Athena Science 2008.
CONDITIONAL PROBABILITY
Example Consider rolling a 6-sided die. Let
A= {number 1 shows up}, B= {an odd number shows up}, C= {number 1 or 2 shows up}
1,2,3,4,5,6
P A B P A 1 6 1
P A | B
P B P B 3 6 3
P C B P A 1 6 1
P C | B
P B P B 3 6 3
PB C 1 6 1
PB | C
P C 26 2
32
Textbook: D. P. Bertsekas, J. N. Tsitsiklis, “Introduction to Probability”, 2nd Ed., Athena Science 2008.
CONDITIONAL PROBABILITY
33
Textbook: D. P. Bertsekas, J. N. Tsitsiklis, “Introduction to Probability”, 2nd Ed., Athena Science 2008.
CONDITIONAL PROBABILITY
Using Conditional Probability for Modeling
P AC B ?
P A BC ?
34
Textbook: D. P. Bertsekas, J. N. Tsitsiklis, “Introduction to Probability”, 2nd Ed., Athena Science 2008.
CONDITIONAL PROBABILITY
Example 1.9. Radar Detection (Continued)
0 .99
A
P
B |
Aircraft Present P
BC
P A 0.05
|A
0.0
1
Missed Detection
.1 0
P AC 0.95
C 0 False Alarm
P
B |A
Extending the preceding example, we have a general rule for calculating various
probabilities in conjunction with a tree-based sequential description of an experiment.
In particular:
(a) We set up the tree so that an event of interest is associated with a leaf. We view the
occurrence of the event as a sequence of steps, namely, the traversals of the branches
along the path from the root to the leaf.
(b) We record the conditional probabilities associated with the branches of the tree.
(c) We obtain the probability of a leaf by multiplying the probabilities recorded along the
corresponding path of the tree.
36
Textbook: D. P. Bertsekas, J. N. Tsitsiklis, “Introduction to Probability”, 2nd Ed., Athena Science 2008.
CONDITIONAL PROBABILITY
Multiplication Rule
P n
i 1
Ai P A1 P A2 | A1 P A3 | A1 A2
P An |
n 1
i 1
Ai
Verification of the multiplication rule:
Multiplication rule can be verified by writing
P A1 A2 P A1 A2 A3 n
P Ai
A P A1
n i 1
P A
P i 1 i
P A1 P A1 A2 n 1
i 1 i
P A1 P A2 | A1 P A3 | A1 A2 ...P An | n 1
i 1
A i
where for the second equality, we used the definition of conditional probability.
For the case of just two events, A1 and A2 , the multiplication rule is simply the definition
of conditional probability.
37
Textbook: D. P. Bertsekas, J. N. Tsitsiklis, “Introduction to Probability”, 2nd Ed., Athena Science 2008.
CONDITIONAL PROBABILITY
Multiplication Rule
Example 1.10 Three cards are drawn from an ordinary 52-card deck without replacement
(drawn cards are not placed back in the deck). We wish to find the probability that none of
the three cards is a heart. We assume that at each step, each one of the remaining cards is
equally likely to be picked. By symmetry, this implies that every triplet of cards is equally
likely to be drawn.
38
Textbook: D. P. Bertsekas, J. N. Tsitsiklis, “Introduction to Probability”, 2nd Ed., Athena Science 2008.
TOTAL PROBABILITY THEOREM AND BAYES’ RULE
Total Probability Theorem:
Let A1 , A2 , , An be disjoint events that form a partition of the sample space, and assume
that P Ai 0 for all i 1, , n . Then for any event B, we have
P B P A1 B P A2 B P An B
P A1 P B | A1 P A2 P B | A2 P An P B | An
39
Textbook: D. P. Bertsekas, J. N. Tsitsiklis, “Introduction to Probability”, 2nd Ed., Athena Science 2008.
TOTAL PROBABILITY THEOREM AND BAYES’ RULE
Visualization and verification of Total Probability Theorem
40
Textbook: D. P. Bertsekas, J. N. Tsitsiklis, “Introduction to Probability”, 2nd Ed., Athena Science 2008.
TOTAL PROBABILITY THEOREM AND BAYES’ RULE
Total Probability Theorem
“Divide-and-Conquer” approach
The key is to choose appropriately the partition A1,…,An . This choice is often
suggested by the problem structure.
41
Textbook: D. P. Bertsekas, J. N. Tsitsiklis, “Introduction to Probability”, 2nd Ed., Athena Science 2008.
TOTAL PROBABILITY THEOREM AND BAYES’ RULE
Total Probability Theorem
Multiplication Rule
Multiplication Rule
42
Textbook: D. P. Bertsekas, J. N. Tsitsiklis, “Introduction to Probability”, 2nd Ed., Athena Science 2008.
TOTAL PROBABILITY THEOREM AND BAYES’ RULE
Example 1.13 You enter a chess tournament where your probability of winning a game is 0.3
against half the players (call them type 1), 0.4 against a quarter of the players (call them type 2), and
0.5 against the remaining quarter of the players (call them type 3). You play a game against a
randomly chosen opponent. What is the probability of winning?
P B P A1 P B | A1 P A2 P B | A2 P A3 P B | A3
0.5 0.3 0.25 0.4 0.25 0.5
0.375
A2
A1
A3
Bayes’ Rule
Let A1 , A2 , , An be disjoint events that form a partition of the sample space, and assume
that P Ai 0 for all i 1, , n . Then for any event B such that P B 0 , we have
P Ai P B | Ai
P Ai | B ( reversing the order of conditioning )
P B
P Ai P B | Ai
P A1 P B | A1 P A2 P B | A2 P An P B | An
P Ai P B | Ai
n
P A PB | A
i 1
i i
44
Textbook: D. P. Bertsekas, J. N. Tsitsiklis, “Introduction to Probability”, 2nd Ed., Athena Science 2008.
TOTAL PROBABILITY THEOREM AND BAYES’ RULE
Bayes’ Rule
P Ai : Prior probability of event Ai
(Probability of event Ai without knowing event B has occurred.)
P Ai | B : Posterior probability of event Ai
(Probability of event Ai knowing event B has occurred.)
Verification
To verify the Bayes’ rule, note that the definition of conditional probability, we have
P Ai P B | Ai
P Ai B P Ai P B | Ai P B P Ai | B P Ai | B
P B
This yields the first equality. The second equality follows from the first by using the total
probability theorem to rewrite P B
P Ai P B | Ai P Ai P B | Ai
P Ai | B
P B P A1 P B | A1 P A2 P B | A2 P An P B | An
45
Textbook: D. P. Bertsekas, J. N. Tsitsiklis, “Introduction to Probability”, 2nd Ed., Athena Science 2008.
TOTAL PROBABILITY THEOREM AND BAYES’ RULE
Example 1.16 Radar Detection Let’s return to the radar detection problem
P aircraft present|alarm P A | B
P A P B | A
P B
P A P B | A
P A P B | A P AC P B | AC
0.05 0.99 0.0495
0.3426 !!!
0.05 0.99 0.95 0.1 0.1445
46
Textbook: D. P. Bertsekas, J. N. Tsitsiklis, “Introduction to Probability”, 2nd Ed., Athena Science 2008.
TOTAL PROBABILITY THEOREM AND BAYES’ RULE
Example 1 .17. Let’s return to the chess problem of Example 1.13. Here. Ai is
the event of getting an opponent of type i, and
P A1 0.5, P A2 0.25, P A3 0.25
Suppose that you win. What is the probability P A1 | B that you had an opponent of type-1?
P A1 P B | A1
P A1 | B
P A1 P B | A1 P A2 P B | A2 P A3 P B | A3
0.5 0.3
0.5 0.3 0.25 0.4 0.25 0.5
0.4
47
Textbook: D. P. Bertsekas, J. N. Tsitsiklis, “Introduction to Probability”, 2nd Ed., Athena Science 2008.
TOTAL PROBABILITY THEOREM AND BAYES’ RULE
Example 1.18 A test for a certain rare disease is assumed to be correct 95% of the time: if a
person has the disease, the test results are positive with probability 0.95, and if the person does
not have the disease, the test results are negative with probability 0.95. A random person drawn
from a certain population has probability 0.001 of having the disease. Given that the
person just tested positive, what is the probability of having the disease?
P A P B | A
P A | B
P A P B | A P Ac P B | Ac
0.001 0.95
very unlikely (less than 2%) !!!
0.001 0.95 0.999 0.05
0.0187
48
Textbook: D. P. Bertsekas, J. N. Tsitsiklis, “Introduction to Probability”, 2nd Ed., Athena Science 2008.
Overview- Total Probability Theorem, Multiplication Rule and
Bayes’ Rule
The key is to choose appropriately the partition A1, … An, and this choice is often suggested
by the problem structure.
A2
A1
A3
49
Textbook: D. P. Bertsekas, J. N. Tsitsiklis, “Introduction to Probability”, 2nd Ed., Athena Science 2008.
Overview-(continued)
EVENTS PROBABILITIES PROBABILITY
(obtained by Multiplication Rule) (obtained by Total Probability
Theorem)
P(B|A1) A1∩B P(A1∩B)
⋃ Additivity +
P(A1)
P(B|A2) A2∩B P(A2∩B) P(B)
P(A2)
⋃ +
P A1 P B | A1
P A1 | B BAYES’ RULE
P B
50
Textbook: D. P. Bertsekas, J. N. Tsitsiklis, “Introduction to Probability”, 2nd Ed., Athena Science 2008.
INDEPENDENCE
If the occurrence of an event B does not alter the probability that A has occurred, i.e.,
P A | B P A
then we say that event A is independent of event B.
By using the definition of conditional probability, P A | B P A B P B , definition
of independence can be given as
P A B P A P B
We adopt this latter relation as the definition of independence because it can be used even
when P B 0 , in which case P A | B is undefined.
For example: Let A have zero probability, then P A B 0 since A B is a smaller event.
P A B P A P B
A zero probability event is independent of any other event.
0 0
On the other hand, independence is not easily visualized in terms of the sample space. A
common first thought is that two events are independent if they are disjoint, but in fact the
opposite is true: two disjoint events A and B with P A 0 and P B 0 are never
independent, since their intersection A B is empty and has probability 0.
For example, an event A and its complement Ac are not independent (unless P A 0 or
P A 1), since knowledge that A has occurred provides precise information about
whether Ac has occurred.
52
Textbook: D. P. Bertsekas, J. N. Tsitsiklis, “Introduction to Probability”, 2nd Ed., Athena Science 2008.
Example 1.19. Consider an experiment involving two successive rolls of a 4-sided die in
which all 16 possible outcomes are equally likely and have probability 1/16.
53
Textbook: D. P. Bertsekas, J. N. Tsitsiklis, “Introduction to Probability”, 2nd Ed., Athena Science 2008.
Example 1.19 (Continued)
54
Textbook: D. P. Bertsekas, J. N. Tsitsiklis, “Introduction to Probability”, 2nd Ed., Athena Science 2008.
Conditional Independence
Given an event C, the events A and B are called conditionally independent if
P A B | C P A | C PB | C
Example 1.20: Consider two independent fair coin tosses, in which all four possible
outcomes are equally likely. Let
For the case of three events, A1, A2, and A3, independence amounts to satisfying the four
conditions
P A1 A2 P A1 P A2 ,
P A1 A3 P A1 P A3 ,
P A2 A3 P A2 P A3 ,
P A1 A2 A3 P A1 P A2 P A3 .
The fourth condition does not follow from the first three. Conversely, the fourth condition
does not imply the first three. See Examples 1.22 and 1.23 (Homework).
56
Textbook: D. P. Bertsekas, J. N. Tsitsiklis, “Introduction to Probability”, 2nd Ed., Athena Science 2008.
Independent Trials and the Binomial Probabilities
Example:
Consider an experiment that consists of n independent tosses of a biased coin, in which
probability of “heads” is p and probability of “tails” is 1-p. In this experiment,
independence means that the events A1 , A2 , , An are independent where
57
Textbook: D. P. Bertsekas, J. N. Tsitsiklis, “Introduction to Probability”, 2nd Ed., Athena Science 2008.
Independent Trials and Binomial Probabilities
p HHH P HHH p 3
HH
P(H|H) = p,
HHT P HHT p 2 1 p
p
because of independence 1 p
H
p HTH P HTH p 2 1 p
p 1 p
HT HTT P HTT p 1 p
2
1 p
p THH P THH p 2 1 p
TH
1 p p
THT P THT p 1 p
2
1 p
T
TTH P TTH p 1 p
2
p
1 p
TTT P TTT 1 p
TT 1 p 3
58
Textbook: D. P. Bertsekas, J. N. Tsitsiklis, “Introduction to Probability”, 2nd Ed., Athena Science 2008.
Independent Trials and Binomial Probabilities
In this example, any particular (or any given) outcome (3-long sequence of heads and
tails) that involves k heads and 3 – k tails has probability
p k 1 p
3 k
.
This formula can be extended to the case of general n tosses. In this case, the probability
of any particular (or any given) n-long sequence that contains k heads and n – k tails is
p k 1 p
nk
.
59
Textbook: D. P. Bertsekas, J. N. Tsitsiklis, “Introduction to Probability”, 2nd Ed., Athena Science 2008.
Independent Trials and Binomial Probabilities
The probability given above considers the all possible n-long sequences that contains k
heads. Therefore, this probability can be calculated by summing the probabilities of all
distinct n - toss sequences that contains k heads, since the distinct n-toss sequences are
mutually exclusive.
n
p k p k 1 p Binomial Probabilities
nk
k
n
k : number of distinct n-toss sequences contain k heads (read as “n choose k”)
n
The numbers are known as the binomial coefficients.
k
n n!
k k ! n k !, k 0,1, 2, , n
where for any positive integer i we have
i ! 1 2 3 i, 0! 1 (by definition)
60
Textbook: D. P. Bertsekas, J. N. Tsitsiklis, “Introduction to Probability”, 2nd Ed., Athena Science 2008.
COUNTING
The calculation of probabilities often involves counting the number of outcomes in
various events. We have already seen two contexts where such counting arises.
1) When the sample space has a finite number of equally likely outcomes, so that the
discrete uniform probability law applies. Then, the probability of any event A is given
by
number of elements of A
P A
number of elements of
and involves counting the elements of A and of .
P A p number of elements of A
and involves counting the elements of A
61
Textbook: D. P. Bertsekas, J. N. Tsitsiklis, “Introduction to Probability”, 2nd Ed., Athena Science 2008.
Counting
The Counting Principle: The counting principle is based on a divide-and-conquer
approach, whereby the counting is broken down into stages through the use of a tree.
n1 n2 n3 n4
Choices Choices Choices Choices
62
Textbook: D. P. Bertsekas, J. N. Tsitsiklis, “Introduction to Probability”, 2nd Ed., Athena Science 2008.
Counting
The Counting Principle
n1 n2 nr
63
Textbook: D. P. Bertsekas, J. N. Tsitsiklis, “Introduction to Probability”, 2nd Ed., Athena Science 2008.
Counting
Example 1.26. A telephone number is a 7-digit sequence, but the first digit has to be
different from 0 or 1. How many distinct telephone numbers are there?
Example 1.27. Consider an n-element set s1 , s2 , , sn . How many subsets does it have
(including itself and the empty set)?
We can visualize the choice of a subset as a sequential process where we examine one
element at a time and decide whether to include it in the set or not. There will be total of n
stages, and a binary choice at each stage. Therefore the number of subsets is
2 2 2 2n .
n times
64
Textbook: D. P. Bertsekas, J. N. Tsitsiklis, “Introduction to Probability”, 2nd Ed., Athena Science 2008.
Counting
Two types of counting arguments: permutations and combinations.
Permutation: Selection of k objects out of the n objects with paying attention to the
order of the selection is called a permutation.
k-permutations:
Assume that there are n distinct objects and we want to select k of them. How many
different ways can be found where the order of the selection matters?
The answer is simple:
o Any of the n objects can be chosen to be the first one.
o After choosing first one, there are only n 1 possible choices for the second one.
o Given the choice of first two, there only remains n 2 available objects for the third
selection, etc.
o At the kth selection, k 1 objects have already been selected and there would be
n k 1 objects available for the last selection.
By the Counting Principle, the number of possible sequences, called k-permutations is
n n 1 n k 1 n k 2 1 n!
n n 1 n k 1
n k 2 1 n k !
65
Textbook: D. P. Bertsekas, J. N. Tsitsiklis, “Introduction to Probability”, 2nd Ed., Athena Science 2008.
Counting
Example Let us select two objects out of the four objects A, B, C, and D with paying
attention to the order of the selection.
AB, BA, AC, CA, AD, DA, BC, CB, BD, DB, CD, DC.
n! 4!
4 3 12
n k 2!
!
Example 1.28. Let us count the number of words that consist of four distinct letters.
This problem can be interpreted as the counting the number of 4-permutations of the 26
letters in English alphabet.
n! 26!
26 25 24 23 358,800
n k ! 22!
Example 1.29. Homework
66
Textbook: D. P. Bertsekas, J. N. Tsitsiklis, “Introduction to Probability”, 2nd Ed., Athena Science 2008.
Counting
Combination: Selection of k objects out of the n objects without paying any attention to
the order of the selection is called a combination.
n n!
The number of possible combinations is given by
k k! n k !
Note that each combination is associated with k ! “duplicate”.
Example 1.30. The number of combinations of two out of the four letters A, B, C, and D
is found by letting n=4 and k=2. It is
4 4!
2 2! 4 2 ! 6
67
Textbook: D. P. Bertsekas, J. N. Tsitsiklis, “Introduction to Probability”, 2nd Ed., Athena Science 2008.
Counting
A more general type of counting: partitions.
Recall that a combination can be viewed as a partition of the set in two: one part contains
k elements and the other contains the remaining n-k. Let us generalize by considering
partitions into more than two subsets.
Partitions:
We are given an n-element set and nonnegative integers n1 , n2 , , nr
n1 n2 nr n
We consider partitions of the set into r disjoint subsets, with the ith subset containing
exactly ni elements. How many ways are there to divide n objects into r disjoint groups?
68
Textbook: D. P. Bertsekas, J. N. Tsitsiklis, “Introduction to Probability”, 2nd Ed., Athena Science 2008.
Counting
Partitions and Multinomial Coefficients:
Each group can be form one at a time.
n
o There are ways of forming the first group.
n1
n n1
o There are ways of forming the second group, and so on.
2
n
o Use the Counting Principle
n n n1 n n1 n2 n n1 n2 nr 1
n n
1 2 n 3 n r
which is equal to
n!
n n1 !
n n1 nr 1 !
n!
n
n1 ! n n1 ! n2 ! n n1 n2 ! nr ! n n1 nr ! n1 !n 2! n r! n1 , n2 , , nr
Multinomial Coefficient
n n n!
Recall that a combination (two groups):
k! n k !
k k , n k
69
Textbook: D. P. Bertsekas, J. N. Tsitsiklis, “Introduction to Probability”, 2nd Ed., Athena Science 2008.
Counting
Example 1.33. A class consisting of 4 graduate and 12 undergraduate students is randomly
divided into four groups of 4. What is the probability that each group includes a graduate
student?
We first determine the nature of the sample space. A typical outcome is a particular way of
partitioning the 16 students into four groups of 4. We take the term “randomly” to mean
that every possible partition is equally likely, so that the probability question can be reduced
to one of counting.
16 16!
4, 4, 4, 4 4!4!4!4!
70
Textbook: D. P. Bertsekas, J. N. Tsitsiklis, “Introduction to Probability”, 2nd Ed., Athena Science 2008.
Counting
Example 1.33 (Continued)
Let us now focus on the event that each group contains a graduate student. Two stages:
o Number of ways of distributing four graduate students to the four group is 4!
o Number of ways of distributing remaining 12 undergraduate students to the four
groups where each group will have three undergrads.
12 12!
3,3,3,3 3!3!3!3!
Total number of elements of the interested event can be obtained by using Counting
Principle
12!
N A 4!
3!3!3!3!
12!
4!
N
The probability of this event is P A A 3!3!3!3!
NS 16!
4!4!4!4!
71
Textbook: D. P. Bertsekas, J. N. Tsitsiklis, “Introduction to Probability”, 2nd Ed., Athena Science 2008.