0% found this document useful (0 votes)
5 views71 pages

Lecture Notes 1

Uploaded by

Ertan Akçay
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views71 pages

Lecture Notes 1

Uploaded by

Ertan Akçay
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 71

Why is probability important for electrical engineers?

Probability provides a framework to deal with “uncertainty”.

 Our main objective in this course is to develop the art of


describing “uncertainty” in terms of probabilistic models, as
well as the skill of probabilistic reasoning (inference).

1
SAMPLE SPACE AND PROBABILITY

PROBABILISTIC MODELS

A probabilistic model is a mathematical description of an uncertain


situation.

Every probabilistic model involves an underlying process, called the experiment,


that will produce exactly one out of several possible outcomes.

Elements of a Probabilistic Model:

• The sample space Ω (set of all possible outcomes)


• The probability law

2
PROBABILISTIC MODELS

Sample Space (set of possible outcomes)

Event A

Random
Experiment

In probability theory,
- we usually call subsets of the sample space as “events”

3
PROBABILISTIC MODELS
Sample Space (set of possible outcomes) Probability Law

Event A
P  A
Random
Experiment Event B P B

A B Events

A probability law is a function that assigns a number to events.

- The probability law encodes our knowledge or belief about which events are
more likely to occur compared to others.

4
PROBABILISTIC MODELS

P(A) represents our knowledge or belief about how likely event A is to occur.

If the outcome is inside set A: “event A occurred”


If the outcome is outside set A: “event A did not occur”


A

5
PROBABILISTIC MODELS
Choosing an Appropriate Sample Space
Elements of the sample space (outcomes) should be mutually exclusive and
collectively exhaustive.
Mutually Exclusive
When the experiment is carried out there is a unique outcome.
(If one outcome occurs, another cannot occur)

Collectively Exhaustive
No matter what happens in the experiment, we always obtain an outcome that
has been included in the sample space.
(None of the possible outcomes is forgotten)

In addition: How much detail should we include?


The sample space should have enough detail to distinguish between all outcomes while
avoiding irrelevant details.
For example, the possibility of the coin landing on its edge is generally not included in the probabilistic
model.
6
PROBABILISTIC MODELS
 Random Experiment: Experiment whose result is uncertain before it is performed.
 Trial: Single performance of a random experiment.
 Outcome: Result of a trial.
 Sample Space S: The set of all possible outcomes of a random experiment.
 Event: Subset of the sample space S (to which a probability can be assigned).
A collection of possible outcomes, including the entire sample space and the
empty set.
 Sure Event: Sample space S (an event for sure to occur)
 Impossible Event: Empty set  (an event impossible to occur).

Difference between “outcomes” and “events”:


Outcomes: {H}, {T}
Events: {H}, {T}, {H, T}, 

7
PROBABILISTIC MODELS
Example: Single roll of a 6-sided die.

Discrete Sample Space:   1,2,3,4,5,6


 Some Events:
A1  an odd number shows up  1,3,5
A2  a number less than 3 shows up  1, 2
A3  an even number shows up  2, 4,6
A4  2 shows up  2 , (single outcome)
A5  a number greater than 6 shows up  , (no outcome)
A6  a number from 1 to 6 shows up  1, 2,3, 4,5,6  , (all outcomes)

8
PROBABILISTIC MODELS
Example (Continued)

A7  2 or 4 show up  2  4  2, 4 (*)


A8  2 and 4 show up  2  4   (*)

(*) When we think probabilistically, i.e., when we talk about events,


- the word "and" rather than the word “intersection”
- the word "or" rather than the word “union”
are more convenient.

A  B is the event that A occurred “and” B occurred.


A  B is the event that A occurred “or” B occurred.

9
PROBABILISTIC MODELS
Sequential Models: Many experiments have an inherently sequential character
For example: receiving successive digits at a communication receiver

Tree-Based Sequential Description


Experiment: Two tosses of a coin
HH
H

T
H
HT

TH
T H

T
TT
H : RESULT of the first toss / first stage
T : RESULT of the second toss / second stage
HT : OUTCOME of the experiment

10
Textbook: D. P. Bertsekas, J. N. Tsitsiklis, “Introduction to Probability”, 2nd Ed., Athena Science 2008.
PROBABILISTIC MODELS
Sequential Models
Experiment: Two rolls of a 6-sided die

Two equivalent description of the sample space

Description in 2-dimensional grid Tree-based sequential discription


1,1
6
1 1,2 
5 Leaves

4  2,1
 2, 2 
Second Roll

2
Root
3

2  6,1
6  6, 2 
1 1 : RESULT of the first roll
1 2 3 4 5 6 2 : RESULT of the second roll
First Roll (1,2) : OUTCOME of the experiment  6,6 
11
Textbook: D. P. Bertsekas, J. N. Tsitsiklis, “Introduction to Probability”, 2nd Ed., Athena Science 2008.
PROBABILISTIC MODELS

The probability P satisfies the following axioms

Probability Axioms

I. Nonnegativity: P  A  0 for every event A.

II. Normalization: The probability of the entire sample space  is equal to 1.

P    1

III. Additivity: If A and B are two disjoint events, then the probability of their union
satisfies
P  A  B   P  A  P  B 

12
Textbook: D. P. Bertsekas, J. N. Tsitsiklis, “Introduction to Probability”, 2nd Ed., Athena Science 2008.
PROBABILISTIC MODELS
“Probability is common sense reduced to calculation.”
Laplace
Let's try to understand this definition through a concrete example:

Constructing a probability law starting from some common sense assumptions


about a discrete model.

Example 1.2. Consider an experiment involving a single coin toss.

There are two possible outcomes: heads (H) and tails (T).

The sample space is H , T 


The events in the sample space are H , T  ,H  ,T  , 

If the coin is fair, i.e. we believe that heads and tails are “equally likely”, then we should
assign equal probabilities to the possible outcomes: P H   P T   0.5

13
Textbook: D. P. Bertsekas, J. N. Tsitsiklis, “Introduction to Probability”, 2nd Ed., Athena Science 2008.
PROBABILISTIC MODELS
Example 1.2. (Continued)
Additivity axiom implies that P H , T   P H   P T   1
which is consistent with the normalization axiom.
Note that comma (,) in the eventH ,T  is used to separate the elements. It is not used to mean "and" in this
notation. Actually, H , T  H or T show up  H   T 

Nonnegativity axiom implies that P "Any Event"  0


Thus, the probability law is given by
P H , T   1, P H   0.5, P T   0.5, P    0
and satisfies all three axioms. This is a “legitimate” probability rule.

Other probability rules:


P H , T   1, P H   0.7, P T   0.3, P     0 legitimate (unfair coin)
P H , T   1, P H   0.5, P T   0.3, P     0 not legitimate
14
Textbook: D. P. Bertsekas, J. N. Tsitsiklis, “Introduction to Probability”, 2nd Ed., Athena Science 2008.
PROBABILISTIC MODELS

There are many natural properties of a probability law, which can be


derived from the above axioms.

For example: the normalization and additivity axioms imply that

1  P     P      1  P    ,

which shows that

P   0

15
Textbook: D. P. Bertsekas, J. N. Tsitsiklis, “Introduction to Probability”, 2nd Ed., Athena Science 2008.
PROBABILISTIC MODELS

Another example: consider three disjoint events A1 , A2 ,and A3 . By using the additivity
axiom for two disjoint events repeatedly, we obtain that

P  A1  A2  A3   P  A1   A2  A3  
 P  A1   P  A2  A3 
 P  A1   P  A2   P  A3 
Generalization: Following the same procedure, a more general property for the case of
finitely many sets can be derived:

If the events A1 , , An are disjoint, then


P  A1   An   P  A1    P  An 

16
Textbook: D. P. Bertsekas, J. N. Tsitsiklis, “Introduction to Probability”, 2nd Ed., Athena Science 2008.
PROBABILISTIC MODELS
Special case of the above property:

Suppose that we are dealing with a finite set consists of a finite number of
possible outcomes. We want to calculate the probability of this set.

This finite set can be written as a union of single element sets:

s1 ,s2 , , sk   s1   sk  


A
Therefore · S1 · Sk+1
· S2 · Sk+2
P s1 ,s 2 , , sk   P  s1   P  s2    P  sk 
.
. .
. .
.
· Sk · Sn
See “Discrete Probability Law” for the formal
definition.

17
Textbook: D. P. Bertsekas, J. N. Tsitsiklis, “Introduction to Probability”, 2nd Ed., Athena Science 2008.
PROBABILISTIC MODELS

Discrete Probability Law

If the sample space consists of a finite number of possible outcomes, then the probability
law is specified by the probabilities of events that consists of a single element. In
particular, the probability of any event s1 ,s2 , , sk  is the sum of the probabilities of its
individual elements

P s1 ,s 2 , , sk   P  s1   P  s2    P  sk 

Notation: Curly brackets (braces) are used to denote sets.

P si  : more precise


P  si  : simpler

18
Textbook: D. P. Bertsekas, J. N. Tsitsiklis, “Introduction to Probability”, 2nd Ed., Athena Science 2008.
PROBABILISTIC MODELS
In the special case where the probabilities P  s1  , P  s2  , , P  sn  are all the same (by
necessity equal to 1 n , in the view of normalization axiom), we obtain the following:

Discrete Uniform Probability Law


If the sample space consists of n possible outcomes which are equally likely (i.e. all single-
element events have the same probability), then the probability of any event A is given by

number of elements of A
P  A 
n
For example:

A
P  A  P s1 ,s 2 , , sk 
· S1 · Sk+1  P  s1   P  s2    P  sk  (by using discrete probability law)
· S2 · Sk+2 1 1 1
   
.
. .
.
.
.
n n n
· Sk · Sn k number of elements of A
 
n n
19
Textbook: D. P. Bertsekas, J. N. Tsitsiklis, “Introduction to Probability”, 2nd Ed., Athena Science 2008.
PROBABILISTIC MODELS
Example: Consider the experiment of rolling a pair of fair dice.
or Consider the experiment of two rolls of a fair die.

  1,1 , 1, 2  , ,  6,6 


Fair dice assumption means that each of the 36 possible outcomes has the same probability 1/36.

Sample Space Pair of Rolls By using “discrete uniform probability law”:


P the sum of the rolls is even 
18 1
6 
36 2
5
P the sum of the rolls is odd 
18 1

4 36 2
Second Roll

P the first roll is larger than the second roll 


15
3
36
P at least one roll is equal to 4 
2 11
1
36
1 2 3 4 5 6
First Roll

20
Textbook: D. P. Bertsekas, J. N. Tsitsiklis, “Introduction to Probability”, 2nd Ed., Athena Science 2008.
PROBABILISTIC MODELS
Properties of Probability Laws:

1) If A  B, then P  A   P  B 

2) P  A  B   P  A   P  B   P  A  B 

3) P  A  B   P  A   P  B 

4) P  A  B  C   P  A   P  AC  B   P  AC  B C  C 

Proof of these properties can be obtained analytically or by using Venn diagrams. Proof is
given in the textbook. (HW)

21
Textbook: D. P. Bertsekas, J. N. Tsitsiklis, “Introduction to Probability”, 2nd Ed., Athena Science 2008.
PROBABILISTIC MODELS
Continuous Models

Discrete Models: Coin tossing experiment, die rolling experiment

Continuous Models: A wheel of fortune, continuously calibrated


between 0 and 1.

Example 1.4. A wheel of fortune is continuously calibrated from 0 to 1. The possible


outcomes: the numbers in the interval   0,1 . What is the probability of the event
consisting of a single element?
Answer: It cannot be positive, because then, using the additivity axiom, it would follow
that events with a sufficiently large number of elements would have probability larger
than 1. Therefore, the probability of any event that consists of a single element must be 0.

22
Textbook: D. P. Bertsekas, J. N. Tsitsiklis, “Introduction to Probability”, 2nd Ed., Athena Science 2008.
PROBABILISTIC MODELS
Continuous Models
Example 1.4. (Continued)

Note that the probabilities of the single-element events are not sufficient to
characterize the probability law for continuous models.

What can we do? “Area” or “length” may be used for defining a probability law.

What is the probability of any subinterval  a, b ?


Assign probability b  a to any subinterval  a, b of 0,1 , and the probability of a set is
calculated by evaluating its “length”.

This assignment satisfies the three probability axioms and qualifies as a legitimate
probability law.

Length of Interested Interval b  a


P   a, b      Uniform Probability Law (Continuous)
Length of Sample Space 1
23
Textbook: D. P. Bertsekas, J. N. Tsitsiklis, “Introduction to Probability”, 2nd Ed., Athena Science 2008.
A Quick Review: SET THEORY
We use set operations, therefore, we need a quick review of set theory.

Notation and Terminology


x x satisfies P where P is a certain property " " : such that

Set Operations
 AC   x : x  A Complement
 A B   x : x  A or x  B Union
 A  B   x : x  A and x  B Intersection

A B   "Mutually Exclusive" or "Disjoint"

24
Textbook: D. P. Bertsekas, J. N. Tsitsiklis, “Introduction to Probability”, 2nd Ed., Athena Science 2008.
SET THEORY

The Algebra of Sets

 A 
C C
A Double Complement
 A  B  B  A and A  B  B  A Commutative Properties
 A   B  C    A  B  C Associative Properties
A   B  C    A  B  C
 A   B  C    A  B   A  C  Distributive Properties
A   B  C    A  B   A  C 
 A  AC  
 A  A
 A  
 A  A

25
Textbook: D. P. Bertsekas, J. N. Tsitsiklis, “Introduction to Probability”, 2nd Ed., Athena Science 2008.
SET THEORY
A partition of a set: If the non-empty subsets A1 , , An are disjoint and their union is  ,
then it is said that A1 , , An form a partition of  .
 

A2

A1 A2 A1 A3

  
A B A B A
B

C C C

A, B, C are A, B, C are A, B, C are


Mutually Exclusive Collectively Exhaustive Mutually Exclusive and
Collectively Exhaustive
26
Textbook: D. P. Bertsekas, J. N. Tsitsiklis, “Introduction to Probability”, 2nd Ed., Athena Science 2008.
CONDITIONAL PROBABILITY
Conditional probability provides us a way to reason (infer) about outcome of an
experiment, based on partial information.

Simple Examples:
o In an experiment involving two successive rolls of a die, you are told that the sum
of the two rolls is 9. How likely is that the first roll was 6?

o In a word guessing game, the first letter of the word is a “t”. What is the likelihood
that the second letter is an “h”?

Real-world problems:
o A spot shows up on a radar screen. How likely it corresponding to an aircraft?

o How likely is it that a person has a disease given that a medical test was negative?

27
Textbook: D. P. Bertsekas, J. N. Tsitsiklis, “Introduction to Probability”, 2nd Ed., Athena Science 2008.
CONDITIONAL PROBABILITY

Fig. If B is known to have occurred, then A can occur only if A  B occurs (Garcia,2008)

Knowledge that event B has occurred implies that the outcome of the experiment is in the
set B. In computing P  A | B  we can therefore view the experiment as now having the
reduced sample space B as shown in the figure above. The event A occurs in the reduced
sample space if and only if the outcome is in A  B .

“reduced” or “new sample” space: B

28
Textbook: D. P. Bertsekas, J. N. Tsitsiklis, “Introduction to Probability”, 2nd Ed., Athena Science 2008.
CONDITIONAL PROBABILITY
Definition: The probability of an event A under the condition that event B has occurred is
called conditional probability of “A given B”.

P A  B
P A | B , P B  0
P B

 Conditional probabilities can be viewed as a probability law on a new universe B.

 If the possible outcomes are finitely many and equally likely, then

number of elementsof A  B
P  A | B  .
number of elementsof B

29
Textbook: D. P. Bertsekas, J. N. Tsitsiklis, “Introduction to Probability”, 2nd Ed., Athena Science 2008.
CONDITIONAL PROBABILITY
Conditional Probabilities Specify a Legitimate Probability Law
 P  A  B   0, P  B   0  P  A | B   0 , the non-negativity axiom is satisfied.
P   B P  B
 P  | B    1, the normalization axiom is satisfied.
P B P B
 Verification of the additivity axiom: Let A1 and A2 be disjoint events related with the same
experiment.
P   A1  A2   B 
P  A1  A2 | B  
P  B
P   A1  B    A2  B  
 (Note that  A1  B  the and  A2  B  are disjoint sets)
P B
P  A1  B   P  A2  B 
 (by using the additivity axiom for  unconditional  probability law)
P B
P  A1  B  P  A2  B 
 
P B P B
 P  A1 | B   P  A2 | B 
30
Textbook: D. P. Bertsekas, J. N. Tsitsiklis, “Introduction to Probability”, 2nd Ed., Athena Science 2008.
CONDITIONAL PROBABILITY
Example 1.6 We toss a fair coin three successive times. The events A and B are defined as
follow:
A= {more heads than tails come up},
B= {first toss is a head}
Find the conditional probability P  A | B  .

  HHH,HHT,HTH,THH,TTH,THT,HTT,TTT

A  HHH , HHT , HTH , THH   P  A  4 8  1 2 (initial probability)


B  HHH , HHT , HTH , HTT   P  B  4 8  1 2
A  B  HHH , HHT , HTH   P  A  B  3 8

P A  B 3 8 3 number of elementsof A  B 3
P A | B    or P A | B  
P B 48 4 number of elementsof B 4
(revised probability)

31
Textbook: D. P. Bertsekas, J. N. Tsitsiklis, “Introduction to Probability”, 2nd Ed., Athena Science 2008.
CONDITIONAL PROBABILITY
Example Consider rolling a 6-sided die. Let
A= {number 1 shows up}, B= {an odd number shows up}, C= {number 1 or 2 shows up}

  1,2,3,4,5,6

P  A  B  P  A 1 6 1
P A | B    
P B P  B 3 6 3

P  C  B  P  A 1 6 1
P C | B    
P B P B 3 6 3

PB C 1 6 1
PB | C   
P C  26 2

HW: Example 1.7, Example 1.8

32
Textbook: D. P. Bertsekas, J. N. Tsitsiklis, “Introduction to Probability”, 2nd Ed., Athena Science 2008.
CONDITIONAL PROBABILITY

Using Conditional Probability for Modeling

When constructing probabilistic models for experiments that have a


sequential character, it is often natural and convenient to first specify
conditional probabilities and then use them to determine unconditional
probabilities.

The rule P  A  B   P  B  P  A | B  , which is restatement of the definition of


conditional probability is often helpful in this process.

33
Textbook: D. P. Bertsekas, J. N. Tsitsiklis, “Introduction to Probability”, 2nd Ed., Athena Science 2008.
CONDITIONAL PROBABILITY
Using Conditional Probability for Modeling

Example 1.9. Radar Detection


If an aircraft is present in a certain area, a radar detects it and generates an alarm signal
with probability 0.99. If an aircraft is not present, the radar generates a (false) alarm with
probability 0.10. We assume that an aircraft is present with probability 0.05. What is the
probability of no aircraft presence and a false alarm? What is the probability of aircraft
presence and no detection?

Let A and B be the events


A  an aircraft is present , B  the radar generates an alarm
AC  an aircraft is not present , BC  the radar does not generate an alarm

P  AC  B   ?
P  A  BC   ?

34
Textbook: D. P. Bertsekas, J. N. Tsitsiklis, “Introduction to Probability”, 2nd Ed., Athena Science 2008.
CONDITIONAL PROBABILITY
Example 1.9. Radar Detection (Continued)
0 .99
A

P
B |
Aircraft Present P
BC
P  A   0.05
|A
  0.0
1
Missed Detection
.1 0
P  AC   0.95
C  0 False Alarm
P
B |A

Aircraft not Present P


BC
  0.9
|AC
0

P  False alarm   P  not present and alarm   P  AC  B   P  AC  P  B | AC   0.95  0.10  0.095


P  Missed detection   P  present and no detection   P  A  B C   P  A  P  B C | A   0.05  0.01
 0.0005
35
Textbook: D. P. Bertsekas, J. N. Tsitsiklis, “Introduction to Probability”, 2nd Ed., Athena Science 2008.
CONDITIONAL PROBABILITY
Using Conditional Probability for Modeling

Extending the preceding example, we have a general rule for calculating various
probabilities in conjunction with a tree-based sequential description of an experiment.
In particular:

(a) We set up the tree so that an event of interest is associated with a leaf. We view the
occurrence of the event as a sequence of steps, namely, the traversals of the branches
along the path from the root to the leaf.

(b) We record the conditional probabilities associated with the branches of the tree.

(c) We obtain the probability of a leaf by multiplying the probabilities recorded along the
corresponding path of the tree.

36
Textbook: D. P. Bertsekas, J. N. Tsitsiklis, “Introduction to Probability”, 2nd Ed., Athena Science 2008.
CONDITIONAL PROBABILITY
Multiplication Rule

P  n
i 1 
Ai  P  A1  P  A2 | A1  P  A3 | A1  A2  
P An |
n 1
i 1
Ai 
Verification of the multiplication rule:
Multiplication rule can be verified by writing

P  A1  A2  P  A1  A2  A3   n

 
P Ai
A  P  A1 
n i 1

P A
P i 1 i
P  A1  P  A1  A2  n 1
i 1 i

 P  A1  P  A2 | A1  P  A3 | A1  A2  ...P An |  n 1
i 1
A i

where for the second equality, we used the definition of conditional probability.

For the case of just two events, A1 and A2 , the multiplication rule is simply the definition
of conditional probability.

37
Textbook: D. P. Bertsekas, J. N. Tsitsiklis, “Introduction to Probability”, 2nd Ed., Athena Science 2008.
CONDITIONAL PROBABILITY
Multiplication Rule
Example 1.10 Three cards are drawn from an ordinary 52-card deck without replacement
(drawn cards are not placed back in the deck). We wish to find the probability that none of
the three cards is a heart. We assume that at each step, each one of the remaining cards is
equally likely to be picked. By symmetry, this implies that every triplet of cards is equally
likely to be drawn.

Not a Heart Ai  the ith card is not a heart  , i  1, 2,3


37 50 We want to calculate P  A1  A2  A3 
Not a Heart Hear P  A1  A2  A3   P  A1  P  A2 | A1  P  A3 | A1  A2 
t
13 5 39 38 37
38 51 0 P  A1   , P  A2 | A1   , P  A3 | A1  A2  
52 51 50
Hear
t 39 38 37
Not a Heart 13 5 P  A1  A2  A3   P  A1  P  A2 | A1  P  A3 | A1  A2    
39 52 1 52 51 50
Hear
t
13 5
2

38
Textbook: D. P. Bertsekas, J. N. Tsitsiklis, “Introduction to Probability”, 2nd Ed., Athena Science 2008.
TOTAL PROBABILITY THEOREM AND BAYES’ RULE
Total Probability Theorem:

Let A1 , A2 , , An be disjoint events that form a partition of the sample space, and assume
that P  Ai   0 for all i  1, , n . Then for any event B, we have

P  B   P  A1  B   P  A2  B    P  An  B 
 P  A1  P  B | A1   P  A2  P  B | A2    P  An  P  B | An 

A1 , A2 ,and A3 form a partition of the set  .

39
Textbook: D. P. Bertsekas, J. N. Tsitsiklis, “Introduction to Probability”, 2nd Ed., Athena Science 2008.
TOTAL PROBABILITY THEOREM AND BAYES’ RULE
Visualization and verification of Total Probability Theorem

40
Textbook: D. P. Bertsekas, J. N. Tsitsiklis, “Introduction to Probability”, 2nd Ed., Athena Science 2008.
TOTAL PROBABILITY THEOREM AND BAYES’ RULE
Total Probability Theorem
“Divide-and-Conquer” approach

- Partition (divide) the sample space into a number of scenarios (events)


- Then, calculate P(B) as the weighted average of its conditional probabilities

 The key is to choose appropriately the partition A1,…,An . This choice is often
suggested by the problem structure.

41
Textbook: D. P. Bertsekas, J. N. Tsitsiklis, “Introduction to Probability”, 2nd Ed., Athena Science 2008.
TOTAL PROBABILITY THEOREM AND BAYES’ RULE
Total Probability Theorem

Multiplication Rule

Multiplication Rule Total Probability


Theorem
+

Multiplication Rule

42
Textbook: D. P. Bertsekas, J. N. Tsitsiklis, “Introduction to Probability”, 2nd Ed., Athena Science 2008.
TOTAL PROBABILITY THEOREM AND BAYES’ RULE
Example 1.13 You enter a chess tournament where your probability of winning a game is 0.3
against half the players (call them type 1), 0.4 against a quarter of the players (call them type 2), and
0.5 against the remaining quarter of the players (call them type 3). You play a game against a
randomly chosen opponent. What is the probability of winning?

Let Ai be the event of playing with an opponent of type i. We have


P  A1   0.5, P  A2   0.25, P  A3   0.25
Let also B be the event of winning. We have
P  B | A1   0.3, P  B | A2   0.4, P  B | A3   0.5

P  B   P  A1  P  B | A1   P  A2  P  B | A2   P  A3  P  B | A3 
 0.5  0.3  0.25  0.4  0.25  0.5
 0.375
A2
A1
A3

HW: Example 1.15


43
Textbook: D. P. Bertsekas, J. N. Tsitsiklis, “Introduction to Probability”, 2nd Ed., Athena Science 2008.
TOTAL PROBABILITY THEOREM AND BAYES’ RULE

Bayes’ Rule
Let A1 , A2 , , An be disjoint events that form a partition of the sample space, and assume
that P  Ai   0 for all i  1, , n . Then for any event B such that P  B   0 , we have

P  Ai  P  B | Ai 
P  Ai | B   ( reversing the order of conditioning )
P B
P  Ai  P  B | Ai 

P  A1  P  B | A1   P  A2  P  B | A2    P  An  P  B | An 

P  Ai  P  B | Ai 
 n

 P A  PB | A 
i 1
i i

44
Textbook: D. P. Bertsekas, J. N. Tsitsiklis, “Introduction to Probability”, 2nd Ed., Athena Science 2008.
TOTAL PROBABILITY THEOREM AND BAYES’ RULE
Bayes’ Rule
P  Ai  : Prior probability of event Ai
(Probability of event Ai without knowing event B has occurred.)
P  Ai | B  : Posterior probability of event Ai
(Probability of event Ai knowing event B has occurred.)

Verification
To verify the Bayes’ rule, note that the definition of conditional probability, we have
P  Ai  P  B | Ai 
P  Ai  B   P  Ai  P  B | Ai   P  B  P  Ai | B   P  Ai | B  
P B
This yields the first equality. The second equality follows from the first by using the total
probability theorem to rewrite P  B 

P  Ai  P  B | Ai  P  Ai  P  B | Ai 
P  Ai | B   
P B P  A1  P  B | A1   P  A2  P  B | A2    P  An  P  B | An 

45
Textbook: D. P. Bertsekas, J. N. Tsitsiklis, “Introduction to Probability”, 2nd Ed., Athena Science 2008.
TOTAL PROBABILITY THEOREM AND BAYES’ RULE
Example 1.16 Radar Detection Let’s return to the radar detection problem

A  an aircraft is present , B  the radar generates an alarm


AC  an aircraft is not present , BC  the radar does not generate an alarm
We are given that P  A   0.05, P  B | A   0.99, P B | A  0.10
C
 
Applying Bayes’ rule with A1  A and A2  AC

P  aircraft present|alarm   P  A | B 
P  A P  B | A

P B
P  A P  B | A

P  A  P  B | A   P  AC  P  B | AC 
0.05  0.99 0.0495
   0.3426 !!!
0.05  0.99  0.95  0.1 0.1445
46
Textbook: D. P. Bertsekas, J. N. Tsitsiklis, “Introduction to Probability”, 2nd Ed., Athena Science 2008.
TOTAL PROBABILITY THEOREM AND BAYES’ RULE
Example 1 .17. Let’s return to the chess problem of Example 1.13. Here. Ai is
the event of getting an opponent of type i, and
P  A1   0.5, P  A2   0.25, P  A3   0.25

Also, B is the “event of winning”, and


P  B | A1   0.3, P  B | A2   0.4, P  B | A3   0.5

Suppose that you win. What is the probability P  A1 | B  that you had an opponent of type-1?

Using Bayes’ rule, we have

P  A1  P  B | A1 
P  A1 | B  
P  A1  P  B | A1   P  A2  P  B | A2   P  A3  P  B | A3 
0.5  0.3

0.5  0.3  0.25  0.4  0.25  0.5
 0.4

47
Textbook: D. P. Bertsekas, J. N. Tsitsiklis, “Introduction to Probability”, 2nd Ed., Athena Science 2008.
TOTAL PROBABILITY THEOREM AND BAYES’ RULE
Example 1.18 A test for a certain rare disease is assumed to be correct 95% of the time: if a
person has the disease, the test results are positive with probability 0.95, and if the person does
not have the disease, the test results are negative with probability 0.95. A random person drawn
from a certain population has probability 0.001 of having the disease. Given that the
person just tested positive, what is the probability of having the disease?

Let’s define the events A and B as

A = {the person has the disease}


B = {the test results are positive}

P  A P  B | A
P A | B 
P  A P  B | A   P  Ac  P  B | Ac 
0.001 0.95
 very unlikely (less than 2%) !!!
0.001 0.95  0.999  0.05
 0.0187

48
Textbook: D. P. Bertsekas, J. N. Tsitsiklis, “Introduction to Probability”, 2nd Ed., Athena Science 2008.
Overview- Total Probability Theorem, Multiplication Rule and
Bayes’ Rule

The key is to choose appropriately the partition A1, … An, and this choice is often suggested
by the problem structure.

A2
A1
A3

49
Textbook: D. P. Bertsekas, J. N. Tsitsiklis, “Introduction to Probability”, 2nd Ed., Athena Science 2008.
Overview-(continued)
EVENTS PROBABILITIES PROBABILITY
(obtained by Multiplication Rule) (obtained by Total Probability
Theorem)
P(B|A1) A1∩B P(A1∩B)

⋃ Additivity +
P(A1)
P(B|A2) A2∩B P(A2∩B) P(B)
P(A2)
⋃ +

P(A3) P(B|A3) A3∩B P(A3∩B)

EVENT: B = (A1∩B) ⋃ (A2∩B) ⋃ (A3∩B)

P  A1  P  B | A1 
P  A1 | B   BAYES’ RULE
P B
50
Textbook: D. P. Bertsekas, J. N. Tsitsiklis, “Introduction to Probability”, 2nd Ed., Athena Science 2008.
INDEPENDENCE
If the occurrence of an event B does not alter the probability that A has occurred, i.e.,
P  A | B   P  A
then we say that event A is independent of event B.
By using the definition of conditional probability, P  A | B   P  A  B  P  B  , definition
of independence can be given as
P  A  B   P  A P  B 
We adopt this latter relation as the definition of independence because it can be used even
when P  B   0 , in which case P  A | B  is undefined.
For example: Let A have zero probability, then P  A  B   0 since A  B is a smaller event.
P  A  B   P  A P  B 
A zero probability event is independent of any other event.
0 0

Symmetric property: If A is independent of B, then B is independent of A, and we can


say that A and B are independent events.
51
Textbook: D. P. Bertsekas, J. N. Tsitsiklis, “Introduction to Probability”, 2nd Ed., Athena Science 2008.
INDEPENDENCE
Independence is often easy to grasp intuitively. For example, if the occurrence of two
events is controlled by distinct and noninteracting physical processes, such events will turn
out to be independent.

On the other hand, independence is not easily visualized in terms of the sample space. A
common first thought is that two events are independent if they are disjoint, but in fact the
opposite is true: two disjoint events A and B with P  A  0 and P  B   0 are never
independent, since their intersection A  B is empty and has probability 0.

For example, an event A and its complement Ac are not independent (unless P  A  0 or
P  A  1), since knowledge that A has occurred provides precise information about
whether Ac has occurred.

52
Textbook: D. P. Bertsekas, J. N. Tsitsiklis, “Introduction to Probability”, 2nd Ed., Athena Science 2008.
Example 1.19. Consider an experiment involving two successive rolls of a 4-sided die in
which all 16 possible outcomes are equally likely and have probability 1/16.

b) Are the events


A  1st roll is a 1 , B  sum of the two rolls is a 5
independent?

P  A  B   P  the result of the two roll is 1,4   


1
16
number of elements of A 4
P  A  
total number of possible outcomes 16
number of elements of B 4
P  B   , since B={(l,4),(2,3),(3,2), (4,1)}
total number of possible outcomes 16

P  A  B   P  A P  B  and the events A and B are independent.

53
Textbook: D. P. Bertsekas, J. N. Tsitsiklis, “Introduction to Probability”, 2nd Ed., Athena Science 2008.
Example 1.19 (Continued)

c) Are the events

A  maximum of the two rolls is 2 , B  minimum of the two rolls is 2


P  A  B   P  the result of the two roll is  2,2   
1
16
 A  1, 2  ,  2,1 ,  2, 2 
number of elements of A 3
P  A  
total number of possible outcomes 16
 B   2, 2  ,  3, 2  ,  2,3 ,  4, 2  ,  2, 4 
number of elements of B 5
P  B  
total number of possible outcomes 16

P  A  B   P  A P  B  and the event A and B are not independent.

54
Textbook: D. P. Bertsekas, J. N. Tsitsiklis, “Introduction to Probability”, 2nd Ed., Athena Science 2008.
Conditional Independence
Given an event C, the events A and B are called conditionally independent if
P A  B | C  P A | C PB | C
Example 1.20: Consider two independent fair coin tosses, in which all four possible
outcomes are equally likely. Let

H1  1st toss is a head , H 2  2nd toss is a head ,


D  the two tosses have different results

The events H1 and H 2 are unconditionally independent. But


1 1
P  H1 | D   , P  H 2 | D   , P  H1  H 2 | D   0 , new universe:{HT,TH}
2 2
So that
P  H1  H 2 | D   P  H1 | D  P  H 2 | D 
and H1 , H 2 are not conditionally independent.

 Independence does not imply conditional independence, and vice versa.


55
Textbook: D. P. Bertsekas, J. N. Tsitsiklis, “Introduction to Probability”, 2nd Ed., Athena Science 2008.
Independence of a Collection of Events

The events A1 , A2 , , An are said to be independent if


P  n
i 1 i 
A   i 1 P  Ai  ,
n
for every subset of {1,2,…,n}

For the case of three events, A1, A2, and A3, independence amounts to satisfying the four
conditions

P  A1  A2   P  A1  P  A2  ,
P  A1  A3   P  A1  P  A3  ,
P  A2  A3   P  A2  P  A3  ,
P  A1  A2  A3   P  A1  P  A2  P  A3  .

The fourth condition does not follow from the first three. Conversely, the fourth condition
does not imply the first three. See Examples 1.22 and 1.23 (Homework).

56
Textbook: D. P. Bertsekas, J. N. Tsitsiklis, “Introduction to Probability”, 2nd Ed., Athena Science 2008.
Independent Trials and the Binomial Probabilities

 Independent Trials: If an experiment involves a sequence of independent but identical


stages, then each stage is called independent trial.
 Bernoulli Trials: In particular, if there are only two possible results at each stage, then
the sequence of these independent trials is called independent Bernoulli trials.

Example:
Consider an experiment that consists of n independent tosses of a biased coin, in which
probability of “heads” is p and probability of “tails” is 1-p. In this experiment,
independence means that the events A1 , A2 , , An are independent where

Ai  ith toss is a head

This experiment can be visualized for n  3 by using tree-based sequential description as


follows:

57
Textbook: D. P. Bertsekas, J. N. Tsitsiklis, “Introduction to Probability”, 2nd Ed., Athena Science 2008.
Independent Trials and Binomial Probabilities
p HHH  P  HHH   p 3
HH
P(H|H) = p,
HHT  P  HHT   p 2 1  p 
p
because of independence 1 p
H
p HTH  P  HTH   p 2 1  p 
p 1 p
HT HTT  P  HTT   p 1  p 
2
1 p

p THH  P THH   p 2 1  p 
TH
1 p p
THT  P THT   p 1  p 
2
1 p
T
TTH  P TTH   p 1  p 
2
p
1 p
TTT  P TTT   1  p 
TT 1  p 3

58
Textbook: D. P. Bertsekas, J. N. Tsitsiklis, “Introduction to Probability”, 2nd Ed., Athena Science 2008.
Independent Trials and Binomial Probabilities

In this example, any particular (or any given) outcome (3-long sequence of heads and
tails) that involves k heads and 3 – k tails has probability

p k 1  p 
3 k
.

This formula can be extended to the case of general n tosses. In this case, the probability
of any particular (or any given) n-long sequence that contains k heads and n – k tails is

p k 1  p 
nk
.

Let us now consider the probability

p  k   P " k heads come up in an n-toss sequence"

59
Textbook: D. P. Bertsekas, J. N. Tsitsiklis, “Introduction to Probability”, 2nd Ed., Athena Science 2008.
Independent Trials and Binomial Probabilities
The probability given above considers the all possible n-long sequences that contains k
heads. Therefore, this probability can be calculated by summing the probabilities of all
distinct n - toss sequences that contains k heads, since the distinct n-toss sequences are
mutually exclusive.

n
p  k     p k 1  p   Binomial Probabilities
nk

k 
n
 k  : number of distinct n-toss sequences contain k heads (read as “n choose k”)
 
n
The numbers   are known as the binomial coefficients.
k 
n n!

 k  k ! n  k !, k  0,1, 2, , n
   
where for any positive integer i we have
i !  1 2  3   i, 0!  1 (by definition)
60
Textbook: D. P. Bertsekas, J. N. Tsitsiklis, “Introduction to Probability”, 2nd Ed., Athena Science 2008.
COUNTING
The calculation of probabilities often involves counting the number of outcomes in
various events. We have already seen two contexts where such counting arises.

1) When the sample space  has a finite number of equally likely outcomes, so that the
discrete uniform probability law applies. Then, the probability of any event A is given
by
number of elements of A
P  A 
number of elements of 
and involves counting the elements of A and of  .

2) When we want to calculate the probability of an event A with a finite number of


equally likely outcomes, each of which has an already known probability p. Then the
probability of A is given by

P  A  p   number of elements of A
and involves counting the elements of A

61
Textbook: D. P. Bertsekas, J. N. Tsitsiklis, “Introduction to Probability”, 2nd Ed., Athena Science 2008.
Counting
The Counting Principle: The counting principle is based on a divide-and-conquer
approach, whereby the counting is broken down into stages through the use of a tree.

Equally likely outcomes


leaves

n1 n2 n3 n4
Choices Choices Choices Choices

Stage 1 Stage 2 Stage 3 Stage 4

62
Textbook: D. P. Bertsekas, J. N. Tsitsiklis, “Introduction to Probability”, 2nd Ed., Athena Science 2008.
Counting
The Counting Principle

Consider a process that consists of r stages. Suppose that:

o There are n1 possible results for the first stage.


o For every possible result of the first stage, there are n2 possible results at the second
stage.
o More generally, for all possible results of the first i  1 stages, there are ni possible
results at the ith stage.
o Then, the total number of possible results of the r-stage process is

n1  n2   nr

63
Textbook: D. P. Bertsekas, J. N. Tsitsiklis, “Introduction to Probability”, 2nd Ed., Athena Science 2008.
Counting

Example 1.26. A telephone number is a 7-digit sequence, but the first digit has to be
different from 0 or 1. How many distinct telephone numbers are there?

8 10  10  8 106

Example 1.27. Consider an n-element set s1 , s2 , , sn  . How many subsets does it have
(including itself and the empty set)?

We can visualize the choice of a subset as a sequential process where we examine one
element at a time and decide whether to include it in the set or not. There will be total of n
stages, and a binary choice at each stage. Therefore the number of subsets is

2 2  2  2n .
n times

64
Textbook: D. P. Bertsekas, J. N. Tsitsiklis, “Introduction to Probability”, 2nd Ed., Athena Science 2008.
Counting
Two types of counting arguments: permutations and combinations.
Permutation: Selection of k objects out of the n objects with paying attention to the
order of the selection is called a permutation.
k-permutations:
Assume that there are n distinct objects and we want to select k of them. How many
different ways can be found where the order of the selection matters?
The answer is simple:
o Any of the n objects can be chosen to be the first one.
o After choosing first one, there are only n  1 possible choices for the second one.
o Given the choice of first two, there only remains n  2 available objects for the third
selection, etc.
o At the kth selection, k  1 objects have already been selected and there would be
n   k  1 objects available for the last selection.
By the Counting Principle, the number of possible sequences, called k-permutations is
n   n  1    n  k  1   n  k    2 1 n!
n  n  1  n  k  1  
 n  k    2 1  n  k !
65
Textbook: D. P. Bertsekas, J. N. Tsitsiklis, “Introduction to Probability”, 2nd Ed., Athena Science 2008.
Counting
Example Let us select two objects out of the four objects A, B, C, and D with paying
attention to the order of the selection.

2-permutations of the objects A, B, C, and D are

AB, BA, AC, CA, AD, DA, BC, CB, BD, DB, CD, DC.
n! 4!
  4  3  12
 n  k  2!
!

Example 1.28. Let us count the number of words that consist of four distinct letters.

This problem can be interpreted as the counting the number of 4-permutations of the 26
letters in English alphabet.
n! 26!
  26  25  24  23  358,800
 n  k ! 22!
Example 1.29. Homework

66
Textbook: D. P. Bertsekas, J. N. Tsitsiklis, “Introduction to Probability”, 2nd Ed., Athena Science 2008.
Counting
Combination: Selection of k objects out of the n objects without paying any attention to
the order of the selection is called a combination.

2-permutations of the letters A, B, C, and D are


AB, BA, AC, CA, AD, DA, BC, CB, BD, DB, CD, DC.

However, combinations of two out four of these letters are


AB, AC, AD, BC, BD, CD

n n!
The number of possible combinations is given by 
 k  k! n  k !
   
Note that each combination is associated with k ! “duplicate”.

Example 1.30. The number of combinations of two out of the four letters A, B, C, and D
is found by letting n=4 and k=2. It is
 4 4!
 2  2! 4  2 !  6

   
67
Textbook: D. P. Bertsekas, J. N. Tsitsiklis, “Introduction to Probability”, 2nd Ed., Athena Science 2008.
Counting
A more general type of counting: partitions.

Recall that a combination can be viewed as a partition of the set in two: one part contains
k elements and the other contains the remaining n-k. Let us generalize by considering
partitions into more than two subsets.

Partitions:
We are given an n-element set and nonnegative integers n1 , n2 , , nr

n1  n2   nr  n

We consider partitions of the set into r disjoint subsets, with the ith subset containing
exactly ni elements. How many ways are there to divide n objects into r disjoint groups?

68
Textbook: D. P. Bertsekas, J. N. Tsitsiklis, “Introduction to Probability”, 2nd Ed., Athena Science 2008.
Counting
Partitions and Multinomial Coefficients:
Each group can be form one at a time.
n 
o There are   ways of forming the first group.
 n1 
 n  n1 
o There are   ways of forming the second group, and so on.
 2 
n
o Use the Counting Principle

 n   n  n1   n  n1  n2   n  n1  n2   nr 1 
n  n     
 1  2   n 3   n r 
which is equal to
n!

 n  n1 !
 
 n  n1   nr 1 !

n!
 
 n 
n1 ! n  n1 ! n2 ! n  n1  n2 ! nr ! n  n1   nr ! n1 !n 2! n r!  n1 , n2 , , nr 
Multinomial Coefficient

n  n  n!
Recall that a combination (two groups):     
 k! n  k !
  
k k , n  k   
69
Textbook: D. P. Bertsekas, J. N. Tsitsiklis, “Introduction to Probability”, 2nd Ed., Athena Science 2008.
Counting
Example 1.33. A class consisting of 4 graduate and 12 undergraduate students is randomly
divided into four groups of 4. What is the probability that each group includes a graduate
student?

We first determine the nature of the sample space. A typical outcome is a particular way of
partitioning the 16 students into four groups of 4. We take the term “randomly” to mean
that every possible partition is equally likely, so that the probability question can be reduced
to one of counting.

According to our discussion, there are

 16  16!

 4, 4, 4, 4  4!4!4!4!
 

different partitions, this is the size of the sample space ( N S ).

70
Textbook: D. P. Bertsekas, J. N. Tsitsiklis, “Introduction to Probability”, 2nd Ed., Athena Science 2008.
Counting
Example 1.33 (Continued)
Let us now focus on the event that each group contains a graduate student. Two stages:
o Number of ways of distributing four graduate students to the four group is 4!
o Number of ways of distributing remaining 12 undergraduate students to the four
groups where each group will have three undergrads.
 12  12!

 3,3,3,3  3!3!3!3!
 
Total number of elements of the interested event can be obtained by using Counting
Principle
12!
N A  4!
3!3!3!3!

12!
4!
N
The probability of this event is P  A  A  3!3!3!3!
NS 16!
4!4!4!4!
71
Textbook: D. P. Bertsekas, J. N. Tsitsiklis, “Introduction to Probability”, 2nd Ed., Athena Science 2008.

You might also like