0% found this document useful (0 votes)
56 views32 pages

DMS 5.1-Probability

Slides for a Course Based on the TextDiscrete Mathematics & Its Applications (5th Edition)by Kenneth H. Rosen

Uploaded by

Dhamodhar Reddy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
56 views32 pages

DMS 5.1-Probability

Slides for a Course Based on the TextDiscrete Mathematics & Its Applications (5th Edition)by Kenneth H. Rosen

Uploaded by

Dhamodhar Reddy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 32

Module #19 – Probability

University of Florida
Dept. of Computer & Information Science & Engineering

COT 3100
Applications of Discrete Structures
Dr. Michael P. Frank

Slides for a Course Based on the Text


Discrete Mathematics & Its Applications
(5th Edition)
by Kenneth H. Rosen

08/28/25 (c)2001-2003, Michael P. Frank 1


Module #19 – Probability

Module #19:
Probability Theory

Rosen 5th ed., ch. 5 (§§5.1-5.3)


~26 slides, ~1 lecture

08/28/25 (c)2001-2003, Michael P. Frank 2


Module #19 – Probability

Why Probability?
• In the real world, we often don’t know
whether a given proposition is true or false.
• Probability theory gives us a way to reason
about propositions whose truth is uncertain.
• It is useful in weighing evidence,
diagnosing problems, and analyzing
situations whose exact details are unknown.

08/28/25 (c)2001-2003, Michael P. Frank 3


Module #19 – Probability

Random Variables
• A “random variable” V is any variable whose
value is unknown, or whose value depends on
the precise situation.
– E.g., the number of students in class today
– Whether it will rain tonight (Boolean variable)
• Let the domain of V be dom[V]≡{v1,…,vn}
– Infinite domains can also be dealt with if needed.
• The proposition V=vi may have an uncertain
truth value, and may be assigned a probability.
08/28/25 (c)2001-2003, Michael P. Frank 4
Module #19 – Probability
Information Capacity
• The information capacity I[V] of a random variable V
with a finite domain can be defined as the logarithm
(with indeterminate base) of the size of the domain of V,
I[V] :≡ log |dom[V]|.
– The log’s base determines the associated information unit!
• Taking the log base 2 yields an information unit of 1 bit b = log 2.
– Related units include the nybble N = 4 b = log 16 (1 hexadecimal digit),
– and more famously, the byte B = 8 b = log 256.
• Other common logarithmic units that can be used as units of
information:
– the nat, or e-fold n = log e,
» widely known in thermodynamics as Boltzmann’s constant k.
– the bel or decade or order of magnitude (D = log 10),
– and the decibel or dB = D/10 = (log 10)/10 ≈ log 1.2589
• Example: An 8-bit register has 28 = 256 possible values.
– Its information capacity is thus: log 256 = 8 log 2 = 8 b!
• Or 2N, or 1B, or loge256 ≈ 5.545 n, or log10256 = 2.408 D, or 24.08 dB
08/28/25 (c)2001-2003, Michael P. Frank 5
Module #19 – Probability

Experiments & Sample Spaces


• A (stochastic) experiment is any process by which
a given random variable V gets assigned some
particular value, and where this value is not
necessarily known in advance.
– We call it the “actual” value of the variable, as
determined by that particular experiment.
• The sample space S of the experiment is just
the domain of the random variable, S = dom[V].
• The outcome of the experiment is the specific
value vi of the random variable that is selected.

08/28/25 (c)2001-2003, Michael P. Frank 6


Module #19 – Probability

Events
• An event E is any set of possible outcomes in S…
– That is, E  S = dom[V].
• E.g., the event that “less than 50 people show up for our next
class” is represented as the set {1, 2, …, 49} of values of the
variable V = (# of people here next class).
• We say that event E occurs when the actual value
of V is in E, which may be written VE.
– Note that VE denotes the proposition (of uncertain
truth) asserting that the actual outcome (value of V) will
be one of the outcomes in the set E.

08/28/25 (c)2001-2003, Michael P. Frank 7


Module #19 – Probability

Probability
• The probability p = Pr[E]  [0,1] of an event E is
a real number representing our degree of certainty
that E will occur.
– If Pr[E] = 1, then E is absolutely certain to occur,
• thus VE has the truth value True.
– If Pr[E] = 0, then E is absolutely certain not to occur,
• thus VE has the truth value False.
– If Pr[E] = ½, then we are maximally uncertain about
whether E will occur; that is,
• VE and VE are considered equally likely.
– How do we interpret other values of p?
Note: We could also define probabilities for more
general propositions, as well as events.
08/28/25 (c)2001-2003, Michael P. Frank 8
Module #19 – Probability

Four Definitions of Probability


• Several alternative definitions of probability
are commonly encountered:
– Frequentist, Bayesian, Laplacian, Axiomatic
• They have different strengths &
weaknesses, philosophically speaking.
– But fortunately, they coincide with each other
and work well together, in the majority of cases
that are typically encountered.

08/28/25 (c)2001-2003, Michael P. Frank 9


Module #19 – Probability

Probability: Frequentist Definition


• The probability of an event E is the limit, as n→∞, of
the fraction of times that we find VE over the course
of n independent repetitions of (different instances of)
the same experiment. nV E
Pr[ E ] :lim
• Some problems with this definition: n  n

– It is only well-defined for experiments that can be


independently repeated, infinitely many times!
• or at least, if the experiment can be repeated in principle, e.g., over
some hypothetical ensemble of (say) alternate universes.
– It can never be measured exactly in finite time!
• Advantage: It’s an objective, mathematical definition.
08/28/25 (c)2001-2003, Michael P. Frank 10
Module #19 – Probability

Probability: Bayesian Definition


• Suppose a rational, profit-maximizing entity R is
offered a choice between two rewards:
– Winning $1 if and only if the event E actually occurs.
– Receiving p dollars (where p[0,1]) unconditionally.
• If R can honestly state that he is completely
indifferent between these two rewards, then we say
that R’s probability for E is p, that is, PrR[E] :≡ p.
• Problem: It’s a subjective definition; depends on the
reasoner R, and his knowledge, beliefs, & rationality.
– The version above additionally assumes that the utility of
money is linear.
• This assumption can be avoided by using “utils” (utility units)
instead of dollars.

08/28/25 (c)2001-2003, Michael P. Frank 11


Module #19 – Probability

Probability: Laplacian Definition


• First, assume that all individual outcomes in the
sample space are equally likely to each other…
– Note that this term still needs an operational definition!
• Then, the probability of any event E is given by,
Pr[E] = |E|/|S|. Very simple!
• Problems: Still needs a definition for equally likely,
and depends on the existence of some finite sample
space S in which all outcomes in S are, in fact,
equally likely.

08/28/25 (c)2001-2003, Michael P. Frank 12


Module #19 – Probability

Probability: Axiomatic Definition


• Let p be any total function p:S→[0,1] such that
∑s p(s) = 1.
• Such a p is called a probability distribution.
• Then, the probability under p
of any event ES is just:
Prp [ E ] : p ( s )
sE

• Advantage: Totally mathematically well-defined!
– This definition can even be extended to apply to infinite
sample spaces, by changing ∑→∫, and calling p a
probability density function or a probability measure.
• Problem: Leaves operational meaning unspecified.
08/28/25 (c)2001-2003, Michael P. Frank 13
Module #19 – Probability

Probabilities of Mutually
Complementary Events
• Let E be an event in a sample space S.
• Then, E represents the complementary
event, saying that the actual value of VE.
• Theorem: Pr[E] = 1 − Pr[E]
– This can be proved using the Laplacian
definition of probability, since Pr[E] = |E|/|S| =
(|S|−|E|)/|S| = 1 − |E|/|S| = 1 − Pr[E].
• Other definitions can also be used to prove it.

08/28/25 (c)2001-2003, Michael P. Frank 14


Module #19 – Probability

Probability vs. Odds Exercise:


Express the
• You may have heard the term “odds.” probability
– It is widely used in the gambling community. p as a function
• This is not the same thing as probability! of the odds in
– But, it is very closely related. favor O.
• The odds in favor of an event E means the relative
probability of E compared with its complement E.
O(E) :≡ Pr(E)/Pr(E).
– E.g., if p(E) = 0.6 then p(E) = 0.4 and O(E) = 0.6/0.4 = 1.5.
• Odds are conventionally written as a ratio of integers.
– E.g., 3/2 or 3:2 in above example. “Three to two in favor.”
• The odds against E just means 1/O(E). “2 to 3 against”
08/28/25 (c)2001-2003, Michael P. Frank 15
Module #19 – Probability

Example 1: Balls-and-Urn
• Suppose an urn contains 4 blue balls and 5 red balls.
• An example experiment: Shake up the urn, reach in
(without looking) and pull out a ball.
• A random variable V: Identity of the chosen ball.
• The sample space S: The set of
all possible values of V:
– In this case, S = {b1,…,b9}
b1 b
• An event E: “The ball chosen is 2
b7 b9
b3 b5
blue”: E = { ______________ }
b4 b b8
• What are the odds in favor of E? 6

• What is the probability of E? (Use Laplacian def’n.)


08/28/25 (c)2001-2003, Michael P. Frank 16
Module #19 – Probability

Example 2: Seven on Two Dice


• Experiment: Roll a pair of
fair (unweighted) 6-sided dice.
• Describe a sample space for this
experiment that fits the Laplacian definition.
• Using this sample space, represent an event E
expressing that “the upper spots sum to 7.”
• What is the probability of E?

08/28/25 (c)2001-2003, Michael P. Frank 17


Module #19 – Probability

Probability of Unions of Events


• Let E1,E2  S = dom[V].
• Then we have: Theorem:
Pr[E1 E2] = Pr[E1] + Pr[E2] − Pr[E1E2]
– By the inclusion-exclusion principle, together
with the Laplacian definition of probability.
• You should be able to easily flesh out the
proof yourself at home.

08/28/25 (c)2001-2003, Michael P. Frank 18


Module #19 – Probability

Mutually Exclusive Events


• Two events E1, E2 are called mutually
exclusive if they are disjoint: E1E2 = 
– Note that two mutually exclusive events cannot
both occur in the same instance of a given
experiment.
• For mutually exclusive events,
Pr[E1  E2] = Pr[E1] + Pr[E2].
– Follows from the sum rule of combinatorics.
08/28/25 (c)2001-2003, Michael P. Frank 19
Module #19 – Probability

Exhaustive Sets of Events


• A set E = {E1, E2, …} of events in the sample
space S is called exhaustive iff Ei S .
• An exhaustive set E of events that are all mutually
exclusive with each other has the property that
 Pr[ Ei ] 1.
• You should be able to easily prove this theorem,
using either the Laplacian or Axiomatic
definitions of probability from earlier.

08/28/25 (c)2001-2003, Michael P. Frank 20


Module #19 – Probability

Independent Events
• Two events E,F are called independent if
Pr[EF] = Pr[E]·Pr[F].
• Relates to the product rule for the number
of ways of doing two independent tasks.
• Example: Flip a coin, and roll a die.
Pr[(coin shows heads)  (die shows 1)] =
Pr[coin is heads] × Pr[die is 1] = ½×1/6 =1/12.

08/28/25 (c)2001-2003, Michael P. Frank 21


Module #19 – Probability

Conditional Probability
• Let E,F be any events such that Pr[F]>0.
• Then, the conditional probability of E given F,
written Pr[E|F], is defined as
Pr[E|F] :≡ Pr[EF]/Pr[F].
• This is what our probability that E would turn out
to occur should be, if we are given only the
information that F occurs.
• If E and F are independent then Pr[E|F] = Pr[E].
 Pr[E|F] = Pr[EF]/Pr[F] = Pr[E]×Pr[F]/Pr[F] = Pr[E]

08/28/25 (c)2001-2003, Michael P. Frank 22


Module #19 – Probability

Prior and Posterior Probability


• Suppose that, before you are given any information about the
outcome of an experiment, your personal probability for an
event E to occur is p(E) = Pr[E].
– The probability of E in your original probability distribution p is called
the prior probability of E.
• This is its probability prior to obtaining any information about the outcome.
• Now, suppose someone tells you that some event F (which may
overlap with E) actually occurred in the experiment.
– Then, you should update your personal probability for event E to occur,
to become p′(E) = Pr[E|F] = p(E∩F)/p(F).
• The conditional probability of E, given F.
– The probability of E in your new probability distribution p′ is called the
posterior probability of E.
• This is its probability after learning that event F occurred.
• After seeing F, the posterior distribution p′ is defined by letting

08/28/25
p′(v) = p({v}∩F)/p(F)(c)2001-2003,
for eachMichael
individual
P. Frank
outcome vS. 23
Module #19 – Probability

Visualizing Conditional Probability


• If we are given that event F occurs, then
– Our attention gets restricted to the subspace F.
• Our posterior probability for E (after seeing
F) corresponds Entire sample space S
to the fraction
of F where E Event E Event F
occurs also. Event
E∩F
• Thus, p′(E)=
p(E∩F)/p(F).
08/28/25 (c)2001-2003, Michael P. Frank 24
Module #19 – Probability

Conditional Probability Example


• Suppose I choose a single letter out of the 26-letter English
alphabet, totally at random.
– Use the Laplacian assumption on the sample space {a,b,..,z}. st
– What is the (prior) probability
1 9
vowels letters
that the letter is a vowel?
• Pr[Vowel] = __ / __ .
z w r
• Now, suppose I tell you that the k b c
letter chosen happened to be in y u a t
the first 9 letters of the alphabet. d f
x e
– Now, what is the conditional (or s o i h g l
posterior) probability that the letter
is a vowel, given this information? p n j v q m
• Pr[Vowel | First9] = ___ / ___ . Sample Space S
08/28/25 (c)2001-2003, Michael P. Frank 25
Module #19 – Probability

Bayes’ Rule
• One way to compute the probability that a
hypothesis H is correct, given some data D:
Pr[ D | H ] Pr[ H ]
Pr[ H | D] 
Pr[ D] Rev. Thomas Bayes
1702-1761
• This follows directly from the definition of
conditional probability! (Exercise: Prove it at home.)
• This rule is the foundation of Bayesian methods for
probabilistic reasoning, which are very powerful, and
widely used in artificial intelligence applications:
– For data mining, automated diagnosis, pattern recognition,
statistical modeling, even evaluating scientific hypotheses!
08/28/25 (c)2001-2003, Michael P. Frank 26
Module #19 – Probability

Expectation Values
• For any random variable V having a numeric domain, its
expectation value or expected value or weighted average
value or (arithmetic) mean value Ex[V], under the
probability distribution Pr[v] = p(v), is defined as

Vˆ :Ex[V ] :Ex p [V ] :  v p(v).


vdom [V ]
• The term “expected value” is very widely used for this.
– But this term is somewhat misleading, since the “expected”
value might itself be totally unexpected, or even impossible!
• E.g., if p(0)=0.5 & p(2)=0.5, then Ex[V]=1, even though p(1)=0 and so
we know that V≠1!
• Or, if p(0)=0.5 & p(1)=0.5, then Ex[V]=0.5 even if V is an integer
variable!
08/28/25 (c)2001-2003, Michael P. Frank 27
Module #19 – Probability

Derived Random Variables


• Let S be a sample space over values of a random
variable V (representing possible outcomes).
• Then, any function f over S can also be considered
to be a random variable (whose actual value f(V) is
derived from the actual value of V).
• If the range R = range[f] of f is numeric, then the
mean value Ex[f] of f can still be defined, as
fˆ Ex[ f ]  p ( s ) f ( s )
sS

08/28/25 (c)2001-2003, Michael P. Frank 28


Module #19 – Probability

Linearity of Expectation Values


• Let X1, X2 be any two random variables
derived from the same sample space S, and
subject to the same underlying distribution.
• Then we have the following theorems:
Ex[X1+X2] = Ex[X1] + Ex[X2]
Ex[aX1 + b] = aEx[X1] + b
• You should be able to easily prove these for
yourself at home.

08/28/25 (c)2001-2003, Michael P. Frank 29


Module #19 – Probability

Variance & Standard Deviation


• The variance Var[X] = σ2(X) of a random variable
X is the expected value of the square of the
difference between the value of X and its
expectation value Ex[X]:
Var [ X ] : X ( s )  Ex p [ X ] p ( s )
2

sS

• The standard deviation or root-mean-square


(RMS) difference of X is σ(X) :≡ Var[X]1/2.

08/28/25 (c)2001-2003, Michael P. Frank 30


Module #19 – Probability

Entropy
• The entropy H of a probability distribution p over a
sample space S over outcomes is a measure of our
degree of uncertainty about the actual outcome.
– It measures the expected amount of increase in our known
information that would result from learning the outcome.
H ( p ) :Ex p [log p  1 ]   p ( s ) log p ( s )
sS
• The base of the logarithm gives the corresponding unit
of entropy; base 2 → 1 bit, base e → 1 nat (as before)
– 1 nat is also known as “Boltzmann’s constant” kB & as the
“ideal gas constant” R, and was first discovered physically
08/28/25 (c)2001-2003, Michael P. Frank 31
Module #19 – Probability

Visualizing Entropy
Sample Nonuniform vs. Uniform Probability Distributions Improbability (Inverse Probability)
1
80
0.8 Improb-
Probability
ability
60
0.6
(1 out of N) 40
0.4

0.2
20

0 0
1 2 3 1 2 3
4 5 6 4 5 6
7 8 9 7 8 9
State Index
10
State Index
10

Log Improbability (Information of Discovery) Boltzmann-Gibbs-Shannon Entropy


(Expected Log Improbability)
7 0.5
6 0.4
5
Log 0.3
Base 2 4
Bits
of 3 0.2
Improb. 2
0.1
1
0 0
1 2 3
1 2 3 S1 4 5
4 5 6 6 7 8
7 8 9 9 10
State Index 10 State Index
08/28/25 (c)2001-2003, Michael P. Frank 32

You might also like