Module #19 – Probability
University of Florida
Dept. of Computer & Information Science & Engineering
COT 3100
Applications of Discrete Structures
Dr. Michael P. Frank
Slides for a Course Based on the Text
Discrete Mathematics & Its Applications
(5th Edition)
by Kenneth H. Rosen
08/28/25 (c)2001-2003, Michael P. Frank 1
Module #19 – Probability
Module #19:
Probability Theory
Rosen 5th ed., ch. 5 (§§5.1-5.3)
~26 slides, ~1 lecture
08/28/25 (c)2001-2003, Michael P. Frank 2
Module #19 – Probability
Why Probability?
• In the real world, we often don’t know
whether a given proposition is true or false.
• Probability theory gives us a way to reason
about propositions whose truth is uncertain.
• It is useful in weighing evidence,
diagnosing problems, and analyzing
situations whose exact details are unknown.
08/28/25 (c)2001-2003, Michael P. Frank 3
Module #19 – Probability
Random Variables
• A “random variable” V is any variable whose
value is unknown, or whose value depends on
the precise situation.
– E.g., the number of students in class today
– Whether it will rain tonight (Boolean variable)
• Let the domain of V be dom[V]≡{v1,…,vn}
– Infinite domains can also be dealt with if needed.
• The proposition V=vi may have an uncertain
truth value, and may be assigned a probability.
08/28/25 (c)2001-2003, Michael P. Frank 4
Module #19 – Probability
Information Capacity
• The information capacity I[V] of a random variable V
with a finite domain can be defined as the logarithm
(with indeterminate base) of the size of the domain of V,
I[V] :≡ log |dom[V]|.
– The log’s base determines the associated information unit!
• Taking the log base 2 yields an information unit of 1 bit b = log 2.
– Related units include the nybble N = 4 b = log 16 (1 hexadecimal digit),
– and more famously, the byte B = 8 b = log 256.
• Other common logarithmic units that can be used as units of
information:
– the nat, or e-fold n = log e,
» widely known in thermodynamics as Boltzmann’s constant k.
– the bel or decade or order of magnitude (D = log 10),
– and the decibel or dB = D/10 = (log 10)/10 ≈ log 1.2589
• Example: An 8-bit register has 28 = 256 possible values.
– Its information capacity is thus: log 256 = 8 log 2 = 8 b!
• Or 2N, or 1B, or loge256 ≈ 5.545 n, or log10256 = 2.408 D, or 24.08 dB
08/28/25 (c)2001-2003, Michael P. Frank 5
Module #19 – Probability
Experiments & Sample Spaces
• A (stochastic) experiment is any process by which
a given random variable V gets assigned some
particular value, and where this value is not
necessarily known in advance.
– We call it the “actual” value of the variable, as
determined by that particular experiment.
• The sample space S of the experiment is just
the domain of the random variable, S = dom[V].
• The outcome of the experiment is the specific
value vi of the random variable that is selected.
08/28/25 (c)2001-2003, Michael P. Frank 6
Module #19 – Probability
Events
• An event E is any set of possible outcomes in S…
– That is, E S = dom[V].
• E.g., the event that “less than 50 people show up for our next
class” is represented as the set {1, 2, …, 49} of values of the
variable V = (# of people here next class).
• We say that event E occurs when the actual value
of V is in E, which may be written VE.
– Note that VE denotes the proposition (of uncertain
truth) asserting that the actual outcome (value of V) will
be one of the outcomes in the set E.
08/28/25 (c)2001-2003, Michael P. Frank 7
Module #19 – Probability
Probability
• The probability p = Pr[E] [0,1] of an event E is
a real number representing our degree of certainty
that E will occur.
– If Pr[E] = 1, then E is absolutely certain to occur,
• thus VE has the truth value True.
– If Pr[E] = 0, then E is absolutely certain not to occur,
• thus VE has the truth value False.
– If Pr[E] = ½, then we are maximally uncertain about
whether E will occur; that is,
• VE and VE are considered equally likely.
– How do we interpret other values of p?
Note: We could also define probabilities for more
general propositions, as well as events.
08/28/25 (c)2001-2003, Michael P. Frank 8
Module #19 – Probability
Four Definitions of Probability
• Several alternative definitions of probability
are commonly encountered:
– Frequentist, Bayesian, Laplacian, Axiomatic
• They have different strengths &
weaknesses, philosophically speaking.
– But fortunately, they coincide with each other
and work well together, in the majority of cases
that are typically encountered.
08/28/25 (c)2001-2003, Michael P. Frank 9
Module #19 – Probability
Probability: Frequentist Definition
• The probability of an event E is the limit, as n→∞, of
the fraction of times that we find VE over the course
of n independent repetitions of (different instances of)
the same experiment. nV E
Pr[ E ] :lim
• Some problems with this definition: n n
– It is only well-defined for experiments that can be
independently repeated, infinitely many times!
• or at least, if the experiment can be repeated in principle, e.g., over
some hypothetical ensemble of (say) alternate universes.
– It can never be measured exactly in finite time!
• Advantage: It’s an objective, mathematical definition.
08/28/25 (c)2001-2003, Michael P. Frank 10
Module #19 – Probability
Probability: Bayesian Definition
• Suppose a rational, profit-maximizing entity R is
offered a choice between two rewards:
– Winning $1 if and only if the event E actually occurs.
– Receiving p dollars (where p[0,1]) unconditionally.
• If R can honestly state that he is completely
indifferent between these two rewards, then we say
that R’s probability for E is p, that is, PrR[E] :≡ p.
• Problem: It’s a subjective definition; depends on the
reasoner R, and his knowledge, beliefs, & rationality.
– The version above additionally assumes that the utility of
money is linear.
• This assumption can be avoided by using “utils” (utility units)
instead of dollars.
08/28/25 (c)2001-2003, Michael P. Frank 11
Module #19 – Probability
Probability: Laplacian Definition
• First, assume that all individual outcomes in the
sample space are equally likely to each other…
– Note that this term still needs an operational definition!
• Then, the probability of any event E is given by,
Pr[E] = |E|/|S|. Very simple!
• Problems: Still needs a definition for equally likely,
and depends on the existence of some finite sample
space S in which all outcomes in S are, in fact,
equally likely.
08/28/25 (c)2001-2003, Michael P. Frank 12
Module #19 – Probability
Probability: Axiomatic Definition
• Let p be any total function p:S→[0,1] such that
∑s p(s) = 1.
• Such a p is called a probability distribution.
• Then, the probability under p
of any event ES is just:
Prp [ E ] : p ( s )
sE
• Advantage: Totally mathematically well-defined!
– This definition can even be extended to apply to infinite
sample spaces, by changing ∑→∫, and calling p a
probability density function or a probability measure.
• Problem: Leaves operational meaning unspecified.
08/28/25 (c)2001-2003, Michael P. Frank 13
Module #19 – Probability
Probabilities of Mutually
Complementary Events
• Let E be an event in a sample space S.
• Then, E represents the complementary
event, saying that the actual value of VE.
• Theorem: Pr[E] = 1 − Pr[E]
– This can be proved using the Laplacian
definition of probability, since Pr[E] = |E|/|S| =
(|S|−|E|)/|S| = 1 − |E|/|S| = 1 − Pr[E].
• Other definitions can also be used to prove it.
08/28/25 (c)2001-2003, Michael P. Frank 14
Module #19 – Probability
Probability vs. Odds Exercise:
Express the
• You may have heard the term “odds.” probability
– It is widely used in the gambling community. p as a function
• This is not the same thing as probability! of the odds in
– But, it is very closely related. favor O.
• The odds in favor of an event E means the relative
probability of E compared with its complement E.
O(E) :≡ Pr(E)/Pr(E).
– E.g., if p(E) = 0.6 then p(E) = 0.4 and O(E) = 0.6/0.4 = 1.5.
• Odds are conventionally written as a ratio of integers.
– E.g., 3/2 or 3:2 in above example. “Three to two in favor.”
• The odds against E just means 1/O(E). “2 to 3 against”
08/28/25 (c)2001-2003, Michael P. Frank 15
Module #19 – Probability
Example 1: Balls-and-Urn
• Suppose an urn contains 4 blue balls and 5 red balls.
• An example experiment: Shake up the urn, reach in
(without looking) and pull out a ball.
• A random variable V: Identity of the chosen ball.
• The sample space S: The set of
all possible values of V:
– In this case, S = {b1,…,b9}
b1 b
• An event E: “The ball chosen is 2
b7 b9
b3 b5
blue”: E = { ______________ }
b4 b b8
• What are the odds in favor of E? 6
• What is the probability of E? (Use Laplacian def’n.)
08/28/25 (c)2001-2003, Michael P. Frank 16
Module #19 – Probability
Example 2: Seven on Two Dice
• Experiment: Roll a pair of
fair (unweighted) 6-sided dice.
• Describe a sample space for this
experiment that fits the Laplacian definition.
• Using this sample space, represent an event E
expressing that “the upper spots sum to 7.”
• What is the probability of E?
08/28/25 (c)2001-2003, Michael P. Frank 17
Module #19 – Probability
Probability of Unions of Events
• Let E1,E2 S = dom[V].
• Then we have: Theorem:
Pr[E1 E2] = Pr[E1] + Pr[E2] − Pr[E1E2]
– By the inclusion-exclusion principle, together
with the Laplacian definition of probability.
• You should be able to easily flesh out the
proof yourself at home.
08/28/25 (c)2001-2003, Michael P. Frank 18
Module #19 – Probability
Mutually Exclusive Events
• Two events E1, E2 are called mutually
exclusive if they are disjoint: E1E2 =
– Note that two mutually exclusive events cannot
both occur in the same instance of a given
experiment.
• For mutually exclusive events,
Pr[E1 E2] = Pr[E1] + Pr[E2].
– Follows from the sum rule of combinatorics.
08/28/25 (c)2001-2003, Michael P. Frank 19
Module #19 – Probability
Exhaustive Sets of Events
• A set E = {E1, E2, …} of events in the sample
space S is called exhaustive iff Ei S .
• An exhaustive set E of events that are all mutually
exclusive with each other has the property that
Pr[ Ei ] 1.
• You should be able to easily prove this theorem,
using either the Laplacian or Axiomatic
definitions of probability from earlier.
08/28/25 (c)2001-2003, Michael P. Frank 20
Module #19 – Probability
Independent Events
• Two events E,F are called independent if
Pr[EF] = Pr[E]·Pr[F].
• Relates to the product rule for the number
of ways of doing two independent tasks.
• Example: Flip a coin, and roll a die.
Pr[(coin shows heads) (die shows 1)] =
Pr[coin is heads] × Pr[die is 1] = ½×1/6 =1/12.
08/28/25 (c)2001-2003, Michael P. Frank 21
Module #19 – Probability
Conditional Probability
• Let E,F be any events such that Pr[F]>0.
• Then, the conditional probability of E given F,
written Pr[E|F], is defined as
Pr[E|F] :≡ Pr[EF]/Pr[F].
• This is what our probability that E would turn out
to occur should be, if we are given only the
information that F occurs.
• If E and F are independent then Pr[E|F] = Pr[E].
Pr[E|F] = Pr[EF]/Pr[F] = Pr[E]×Pr[F]/Pr[F] = Pr[E]
08/28/25 (c)2001-2003, Michael P. Frank 22
Module #19 – Probability
Prior and Posterior Probability
• Suppose that, before you are given any information about the
outcome of an experiment, your personal probability for an
event E to occur is p(E) = Pr[E].
– The probability of E in your original probability distribution p is called
the prior probability of E.
• This is its probability prior to obtaining any information about the outcome.
• Now, suppose someone tells you that some event F (which may
overlap with E) actually occurred in the experiment.
– Then, you should update your personal probability for event E to occur,
to become p′(E) = Pr[E|F] = p(E∩F)/p(F).
• The conditional probability of E, given F.
– The probability of E in your new probability distribution p′ is called the
posterior probability of E.
• This is its probability after learning that event F occurred.
• After seeing F, the posterior distribution p′ is defined by letting
08/28/25
p′(v) = p({v}∩F)/p(F)(c)2001-2003,
for eachMichael
individual
P. Frank
outcome vS. 23
Module #19 – Probability
Visualizing Conditional Probability
• If we are given that event F occurs, then
– Our attention gets restricted to the subspace F.
• Our posterior probability for E (after seeing
F) corresponds Entire sample space S
to the fraction
of F where E Event E Event F
occurs also. Event
E∩F
• Thus, p′(E)=
p(E∩F)/p(F).
08/28/25 (c)2001-2003, Michael P. Frank 24
Module #19 – Probability
Conditional Probability Example
• Suppose I choose a single letter out of the 26-letter English
alphabet, totally at random.
– Use the Laplacian assumption on the sample space {a,b,..,z}. st
– What is the (prior) probability
1 9
vowels letters
that the letter is a vowel?
• Pr[Vowel] = __ / __ .
z w r
• Now, suppose I tell you that the k b c
letter chosen happened to be in y u a t
the first 9 letters of the alphabet. d f
x e
– Now, what is the conditional (or s o i h g l
posterior) probability that the letter
is a vowel, given this information? p n j v q m
• Pr[Vowel | First9] = ___ / ___ . Sample Space S
08/28/25 (c)2001-2003, Michael P. Frank 25
Module #19 – Probability
Bayes’ Rule
• One way to compute the probability that a
hypothesis H is correct, given some data D:
Pr[ D | H ] Pr[ H ]
Pr[ H | D]
Pr[ D] Rev. Thomas Bayes
1702-1761
• This follows directly from the definition of
conditional probability! (Exercise: Prove it at home.)
• This rule is the foundation of Bayesian methods for
probabilistic reasoning, which are very powerful, and
widely used in artificial intelligence applications:
– For data mining, automated diagnosis, pattern recognition,
statistical modeling, even evaluating scientific hypotheses!
08/28/25 (c)2001-2003, Michael P. Frank 26
Module #19 – Probability
Expectation Values
• For any random variable V having a numeric domain, its
expectation value or expected value or weighted average
value or (arithmetic) mean value Ex[V], under the
probability distribution Pr[v] = p(v), is defined as
Vˆ :Ex[V ] :Ex p [V ] : v p(v).
vdom [V ]
• The term “expected value” is very widely used for this.
– But this term is somewhat misleading, since the “expected”
value might itself be totally unexpected, or even impossible!
• E.g., if p(0)=0.5 & p(2)=0.5, then Ex[V]=1, even though p(1)=0 and so
we know that V≠1!
• Or, if p(0)=0.5 & p(1)=0.5, then Ex[V]=0.5 even if V is an integer
variable!
08/28/25 (c)2001-2003, Michael P. Frank 27
Module #19 – Probability
Derived Random Variables
• Let S be a sample space over values of a random
variable V (representing possible outcomes).
• Then, any function f over S can also be considered
to be a random variable (whose actual value f(V) is
derived from the actual value of V).
• If the range R = range[f] of f is numeric, then the
mean value Ex[f] of f can still be defined, as
fˆ Ex[ f ] p ( s ) f ( s )
sS
08/28/25 (c)2001-2003, Michael P. Frank 28
Module #19 – Probability
Linearity of Expectation Values
• Let X1, X2 be any two random variables
derived from the same sample space S, and
subject to the same underlying distribution.
• Then we have the following theorems:
Ex[X1+X2] = Ex[X1] + Ex[X2]
Ex[aX1 + b] = aEx[X1] + b
• You should be able to easily prove these for
yourself at home.
08/28/25 (c)2001-2003, Michael P. Frank 29
Module #19 – Probability
Variance & Standard Deviation
• The variance Var[X] = σ2(X) of a random variable
X is the expected value of the square of the
difference between the value of X and its
expectation value Ex[X]:
Var [ X ] : X ( s ) Ex p [ X ] p ( s )
2
sS
• The standard deviation or root-mean-square
(RMS) difference of X is σ(X) :≡ Var[X]1/2.
08/28/25 (c)2001-2003, Michael P. Frank 30
Module #19 – Probability
Entropy
• The entropy H of a probability distribution p over a
sample space S over outcomes is a measure of our
degree of uncertainty about the actual outcome.
– It measures the expected amount of increase in our known
information that would result from learning the outcome.
H ( p ) :Ex p [log p 1 ] p ( s ) log p ( s )
sS
• The base of the logarithm gives the corresponding unit
of entropy; base 2 → 1 bit, base e → 1 nat (as before)
– 1 nat is also known as “Boltzmann’s constant” kB & as the
“ideal gas constant” R, and was first discovered physically
08/28/25 (c)2001-2003, Michael P. Frank 31
Module #19 – Probability
Visualizing Entropy
Sample Nonuniform vs. Uniform Probability Distributions Improbability (Inverse Probability)
1
80
0.8 Improb-
Probability
ability
60
0.6
(1 out of N) 40
0.4
0.2
20
0 0
1 2 3 1 2 3
4 5 6 4 5 6
7 8 9 7 8 9
State Index
10
State Index
10
Log Improbability (Information of Discovery) Boltzmann-Gibbs-Shannon Entropy
(Expected Log Improbability)
7 0.5
6 0.4
5
Log 0.3
Base 2 4
Bits
of 3 0.2
Improb. 2
0.1
1
0 0
1 2 3
1 2 3 S1 4 5
4 5 6 6 7 8
7 8 9 9 10
State Index 10 State Index
08/28/25 (c)2001-2003, Michael P. Frank 32