Preface
This handbook is designed to provide postgraduate students with a rigorous
yet accessible introduction to probability theory. The approach emphasizes:
• Conceptual Understanding: Each definition is accompanied by de-
tailed explanations and motivations
• Visual Learning: Extensive use of diagrams and visualizations to
illustrate abstract concepts
• Logical Development: Careful progression from basic to advanced
topics with clear connections
• Practical Applications: Examples drawn from statistics, computer
science, and real-world problems
How to Use This Book:
• Read chapters sequentially as concepts build upon each other
• Work through all examples and problems to reinforce understanding
• Use the margin notes and key idea boxes for quick reference
• Refer to the appendix for proofs and additional technical details
i
ii
Contents
Preface i
1 Foundations of Probability Theory 1
1.1 Set Theory Fundamentals . . . . . . . . . . . . . . . . . . . . 1
1.1.1 Basic Concepts and Definitions . . . . . . . . . . . . . 1
1.1.2 Set Operations . . . . . . . . . . . . . . . . . . . . . . 1
1.1.3 De Morgan’s Laws . . . . . . . . . . . . . . . . . . . . 2
1.1.4 Generalized Set Operations . . . . . . . . . . . . . . . 3
1.2 Set Theory in Probability . . . . . . . . . . . . . . . . . . . . 4
1.2.1 Sample Spaces and Events . . . . . . . . . . . . . . . . 4
1.2.2 Probability Measures . . . . . . . . . . . . . . . . . . . 4
2 Advanced Measure-Theoretic Foundations 5
2.1 -Algebras and Measurable Spaces . . . . . . . . . . . . . . . . 5
2.2 Measure Theory Foundations . . . . . . . . . . . . . . . . . . 6
2.3 Advanced Limit Concepts . . . . . . . . . . . . . . . . . . . . 7
2.4 Complete Measures and Extensions . . . . . . . . . . . . . . . 8
2.5 Borel Sets and Measurable Functions . . . . . . . . . . . . . . 8
2.6 Measurable Functions . . . . . . . . . . . . . . . . . . . . . . 9
2.7 Probability Measures . . . . . . . . . . . . . . . . . . . . . . . 10
2.8 Random Variables . . . . . . . . . . . . . . . . . . . . . . . . 11
2.9 Integration Theory . . . . . . . . . . . . . . . . . . . . . . . . 11
3 Advanced Probability Theory 13
3.1 Inverse Functions and Measurability . . . . . . . . . . . . . . 13
3.2 Probability Spaces . . . . . . . . . . . . . . . . . . . . . . . . 14
3.3 Random Variables and Distributions . . . . . . . . . . . . . . 14
3.4 Integration and Expectation . . . . . . . . . . . . . . . . . . . 15
3.5 Convergence of Random Variables . . . . . . . . . . . . . . . 16
iii
iv CONTENTS
4 Probability Foundations 17
4.1 Classical Probability . . . . . . . . . . . . . . . . . . . . . . . 17
4.2 Empirical Probability . . . . . . . . . . . . . . . . . . . . . . . 18
4.3 Subjective Probability . . . . . . . . . . . . . . . . . . . . . . 18
4.4 Axiomatic Probability . . . . . . . . . . . . . . . . . . . . . . 19
4.5 Comparison of Probability Interpretations . . . . . . . . . . . 19
4.6 Basic Properties of Probability Measures . . . . . . . . . . . . 20
4.7 Continuity Theorems . . . . . . . . . . . . . . . . . . . . . . . 20
4.8 Inclusion-Exclusion Principle . . . . . . . . . . . . . . . . . . 21
4.9 Advanced Properties . . . . . . . . . . . . . . . . . . . . . . . 22
4.10 Applications and Examples . . . . . . . . . . . . . . . . . . . 22
5 Discrete Probability Theory 25
5.1 Discrete Probability Spaces . . . . . . . . . . . . . . . . . . . 25
5.2 Finite Probability Spaces . . . . . . . . . . . . . . . . . . . . 26
5.3 Countable Probability Spaces . . . . . . . . . . . . . . . . . . 26
5.4 Discrete Random Variables . . . . . . . . . . . . . . . . . . . 27
5.5 Important Discrete Distributions . . . . . . . . . . . . . . . . 27
5.6 Expectation and Variance . . . . . . . . . . . . . . . . . . . . 28
6 Continuous Probability Theory 29
6.1 Continuous Probability Spaces . . . . . . . . . . . . . . . . . 29
6.2 Important Continuous Distributions . . . . . . . . . . . . . . 30
6.3 Continuous Random Variables . . . . . . . . . . . . . . . . . . 30
6.4 Normal Distribution . . . . . . . . . . . . . . . . . . . . . . . 30
6.5 Expectation and Variance . . . . . . . . . . . . . . . . . . . . 31
7 Distribution Functions and Their Properties 33
7.1 Cumulative Distribution Functions . . . . . . . . . . . . . . . 33
7.2 Types of Distributions . . . . . . . . . . . . . . . . . . . . . . 34
7.3 Decomposition of Distributions . . . . . . . . . . . . . . . . . 34
7.4 Convergence of Distribution Functions . . . . . . . . . . . . . 35
7.5 Empirical Distribution Functions . . . . . . . . . . . . . . . . 35
7.6 Applications and Examples . . . . . . . . . . . . . . . . . . . 36
A Detailing on selected concepts 37
A.1 De Morgan’s Laws . . . . . . . . . . . . . . . . . . . . . . . . 37
A.2 Validating Borel Sets . . . . . . . . . . . . . . . . . . . . . . . 38
A.2.1 Validation Method . . . . . . . . . . . . . . . . . . . . 38
A.2.2 Non-Borel Examples . . . . . . . . . . . . . . . . . . . 39
CONTENTS v
A.2.3 Verification Techniques . . . . . . . . . . . . . . . . . . 39
A.3 Borel-Cantelli Lemma . . . . . . . . . . . . . . . . . . . . . . 40
A.4 Radon-Nikodym Theorem . . . . . . . . . . . . . . . . . . . . 41
A.5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
vi CONTENTS
Chapter 1
Foundations of Probability
Theory
1.1 Set Theory Fundamentals
1.1.1 Basic Concepts and Definitions
Definition 1.1 (Set). A set is a well-defined collection of distinct objects,
called elements or members. Sets are typically denoted by capital letters
(A, B, C, . . .) and their elements by lowercase letters (a, b, c, . . .).
Formally, a set A in a universe Ω can be defined by its characteristic
function: (
1 if ω ∈ A
χA (ω) =
0 if ω ∈
/A
Remark 1.2. The concept of a set is fundamental in probability theory
because:
• Sample spaces are sets of all possible outcomes
• Events are subsets of the sample space
• Probability measures are functions defined on collections of sets
1.1.2 Set Operations
Definition 1.3 (Basic Set Operations). For sets A, B ⊆ Ω:
• Union: A ∪ B = {ω ∈ Ω : ω ∈ A or ω ∈ B}
1
2 CHAPTER 1. FOUNDATIONS OF PROBABILITY THEORY
• Intersection: A ∩ B = {ω ∈ Ω : ω ∈ A and ω ∈ B}
• Complement: Ac = {ω ∈ Ω : ω ∈
/ A}
• Set Difference: A \ B = {ω ∈ Ω : ω ∈ A and ω ∈
/ B}
• Symmetric Difference: A△B = (A \ B) ∪ (B \ A)
Example 1.4. Let Ω = {1, 2, 3, 4, 5}, A = {1, 2, 4}, B = {2, 4, 5}:
A ∪ B = {1, 2, 4, 5}
A ∩ B = {2, 4}
Ac = {3, 5}
A \ B = {1}
A△B = {1, 5}
A∪B
A\B A A∩B B B\A
Figure 1.1: Venn diagram illustrating basic set operations
1.1.3 De Morgan’s Laws
Theorem 1.5 (De Morgan’s Laws). For any two sets A, B ⊆ Ω:
1. (A ∪ B)c = Ac ∩ B c
2. (A ∩ B)c = Ac ∪ B c
Proof of First Law. We prove both inclusions:
1. (A ∪ B)c ⊆ Ac ∩ B c :
• Let x ∈ (A ∪ B)c
• Then x ∈
/ A ∪ B by definition of complement
• Thus x ∈
/ A and x ∈
/ B (negation of "or")
1.1. SET THEORY FUNDAMENTALS 3
• Therefore x ∈ Ac and x ∈ B c
• Hence x ∈ Ac ∩ B c
2. Ac ∩ B c ⊆ (A ∪ B)c :
• Let x ∈ Ac ∩ B c
• Then x ∈
/ A and x ∈
/B
• Thus x ∈
/ A∪B
• Therefore x ∈ (A ∪ B)c
Key Idea
De Morgan’s Laws establish a duality between union and intersection
through complementation. This duality is fundamental in probability
theory when dealing with events and their complements.
1.1.4 Generalized Set Operations
Definition 1.6 (Generalized Union and Intersection). For an indexed family
of sets {Ai }i∈I :
• i∈I Ai = {ω ∈ Ω : ∃i ∈ I such that ω ∈ Ai }
S
• i∈I Ai = {ω ∈ Ω : ∀i ∈ I, ω ∈ Ai }
T
Example 1.7 (Nested Intervals). Consider the sequence of sets An = (0, 1+
n ) for n ∈ N:
1
\∞
An = (0, 1]
n=1
This shows that countable intersections can behave differently than finite
intersections.
Theorem 1.8 (Extended De Morgan’s Laws). For any collection {Ai }i∈I :
!c
[ \
Ai = Aci
i∈I i∈I
!c
\ [
Ai = Aci
i∈I i∈I
4 CHAPTER 1. FOUNDATIONS OF PROBABILITY THEORY
1.2 Set Theory in Probability
1.2.1 Sample Spaces and Events
Definition 1.9 (Sample Space). The sample space Ω is the set of all
possible outcomes of a random experiment.
Definition 1.10 (Event). An event is a subset of the sample space to which
we can assign probability. The collection of all events forms a σ-algebra F .
Example 1.11. For rolling a die:
• Sample space: Ω = {1, 2, 3, 4, 5, 6}
• Events: "Even number" = {2, 4, 6}, "Prime number" = {2, 3, 5}
1.2.2 Probability Measures
Definition 1.12 (Probability Measure). A function P : F → [0, 1] is a
probability measure if:
• P (Ω) = 1
• For countable disjoint events {Ai }, P ( i Ai ) = i P (Ai )
S P
Proposition 1.13 (Basic Properties). For any events A, B:
• P (Ac ) = 1 − P (A)
• If A ⊆ B, then P (A) ≤ P (B)
• P (A ∪ B) = P (A) + P (B) − P (A ∩ B)
Chapter 2
Advanced Measure-Theoretic
Foundations
2.1 -Algebras and Measurable Spaces
Definition 2.1 (-Algebra). A collection F of subsets of X is a -algebra if:
1. X ∈ F
2. Closed under complements: A ∈ F ⇒ Ac ∈ F
S∞
3. Closed under countable unions: {An }∞
n=1 ⊆ F ⇒ n=1 An ∈F
The pair (X, F ) is called a measurable space.
Theorem 2.2 (Generated -Algebra). For any collection C of subsets of
X, there exists a smallest -algebra σ(C ) containing C , called the -algebra
generated by C .
Proof. Let {Fi }i∈I be the family of all -algebras containing C . Then:
\
σ(C ) = Fi
i∈I
We verify the -algebra properties:
1. Each Fi contains X, so X ∈ σ(C )
2. If A ∈ σ(C ), then A ∈ Fi for all i, hence Ac ∈ Fi for all i, so
Ac ∈ σ(C )
5
6 CHAPTER 2. ADVANCED MEASURE-THEORETIC FOUNDATIONS
3. S
For countable unions {AS n } ⊆ σ(C ), each An ∈ Fi for all i, hence
An ∈ Fi for all i, so An ∈ σ(C )
Minimality follows since any -algebra containing C is one of the Fi .
Example 2.3 (Borel -Algebra). The Borel -algebra B(Rn ) is generated
by:
• Open sets: σ(open sets)
• Equivalent generators: closed sets, intervals (a, b], or {(a1 , b1 ] × · · · ×
(an , bn ]}
2.2 Measure Theory Foundations
Definition 2.4 (Measure). A function µ : F → [0, ∞] is a measure if:
1. µ(∅) = 0
n=1 ⊆ F ,
2. Countable additivity: For disjoint {An }∞
∞ ∞
!
[ X
µ An = µ(An )
n=1 n=1
The triple (X, F , µ) is a measure space.
Theorem 2.5 (Continuity from Below).
S Let (X, F , µ) be a measure space
and An ↑ A (i.e., An ⊆ An+1 and A = ∞
n=1 An ). Then:
lim µ(An ) = µ(A)
n→∞
Proof. Construct disjoint sets B1 = A1 , Bn = An \ An−1 for n ≥ 2. Then:
∞ ∞
! !
[ [
µ(A) = µ An = µ Bn
n=1 n=1
∞
X
= µ(Bn ) (countable additivity)
n=1
N N
!
X [
= lim µ(Bn ) = lim µ Bn
N →∞ N →∞
n=1 n=1
= lim µ(AN )
N →∞
2.3. ADVANCED LIMIT CONCEPTS 7
Theorem 2.6 (Carathéodory’s Extension Theorem). Let µ be a premeasure
on an algebra A . Then there exists a unique extension of µ to a measure on
σ(A ).
1. Construct outer measure: µ∗ (E) = inf { ∞
S∞
i=1 Ai , Ai ∈ A }
P
Proof Sketch. i=1 µ(Ai ) : E ⊆
2. Identify µ∗ -measurable sets M satisfying:
µ∗ (E) = µ∗ (E ∩ M ) + µ∗ (E ∩ M c ) ∀E ⊆ X
3. Show M is a -algebra containing A
4. Prove µ∗ |M is the desired measure extension
2.3 Advanced Limit Concepts
Definition 2.7 (Set Convergence). For a sequence {An } in F :
• lim sup An = ∞
S∞
n=1 k=n Ak = {ω : ω ∈ An for infinitely many n}
T
• lim inf An = ∞
T∞
n=1 k=n Ak = {ω : ω ∈ An for all but finitely many n}
S
When lim sup An = lim inf An = A, we say An → A.
Theorem 2.8 (Borel-Cantelli Lemma). For any measure space (X, F , µ):
1. If ∞
P
n=1 µ(An ) < ∞, then µ(lim sup An ) = 0
2. If µ is a probability measure and {An } are independent with ∞
P
n=1 µ(An ) =
∞, then µ(lim sup An ) = 1
Proof. (1) Note that:
∞ [
∞ ∞ ∞
! !
\ [ X
µ(lim sup An ) = µ Ak ≤µ Ak ≤ µ(Ak ) → 0
n=1 k=n k=n k=n
as n → ∞ by the convergence of the series.
(2) Using independence:
N N N
! !
\ Y X
µ Ack = (1 − µ(Ak )) ≤ exp − µ(Ak ) → 0
k=n k=n k=n
Thus µ(lim inf Acn ) = 0 and µ(lim sup An ) = 1.
8 CHAPTER 2. ADVANCED MEASURE-THEORETIC FOUNDATIONS
2.4 Complete Measures and Extensions
Definition 2.9 (Complete Measure). A measure space (X, F , µ) is com-
plete if:
∀E ∈ F with µ(E) = 0, ∀F ⊆ E, F ∈ F
Theorem 2.10 (Completion Theorem). For any measure space (X, F , µ),
there exists a smallest complete measure space (X, F , µ) extending it.
Proof. Define:
F = {E ∪ N : E ∈ F , N ⊆ F ∈ F with µ(F ) = 0}
µ(E ∪ N ) = µ(E)
Verification:
• F is a -algebra
• µ is well-defined and extends µ
• µ is complete
• Minimality follows from construction
2.5 Borel Sets and Measurable Functions
Definition 2.11 (Borel -Algebra). The Borel -algebra B(R) is the small-
est -algebra containing all open sets in R. Equivalently, it’s generated by:
• Open intervals: σ({(a, b) : a < b})
• Closed intervals: σ({[a, b] : a ≤ b})
• Left-open intervals: σ({(a, b] : a < b})
Theorem 2.12 (Cardinality of Borel Sets). The Borel -algebra has cardi-
ℵ
nality 2ℵ0 (same as R), strictly smaller than 22 0 (the power set of R).
2.6. MEASURABLE FUNCTIONS 9
Proof. 1. Construct the Borel hierarchy:
Σ01 = Open sets
Π01 = Closed sets
( )
[
Σ0α = An : An ∈ Π0βn , βn < α
n∈N
Π0α = A : A ∈ Σ0α
c
2. Show B(R) = α<ω1 Σ0α where ω1 is first uncountable ordinal 3. Each
S
level has cardinality 2ℵ0 , and union over ℵ1 levels preserves this
2.6 Measurable Functions
Definition 2.13 (Measurable Function). A function f : (X, F ) → (R, B(R))
is measurable if:
∀B ∈ B(R), f −1 (B) ∈ F
Equivalently, any of these conditions suffice:
• ∀a ∈ R, {x : f (x) ≤ a} ∈ F
• ∀a ∈ R, {x : f (x) > a} ∈ F
• ∀open U ⊆ R, f −1 (U ) ∈ F
Theorem 2.14 (Operations Preserving Measurability). Let f, g : (X, F ) →
(R, B(R)) be measurable. Then:
1. f + g is measurable
2. f · g is measurable
3. max(f, g) and min(f, g) are measurable
4. If g ̸= 0, f /g is measurable
5. For continuous ϕ : R → R, ϕ ◦ f is measurable
Proof for f + g. For any a ∈ R:
[
{x : f (x) + g(x) > a} = ({x : f (x) > r} ∩ {x : g(x) > a − r})
r∈Q
∈F
since:
10CHAPTER 2. ADVANCED MEASURE-THEORETIC FOUNDATIONS
• The union is countable
• Each {x : f (x) > r} ∈ F and {x : g(x) > a − r} ∈ F
• F is closed under countable unions and finite intersections
2.7 Probability Measures
Definition 2.15 (Probability Measure). A function P : F → [0, 1] is a
probability measure if:
1. P (∅) = 0, P (Ω) = 1
n=1 ⊆ F ,
2. Countable additivity: For disjoint {An }∞
∞ ∞
!
[ X
P An = P (An )
n=1 n=1
The triple (Ω, F , P ) is a probability space.
Theorem 2.16 (Continuity of Probability Measures). For any probability
measure P :
S
1. If An ↑ A (i.e., A1 ⊆ A2 ⊆ · · · and A = An ), then
lim P (An ) = P (A)
n→∞
T
2. If An ↓ A (i.e., A1 ⊇ A2 ⊇ · · · and A = An ), then
lim P (An ) = P (A)
n→∞
Proof for increasing sequence. 1. Construct disjoint sets:
B 1 = A1 , Bn = An \ An−1 for n ≥ 2
2. Then:
∞
[
A= Bn (disjoint union)
n=1
3. By countable additivity:
∞
X N
X
P (A) = P (Bn ) = lim P (Bn ) = lim P (AN )
N →∞ N →∞
n=1 n=1
2.8. RANDOM VARIABLES 11
2.8 Random Variables
Definition 2.17 (Random Variable). A random variable is a measurable
function X : (Ω, F , P ) → (R, B(R)).
Theorem 2.18 (Measurability of Limits). Let {Xn } be a sequence of random
variables. Then:
1. supn Xn , inf n Xn are random variables
2. lim supn→∞ Xn , lim inf n→∞ Xn are random variables
3. If limn→∞ Xn exists pointwise, it is a random variable
Proof for supn Xn . For any a ∈ R:
∞
\
{ω : sup Xn (ω) ≤ a} = {ω : Xn (ω) ≤ a} ∈ F
n
n=1
since:
• Each {ω : Xn (ω) ≤ a} ∈ F by measurability of Xn
• F is closed under countable intersections
2.9 Integration Theory
Definition 2.19 (Lebesgue Integral). For a random variable X ≥ 0:
1. For simple X = ni=1 ai IAi :
P
Z n
X
XdP = ai P (Ai )
i=1
2. For general X ≥ 0:
Z Z
XdP = sup Y dP : 0 ≤ Y ≤ X, Y simple
3. For general X:
Z Z Z
XdP = X + dP − X − dP
12CHAPTER 2. ADVANCED MEASURE-THEORETIC FOUNDATIONS
Theorem 2.20 (Monotone Convergence Theorem). If 0 ≤ Xn ↑ X point-
wise, then: Z Z
lim Xn dP = XdP
n→∞
Proof. 1. Xn dP ≤ XdP by monotonicity, so lim sup Xn dP ≤ XdP .
R R R R
2. For any simple Y ≤ X with 0 ≤ Y ≤ c, define:
Yn = min(Y, Xn )
Then Yn ↑ Y and:
Z Z Z
Y dP = lim Yn dP ≤ lim inf Xn dP
n→∞ n→∞
3. Take supremum over all simple Y ≤ X to get:
Z Z
XdP ≤ lim inf Xn dP
n→∞
Chapter 3
Advanced Probability Theory
3.1 Inverse Functions and Measurability
Definition 3.1 (Inverse Function). For a bijective function f : (X, F ) →
(Y, G ) between measurable spaces:
• The inverse f −1 : Y → X satisfies f −1 (y) = x ⇐⇒ f (x) = y
• For any B ⊆ Y , f −1 (B) = {x ∈ X : f (x) ∈ B}
Theorem 3.2 (Measurability of Inverse Functions). If f : (X, F ) → (Y, G )
is bijective and measurable, then:
1. f −1 is measurable when F = σ(f −1 (G ))
2. For Borel measurable f : R → R, f −1 is Borel measurable
Proof. 1. For any A ∈ G :
(f −1 )−1 (A) = f (A) ∈ F by measurability of f
2. For Borel case:
• {A ⊆ R : f −1 (A) is Borel} is a σ-algebra
• Contains open sets since f is continuous
• Hence contains all Borel sets
13
14 CHAPTER 3. ADVANCED PROBABILITY THEORY
3.2 Probability Spaces
Definition 3.3 (Complete Probability Space). A probability space (Ω, F , P )
is complete if:
∀N ∈ F with P (N ) = 0, ∀A ⊆ N, A ∈ F
Theorem 3.4 (Completion Theorem). For any probability space (Ω, F , P ),
there exists a smallest complete extension (Ω, F , P ).
Proof. Define:
F = {E ∪ N : E ∈ F , N ⊆ F ∈ F with P (F ) = 0}
P (E ∪ N ) = P (E)
Verification:
• F is a σ-algebra
• P is well-defined and complete
• Minimality follows from construction
3.3 Random Variables and Distributions
Definition 3.5 (Random Variable). A random variable is a measurable
function:
X : (Ω, F , P ) → (R, B(R))
Its distribution is the pushforward measure:
PX (B) = P (X −1 (B)) = P ({ω : X(ω) ∈ B})
Theorem 3.6 (Properties of Distribution Functions). For any random vari-
able X, its CDF FX (x) = P (X ≤ x) satisfies:
1. Monotonicity: x ≤ y ⇒ FX (x) ≤ FX (y)
2. Right-continuity: limx→a+ FX (x) = FX (a)
3. Limits: limx→−∞ FX (x) = 0, limx→+∞ FX (x) = 1
3.4. INTEGRATION AND EXPECTATION 15
Proof. 1. Monotonicity follows from {X ≤ x} ⊆ {X ≤ y}.
2. For right-continuity:
∞
\
{X ≤ a} = {X ≤ a + n1 }
n=1
By continuity of probability measures:
FX (a) = lim FX (a + n1 )
n→∞
3. Limit properties follow from:
lim {X ≤ x} = ∅, lim {X ≤ x} = Ω
x→−∞ x→∞
3.4 Integration and Expectation
Definition 3.7 (Lebesgue Integral). For random variable X ≥ 0 on (Ω, F , P ):
1. For simple X = ni=1 ai IAi :
P
Z Xn
E[X] = XdP = ai P (Ai )
i=1
2. For general X ≥ 0:
E[X] = sup {E[Y ] : 0 ≤ Y ≤ X, Y simple}
3. For general X:
E[X] = E[X + ] − E[X − ]
Theorem 3.8 (Dominated Convergence Theorem). If Xn → X pointwise
and |Xn | ≤ Y with E[Y ] < ∞, then:
lim E[Xn ] = E[X]
n→∞
Proof Sketch. 1. Apply Fatou’s Lemma to Y + Xn and Y − Xn :
E[X] ≤ lim inf E[Xn ]
E[−X] ≤ lim inf E[−Xn ] = − lim sup E[Xn ]
2. Combine to get:
lim sup E[Xn ] ≤ E[X] ≤ lim inf E[Xn ]
3. Hence the limit exists and equals E[X].
16 CHAPTER 3. ADVANCED PROBABILITY THEORY
3.5 Convergence of Random Variables
Definition 3.9 (Modes of Convergence). For random variables {Xn } and
X:
• Almost sure: P (limn→∞ Xn = X) = 1
• Probability: ∀ϵ > 0, limn→∞ P (|Xn − X| > ϵ) = 0
• Lp : limn→∞ E[|Xn − X|p ] = 0
• Distribution: limn→∞ FXn (x) = FX (x) at continuity points
Theorem 3.10 (Relationships Between Convergences).
a.s. ⇒ Probability ⇒ Distribution
Lp ⇒ Probability
Probability ⇒ Subsequence a.s.
Proof of Probability ⇒ Subsequence a.s. 1. Choose ϵk = 1/k and nk such
that:
P (|Xnk − X| > 1/k) < 1/2k
2. By Borel-Cantelli, P (|Xnk − X| > 1/k i.o.) = 0 3. Hence Xnk → X
almost surely
Chapter 4
Probability Foundations
4.1 Classical Probability
Definition 4.1 (Classical Probability). For a finite sample space Ω with N
equally likely outcomes:
|A| Number of favorable outcomes
P (A) = =
|Ω| Total number of outcomes
Theorem 4.2 (Properties of Classical Probability). For any events A, B ⊆
Ω:
1. 0 ≤ P (A) ≤ 1
2. P (Ω) = 1
3. If A ∩ B = ∅, then P (A ∪ B) = P (A) + P (B)
4. P (Ac ) = 1 − P (A)
Proof. 1. Since 0 ≤ |A| ≤ N , we have 0 ≤ |A|
N ≤ 1.
2. P (Ω) = N = 1.
N
3. If A ∩ B = ∅, then |A ∪ B| = |A| + |B|, so:
|A| + |B|
P (A ∪ B) = = P (A) + P (B)
N
4. Since A ∩ Ac = ∅ and A ∪ Ac = Ω:
1 = P (Ω) = P (A) + P (Ac ) ⇒ P (Ac ) = 1 − P (A)
17
18 CHAPTER 4. PROBABILITY FOUNDATIONS
4.2 Empirical Probability
Definition 4.3 (Empirical Probability). For an event A occurring nA times
in n trials:
nA
P (A) = lim
n→∞ n
Theorem 4.4 (Strong Law of Large Numbers). For i.i.d. trials with true
probability p:
nA
P lim =p =1
n→∞ n
Sketch. 1. Use Borel-Cantelli lemma to show:
∞ n
A
X
∀ϵ > 0, P −p >ϵ <∞
n
n=1
2. Apply Kolmogorov’s zero-one law to conclude almost sure convergence.
4.3 Subjective Probability
Definition 4.5 (Subjective Probability). A personal degree of belief P (A) ∈
[0, 1] satisfying:
1. P (Ω) = 1
2. For mutually exclusive A1 , ..., An :
n n
!
[ X
P Ai = P (Ai )
i=1 i=1
Theorem 4.6 (Dutch Book Argument). A subjective probabilist who violates
the probability axioms can be forced into a sure loss (Dutch book).
Proof. Suppose an agent assigns: - P (A) + P (Ac ) < 1 for some event A
A bookmaker can offer: - Bet on A at odds P (A) : (1 − P (A)) - Bet on
Ac at odds P (Ac ) : (1 − P (Ac ))
The agent accepts both bets, paying P (A) + P (Ac ) < 1 but receives 1
regardless of outcome.
4.4. AXIOMATIC PROBABILITY 19
4.4 Axiomatic Probability
Definition 4.7 (Probability Space). A triple (Ω, F , P ) where:
• Ω is the sample space
• F is a σ-algebra on Ω
• P : F → [0, 1] satisfies Kolmogorov’s axioms
Theorem 4.8 (Extension Theorem). Any countably additive P on an algebra
A has a unique extension to σ(A ).
Key Steps. 1. Define outer measure:
(∞ ∞
)
X [
P ∗ (E) = inf P (Ai ) : E ⊆ Ai , Ai ∈ A
i=1 i=1
2. Show M = {E ⊆ Ω : P ∗ (F ) = P ∗ (F ∩ E) + P ∗ (F ∩ E c ) ∀F ⊆ Ω} is
a σ-algebra containing A .
3. P ∗ |M is the desired extension.
4.5 Comparison of Probability Interpretations
Theorem 4.9 (Equivalence in Regular Cases). For i.i.d. trials with equally
likely outcomes:
Pclassical (A) = Pempirical (A) = Psubjective (A)
when all are well-defined.
|A|
Proof. 1. Classical: By symmetry, P (A) = |Ω| .
|A|
2. Empirical: By SLLN, nA
n → |Ω| almost surely.
|A|
3. Subjective: For symmetric beliefs, must assign P (A) = |Ω| to avoid
Dutch book.
Remark 4.10. The axiomatic framework subsumes all interpretations when
properly specified.
20 CHAPTER 4. PROBABILITY FOUNDATIONS
4.6 Basic Properties of Probability Measures
Theorem 4.11 (Fundamental Properties). For any probability space (Ω, F , P )
and events A, B ∈ F :
1. P (∅) = 0
2. P (Ac ) = 1 − P (A)
3. If A ⊆ B, then P (B \ A) = P (B) − P (A)
4. P (A ∪ B) = P (A) + P (B) − P (A ∩ B)
5. Boole’s Inequality: P ( ni=1 Ai ) ≤ ni=1 P (Ai )
S P
Proof of Property 3. For A ⊆ B:
B = A ∪ (B \ A) (disjoint union)
P (B) = P (A) + P (B \ A) by additivity
⇒ P (B \ A) = P (B) − P (A)
Proof of Property 4. Using the disjoint decomposition:
A ∪ B = (A \ B) ∪ (A ∩ B) ∪ (B \ A)
P (A ∪ B) = P (A \ B) + P (A ∩ B) + P (B \ A)
= [P (A \ B) + P (A ∩ B)] + [P (B \ A) + P (A ∩ B)] − P (A ∩ B)
= P (A) + P (B) − P (A ∩ B)
4.7 Continuity Theorems
Theorem 4.12 (ContinuityS from Below). For any increasing sequence A1 ⊆
A2 ⊆ · · · with An ↑ A = ∞
n=1 An :
lim P (An ) = P (A)
n→∞
4.8. INCLUSION-EXCLUSION PRINCIPLE 21
Proof. Construct disjoint sets B1 = A1 , Bn = An \ An−1 for n ≥ 2. Then:
∞
[
A= Bn (disjoint union)
n=1
X∞
P (A) = P (Bn ) by countable additivity
n=1
N
X
= lim P (Bn ) = lim P (AN )
N →∞ N →∞
n=1
Theorem 4.13 (ContinuityT from Above). For any decreasing sequence A1 ⊇
A2 ⊇ · · · with An ↓ A = ∞
n=1 An :
lim P (An ) = P (A)
n→∞
Proof. Consider complements Acn which form an increasing sequence:
∞
[
Acn ↑ Ac = Acn
n=1
lim P (An ) = 1 − lim P (Acn )
n→∞ n→∞
c
= 1 − P (A ) = P (A)
4.8 Inclusion-Exclusion Principle
Theorem 4.14 (General Inclusion-Exclusion). For events A1 , . . . , An :
n n
!
[ X
P Ai = (−1)k+1 Sk
i=1 k=1
where:
X k
\
Sk = P Aij
1≤i1 <···<ik ≤n j=1
22 CHAPTER 4. PROBABILITY FOUNDATIONS
Proof. By induction on n. Base case (n = 2) is the familiar formula:
P (A1 ∪ A2 ) = P (A1 ) + P (A2 ) − P (A1 ∩ A2 )
Inductive step uses:
n+1 n n
! ! !
[ [ [
P Ai = P Ai + P (An+1 ) − P (Ai ∩ An+1 )
i=1 i=1 i=1
and applies the inductive hypothesis to both unions.
4.9 Advanced Properties
Theorem 4.15 (Bonferroni Inequalities). For any events A1 , . . . , An and
m ≤ n:
1. For odd m: P ( ni=1 Ai ) ≤ m k+1 S
S P
k=1 (−1) k
Sn Pm
2. For even m: P ( i=1 Ai ) ≥ k=1 (−1)k+1 Sk
Theorem 4.16 (Fréchet Inequalities). For any events A1 , . . . , An :
n n
! !
X \
max 0, P (Ai ) − (n − 1) ≤ P Ai ≤ min P (Ai )
i
i=1 i=1
n n
! !
[ X
max P (Ai ) ≤ P Ai ≤ min 1, P (Ai )
i
i=1 i=1
Proof of Lower Bound. Using induction and:
P (A1 ∩ A2 ) ≥ P (A1 ) + P (A2 ) − 1
The general case follows by iterating this inequality.
4.10 Applications and Examples
Example 4.17 (Poincaré’s Formula). For events A1 , . . . , An :
n n n
! !
\ [ X
P Aci = 1 − P Ai = (−1)k Sk
i=1 i=1 k=0
where S0 = 1. This gives the probability that none of the events occur.
4.10. APPLICATIONS AND EXAMPLES 23
Example 4.18 (Matching Problem). For random permutations, the proba-
bility of at least one fixed point is:
n
X (−1)k
1− ≈ 1 − e−1 ≈ 0.632
k!
k=0
as n → ∞, using inclusion-exclusion.
24 CHAPTER 4. PROBABILITY FOUNDATIONS
Chapter 5
Discrete Probability Theory
5.1 Discrete Probability Spaces
Definition 5.1 (Discrete Probability Space). A probability space (Ω, F , P )
is discrete if:
• Ω is countable (finite or countably infinite)
• F = P(Ω) (the power set)
• There exists a probability mass function p : Ω → [0, 1] with:
X
P (A) = p(ω) for all A ⊆ Ω
ω∈A
Theorem 5.2 (Properties of Discrete Measures). For any discrete probability
space:
1. P ({ω}) = p(ω) for all ω ∈ Ω
P
2. ω∈Ω p(ω) = 1
P
3. For any A ⊆ Ω, P (A) = ω∈A P ({ω})
Proof. 1. By definition with A = {ω}.
2. Since P (Ω) = 1 and Ω is countable:
X X
1 = P (Ω) = P ({ω}) = p(ω)
ω∈Ω ω∈Ω
3. Follows directly from countable additivity.
25
26 CHAPTER 5. DISCRETE PROBABILITY THEORY
5.2 Finite Probability Spaces
Definition 5.3 (Finite Equiprobable Space). A discrete space with |Ω| =
N < ∞ and:
1
p(ω) = for all ω ∈ Ω
N
Theorem 5.4 (Combinatorial Probability). In a finite equiprobable space:
|A| Number of favorable outcomes
P (A) = =
|Ω| Total number of outcomes
Proof.
X X 1 |A|
P (A) = p(ω) = =
N N
ω∈A ω∈A
Example 5.5 (Poker Hands). Probability of a full house (3 of one rank, 2
of another):
13 × 12 × 43 × 42
P (Full House) = 52
≈ 0.00144
5
5.3 Countable Probability Spaces
Definition 5.6 (Geometric Distribution). For 0 < p < 1, the geometric
distribution on N has:
p(k) = (1 − p)k−1 p for k = 1, 2, 3, . . .
Theorem 5.7 (Memoryless Property). For geometric X and k, m ≥ 1:
P (X > k + m | X > k) = P (X > m)
Proof.
P (X > k + m)
P (X > k + m | X > k) =
P (X > k)
(1 − p)k+m
= = (1 − p)m = P (X > m)
(1 − p)k
5.4. DISCRETE RANDOM VARIABLES 27
5.4 Discrete Random Variables
Definition 5.8 (Discrete Random Variable). A function X : Ω → R where
X(Ω) is countable.
Theorem 5.9 (Probability Mass Function). For discrete X, the PMF pX :
R → [0, 1] is:
pX (x) = P (X = x) = P ({ω : X(ω) = x})
satisfying:
1. pX (x) ≥ 0 for all x
P
2. x∈X(Ω) pX (x) = 1
Proof. 1. Since P is non-negative.
2. Since X(Ω) is countable and P (Ω) = 1:
X [
P (X = x) = P {ω : X(ω) = x} = P (Ω) = 1
x∈X(Ω) x∈X(Ω)
5.5 Important Discrete Distributions
Definition 5.10 (Poisson Distribution). For λ > 0, the Poisson PMF is:
λk
p(k) = e−λ for k = 0, 1, 2, . . .
k!
Theorem 5.11 (Poisson Limit Theorem). For n → ∞, npn → λ:
λk
n k
pn (1 − pn )n−k → e−λ
k k!
Sketch. 1. Write:
n(n − 1) · · · (n − k + 1) λn k λn n−k
n k n−k
p (1 − pn ) = 1−
k n k! n n
2. Take limits:
n(n − 1) · · · (n − k + 1)
→1
nk
λn n−k
1− → e−λ
n
28 CHAPTER 5. DISCRETE PROBABILITY THEORY
5.6 Expectation and Variance
Definition 5.12 (Expectation). For discrete X with PMF pX :
X
E[X] = xpX (x)
x∈X(Ω)
when the sum converges absolutely.
Theorem 5.13 (Linearity of Expectation). For discrete X, Y and a, b ∈ R:
E[aX + bY ] = aE[X] + bE[Y ]
Proof.
X
E[aX + bY ] = (aX(ω) + bY (ω))P ({ω})
ω∈Ω
X X
=a X(ω)P ({ω}) + b Y (ω)P ({ω})
ω ω
= aE[X] + bE[Y ]
Definition 5.14 (Variance).
Var(X) = E[(X − E[X])2 ] = E[X 2 ] − (E[X])2
Example 5.15 (Binomial Variance). For X ∼ Binomial(n, p):
Var(X) = np(1 − p)
Chapter 6
Continuous Probability Theory
6.1 Continuous Probability Spaces
Definition 6.1 (Continuous Probability Space). A probability space (Ω, F , P )
is continuous if:
• Ω is uncountable (typically Rn )
• F contains the Borel σ-algebra
• There exists a probability density function f : Ω → [0, ∞) with:
Z
P (A) = f (x)dx for all measurable A ⊆ Ω
A
Theorem 6.2 (Properties of Continuous Measures). For any continuous
probability space:
1. P ({x}) = 0 for all x ∈ Ω
R∞
2. −∞ f (x)dx = 1
Rb
3. For any a ≤ b, P ([a, b]) = a f (x)dx
Proof. 1. Since singletons have zero Lebesgue measure.
2. By normalization:
Z
P (Ω) = f (x)dx = 1
Ω
3. Follows from the definition of Lebesgue integration.
29
30 CHAPTER 6. CONTINUOUS PROBABILITY THEORY
6.2 Important Continuous Distributions
Definition 6.3 (Uniform Distribution). For a < b, the uniform PDF on
[a, b] is: (
1
x ∈ [a, b]
f (x) = b−a
0 otherwise
Theorem 6.4 (Uniform Properties). For X ∼ Uniform(a, b):
a+b
1. E[X] = 2
(b−a)2
2. Var(X) = 12
b2 −a2
Rb
Proof. 1. E[X] = a x · b−a1 1
dx = b−a · 2 = a+b
2
b 3 3
2. E[X 2 ] = a x2 · b−a
1 b −a
R
dx = 3(b−a)
(b−a)2
Var(X) = E[X 2 ] − (E[X])2 = 12
6.3 Continuous Random Variables
Definition 6.5 (Continuous Random Variable). A measurable function X :
Ω → R with absolutely continuous distribution.
Theorem 6.6 (Cumulative Distribution Function). For continuous X, the
CDF FX : R → [0, 1] is:
Z x
FX (x) = P (X ≤ x) = fX (t)dt
−∞
and satisfies:
1. limx→−∞ FX (x) = 0, limx→∞ FX (x) = 1
2. FX is absolutely continuous
d
3. fX (x) = dx FX (x) almost everywhere
6.4 Normal Distribution
Definition 6.7 (Standard Normal). The standard normal PDF is:
1 2
ϕ(x) = √ e−x /2
2π
Rx
with CDF Φ(x) = −∞ ϕ(t)dt.
6.5. EXPECTATION AND VARIANCE 31
Theorem 6.8 (General Normal). For X ∼ N (µ, σ 2 ):
1 (x−µ)2
fX (x) = √ e− 2σ2
σ 2π
6.5 Expectation and Variance
Definition 6.9 (Expectation). For continuous X with PDF fX :
Z ∞
E[X] = xfX (x)dx
−∞
when the integral converges absolutely.
Theorem 6.10 (Law of the Unconscious Statistician). For measurable g:
Z ∞
E[g(X)] = g(x)fX (x)dx
−∞
Definition 6.11 (Variance).
Z ∞
Var(X) = E[(X − E[X])2 ] = (x − µ)2 fX (x)dx
−∞
32 CHAPTER 6. CONTINUOUS PROBABILITY THEORY
Chapter 7
Distribution Functions and
Their Properties
7.1 Cumulative Distribution Functions
Definition 7.1 (Cumulative Distribution Function). For a random variable
X on (Ω, F , P ), the CDF FX : R → [0, 1] is:
FX (x) = P (X ≤ x) = P ({ω ∈ Ω : X(ω) ≤ x})
Theorem 7.2 (Characterization of CDFs). A function F : R → [0, 1] is a
CDF iff:
1. F is non-decreasing
2. F is right-continuous: limy↓x F (y) = F (x)
3. limx→−∞ F (x) = 0 and limx→∞ F (x) = 1
Proof. (⇒) 1. Monotonicity: For x ≤ y, {X ≤ x} ⊆ {X ≤ y} implies
F (x) ≤ F (y).
2. Right-continuity: For xn ↓ x, {X ≤ xn } ↓ {X ≤ x}, so by continuity
from above:
F (xn ) = P (X ≤ xn ) ↓ P (X ≤ x) = F (x)
3. Limits: {X ≤ x} ↓ ∅ as x → −∞ and {X ≤ x} ↑ Ω as x → ∞.
(⇐) Construction of probability measure via Carathéodory extension the-
orem.
33
34CHAPTER 7. DISTRIBUTION FUNCTIONS AND THEIR PROPERTIES
7.2 Types of Distributions
Definition 7.3 (Discrete Distribution). A random variable X is discrete
if its range is countable. Its CDF is a step function:
X
FX (x) = p(xi )
xi ≤x
where p(xi ) = P (X = xi ) is the probability mass function.
Example 7.4 (Bernoulli Distribution). For X ∼ Bernoulli(p):
0
x<0
FX (x) = 1 − p 0≤x<1
1 x≥1
with jumps of size 1 − p at 0 and p at 1.
Definition 7.5 (Absolutely Continuous Distribution). X is absolutely
continuous if its CDF can be expressed as:
Z x
FX (x) = fX (t)dt
−∞
R∞
for some density function fX ≥ 0 with −∞ fX (t)dt = 1.
Theorem 7.6 (Radon-Nikodym for Probability Measures). If FX is abso-
lutely continuous, then fX = FX′ almost everywhere.
Example 7.7 (Normal Distribution). For X ∼ N (µ, σ 2 ):
Z x
1 (t−µ)2
FX (x) = √ e− 2σ2 dt
2πσ −∞
which is differentiable everywhere with:
1 (x−µ)2
fX (x) = √ e− 2σ2
2πσ
7.3 Decomposition of Distributions
Theorem 7.8 (Lebesgue Decomposition). Any CDF F can be uniquely de-
composed as:
F = αFd + βFac + γFsc
where α + β + γ = 1 and:
7.4. CONVERGENCE OF DISTRIBUTION FUNCTIONS 35
• Fd is discrete (pure jumps)
• Fac is absolutely continuous
• Fsc is singular continuous (continuous but not absolutely continuous)
Example 7.9 (Cantor Distribution). The Cantor function Fsc is:
• Continuous everywhere
• Differentiable almost everywhere with Fsc
′ = 0 a.e.
• Not absolutely continuous
• Supported on the Cantor set (measure zero)
7.4 Convergence of Distribution Functions
Definition 7.10 (Weak Convergence). A sequence Fn converges weakly
to F (Fn ⇒ F ) if:
lim Fn (x) = F (x)
n→∞
at all continuity points x of F .
Theorem 7.11 (Helly’s Selection Theorem). Every sequence of CDFs has a
subsequence that converges weakly to some right-continuous, non-decreasing
function F (possibly defective: F (∞) − F (−∞) < 1).
Theorem 7.12 (Levy Continuity Theorem). For characteristic functions
ϕn (t) = E[eitXn ]:
Fn ⇒ F ⇐⇒ ϕn (t) → ϕ(t) for all t
where ϕ is the characteristic function of F .
7.5 Empirical Distribution Functions
Definition 7.13 (Empirical CDF). For i.i.d. samples X1 , . . . , Xn from F ,
the empirical CDF is:
n
1X
F̂n (x) = I(−∞,x] (Xi )
n
i=1
36CHAPTER 7. DISTRIBUTION FUNCTIONS AND THEIR PROPERTIES
Theorem 7.14 (Glivenko-Cantelli).
a.s.
sup |F̂n (x) − F (x)| → 0
x∈R
a.s.
Proof. 1. For fixed x, F̂n (x) → F (x) by SLLN.
2. Uniform convergence follows from monotonicity and compactification
of R.
Theorem 7.15 (Donsker’s Theorem).
√
n(F̂n − F ) ⇒ B ◦ F
where B is a standard Brownian bridge.
7.6 Applications and Examples
Example 7.16 (Exponential Distribution Validation). For F (x) = 1 − e−λx
(x ≥ 0):
• Right-continuous: limy↓x F (y) = F (x)
• Monotone: F ′ (x) = λe−λx > 0 for x > 0
• Limits: F (0+ ) = 0, F (∞) = 1
2 /2
Example 7.17 (Rayleigh Distribution). For F (x) = 1 − e−x (x ≥ 0):
2
• F ′ (x) = xe−x /2 is non-negative
R∞ 2
• 0 xe−x /2 dx = 1
• Corresponds to X = Y12 + Y22 where Yi ∼ N (0, 1)
p
Appendix A
Detailing on selected concepts
A.1 De Morgan’s Laws
Complete Proof of Both Laws. We provide a more detailed proof using logi-
cal equivalences:
1. First Law: (A ∪ B)c = Ac ∩ B c
For any x ∈ Ω:
x ∈ (A ∪ B)c ⇔ x ∈
/ A∪B
⇔ ¬(x ∈ A or x ∈ B)
/ A and x ∈
⇔x∈ /B (by De Morgan’s laws in logic)
⇔ x ∈ A and x ∈ B
c c
⇔ x ∈ Ac ∩ B c
2. Second Law: (A ∩ B)c = Ac ∪ B c
For any x ∈ Ω:
x ∈ (A ∩ B)c ⇔ x ∈
/ A∩B
⇔ ¬(x ∈ A and x ∈ B)
/ A or x ∈
⇔x∈ /B (by De Morgan’s laws in logic)
⇔ x ∈ A or x ∈ B
c c
⇔ x ∈ Ac ∪ B c
37
38 APPENDIX A. DETAILING ON SELECTED CONCEPTS
A.2 Validating Borel Sets
Definition A.1 (Borel Set). A Borel set is any set in a topological space
that can be formed from open sets (or equivalently, closed sets) through:
• Countable unions
• Countable intersections
• Complements
The collection of all Borel sets forms a σ-algebra called the Borel σ-algebra,
denoted B(X).
A.2.1 Validation Method
To verify if a set A is Borel:
1. Check if A can be constructed from open/closed sets using countable
operations
2. Verify membership in the Borel σ-algebra:
A ∈ B(X)
3. Alternatively, show it’s measurable with respect to all Borel measures
Example A.2 (Standard Borel Sets). The following are Borel sets in R with
the usual topology:
• All open intervals (a, b)
• All closed intervals [a, b]
• The rational numbers Q (countable union of singletons)
• The Cantor set (closed, uncountable intersection of closed sets)
Rationals are Borel. [
Q= {q}
q∈Q
Each singleton {q} is closed (hence Borel), and the countable union preserves
Borel measurability.
A.2. VALIDATING BOREL SETS 39
Example A.3 (Constructing Borel Sets). The set A = (0, 1) ∪ {2} is Borel
because:
• (0, 1) is open (hence Borel)
• {2} is closed (hence Borel)
• The union of two Borel sets is Borel
A.2.2 Non-Borel Examples
[Vitali Set] There exists a subset V ⊂ [0, 1] that is not Borel measurable.
Such sets:
• Require the Axiom of Choice to construct
• Are not Lebesgue measurable
• Cannot be obtained through countable operations on open/closed sets
[Non-Borel in Product Space] Let A ⊂ R2 be the set of points with both
coordinates rational:
A=Q×Q
While this appears simple, if we equip R2 with the σ-algebra generated by
products of Borel sets, A is Borel. However, more exotic sets like:
B = {(x, y) : x − y ∈ V }
where V is a Vitali set, are typically non-Borel.
A.2.3 Verification Techniques
Theorem A.4 (Borel Hierarchy). The Borel σ-algebra can be stratified into
levels:
• Σ01 : Open sets
• Π01 : Closed sets
• Σ02 : Countable unions of closed sets (Fσ )
• Π02 : Countable intersections of open sets (Gδ )
• And so on through transfinite induction
40 APPENDIX A. DETAILING ON SELECTED CONCEPTS
A set is Borel if it appears at some level of this hierarchy.
Example A.5 (Gδ but not Fσ ). The set of irrational numbers R \ Q is:
• Gδ : Can be written as q∈Q R \ {q}
T
• Not Fσ in R (requires Baire category theorem)
but still Borel as it’s in the hierarchy.
A.3 Borel-Cantelli Lemma
Theorem A.6 (Borel-Cantelli Lemma). Let (Ω, F , P ) be a probability space
and {An }∞
n=1 a sequence of events.
1. If ∞
P
n=1 P (An ) < ∞, then P (lim sup An ) = 0
2. If {An } are independent and ∞
P
n=1 P (An ) = ∞, then P (lim sup An ) =
1
where lim sup An = ∞
T S∞
n=1 k=n Ak (“infinitely many An occur").
Proof of Part 1.
∞ [
∞
!
\
P (lim sup An ) = P Ak
n=1 k=n
∞
!
[
= lim P Ak (by continuity of measure)
n→∞
k=n
∞
X
≤ lim P (Ak ) (subadditivity)
n→∞
k=n
= 0 (since the tail of a convergent series tends to 0)
Proof of Part 2. Using independence and the inequality 1 − x ≤ e−x :
N N
!
\ Y
P Ack = (1 − P (Ak ))
k=n k=n
N
!
X
≤ exp − P (Ak ) → 0 as N → ∞
k=n
A.4. RADON-NIKODYM THEOREM 41
Thus:
∞ ∞
! !
[ \
P Ak =1−P Ack =1
k=n k=n
and:
∞
!
[
P (lim sup An ) = lim P Ak =1
n→∞
k=n
A.4 Radon-Nikodym Theorem
Definition A.7 (Absolute Continuity). A measure ν is absolutely con-
tinuous with respect to µ (denoted ν ≪ µ) if:
µ(E) = 0 =⇒ ν(E) = 0 ∀E ∈ F
Theorem A.8 (Radon-Nikodym). Let (Ω, F ) be a measurable space with
σ-finite measures µ and ν. If ν ≪ µ, then there exists a measurable f : Ω →
[0, ∞) such that: Z
ν(E) = f dµ ∀E ∈ F
E
dν
This f is called the Radon-Nikodym derivative, denoted dµ .
Sketch of Proof. The proof proceeds in several steps:
1. Finite Measures Case:
• Consider the set F = {f ≥ 0 : E f dµ ≤ ν(E)∀E}
R
• Take f0 to be a maximal element (exists by Zorn’s Lemma)
• Show ν(E) − E f0 dµ defines a zero measure
R
2. σ-finite Extension:
• Decompose Ω = ∞ n=1 Ωn with µ(Ωn ), ν(Ωn ) < ∞
S
• Apply finite case to each Ωn to get fn
• Define f = ∞
P
n=1 fn 1Ωn
Corollary A.9 (Existence of Conditional Expectation). For X ∈ L1 (Ω, F , P )
and sub-σ-algebra G ⊆ F , the Rconditional expectation E[X|G ] is the Radon-
Nikodym derivative of ν(E) = E XdP with respect to P |G .
42 APPENDIX A. DETAILING ON SELECTED CONCEPTS
A.5 Applications
Example A.10 (Borel-Cantelli in Random Walks). For a simple random
walk {Sn } on Zd , the event {Sn = 0 infinitely often} has probability 1 if
d ≤ 2 and 0 if d ≥ 3.
Example A.11 (Radon-Nikodym in Finance). In risk-neutral pricing, the
Radon-Nikodym derivative connects physical and risk-neutral measures:
dQ RT 1 T 2
R
= e− 0 θt dWt − 2 0 θt dt
dP