Random Variables & Independence
STA 211: The Mathematics of Regression
Niccolo Anceschi Ph.D.
1
Slides adapted from lectures by Prof. Jerry Reiter
Introduction
▶ Probability Theory: the math of randomness
▶ A random variable is an abstract object that can take a set of
possible values with certain probabilities, rather than a single
deterministic value
▶ Useful to model experimental outcomes
▶ Statistical models are probability distributions with unknown
parameters.
▶ The problems considered by probability and statistics are
inverse to each other.
▶ In probability theory we consider some underlying process that
has some randomness or uncertainty modeled by random
variables, and we figure out what happens.
▶ In statistics we observe something that has happened, and try
to figure out what underlying process would explain those
observations.
Marginal Distribution and Expectation of RVs
A random variable X ∈ R can be discrete or continuous.
▶ Discrete: The marginal distribution is given by the probability
mass function (PMF) pX : Z → [0, 1] defined as
pX (x) = Pr(X = x) ∀x ∈ Z
▶ Continuous: The marginal distribution is given by a
probability density function (PDF) fX : R → [0, ∞) defined as
Z b
Pr(a ≤ X < b) = fX (x)dx
a
In this case, we have Pr(X = x) = 0 for every x ∈ R
▶ For a real-valued function g , we can calculate expectations:
(P
g (x)pX (x) if X is discrete
E(g (X )) = R x∈Z
g (x)fX (x)dx if X is continuous
Marginal distributions do not tell us everything
Suppose I tell you:
▶ I X is a toss of a fair die,
▶ I Y is a toss of a fair die.
Do you have all the information about X and Y now?
No. Because we can still have many different scenarios:
1. I toss one dice X . Y = X is the result of the same toss.
2. I toss two dice: X is the first toss while Y is the second.
3. I toss one dice X . Y = 7 − X is the number at the bottom
(not top) of the tossed dice.
In all cases: X & Y have the marginal distribution of a fair dice
toss.
Joint Distribution for discrete RVs
All the information (called joint distribution) about two discrete
RVs X ,Y is captured by a joint PMF:
pX ,Y (x, y ) = Pr(X = x, Y = y ),
where the comma is interpreted as an “and” (or intersection).
For example, in the previous scenario:
1. Same Toss X = Y : pX ,Y (x, y ) = 1/6 if x = y ∈ {1, . . . , 6}
and pX ,Y (x, y ) = 0 otherwise
2. Two Tosses X , Y : pX ,Y (x, y ) = 1/62 for any x, y ∈ {1, ..., 6}
3. Opposite sides Y = 7 − X : pX ,Y (x, y ) = 1/6 if x ∈ 1, ..., 6
and y = 7 − x , and pX ,Y (x, y ) = 0 otherwise.
Basic properties of the joint PMF
Law of total probability:
XX
pX ,Y (x, y ) = 1.
x∈Z y ∈Z
Recovery of marginal distributions:
X X
pX (x) = pX ,Y (x, y ) and pY (y ) = pX ,Y (x, y )
y ∈Z x∈Z
Finally, we can calculate the expectation of a function g of X , Y
using the formula:
XX
E(g (X , Y )) = g (x, y )pX ,Y (x, y )
x∈Z y ∈Z
Suggested exercise #1: For X and Y in all previous scenarios 1-3:
(i) Show X and Y have the marginal PMF pX (t) = pY (t) = 1/6
if t ∈ {1, . . . , 6}.
(ii) Compute E(X · Y ).
(iii) Compute E(X + Y ).
For the third part, you can use the fact that for any constants
a, b ∈ R, it holds:
E(aX + bY ) = aE(X ) + bE(Y ).
This is called the linearity of expectations.
Joint PDF for continuous RVs
Given two continuous RVs X , Y their joint distribution is captured
by a joint PDF fX ,Y : R × R → [0, 1) such that:
Z b2 Z b1
Pr(a1 ≤ X < b1 , a2 ≤ Y < b2 ) = fX ,Y (x, y ) dx dy ,
a2 a1
where the comma is interpreted as an “and” (or intersection).
Using the above condition, it is possible to show:
R∞ R∞
▶ Law of total probability: −∞ −∞ fX ,Y (x, y ) dx dy = 1
R∞
▶ Recovery of marginal PDF: fX (x) = −∞ fX ,Y (x, y ) dy = 1
R∞
and fY (y ) = −∞ fX ,Y (x, y ) dx = 1
Finally, if X and Y are continuous RVs
Z ∞Z ∞
E(g (X , Y )) = g (x, y )fX ,Y (x, y ) dx dy
−∞ −∞
Independence
Two discrete RVs X , Y are called independent if
pX ,Y (x, y ) = pX (x)pY (y ) ∀x, y ∈ Z.
Similarly, two continuous RVs are called independent
fX ,Y (x, y ) = fX (x)fY (y ) ∀x, y ∈ R.
Thus X and Y are independent if their joint PMF (or PDF) is a
product of their marginals.
Suggested exercise #2: Consider X and Y in the previous
scenarios 1-3. Show that X and Y are independent in scenario 2
and dependent in scenarios 1 and 3.
Measuring dependence using covariance
Given two RVs X , Y ∈ R, one way to measure the dependence
between them is through covariance:
Cov (X , Y ) = E[(X − E(X ))(Y − E(Y ))].
Correlation is the normalized form defined as:
Cov (X , Y )
Corr (X , Y ) = p p ∈ [−1, 1]
Var (X ) Var (Y )
Suggested exercise #3:
(i) Show that, if X and Y are independent, then
Cov (g (X ), h(Y )) = 0
for all real-valued functions g , h.
The converse statement is also true.
Covariance identities
Suggested exercise #4:
(i) Compute Cov (X , Y ) in the previous scenarios 1-3.
You may use the covariance identity:
Cov (X , Y ) = E [XY ] − E (X )E (Y ).
The special case X = Y yields Var (X ) = E (X 2 ) − E (X )2 .
Covariance depends on the scale but not on translation:
Cov (aX + b, cY + d) = a c Cov (X , Y ).
Finally, covariance appears in the formula for variance of sum:
Var (X + Y ) = Var (X ) + Var (Y ) + 2Cov (X , Y ).