Lecture 5: The multivariate normal
distribution
The bivariate normal distribution
Suppose µx , µy , σx ≥ 0, σy ≥ 0 and −1 ≤ ρ ≤ 1 are constants.
Define the 2 × 2 matrix Σ by
2
σx ρσx σy
Σ= .
ρσx σy σy2
Then define a joint probability density function by
1 1
fX ,Y (x, y ) = √ exp − Q(x, y )
2π det Σ 2
where
Q(x, y ) = (x − µ)T Σ−1 (x − µ)
and
x µx
x= , µ= .
y µy
If random variables (X , Y ) have joint probability density given by
fX ,Y above, then we say that (X , Y ) have a bivariate normal
distribution and write
(X , Y )T ∼ N2 (µ, Σ).
It can be proved that the function fX ,Y (x, y ) integrates to 1 and
therefore defines a valid joint pdf.
The notes contain expansions of Q(x, y ) and det Σ.
Remarks
1 The vector µ = (µx , µy )T is called the mean vector and the
matrix Σ is called the covariance matrix (or sometimes
variance-covariance matrix).
2 Functions of the form F (x) = x T Σ−1 x are called quadratic
forms. Quadratic forms are functions Rn → R which satisfy
certain properties. They crop up in several areas of
mathematics and statistics.
3 The matrix Σ and its inverse Σ−1 are positive definite. A
matrix A is positive definite if
x T Ax ≥ 0
for all non-zero vectors x.
4 It follows that when µx = µy = 0, Q(x, y ) is a positive
definite quadratic form.
Pictures
σx = σy, ρ = 0
3
90%
2
y
−1
80%
−2
95%
−3 99%
−3 −2 −1 0 1 2 3
x
σx = 2σy, ρ = 0
3
0
y
Pictures −1
80%
−2
95%
−3 99%
−3 −2 −1 0 1 2 3
x
σx = 2σy, ρ = 0
3
2
99%
1 80%
y
−1 90%
95%
−2
−3
−3 −2 −1 0 1 2 3
x
2σx = σy, ρ = 0
3 99%
−1 90%
95%
Pictures −2
−3
−3 −2 −1 0 1 2 3
x
2σx = σy, ρ = 0
3 99%
95%
2
80%
y
−1
−2 90%
−3
−3 −2 −1 0 1 2 3
x
Pictures
σx = σy, ρ = 0.75
3
90%
2
y
−1
80%
−2
95%
−3 99%
−3 −2 −1 0 1 2 3
x
σx = σy, ρ = − 0.75
3 99%
0
y
Pictures −1
80%
−2
95%
−3 99%
−3 −2 −1 0 1 2 3
x
σx = σy, ρ = − 0.75
3 99%
90%
2
y
−1
80%
−2
95%
−3
−3 −2 −1 0 1 2 3
x
2σx = σy, ρ = 0.75
3
−1
Pictures −2
80%
95%
−3
−3 −2 −1 0 1 2 3
x
2σx = σy, ρ = 0.75
3
95%
2
80%
1
y
−1
90%
−2
99%
−3
−3 −2 −1 0 1 2 3
x
Comments
1 Q(x, y ) ≥ 0 with equality only when x = µ. It follows that
the density function has its mode at x = µ.
2 Changing the values of µx , µy does not change the shape of
the plots, but corresponds to a translation of the xy -plane
i.e. changing µx , µy just shifts the contours / surface to a
new mode position.
3 The contours of equal density are circular when σx = σy and
ρ = 0 and elliptical when σx 6= σy or ρ 6= 0.
4 σx and σy control the extent to which the distribution is
dispersed.
5 The parameter ρ is the correlation of X , Y
i.e. Cor (X , Y ) = ρ. Thus for non-zero ρ, the contours are at
an angle to the axes.
Marginals and conditionals
Suppose (X , Y )T ∼ N2 (µ, Σ). Then:-
1 The marginal distributions are normal:
X ∼ N(µx , σx2 ) and
Y ∼ N(µy , σy2 ).
2 The conditional distributions are normal:
σx
X |Y = y ∼ N(µx + ρ (y − µy ), σx2 (1 − ρ2 )) and
σy
σy
Y |X = x ∼ N(µy + ρ (x − µx ), σy2 (1 − ρ2 )).
σx
3 When ρ = 0, X and Y are independent.
4 Linear combinations of X and Y are also normally distributed:
aX + bY ∼ N(aµx + bµy , a2 σx2 + b 2 σy2 + 2abρσx σy )
where a, b are constants.
Example 5.1
Suppose (X , Y )T ∼ N2 (µ, Σ) where µx = 2, µy = 3, σx = 1,
σy = 1 and ρ = 0.5.
Simulate a sample of size 500 from this distribution and draw a
scatter plot.
Use simulation to find Pr X 2 + Y 2 < 9 .
Solution
The marginal distribution of X is X ∼ N(2, 12 ).
Using the formula for the conditional
σy
Y |X = x ∼ N(µy + ρ (x − µx ), σy2 (1 − ρ2 ))
σx
∼ N(3 + 0.5(x − 2), 0.75).
Example 5.1
Suppose (X , Y )T ∼ N2 (µ, Σ) where µx = 2, µy = 3, σx = 1,
σy = 1 and ρ = 0.5.
Simulate a sample of size 500 from this distribution and draw a
scatter plot.
Use simulation to find Pr X 2 + Y 2 < 9 .
Solution
The marginal distribution of X is X ∼ N(2, 12 ).
Using the formula for the conditional
σy
Y |X = x ∼ N(µy + ρ (x − µx ), σy2 (1 − ρ2 ))
σx
∼ N(3 + 0.5(x − 2), 0.75).
Simulation results
1 n p t s = 500
2 x = rnorm ( n p t s , mean=2, s d = 1 )
3 y = rnorm ( n p t s , mean=3+0.5∗ ( x −2) , s d=s q r t ( 0 . 7 5 ) )
●
6
● ●
● ●
● ● ●●
5
● ● ● ●
● ●●●
● ● ● ●● ● ●
●
● ● ●● ● ● ● ●
●
●●●● ● ● ● ● ●
●● ● ● ●●
● ● ●●●● ● ● ●
4
●●●●● ● ●
●●●●● ●●●
● ●●●● ●
●● ●●●● ●
● ●● ●● ● ●
● ●● ● ●●●● ● ● ●●●● ●●●●●● ● ● ●
● ● ●● ● ●
● ● ● ●
● ●● ●● ●●
● ●●●●●●●● ●●●●●●●●●●●
●●●● ●
●● ●●● ●●●● ●
● ●●●●
●
● ●
●
●● ●●● ● ● ●●
● ●
●● ●
●● ●●●●●●● ●●●
● ● ●● ●●● ● ● ●
● ●● ● ●● ● ● ●● ● ●● ● ●● ●
y
● ●●●
3
● ● ●
● ● ● ● ●
● ●
●●● ● ●●●●●● ●● ● ●● ●●● ●
● ●
● ●●● ●● ●●● ●● ●●
● ●● ● ●● ● ● ●● ● ●●
● ●●●●● ●●
●●●●● ●●
●●
● ● ●●●●●●● ●● ● ● ●
●●
● ●● ● ●●● ● ● ●●●● ●●●
●●● ● ● ●●●● ●
●●●● ● ● ●●● ●●
●
● ● ●●● ● ●●
● ● ● ●● ● ●● ● ●●●
● ● ● ● ●● ●
2
●●
● ●● ●● ●● ●●●● ●●●● ● ● ●
● ●● ● ● ●
●
● ● ● ●● ● ●● ●●● ● ●
● ●●●●
● ●
● ● ● ●
● ● ●● ●
●● ● ● ●
● ● ● ● ● ● ●●
1
●
●●
● ● ●
●●●
●
●
● ●
0
0 1 2 3 4 5
x
Probability calculation
To find Pr X 2 + Y 2 < 9 approximately, count the number of
points in the region:
1 n p t s = 10000
2 x = rnorm ( n p t s , mean=2, s d = 1 )
3 y = rnorm ( n p t s , mean=3+0.5∗ ( x −2) , s d=s q r t ( 0 . 7 5 ) )
4 f = xˆ2+y ˆ2
5 sum ( f <9)/ n p t s
Answer ' 0.2776
Extra example
Suppose
X 4 8 2
∼ N2 , .
Y 1 2 5
The random variable Z is defined by Z = X + 3Y . What is the
distribution of Z ?
Extra example
We have Z = X + 3Y . Using result 4 on page 31, we have
E [Z ] = 1 × µx + 3 × µy = 1 × 4 + 3 × 1 = 7.
Now from the variance-covariance matrix, we have ρσx σy = 2.
Thus
Var (Z ) = 12 × σx2 + 32 × σy2 + 2 × 1 × 3 × (ρσx σy )
= 1×8+9×5+2×1×3×2
= 65.
Therefore Z ∼ N(7, 65).
Extra example
We have Z = X + 3Y . Using result 4 on page 31, we have
E [Z ] = 1 × µx + 3 × µy = 1 × 4 + 3 × 1 = 7.
Now from the variance-covariance matrix, we have ρσx σy = 2.
Thus
Var (Z ) = 12 × σx2 + 32 × σy2 + 2 × 1 × 3 × (ρσx σy )
= 1×8+9×5+2×1×3×2
= 65.
Therefore Z ∼ N(7, 65).
Extra example
We have Z = X + 3Y . Using result 4 on page 31, we have
E [Z ] = 1 × µx + 3 × µy = 1 × 4 + 3 × 1 = 7.
Now from the variance-covariance matrix, we have ρσx σy = 2.
Thus
Var (Z ) = 12 × σx2 + 32 × σy2 + 2 × 1 × 3 × (ρσx σy )
= 1×8+9×5+2×1×3×2
= 65.
Therefore Z ∼ N(7, 65).
Extra example
We have Z = X + 3Y . Using result 4 on page 31, we have
E [Z ] = 1 × µx + 3 × µy = 1 × 4 + 3 × 1 = 7.
Now from the variance-covariance matrix, we have ρσx σy = 2.
Thus
Var (Z ) = 12 × σx2 + 32 × σy2 + 2 × 1 × 3 × (ρσx σy )
= 1×8+9×5+2×1×3×2
= 65.
Therefore Z ∼ N(7, 65).
Extra example
−20 0 20 40
The multivariate normal distribution
The multivariate normal distribution is defined on vectors in Rn .
Suppose that X is a random vector with n entries, i.e.
X = (X1 , . . . , Xn )T .
Then
X ∼ Nn (µ, Σ)
if X1 , . . . , Xn have joint PDF given by
1 1
fX (x) = √ exp − Q(x)
2π det Σ 2
where
Q(x) = (x − µ)T Σ−1 (x − µ).
This definition makes sense for any column vector µ ∈ Rn and any
positive definite n × n matrix Σ.
Remarks
1 The vector µ is the mean of the distribution and Σ is called
the covariance matrix.
2 All the marginal distributions of X are normal. (We do not
specify their parameters here, however).
3 Similarly, all the conditional distributions of X are normal.
(Again, we do not specify the parameters of these
distributions here).