The Gaussian Distribution
Machine Learning and Pattern Recognition
Chris Williams
School of Informatics, University of Edinburgh
August 2014
(All of the slides in this course have been adapted from
previous versions by Charles Sutton, Amos Storkey, David Barber.)
1 / 17
Outline
A useful model for real-valued quantities
Univariate Gaussian
Multivariate Gaussian
Maximum likelihood estimation
Class conditional classification
Reading: Murphy 4.1.2, 4.1.3 (without proof), 4.2 up to end
of 4.2.1; or Barber 8.4 up to start of 8.4.1 and 8.8 up to start
of 8.8.2.
2 / 17
The Gaussian Distribution
The Gaussian distribution is one of the most common
distributions over continuous variables.
The one dimensional Gaussian distribution is given by
P (x|, 2 ) = N (x; , 2 ) =
1
2 2
exp
(x )2
2 2
x N (, 2 ) (x is distributed as...).
is the mean of the Gaussian and 2 is the variance.
If = 0 and 2 = 1 then N (x; , 2 ) is called a standard
Gaussian.
3 / 17
Plot
0.4
0.3
0.2
0.1
This is a standard one dimensional Gaussian distribution.
All Gaussians have the same shape subject to scaling and
displacement.
If x is distributed N (, 2 ), then y = (x )/ is distributed
N (0, 1).
4 / 17
Normalization
Remember all distributions must integrate to one. The 2 2
is called a normalization constant - it ensures this is the case.
Hence tighter Gaussians have higher peaks:
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
8
5 / 17
Maximum Likelihood Estimation
Maximum likelihood: Set = 1/ 2 Take derivatives
n
N
N
1X
(xn )2
log(2) +
log
2
2
2
X
log P (X|, )
=
(xn )
n
1X n
N
log P (X|, )
=
(x )2 +
2 n
2
log P (X|, ) =
Hence equating
to zero:
= (1/N )
P derivatives
2 = (1/N ) n (xn
)2 .
nx
and
6 / 17
Multivariate Gaussian
I
The vector x is multivariate Gaussian if for mean and
covariance matrix , it is distributed according to
1
1
T 1
exp (x ) (x )
P (x|, ) =
2
|(2)|1/2
The univariate Gaussian is a special case of this.
Shorthand: x N (, )
is called a covariance matrix, i.e., each element says
ij = Cov(Xi , Xj ), where
Cov(Xi , Xj ) = E[(Xi i )(Xj j )]
must be symmetric and positive definite
7 / 17
Multivariate Gaussian: Picture
0.16
0.14
0.12
0.1
0.08
0.06
0.04
0.02
0
3
2
2
0
1
0
2
3
8 / 17
Mahalanobis Distance
d2 (xi , xj ) = (xi xj )T 1 (xi xj )
I
d2 (xi , xj ) is called the Mahalanobis distance between xi and xj
If is diagonal, the contours of d2 are axis-aligned ellipsoids
If is not diagonal, the contours of d2 are rotated ellipsoids
= U U T
where is diagonal and U is a rotation matrix (eigendecomposition
of )
is positive definite entries in are positive
9 / 17
Multivariate Gaussian: Maximum Likelihood
I
I
I
The Maximum Likelihood estimate can be found in the same
way.
P
n
= (1/N ) N
n=1 x
PN
= (1/N ) n=1 (xn )(xn )T
Sometimes the Gaussian is parameterized in terms of the
precision matrix = 1 .
10 / 17
Example
I
The data.
6
6
6
11 / 17
Example
I
The data. The maximum likelihood fit.
6
6
6
12 / 17
Class conditional classification
Example
Suppose you have variables position and class where the
position is a location in D-dimensional space. Suppose you have
data D consisting of examples of position and class. If we
assume that all the points with a particular class label are
Gaussian, describe how, using the data, you could predict the
class for a previously unseen position (and give the accuracy of
the prediction).
13 / 17
Class conditional classification
I
Learning: Fit Gaussian to data in each class (class conditional
fitting). Gives p(position|class)
Find estimate for probability of each class (see last lecture)
p(class)
Inference: Given a new position, we can ask What is the
probability of this point being generated by each of the
Gaussians?
Better still give probability using Bayes rule
P (class|position) P (position|class)P (class)
Then can get ratio
P (class = 1|position)/P (class = 0|position).
Decision boundary for two classes is where this ratio is one.
14 / 17
Key Facts About Gaussians
Sums of Gaussian RVs are Gaussian
Linear Gaussian models are jointly Gaussian. In general, let
p(x) = N (x|x , x )
p(y|x) = N (y|Ax + b, n )
Then p(x, y) is Gaussian, and so is p(x|y). See Murphy 4.3.
If p(x, y) a multivariate Gaussian, both the marginals
p(x), p(y) and the conditionals p(x|y), p(y|x) are Gaussian.
15 / 17
Inference in Gaussian models
I
Partition variables into two groups, X1 and X2
1
=
2
=
11
21
12
22
c1|2 = 1 + 12 1
22 (x2 2 )
c1|2 = 11 12 1
22 21
I
For proof see e.g. 4.3.4 of Murphy (2012) (not examinable)
16 / 17
Summary
A useful model for real-valued quantities
Univariate Gaussian
Multivariate Gaussian
Maximum likelihood estimation
Class conditional classification
17 / 17