Supervised Learning Exercise sheet
https://slds-lmu.github.io/i2ml/ Curse of Dimensionality
Exercise 1: High-dimensional Gaussian Distributions
Consider a random vector X = (X1 , . . . , Xp )⊤ ∼ N (0, I), i.e., a multivariate normally distributed vector with
mean vector zero and covariance matrix being the identity matrix of dimension p × p. In this case, the coordinates
X1 , . . . , Xp are i.i.d. each with distribution N (0, 1).
(a) Show that E(∥X∥22 ) = p and Var(∥X∥22 ) = 2p, where ∥ · ∥2 is the Euclidean norm.
Hint: EY ∼N (0,1) (Y 4 ) = 3.
√
(b) Use (a) to infer that |E(∥X∥2 − p)| ≤ √1 by using the following steps:
p
√ ∥X∥22 − p (∥X∥22 − p)2
(i) Write ∥X∥2 − p= √ − √ √ .
2 p 2 p(∥X∥2 + p)2
| {z } | {z }
=(1) =(2)
(ii) Compute E[(1)].
Var(∥X∥22 )
(iii) Note that 0 ≤ E[(2)] ≤ 2p3/2
holds due to ∥X∥2 ≥ 0.
(iv) Put (i)–(iii) together.
(c) Use (b) to infer that Var (∥X∥2 ) ≤ 2 by using the following steps:
√
(i) Write Var (∥X∥2 ) = Var ∥X∥2 − p .
(ii) For any random variable Y it holds that Var(Y ) ≤ E(Y 2 ).
√ √
(iii) If you encounter the term E[∥X∥2 ] write it as E[ ∥X∥2 − p + p] and use (b) for (∗).
| {z }
=(∗)
(d) Now let X ′ = (X1′ , . . . , Xp′ )⊤ ∼ N (0, I) be another multivariate normally distributed vector with mean vector
zero and covariance matrix being the identity matrix of dimension p × p. Further, assume that X and X ′ are
′
independent, so that Z := X−X √
2
∼ N (0, I). Conclude from the previous that
r
′ 2
and Var(∥X − X ′ ∥2 ) ≤ 4.
p
E(∥X − X ∥2 − 2p ) ≤
p
(e) From the cosine rule we can infer that for any x, x′ ∈ Rp it holds that
1
⟨x, x′ ⟩ = (∥x∥22 + ∥x′ ∥22 − ∥x − x′ ∥22 ).
2
Use this to show that E⟨X, X ′ ⟩ = 0. Moreover, derive that Var(⟨X, X ′ ⟩) = p.
(f) For different dimensions p, e.g. p ∈ {1, 2, 4, 8, . . . , 1024} create two sets consisting of 100 i.i.d. random obser-
vations from N (0, I), respectively and
√
(i) compute the average Euclidean length of (one of) the sampled sets and compare it to p;
√
(ii) compute the average Euclidean distances between the sampled sets and compare it to 2p;
(iii) compute the average inner products between the sampled sets;
(iv) compute in (i)–(iii) also the empirical variances of the respective terms.
Visualize your results in an appropriate manner.