High-Dimensional Space
Prof. Asim Tewari
IIT Bombay
Asim Tewari, IIT Bombay ME 781: Engineering Data Mining and Applications
Dimensionality reduction
• Feature selection: Feature selection approaches try to find a subset
of the original variables (also called features or attributes)
– Filter strategy (e.g. information gain)
– Wrapper strategy (e.g. search guided by accuracy)
– Embedded strategy (features are selected to add or be removed while
building the model based on the prediction errors)
• Feature projection: Feature projection transforms the data in the
high-dimensional space to a space of fewer dimensions. The data
transformation may be linear, as in principal component analysis
(PCA), but many nonlinear dimensionality reduction techniques also
exist.For multidimensional data, tensor representation can be used
in dimensionality reduction through multilinear subspace learning.
Asim Tewari, IIT Bombay ME 781: Engineering Data Mining and Applications
Markov’s inequality
Asim Tewari, IIT Bombay ME 781: Engineering Data Mining and Applications
Markov’s inequality
• Let x be a nonnegative random variable. Then
for a > 0,
Proof: For a continuous nonnegative random variable x with probability density p,
Thus,
Asim Tewari, IIT Bombay ME 781: Engineering Data Mining and Applications
Markov’s inequality
• Let x be a nonnegative random variable. Then
for a > 0,
Asim Tewari, IIT Bombay ME 781: Engineering Data Mining and Applications
Chebyshev’s inequality
• Let x be a random variable. Then for c > 0,
Asim Tewari, IIT Bombay ME 781: Engineering Data Mining and Applications
Law of Large Numbers
• Let x1, x2, . . . , xn be n independent samples of
a random variable x. Then
Asim Tewari, IIT Bombay ME 781: Engineering Data Mining and Applications
Law of Large Numbers
Asim Tewari, IIT Bombay ME 781: Engineering Data Mining and Applications
Law of Large Numbers
Implications of the Law of Large Numbers:
• If we draw a point x from a d-dimensional Gaussian
with unit variance, then it will lie on a sphere of
radius sqrt(d).
This is because
|x|2 ≈ d
Asim Tewari, IIT Bombay ME 781: Engineering Data Mining and Applications
Law of Large Numbers
Implications of the Law of Large Numbers:
• If we draw two point y and z from a d-dimensional
Gaussian with unit variance, then they would be
approximately orthogonal
This is since for all i,
Therefore,
Thus by the Pythagorean theorem, the two points y and z
must be approximately orthogonal.
Asim Tewari, IIT Bombay ME 781: Engineering Data Mining and Applications
Volume in objects of High Dimensions
Asim Tewari, IIT Bombay ME 781: Engineering Data Mining and Applications
Volume of the Unit Ball
Asim Tewari, IIT Bombay ME 781: Engineering Data Mining and Applications
Volume of the Unit Ball
Asim Tewari, IIT Bombay ME 781: Engineering Data Mining and Applications
Volume of the Unit Ball
Asim Tewari, IIT Bombay ME 781: Engineering Data Mining and Applications
Volume of the Unit Ball
Asim Tewari, IIT Bombay ME 781: Engineering Data Mining and Applications
Volume near the Equator
• An interesting fact about the unit ball in high
dimensions is that most of its volume is
concentrated near its “equator.” In particular,
for any unit-length vector v defining “north,”
most of the volume of the unit ball lies in the
thin slab of points whose dot- product with v
has magnitude O(1/ √d).
Asim Tewari, IIT Bombay ME 781: Engineering Data Mining and Applications
Volume near the Equator
Let A denote the portion of the ball with and let H denote the upper hemisphere.
We can then show that the ratio of the volume of A to the
volume of H goes to zero by calculating an upper bound
on volume(A) and a lower bound on volume(H) and
proving that
Asim Tewari, IIT Bombay ME 781: Engineering Data Mining and Applications
Volume near the Equator
Asim Tewari, IIT Bombay ME 781: Engineering Data Mining and Applications
Volume near the Equator
Thus:
Asim Tewari, IIT Bombay ME 781: Engineering Data Mining and Applications
Random Projection
Johnson-Lindenstrauss Lemma
• The projection f : Rd → Rk. Pick k Gaussian vectors u1,
u2, . . . , uk in Rd with unit-variance coordinates. For any
vector v, define the projection f (v) by:
f (v) = (u1 · v, u2 · v, . . . , uk · v).
( The projection f (v) is the vector of dot products of v
with the ui) . Then, the Johnson-Lindenstrauss Lemma
States that:
• With high probability, |f (v)| ≈√k|v|
• And for any two vectors v1 and v2,
f (v1 − v2) = f (v1) − f (v2)
Asim Tewari, IIT Bombay ME 781: Engineering Data Mining and Applications