High-Dimensional Data Mining Techniques

The document discusses several topics related to high-dimensional space, including dimensionality reduction techniques like feature selection and feature projection. It also covers concepts like the law of large numbers, volume in high-dimensional objects, and how most of the volume in a high-dimensional unit ball is concentrated near its equator. Random projection and the Johnson-Lindenstrauss lemma are also summarized.

Uploaded by

Jathin Sreevas

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

102 views20 pages

High-Dimensional Data Mining Techniques

Uploaded by

Jathin Sreevas

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

High-Dimensional Space

Prof. Asim Tewari

IIT Bombay

Asim Tewari, IIT Bombay ME 781: Engineering Data Mining and Applications
Dimensionality reduction
• Feature selection: Feature selection approaches try to find a subset
of the original variables (also called features or attributes)
– Filter strategy (e.g. information gain)
– Wrapper strategy (e.g. search guided by accuracy)
– Embedded strategy (features are selected to add or be removed while
building the model based on the prediction errors)

• Feature projection: Feature projection transforms the data in the

high-dimensional space to a space of fewer dimensions. The data
transformation may be linear, as in principal component analysis
(PCA), but many nonlinear dimensionality reduction techniques also
exist.For multidimensional data, tensor representation can be used
in dimensionality reduction through multilinear subspace learning.

Asim Tewari, IIT Bombay ME 781: Engineering Data Mining and Applications
Markov’s inequality

Asim Tewari, IIT Bombay ME 781: Engineering Data Mining and Applications
Markov’s inequality
• Let x be a nonnegative random variable. Then
for a > 0,

Proof: For a continuous nonnegative random variable x with probability density p,

Thus,

Asim Tewari, IIT Bombay ME 781: Engineering Data Mining and Applications
Markov’s inequality
• Let x be a nonnegative random variable. Then
for a > 0,

Asim Tewari, IIT Bombay ME 781: Engineering Data Mining and Applications
Chebyshev’s inequality
• Let x be a random variable. Then for c > 0,

Asim Tewari, IIT Bombay ME 781: Engineering Data Mining and Applications
Law of Large Numbers
• Let x1, x2, . . . , xn be n independent samples of
a random variable x. Then

Asim Tewari, IIT Bombay ME 781: Engineering Data Mining and Applications
Law of Large Numbers

Asim Tewari, IIT Bombay ME 781: Engineering Data Mining and Applications
Law of Large Numbers
Implications of the Law of Large Numbers:
• If we draw a point x from a d-dimensional Gaussian
with unit variance, then it will lie on a sphere of
radius sqrt(d).
This is because
|x|2 ≈ d

Asim Tewari, IIT Bombay ME 781: Engineering Data Mining and Applications
Law of Large Numbers
Implications of the Law of Large Numbers:
• If we draw two point y and z from a d-dimensional
Gaussian with unit variance, then they would be
approximately orthogonal
This is since for all i,

Therefore,
Thus by the Pythagorean theorem, the two points y and z
must be approximately orthogonal.
Asim Tewari, IIT Bombay ME 781: Engineering Data Mining and Applications
Volume in objects of High Dimensions

Asim Tewari, IIT Bombay ME 781: Engineering Data Mining and Applications
Volume of the Unit Ball

Asim Tewari, IIT Bombay ME 781: Engineering Data Mining and Applications
Volume near the Equator
• An interesting fact about the unit ball in high
dimensions is that most of its volume is
concentrated near its “equator.” In particular,
for any unit-length vector v defining “north,”
most of the volume of the unit ball lies in the
thin slab of points whose dot- product with v
has magnitude O(1/ √d).

Asim Tewari, IIT Bombay ME 781: Engineering Data Mining and Applications
Volume near the Equator

Let A denote the portion of the ball with and let H denote the upper hemisphere.

We can then show that the ratio of the volume of A to the

volume of H goes to zero by calculating an upper bound
on volume(A) and a lower bound on volume(H) and
proving that

Asim Tewari, IIT Bombay ME 781: Engineering Data Mining and Applications
Volume near the Equator

Thus:

Asim Tewari, IIT Bombay ME 781: Engineering Data Mining and Applications
Random Projection
Johnson-Lindenstrauss Lemma
• The projection f : Rd → Rk. Pick k Gaussian vectors u1,
u2, . . . , uk in Rd with unit-variance coordinates. For any
vector v, define the projection f (v) by:
f (v) = (u1 · v, u2 · v, . . . , uk · v).
( The projection f (v) is the vector of dot products of v
with the ui) . Then, the Johnson-Lindenstrauss Lemma
States that:
• With high probability, |f (v)| ≈√k|v|
• And for any two vectors v1 and v2,
f (v1 − v2) = f (v1) − f (v2)

Asim Tewari, IIT Bombay ME 781: Engineering Data Mining and Applications

ME 781: Data Mining Overview
No ratings yet
ME 781: Data Mining Overview
27 pages
ME 781 - Statistical Machine: Learning and Data Mining
0% (1)
ME 781 - Statistical Machine: Learning and Data Mining
2 pages
Lambert Conformal Conic Projection For India
No ratings yet
Lambert Conformal Conic Projection For India
4 pages
Digital Image Processing (DIP) of Remotely Sensed Data CE 712
No ratings yet
Digital Image Processing (DIP) of Remotely Sensed Data CE 712
17 pages
B Tech CSEDS - Semester III
No ratings yet
B Tech CSEDS - Semester III
14 pages
Basic Definitions
No ratings yet
Basic Definitions
15 pages
ISI Kolkata Placement Prep Guide
No ratings yet
ISI Kolkata Placement Prep Guide
9 pages
Earthquake Management and Engineering Guide
100% (1)
Earthquake Management and Engineering Guide
152 pages
00 ME781 Merged Till SVM
No ratings yet
00 ME781 Merged Till SVM
604 pages
Introduction To Official Statistics Lecture 1
No ratings yet
Introduction To Official Statistics Lecture 1
9 pages
Understanding Cluster Analysis Basics
No ratings yet
Understanding Cluster Analysis Basics
51 pages
PCA Guide for B.Tech Students
No ratings yet
PCA Guide for B.Tech Students
10 pages
Bayesian Parametric Inference Ashok K. Bansal
No ratings yet
Bayesian Parametric Inference Ashok K. Bansal
455 pages
K Means Clustering Project
100% (1)
K Means Clustering Project
2 pages
7th English Socialscience 1
No ratings yet
7th English Socialscience 1
196 pages
Social and Economic Statistics
100% (1)
Social and Economic Statistics
100 pages
Resampling Techniques for Engineers
No ratings yet
Resampling Techniques for Engineers
15 pages
21MES102L Engineering Graphics and Design School of Mechanical Engineering
No ratings yet
21MES102L Engineering Graphics and Design School of Mechanical Engineering
35 pages
Artificial Intelligence Question Bank
No ratings yet
Artificial Intelligence Question Bank
11 pages
Quantitative Mueller Matrix Polarimetry With Diverse Applications
No ratings yet
Quantitative Mueller Matrix Polarimetry With Diverse Applications
41 pages
Advanced Image Recognition Techniques
No ratings yet
Advanced Image Recognition Techniques
8 pages
EE 769 Introduction To Machine Learning: Sheet 4 - 2020-21-2 Linear Classification
No ratings yet
EE 769 Introduction To Machine Learning: Sheet 4 - 2020-21-2 Linear Classification
4 pages
7th - 2 - Learning Through Maps
No ratings yet
7th - 2 - Learning Through Maps
8 pages
RRB JE Mechanical Machine Tools Full
No ratings yet
RRB JE Mechanical Machine Tools Full
30 pages
Understanding Machine Learning Basics
No ratings yet
Understanding Machine Learning Basics
38 pages
The Basics of MRI
No ratings yet
The Basics of MRI
5 pages
Understanding Qualitative and Quantitative Data
No ratings yet
Understanding Qualitative and Quantitative Data
89 pages
Session 13A - The ARMA and ARIMA Models
No ratings yet
Session 13A - The ARMA and ARIMA Models
173 pages
Introduction to Remote Sensing Basics
No ratings yet
Introduction to Remote Sensing Basics
40 pages
Machine Learning Techniques Overview
No ratings yet
Machine Learning Techniques Overview
52 pages
A+ Blog Class 8 Physics Chapter 1 Notes and Question Answer (Em)
No ratings yet
A+ Blog Class 8 Physics Chapter 1 Notes and Question Answer (Em)
5 pages
Introduction To Cheminformatics
No ratings yet
Introduction To Cheminformatics
38 pages
Bishop ML
No ratings yet
Bishop ML
3 pages
Jyesthadeva-2
No ratings yet
Jyesthadeva-2
2 pages
MCQs on Nationalism in Europe and India
No ratings yet
MCQs on Nationalism in Europe and India
43 pages
Independent Component Analysis Overview
No ratings yet
Independent Component Analysis Overview
26 pages
Probability Concepts Tutorial
No ratings yet
Probability Concepts Tutorial
15 pages
Retail Data Insights & Strategies
No ratings yet
Retail Data Insights & Strategies
24 pages
Solutions To Chapter 1 An Introduction To Data Mining: Discovering Knowledge in Data 2 Edition
No ratings yet
Solutions To Chapter 1 An Introduction To Data Mining: Discovering Knowledge in Data 2 Edition
15 pages
Project Report PDF
No ratings yet
Project Report PDF
14 pages
Statistics Hand Notes
No ratings yet
Statistics Hand Notes
16 pages
Statistical Foundation For AIML
No ratings yet
Statistical Foundation For AIML
2 pages
Monte Carlo Integration Lecture
No ratings yet
Monte Carlo Integration Lecture
8 pages
Nature of Wind
No ratings yet
Nature of Wind
1 page
DIP Notes Unit-5 PPT@Zammers
No ratings yet
DIP Notes Unit-5 PPT@Zammers
43 pages
Mathematical Modeling by J N Kapur 1
No ratings yet
Mathematical Modeling by J N Kapur 1
51 pages
L-29 (Depth Sorting)
No ratings yet
L-29 (Depth Sorting)
17 pages
Cok Rigging
No ratings yet
Cok Rigging
64 pages
Great Indian Mathematicians and Their Contributions
No ratings yet
Great Indian Mathematicians and Their Contributions
2 pages
Bangalore House Price Prediction
No ratings yet
Bangalore House Price Prediction
5 pages
Matrix Inversion Program
No ratings yet
Matrix Inversion Program
3 pages
Machine Learning in Geospatial Data
No ratings yet
Machine Learning in Geospatial Data
9 pages
Module 1 - Learning Models
100% (1)
Module 1 - Learning Models
147 pages
Data Science Course Overview
No ratings yet
Data Science Course Overview
74 pages
GNR651 Spring2023 - C1 C2 C3 C4 Session9++
No ratings yet
GNR651 Spring2023 - C1 C2 C3 C4 Session9++
57 pages
Principles of Database Management Overview
100% (1)
Principles of Database Management Overview
24 pages
MA 2140 Statistics Course Guide
No ratings yet
MA 2140 Statistics Course Guide
22 pages
High-Dimensional Data Mining Techniques
No ratings yet
High-Dimensional Data Mining Techniques
18 pages
Understanding High-Dimensional Space
No ratings yet
Understanding High-Dimensional Space
36 pages
Understanding Data Mining Techniques
No ratings yet
Understanding Data Mining Techniques
3 pages
USB TO RS485 Converter Strip: Data Sheet
No ratings yet
USB TO RS485 Converter Strip: Data Sheet
2 pages
Vol.2. Introduction To Hydraulic Cylinder
100% (8)
Vol.2. Introduction To Hydraulic Cylinder
54 pages
Renault Duster K4M 1.6 Engine
No ratings yet
Renault Duster K4M 1.6 Engine
11 pages
Battery Module
No ratings yet
Battery Module
1 page
X-Ray Powder Diffraction Techniques
No ratings yet
X-Ray Powder Diffraction Techniques
30 pages
Pre-Calculus Content Standards Overview
No ratings yet
Pre-Calculus Content Standards Overview
6 pages
MultiFlex Motion Controllers Price List
No ratings yet
MultiFlex Motion Controllers Price List
1 page
Cells and Systems Review Questions
No ratings yet
Cells and Systems Review Questions
5 pages
Sodium Lauryl Sulfate Analysis Guide
No ratings yet
Sodium Lauryl Sulfate Analysis Guide
3 pages
Astm D7006-22
No ratings yet
Astm D7006-22
7 pages
Discriminant Analysis for Customer Risk
No ratings yet
Discriminant Analysis for Customer Risk
3 pages
ICU Ventilator Components and Standards
No ratings yet
ICU Ventilator Components and Standards
44 pages
CL 9 Phy PPT5 Ls 9 Gravitation
No ratings yet
CL 9 Phy PPT5 Ls 9 Gravitation
19 pages
FLUXUS G601 Technical Specifications
No ratings yet
FLUXUS G601 Technical Specifications
1 page
Vision Tool Manuals: Introduction To Vision Tools
No ratings yet
Vision Tool Manuals: Introduction To Vision Tools
36 pages
Electrolysis Worksheet Clean
No ratings yet
Electrolysis Worksheet Clean
3 pages
RSA (Rivest-Shamir-Adleman)
No ratings yet
RSA (Rivest-Shamir-Adleman)
2 pages
Group 2 Project
No ratings yet
Group 2 Project
5 pages
Centographic Analysis
No ratings yet
Centographic Analysis
15 pages
ANR Error Report for Shopee App
No ratings yet
ANR Error Report for Shopee App
11 pages
Decision Making and Looping
No ratings yet
Decision Making and Looping
18 pages
Invertebrate Constraints
No ratings yet
Invertebrate Constraints
14 pages
Digital Certificates v3
No ratings yet
Digital Certificates v3
22 pages
TISP4070J3BJ
No ratings yet
TISP4070J3BJ
8 pages
CFD Analysis of Centrifugal Pump Impeller
No ratings yet
CFD Analysis of Centrifugal Pump Impeller
13 pages
Nonlinear Algebraic Systems Methods
No ratings yet
Nonlinear Algebraic Systems Methods
14 pages
GB 3087 e 2008
No ratings yet
GB 3087 e 2008
18 pages
Chemistry MCQs for Students
No ratings yet
Chemistry MCQs for Students
2 pages
Water Hardness and Softening Guide
100% (8)
Water Hardness and Softening Guide
22 pages
Use-Case 2 0 Jan11 PDF
No ratings yet
Use-Case 2 0 Jan11 PDF
55 pages

High-Dimensional Data Mining Techniques

Uploaded by

High-Dimensional Data Mining Techniques

Uploaded by

High-Dimensional Space

Prof. Asim Tewari

• Feature projection: Feature projection transforms the data in the

Proof: For a continuous nonnegative random variable x with probability density p,

We can then show that the ratio of the volume of A to the

You might also like