0% found this document useful (0 votes)
11 views77 pages

ML Unit6

Dimensionality reduction is a process aimed at reducing the number of features in a dataset to improve model performance and mitigate the curse of dimensionality. It involves techniques such as feature selection and feature extraction to maintain meaningful data properties while simplifying the dataset. This is crucial in machine learning to enhance accuracy, reduce overfitting, and improve computational efficiency.

Uploaded by

covidgamer00
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views77 pages

ML Unit6

Dimensionality reduction is a process aimed at reducing the number of features in a dataset to improve model performance and mitigate the curse of dimensionality. It involves techniques such as feature selection and feature extraction to maintain meaningful data properties while simplifying the dataset. This is crucial in machine learning to enhance accuracy, reduce overfitting, and improve computational efficiency.

Uploaded by

covidgamer00
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 77

Dimensionality Reduction

AI&DS
Sem VI
Machine Learning,
UNIT – VI
By,
Dr. Himani Deshpande
Dr.Himani Deshpande (TSEC)
Dr.Himani Deshpande (TSEC)

Machine Learning, Sem VI AI&DS


COMPONENTS OF A ML Model

One Major

AI&DS
Component

Sem VI
Data

Machine Learning,
Information in the
form of Features
Features

• Features are available information about the system and are referred to as
the distinguishing characteristics, multiple features group together to

AI&DS
define a system.

Sem VI
• Dataset is consists of features of multiple records.

Machine Learning,
• Each feature, or column, represents a measurable piece of data that can be
used for analysis.
Machine learning

Available

AI&DS
Data Prediction Final
Information
Pre-Processing Algorithm Outcome
( Dataset )

Sem VI
Machine Learning,
Machine learning

Available

AI&DS
Data Prediction Final
Information
Pre-Processing Algorithm Outcome
( Dataset )

Sem VI
All Selected
features features

Machine Learning,
FEATURE
SELECTION
What is dimensionality reduction?
• Dimensionality reduction is the task of reducing the number of
features in a dataset.
• In machine learning tasks like regression or classification, there

AI&DS
are often too many variables to work with. These variables are
also called features.

Sem VI
• The higher the number of features, the more difficult it is to
model them, this is known as the curse of dimensionality.

Machine Learning,
• Additionally, some of these features can be quite redundant,
adding noise to the dataset and it makes no sense to have them
in the training data. This is where feature space needs to be
reduced.
11
What is dimensionality reduction? (1)

“The process of dimensionality reduction essentially transforms data


from high-dimensional feature space to a low-dimensional feature

AI&DS
space.”

Sem VI
• Simultaneously, it is also important that meaningful properties
present in the data are not lost during the transformation.

Machine Learning,
• Dimensionality reduction is commonly used in data visualization
to understand and interpret the data, and in machine learning or
deep learning techniques to simplify the task at hand.
12
Curse of Dimensionality
• It is well known that ML algorithms need a large amount of
data to learn invariance, patterns, and representations.
• If this data comprises a large number of features, this can lead

AI&DS
to the curse of dimensionality.

Sem VI
• The curse of dimensionality, describes that in order to
estimate an arbitrary function with a certain accuracy the
number of features or dimensionality required for estimation

Machine Learning,
grows exponentially.
• This is especially true with big data which yields
more sparsity.
Curse of dimensionality

As number of features increases:


• Volume of feature space increases exponentially.

AI&DS
• Data becomes increasingly sparse in the space it occupies.

Sem VI
• Sparsity makes it difficult to achieve statistical significance
for many methods.

Machine Learning,
• All distances start to converge to a common value
Curse of Dimensionality (1)
• Increasing the number of features will not
always improve classification accuracy.

• In practice, the inclusion of more features

AI&DS
might actually lead to worse performance.

Sem VI
k=3
31 bins
• The number of training examples required
increases exponentially with dimensionality d

Machine Learning,
(i.e., kd).

k: number of bins per feature


32 bins 33 bins
Curse of Dimensionality (2)
• Sparsity in data is usually referred to as the features having a value of
zero. If the data has a lot of sparse features then the space and
computational complexity increase.
• The model trained on sparse data performed poorly in the test

AI&DS
dataset, they are not able to generalize well. Hence they overfit.
• When the data is sparse, samples are difficult to cluster as high-

Sem VI
dimensional data causes every observation to appear equidistant from
each other.
• If data is meaningful and non-redundant, then there will be regions

Machine Learning,
where similar data points come together and cluster, furthermore they
must be statistically significant.
Curse of Dimensionality (3)
• Issues that arise with high dimensional data are:
• Running a risk of overfitting the machine learning model.
• Difficulty in clustering similar features.
• Increased space and computational time complexity.

AI&DS
• Non-sparse data or dense data is data that has non-zero features.
They also contain information that is both meaningful and non-

Sem VI
redundant.
• To tackle the curse of dimensionality, methods like dimensionality
reduction are used.

Machine Learning,
• Dimensional reduction techniques are very useful to transform
sparse features to dense features.
Dimensionality Reduction
• What is the objective?
• Choose an optimum set of features of lower dimensionality to improve
classification accuracy.

AI&DS
Sem VI
Machine Learning,
• Different methods are there to reduce dimensionality.

18
Dimensionality
Reduction
Methods

AI&DS
Sem VI
Machine Learning,
Feature Feature
Selection Extraction

Dr.Himani Deshpande (TSEC)


Approach for Dimensionality reduction

FEATURE SELECTION
Choose a best subset of size ‘n’ from the available ‘d’ features.

AI&DS
Remove non-informative, inconsistent, irrelevant and redundant

Sem VI
features.

FEATURE EXTRACTION

Machine Learning,
Given the feature set, extract ‘m’ new features by linear or non-linear
combination of all the d features
Eg: PCA, LDA/MDA
Dimensionality Reduction (1)
Feature extraction: finds a Feature selection:
set of new features (i.e., chooses a subset of the
through some mapping f())
original features.

AI&DS
from the existing features.

Sem VI
 x1 
 x1  x 
The mapping f()  2  xi1 
x 
could be linear  2  .   
 y1   

Machine Learning,
or non-linear  .  y   xi2 
 . 
   2 x= →y= . 
.
x =   ⎯⎯⎯ f ( x )
→y =  .   .   
 .       . 
   .    .  
 .   yK   .   xiK 
 .   
   xN 
 xN  K<<N K<<N
21
Feature selection

• In machine learning and statistics,

AI&DS
feature selection, also known as variable selection, attribute

Sem VI
selection or variable subset selection, is the process of selecting a
subset of relevant features (variables, predictors) for use in model

Machine Learning,
construction.
Feature selection

AI&DS
• A feature selection algorithm can be seen as the
→ combination of a search technique for

Sem VI
→ proposing new feature subsets, along with an
→ evaluation measure which scores the different feature / feature

Machine Learning,
subsets.
When feature selection is important
• Noise data
• Non informative features

AI&DS
• Too many features compared to samples
• Complex models

Sem VI
• Sample in real scenario is inhomogeneous with training and test sample
→ Less information required to predict disease.

Machine Learning,
• Smaller dataset required.
Why feature selection

Feature selection techniques are used for several reasons


• simplification of models to make them easier to interpret by researchers/users,

AI&DS
• shorter training times,

Sem VI
• Less storage requirement
• to avoid the curse of dimensionality,
• enhanced generalization by reducing overfitting

Machine Learning,
• Improves accuracy using relevant features
Why feature selection
Easier
to
Debug

AI&DS
Sem VI
FEATURE
SELETCION

Machine Learning,
METHODS FOR

AI&DS
Sem VI
FEATURE SELECTION

Machine Learning,
Feature Selection

AI&DS
Sem VI
Machine Learning,
Dr.Himani Deshpande (TSEC)
Non labeled Data

AI&DS
Sem VI
Machine Learning,
Dr.Himani Deshpande (TSEC)
FEATURE SELECTION

Machine Learning, Sem VI AI&DS


Filter methods

Machine Learning, Sem VI AI&DS


FILTER METHOD

• Filter methods works on uniqueness of features,

AI&DS
• It relies on statistical methods.
• Filter methods use criteria like information, dependency, consistency, distance etc

Sem VI
to measure the weightage of features.
• These methods focus on the usefulness of every feature, in identifying each
problem class.

Machine Learning,
• Filter method is famous for having less computation burden and does not face
overfitting issues.
• Due to less computation requirements, it works faster.
• Some ranking methods used by filter methods are correlation coefficient, Mutual
information, F-score, chi- square test etc.
Filter method

Most common search strategy, used by technical and non-technical

AI&DS
professional :
• Score each feature individually for its ability to discriminate outcome.

Sem VI
• Rank features by score.
• Select top k ranked features. A B C D E outcome

Machine Learning,
1 1 11.5 4 1 7
1 2 12.6 5 1 8
0 3 4 4 1 3
1 4 8.9 4 1 6
0 5 5.2 5 0 4
•FILTER METHOD EXAMPLES

Machine Learning, Sem VI AI&DS


F-score (Filter method)
ഥ𝒊
𝒙 − 𝑨𝒗𝒆𝒓𝒂𝒈𝒆 𝒐𝒇 𝒊𝒕𝒉 𝒇𝒆𝒂𝒕𝒖𝒓𝒆 𝒊𝒏 𝒘𝒉𝒐𝒍𝒆 𝒅𝒂𝒕𝒂𝒔𝒆𝒕

ഥ𝒊
𝒙 − 𝑨𝒗𝒆𝒓𝒂𝒈𝒆 𝒐𝒇 𝒊𝒕𝒉 𝒇𝒆𝒂𝒕𝒖𝒓𝒆 𝒊𝒏 − 𝒗𝒆 𝒅𝒂𝒕𝒂𝒔𝒆𝒕
+ve set : values with ‘1’ outcome
+
-ve set : values with ‘0’ outcome ഥ𝒊
𝒙 − 𝑨𝒗𝒆𝒓𝒂𝒈𝒆 𝒐𝒇 𝒊𝒕𝒉 𝒇𝒆𝒂𝒕𝒖𝒓𝒆 𝒊𝒏 + 𝒗𝒆 𝒅𝒂𝒕𝒂𝒔𝒆𝒕
+

AI&DS
ഥ𝒌,𝒊
𝒙 − 𝒊𝒕𝒉 𝒇𝒆𝒂𝒕𝒖𝒓𝒆 𝒊𝒏 𝒕𝒉𝒆 𝒌𝒕𝒉 + 𝒗𝒆 𝒊𝒏𝒔𝒕𝒂𝒏𝒄𝒆
𝟐 𝟐
ഥ 𝒊 + −𝒙 +
ഥ 𝒊 − −𝒙

Sem VI
(𝒙 ഥ𝒊 ) (𝒙 ഥ𝒊 )
• 𝑭= Numerator-
𝑨 + 𝑩
Indicates the discrimination between positive and
negative sets.
𝟏 𝒏+ (ഥ + + 𝟐
σ𝒌=𝟏( 𝒙𝒌,𝒊 −(ഥ
𝒙𝒊 )

Machine Learning,
• A= ( 𝒏+ − 𝟏 )
Denominators-

− − 𝟐 A- indicates discrimination within positive set


𝟏
•𝑩= σ𝒏𝒌=𝟏
+ (ഥ
( 𝒙𝒌,𝒊 −(ഥ
𝒙𝒊 ) B- indicates discrimination within negative set
( 𝒏− − 𝟏 )
Chi square test ( X2 )
• First used by Karl Pearson
• Simple and widely used non-parametric(no assumptions about population)

AI&DS
test
• Calculate value of χ2

Sem VI
• Test is based on frequencies and not on parameters.
• It is also called as goodness of fit test.

Machine Learning,
MUTUAL INFORMATION

Machine Learning, Sem VI AI&DS


Wrapper method

• Selects subset of features that


together have good predictive power,

AI&DS
as opposed to ranking features
individually.

Sem VI
• Evaluation

Machine Learning,
Classification Error
WRAPPER METHOD

• Wrapper methods are “Classifier dependent”.

AI&DS
• Results obtained by ML algorithm(SVM, Niave bayes, KNN etc.), will judge the
relevance of each subset of parameters.

Sem VI
• Based on the classification accuracy, methods evaluate the “goodness” of the
selected feature subset directly, which would help to get effective results.
• Wrapper method follows the recursive feature elimination or selection process.

Machine Learning,
• These methods face overfitting issues classifiers may have a bias which could
possibly increase classification error.
Wrapper

WRAPPER

AI&DS
Sem VI
Backward Elimination Forward Selection

Machine Learning,
Recursive Feature elimination
Forward selection

Forward selection is an iterative method in


which we start with having no feature in the
model.

AI&DS
Sem VI
Machine Learning,
Forward Selection SELECTED
FEATURE
SET

 1. Start with a null model Feature II

AI&DS
 2. Fit all one-variable models in turn. Pick the model with the best Accuracy Feature VI

Sem VI
 3. Then, fit all two variable models that contain the variable selected in 2. Pick the Feature
III
one for which the added variable gives the best Accuracy
Feature V

Machine Learning,
 4. Continue in this way until adding further variables does not improve the
Accuracy Feature I

Feature Feature Feature


Feature 1 Feature II Feature VI
III IV V
Backward Elimination

In backward elimination, we start with all the features and removes the least
significant feature at each iteration which improves the performance of the
model.

AI&DS
Sem VI
We repeat this until no improvement is observed on removal of features.

Machine Learning,
Feature Feature Feature
Feature 1 Feature II Feature VI
III IV V
Recursive Feature elimination

It is a greedy optimization algorithm which aims to find the best performing


feature subset.

AI&DS
It repeatedly creates models and keeps aside the best or the worst
performing feature at each iteration.

Sem VI
It constructs the next model with the left features until all the features are
exhausted.
It then ranks the features based on the order of their elimination.

Machine Learning,
Feature Feature
Featur Feature
Featur
Featur
Feature 1
Featur
Feature II Featur FeatureFeatur
VI
e1 e 1I III e III IV e IV V eV e VI
Embedded

Potential to classify as well as rank features

AI&DS
→ Embedded methods combine the qualities of filter and wrapper methods.

Sem VI
→ It’s implemented by algorithms that have their own built-in feature selection
methods

Machine Learning,
→ Example
•L1 (LASSO) regularization
•Decision tree
•Random Forest
Decision tree
Outcome
• Entropy- Defines randomness in data. It is defined by the

AI&DS
Entropy(S) = -p(yes) log2 p(yes) – p(no) log2 p(no)

Sem VI
Information Gain-
I.G. = E(S)- Info from selected feature
• Measures the reduction in entropy → Decides the relevance of attributes.

Machine Learning,
Gain(S,A) = It reflects the additional information about A provided by S and is called the
information gain

Gain(S, A) = E (S) – E (S, A)


RANDOM FOREST for feature selection

• Works as Ensemble learning


i.e. many week classifier combines to form a strong classifier

AI&DS
• Used in literature as embedded method for feature selection.

Sem VI
• Solves the overfitting issue of Decision tree
• Multiple decision trees formed with diversity in data
➢Randomly selected sample for each decision tree.

Machine Learning,
➢Randomly selected set of attributes for each decision tree
FEATURE SELECTION

Machine Learning, Sem VI AI&DS


FILTER WRAPPER EMBEDDED

Select each feature Count the importance of Select feature with each
individually feature being part of a iteration of model training

AI&DS
subset phase.

Sem VI
Generic set of methods Evaluates on a specific ML Embeds features during
which do not incorporate a algo. To find optimal model building process.
specific ML algo. features.

Machine Learning,
Much faster in terms of High computation time, for In between filter and
time complexity dataset with many features wrapper in terms of time
complexity
Less prone to over fitting High chances of over Generally used to reduce
fitting over-fitting
Hybrid Approach
• Filter methods are said to get the  Wrapper methods provide optimized
features relevant to the class and features and higher accuracy of
it avoids redundancy . prediction and classification .

AI&DS
• Filter methods are good in terms  Wrapper and embedded methods
of evaluating the dataset. help in selecting a better feature set

Sem VI
for the classification algorithm.

Literature suggests that both filter and wrapper have their own set of advantages

Machine Learning,
and disadvantages.
It is suggested that selecting relevant features by combining methods, shows
improvement in terms of classification accuracy, speeds up the process and reduce
error rate.
Feature

AI&DS
Sem VI
Extraction

Machine Learning,
Dr.Himani Deshpande (TSEC)
Feature Extraction
• From a mathematical point of view, finding an optimum mapping
y=𝑓(x) is equivalent to optimizing an objective criterion.

AI&DS
• Different methods use different objective criteria, e.g.,

Sem VI
• Minimize Information Loss: represent the data as accurately as possible in
the lower-dimensional space.

Machine Learning,
• Maximize Discriminatory Information: enhance the class-discriminatory
information in the lower-dimensional space.

55
Feature Extraction
• Popular linear feature extraction methods:

• Principal Components Analysis (PCA): Seeks a projection that

AI&DS
preserves as much information in the data as possible.

Sem VI
• Linear Discriminant Analysis (LDA): Seeks a projection that best
discriminates the data.

Machine Learning,
• Singular Value Decomposition (SVD): is a widely used technique to
decompose a matrix into several component matrices.

56
Principal

AI&DS
Component

Sem VI
Analysis

Machine Learning,
PCA
Dr.Himani Deshpande (TSEC)
PCA
PCA is a technique used to:
➢Reduce the dimensionality of the data set.

AI&DS
➢Identify new meaningful underlying features.

Sem VI
➢Loose minimum information.
➢PCA can be also used for denoising and data compression.

Machine Learning,
Dr.Himani Deshpande (TSEC)
PCA
• PCA transforms a large set of variables into a smaller one that still
contains most of the information.

AI&DS
• It does this by identifying linear combinations of the original

Sem VI
variables that best explain the variance of all the variables.

Machine Learning,
• These linear combinations are called principal components.

Dr.Himani Deshpande (TSEC)


PCA
• Principal Components Analysis (PCA) is a well
known unsupervised dimensionality reduction technique

AI&DS
• that constructs relevant features/variables through
• linear (linear PCA) or

Sem VI
• non-linear (kernel PCA)

combinations of the original variables (features).

Machine Learning,
Dr.Himani Deshpande (TSEC)
Considering any two variables without PCA

AI&DS
Sem VI
Machine Learning,
Dr.Himani Deshpande (TSEC)
PCA

AI&DS
Sem VI
Machine Learning,
Dr.Himani Deshpande (TSEC)
PCA

AI&DS
Sem VI
Machine Learning,
Dr.Himani Deshpande (TSEC)
Dr.Himani Deshpande (TSEC)

Machine Learning, Sem VI AI&DS


Dr.Himani Deshpande (TSEC)

Machine Learning, Sem VI AI&DS


Principal Components
• The construction of relevant features is achieved by linearly
transforming correlated variables into a smaller number
of uncorrelated variables.

AI&DS
• This is done by projecting (dot product) the original data into

Sem VI
the reduced PCA space using the eigenvectors of the
covariance/correlation matrix aka the principal components (PCs).
• The resulting projected data are essentially linear combinations of

Machine Learning,
the original data capturing most of the variance in the data.

Dr.Himani Deshpande (TSEC)


PCA

PCA is an orthogonal transformation of the data into a series

AI&DS
of uncorrelated data living in the reduced PCA space such that

Sem VI
the first component(PC1) explains the most variance in the
data with each subsequent component explaining less.

Machine Learning,
Dr.Himani Deshpande (TSEC)
Principal Component Analysis

AI&DS
Sem VI
Machine Learning,
Dr.Himani Deshpande (TSEC)
PCA

AI&DS
Sem VI
Machine Learning,
Dr.Himani Deshpande (TSEC)
PCA

AI&DS
Sem VI
Machine Learning,
Dr.Himani Deshpande (TSEC)
Dr.Himani Deshpande (TSEC)

Machine Learning, Sem VI AI&DS


PCA
• PCA is based on

AI&DS
→Variance & Covariance

Sem VI
→ Eigenvectors & Eigenvalues

Machine Learning,
Dr.Himani Deshpande (TSEC)
Dr.Himani Deshpande (TSEC)

Machine Learning, Sem VI AI&DS


𝜆1

Eigen
vectors

AI&DS
𝜆2

Sem VI
Diagram

Machine Learning,
Representation

Dr.Himani Deshpande (TSEC)


Eigenvectors

AI&DS
• In data science and machine learning, eigenvalues are used in
PCA for dimensionality reduction.

Sem VI
• The eigenvectors corresponding to the largest eigenvalues

Machine Learning,
capture the most essential features of the data, helping reduce
complexity while retaining important patterns.

Dr.Himani Deshpande (TSEC)


Eigen Values & Eigen Vectors

AI&DS
Sem VI
Machine Learning,
Dr.Himani Deshpande (TSEC)
PCA Steps

AI&DS
Sem VI
Machine Learning,
Dr.Himani Deshpande (TSEC)
Step -1

AI&DS
Sem VI
Machine Learning,
Dr.Himani Deshpande (TSEC)
Step- 2

AI&DS
Sem VI
Machine Learning,
Dr.Himani Deshpande (TSEC)
Step -3

AI&DS
Sem VI
Machine Learning,
Dr.Himani Deshpande (TSEC)
Step -4

AI&DS
Sem VI
Machine Learning,
Dr.Himani Deshpande (TSEC)
Step -5

AI&DS
Sem VI
Machine Learning,
Dr.Himani Deshpande (TSEC)
PCA
Things to keep in mind while implementing PCA
• Scale features while applying PCA

AI&DS
• Your accuracy might drop

Sem VI
Machine Learning,
Dr.Himani Deshpande (TSEC)
AI&DS
THANK YOU

Sem VI
Machine Learning,
Dr.Himani Deshpande (TSEC)

You might also like