ML Unit6
ML Unit6
AI&DS
Sem VI
Machine Learning,
UNIT – VI
By,
Dr. Himani Deshpande
Dr.Himani Deshpande (TSEC)
Dr.Himani Deshpande (TSEC)
One Major
AI&DS
Component
Sem VI
Data
Machine Learning,
Information in the
form of Features
Features
• Features are available information about the system and are referred to as
the distinguishing characteristics, multiple features group together to
AI&DS
define a system.
Sem VI
• Dataset is consists of features of multiple records.
Machine Learning,
• Each feature, or column, represents a measurable piece of data that can be
used for analysis.
Machine learning
Available
AI&DS
Data Prediction Final
Information
Pre-Processing Algorithm Outcome
( Dataset )
Sem VI
Machine Learning,
Machine learning
Available
AI&DS
Data Prediction Final
Information
Pre-Processing Algorithm Outcome
( Dataset )
Sem VI
All Selected
features features
Machine Learning,
FEATURE
SELECTION
What is dimensionality reduction?
• Dimensionality reduction is the task of reducing the number of
features in a dataset.
• In machine learning tasks like regression or classification, there
AI&DS
are often too many variables to work with. These variables are
also called features.
Sem VI
• The higher the number of features, the more difficult it is to
model them, this is known as the curse of dimensionality.
Machine Learning,
• Additionally, some of these features can be quite redundant,
adding noise to the dataset and it makes no sense to have them
in the training data. This is where feature space needs to be
reduced.
11
What is dimensionality reduction? (1)
AI&DS
space.”
Sem VI
• Simultaneously, it is also important that meaningful properties
present in the data are not lost during the transformation.
Machine Learning,
• Dimensionality reduction is commonly used in data visualization
to understand and interpret the data, and in machine learning or
deep learning techniques to simplify the task at hand.
12
Curse of Dimensionality
• It is well known that ML algorithms need a large amount of
data to learn invariance, patterns, and representations.
• If this data comprises a large number of features, this can lead
AI&DS
to the curse of dimensionality.
Sem VI
• The curse of dimensionality, describes that in order to
estimate an arbitrary function with a certain accuracy the
number of features or dimensionality required for estimation
Machine Learning,
grows exponentially.
• This is especially true with big data which yields
more sparsity.
Curse of dimensionality
AI&DS
• Data becomes increasingly sparse in the space it occupies.
Sem VI
• Sparsity makes it difficult to achieve statistical significance
for many methods.
Machine Learning,
• All distances start to converge to a common value
Curse of Dimensionality (1)
• Increasing the number of features will not
always improve classification accuracy.
AI&DS
might actually lead to worse performance.
Sem VI
k=3
31 bins
• The number of training examples required
increases exponentially with dimensionality d
Machine Learning,
(i.e., kd).
AI&DS
dataset, they are not able to generalize well. Hence they overfit.
• When the data is sparse, samples are difficult to cluster as high-
Sem VI
dimensional data causes every observation to appear equidistant from
each other.
• If data is meaningful and non-redundant, then there will be regions
Machine Learning,
where similar data points come together and cluster, furthermore they
must be statistically significant.
Curse of Dimensionality (3)
• Issues that arise with high dimensional data are:
• Running a risk of overfitting the machine learning model.
• Difficulty in clustering similar features.
• Increased space and computational time complexity.
AI&DS
• Non-sparse data or dense data is data that has non-zero features.
They also contain information that is both meaningful and non-
Sem VI
redundant.
• To tackle the curse of dimensionality, methods like dimensionality
reduction are used.
Machine Learning,
• Dimensional reduction techniques are very useful to transform
sparse features to dense features.
Dimensionality Reduction
• What is the objective?
• Choose an optimum set of features of lower dimensionality to improve
classification accuracy.
AI&DS
Sem VI
Machine Learning,
• Different methods are there to reduce dimensionality.
18
Dimensionality
Reduction
Methods
AI&DS
Sem VI
Machine Learning,
Feature Feature
Selection Extraction
FEATURE SELECTION
Choose a best subset of size ‘n’ from the available ‘d’ features.
AI&DS
Remove non-informative, inconsistent, irrelevant and redundant
Sem VI
features.
FEATURE EXTRACTION
Machine Learning,
Given the feature set, extract ‘m’ new features by linear or non-linear
combination of all the d features
Eg: PCA, LDA/MDA
Dimensionality Reduction (1)
Feature extraction: finds a Feature selection:
set of new features (i.e., chooses a subset of the
through some mapping f())
original features.
AI&DS
from the existing features.
Sem VI
x1
x1 x
The mapping f() 2 xi1
x
could be linear 2 .
y1
Machine Learning,
or non-linear . y xi2
.
2 x= →y= .
.
x = ⎯⎯⎯ f ( x )
→y = . .
. .
. .
. yK . xiK
.
xN
xN K<<N K<<N
21
Feature selection
AI&DS
feature selection, also known as variable selection, attribute
Sem VI
selection or variable subset selection, is the process of selecting a
subset of relevant features (variables, predictors) for use in model
Machine Learning,
construction.
Feature selection
AI&DS
• A feature selection algorithm can be seen as the
→ combination of a search technique for
Sem VI
→ proposing new feature subsets, along with an
→ evaluation measure which scores the different feature / feature
Machine Learning,
subsets.
When feature selection is important
• Noise data
• Non informative features
AI&DS
• Too many features compared to samples
• Complex models
Sem VI
• Sample in real scenario is inhomogeneous with training and test sample
→ Less information required to predict disease.
Machine Learning,
• Smaller dataset required.
Why feature selection
AI&DS
• shorter training times,
Sem VI
• Less storage requirement
• to avoid the curse of dimensionality,
• enhanced generalization by reducing overfitting
Machine Learning,
• Improves accuracy using relevant features
Why feature selection
Easier
to
Debug
AI&DS
Sem VI
FEATURE
SELETCION
Machine Learning,
METHODS FOR
AI&DS
Sem VI
FEATURE SELECTION
Machine Learning,
Feature Selection
AI&DS
Sem VI
Machine Learning,
Dr.Himani Deshpande (TSEC)
Non labeled Data
AI&DS
Sem VI
Machine Learning,
Dr.Himani Deshpande (TSEC)
FEATURE SELECTION
AI&DS
• It relies on statistical methods.
• Filter methods use criteria like information, dependency, consistency, distance etc
Sem VI
to measure the weightage of features.
• These methods focus on the usefulness of every feature, in identifying each
problem class.
Machine Learning,
• Filter method is famous for having less computation burden and does not face
overfitting issues.
• Due to less computation requirements, it works faster.
• Some ranking methods used by filter methods are correlation coefficient, Mutual
information, F-score, chi- square test etc.
Filter method
AI&DS
professional :
• Score each feature individually for its ability to discriminate outcome.
Sem VI
• Rank features by score.
• Select top k ranked features. A B C D E outcome
Machine Learning,
1 1 11.5 4 1 7
1 2 12.6 5 1 8
0 3 4 4 1 3
1 4 8.9 4 1 6
0 5 5.2 5 0 4
•FILTER METHOD EXAMPLES
AI&DS
ഥ𝒌,𝒊
𝒙 − 𝒊𝒕𝒉 𝒇𝒆𝒂𝒕𝒖𝒓𝒆 𝒊𝒏 𝒕𝒉𝒆 𝒌𝒕𝒉 + 𝒗𝒆 𝒊𝒏𝒔𝒕𝒂𝒏𝒄𝒆
𝟐 𝟐
ഥ 𝒊 + −𝒙 +
ഥ 𝒊 − −𝒙
Sem VI
(𝒙 ഥ𝒊 ) (𝒙 ഥ𝒊 )
• 𝑭= Numerator-
𝑨 + 𝑩
Indicates the discrimination between positive and
negative sets.
𝟏 𝒏+ (ഥ + + 𝟐
σ𝒌=𝟏( 𝒙𝒌,𝒊 −(ഥ
𝒙𝒊 )
Machine Learning,
• A= ( 𝒏+ − 𝟏 )
Denominators-
AI&DS
test
• Calculate value of χ2
Sem VI
• Test is based on frequencies and not on parameters.
• It is also called as goodness of fit test.
Machine Learning,
MUTUAL INFORMATION
AI&DS
as opposed to ranking features
individually.
Sem VI
• Evaluation
Machine Learning,
Classification Error
WRAPPER METHOD
AI&DS
• Results obtained by ML algorithm(SVM, Niave bayes, KNN etc.), will judge the
relevance of each subset of parameters.
Sem VI
• Based on the classification accuracy, methods evaluate the “goodness” of the
selected feature subset directly, which would help to get effective results.
• Wrapper method follows the recursive feature elimination or selection process.
Machine Learning,
• These methods face overfitting issues classifiers may have a bias which could
possibly increase classification error.
Wrapper
WRAPPER
AI&DS
Sem VI
Backward Elimination Forward Selection
Machine Learning,
Recursive Feature elimination
Forward selection
AI&DS
Sem VI
Machine Learning,
Forward Selection SELECTED
FEATURE
SET
AI&DS
2. Fit all one-variable models in turn. Pick the model with the best Accuracy Feature VI
Sem VI
3. Then, fit all two variable models that contain the variable selected in 2. Pick the Feature
III
one for which the added variable gives the best Accuracy
Feature V
Machine Learning,
4. Continue in this way until adding further variables does not improve the
Accuracy Feature I
In backward elimination, we start with all the features and removes the least
significant feature at each iteration which improves the performance of the
model.
AI&DS
Sem VI
We repeat this until no improvement is observed on removal of features.
Machine Learning,
Feature Feature Feature
Feature 1 Feature II Feature VI
III IV V
Recursive Feature elimination
AI&DS
It repeatedly creates models and keeps aside the best or the worst
performing feature at each iteration.
Sem VI
It constructs the next model with the left features until all the features are
exhausted.
It then ranks the features based on the order of their elimination.
Machine Learning,
Feature Feature
Featur Feature
Featur
Featur
Feature 1
Featur
Feature II Featur FeatureFeatur
VI
e1 e 1I III e III IV e IV V eV e VI
Embedded
AI&DS
→ Embedded methods combine the qualities of filter and wrapper methods.
Sem VI
→ It’s implemented by algorithms that have their own built-in feature selection
methods
Machine Learning,
→ Example
•L1 (LASSO) regularization
•Decision tree
•Random Forest
Decision tree
Outcome
• Entropy- Defines randomness in data. It is defined by the
AI&DS
Entropy(S) = -p(yes) log2 p(yes) – p(no) log2 p(no)
Sem VI
Information Gain-
I.G. = E(S)- Info from selected feature
• Measures the reduction in entropy → Decides the relevance of attributes.
Machine Learning,
Gain(S,A) = It reflects the additional information about A provided by S and is called the
information gain
AI&DS
• Used in literature as embedded method for feature selection.
Sem VI
• Solves the overfitting issue of Decision tree
• Multiple decision trees formed with diversity in data
➢Randomly selected sample for each decision tree.
Machine Learning,
➢Randomly selected set of attributes for each decision tree
FEATURE SELECTION
Select each feature Count the importance of Select feature with each
individually feature being part of a iteration of model training
AI&DS
subset phase.
Sem VI
Generic set of methods Evaluates on a specific ML Embeds features during
which do not incorporate a algo. To find optimal model building process.
specific ML algo. features.
Machine Learning,
Much faster in terms of High computation time, for In between filter and
time complexity dataset with many features wrapper in terms of time
complexity
Less prone to over fitting High chances of over Generally used to reduce
fitting over-fitting
Hybrid Approach
• Filter methods are said to get the Wrapper methods provide optimized
features relevant to the class and features and higher accuracy of
it avoids redundancy . prediction and classification .
AI&DS
• Filter methods are good in terms Wrapper and embedded methods
of evaluating the dataset. help in selecting a better feature set
Sem VI
for the classification algorithm.
Literature suggests that both filter and wrapper have their own set of advantages
Machine Learning,
and disadvantages.
It is suggested that selecting relevant features by combining methods, shows
improvement in terms of classification accuracy, speeds up the process and reduce
error rate.
Feature
AI&DS
Sem VI
Extraction
Machine Learning,
Dr.Himani Deshpande (TSEC)
Feature Extraction
• From a mathematical point of view, finding an optimum mapping
y=𝑓(x) is equivalent to optimizing an objective criterion.
AI&DS
• Different methods use different objective criteria, e.g.,
Sem VI
• Minimize Information Loss: represent the data as accurately as possible in
the lower-dimensional space.
Machine Learning,
• Maximize Discriminatory Information: enhance the class-discriminatory
information in the lower-dimensional space.
55
Feature Extraction
• Popular linear feature extraction methods:
AI&DS
preserves as much information in the data as possible.
Sem VI
• Linear Discriminant Analysis (LDA): Seeks a projection that best
discriminates the data.
Machine Learning,
• Singular Value Decomposition (SVD): is a widely used technique to
decompose a matrix into several component matrices.
56
Principal
AI&DS
Component
Sem VI
Analysis
Machine Learning,
PCA
Dr.Himani Deshpande (TSEC)
PCA
PCA is a technique used to:
➢Reduce the dimensionality of the data set.
AI&DS
➢Identify new meaningful underlying features.
Sem VI
➢Loose minimum information.
➢PCA can be also used for denoising and data compression.
Machine Learning,
Dr.Himani Deshpande (TSEC)
PCA
• PCA transforms a large set of variables into a smaller one that still
contains most of the information.
AI&DS
• It does this by identifying linear combinations of the original
Sem VI
variables that best explain the variance of all the variables.
Machine Learning,
• These linear combinations are called principal components.
AI&DS
• that constructs relevant features/variables through
• linear (linear PCA) or
Sem VI
• non-linear (kernel PCA)
Machine Learning,
Dr.Himani Deshpande (TSEC)
Considering any two variables without PCA
AI&DS
Sem VI
Machine Learning,
Dr.Himani Deshpande (TSEC)
PCA
AI&DS
Sem VI
Machine Learning,
Dr.Himani Deshpande (TSEC)
PCA
AI&DS
Sem VI
Machine Learning,
Dr.Himani Deshpande (TSEC)
Dr.Himani Deshpande (TSEC)
AI&DS
• This is done by projecting (dot product) the original data into
Sem VI
the reduced PCA space using the eigenvectors of the
covariance/correlation matrix aka the principal components (PCs).
• The resulting projected data are essentially linear combinations of
Machine Learning,
the original data capturing most of the variance in the data.
AI&DS
of uncorrelated data living in the reduced PCA space such that
Sem VI
the first component(PC1) explains the most variance in the
data with each subsequent component explaining less.
Machine Learning,
Dr.Himani Deshpande (TSEC)
Principal Component Analysis
AI&DS
Sem VI
Machine Learning,
Dr.Himani Deshpande (TSEC)
PCA
AI&DS
Sem VI
Machine Learning,
Dr.Himani Deshpande (TSEC)
PCA
AI&DS
Sem VI
Machine Learning,
Dr.Himani Deshpande (TSEC)
Dr.Himani Deshpande (TSEC)
AI&DS
→Variance & Covariance
Sem VI
→ Eigenvectors & Eigenvalues
Machine Learning,
Dr.Himani Deshpande (TSEC)
Dr.Himani Deshpande (TSEC)
Eigen
vectors
AI&DS
𝜆2
Sem VI
Diagram
Machine Learning,
Representation
AI&DS
• In data science and machine learning, eigenvalues are used in
PCA for dimensionality reduction.
Sem VI
• The eigenvectors corresponding to the largest eigenvalues
Machine Learning,
capture the most essential features of the data, helping reduce
complexity while retaining important patterns.
AI&DS
Sem VI
Machine Learning,
Dr.Himani Deshpande (TSEC)
PCA Steps
AI&DS
Sem VI
Machine Learning,
Dr.Himani Deshpande (TSEC)
Step -1
AI&DS
Sem VI
Machine Learning,
Dr.Himani Deshpande (TSEC)
Step- 2
AI&DS
Sem VI
Machine Learning,
Dr.Himani Deshpande (TSEC)
Step -3
AI&DS
Sem VI
Machine Learning,
Dr.Himani Deshpande (TSEC)
Step -4
AI&DS
Sem VI
Machine Learning,
Dr.Himani Deshpande (TSEC)
Step -5
AI&DS
Sem VI
Machine Learning,
Dr.Himani Deshpande (TSEC)
PCA
Things to keep in mind while implementing PCA
• Scale features while applying PCA
AI&DS
• Your accuracy might drop
Sem VI
Machine Learning,
Dr.Himani Deshpande (TSEC)
AI&DS
THANK YOU
Sem VI
Machine Learning,
Dr.Himani Deshpande (TSEC)