0% found this document useful (0 votes)

11 views12 pages

Deep Learning Unit 3 Notes

This document discusses dimensionality reduction techniques including Linear Factor Models, PCA, LDA, and Auto Encoders, outlining their definitions, advantages, and disadvantages. It emphasizes the importance of reducing high-dimensional data for improved algorithm performance and visualization, while also addressing the limitations of these methods. Additionally, it introduces manifold learning as a solution for non-linear dimensionality reduction, detailing various algorithms like Isomap, Locally Linear Embeddings, and t-SNE.

Uploaded by

dhruvsoam95

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views12 pages

Deep Learning Unit 3 Notes

Uploaded by

dhruvsoam95

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

UNIT III

DIMENTIONALITY REDUCTION

Syllabus: Linear (PCA, LDA) and manifolds, metric learning - Auto encoders and
dimensionality reduction in networks - Introduction to Convnet - Architectures –
AlexNet, VGG, Inception, ResNet - Training a Convnet: weights initialization, batch
normalization, hyperparameter optimization.

3.1 Linear Factor Models:

linear factor models are used as building blocks of mixture models of larger, deep
probabilistic models. A linear factor model is defined by the use of a stochastic linear
decoder function that generates x by adding noise to a linear transformation of h. It
allows us to discover explanatory factors that have a simple joint distribution. A linear
factor model describes the data-generation process as follows. ( we sample the
explanatory factors h from a distribution)
h ∼ p(h)

Figure:1 Linear Factor Model

3.2 Dimensionality Reduction:

➢ High dimensionality is challenging and redundant
➢ It is natural to try to reduce dimensionality
➢ We reduce the dimensionality by feature combination i.e., combine old features X
to create new features Y as given below

Figure 2: Dimensionality Reduction

1
3.3 Principal Component Analysis (PCA):
Principal Component Analysis, or simply PCA, is a statistical procedure
concerned with elucidating the covariance structure of a set of variables. In particular it
allows us to identify the principal directions in which the data varies.
For example, in figure 1, suppose that the triangles represent a two variable data
set which we have measured in the X-Y coordinate system. The principal direction in
which the data varies is shown by the U axis and the second most impor−tant
direction is the V axis orthogonal to it. If we place the U-V axis system at the mean of
the data it gives us a compact representation. If we transform each (X, Y ) coordinate
into its corresponding (U, V ) value, the data is de-correlated, meaning that the co-
variance between the U and V variables is zero. For a given set of data, principal
component analysis finds the axis system defined by the principal directions of variance
(ie the U − V axis system in figure 3). The directions U and V are called the principal
components

Figure 3A: PCA for Data Representation Figure 3B: PCA Dimension Reduction
If the variation in a data set is caused by some natural property, or is caused by random
experimental error, then we may expect it to be normally distributed. In this case we
show the nominal extent of the normal distribution by a hyper-ellipse (the two-
dimensional ellipse in the example). The hyper ellipse encloses data points that are
thought of as belonging to a class. It is drawn at a distance beyond which the probability
of a point belonging to the class is low, and can be thought of as a class boundary.
If the variation in the data is caused by some other relationship, then PCA gives
us a way of reducing the dimensionality of a data set. Consider two variables that are
nearly related linearly as shown in figure 3B. As in figure 3A the principal direction in
which the data varies is shown by the U axis, and the secondary direction by the V axis.
However, in this case all the V coordinates are all very close to zero. We may assume,
for example, that they are only non-zero because of experimental noise. Thus, in the U
V axis system we can represent the data set by one variable U and discard V. Thus, we
have reduced the dimensionality of the problem by 1Computing the Principal Components

3.3.1. Computing the Principal Components

In computational terms the principal components are found by calculating the
eigenvectors and eigenvalues of the data covariance matrix. This process is equivalent
to finding the axis system in which the co-variance matrix is diagonal. The eigenvector
with the largest eigenvalue is the direction of greatest variation, the one with the second
largest eigenvalue is the (orthogonal) direction with the next highest variation and so
on. To see how the computation is done we will give a brief review on
eigenvectors/eigenvalues. Let A be a n × n matrix. The eigenvalues of A are defined as
the roots of:
2
Determinant (A − λI) = |(A − λI)| = 0
where I is the n n identity matrix. This equation is called the characteristic equation (or
charac×teristic polynomial) and has n roots.
Let λ be an eigenvalue of A. Then there exists a vector x such that:
Ax = λx

The vector x is called an eigenvector of A associated with the eigenvalue λ. Notice that
there is no unique solution for x in the above equation. It is a direction vector only and
can be scaled to any magnitude. To find a numerical solution for x we need to set one of
its elements to an arbitrary value, say 1, which gives us a set of simultaneous equations
to solve for the other elements. If there is no solution, we repeat the process with another
element. Ordinarily we normalize the final values so that x has length one, that is x · xT
= 1. Suppose we have a 3 × 3 matrix A with eigenvectors x1, x2, x3, and eigenvalues λ1,
λ2, λ3 so:
Ax1 = λ1x1 Ax2 = λ2x2 Ax3 = λ3x3 Putting the

eigenvectors as the columns of a matrix gives:

gives us the matrix equation: AΦ = ΦΛ We normalised the eigenvectors to unit

magnitude, and they are orthogonal, so: ΦΦT = ΦT Φ = I, which means that: Φ T AΦ =
Λ and: A = ΦΛΦT. Now let us consider how this applies to the covariance matrix in the
PCA process. Let Σ be an n×n covariance matrix. There is an orthogonal n × n matrix Φ
whose columns are eigenvectors of Σ and a diagonal matrix Λ whose diagonal elements
are the eigenvalues of Σ, such that Φ T ΣΦ = Λ We can look on the matrix of eigenvectors
Φ as a linear transformation which, in the example of figure 3A transforms data points
in the [X, Y ] axis system into the [U, V ] axis system. In the general case the linear
transformation given by Φ transforms the data points into a data set where the variables
are uncorrelated. The correlation matrix of the data in the new coordinate system is Λ
which has zeros in all the off-diagonal elements.

3
3.3.2 Steps involved in PCA:
o Start with data for n observations on p variables
o Form a matrix of size n X p
o Calculate the Covariance Matrix
o Calculate the Eigen vectors and Eigen Values
o Choose Principal Component from Feature Vectors
o Derive the new Data Set

3.3.3 PCA Advantages:

1. Removes Correlated Features:

In a real-world scenario, it is very common that we get thousands of features in

our dataset. You cannot run your algorithm on all the features as it will reduce the
performance of your algorithm and it will not be easy to visualize that many features in
any kind of graph. Hence the data set should be reduced. We need to find out the
correlation among the features (correlated variables). Finding correlation manually in
thousands of features is nearly impossible, frustrating and time-consuming. PCA
performs this task effectively. After implementing the PCA on your dataset, all the
Principal Components are independent of one another. There is no correlation among
them.
2. Improves Algorithm Performance:

With so many features, the performance of your algorithm will drastically

degrade. PCA is a very common way to speed up your Machine Learning algorithm by
getting rid of correlated variables which don't contribute in any decision making. The
training time of the algorithms reduces significantly with a smaller number of features.
So, if the input dimensions are too high, then using PCA to speed up the algorithm is a
reasonable choice.
3. Reduces Overfitting:
Overfitting mainly occurs when there are too many variables in the dataset. So,
PCA helps in overcoming the overfitting issue by reducing the number of features.

4. Improves Visualization:

3.3.4. Disadvantages of PCA

1. Independent variables become less interpretable: After implementing PCA on the

dataset, your original features will turn into Principal Components. Principal
Components are the linear combination of your original features. Principal Components
are not as readable and interpretable as original features.

2. Data standardization is must before PCA: You must standardize your data before
implementing PCA, otherwise PCA will not be able to find the optimal Principal
Components.

3. Information Loss: Although Principal Components try to cover maximum variance

among the features in a dataset, if we don't select the number of Principal Components
4
with care, it may miss some information as compared to the original list of features.

3.4Linear Discrimination Analysis (LDA):

Linear Discriminant Analysis as its name suggests is a linear model for classification
and dimensionality reduction. Most commonly used for feature extraction in pattern
classification problems.

3.4.1 Need for LDA:

• Logistic Regression is performed well for binary classification but fails in the case of
multiple classification problems with well-separated classes. While LDA handles these
quite efficiently.
• LDA can also be used in data pre-processing to reduce the number of features just as
PCA which reduces the computing cost significantly.

3.4.2. Limitations:

• Linear decision boundaries may not effectively separate non-linearly separable classes.
More flexible boundaries are desired.
• In cases where the number of observations exceeds the number of features, LDA might
not perform as desired. This is called Small Sample Size (SSS) problem. Regularization
is required.

Linear Discriminant Analysis or Normal Discriminant Analysis or Discriminant

Function Analysis is a dimensionality reduction technique that is commonly used for
supervised classification problems. It is used for modeling differences in groups i.e.
separating two or more classes. It is used to project the features in higher dimension
space into a lower dimension space. For example, we have two classes and we need to
separate them efficiently. Classes can have multiple features. Using only a single feature
to classify them may result in some overlapping. So, we will keep on increasing the
number of features for proper classification.

3.4.3 Steps involved in LDA:

There are the three key steps.
(i) Calculate the separability between different classes. This is also known as between-
class variance and is defined as the distance between the mean of different classes.
(ii) Calculate the within-class variance. This is the distance between the mean and the
sample of every class.
(iii) Construct the lower-dimensional space that maximizes Step1 (between-class variance)
and minimizes Step 2(within-class variance).

3.4.4. Pros & Cons of LDA

Advantages of LDA:

1. Simple prototype classifier: Distance to the class mean is used, it’s simple to interpret.

2. Decision boundary is linear: It’s simple to implement and the classification is robust.

5
3. Dimension reduction: It provides informative low-dimensional view on the data,
which is both useful for visualization and feature engineering.

Shortcomings of LDA:
1. Linear decision boundaries may not adequately separate the classes.
Support for more general boundaries is desired.

2. In a high-dimensional setting, LDA uses too many parameters. A

regularized version of LDA is desired.

3. Support for more complex prototype classification is desired.

3.5. Manifold Learnings:

➢ Manifold learning for dimensionality reduction has recently gained much attention to
assist image processing tasks such as segmentation, registration, tracking, recognition,
and computational anatomy.
➢ The drawbacks of PCA in handling dimensionality reduction problems for non-linear
weird and curved shaped surfaces necessitated development of more advanced
algorithms like Manifold Learning.
➢ There are different variant’s of Manifold Learning that solves the problem of reducing
data dimensions and feature-sets obtained from real world problems representing
uneven weird surfaces by sub-optimal data representation.
➢ This kind of data representation selectively chooses data points from a low-
dimensional manifold that is embedded in a high-dimensional space in an attempt to
generalize linear frameworks like PCA.
❖ Manifolds give a look of flat and featureless space that behaves like Euclidean space.
Manifold learning problems are unsupervised where it learns the high-dimensional
structure of the data from the data itself, without the use of predetermined
classifications and loss of importance of information regarding some characteristic of
the original variables.
❖ The goal of the manifold-learning algorithms is to recover the original domain
structure, up to some scaling and rotation. The nonlinearity of these algorithms allows
them to reveal the domain structure even when the manifold is not linearly embedded.
It uses some scaling and rotation for this purpose.
❖ Manifold learning algorithms are divided in to two categories:
➢ Global methods: Allows high-dimensional data to be mapped from high- dimensional
to low-dimensional such that the global properties are preserved. Examples include
Multidimensional Scaling (MDS), Isomaps covered in the following sections.
➢ Local methods: Allows high-dimensional data to be mapped to low dimensional such
that local properties are preserved. Examples are Locally linear embedding (LLE),
Laplacian eigenmap (LE), Local tangent space alignment (LSTA), Hessian
Eigenmapping (HLLE)
➢ Three popular manifold learning algorithms:
❑ IsoMap (Isometric Mapping)

6
Isomap seeks a lower-dimensional representation that maintains ‘geodesic distances’
between the points. A geodesic distance is a generalization of distance for curved
surfaces. Hence, instead of measuring distance in pure Euclidean distance with the
Pythagorean theorem-derived distance formula, Isomap optimizes distances along a
discovered manifold
❑ Locally Linear Embeddings
Locally Linear Embeddings use a variety of tangent linear patches (as
demonstrated with the diagram above) to model a manifold. It can be thought of as
performing a PCA on each of these neighborhoods locally, producing a linear
hyperplane, then comparing the results globally to find the best nonlinear embedding.
The goal of LLE is to ‘unroll’ or ‘unpack’ in distorted fashion the structure of the data,
so often LLE will tend to have a high density in the center with extending rays
❑ t-SNE
t-SNE is one of the most popular choices for high-dimensional visualization,
and stands for t-distributed Stochastic Neighbor Embeddings. The algorithm converts
relationships in original space into t-distributions, or normal distributions with small
sample sizes and relatively unknown standard deviations. This makes t-SNE very
sensitive to the local structure, a common theme in manifold learning. It is considered
to be the go-to visualization method because of many advantages it possesses.

3.6. Auto Encoders:

Auto Encoder is an unsupervised Artificial Neural Network that attempts

to encode the data by compressing it into the lower dimensions (bottle neck layer or
code) and then decoding the data to reconstruct the original input. The bottleneck layer
(or code) holds the compressed representation of the input data. In Auto Encoder the
number of output units must be equal to the number of input units since we’re
attempting to reconstruct the input data.
Auto Encoders usually consist of an encoder and a decoder. The encoder encodes
the provided data into a lower dimension which is the size of the bottleneck layer and
the decoder decodes the compressed data into its original form. The number of neurons
in the layers of the encoder will be decreasing as we move on with further layers,
whereas the number of neurons in the layers of the decoder will be increasing gas we
move on with further layers. There are three layers used in the encoder and decoder in
the following example. The encoder contains 32, 16, and 7 units in each layer
respectively and the decoder contains 7, 16, and 32 units in each layer respectively. The
code size/the number of neurons in bottle-neck must be less than the number of
features in the data. Before feeding the data into the Auto Encoder the data must
definitely be scaled between 0 and 1 using Min Max Scaler since we are going to use
sigmoid
7
activation function in the output layer which outputs values between0 and 1. When we
are using Auto Encoders for dimensionality reduction, we’ll been tracting the bottleneck
layer and use it to reduce the dimensions. This process can be viewed as feature extraction.

The type of Auto Encoder that we’re using is Deep Auto Encoder, where the encoder
and the decoder are symmetrical. The Autoencoders don’t necessarily have a
symmetrical encoder and decoder but we can have the encoder and decoder non-
symmetrical as well.Types of Auto Encoders are,

• Deep Autoencoder
• Sparse Autoencoder
• Under complete Autoencoder
• Variational Autoencoder
• LSTM Autoencoder

3.6.2. Hyperparameters of an Auto Encoder

◆ Code size or the number of units in the bottleneck layer

◆ Input and output size, which is the number of features in the data
◆ Number of neurons or nodes per layer
◆ Number of layers in encoder and decoder.
◆ Activation function
◆ Optimization function

Figure 4: Auto Encoders

8
3.7. AlexNet:

Figure 5: Alexnet Architecture

Alexnet model was proposed in 2012 in the research paper named Image net
Classification with Deep Convolution Neural Network by Alex Krizhevsky and his
colleagues

➢ The Alexnet has eight layers with learnable parameters

➢ The model has five layers with a combination of max pooling followed by 3 fully
connected layers
➢ The fully connected layers use Relu activation except the output layer
➢ They found out that using the Relu as an activation function accelerated the speed of
the training process by almost six times.
➢ They also used the dropout layers, which prevented the model from overfitting.
➢ The model is trained on the Imagenet dataset. The Imagenet dataset has arounf 14
million images across 1000 classes.
➢ The input to this model is the images of size 227X227X3
➢ The first convolution layer with 96 filters of size 11X11 with stride 4
➢ The activation function used in this layer is relu. The output feature map is 55X55X96
➢ Next, we have the first Maxpooling layer, of size 3X3 and stride 2
➢ Next the filter size is reduced to 5X5 and 256 such filtersare added
The stride value is 1 and padding 2. The activation function used is again relu. The
output size we get is 27X27X256
➢ Next we have a max-pooling layer of size 3X3 with stride 2. The resulting feature
map size is 13X13X256
➢ The third convolution operation with 384 filters of size 3X3 stride 1 and also padding
1is done next. In this stage the activation function used is relu. The output feature map
is of shape 13X13X384

9
➢ Then the fourth convolution operation with 384 filters of size 3X3. The stride value
along with the padding is 1. The output size remains unchanged as 13X13X384.
➢ After this, we have the final convolution layer of size 3X3 with 256 such filters. The
stride and padding are set to 1, also the activation function is relu. The resulting
feature map is of shape 13X13X256

If we look at the architecture now, the number of filters is increasing as we are going
deeper. Hence more features are extracted as we move deeper into the architecture.
Also, the filter size is reducing, which means a decrease in the feature map shape.

3.8. VGG-16
➢ The major shortcoming of too many hyper-parameters of AlexNet was solved by VGG
Net by replacing large kernel-sized filters (11 and 5 in the first and second convolution
layer, respectively) with multiple 3×3 kernel-sized filters one after another.
➢ The architecture developed by Simonyan and Zisserman was the 1st runner up of the
Visual Recognition Challenge of 2014.
➢ The architecture consists of 3*3 Convolutional filters, 2*2 Max Pooling layer with a stride
of 1.
➢ Padding is kept same to preserve the dimension.
➢ There are 16 layers in the network where the input image is RGB format with dimension
of 224*224*3, followed by 5 pairs of Convolution (filters: 64, 128, 256,512,512) and Max
Pooling.
➢ The output of these layers is fed into three fully connected layers and a SoftMax
function in the output layer.
➢ In total there are 138 million parameters in VGG Net

Figure6: VGG Architecture

3.9 ResNet:

ResNet, the winner of ILSVRC-2015 competition is a deep network with over 100 layers.
Residual networks (ResNet) are similar to VGG nets however with a sequential
10
approach they also use “Skip connections” and “batch normalization” that helps to train
deep layers without hampering the performance. After VGG Nets, as CNNs were going
deep, it was becoming hard to train them because of vanishing gradients problem that
makes the derivate infinitely small. Therefore, the overall performance saturates or even
degrades. The idea of skips connection came from highway network where gated
shortcut connections were used

3.10 Inception Net:

Figure 7: Inception Net

Inception network also known as GoogleLe Net was proposed by developers at google
in “Going Deeper with Convolutions” in 2014. The motivation of Inception Net comes
from the presence of sparse features Salient parts in the image that can have a large
variation in size. Due to this, the selection of right kernel size becomes extremely
difficult as big kernels are selected for global features and small kernels when the
features are locally located. The Inception Nets resolves this by stacking multiple
kernels at the same level. Typically, it uses 5*5, 3*3 and 1*1 filters in one go.

3.11. Hyperparameter Optimization:

Hyperparameter optimization in machine learning intends to find the hyperparameters
of a given machine learning algorithm that deliver the best performance as measured
on a validation set. Hyperparameters, in contrast to model parameters, are set by the
machine learning engineer before training. The number of trees in a random forest is a
hyperparameter while the weights in a neural network are model parameters learned
during training. Hyperparameter optimization finds a combination of
hyperparameters that returns 40 an optimal model which reduces a predefined loss
function and in turn increases the accuracy on given independent data

Hyperparameter Optimization methods

➢ Manual Hyperparameter Tuning

➢ Grid Search
➢ Random Search
11
➢ Bayesian Optimization
➢ Gradient-based Optimization

Module3 Notes
No ratings yet
Module3 Notes
13 pages
Unit 3dimentionality Reduction
No ratings yet
Unit 3dimentionality Reduction
13 pages
Unit Iii Dimentionality Reduction
No ratings yet
Unit Iii Dimentionality Reduction
12 pages
Deep Learning Notes III To IV
No ratings yet
Deep Learning Notes III To IV
22 pages
Lecture 14: Principal Component Analysis: Computing The Principal Components
No ratings yet
Lecture 14: Principal Component Analysis: Computing The Principal Components
6 pages
5 Dimentionality Reduction
No ratings yet
5 Dimentionality Reduction
27 pages
PCA for Data Simplification
No ratings yet
PCA for Data Simplification
70 pages
U4 - PCA - 5th Sem - DS
No ratings yet
U4 - PCA - 5th Sem - DS
14 pages
Dimensionality Reduction Guide
No ratings yet
Dimensionality Reduction Guide
104 pages
Principle Component Analysis
No ratings yet
Principle Component Analysis
7 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
19 pages
1501589578da Mod15 Q1 e Text
No ratings yet
1501589578da Mod15 Q1 e Text
9 pages
PCA
100% (1)
PCA
33 pages
U5@-Data Reduction
No ratings yet
U5@-Data Reduction
22 pages
Unit 3
No ratings yet
Unit 3
102 pages
Unit V Foml
No ratings yet
Unit V Foml
18 pages
IDS 4 (Week 14)
No ratings yet
IDS 4 (Week 14)
66 pages
PCA Complete
No ratings yet
PCA Complete
8 pages
What Is Principal Component Analysis (PCA) ?
No ratings yet
What Is Principal Component Analysis (PCA) ?
13 pages
Dimensionality Reduction & Models
No ratings yet
Dimensionality Reduction & Models
59 pages
ML Mod32019
No ratings yet
ML Mod32019
6 pages
PCA Tutorial for CS898 Course
No ratings yet
PCA Tutorial for CS898 Course
12 pages
Principal Component Analysis and Cluster Analysis
No ratings yet
Principal Component Analysis and Cluster Analysis
14 pages
Presentation A I STD 2
No ratings yet
Presentation A I STD 2
63 pages
Need of Principal Component Analysis
No ratings yet
Need of Principal Component Analysis
8 pages
Dimensionality Reduction Techniques: PCA & t-SNE
No ratings yet
Dimensionality Reduction Techniques: PCA & t-SNE
81 pages
10 ASAP Advanced Statistics Dimension Reduction
No ratings yet
10 ASAP Advanced Statistics Dimension Reduction
8 pages
Module 3
No ratings yet
Module 3
41 pages
It ML Unit 4 Notes Final
No ratings yet
It ML Unit 4 Notes Final
21 pages
Dimensionality Reduction (Pca)
No ratings yet
Dimensionality Reduction (Pca)
32 pages
Unit 3
No ratings yet
Unit 3
28 pages
Unit II - PCA - 03
No ratings yet
Unit II - PCA - 03
39 pages
PCA for Data Scientists
No ratings yet
PCA for Data Scientists
45 pages
P-3.1.4 - Pca
No ratings yet
P-3.1.4 - Pca
44 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
45 pages
W4.2 DataPreProcessing-PCA
No ratings yet
W4.2 DataPreProcessing-PCA
22 pages
6 Principal Component Analysis
No ratings yet
6 Principal Component Analysis
7 pages
Principal Component Analysis
100% (1)
Principal Component Analysis
10 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
27 pages
3.2 Pca
No ratings yet
3.2 Pca
27 pages
Principal Component Analysis: Term Paper For Data Mining & Data Warehousing
No ratings yet
Principal Component Analysis: Term Paper For Data Mining & Data Warehousing
11 pages
Dimensionality Reduction Using Principal Component Analysis
No ratings yet
Dimensionality Reduction Using Principal Component Analysis
32 pages
Dimensionality Reduction Techniques in Data Mining Aim To Reduce The Number of Features
No ratings yet
Dimensionality Reduction Techniques in Data Mining Aim To Reduce The Number of Features
9 pages
Understanding Principal Component Analysis
100% (1)
Understanding Principal Component Analysis
45 pages
Dimensionality Reduction Using PCA: Unsupervised Machine Learning
No ratings yet
Dimensionality Reduction Using PCA: Unsupervised Machine Learning
32 pages
Machine Learning (CSO851) - Lecture 03
No ratings yet
Machine Learning (CSO851) - Lecture 03
71 pages
CH 6
No ratings yet
CH 6
11 pages
PCA in Biological Data Analysis
No ratings yet
PCA in Biological Data Analysis
45 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
6 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
27 pages
Dim Reduction & Pattern Recognition
No ratings yet
Dim Reduction & Pattern Recognition
63 pages
Facial Recognition and Mathematics - Vectors and Geometry in Action
No ratings yet
Facial Recognition and Mathematics - Vectors and Geometry in Action
6 pages
Module 2-PCA-1
No ratings yet
Module 2-PCA-1
26 pages
Sess03 Dimension Reduction Methods
No ratings yet
Sess03 Dimension Reduction Methods
36 pages
Feature Extraction in Machine Learning
No ratings yet
Feature Extraction in Machine Learning
17 pages
Dimensionality Reduction Techniques
No ratings yet
Dimensionality Reduction Techniques
7 pages
What Is PCA?: Image Source
No ratings yet
What Is PCA?: Image Source
17 pages
ML Mod 4 Part 2
No ratings yet
ML Mod 4 Part 2
32 pages
ML RUSA Module 5 Dim Red
No ratings yet
ML RUSA Module 5 Dim Red
85 pages
Unit 4 RER Notes
No ratings yet
Unit 4 RER Notes
11 pages
Unit 4th Deep Learning
No ratings yet
Unit 4th Deep Learning
15 pages
Unit 4
No ratings yet
Unit 4
10 pages
Unit 4
No ratings yet
Unit 4
20 pages
Unit 3 RER Notes
No ratings yet
Unit 3 RER Notes
11 pages
Unit - III Notes Ibm
No ratings yet
Unit - III Notes Ibm
3 pages
Wa0008.
No ratings yet
Wa0008.
18 pages
Unit - IV Notes Ibm
No ratings yet
Unit - IV Notes Ibm
3 pages
Assignment 2 CFA
No ratings yet
Assignment 2 CFA
1 page
c11 Rain Fall Prediction
No ratings yet
c11 Rain Fall Prediction
33 pages
Shikhar25 Program-02
No ratings yet
Shikhar25 Program-02
10 pages
Sample Question DMW
No ratings yet
Sample Question DMW
4 pages
Cloud AI Developer Services Overview
No ratings yet
Cloud AI Developer Services Overview
22 pages
Data Science
No ratings yet
Data Science
207 pages
Sas Viya Overview
No ratings yet
Sas Viya Overview
4 pages
Adversarial Attacks Against Medical Deep Learning Systems
No ratings yet
Adversarial Attacks Against Medical Deep Learning Systems
16 pages
Generative AI With MATLAB Fixed
No ratings yet
Generative AI With MATLAB Fixed
5 pages
Person Identification by Keystroke Dynamics Using Pairwise User Coupling
No ratings yet
Person Identification by Keystroke Dynamics Using Pairwise User Coupling
11 pages
Data Science Exam Analysis
No ratings yet
Data Science Exam Analysis
16 pages
Network Attack
No ratings yet
Network Attack
84 pages
AlphaGo: AI Innovations in Go Game
No ratings yet
AlphaGo: AI Innovations in Go Game
1 page
Ai in Finance Citi Gps
100% (1)
Ai in Finance Citi Gps
124 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
21 pages
A Guide To Deep Learning in Healthcare
No ratings yet
A Guide To Deep Learning in Healthcare
6 pages
J World Aquaculture Soc - 2024 - Aung - Artificial Intelligence Methods Used in Various Aquaculture Applications A
No ratings yet
J World Aquaculture Soc - 2024 - Aung - Artificial Intelligence Methods Used in Various Aquaculture Applications A
31 pages
Top 50 Data Mining Interview Q&A
No ratings yet
Top 50 Data Mining Interview Q&A
30 pages
Bias Variance Trade Off
No ratings yet
Bias Variance Trade Off
20 pages
FunAI Assignment Week 11
No ratings yet
FunAI Assignment Week 11
3 pages
Optimizing ML for Healthcare Analytics
No ratings yet
Optimizing ML for Healthcare Analytics
53 pages
M Tech Thesis Help: Control Systems
100% (3)
M Tech Thesis Help: Control Systems
8 pages
HR Analytics
No ratings yet
HR Analytics
41 pages
Deep Learning Question Bank
No ratings yet
Deep Learning Question Bank
8 pages
Deep Learning Seminar Project Overview
No ratings yet
Deep Learning Seminar Project Overview
22 pages
Ishak Bintang Dikaputra - CV
No ratings yet
Ishak Bintang Dikaputra - CV
2 pages
Term Paper: Dept of CSE, GMRIT
No ratings yet
Term Paper: Dept of CSE, GMRIT
16 pages
M.Tech in Data Science & Engineering
No ratings yet
M.Tech in Data Science & Engineering
20 pages
Data Mining Essentials for Students
No ratings yet
Data Mining Essentials for Students
15 pages
Smart Real Estate Insights AI Powered Housing Market Analysis
No ratings yet
Smart Real Estate Insights AI Powered Housing Market Analysis
5 pages
DS 2024 Roadmap Version 2
No ratings yet
DS 2024 Roadmap Version 2
13 pages

Deep Learning Unit 3 Notes

Uploaded by

Deep Learning Unit 3 Notes

Uploaded by

UNIT III

3.1 Linear Factor Models:

Figure:1 Linear Factor Model

3.2 Dimensionality Reduction:

Figure 2: Dimensionality Reduction

3.3.1. Computing the Principal Components

eigenvectors as the columns of a matrix gives:

gives us the matrix equation: AΦ = ΦΛ We normalised the eigenvectors to unit

3.3.3 PCA Advantages:

In a real-world scenario, it is very common that we get thousands of features in

With so many features, the performance of your algorithm will drastically

3.3.4. Disadvantages of PCA

1. Independent variables become less interpretable: After implementing PCA on the

3. Information Loss: Although Principal Components try to cover maximum variance

3.4Linear Discrimination Analysis (LDA):

3.4.1 Need for LDA:

Linear Discriminant Analysis or Normal Discriminant Analysis or Discriminant

3.4.3 Steps involved in LDA:

3.4.4. Pros & Cons of LDA

2. In a high-dimensional setting, LDA uses too many parameters. A

3. Support for more complex prototype classification is desired.

3.5. Manifold Learnings:

3.6. Auto Encoders:

Auto Encoder is an unsupervised Artificial Neural Network that attempts

3.6.2. Hyperparameters of an Auto Encoder

◆ Code size or the number of units in the bottleneck layer

Figure 4: Auto Encoders

Figure 5: Alexnet Architecture

➢ The Alexnet has eight layers with learnable parameters

Figure6: VGG Architecture

3.10 Inception Net:

Figure 7: Inception Net

3.11. Hyperparameter Optimization:

Hyperparameter Optimization methods

➢ Manual Hyperparameter Tuning

You might also like