0% found this document useful (0 votes)
101 views120 pages

UT-1-Machine Learning Lecture Notes-1

The document outlines a syllabus for an introduction to Machine Learning, covering key concepts such as the comparison between Machine Learning, AI, and Data Science, types of learning (supervised, unsupervised, semi-supervised, and reinforcement learning), and various machine learning models. It explains how Machine Learning algorithms are trained on data to make predictions and decisions, contrasting it with traditional programming methods. Additionally, it details the characteristics and applications of different learning types and models within the field of Machine Learning.

Uploaded by

Karan Nigal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
101 views120 pages

UT-1-Machine Learning Lecture Notes-1

The document outlines a syllabus for an introduction to Machine Learning, covering key concepts such as the comparison between Machine Learning, AI, and Data Science, types of learning (supervised, unsupervised, semi-supervised, and reinforcement learning), and various machine learning models. It explains how Machine Learning algorithms are trained on data to make predictions and decisions, contrasting it with traditional programming methods. Additionally, it details the characteristics and applications of different learning types and models within the field of Machine Learning.

Uploaded by

Karan Nigal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Syllabus

Unit No 01 Introduction to Machine Learning

Introduction to Machine Learning, Comparison of Machine learning with traditional


programming, ML vs AI vs Data Science.

Types of learning:Supervised, Unsupervised, and semi-supervised, reinforcement


learning techniques,

Models of Machine learning:Geometric model, Probabilistic Models, Logical


Models, Grouping and grading models, Parametric and non-parametric models.

Features: Feature transformation ,etc..


Machine Learning

● Learning: The ability to improve behaviour based on experience is called learning


● Machine: A mechanically, electrically or electronically operated device for performing a task is
machine
● Machine Learning: Machine learning explores algorithm learn/ build model from data and that
model is used for prediction, decision making, and for solving tasks.

Definition: A computer program is said to learn from experience E (data) with respect to some class
of task T (prediction, classification etc..) and performance measure P if its performance on task in T
as measured by P improves with experience E.

● Machine Learning is a subset of artificial intelligence which focuses mainly on


machine learning from their experience and making predictions based on its experience.
Machine Learning
Learni

● It enables the computers or the machines


to make data-driven decisions rather than
being explicitly programmed for carrying
out a certain task.
● These programs or algorithms are
designed in a way that they learn and
improve over time when are exposed to
new data.
Machine Learning
Flow of Machine Learning
Machine Learning

Flow of Machine Learning


● Machine Learning algorithm is trained using a training data set to create a model.
● When new input data is introduced to the ML algorithm, it makes a prediction
on the basis of the model.
● The prediction is evaluated for accuracy and if the accuracy is acceptable,
the Machine Learning algorithm is deployed.
● If the accuracy is not acceptable, the Machine Learning algorithm is trained
again and again with an augmented training data set.
Machine Learning
Comparison of Machine Learning with Traditional Programming
● Traditional programming is a manual process — meaning a person (programmer) creates the
program.
● But without anyone programming the logic, one has to manually formulate or code rules.
● We have the input data, and someone (programmer) coded a program that uses that data and runs
on a computer to produce the desired output.
● Machine Learning, on the other hand, the input data and output are fed to an algorithm to create a
program.
● In Traditional programming one has to manually formulate/code rules while in Machine Learning
the algorithms automatically formulate the rules from the data, which is very powerful.
● If the Traditional Programming is automation, Then machine learning is automating the process
of automation.
Machine Learning
Comparison of Machine Learning with Traditional Programming
Machine Learning
Comparison of Machine Learning with Traditional Programming
M a c h i n e Le arn in g

Comparison of Machine Learning with Traditional Programming


Machine Learning

ML vs AI vs Data Science

Data Science:
● Based on strict analytical
evidence Artificial Intelligence
● Deal with structured • Imparts human intellects to
&unstructured data machine
Machine Learning
● Includes data various • Uses logic and decision
operations trees
• Subset of AI
• Includes Machine Learning • Uses Statistical models
• Machines improved
Machine Learning

Data Science vs. Artificial Intelligence

● Data science deals with pre-processing, analyzing, visualizing, and predicting the data. Whereas,
AI implements a predictive model used for forecasting future events.
● Data science banks on statistical techniques while AI leverages computer algorithms.
● The tools used in data science are much more in quantity than the ones used in AI.
● The reason for this is – there are multiple steps for analyzing data and extracting insights from it.
● In data science, the focus remains on building models that use statistical insights, whereas, for AI,
the aim is to build models that can emulate human intelligence.
● Data science strives to find hidden patterns in the raw and unstructured data while AI is about
assigning autonomy to data models.
Data Science vs. Machine Learning

● To be precise, Machine Learning fits within the purview of data science.


● The main difference between data science and machine learning lies in the fact that
data science is much broader in its scope and while focusing on algorithms and
statistics (like machine learning) also deals with entire data processing.
● Data science is essentially used to extract insights from data while Machine
learning is about techniques that data scientists use so that machines learn from
data.
● Data Science actually banks on tools such as machine learning and data analytics.
Artificial Intelligence vs. Machine Learning

● Artificial intelligence essentially makes machines act out human intelligence while ML deals with
learning from past data without being explicitly programmed.
● AI focuses on making systems that can solve complex problems while ML aims to make machines

learn from available data and generate accurate outputs.


● AI works towards maximizing the chances of success while ML is concerned with understanding

patterns and giving accurate results.


● AI involves the process of learning, reasoning, and self-correction while ML deals with learning and

self-correction only when introduced to new data.


● Artificial Intelligence deals with structured, unstructured, and semi-structured data while Machine

learning deals only with structured and semi-structured data.


Machine Learning

Types of Learning
● As with any method, there are different ways to train machine learning algorithms, each with
their own advantages and disadvantages.
● In ML, there are two kinds of data — labeled data and unlabeled data.
● Labeled data has both the input and output parameters in a completely machine-readable
pattern, but requires a lot of human labour to label the data, to begin with.
● Unlabeled data only has one or none of the parameters in a machine-readable form.
● This negates the need for human labour but requires more complex solutions.
● There are also some types of machine learning algorithms that are used in very specific use-
cases, but three main methods are used today.
Machine Learning
Types of Learning

[Link] Machine Learning


● Supervised learning is one of the most basic types of machine learning.
● In this type, the machine learning algorithm is trained on labeled data.
● Even though the data needs to be labeled accurately for this method to work, supervised learning

is extremely powerful when used in the right circumstances.


Types of Learning

[Link] Machine Learning


● In supervised learning, the ML algorithm is given a small training dataset to work with.
● This training dataset is a smaller part of the bigger dataset and serves to give the algorithm a
basic idea of the problem, solution, and data points to be dealt with.
● The training dataset is also very similar to the final dataset in its characteristics and provides the
algorithm with the labeled parameters required for the problem.
● The algorithm then finds relationships between the parameters given, essentially establishing a
cause and effect relationship between the variables in the dataset.
● At the end of the training, the algorithm has an idea of how the data works and the relationship

between the input and the output.


Machine Learning
Types of Learning

[Link] Machine Learning


● In supervised learning, learning data comes with description, labels, targets or desired outputs and
the objective is to find a general rule that maps inputs to outputs.
● This kind of learning data is called labeled data.
● The learned rule is then used to label new data with unknown outputs.
● Supervised learning involves building a machine learning model that is based on labeled samples.
● For example, if we build a system to estimate the price of a plot of land or a house based on
various features, such as size, location, and so on, we first need to create a database and label it.
We need to teach the algorithm what features correspond to what prices. Based on this data, the
algorithm will learn how to calculate the price of real estate using the values of the input features.
Machine Learning
Types of Learning

[Link] Machine Learning


● Supervised learning is commonly used in real world applications, such as face and speech recognition,
products or movie recommendations, and sales forecasting.
● Supervised learning deals with learning a function from available training data. Here, a learning
algorithm analyzes the training data and produces a derived function that can be used for mapping new
examples.
● Supervised learning can be further classified into two types - Regression and Classification.

● Regression trains on and predicts a continuous-valued response, for example predicting real
estate prices.
● When output Y is discrete valued, it is classification and when Y is continuous, then it is Regression.
Machine Learning

Types of Learning

[Link] Machine Learning


● Regression algorithms are used if there is a relationship between the input variable and the output
variable.
● It is used for the prediction of continuous variables, such as Weather forecasting, Market Trends, etc.
i) Linear Regression

ii) Regression Trees

iii) Non-Linear Regression

iv) Bayesian Linear Regression


v) Polynomial Regression

vi)Logistic Regression
Machine Learning

Types of Learning

[Link] Machine Learning


b) Classification

● Classification attempts to find the appropriate class label, such as analyzing positive/negative sentiment,
male and female persons, secure and unsecure loans etc.
● Classification algorithms are used when the output variable is categorical, which means there are two

classes such as Yes-No, Male-Female, True-false, etc. Common examples of supervised learning
include
i) Decision Trees
classifying emails into spam and not-spam
ii) Random Forest
categories, labeling web pages based on
iii) Support vector Machines
their content, and voice recognition.
iv) Neural network
Machine Learning

Types of Learning

[Link] Machine Learning


● Unsupervised machine learning holds the advantage of being able to work with unlabeled data.
● This means that human labor is not required to make the dataset machine-readable, allowing much
larger datasets to be worked on by the program.
● In supervised learning, the labels allow the algorithm to find the exact nature of the relationship
between any two data points.
● However, unsupervised learning does not have labels to work off of, resulting in the creation of
hidden structures.
● Relationships between data points are perceived by the algorithm in an abstract manner, with no

input required from human beings.


Machine Learning
Types of Learning

[Link] Machine Learning


● The creation of these hidden structures is what makes unsupervised learning algorithms versatile.
● Instead of a defined and set problem statement, unsupervised learning algorithms can adapt to the data
by dynamically changing hidden structures.
● This offers more post-deployment development than supervised learning algorithms.
● Unsupervised learning is used to detect anomalies, outliers, such as fraud or defective equipment, or to
group customers with similar behaviours for a sales campaign.
● It is the opposite of supervised learning. There is no labeled data here.
● When learning data contains only some indications without any description or labels, it is up to the coder
or to the algorithm to find the structure of the underlying data, to discover hidden patterns, or

to determine how to describe the data. This kind of learning data is called unlabeled data.
Machine Learning
Types of Learning

[Link] Machine Learning


● Suppose that we have a number of data points, and we want to classify them into several
groups. We may not exactly know what the criteria of classification would be.
● So, an unsupervised learning algorithm tries to classify the given dataset into a certain
number of groups in an optimum way.
● Unsupervised learning algorithms are extremely powerful tools for analyzing data and for
identifying patterns and trends.
● They are most commonly used for clustering similar input into logical groups.
● It has two types clustering and Association
Machine Learning
Types of Learning

2. Unsupervised Machine Learning


a. Clustering:
● Clustering is a method of grouping the objects into clusters such that objects with most similarities
remains into a group and has less or no similarities with the objects of another group.
● Cluster analysis finds the commonalities between the data objects and categorizes them as per the
presence and absence of those commonalities.

b. Association:
● An association rule is an unsupervised learning method which is used for finding the relationships
between variables in the large database.
● It determines the set of items that occurs together in the dataset.
● Association rule makes marketing strategy more effective. Such as people who buy X item (suppose
a bread) are also tend to purchase Y (Butter/Jam) item.
● A typical example of Association rule is Market Basket Analysis.
Machine Learning
[Link] Machine Learning

The list of some popular unsupervised learning algorithms:

● K-means clustering
● KNN (K-Nearest

● Hierarchical clustering
Machine Learning
[Link]. Supervised Unsupervised
1 Supervised learning algorithms are Unsupervised learning algorithms are trained using
trained using labeled data. unlabeled data.
2 Supervised learning model takes direct Unsupervised learning model does not take any
feedback to check if it is predicting feedback.
correct output or not.
3
Supervised learning model predicts Unsupervised learning model finds the hidden
The output. patterns in data.

In supervised learning, input In unsupervised learning, only input data is


4 data is provided to the model along provided to the model.
with the output.

5 The goal of supervised learning is to The goal of unsupervised learning is to find the
train the model so that it can predict the hidden patterns and useful insights from the unknown
output when it is given new data.
dataset.
Machine Learning
Types of Learning

[Link] Supervised Machine Learning


● The most basic disadvantage of any Supervised Learning algorithm is that the dataset has to
be hand-labeled either by a Machine Learning Engineer or a Data Scientist. This is a very
costly process, especially when dealing with large volumes of data. The most basic
disadvantage of any Unsupervised Learning is that its application spectrum is limited.
● To counter these disadvantages, the concept of Semi-Supervised Learning was introduced.
● It is partly supervised and partly unsupervised .
● If some learning samples are labeled, but some other are not labeled, then it is semi-
supervised learning.
● It makes use of a large amount of unlabeled data for training and a small amount of labeled

data for testing.


Machine Learning
Types of Learning

[Link] Supervised Machine Learning


● Semi-supervised learning is applied in cases where it
is expensive to acquire a fully labeled dataset while
more practical to label a small subset.
● Supervised learning: where a student is under the
supervision of a teacher at both home and school,
Unsupervised learning: where a student has to figure
out a concept himself and Semi-Supervised learning:
where a teacher teaches a few concepts in class and
gives questions as homework which are based on
similar concepts.
Machine Learning
Types of Learning

[Link] Learning
● Reinforcement learning directly takes inspiration from how human beings learn from data in their
lives.
● It features an algorithm that improves upon itself and learns from new situations using a trial-and-error
method.
● Favourable outputs are encouraged or ‗reinforced‘, and non-favourable outputs are discouraged or
‗punished‘.
● Based on the psychological concept of conditioning, reinforcement learning works by putting the
algorithm in a work environment with an interpreter and a reward system.
● In every iteration of the algorithm, the output result is given to the interpreter, which decides whether
the outcome is favourable or not.
Machine Learning
Types of Learning

[Link] Learning
● In case of the program finding the correct solution, the interpreter reinforces the solution by providing
a reward to the algorithm.
If the outcome is not favourable, the algorithm is forced to reiterate until it finds a better result.
● In most cases, the reward system is directly tied to the effectiveness of the result.

Machine Learning
Types of Learning

[Link] Learning
● In typical reinforcement learning use-cases, such as finding the shortest route between two points on a
map,
the solution is not an absolute value.
● Instead, it takes on a score of effectiveness, expressed in a percentage value.
● The higher this percentage value is, the more reward is given to the algorithm.
● Thus, the program is trained to give the best possible solution for the best possible reward.
● Here learning data gives feedback so that the system adjusts to dynamic conditions in order to achieve
a certain objective.
The system evaluates its performance based on the feedback responses and reacts accordingly.
● The best known instances include self-driving cars and chess master algorithm AlphaGo.
● There are two important learning models in reinforcement learning: Markov Decision Process & Q learning
Machine Learning

Types of Learning
Models of Machine Learning

1. Geometric model,
2. Probabilistic Models,
3. Logical Models,
4. Grouping and grading models,
5. Parametric and non-parametric models.
Models of Machine Learning
[Link] Models

● In Geometric models, features could be described as points in two dimensions (x- and y-axis) or
a three-dimensional space (x, y, and z).

● Even when features are not by nature geometric, they could be modeled in a geometric
manner (for example, temperature as a function of time can be modeled in two axes).
● In geometric models, there are two ways we could impose similarity.
● We could use geometric concepts like lines or planes to segment (classify) the instance space.

These are called Linear models .


● Alternatively, we can use the geometric notion of distance to represent similarity. In this
case, if two points are close together, they have similar values for features and thus can be
classed as similar. We call such models as Distance-based models.
Models of Machine Learning
[Link] Models
a. Linear Model
● Linear models are relatively simple. In this case, the function is
represented as a linear combination of its inputs.
● Thus, if x1 and x2 are two scalars or vectors of the same dimension
and a and b are arbitrary scalars, then ax1 + bx2 represents a linear
combination of x1 and x2.
● In the simplest case where f(x) represents a straight line, we have
an equation of the form f (x) = mx + c where c represents the
intercept and m represents the slope.
Models of Machine Learning
[Link] Models
a. Linear Model

● Linear models are parametric, which means that they have a fixed form with a small number of numeric
parameters that need to be learned from data.
● For example, in f (x) = mx + c, m and c are the parameters that we are trying to learn from the data.
● This technique is different from tree or rule models, where the structure of the model (e.g., which
features to use in the tree, and where) is not fixed in advance.
● Linear models are stable, i.e., small variations in the training data have only a limited impact on the

learned model.
● In contrast, tree models tend to vary more with the training data, as the choice of a different split at the

root of the tree typically means that the rest of the tree is different as well.
● As a result of having relatively few parameters, Linear models have low variance and high bias.
Models of Machine Learning
[Link] Models
a. Linear Model

● This implies that Linear models are less likely to overfit the training data than some other
models. However, they are more likely to underfit.
● For example, if we want to learn the boundaries between countries based on labeled data, then
linear models are not likely to give a good approximation.

b. Distance Model

● Distance-based models are the second class of Geometric models.


● Like Linear models, distance-based models are based on the geometry of data.
● As the name implies, distance-based models work on the concept of distance.
Models of Machine Learning
[Link] Models

b. Distance Model

● In the context of Machine learning, the concept of distance is not based on merely the physical
distance between two points.

● Instead, we could think of the distance between two points considering the mode of transport
between two points.
● Travelling between two cities by plane covers less distance physically than by train because as
the plane is unrestricted.

● Similarly, in chess, the concept of distance depends on the piece used – for example, a Bishop
can move diagonally.
Models of Machine Learning
[Link] Models
b. Distance Model
● Thus, depending on the entity and the mode of travel, the concept of distance can be experienced
differently.
● The distance metrics commonly used are Euclidean, Minkowski, Manhattan, and Mahalanobis.
● Distance is applied through the concept of neighbors and exemplars.
● Neighbors are points in proximity with respect to the distance measure expressed through exemplars.
● Exemplars are either centroids that find a centre of mass according to a chosen distance metric or
medoids that find the most centrally located data point.

● The most commonly used centroid is the arithmetic mean, which minimizes squared Euclidean
distance to all other points.
● The algorithms under Geometric Model: KNN, Linear Regression, SVM, Logistic Regression etc
Models of Machine Learning
[Link] Models

● The third family of machine learning algorithms is the probabilistic models.


● The k-nearest neighbour algorithm uses the idea of distance (e.g., Euclidean distance) to classify
entities, and logical models use a logical expression to partition the instance space.
● Here the probabilistic models use the idea of probability to classify new entities.
● Probabilistic models see features and target variables as random variables.
● The process of modeling represents and manipulates the level of uncertainty with respect to these
variables.
● There are two types of probabilistic models: Predictive and Generative.
● Predictive probability models use the idea of a conditional probability distribution P (Y |X)
from which Y can be predicted from X.
Models of Machine Learning
[Link] Models

● Generative models estimate the joint distribution P (Y, X). Once we know the joint
distribution for the generative models, we can derive any conditional or marginal
distribution involving the same variables.
● Thus, the generative model is capable of creating new data points and their labels, knowing
the joint probability distribution.
● The joint distribution looks for a relationship between two variables.
● Once this relationship is inferred, it is possible to infer new data points.
● The algorithms under Probabilistic Models: Naïve Bayes , Gaussian Process Regression etc
Models of Machine Learning
[Link] Models

Naïve Bayes is an example of a probabilistic classifier.


● The goal of any probabilistic classifier is given a set of features (x_0 through x_n) and a set of
classes (c_0 through c_k), we aim to determine the probability of the features occurring in each
class, and to return the most likely class.
● Therefore, for each class, we need to calculate P(c_i | x_0, …, x_n).
● We can do this using the Bayes rule defined as
● The Naïve Bayes algorithm is based on the idea of Conditional Probability.
● Conditional probability is based on finding the probability that something will happen, given
that
something else has already happened.
● The task of the algorithm then is to look at the evidence and to determine the likelihood of a
Models of Machine Learning
[Link] Models

● Logical models use a logical expression to divide the instance space into segments and hence
construct grouping models.
● A logical expression is an expression that returns a Boolean value, i.e., a True or False outcome.
● Once the data is grouped using a logical expression, the data is divided into homogeneous
groupings for the problem we are trying to solve.
● For example, for a classification problem, all the instances in the group belong to one class.
● There are mainly two kinds of logical models: Tree models and Rule models.
● Rule models consist of a collection of implications or IF-THEN rules. For tree-based models, the
‗if-part‘ defines a segment and the ‗then-part‘ defines the behaviour of the model for this

segment. Rule models follow the same reasoning.


Models of Machine Learning
[Link] Models

● Tree models can be seen as a particular type of rule model where the if-parts of the rules are
organized in a tree structure.
● Both Tree models and Rule models use the same approach to supervised learning. The
approach can be summarized in two strategies:
a) we could first find the body of the rule (the concept) that covers a sufficiently homogeneous
set of examples and then find a label to represent the body.
b) Alternately, we could approach it from the other direction, i.e., first select a class we want
to learn and then find rules that cover examples of the class.
Models of Machine Learning
[Link] Models

● A simple tree-based model is shown below.


● The tree shows survival numbers of passengers on the
Titanic ("sibsp" is the number of spouses or siblings
aboard).
● The values under the leaves show the probability of
survival and the percentage of observations in the leaf.
● The model can be summarized as: Your chances of
survival were good if you were (i) a female or (ii) a
male younger than 9.5 years with less than 2.5 siblings.
Models of Machine Learning
[Link] Models

● To understand logical models further, we need to understand the idea of Concept Learning. Concept
Learning involves learning logical expressions or concepts from examples.
● The idea of Concept Learning fits in well with the idea of Machine learning, i.e., inferring a general
function from specific training examples.
● Concept learning forms the basis of both tree-based and rule-based models.

● More formally, Concept Learning involves acquiring the definition of a general category
from a given set of positive and negative training examples of the category.
● A Formal Definition for Concept Learning is ―The understanding of a Boolean-valued function
from training examples of its input and output.‖
● In concept learning, we only learn a description for the positive class and label everything that
doesn‘t satisfy that description as negative.

● The algorithms under Logical Models: Decision Tree, Random Forest etc.
Models of Machine Learning
[Link] and Grading Models

The key difference between Grouping and Grading is the way they handle the instance space.

a) Grouping Model:
● Grouping models breaks ups the instance space into groups or segments , the number
of which is determined at training time.
● They have fixed resolution that is they cannot distinguish instances beyond resolution.
● At the finest resolution grouping models assign the majority class to all instances that fall into
the segment.
● Determine the right segments and label all the objects in that segment.
● Example the tree model split the instance space into smaller subsets. Trees are usually of
limited depth and don't contain all the available features. The subset at the leaves of the tree
partition , the instance space with some finite resolution. Instances filtered into the same leaf
of the tree are treated the same regardless of any features not in the tree that might be able to
distinguish them.
Models of Machine Learning
[Link] and Grading Models

b) Grading Model:

● They don't use the notion of segment.


● Forms one global model over instance space.
● Grading models are usually able to distinguish between arbitrary instances, no matter how similar
they are.
● Resolution in theory , infinite particularly when working in Cartesian instance space
● SVM and other geometric classifiers are the examples of grading models.
● They work in Cartesian instance space. They exploit the minute differences between instances.
● Some models combines features of both grouping and grading models.
● Linear classifiers are the primary example of a grading model. Instances on a line or plane parallel to
the decision boundary can't be distinguished by a liner model. There are infinitely many segments.
Models of ML

[Link] and Non Parametric


a) Parametric Model:
● Assumptions can greatly simplify the learning process, but can also limit what can be
learned. Algorithms that simplify the function to a known form are called parametric
machine learning algorithms.
● A learning model that summarizes data with a set of parameters of fixed size (independent
of the number of training examples) is called a parametric model.
● No matter how much data you throw at a parametric model, it won‘t change its mind about
how many parameters it needs.

The algorithms involve two steps:


1. Select a form for the function.
2. Learn the coefficients for the function from the training data.
Models of Machine Learning
[Link] and Non Parametric

a) Parametric Model:

● An easy to understand functional form for the mapping function is a line, as is used in linear
regression:b0 + b1*x1 + b2*x2 = 0
● Where b0, b1 and b2 are the coefficients of the line that control the intercept and slope, and x1
and x2 are two input variables.
● Assuming the functional form of a line greatly simplifies the learning process.
● Now, all we need to do is estimate the coefficients of the line equation and we have a predictive

model for the problem.


● Often the assumed functional form is a linear combination of the input variables and as such

parametric machine learning algorithms are often also called ―linear machine learning
algorithms―.
Models of Machine Learning
[Link] and Non Parametric Models

a) Parametric Model:

● The problem is, the actual unknown underlying function may not be a linear function like a line.
● It could be almost a line and require some minor transformation of the input data to work right.
● Or it could be nothing like a line in which case the assumption is wrong and the approach will
produce poor results.
● Some more examples of parametric machine learning algorithms include:

1. Logistic Regression
2. Linear Discriminant Analysis
3. Perceptron
4. Naive Bayes
5. Simple Neural Networks
Models of Machine Learning
5. Parametric and Non Parametric Models
a) Parametric Model:
Benefits
● Simpler: These methods are easier to understand and interpret results.
● Speed: Parametric models are very fast to learn from data.
● Less Data: They do not require as much training data and can work well even if the fit to the data is
not perfect.
Limitations
● Constrained: By choosing a functional form these methods are highly constrained to the specified
form.

● Limited Complexity: The methods are more suited to simpler problems.


● Poor Fit: In practice the methods are unlikely to match the underlying mapping function.
Models of Machine Learning
[Link] and Non Parametric Models

b) Non Parametric Model:

● Algorithms that do not make strong assumptions about the form of the mapping function are
called nonparametric machine learning algorithms.
● By not making assumptions, they are free to learn any functional form from the training data.
● Nonparametric methods are good when you have a lot of data and no prior knowledge,
and when you don‘t want to worry too much about choosing just the right features.
● Nonparametric methods seek to best fit the training data in constructing the mapping function,
though maintaining some ability to generalize to unseen data.
● As such, they are able to fit a large number of functional forms.
Models of Machine Learning
[Link] and Non Parametric Models

b) Non Parametric Model:

● An easy to understand nonparametric model is the k-nearest neighbors algorithm that makes
predictions based on the k most similar training patterns for a new data instance.
● The method does not assume anything about the form of the mapping function other than
patterns that are close are likely to have a similar output variable.

Some more examples of popular nonparametric machine learning algorithms are:


1. k-Nearest Neighbors
2. Decision Trees like CART and C4.5
3. Support Vector Machines
Models of Machine Learning
[Link] and Non Parametric Models

b) Non Parametric Model:

Benefits of Nonparametric Machine Learning Algorithms:


● Flexibility: Capable of fitting a large number of functional forms.
● Power: No assumptions (or weak assumptions) about the underlying function.

● Performance: Can result in higher performance models for prediction.

Limitations of Nonparametric Machine Learning Algorithms:


● More data: Require a lot more training data to estimate the mapping function.
● Slower: A lot slower to train as they often have far more parameters to train.

● Overfitting: More of a risk to overfit the training data and it is harder to explain why specific
predictions are made.
Data Formats in Machine Learning
Data Formats in Machine Learning

● Each data format represents how the input data is represented in memory.
● This is important as each machine learning application performs well for a particular data
format and worse for others.
● Interchanging between various data formats and choosing the correct format is a major
optimization technique.

There are four types of data formats:


1. NHWC
2. NCHW
3. NCDHW
4. NDHWC
Real life Application ML
1. NLP - Chatbots & Virtual Assistants, text summarization, sentiment analysis,etc

1. Image Processing – Facial, Fingerprint Recognition, Medical diagnosis,etc

2. Speech Recognition - Voice Command Control, voice biometric, etc

3. Industry Automation – Autonomous Driving, Predictive Maintenance, etc

4. Anomaly Detection - Fraud Detection, Spam Filtering etc

5. Predictive Analytics- Data analytics, Weather Forecasting, Personalized Advertising etc

6. E-Commerce- Credit Scoring, Product Recommendations, Dynamic Pricing etc


Important Elements of Machine Learning

Data Formats in Machine Learning

Each letter in the formats denotes a particular aspect/ dimension of the data:
● N: Batch size : is the number of images passed together as a group for inference
● C: Channel : is the number of data components that make a data point for the input data. It
is 3 for opaque images and 4 for transparent images.
● H: Height : is the height/ measurement in y axis of the input data
● W: Width : is the width/ measurement in x axis of the input data
● D: Depth : is the depth of the input data
Important Elements of Machine Learning
Data Formats in Machine Learning

1) NHWC
NHWC denotes (Batch size, Height, Width, Channel). This means there is a 4D array where the first
dimension represents batch size and accordingly. This 4D array is laid out in memory in row major
order. Hence, you can visualize the memory layout to imagine which operations will access
consecutive memory (fast) or memory separated by other data (slow).

2) NCHW
NCHW denotes (Batch size, Channel, Height, Width). This means there is a 4D array where the first
dimension represents batch size and accordingly. This 4D array is laid out in memory in row major
order.
Important Elements of Machine Learning
Data Formats in Machine Learning

3) NCDHW
NCHW denotes (Batch size, Channel, Depth, Height, Width). This means there is a 5D array
where the first dimension represents batch size and accordingly. This 5D array is laid out

in
memory in row major order.
4) NDHWC
NCHW denotes (Batch size, Depth, Height, Width, Channel). This means there is a 5D array
where the first dimension represents batch size and accordingly. This 5D array is laid out

in
Important Elements of Machine Learning
Learnability in Machine Learning

● Learn ability is a quality of products and interfaces that allows users to quickly become familiar with
them and able to make good use of all their features and capabilities.
● Learn ability is one component of usability and is often heard in the context of user interface or user
experience design, as well as usability and user acceptance testing.
● A very learnable interface or product is sometimes said to be intuitive because the user can immediately
grasp how to interact with the system.
● First-time learn ability refers to the degree of ease with which a user can learn a newly-encountered
system without referring to documentation, such as manuals, user guides or FAQ (frequently-asked
questions) lists.
● One element of first-time learn ability is discoverability, which is the degree of ease with which the user
can find all the elements and features of a new system when they first encounter it.
Important Elements of Machine Learning
Learnability in Machine Learning

● Learn ability over time, on the other hand, is the capacity of a user to gain expertise in working with
a
given system through repeated interaction.
● There are three different aspects of Learn ability

1. First use learn ability

2. Steepness of learning curve

[Link] of ultimate plateau

● Relatively simple systems with good learn ability are said to have short or steep learning curves,
meaning that most learning associated with the system happens very quickly, after which the rate of
learning levels off or plateaus.
Important Elements of Machine Learning

Learnability in Machine Learning

● More complex systems typically involve a longer (shallower) learning curve.


● Within any system that applies standards to measurement, a steep learning curve
refers to something easily learned.
● As displayed in a graph, for example, the steepness indicates that the degree of
learning obtained rises quickly.
● Contrary to the term's actual definition, however, most people use the term steep
learning curve to indicate difficulty, similarly to the way that a steep hill is difficult to
climb.
Important Elements of Machine Learning

Learnability in Machine Learning

Measuring Linear ability

● High learnability contributes to usability.


● It results in quick system onboarding which translates to low training costs.
● Additionally, good learnability can result in high satisfaction because users will feel confident in their
abilities.
● If the system and corresponding tasks are complex and ones that users access frequently, your

product may be a good case for a learnability study


Important Elements of Machine Learning
Statistical Learning
● Statistics is a collection of tools that you can use to get answers to important questions about
data.
● You can use descriptive statistical methods to transform raw observations into information that
you can understand and share.
● You can use inferential statistical methods to reason from small samples of data to whole
domains.
● Statistical learning theory is a framework for machine learning that draws from statistics and

functional analysis.
● It deals with finding a predictive function based on the data presented.
● The main idea in statistical learning theory is to build a model that can draw conclusions from
data and make predictions.
Important Elements of Machine Learning
Statistical Learning Approaches

1. Statistics in Data Preparation


Statistical methods are required in the preparation of train and test data for your machine learning
model.
This includes techniques for:
● Outlier detection.
● Missing value imputation.
● Data sampling.
● Data scaling.
● Variable encoding and much more.
A basic understanding of data distributions, descriptive statistics, and data visualization is required to
help you identify the methods to choose when performing these tasks.
Important Elements of Machine
Learning
Statistical Learning Approaches

2. Statistics in Model Evaluation


Statistical methods are required when evaluating the skill of a machine learning model on data not
seen during training.

This includes techniques for:


● Data sampling
● Data Resampling
● Experimental design
Re-sampling techniques such as k-fold cross-validation are often well understood by machine learning
practitioners, but the rationale for why this method is required is not.
Important Elements of Machine
Learning
Statistical Learning Approaches

3. Statistics in Model Selection


Statistical methods are required when selecting a final model or model configuration to use for a
predictive modeling problem.

These include techniques for:


● Checking for a significant difference between results.
● Quantifying the size of the difference between results.
This might include the use of statistical hypothesis tests.
Important Elements of Machine
Learning
Statistical Learning Approaches

4. Statistics in Model Presentation


Statistical methods are required when presenting the skill of a final model to
stakeholders. This includes techniques for:

● Summarizing the expected skill of the model on average.


● Quantifying the expected variability of the skill of the model in practice.
This might include estimation statistics such as confidence intervals.
Important Elements of Machine Learning

Statistical Learning Approaches

5. Statistics in Prediction
Statistical methods are required when making a prediction with a finalized model on
new data.

This includes techniques for:


● Quantifying the expected variability for the prediction.
● This might include estimation statistics such as prediction intervals.
Important Elements of Machine Learning
Statistical Learning Approaches

6. Problem Framing:
● Requires the use of exploratory data analysis and data mining.
7. Data Cleaning.

●Requires the use of outlier detection, imputation and more.


8. Data Selection:
● Requires the use of data sampling and feature selection
methods.
9. Model Configuration:
● Requires the use of statistical hypothesis tests and
estimation statistics.
Principal Component Analysis(PCA)
Principal Component Analysis is an unsupervised learning algorithm that is used for the
dimensionality reduction in machine learning.

It is a statistical process that converts the observations of correlated features into a set of
linearly uncorrelated features with the help of orthogonal transformation.

These new transformed features are called the Principal Components.

It is one of the popular tools that is used for exploratory data analysis and predictive
modeling.

It is a technique to draw strong patterns from the given dataset by reducing the
variances.

PCA generally tries to find the lower-dimensional surface to project the high-dimensional
data.
PCA works by considering the variance of each attribute because the high
attribute shows the good split between the classes, and hence it reduces the
dimensionality.

Some real-world applications of PCA are image processing, movie


recommendation system, optimizing the power allocation in various
communication channels.

It is a feature extraction technique, so it contains the important variables and


drops the least important variable.
The PCA algorithm is based on some mathematical concepts such as:
• Mean
• Variance and Covariance
• Eigenvalues and Eigen factors

• Mean: The mean, also known as the average, is a measure of the central tendency of a dataset. It is
calculated by summing up all the values in the dataset and dividing them by the number of values.

For a dataset with n values x1, x2, x3, ……., xn the mean μ is given by:

μ = (1/n) ∑i=1:n xi
Mean Example

For the dataset {4,8,6,5,3,7}:

μ=4+8+6+5+3+7 / 6 = 33/6 = 5.5


• Variance
Variance measures the dispersion of a dataset, indicating how much the values differ from the mean.
It is the average of the squared differences from the mean.

Variance Formula
For a dataset with n values x1, x2, x3, ……., xn the mean σ2 is given by:

σ2 = (1/n) ∑i=1: n (xi–μi)2


Variance Example
For the dataset {4,8,6,5,3,7} with mean = 5.5

σ2 = (4 – 5.5)2 + (8 – 5.5)2 + (6−5.5)2 + (5−5.5)2 + (3−5.5)2 + (7- 5.5)2 / 6


σ2 = 17.5/6 = 2.92
• Covariance

Covariance provides insight into how two variables are related to one another.
More precisely, covariance refers to the measure of how two random variables in a data set will change
together.

A positive covariance means the variables at hand are positively related. Meaning they move in the
same direction.
A negative covariance means the variables are inversely related, or that they move in opposite
directions.

The formula for covariance is as follows:

In this formula, X represents the independent variable,


Y represents the dependent variable,
N represents the number of data points in the sample,
x-bar represents the mean of the X, and
y-bar represents the mean of the dependent variable Y.
• Standard Deviation
Standard deviation is the square root of the variance, which is the average of the squared differences
from the mean.

Standard Deviation Formula


For a dataset with n values x1, x2, x3, ……., xn the mean σ is given by:

σ= √ (1/N) Σi=1:n(x−μ) 2

Standard Deviation Example


For the dataset {4,8,6,5,3,7} , with variance σ2 = 2.92
σ = 1.71
Eigenvalue

The specific set of scalars connected with the system of linear equations is known as eigenvalues.

Matrix equations are where it‘s most commonly employed.

The German term ‗Eigen‘ denotes ‗appropriate‘ or ‗characteristic.‘

As a result, eigenvalue can also be referred to as a characteristic value, a characteristic root, appropriate
values, or latent roots.

An eigenvalue is a scalar that is utilized to convert an eigenvector.

The fundamental formula is

Ax = λx
The eigenvalue of A is the number or scalar value ―λ‖.
Eigenvector

When a linear transformation is applied, eigenvectors are non-zero vectors that do not change
direction.
It only varies by scalar quantity.

In a nutshell, if A is a linear transformation from a vector space V, and x is a non-zero vector in V,


then v is an eigenvector of A

if A(X) is a scalar multiple of x. A set of all the eigenvectors with the identical eigenvalue, jointly with
the zero vector, makes up an Eigenspace of vector x.

The zero vector, however, is not an eigenvector. If A is an ―n×n‖ matrix and is an eigenvalue of A,
then x, a non-zero vector, is called an eigenvector if it fulfills the following expression:

Ax = λx

X is one of the eigenvectors of A value that corresponds to eigenvalue λ


Some common terms used in PCA algorithm:

Dimensionality:
It is the number of features or variables present in the given dataset. More easily, it is the number of
columns present in the dataset.

Correlation:
It signifies that how strongly two variables are related to each other.
Such as if one changes, the other variable also gets changed.
The correlation value ranges from -1 to +1.
Here, -1 occurs if variables are inversely proportional to each other, and +1 indicates that
variables are directly proportional to each other.
Orthogonal:
It defines that variables are not correlated to each other, and hence the correlation between the pair of
variables is zero.

Eigenvectors:
If there is a square matrix M, and a non-zero vector v is given. Then v will be eigenvector if Av is the
scalar multiple of v.

Covariance Matrix:
A matrix containing the covariance between the pair of variables is called the Covariance Matrix.
Covariance Matrix

The variance-covariance matrix is a square matrix with diagonal elements that represent the variance
and the non-diagonal components that express covariance.

The covariance of a variable can take any real value- positive, negative, or zero.

A positive covariance suggests that the two variables have a positive relationship, whereas a negative
covariance indicates that they do not.

If two elements do not vary together, they have a zero covariance.


Covariance Matrix Example
Let‘s say there are 2 data sets X = [10, 5] and Y = [3, 9].
The variance of Set X = 6.5 and the variance of set Y = 9.
The covariance between both variables is -15.

The covariance matrix is as follows:

[Variance of Set X Covariance of Both sets] =[6.5 −15


Covariance of Both Sets Variance of Set Y −15 9]

Here, μ is Mean of Population


x‾x is Mean of Sample
n is Number of Observation
xi is the Observation in Dataset x

[Link]
How to Find Covariance Matrix?
The dimensions of a covariance matrix are determined by the number of variables in a given data set.

If there are only two variables in a set, then the covariance matrix would have two rows and two
columns.

Similarly, if a data set has three variables, then its covariance matrix would have three rows and three
columns.
Psycholog
Student History(Y)
y(X)
Anna 80 70
The data pertains to marks scored by Anna, Caroline, and Laura in Psychology &
History. Make a covariance matrix. Caroline 63 20
Laura 100 50
The following steps have to be followed:

Step 1: Find the mean of variable X. Sum up all the observations in variable X and divide the sum obtained with
the number of terms. Thus, (80 + 63 + 100)/3 = 81.

Step 2: Subtract the mean from all observations. (80 – 81), (63 – 81), (100 – 81).

Step 3: Take the squares of the differences obtained above and then add them up. Thus, (80 – 81)2 + (63 – 81)2 +
(100 – 81)2.

Step 4: Find the variance of X by dividing the value obtained in Step 3 by 1 less than the total number of
observations.
var(X) = [(80 – 81)2 + (63 – 81)2 + (100 – 81)2] / (3 – 1) = 343.
Step 5: Similarly, repeat steps 1 to 4 to calculate the variance of Y.
Var(Y) = 633.333
Step 6: Choose a pair of variables.

Step 7: Subtract the mean of the first variable (X) from all observations; (80 – 81), (63 – 81), (100 – 81).

Step 8: Repeat the same for variable Y; (70 – 47), (20 – 47), (50 – 47).

Step 9: Multiply the corresponding terms: (80 – 81)(70 – 47)X (63 – 81)(20 – 47)X(100 – 81)(50 – 47).

Step 10: Find the covariance by adding these values and dividing them by (n – 1).

Cov(X, Y) = [(80 – 81)(70 – 47) + (63 – 81)(20 – 47) + (100 – 81)(50 – 47)]/(3-1) = 260.

Step 11: Use the general formula for the covariance matrix to arrange the terms.

The matrix becomes:


[ 343 260
260 633.333]
Principal Components Analysis ( PCA)
Principal Components Analysis ( PCA)
An exploratory technique used to reduce the dimensionality of the data
set to 2D or 3D

Can be used to:


• Reduce number of dimensions in data
• Find patterns in high-dimensional data
• Visualize data of high dimensionality

Example applications:
• Face recognition
• Image compression
• Gene expression analysis
The main goal of Principal Component Analysis
(PCA) is to reduce the dimensionality of a dataset
while preserving the most important patterns or
relationships between the variables without any prior
knowledge of the target variables.
Principal Component Analysis (PCA) is used to reduce the dimensionality of
a data set by finding a new set of variables, smaller than the original set of
variables, retaining most of the sample‘s information, and useful for
the regression and classification of data.
Principal Component Analysis (PCA) is a technique for dimensionality reduction that identifies
a set of orthogonal axes, called principal components, that capture the maximum variance in the
data.

The principal components (PC) are linear combinations of the original variables in the dataset
and are ordered in decreasing order of importance.

The total variance captured by all the principal components is equal to the total variance in the
original dataset.

The first principal component captures the most variation in the data, but the second principal
component captures the maximum variance that is orthogonal to the first principal component,
and so on.

In Principal Component Analysis, it is assumed that the information is carried in the variance of
the features, that is, the higher the variation in a feature, the more information that features
carries.
Applications of PCA

Data visualization, feature selection, and data compression.

In data visualization, PCA can be used to plot high-dimensional data in two or


three dimensions, making it easier to interpret.

In feature selection, PCA can be used to identify the most important variables in
a dataset.

In data compression, PCA can be used to reduce the size of a dataset without
losing important information.
PCA (Principal Component Analysis)
Step 1: Standardization

First, we need to standardize our dataset to ensure that each variable has a mean
of 0 and a standard deviation of 1.

Z=(X−μ)/σ​

Here,
μ is the mean of independent features μ={μ1,μ2,⋯,μm}
σ is the standard deviation of independent features
σ={σ1​,σ2​,⋯,σm​}
Step2: Covariance Matrix Computation

Covariance measures the strength of joint variability between two or more


variables, indicating how much they change in relation to each other.

To find the covariance we can use the formula:

cov(x1,x2)= (∑i=1:n(x1i−x1ˉ)(x2i−x2ˉ )/(n−1)​


The value of covariance can be positive, negative, or zeros.
Positive : As the x1 increases x2 also increases.
Negative: As the x1 increases x2 also decreases.
Zeros : No direct relation
Step 3: Compute Eigenvalues and Eigenvectors of Covariance Matrix to Identify
Principal Components

Let A be a square n x n matrix and X be a non-zero vector for which


AX=λX for some scalar values λ.

then λ is known as the eigenvalue of matrix A and X is known as


the eigenvector of matrix A for the corresponding eigenvalue.

It can also be written as :


AX−λX=0
(A−λI)X=0
where I am the identity matrix of the same shape as matrix A. And the above
conditions will be true only if (A–λI)(A–λI) will be non-invertible (i.e. singular
matrix).
That means, ∣A–λI∣=0
From the above equation, we can find the eigenvalues \lambda, and therefore
corresponding eigenvector can be found using the equation X=λX.
Summary of PCA
Principal Component Analysis(PCA) technique was introduced by the
mathematician Karl Pearson in 1901.

Principal Component Analysis (PCA) is a statistical procedure that uses an


orthogonal transformation that converts a set of correlated variables to a set of
uncorrelated variables.

PCA is the most widely used tool in exploratory data analysis and in machine
learning for predictive models.

Principal Component Analysis (PCA) is an unsupervised learning algorithm


technique used to examine the interrelations among a set of variables.

It is also known as a general factor analysis where regression determines a line of


best fit.
Principal Components Analysis Ideas ( PCA)
Does the data set ‗span‘ the whole of d dimensional space?

For a matrix of m samples x n genes, create a new covariance matrix of size n


x n.

Transform some large number of variables into a smaller number of


uncorrelated variables called principal components (PCs).

developed to capture as much of the variation in data as possible


Principal Component Analysis
See online tutorials such as
[Link]
X2

Y1
Note: Y1 is the Y2 x
x
first eigen vector, x
xx
x x
x x
x x
Y2 is the second. x
x
x x
x x
Y2 ignorable. x x x X1
x
x
x Key observation:
x
x variance = largest!
Principal Component Analysis: one
attribute first Temperature
42
40
Question: how much spread is in the data along 24
the axis? (distance to the mean) 30
15
18
Variance=Standard deviation^2
15
30
n 15
(X i  X) 2
30

s2  i 1 35
(n  1) 30
40
30
Now consider two dimensions
X=Temperature Y=Humidity
40 90
40 90
Covariance: measures the
40 90
correlation between X and Y
30 90
• cov(X,Y)=0: independent
15 70
•Cov(X,Y)>0: move same dir
15 70
•Cov(X,Y)<0: move oppo dir
15 70
30 90
15 70

n
30 70
(X
i 1
i  X )(Yi  Y ) 30 70
cov(X , Y )  30 90
(n  1)
40 70
30 90
More than two attributes: covariance matrix
Contains covariance values between all possible dimensions (=attributes):

Example for three attributes (x,y,z):

C nxn  (cij | cij  cov( Dimi , Dim j ))

 cov( x, x) cov( x, y ) cov( x, z ) 


 
C   cov( y, x) cov( y, y ) cov( y, z ) 
 cov( z, x) cov( z, y ) cov( z, z ) 
 
Eigenvalues & eigenvectors
Vectors x having same direction as Ax are called eigenvectors of A (A is an n by
n matrix).

In the equation Ax=x,  is called an eigenvalue of A.

 2 3  3  12   3
  x      4 x 
 2 1  2   8   2

103
Principal components

1. principal component (PC1)


The eigenvalue with the largest absolute value will indicate that
the data have the largest variance along its eigenvector, the
direction along which there is greatest variation

2. principal component (PC2)


the direction with maximum variation left in data, orthogonal to
the 1. PC

In general, only few directions manage to capture most of the


variability in the data.

104
Steps of PCA

Let X be the mean vector (taking the


For matrix C, vectors e (=column vector)
mean of all rows) having same direction as Ce :

Adjust the original data by the mean eigenvectors of C is e such that Ce=e,
 is called an eigenvalue of C.
X’ = X – X Ce=e  (C-I)e=0

Compute the covariance matrix C of Most data mining packages do this for you.
adjusted X

Find the eigenvectors and eigenvalues of


C.
Eigenvalues
• Calculate eigenvalues  and eigenvectors x for covariance matrix:

Eigenvalues j are used for calculation of [% of total variance] (Vj)


for each component j:

j n
V j  100  n  x n

x 1
x
x 1

106
Principal components - Variance

25

20

Variance (%) 15

10

0
PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9 PC10

107
Transformed Data
• Eigenvalues j corresponds to variance on each
component j
• Thus, sort by j
• Take the first p eigenvectors ei; where p is the number
of top eigenvalues
• These are the directions with the largest variances
 yi1   e1  xi1  x1 
    
 yi 2   e2  xi 2  x2 
 ...    ...  
    ... 
 y   e  x  x 
 ip   p  in n  108
An Example Mean1=24.1
Mean2=53.8
100
90
80
70
60
X1 X2 X1' X2' 50 Series1
40
30
19 63 -5.1 9.25 20
10
39 74 14.9 20.25 0
0 10 20 30 40 50

30 87 5.9 33.25
40
30 23 5.9 -30.75 30

20
15 35 -9.1 -18.75
10

15 43 0 Series1
-9.1 -10.75
-15 -10 -5 0 5 10 15 20
-10

15 32 -9.1 -21.75 -20

-30
30 73 5.9 19.25 -40
109
Covariance Matrix

75 106
C=
106 482

Eigenvectors:
e1=(-0.98,-0.21), 1=51.8
e2=(0.21,-0.98), 2=560.2

Thus the second eigenvector is more important!

110
If we only keep one dimension: e2
yi
-10.14

0.5 -16.72
We keep the dimension of 0.4 -31.35
e2=(0.21,-0.98) 0.3 31.37
4
We can obtain the final data 0.2
16.46
0.1
as 0
4
8.624
-40 -20 -0.1 0 20 40 19.40
-0.2 4
-0.3 -17.63
-0.4
-0.5

xi1 
yi  0.21 
 0.98    0.21* xi1  0.98 * xi 2
 xi 2 
111
112
Linear Discriminant Analysis (LDA)
Linear Discriminant Analysis (LDA)
Linear Discriminant Analysis (LDA), also known as Normal Discriminant Analysis
or Discriminant Function Analysis, is a dimensionality reduction technique primarily
utilized in supervised classification problems.

It facilitates the modeling of distinctions between groups, effectively separating two


or more classes. LDA operates by projecting features from a higher-dimensional
space into a lower-dimensional one. In machine learning, LDA serves as a supervised
learning algorithm specifically designed for classification tasks, aiming to identify a
linear combination of features that optimally segregates classes within a dataset.
It facilitates the modeling of distinctions between groups, effectively separating two
or more classes.

LDA operates by projecting features from a higher-dimensional space into a lower-


dimensional one.

In machine learning, LDA serves as a supervised learning algorithm specifically


designed for classification tasks, aiming to identify a linear combination of features
that optimally segregates classes within a dataset.
Suppose we have two sets of data points belonging to
two different classes that we want to classify.

To create a new axis and projects data onto a new


axis in a way to maximize the separation of the two
categories and hence, reduces the 2D graph into a 1D
graph.

Two criteria are used by LDA to create a new


axis:

1. Maximize the distance between the means of the


two classes.
2. Minimize the variation within each class.
In the above graph, it can be seen that a new axis (in
red) is generated and plotted in the 2D graph such that
it maximizes the distance between the means of the
two classes and minimizes the variation within each
class.

This newly generated axis increases the separation


between the data points of the two classes.

After generating this new axis using the above-


mentioned criteria, all the data points of the classes are
plotted on this new axis and are shown in the figure
given below.
Drawback of LDA:

But Linear Discriminant Analysis fails when the mean of the distributions are shared,
as it becomes impossible for LDA to find a new axis that makes both classes linearly
separable.

In such cases, we use non-linear discriminant analysis.


LDA can be performed in 5 steps:

1. Compute the mean vectors for the different classes from the dataset.

1. Compute the scatter matrices (in-between-class and within-class scatter matrices).

2. Compute the eigenvectors and corresponding eigenvalues for the scatter matrices.

3. Sort the eigenvectors by decreasing eigenvalues and choose k eigenvectors with the
largest eigenvalues.

4. Use this eigenvector matrix to transform the samples onto the new subspace.
Summarizing the LDA approach in 5 steps

[Link] the d-dimensional mean vectors for the different classes from the dataset.

[Link] the scatter matrices (in-between-class and within-class scatter matrix).

[Link] the eigenvectors (ee1,ee2,...,eedee1,ee2,...,eed) and corresponding eigenvalues


(λλ1,λλ2,...,λλdλλ1,λλ2,...,λλd) for the scatter matrices.

[Link] the eigenvectors by decreasing eigenvalues and choose kk eigenvectors with the
largest eigenvalues to form a d×kd×k dimensional matrix WWWW (where every column
represents an eigenvector).

[Link] this d×kd×k eigenvector matrix to transform the samples onto the new subspace. This
can be summarized by the matrix multiplication: YY=XX×WWYY=XX×WW (where XXXX is
a n×dn×d-dimensional matrix representing the nn samples, and yyyy are the
transformed n×kn×k-dimensional samples in the new subspace).

5.[Link]

You might also like