Machine Learning
Techniques
Tech Master Edu Machine Learning
Agenda
Chapter 1 : Introduction
Chapter 2 : Regression
Bayesian Learning
Support Vector Machine(SVM)
Chapter 3 : Decision Tree Learning
Instance-Based Learning
Chapter 4 : Artificial Neural Networks
Deep Learning
Chapter 5 : Reinforcement Learning
Genetic Algorithms
Tech Master Edu Machine Learning
Tech Master Edu Machine Learning
Key Points
• Subset of artificial intelligence.
• Use algorithms and statistical models.
• Improve performance on a specific task over time.
• Without being explicitly programmed.
• The primary goal of is to teach computers.
• Recognize patterns in data.
• Make predictions or decisions based on patterns.
Tech Master Edu Machine Learning
Machine learning is a field of
artificial intelligence that
involves training computer
systems to learn from data and
improve their performance on
a specific task without being
explicitly programmed. The
goal of machine learning is to
enable computers to recognize
patterns in data and make
predictions or decisions based
on those patterns.
Tech Master Edu Machine Learning
Learning
In machine learning, "learning" refers to the process of training an
algorithm to recognize patterns and make predictions or decisions
based on data.
• Algorithm is fed on large amount of training data.
• Consists of input-output pairs.
• Learn the underlying patterns and relationships
between the input and output.
• Goal : Make accurate predictions on new, unseen
data.
Tech Master Edu Machine Learning
Tech Master Edu Machine Learning
Supervised Learning
• Computer program is trained on a labeled dataset.
• Consists of input data and corresponding output data.
• Maps the input data (features or predictors) and
the output data (Label or target variable)
• Can predict the output for new input data.
• It is like having a teacher or a guide who helps you learn.
Tech Master Edu Machine Learning
Apple
Tech Master Edu Machine Learning
Unsupervised Learning
• Computer program is trained on an unlabeled dataset.
• Means output data is not provided during training.
• Must find patterns and relationships in the data on its own.
• Program explores the data, looking for patterns and tries to
group them together based on similarities.
• It is like exploring a new city without a map or a guide.
Tech Master Edu Machine Learning
Tech Master Edu Machine Learning
Reinforcement Learning
• A model is trained to make decisions based on trial and error.
• It takes actions and receive feedback on those actions.
• It receives rewards for making correct decisions and penalties for
making incorrect decisions.
• It helps to learn how to make better decisions in the future.
• Commonly used for game playing, robotics, and control systems.
• It is like children learn new behaviors and skills.
Tech Master Edu Machine Learning
Tech Master Edu Machine Learning
Well defined learning problems
Problem that is well-formulated and has clear objectives, input
data, and output data.
A well-defined problem is necessary for effective algorithm
development and evaluation.
Tech Master Edu
Characteristics :
• Clear objectives : The problem should have clear
objectives and goals that are well-defined and
measurable.
• Input data : The problem should have input data that is relevant
to the problem, well-structured, and easily accessible.
• Output data : The problem should have output data that is well-
defined and related to the objectives of the problem.
• Evaluation metrics : The problem should have well-defined
evaluation metrics that can be used to measure the performance
of the algorithm.
Tech Master Edu Machine Learning
Example
• Image classification
• Sentiment analysis and
• Recommendation systems.
These problems have clear objectives, input data (such as images or
text), output data (such as labels or ratings), and evaluation metrics
(such as accuracy or F1 score) that can be used to evaluate the
performance of the algorithm.
Tech Master Edu Machine Learning
Designing a learning system
Designing a learning system involves several steps, including :
1. Define the problem:
Ø Define the problem you want to solve.
Ø Specify the inputs, desired outputs, and evaluation metrics.
2. Collect and preprocess data:
Ø Collect and preprocess the data to train and test the system.
Ø Involve cleaning, transforming, and normalizing the data.
Tech Master Edu Machine Learning
Designing a learning system
3. Choose an algorithm:
Ø Select an appropriate algorithm.
Ø Selecting a supervised or unsupervised learning algorithm, as
well as tuning the parameters of the algorithm to optimize its
performance.
4. Train the model:
Ø Train the model on the training data.
Ø This involves running the algorithm on the training data.
Ø Adjusting the model parameters based on the feedback
provided by the evaluation metrics.
Tech Master Edu Machine Learning
Designing a learning system
5. Test the model:
Ø Test to evaluate its performance.
Ø Identify any issues or limitations with the model and suggest
areas for improvement.
6. Deploy the model:
Ø Deploy in a production environment.
Ø Involve integrating the model into an existing system or
developing a new application that utilizes the model.
Tech Master Edu Machine Learning
History of Machine Learning
1. 1950 - 1960
The first machine learning algorithms were developed. And were
based on ideas from psychology and statistics.
2. In the 1970s,
Machine learning began to be used for practical applications, such
as image recognition and natural language processing.
3. In the 1980s,
Support vector machines (SVMs) were introduced, which allowed
for more accurate and efficient classification of data.
Tech Master Edu Machine Learning
4. In the 1990s,
The rise of statistical learning algorithms, such as Bayesian
networks and Hidden Markov Models (HMMs), led to breakthroughs
in areas such as speech recognition and machine translation.
5. In 1997,
The IBM computer Deep Blue, which was a chess playing computer,
beat the world chess champion.
6. In the 2000s,
The advent of deep learning algorithms, such as convolutional
neural networks (CNNs) and recurrent neural networks (RNNs),
allowed for more complex and sophisticated machine learning
applications.
Tech Master Edu Machine Learning
7. The 21st Century
Today, machine learning is used in a wide range of fields :
a) AlphaGo
b) Image recognition and object detection: self-driving cars,
security cameras.
c) Speech recognition: such as Siri, Google Assistant, and Alexa.
d) Recommender systems: such as Netflix and Amazon.
e) Medical diagnosis and prediction
f) Fraud detection
g) Natural language processing : sentiment analysis, and chatbots.
Tech Master Edu Machine Learning
Introduction To Machine Learning Approaches
• Artificial Neural Network
• Clustering
• Re-inforcement learning
• Decision Tree Learning
• Bayesian Networks
• Support Vector machine
• Genetic Algorithm
Tech Master Edu Machine Learning
Artificial Neural Network (ANN)
• Inspired by structure and function of the human brain.
• Computational model that consists of interconnected nodes, called
neurons, which are organized into layers.
• Each neuron receives input from other neurons, processes that
input, and then sends an output signal to other neurons in the
network.
• The input layer receives input data, which is then passed through
one or more hidden layers, where the data is transformed and
processed.
• The final output is produced by the output layer, which provides
the result of the computation.
Tech Master Edu Machine Learning
Tech Master Edu Machine Learning
• ANNs can be used for a wide range of machine learning tasks, such as:
q Image and speech recognition
q Natural language processing and
q Time-series prediction.
Tech Master Edu Machine Learning
Clustering
• Type of unsupervised machine learning algorithm.
• Groups similar data points together based on their similarity or
distance to one another.
• The goal is to identify patterns in data without any prior knowledge
of the structure of the data or the labels associated with the data.
• There are many different clustering algorithms, each with its own
strengths and weaknesses.
• Most common clustering algorithms include
q k-means clustering
q hierarchical clustering and
q density-based clustering
Tech Master Edu Machine Learning
• Clustering has many applications in machine learning and data
analysis, such as :
Ø Customer segmentation
Ø Image and text analysis and
Ø Anomaly detection
Tech Master Edu Machine Learning
Decision Tree Learning
• Type of supervised machine learning algorithm that is used for
classification and regression tasks.
• Works by constructing a decision tree that represents a sequence
of decisions and their possible consequences.
• Each node in the decision tree represents a decision based on the
value of an input feature, and each branch represents the
possible outcomes of that decision.
• The leaf nodes of the tree represent the final prediction or
decision based on the input features.
Tech Master Edu Machine Learning
• Decision trees have many advantages, such as being easy to
interpret and visualize, and requiring minimal data preparation.
• There are many different algorithms and variations of decision tree
learning, such as CART (Classification and Regression Trees), ID3
(Iterative Dichotomiser 3), and C4.5.
Tech Master Edu Machine Learning
Bayesian Networks
• Based on Bayes' theorem, which is a mathematical formula that
describes how to update the probability of a hypothesis given new
evidence.
Tech Master Edu Machine Learning
• Application of Bayesian learning is in Bayesian networks.
• A Bayesian network is a probabilistic graphical model that uses
Bayesian inference for probability computations.
• Bayesian networks are ideal for taking an event that occurred and
predicting the likelihood that any one of several possible known
causes was the contributing factor.
• Bayesian networks can be used for modeling complex systems and
making predictions in various domains.
• It is often used in applications such as :
q Natural language processing
q Computer vision and
q Robotics
Tech Master Edu Machine Learning
Support Vecor Machine
• Support Vector Machine (SVM) is a type of supervised machine
learning algorithm that can be used for classification or regression
tasks.
• SVM tries to find the best possible boundary between the
different classes in the data by finding a hyperplane that
maximizes the margin between the two classes.
• The data points are represented as vectors in a high-dimensional
space.
• It is often used in applications such as image recognition, text
classification, and bioinformatics.
Tech Master Edu Machine Learning
Genetic Algorithm
• Genetic algorithm is a type of optimization algorithm.
• It is inspired by the process of natural selection in biological
evolution.
• It starts by generating a bunch of random solutions and then
tests(Calculate fitness) them to see how well they perform.
• The best solutions are then combined to create new solutions, just
like how a baby inherits traits from its parents.
• The process is repeated until the best solution is found.
• Commonly used to generate high-quality solutions for optimization
problems and search problems.
Tech Master Edu Machine Learning
Issues in machine learning and data science
There are many issues and challenges in machine learning and data
science, some of the most significant ones include:
Machine learning algorithms are only as
good as the data they are trained on. If the data is of poor quality,
contains errors or inconsistencies, or is biased in any way, the
resulting model will also be flawed.
As machine learning models become more
complex, it can be difficult to understand how they arrived at their
predictions or decisions.
Tech Master Edu Machine Learning
Machine learning algorithms often require
access to large amounts of sensitive data, such as personal
information, medical records, or financial data.
Machine learning models can suffer
from overfitting or underfitting, which can result in poor
performance on new data.
In some domains, such as medical research or rare
event prediction, there may be limited data available for training
machine learning models.
Machine learning algorithms can be
computationally intensive and may require significant amounts of
processing power or specialized hardware to run efficiently.
Tech Master Edu Machine Learning
Regression
• Regression is a statistical method used to model the relationship
between a dependent variable and one or more independent
variables.
• The goal of regression analysis is to identify the strength of the
relationship between the variables and to use that relationship to
make predictions about the dependent variable based on the
values of the independent variables.
• The model can take various forms, such as linear regression, logistic
regression, and polynomial regression, depending on the nature of
the relationship between the variables.
Tech Master Edu Machine Learning
Linear Regression
• It is a statistical method used to establish a relationship between a
dependent variable and one or more independent variables.
• Linear regressions makes predictions for continuous/real or
numeric values such as sales, salary, age, product price, etc.
• The goal of the analysis is to find the line of best fit that represents
the relationship between the two variables.
• The equation for a simple linear regression model is:
y = mx + b
• where y is the dependent variable, x is the independent variable, m
is the slope of the line, and b is the y-intercept.
Tech Master Edu Machine Learning
• It can be used for both prediction and inference(test hypotheses).
• The linear regression model provides a sloped straight line
representing the relationship between the variables.
Tech Master Edu Machine Learning
Types of Linear Regression
Linear regression can be further divided into two types of the
algorithm:
Ø If a single independent variable is used to predict the value of
a numerical dependent variable.
Ø The line is represented by the equation:
y = b0 + b1*x
Ø where y is the dependent variable, x is the independent
variable, b0 is the y-intercept, and b1 is the slope of the line.
Tech Master Edu Machine Learning
Tech Master Edu Machine Learning
Ø If more than one independent variable is used to predict the
value of numerical dependent variable.
Ø The equation for a multiple linear regression model is:
y = b0 + b1x1 + b2x2 + ... + bn*xn
Ø where y is the dependent variable, x1, x2, ..., xn are the
independent variables, b0 is the y-intercept, and b1, b2, ..., bn
are the slopes of the lines.
Tech Master Edu Machine Learning
Tech Master Edu Machine Learning
Linear Regression Line
A linear line showing the relationship between the dependent and
independent variables is called a regression line. A regression line
can show two types of relationship:
Tech Master Edu Machine Learning
Logistic Regression
• Logistic regression is a statistical method used to model the
relationship between a dependent variable and one or more
independent variables when the dependent variable is binary or
categorical.
• It is used to predict the probability of an event occurring based on
the values of the independent variables.
• The dependent variable is usually modeled as a binary variable,
with two possible outcomes such as 0 or 1, yes or no, or true or
false.
• For example, it could be used to predict whether a customer will
buy a product based on their age, income, and gender.
Tech Master Edu Machine Learning
• The logistic regression model uses a function called the logistic
function or sigmoid function to transform the output of a linear
equation into a value between 0 and 1.
• The equation for logistic regression is:
P(y=1) = 1 / (1 + e-z)
• where P(y=1) is the probability of the dependent variable y being
equal to 1, z is the linear combination of the independent
variables, and e is the base of the natural logarithm.
• Logistic regression is commonly used in fields such as healthcare,
finance, and marketing to predict the likelihood of outcomes such
as disease diagnosis, credit risk, or customer behavior.
Tech Master Edu Machine Learning
Tech Master Edu Machine Learning
Tech Master Edu Machine Learning
Bayesian Learning
• Bayesian learning is a fundamental statistical approach to the
problem of pattern classification.
• In Bayesian learning, a model is specified with parameters that
describe the probability distribution of the data.
• These parameters are initially set using prior knowledge or
assumptions, and then updated based on the observed data using
Bayes' theorem.
• The updated probabilities are used to make predictions about new
data.
Tech Master Edu Machine Learning
• Bayesian learning is useful in situations where there is limited data
or where the data is noisy or uncertain.
• Some examples of Bayesian learning applications include :
Ø spam filtering,
Ø medical diagnosis,
Ø image recognition
Ø natural language processing,
Ø speech recognition and
Ø recommendation systems.
Tech Master Edu Machine Learning
Baye’s Theorem
• Bayes' theorem is a mathematical concept that describes the
relationship between conditional probabilities of two events.
• It is used in artificial intelligence and machine learning to build
probabilistic models and make decisions based on uncertain
data.
Tech Master Edu Machine Learning
Concept learning
• Concept learning refers to the process of building a probabilistic
model that can recognize and generalize patterns in data and use
that knowledge to classify new examples.
• Bayesian concept learning involves using Bayesian inference to
update the probabilities of hypotheses based on observed data.
• Bayesian concept learning is particularly useful in situations where
there is uncertainty or noise in the data.
• It allows us to incorporate our prior knowledge and beliefs into the
model.
Tech Master Edu Machine Learning
The process of Bayesian concept learning involves:
v Defining a prior probability distribution over the space of possible
hypotheses, which represents our prior beliefs about the
classification task.
v Collecting a set of training examples, where each example is labeled
with the correct classification.
v Using the training examples and Bayes' theorem to compute the
posterior probability distribution over the space of possible
hypotheses, which represents our updated beliefs about the
classification task.
v Using the posterior distribution to make predictions about new,
unseen examples, by selecting the hypothesis that has the highest
posterior probability.
Tech Master Edu Machine Learning
Baye’s optimal classifier
• The Bayes Optimal Classifier is a probabilistic model that predicts
the most probable outcome for a new instance using the Bayes
Theorem.
• The Bayes Optimal Classifier chooses the class that has the greatest
a posteriori probability of occurrence, and it is the theoretically
optimal classifier for a given classification problem.
• It’s also related to Maximum a Posteriori (MAP), a probabilistic
framework for determining the most likely hypothesis for a training
dataset.
Tech Master Edu Machine Learning
Take a hypothesis space that has 3 hypotheses h1, h2, and h3.
The posterior probabilities of the hypotheses are as follows:
h1 -> 0.4, h2 -> 0.3, h3 -> 0.3
Hence, h1 is the MAP hypothesis. (MAP => max posterior)
• This model is also referred to as the Bayes optimal learner, the
Bayes classifier, Bayes optimal decision boundary, or the Bayes
optimal discriminant function.
• A Bayes optimal classifier is a system that classifies new cases
according to Equation.
Tech Master Edu Machine Learning
• Where vj is a new instance to be classified, H is the set of hypotheses
for classifying the instance, hi is a given hypothesis, P(vj|hi) is the
posterior probability for vi given hypothesis hi, and P(hi|D) is the
posterior probability of the hypothesis hi given the data D.
Yes No P(yes) P(No)
Sunny 2 3 2/9 3/5
Outcast 4 0 4/9 0/5
Rain 3 2 3/9 2/5
Total 9 5 100% 100%
Tech Master Edu Machine Learning
Yes No P(yes) P(No)
Hot 2 2 2/9 2/5
Mild 4 2 4/9 2/5
cold 3 1 3/9 1/5
Total 9 5 100% 100%
Yes 9 9/14
No 5 5/14
Total 14 100%
P(yes|sunny,hot) = 0.031 P(No|sunny,hot) = 0.08571
P(Yes) = 0.114 P(No) = 0.317
Tech Master Edu Machine Learning
Naive Bayes classifier
• A Naive Bayes classifier is a probabilistic machine learning model
that’s used for classification task. The crux of the classifier is based
on the Bayes theorem.
• Naive Bayes classifiers are a collection of classification algorithms.
• It is not a single algorithm but a family of algorithms where all of
them share a common principle, i.e. every pair of features being
classified is independent of each other.
• Assumption:The fundamental Naive Bayes assumption is that each
feature makes an: Independent, Equal contribution to the outcome.
• Note: The assumptions made by Naive Bayes are not generally
correct in real-world situations.
Tech Master Edu Machine Learning
Since the denominator is constant here so we can remove it. It’s purely
your choice if you want to remove it or not. Removing the denominator
will help you save time and calculations.
Tech Master Edu Machine Learning
Tech Master Edu Machine Learning
Tech Master Edu Machine Learning
Now if we send our test data, suppose test = (Cow, Medium, Black)
Probability of petting an animal :
Tech Master Edu Machine Learning
Bayesian belief network
• Bayesian Belief Network is a Probabilistic Graphical Model (PGM)
that represents conditional dependencies between random
variables through a Directed Acyclic Graph (DAG).
• It is also called a Bayes network, belief network, decision network,
or Bayesian model.
• The Bayesian Belief Network is based on the Bayes Theorem.
• Bayesian Network can be used for building models from data and
experts opinions, and it consists of two parts:
Ø Directed Acyclic Graph
Ø Table of conditional probabilities
Tech Master Edu Machine Learning
• The generalized form of Bayesian network that represents and
solve decision problems under uncertain knowledge is known as an
Influence diagram.
• A Bayesian network graph is made up of nodes and Arcs (directed
links), where:
Tech Master Edu Machine Learning
• The nodes in the graph represent random variables and the edges
that connect the nodes represent the relationships between the
random variables.
• Bayesian networks provide useful benefits as a probabilistic model.
For example:
• Visualization : The model provides a direct way to visualize
the structure of the model and motivate the design of new
models.
• Relationships : Provides insights into the presence and
absence of the relationships between random variables.
• Computations : Provides a way to structure complex
probability calculations.
Tech Master Edu Machine Learning
E-M Algorithm
• The E-M (Expectation-Maximization) algorithm is an iterative
optimization method used to estimate the parameters of
probabilistic models when there are missing or incomplete data.
• It is a technique to find maximum likelihood estimation when the
latent variables are present. It is also referred to as the latent
variable model.
• The Expectation-Maximization (EM) algorithm is defined as the
combination of various unsupervised machine learning algorithms
• Here's a high-level explanation of the E-M algorithm:
1. Initialization: Start by initializing the parameters of the model
with some initial values.
Tech Master Edu Machine Learning
2.Expectation step (E - step): It involves the estimation (guess) of all
missing values in the dataset so that after completing this step, there
should not be any missing value.
3. Maximization step (M - step): This step involves the use of
estimated data in the E-step and updating the parameters.
4. Iteration: Repeat E-step and M-step until the convergence of the
values occurs.
Convergence is defined as the specific situation in probability based
on intuition.
Tech Master Edu Machine Learning
The E-M algorithm keeps going back and forth between estimating
probabilities and updating the guess, gradually improving the
accuracy of the guess.
Tech Master Edu Machine Learning
Support vector machine
• Support Vector Machine or SVM is a supervised Learning algorithm,
which is used for Classification as well as Regression problems.
• The goal of the SVM algorithm is to create the best line or decision
boundary. This best decision boundary is called a hyperplane.
• It can segregate n-dimensional space into classes so that we can
easily put the new data point in the correct category in the future.
• SVM chooses the extreme points/vectors that help in creating the
hyperplane.
• These extreme cases are called as support vectors.
• SVM algorithm can be used for Face detection, image classification,
text categorization, etc.
Tech Master Edu Machine Learning
Tech Master Edu Machine Learning
Tech Master Edu Machine Learning
Types of SVM
• Linear SVM
Linear SVM is used for linearly separable data, which means if a
dataset can be classified into two classes by using a single straight
line, then such data is termed as linearly separable data, and
classifier is used called as Linear SVM classifier.
Tech Master Edu Machine Learning
Types of SVM
• Non-linear SVM
Non-Linear SVM is used for non-linearly separated data, which
means if a dataset cannot be classified by using a straight line,
then such data is termed as non-linear data and classifier used is
called as Non-linear SVM classifier.
Tech Master Edu Machine Learning
Kernel
• A kernel is a function that is used to transform the original feature
space into a higher-dimensional space.
• The kernel function enables SVM to effectively handle non-linear
classification problems.This allows SVM to find a non-linear
decision boundary in the transformed feature space, even though
the original data may not be linearly separable.
Tech Master Edu Machine Learning
Types of Support vector kernel
1. Linear Kernel
• The linear kernel is the simplest kernel function.
• It performs a linear transformation on the input data without
introducing any new dimensions.
• It works well when the data is linearly separable, meaning the
classes can be separated by a straight line or hyperplane.
• The linear kernel is mostly preferred for text-classification
problems.
Tech Master Edu Machine Learning
Types of Support vector kernel
2. Polynomial Kernel
• The polynomial kernel introduces new dimensions by computing
all possible polynomials up to a specified degree of the original
features.
• It captures non-linear relationships between the data points.
Tech Master Edu Machine Learning
Types of Support vector kernel
3. Gaussian Kernel
• The Gaussian kernel, also known as the Radial Basis Function
(RBF) kernel.
• The RBF kernel is widely used and is effective for handling non-
linearly separable data.
• It maps the data into an infinite-dimensional space where each
data point is represented as a Gaussian function.
Tech Master Edu Machine Learning
• The gamma parameter controls the influence of each training
example on the decision boundary.
• Here, x1, x2 represents the data you’re trying to classify.
Tech Master Edu Machine Learning
Hyperplane
• A hyperplane is a decision boundary that separates different
classes or groups of data points in a higher-dimensional feature
space.
• A hyperplane is a subspace with one less dimension than the
feature space.
• If there are 2 features, then hyperplane will be a straight line. And
if there are 3 features, then hyperplane will be a 2-dimension
plane.
• We always create a hyperplane that has a maximum margin, which
means the maximum distance between the data points.
Tech Master Edu Machine Learning
Properties of SVM
Support Vector Machines (SVM) have several important properties
that contribute to their effectiveness and popularity as a machine
learning algorithm:
1. Maximal Margin: SVM aims to find a hyperplane that maximizes
the margin, which is the distance between the hyperplane and the
closest data points from each class.
2. Non-linearity with Kernels: SVM can handle non-linearly
separable data by utilizing kernel functions. Kernels transform the
data into a higher-dimensional feature space where a linear
decision boundary can be found.
Tech Master Edu Machine Learning
3. Support Vectors: SVM uses a subset of the training data called
support vectors. These are the data points that are closest to the
decision boundary.
4. Sparsity: SVM often has a sparse solution, meaning that the
decision boundary is determined by only a small number of
support vectors rather than the entire training dataset.
5. Versatility: SVM can be applied to both classification and
regression tasks.
6. Control of Complexity: SVM provides control over the model's
complexity through the choice of the hyperparameter.
Tech Master Edu Machine Learning
Issues in SVM
1. Scalability with Large Datasets: SVMs can become computationally
expensive and memory-intensive, particularly with large-scale
datasets.
2. Lack of Probabilistic Interpretation: SVMs inherently provide a
decision boundary that separates classes, but they do not directly
provide probabilistic outputs.
3. Imbalanced Data: When dealing with imbalanced datasets where
the number of samples in different classes is significantly unequal,
SVMs may be biased towards the majority class.
4. Interpretability: SVMs tend to provide good predictive performance,
but they may not offer direct interpretability of the learned model.
Tech Master Edu Machine Learning
Agenda
• Introduction Decision Tree learning
• Inductive Bias
• Inductive Inference
• Example - Tennis Dataset
• Entropy
• Gini Impurity
• Information Gain
• Issues in Decision tree learning
Tech Master Edu Machine Learning
Decision Tree Learning
• Type of supervised machine learning algorithm that is used for
classification and regression tasks.
• Works by constructing a decision tree that represents a sequence
of decisions and their possible consequences.
• Each node in the decision tree represents a decision based on the
value of an input feature, and each branch represents the possible
outcomes of that decision.
• The leaf nodes of the tree represent the final prediction or
decision based on the input features.
• The goal is to create a model that predicts the value of a target
variable by learning simple decision rules inferred from the data
features.
Tech Master Edu Machine Learning
Inductive Bias
• The inductive bias refers to the assumptions or constraints that
shape how the decision tree algorithm builds the tree and makes
predictions.
• Decision Tree algorithm has an inductive bias towards using
hierarchical if-else rules to represent the relationships between
features and the target variable.
• It uses binary splitting to divide the data.
• The inductive bias in Decision Trees assumes feature independence,
which means that the algorithm treats features as unrelated when
making splitting decisions.
Tech Master Edu Machine Learning
Tech Master Edu Machine Learning
Inductive Inference
• Inductive inference refers to the process of drawing general
conclusions or making predictions based on limited observations
or examples.
• we start with a set of specific observations or examples and aim to
derive general principles or rules that can be applied to new,
unseen situations.
• The goal is to make reliable predictions or generalizations beyond
the observed data.
• Inductive inference involves reasoning from specific instances to
general principles.
Tech Master Edu Machine Learning
Example Tennis Dataset
Tech Master Edu Machine Learning
Tech Master Edu Machine Learning
Entropy
• Entropy measures the amount of uncertainty or randomness in a
set of data.
• It provides a measure of how unpredictable or uncertain the
outcomes of a random variable are.
Tech Master Edu Machine Learning
Gini Impurity
• Gini impurity is a measure of impurity or disorder used in decision
tree algorithms for classification tasks.
Tech Master Edu Machine Learning
Entropy v/s Gini Impurity
Tech Master Edu Machine Learning
Information gain
• Information gain is a concept used in decision tree algorithms for
feature selection.
• It helps determine which feature is the most informative or useful
for making predictions.
Tech Master Edu Machine Learning
Tech Master Edu Machine Learning
Issues in Decision Tree learning
1. Overfitting: Decision trees can easily overfit the training data if
they become too complex and capture noise or irrelevant patterns.
2. High Variance: Decision trees are prone to high variance, meaning
they can be sensitive to small changes in the training data.
3. Handling Missing Data: Decision trees handle missing data by
either ignoring instances with missing values.
4. Handling Imbalanced Data: Decision trees can be biased towards
majority classes in imbalanced datasets, leading to poor
predictions for minority classes.
5. Computational Complexity: Constructing decision trees can be
computationally expensive, especially for large datasets.
Tech Master Edu Machine Learning
ID-3 Algorithm
• The ID3 (Iterative Dichotomiser 3) algorithm is a decision tree
learning algorithm.
• It is used for classification tasks and is based on the concept of
information gain.
• The algorithm follows a greedy approach of building a decision tree
by selecting a best attribute that yields maximum Information Gain
(IG) or minimum Entropy (H).
• The algorithm iteratively (repeatedly) dichotomizes(divides)
features into two or more groups at each step.
• ID3 uses a top-down greedy approach to build a decision tree.
Tech Master Edu Machine Learning
• The top-down approach means that we start building the tree
from the top and the greedy approach means that at each iteration
we select the best feature at the present moment to create a node.
ID3 Steps
Ø Calculate the Information Gain of each feature.
Ø The dataset is split into subsets based on the selected
attribute's values.
Ø Make a decision tree node using the feature with the maximum
Information gain.
Ø If all rows belong to the same class, make the current node as a
leaf node with the class as its label.
Ø Repeat for the remaining features until we run out of all
features, or the decision tree has all leaf nodes.
Tech Master Edu Machine Learning
Tech Master Edu Machine Learning
Tech Master Edu Machine Learning
IG calculation for Fever:
In this(Fever) feature there are 8 rows having value YES and 6 rows
having value NO.
Tech Master Edu Machine Learning
The block, below, demonstrates the calculation of Information Gain
for Fever.
Tech Master Edu Machine Learning
Next, we calculate the IG for the features “Cough” and “Breathing
issues”.
Since the feature Breathing issues have the highest Information Gain
it is used to create the root node.
Hence, after this initial step our tree looks like this:
Tech Master Edu Machine Learning
Next, from the remaining two unused features, namely, Fever and
Cough, we decide which one is the best for the left branch of
Breathing Issues.
Since the left branch of Breathing Issues denotes YES, we will work
with the subset of the original data i.e the set of rows having YES as
the value in the Breathing Issues column. These 8 rows are shown
below:
Tech Master Edu Machine Learning
Next, we calculate the IG for the features Fever and Cough using the
subset Sʙʏ (Set Breathing Issues Yes) which is shown above :
Note: For IG calculation the Entropy will be calculated from the
subset Sʙʏ and not the original dataset S.
IG of Fever is greater than that of Cough, so we select Fever as the left
branch of Breathing Issues.
Our tree now looks like this:
Tech Master Edu Machine Learning
Tech Master Edu Machine Learning
Instance Based Learning
• Instance-based learning refers to a class of algorithms that make
predictions or classifications based on specific instances or
examples from the training data.
• It works without explicitly building a general model.
• These algorithms store the training instances and use them directly
during the prediction phase.
• The k-nearest neighbors algorithm and case-based reasoning are
examples of instance-based learning algorithms.
Tech Master Edu Machine Learning
Model Based Learning
• It is a machine learning approach that involves building a model
from the training data in order to make predictions or
classifications on new, unseen instances.
• Model-based learning focuses on constructing an explicit
representation of the underlying patterns or relationships in the
data.
• Here are a few examples of model-based learning algorithms and
techniques:
Ø Linear Regression
Ø Decision tree
Ø Support vector machine
Tech Master Edu Machine Learning
K-Nearest Neighbors(K-NN)
• The k-nearest neighbors (k-NN) algorithm is a popular instance-
based learning algorithm used for classification and regression
tasks in machine learning.
• It is a non-parametric algorithm, meaning it does not make any
assumptions about the underlying data distribution.
• The KNN algorithm assumes that similar things exist in close
proximity.
• KNN captures the idea of similarity (sometimes called distance,
proximity, or closeness) with some mathematics— calculating the
distance between points on a graph.
Tech Master Edu Machine Learning
Steps for Algorithm
1. Prepare the Data
2. Choose the Value of k (Value of K can be selected as k = sqrt(n).
where n = number of data points in
training data Odd number is preferred as K
value.)
3. Find the k Nearest Neighbors
4. Calculate Distance (Euclidean distance, Manhattan distance, and
cosine similarity)
5. Classify or Predict
6. Output the Result
Tech Master Edu Machine Learning
Example
Suppose we have a dataset which can be plotted as follows −
Now, we need to classify new data point with black dot (at point 60,60)
into blue or red class.
Tech Master Edu Machine Learning
We are assuming K = 3 i.e. it would find three nearest data points.
We can see in the above diagram the three nearest neighbors of the
data point with black dot. Among those three, two of them lies in Red
class hence the black dot will also be assigned in red class.
Tech Master Edu Machine Learning
Locally weighted regression
• Locally Weighted Regression (LWR) is a non-parametric regression
algorithm used to model the relationship between variables.
• It aims to fit a linear regression model to a dataset by giving more
weight to nearby data points.
• The basic assumption for a linear regression is that the data must
be linearly distributed.
• But what if the data is not linearly distributed. Can we still apply
the idea of regression? And the answer is ‘ yes’ … we can apply
regression and it is called as locally weighted regression.
• LWR works by assigning weights to the known points based on
their proximity to the unknown point.
Tech Master Edu Machine Learning
• The closer a known point is to the unknown point, the higher its
weight.
• This means that the known points that are closer to the unknown
point have a stronger influence on the prediction.
Here's how Locally Weighted Regression works:
1. Input
The algorithm takes as input a training dataset consisting of
instances with input variables (features) and corresponding
output values (targets), as well as a query instance for which the
regression prediction is desired.
Tech Master Edu Machine Learning
2. Weight Calculation
For each training instance in the dataset, a weight is calculated
based on its proximity to the query instance. The most common
weight function used is the Gaussian kernel, which assigns higher
weights to instances closer to the query instance and lower weights
to instances farther away.
3. Local Regression
A weighted regression model is fit to the training instances using the
calculated weights.
4. Prediction
Once the local regression model is fitted, the prediction for the
query instance is made by applying the learned model to the query
instance's input variables.
Tech Master Edu Machine Learning
Tech Master Edu Machine Learning
Radial Basis Function Network
• A Radial Basis Function Network (RBFN) is a type of artificial neural
network that uses radial basis functions as activation functions.
• RBFNs are commonly used for regression and classification tasks.
• An RBFN typically consists of three layers: an input layer, a hidden
layer with radial basis functions, and an output layer.
• The input layer receives the input features.
• The hidden layer contains radial basis function neurons, which
are responsible for transforming the input data to a higher-
dimensional space.
• The output layer performs the final regression or classification
based on the transformed data.
Tech Master Edu Machine Learning
Here are the key components and characteristics of a Radial Basis
Function Network:
1. Architecture
2. Radial basis function
3. Center and weights
4. Training
5. Prediction
RBFNs have several advantages, including their ability to approximate
complex non-linear relationships, efficient training procedures, and
good generalization capabilities.
Tech Master Edu Machine Learning
Tech Master Edu Machine Learning
Case Based Learning
• Case-based learning, also known as case-based reasoning (CBR), is
a machine learning approach that solves new problems by reusing
solutions from similar past problems.
• It is a form of instance-based learning where the training instances
are called "cases" and are represented by a combination of
features and their corresponding solutions.
• Applications :
Ø pattern recognition
Ø diagnosis
Ø troubleshooting and planning
Tech Master Edu Machine Learning
In case-based learning, the process involves four main steps:
1. Retrieve
Given a new problem or query, retrieve similar cases from the
case base.
2. Reuse
Adapt or modify the solution of the retrieved cases to fit the
current problem.
3. Revise
Assess the quality and appropriateness of the adapted solution.
4. Retain
Store the new problem, the adapted solution, and potentially the
revised solution as a new case in the case base for future use.
Tech Master Edu Machine Learning
Tech Master Edu Machine Learning