0% found this document useful (0 votes)
32 views18 pages

Unit 3

The document provides an overview of supervised learning in machine learning, detailing concepts such as linear regression, classification, and various algorithms. It explains the differences between supervised and unsupervised learning, as well as the importance of training and test datasets. Additionally, it discusses the advantages and disadvantages of different machine learning models, including Naive Bayes, Decision Trees, and Support Vector Machines.

Uploaded by

TAMILSELVI R
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views18 pages

Unit 3

The document provides an overview of supervised learning in machine learning, detailing concepts such as linear regression, classification, and various algorithms. It explains the differences between supervised and unsupervised learning, as well as the importance of training and test datasets. Additionally, it discusses the advantages and disadvantages of different machine learning models, including Naive Bayes, Decision Trees, and Support Vector Machines.

Uploaded by

TAMILSELVI R
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 18

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

UNIT III SUPERVISED LEARNING


Introduction to machine learning – Linear Regression Models: Least squares, single
& multiple variables, Bayesian linear regression, gradient descent, Linear Classification
Models: Discriminant function – Probabilistic discriminative model - Logistic regression,
Probabilistic generative model – Naive Bayes, Maximum margin classifier – Support vector
machine, Decision Tree, Random forests.

PART - A

1. What is Machine Learning?


Machine learning is a branch of computer science which deals with system
programming in order to automatically learn and improve with experience. For
example: Robots are programed so that they can perform the task based on data
they gather from sensors. It automatically learns programs from data.

2. Mention the difference between Data Mining and Machine learning?


Machine learning relates with the study, design and development of the
algorithms that give computers the capability to learn without being explicitly
programmed. While, data mining can be defined as the process in which the
unstructured data tries to extract knowledge or unknown interesting patterns.

3. What is ‘Overfitting’ in Machine learning?


In machine learning, when a statistical model describes random error or noise
instead of underlying relationship ‘overfitting’ occurs. When a model is
excessively complex, overfitting is normally observed, because of having too
many parameters with respect to the number of training data types. The model
exhibits poor performance which has been overfit.

4. Why overfitting happens?


The possibility of overfitting exists as the criteria used for training the model
is not the same as the criteria used to judge the efficacy of a model.

5. How can you avoid overfitting?


By using a lot of data overfitting can be avoided, overfitting happens relatively
as you have a small dataset, and you try to learn from it. But if you have a small
database and you are forced to come with a model based on that. In such situation,
you can use a technique known as cross validation. In this method the dataset
splits into two section, testing and training datasets, the testing dataset will only
test the model while, in training dataset, the datapoints will come up with the
model. In this technique, a model is usually given a dataset of a known data on
which training (training data set) is run and a dataset of unknown data against
which the model is tested. The idea of cross validation is to define a dataset to
“test” the model in the training phase.

6. What are the five popular algorithms of Machine Learning?


 Decision Trees
 Neural Networks (back propagation)
 Probabilistic networks
 Nearest Neighbor
 Support vector machines
7. What are the different Algorithm techniques in Machine Learning?
The different types of techniques in Machine Learning are
 Supervised Learning
 Unsupervised Learning
 Semi-supervised Learning
 Reinforcement Learning
 Transduction
 Learning to Learn

8. What are the three stages to build the hypotheses or model in machine
learning?
 Model building
 Model testing
 Applying the model

9. What is ‘Training set’ and ‘Test set’?


In various areas of information science like machine learning, a set of data is
used to discover the potentially predictive relationship known as ‘Training Set’.
Training set is an examples given to the learner, while Test set is used to test the
accuracy of the hypotheses generated by the learner, and it is the set of example
held back from the learner. Training set are distinct from Test set.

10. What are the advantages of Naive Bayes?


In Naïve Bayes classifier will converge quicker than discriminative models
like logistic regression, so you need less training data. The main advantage is that
it can’t learn interactions between features.

11. What is the main key difference between supervised and unsupervised
machine learning?
supervised learning Unsupervised learning
The supervised learning technique needsUnsupervised learning does not
labelled data to train the model. Forneed any labelled dataset. This is
example, to solve a classification problemthe main key difference between
(a supervised learning task), you need to supervised learning and
have label data to train the model and to unsupervised learning.
classify the data into your labelled groups.
12. What is a Linear Regression?
In simple terms, linear regression is adopting a linear approach to modeling
the relationship between a dependent variable (scalar response) and one or more
independent variables (explanatory variables). In case you have one explanatory
variable, you call it a simple linear regression. In case you have more than one
independent variable, you refer to the process as multiple linear regressions.

13. What are the disadvantages of the linear regression model?


One of the most significant demerits of the linear model is that it is sensitive
and dependent on the outliers. It can affect the overall result. Another notable
demerit of the linear model is overfitting. Similarly, underfitting is also a
significant disadvantage of the linear model.

14. What is the difference between classification and regression?


Classification is used to produce discrete results; classification is used to
classify data into some specific categories. For example, classifying emails into
spam and non-spam categories. Whereas, we use regression analysis when we are
dealing with continuous data, for example predicting stock prices at a certain
pointin time.

15. What is the difference between stochastic gradient descent (SGD) and gradient
descent (GD)?
Both algorithms are methods for finding a set of parameters that minimize a
loss function by evaluating parameters against data and then making adjustments.
In standard gradient descent, you'll evaluate all training samples for each set of
parameters. This is akin to taking big, slow steps toward the solution. In stochastic
gradient descent, you'll evaluate only 1 training sample for the set of parameters
before updating them. This is akin to taking small, quick steps toward the
solution.

16. What are the different types of least squares?


Least squares problems fall into two categories: linear or ordinary least
squares and nonlinear least squares, depending on whether or not the residuals are
linear in all unknowns. The linear least-squares problem occurs in statistical
regression analysis; it has a closed-form solution.

17. What is the difference between least squares regression and multiple regression?
The goal of multiple linear regression is to model the linear relationship
between the explanatory (independent) variables and response (dependent)
variables. In essence, multiple regression is the extension of ordinary least-
squares(OLS) regression because it involves more than one explanatory variable.
18. What is the principle of least squares?
Principle of Least Squares" states that the most probable values of a system of
unknown quantities upon which observations have been made, are obtained by
making the sum of the squares of the errors a minimum.

19. What are some advantages to using Bayesian linear regression?


Doing Bayesian regression is not an algorithm but a different approach to
statistical inference. The major advantage is that, by this Bayesian processing, you
recover the whole range of inferential solutions, rather than a point estimate and a
confidence interval as in classical regression.

20. What are the advantages of Bayesian Regression?


 Extremely efficient when the dataset is tiny.
 Particularly well-suited for online learning as opposed to batch learning,
when we know the complete dataset before we begin training the model.
This is so that Bayesian Regression can be used without having to save
data.
 The Bayesian technique has been successfully applied and is quite strong
mathematically. Therefore, using this requires no additional prior
knowledge of the dataset.

21. What are the disadvantages of Bayesian Regression?


 The model's inference process can take some time.
 The Bayesian strategy is not worthwhile if there is a lot of data accessible
for our dataset, and the regular probability approach does the task more
effectively.

22. What are types of classification models?


 Logistic Regression
 Naive Bayes
 K-Nearest Neighbors
 Decision Tree
 Support Vector Machines

23. Why is random forest better than SVM?


Random Forest is intrinsically suited for multiclass problems, while SVM is
intrinsically two-class. For multiclass problem you will need to reduce it into
multiple binary classification problems. Random Forest works well with a
mixtureof numerical and categorical features.
24. What is probabilistic discriminative model?
Discriminative models are a class of supervised machine learning models which make predictions by
estimating conditional probability P(y|x). In order to use a generative model, more unknowns should be
solved: one has to estimate probability of each class and probability of observation given class.

25. What is SVM?


It is a supervised learning algorithm used both for classification and regressionproblems. A type of
discriminative modelling, support vector machine (SVM) creates a decision boundary to segregate n-
dimensional space into classes. The best decision boundary is called a hyperplane created by choosing the
extreme points called the support vectors.

26. What is Decision Tree Classification?


A decision tree builds classification (or regression) models as a tree structure, with datasets broken up
into ever-smaller subsets while developing the decision tree, literally in a tree-like way with branches and
nodes. Decision trees can handle both categorical and numerical data.

PART B

1. Explain the types of learning in machine learning.

Types of Machine Learning

Machine learning is a subset of AI, which enables the machine to automatically learn from data,
improve performance from past experiences, and make predictions. Machine learning contains a set
of algorithms that work on a huge amount of data. Data is fed to these algorithms to train them, and
on the basis of training, they build the model & perform a specific task.

These ML algorithms help to solve different business problems like Regression, Classification,
Forecasting, Clustering, and Associations, etc.
Based on the methods and way of learning, machine learning is divided into mainly four types, which
are:
1. Supervised Machine Learning
2. Unsupervised Machine Learning
3. Semi-Supervised Machine Learning
4. Reinforcement Learning
Supervised Machine Learning
Supervised machine learning is based on supervision. It means in the supervised learning
technique, we train the machines using the "labelled" dataset, and based on the training, the machine
predicts the output. Here, the labelled data specifies that some of the inputs are already mapped to the
output. More preciously, we can say; first, we train the machine with the input and corresponding
output, and then we ask the machine to predict the output using the test dataset.

Example: Suppose we have an input dataset of cats and dog images. So, first, we will provide
the training to the machine to understand the images, such as the shape & size of the tail of cat and
dog, Shape of eyes, colour, height (dogs are taller, cats are smaller), etc. After completion of training,
we input the picture of a cat and ask the machine to identify the object and predict the output. Now,
the machine is well trained, so it will check all the features of the object, such as height, shape, color,
eyes, ears, tail, etc., and find that it's a cat. So, it will put it in the Cat category. This is the process of
how the machine identifies the objects in Supervised Learning.
The main goal of the supervised learning technique is to map the input variable(x) with the output
variable(y). Some real-world applications of supervised learning are Risk Assessment, Fraud
Detection, Spam filtering, etc.
Supervised machine learning can be classified into two types of problems, which are given
below:
o Classification
o Regression

a) Classification
Classification algorithms are used to solve the classification problems in which the output variable is
categorical, such as "Yes" or No, Male or Female, Red or Blue, etc. The classification algorithms
predict the categories present in the dataset. Some real-world examples of classification algorithms
are Spam Detection, Email filtering, etc.
Some popular classification algorithms are given below:
o Random Forest Algorithm
o Decision Tree Algorithm
o Logistic Regression Algorithm
o Support Vector Machine Algorithm

b) Regression
Regression algorithms are used to solve regression problems in which there is a linear relationship
between input and output variables. These are used to predict continuous output variables, such as
market trends, weather prediction, etc.

Some popular Regression algorithms are given below:


o Simple Linear Regression Algorithm
o Multivariate Regression Algorithm
o Decision Tree Algorithm
o Lasso Regression

Advantages:
o Since supervised learning work with the labelled dataset so we can have an exact idea about
the classes of objects.
o These algorithms are helpful in predicting the output on the basis of prior experience.
Disadvantages:
o These algorithms are not able to solve complex tasks.
o It may predict the wrong output if the test data is different from the training data.
o It requires lots of computational time to train the algorithm.

Applications of Supervised Learning


Some common applications of Supervised Learning are given below:
oImageSegmentation
oMedicalDiagnosis
oFraud Detection
oSpam detection Speech Recognition

2. Unsupervised Machine Learning


Unsupervised learning is different from the Supervised learning technique; as its name
suggests, there is no need for supervision. It means, in unsupervised machine learning, the machine is
trained using the unlabeled dataset, and the machine predicts the output without any supervision. In
unsupervised learning, the models are trained with the data that is neither classified nor labelled, and
the model acts on that data without any supervision.

Unsupervised Learning can be further classified into two types, which are given below:
o Clustering
o Association

Advantages:
o These algorithms can be used for complicated tasks compared to the supervised ones because
these algorithms work on the unlabeled dataset.
o Unsupervised algorithms are preferable for various tasks as getting the unlabeled dataset is
easier as compared to the labelled dataset.
Disadvantages:
o The output of an unsupervised algorithm can be less accurate as the dataset is not labelled,
and algorithms are not trained with the exact output in prior.
o Working with Unsupervised learning is more difficult as it works with the unlabelled dataset
that does not map with the output.

Applications of Unsupervised Learning

Network Analysis Recommendation Systems


2. Explain in details about regression models- linear regression models?
Linear Regression in Machine Learning
a. Linear regression is one of the easiest and most popular Machine Learning algorithms. It
is a statistical method that is used for predictive analysis. Linear regression makes
predictions for continuous/real or numeric variables such as sales, salary, age, product
price, etc.

b. Linear regression algorithm shows a linear relationship between a dependent (y) and one or
more independent (y) variables, hence called as linear regression. Since linear regression
shows the linear relationship, which means it finds how the value of the dependent variable is
changing according to the value of the independent variable.

c. The linear regression model provides a sloped straight line representing the relationship
between the variables. Consider the below image:

y= a0+a1x+ ε
Here,
Y= Dependent Variable (Target Variable)
X= Independent Variable (predictor Variable)
a0= intercept of the line (Gives an additional degree of freedom)
a1 = Linear regression coefficient (scale factor to each input value).
ε = random error

The values for x and y variables are training datasets for Linear Regression model representation.

Types of Linear Regression


Linear regression can be further divided into two types of the algorithm:
o SimpleLinearRegression:
If a single independent variable is used to predict the value of a numerical dependent variable,
then such a Linear Regression algorithm is called Simple Linear Regression.
o MultipleLinearregression:
If more than one independent variable is used to predict the value of a numerical dependent
variable, then such a Linear Regression algorithm is called Multiple Linear Regression.

Linear Regression Line


A linear line showing the relationship between the dependent and independent variables is called
a regression line. A regression line can show two types of relationship:

o PositiveLinearRelationship:
If the dependent variable increases on the Y-axis and independent variable increases on X-
axis, then such a relationship is termed as a Positive linear relationship.

o NegativeLinearRelationship:
If the dependent variable decreases on the Y-axis and independent variable increases on the X-
axis, then such a relationship is called a negative linear relationship.

Finding the best fit line:

When working with linear regression, our main goal is to find the best fit line that means the error
between predicted values and actual values should be minimized. The best fit line will have the least
error. The different values for weights or the coefficient of lines (a 0, a1) gives a different line of
regression, so we need to calculate the best values for a 0 and a1 to find the best fit line, so to calculate
this we use cost function.
Cost function-
o The different values for weights or coefficient of lines (a 0, a1) gives the different line of
regression, and the cost function is used to estimate the values of the coefficient for the best fit
line.
o Cost function optimizes the regression coefficients or weights. It measures how a linear
regression model is performing.
o We can use the cost function to find the accuracy of the mapping function, which maps the
input variable to the output variable. This mapping function is also known as Hypothesis
function.

For Linear Regression, we use the Mean Squared Error (MSE) cost function, which is the average
of squared error occurred between the predicted values and actual values. It can be written as:

Where,
N=Total number of observation
Yi = Actual value
(a1xi+a0)= Predicted value.

Residuals: The distance between the actual value and predicted values is called residual. If the
observed points are far from the regression line, then the residual will be high, and so cost function
will high. If the scatter points are close to the regression line, then the residual will be small and
hence the cost function.

Gradient Descent:
o Gradient descent is used to minimize the MSE by calculating the gradient of the cost function.
o A regression model uses gradient descent to update the coefficients of the line by reducing the
cost function.
o It is done by a random selection of values of coefficient and then iteratively update the values
to reach the minimum cost function.

Model Performance:

The Goodness of fit determines how the line of regression fits the set of observations. The process of
finding the best model out of various models is called optimization.

3. Explain in details about regression models- linear classification models.

Logistic Regression in Machine Learning

o Logistic regression is one of the most popular Machine Learning algorithms, which comes
under the Supervised Learning technique. It is used for predicting the categorical dependent
variable using a given set of independent variables.
o Logistic regression predicts the output of a categorical dependent variable. Therefore the
outcome must be a categorical or discrete value. It can be either Yes or No, 0 or 1, true or
False, etc. but instead of giving the exact value as 0 and 1, it gives the probabilistic values
which lie between 0 and 1.
o Logistic Regression is much similar to the Linear Regression except that how they are used.
Linear Regression is used for solving Regression problems, whereas Logistic regression is
used for solving the classification problems.
o In Logistic regression, instead of fitting a regression line, we fit an "S" shaped logistic
function, which predicts two maximum values (0 or 1).
o The curve from the logistic function indicates the likelihood of something such as whether the
cells are cancerous or not, a mouse is obese or not based on its weight, etc.
o Logistic Regression is a significant machine learning algorithm because it has the ability to
provide probabilities and classify new data using continuous and discrete datasets.
o Logistic Regression can be used to classify the observations using different types of data and
can easily determine the most effective variables used for the classification. The below image
is showing the logistic function:

Logistic Function (Sigmoid Function):

o The sigmoid function is a mathematical function used to map the predicted values to
probabilities.
o It maps any real value into another value within a range of 0 and 1.
o The value of the logistic regression must be between 0 and 1, which cannot go beyond this
limit, so it forms a curve like the "S" form. The S-form curve is called the Sigmoid function or
the logistic function.
o In logistic regression, we use the concept of the threshold value, which defines the probability
of either 0 or 1. Such as values above the threshold value tends to 1, and a value below the
threshold values tends to 0.

Assumptions for Logistic Regression:

o The dependent variable must be categorical in nature.


o The independent variable should not have multi-collinearity.

Logistic Regression Equation:

The Logistic regression equation can be obtained from the Linear Regression equation. The
mathematical steps to get Logistic Regression equations are given below:
o We know the equation of the straight line can be written as:

o In Logistic Regression y can be between 0 and 1 only, so for this let's divide the above
equation by (1-y):

o But we need range between -[infinity] to +[infinity], then take logarithm of the equation it
will become:

The above equation is the final equation for Logistic Regression.


Type of Logistic Regression:

Logistic Regression can be classified into three types:


o Binomial: In binomial Logistic regression, there can be only two possible types of the
dependent variables, such as 0 or 1, Pass or Fail, etc.
o Multinomial: In multinomial Logistic regression, there can be 3 or more possible unordered
types of the dependent variable, such as "cat", "dogs", or "sheep"
o Ordinal: In ordinal Logistic regression, there can be 3 or more possible ordered types of
dependent variables, such as "low", "Medium", or "High".

4. Explain in detail about support vector machine?


Support Vector Machine Algorithm
a. Support Vector Machine or SVM is one of the most popular Supervised Learning algorithms,
which is used for Classification as well as Regression problems. However, primarily, it is used
for Classification problems in Machine Learning.

b. The goal of the SVM algorithm is to create the best line or decision boundary that can
segregate n-dimensional space into classes so that we can easily put the new data point in the
correct category in the future. This best decision boundary is called a hyperplane.

c. SVM chooses the extreme points/vectors that help in creating the hyperplane. These extreme
cases are called as support vectors, and hence algorithm is termed as Support Vector Machine.
Consider the below diagram in which there are two different categories that are classified using a
decision boundary or hyperplane:
Example: SVM can be understood with the example that we have used in the KNN classifier. Suppose
we see a strange cat that also has some features of dogs, so if we want a model that can accurately
identify whether it is a cat or dog, so such a model can be created by using the SVM algorithm. We
will first train our model with lots of images of cats and dogs so that it can learn about different
features of cats and dogs, and then we test it with this strange creature. So as support vector creates a
decision boundary between these two data (cat and dog) and choose extreme cases (support vectors),
it will see the extreme case of cat and dog.

SVM algorithm can be used for Face detection, image classification, text categorization, etc.

Types of SVM

SVM can be of two types:


o Linear SVM: Linear SVM is used for linearly separable data, which means if a dataset can be
classified into two classes by using a single straight line, then such data is termed as linearly
separable data, and classifier is used called as Linear SVM classifier.
o Non-linear SVM: Non-Linear SVM is used for non-linearly separated data, which means if a
dataset cannot be classified by using a straight line, then such data is termed as non-linear data
and classifier used is called as Non-linear SVM classifier.

Hyperplane and Support Vectors in the SVM algorithm:

Hyperplane:
d. There can be multiple lines/decision boundaries to segregate the classes in n-dimensional
space, but we need to find out the best decision boundary that helps to classify the data
points. This best boundary is known as the hyperplane of SVM.

e. The dimensions of the hyperplane depend on the features present in the dataset, which
means if there are 2 features (as shown in image), then hyperplane will be a straight line.
And if there are 3 features, then hyperplane will be a 2-dimension plane.

Support Vectors:
The data points or vectors that are the closest to the hyperplane and which affect the position of the
hyperplane are termed as Support Vector. Since these vectors support the hyperplane, hence called a
Support vector.

How does SVM works?

Linear SVM:
The working of the SVM algorithm can be understood by using an example. Suppose we
have a dataset that has two tags (green and blue), and the dataset has two features x1 and x2. We
want a classifier that can classify the pair(x1, x2) of coordinates in either green or blue. Consider the
below image:
Hence, the SVM algorithm helps to find the best line or decision boundary; this best boundary
or region is called as a hyperplane. SVM algorithm finds the closest point of the lines from both the
classes. These points are called support vectors. The distance between the vectors and the hyperplane
is called as margin. And the goal of SVM is to maximize this margin. The hyperplane with
maximum margin is called the optimal hyperplane.

Non-Linear SVM:
If data is linearly arranged, then we can separate it by using a straight line, but for non-linear data, we
cannot draw a single straight line. Consider the below image:
So to separate these data points, we need to add one more dimension. For
linear data, we have used two dimensions x and y, so for non-linear data,
we will add a third dimension z. It can be calculated as:
z=x2 +y2
By adding the third dimension, the sample space will become as below image:

So now, SVM will divide the datasets into classes in the following way.
Since we are in 3-d Space, hence it is looking like a plane parallel to the x-
axis. If we convert it in 2d space with z=1, then it will become as:
5. Explain in detail about decision tree and random forest?

Decision Tree Classification Algorithm


o Decision Tree is a Supervised learning technique that can be used for
both classification and Regression problems, but mostly it is preferred for
solving Classification problems. It is a tree- structured classifier, where
internal nodes represent the features of a dataset, branches represent the
decision rules and each leaf node represents the outcome.
o In a Decision tree, there are two nodes, which are

Decision Node and Leaf Node

o Decision nodes are used to make any decision and have multiple
branches, whereas Leaf nodes are the output of those decisions and do not
contain any further branches.

o The decisions or the test are performed on the basis of features of the
given dataset. It is a graphical representation for getting all the possible
solutions to a problem/decision based on given conditions.

o It is called a decision tree because, similar to a tree, it starts with the


root node, which expands on further branches and constructs a tree-like
structure. In order to build a tree, we use the CART algorithm, which
stands for Classification and Regression Tree algorithm.
o A decision tree simply asks a question, and based on the answer
(Yes/No), it further split the tree into subtrees. Below diagram explains the
general structure of a decision tree:

There are various algorithms in Machine learning, so choosing the best


algorithm for the given dataset and problem is the main point to remember
while creating a machine learning model. Below are the two reasons for
using the Decision tree:
o Decision Trees usually mimic human thinking ability while making
a decision, so it is easy to understand.
o The logic behind the decision tree can be easily understood because
it shows a tree-like structure.

Decision Tree Terminologies


➢ Root Node: Root node is from where the decision tree starts. It
represents the entire dataset, which further gets divided into two or
more homogeneous sets.
➢ Leaf Node: Leaf nodes are the final output node, and the tree
cannot be segregated further after getting a leaf node.
➢ Splitting: Splitting is the process of dividing the decision
node/root node into sub-nodes according to the given conditions.
➢ Branch/Sub Tree: A tree formed by splitting the tree.
➢ Pruning: Pruning is the process of removing the unwanted branches from the
tree.
➢ Parent/Child node: The root node of the tree is called the parent
node, and other nodes are called the child nodes.

How does the Decision Tree algorithm Work?


In a decision tree, for predicting the class of the given dataset, the
algorithm starts from the root node of the tree. This algorithm compares the
values of root attribute with the record (real dataset) attribute and, based on
the comparison, follows the branch and jumps to the next node. For the
next node, the algorithm again compares the attribute value with the other
sub-nodes and move further. It continues the process until it reaches the
leaf node of the tree. The complete process can be better understood using
the below algorithm:
Step-1: Begin the tree with the root node, says S, which contains the complete
dataset.
Step-2: Find the best attribute in the dataset using Attribute Selection Measure
(ASM).
Step-3: Divide the S into subsets that contains possible values for the best
attributes.
Step-4: Generate the decision tree node, which contains the best attribute.
Step-5: Recursively make new decision trees using the subsets of the
dataset created in step -3. Continue this process until a stage is reached
where you cannot further classify the nodes and called the final node as
a leaf node.

Example: Suppose there is a candidate who has a job offer and wants to
decide whether he should accept the offer or Not. So, to solve this problem,
the decision tree starts with the root node (Salary attribute by ASM). The
root node splits further into the next decision node (distance from the
office) and one leaf node based on the corresponding labels. The next
decision node further gets split into one decision node (Cab facility) and
one leaf node. Finally, the decision node splits into two leaf nodes
(Accepted offers and Declined offer). Consider the below diagram:

Advantages of the Decision Tree

o It is simple to understand as it follows the same process which a


human follow while making any decision in real-life.
o It can be very useful for solving decision-related problems.
o It helps to think about all the possible outcomes for a problem.
o There is less requirement of data cleaning compared to other algorithms.

Disadvantages of the Decision Tree


o The decision tree contains lots of layers, which makes it complex.
o It may have an overfitting issue, which can be resolved using the Random
Forest algorithm.
o For more class labels, the computational complexity of the decision tree may
increase.

You might also like