0% found this document useful (0 votes)

15 views5 pages

Survey Paper On Machine Learning Algorithms

Uploaded by

niravbhatt11

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views5 pages

Survey Paper On Machine Learning Algorithms

Uploaded by

niravbhatt11

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/369417666

Comparative Analysis of Machine Learning Algorithms : Random Forest

algorithm, Naive Bayes Classiﬁer and KNN -A survey

Conference Paper · March 2023

CITATION READS

1 1,133

5 authors, including:

Pallavi Wankhede Akshay Gole

St Vincent Pallotti College of Engineering & Technology St Vincent Pallotti College of Engineering & Technology
11 PUBLICATIONS 54 CITATIONS 2 PUBLICATIONS 10 CITATIONS

SEE PROFILE SEE PROFILE

Prathmesh Kanherkar Sankalp Singh

St Vincent Pallotti College of Engineering & Technology St Vincent Pallotti College of Engineering & Technology
2 PUBLICATIONS 10 CITATIONS 2 PUBLICATIONS 10 CITATIONS

SEE PROFILE SEE PROFILE

All content following this page was uploaded by Pallavi Wankhede on 22 March 2023.

The user has requested enhancement of the downloaded file.

Comparative Analysis of Machine Learning Algorithms : Random
Forest algorithm, Naive Bayes Classifier and KNN - A survey

Akshay Gole Sankalp Singh

Department of Computer Engineering, Department of Computer Engineering,
St. Vincent Pallotti College of Engineering & Technology, St. Vincent Pallotti College of Engineering & Technology,
Nagpur, Maharashtra, India. Nagpur, Maharashtra, India.

Prathmesh Kanherkar
Department of Computer Engineering, P.R.Abhishek
St. Vincent Pallotti College of Engineering & Technology, Department of Computer Engineering,
Nagpur, Maharashtra, India. St. Vincent Pallotti College of Engineering & Technology,
Nagpur, Maharashtra, India.

Prof . Pallavi Wankhede

Assistant Professor
Department of Computer Engineering,
St. Vincent Pallotti College of Engineering & Technology,
Nagpur, Maharashtra, India.

Abstract— Machine learning is a branch of computer science search, as well as a vastly better understanding of our genomes.
in which a computer predicts the next task to be performed by There are a lot of things you probably do every day that utilize
analysing the data that is provided to it. The computer can machine learning, but you might not even know it.[1] Often,
access data in the form of digitised training sets or through machine learning is classified by how an algorithm improves in
interaction with the environment. The primary goal of this paper its abilities to make predictions. We can categorize learning
is to provide a general comparison of the Random Forest approaches into four groups: supervised, unsupervised, semi-
algorithm, the Naive Bayes Classifier, and the KNN algorithm supervised, reinforcement, and ensemble learning. Based on
all aspects. "Random Forest Classifier" is made up of many what type of data scientists want to predict, they choose what
decision trees. To promote uncorrelated forests, the algorithm algorithm to use.
leverages randomization to form each individual tree, which
then uses the forest's predictive powers to make accurate
decisions. The Naive Bayes Classifier is a simple and effective 2. LITRATURE REVIEW
classification method that aids in the development of fast We've covered the relevance of machine learning in this section,
machine learning models capable of making quick predictions. as well as the Random Forest method, the Naive Bayes
“K-Nearest Neighbour”. The algorithm can be used to handle Classifier, and the KNN algorithm.
problems involving classification and regression. These
algorithms are surveyed on the basis of aim, methodology,
advantages and disadvantages 2.1 Machine learning:
1. INTRODUCTION
Machine learning, in short, is the science of getting computers
to act automatically without explicit programming. We’ve been
Machine learning, in short, is the science of getting computers able to use machine learning for many things over the past
to act automatically without explicit programming .We've been decade, from self-driving cars to speech recognition and web
able to use machine learning for many things over the past search, as well as a vastly better understanding of our genomes.
decade, from self-driving cars to speech recognition and web There are a lot of things you probably do every day that utilize

194
It results in more or
machine learning, but you might not even know it.[1] Often, less correlated predictions and errors made by each tree in the
machine learning is classified by how an algorithm improves in ensemble. These less correlated trees often perform better than
its abilities to make predictions. We can categorize learning bagged decision trees when their predictions are averaged to
approaches into four groups: supervised, unsupervised, semi- make a prediction.[2]
supervised, reinforcement, and ensemble learning. Based on
what type of data scientists want to predict, they choose what The number of random features to consider at each split point
algorithm to use. is probably the most important hyperparameter for tuning
random forests.
[3] This hyperparameter should be set to 1/3 of the number of
Ensemble learning: To understand the Random Forest input features as a heuristic for regression.
machine learning algorithm we need to first understand
ensemble learning.
num_features_for_split = total_input_features / 3

[4] This hyperparameter should be set to the square root of the

[2] The concept of ensemble learning basically refers to a number of input features as a heuristic for classification
method of making predictions based on the prediction of several N
different models. Ensemble models are more flexible and less
sensitive to data because they combine individual models. um_features_for_split = sqrt(total_input_features)

The depth of the decision trees is another important

Bagging and boosting are most popular ensemble learning hyperparameter. In addition to more overfitting, deeper trees are
methods: less correlated, which may enhance the ensemble performance.
1 to 10 levels of depths may be effective.[2]

Bagging: A bunch of individual models are trained As a last step, you can choose how many decision trees will be
simultaneously. Data from random subsets is used to train the included in the ensemble. The number is often increased until
models no further improvements can be observed.

Advantages of Random Forest [5]:

Boosting: Individual models are trained in sequential way.
1. In decision trees, it reduces overfitting and improves
Throughout the learning process, each new model learns from
accuracy.
the mistakes of the previous one.[2]
2. This algorithm is flexible enough to be used for both
classification and regression problems.
2.2 Random Forest: 3. It can be used for both categorical and continuous
Random forests are ensemble models using bagging as the values.
ensemble method and decision trees as the individual model. 4. It automates the process of filling in missing values in
As a result of averaging the predictions from the trees, the the data.
model performs better than any one decision tree alone.[2] 5. It uses a rule-based approach, so it does not require
In a regression problem, the prediction is achieved by averaging data normalization.
the predictions of all the trees. For a classification problem, the
prediction is the class label that has the largest majority vote Disadvantages of Random Forest [5]:
across the trees of the ensemble. 1. As it builds numerous trees to combine their outputs,
● Regression: A prediction is an average of all the it requires a lot of computational power and resources.
predictions in the decision tree. 2. Additionally, it requires a great deal of time for
● Classification: The prediction is the class label with training since it combines a lot of decision trees.
the most votes across all decision trees. 3. Moreover, because it uses an ensemble of decision
trees, it is hard to interpret and does not indicate the significance
A random forest is constructed by putting bootstrapping
of each variable.
samples from a training dataset into a large number of decision
trees. At each split point in the construction of trees, random
forest also selects a subset of input features (columns or Applications of Random Forest:
variables) from the input, unlike bagging. The process of
building a decision tree involves selecting a split point based on A banking analysis contains a high risk of profit and loss, thus
the value of each input variable in the data. As the features are requiring a lot of effort. In the banking industry, customer
reduced to a random subset that can be considered at each split analysis is one of the most commonly used studies. Random
point, the ensemble decision trees become more diverse. forests are perfect for detecting any fraud transaction or

195
• When the
problems such as calculating the likelihood of a customer assumptions of independence hold, the Naïve Bayes
defaulting on a loan.[5] classifier performs well than most of the other existing
models
1. It can be used in pharmaceutical industries to assess the • It performs well with categorical labels than numerical
potential of a particular medicine or to identify the chemical variables
composition needed for a medicine.
Disadvantages
2. In addition, hospitals can use it to identify illnesses suffered • The independent assumptions are a big factor in
by patients, cancer risk in patients, and many other diseases that making guesses hence if that doesn’t hold then the Naïve Bayes
depend on early diagnosis and research. classifiers fail to give the correct output
• If a label is observed in test data but not training data
then the model assigns 0 value to it and will be unable to make
2.3 Naïve Bayes predictions for it
Naïve Bayes is a classification algorithm that works on the
concept of Bayes theorem of probability to predict the classes Applications
for an unknown dataset. In simpler terms, the Naïve Bayes
algorithm classifies each feature of the given dataset Used for Document classification
independently irrespective of its relation to any of the other
features. Used for Email filtering, Spam Filtering
The Bayes theorem provides a way for the user to calculate the
Used for construction of recommendation system which is
posterior probability P(c|x) from P(c), P(x), and P(x|c). The
used for data mining
equation for the same is
Used for real time predictions
𝑃(𝑥|𝑐) × 𝑃(𝑐)
𝑃(𝑐|𝑥) =
𝑃(𝑥) 2.4 K - Nearest Algorithm (KNN Algorithm)
Where,
• P(c|x) is the posterior probability of class The k-nearest algorithm or the k-nearest neighbors algorithm is
• P(c) is the previous probability of class a non-parametric supervised learning method which was first
• P(x|c) is the probability of the predictor of the given developed in the year 1951 by Joseph Hodges and Evelyn Fix,
class which was later expanded upon by Thomas Cover.
• P(x) is the previous probability of predictor
The KNN algorithm is used to solve classification as well as
regression problems.
Going by the types of Naïve Bayes classifiers there are three It works by finding the distances between a query and all the
types namely: examples in the data, selecting the specified number examples
Multinomial Naïve Bayes (K) closest to the query, then votes for the most frequent label
This is generally used when the task at hand is document or averages the labels in case of classification or in the case of
classification. For example, if we have to classify the document regression respectively.
into types like sports magazines or political magazines.
Bernoulli Naïve Bayes In the case of classification and regression, choosing the right
This classifier is similar to the Multinomial Naïve Bayes K for the data is done by trying several different Ks and then
classifier with the difference being that the Bernoulli classifier picking the one that works best according to our needs.
only has predictors in boolean variables i.e., they take up values
only in the form of Yes or No. Advantages

Gaussian Naïve Bayes KNN is widely used because of the vast advantages it offers. It
In this classifier, the predictors take up continuous values is very simple to implement and understand. The KNN
instead of discrete values. Since the values present in the dataset algorithm has no explicit training step and all the work happens
change, the formula for conditional probability changes to, during prediction itself. As new data is added to the data-set, the
1 (𝑥𝑖 − 𝜇𝑦)2 prediction is adjusted without having to retrain a new model as
𝑃(𝑥𝑖|𝑦) = exp (− ) there is no explicit training set for it. Also, since there is only a
√2𝜋𝜎𝑦2 2𝜎𝑦2 single hyper parameter, i.e, the value of K, hence hyper
parameter tuning becomes pretty easy.
Advantages
• It is much faster and easier to predict classes for a Disadvantages
dataset.

196
like all other algorithms, the KNN algorithm isn’t perfect
either. When there is a high amount of data set to process, the The above is the comparison of widely used and most popular
prediction complexity becomes very high. Also for higher supervised classification algorithms. Accuracy is determined by
dimensional data too, the prediction complexity becomes high. comparing the confusion matrix. As a measure of performance,
The KNN algorithm is sensitive when features like distance, accuracy is the ratio of correct predictions to all observations. It
dimension etc. have different ranges. Also noisy data can result is the most intuitive measure. The accuracy is compared by
in over-fitting or under-fitting of data. applying the algorithms to the dataset. [6]

Application
4. Conclusion
The KNN algorithm has applications in various fields. Some of
the common applications of KNN are as follows:
1. Facial recognition systems.
2. Recommendation Systems.
3. In the agricultural sector to predict various factors.

The KNN algorithm is used in different platforms such as on

Netflix or on Amazon where the user or the customer is given
recommendation for movies, series, products etc, based on their
previous searches or watch history.
Clarity on Best Best Average
Classification
prediction

Parameters Average Best Average

handling for
model

Overall Best Worst Good

accuracy (84.13%) (80.14%) (83.65%)

3.Comparative Analysis We examined three basic algorithms in depth in this

comprehensive survey: the Random Forest method, the Naive
This section provides a comparison of the above-mentioned Bayes Classifier, and the KNN algorithm. These three
algorithms with respect to a few important parameters. In the algorithms were compared based on a number of parameters.
end, we evaluate the overall accuracy of these algorithms. This This paper will aid researchers in determining which one of
analysis is based on the earlier mentioned dataset. these three algorithms is the best to use in their future research.

Table 1 An analysis of three widely used supervised

classification algorithms.[6]
REFERENCES
Parameters for Random Naive k-NN
comparison Forest Bayes
[1https://www.geeksforgeeks.org/machine-learning/
Speed of Average Best Best
learning [2] https://machinelearningmastery.com/random-forest-
ensemble-in-python/
Classification Best Best Worst
speed [3] Page 199, Applied Predictive Modeling, 2013

Performance Average Best Worst [4] Page 387, Applied Predictive Modeling, 2013.
when value is
missing [5] https://www.mygreatlearning.com/blog/random-
forestalgorithm/#AdvantagesandDisadvantagesofRandomFore st
Performance Average Good Good
with irrelevant [6] [6] Sen, Pratap & Hajra, Mahimarnab & Ghosh, Mitadru.
features (2020). Supervised Classification Algorithms in Machine
Learning: A Survey and Review. 10.1007/978-981-13-7403-
Noise tolerance Good Average Average 6_11.

Performance on Good Average Average 197

discrete/binary
attributes
View publication stats

Data Science Unit-4 B.sc. III Sem. MDC
No ratings yet
Data Science Unit-4 B.sc. III Sem. MDC
6 pages
Partha Pratim Das New1
No ratings yet
Partha Pratim Das New1
13 pages
Paper-189 - Machine Learning Unveiled
No ratings yet
Paper-189 - Machine Learning Unveiled
19 pages
AI in Management & Finance
No ratings yet
AI in Management & Finance
6 pages
(IJCST-V9I4P18) :yew Kee Wong
No ratings yet
(IJCST-V9I4P18) :yew Kee Wong
5 pages
MAC 681c884450fb6
No ratings yet
MAC 681c884450fb6
6 pages
Machine Learning Overview & Techniques
No ratings yet
Machine Learning Overview & Techniques
30 pages
(IJIT-V7I5P2) :yew Kee Wong
No ratings yet
(IJIT-V7I5P2) :yew Kee Wong
6 pages
New Advances in Machine Learning: ISBN 978-953-307-034-6
No ratings yet
New Advances in Machine Learning: ISBN 978-953-307-034-6
378 pages
DataScience Lung Cancer Predection M2
No ratings yet
DataScience Lung Cancer Predection M2
6 pages
92991v00 Machine Learning Section1 Ebook PDF
No ratings yet
92991v00 Machine Learning Section1 Ebook PDF
12 pages
Machine Learning
100% (1)
Machine Learning
63 pages
Machine Learning
No ratings yet
Machine Learning
51 pages
Machine Learning Algorithms Overview
No ratings yet
Machine Learning Algorithms Overview
8 pages
Machine Learning Ebook
100% (1)
Machine Learning Ebook
63 pages
2 Matlab Machine Learning
No ratings yet
2 Matlab Machine Learning
12 pages
Bias and Fairness in AI Algorithms
No ratings yet
Bias and Fairness in AI Algorithms
11 pages
What Is Machine Learning?
No ratings yet
What Is Machine Learning?
6 pages
Machine Learning in Big Data
No ratings yet
Machine Learning in Big Data
10 pages
Unit 1 Notes
No ratings yet
Unit 1 Notes
20 pages
Machine Learning
No ratings yet
Machine Learning
17 pages
Machine Learning
No ratings yet
Machine Learning
5 pages
Midterm Topics - V Advanced Data Mining Algorithms
No ratings yet
Midterm Topics - V Advanced Data Mining Algorithms
7 pages
Machine Learning For Data Science Unit-4
No ratings yet
Machine Learning For Data Science Unit-4
16 pages
Fundamentals of Machine Learning II
No ratings yet
Fundamentals of Machine Learning II
13 pages
Machine Learning
No ratings yet
Machine Learning
2 pages
AIML Internship Report
No ratings yet
AIML Internship Report
38 pages
Machine Learning Applications in Big Data
No ratings yet
Machine Learning Applications in Big Data
10 pages
Project: Advisor Dr. Sanaa El Touny (Spring 2024) Group 3
No ratings yet
Project: Advisor Dr. Sanaa El Touny (Spring 2024) Group 3
7 pages
Role of Machine Learning in Manufacturin
No ratings yet
Role of Machine Learning in Manufacturin
9 pages
Machine Learning
No ratings yet
Machine Learning
5 pages
New Advances in Machine Learning
No ratings yet
New Advances in Machine Learning
374 pages
Machine Learning
100% (2)
Machine Learning
104 pages
Unit 1 ML
No ratings yet
Unit 1 ML
23 pages
Presenttion 33
No ratings yet
Presenttion 33
2 pages
Machine Learning - Introduction
No ratings yet
Machine Learning - Introduction
36 pages
Machine Learning, History and Types of ML
No ratings yet
Machine Learning, History and Types of ML
18 pages
Machine Learning
No ratings yet
Machine Learning
9 pages
AkinsolaJET IJCTT V48P126
No ratings yet
AkinsolaJET IJCTT V48P126
12 pages
ML Aa
No ratings yet
ML Aa
83 pages
Machine Learning Overview
No ratings yet
Machine Learning Overview
12 pages
SSRN 3702236
No ratings yet
SSRN 3702236
8 pages
Advanced Machine Learning Mastering Level Learning With Python
No ratings yet
Advanced Machine Learning Mastering Level Learning With Python
81 pages
Laurent Paper 11113
No ratings yet
Laurent Paper 11113
7 pages
Machine Learning Concept1
No ratings yet
Machine Learning Concept1
16 pages
Learning and Big Data AI, Machine
No ratings yet
Learning and Big Data AI, Machine
42 pages
Classification Techniquesin Machine Learning Applicationsand Issues
No ratings yet
Classification Techniquesin Machine Learning Applicationsand Issues
8 pages
Machine Learning
No ratings yet
Machine Learning
7 pages
Machine Learning Techniques and Tools: A Survey: 2. Related Work
No ratings yet
Machine Learning Techniques and Tools: A Survey: 2. Related Work
7 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
19 pages
1725756863119.AS - WB - The Internet
No ratings yet
1725756863119.AS - WB - The Internet
2 pages
1725756863119.AS - WB - The Internet
No ratings yet
1725756863119.AS - WB - The Internet
2 pages
Useful Information For Presenting Authors in Parallel Sessions
No ratings yet
Useful Information For Presenting Authors in Parallel Sessions
1 page
View Serializability Polygraph PDF
No ratings yet
View Serializability Polygraph PDF
7 pages
View Serializability Polygraph PDF
No ratings yet
View Serializability Polygraph PDF
7 pages
Ensemble Learning in Machine Learning
No ratings yet
Ensemble Learning in Machine Learning
15 pages
Brit J of Edu Psychol - 2024 - Cheung - A Machine Learning Model of Academic Resilience in The Times of The COVID 19
No ratings yet
Brit J of Edu Psychol - 2024 - Cheung - A Machine Learning Model of Academic Resilience in The Times of The COVID 19
21 pages
BDC Nptel Week-7
No ratings yet
BDC Nptel Week-7
5 pages
CT2 Assignment
No ratings yet
CT2 Assignment
3 pages
AI & DS-II MU QPaper Solution (Dec 2023)
No ratings yet
AI & DS-II MU QPaper Solution (Dec 2023)
16 pages
MLS 1 - Decision Trees and Random Forests
No ratings yet
MLS 1 - Decision Trees and Random Forests
16 pages
Final Year Project Report
No ratings yet
Final Year Project Report
76 pages
Generative AI and Machine Learning Course Content
No ratings yet
Generative AI and Machine Learning Course Content
19 pages
Microcredit Credit Scoring Models
No ratings yet
Microcredit Credit Scoring Models
22 pages
Module 3 - 3
No ratings yet
Module 3 - 3
93 pages
Supply Chain
No ratings yet
Supply Chain
14 pages
Introduction To AI and ML
No ratings yet
Introduction To AI and ML
22 pages
Fake Job Post Detection Using Machine Learning
100% (1)
Fake Job Post Detection Using Machine Learning
24 pages
HITRUST Inheritance in Ensemble Methods
No ratings yet
HITRUST Inheritance in Ensemble Methods
17 pages
Question-Answers in Machine Learning
No ratings yet
Question-Answers in Machine Learning
14 pages
Bagging and Boosting
No ratings yet
Bagging and Boosting
9 pages
AI Fellowship Nepal
No ratings yet
AI Fellowship Nepal
17 pages
Machine Learning Interview Questions
No ratings yet
Machine Learning Interview Questions
38 pages
Module 3
No ratings yet
Module 3
13 pages
Unit 3 - Regularization For Deep Learning
No ratings yet
Unit 3 - Regularization For Deep Learning
65 pages
Kalaycioglu Et Al 2025 Evaluating The Sample Size Requirements of Tree Based Ensemble Machine Learning Techniques For
No ratings yet
Kalaycioglu Et Al 2025 Evaluating The Sample Size Requirements of Tree Based Ensemble Machine Learning Techniques For
17 pages
Unit 4
No ratings yet
Unit 4
13 pages
Impact of Scaling Techniques on Classification
No ratings yet
Impact of Scaling Techniques on Classification
37 pages
Ensemble Methods in Data Analytics
No ratings yet
Ensemble Methods in Data Analytics
23 pages
Overfitting
No ratings yet
Overfitting
7 pages
10 38016-Jista 922663-1720006
No ratings yet
10 38016-Jista 922663-1720006
7 pages
Crop Yield Analysis with ML
No ratings yet
Crop Yield Analysis with ML
51 pages
A General Guide To Applying Machine Learning To Computer Architecture - Marked
No ratings yet
A General Guide To Applying Machine Learning To Computer Architecture - Marked
21 pages
Feature Selection Strategies: A Comparative Analysis of SHAP Value and Importance Based Methods
No ratings yet
Feature Selection Strategies: A Comparative Analysis of SHAP Value and Importance Based Methods
16 pages
Let's Begin With:: Differentiate Between Supervised and Unsupervised Learning
No ratings yet
Let's Begin With:: Differentiate Between Supervised and Unsupervised Learning
26 pages