0% found this document useful (0 votes)

38 views17 pages

Logistic & Naïve Bayes Analysis

Uploaded by

Rishita Maheshwari

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

38 views17 pages

Logistic & Naïve Bayes Analysis

Uploaded by

Rishita Maheshwari

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

Experiment 5

(a) Execute the Logistic Regression with the help of properly identified data set. Analyse
the result and identify how well the model performed on test set. Brief the steps that
you have followed for analyse the data set.
(b) Implement Logistic Regression using python.

What is Logistic Regression?

Logistic regression is used for binary classification where we use sigmoid function, that takes
input as independent variables and produces a probability value between 0 and 1.
For example, we have two classes Class 0 and Class 1 if the value of the logistic function for
an input is greater than 0.5 (threshold value) then it belongs to Class 1 it belongs to Class 0.
It’s referred to as regression because it is the extension of linear regression but is mainly used
for classification problems.
Key Points:
● Logistic regression predicts the output of a categorical dependent variable. Therefore,
the outcome must be a categorical or discrete value.
● It can be either Yes or No, 0 or 1, true or False, etc. but instead of giving the exact
value as 0 and 1, it gives the probabilistic values which lie between 0 and 1.
● In Logistic regression, instead of fitting a regression line, we fit an “S” shaped logistic
function, which predicts two maximum values (0 or 1).
Linear Regression Equation:

Where, y is a dependent variable and x1, x2 ... and Xn are explanatory variables.

Sigmoid Function:

Apply Sigmoid function on linear regression:

Logistic Function – Sigmoid Function

● The sigmoid function is a mathematical function used to map the predicted values to
probabilities.
● It maps any real value into another value within a range of 0 and 1. The value of the
logistic regression must be between 0 and 1, which cannot go beyond this limit, so it
forms a curve like the “S” form.
● The S-form curve is called the Sigmoid function or the logistic function.
● In logistic regression, we use the concept of the threshold value, which defines the
probability of either 0 or 1. Such as values above the threshold value tends to 1, and a
value below the threshold values tends to 0.

Program: -
#importing libraries
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import seaborn as sns
import matplotlib.pyplot as plt

df= pd.read_csv("iris.csv") #importing dataset and making dataframe

df.head() #showing top 5 data entry

df.describe() #describes are data

df.info() #gives information about the columns

df.shape #tells us about no. of rows and column [rows , columns]

(150, 5)

print(df["variety"].value_counts())
sns.countplot(df["variety"])

plt.figure(figsize=(8,4))
sns.heatmap(df.corr(),annot=True,fmt=".0%") #draws heatmap with input as the correlation
matrix calculted by(df.corr())
plt.show()
# We'll use seaborn's FacetGrid to color the scatterplot by species
sns.FacetGrid(df, hue="variety", height=5).map(plt.scatter, "sepal.length",
"sepal.width").add_legend()

from sklearn.linear_model import LogisticRegression # for Logistic Regression algorithm

from sklearn.model_selection import train_test_split #to split the dataset for training and
testing
from sklearn import metrics #for checking the model accuracy

X=df.iloc[:,0:4]
Y=df["variety"]
X.head()
X_train,X_test,Y_train,Y_test=train_test_split(X,Y,test_size=0.25,random_state=0)# in this
our main data is split into train and test
# the attribute test_size=0.3 splits the data into 70% and 30% ratio. train=70% and test=30%
print("Train Shape",X_train.shape)
print("Test Shape",X_test.shape)

log = LogisticRegression()
log.fit(X_train,Y_train)
prediction=log.predict(X_test)

print('The accuracy of the Logistic Regression is',metrics.accuracy_score(prediction,Y_test))

Experiment 6

Execute the Naïve Bayes algorithm with suitable data set and do proper
analysis on the result. Also implement Naïve Bayes algorithm using
python.

Naïve Bayes Classifier Algorithm

● Naïve Bayes algorithm is a supervised learning algorithm, which is based on Bayes
theorem and used for solving classification problems.
● It is mainly used in text classification that includes a high-dimensional training
dataset.
● Naïve Bayes Classifier is one of the simple and most effective Classification
algorithms which helps in building the fast machine learning models that can make
quick predictions.
● It is a probabilistic classifier, which means it predicts on the basis of the probability of
an object.
● Some popular examples of Naïve Bayes Algorithm are spam filtration, Sentimental
analysis, and classifying articles.
Why is it called Naïve Bayes?
The Naïve Bayes algorithm is comprised of two words Naïve and Bayes, Which can be
described as:
Naïve: It is called Naïve because it assumes that the occurrence of a certain feature is
independent of the occurrence of other features. Such as if the fruit is identified on the bases of
color, shape, and taste, then red, spherical, and sweet fruit is recognized as an apple. Hence
each feature individually contributes to identify that it is an apple without depending on each
other.
Bayes: It is called Bayes because it depends on the principle of Bayes' Theorem.
Bayes' Theorem:
o Bayes' theorem is also known as Bayes' Rule or Bayes' law, which is used to
determine the probability of a hypothesis with prior knowledge. It depends on the
conditional probability.
o The formula for Bayes' theorem is given as:

Where,
P(A|B) is Posterior probability: Probability of hypothesis A on the observed event B.
P(B|A) is Likelihood probability: Probability of the evidence given that the probability of a
hypothesis is true.
P(A) is Prior Probability: Probability of hypothesis before observing the evidence. P(B)
is Marginal Probability: Probability of Evidence.
Program: -

#importing libraries
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv) import
seaborn as sns
import matplotlib.pyplot as plt
from sklearn.naive_bayes import GaussianNB from
sklearn import metrics
from sklearn.model_selection import train_test_split #to split the dataset for training and
testing

df= pd.read_csv("iris.csv") #importing dataset and making dataframe

df.head() #showing top 5 data entry

df.describe() #describes are data

df.info() #gives information about the columns

df.shape #tells us about no. of rows and column [rows , columns] (150,
5)

print(df["variety"].value_counts())
sns.countplot(df["variety"])

X_train,X_test,Y_train,Y_test=train_test_split(X,Y,test_size=0.4,random_state=0)# in this
our main data is split into train and test
# the attribute test_size=0.4 splits the data into 60% and 40% ratio. train=60% and test=40%
print("Train Shape",X_train.shape)
print("Test Shape",X_test.shape)

Train Shape (90, 4)

Test Shape (60, 4)
#Creating Naive Bayes classifier model gnb
= GaussianNB()
gnb.fit(X_train, Y_train)
GaussianNB(priors=None, var_smoothing=1e-09)

# making predictions on the testing set

y_pred = gnb.predict(X_test)

print("Gaussian Naive Bayes model accuracy(in %):", metrics.accuracy_score(Y_test,

y_pred)*100)
Gaussian Naive Bayes model accuracy(in %): 93.33333333333333
Experiment 7

Identify a data set for executing the Decision Tree algorithm to implement
using python and analyse the same with cross validation and percentage
split.

Decision Tree Algorithm

• Decision Tree is a Supervised learning technique that can be used for both
classification and Regression problems, but mostly it is preferred for solving
Classification problems.
• It is a tree-structured classifier, where internal nodes represent the features of a
dataset, branches represent the decision rules and each leaf node represents the
outcome.
• In a Decision tree, there are two nodes, which are the Decision Node and Leaf Node.
• Decision nodes are used to make any decision and have multiple branches, whereas
Leaf nodes are the output of those decisions and do not contain any further branches.
• The decisions or the test are performed on the basis of features of the given dataset.
• It is a graphical representation for getting all the possible solutions to a
problem/decision based on given conditions.
• It is called a decision tree because, similar to a tree, it starts with the root node, which
expands on further branches and constructs a tree-like structure.
A decision tree simply asks a question, and based on the answer (Yes/No), it further split the
tree into subtrees.
Program: -
# Load libraries
import pandas as pd
import seaborn as sns
from sklearn.tree import DecisionTreeClassifier # Import Decision Tree Classifier
from sklearn.model_selection import train_test_split # Import train_test_split function
from sklearn import metrics #Import scikit-learn metrics module for accuracy calculation

# load dataset
pima = pd.read_csv("diabetes.csv")
pima.head()

pima.describe()

pima.info()

print(pima["Outcome"].value_counts())
sns.countplot(pima["Outcome"]
#split dataset in features and target variable
feature_cols = ['Pregnancies', 'Glucose', 'BloodPressure',
'SkinThickness','Insulin','BMI','DiabetesPedigreeFunction','Age']
X = pima[feature_cols] # Features
y = pima.Outcome # Target variable

# Split dataset into training set and test set

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1) # 70%
training and 30% test

# Create Decision Tree classifer object

clf = DecisionTreeClassifier()
# Train Decision Tree Classifer
clf = clf.fit(X_train,y_train)
#Predict the response for test dataset
y_pred = clf.predict(X_test)

# Model Accuracy, how often is the classifier correct?

print("Accuracy:",metrics.accuracy_score(y_test, y_pred))
Accuracy: 0.70995670995671
Experiment 8

Identify / prepare a data set for executing K-Means algorithm. Implement K-

Means algorithm using python. Do the proper analysis of the result with
visualizing the clusters and by changing the K.
What is K-Means Algorithm?
• K-Means Clustering is an Unsupervised Learning algorithm, which groups the
unlabeled dataset into different clusters.
• Here K defines the number of pre-defined clusters that need to be created in the
process, as if K=2, there will be two clusters, and for K=3, there will be three clusters,
and so on.
• It allows us to cluster the data into different groups and a convenient way to discover
the categories of groups in the unlabeled dataset on its own without the need for any
training.
• It is a centroid-based algorithm, where each cluster is associated with a centroid.
• The main aim of this algorithm is to minimize the sum of distances between the data
point and their corresponding clusters.
• The algorithm takes the unlabeled dataset as input, divides the dataset into k-number
of clusters, and repeats the process until it does not find the best clusters.
• The value of k should be predetermined in this algorithm.
• The k-means clustering algorithm mainly performs two tasks:
• Determines the best value for K center points or centroids by an iterative process.
• Assigns each data point to its closest k-center. Those data points which are near to the
particular k-center, create a cluster.

Hence each cluster has data points with some commonalities, and it is away from other
clusters.
The below diagram explains the working of the K-means Clustering Algorithm:

The working of the K-Means algorithm is explained in the below steps:

• Step-1: Select the number K to decide the number of clusters.
• Step-2: Select random K points or centroids. (It can be other from the input dataset).
• Step-3: Assign each data point to their closest centroid, which will form the
predefined K clusters.
• Step-4: Calculate the variance and place a new centroid of each cluster.
• Step-5: Repeat the third steps, which means reassign each datapoint to the new
closest centroid of each cluster.
• Step-6: If any reassignment occurs, then go to step-4 else go to FINISH.
• Step-7: The model is ready.
•
How to choose the value of "K number of clusters" in K-means Clustering?
• The performance of the K-means clustering algorithm depends upon highly efficient
clusters that it forms.
• But choosing the optimal number of clusters is a big task.
• There are some different ways to find the optimal number of clusters, but here we are
discussing the most appropriate method to find the number of clusters or value of K.
Elbow Method
• The Elbow method is one of the most popular ways to find the optimal number of
clusters. This method uses the concept of WCSS value. WCSS stands for Within
Cluster Sum of Squares, which defines the total variations within a cluster.

• The formula to calculate the value of WCSS (for 3 clusters) is given below:

• Since the graph shows the sharp bend, which looks like an elbow, hence it is known as
the elbow method. The graph for the elbow method looks like the below image:

Program: -
# importing libraries
import numpy as nm
import matplotlib.pyplot as mtp import
pandas as pd
from sklearn.cluster import KMeans
# Importing the dataset
dataset = pd.read_csv('Mall_Customers.csv')
print(dataset.head())

x = dataset.iloc[:, [3, 4]].values

wcss_list= [] #Initializing the list for the values of WCSS

#Using for loop for iterations from 1 to 10. for

i in range(1, 11):
kmeans = KMeans(n_clusters=i, init='k-means++', random_state= 42)
kmeans.fit(x)
wcss_list.append(kmeans.inertia_)
mtp.plot(range(1, 11), wcss_list)
mtp.title('The Elobw Method Graph')
mtp.xlabel('Number of clusters(k)')
mtp.ylabel('wcss_list')
mtp.show()

#training the K-means model on a dataset

kmeans = KMeans(n_clusters=5, init='k-means++', random_state= 42)
y_predict=kmeans.fit_predict(x)

#visulaizing the clusters

mtp.scatter(x[y_predict == 0, 0], x[y_predict == 0, 1], s = 100, c = 'blue', label = 'Cluster 1')
#for first cluster
mtp.scatter(x[y_predict == 1, 0], x[y_predict == 1, 1], s = 100, c = 'green', label = 'Cluster 2')
#for second cluster
mtp.scatter(x[y_predict== 2, 0], x[y_predict == 2, 1], s = 100, c = 'red', label = 'Cluster 3')
#for third cluster
mtp.scatter(x[y_predict == 3, 0], x[y_predict == 3, 1], s = 100, c = 'cyan', label = 'Cluster 4')
#for fourth cluster
mtp.scatter(x[y_predict == 4, 0], x[y_predict == 4, 1], s = 100, c = 'magenta', label = 'Cluster 5')
#for fifth cluster
mtp.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1], s = 300, c = 'yellow',
label = 'Centroid')
mtp.title('Clusters of customers')
mtp.xlabel('Annual Income (k$)')
mtp.ylabel('Spending Score (1-100)')
mtp.legend(loc='lower center')
mtp.show()

Titanic Data Analysis with Python
No ratings yet
Titanic Data Analysis with Python
20 pages
Rain in Australia Logistic Regression Classifier
No ratings yet
Rain in Australia Logistic Regression Classifier
10 pages
ML File - 1
No ratings yet
ML File - 1
12 pages
Naive Bayes Classification
No ratings yet
Naive Bayes Classification
8 pages
AIML - Lab7 - Manual (Model Eval-Cross Validation)
No ratings yet
AIML - Lab7 - Manual (Model Eval-Cross Validation)
6 pages
Logistic Regression in Machine Learning
No ratings yet
Logistic Regression in Machine Learning
28 pages
Machine Learning Lab Manual 06
100% (1)
Machine Learning Lab Manual 06
8 pages
ML Lab1 PGM
No ratings yet
ML Lab1 PGM
4 pages
Home Work
No ratings yet
Home Work
12 pages
ML Lab PT
No ratings yet
ML Lab PT
25 pages
Machine Learning Evaluation Guide
100% (1)
Machine Learning Evaluation Guide
504 pages
Wa0004.
No ratings yet
Wa0004.
9 pages
22BCE11335 Data Science Assignment2 Answer
No ratings yet
22BCE11335 Data Science Assignment2 Answer
3 pages
Logistic Regression
No ratings yet
Logistic Regression
21 pages
23BCE7092 ML Lab Assignment
No ratings yet
23BCE7092 ML Lab Assignment
14 pages
Logistic Regression for Beginners
No ratings yet
Logistic Regression for Beginners
3 pages
Machine Learning Strategies
No ratings yet
Machine Learning Strategies
59 pages
Aiml Practical
No ratings yet
Aiml Practical
17 pages
DSBDA Practicals
No ratings yet
DSBDA Practicals
16 pages
Prathamesh KRAI
No ratings yet
Prathamesh KRAI
38 pages
ML Manual With Outputs
No ratings yet
ML Manual With Outputs
30 pages
9 Supervised Learning - II
No ratings yet
9 Supervised Learning - II
55 pages
Data Mining Assignment No. 1
No ratings yet
Data Mining Assignment No. 1
7 pages
Logistic Regression on Iris Dataset
No ratings yet
Logistic Regression on Iris Dataset
39 pages
Logistic Regression for Classification
No ratings yet
Logistic Regression for Classification
13 pages
20MEMECH Part 3 - Classification
No ratings yet
20MEMECH Part 3 - Classification
49 pages
Practicalpgm ML
No ratings yet
Practicalpgm ML
33 pages
Iii Aid - ML
No ratings yet
Iii Aid - ML
30 pages
Broadly, There Are 3 Types of Machine Learning Algorithms.
No ratings yet
Broadly, There Are 3 Types of Machine Learning Algorithms.
33 pages
Unit 3
No ratings yet
Unit 3
9 pages
Logistic Regression and Naive Bayes
No ratings yet
Logistic Regression and Naive Bayes
4 pages
AI ML - Cycle 2 Programs
No ratings yet
AI ML - Cycle 2 Programs
15 pages
MISY 631 Final Review Calculators Will Be Provided For The Exam
No ratings yet
MISY 631 Final Review Calculators Will Be Provided For The Exam
9 pages
ML Lab
No ratings yet
ML Lab
10 pages
ADS - Phase 3
No ratings yet
ADS - Phase 3
34 pages
1
No ratings yet
1
13 pages
Import As Import As Import As: "Default - CSV"
No ratings yet
Import As Import As Import As: "Default - CSV"
9 pages
ML Lab Programs
No ratings yet
ML Lab Programs
18 pages
ML File
No ratings yet
ML File
7 pages
Aiml Practicals
No ratings yet
Aiml Practicals
22 pages
ML Python Exercises UOM BDS Classification
No ratings yet
ML Python Exercises UOM BDS Classification
18 pages
Unit 3
No ratings yet
Unit 3
12 pages
Logistic Regression Overview
No ratings yet
Logistic Regression Overview
4 pages
08 CSE358 Intro To Machine Learning II
No ratings yet
08 CSE358 Intro To Machine Learning II
100 pages
Binary Logistic Regression From Scratch
No ratings yet
Binary Logistic Regression From Scratch
10 pages
ML Lab 01999676272
No ratings yet
ML Lab 01999676272
12 pages
21CSC305P ML - Lab Programs 1 - 9
No ratings yet
21CSC305P ML - Lab Programs 1 - 9
36 pages
Task 1
No ratings yet
Task 1
7 pages
AI and ML Lab Ex3 To 12
No ratings yet
AI and ML Lab Ex3 To 12
27 pages
Earthquake Prediction Models
No ratings yet
Earthquake Prediction Models
21 pages
Machine Learning: Engr. Ejaz Ahmad
No ratings yet
Machine Learning: Engr. Ejaz Ahmad
54 pages
ML Lab Exp
No ratings yet
ML Lab Exp
7 pages
C1 W3 Logistic Regression
No ratings yet
C1 W3 Logistic Regression
27 pages
Exp 3 Bi 30
No ratings yet
Exp 3 Bi 30
7 pages
Unit 2
No ratings yet
Unit 2
5 pages
Machine Learning Cheatsheet
No ratings yet
Machine Learning Cheatsheet
5 pages
ML Models
No ratings yet
ML Models
21 pages
Machine Learning Lab Manual 2021-22
No ratings yet
Machine Learning Lab Manual 2021-22
23 pages
Irs Lab Week-4
No ratings yet
Irs Lab Week-4
2 pages
Unit 3.5 & 5 ML
No ratings yet
Unit 3.5 & 5 ML
16 pages
Twitter Spam Detection Insights
No ratings yet
Twitter Spam Detection Insights
13 pages
C4.5 Algorithm for Banking Data Classification
No ratings yet
C4.5 Algorithm for Banking Data Classification
12 pages
Analyzing Sentiment Using IMDb Dataset
No ratings yet
Analyzing Sentiment Using IMDb Dataset
4 pages
AI Lab
No ratings yet
AI Lab
45 pages
Airline Satisfaction Prediction
No ratings yet
Airline Satisfaction Prediction
16 pages
A Comparative Study On University Admiss
No ratings yet
A Comparative Study On University Admiss
12 pages
Change Point Analysis in Time Series
No ratings yet
Change Point Analysis in Time Series
1 page
Unit II Full Notes
No ratings yet
Unit II Full Notes
108 pages
Data Mining Techniques For Early Detection of Breast Cancer
No ratings yet
Data Mining Techniques For Early Detection of Breast Cancer
8 pages
Data Mining Classification: Naïve Bayes Classifier Lecture Notes For Chapter 4 &5
No ratings yet
Data Mining Classification: Naïve Bayes Classifier Lecture Notes For Chapter 4 &5
26 pages
Heart Disease Prediction via Data Mining
No ratings yet
Heart Disease Prediction via Data Mining
9 pages
PMR3508 Problem Set by Fabio Cozman
No ratings yet
PMR3508 Problem Set by Fabio Cozman
6 pages
MLT - Solutions (12 Weeks Merged) PDF
100% (2)
MLT - Solutions (12 Weeks Merged) PDF
143 pages
EMAIL+SPAM+DETECTION Final Fishries++ (2658+to+2664) - 1
No ratings yet
EMAIL+SPAM+DETECTION Final Fishries++ (2658+to+2664) - 1
7 pages
SSRN 4753325
No ratings yet
SSRN 4753325
6 pages
ML 9
No ratings yet
ML 9
15 pages
Human Resource Analytics: Bachelor of Technology
No ratings yet
Human Resource Analytics: Bachelor of Technology
66 pages
Twitter Sentiment Analysis
100% (2)
Twitter Sentiment Analysis
10 pages
Data 20 Bootcamp
No ratings yet
Data 20 Bootcamp
29 pages
Unit 5 - DA - Classification & Clustering
No ratings yet
Unit 5 - DA - Classification & Clustering
105 pages
Machine Learning Interview Questions
No ratings yet
Machine Learning Interview Questions
20 pages
Unit 1-1
No ratings yet
Unit 1-1
45 pages
Machine Learning Quick Guide
No ratings yet
Machine Learning Quick Guide
29 pages
YouTube Spam Detection Study
No ratings yet
YouTube Spam Detection Study
3 pages
An Efficient Computation Risk Prediction Model of Heart Diseases Based On Dual State Stacked Machine
No ratings yet
An Efficient Computation Risk Prediction Model of Heart Diseases Based On Dual State Stacked Machine
114 pages
Unit 4 Classification & Prediction
No ratings yet
Unit 4 Classification & Prediction
10 pages
Slope Stability ML Surrogate Models
No ratings yet
Slope Stability ML Surrogate Models
49 pages
Question Bank
No ratings yet
Question Bank
16 pages

Logistic & Naïve Bayes Analysis

Uploaded by

Logistic & Naïve Bayes Analysis

Uploaded by

Experiment 5

What is Logistic Regression?

Apply Sigmoid function on linear regression:

Logistic Function – Sigmoid Function

df= pd.read_csv("iris.csv") #importing dataset and making dataframe

df.describe() #describes are data

df.shape #tells us about no. of rows and column [rows , columns]

from sklearn.linear_model import LogisticRegression # for Logistic Regression algorithm

print('The accuracy of the Logistic Regression is',metrics.accuracy_score(prediction,Y_test))

Naïve Bayes Classifier Algorithm

df= pd.read_csv("iris.csv") #importing dataset and making dataframe

df.describe() #describes are data

df.info() #gives information about the columns

Train Shape (90, 4)

# making predictions on the testing set

print("Gaussian Naive Bayes model accuracy(in %):", metrics.accuracy_score(Y_test,

Decision Tree Algorithm

# Split dataset into training set and test set

# Create Decision Tree classifer object

# Model Accuracy, how often is the classifier correct?

Identify / prepare a data set for executing K-Means algorithm. Implement K-

The working of the K-Means algorithm is explained in the below steps:

x = dataset.iloc[:, [3, 4]].values

#Using for loop for iterations from 1 to 10. for

#training the K-means model on a dataset

#visulaizing the clusters

You might also like