0% found this document useful (0 votes)

119 views38 pages

ML Lab Manual

The document describes generating a synthetic dataset using the function y = 0.5x + sin(x) + א, where א is Gaussian noise. It splits the data into training, validation and test sets. It then fits a linear regression model to the training data and evaluates the model on the test set. Key steps include importing libraries, generating the synthetic data, visualizing it, splitting the data into sets, fitting a linear regression model using scikit-learn, and evaluating the model on the test set.

Uploaded by

Rahul

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

119 views38 pages

ML Lab Manual

Uploaded by

Rahul

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 38

Lab Manual for

Introduction to Machine Learning

(Subject Code: 3171114)

B.E. 7th Semester

Name of Faculty: Dr. B. J. Makwana

Enrolment No:

Name of Student:_

Government Engineering College, Rajkot

Electronics and Communication Engineering
Department
CERTIFICATE

This is to certify that Miss / Mr. ______________________________________,

of Semester 7th Enrolment number _______________________________ has
satisfactorily completed his/her laboratory work for the subject “Introduction to
Machine Learning (3171114)” as per G.T.U. guidelines in the academic year
__________________ .

Date of submission: _______________

________________ ______________________
Head of Department Faculty
(Dr. B. J. Makwana)
Index
Sr. Page Date
Name Of Experiment Sign
No. No.
Introduction to Python Programming for Machine
1
Learning
Generate a synthetic data set and Use linear
2 regression technique to develop a model, and
evaluate on test samples.
Write a program for Logistic Regression to classify
3
IRIS data for two features (sepal length and width).
Write a program for the concept of decision tree to
4
develop a piecewise linear model and test it as well.
Write a program for kNN algorithm for classification
5
of IRIS dataset
Write a program using PCA algorithm for
dimensionality reduction in case of Olivetti dataset,
6
and follow it with KNN algorithm for face
recognition.
Write a program using Bayes algorithm for email
classification (spam or non-spam) for the
7
opensourced data set from the UC Irvine Machine
Learning Repository
Write a program using SVM on IRIS dataset and
8
carry out classification.
Write a program using SVM algorithm for Boston
9 house price prediction dataset to predict price of
houses from certain features
Experiment No. 1

Introduction to Python Programming for Machine Learning

Problem Statement: Run the commands using Anaconda- Jupyter notebook

Theory :

Important Libraries

1. NumPy : Numerical Python

• It is useful component that makes Python as one of the favorite languages for Data
Science.
• It basically stands for Numerical Python and consists of multidimensional array
objects.
• By using NumPy, we can perform the following important operations −
▪ Mathematical and logical operations on arrays.
▪ Fourier transformation
▪ Operations associated with linear algebra.
We can also see NumPy as the replacement of MatLab because NumPy is mostly used along
with Scipy (Scientific Python) and Mat-plotlib (plotting library).

2. Pandas

Pandas is an open-source Python Library used for high-performance data manipulation and
data analysis using its powerful data structures.

With the help of Pandas, in data processing we can accomplish the following five steps −

• Load
• Prepare
• Manipulate
• Model
• Analyze

1|Page
Key Features of Pandas

• Fast and efficient DataFrame object with default and customized indexing.
• Tools for loading data into in-memory data objects from different file formats.
• Data alignment and integrated handling of missing data.
• Reshaping and pivoting of date sets.
• Label-based slicing, indexing and sub setting of large data sets.
• Columns from a data structure can be deleted or inserted.
• Group by data for aggregation and transformations.
• High performance merging and joining of data.
• Time Series functionality.

Pandas deals with the following data structures −

• Series
• DataFrame

1. Scipy : Scientific Python

The SciPy library of Python is built to work with NumPy arrays and provides many user-friendly and
efficient numerical practices such as routines for numerical integration and optimization. Together,
they run on all popular operating systems, are quick to install and are free of charge. NumPy and
SciPy are easy to use, but powerful enough to depend on by some of the world's leading scientists and
engineers.

2. Scikit-learn

The following are some features of Scikit-learn that makes it so useful −

• It is built on NumPy, SciPy, and Matplotlib.
• It is an open source
• Wide range of machine learning algorithms covering major areas of ML like classification,
clustering, regression, dimensionality reduction, model selection etc. can be implemented
with the help of it.

2|Page
3. Matplotlib
• Matplotlib is a python library used to create 2D graphs and plots by using python scripts.
• It has a module named pyplot which makes things easy for plotting by providing feature to
control line styles, font properties, formatting axes etc.
• It supports a very wide variety of graphs and plots namely - histogram, bar charts, power
spectra, error charts etc.
• It is used along with NumPy to provide an environment that is an effective open source
alternative for MatLab. It can also be used with graphics toolkits like PyQt and wxPython.

Hands on :

3|Page
4|Page
Output :

Conclusion:

5|Page
Experiment No. 2
Problem Statement:

Generate a synthetic data set using following function, and split it into training,
validation, and testing sample points. Use linear regression technique to develop a model,
and evaluate on test samples. ℵ is Gaussian noise.
𝒙
𝒚 = + 𝐬𝐢𝐧(𝒙) + ℵ
𝟐

Steps:
1. Import libraries
2. Prepare data: First we will prepare some data for demonstrating linear regression. To
keep things simple we will assume we have a single input feature. Let us use thefollowing
function to generate our data:
𝒙
𝒚= + 𝐬𝐢𝐧(𝒙) + ℵ
𝟐
3. Split the dataset into training, validation and test sets.
4. Evaluate the model.

Program:
1. Import libraries

import numpy as np
from sklearn import linear_model, datasets, tree
import matplotlib.pyplot as plt %matplotlib inline
2. Prepare data: First we will prepare some data for demonstrating linear regression. To
keep things simple we will assume we have a single input feature. Let us use thefollowing
function to generate our data:
𝒙
𝒚= + 𝐬𝐢𝐧(𝒙) + ℵ
𝟐

number_of_samples = 100
x = np.linspace(-np.pi, np.pi, number_of_samples)
6|Page
y = 0.5*x+np.sin(x)+np.random.random(x.shape)
plt.scatter(x,y,color='black') #Plot y-vs-x in dots
plt.xlabel('x-input feature')
plt.ylabel('y-target values')
plt.title('Fig 1: Data for linear regression')
plt.show()

3. Split the dataset into training, validation and test sets :It is always encouraged
in machine learning to split the available data into
training, validation and test sets. The training set is supposed to be used to train the
model. The model is evaluated on the validation set after every episodeof training.
The performance on the validation set gives a measure of how good the model
generalizes. Various hyper parameters of the model are tuned to improve
performance on the validation set. Finally when the model is completely optimized
and ready for deployment, it is evaluated on the test data and theperformance is
reported in the final description of the model.
In this example we do a 70%−15%−15%70%−15%−15% random split of the data
between the training, validation and test sets respectively.

random_indices = np.random.permutation(number_of_samples)
#Training set
x_train = x[random_indices[:70]]
y_train = y[random_indices[:70]]
#Validation set

x_val = x[random_indices[70:85]]
y_val = y[random_indices[70:85]]
#Test set
x_test = x[random_indices[85:]]
y_test = y[random_indices[85:]]

7|Page
Fit a line to the data

Linear regression learns to fit a hyper plane to our data in the feature space. For one
dimensional data, the hyper plane reduces to a straight line. We will fit a line to our data
using sklearn.linear_model.LinearRegression

model = linear_model.LinearRegression() #Create a least squared error linear regression

object

#sklearn takes the inputs as matrices. Hence we reshpae the arrays into column
matrices
x_train_for_line_fitting = np.matrix(x_train.reshape(len(x_train),1))
y_train_for_line_fitting = np.matrix(y_train.reshape(len(y_train),1))

#Fit the line to the training data

model.fit(x_train_for_line_fitting, y_train_for_line_fitting)

#Plot the line

plt.scatter(x_train, y_train, color='black')
plt.plot(x.reshape((len(x),1)),model.predict(x.reshape((len(x),1))),color='blue')
plt.xlabel('x-input feature')
plt.ylabel('y-target values')
plt.title('Fig 2: Line fit to training data')
plt.show()

4. Evaluate the model

Now that we have our model ready, we must evaluate our model. In a linear regression
scenario, its common to evaluate the model in terms of the mean squared error on the
validation and test sets.

mean_val_error = np.mean( (y_val - model.predict(x_val.reshape(len(x_val),1)))**2 )

mean_test_error = np.mean( (y_test - model.predict(x_test.reshape(len(x_test),1)))**2 )
8|Page
print ('Validation MSE: ', mean_val_error, '\nTest MSE: ', mean_test_error)

Output :

Conclusion :

9|Page
Experiment No. 3

Problem Statement:

Write a program for Logistic Regression to classify IRIS data for two features
(sepal length and width).

Irish Dataset:
• The Iris flower data set consists of 50 samples from each of three species of Iris Flowers — Iris Setosa,
Iris Virginica and Iris Versicolor .
• The Iris flower data set was introduced by the British statistician and biologist Ronald Fisher in his 1936
paper “The use of multiple measurements in taxonomic problems”.
• Iris data is a multivariate data set.
• Four features measured from each sample are —sepal length, sepal width, petal length and petal width, in
centimeters.
• It consists of a set of 150 records under 5 attributes — Sepal length, Sepal width, Petal length, Petal
width and Class-Labels(Species).

Objective:
Given the sepal length, sepal width, petal length and petal width, classify the Iris flower into one of the
three species — Setosa, Virginica and Versicolor.

10 | P a g e
Steps:
1. Import libraries
2. Prepare data
3. Evaluate the model.

Program:
1. Import libraries

import numpy as np
from sklearn import linear_model, datasets, tree
import matplotlib.pyplot as plt %matplotlib inline

2. Prepare data: The data has 4 input-features and 3 output-classes. For simplicity we will
use only two features: sepal-length and sepal-width (both in cm) and two output
classes: Iris Setosa and Iris Versicolour.
iris = datasets.load_iris()
X = iris.data[:,:2] #Choosing only the first two input-features
Y = iris.target
#The first 50 samples are class 0 and the next 50 samples are class 1
X = X[:100]
Y = Y[:100]

11 | P a g e
number_of_samples = len(Y)
#Splitting into training, validation and test sets
random_indices = np.random.permutation(number_of_samples)
#Training set
num_training_samples = int(number_of_samples*0.7)
x_train = X[random_indices[:num_training_samples]]
y_train = Y[random_indices[:num_training_samples]]
#Validation set
num_validation_samples = int(number_of_samples*0.15)
x_val = X[random_indices[num_training_samples : num_training_samples+num_validation_sample
s]]
y_val = Y[random_indices[num_training_samples: num_training_samples+num_validation_samples
]]
#Test set
num_test_samples = int(number_of_samples*0.15)
x_test = X[random_indices[-num_test_samples:]]
y_test = Y[random_indices[-num_test_samples:]]

#Visualizing the training data

plt.scatter([X_class0[:,0]],[X_class0[:,1]],color='red')
plt.scatter([X_class1[:,0]], [X_class1[:,1]],color='blue')
plt.xlabel('sepal length')
plt.ylabel('sepal width')
plt.legend(['class 0','class 1'])
plt.title('Fig 3: Visualization of training data')
plt.show()

3. Fit logistic regression model

Now we fit a linear decision boundary through the feature space that separates the two classes well.
We use sklearn.linear_model.LogisticRegression.

model = linear_model.LogisticRegression(C=1e5)#C is the inverse of the regularization factor

12 | P a g e
full_X = np.concatenate((X_class0,X_class1),axis=0)
full_Y = np.concatenate((Y_class0,Y_class1),axis=0)
model.fit(full_X,full_Y)

# Display the decision boundary

#(Visualization code taken from: http://scikit-
learn.org/stable/auto_examples/linear_model/plot_iris_logistic.html) #
Plot the decision boundary. For that, we will assign a color to each#
point in the mesh [x_min, m_max]x[y_min, y_max].

h = .02 # step size in the mesh

x_min, x_max = full_X[:, 0].min() - .5, full_X[:, 0].max() + .5
y_min, y_max = full_X[:, 1].min() - .5, full_X[:, 1].max() + .5
xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))
Z = model.predict(np.c_[xx.ravel(), yy.ravel()]) #predict for the entire mesh to find the regions for
each class in the feature space

# Put the result into a color plot

Z = Z.reshape(xx.shape)
plt.figure()
plt.pcolormesh(xx, yy, Z, cmap=plt.cm.Paired)

# Plot also the training points

plt.scatter([X_class0[:, 0]], [X_class0[:, 1]], c='red', edgecolors='k', cmap=plt.cm.Paired)
plt.scatter([X_class1[:, 0]], [X_class1[:, 1]], c='blue', edgecolors='k', cmap=plt.cm.Paired)
plt.xlabel('Sepal length')
plt.ylabel('Sepal width')
plt.title('Fig 4: Visualization of decision boundary')
plt.xlim(xx.min(), xx.max())
plt.ylim(yy.min(), yy.max())

plt.show()
plt.pcolormesh(xx, yy, Z, cmap=plt.cm.Paired)

4. Evaluate the model

We calculate the validation and test misclassification errors.

validation_set_predictions = [model.predict(x_val[i].reshape((1,2)))[0] for i in range(x_val.shape[0])]

validation_misclassification_percentage = 0

13 | P a g e
for i in range(len(validation_set_predictions)):
if validation_set_predictions[i]!=y_val[i]:
validation_misclassification_percentage+=1
validation_misclassification_percentage *= 100/len(y_val)
print ('validation misclassification percentage =', validation_misclassification_percentage, '%')

test_set_predictions = [model.predict(x_test[i].reshape((1,2)))[0] for i in range(x_test.shape[0])]

test_misclassification_percentage = 0
for i in range(len(test_set_predictions)):
if test_set_predictions[i]!=y_test[i]:
test_misclassification_percentage+=1
test_misclassification_percentage *= 100/len(y_test)
print ('test misclassification percentage =', test_misclassification_percentage, '%')

Output :

Conclusion :

14 | P a g e
Experiment No. 4

Problem Statement:

For the synthetic dataset used in experiment 2, write a program for the concept
of decision tree to develop a piecewise linear model and test it as well.

Steps:
1. Import libraries
2. Prepare data
3. Split the data into training, validation and test sets
4. Fit model
5. Evaluate the model.

Program:
1. Import libraries

import numpy as np
from sklearn import linear_model, datasets, tree
import matplotlib.pyplot as plt %matplotlib inline

2. Prepare data:

number_of_samples = 100
x = np.linspace(-np.pi, np.pi, number_of_samples)
y = 0.5*x+np.sin(x)+np.random.random(x.shape)
plt.scatter(x,y,color='black') #Plot y-vs-x in dots
plt.xlabel('x-input feature')
plt.ylabel('y-target values')
plt.title('Fig 5: Data for linear regression')
plt.show()

3. Split the data into training, validation and test sets

15 | P a g e
random_indices = np.random.permutation(number_of_samples)
#Training set
x_train = x[random_indices[:70]]
y_train = y[random_indices[:70]]
#Validation set
x_val = x[random_indices[70:85]]
y_val = y[random_indices[70:85]]
#Test set
x_test = x[random_indices[85:]]
y_test = y[random_indices[85:]]

4. Fit a line to the data

maximum_depth_of_tree = np.arange(10)+1
train_err_arr = []
val_err_arr = []
test_err_arr = []

for depth in maximum_depth_of_tree:

model = tree.DecisionTreeRegressor(max_depth=depth)
#sklearn takes the inputs as matrices. Hence we reshpae the arrays into column matrices
x_train_for_line_fitting = np.matrix(x_train.reshape(len(x_train),1))
y_train_for_line_fitting = np.matrix(y_train.reshape(len(y_train),1))

#Fit the line to the training data

model.fit(x_train_for_line_fitting, y_train_for_line_fitting)

#Plot the line

plt.figure()
plt.scatter(x_train, y_train, color='black')
plt.plot(x.reshape((len(x),1)),model.predict(x.reshape((len(x),1))),color='blue')
plt.xlabel('x-input feature')
plt.ylabel('y-target values')
plt.title('Line fit to training data with max_depth='+str(depth))
plt.show()
5. Evaluate the model.

mean_train_error = np.mean( (y_train - model.predict(x_train.reshape(len(x_train),1)))**2 )

mean_val_error = np.mean( (y_val - model.predict(x_val.reshape(len(x_val),1)))**2 )
mean_test_error = np.mean( (y_test - model.predict(x_test.reshape(len(x_test),1)))**2 )

train_err_arr.append(mean_train_error)
val_err_arr.append(mean_val_error)
16 | P a g e
test_err_arr.append(mean_test_error)

print ('Training MSE: ', mean_train_error, '\nValidation MSE: ', mean_val_error, '\nTest MSE:
', mean_test_error)

plt.figure()
plt.plot(train_err_arr,c='red')
plt.plot(val_err_arr,c='blue')
plt.plot(test_err_arr,c='green')
plt.legend(['Training error', 'Validation error', 'Test error'])
plt.title('Variation of error with maximum depth of tree')
plt.show()

Output :

Conclusion :

17 | P a g e
Experiment No. 5

Problem Statement:

Write a program for kNN algorithm for classification of IRIS dataset.

Objective:
Given the sepal length, sepal width, petal length and petal width, classify the Iris flower into one of the
three species — Setosa, Virginica and Versicolor.

18 | P a g e
Program:
1. Import Libraries

from future import print_function

import numpy as np
from sklearn import datasets, neighbors, linear_model, tree
from sklearn.decomposition import PCA
from sklearn.neighbors import KNeighborsClassifier
from sklearn.datasets import load_iris, fetch_olivetti_faces
from sklearn.model_selection import train_test_split
from sklearn.decomposition import PCA as RandomizedPCA
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
import matplotlib.pyplot as plt
from time import time
%matplotlib inline

2. Prepare dataset

First we will prepare the dataset. The dataset we choose is a modified version of the Iris dataset. We
choose only the first two input feature dimensions viz sepal-length and sepal-width (both in cm) for
ease of visualization.
iris = load_iris()
X = iris.data[:,:2] #Choosing only the first two input-features
Y = iris.target
19 | P a g e
number_of_samples = len(Y)

print(number_of_samples)

#Splitting into training and test sets

random_indices = np.random.permutation(number_of_samples)
#Training set
num_training_samples = int(number_of_samples*0.75)
x_train = X[random_indices[:num_training_samples]]
y_train = Y[random_indices[:num_training_samples]]

#Test set
x_test = X[random_indices[num_training_samples:]]
y_test = Y[random_indices[num_training_samples:]]

#Visualizing the training data

X_class0 = np.asmatrix([x_train[i] for i in range(len(x_train)) if y_tr
ain[i]==0]) #Picking only the first two classes
Y_class0 = np.zeros((X_class0.shape[0]),dtype=np.int)
X_class1 = np.asmatrix([x_train[i] for i in range(len(x_train)) if y_tr
ain[i]==1])
Y_class1 = np.ones((X_class1.shape[0]),dtype=np.int)
X_class2 = np.asmatrix([x_train[i] for i in range(len(x_train)) if y_tr
ain[i]==2])
Y_class2 = np.full((X_class2.shape[0]),fill_value=2,dtype=np.int)

plt.scatter([X_class0[:,0]],[ X_class0[:,1]],color='red')
plt.scatter([X_class1[:,0]],[ X_class1[:,1]],color='blue')
plt.scatter([X_class2[:,0]], [X_class2[:,1]],color='green')
plt.xlabel('sepal length')
plt.ylabel('sepal width')
plt.legend(['class 0','class 1','class 2'])
plt.title('Fig 1: Visualization of training data')
plt.show()

Note that the first class is linearly separable from the other two classes but the second and
third classes are not linearly separable from each other.

3. K-nearest neighbour classifier algorithm

Now that our training data is ready we will jump right into the classification task. Just to remind you,
the K-nearest neighbor is a non-parametric learning algorithm and does not learn an parameterized
function that maps the input to the output. Rather it looks up the training set every time it is asked to
classify a point and finds out the K nearest neighbors of the query point. The class corresponding to
majority of the points is output as the class of the query point.

model = neighbors.KNeighborsClassifier(n_neighbors = 10) # K = 10

model.fit(x_train, y_train)
4. Visualize the working of the algorithm
20 | P a g e
Let's see how the algorithm works. We choose the first point in the test set as our query point.

query_point = np.array([5.9,2.9])
true_class_of_query_point = 1
predicted_class_for_query_point = model.predict([query_point])
print("Query point: {}".format(query_point))
print("True class of query point: {}".format(true_class_of_query_point))
query_point.shape

Let's visualize the point and its K=5 nearest neighbors.

neighbors_object = neighbors.NearestNeighbors(n_neighbors=10)
neighbors_object.fit(x_train)
distances_of_nearest_neighbors, indices_of_nearest_neighbors_of_query_point =
neighbors_object.kneighbors([query_point])
nearest_neighbors_of_query_point = x_train[indices_of_nearest_neighbors_of_qu
ery_point[0]]
print("The query point is: {}\n".format(query_point))
print("The nearest neighbors of the query point are:\n
{}\n".format(nearest_neighbors_of_query_point))
print("The classes of the nearest neighbors are:
{}\n".format(y_train[indices_of_nearest_neighbors_of_query_point[0]]))
print("Predicted class for query point:
{}".format(predicted_class_for_query_point[0]))

plt.scatter([X_class0[:,0]], [X_class0[:,1]],color='red')
plt.scatter([X_class1[:,0]], [X_class1[:,1]],color='blue')
plt.scatter([X_class2[:,0]], [X_class2[:,1]],color='green')
plt.scatter(query_point[0], query_point[1],marker='^',s=75,color='black')
plt.scatter(nearest_neighbors_of_query_point[:,0], nearest_neighbors_of_query
_point[:,1],marker='s',s=150,color='yellow',alpha=0.30)
plt.xlabel('sepal length')
plt.ylabel('sepal width')
plt.legend(['class 0','class 1','class 2'])
plt.title('Fig 3: Working of the K-NN classification algorithm')
plt.show()

def evaluate_performance(model, x_test, y_test):

test_set_predictions = [model.predict(x_test[i].reshape((1,len(x_test[i])
)))[0] for i in range(x_test.shape[0])]
test_misclassification_percentage = 0
for i in range(len(test_set_predictions)):
if test_set_predictions[i]!=y_test[i]:
test_misclassification_percentage+=1
test_misclassification_percentage *= 100/len(y_test)
return test_misclassification_percentage
5. Evaluate the performances on the validation and test sets

21 | P a g e
print("Evaluating K-NN classifier:")
test_err = evaluate_performance(model, x_test, y_test)
print('test misclassification percentage = {}%'.format(test_err))

Output :

Conclusion :

22 | P a g e
Experiment No. 6

Problem Statement:

Write a program using PCA algorithm for dimensionality reduction in case of

Olivetti dataset, and follow it with KNN algorithm for face recognition.

Background : The Olivetti faces dataset

Brief information about Olivetti Dataset:

• Face images taken between April 1992 and April 1994.

• There are ten different image of each of 40 distinct people
• There are 400 face images in the dataset
• Face images were taken at different times, variying ligthing, facial express and facial detail
• All face images have black background
• The images are gray level
• Size of each image is 64x64
• Image pixel values were scaled to [0, 1] interval
• Names of 40 people were encoded to an integer from 0 to 39

Program:
1. Import Libraries

from future import print_function

2. Prepare dataset
23 | P a g e
First we will prepare the dataset. The dataset we choose is a modified version of the Iris dataset. We
choose only the first two input feature dimensions viz sepal-length and sepal-width (both in cm) for
ease of visualization.
iris = load_iris()
X = iris.data[:,:2] #Choosing only the first two input-features
Y = iris.target

number_of_samples = len(Y)

print(number_of_samples)

#Splitting into training and test sets

#Test set
x_test = X[random_indices[num_training_samples:]]
y_test = Y[random_indices[num_training_samples:]]

#Visualizing the training data

plt.scatter([X_class0[:,0]],[ X_class0[:,1]],color='red')
plt.scatter([X_class1[:,0]],[ X_class1[:,1]],color='blue')
plt.scatter([X_class2[:,0]], [X_class2[:,1]],color='green')
plt.xlabel('sepal length')
plt.ylabel('sepal width')
plt.legend(['class 0','class 1','class 2'])
plt.title('Fig 1: Visualization of training data')
plt.show()

Note that the first class is linearly separable from the other two classes but the second and
third classes are not linearly separable from each other.

3. K-nearest neighbour classifier algorithm

Now that our training data is ready we will jump right into the classification task. Just to remind you,
the K-nearest neighbor is a non-parametric learning algorithm and does not learn an parameterized
24 | P a g e
function that maps the input to the output. Rather it looks up the training set every time it is asked to
classify a point and finds out the K nearest neighbors of the query point. The class corresponding to
majority of the points is output as the class of the query point.

model = neighbors.KNeighborsClassifier(n_neighbors = 10) # K = 10

model.fit(x_train, y_train)

4. Visualize the working of the algorithm

Let's see how the algorithm works. We choose the first point in the test set as our query point.

Let's visualize the point and its K=5 nearest neighbors.

plt.scatter([X_class0[:,0]], [X_class0[:,1]],color='red')
plt.scatter([X_class1[:,0]], [X_class1[:,1]],color='blue')
plt.scatter([X_class2[:,0]], [X_class2[:,1]],color='green')
plt.scatter(query_point[0], query_point[1],marker='^',s=75,color='black')
plt.scatter(nearest_neighbors_of_query_point[:,0], nearest_neighbors_of_query
_point[:,1],marker='s',s=150,color='yellow',alpha=0.30)
plt.xlabel('sepal length')
plt.ylabel('sepal width')
plt.legend(['class 0','class 1','class 2'])
plt.title('Fig 3: Working of the K-NN classification algorithm')
plt.show()

25 | P a g e
def evaluate_performance(model, x_test, y_test):
test_set_predictions = [model.predict(x_test[i].reshape((1,len(x_test[i])
)))[0] for i in range(x_test.shape[0])]
test_misclassification_percentage = 0
for i in range(len(test_set_predictions)):
if test_set_predictions[i]!=y_test[i]:
test_misclassification_percentage+=1
test_misclassification_percentage *= 100/len(y_test)
return test_misclassification_percentage

5. Evaluate the performances on the validation and test sets

print("Evaluating K-NN classifier:")

test_err = evaluate_performance(model, x_test, y_test)
print('test misclassification percentage = {}%'.format(test_err))

Output :

Conclusion :

26 | P a g e
Experiment No. 7

Problem Statement:

Write a program using Bayes algorithm for email classification (spam or non-spam)
for the opensourced data set from the UC Irvine Machine Learning Repository

Program:
import numpy as np
from sklearn.model_selection import train_test_split

datafile = open('C:/Users/AntennaPC/Desktop/spambase.data','r')

# Download spambase.data from the MSTeam of this course, Save it and

give file path from your pc

data = []
for line in datafile:
line = [float(element) for element in line.rstrip('\n').split(',')]
data.append(np.asarray(line))

num_features = 48
X = [data[i][:num_features] for i in range(len(data))]
y = [int(data[i][-1]) for i in range(len(data))]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25,

random_state=42)

#Making likelihood estimations

#Find the two classes

X_train_class_0 = [X_train[i] for i in range(len(X_train)) if y_train[i]==0]

X_train_class_1 = [X_train[i] for i in range(len(X_train)) if y_train[i]==1]
#Find the class specific likelihoods of each feature

likelihoods_class_0 = np.mean(X_train_class_0, axis=0)/100.0

likelihoods_class_1 = np.mean(X_train_class_1, axis=0)/100.0

#Calculate the class priors

num_class_0 = float(len(X_train_class_0))

27 | P a g e
num_class_1 = float(len(X_train_class_1))

prior_probability_class_0 = num_class_0 / (num_class_0 + num_class_1)

prior_probability_class_1 = num_class_1 / (num_class_0 + num_class_1)

log_prior_class_0 = np.log10(prior_probability_class_0)
log_prior_class_1 = np.log10(prior_probability_class_1)

def calculate_log_likelihoods_with_naive_bayes(feature_vector, Class):

assert len(feature_vector) == num_features
log_likelihood = 0.0 #using log-likelihood to avoid underflow
if Class==0:
for feature_index in range(len(feature_vector)):
if feature_vector[feature_index] == 1: #feature present
log_likelihood +=
np.log10(likelihoods_class_0[feature_index])
elif feature_vector[feature_index] == 0: #feature absent
log_likelihood += np.log10(1.0 -
likelihoods_class_0[feature_index])
elif Class==1:
for feature_index in range(len(feature_vector)):
if feature_vector[feature_index] == 1: #feature present
log_likelihood +=
np.log10(likelihoods_class_1[feature_index])
elif feature_vector[feature_index] == 0: #feature absent
log_likelihood += np.log10(1.0 -
likelihoods_class_1[feature_index])
else:
raise ValueError("Class takes integer values 0 or 1")

return log_likelihood

def calculate_class_posteriors(feature_vector):
log_likelihood_class_0 =
calculate_log_likelihoods_with_naive_bayes(feature_vector, Class=0)
log_likelihood_class_1 =
calculate_log_likelihoods_with_naive_bayes(feature_vector, Class=1)

log_posterior_class_0 = log_likelihood_class_0 + log_prior_class_0

log_posterior_class_1 = log_likelihood_class_1 + log_prior_class_1

return log_posterior_class_0, log_posterior_class_1

def classify_spam(document_vector):
feature_vector = [int(element>0.0) for element in document_vector]
log_posterior_class_0, log_posterior_class_1 =
calculate_class_posteriors(feature_vector)
if log_posterior_class_0 > log_posterior_class_1:
return 0
else:
return 1

28 | P a g e
#Predict spam or not on the test set

predictions = []
for email in X_test:
predictions.append(classify_spam(email))

def evaluate_performance(predictions, ground_truth_labels):

correct_count = 0.0
for item_index in range(len(predictions)):
if predictions[item_index] == ground_truth_labels[item_index]:
correct_count += 1.0
accuracy = correct_count/len(predictions)
return accuracy
accuracy_of_naive_bayes = evaluate_performance(predictions, y_test)
print(accuracy_of_naive_bayes)

for i in range(100):
print predictions[i], y_test[i]

Output :

Conclusion :

29 | P a g e
Experiment No. 8

Problem Statement:

Write a program using SVM on IRIS dataset and carry out classification.

Objective:
Given the sepal length, sepal width, petal length and petal width, classify the Iris flower into one of the
three species — Setosa, Virginica and Versicolor.

30 | P a g e
Program:
1. Import Libraries

from future import division, print_function

import numpy as np
from sklearn import datasets, svm
from sklearn.model_selection import train_test_split
# from sklearn.cross_validation import train_test_split
import matplotlib.pyplot as plt
%matplotlib inline

2. Prepare dataset

iris = datasets.load_iris()
X = iris.data[:,:2]
y = iris.target

X_train, X_test, y_train, y_test = train_test_split(X, y,

test_size=0.25, random_state=42)

3. Use Support Vector Machine with different kinds of kernels and evaluate
performance

def evaluate_on_test_data(model=None):
predictions = model.predict(X_test)
correct_classifications = 0
for i in range(len(y_test)):
31 | P a g e
if predictions[i] == y_test[i]:
correct_classifications += 1
accuracy = 100*correct_classifications/len(y_test) #Accuracy as a
percentage
return accuracy

kernels = ('linear','poly','rbf')
accuracies = []
for index, kernel in enumerate(kernels):
model = svm.SVC(kernel=kernel)
model.fit(X_train, y_train)
acc = evaluate_on_test_data(model)
accuracies.append(acc)
print("{} % accuracy obtained with kernel = {}".format(acc,
kernel))

4. Visualize the Visualize the decision boundaries

#Train SVMs with different kernels

svc = svm.SVC(kernel='linear').fit(X_train, y_train)
rbf_svc = svm.SVC(kernel='rbf', gamma=0.7).fit(X_train, y_train)
poly_svc = svm.SVC(kernel='poly', degree=3).fit(X_train, y_train)

#Create a mesh to plot in

h = .02 # step size in the mesh
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
np.arange(y_min, y_max, h))

#Define title for the plots

titles = ['SVC with linear kernel',
'SVC with RBF kernel',
'SVC with polynomial (degree 3) kernel']

for i, clf in enumerate((svc, rbf_svc, poly_svc)):

# Plot the decision boundary. For that, we will assign a color to each
# point in the mesh [x_min, m_max]x[y_min, y_max].
plt.figure(i)

Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])

# Put the result into a color plot

Z = Z.reshape(xx.shape)
plt.contourf(xx, yy, Z, cmap=plt.cm.Paired , alpha=0.8)

# Plot also the training points

plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.ocean)
plt.xlabel('Sepal length')
plt.ylabel('Sepal width')
plt.xlim(xx.min(), xx.max())
plt.ylim(yy.min(), yy.max())
plt.xticks(())
plt.yticks(())
plt.title(titles[i])

plt.show()
32 | P a g e
5. Check the support vectors

#Checking the support vectors of the polynomial kernel (for example)

print("The support vectors are:\n", poly_svc.support_vectors_)

Evaluate the performances on the validation and test sets

print("Evaluating K-NN classifier:")

test_err = evaluate_performance(model, x_test, y_test)
print('test misclassification percentage = {}%'.format(test_err))

Output :

Conclusion :

33 | P a g e
Experiment No. 9

Problem Statement:

Write a program using SVM algorithm for Boston house price prediction dataset to predict
price of houses from certain features.

Program:
1. Import Libraries

from future import division, print_function

2. Load data from the Boston dataset

boston = datasets.load_boston()
X = boston.data
y = boston.target

X_train, X_test, y_train, y_test = train_test_split(X, y,

test_size=0.25, random_state=42)

3. Use Support Vector Machine with different kinds of kernels and evaluate
performance

def evaluate_on_test_data(model=None):
predictions = model.predict(X_test)
sum_of_squared_error = 0
for i in range(len(y_test)):
err = (predictions[i]-y_test[i]) **2
sum_of_squared_error += err
mean_squared_error = sum_of_squared_error/len(y_test)
RMSE = np.sqrt(mean_squared_error)
return RMSE

34 | P a g e
kernels = ('linear','rbf')
RMSE_vec = []
for index, kernel in enumerate(kernels):
model = svm.SVR(kernel=kernel)
model.fit(X_train, y_train)
RMSE = evaluate_on_test_data(model)
RMSE_vec.append(RMSE)
print("RMSE={} obtained with kernel = {}".format(RMSE, kernel))

Output :

Conclusion :

35 | P a g e

ML Lab Manual
No ratings yet
ML Lab Manual
19 pages
ML Lab
No ratings yet
ML Lab
30 pages
Machine Learning Laboratory Exercises
No ratings yet
Machine Learning Laboratory Exercises
16 pages
R22 ML Lab Manual
No ratings yet
R22 ML Lab Manual
25 pages
ML Record - Merged
No ratings yet
ML Record - Merged
29 pages
Machine Learning Lab Course Overview
No ratings yet
Machine Learning Lab Course Overview
49 pages
Machine Learning Lab
No ratings yet
Machine Learning Lab
43 pages
ML RECORD - Merged
No ratings yet
ML RECORD - Merged
33 pages
Linear Regression
No ratings yet
Linear Regression
6 pages
Machine Learning
No ratings yet
Machine Learning
30 pages
Machine Learning Lab Guide
No ratings yet
Machine Learning Lab Guide
36 pages
ML File Syllabus
No ratings yet
ML File Syllabus
43 pages
ML Lab - Manual
No ratings yet
ML Lab - Manual
15 pages
Practical File DL
No ratings yet
Practical File DL
14 pages
Lab Experiments Vi Sem-1
No ratings yet
Lab Experiments Vi Sem-1
10 pages
ML Lab Mala Reddy CLG
No ratings yet
ML Lab Mala Reddy CLG
23 pages
ML Manual New
No ratings yet
ML Manual New
38 pages
Lab Manual ML R22
No ratings yet
Lab Manual ML R22
27 pages
Python Code for Central Tendency
No ratings yet
Python Code for Central Tendency
28 pages
r22 1 9 ML Lab Manual r22 Regulations
No ratings yet
r22 1 9 ML Lab Manual r22 Regulations
24 pages
MLCyber Lab
No ratings yet
MLCyber Lab
9 pages
Machinelearning - Lab Manual
No ratings yet
Machinelearning - Lab Manual
26 pages
CO-367 Machine Learning Lab File: Submitted To: Submitted by
No ratings yet
CO-367 Machine Learning Lab File: Submitted To: Submitted by
12 pages
Best Python Libraries For Machine Learning - GeeksforGeeks
No ratings yet
Best Python Libraries For Machine Learning - GeeksforGeeks
18 pages
MLLAb
No ratings yet
MLLAb
36 pages
ML Exp
No ratings yet
ML Exp
9 pages
Smec ML Lab Manual R22
No ratings yet
Smec ML Lab Manual R22
21 pages
ML Lab
No ratings yet
ML Lab
33 pages
Important Questions
No ratings yet
Important Questions
4 pages
Tushar ML
No ratings yet
Tushar ML
52 pages
Image Processing with ML in Python
No ratings yet
Image Processing with ML in Python
20 pages
ML Lab Syllabus for Students
No ratings yet
ML Lab Syllabus for Students
90 pages
ML Priyesha - 778
No ratings yet
ML Priyesha - 778
23 pages
Sahil ML
No ratings yet
Sahil ML
21 pages
Unit 1-1
No ratings yet
Unit 1-1
10 pages
ML Record
No ratings yet
ML Record
21 pages
Krishna Edx Machine Learning With Python
No ratings yet
Krishna Edx Machine Learning With Python
18 pages
PyCaret for Machine Learning Experiment
No ratings yet
PyCaret for Machine Learning Experiment
10 pages
Seminar Presentation
No ratings yet
Seminar Presentation
25 pages
Machine Learning Laboratory Report
No ratings yet
Machine Learning Laboratory Report
23 pages
Digital Principal and System Design
No ratings yet
Digital Principal and System Design
17 pages
FDS Lab
No ratings yet
FDS Lab
11 pages
cp4252 Machine Learning Lab Manual
No ratings yet
cp4252 Machine Learning Lab Manual
21 pages
Lab Mannual of ML
No ratings yet
Lab Mannual of ML
43 pages
Machine Learning - Manual
No ratings yet
Machine Learning - Manual
32 pages
D P Lab Manual
No ratings yet
D P Lab Manual
54 pages
Introduction To Machine Learning With Python
No ratings yet
Introduction To Machine Learning With Python
2 pages
ML Manual
No ratings yet
ML Manual
21 pages
ML Updated File
No ratings yet
ML Updated File
36 pages
Report Intership Chapters
No ratings yet
Report Intership Chapters
39 pages
ML Assignment 1
No ratings yet
ML Assignment 1
15 pages
ML LabManual
No ratings yet
ML LabManual
16 pages
Machinelearninglabmanual
No ratings yet
Machinelearninglabmanual
47 pages
ML Lab Record
No ratings yet
ML Lab Record
45 pages
Machine Learning Laboratory: Manual
No ratings yet
Machine Learning Laboratory: Manual
52 pages
Silver Oak College of Computer Application: Subject:Machine Learning
No ratings yet
Silver Oak College of Computer Application: Subject:Machine Learning
15 pages
ML Lab (R22) Manual
No ratings yet
ML Lab (R22) Manual
25 pages
Effects of A Personalized Game On Students Outcomes and Visual Attention During Digital Citizenship Learning-1
No ratings yet
Effects of A Personalized Game On Students Outcomes and Visual Attention During Digital Citizenship Learning-1
23 pages
GenChem1 Lesson 2
No ratings yet
GenChem1 Lesson 2
48 pages
Theories of IR and Their Approach To Security Studies (2021) - ARTICLE (SK)
No ratings yet
Theories of IR and Their Approach To Security Studies (2021) - ARTICLE (SK)
8 pages
Enhancing Self-Concept and Esteem
No ratings yet
Enhancing Self-Concept and Esteem
3 pages
ACMS-2026 Brochure
No ratings yet
ACMS-2026 Brochure
2 pages
Nestlé Nigeria Q1 2022 Financial Report
No ratings yet
Nestlé Nigeria Q1 2022 Financial Report
27 pages
Consumer Trend Canvas
100% (3)
Consumer Trend Canvas
23 pages
V Semester Diploma Examination MAY-2024 Full Stack Development-20CS52I
No ratings yet
V Semester Diploma Examination MAY-2024 Full Stack Development-20CS52I
33 pages
Cardiodynamics - Dra-1. Valerio
No ratings yet
Cardiodynamics - Dra-1. Valerio
7 pages
Roman Tunnel Construction Techniques
No ratings yet
Roman Tunnel Construction Techniques
4 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
56 pages
IT Large Cap Valuation Analysis
No ratings yet
IT Large Cap Valuation Analysis
51 pages
Reid TacticalStrategicDeception 2017
No ratings yet
Reid TacticalStrategicDeception 2017
25 pages
Deep Learning UNIT-II Part1
No ratings yet
Deep Learning UNIT-II Part1
48 pages
SKF Bearing Selection
No ratings yet
SKF Bearing Selection
55 pages
POL 1101 E - W25 (Online)
No ratings yet
POL 1101 E - W25 (Online)
6 pages
Design Project Brief
No ratings yet
Design Project Brief
8 pages
APPL115 Concept
No ratings yet
APPL115 Concept
52 pages
Fallacies
No ratings yet
Fallacies
6 pages
2025 - Year 12 Subject Requirement List
No ratings yet
2025 - Year 12 Subject Requirement List
6 pages
MGT 4010 Final Exam QUESTION Semester 1 2019
No ratings yet
MGT 4010 Final Exam QUESTION Semester 1 2019
14 pages
CS102 Test 2 Section A (Theory)
No ratings yet
CS102 Test 2 Section A (Theory)
2 pages
Engineering Materials Composition
No ratings yet
Engineering Materials Composition
12 pages
13 DPP Thnote+19c-19e Dpps+Ans+Sol
No ratings yet
13 DPP Thnote+19c-19e Dpps+Ans+Sol
56 pages
Overview of Renewable Energy Sources
No ratings yet
Overview of Renewable Energy Sources
11 pages
Suspending Damage: A Letter To Communities, Eve Tuck
No ratings yet
Suspending Damage: A Letter To Communities, Eve Tuck
20 pages
CoIL Challenge 2000 Tasks and Results Predicting A
No ratings yet
CoIL Challenge 2000 Tasks and Results Predicting A
7 pages
About Earthyn
No ratings yet
About Earthyn
6 pages
Chapter 6
No ratings yet
Chapter 6
8 pages
Ilonggo-Literature MT 3B
No ratings yet
Ilonggo-Literature MT 3B
10 pages