Machine Learning Lab Manual
Machine Learning Lab Manual
M.KALIPATTI,METTUR(TK),SALEM(DT) – 636453
BONAFIDE CERTIFICATE
Name : …………………………………………………………
Degree : …………………………………………………………
Branch : …………………………………………………………
Reg.No. : …………………………………………………………
Certified that this is the bonafide record of the work done by the above
student in ........................................................................ Laboratory
during the academic year …………………………………
INTERNALEXAMINER EXTERNALEXAMINER
Page 1
LAB MANNERS
Students must be present in proper dress code and wear the ID card.
Students should enter the log-in and log-out time in the log register without
fail.
Students are not allowed to download pictures, music, videos
or files without the permission of respective lab in-charge.
Student should wear their own lab coats and bring observation notebooks to
the laboratory classes regularly.
Record of experiments done in a particular class should be submitted in the next
lab class.
Students who do not submit the record notebook in time will not be allowed to
do the next experiment and will not be given attendance for that laboratory
class.
Students will not be allowed to leave the laboratory until they complete the
experiment.
Students are advised to switch-off the Monitors and CPU when they leave the
lab.
Students are advised to arrange the chairs properly when they leave the lab.
Page 2
College
VISION
To emerge into a frontline Institution of professional learning by inculcating quality
standards in technical education, high pattern of discipline and leadership qualities to meet the
global standards.
MISSION
We dedicate and commit ourselves to achieve, sustain and foster unmatched
excellence in Technical Education through global standards to flourish the domain of
Engineering and nurture talents needed for the students to develop themselves with ethical
values to the challenging technology for the betterment of the nation.
Department
VISION
To promote innovative research and consultancy through effectual teaching and
learning processes to develop emerging technology solutions for the benefits of industry and
society.
MISSION
Imparting quality value based technical education and produce technology
professionals with innovative thoughts and inspiring leadership skills.
Having rational thinking for design and development of cutting-edge products by
engaging with industry stakeholders to fulfill the global demands and standards.
Strengthening the core competence in the domain of Artificial Intelligence and Data
Science.
Enabling the graduates to adapt to the evolving technologies through strong
fundamentals and lifelong.
Page 3
Program Outcomes(POs)
To apply knowledge of mathematics, science, engineering
PO1 fundamentals and
Computer science theory to solve the complex problems in
Computer Science and Engineering.
Page 4
Program Specific Outcomes (PSOs)
1. Evolve AI based efficient domain specific process for effective decision making in
several domains such as business and governance domains.
2. Arrive at actionable Foresight, Insight, and Hindsight from data solving businessand
engineering problems.
3. Create, select and apply the theoretical knowledge of AI and Data analysis alongwith
practical industrial tools and techniques to manage and solve wicked societal problems.
4. Capable of developing data analysis, knowledge representation and knowledge
engineering, and hence capable of coordinating complex projects.
5. Able to carry out fundamental research to cater the critical needs of the societythrough
cutting edge technologies of AI.
Page 5
AD3461 MACHINE LEARNING LABORATORY LTPC
0042
COURSE OBJECTIVES:
To understand the data sets and apply suitable algorithms for selecting the
appropriate features for analysis.
To learn to implement supervised machine learning algorithms on standard
datasets and evaluate the performance.
To experiment the unsupervised machine learning algorithms on standard
datasets and evaluate the performance.
To build the graph based learning models for standard data sets.
To compare the performance of different ML algorithms and select the suitable
one based on the application.
LIST OF EXPERIMENTS:
1. For a given set of training data examples stored in a .CSV file, implement and demonstrate
the Candidate-Elimination algorithm to output a description of the set of all hypotheses
consistent with the training examples.
2. Write a program to demonstrate the working of the decision tree based ID3 algorithm. Use
an appropriate data set for building the decision tree and apply this knowledge to classify a new
sample.
3. Build an Artificial Neural Network by implementing the Backpropagation algorithm and test
the same using appropriate data sets.
4. Write a program to implement the naïve Bayesian classifier for a sample training data set
stored as a .CSV file and compute the accuracy with a few test data sets.
5. Implement naïve Bayesian Classifier model to classify a set of documents and measure the
accuracy, precision, and recall.
6. Write a program to construct a Bayesian network to diagnose CORONA infection using
standard WHO Data Set.
7. Apply EM algorithm to cluster a set of data stored in a .CSV file. Use the same data set for
clustering using the k-Means algorithm. Compare the results of these two algorithms.
8. Write a program to implement k-Nearest Neighbour algorithm to classify the iris data set.
Print both correct and wrong predictions.
9. Implement the non-parametric Locally Weighted Regression algorithm in order to fit data
points. Select an appropriate data set for your experiment and draw graphs.
COURSE OUTCOMES:
At the end of this course, the students will be able to:
CO1:Apply suitable algorithms for selecting the appropriate features for analysis.
CO2:Implement supervised machine learning algorithms on standard datasets and evaluate the
performance.
CO3:Apply unsupervised machine learning algorithms on standard datasets and evaluate the
performance.
CO4:Build the graph based learning models for standard data sets.
CO5:Assess and compare the performance of different ML algorithms and select the suitable
one based on the application.
Page 6
CO’s- PO’s & PSO’s
MAPPING
PO’S PSO,s
CO’s 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3
1 2 2 2 1 - - - - 1 2 3 3 3 2 1
2 2 1 1 3 2 - - - 3 2 3 2 3 1 1
3 2 2 1 1 2 - - - 1 1 1 1 2 3 3
4 2 2 3 3 2 - - - 1 2 1 1 1 2 2
5 2 2 3 1 2 - - - 3 1 1 1 2 1 2
AVG 2 2 2 2 2 - - - 2 2 2 2 2 2 2
1-LOW,2-MEDIUM,3-HIGH,’-’,NO CORERELATION
Page 7
EX.N Pg. Date of Mark Sig
O
Date Name of the Exercise No completion s
Remarks
n
Page 8
EX.No: 01 CANDIDATE-ELIMINATION Learning Algorithm
DATE:
AIM:
For a given set of training data examples stored in a .CSV file, implement and demonstrate the
Candidate-Elimination algorithm to output a description of the set of all hypotheses consistent with the
training examples.
The CANDIDATE-ELIMINTION algorithm computes the version space containing all hypotheses from H
that are consistent with an observed sequence of training examples.
• If d is a positive example
• Remove s from S
• Remove from S any hypothesis that is more general than another hypothesis in S
• If d is a negative example
• Remove g from G
• Remove from G any hypothesis that is less general than another hypothesis in G
Page 9
CANDIDATE- ELIMINTION algorithm using version spaces
Training Examples:
Example
Program:
import numpy as np
import pandas as pd
concepts = np.array(data.iloc[:,0:-1])
print(concepts)
target = np.array(data.iloc[:,-1])
print(target)
specific_h = concepts[0].copy()
print(specific_h)
print(general_h)
for i, h in enumerate(concepts):
if target[i] == "yes":
for x in range(len(specific_h)):
Page 10
if h[x]!= specific_h[x]:
specific_h[x] ='?'
general_h[x][x] ='?'
print(specific_h)
print(specific_h)
if target[i] == "no":
for x in range(len(specific_h)):
if h[x]!= specific_h[x]:
general_h[x][x] = specific_h[x]
else:
general_h[x][x] = '?'
print(specific_h)
print(general_h)
for i in indices:
Page 11
Output:
Final Specific_h:
Final General_h:
RESULT:
Page 12
Ex.No:02 Decision tree based ID3 algorithm
DATE:
AIM:
Write a program to demonstrate the working of the decision tree based ID3 algorithm. Use an
appropriate data set for building the decision tree and apply this knowledge to classify a new sample.
ALGORITHM:
ID3 Algorithm
Examples are the training examples. Target_attribute is the attribute whose value is to be
predicted by the tree. Attributes is a list of other attributes that may be tested by the
learned decision tree. Returns a decision tree that correctly classifies the given Examples.
If all Examples are positive, Return the single-node tree Root, with label = +
If all Examples are negative, Return the single-node tree Root, with label = -
If Attributes is empty, Return the single-node tree Root, with label = most common value
of Target_attribute in Examples
Otherwise Begin
Let Examples vi, be the subset of Examples that have value vi for A
If Examples vi , is empty
Then below this new branch add a leaf node with label = most common
Page 13
ID3(Examples vi, Targe_tattribute, Attributes – {A}))
End
Return Root
ENTROPY:
INFORMATION GAIN:
S, is defined as
Page 14
Training Dataset:
Program:
import math
import csv
def load_csv(filename):
lines=csv.reader(open(filename,"r"));
dataset = list(lines)
headers = dataset.pop(0)
return dataset,headers
class Node:
def __init__(self,attribute):
self.attribute=attribute
self.children=[]
Page 15
self.answer=""
def subtables(data,col,delete):
dic={}
attr=list(set(coldata))
counts=[0]*len(attr)
r=len(data)
c=len(data[0])
for x in range(len(attr)):
for y in range(r):
if data[y][col]==attr[x]:
counts[x]+=1
for x in range(len(attr)):
range(counts[x])]
pos=0
for y in range(r):
if data[y][col]==attr[x]:
if delete:
del data[y][col]
dic[attr[x]][pos]=data[y]
pos+=1
Page 16
return attr,dic
def entropy(S):
attr=list(set(S))
if len(attr)==1:
return 0
counts=[0,0]
for i in range(2):
sums=0
sums+=-1*cnt*math.log(cnt,2)
return sums
def compute_gain(data,col):
attr,dic = subtables(data,col,delete=False)
total_size=len(data)
entropies=[0]*len(attr)
ratio=[0]*len(attr)
for x in range(len(attr)):
ratio[x]=len(dic[attr[x]])/(total_size*1.0)
Page 17
entropies[x]=entropy([row[-1] for row in
dic[attr[x]]])
total_entropy-=ratio[x]*entropies[x]
return total_entropy
def build_tree(data,features):
if(len(set(lastcol)))==1:
node=Node("")
node.answer=lastcol[0]
return node
n=len(data[0])-1
gains=[0]*n
gains[col]=compute_gain(data,col)
split=gains.index(max(gains))
node=Node(features[split])
fea = features[:split]+features[split+1:]
attr,dic=subtables(data,split,delete=True)
for x in range(len(attr)):
child=build_tree(dic[attr[x]],fea)
node.children.append((attr[x],child))
return node
Page 18
def print_tree(node,level):
if node.answer!="":
print(" "*level,node.answer)
return
print(" "*level,node.attribute)
print(" "*(level+1),value)
print_tree(n,level+2)
def classify(node,x_test,features):
if node.answer!="":
print(node.answer)
return
pos=features.index(node.attribute)
if x_test[pos]==value:
classify(n,x_test,features)
'''Main program'''
dataset,features=load_csv("data3.csv")
node1=build_tree(dataset,features)
is")
print_tree(node1,0)
Page 19
testdata,features=load_csv("data3_test.csv")
classify(node1,xtest,features)
Output:
RESULT:
Thus the above dataset is implemented and executed successfully using based ID3 algorithm.
Page 20
Ex.No:03 Back propagation algorithm
DATE:
AIM:
To build an Artificial Neural Network by implementing the Back propagation algorithm and test
the same using appropriate data sets.
ALGORITHM:
BACKPROPAGATION Algorithm
Each training example is a pair of the form (𝑥, ⃗ ), where (𝑥 ) is the vector of network
ƞ is the learning rate (e.g., .05). ni, is the number of network inputs, nhidden the number
of units in the hidden layer, and nout the number of output units.
The input from unit i into unit j is denoted xji, and the weight from unit i to unit j is
denoted wji
Create a feed-forward network with ni inputs, nhidden hidden units, and nout output
units.
1. Input the instance ⃗𝑥, to the network and compute the output ou of every
Page 21
PROGRAM:
import numpy as np
y = y/100
#Sigmoid Function
def derivatives_sigmoid(x):
return x * (1 - x)
#Variable initialization
Page 22
lr=0.1 #Setting learning rate
wh=np.random.uniform(size=(inputlayer_neurons,hiddenlayer_neurons))
bh=np.random.uniform(size=(1,hiddenlayer_neurons))
wout=np.random.uniform(size=(hiddenlayer_neurons,output_neurons))
bout=np.random.uniform(size=(1,output_neurons))
for i in range(epoch):
#Forward Propogation
hinp1=np.dot(X,wh)
hinp=hinp1 + bh
hlayer_act = sigmoid(hinp)
outinp1=np.dot(hlayer_act,wout)
output = sigmoid(outinp)
#Backpropagation
EO = y-output
outgrad = derivatives_sigmoid(output)
EH = d_output.dot(wout.T)
hiddengrad = derivatives_sigmoid(hlayer_act)
Page 23
d_hiddenlayer = EH * hiddengrad
wh += X.T.dot(d_hiddenlayer) *lr
Output:
RESULT:
Thus using Build an Artificial Neural Network by implementing the Back propagation
algorithm and test the same using appropriate data sets.
Page 24
Ex.No:04 Naive Bayesian classifier
DATE:
AIM:
To write a program to implement the naïve Bayesian classifier for a sample training data set
stored as a .CSV file. Compute the accuracy of the classifier, considering few test data sets.
ALGORITHM:
Bayes’ Theorem is stated as:
Where,
P(h|D) is the probability of hypothesis h given the data D. This is called the posterior
probability.
P(D|h) is the probability of data d given that the hypothesis h was true.
P(h) is the probability of hypothesis h being true. This is called the prior probability of h.
P(D) is the probability of the data. This is called the prior probability of D
After calculating the posterior probability for a number of different hypotheses h, and is
interested in finding the most probable hypothesis h ∈ H given the observed data D. Any such
Bayes theorem to calculate the posterior probability of each candidate hypothesis is hMAP is a
Page 25
(Ignoring P(D) since it is a constant)
A Gaussian Naive Bayes algorithm is a special type of Naïve Bayes algorithm. It’s specifically
used when the features have continuous values. It’s also assumed that all the features are
We calculate the probabilities for input values for each class using a frequency. With realvalued inputs,
we can calculate the mean and standard deviation of input values (x) for each class to summarize the
distribution.
This means that in addition to the probabilities for each class, we must also store the mean and
The probability density function for the normal distribution is defined by two parameters (mean
and standard deviation) and calculating the mean and standard deviation values of each input
Page 26
Example: Refer the link
http://chem-eng.utoronto.ca/~datamining/dmc/naive_bayesian.htm
Examples:
The data set used in this program is the Pima Indians Diabetes problem.
This data set is comprised of 768 observations of medical details for Pima Indians
patents. The records describe instantaneous measurements taken from the patient such
as their age, the number of times pregnant and blood workup. All patients are women
aged 21 or older. All attributes are numeric, and their units vary from attribute to
attribute.
Each record has a class value that indicates whether the patient suffered an onset of
diabetes within 5 years of when the measurements were taken (1) or not (0)
Sample Examples:
PROGRAM:
import csv
import random
Page 27
import math
def loadcsv(filename):
dataset = list(lines)
for i in range(len(dataset)):
return dataset
trainset = []
copy = list(dataset);
#generate indices for the dataset list randomly to pick ele for
training data
index = random.randrange(len(copy));
trainset.append(copy.pop(index))
def separatebyclass(dataset):
for i in range(len(dataset)):
Page 28
vector = dataset[i]
separated[vector[-1]] = []
separated[vector[-1]].append(vector)
return separated
def mean(numbers):
return sum(numbers)/float(len(numbers))
def stdev(numbers):
avg = mean(numbers)
numbers])/float(len(numbers)-1)
return math.sqrt(variance)
attribute in zip(*dataset)];
return summaries
def summarizebyclass(dataset):
separated = separatebyclass(dataset);
#print(separated)
summaries = {}
summaries[classvalue] = summarize(instances)
Page 29
#summarize is used to cal to mean and std
return summaries
exponent = math.exp(-(math.pow(x-mean,2)/
(2*math.pow(stdev,2))))
probabilities = {}
probabilities[classvalue] = 1
for i in range(len(classsummaries)):
probabilities[classvalue] *=
return probabilities
is passed
probabilities = calculateclassprobabilities(summaries,
inputvector)
Page 30
#assigns that class which has the highest prob
bestProb = probability
bestLabel = classvalue
return bestLabel
predictions = []
for i in range(len(testset)):
predictions.append(result)
return predictions
correct = 0
for i in range(len(testset)):
if testset[i][-1] == predictions[i]:
correct += 1
def main():
filename = 'naivedata.csv'
splitratio = 0.67
dataset = loadcsv(filename);
Page 31
# prepare model
summaries = summarizebyclass(trainingset);
#print(summaries)
# test model
{0}%'.format(accuracy))
main()
Output:
RESULT:
Thus the program to implement the naïve Bayesian classifier for a sample training data set
stored as a .CSV file. Compute the accuracy of the classifier, considering few test data sets has been
executed successfully.
Page 32
Ex.No:05: Naive Bayesian Classifier model
DATE:
AIM:
To implement the concepts of assuming a set of documents that need to be classified, use the
naïve Bayesian Classifier model to perform this task. Built-in Java classes/API can be used to write the
program. Calculate the accuracy, precision, and recall for your data set.
ALGORITHM:
LEARN_NAIVE_BAYES_TEXT (Examples, V)
Examples is a set of text documents along with their target values. V is the set of all possible
target values. This function learns the probability terms P(wk |vj,), describing the probability
that a randomly drawn word from a document in class vj will be the English word wk. It also
Page 33
CLASSIFY_NAIVE_BAYES_TEXT (Doc)
Data set:
Program:
import pandas as pd
Page 34
msg=pd.read_csv('naivetext.csv',names=['message','label'])
msg['labelnum']=msg.label.map({'pos':1,'neg':0})
X=msg.message
y=msg.labelnum
print(X)
print(y)
xtrain,xtest,ytrain,ytest=train_test_split(X,y)
count_vect = CountVectorizer()
xtrain_dtm = count_vect.fit_transform(xtrain)
xtest_dtm=count_vect.transform(xtest)
print(count_vect.get_feature_names())
df=pd.DataFrame(xtrain_dtm.toarray(),columns=count_vect.get_fe
ature_names())
clf = MultinomialNB().fit(xtrain_dtm,ytrain)
predicted = clf.predict(xtest_dtm)
Page 35
#printing accuracy, Confusion matrix, Precision and Recall
metrics.accuracy_score(ytest,predicted))
print(metrics.confusion_matrix(ytest,predicted))
metrics.precision_score(ytest,predicted))
metrics.recall_score(ytest,predicted))
Output:
RESULT:
Thus the implementation concepts of assuming a set of documents that need to be classified,
use the naïve Bayesian Classifier model to perform this task. Built-in Java classes/API can be used to
write the program. Calculate the accuracy, precision, and recall for your data set has been successfully
implemented.
Page 36
Ex.No:06 Bayesian network
DATE:
AIM:
To write a program to construct a Bayesian network considering medical data. Use this model to
demonstrate the diagnosis of heart patients using standard Heart Disease Data Set. You can use
Java/Python ML library classes/API.
ALGORITHM:
Theory A Bayesian network is a directed acyclic graph in which each edge corresponds to a conditional
dependency, and each node corresponds to a unique random variable.
Bayesian network consists of two major parts: a directed acyclic graph and a set of conditional
probability distributions
• The conditional probability distribution of a node (random variable) is defined for every possible
outcome of the preceding causal node(s). For illustration, consider the following example. Suppose we
attempt to turn on our computer, but the computer does not start (observation/evidence). We would
like to know which of the possible causes of computer failure is more likely. In this simplified illustration,
we assume only two possible causes of this misfortune: electricity failure and computer malfunction.
The corresponding directed acyclic graph is depicted in below figure.
The goal is to calculate the posterior conditional probability distribution of each of the possible
unobserved causes given the observed evidence, i.e. P [Cause | Evidence]
Page 37
Data Set: Title: Heart Disease Databases The Cleveland database contains 76 attributes, but all published
experiments refer to using a subset of 14 of them. In particular, the Cleveland database is the only one
that has been used by ML researchers to this date. The "Heartdisease" field refers to the presence of
heart disease in the patient. It is integer valued from 0 (no presence) to 4.
Attribute Information:
• Value 4: asymptomatic
• Value 0: normal
criteria
Page 38
11.slope: the slope of the peak exercise ST segment
• Value 1: upsloping
• Value 2: flat
• Value 3: downsloping
Program:
import numpy as np
import pandas as pd
import csv
heartDisease = pd.read_csv('heart.csv')
heartDisease = heartDisease.replace('?',np.nan)
print(heartDisease.head())
Page 39
print('\n Attributes and datatypes')
print(heartDisease.dtypes)
model =
BayesianModel([('age','heartdisease'),('sex','heartdisease'),(
'exang','heartdisease'),('cp','heartdisease'),('heartdisease',
'restecg'),('heartdisease','chol')])
model.fit(heartDisease,estimator=MaximumLikelihoodEstimator)
HeartDiseasetest_infer = VariableElimination(model)
restecg :1')
q1=HeartDiseasetest_infer.query(variables=['heartdisease'],evi
dence={'restecg':1})
print(q1)
q2=HeartDiseasetest_infer.query(variables=['heartdisease'],evi
dence={'cp':2})
print(q2)
Page 40
Page 41
RESULT:
Thus the execution of the program to construct a Bayesian network considering medical data.
Use this model to demonstrate the diagnosis of heart patients using standard Heart Disease Data Set has
been executed successfully.
Page 42
Ex.No:07 EM algorithm
DATE:
AIM:
To apply EM algorithm to cluster a set of data stored in a .CSV file. Use the same data set for
clustering using k-Means algorithm. Compare the results of these two algorithms and comment on the
quality of clustering. You can add Java/Python ML library classes/API in the program.
PROGRAM:
import sklearn.metrics as sm
import pandas as pd
import numpy as np
iris = datasets.load_iris()
X = pd.DataFrame(iris.data)
X.columns = ['Sepal_Length','Sepal_Width','Petal_Length','Petal_Width']
y = pd.DataFrame(iris.target)
y.columns = ['Targets']
model = KMeans(n_clusters=3)
model.fit(X)
Page 43
plt.figure(figsize=(14,7))
plt.subplot(1, 2, 1)
plt.title('Real Classification')
plt.xlabel('Petal Length')
plt.ylabel('Petal Width')
plt.subplot(1, 2, 2)
plt.xlabel('Petal Length')
plt.ylabel('Petal Width')
scaler = preprocessing.StandardScaler()
scaler.fit(X)
Page 44
xsa = scaler.transform(X)
#xs.sample(5)
gmm = GaussianMixture(n_components=3)
gmm.fit(xs)
y_gmm = gmm.predict(xs)
#y_cluster_gmm
plt.subplot(2, 2, 3)
plt.title('GMM Classification')
plt.xlabel('Petal Length')
plt.ylabel('Petal Width')
OUTPUT:
RESULT:
To apply EM algorithm to cluster a set of data stored in a .CSV file. Use the same data set for clustering
using k-Means algorithm. Compare the results of these two algorithms and comment on the quality of
clustering has been completed successfully.
Page 45
Ex.No:08 k-Nearest Neighbour algorithm
DATE:
AIM:
To write a program to implement k-Nearest Neighbour algorithm to classify the iris data set.
Print both correct and wrong predictions. Java/Python ML library classes can be used for this problem.
ALGORITHM:
PROGRAM:
Page 46
from sklearn import datasets
""" Iris Plants Dataset, dataset contains 150 (50 in each of three
the Class
"""
iris=datasets.load_iris()
""" The x variable contains the first four columns of the dataset
"""
x = iris.data
y = iris.target
print(x)
print(y)
""" Splits the dataset into 70% train data and 30% test data. This
means that out of total 150 records, the training set will contain
"""
train_test_split(x,y,test_size=0.3)
classifier = KNeighborsClassifier(n_neighbors=5)
classifier.fit(x_train, y_train)
Page 47
y_pred=classifier.predict(x_test)
"""
print('Confusion Matrix')
print(confusion_matrix(y_test,y_pred))
print('Accuracy Metrics')
print(classification_report(y_test,y_pred))
OUTPUT:
RESULT:
To write a program to implement k-Nearest Neighbour algorithm to classify the iris data set.
Print both correct and wrong predictions has been completed.
Page 48
Ex.No:09 Locally Weighted Regression algorithm
DATE:
AIM:
To implement the non-parametric Locally Weighted Regression algorithm in order to fit data
points. Select appropriate data set for your experiment and draw graphs.
ALGORITHM:
Loess/Lowess Regression: Loess regression is a nonparametric technique that uses local weighted
regression to fit a smooth curve through points in a scatter plot
Page 49
Lowess Algorithm:
learning.
The weights are given by a kernel function (k or w) which can be chosen arbitrarily
Algorithm
1. Read the Given data Sample to X and the curve (linear or non linear) to Y
PROGRAM:
import numpy as np
Page 50
from bokeh.layouts import gridplot
information
X = np.c_[np.ones(len(X)), X]
# predict value
for prediction
tau))
n = 1000
# generate dataset
X = np.linspace(-3, 3, num=n)
Y = np.log(np.abs(X ** 2 - 1) + .5)
:\n",Y[1:10])
# jitter X
Page 51
X += np.random.normal(scale=.1, size=n)
def plot_lwr(tau):
domain]
plot.title.text='tau=%g' % tau
plot.scatter(X, Y, alpha=.3)
return plot
show(gridplot([
[plot_lwr(10.), plot_lwr(1.)],
[plot_lwr(0.1), plot_lwr(0.01)]]))
"""
Spyder Editor
"""
Page 52
import matplotlib
import pandas as pd
import numpy.linalg as np
m,n = np1.shape(xmat)
weights = np1.mat(np1.eye((m)))
for j in range(m):
weights[j,j] = np1.exp(diff*diff.T/(-2.0*k**2))
return weights
def localWeight(point,xmat,ymat,k):
wei = kernel(point,xmat,k)
W = (X.T*(wei*X)).I*(X.T*(wei*ymat.T))
return W
def localWeightRegression(xmat,ymat,k):
m,n = np1.shape(xmat)
ypred = np1.zeros(m)
for i in range(m):
ypred[i] = xmat[i]*localWeight(xmat[i],xmat,ymat,k)
return ypred
Page 53
# load data points
data = pd.read_csv('tips.csv')
bill = np1.array(data.total_bill)
tip = np1.array(data.tip)
mbill = np1.mat(bill)
m= np1.shape(mbill)[1]
one = np1.mat(np1.ones(m))
#print(X)
#set k here
ypred = localWeightRegression(X,mtip,0.3)
SortIndex = X[:,1].argsort(0)
xsort = X[SortIndex][:,0]
fig = plt.figure()
ax = fig.add_subplot(1,1,1)
ax.scatter(bill,tip, color='green')
plt.xlabel('Total bill')
plt.ylabel('Tip')
plt.show();
Page 54
RESULT:
Page 55