DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
MACHINE LEARNING LAB
INTRODUCTION TO LAB:
Machine Learning is used anywhere from automating mundane tasks to offering intelligent insights,
industries in every sector try to benefit from it. You may already be using a device that utilizes it. For
example, a wearable fitness tracker like Fitbit, or an intelligent home assistant like Google Home. But
there are much more examples of ML in use.
Prediction:Machine learning can also be used in the prediction systems. Considering the loan
example, to compute the probability of a fault, the system will need to classify the available
data ingroups.
Image recognition:Machine learning can be used for face detection in an image as well. There
is aseparate category for each person in a database of several people.
Speech Recognition:It is the translation of spoken words into the text. It is used in voice
searches and more. Voice user interfaces include voice dialing, call routing, and appliance
control. It can also be used a simple data entry and the preparation of structured documents.
Medical diagnoses:ML is trained to recognize cancerous tissues.
Financial industry:andtrading:companies use ML in fraud investigations and credit checks.
Types of Machine Learning?
Machine learning can be classified into 3 types of algorithms
1. Supervised Learning
2. Unsupervised Learning
3. Reinforcement Learning
Overview of Supervised Learning Algorithm
In Supervised learning, an AI system is presented with data which is labeled, which means that each
datatagged with the correct label.
The goal is to approximate the mapping function so well that when you have new input data (x) that
youcan predict the output variables (Y) for that data.
As shown in the above example, we have initially taken some data and marked them as ‘Spam’ or ‘Not
Spam’. This labeled data is used by the training supervised model, this data is used to train the model.
Once it is trained we can test our model by testing it with some test new mails and checking of the
model is able to predict the right output.
Types of Supervised learning
Classification: A classification problem is when the output variable is a category, such as
“red” or “blue” or “disease” and “no disease”.
Regression: A regression problem is when the output variable is a real value, such as
“dollars” or “weight”.
Overview of Unsupervised Learning Algorithm
In unsupervised learning, an AI system is presented with unlabeled, uncategorized data and the
system’s algorithms act on the data without prior training. The output is dependent upon the coded
algorithms. Subjecting a system to unsupervised learning is one way of testing AI.
Types of Unsupervised learning:
Clustering: A clustering problem is where you want to discover the inherent groupings in the
data,such as grouping customers by purchasing behavior.
Association: An association rule learning problem is where you want to discover rules that
describe large portions of your data, such as people that buy X also tend to buy Y.
Overview of Reinforcement Learning
A reinforcement learning algorithm, or agent, learns by interacting with its environment. The agent
receives rewards by performing correctly and penalties for performing incorrectly. The agent learns
without intervention from a human by maximizing its reward and minimizing its penalty. It is a type of
dynamic programming that trains algorithms using a system of reward and punishment.
in the above example, we can see that the agent is given 2 options i.e. a path with water or a path
with fire. A reinforcement algorithm works on reward a system i.e. if the agent uses the fire path
then the rewards are subtracted and agent tries to learn that it should avoid the fire path. If it had
chosen the water path or the safe path then some points would have been added to the reward
points, the agent then would try to learn what path is safe and what path isn’t.
It is basically leveraging the rewards obtained; the agent improves its environment knowledge to
select thenext action.
PROGRAM 1: The probability that it is Friday and that a student is absent is 3 %. Since there are 5
school days in a week, the probability that it is Friday is 20 %. What is theprobability that a student
is absent given that today is Friday? Apply Baye’s rule in python to get the result.
SOURCE CODE:
# User input of the probability of the student being absent on Friday
abonfri=float(input("Enter the probability that a student is absent on Firday"))
# The probability that a given day is Friday
prothatfri=float(input("Enter the probability that a given day is Friday"))
absgivenfri=abonfri/prothatfri
print("The probability that the student is absent given that today is Fridays is",absgivenfri)
O/P:
Note: Write reaming two example programs
PROGRAM 2: EXTRACT THE DATA FROM DATABASE USING PYTHON
Method - I
SOURCE CODE:
import mysql.connector
my_database=mysql.connector.connect(host="localhost", user="root", password="root",
database="mysql")
cursor=my_database.cursor()
sql="insert into player1(name, jersey_no,age,score)values(%s,%s,%s,%s)"
player2=[('sachin',10,20,50),('kohile',20,30,100),('dhoni',40,35,110)]
cursur.executemany(sql,player2)
my_database.commit()
cursor.execute("select * from player1")
cursor.fetchall()
output:
Method - II
'''Aim: Extract the data from database using python
=================================
Explanation:
=================================
===> First You need to Create a Table (students) in Mysql Database (SampleDB)
---> Open Command prompt and then execute the following command to enter into MySQL prompt.
--> mysql -u root -p
And then, you need to execute the following commands at MySQL prompt to create table in the
database.
--> create database SampleDB;
--> use SampleDB;
--> CREATE TABLE students (sid VARCHAR(10),sname VARCHAR(10),age int);
--> INSERT INTO students VALUES('s521','Jhon Bob',23);
--> INSERT INTO students VALUES('s522','Dilly',22);
--> INSERT INTO students VALUES('s523','Kenney',25);
--> INSERT INTO students VALUES('s524','Herny',26);
===> Next,Open Command propmt and then execute the following command to install mysql.connector
package to connect with mysql database through python.
--> pip install mysql.connector (Windows)
--> sudo apt-get install mysql.connector (linux)
===============================
Source Code :
===============================. '''
import mysql.connector
# Create the connection object
myconn = mysql.connector.connect(host = "localhost", user = "root",passwd =
"",database="SampleDB")
# Creating the cursor object
cur = myconn.cursor()
# Executing the query
cur.execute("select * from students")
# Fetching the rows from the cursor object
result = cur.fetchall()
print("Student Details are :")
# Printing the result
for x in result:
print(x);
# Commit the transaction
myconn.commit()
# Close the connection
myconn.close()
Output:
PROGRAM 3: IMPLEMENT K-NEAREST NEIGHBORS CLASSIFICATION
USINGPYTHON
SOURCE CODE:
print(__doc__)
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap
from sklearn.neighbors import KNeighborsClassifier
from sklearn.datasets import load_iris
n_neighbors=15
#Loading data
iris=load_iris()
#print(irisData)
X=iris.data[:, :2] #We only take the first two features
#print(X)
y=iris.target
h=.2 # step size in the mesh
# create color maps
cmap_light=ListedColormap(['#FFAAAA','#AAFFAA','#AAAAFF'])
cmap_bold=ListedColormap(['#FF0000','#00FF00','#0000FF'])
for weights in ['uniform','distance']:
# we create an instance of Neighbor classifier and fit the data.
clf=KNeighborsClassifier(n_neighbors,weights=weights)
clf.fit(X,y)
# Plot the decision boundary. We will assign a color to each
# point in the mesh [x_min,x_max]*[y_min,y_max].
x_min,x_max=X[:, 0].min()-1,X[:, 0].max()+1
y_min,y_max=X[:, 1].min()-1,X[:, 1].max()+1
xx,yy=np.meshgrid(np.arange(x_min,x_max,h),np.arange(y_min,y_max,h))
Z=clf.predict(np.c_[xx.ravel(),yy.ravel()])
#Put the result into a color plot
Z=Z.reshape(xx.shape)
plt.figure()
plt.pcolormesh(xx,yy,Z,cmap=cmap_light,shading='auto')
# Plot the training points also
plt.scatter(X[: ,0], X[: ,1],c=y,cmap=cmap_bold)
plt.xlim(xx.min(),xx.max())
plt.ylim(yy.min(),yy.max())
plt.title("3-Class classification (k=%i,weights='%s')"%(n_neighbors,weights))
plt.show()
Output:
SOURCE CODE:
# Import necessary modules
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris
# Loading data
irisData = load_iris()
#print(irisData)
# Create feature and target arrays
X = irisData.data
y = irisData.target
# Split into training and test set
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size = 0.2, random_state=42)
print(X_test)
print(y_test)
knn = KNeighborsClassifier(n_neighbors=7,weights='distance')
knn.fit(X_train, y_train)
# Predict on dataset which model has not seen before
print(knn.predict(X_test))
output:
PROGRAM-4: Given the following data, which specify classifications for nine
combinations of VAR1 and VAR2 predict a classification for a case where
VAR1=0.906 and VAR2=0.606, using the result of k-means clustering with 3 means
(i.e., 3centroids)
SOURCE CODE:
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
#from sklearn.datasets.samples_generator import make_blobs
from sklearn.cluster import KMeans
data=pd.read_csv("D:\ML LAB\Lab_Final\Lab_Final\kmeansdata.csv")
print(data)
var1=pd.DataFrame(data['VAR1'])
var2=pd.DataFrame(data['VAR2'])
kmeans=KMeans(3)
pred_var2=kmeans.fit_predict(var1)
print(pred_var2)
plt.scatter(var1,var2)
plt.scatter(kmeans.cluster_centers_[: ,0],kmeans.cluster_centers_[: ,0],s=300,c='red')
plt.show()
CSV FILE:
OUTPUT:
PROGRAM 5: The Following Training Examples Map Descriptions Of Individuals Onto High,
Medium And LowCredit-Worthiness.
medium skiing design single twenties no ->highRisk
high golf trading married forties yes ->lowRisk
low speedway transport married thirties yes ->medRisk
medium football banking single thirties yes ->lowRisk
high flying media married fifties yes ->highRisk
low football security single twenties no ->medRisk
medium golf media single thirties yes ->medRisk
medium golf transport married forties yes ->lowRisk
high skiing banking single thirties yes ->highRisk
low golf unemployed married forties yes ->highRisk
SOURCE CODE:
import pandas as pd
data=pd.read_csv("D:\ML LAB\Lab_Final\Lab_Final\Credit-Worthiness.csv")
def unc_prob(val,attr):
val_count=0
for ele in data[attr]:
if ele==val:
val_count+=1
return val_count/len(data[attr])
def cond_prob(val1,attr1,val2,attr2):
val_count1,data_count=0,0
for ele1 in data[attr1]:
for ele2 in data[attr2]:
if ele2==val2:
data_count+=1
if ele1==val1:
val_count1+=1
return val_count1/data_count
inp_value,inp_attr=input("Enter the value name and attribute name for which you want to find
unconditional probability with a space in between").split()
inp_value1,inp_attr1,inp_value2,inp_attr2=input("Enter the value name and attribute name for which
you want to find the conditional probability give value name and attribute name").split()
ele_unc_prob=unc_prob(inp_value,inp_attr)
ele_cond_prob=cond_prob(inp_value1,inp_attr1,inp_value2,inp_attr2)
print(ele_unc_prob)
print(ele_cond_prob)
CSV FILE:
OUTPUT:
PROGRAM 6: IMPLEMENT LINEAR REGRESSION USINGPYTHON.
SOURCE CODE:
# A program to illustrate linear regression
# Import Libraries
import pandas as pd
import matplotlib.pyplot as plt
from sklearn import linear_model
# Load Dataset
data=pd.read_csv('D:\ML LAB\Lab_Final\Lab_Final\weight-height.csv')
# Quick view about the data
#print(data)
#data.plot(kind='scatter',x='Height',y='Weight')
#data.plot(kind='box')
#plt.show()
#print(data.corr())
# Change to DataFrame variables
Height=pd.DataFrame(data['Height'])
Weight=pd.DataFrame(data['Weight'])
print(Weight)
print(Height)
# Build Linear Regression Model
lm=linear_model.LinearRegression()
model=lm.fit(Height,Weight)
print(model.coef_)
print(model.intercept_)
print(model.score(Height,Weight))#Evaluate the model
Height_new=pd.DataFrame([65,60,68])
Weight_new=model.predict(Height_new)
Weight_new=pd.DataFrame(Weight_new)
df=pd.concat([Height_new,Weight_new],axis=1,keys=['Height_new','Weight_new'])
print(df)
# Visualize the result
data.plot(kind='scatter',x='Height',y='Weight')
#Plotting the regression line
plt.plot(Height,model.predict(Height),color='red',linewidth=2)
# Plotting the predicted values
plt.scatter(Height_new,Weight_new,color='red')
plt.show()
CSV FILE:
OUTPUT:
PROGRAM 7: IMPLEMENT NAÏVE BAYES THEOREM TO CLASSIFY THE
ENGLISHTEXT
SOURCE CODE:
import numpy as np, pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.datasets import fetch_20newsgroups
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.pipeline import make_pipeline
from sklearn.metrics import confusion_matrix, accuracy_score
sns.set() # use seaborn plotting style
# Load the dataset
data = fetch_20newsgroups()
print(data)
# Get the text categories
text_categories = data.target_names
# define the training set
train_data = fetch_20newsgroups(subset="train", categories=text_categories)
# define the test set
test_data = fetch_20newsgroups(subset="test", categories=text_categories)
print("We have {} unique classes".format(len(text_categories)))
print("We have {} training samples".format(len(train_data.data)))
print("We have {} test samples".format(len(test_data.data)))
# let’s have a look as some training data
print(test_data.data[5])
# Build the model
model = make_pipeline(TfidfVectorizer(), MultinomialNB())
# Train the model using the training data
model.fit(train_data.data, train_data.target)
# Predict the categories of the test data
predicted_categories = model.predict(test_data.data)
print(np.array(test_data.target_names)[predicted_categories])
# plot the confusion matrix
mat = confusion_matrix(test_data.target, predicted_categories)
sns.heatmap(mat.T, square = True, annot=True, fmt = "d",
xticklabels=train_data.target_names,yticklabels=train_data.target_names)
plt.xlabel("true labels")
plt.ylabel("predicted label")
plt.show()
print("The accuracy is {}".format(accuracy_score(test_data.target, predicted_categories)))
OUTPUT:
PROGRAM 8: IMPLEMENT AN ALGORITHM TO DEMONSTRATE THE
SIGNIFICANCE OFGENETICALGORITHM
SOURCE CODE:
import numpy
def cal_pop_fitness(equation_inputs, pop):
fitness = numpy.sum(pop*equation_inputs, axis=1)
return fitness
def select_mating_pool(pop, fitness, num_parents):
parents = numpy.empty((num_parents, pop.shape[1]))
for parent_num in range(num_parents):
max_fitness_idx = numpy.where(fitness == numpy.max(fitness))
max_fitness_idx = max_fitness_idx[0][0]
parents[parent_num, :] = pop[max_fitness_idx, :]
fitness[max_fitness_idx] = -99999999999
return parents
def crossover(parents, offspring_size):
offspring = numpy.empty(offspring_size)
crossover_point = numpy.uint8(offspring_size[1]/2)
for k in range(offspring_size[0]):
parent1_idx = k%parents.shape[0]
parent2_idx = (k+1)%parents.shape[0]
offspring[k, 0:crossover_point] = parents[parent1_idx, 0:crossover_point]
offspring[k, crossover_point:] = parents[parent2_idx, crossover_point:]
return offspring
def mutation(offspring_crossover):
for idx in range(offspring_crossover.shape[0]):
random_value = numpy.random.uniform(-1.0, 1.0, 1)
offspring_crossover[idx, 4] = offspring_crossover[idx, 4] + random_value
return offspring_crossover
PROGRAM 9: IMPLEMENT THE FINITE WORDS CLASSIFICATION SYSTEM USING
BACK-PROPAGATIONALGORITHM
SOURCE CODE:
import pandas as pd
msg = pd.read_csv('D:/python/backprapogation1.csv', names=['message', 'label'])
print("Total Instances of Dataset: ", msg.shape[0])
msg['labelnum'] = msg.label.map({'pos': 1, 'neg': 0})
X = msg.message
y = msg.labelnum
from sklearn.model_selection import train_test_split
Xtrain, Xtest, ytrain, ytest = train_test_split(X, y)
from sklearn.feature_extraction.text import CountVectorizer
count_v = CountVectorizer()
Xtrain_dm = count_v.fit_transform(Xtrain)
Xtest_dm = count_v.transform(Xtest)
df = pd.DataFrame(Xtrain_dm.toarray(),columns=count_v.get_feature_names())
print(df[0:5])
from sklearn.naive_bayes import MultinomialNB
clf = MultinomialNB()
clf.fit(Xtrain_dm, ytrain)
pred = clf.predict(Xtest_dm)
for doc, p in zip(Xtrain, pred):
p = 'pos' if p == 1 else 'neg'
print("%s -> %s" % (doc, p))
from sklearn.metrics import accuracy_score, confusion_matrix, precision_score, recall_score
print('Accuracy Metrics: \n')
print('Accuracy: ', accuracy_score(ytest, pred))
print('Recall: ', recall_score(ytest, pred))
print('Precision: ', precision_score(ytest, pred))
print('Confusion Matrix: \n', confusion_matrix(ytest, pred))
CSV FILE:
OUTPUT: