0% found this document useful (0 votes)
17 views6 pages

Aml Lab

The document outlines various machine learning tasks implemented in Python, including a Bayesian Classifier for sentiment analysis, a Bayesian network for heart disease diagnosis, Locally Weighted Regression, EM algorithm for clustering, k-Nearest Neighbors for classifying the iris dataset, and Random Forest classification. Each task includes code snippets, accuracy metrics, and outputs demonstrating the effectiveness of the models. The document serves as a comprehensive guide for implementing these algorithms using Python's machine learning libraries.

Uploaded by

sakethg167
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views6 pages

Aml Lab

The document outlines various machine learning tasks implemented in Python, including a Bayesian Classifier for sentiment analysis, a Bayesian network for heart disease diagnosis, Locally Weighted Regression, EM algorithm for clustering, k-Nearest Neighbors for classifying the iris dataset, and Random Forest classification. Each task includes code snippets, accuracy metrics, and outputs demonstrating the effectiveness of the models. The document serves as a comprehensive guide for implementing these algorithms using Python's machine learning libraries.

Uploaded by

sakethg167
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Bayesian Classifier model to perform this task.

Built-in Java classes/API can be


used to write the program. Calculate the accuracy, precision, and recall for your
data set.
import pandas as pd
from sklearn.model_selec on import train_test_split
from sklearn.feature_extrac on.text import CountVectorizer
from sklearn.naive_bayes import Mul nomialNB
from sklearn.metrics import accuracy_score, precision_score, recall_score, confusion_matrix

df = pd.read_csv('document.csv', names=['message', 'label'])


df.dropna(subset=['message', 'label'], inplace=True)
df['labelnum'] = df.label.map({'pos': 1, 'neg': 0})

X = df['message']
y = df['labelnum']

Xtrain, Xtest, ytrain, ytest = train_test_split(X, y, test_size=0.3, random_state=42)

vectorizer = CountVectorizer()
Xtrain_vec = vectorizer.fit_transform(Xtrain)
Xtest_vec = vectorizer.transform(Xtest)

model = Mul nomialNB()


model.fit(Xtrain_vec, ytrain)

pred = model.predict(Xtest_vec)

print("\nPredic ons:")
for text, predic on in zip(Xtest, pred):
print(f"{text} -> {'pos' if predic on == 1 else 'neg'}")

print("\nAccuracy Metrics:")
print("Accuracy:", accuracy_score(ytest, pred))
print("Precision:", precision_score(ytest, pred, zero_division=0))
print("Recall:", recall_score(ytest, pred, zero_division=0))
print("Confusion Matrix:\n", confusion_matrix(ytest, pred))
output:
Predic ons:
I love this sandwich -> pos
This is an amazing place -> pos
He is my sworn enemy -> neg
I do not like this restaurant -> neg
This is my best work -> neg
I am sick and red of this place -> neg
Accuracy Metrics:
Accuracy: 0.8333333333333334
Precision: 1.0
Recall: 0.6666666666666666 Confusion
Matrix:
[[3 0]
[1 2]]
5. Write a program to construct a Bayesian network considering medical data. Use
this model to demonstrate the diagnosis of heart pa ents using standard Heart
Disease Data Set. You can use Java/Python ML library classes/API.
import pandas as pd
from pgmpy.models import BayesianNetwork
from pgmpy.es mators import MaximumLikelihoodEs mator
from pgmpy.inference import VariableElimina on

data = pd.read_csv("heartdisease.csv")
heart_disease = pd.DataFrame(data)
print(heart_disease.head())

model = BayesianNetwork([
('age', 'Lifestyle'),
('Gender', 'Lifestyle'),
('Family', 'heartdisease'),
('Lifestyle', 'diet'),
('diet', 'cholestrol'),
('cholestrol', 'heartdisease')])

model.fit(heart_disease, es mator=MaximumLikelihoodEs mator)

HeartDisease_infer = VariableElimina on(model)

print('For age Enter { SuperSeniorCi zen:0, SeniorCi zen:1, MiddleAged:2, Youth:3, Teen:4 }')
print('For Gender Enter { Male:0, Female:1 }')
print('For Family History Enter { Yes:1, No:0 }')
print('For diet Enter { High:0, Medium:1 }')
print('For Lifestyle Enter { Athlete:0, Ac ve:1, Moderate:2, Sedentary:3 }')
print('For cholesterol Enter { High:0, BorderLine:1, Normal:2 }')

evidence = {
'age': int(input('Enter age: ')),
'Gender': int(input('Enter Gender: ')),
'Family': int(input('Enter Family History: ')),
'diet': int(input('Enter diet: ')),
'Lifestyle': int(input('Enter Lifestyle: ')),
'cholestrol': int(input('Enter cholesterol: '))}

q = HeartDisease_infer.query(variables=['heartdisease'], evidence=evidence)
print(q)

output:
+----------------+-------------+
| heartdisease | phi(%) |
+================+=============+
| heartdisease_0 | 35.67 |
| heartdisease_1 | 64.33 |
+----------------+-------------+
6. Implement the non-parametric Locally Weighted Regression algorithm in order to
fit data points. Select appropriate data set for your experiment and draw graphs.
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

def kernel(point, xmat, k):


m, n = np.shape(xmat)
weights = np.mat(np.eye((m)))
for j in range(m):
diff = point - xmat[j]
weights[j, j] = np.exp(diff * diff.T / (-2.0 * k ** 2))
return weights

def localWeight(point, xmat, ymat, k):


wei = kernel(point, xmat, k)
W = (xmat.T * (wei * xmat)).I * (xmat.T * (wei * ymat.T))
return W

def localWeightRegression(xmat, ymat, k):


m, n = np.shape(xmat)
ypred = np.zeros(m)
for i in range(m):
ypred[i] = xmat[i] * localWeight(xmat[i], xmat, ymat, k)
return ypred

data = pd.read_csv('10-dataset.csv')
bill = np.array(data.total_bill)
p = np.array(data. p)

mbill = np.mat(bill)
m p = np.mat( p)
m = np.shape(mbill)[1]
one = np.mat(np.ones(m))
X = np.hstack((one.T, mbill.T))

ypred = localWeightRegression(X, m p, 0.5)


SortIndex = X[:, 1].argsort(0)
xsort = X[SortIndex][:, 0]

fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
ax.sca er(bill, p, color='green')
ax.plot(X[SortIndex][:, 1], ypred[SortIndex], color='red', linewidth=5)
plt.xlabel('Total bill')
plt.ylabel('Tip')
plt.show()

--------------------------------------------------------------------------------------------------------------------------------------
-----
7.Apply EM algorithm to cluster a set of data stored in a .CSV file. Use the same data set
for clustering using k-Means algorithm. Compare the results of these two algorithms and
comment on the quality of clustering. You can add Java/Python ML library classes/API in
the program.
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.cluster import KMeans
from sklearn.mixture import GaussianMixture
from sklearn.metrics import silhoue e_score

iris = load_iris()
X = iris.data

kmeans = KMeans(n_clusters=3, n_init=10)


kmeans_labels = kmeans.fit_predict(X)
kmeans_silhoue e = silhoue e_score(X, kmeans_labels)

em = GaussianMixture(n_components=3)
em_labels = em.fit_predict(X)
em_silhoue e = silhoue e_score(X, em_labels)

print("k-Means Silhoue e Score:", kmeans_silhoue e)


print("EM (Gaussian Mixture) Silhoue e Score:", em_silhoue e)

print("\nClustering Results:")
print("k-Means Labels:", kmeans_labels)
print("EM (Gaussian Mixture) Labels:", em_labels)

output:
k-Means Silhoue e Score: 0.5528190123564091
EM (Gaussian Mixture) Silhoue e Score: 0.5011761635067202
Clustering Results:
k-Means Labels: [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
00
0000000000000112111111111111111111111
111211111111111111111111112122221222222
1 1 2 2 2 2 1 2 1 2 1 2 2 1 1 2 2 2 2 2 1 2 2 2 2 1 2 2 2 1 2 2 2 1 2 2 1]
EM (Gaussian Mixture) Labels: [1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
11111111111
11
1111111111111000000000000000000202020
0 0 0 2 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2]
8. Write a program to implement k-Nearest
Neighbour algorithm to classify the iris data set.
Print both correct and wrong predic ons.
Java/Python ML library classes can be used for this
problem.
from sklearn.datasets import load_iris
from sklearn.model_selec on import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score

iris = load_iris()
X = iris.data
y = iris.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

knn = KNeighborsClassifier(n_neighbors=3)
knn.fit(X_train, y_train)
predic ons = knn.predict(X_test)

correct_predic ons = 0
wrong_predic ons = 0

for i in range(len(predic ons)):


if predic ons[i] == y_test[i]:
correct_predic ons += 1
else:
wrong_predic ons += 1
print(f"Wrong predic on: Predicted {predic ons[i]}, Actual {y_test[i]}")

accuracy = correct_predic ons / len(predic ons)


print(f"Accuracy: {accuracy}")
print(f"Correct predic ons: {correct_predic ons}")
print(f"Wrong predic ons: {wrong_predic ons}")
print("Accuracy (from sklearn):", accuracy_score(y_test, predic ons))
output:
Accuracy: 1.0
Correct predic ons: 30
Wrong predic ons: 0
Accuracy (from sklearn): 1.0
9.3 Write a Program to implement Random Forest Algorithm.
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_iris
from sklearn.model_selec on import train_test_split
from sklearn.metrics import accuracy_score
# Load dataset
iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(iris.data
, iris.target, test_size=0.3, random_state=42)
# Train Random Forest
clf = RandomForestClassifier(n_es mators=100, random_state=42)
#n_es mators=100:This means the model will create 100 decision trees in the
forest.
clf.fit(X_train, y_train)
predic ons = clf.predict(X_test)
print("Accuracy:", accuracy_score(y_test, predic ons))

Output: Accuracy: 1.0

You might also like