MACHINE LEARNING Page 1
1. Implement and demonstrate the FIND-S algorithm for finding the
most specific hypothesis based on a given set of training data
samples. Read the training data from a .CSV file.
import pandas as pd
def read_data(file_path):
df = pd.read_csv(file_path)
print("Training Data:")
print(df)
return df.values.tolist()
def find_s_algorithm(training_data):
hypothesis = None
for row in training_data:
if row[-1].lower() == "yes":
if hypothesis is None:
hypothesis = row[:-1]
else:
hypothesis = [h if h != r else "?" for h, r in zip(hypothesis,
row[:-1])]
return hypothesis
file_path = "training_data.csv"
training_data = read_data(file_path)
print("Most Specific Hypothesis:", find_s_algorithm(training_data))
OUTPUT
Training Data:
Most Specific Hypothesis: ['D3', '?', '?', '?', '?']
CAMBRIDGE INSTITUTE OF TECHNOLOGY MCA 2024-25
MACHINE LEARNING Page 2
2. For a given set of training data examples stored in a .CSV file,
implement and demonstrate the Candidate-Elimination algorithm to
output a description of the set of all hypotheses consistent with the
training examples.
import csv
# Load the data from CSV file
data = []
with open('data.csv') as file:
for row in csv.reader(file):
data.append(row)
# Initialize S (Specific) and G (General)
S = data[0][:-1] # First positive example
G = [['?' for _ in range(len(S))]] # General hypothesis
# Process each example
for example in data:
inputs, output = example[:-1], example[-1]
if output == "Yes":
for i in range(len(S)):
if S[i] != inputs[i]:
S[i] = '?' # Generalize S
# Remove inconsistent hypotheses from G
G = [g for g in G if all(g[i] == '?' or g[i] == inputs[i] for i in
range(len(S)))]
elif output == "No":
new_G = []
for g in G:
for i in range(len(S)):
if S[i] != '?' and S[i] != inputs[i]:
new_hypo = g.copy()
new_hypo[i] = S[i]
new_G.append(new_hypo)
G = new_G
# Output the result
print("Final Specific Hypothesis (S):", S)
CAMBRIDGE INSTITUTE OF TECHNOLOGY MCA 2024-25
MACHINE LEARNING Page 3
print("Final General Hypotheses (G):")
for g in G:
print(g)
OUTPUT:
Final Specific Hypothesis (S): ['Sunny', 'Warm', '?', 'Strong', '?', '?']
Final General Hypotheses (G):
['Sunny', '?', '?', '?', '?', '?']
['?', 'Warm', '?', '?', '?', '?']
CAMBRIDGE INSTITUTE OF TECHNOLOGY MCA 2024-25
MACHINE LEARNING Page 4
3. Write a program to demonstrate the working of the decision tree
based ID3 algorithm. Use an appropriate data set for building the
decision tree and apply this knowledge to classify a new sample.
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
data = {
'protocol': [1, 2, 1, 3, 2, 1, 3, 1, 2, 3],
'duration': [10, 50, 5, 60, 15, 7, 80, 20, 40, 90],
'src_bytes': [100, 200, 50, 400, 120, 70, 500, 140, 220, 600],
'dst_bytes': [50, 80, 30, 150, 60, 40, 200, 70, 90, 300],
'attack': [0, 1, 0, 1, 0, 0, 1, 0, 1, 1] # 0 = Normal, 1 = Attack }
df = pd.DataFrame(data)
X = df.drop(columns=['attack'])
y = df['attack']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3,
random_state=42)
clf = DecisionTreeClassifier()
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
print("Accuracy:", accuracy_score(y_test, y_pred))
new_sample = [[1, 25, 130, 75]] # Example input (protocol=1, duration=25,
src_bytes=130, dst_bytes=75)
prediction = clf.predict(new_sample)
print("New Sample Classification (0=Normal, 1=Attack):", prediction[0])
OUTPUT
Accuracy: 0.3333333333333333
New Sample Classification (0=Normal, 1=Attack): 0
CAMBRIDGE INSTITUTE OF TECHNOLOGY MCA 2024-25
MACHINE LEARNING Page 5
4. Build an Artificial Neural Network by implementing the
Backpropagation algorithm and test the same using appropriate data
sets.
import numpy as np
# Activation function
def sigmoid(x): return 1 / (1 + np.exp(-x))
def sigmoid_deriv(x): return x * (1 - x)
# Input (XOR)
X = np.array([[0,0],[0,1],[1,0],[1,1]])
y = np.array([[0],[1],[1],[0]])
# Random weights
np.random.seed(1)
w1 = np.random.rand(2,2)
w2 = np.random.rand(2,1)
# Training loop
for _ in range(10000):
h = sigmoid(np.dot(X, w1))
o = sigmoid(np.dot(h, w2))
d = (y - o) * sigmoid_deriv(o)
w2 += h.T.dot(d)
w1 += X.T.dot((d.dot(w2.T)) * sigmoid_deriv(h))
# Output
print("Output:")
print(o.round(3))
Output:
[[0.033]
[0.931]
[0.931]
[0.093]]
CAMBRIDGE INSTITUTE OF TECHNOLOGY MCA 2024-25
MACHINE LEARNING Page 6
5. Assuming a set of documents that need to be classified, use the naïve
Bayesian Classifier model to perform this task. Built-in Java
classes/API can be used to write the program. Calculate the accuracy,
precision, and recall for your data set.
import pandas as pd
from sklearn.tree import DecisionTreeClassifier
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import accuracy_score,precision_score,recall_score
from sklearn.model_selection import train_test_split
# Load dataset
df=pd.read_csv("C:/Users/ashwi/OneDrive/Documents/python
programs/training_data.csv")
# Encode categorical values
le = LabelEncoder()
for column in df.columns:
df[column] = le.fit_transform(df[column])
# Split features and target
X = df.drop(columns=['play_tennis'])
y = df['play_tennis']
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.2,random_s
tate=42)
# Train decision tree model
model = DecisionTreeClassifier()
model.fit(X_train, y_train)
y_pred=model.predict(X_test)
accuracy=accuracy_score(y_test, y_pred)
precision=precision_score(y_test, y_pred)
recall=recall_score(y_test, y_pred)
print(f"Accuracy:{accuracy:.2f}")
print(f"Precision: {precision:.2f}")
CAMBRIDGE INSTITUTE OF TECHNOLOGY MCA 2024-25
MACHINE LEARNING Page 7
print(f"Recall:{recall:.2f}")
# Predict for a sample input (including all 5 features)
sample = [[2, 1, 0, 1, 2]]
prediction = model.predict(sample)
print("Prediction (1 = Yes, 0 = No):", prediction[0])
OUTPUT:
Accuracy:0.67
Precision: 0.67
Recall:1.00
Prediction (1 = Yes, 0 = No): 1
CAMBRIDGE INSTITUTE OF TECHNOLOGY MCA 2024-25
MACHINE LEARNING Page 8
6. Write a program to implement k-Nearest Neighbour algorithm to
classify the iris data set. Print both correct and wrong predictions.
Java/Python ML library classes can be used for this problem.
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score
data = {
'feature1': [1, 2, 3, 6, 7, 8],
'feature2': [2, 3, 3, 6, 7, 8],
'label': [0, 0, 0, 1, 1, 1]
}
df = pd.DataFrame(data)
X_train, X_test, y_train, y_test = train_test_split(df[['feature1',
'feature2']], df['label'], test_size=0.3, random_state=42)
knn = KNeighborsClassifier(n_neighbors=3)
knn.fit(X_train, y_train)
y_pred = knn.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
correct = [(X_test.iloc[i].tolist(), y_test.iloc[i]) for i in range(len(y_test))
if y_test.iloc[i] == y_pred[i]]
wrong = [(X_test.iloc[i].tolist(), y_test.iloc[i], y_pred[i]) for i in
range(len(y_test)) if y_test.iloc[i] != y_pred[i]]
print("\nCorrect Predictions:", correct)
print("\nWrong Predictions:", wrong)
OUTPUT
Accuracy: 0.0
Correct Predictions: []
Wrong Predictions: [([1, 2], 0,1), ([2, 3], 0,1)]
CAMBRIDGE INSTITUTE OF TECHNOLOGY MCA 2024-25