ML_LAB_PROGRAM_3 25-26
1. Implement and demonstrate the FIND S-algorithm for finding the most specific
hypothesis based on a given set of training data samples. Read the training data from
a .CSV file and show the output for test cases. Develop an interactive program by
Comparing the result by implementing LIST THEN ELIMINATE algorithm.
import csv
from itertools import product
num_attributes = 6
data = []
print("\nThe Given Training Data Set:\n")
with open('Book1.csv', 'r') as csvfile:
reader = csv.reader(csvfile)
for row in reader:
data.append(row)
print(row)
print("\nThe initial value of hypothesis:")
hypothesis = ['0'] * num_attributes
print(hypothesis)
# Initialize hypothesis with the first positive example
for j in range(num_attributes):
hypothesis[j] = data[0][j]
print("\nFind-S: Finding a Maximally Specific Hypothesis\n")
for i in range(len(data)):
if data[i][num_attributes].lower() == 'yes':
Dept.of AIML, CiTech, Bengaluru Page 1
ML_LAB_PROGRAM_3 25-26
for j in range(num_attributes):
if data[i][j] != hypothesis[j]:
hypothesis[j] = '?'
else:
hypothesis[j] = hypothesis[j]
print(f"For Training Instance No: {i+1} the hypothesis is: {hypothesis}")
print("\nThe Maximally Specific Hypothesis for the given Training Examples:\n")
print(hypothesis)
# === List-Then-Eliminate ===
# Get all attribute domains
domains = [set() for _ in range(num_attributes)]# Creates an empty set.
for instance in data:
for i in range(num_attributes):
domains[i].add(instance[i]) # store only unique values.
# Generate all possible hypotheses
all_hypotheses = list(product(*[list(values) + ['?'] for values in domains]))
# Filter consistent hypotheses
consistent_hypotheses = []
for h in all_hypotheses:
consistent = True
for instance in data:
Dept.of AIML, CiTech, Bengaluru Page 2
ML_LAB_PROGRAM_3 25-26
label = instance[num_attributes].lower()
match = all(h[i] == instance[i] or h[i] == '?' for i in range(num_attributes))
if (label == 'yes' and not match) or (label == 'no' and match):
consistent = False
break
if consistent:
consistent_hypotheses.append(h)
print("\nList-Then-Eliminate: Consistent Hypotheses (showing first 5):")
for i, h in enumerate(consistent_hypotheses[:9]):
print(f"{i + 1}. {h}")
print(f"\nTotal consistent hypotheses found: {len(consistent_hypotheses)}")
OUTPUT:-
The Given Training Data Set:
['Sky', 'AirTemp', 'Humidity', 'Wind', 'Water', 'Forecast', 'EnjoySport']
['Sunny', 'Warm', 'Normal', 'Strong', 'Warm', 'Same', 'Yes']
['Sunny', 'Warm', 'High', 'Strong', 'Warm', 'Same', 'Yes']
['Rainy', 'Cold', 'High', 'Strong', 'Warm', 'Change', 'No']
['Sunny', 'Warm', 'High', 'Strong', 'Cool', 'Change', 'Yes']
The initial value of hypothesis:
['0', '0', '0', '0', '0', '0']
Dept.of AIML, CiTech, Bengaluru Page 3
ML_LAB_PROGRAM_3 25-26
Find-S: Finding a Maximally Specific Hypothesis
For Training Instance No: 1 the hypothesis is: ['Sky', 'AirTemp', 'Humidity', 'Wind', 'Water', 'Forecast']
For Training Instance No: 2 the hypothesis is: ['?', '?', '?', '?', '?', '?']
For Training Instance No: 3 the hypothesis is: ['?', '?', '?', '?', '?', '?']
For Training Instance No: 4 the hypothesis is: ['?', '?', '?', '?', '?', '?']
For Training Instance No: 5 the hypothesis is: ['?', '?', '?', '?', '?', '?']
The Maximally Specific Hypothesis for the given Training Examples:
['?', '?', '?', '?', '?', '?']
List-Then-Eliminate: Consistent Hypotheses (showing first 5):
1. ('Sunny', 'Warm', '?', 'Strong', '?', '?')
2. ('Sunny', 'Warm', '?', '?', '?', '?')
3. ('Sunny', '?', '?', 'Strong', '?', '?')
4. ('Sunny', '?', '?', '?', '?', '?')
5. ('?', 'Warm', '?', 'Strong', '?', '?')
6. ('?', 'Warm', '?', '?', '?', '?')
Total consistent hypotheses found: 6
Dept.of AIML, CiTech, Bengaluru Page 4
ML_LAB_PROGRAM_3 25-26
2. For a given set of training data examples stored in a .CSV file, implement and
demonstrate the Candidate-Elimination algorithm. Output a description of the set of all
hypotheses consistent with the training examples.
# Candidate Elimination Algorithm
import csv
# Load dataset
with open('sports.csv', 'r') as f:
reader = csv.reader(f)
data = list(reader)
headers = data[0]
data = data[1:]
# Initialize S and G
for row in data:
if row[-1].lower() == "yes":
S = row[:-1]
break
G = [['?'] * len(S)]
print("Initial S:", S)
print("Initial G:", G)
print("-" * 50)
Dept.of AIML, CiTech, Bengaluru Page 5
ML_LAB_PROGRAM_3 25-26
# Candidate Elimination process
for row in data:
instance, label = row[:-1], row[-1].lower()
if label == "yes": # Positive instance
for i in range(len(S)):
if S[i] != instance[i]:
S[i] = '?'
G = [g for g in G if all(g[i] == '?' or g[i] == instance[i] for i in range(len(S)))]
else: # Negative instance
new_G = []
for g in G:
if all(g[i] == '?' or g[i] == instance[i] for i in range(len(S))):
for i in range(len(S)):
if g[i] == '?':
values = set(row[i] for row in data)
for val in values:
if val != instance[i]:
new_hyp = g.copy()
new_hyp[i] = val
if all(S[j] == '?' or S[j] == new_hyp[j] for j in range(len(S))):
new_G.append(new_hyp)
else:
Dept.of AIML, CiTech, Bengaluru Page 6
ML_LAB_PROGRAM_3 25-26
new_G.append(g)
G = new_G
print(f"Instance: {instance}, Label: {label}")
print("S:", S)
print("G:", G)
print("-" * 50)
print("Final S:", S)
print("Final G:", G)
OUTPUT:-
Initial S: ['Sunny', 'Warm', 'Normal', 'Strong', 'Warm', 'Same']
Initial G: [['?', '?', '?', '?', '?', '?']]
--------------------------------------------------
Instance: ['Sunny', 'Warm', 'Normal', 'Strong', 'Warm', 'Same'], Label: yes
S: ['Sunny', 'Warm', 'Normal', 'Strong', 'Warm', 'Same']
G: [['?', '?', '?', '?', '?', '?']]
--------------------------------------------------
Instance: ['Sunny', 'Warm', 'High', 'Strong', 'Warm', 'Same'], Label: yes
S: ['Sunny', 'Warm', '?', 'Strong', 'Warm', 'Same']
G: [['?', '?', '?', '?', '?', '?']]
--------------------------------------------------
Instance: ['Rainy', 'Cold', 'High', 'Strong', 'Warm', 'Change'], Label: no
S: ['Sunny', 'Warm', '?', 'Strong', 'Warm', 'Same']
Dept.of AIML, CiTech, Bengaluru Page 7
ML_LAB_PROGRAM_3 25-26
G: []
--------------------------------------------------
Instance: ['Sunny', 'Warm', 'High', 'Weak', 'Cool', 'Change'], Label: yes
S: ['Sunny', 'Warm', '?', '?', '?', '?']
G: []
--------------------------------------------------
Final S: ['Sunny', 'Warm', '?', '?', '?', '?']
Final G: []
Dept.of AIML, CiTech, Bengaluru Page 8
ML_LAB_PROGRAM_3 25-26
3. Apply Preprocessing (Data Cleaning, Integration and Transformation) activity on
suitable data: For example: Identify and Delete Rows that Contain Duplicate Data by
considering an appropriate dataset. Identify and Delete Columns That Contain a Single
Value by considering an appropriate dataset.
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
# 1. Create Sample Dataset
data1 = {
'ID': [1, 2, 3, 3, 4, 5],
'Name': ['Alice', 'Bob', 'Cathy', 'Cathy', 'David', np.nan], # Missing Name
'Age': [25, 30, 22, 22, np.nan, 35], # Missing Age
'City': ['NY', 'LA', 'NY', 'NY', 'TX', 'TX'],
'Constant': [1, 1, 1, 1, 1, 1] # Single value column
data2 = {
'ID': [6, 7],
'Name': ['Frank', 'Grace'],
'Age': [40, 29],
'City': ['LA', np.nan], # Missing City
'Constant': [1, 1]
df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)
# Integrate datasets
Dept.of AIML, CiTech, Bengaluru Page 9
ML_LAB_PROGRAM_3 25-26
df = pd.concat([df1, df2], ignore_index=True)
# 2. Visualize Missing Values BEFORE Cleaning
plt.figure(figsize=(7,4))
sns.heatmap(df.isnull(), cbar=False, cmap="viridis")
plt.title("Missing Values BEFORE Cleaning")
plt.show()
# 3. Handle Missing Values
df_cleaned = df.fillna({
'Name': 'Unknown',
'Age': df['Age'].mean(), # Fill numeric with mean
'City': 'Unknown'
})
# Remove duplicate rows
df_cleaned = df_cleaned.drop_duplicates()
# Drop single-value columns
df_final = df_cleaned.loc[:, df_cleaned.nunique() > 1]
# 4. Visualize Missing Values AFTER Cleaning
plt.figure(figsize=(7,4))
sns.heatmap(df_final.isnull(), cbar=False, cmap="viridis")
plt.title("Missing Values AFTER Cleaning")
plt.show()
print("\n---- Final Preprocessed Dataset ----")
print(df_final)
--------------------------------------------------------------------------------------------
Dept.of AIML, CiTech, Bengaluru Page 10
ML_LAB_PROGRAM_3 25-26
OUTPUT:-
Final Preprocessed Dataset
ID Name Age City
0 1 Alice 25.0 NY
1 2 Bob 30.0 LA
2 3 Cathy 22.0 NY
4 4 David 29.0 TX
5 5 Unknown 35.0 TX
6 6 Frank 40.0 LA
7 7 Grace 29.0 Unknown
Dept.of AIML, CiTech, Bengaluru Page 11
ML_LAB_PROGRAM_3 25-26
4. Apply the ID3 algorithm to build a Decision Tree from an appropriate dataset and
predict the class of a new data point.
import pandas as pd
import math
from collections import Counter
# Example dataset (commented out because CSV is used instead)
# data = {
# 'Outlook': ['Sunny', 'Sunny', 'Overcast', 'Rainy', 'Rainy', 'Rainy', 'Overcast',
# 'Sunny', 'Sunny', 'Rainy', 'Sunny', 'Overcast', 'Overcast', 'Rainy'],
# 'Temperature': ['Hot', 'Hot', 'Hot', 'Mild', 'Cool', 'Cool', 'Cool',
# 'Mild', 'Cool', 'Mild', 'Mild', 'Mild', 'Hot', 'Mild'],
# 'Humidity': ['High', 'High', 'High', 'High', 'Normal', 'Normal', 'Normal',
# 'High', 'Normal', 'Normal', 'Normal', 'High', 'Normal', 'High'],
# 'Wind': ['Weak', 'Strong', 'Weak', 'Weak', 'Weak', 'Strong', 'Strong',
# 'Weak', 'Weak', 'Weak', 'Strong', 'Strong', 'Weak', 'Strong'],
# 'PlayTennis': ['No', 'No', 'Yes', 'Yes', 'Yes', 'No', 'Yes',
# 'No', 'Yes', 'Yes', 'Yes', 'Yes', 'Yes', 'No']
#}
# df = pd.DataFrame(data)
# print("Play Tennis Dataset:\n")
# print(df)
# Load dataset from CSV
df = pd.read_csv('PlayTennis.csv')
Dept.of AIML, CiTech, Bengaluru Page 12
ML_LAB_PROGRAM_3 25-26
print("Play Tennis Dataset:\n")
print(df)
# Function to calculate entropy
def entropy(target_col):
elements, counts = zip(*Counter(target_col).items())
entropy_val = 0
for i in range(len(elements)):
p = counts[i] / sum(counts)
entropy_val -= p * math.log2(p)
return entropy_val
# Function to calculate Information Gain
def InfoGain(data, split_attribute_name, target_name="PlayTennis"):
total_entropy = entropy(data[target_name])
vals, counts = zip(*Counter(data[split_attribute_name]).items())
weighted_entropy = 0
for i in range(len(vals)):
subset = data[data[split_attribute_name] == vals[i]]
weighted_entropy += (counts[i] / sum(counts)) * entropy(subset[target_name])
information_gain = total_entropy - weighted_entropy
return information_gain
# ID3 Algorithm
Dept.of AIML, CiTech, Bengaluru Page 13
ML_LAB_PROGRAM_3 25-26
def ID3(data, original_data, features, target_attribute_name="PlayTennis",
parent_node_class=None):
# If all target values have the same value, return that value
if len(Counter(data[target_attribute_name])) == 1:
return list(Counter(data[target_attribute_name]).keys())[0]
# If dataset is empty, return the mode target feature value from original data
elif len(data) == 0:
return Counter(original_data[target_attribute_name]).most_common(1)[0][0]
# If feature space is empty, return parent node class
elif len(features) == 0:
return parent_node_class
else:
# Get the most common target feature value of current node
parent_node_class = Counter(data[target_attribute_name]).most_common(1)[0][0]
# Select the feature that best splits the dataset
item_values = [InfoGain(data, feature, target_attribute_name) for feature in features]
best_feature_index = item_values.index(max(item_values))
best_feature = features[best_feature_index]
# Create the tree structure
tree = {best_feature: {}}
Dept.of AIML, CiTech, Bengaluru Page 14
ML_LAB_PROGRAM_3 25-26
# Remove the feature from the feature space
remaining_features = [i for i in features if i != best_feature]
# Grow a branch under the root node for each possible value of best_feature
for value in data[best_feature].unique():
subset = data[data[best_feature] == value]
subtree = ID3(subset, original_data, remaining_features, target_attribute_name,
parent_node_class)
tree[best_feature][value] = subtree
return tree
# Build the tree
features = list(df.columns)
features.remove("PlayTennis")
tree = ID3(df, df, features)
print("\nDecision Tree:\n", tree)
Dept.of AIML, CiTech, Bengaluru Page 15
ML_LAB_PROGRAM_3 25-26
OUTPUT:-
Play Tennis Dataset:
Outlook Temperature Humidity Wind PlayTennis
0 Sunny Hot High Weak No
1 Sunny Hot High Strong No
2 Overcast Hot High Weak Yes
3 Rainy Mild High Weak Yes
4 Rainy Cool Normal Weak Yes
5 Rainy Cool Normal Strong No
6 Overcast Cool Normal Strong Yes
7 Sunny Mild High Weak No
8 Sunny Cool Normal Weak Yes
9 Rainy Mild Normal Weak Yes
10 Sunny Mild Normal Strong Yes
11 Overcast Mild High Strong Yes
12 Overcast Hot Normal Weak Yes
13 Rainy Mild High Strong No
Decision Tree:
{'Outlook': {'Sunny': {'Humidity': {'High': 'No', 'Normal': 'Yes'}}, 'Overcast': 'Yes', 'Rainy':
{'Wind': {'Weak': 'Yes', 'Strong': 'No'}}}}
Dept.of AIML, CiTech, Bengaluru Page 16