0% found this document useful (0 votes)
9 views16 pages

ML Lab Programs

Uploaded by

karyagola
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views16 pages

ML Lab Programs

Uploaded by

karyagola
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 16

ML_LAB_PROGRAM_3 25-26

1. Implement and demonstrate the FIND S-algorithm for finding the most specific
hypothesis based on a given set of training data samples. Read the training data from
a .CSV file and show the output for test cases. Develop an interactive program by
Comparing the result by implementing LIST THEN ELIMINATE algorithm.

import csv

from itertools import product

num_attributes = 6

data = []

print("\nThe Given Training Data Set:\n")

with open('Book1.csv', 'r') as csvfile:

reader = csv.reader(csvfile)

for row in reader:

data.append(row)

print(row)

print("\nThe initial value of hypothesis:")

hypothesis = ['0'] * num_attributes

print(hypothesis)

# Initialize hypothesis with the first positive example

for j in range(num_attributes):

hypothesis[j] = data[0][j]

print("\nFind-S: Finding a Maximally Specific Hypothesis\n")

for i in range(len(data)):

if data[i][num_attributes].lower() == 'yes':

Dept.of AIML, CiTech, Bengaluru Page 1


ML_LAB_PROGRAM_3 25-26

for j in range(num_attributes):

if data[i][j] != hypothesis[j]:

hypothesis[j] = '?'

else:

hypothesis[j] = hypothesis[j]

print(f"For Training Instance No: {i+1} the hypothesis is: {hypothesis}")

print("\nThe Maximally Specific Hypothesis for the given Training Examples:\n")

print(hypothesis)

# === List-Then-Eliminate ===

# Get all attribute domains

domains = [set() for _ in range(num_attributes)]# Creates an empty set.

for instance in data:

for i in range(num_attributes):

domains[i].add(instance[i]) # store only unique values.

# Generate all possible hypotheses

all_hypotheses = list(product(*[list(values) + ['?'] for values in domains]))

# Filter consistent hypotheses

consistent_hypotheses = []

for h in all_hypotheses:

consistent = True

for instance in data:

Dept.of AIML, CiTech, Bengaluru Page 2


ML_LAB_PROGRAM_3 25-26

label = instance[num_attributes].lower()

match = all(h[i] == instance[i] or h[i] == '?' for i in range(num_attributes))

if (label == 'yes' and not match) or (label == 'no' and match):

consistent = False

break

if consistent:

consistent_hypotheses.append(h)

print("\nList-Then-Eliminate: Consistent Hypotheses (showing first 5):")

for i, h in enumerate(consistent_hypotheses[:9]):

print(f"{i + 1}. {h}")

print(f"\nTotal consistent hypotheses found: {len(consistent_hypotheses)}")

OUTPUT:-

The Given Training Data Set:

['Sky', 'AirTemp', 'Humidity', 'Wind', 'Water', 'Forecast', 'EnjoySport']

['Sunny', 'Warm', 'Normal', 'Strong', 'Warm', 'Same', 'Yes']

['Sunny', 'Warm', 'High', 'Strong', 'Warm', 'Same', 'Yes']

['Rainy', 'Cold', 'High', 'Strong', 'Warm', 'Change', 'No']

['Sunny', 'Warm', 'High', 'Strong', 'Cool', 'Change', 'Yes']

The initial value of hypothesis:

['0', '0', '0', '0', '0', '0']

Dept.of AIML, CiTech, Bengaluru Page 3


ML_LAB_PROGRAM_3 25-26

Find-S: Finding a Maximally Specific Hypothesis

For Training Instance No: 1 the hypothesis is: ['Sky', 'AirTemp', 'Humidity', 'Wind', 'Water', 'Forecast']

For Training Instance No: 2 the hypothesis is: ['?', '?', '?', '?', '?', '?']

For Training Instance No: 3 the hypothesis is: ['?', '?', '?', '?', '?', '?']

For Training Instance No: 4 the hypothesis is: ['?', '?', '?', '?', '?', '?']

For Training Instance No: 5 the hypothesis is: ['?', '?', '?', '?', '?', '?']

The Maximally Specific Hypothesis for the given Training Examples:

['?', '?', '?', '?', '?', '?']

List-Then-Eliminate: Consistent Hypotheses (showing first 5):

1. ('Sunny', 'Warm', '?', 'Strong', '?', '?')

2. ('Sunny', 'Warm', '?', '?', '?', '?')

3. ('Sunny', '?', '?', 'Strong', '?', '?')

4. ('Sunny', '?', '?', '?', '?', '?')

5. ('?', 'Warm', '?', 'Strong', '?', '?')

6. ('?', 'Warm', '?', '?', '?', '?')

Total consistent hypotheses found: 6

Dept.of AIML, CiTech, Bengaluru Page 4


ML_LAB_PROGRAM_3 25-26

2. For a given set of training data examples stored in a .CSV file, implement and
demonstrate the Candidate-Elimination algorithm. Output a description of the set of all
hypotheses consistent with the training examples.

# Candidate Elimination Algorithm

import csv

# Load dataset

with open('sports.csv', 'r') as f:

reader = csv.reader(f)

data = list(reader)

headers = data[0]

data = data[1:]

# Initialize S and G

for row in data:

if row[-1].lower() == "yes":

S = row[:-1]

break

G = [['?'] * len(S)]

print("Initial S:", S)

print("Initial G:", G)

print("-" * 50)

Dept.of AIML, CiTech, Bengaluru Page 5


ML_LAB_PROGRAM_3 25-26

# Candidate Elimination process

for row in data:

instance, label = row[:-1], row[-1].lower()

if label == "yes": # Positive instance

for i in range(len(S)):

if S[i] != instance[i]:

S[i] = '?'

G = [g for g in G if all(g[i] == '?' or g[i] == instance[i] for i in range(len(S)))]

else: # Negative instance

new_G = []

for g in G:

if all(g[i] == '?' or g[i] == instance[i] for i in range(len(S))):

for i in range(len(S)):

if g[i] == '?':

values = set(row[i] for row in data)

for val in values:

if val != instance[i]:

new_hyp = g.copy()

new_hyp[i] = val

if all(S[j] == '?' or S[j] == new_hyp[j] for j in range(len(S))):

new_G.append(new_hyp)

else:

Dept.of AIML, CiTech, Bengaluru Page 6


ML_LAB_PROGRAM_3 25-26

new_G.append(g)

G = new_G

print(f"Instance: {instance}, Label: {label}")

print("S:", S)

print("G:", G)

print("-" * 50)

print("Final S:", S)

print("Final G:", G)

OUTPUT:-

Initial S: ['Sunny', 'Warm', 'Normal', 'Strong', 'Warm', 'Same']

Initial G: [['?', '?', '?', '?', '?', '?']]

--------------------------------------------------

Instance: ['Sunny', 'Warm', 'Normal', 'Strong', 'Warm', 'Same'], Label: yes

S: ['Sunny', 'Warm', 'Normal', 'Strong', 'Warm', 'Same']

G: [['?', '?', '?', '?', '?', '?']]

--------------------------------------------------

Instance: ['Sunny', 'Warm', 'High', 'Strong', 'Warm', 'Same'], Label: yes

S: ['Sunny', 'Warm', '?', 'Strong', 'Warm', 'Same']

G: [['?', '?', '?', '?', '?', '?']]

--------------------------------------------------

Instance: ['Rainy', 'Cold', 'High', 'Strong', 'Warm', 'Change'], Label: no

S: ['Sunny', 'Warm', '?', 'Strong', 'Warm', 'Same']

Dept.of AIML, CiTech, Bengaluru Page 7


ML_LAB_PROGRAM_3 25-26

G: []

--------------------------------------------------

Instance: ['Sunny', 'Warm', 'High', 'Weak', 'Cool', 'Change'], Label: yes

S: ['Sunny', 'Warm', '?', '?', '?', '?']

G: []

--------------------------------------------------

Final S: ['Sunny', 'Warm', '?', '?', '?', '?']

Final G: []

Dept.of AIML, CiTech, Bengaluru Page 8


ML_LAB_PROGRAM_3 25-26

3. Apply Preprocessing (Data Cleaning, Integration and Transformation) activity on


suitable data: For example: Identify and Delete Rows that Contain Duplicate Data by
considering an appropriate dataset. Identify and Delete Columns That Contain a Single
Value by considering an appropriate dataset.

import pandas as pd

import numpy as np

import seaborn as sns

import matplotlib.pyplot as plt

# 1. Create Sample Dataset

data1 = {

'ID': [1, 2, 3, 3, 4, 5],

'Name': ['Alice', 'Bob', 'Cathy', 'Cathy', 'David', np.nan], # Missing Name

'Age': [25, 30, 22, 22, np.nan, 35], # Missing Age

'City': ['NY', 'LA', 'NY', 'NY', 'TX', 'TX'],

'Constant': [1, 1, 1, 1, 1, 1] # Single value column

data2 = {

'ID': [6, 7],

'Name': ['Frank', 'Grace'],

'Age': [40, 29],

'City': ['LA', np.nan], # Missing City

'Constant': [1, 1]

df1 = pd.DataFrame(data1)

df2 = pd.DataFrame(data2)

# Integrate datasets

Dept.of AIML, CiTech, Bengaluru Page 9


ML_LAB_PROGRAM_3 25-26

df = pd.concat([df1, df2], ignore_index=True)

# 2. Visualize Missing Values BEFORE Cleaning

plt.figure(figsize=(7,4))

sns.heatmap(df.isnull(), cbar=False, cmap="viridis")

plt.title("Missing Values BEFORE Cleaning")

plt.show()

# 3. Handle Missing Values

df_cleaned = df.fillna({

'Name': 'Unknown',

'Age': df['Age'].mean(), # Fill numeric with mean

'City': 'Unknown'

})

# Remove duplicate rows

df_cleaned = df_cleaned.drop_duplicates()

# Drop single-value columns

df_final = df_cleaned.loc[:, df_cleaned.nunique() > 1]

# 4. Visualize Missing Values AFTER Cleaning

plt.figure(figsize=(7,4))

sns.heatmap(df_final.isnull(), cbar=False, cmap="viridis")

plt.title("Missing Values AFTER Cleaning")

plt.show()

print("\n---- Final Preprocessed Dataset ----")

print(df_final)

--------------------------------------------------------------------------------------------

Dept.of AIML, CiTech, Bengaluru Page 10


ML_LAB_PROGRAM_3 25-26

OUTPUT:-

Final Preprocessed Dataset

ID Name Age City

0 1 Alice 25.0 NY

1 2 Bob 30.0 LA

2 3 Cathy 22.0 NY

4 4 David 29.0 TX

5 5 Unknown 35.0 TX

6 6 Frank 40.0 LA

7 7 Grace 29.0 Unknown

Dept.of AIML, CiTech, Bengaluru Page 11


ML_LAB_PROGRAM_3 25-26

4. Apply the ID3 algorithm to build a Decision Tree from an appropriate dataset and
predict the class of a new data point.

import pandas as pd
import math

from collections import Counter

# Example dataset (commented out because CSV is used instead)

# data = {

# 'Outlook': ['Sunny', 'Sunny', 'Overcast', 'Rainy', 'Rainy', 'Rainy', 'Overcast',

# 'Sunny', 'Sunny', 'Rainy', 'Sunny', 'Overcast', 'Overcast', 'Rainy'],

# 'Temperature': ['Hot', 'Hot', 'Hot', 'Mild', 'Cool', 'Cool', 'Cool',

# 'Mild', 'Cool', 'Mild', 'Mild', 'Mild', 'Hot', 'Mild'],

# 'Humidity': ['High', 'High', 'High', 'High', 'Normal', 'Normal', 'Normal',

# 'High', 'Normal', 'Normal', 'Normal', 'High', 'Normal', 'High'],

# 'Wind': ['Weak', 'Strong', 'Weak', 'Weak', 'Weak', 'Strong', 'Strong',

# 'Weak', 'Weak', 'Weak', 'Strong', 'Strong', 'Weak', 'Strong'],

# 'PlayTennis': ['No', 'No', 'Yes', 'Yes', 'Yes', 'No', 'Yes',

# 'No', 'Yes', 'Yes', 'Yes', 'Yes', 'Yes', 'No']

#}

# df = pd.DataFrame(data)

# print("Play Tennis Dataset:\n")

# print(df)

# Load dataset from CSV

df = pd.read_csv('PlayTennis.csv')

Dept.of AIML, CiTech, Bengaluru Page 12


ML_LAB_PROGRAM_3 25-26

print("Play Tennis Dataset:\n")

print(df)

# Function to calculate entropy

def entropy(target_col):

elements, counts = zip(*Counter(target_col).items())

entropy_val = 0

for i in range(len(elements)):

p = counts[i] / sum(counts)

entropy_val -= p * math.log2(p)

return entropy_val

# Function to calculate Information Gain

def InfoGain(data, split_attribute_name, target_name="PlayTennis"):

total_entropy = entropy(data[target_name])

vals, counts = zip(*Counter(data[split_attribute_name]).items())

weighted_entropy = 0

for i in range(len(vals)):

subset = data[data[split_attribute_name] == vals[i]]

weighted_entropy += (counts[i] / sum(counts)) * entropy(subset[target_name])

information_gain = total_entropy - weighted_entropy

return information_gain

# ID3 Algorithm

Dept.of AIML, CiTech, Bengaluru Page 13


ML_LAB_PROGRAM_3 25-26

def ID3(data, original_data, features, target_attribute_name="PlayTennis",


parent_node_class=None):

# If all target values have the same value, return that value

if len(Counter(data[target_attribute_name])) == 1:

return list(Counter(data[target_attribute_name]).keys())[0]

# If dataset is empty, return the mode target feature value from original data

elif len(data) == 0:

return Counter(original_data[target_attribute_name]).most_common(1)[0][0]

# If feature space is empty, return parent node class

elif len(features) == 0:

return parent_node_class

else:

# Get the most common target feature value of current node

parent_node_class = Counter(data[target_attribute_name]).most_common(1)[0][0]

# Select the feature that best splits the dataset

item_values = [InfoGain(data, feature, target_attribute_name) for feature in features]

best_feature_index = item_values.index(max(item_values))

best_feature = features[best_feature_index]

# Create the tree structure

tree = {best_feature: {}}

Dept.of AIML, CiTech, Bengaluru Page 14


ML_LAB_PROGRAM_3 25-26

# Remove the feature from the feature space

remaining_features = [i for i in features if i != best_feature]

# Grow a branch under the root node for each possible value of best_feature

for value in data[best_feature].unique():

subset = data[data[best_feature] == value]

subtree = ID3(subset, original_data, remaining_features, target_attribute_name,


parent_node_class)

tree[best_feature][value] = subtree

return tree

# Build the tree

features = list(df.columns)

features.remove("PlayTennis")

tree = ID3(df, df, features)

print("\nDecision Tree:\n", tree)

Dept.of AIML, CiTech, Bengaluru Page 15


ML_LAB_PROGRAM_3 25-26

OUTPUT:-

Play Tennis Dataset:

Outlook Temperature Humidity Wind PlayTennis

0 Sunny Hot High Weak No

1 Sunny Hot High Strong No

2 Overcast Hot High Weak Yes

3 Rainy Mild High Weak Yes

4 Rainy Cool Normal Weak Yes

5 Rainy Cool Normal Strong No

6 Overcast Cool Normal Strong Yes

7 Sunny Mild High Weak No

8 Sunny Cool Normal Weak Yes

9 Rainy Mild Normal Weak Yes

10 Sunny Mild Normal Strong Yes

11 Overcast Mild High Strong Yes

12 Overcast Hot Normal Weak Yes

13 Rainy Mild High Strong No

Decision Tree:

{'Outlook': {'Sunny': {'Humidity': {'High': 'No', 'Normal': 'Yes'}}, 'Overcast': 'Yes', 'Rainy':
{'Wind': {'Weak': 'Yes', 'Strong': 'No'}}}}

Dept.of AIML, CiTech, Bengaluru Page 16

You might also like