22cs503 Machine Learning Lab
22cs503 Machine Learning Lab
The Candidate Elimination Algorithm learns from examples to identify concepts. It does this by refining a set
of possible descriptions (hypotheses) based on whether examples are positive matches or not.
Algorithm :
if attribute_value == hypothesis_value:
Do nothing
else:
Program:
import numpy as np
import pandas as pd
data = pd.read_csv('EnjoySport.csv')
concepts = np.array(data.iloc[:,0:-1])
print("\nInstances are:\n",concepts)
target = np.array(data.iloc[:,-1])
specific_h = concepts[0].copy()
for i, h in enumerate(concepts):
if target[i] == "yes":
for x in range(len(specific_h)):
if h[x]!= specific_h[x]:
specific_h[x] ='?'
general_h[x][x] ='?'
if target[i] == "no":
for x in range(len(specific_h)):
if h[x]!= specific_h[x]:
general_h[x][x] = specific_h[x]
else:
general_h[x][x] = '?'
print("\n")
indices = [i for i, val in enumerate(general_h) if val == ['?', '?', '?', '?', '?', '?']]
for i in indices:
Result:
Thus the above program has been Executed Successfully.
EXP NO:2 2A. R-Squared Error
DATE:
Aim:
Algorithm:
2. Calculate the Total Sum of Squares (TSS) by subtracting each observation yi from y̅, then squaring it and
summing these square differences across all the values. It is denoted by
3. We estimate the model parameter using a suitable regression model such as Linear Regression or SVM Regressor
4. We calculate the Sum of squares due to regression which is denoted by RSS. This is calculated by subtracting
each predicted value of y denoted by y_predi from yi squaring these differences and then summing all the n terms.
Program:
print("\n")
y = [1, 2, 3, 6]
print("Error", r_squared)
Output:
Result:
Aim:
Algorithm:
2. Calculate the Total Sum of Squares (TSS) by subtracting each observation yi from y̅, then squaring it and
summing these square differences across all the values. It is denoted by
2. We calculate the Sum of squares due to regression which is denoted by RSS. This is calculated by subtracting
each predicted value of y denoted by y_predi from yi squaring these differences and then summing all the n terms.
Program:
return adjusted_r_squared
r_squared_value = 0.75
n_obs = 100
n_pred = 3
Output:
Result:
Aim:
Algorithm:
Input y and yCap. Where y is the Array of actual target values and yCap is an array of predicted target values.
var+=|yi-yiCap|
Program:
import numpy as np
n = len(y_true)
return var
print(f"MAE: {var_res}")
Output:
Result:
Aim:
Algorithm:
2. Insert X values in the equation found in step 1 in order to get the respective Y values i.e. Yi Cap
3. Now subtract the new Y values (i.e. Y Cap) from the original Y values. Thus, found values are the error terms. It
is also known as the vertical distance of the given point from the regression line. Yi - Yi Cap
MSE =
Program:
import numpy as np
mse = np.mean(squared_diff)
return mse
print(f"mse: {mse}")
Output:
Result:
DATE: TECHNIQUES
A. CONFUSION MATRIX
AIM:
To write a program to implementation of machine learning model evaluation
Technique using confusion matrix.
ALGORITHM:
1. Initialize matrix: Create a square matrix with dimensions (N x N), where N is the number of classes in your
classification problem. Initialize all values in the matrix to zero.
2. Iterate over predictions and ground truth: For each prediction-ground truth pair in your dataset:
If the predicted class matches the actual class:
Increment the corresponding entry in the confusion matrix (true positive).
If the predicted class does not match the actual class:
Increment the entry in the confusion matrix corresponding to the actual class and the
predicted class (false positive for the predicted class, false negative for the actual class).
3. Calculate metrics: Optionally, you can calculate various evaluation metrics using the values in the
confusion matrix, such as accuracy, precision, recall, and F1-score.
Program:
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.metrics import confusion_matrix
y_true = [1, 0, 1, 1, 0, 1, 0, 0, 1, 0]
y_pred = [1, 0, 1, 1, 0, 0, 1, 0, 1, 1]
conf_matrix = confusion_matrix(y_true, y_pred)
sns.heatmap(conf_matrix, annot=True, fmt="d", cmap="Blues", cbar=False)
plt.xlabel("Predicted")
plt.ylabel("True")
plt.title("Confusion Matrix")
plt.show()
TP = conf_matrix[1, 1]
TN = conf_matrix[0, 0]
FP = conf_matrix[0, 1]
FN = conf_matrix[1, 0]
print("True Positive (TP):", TP)
print("True Negative (TN):", TN)
print("False Positive (FP):", FP)
print("False Negative (FN):", FN)
print("\n")
Output:
Result:
Thus the above program has been Executed Successfully.
B. F1 Score
AIM:
To write a program to implementation of machine learning model evaluation
Technique using F1 score
ALGORITHM:
2. Aggregate F1 Scores:
Depending on your needs, you may want to aggregate the F1 scores in different ways. Common approaches
include:
● Micro-average: Compute F1 score globally by considering total true positives, false positives, and false
negatives across all classes.
● Macro-average: Compute the average F1 score across all classes without considering class imbalance.
● Weighted-average: Compute the average F1 score across all classes with each class weighted by its
support (the number of true instances for each class).
Program:
Output:
Result:
Thus the above Program has been Executed Successfully.
C. AUCROC Curve
AIM:
To write a program to implementation of machine learning model evaluation
Technique using AUCROC Curve
ALGORITHM:
1.Sort Predictions: Sort the predictions generated by your binary classification algorithm in descending order of
confidence scores.
2. Initialize Variables: Set variables for true positives (TP), false positives (FP), true negatives (TN), and false
negatives (FN) to zero.
3.Iterate Through Sorted Predictions:
● Start iterating through the sorted predictions.
● For each prediction:
● If the prediction is a true positive (actual positive and predicted positive), increment TP.
● If the prediction is a false positive (actual negative but predicted positive), increment FP.
● If the prediction is a true negative (actual negative and predicted negative), increment TN.
● If the prediction is a false negative (actual positive but predicted negative), increment FN.
3. Calculate True Positive Rate (TPR) and False Positive Rate (FPR):
● True Positive Rate (TPR) is calculated as TPR=TP/ FN+TP
● False Positive Rate (FPR) is calculated as FPR=FP / TN+FP
4. Plot ROC Curve: Plot the FPR on the x-axis and the TPR on the y-axis.
5. Calculate AUC-ROC:
● Calculate the area under the ROC curve using various methods such as trapezoidal rule or other
numerical integration methods.
Program:
Output:
Result:
Thus the above program has been Executed Successfully.
EX NO:04 IDENTIFYING FEATURE CO-RELATION USING PCA
DATE:
AIM:
ALGORITHM:
Program:
import numpy as np
import pandas as pd
np.random.seed(42)
data = np.random.rand(100, 5)
sorted_indices = np.argsort(eigenvalues)[::-1]
eigenvalues = eigenvalues[sorted_indices]
eigenvectors = eigenvectors[:, sorted_indices]
cumulative_explained_variance = np.cumsum(explained_variance_ratio)
plt.show()
feature_weights = pd.DataFrame(eigenvectors[:, 0], index=['Feature 1', 'Feature 2', 'Feature 3', 'Feature 4', 'Feature
5'],
columns=['Weight'])
feature_weights.plot(kind='bar', legend=None)
plt.ylabel('Weight')
plt.show()
correlation_matrix = pca_df.corr()
print(correlation_matrix)
Output:
Result:
Aim :
Implement interpret Canonical Covariates with Heatmap
Algorithm :
Program:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.cross_decomposition import CCA
from sklearn.datasets import load_iris
iris = load_iris()
X1 = iris.data[:, :2]
X2 = iris.data[:, 2:]
cca = CCA(n_components=2)
cca.fit(X1, X2)
canonical_coef_X1 = cca.x_weights_
canonical_coef_X2 = cca.y_weights_
plt.figure(figsize=(10, 5))
plt.subplot(1, 2, 1)
sns.heatmap(canonical_coef_X1, annot=True, cmap="coolwarm", xticklabels=iris.feature_names[:2],
yticklabels=[f"CCA {i+1}" for i
in range(canonical_coef_X1.shape[1])])
plt.title('Canonical Coefficients for X1')
plt.subplot(1, 2, 2)
sns.heatmap(canonical_coef_X2, annot=True, cmap="coolwarm", xticklabels=iris.feature_names[2:],
yticklabels=[f"CCA {i+1}" for i
in range(canonical_coef_X2.shape[1])])
plt.title('Canonical Coefficients for X2')
plt.tight_layout()
plt.show()
Output:
Result:
Thus the above program has been Executed Successfully.
AIM:
To implement the python code for Feature Selection,Feature Transformation and Feature Extraction.
ALGORITHM:
1.Load the Dataset: Load your dataset into a pandas DataFrame or any other suitable data structure.
2.Feature Selection:
a. Decide on a feature selection method based on your problem and data characteristics. Common methods include:
● Filter methods: Use statistical measures like correlation, chi-square test, or mutual information to select
features.
● Wrapper methods: Train a machine learning model and select features based on their impact on model
performance (e.g., recursive feature elimination).
● Embedded methods: Select features as part of the model training process (e.g., L1 regularization). b.
Implement the selected feature selection method using libraries like scikit-learn or feature selection
algorithms directly.
3.Feature Transformation:
a. Choose a feature transformation technique suitable for your data. Common techniques include:
● Standardization (scaling): Scale features to have a mean of 0 and a standard deviation of 1.
● Normalization: Scale features to a specified range (e.g., [0, 1]).
● PCA (Principal Component Analysis): Transform features into a lower-dimensional space while
preserving the most important information.
● LDA (Linear Discriminant Analysis): Find linear combinations of features that best separate different
classes. b. Implement the chosen feature transformation technique using libraries like scikit-learn.
4. Feature Extraction:
a. Decide on a feature extraction method suitable for your problem. Common techniques include:
● PCA: Extract principal components that capture the maximum variance in the data.
● LDA: Extract linear discriminants that maximize class separability.
● t-SNE (t-distributed Stochastic Neighbor Embedding): Non-linear dimensionality reduction technique
for visualizing high-dimensional data.
● Autoencoders: Train neural networks to learn compact representations of data. b. Implement the
chosen feature extraction method using libraries like scikit-learn or deep learning frameworks like
TensorFlow or PyTorch.
5. Apply Feature Selection, Transformation, and Extraction:
● Apply the selected feature selection, transformation, and extraction methods to your dataset.
6. Evaluate the Performance:
● Evaluate the performance of your model using the selected features, transformed features, or
extracted features.
Program:
from sklearn.datasets import load_iris
from sklearn.feature_selection import SelectKBest, chi2
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
import matplotlib.pyplot as plt
iris = load_iris()
X, y = iris.data, iris.target
selector = SelectKBest(chi2, k=3).fit(X, y)
selected_features_indices = selector.get_support(indices=True)
selected_features = [iris.feature_names[i] for i in selected_features_indices]
X_new = selector.transform(X)
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X_scaled)
print("Original shape:", X.shape)
print("Shape after feature selection:", X_new.shape)
print("Shape after feature transformation:", X_scaled.shape)
print("Shape after feature extraction:", X_pca.shape)
print("Selected features:", selected_features)
plt.figure(figsize=(14, 5))
plt.subplot(1, 3, 1)
plt.scatter(X_new[:, 0], X_new[:, 1], c=y, cmap='viridis', edgecolor='k')
plt.title('Feature Selection')
plt.subplot(1, 3, 2)
plt.scatter(X_scaled[:, 0], X_scaled[:, 1], c=y, cmap='viridis', edgecolor='k')
plt.title('Feature Transformation')
plt.subplot(1, 3, 3)
plt.scatter(X_pca[:, 0], X_pca[:, 1], c=y, cmap='viridis', edgecolor='k')
plt.title('Feature Extraction (PCA)')
plt.tight_layout()
plt.show()
Output:
Result:
Thus the above program has been Executed Successfully.
Aim :
Implement the non-parametric Locally Weighted Regression algorithm in order to fit data points. Select the
appropriate data set for your experiment and draw graphs.
Algorithm:
1. Calculate distances:
o Compute the Euclidean distance between new_point and each data point in X.
o Store the distances in a vector distances.
2. Calculate weights:
o Use the kernel_function to calculate weights for each data point based on its distance.
o The weight of a data point is typically higher if it's closer to new_point.
o Store the weights in a vector weights.
3. Create weighted data:
o Multiply each row of X by its corresponding weight from weights.
o Multiply y by weights.
o Store the weighted data in weighted_X and weighted_y, respectively.
4. Fit a linear model:
o Use ordinary least squares (OLS) to fit a linear model to the weighted data weighted_X and
weighted_y.
5. Predict:
o Use the fitted linear model to predict the value for new_point.
Program
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
df = pd.read_csv('/content/sample_data/tips.csv')
features = np.array(df.total_bill)
labels = np.array(df.tip)
m = features.shape[0]
mtip = np.mat(labels)
data = np.hstack((np.ones((m, 1)), np.mat(features).T))
OUTPUT
Result:
Thus the above program has been Executed Successfully.
AIM:
To Implement and demonstrate the working of the decision tree-based ID3 algorithm
ALGORITHM:
Algorithm:
PROGRAM:
import pandas as pd
import math
import numpy as np
data = pd.read_csv("/content/sample_data/3-dataset.csv")
features = [feat for feat in data]
features.remove("answer")
class Node:
def __init__(self):
self.children = []
self.value = ""
self.isLeaf = False
self.pred = ""
def entropy(examples):
pos = 0.0
neg = 0.0
for _, row in examples.iterrows():
if row["answer"] == "yes":
pos += 1
else:
neg += 1
if pos == 0.0 or neg == 0.0:
return 0.0
else:
p = pos / (pos + neg)
n = neg / (pos + neg)
return -(p * math.log(p, 2) + n * math.log(n, 2))
def info_gain(examples, attr):
uniq = np.unique(examples[attr])
#print ("\n",uniq)
gain = entropy(examples)
#print ("\n",gain)
for u in uniq:
subdata = examples[examples[attr] == u]
#print ("\n",subdata)
sub_e = entropy(subdata)
gain -= (float(len(subdata)) / float(len(examples))) * sub_e
#print ("\n",gain)
return gain
def ID3(examples, attrs):
root = Node()
max_gain = 0
max_feat = ""
for feature in attrs:
#print ("\n",examples)
gain = info_gain(examples, feature)
if gain > max_gain:
max_gain = gain
max_feat = feature
root.value = max_feat
#print ("\nMax feature attr",max_feat)
uniq = np.unique(examples[max_feat])
#print ("\n",uniq)
for u in uniq:
#print ("\n",u)
subdata = examples[examples[max_feat] == u]
#print ("\n",subdata)
if entropy(subdata) == 0.0:
newNode = Node()
newNode.isLeaf = True
newNode.value = u
newNode.pred = np.unique(subdata["answer"])
root.children.append(newNode)
else:
dummyNode = Node()
dummyNode.value = u
new_attrs = attrs.copy()
new_attrs.remove(max_feat)
child = ID3(subdata, new_attrs)
dummyNode.children.append(child)
root.children.append(dummyNode)
return root
Output:
Decision Tree is:
outlook
overcast -> ['yes']
rain
wind
strong -> ['no']
sunny
humidity
high -> ['no']
------------------
Predicted Label for new example {'outlook': 'sunny', 'temperature': 'hot', 'humidity': 'normal', 'wind': 'strong'} is:
['yes']
Result:
Thus the above program has been Executed Successfully.
AIM:
To Implement and demonstrate the working of a simple Support Vector Machines(SVM) using a dataset
ALGORITHM:
PROGRAM:
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn import svm
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, confusion_matrix
# Load dataset
iris = datasets.load_iris()
X = iris.data[:, :2] # Use only the first two features for visualization
y = iris.target
# Make predictions
y_pred = clf.predict(X_test)
plot_decision_boundary(clf, X, y)
OUTPUT:
precision recall f1-score support
accuracy 0.80 45
macro avg 0.78 0.77 0.77 45
weighted avg 0.81 0.80 0.80 45
[[19 0 0]
[ 0 7 6]
[ 0 3 10]]
Result:
Thus the above program has been Executed Successfully.
AIM:
Implement a k-Nearest Neighbour algorithm to classify the iris data set. Print both correct and wrong
predictions
ALGORITHM:
PROGRAM:
# k-Nearest Neighbour Algorithm for Iris Dataset Classification
import numpy as np
import pandas as pd
iris = datasets.load_iris()
X = iris.data
y = iris.target
knn = KNeighborsClassifier(n_neighbors=3)
knn.fit(X_train, y_train)
# Make predictions
predictions = knn.predict(X_test)
for i in range(len(predictions)):
if predictions[i] == y_test[i]:
print(f"Correct Prediction: {predictions[i]} for Actual: {y_test[i]}")
else:
# Print accuracy
OUTPUT
Accuracy: 100.00%
RESULT:
AIM:
To implement market basket analysis using association rules
ALGORITHM:
1. Load a transaction dataset with minimum support threshold, minimum confidence threshold.
PROGRAM:
#Import all relevant libraries
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import association_rules
#Load the file into pandas
df = pd.read_excel("http://archive.ics.uci.edu/ml/machine-learning-databases/00352/Online%20Retail.xlsx")
#Check the first 5 rows of the dataframe
df.head()
df.info()
df.isna().sum()
df.dropna(inplace=True)
len(df)
#Convert the InvoiceNo column to string
df["InvoiceNo"] = df["InvoiceNo"].astype('str')
len(df)
#Check the distribution of transactions per country.
top10 = df["Country"].value_counts().head(10)
top10
#Create a pie chart to show distribution of transactions
plt.figure(figsize=[8,8])
plt.pie(top10,labels=top10.index, autopct = '%0.0f%%',labeldistance=1.3)
plt.title("Distribution of Transactions by Country")
plt.show()
#Group, sum, unstack and set index of dataframe
basket = df[df['Country'] =="United Kingdom"]\
.groupby(['InvoiceNo', 'Description'])["Quantity"]\
.sum().unstack()\
.reset_index().fillna(0)\
.set_index("InvoiceNo")
basket.head()
#Create function to hot encode the values
def encode_values(x):
if x <= 0:
return 0
if x >= 1:
return 1
basket_encoded
#filter for only invoices with 2 or more items
basket_filtered = basket_encoded[(basket_encoded > 0).sum(axis=1) >= 2]
basket_filtered
#Generate the frequent itemsets
frequent_itemsets = apriori(basket_filtered, min_support=0.03,
use_colnames=True).sort_values("support",ascending=False)
frequent_itemsets.head(10)
#Apply association rules
assoc_rules = association_rules(frequent_itemsets, metric="lift",
min_threshold=1).sort_values("lift",ascending=False).reset_index(drop=True)
assoc_rules
OUTPUT:
AIM:
To build an ANN by implementing the Single-layer Perceptron. Test it using appropriate
data sets.
ALGORITHM:
1. Initialize parameters:
3. Training loop:
4. Testing:
PROGRAM:
import numpy as np
class SingleLayerPerceptron:
def __init__(self, learning_rate=0.01, n_iter=1000):
self.learning_rate = learning_rate
self.n_iter = n_iter
self.weights = None
self.bias = None
for _ in range(self.n_iter):
for idx, x_i in enumerate(X):
linear_output = np.dot(x_i, self.weights) + self.bias
y_predicted = self.activation_function(linear_output)
OUTPUT:
Predictions: [0 0 0 1]
RESULT:
Ex 13 MULTIPLE-LAYER PERCEPTRON.
DATE:
AIM:
To implement Multi-layer Perceptron and test the same using appropriate data sets
ALGORITHM:
1. Import necessary libraries
2. Generate a dataset
3. Split the dataset into training and testing sets
4. Create a Multiple-layer Perceptron model
5. Train the model
6. Make predictions
7. Calculate accuracy
PROGRAM:
import numpy as np
from sklearn.datasets import make_moons
from sklearn.model_selection import train_test_split
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import accuracy_score
# Generate a dataset
X, y = make_moons(n_samples=1000, noise=0.1, random_state=42)
# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy:.2f}')
OUTPUT:
Accuracy: 1.00
RESULT:
Ex 14 RBF NETWORK
DATE:
AIM:
To Build a RBF Network to calculate the fitness function with five neurons.
ALGORITHM:
Use a heuristic or optimization technique to determine the widths of the RBF functions.
Use linear regression or other methods to train the weights between the hidden layer and the output layer.
PROGRAM:
# Radial Basis Function Network for Fitness Calculation
import numpy as np
from sklearn.metrics import pairwise
class RBFNetwork:
def __init__(self, n_neurons):
self.n_neurons = n_neurons
self.centers = np.random.rand(n_neurons, 2) # Random centers in 2D space
self.sigmas = np.random.rand(n_neurons) # Random widths for each neuron
# Example usage
if __name__ == "__main__":
rbf_network = RBFNetwork(n_neurons=5)
X = np.random.rand(10, 2) # 10 random input samples
rbf_network.fit(X)
fitness = rbf_network.predict(X)
print(fitness)
OUTPUT:
AIM:
To implement deep learning using Tensorflow
ALGORITHM:
Import libraries: TensorFlow is imported to implement the model. tf.keras is used for high-level deep
learning tasks.
Load the dataset: The dataset, such as MNIST or CIFAR-10, is loaded. Both training and testing sets include
images and their respective labels.
Preprocess the dataset: The image data is normalized to have pixel values between 0 and 1 to improve the
convergence of the model.
Conv2D layer: Performs convolution with 32 filters and a 3x3 kernel to detect features in the images.
MaxPooling2D layer: Reduces the spatial size of the feature maps, reducing computational load and
focusing on the most prominent features.
Flatten layer: Converts the 2D matrix to a 1D vector for the fully connected layers.
Dense layers: Fully connected layers. The final Dense layer has a number of neurons equal to the number
of output classes, and it uses the softmax activation function for multi-class classification.
Compile the model: The Adam optimizer is used for optimization, categorical cross-entropy for the loss function,
and accuracy is tracked as a metric.
Train the model: The model is trained for 10 epochs with a batch size of 32, using the training images and labels,
while validating against the test set.
Evaluate the model: The model's performance is evaluated using the test dataset to get the accuracy.
PROGRAM:
# General imports
%matplotlib inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import openml as oml
predictions = model.predict(X_test)
np.set_printoptions(precision=7)
fig, axes = plt.subplots(1, 1, figsize=(2, 2))
axes.imshow(X_test[sample_id].reshape(28, 28), cmap=plt.cm.gray_r)
axes.set_xlabel("True label: {}".format(y_test[sample_id]))
axes.set_xticks([])
axes.set_yticks([])
test_loss, test_acc = model.evaluate(X_test, y_test)
print('Test accuracy:', test_acc)
OUTPUT:
RESULT: