FML_LAB FILE
Practical File
Submitted in partial fulfillment for the evaluation of
“Fundamentals of Machine
Learning-Lab”
Submitted By:
Students Name: Kunal Saini , Aditya Jain , Aditi Jain , Ananya Tyagi , Aditya Pachisia.
Enrolment Number: 07317711621, 07617711621, 08217711621, 08817711621, 09117711621
Branch & Section: AIML(B)
Submitted To:
Dr. Sonakshi Vij
1
FML_LAB FILE
Index
[Link] Details Experiment Date Grade/Evaluation Sign
No.
2
FML_LAB FILE
Experiment 1: Study and Implement Linear Regression.
Abstract:
3
FML_LAB FILE
Code: import pandas as pd
import [Link] as plt
from sklearn.linear_model import LinearRegression
from [Link] import mean_squared_error
# Load the dataset
data = pd.read_csv('Seed_Data.csv')
# Select the 'area' and 'lengthOfKernel' columns for linear regression
X = data[['area']] # Predictor variable
y = data['lengthOfKernel'] # Target variable
# Create a linear regression model
model = LinearRegression()
# Fit the model to the data
[Link](X, y)
# Perform predictions
y_pred = [Link](X)
# Calculate the mean squared error
mse = mean_squared_error(y, y_pred)
print("Mean Squared Error:", mse)
# Plot the linear regression line
[Link](X, y, color='blue', label='Data')
[Link](X, y_pred, color='red', label='Linear Regression')
[Link]('area')
[Link]('lengthOfKernel')
[Link]('Linear Regression: area vs lengthOfKernel')
[Link]()
[Link]()
# Select the 'compactness' and 'widthOfKernel' columns for linear regression
X = data[['compactness']] # Predictor variable
y = data['widthOfKernel'] # Target variable
4
FML_LAB FILE
# Create a linear regression model
model = LinearRegression()
# Fit the model to the data
[Link](X, y)
# Perform predictions
y_pred = [Link](X)
# Calculate the mean squared error
mse = mean_squared_error(y, y_pred)
print("Mean Squared Error:", mse)
# Plot the linear regression line
[Link](X, y, color='blue', label='Data')
[Link](X, y_pred, color='red', label='Linear Regression')
[Link]('compactness')
[Link]('widthOfKernel')
[Link]('Linear Regression: compactness vs widthOfKernel')
[Link]()
[Link]()
# Select the 'perimeter' and 'asymmetryCoefficient' columns for linear
regression
X = data[['perimeter']] # Predictor variable
y = data['asymmetryCoefficient'] # Target variable
# Create a linear regression model
model = LinearRegression()
# Fit the model to the data
[Link](X, y)
# Perform predictions
y_pred = [Link](X)
# Calculate the mean squared error
5
FML_LAB FILE
mse = mean_squared_error(y, y_pred)
print("Mean Squared Error:", mse)
# Plot the linear regression line
[Link](X, y, color='blue', label='Data')
[Link](X, y_pred, color='red', label='Linear Regression')
[Link](' perimeter ')
[Link](‘asymmetryCoefficient’)
[Link]('Linear Regression: perimeter vs asymmetryCoefficient’)
[Link]()
[Link]()
Output:
Mean Squared Error: 0.019054032881784515
6
FML_LAB FILE
Mean Squared Error: 0.05962293563300296
Mean Squared Error: 2.1436398314362863
In [13]:
7
FML_LAB FILE
Experiment 2: Study and Implement Logistic Regression.
Abstract:
8
FML_LAB FILE
Code:
import pandas as pd
import numpy as np
import [Link] as plt
from sklearn.linear_model import LogisticRegression
from [Link] import accuracy_score, confusion_matrix
from sklearn.model_selection import train_test_split
# Load the dataset
data = pd.read_csv('Seed_Data.csv')
# Select the ' lengthOfKernel ' and ' lengthofkernelgroove ' columns for logistic
regression
X = data[[' lengthOfKernel ']] # Predictor variable
y = data[' lengthofkernelgroove '] # Target variable
# Reshape the ‘lengthOfKernel’ feature to a 1-
dimensional array X = [Link](X).reshape(-1, 1)
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)
# Create a logistic regression model
model = LogisticRegression()
# Fit the model to the training data
[Link](X_train, y_train)
# Perform predictions on the test data
y_pred = [Link](X_test)
# Calculate the accuracy of the model
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
# Create a confusion matrix
confusion_mat = confusion_matrix(y_test, y_pred)
print("Confusion Matrix:")
9
FML_LAB FILE
print(confusion_mat)
# Plot the logistic regression curve
X_range = [Link]([Link](), [Link](), 100).reshape(-1, 1)
y_proba = model.predict_proba(X_range)[:, 1]
[Link](X, y, color='blue', label='Data')
[Link](X_range, y_proba, color='red', label='Logistic Regression')
[Link]('LengthOfKernel ')
[Link](' lengthofkernelgroove ')
[Link]('Logistic Regression: LengthOfKernel vs lengthofkernelgroove’)
[Link]()
[Link]()
# Select the 'perimeter' and 'lengthofkernelgroove' columns for logistic regression
X = data[['perimeter']] # Predictor variable
y = data['lengthofkernelgroove'] # Target variable
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)
# Create a logistic regression model
model = LogisticRegression()
# Fit the model to the training data
[Link](X_train, y_train)
# Perform predictions on the test data
y_pred = [Link](X_test)
# Calculate the accuracy of the model
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
# Create a confusion matrix
confusion_mat = confusion_matrix(y_test, y_pred)
10
FML_LAB FILE
print("Confusion Matrix:")
print(confusion_mat)
# Plot the logistic regression curve
X_range = [Link]([Link](), [Link](), 100).reshape(-1, 1)
y_proba = model.predict_proba(X_range)[:, 1]
[Link](X, y, color='blue', label='Data')
[Link](X_range, y_proba, color='red', label='Logistic Regression')
[Link]('Perimeter')
[Link]('Lengthofkernelgroove')
[Link]('Logistic Regression: Perimeter vs Lengthofkernelgroove')
[Link]()
[Link]()
Output:
Accuracy: 0.5024390243902439
Confusion Matrix:
[[39 63]
[39 64]]
Logistic Regression: LengthOfKernel vs lengthofkernelgroove
11
FML_LAB FILE
Accuracy: 0.624390243902439
Confusion Matrix:
[[66 36]
[41 62]]
Logistic Regression: Perimeter vs Lengthofkernelgroove
12
FML_LAB FILE
Experiment 3: Study and Implement K Nearest Neighbour (KNN).
Abstract:
13
FML_LAB FILE
Code :
import pandas as pd
import numpy as np
import [Link] as plt
from sklearn.model_selection import train_test_split
from [Link] import KNeighborsClassifier
from [Link] import accuracy_score, confusion_matrix
# Load the dataset
data = pd.read_csv('Seed_Data.csv')
# Select the relevant features and lengthofkernelgroove variable
X = data[['perimeter', 'area', 'lengthOfKernel ', 'asymmetryCoefficient', 'compactness']]
y = data['lengthofkernelgroove']
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Create a KNN classifier with k=5
knn = KNeighborsClassifier(n_neighbors=5)
# Fit the model to the training data
[Link](X_train, y_train)
# Perform predictions on the test data
y_pred = [Link](X_test)
# Calculate the accuracy of the model
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
# Create a confusion matrix
confusion_mat = confusion_matrix(y_test, y_pred)
print("Confusion Matrix:")
print(confusion_mat)
# Scatter plot of perimeter vs. lengthOfKernel
[Link](data['perimeter'], data['lengthOfKernel '], c=data['lengthof kernelgroove'],
cmap='coolwarm')
[Link]('Perimeter')
[Link]('LengthOfKernel')
[Link]('KNN')
[Link](label='Lengthofkernelgroove')
[Link]()
14
FML_LAB FILE
# Histogram of asymmetryCoefficient (maximum heart rate achieved)
[Link](data[data['lengthofkernelgroove'] == 0]['asymmetryCoefficient'],
bins=30, alpha=0.5, label='Lengthofkernelgroove 0')
[Link](data[data['lengthofkernelgroove'] == 1]['asymmetryCoefficient'],
bins=30, alpha=0.5, label='Lengthofkernelgroove 1')
[Link]('asymmetryCoefficient')
[Link]('Frequency')
[Link]('KNN')
[Link]()
[Link]()
Output:
15
FML_LAB FILE
16
FML_LAB FILE
KNN
asymmetryCoefficient
17
Experiment 4: Study and Implement classification
using SVM.
Abstract:
17
Code:
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from [Link] import mean_squared_error, r2_score
# Load the CSV file into a pandas dataframe
df = pd.read_csv('Seed_Data.csv')
# Split the data into features and
lengthofkernelgroove variable X =
[Link](['compactness'], axis=1) # Features
y = df['compactness'] # Target variable
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)
# Train the linear regression model
model = LinearRegression()
[Link](X_train, y_train)
# Make predictions on the test set
y_pred = [Link](X_test)
# Evaluate the performance of the model using the mean squared error and R-
squared metric
print("Mean squared error:", mean_squared_error(y_test, y_pred))
print("R-squared:", r2_score(y_test, y_pred))
# Train the SVM model
model = SVR(kernel='linear')
[Link](X_train, y_train)
# Make predictions on the test set
y_pred = [Link](X_test)
18
# Evaluate the performance of the model using the mean squared error and R-
squared metric
print("Mean squared error:", mean_squared_error(y_test, y_pred))
print("R-squared:", r2_score(y_test, y_pred))
import [Link] as plt
from sklearn.model_selection import train_test_split
from [Link] import SVR
from [Link] import StandardScaler
from [Link] import r2_score
# Load the CSV file into a pandas dataframe
df = pd.read_csv('Seed_Data.csv')
# Extract the relevant features
X = df[['area']]
y = df['widthOfKernel']
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)
# Scale the data
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = [Link](X_test)
# Train the SVM model
svm_regressor = SVR(kernel='rbf')
svm_regressor.fit(X_train_scaled, y_train)
# Make predictions on the test data
y_pred = svm_regressor.predict(X_test_scaled)
# Calculate the coefficient of determination (R^2) on the test data
r2 = r2_score(y_test, y_pred)
print('R^2 score:', r2)
19
FML_LAB FILE
# Plot the predicted values and the actual values on the test data
[Link](figsize=(10, 5))
[Link](X_test, y_test, color='black')
[Link](X_test, y_pred, color='blue', linewidth=3)
[Link]('area')
[Link]('widthOfKernel')
[Link]('Support Vector Regression')
[Link]()
# Extract the relevant features
X = df[['lengthOfKernel ']]
y = df['asymmetryCoefficient']
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)
# Scale the data
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = [Link](X_test)
# Train the SVM model
svm_regressor = SVR(kernel='rbf')
svm_regressor.fit(X_train_scaled, y_train)
# Make predictions on the test data
y_pred = svm_regressor.predict(X_test_scaled)
# Calculate the coefficient of determination (R^2) on the test data
r2 = r2_score(y_test, y_pred)
print('R^2 score:', r2)
# Plot the predicted values and the actual values on the test data
[Link](figsize=(10, 5))
[Link](X_test, y_test, color='black')
[Link](X_test, y_pred, color='blue', linewidth=3)
[Link]('lengthOfKernel ')
20
[Link]('asymmetryCoefficient')
[Link]('Support Vector Regression')
[Link]()
Output:
R^2 score: -0.22790423264754622
compactness
R^2 score: -0.024969824035952604
21
22
23
FML_LAB FILE
Experiment 5: Study and Implement Bagging using Random Forests.
Abstract:
23
FML_LAB FILE
Code:
import pandas as
pd import numpy
as np
import [Link] as
plt import seaborn as sns
from sklearn.model_selection import train_test_split
from [Link] import RandomForestClassifier
from [Link] import accuracy_score, confusion_matrix
# Load the dataset
data = pd.read_csv('Seed_Data.csv')
# Select the features and lengthofkernelgroove variable
X = [Link]('lengthofkernelgroove', axis=1)
y =data['lengthofkernelgroove']
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Create a Random Forest classifier with 100 trees
rf = RandomForestClassifier(n_estimators=100, random_state=42)
# Fit the model to the training
data [Link](X_train, y_train)
# Perform predictions on the test
data y_pred = [Link](X_test)
# Calculate the accuracy of the model
accuracy = accuracy_score(y_test,
y_pred) print("Accuracy:", accuracy)
# Create a confusion matrix
confusion_mat = confusion_matrix(y_test,
y_pred) print("Confusion Matrix:")
print(confusion_mat)
# Feature Importance
feature_importance = rf.feature_importances_ feature_names = [Link]
# Sort feature importance in descending order
sorted_indices = [Link](feature_importance)[::-1]
sorted_feature_names = feature_names[sorted_indices]
sorted_feature_importance = feature_importance[sorted_indices]
24
FML_LAB FILE
#Bar plot of feature importance
[Link](figsize=(10, 6))
[Link](x=sorted_feature_importance, y=sorted_feature_names) [Link]('Feature Importance')
[Link]('Features')
[Link]('Random Forest: Feature
Importance') [Link]()
# Heatmap of confusion
matrix [Link](figsize=(8,
6))
[Link](confusion_mat, annot=True, cmap='Blues', fmt='d')
[Link]('Predicted')
[Link]('Actual')
[Link]('Random Forest: Confusion
Matrix') [Link]()
25
Output:
25
26
FML_LAB FILE
Experiment 6: Study and Implement Naive Bayes.
Abstract:
27
FML_LAB FILE
Code:
import pandas as pd
import seaborn as
sns
import [Link] as plt
from sklearn.model_selection import
train_test_split from sklearn.naive_bayes import
GaussianNB
from [Link] import confusion_matrix, classification_report
# Load the dataset
data = pd.read_csv("Seed_Data.csv")
# Split the dataset into features (X) and lengthofkernelgroove variable (y)
X = [Link]("lengthofkernelgroove", axis=1)
y = data["lengthofkernelgroove"]
# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Create a Gaussian Naive Bayes
model naive_bayes = GaussianNB()
# Train the model
naive_bayes.fit(X_train, y_train)
# Make predictions on the test set
y_pred = naive_bayes.predict(X_test)
# Create a confusion matrix
cm = confusion_matrix(y_test, y_pred)
# Plot the confusion
matrix
[Link](figsize=(8, 6))
[Link](cm, annot=True, cmap="Blues", fmt="d",
cbar=False) [Link]("Confusion Matrix")
[Link]("Predicted")
[Link]("Actual")
[Link]()
# Create a classification report
report = classification_report(y_test, y_pred)
# Print the classification
report print("Classification
Report:") print(report)
28
FML_LAB FILE
29
FML_LAB FILE
Output:
29
FML_LAB FILE
Experiment 7:
Abstract:
30
FML_LAB FILE
Code:
import pandas as pd
import [Link] as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from [Link] import DecisionTreeClassifier, plot_tree
from [Link] import confusion_matrix
# Load the dataset
data = pd.read_csv("Seed_Data.csv")
# Split the dataset into features (X) and lengthofkernelgroove variable (y)
X = [Link]("lengthofkernelgroove", axis=1)
y = data["lengthofkernelgroove"]
# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)
# Create a Decision Tree classifier
decision_tree = DecisionTreeClassifier()
# Train the model
decision_tree.fit(X_train, y_train)
# Plot the decision tree
[Link](figsize=(12, 8))
plot_tree(decision_tree , feature_names=[Link] ,class_names=['0','1'],
filled=True)
[Link]("Decision Tree")
[Link]()
# Make predictions on the test set
y_pred = decision_tree.predict(X_test)
# Create a confusion matrix
cm = confusion_matrix(y_test, y_pred)
31
FML_LAB FILE
# Plot the confusion matrix
[Link](figsize=(8, 6))
[Link](cm, annot=True, cmap="Blues", fmt="d", cbar=False)
[Link]("Confusion Matrix")
[Link]("Predicted")
[Link]("Actual")
[Link]()
# Generate the classification report
report = classification_report(y_test, y_pred)
print("Classification Report:")
print(report)
Output:
32
FML_LAB FILE
33