0% found this document useful (0 votes)

20 views8 pages

MSML Project 1

The document outlines a machine learning project that involves loading training and testing data from Google Drive, implementing various classifiers including Linear Discriminant Analysis (LDA), Decision Trees, k-Nearest Neighbors (kNN), and Support Vector Machines (SVM), and evaluating their performance using Type 1 and Type 2 error rates. It also discusses the effects of applying Principal Component Analysis (PCA) on the classifiers, showing improvements in error rates for both kNN and SVM. Finally, the document details the creation of a tar file for submission that includes the script and data files.

Uploaded by

Sravani Reddy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views8 pages

MSML Project 1

Uploaded by

Sravani Reddy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

from google.

colab import drive

[Link]('/content/drive')

Mounted at /content/drive

from [Link] import drive

import pandas as pd

# Mount Google Drive

[Link]('/content/drive')

# Load training and testing data

train_data = pd.read_csv('/content/drive/MyDrive/[Link]')
test_data = pd.read_csv('/content/drive/MyDrive/[Link]')

# Display the first few rows of the training data

print(train_data.head())

Drive already mounted at /content/drive; to attempt to forcibly

remount, call [Link]("/content/drive", force_remount=True).
Age Annual Income Credit Score Experience Loan Amount Loan
Duration \
0 45 39948 617 22 13152
48
1 38 39709 628 15 26045
48
2 47 40724 570 26 17627
36
3 58 69084 545 34 37898
96
4 58 51250 564 39 12741
48

Number of Dependents Monthly Debt Payment Creditcard Utilizatio

Rate \
0 2 183
0.354418
1 1 496
0.087827
2 2 902
0.137414
3 1 755
0.267587
4 0 337
0.367380

Number of Open Credit Lines ... Total Assets TotalLiabilities \

0 1 ... 146111 19183
1 5 ... 53204 9595
2 2 ... 25176 128874
3 2 ... 104822 5370
4 6 ... 65624 43894

MonthlyIncome UtilityBillsPaymentHistory JobTenure NetWorth \

0 3329.000000 0.724972 11 126928
1 3309.083333 0.935132 3 43609
2 3393.666667 0.872241 6 5205
3 5757.000000 0.896155 5 99452
4 4270.833333 0.884275 5 21730

InterestRate MonthlyLoanPayment TotalDebtToIncomeRatio

LoanApproved
0 0.227590 419.805992 0.181077
0
1 0.201077 794.054238 0.389852
0
2 0.212548 666.406688 0.462157
0
3 0.300911 1047.506980 0.313098
0
4 0.205271 391.300352 0.170529
0

[5 rows x 28 columns]

find a suitable projection vector 𝑤 w and classify based on the projections. Here’s how to
Binary Classifiers: Original Features 2.1 Linear Discriminant Analysis (LDA) For LDA, we need to

implement LDA:

import numpy as np
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
as LDA
import [Link] as plt

# Separate features and labels

X_train = train_data.iloc[:, :-1].values # all rows, all columns
except last
y_train = train_data.iloc[:, -1].values # all rows, last column
X_test = test_data.iloc[:, :-1].values
y_test = test_data.iloc[:, -1].values

# Fit LDA
lda = LDA()
[Link](X_train, y_train)

# Project the test data

y_test_proj = [Link](X_test)

# Classify based on a threshold

thresholds = [Link](-3, 3, 100)
type1_errors = []
type2_errors = []

for threshold in thresholds:

y_pred = (y_test_proj > threshold).astype(int)

type1_error = [Link]((y_pred == 0) & (y_test == 1)) / 200 #

Denied when Approved
type2_error = [Link]((y_pred == 1) & (y_test == 0)) / 200 #
Approved when Denied

type1_errors.append(type1_error)
type2_errors.append(type2_error)

# Plotting Type 1 and Type 2 error rates

[Link](thresholds, type1_errors, label='Type 1 Error Rate')
[Link](thresholds, type2_errors, label='Type 2 Error Rate')
[Link]('Threshold')
[Link]('Error Rate')
[Link]('Error Rates for LDA with Varying Thresholds')
[Link]()
[Link]()
[Link]()
2.2 Decision Tree Now let’s implement a decision tree classifier:

from [Link] import DecisionTreeClassifier

from [Link] import confusion_matrix

# Fit Decision Tree

tree_clf = DecisionTreeClassifier(random_state=42)
tree_clf.fit(X_train, y_train)

# Predictions
y_pred_tree = tree_clf.predict(X_test)

# Confusion Matrix
cm_tree = confusion_matrix(y_test, y_pred_tree)
type1_error_tree = cm_tree[1, 0] / 200 # Denied when Approved
type2_error_tree = cm_tree[0, 1] / 200 # Approved when Denied

print(f"Decision Tree - Type 1 Error Rate: {type1_error_tree}, Type 2

Error Rate: {type2_error_tree}")

Decision Tree - Type 1 Error Rate: 0.155, Type 2 Error Rate: 0.175

from [Link] import KNeighborsClassifier

k_values = [1, 3, 5, 10]

type1_errors_knn = []
type2_errors_knn = []

for k in k_values:
knn_clf = KNeighborsClassifier(n_neighbors=k)
knn_clf.fit(X_train, y_train)

y_pred_knn = knn_clf.predict(X_test)

cm_knn = confusion_matrix(y_test, y_pred_knn)

type1_error_knn = cm_knn[1, 0] / 200 # Denied when Approved
type2_error_knn = cm_knn[0, 1] / 200 # Approved when Denied

type1_errors_knn.append(type1_error_knn)
type2_errors_knn.append(type2_error_knn)

print("kNN Error Rates:")

for k, t1, t2 in zip(k_values, type1_errors_knn, type2_errors_knn):
print(f"k={k}: Type 1 Error Rate = {t1}, Type 2 Error Rate =
{t2}")

kNN Error Rates:

k=1: Type 1 Error Rate = 0.185, Type 2 Error Rate = 0.25
k=3: Type 1 Error Rate = 0.17, Type 2 Error Rate = 0.185
k=5: Type 1 Error Rate = 0.135, Type 2 Error Rate = 0.2
k=10: Type 1 Error Rate = 0.165, Type 2 Error Rate = 0.17
2.4 Support Vector Machine (SVM) Finally, let’s implement the SVM with a soft margin:

from [Link] import SVC

# Fit SVM
svm_clf = SVC(C=1.0, kernel='rbf', random_state=42) # Use RBF kernel
svm_clf.fit(X_train, y_train)

# Predictions
y_pred_svm = svm_clf.predict(X_test)

# Confusion Matrix
cm_svm = confusion_matrix(y_test, y_pred_svm)
type1_error_svm = cm_svm[1, 0] / 200 # Denied when Approved
type2_error_svm = cm_svm[0, 1] / 200 # Approved when Denied

print(f"SVM - Type 1 Error Rate: {type1_error_svm}, Type 2 Error Rate:

{type2_error_svm}")

SVM - Type 1 Error Rate: 0.135, Type 2 Error Rate: 0.14

Step 3: Binary Classifiers: PCA Features Now, let’s apply PCA and use it to train the kNN and
SVM classifiers:

from [Link] import PCA

# Apply PCA
pca = PCA(n_components=5) # Change number of components as needed
X_train_pca = pca.fit_transform(X_train)
X_test_pca = [Link](X_test)

# kNN with PCA

knn_clf_pca = KNeighborsClassifier(n_neighbors=5)
knn_clf_pca.fit(X_train_pca, y_train)
y_pred_knn_pca = knn_clf_pca.predict(X_test_pca)

cm_knn_pca = confusion_matrix(y_test, y_pred_knn_pca)

type1_error_knn_pca = cm_knn_pca[1, 0] / 200 # Denied when Approved
type2_error_knn_pca = cm_knn_pca[0, 1] / 200 # Approved when Denied

print(f"kNN with PCA - Type 1 Error Rate: {type1_error_knn_pca}, Type

2 Error Rate: {type2_error_knn_pca}")

# SVM with PCA

svm_clf_pca = SVC(C=1.0, kernel='rbf', random_state=42)
svm_clf_pca.fit(X_train_pca, y_train)
y_pred_svm_pca = svm_clf_pca.predict(X_test_pca)

cm_svm_pca = confusion_matrix(y_test, y_pred_svm_pca)

type1_error_svm_pca = cm_svm_pca[1, 0] / 200 # Denied when Approved
type2_error_svm_pca = cm_svm_pca[0, 1] / 200 # Approved when Denied

print(f"SVM with PCA - Type 1 Error Rate: {type1_error_svm_pca}, Type

2 Error Rate: {type2_error_svm_pca}")

kNN with PCA - Type 1 Error Rate: 0.125, Type 2 Error Rate: 0.19
SVM with PCA - Type 1 Error Rate: 0.125, Type 2 Error Rate: 0.14

import pandas as pd
import numpy as np
import [Link] as plt

# Assuming these are the error rates you obtained:

results = {
'Classifier': ['Decision Tree', 'kNN (k=1)', 'kNN (k=3)', 'kNN
(k=5)', 'kNN (k=10)', 'SVM', 'kNN with PCA', 'SVM with PCA'],
'Type 1 Error Rate': [0.155, 0.185, 0.17, 0.135, 0.165, 0.135,
0.125, 0.125],
'Type 2 Error Rate': [0.175, 0.25, 0.185, 0.2, 0.17, 0.14, 0.19,
0.14]
}

# Create a DataFrame
error_df = [Link](results)

# Display the error rates

print("Error Rate Summary:")
print(error_df)

# Plotting error rates

fig, ax = [Link](figsize=(12, 6))

# Bar width
bar_width = 0.35

# Index for bar positions

index = [Link](len(error_df))

# Plot Type 1 Error Rates

[Link](index, error_df['Type 1 Error Rate'], bar_width, label='Type 1
Error Rate', alpha=0.7)

# Plot Type 2 Error Rates

[Link](index + bar_width, error_df['Type 2 Error Rate'], bar_width,
label='Type 2 Error Rate', alpha=0.5)

ax.set_ylabel('Error Rate')
ax.set_title('Error Rates for Different Classifiers')
ax.set_xticks(index + bar_width / 2)
ax.set_xticklabels(error_df['Classifier'])
[Link]()
[Link]()
plt.tight_layout()
[Link]()

# Discuss PCA Effects

pca_effects = """
Discussion on PCA Effects:
1. kNN with PCA showed a decrease in Type 1 error rate (from 0.135 to
0.125), indicating improved performance in avoiding false negatives.
2. The Type 2 error rate remained stable (0.19), suggesting that while
the model became better at detecting approved applications, it
maintained a similar rate of false positives.
3. SVM with PCA had similar outcomes, improving its Type 1 error while
keeping Type 2 errors consistent.
4. Overall, PCA appears to aid in reducing errors for both kNN and SVM
classifiers by focusing on the most informative features.
"""
print(pca_effects)

Error Rate Summary:

Classifier Type 1 Error Rate Type 2 Error Rate
0 Decision Tree 0.155 0.175
1 kNN (k=1) 0.185 0.250
2 kNN (k=3) 0.170 0.185
3 kNN (k=5) 0.135 0.200
4 kNN (k=10) 0.165 0.170
5 SVM 0.135 0.140
6 kNN with PCA 0.125 0.190
7 SVM with PCA 0.125 0.140
Discussion on PCA Effects:
1. kNN with PCA showed a decrease in Type 1 error rate (from 0.135 to
0.125), indicating improved performance in avoiding false negatives.
2. The Type 2 error rate remained stable (0.19), suggesting that while
the model became better at detecting approved applications, it
maintained a similar rate of false positives.
3. SVM with PCA had similar outcomes, improving its Type 1 error while
keeping Type 2 errors consistent.
4. Overall, PCA appears to aid in reducing errors for both kNN and SVM
classifiers by focusing on the most informative features.

import tarfile
import os

# Define the name of the tar file

submission_file = '/content/drive/MyDrive/project_submission.[Link]'
# Save the tar file in your Google Drive

# Create a tar file for submission

with [Link](submission_file, 'w:gz') as tar:
# Add your main script
[Link]('/content/drive/MyDrive/your_script.py') # Replace with
your actual script name
# Add the training and testing data if needed
[Link]('/content/drive/MyDrive/[Link]')
[Link]('/content/drive/MyDrive/[Link]')

# Check the contents of the tar file

with [Link](submission_file, 'r:gz') as tar:
print("Contents of the tar file:")
[Link]() # List the contents of the tar file

print(f"Submission file '{submission_file}' created successfully.")

Contents of the tar file:

?rw------- root/root 509 2024-10-26 [Link]
content/drive/MyDrive/your_script.py
?rw------- root/root 143548 2024-10-26 [Link]
content/drive/MyDrive/[Link]
?rw------- root/root 64023 2024-10-26 [Link]
content/drive/MyDrive/[Link]
Submission file '/content/drive/MyDrive/project_submission.[Link]'
created successfully.

Hands-On Activity 3.3 Random Forest Mantaring - Ipynb - Mantaring
No ratings yet
Hands-On Activity 3.3 Random Forest Mantaring - Ipynb - Mantaring
13 pages
Classification Problems
100% (1)
Classification Problems
25 pages
Loan Default Prediction System 1753830667
No ratings yet
Loan Default Prediction System 1753830667
11 pages
Week 12 Assignment
No ratings yet
Week 12 Assignment
8 pages
05 E RandomForest LoanData
No ratings yet
05 E RandomForest LoanData
8 pages
CCD - Ipynb - Colab
No ratings yet
CCD - Ipynb - Colab
6 pages
AIML Lab Ex 3-5 - 1
No ratings yet
AIML Lab Ex 3-5 - 1
31 pages
Naive Bayes Analysis for Personal Loans
No ratings yet
Naive Bayes Analysis for Personal Loans
4 pages
LendingClub Loan Default Prediction Model
No ratings yet
LendingClub Loan Default Prediction Model
18 pages
Loan - Approval - Prediction - Ipynb - Colab
No ratings yet
Loan - Approval - Prediction - Ipynb - Colab
7 pages
Advanced Modelling Techniques Anurag Payel
No ratings yet
Advanced Modelling Techniques Anurag Payel
41 pages
Articles Xgboost Classification With Smote-Enn Algorithm
No ratings yet
Articles Xgboost Classification With Smote-Enn Algorithm
11 pages
Credit Risk Prediction Model Overview
No ratings yet
Credit Risk Prediction Model Overview
19 pages
PA v0.7
No ratings yet
PA v0.7
15 pages
PA v0.21
No ratings yet
PA v0.21
17 pages
Credit Scores Classification
No ratings yet
Credit Scores Classification
104 pages
Loan Response Prediction Models
No ratings yet
Loan Response Prediction Models
97 pages
Python Code For Loan Default Prediction
No ratings yet
Python Code For Loan Default Prediction
4 pages
Final Project Making Predictions From Data-Course 2: October 6, 2020
No ratings yet
Final Project Making Predictions From Data-Course 2: October 6, 2020
20 pages
Assignment
No ratings yet
Assignment
5 pages
Loan Interest Prediction Using Linear Regression
No ratings yet
Loan Interest Prediction Using Linear Regression
26 pages
Ranvijay 12203409
No ratings yet
Ranvijay 12203409
13 pages
57 - AI2 - PRAC 6.ipynb - Colab
No ratings yet
57 - AI2 - PRAC 6.ipynb - Colab
3 pages
DAV Lab Manual Yashraj
No ratings yet
DAV Lab Manual Yashraj
28 pages
Bankruptcy Prediction Models
No ratings yet
Bankruptcy Prediction Models
29 pages
Classification
No ratings yet
Classification
3 pages
Loan Prediction
No ratings yet
Loan Prediction
33 pages
Ensemmmmm
No ratings yet
Ensemmmmm
10 pages
Final-12-Lab Programs
No ratings yet
Final-12-Lab Programs
30 pages
Loan Prediction for Banks
No ratings yet
Loan Prediction for Banks
3 pages
Credit Risk Modelling (EDA & Classification) - Kaggle
No ratings yet
Credit Risk Modelling (EDA & Classification) - Kaggle
21 pages
SanatKulkarni - AP22110010183 - Assignment3-1
No ratings yet
SanatKulkarni - AP22110010183 - Assignment3-1
4 pages
Loan Approval Prediction Models
No ratings yet
Loan Approval Prediction Models
10 pages
Asg One
No ratings yet
Asg One
10 pages
Aosdijfpqoiew
No ratings yet
Aosdijfpqoiew
6 pages
Credit Card Approval Prediction
No ratings yet
Credit Card Approval Prediction
90 pages
Case Study Stock Market Prediciton
No ratings yet
Case Study Stock Market Prediciton
10 pages
Analyzing Customer Data with NumPy
No ratings yet
Analyzing Customer Data with NumPy
9 pages
DA Programs
No ratings yet
DA Programs
44 pages
DS2 C5 S1 Preparing Data Machine Learning Concept Codebook
No ratings yet
DS2 C5 S1 Preparing Data Machine Learning Concept Codebook
1 page
Observation: As We Can See We Have Threwe Types of Datatypes I.E. (Int, Float, Object) That Means We Have Both Categorical and Numerical Data
No ratings yet
Observation: As We Can See We Have Threwe Types of Datatypes I.E. (Int, Float, Object) That Means We Have Both Categorical and Numerical Data
2 pages
Group 5 Dseb64a Report
No ratings yet
Group 5 Dseb64a Report
10 pages
Practical 3
No ratings yet
Practical 3
8 pages
MLT 1 - 7 Kanish
No ratings yet
MLT 1 - 7 Kanish
24 pages
Openlab 1
No ratings yet
Openlab 1
17 pages
Supervised ML: CART, RF, ANN Models
No ratings yet
Supervised ML: CART, RF, ANN Models
65 pages
Machine Learning for Loan Approval Prediction
No ratings yet
Machine Learning for Loan Approval Prediction
31 pages
Machine Learning Lab New
No ratings yet
Machine Learning Lab New
14 pages
Perform Prediction Using Regression Algorithm: Ex No: 1 Date
No ratings yet
Perform Prediction Using Regression Algorithm: Ex No: 1 Date
13 pages
Banking Marketing Target Prediction
No ratings yet
Banking Marketing Target Prediction
13 pages
EDA and Modeling for Insurance Data
No ratings yet
EDA and Modeling for Insurance Data
11 pages
Decision Tree & Random Forest Guide
No ratings yet
Decision Tree & Random Forest Guide
7 pages
Module 3.4 Classification Models, Case Study
No ratings yet
Module 3.4 Classification Models, Case Study
12 pages
Credit Card Fraud Analysis Ashutosh
No ratings yet
Credit Card Fraud Analysis Ashutosh
3 pages
Loan Approval
No ratings yet
Loan Approval
12 pages
Data Science for Home Loan Automation
No ratings yet
Data Science for Home Loan Automation
11 pages
Anko - Opt-In List - 22-Aug-2025 16 - 08 - 16
No ratings yet
Anko - Opt-In List - 22-Aug-2025 16 - 08 - 16
28 pages
EM vs K-Means Clustering Comparison
No ratings yet
EM vs K-Means Clustering Comparison
3 pages
Types of Learning
No ratings yet
Types of Learning
7 pages
NLPin Stock Marketpredictionby Rodrigue Andrawos
No ratings yet
NLPin Stock Marketpredictionby Rodrigue Andrawos
13 pages
Virtual Stylist Application Proposal
No ratings yet
Virtual Stylist Application Proposal
4 pages
Assignment - 13: Title
No ratings yet
Assignment - 13: Title
2 pages
Udaan Research
No ratings yet
Udaan Research
7 pages
Neural Networks in Pattern Recognition
No ratings yet
Neural Networks in Pattern Recognition
10 pages
Phase3 PDF
No ratings yet
Phase3 PDF
4 pages
Data ScientistGiang Vo
No ratings yet
Data ScientistGiang Vo
1 page
Some Methods of Constructing Kernel
No ratings yet
Some Methods of Constructing Kernel
23 pages
Fusing Global and Local Features For Generalized AI-synthesized Image Detection
No ratings yet
Fusing Global and Local Features For Generalized AI-synthesized Image Detection
5 pages
K-Means Clustering Algorithm - Javatpoint
No ratings yet
K-Means Clustering Algorithm - Javatpoint
8 pages
IEEE Paper Format Template
No ratings yet
IEEE Paper Format Template
3 pages
Machine Learning Re Defining Semiconductor Industry 1598272842
No ratings yet
Machine Learning Re Defining Semiconductor Industry 1598272842
33 pages
Securing IoT and Big Data Next Generation Intelligence
No ratings yet
Securing IoT and Big Data Next Generation Intelligence
191 pages
Odd Semester Exam Schedule School of Computing
No ratings yet
Odd Semester Exam Schedule School of Computing
15 pages
Dynamic CNNs for Text Classification
No ratings yet
Dynamic CNNs for Text Classification
10 pages
Report of AI & Machine Learning Model For Step Data Analysis
No ratings yet
Report of AI & Machine Learning Model For Step Data Analysis
2 pages
A Critical Assessment of The Proposed Data Network Effect Theory
No ratings yet
A Critical Assessment of The Proposed Data Network Effect Theory
28 pages
Understanding K-Nearest Neighbors (KNN)
No ratings yet
Understanding K-Nearest Neighbors (KNN)
9 pages
Unsupervised Deraining: Where Contrastive Learning Meets Self-Similarity
No ratings yet
Unsupervised Deraining: Where Contrastive Learning Meets Self-Similarity
10 pages
Prognostics and Health Management (PHM)
100% (1)
Prognostics and Health Management (PHM)
16 pages
Handwriting Personality Analysis Study
No ratings yet
Handwriting Personality Analysis Study
6 pages
Logistic Regression for Beginners
No ratings yet
Logistic Regression for Beginners
9 pages
Python Voice Assistant Guide
No ratings yet
Python Voice Assistant Guide
7 pages
Ai Study This Properly of Class 6,7,8
No ratings yet
Ai Study This Properly of Class 6,7,8
37 pages
Artificial Intelligence Techniques For Landslides Prediction Using Satellite Imagery
No ratings yet
Artificial Intelligence Techniques For Landslides Prediction Using Satellite Imagery
17 pages
15.-Oracle Cloud Infrastructure AI Foundations
No ratings yet
15.-Oracle Cloud Infrastructure AI Foundations
14 pages
Joshi Shruti Resume
No ratings yet
Joshi Shruti Resume
1 page

MSML Project 1

Uploaded by

MSML Project 1

Uploaded by

from google.

colab import drive

from [Link] import drive

# Mount Google Drive

# Load training and testing data

# Display the first few rows of the training data

Drive already mounted at /content/drive; to attempt to forcibly

Number of Dependents Monthly Debt Payment Creditcard Utilizatio

Number of Open Credit Lines ... Total Assets TotalLiabilities \

MonthlyIncome UtilityBillsPaymentHistory JobTenure NetWorth \

InterestRate MonthlyLoanPayment TotalDebtToIncomeRatio

# Separate features and labels

# Project the test data

# Classify based on a threshold

for threshold in thresholds:

type1_error = [Link]((y_pred == 0) & (y_test == 1)) / 200 #

# Plotting Type 1 and Type 2 error rates

from [Link] import DecisionTreeClassifier

# Fit Decision Tree

print(f"Decision Tree - Type 1 Error Rate: {type1_error_tree}, Type 2

from [Link] import KNeighborsClassifier

k_values = [1, 3, 5, 10]

cm_knn = confusion_matrix(y_test, y_pred_knn)

print("kNN Error Rates:")

kNN Error Rates:

from [Link] import SVC

print(f"SVM - Type 1 Error Rate: {type1_error_svm}, Type 2 Error Rate:

SVM - Type 1 Error Rate: 0.135, Type 2 Error Rate: 0.14

from [Link] import PCA

# kNN with PCA

cm_knn_pca = confusion_matrix(y_test, y_pred_knn_pca)

print(f"kNN with PCA - Type 1 Error Rate: {type1_error_knn_pca}, Type

# SVM with PCA

cm_svm_pca = confusion_matrix(y_test, y_pred_svm_pca)

print(f"SVM with PCA - Type 1 Error Rate: {type1_error_svm_pca}, Type

# Assuming these are the error rates you obtained:

# Display the error rates

# Plotting error rates

# Index for bar positions

# Plot Type 1 Error Rates

# Plot Type 2 Error Rates

# Discuss PCA Effects

Error Rate Summary:

# Define the name of the tar file

# Create a tar file for submission

# Check the contents of the tar file

print(f"Submission file '{submission_file}' created successfully.")

Contents of the tar file:

You might also like