0% found this document useful (0 votes)
35 views3 pages

P2) Code Email Spam Detection

Uploaded by

riteshakhade1234
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views3 pages

P2) Code Email Spam Detection

Uploaded by

riteshakhade1234
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

10/15/24, 5:28 PM Email Spam Classification

In [14]: import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score
from sklearn.neighbors import KNeighborsClassifier

In [15]: df = pd.read_csv("./emails.csv")

In [16]: df.head()

Out[16]: Email
the to ect and for of a you hou ... connevey jay valued lay infrastructure
No.

Email
0 0 0 1 0 0 0 2 0 0 ... 0 0 0 0 0
1

Email
1 8 13 24 6 6 2 102 1 27 ... 0 0 0 0 0
2

Email
2 0 0 1 0 0 0 8 0 0 ... 0 0 0 0 0
3

Email
3 0 5 22 0 5 1 51 2 10 ... 0 0 0 0 0
4

Email
4 7 6 17 1 5 2 57 0 9 ... 0 0 0 0 0
5

5 rows × 3002 columns

In [17]: df.isnull().sum()

Email No. 0
Out[17]:
the 0
to 0
ect 0
and 0
..
military 0
allowing 0
ff 0
dry 0
Prediction 0
Length: 3002, dtype: int64

In [18]: X = df.iloc[:,1:3001]
X

localhost:8888/nbconvert/html/OneDrive/Desktop/BE/Final Yr/SPPU-CSE-SEM7-Codes-main/ML/2. Email Spam Classification/Email Spam Classificati… 1/3


10/15/24, 5:28 PM Email Spam Classification

Out[18]: the to ect and for of a you hou in ... enhancements connevey jay valued lay

0 0 0 1 0 0 0 2 0 0 0 ... 0 0 0 0 0

1 8 13 24 6 6 2 102 1 27 18 ... 0 0 0 0 0

2 0 0 1 0 0 0 8 0 0 4 ... 0 0 0 0 0

3 0 5 22 0 5 1 51 2 10 1 ... 0 0 0 0 0

4 7 6 17 1 5 2 57 0 9 3 ... 0 0 0 0 0

... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...

5167 2 2 2 3 0 0 32 0 0 5 ... 0 0 0 0 0

5168 35 27 11 2 6 5 151 4 3 23 ... 0 0 0 0 0

5169 0 0 1 1 0 0 11 0 0 1 ... 0 0 0 0 0

5170 2 7 1 0 2 1 28 2 0 8 ... 0 0 0 0 0

5171 22 24 5 1 6 5 148 8 2 23 ... 0 0 0 0 0

5172 rows × 3000 columns

In [19]: Y = df.iloc[:,-1].values
Y

array([0, 0, 0, ..., 1, 1, 0], dtype=int64)


Out[19]:

In [20]: train_x,test_x,train_y,test_y = train_test_split(X,Y,test_size = 0.25)

In [21]: svc = SVC(C=1.0,kernel='rbf',gamma='auto')


# C here is the regularization parameter. Here, L2 penalty is used(default). It is the
# As C increases, model overfits.
# Kernel here is the radial basis function kernel.
# gamma (only used for rbf kernel) : As gamma increases, model overfits.
svc.fit(train_x,train_y)
y_pred2 = svc.predict(test_x)
print("Accuracy Score for SVC : ", accuracy_score(y_pred2,test_y))

Accuracy Score for SVC : 0.8979118329466357

In [22]: X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size = 0.2, random_stat

In [23]: knn = KNeighborsClassifier(n_neighbors=7)

In [24]: knn.fit(X_train, y_train)

Out[24]: ▾ KNeighborsClassifier
KNeighborsClassifier(n_neighbors=7)

In [25]: print(knn.predict(X_test))

[0 0 1 ... 0 1 0]

localhost:8888/nbconvert/html/OneDrive/Desktop/BE/Final Yr/SPPU-CSE-SEM7-Codes-main/ML/2. Email Spam Classification/Email Spam Classificati… 2/3


10/15/24, 5:28 PM Email Spam Classification

In [26]: print(knn.score(X_test, y_test))

0.8685990338164251

In [ ]:

localhost:8888/nbconvert/html/OneDrive/Desktop/BE/Final Yr/SPPU-CSE-SEM7-Codes-main/ML/2. Email Spam Classification/Email Spam Classificati… 3/3

You might also like