0% found this document useful (0 votes)

337 views16 pages

Spam Detection System Using ML Techniques

The document describes a project to design and develop a spam detection system for emails using classifiers. It involves preparing data from a publicly available email dataset, training classifiers like Naive Bayes and SVM on the data, validating the classifiers on separate data, and testing the classifiers on new emails to classify them as spam or clean. The results show that Support Vector Machine and Naive Bayes Multinomial classifiers performed better in terms of precision and recall compared to other classifiers. The system provides a way to filter spam emails and can be enhanced further.

Uploaded by

d0c0ngthanh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

337 views16 pages

Spam Detection System Using ML Techniques

Uploaded by

d0c0ngthanh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 16

SPAM DETECTION

AND FILTERING
By Prasanna Kunchavaram
Introduction

PROBLEM:
Unsolicited commercial email, commonly known as spam, is a
pressing problem on the Internet.

It undermines the usability of the email system and also costs

space thus delaying in system response.

GOAL :
The goal of the project is to design and develop a spam detection
system for emails by using classifiers like NaiveBayes,
NaiveBayes Multinomial, KNN, SVM.
Data Preparation

Publicly available email message data from lingspam data set is

used in the project.

The data is downloaded from

http://www.aueb.gr/users/ion/data/lingspam_public.tar.gz

Downloaded data which contains a series of folders with each

having clean and spam message need to be restructured into
clean and spam folders for training and validation purpose as a
part of data preparation.

A special method of Pruning is designed and implemented for

the purpose of restructuring the data to be used in the project
Steps involved in Spam Filtering

1) Training: Trains the system with two class data

2)Validation: In this step, we are going to validate the classifier

by providing the known structured data and check how the
system is classifying the data to measure its performance.

3)Testing: In this step, we are going to test the classifier to

classify the raw email messages into two classes as 'clean'
(copied into Inbox) 'spam' (copied into spam folder).
Main Screen
Training

In this phase, we are going to train the system with the data of
known classes.
I am using Part 1 to part 7 as a training data to the system.
Input to the training step is the path of structured data which is
already classified into two classes.
Then the system builds the model using any one of the
following algorithms:
Nave Bayes
Nave Bayes Multinomial
K-Nearest Neighbour
Support Vector Machine
Training Screen
Validation

In this step, we are going to validate the model which is built

during training phase.
I am using Parts 8 and 9 of lingspam data set for validation.
The input to the validation is the dataset of known classes.
Then the system classifies the data into classes using the model.
This phase is used to check how the system is classifying the
data to measure(like Precision and Recall) its performance and
thus compare the algorithms to select the optimal method.
Validation Screen
Testing/Classification

In this step, we are going to use the classifier

I am using Part 10 of the lingspam dataset as an input to test the
model.
Then the system classifies the given dataset into two classes
'spam' and 'clean'.
Classified file is then copied into Inbox folder if it is classified
as clean or into Spam folder otherwise.
Test Screen
Results

Precision Recall
110 110

105
100

100
90
NB NB
95
NB_Multinomial
NB_Multinomial
KNN 80
KNN
SVM
90 SVM

70
85

60
80

75 50
Dataset1 Dataset2 Dataset3 Dataset4 Dataset5 Dataset1 Dataset2 Dataset3 Dataset4 Dataset5
Results (continued)
Correctly classified Wrongly classified
(TP+TN) (FP+FN)
290 25

285
20
280

275
NB 15 NB
NB_Multinomial NB_Multinomial
270 KNN KNN
SVM
10 SVM
265

260
5
255

250 0
Dataset1 Dataset2 Dataset3 Dataset4 Dataset5 Dataset1 Dataset2 Dataset3 Dataset4 Dataset5
Conclusion

Spam filter system is implemented using WEKA java API and

implementing 4 algorithms.

Based on results Support Vector Machine and Nave Bayes

Multinomial classifier seem to perform better in classification.

This system can be further enhanced by including the

functionality of reading Web Pages so that web browsing can be
made a spam-free experience by avoiding advertisements and
unwanted malicious websites.
DEMO
THANK YOU

Vishal FOML Micro Project Vishal & Milan
No ratings yet
Vishal FOML Micro Project Vishal & Milan
26 pages
Pruthviraj Micor Foml
No ratings yet
Pruthviraj Micor Foml
26 pages
Spam Email Classifier
No ratings yet
Spam Email Classifier
17 pages
Email Spam Detection Using Machine Learning
No ratings yet
Email Spam Detection Using Machine Learning
2 pages
Machine Learning for Spam Detection
No ratings yet
Machine Learning for Spam Detection
14 pages
Spam Email Classifier
No ratings yet
Spam Email Classifier
16 pages
ML Algorithms for Spam Detection
No ratings yet
ML Algorithms for Spam Detection
10 pages
Spam Classifier
No ratings yet
Spam Classifier
8 pages
Kriti - Report FINAL
No ratings yet
Kriti - Report FINAL
11 pages
Ai Project
No ratings yet
Ai Project
8 pages
Email Spam Detection PPT Github
No ratings yet
Email Spam Detection PPT Github
11 pages
Email Spam Detection for ML Experts
No ratings yet
Email Spam Detection for ML Experts
7 pages
Enhancing Email Security With Naïve Bayes Spam Detection - Docx Fully Edited
No ratings yet
Enhancing Email Security With Naïve Bayes Spam Detection - Docx Fully Edited
64 pages
Spam Mail Classifier
No ratings yet
Spam Mail Classifier
8 pages
Major-Final Research Paper
No ratings yet
Major-Final Research Paper
3 pages
Second Progress Report
No ratings yet
Second Progress Report
17 pages
B. Flowchart of The Model: Esult
No ratings yet
B. Flowchart of The Model: Esult
3 pages
A Support Vector Machine Based Naive Bayes Algorithm For Spam Filtering
No ratings yet
A Support Vector Machine Based Naive Bayes Algorithm For Spam Filtering
8 pages
ML Project - Classifying Spam Emails
No ratings yet
ML Project - Classifying Spam Emails
3 pages
Email Spam CLassification
No ratings yet
Email Spam CLassification
16 pages
Presentation 3
No ratings yet
Presentation 3
13 pages
AI Phase2
No ratings yet
AI Phase2
42 pages
Content Based Spam Detection in Email Us PDF
No ratings yet
Content Based Spam Detection in Email Us PDF
5 pages
Spam Detection Using ID3 Decision Trees
No ratings yet
Spam Detection Using ID3 Decision Trees
4 pages
Spam Filtering Model Development Guide
No ratings yet
Spam Filtering Model Development Guide
8 pages
Spam Detection Using Naive Bayes
No ratings yet
Spam Detection Using Naive Bayes
11 pages
Report
No ratings yet
Report
11 pages
44 Decision Tree Model For Email Classification
No ratings yet
44 Decision Tree Model For Email Classification
4 pages
Python Spam Mail Detection Program
No ratings yet
Python Spam Mail Detection Program
2 pages
Spam Detection Model
No ratings yet
Spam Detection Model
4 pages
Spam Detection
No ratings yet
Spam Detection
4 pages
Final Report (Saie)
No ratings yet
Final Report (Saie)
38 pages
B.Sc. Project: Email Spam Filter
No ratings yet
B.Sc. Project: Email Spam Filter
35 pages
Spam Detection & Classification Final
No ratings yet
Spam Detection & Classification Final
38 pages
Spam Email Detection Using Python and Machine Learning
No ratings yet
Spam Email Detection Using Python and Machine Learning
14 pages
Email Spam Detection for Engineers
No ratings yet
Email Spam Detection for Engineers
4 pages
Email Spam Classification
No ratings yet
Email Spam Classification
17 pages
Abhishek Mini Proj . File
No ratings yet
Abhishek Mini Proj . File
19 pages
Related Work
No ratings yet
Related Work
5 pages
Id - 3747 - Literature Review
No ratings yet
Id - 3747 - Literature Review
3 pages
Email
No ratings yet
Email
27 pages
$RVJ44FQ
No ratings yet
$RVJ44FQ
13 pages
ML Lab
No ratings yet
ML Lab
13 pages
E-Mail Spam Detection
No ratings yet
E-Mail Spam Detection
8 pages
Introduction To Spam Email Detection
No ratings yet
Introduction To Spam Email Detection
16 pages
Email Spam Filtering Using Machine Learning.1
No ratings yet
Email Spam Filtering Using Machine Learning.1
16 pages
PPT
0% (1)
PPT
15 pages
SPAMDETECTION
No ratings yet
SPAMDETECTION
8 pages
Lab 3 Write Up
No ratings yet
Lab 3 Write Up
2 pages
Anti Spam
No ratings yet
Anti Spam
26 pages
AI Phase4
No ratings yet
AI Phase4
11 pages
VBK23 Cse 041
No ratings yet
VBK23 Cse 041
6 pages
Spam Detection for CS Students
No ratings yet
Spam Detection for CS Students
29 pages
AI-Powered Email Spam Detection Guide
No ratings yet
AI-Powered Email Spam Detection Guide
50 pages
Email Spam Detection
No ratings yet
Email Spam Detection
8 pages
Email Spam Classifier Using GaussianNB
No ratings yet
Email Spam Classifier Using GaussianNB
3 pages
Machine Learning for Email Spam Detection
No ratings yet
Machine Learning for Email Spam Detection
9 pages
1822 B Deleted
No ratings yet
1822 B Deleted
38 pages
The Electrical Load Forecasting Base On An Optimal Selection Method of Multiple Models in DSM
No ratings yet
The Electrical Load Forecasting Base On An Optimal Selection Method of Multiple Models in DSM
8 pages
Wimax: Broadband Wireless Access
No ratings yet
Wimax: Broadband Wireless Access
17 pages
International Students Financial Aid Application
No ratings yet
International Students Financial Aid Application
2 pages
24 Bitcoin
No ratings yet
24 Bitcoin
36 pages
ML Engineer Roadmap
No ratings yet
ML Engineer Roadmap
3 pages
Lecture - 4.1 - Bayes Classifier
No ratings yet
Lecture - 4.1 - Bayes Classifier
31 pages
Data Warehousing and Mining Exam Guide
No ratings yet
Data Warehousing and Mining Exam Guide
4 pages
Data Mining Warehousing - Data Mining - Notes
No ratings yet
Data Mining Warehousing - Data Mining - Notes
56 pages
Predicting Academic Tracks for Students
No ratings yet
Predicting Academic Tracks for Students
6 pages
N LP Research Paper Fake News Detection
No ratings yet
N LP Research Paper Fake News Detection
7 pages
FDS Unit 2
No ratings yet
FDS Unit 2
8 pages
Lecture 2
No ratings yet
Lecture 2
66 pages
Age and Gender Detection via Fingerprints
No ratings yet
Age and Gender Detection via Fingerprints
5 pages
RS ArcGIS10Tutorial7j
No ratings yet
RS ArcGIS10Tutorial7j
55 pages
EPN VIT - CP Data Analytics
No ratings yet
EPN VIT - CP Data Analytics
13 pages
Applications of Computer Vision and Machine Learning in Agriculture - A State-Of-The-Art Glimpse (Final)
No ratings yet
Applications of Computer Vision and Machine Learning in Agriculture - A State-Of-The-Art Glimpse (Final)
5 pages
Research Paper Example
No ratings yet
Research Paper Example
37 pages
Text Classification: SNLP 2016
No ratings yet
Text Classification: SNLP 2016
56 pages
Eswa D 23 09051
No ratings yet
Eswa D 23 09051
9 pages
Unit - 3 Q1.State Different Types of Report With Application 1.crosstab Report
No ratings yet
Unit - 3 Q1.State Different Types of Report With Application 1.crosstab Report
26 pages
7.tomato Quality Classification Based On Transfer
No ratings yet
7.tomato Quality Classification Based On Transfer
14 pages
Mini Project Report
No ratings yet
Mini Project Report
21 pages
Practical
No ratings yet
Practical
24 pages
AI-Big Data Analytics For Building Automation and Management Systems A Survey, Actual Challenges and Future Perspectives
No ratings yet
AI-Big Data Analytics For Building Automation and Management Systems A Survey, Actual Challenges and Future Perspectives
93 pages
Random Forest for Air Quality Prediction
100% (1)
Random Forest for Air Quality Prediction
28 pages
Decision Trees
No ratings yet
Decision Trees
26 pages
Aiml Full Record
No ratings yet
Aiml Full Record
46 pages
DM-MICA TELTEK Piyush Singh
100% (1)
DM-MICA TELTEK Piyush Singh
12 pages
Naïve Bayes for Resume Classification
No ratings yet
Naïve Bayes for Resume Classification
8 pages
Advances in Nature Inspired Metaheuristic Optimization For 2023 Computer Sci
No ratings yet
Advances in Nature Inspired Metaheuristic Optimization For 2023 Computer Sci
24 pages
Heart Disease Prediction Model by Shivansh
No ratings yet
Heart Disease Prediction Model by Shivansh
11 pages
Supervised Learning Overview in Scikit-Learn
No ratings yet
Supervised Learning Overview in Scikit-Learn
4 pages
DL Unit - 5
No ratings yet
DL Unit - 5
14 pages
Spam Filtering Techniques Survey
No ratings yet
Spam Filtering Techniques Survey
7 pages

Spam Detection System Using ML Techniques

Uploaded by

Spam Detection System Using ML Techniques

Uploaded by

SPAM DETECTION

It undermines the usability of the email system and also costs

Publicly available email message data from lingspam data set is

The data is downloaded from

Downloaded data which contains a series of folders with each

A special method of Pruning is designed and implemented for

1) Training: Trains the system with two class data

2)Validation: In this step, we are going to validate the classifier

3)Testing: In this step, we are going to test the classifier to

In this step, we are going to validate the model which is built

In this step, we are going to use the classifier

Spam filter system is implemented using WEKA java API and

Based on results Support Vector Machine and Nave Bayes

This system can be further enhanced by including the

You might also like