0% found this document useful (0 votes)

13 views7 pages

SVM Lab Report

The document details the development and evaluation of a Spam Email Classifier using Support Vector Machine (SVM), highlighting key metrics like precision, recall, and F1-score. It outlines the preprocessing steps, feature extraction, and the training process of the SVM model, as well as its performance evaluation through confusion matrices and time analysis. The classifier demonstrates high precision and recall, indicating its effectiveness in distinguishing between spam and non-spam emails.

Uploaded by

m.mouhcine1234

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views7 pages

SVM Lab Report

Uploaded by

m.mouhcine1234

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

Spam Email Classifier Using SVM

Made by : Ahannach yassine

EL Garte Mouhcine

1. Introduction
This section presents the performance and evaluation results of the Spam Email
Classifier using Support Vector Machine (SVM). The results include key
evaluation metrics such as precision, recall, and F1-score, along with insights
from the confusion matrix and time analysis to assess the model's efficiency and
reliability in classifying spam and non-spam emails.

2. Feature Extraction and Preprocessing:

This script processes an email dataset to identify unique words and their
frequency of occurrence. The cleaned and processed data is then saved to a CSV
file, which can be used as a feature set for machine learning tasks like spam
detection.

1. text_cleanup(text):
o Removes punctuation and stopwords (e.g., "the", "is").
o Converts words to lowercase for consistency.
o Output: A list of cleaned words from the input text.
2. extract_unique_words(data_path):
o Reads dataset: Assumes emails are in the text column of a CSV.
o Processes each email:
 Cleans text with text_cleanup.
 Lemmatizes words (e.g., "running" → "run").
 Filters short words (≤ 2 characters) and numbers.
o Counts occurrences of unique words in a dictionary.
o Saves the word frequencies as [Link].
3. Key Logic:
o Lemmatization ensures related words are grouped.
o Short words and digits are ignored to focus on meaningful data.
o Sorting by frequency helps identify the most important words.
4. Output:
o A CSV file ([Link]) with two columns:
 word: The unique word.
 count: Its frequency in the dataset.

Important Lines
 Text cleaning

 Lemmatization and counting:

 Save to CSV:

Role in the Lab

This script extracts features (unique words with frequencies) from email text,
which will later be used as input for machine learning models in spam
classification.

Preprocessing and Feature Extraction

 Processes email data to generate a feature matrix where:
o Rows represent individual emails.
o Columns represent the frequency of predefined words in each
email.
 Labels emails as ham (1) or spam (-1).
 Saves the feature matrix as [Link] for use in machine learning.

Key Components
1. Input Files:
o data_path: CSV file containing emails (text column) and their labels
(label column as "ham" or "spam").
o words: List of unique words (from [Link]) used as features.
2. process_emails Function:
o Initial Setup:
 Reads the email dataset.
 Initializes a zero matrix for word counts.
o Word Processing:
 Splits each email into words and:
 Lemmatizes each word to its base form.
 Filters out stopwords, punctuation, short words, and
numbers.
 Counts occurrences of each word from the words list.
o Assign Labels:
 Maps "ham" to 1 and "spam" to -1.
o Save Results:
 Writes the word frequencies and labels to [Link].

Key Logic
1. Feature Extraction:

o Updates the count for words found in the predefined words list.
2. Label Assignment:

o Assigns a numeric label for each email for classification tasks.

3. Save to CSV:

o Combines word frequencies and labels into a single row.

Output
 [Link]:
o Contains:
 Columns for each word (from words).
 An additional output column for the email label (1 for ham,
-1 for spam).

Important Lines
1. Reading Unique Words:

o Ensures feature consistency by using predefined words.

2. Filtering and Lemmatization:

o Ensures only meaningful words are included in the count.

3. Writing Results:
o Saves the feature matrix for further analysis.

Role in the Lab

 Converts cleaned email text into a structured feature matrix required for
training machine learning models.
 Bridges the gap between raw data and supervised learning tasks.

Support Vector Machine (SVM) Model

Training and Evaluation
 Implements a Support Vector Machine (SVM) model to classify emails
as spam or ham.
 Uses a feature matrix generated from word frequencies ([Link]).
 Trains the model, evaluates it, and saves it for future predictions.

Key Components
1. Input File:
o [Link]: Contains word frequency features for each email
and a label (1 for ham, -1 for spam).
2. Functions:
train_svm(X_train, y_train):
o Trains an SVM model using a linear kernel.
o Returns the trained model.
save_model(model, vectorizer):
o Saves the trained SVM model and (optionally) a vectorizer to disk
using pickle.
o Why Save?: Enables reusability without retraining.
load_model():
o Loads the SVM model and vectorizer from disk for predictions.

main() Function
1. Data Preparation:
o Reads the [Link] file.
o Splits the data into:
 Features (X): Word frequency counts for each email.
 Labels (y): 1 for ham, -1 for spam.
o Divides the dataset into training (70%) and testing (30%) subsets.
2. Training:

o Trains the SVM model using the training data.

3. Evaluation:
o Predicts labels for the test data:
Output
1. Console Output:
o A detailed classification report showing model performance.
2. Saved Files:
o svm_spam_classifier.pkl: The trained SVM model.
o (Optionally) tfidf_vectorizer.pkl: For text preprocessing (not used
here).

Key Lines
1. Splitting the Dataset:

o Ensures 70% training and 30% testing split.

2. Training the SVM:

Uses a linear
o kernel, which is ideal for text classification.
3. Classification Report:

o Provides metrics like precision, recall, and F1-score.

4. Saving the Model:

Role in the Lab

 This script performs the final step of the pipeline:
o Builds a machine learning model.
o Evaluates its performance.
o Saves the model for deployment or future predictions.

Model Performance and Evaluation Result

Key Metrics Explained:
1. Precision:
o Measures how many emails classified as spam are actually spam.
o High precision (up to 0.99) indicates the model is good at avoiding
false positives (misclassifying ham as spam).
2. Recall:
o Measures how many actual spam emails were correctly classified.
o High recall (up to 0.96) shows the model is effectively capturing
most spam emails.
3. Trade-off:
o Precision and recall balance the model's ability to avoid false
positives and false negatives.

Confusion Matrix:
The table shows:
 Ham (non-spam):
o Correctly classified (Ham-Ham).
o Misclassified as spam (Ham-Spam).
 Spam:
o Correctly classified (Spam-Spam).
o Misclassified as non-spam (Spam-Ham).
For instance:
 In one model:
o Ham correctly classified: 1095 emails.
o Ham misclassified as spam: 16 emails.
o Spam correctly classified: 389 emails.
o Spam misclassified as ham: 52 emails.

Performance Trends:
 Precision remains consistently high across all models (0.92–0.99).
 Recall varies between 0.91–0.96, indicating some fluctuation in capturing
all spam.
 Best results are seen with:
o Precision: 0.99.
o Recall: 0.95.
o Time spent: ~190 seconds.
Time Analysis:
 Each model spends ~189–280 seconds on training and evaluation.
 Total time for processing and evaluating all models: 5302.99 seconds
(~1.5 hours).

Conclusion:
The results demonstrate that the spam classifier performs well with consistent
high precision and recall, making it reliable for distinguishing between spam and
ham. The variation in performance metrics across models suggests optimization
potential for balancing precision and recall while reducing processing time.

Spam Email Detection Documentation
No ratings yet
Spam Email Detection Documentation
3 pages
AI Phase4
No ratings yet
AI Phase4
11 pages
Spam Email Classifier
No ratings yet
Spam Email Classifier
17 pages
Spam Filter Project Report Logistic Regression
No ratings yet
Spam Filter Project Report Logistic Regression
10 pages
Sms Spam Using Machine Learning 4
No ratings yet
Sms Spam Using Machine Learning 4
42 pages
Aiml Assignment-2
No ratings yet
Aiml Assignment-2
8 pages
Arnav MLlab04
No ratings yet
Arnav MLlab04
7 pages
Email Spam Detection Final Presentation-21BSCHH010002
No ratings yet
Email Spam Detection Final Presentation-21BSCHH010002
17 pages
Ai Project
No ratings yet
Ai Project
8 pages
Project Report
No ratings yet
Project Report
11 pages
CTI Record
No ratings yet
CTI Record
49 pages
Document
No ratings yet
Document
11 pages
Micro
No ratings yet
Micro
5 pages
Spam Email Dection
No ratings yet
Spam Email Dection
23 pages
AI Phash3
No ratings yet
AI Phash3
11 pages
Spam Detection Model
No ratings yet
Spam Detection Model
4 pages
Spam Filtering Model Development Guide
No ratings yet
Spam Filtering Model Development Guide
8 pages
Project 2
No ratings yet
Project 2
10 pages
Zoom
No ratings yet
Zoom
20 pages
Email Classification with Machine Learning
No ratings yet
Email Classification with Machine Learning
22 pages
AI Spam Classifier Guide
No ratings yet
AI Spam Classifier Guide
14 pages
v02ioj4R4trEE7G4xP9A - NLP Exercise On Spam Classification
No ratings yet
v02ioj4R4trEE7G4xP9A - NLP Exercise On Spam Classification
2 pages
DSP Report Taashif 22347 Aman 22035 Vivek 22373 Emailspamdetection
No ratings yet
DSP Report Taashif 22347 Aman 22035 Vivek 22373 Emailspamdetection
3 pages
Project Ali Huzaifa
No ratings yet
Project Ali Huzaifa
6 pages
Aayush Nihar Spam Mail Filtering
No ratings yet
Aayush Nihar Spam Mail Filtering
18 pages
Aiproject 2
No ratings yet
Aiproject 2
4 pages
Email Spam Detection PPT Github
No ratings yet
Email Spam Detection PPT Github
11 pages
ML Lab
No ratings yet
ML Lab
13 pages
DWDM Pavan Final
No ratings yet
DWDM Pavan Final
10 pages
Spam Filtering
No ratings yet
Spam Filtering
31 pages
Title: Abstract
No ratings yet
Title: Abstract
2 pages
44 Decision Tree Model For Email Classification
No ratings yet
44 Decision Tree Model For Email Classification
4 pages
High and Low Risk Email Classification
No ratings yet
High and Low Risk Email Classification
5 pages
Spam Detection
No ratings yet
Spam Detection
10 pages
Report Minor Project PDF
No ratings yet
Report Minor Project PDF
37 pages
Final Report (Saie)
No ratings yet
Final Report (Saie)
38 pages
Ijresm V6 I9 3 2
No ratings yet
Ijresm V6 I9 3 2
5 pages
Neural Network Spam Classifier
No ratings yet
Neural Network Spam Classifier
5 pages
Email Spam Classification with ML & NLP
No ratings yet
Email Spam Classification with ML & NLP
6 pages
Spam Detection Using ID3 Decision Trees
No ratings yet
Spam Detection Using ID3 Decision Trees
4 pages
Spam Classifier
No ratings yet
Spam Classifier
8 pages
Vishal FOML Micro Project Vishal & Milan
No ratings yet
Vishal FOML Micro Project Vishal & Milan
26 pages
Data Science Report
No ratings yet
Data Science Report
33 pages
SMS Spam Classifier Guide
No ratings yet
SMS Spam Classifier Guide
3 pages
Machine Learning Laboratory Practical Record
No ratings yet
Machine Learning Laboratory Practical Record
27 pages
Logistic Regression for Email Spam Classification
No ratings yet
Logistic Regression for Email Spam Classification
19 pages
Machine Learning for Spam Detection
No ratings yet
Machine Learning for Spam Detection
14 pages
Implemention of Sms Spam Filtering
No ratings yet
Implemention of Sms Spam Filtering
27 pages
AI Phase2
No ratings yet
AI Phase2
42 pages
Pruthviraj Micor Foml
No ratings yet
Pruthviraj Micor Foml
26 pages
Lab 3 Write Up
No ratings yet
Lab 3 Write Up
2 pages
Email Spam CLassification
No ratings yet
Email Spam CLassification
16 pages
Machine Learning Crash Course: Part I
No ratings yet
Machine Learning Crash Course: Part I
40 pages
Spam Email Classifier
No ratings yet
Spam Email Classifier
16 pages
B. Flowchart of The Model: Esult
No ratings yet
B. Flowchart of The Model: Esult
3 pages
Email Spam Detection Using Machine Learning
No ratings yet
Email Spam Detection Using Machine Learning
2 pages
Research Article On The Forensic
No ratings yet
Research Article On The Forensic
14 pages
Comparing AI Offerings From Top Cloud Providers
No ratings yet
Comparing AI Offerings From Top Cloud Providers
9 pages
Black and Green Modern Finance Business Report Presentation
No ratings yet
Black and Green Modern Finance Business Report Presentation
11 pages
Internal Audit Checklist
No ratings yet
Internal Audit Checklist
2 pages
Lab 1
No ratings yet
Lab 1
1 page
Practical Work On IPSec Protocol
No ratings yet
Practical Work On IPSec Protocol
4 pages
Spamfilter
No ratings yet
Spamfilter
4 pages
RAPUNet
No ratings yet
RAPUNet
9 pages
Fake Job Posting Detection Report
No ratings yet
Fake Job Posting Detection Report
10 pages
Spam Detection in Social Networks Survey
No ratings yet
Spam Detection in Social Networks Survey
14 pages
Textile Defect Detection Advances
No ratings yet
Textile Defect Detection Advances
18 pages
Achine Learning Based Disease Diagnosis Comprehensive Review
No ratings yet
Achine Learning Based Disease Diagnosis Comprehensive Review
30 pages
Domain Based Classification of Punjabi Text Documents: Nidhi, Vishal Gupta
No ratings yet
Domain Based Classification of Punjabi Text Documents: Nidhi, Vishal Gupta
8 pages
2581-Article Text-11325-2-10-20250723
No ratings yet
2581-Article Text-11325-2-10-20250723
48 pages
Automated Vulnerability Detection Using Deep Representation Learning
No ratings yet
Automated Vulnerability Detection Using Deep Representation Learning
7 pages
Sentiment Analysis of IMDb Movie Reviews A Comparative Study On Performance of Hyperparameter-Tuned Classification Algorithms
No ratings yet
Sentiment Analysis of IMDb Movie Reviews A Comparative Study On Performance of Hyperparameter-Tuned Classification Algorithms
6 pages
AI Concepts, Detailed Explanation
No ratings yet
AI Concepts, Detailed Explanation
53 pages
2major Project PPT (Final)
No ratings yet
2major Project PPT (Final)
14 pages
Abhishek Raut Report
No ratings yet
Abhishek Raut Report
42 pages
Internship Report Poorab
No ratings yet
Internship Report Poorab
30 pages
Final Report
No ratings yet
Final Report
12 pages
67379dbfbc59a Assignment
No ratings yet
67379dbfbc59a Assignment
2 pages
Detecting Low-Rate DoS/DDoS with Deep Learning
No ratings yet
Detecting Low-Rate DoS/DDoS with Deep Learning
7 pages
Fake News Detection Report
No ratings yet
Fake News Detection Report
27 pages
2.5 Pre - and Post-Coordinate Indexing
No ratings yet
2.5 Pre - and Post-Coordinate Indexing
48 pages
Machine Learning Exam Papers
No ratings yet
Machine Learning Exam Papers
9 pages
AI-Driven Document Automation Insights
No ratings yet
AI-Driven Document Automation Insights
1 page
Project Report Half
100% (1)
Project Report Half
33 pages
Csemp - 152 (Interim Review - 1 For Major Project Stage 2)
No ratings yet
Csemp - 152 (Interim Review - 1 For Major Project Stage 2)
11 pages
Notes Ai Finals
No ratings yet
Notes Ai Finals
39 pages
A Human-Detection Method Based On YOLOv5 and Trans
No ratings yet
A Human-Detection Method Based On YOLOv5 and Trans
12 pages
(LNCS 9795) Hien T. Nguyen, Vaclav Snasel (Eds.) - Computational Social Networks - 5th International Conference, CSoNet 2016, Ho Chi Minh City, Vietnam, August 2-4, 2016
100% (1)
(LNCS 9795) Hien T. Nguyen, Vaclav Snasel (Eds.) - Computational Social Networks - 5th International Conference, CSoNet 2016, Ho Chi Minh City, Vietnam, August 2-4, 2016
366 pages
COVID-19 Sentiment Analysis Using Deep Learning
No ratings yet
COVID-19 Sentiment Analysis Using Deep Learning
7 pages
utf-8''C3M2 Assignment
No ratings yet
utf-8''C3M2 Assignment
29 pages
Deep Learning Project Report
No ratings yet
Deep Learning Project Report
11 pages
Unit Iii
No ratings yet
Unit Iii
22 pages