0% found this document useful (0 votes)

76 views4 pages

Detection of Spyware by Mining Executable Files

The document discusses a method for detecting spyware by mining executable files. It extracts features from binary files, performs feature reduction, and uses machine learning algorithms like Naive Bayes and SVM on the reduced feature sets to classify files as spyware or legitimate software, with the goal of detecting spyware.

Uploaded by

swamishailu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

76 views4 pages

Detection of Spyware by Mining Executable Files

Uploaded by

swamishailu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

Detection of Spyware by Mining Executable Files

Objectives
The main objective of our project is to establish a method in spyware detection
research using data mining techniques. These techniques are used for information
retrieval and classification. In application of techniques, there was only one change that
computer programs were used rather than text documents.
In this project, binary features are extracted from executable files. A feature
reduction method is then used to obtain a subset of data which is further used as a
training set for automatically generating classifiers. In this method, the generated
classifiers are used to classify new, previously unseen binaries as either legitimate
software or spyware. We will use appropriate value of n in order to yield high
performance, also suitable machine learning algorithm to produce high accuracy.

Project idea
The goal of the project is to detect spyware by using data mining and machine
learning. We use the Waikato Environment for Knowledge Analysis (WEKA) to perform
the experiments. WEKA is a suite of machine learning algorithms and analysis tools,
which is used in practice for solving data mining problems. First, we extract features
from the binary files and we then apply a feature reduction method in order to reduce data
set complexity. Finally, we convert the reduced feature set into the Attribute Relation File
Format (ARFF). ARFF files are ASCII text files that include a set of data instances, each
described by a set of features. Figure 2.1 shows the steps involved in our proposed
method.

Detection of Spyware by Mining Executable Files

Figure 2.1: Proposed System

We organized our work into following stages:

1. Data Collection
2. Byte Sequence Generation
3. N-gram Generation
4. Feature Extraction
5. Feature Reduction
6. ARFF Generation
7. Model Training

Step 1: Data Collection

Detection of Spyware by Mining Executable Files

Our data set consists of two classes of binary files:

(1) Benign files
(2) Spyware files.

Step 2: Byte Sequence Generation

This process makes file conversion from binary to byte sequence in each class.
We use xxd, which is a UNIX based utility for conversion.

Step 3: N-gram Generation

This process pieces out the byte sequences into a desired size of n (namely 4, 5
and 6). An n-gram is a sequence of n elements. This process also makes sure that each
line contains one n-gram and length of a single line is equal to the size of n.

Step 4: Feature Extraction

We extract the features by using two different approaches: Common Feature Based
Extraction (CFBE) and Frequency Based Feature Extraction (FBFE). Both methods are
used to obtain Reduced Feature Sets (RFSs) which are then used to generate the Attribute
Relation File Format (ARFF) files.
1. Frequency Based Feature Extraction (FBFE):
In FBFE, the frequency of each n-gram in each class is calculated.
2. Common Feature Based Extraction (CFBE):
In CFBE, the common n-grams are extracted from each class.

Step 5: Feature Reduction

In FBFE, all n-grams within a specified frequency range (50-500) are extracted
and the rest (1-49) are discarded. In CFBE, only one representation of each feature is
Detection of Spyware by Mining Executable Files

considered in one class. To obtain Reduced Feature Sets (RFSs) for CFBE and FBFE,
merge unique n-grams for both classes.

Step 6: ARFF Generation (Data Set Generation)

This process generates two ARFF databases: frequency based feature database
and common feature based database. All attributes in database are treated as Boolean
attributes. ARFF process searches for every n-gram in all byte sequences for a class and
assign a value to the attribute which can be either 1 or 0 on the present/not present
basis.

Step 7: Model Training

The ARFF file is used as input to WEKA for applying machine learning
algorithms. The algorithms used in the experiment are: ZeroR, Naive Bayes, SVM
(Support Vector Machines), J48, Random Forest and JRip.

Hardware Requirements

Pentium Processor, 1.6 GHz or advanced

RAM, 128 MB or more

HDD, 40 GB or more.

Software Requirements

Platform: Linux OS

Language: JAVA

Editor: G-Edit Editor

WEKA (Machine Learning Tool)

Detection of Spyware by Mining Executable Files

Data Mining Lab Manual 2024-25
No ratings yet
Data Mining Lab Manual 2024-25
45 pages
DWDM Lab Manual 2024-2025
No ratings yet
DWDM Lab Manual 2024-2025
96 pages
Deepak Dmbi File
No ratings yet
Deepak Dmbi File
40 pages
DWBI Lab Manual 2023-24 Final
No ratings yet
DWBI Lab Manual 2023-24 Final
40 pages
DM Lab Material
No ratings yet
DM Lab Material
88 pages
Internet 2016 1 40 40038
No ratings yet
Internet 2016 1 40 40038
6 pages
Exp 6
No ratings yet
Exp 6
12 pages
All Computers
No ratings yet
All Computers
44 pages
Malware Detection Using Machine Learning and Deep Learning
No ratings yet
Malware Detection Using Machine Learning and Deep Learning
10 pages
DMW LabFile 0901CS243D11 Swastik
No ratings yet
DMW LabFile 0901CS243D11 Swastik
25 pages
Deep Neural Network for Malware Detection
No ratings yet
Deep Neural Network for Malware Detection
10 pages
Malware Detection Using Frequency Domain-Based Image Visualization and Deep Learning
No ratings yet
Malware Detection Using Frequency Domain-Based Image Visualization and Deep Learning
10 pages
AI-43 Data Mining
No ratings yet
AI-43 Data Mining
96 pages
A - Multi-Strategy - Adversarial - Attack - Method - For - Deep - Learning - Based - Malware - Detectors
No ratings yet
A - Multi-Strategy - Adversarial - Attack - Method - For - Deep - Learning - Based - Malware - Detectors
5 pages
Malware Detection with Machine Learning
No ratings yet
Malware Detection with Machine Learning
29 pages
Deep Learning for Malware Detection
No ratings yet
Deep Learning for Malware Detection
28 pages
Experiment 1: Installation of WEKA Tool Aim
No ratings yet
Experiment 1: Installation of WEKA Tool Aim
19 pages
Data Mining Lab Manual COMPLETE GMR
No ratings yet
Data Mining Lab Manual COMPLETE GMR
66 pages
Malware Detection with ML Techniques
No ratings yet
Malware Detection with ML Techniques
8 pages
DMW Lab Print
No ratings yet
DMW Lab Print
21 pages
Malware Detection for Researchers
No ratings yet
Malware Detection for Researchers
11 pages
Data Warehousing Lab Report: WEKA Analysis
No ratings yet
Data Warehousing Lab Report: WEKA Analysis
20 pages
Malware Detection Using ANN
No ratings yet
Malware Detection Using ANN
10 pages
Malware Detection with Ensemble Learning
No ratings yet
Malware Detection with Ensemble Learning
70 pages
Intrusion Detection - DM
No ratings yet
Intrusion Detection - DM
34 pages
Efficient and Effective Malware Detection System
No ratings yet
Efficient and Effective Malware Detection System
5 pages
Weka Data Processing and Analysis Guide
No ratings yet
Weka Data Processing and Analysis Guide
100 pages
DWDM Lab Manual 2022-2023
No ratings yet
DWDM Lab Manual 2022-2023
87 pages
Elizabeth Walkup, MacMalware
No ratings yet
Elizabeth Walkup, MacMalware
5 pages
Data Mining in Spam Detection
No ratings yet
Data Mining in Spam Detection
7 pages
Malware Detection for Tech Experts
No ratings yet
Malware Detection for Tech Experts
6 pages
Research Paper
No ratings yet
Research Paper
8 pages
Data Warehousing Lab Manual
No ratings yet
Data Warehousing Lab Manual
36 pages
BI - Experiment - No - 1
No ratings yet
BI - Experiment - No - 1
7 pages
Itdw
No ratings yet
Itdw
44 pages
Malcode Detection
No ratings yet
Malcode Detection
5 pages
Mal Ware Analysis and Dect I On
No ratings yet
Mal Ware Analysis and Dect I On
48 pages
Data Mining in Bioinformatics
No ratings yet
Data Mining in Bioinformatics
21 pages
Data Mining and Warehousing
No ratings yet
Data Mining and Warehousing
30 pages
FINAL DW Record PDF
No ratings yet
FINAL DW Record PDF
32 pages
Compusoft, 3 (10), 1116-1123 PDF
No ratings yet
Compusoft, 3 (10), 1116-1123 PDF
8 pages
Detecting Malware in Portable Executable Files Using Machine Learning Approach
No ratings yet
Detecting Malware in Portable Executable Files Using Machine Learning Approach
7 pages
08 Rohit Final Malware Research Paper
No ratings yet
08 Rohit Final Malware Research Paper
13 pages
WEKA Manual
No ratings yet
WEKA Manual
25 pages
Data Mining Lab Manual for CSE
No ratings yet
Data Mining Lab Manual for CSE
50 pages
Experiment No: 01 Data Exploration & Data Preprocessing
No ratings yet
Experiment No: 01 Data Exploration & Data Preprocessing
54 pages
Data Mining Practical Guide
No ratings yet
Data Mining Practical Guide
27 pages
DWDM Lab Manual
No ratings yet
DWDM Lab Manual
55 pages
DWDM2 2
No ratings yet
DWDM2 2
44 pages
ELF Et Virologie Informatique
No ratings yet
ELF Et Virologie Informatique
7 pages
Ccs341 Datawarehousing
No ratings yet
Ccs341 Datawarehousing
66 pages
Akron 1311042709
No ratings yet
Akron 1311042709
104 pages
A Study of Detecting Computer Viruses in Real-Infected Files in The N-Gram Representation With Machine Learning Methods
No ratings yet
A Study of Detecting Computer Viruses in Real-Infected Files in The N-Gram Representation With Machine Learning Methods
15 pages
Data Warehousing and Data Mining Lab
No ratings yet
Data Warehousing and Data Mining Lab
53 pages
Feature-Based Type Identification of File Fragments: Keywords
No ratings yet
Feature-Based Type Identification of File Fragments: Keywords
20 pages
Data Mining & Predictive Analysis Lab Manual
No ratings yet
Data Mining & Predictive Analysis Lab Manual
68 pages
Dynamic Malware Detection via Deep Learning
No ratings yet
Dynamic Malware Detection via Deep Learning
16 pages
A Segmentation Based Sequential Pattern Matching For Efficient Video Copy Detection
No ratings yet
A Segmentation Based Sequential Pattern Matching For Efficient Video Copy Detection
22 pages
QUERIE Collborative Database Exploration
No ratings yet
QUERIE Collborative Database Exploration
26 pages
A Hybrid Cloud Approach For Secure Authorized Deduplication
100% (4)
A Hybrid Cloud Approach For Secure Authorized Deduplication
9 pages
Web Image Re-Ranking Using Query-Specific Semantic Signatures
100% (1)
Web Image Re-Ranking Using Query-Specific Semantic Signatures
5 pages
Multimedia Answer Generation For Community Question Answering
No ratings yet
Multimedia Answer Generation For Community Question Answering
17 pages
Advanced Routing Protocols for Networks
No ratings yet
Advanced Routing Protocols for Networks
2 pages
Computer Studies - Computer Studies Form 1 - Zeraki Achievers 2.0 - Marking Scheme
100% (1)
Computer Studies - Computer Studies Form 1 - Zeraki Achievers 2.0 - Marking Scheme
7 pages
OEM NetApp Quick Reference Guide
No ratings yet
OEM NetApp Quick Reference Guide
4 pages
MongoDB Capacity Planning Guide
No ratings yet
MongoDB Capacity Planning Guide
36 pages
Citrix Software Versions Cheat Sheet
No ratings yet
Citrix Software Versions Cheat Sheet
2 pages
Marantz sr4200 Service PDF
No ratings yet
Marantz sr4200 Service PDF
29 pages
Ard1 - Using 7-Segment and PWM
No ratings yet
Ard1 - Using 7-Segment and PWM
8 pages
Air Conditioner Maintenance Guide
100% (1)
Air Conditioner Maintenance Guide
11 pages
WLI UC GN - Manual
No ratings yet
WLI UC GN - Manual
46 pages
AEI Phone Solutions for Hospitality
100% (1)
AEI Phone Solutions for Hospitality
35 pages
Jrs Digital 1u 4pon Port JD Ae34 Epon Olt
No ratings yet
Jrs Digital 1u 4pon Port JD Ae34 Epon Olt
3 pages
Bosch MIC 400 Series Operation Manual
No ratings yet
Bosch MIC 400 Series Operation Manual
78 pages
Technical Specification Sail Water Treatment Plant
No ratings yet
Technical Specification Sail Water Treatment Plant
192 pages
ZT3L Module Datasheet - Tuya Smart - Docs Center
No ratings yet
ZT3L Module Datasheet - Tuya Smart - Docs Center
26 pages
RolastarCatalogue PDF
No ratings yet
RolastarCatalogue PDF
8 pages
19cce302 CDP Csa
No ratings yet
19cce302 CDP Csa
9 pages
AUTOMOTIVE MECHATRONICs
No ratings yet
AUTOMOTIVE MECHATRONICs
9 pages
Nethserver PDF
No ratings yet
Nethserver PDF
172 pages
AMD RS690 Chipset Databook Technical Reference Manual Rev. 3.04
No ratings yet
AMD RS690 Chipset Databook Technical Reference Manual Rev. 3.04
82 pages
PDF Asteion Aquilion Console Service 8 Compress
No ratings yet
PDF Asteion Aquilion Console Service 8 Compress
63 pages
Laptop Kredit Cicilan 0% di Tripio
No ratings yet
Laptop Kredit Cicilan 0% di Tripio
1 page
Onan Parts Manual RDJE RDJEA Industrial Engine 974-0252
100% (1)
Onan Parts Manual RDJE RDJEA Industrial Engine 974-0252
34 pages
Roles of Field Monitors in Water Projects
No ratings yet
Roles of Field Monitors in Water Projects
11 pages
Analog & Digital Communication Course
100% (2)
Analog & Digital Communication Course
2 pages
A LED Dice Using A PIC 16F84
No ratings yet
A LED Dice Using A PIC 16F84
14 pages
Service Manual: HXC-158 - HXC-258 - HXC-358 HXC-608 - HXC-608A - HYC-610
No ratings yet
Service Manual: HXC-158 - HXC-258 - HXC-358 HXC-608 - HXC-608A - HYC-610
49 pages
Game Log Analysis for BuglyApp 5.3
No ratings yet
Game Log Analysis for BuglyApp 5.3
7 pages
Adjustable Precision Shunt Regulators Az431-B
No ratings yet
Adjustable Precision Shunt Regulators Az431-B
18 pages
TCAS 2000 Part 2
No ratings yet
TCAS 2000 Part 2
217 pages
ICT A7/V7 Bill Acceptor Manual
No ratings yet
ICT A7/V7 Bill Acceptor Manual
84 pages
Unity Game Engine Log Analysis
No ratings yet
Unity Game Engine Log Analysis
29 pages

Detection of Spyware by Mining Executable Files

Uploaded by

Detection of Spyware by Mining Executable Files

Uploaded by

Detection of Spyware by Mining Executable Files

Detection of Spyware by Mining Executable Files

Figure 2.1: Proposed System

We organized our work into following stages:

Step 1: Data Collection

Our data set consists of two classes of binary files:

Step 2: Byte Sequence Generation

Step 3: N-gram Generation

Step 4: Feature Extraction

Step 5: Feature Reduction

Step 6: ARFF Generation (Data Set Generation)

Step 7: Model Training

Pentium Processor, 1.6 GHz or advanced

RAM, 128 MB or more

Editor: G-Edit Editor

WEKA (Machine Learning Tool)

Detection of Spyware by Mining Executable Files

You might also like