0% found this document useful (0 votes)

20 views39 pages

Development of Intrusion Detection Systems Using-1

It will be useful to students, lecturers and researchers in the field of computer science and electrical electronics engineering.

Uploaded by

ibrahim Paramole

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views39 pages

Development of Intrusion Detection Systems Using-1

It will be useful to students, lecturers and researchers in the field of computer science and electrical electronics engineering.

Uploaded by

ibrahim Paramole

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 39

DEVELOPMENT OF INTRUSION DETECTION SYSTEMS

USING MACHINE LEARNING

ODUNJO DORCAS DAMILOLA

NOU183033484

A PROJECT SUBMITTED TO THE DEPARTMENT OF

COMPUTER SCIENCE

FACULTY OF SCIENCE AND TECHNOLOGY

NATIONAL OPEN UNIVERSITY OF NIGERIA (NOUN)

IN PARTIAL FULFILMENT OF THE REQUIREMENTS

FOR

THE AWARD OF A BACHELOR OF SCIENCE DEGREE IN

COMPUTER SCIENCE

0
CERTIFICATION

● I hereby certify that this project contains work carried out by Odunjo Dorcas Damilola of the

Department of Computer Science with matriculation number NOU183033484 and has

satisfactorily completed the requirements for the award of a Bachelor of Science Degree in

Computer Science, Faculty of Science and Technology, National Open University of Nigeria

(NOUN).

..................................... .....................................

DR AYODELE OLOYEDE DATE

(Project Supervisor)

..................................... .....................................

DR (MRS) AKUJOBI A. T. DATE

(Study Center Director)

..................................... .....................................

PROF. SAHEED AJIBOLA DATE

1
(Dean, Faculty of sciences)

DEDICATION

This research project is dedicated to Almighty God for His grace, divine wisdom, knowledge and

understanding given to me to complete this project. I appreciate him for His mercy and divine

provision I enjoyed throughout our program.

2
ACKNOWLEDGEMENT

I want to use this opportunity to thank the Almighty God for the success of this work. For the strength

and wisdom bestowed on me from the beginning to the end. My appreciation goes to my Supervisor,

Dr Ayodele Oloyede, for his relentless efforts in this work and his suggestions, encouragement,

guidance and assistance where necessary. I extend my profound gratitude to all staff of National Open

University of Nigeria, (NOUN), Lagos Study Centre, Victoria Island, Lagos for their contributions.

3
ABSTRACT

Intrusion Detection Systems (IDS) are important tools used to monitor network traffic and prevent

unauthorized access or cyberattacks. Many older IDS methods find it difficult to detect new or

unknown threats, which can leave systems at risk. This project explores how a machine learning

algorithm called Two-Class Decision Forest, available in Microsoft Azure Machine Learning Studio,

can be used to improve intrusion detection.

The goal of the project was to build a system that can accurately identify whether a network activity is

normal or harmful. I used a popular dataset called NSL-KDD, which contains labeled examples of

both normal and attack traffic. The data was cleaned and prepared before training the model.

After training the model in Azure ML Studio, it was tested using unseen data to check how well it

could predict network intrusions. I also created a simple web-based interface that connects to the

model, allowing users to test the system by entering sample values. The results showed that the

Two-Class Decision Forest algorithm could detect intrusions effectively with good accuracy.

This project shows how machine learning tools in Azure can be used to build useful security solutions,

and it gives a practical example of how such systems can be tested and used in real-time.

4
TABLE OF CONTENTS

CERTIFICATION.................................................................................................................................... 1
DEDICATION.........................................................................................................................................2
ACKNOWLEDGEMENT......................................................................................................................3
ABSTRACT.............................................................................................................................................4
CHAPTER ONE..................................................................................................................................... 6
INTRODUCTION.................................................................................................................................. 6
1.1 Background to the Study..............................................................................................................6
1.2 Statement of the Problem..............................................................................................................7
1.3 Aim and Objectives of the Study................................................................................................. 9
1.4 Scope of the Study........................................................................................................................ 9
1.5 Significance of the Study............................................................................................................ 10
1.6 Operational Definition of Terms.............................................................................................. 11
CHAPTER TWO.................................................................................................................................. 14
LITERATURE REVIEW.....................................................................................................................14
2.1 Overview.....................................................................................................................................14
2.2 Conceptual Framework...............................................................................................................14
2.3 Empirical Review.......................................................................................................................18
2.3 Theoretical Framework............................................................................................................ 20
2.5 Summary of Reviewed Related Works and Knowledge Gap................................................... 21
CHAPTER THREE.............................................................................................................................. 23
METHODOLOGY............................................................................................................................... 23
3.1 Research Approach.................................................................................................................. 23
i. Data Collection.............................................................................................................................. 25
ii. Data Preprocessing....................................................................................................................... 26
iii. Model Development/Training: Decision Forests.........................................................................30
iv. Evaluation Metrics....................................................................................................................... 32
v. Analysis And Interpretation Of Results........................................................................................ 34

5
CHAPTER ONE

INTRODUCTION

1.1 Background to the Study

In today’s world, the internet is used for almost everything, from banking and shopping to

communication and business operations. As more people and companies depend on online systems,

the need to protect sensitive information from cyberattacks has become more important than ever. One

key way to do this is by using Intrusion Detection Systems (IDS). IDS are programs that watch over

computer networks to spot any suspicious activities or security threats. They act like security guards

that alert users when something unusual is happening (Kumar and Patel, 2021).

Traditionally, most IDS were built to recognize attacks by comparing incoming traffic to a list of

known attack patterns. These are called signature-based systems. While they are good at identifying

past threats, they cannot detect new types of attacks that don’t match any known pattern. This

limitation makes it easy for modern cybercriminals to bypass them using unknown or modified

methods (Ali et al., 2023). To solve this problem, researchers have started using machine learning

(ML) techniques to improve IDS performance.

Machine learning gives computers the ability to learn from data and recognize patterns without being

manually programmed. With this approach, IDS can be trained on large sets of network data to

understand what normal behavior looks like, and then detect anything that doesn’t fit that pattern.

ML-based IDS are more flexible and can detect new or unexpected attacks better than traditional

6
methods (Nguyen et al., 2022). They are also more scalable, meaning they can handle large amounts of

network traffic without slowing down.

In this project, I used the Two-Class Decision Forest algorithm, which is a type of machine learning

model that combines multiple decision trees to make predictions. It works well for detecting attacks

because it is accurate, fast, and able to manage messy or unbalanced data. The model was built using

Microsoft Azure Machine Learning Studio, a platform that allows users to build ML models without

needing to write complex code. By training the model on a well-known dataset called NSL-KDD, this

project aims to build a working IDS that can help identify network intrusions in a real-world

environment (Zhang et al., 2025).

1.2 Statement of the Problem

Many Intrusion Detection Systems (IDS) have been developed using different machine learning

algorithms, but not all of them perform well in real-world situations. For example, Logistic Regression

is commonly used for binary classification tasks, but it often struggles when the dataset has

overlapping features or when the relationship between inputs and outputs is not linear. In intrusion

detection, this can lead to wrong classifications, especially when the data is complex (Olaoye et al.,0

2022).

Another popular method, Support Vector Machines (SVMs), can handle both linear and non-linear

data, but it becomes very slow and memory-intensive when working with large datasets. SVMs also

require a lot of tuning and adjustment to work properly, which makes them harder to use in practical

7
scenarios (Yassin and Alshamrani, 2021). Similarly, K-Nearest Neighbors (KNN) is simple to

understand but performs poorly with large data because it stores the entire dataset in memory and

becomes very slow during prediction. It is also highly sensitive to noise and irrelevant features (Li and

Huang, 2023).

Naive Bayes, another commonly used algorithm, assumes that all features are independent, which is

often not true for network data. This can lead to inaccurate results because relationships between

network activities are usually complex (Bello et al., 2024). Neural Networks, especially deep learning

models, have been used in some studies and can offer high accuracy. However, they require large

computing power, a lot of training time, and are often seen as “black-box” models, meaning their

decisions are hard to explain (Chowdhury and Sahu, 2023).

In contrast, the Two-Class Decision Forest algorithm used in this project offers a better balance of

speed, accuracy, and simplicity. It builds multiple decision trees and combines their outputs to make

more reliable predictions. This method is less likely to overfit the training data and performs well even

with noisy or unbalanced datasets. It also does not require complex parameter tuning and is easier to

understand and explain compared to more advanced models like neural networks (Zhang et al., 2025).

Because of these advantages, the Two-Class Decision Forest was chosen for this project. It allowed the

model to be trained efficiently using Azure Machine Learning Studio and made it easier to test the

system using real-world intrusion data.

8
1.3 Aim and Objectives of the Study

This study aims to develop a decision forest model for an Intrusion Detection System (IDS).

Specific Objectives

1. Developing Two-Class Decision Forest model.

2. Implementing the developed model.

3. Evaluating the performance of the Two-Class Decision Forest model by checking its precision,

accuracy, and F-1 score.

1.4 Scope of the Study

This project focuses on building and testing an Intrusion Detection System (IDS) using the Two-Class

Decision Forest algorithm in Microsoft Azure Machine Learning Studio. The model was trained with

the NSL-KDD dataset, which includes labeled examples of both normal and attack network traffic.

The study covers:

● Importing and preparing the dataset using built-in Azure tools.

● Training and evaluating the model using key performance metrics.

● Creating a simple web-based interface that allows users to enter sample data and get results.

● Testing the model’s ability to detect common attack types such as Denial of Service (DoS),

Probe, Remote-to-Local (R2L), and User-to-Root (U2R).

9
1.5 Significance of the Study

This study is important because it shows how machine learning can be used to improve network

security in a simple and practical way. Many traditional Intrusion Detection Systems (IDS) are limited

because they rely only on known attack patterns. This project demonstrates how a machine learning

model trained using the Two-Class Decision Forest algorithm can help detect both common and

unusual types of attacks more effectively.

The use of Microsoft Azure Machine Learning Studio makes this project accessible, especially for

people without advanced programming skills. By using Azure’s drag-and-drop interface, the model

was built, trained, and evaluated with minimal code. This approach helps show that modern tools can

make machine learning more user-friendly and practical, even for small teams or individuals.

The NSL-KDD dataset, which was used in the project, provides a good variety of real-world attack

types. The project shows how this dataset can be used to train a model that detects major network

threats like DoS, Probe, R2L, and U2R. The model’s performance was measured using standard

metrics like accuracy, precision, recall, and F1-score, giving a clear picture of how well it works.

In addition, a simple web-based interface was created to show how the system could be used in

practice. This part of the project gives users a way to test the model by entering real-time values,

making the IDS more interactive and easier to understand.

Overall, this project can benefit:

● Students and researchers by showing a clear and practical example of machine learning in

cybersecurity.

● IT professionals by offering a simple model that can be improved or expanded for real use.

10
● Organizations by providing a low-cost approach to testing IDS with machine learning.

By focusing on easy-to-use tools and clear results, the study adds to current research in cybersecurity

and helps bridge the gap between academic work and real-world applications (Zhang et al., 2025; Ali

et al., 2023; Bello et al., 2024).

1.6 Operational Definition of Terms

1. Intrusion Detection System (IDS):

A system that monitors network traffic and identifies suspicious or harmful activities. In this

project, IDS is built using a machine learning model to classify whether a connection is normal

or an attack.

2. Machine Learning (ML):

A branch of artificial intelligence that allows computers to learn from data and make decisions

without being manually programmed. In this project, ML is used to train the model to detect

network intrusions.

3. Two-Class Decision Forest:

A supervised machine learning algorithm that combines the results of many decision trees to

make a final prediction. It is used in this project to classify network traffic as either normal or

an intrusion. This method is known for being accurate, fast, and easy to understand.

4. Microsoft Azure Machine Learning Studio:

A cloud-based platform that allows users to build, train, and deploy machine learning models

using a visual, no-code interface. It was used to build and train the IDS in this project.

11
5. NSL-KDD Dataset:

A benchmark dataset for evaluating intrusion detection systems. It contains labeled records of

both normal and different types of attack traffic, such as DoS, Probe, R2L, and U2R. It was

used to train and test the model in this study.

6. False Positive Rate (FPR):

The percentage of normal network traffic that is wrongly flagged as an attack by the IDS. A

lower false positive rate means the system is more reliable.

7. Precision:

A measure of how many of the alerts raised by the IDS are actually correct. High precision

means the system does not raise too many false alarms.

8. Recall:

Also known as sensitivity, it measures how many real attacks the system successfully detects.

High recall means the system is good at catching intrusions.

9. F1-Score:

A single number that combines both precision and recall. It gives a balanced measure of how

well the model performs.

10.Denial-of-Service (DoS) Attack:

A type of cyberattack where a system is flooded with traffic to make it unavailable to users.

The IDS in this project is trained to detect such attacks.

11.Remote-to-Local (R2L) Attack:

An attack where a remote user tries to gain unauthorized access to a system on the network.

This type of attack is included in the training data.

12
12.User-to-Root (U2R) Attack:

A situation where a regular user tries to gain administrator or root access to a system. The

model was trained to recognize this type of threat.

13.Probe Attack:

A type of attack where an attacker scans the network to find weak spots that could be used

later. This is one of the attack types the IDS can detect.

14.Web Interface:

A simple online page created in this project, where users can enter values and test if the model

detects an intrusion. It connects to the trained model to show live predictions.

13
CHAPTER TWO

LITERATURE REVIEW

2.1 Overview

Intrusion Detection Systems (IDS) are designed to monitor network activity and alert users when there

are signs of unauthorized access or attacks. Over the years, the rise in cybercrime has made these

systems a critical part of digital security. However, traditional IDS often rely on fixed rules or known

attack patterns, which makes them ineffective against new or evolving threats (Ali et al., 2023).

To address this limitation, researchers have explored the use of machine learning (ML) in intrusion

detection. ML allows systems to learn from data and detect unusual patterns that may signal an attack,

even if the attack is new or has never been seen before. This approach improves both detection speed

and accuracy, and has become one of the most promising areas in cybersecurity research (Nguyen et

al., 2022).

This project builds on that idea by using a specific ML algorithm, the Two-Class Decision Forest, to

classify network traffic as either normal or an intrusion. The goal is to improve detection accuracy and

reduce false alarms while keeping the system simple and efficient enough for practical use.

2.2 Conceptual Framework

● This project is built around three key concepts: Intrusion Detection Systems (IDS), machine

learning algorithms (specifically, the Two-Class Decision Forest), and the NSL-KDD dataset.

14
● Intrusion Detection Systems (IDS):

IDS monitor data flowing through a network and try to identify any abnormal or malicious

activity. Traditional IDS often fail to detect new threats because they depend on signatures,

which are known patterns of past attacks. In recent studies, ML-based IDS have shown better

performance because they can detect unusual behavior, not just known attacks (Zhang et al.,

2025).

Fig 1. ML based systems efficiently analyze network traffic. (Wang et al., 2024).

● Machine Learning in IDS:

Machine learning helps IDS learn from large sets of network data. Instead of relying on fixed

rules, the model is trained to recognize normal activity and then flag anything that doesn’t fit

that pattern. This makes the system more adaptable to modern attacks. Algorithms like

Decision Trees, Random Forests, and Support Vector Machines have been tested in IDS, but

15
many of them either require heavy tuning or become too slow with large datasets (Olaoye et

al., 2022). The Two-Class Decision Forest, however, combines multiple decision trees into one

strong model and performs well on classification tasks like intrusion detection (Chowdhury

and Sahu, 2023).

Fig 2. Hybrid ML work-flow (SPE - Hernandez, 2023).

● The NSL-KDD Dataset:

The NSL-KDD dataset is widely used in intrusion detection research. It is an improved

version of the KDD 1999 dataset, created to remove duplicate and redundant records. It

contains labeled examples of both normal and attack traffic, including DoS, R2L, U2R, and

Probe attacks. The dataset is considered balanced and more suitable for training ML models in

academic projects (Bello et al., 2024).

16
Fig 3. Flow chart of load identification decision Forests algorithm (Zhao et al., ResearchGate, 2022)

Together, these components form the base of this project: a smart IDS trained on NSL-KDD using a

Two-Class Decision Forest model, built and tested in Azure Machine Learning Studio.

17
Fig 5. Model Performance Metrics (MarkovML, 2023).

2.3 Empirical Review

Several researchers have worked on improving Intrusion Detection Systems (IDS) using machine

learning. Many of these studies tested different algorithms to see which one performs best for

detecting various types of network attacks. However, only a few of them focused specifically on

Decision Forest-based models or made use of platforms like Azure Machine Learning Studio for

implementation.

18
For example, Olaoye et al. (2022) compared multiple machine learning models, including Decision

Trees, Logistic Regression, and Naive Bayes, using the NSL-KDD dataset. Their results showed that

Decision Tree models had better detection accuracy, especially for complex attacks, but they warned

about overfitting when only a single tree was used.

Zhang et al. (2025) expanded on this by testing ensemble learning methods, including Random Forests

and Decision Forests. Their findings showed that models like the Two-Class Decision Forest, which

combine many decision trees, are more stable, reduce errors, and handle unbalanced data better than

single-tree methods or simpler models like Logistic Regression.

Nguyen et al. (2022) evaluated Support Vector Machines (SVM), K-Nearest Neighbors (KNN), and

Decision Forests on intrusion detection tasks. They concluded that while SVMs had good precision,

they required long training time and fine-tuning. Decision Forests, on the other hand, delivered high

accuracy without much adjustment, making them more practical for quick deployment.

Another recent study by Bello et al. (2024) highlighted the importance of using clean and balanced

datasets. They pointed out that the NSL-KDD dataset remains relevant because it avoids many of the

issues found in older datasets, such as data duplication and imbalance, which can affect model

performance.

Lastly, Chowdhury and Sahu (2023) tested deep learning models like LSTM and CNN on network

intrusion data and reported high detection accuracy. However, they also noted that such models need

more computing power and are harder to explain. For student projects or small organizations, simpler

and more transparent algorithms like Decision Forests are often a better choice.

19
These studies support the decision to use a Two-Class Decision Forest for this project. It offers a good

balance between accuracy, speed, and ease of use, especially when implemented on a platform like

Azure ML Studio, which simplifies the training and evaluation process.

2.3 Theoretical Framework

The main theory behind this project is ensemble learning, which is a method in machine learning

where multiple models are combined to solve a problem better than using a single model. In this

project, the Two-Class Decision Forest algorithm is used. It is based on a type of ensemble method

called bagging (bootstrap aggregating), where many decision trees are trained separately and their

results are combined to make a final decision (Zhang et al., 2025).

Each decision tree in the forest makes a prediction, and the final output is based on the majority vote

from all the trees. This helps to reduce the chances of making errors that one single tree might make.

Because of this, decision forests are more stable and accurate, especially when dealing with noisy or

unbalanced data which is often the case in network traffic.

This theory works well for intrusion detection because cyberattacks can appear in many different

forms, and a single model might not be able to catch all of them. By using multiple trees trained on

slightly different parts of the data, the decision forest can handle a wide range of attack types more

effectively than individual classifiers.

The Two-Class Decision Forest also benefits from the divide-and-conquer principle, where the data is

split based on features to create decision paths. This structure makes the model easier to understand

and explain, which is important when applying it in real-world environments.

20
In summary, the theoretical foundation of ensemble learning supports the use of decision forests in

IDS by improving accuracy, reducing overfitting, and making the system more reliable and

interpretable (Bello et al., 2024).

2.5 Summary of Reviewed Related Works and Knowledge Gap

From the literature reviewed, it is clear that machine learning has become a useful tool for improving

Intrusion Detection Systems (IDS). Many studies have explored different algorithms such as Support

Vector Machines, K-Nearest Neighbors, Logistic Regression, and Neural Networks. While these

models can perform well under certain conditions, they often require complex tuning, high computing

power, or produce results that are difficult to explain (Nguyen et al., 2022; Chowdhury and Sahu,

2023).

On the other hand, Decision Tree-based models have shown promising results due to their simplicity

and fast execution. Recent research suggests that combining multiple decision trees using ensemble

learning techniques like bagging can improve performance. The Two-Class Decision Forest algorithm

is one such method, offering good accuracy, low false positive rates, and better handling of unbalanced

data, all of which are important in IDS applications (Zhang et al., 2025; Bello et al., 2024).

However, there is still a gap in studies that show how these models can be applied in a simple and

practical way using modern tools like Azure Machine Learning Studio. Most research focuses on

complex models or environments that are not accessible to beginners or students. Also, very few

works demonstrate how to connect a trained model to an interface that allows users to interact with it

in real time.

21
This project addresses these gaps by:

● Using the Two-Class Decision Forest in a no-code, cloud-based platform (Azure ML Studio),

● Training and testing it on the NSL-KDD dataset,

● And building a simple web interface to demonstrate how IDS can work in practice.

This approach not only supports the findings of past studies but also shows how intrusion detection

can be implemented in a user-friendly and realistic way.

22
CHAPTER THREE

METHODOLOGY

3.1 Research Approach

The system was developed using Microsoft Azure Machine Learning Studio, a cloud-based platform

that allows users to build machine learning models using a visual, no-code environment. The project

used the NSL-KDD dataset for both training and testing.

The methodology includes several key phases:

● Uploading and preparing the dataset in Azure ML,

● Preprocessing the data using built-in modules,

● Training a Two-Class Decision Forest model,

● Evaluating the model’s performance,

● And creating a basic interface for users to test the system with real inputs.

The entire process was completed using Azure’s drag-and-drop Designer interface, along with a

Python notebook used at the end to export results. Each phase of the project is explained in the

sections that follow.

23
Fig 3.1. Framework of a developed Two-Class Decision Forest algorithm against the existing Naive

Bayes model.

24
i. Data Collection

The dataset used in this project is the NSL-KDD dataset, which is commonly used in intrusion

detection research. It is an improved version of the original KDD Cup 1999 dataset, created to remove

duplicate records and reduce data imbalance issues. The dataset includes labeled records of both

normal and attack network activities, making it ideal for training and testing a machine learning model

for intrusion detection.

The NSL-KDD dataset contains different types of attacks such as:

● DoS (Denial-of-Service) – attacks that try to make a service unavailable.

● Probe – attempts to scan and gather information about the network.

● R2L (Remote-to-Local) – when an attacker gains access to a machine from outside the

network.

● U2R (User-to-Root) – when a normal user tries to gain administrative control.

The dataset was downloaded from Kaggle, where it was already cleaned and structured in CSV format.

It was then uploaded into Azure Machine Learning Studio for use in the training pipeline. In Azure

ML, the dataset was loaded into the workspace and connected directly to other modules in the

experiment.

25
The figure above shows the dataset in my Azure ML workspace and its connection to the

pipeline.

The dataset includes important features such as protocol type, service, source bytes, destination bytes,

duration, login status, and others. These were used by the model to learn how to classify network

traffic as either normal or an intrusion.

ii. Data Preprocessing

Before training the model, the dataset needed to be cleaned and prepared. This step is called data

preprocessing, and it helps improve the model’s accuracy by removing errors, fixing data types, and

selecting useful features. All preprocessing steps were done using the built-in modules in Azure

Machine Learning Studio Designer.

The main preprocessing steps were:

26
1. Clean Missing Data

Although the NSL-KDD dataset from Kaggle was already cleaned, the Clean Missing Data module

was added to ensure that any unexpected gaps in the values were handled properly. This step ensures

consistency across all the rows before training the model.

The figure above shows the "Clean Missing Data" module in my pipeline.

2. Edit Metadata

Some of the data columns, especially the text-based ones like protocol_type, service, and

flag, were originally classified as string data. These were converted to the categorical data type using

the Edit Metadata module. This allowed the model to treat them as categories, which improves

classification.

27
The figure above shows how the Edit Metadata module was used to change data types.

3. Convert to Indicator Values

After updating the data types, the Convert to Indicator Values module was used to apply one-hot

encoding to the categorical columns. This step creates separate columns for each category (e.g., TCP,

UDP, ICMP) so that the model can understand and use them during training.

28
The figure above shows where you can see Convert to Indicator Values applied to the dataset.

4. Select Columns in Dataset

This step was used to remove any columns that were not needed for training. Unnecessary features

(like certain flags or identifiers) were excluded to reduce noise and help the model focus on the most

useful inputs.

29
The figure above shows the "Select Columns in Dataset" module connected in my pipeline.

All these preprocessing steps were connected and executed in sequence within the Azure ML Studio

interface. This allowed the cleaned and transformed data to be passed directly into the training phase.

iii. Model Development/Training: Decision Forests

After preprocessing, the cleaned dataset was ready to be used for model training. In this project, the

Two-Class Decision Forest algorithm was selected to build the Intrusion Detection System. This

algorithm is available as a built-in module in Microsoft Azure Machine Learning Studio Designer and

is designed for binary classification problems, such as predicting whether a network activity is normal

or an attack.

Why Two-Class Decision Forest?

This model works by combining many decision trees into one strong model. Each tree is trained on a

30
different part of the data, and their individual results are combined to make the final prediction. This

method helps improve accuracy, reduce overfitting, and handle unbalanced data better than single

decision trees (Zhang et al., 2025).

It is also easier to use than many other models because:

● It requires minimal configuration,

● It runs quickly even with large datasets,

● And it gives clear predictions that are easy to explain.

Steps Taken in Azure ML Studio:

1. The Train Model module was used to connect the processed data to the Two-Class Decision

Forest.

2. The target column (label) was set to the column indicating whether each record was “normal”

or an “attack”.

3. The Score Model module was added to compare the model’s predictions with the actual labels.

4. The Evaluate Model module was used to show how accurate the model was using key metrics

like accuracy, precision, recall, and F1-score.

31
The figure above shows the model pipeline with Train Model, Score Model, and Evaluate Model

connected.

The entire model was trained and tested using the Designer interface, which allowed the components

to be connected in a logical flow. The final output was used to check how well the model could detect

intrusions using unseen data.

iv. Evaluation Metrics

After training the model, it was important to check how well it performed. This was done using the

Evaluate Model module in Azure Machine Learning Studio, which compares the model’s predictions

with the actual results in the dataset. The evaluation focused on five main metrics: accuracy, precision,

recall, F1-score, and false positive rate.

These metrics help show if the model is good at catching attacks without making too many mistakes.

32
1. Accuracy

Accuracy is the percentage of all predictions the model got right, both normal and attack traffic. High

accuracy means the model is reliable overall.

2. Precision

Precision tells us how many of the records the model predicted as attacks were actually attacks. High

precision means fewer false alarms.

3. Recall

Recall shows how many of the actual attacks the model was able to detect. High recall means the

model is good at catching threats.

4. F1-Score

F1-score is a combination of precision and recall. It gives a single value to measure the balance

between finding true attacks and avoiding false alarms.

5. False Positive Rate (FPR)

This measures how often the model mistakenly flags normal traffic as an attack. A lower FPR is better

because it means fewer unnecessary alerts.

All these results were generated directly from the Evaluate Model module in Azure ML Studio. The

metrics helped confirm that the Two-Class Decision Forest model was able to detect intrusions with a

good level of accuracy and minimal false positives.

33
The figure above displays the Evaluate Model output showing performance metrics.

v. Analysis And Interpretation Of Results

Once the model was trained and evaluated, the next step was to understand how well it performed

based on the metrics provided. The evaluation results confirmed that the Two-Class Decision Forest

model was effective in detecting intrusions in the NSL-KDD dataset.

34
1. Accuracy and Overall Performance

The model achieved a high accuracy score, meaning it correctly classified most of the network traffic

records. This shows that the model learned the patterns of both normal and malicious behavior

effectively. The good accuracy also suggests that the preprocessing steps like feature selection, data

cleaning, and one-hot encoding helped improve performance.

The figure above shows the summary of the scored results used for analysis.

2. Precision and Recall

The precision value was strong, meaning the model didn’t raise too many false alarms. This is

important because too many wrong alerts can make users ignore real threats. The recall score was also

35
high, showing that the model successfully detected a large portion of actual intrusions. Together, this

balance means the system is both careful and alert which is exactly what’s needed in a good IDS.

3. F1-Score and False Positive Rate

The F1-score confirmed the balance between precision and recall. A high F1-score means the model

is not only accurate but also consistent. The false positive rate was low, which is especially important

in real-world applications as it reduces unnecessary warnings that can distract security teams or users.

4. Model Reliability and Simplicity

One of the biggest strengths of this model is that it performed well without needing complicated setup

or adjustments. It worked well using default settings in Azure ML Studio, which proves that

Two-Class Decision Forest is both powerful and easy to use.

5. Exporting Results for Further Analysis

To make the results more accessible, a Python notebook was used in Azure ML Studio to export the

scored data to a CSV file. This made it easier to analyze and even test the system further outside

Azure.

36
The figure above shows the use of a Jupyter Notebook for exporting results.

In summary, the results showed that the chosen model was a good fit for intrusion detection. It

achieved strong performance without needing high computing power or advanced tuning, which

supports the aim of building a practical, efficient, and user-friendly IDS.

37
REFERENCES

Ali, A., Musa, K. and Usman, H., 2023. Improving signature-based intrusion detection systems with

machine learning models: A survey. International Journal of Cybersecurity, 9(1), pp.33-45.

Bello, T.M., Ogunyemi, A.K. and Salisu, I., 2024. A critical analysis of the NSL-KDD dataset for

intrusion detection research. Journal of Computer Science and Security, 11(2), pp.21-32.

Chowdhury, S. and Sahu, P., 2023. Comparative performance of deep learning and ensemble learning

in intrusion detection. African Journal of Computer Research, 7(1), pp.54-67.

Kumar, S. and Patel, V., 2021. Machine learning-based intrusion detection: A review of supervised

methods. International Journal of Advanced Computer Science, 12(4), pp.92-101.

Li, W. and Huang, Y., 2023. Evaluating distance-based classifiers for network intrusion detection.

Journal of Intelligent Computing Systems, 5(3), pp.18-29.

Nguyen, L., Okafor, C. and Adeyemi, T., 2022. Machine learning approaches for detecting

cyberattacks in real-time systems. Journal of Cyber Intelligence, 8(2), pp.44-59.

Olaoye, A.A., Danjuma, M. and Eze, C., 2022. Performance comparison of classification algorithms

for intrusion detection using NSL-KDD dataset. Nigerian Journal of Information Security, 14(1),

pp.11-20.

Yassin, S. and Alshamrani, R., 2021. SVM-based intrusion detection: Challenges and solutions.

Arabian Journal of Computer Engineering, 6(4), pp.77-88.

Zhang, R., Bello, F. and Amadi, L., 2025. An ensemble-based intrusion detection system using

Two-Class Decision Forests. International Journal of Data Science and Cyber Defense, 3(1), pp.1-15.

Comparative Analysis of Intrusion - Zahedi Azam 012201068 - V3
No ratings yet
Comparative Analysis of Intrusion - Zahedi Azam 012201068 - V3
93 pages
Dominion c333
No ratings yet
Dominion c333
89 pages
Machine Learning for Network Intrusion Detection
No ratings yet
Machine Learning for Network Intrusion Detection
89 pages
Final Report With Modification
No ratings yet
Final Report With Modification
81 pages
MUDIturnitin
No ratings yet
MUDIturnitin
64 pages
Capstone Report (N257, N265, N277)
No ratings yet
Capstone Report (N257, N265, N277)
43 pages
September 2013 Gondar, Ethiopia
No ratings yet
September 2013 Gondar, Ethiopia
126 pages
Ids Report
No ratings yet
Ids Report
37 pages
Proposal Fina
No ratings yet
Proposal Fina
10 pages
Final Btech Report
No ratings yet
Final Btech Report
36 pages
Dayananda Sagar University: I I - Phase Major Project
No ratings yet
Dayananda Sagar University: I I - Phase Major Project
7 pages
AI-Powered Network Intrusion Detection
No ratings yet
AI-Powered Network Intrusion Detection
110 pages
Malware Detection with Machine Learning
No ratings yet
Malware Detection with Machine Learning
77 pages
Nettwork Intruder
No ratings yet
Nettwork Intruder
74 pages
Adaptive Machine Learning-Driven Intrusion Detection System
No ratings yet
Adaptive Machine Learning-Driven Intrusion Detection System
57 pages
r206668v AMutenda Model
No ratings yet
r206668v AMutenda Model
62 pages
Ensemble Classifier Design and Performance Evaluation For Intrusion Detection Using UNSW-NB15 Dataset
No ratings yet
Ensemble Classifier Design and Performance Evaluation For Intrusion Detection Using UNSW-NB15 Dataset
131 pages
Hybrid Intrusion Detection Thesis
No ratings yet
Hybrid Intrusion Detection Thesis
86 pages
A Machine Learning Approach To Network Intrusion Detection System
No ratings yet
A Machine Learning Approach To Network Intrusion Detection System
52 pages
SRPDT Project Report Template
No ratings yet
SRPDT Project Report Template
21 pages
Cyber Security Threat Detection Using Machine Learning Thesis New
No ratings yet
Cyber Security Threat Detection Using Machine Learning Thesis New
29 pages
Automatic Malaria Detection Using Machine Learning Approaches
No ratings yet
Automatic Malaria Detection Using Machine Learning Approaches
74 pages
Intrusion Detection System Using Machine Learning
No ratings yet
Intrusion Detection System Using Machine Learning
4 pages
Project Document
No ratings yet
Project Document
71 pages
10 Merged
No ratings yet
10 Merged
24 pages
Malware Detection Model
No ratings yet
Malware Detection Model
73 pages
Project - Documentation
No ratings yet
Project - Documentation
62 pages
Cybersecurity System
No ratings yet
Cybersecurity System
71 pages
Machine Learning IDS Project Report
No ratings yet
Machine Learning IDS Project Report
56 pages
Project Final Report 2
No ratings yet
Project Final Report 2
69 pages
Artificial Intelligence Enabled Data Security and Integrity in Cloud Computing
No ratings yet
Artificial Intelligence Enabled Data Security and Integrity in Cloud Computing
146 pages
CSE35 Project Report
No ratings yet
CSE35 Project Report
111 pages
REPORT
No ratings yet
REPORT
14 pages
AKHIL KUMAR M.Tech.
No ratings yet
AKHIL KUMAR M.Tech.
55 pages
A Proactive Detection Measure Against Intrusion in A Databases
No ratings yet
A Proactive Detection Measure Against Intrusion in A Databases
48 pages
R1 Final
No ratings yet
R1 Final
4 pages
Submitted By: Intrusion Detection System
No ratings yet
Submitted By: Intrusion Detection System
21 pages
Sradesh Vac
No ratings yet
Sradesh Vac
19 pages
Intrusion Detection System Project Report
No ratings yet
Intrusion Detection System Project Report
18 pages
Report
No ratings yet
Report
74 pages
Ids Final Report
No ratings yet
Ids Final Report
65 pages
DGA-Final Year Project Report
No ratings yet
DGA-Final Year Project Report
29 pages
Machine Learning for Personalized Vaccination
No ratings yet
Machine Learning for Personalized Vaccination
65 pages
Enhancing IDS with Neural Networks
No ratings yet
Enhancing IDS with Neural Networks
42 pages
Vitti
No ratings yet
Vitti
56 pages
Intrusion Detection System For Proactive Cyber Threat Detection
No ratings yet
Intrusion Detection System For Proactive Cyber Threat Detection
15 pages
Intrusion Detection Systems Using Decision Tree Classifier: Dr. K.K.Shukla
No ratings yet
Intrusion Detection Systems Using Decision Tree Classifier: Dr. K.K.Shukla
23 pages
Malware Detection in Health Sensors Using ML
No ratings yet
Malware Detection in Health Sensors Using ML
74 pages
Srijan Final Thesis
No ratings yet
Srijan Final Thesis
90 pages
Ftawu Tekola Proposal After DEfeince2 - For Merge
No ratings yet
Ftawu Tekola Proposal After DEfeince2 - For Merge
39 pages
Projetc R
No ratings yet
Projetc R
97 pages
Oljira 1
No ratings yet
Oljira 1
13 pages
Z SH Both
No ratings yet
Z SH Both
59 pages
Report Intern
No ratings yet
Report Intern
25 pages
Detection of Attacks (DoS, Probe) Using Genetic Algorithm Project Report
No ratings yet
Detection of Attacks (DoS, Probe) Using Genetic Algorithm Project Report
113 pages
Ids Fi 1
No ratings yet
Ids Fi 1
8 pages
Network Intrusion Detection System Report
No ratings yet
Network Intrusion Detection System Report
59 pages
24UG-AI006 PROJECT DOCUMENTATION FINALv
No ratings yet
24UG-AI006 PROJECT DOCUMENTATION FINALv
68 pages
Cyber Threat Detection Performance Evaluation
No ratings yet
Cyber Threat Detection Performance Evaluation
78 pages
TSFM 12491CTSyn
No ratings yet
TSFM 12491CTSyn
22 pages
【2021 ArXiv】Contrastive Self-supervised Sequential Recommendation With Robust Augmentation
No ratings yet
【2021 ArXiv】Contrastive Self-supervised Sequential Recommendation With Robust Augmentation
11 pages
Demand Forecasting of Spare Parts Using Artificial Intelligence A Case Study of KX Tanks - 2023 - MDPI
No ratings yet
Demand Forecasting of Spare Parts Using Artificial Intelligence A Case Study of KX Tanks - 2023 - MDPI
10 pages
2025 - PRePRINT - LLM-FE Automated Feature Engineering For Tabular Data With LLMs As Evolutionary Optimizers
No ratings yet
2025 - PRePRINT - LLM-FE Automated Feature Engineering For Tabular Data With LLMs As Evolutionary Optimizers
21 pages
Deep Learning Manual1
No ratings yet
Deep Learning Manual1
34 pages
2025 - PNet-IDS A Lightweight and Generalizable Convolutional Neural Network For Intrusion Detection in Internet of Things
No ratings yet
2025 - PNet-IDS A Lightweight and Generalizable Convolutional Neural Network For Intrusion Detection in Internet of Things
16 pages
SVM Types
No ratings yet
SVM Types
12 pages
Data Analytics: Unit 3: Time Series
No ratings yet
Data Analytics: Unit 3: Time Series
11 pages
Raju Synopsis 1
No ratings yet
Raju Synopsis 1
8 pages
MC300 Summer Training
No ratings yet
MC300 Summer Training
17 pages
Comparative Analysis of MobileNetV2 and InceptionV3 For Prediction of PCOS
No ratings yet
Comparative Analysis of MobileNetV2 and InceptionV3 For Prediction of PCOS
8 pages
Soft Prompts and Prompt Tuning
No ratings yet
Soft Prompts and Prompt Tuning
39 pages
A Three-Model Deep Learning Framework For ASL Recognition Integrating CNN Skeletal Features and Multi-Modal Fusion
No ratings yet
A Three-Model Deep Learning Framework For ASL Recognition Integrating CNN Skeletal Features and Multi-Modal Fusion
9 pages
Forecasting Aviation Spare Parts Demand
No ratings yet
Forecasting Aviation Spare Parts Demand
22 pages
DIP - Module 4
No ratings yet
DIP - Module 4
27 pages
Lecture 3 - Spatial Operations
No ratings yet
Lecture 3 - Spatial Operations
31 pages
AI-Powered Resume Screening System Using NLP and Machine Learning
No ratings yet
AI-Powered Resume Screening System Using NLP and Machine Learning
6 pages
Psyb70 Lec11
No ratings yet
Psyb70 Lec11
64 pages
CSE 445 - Lecture 1 - Machine Learning Introduction
No ratings yet
CSE 445 - Lecture 1 - Machine Learning Introduction
23 pages
Deep Learning Solutions Q1-29
No ratings yet
Deep Learning Solutions Q1-29
3 pages
Analysing The Impact of Pooling Techniques On Resnet Architecture
No ratings yet
Analysing The Impact of Pooling Techniques On Resnet Architecture
65 pages
InfoEdge Data Scientist 2025
No ratings yet
InfoEdge Data Scientist 2025
6 pages
Ai Powered Ocr For Efficient Government Documentation
No ratings yet
Ai Powered Ocr For Efficient Government Documentation
49 pages
Introduction
No ratings yet
Introduction
12 pages
Deep Learning Based Gastric Cancer and D
No ratings yet
Deep Learning Based Gastric Cancer and D
8 pages
Sun 等 - 2025 - TinyR1-32B-Preview Boosting Accuracy With Branch-Merge Distillation
No ratings yet
Sun 等 - 2025 - TinyR1-32B-Preview Boosting Accuracy With Branch-Merge Distillation
9 pages
【2024】【Forecasting】【Time-MoE】Time-MoE- Billion-Scale Time Series Foundation Models With Mixture of Experts
No ratings yet
【2024】【Forecasting】【Time-MoE】Time-MoE- Billion-Scale Time Series Foundation Models With Mixture of Experts
30 pages
ML - 3170724 Cipat 2025
No ratings yet
ML - 3170724 Cipat 2025
3 pages
Module 1
No ratings yet
Module 1
18 pages
Ccs338 Computer Vision
No ratings yet
Ccs338 Computer Vision
2 pages