Slide 1: Hello everyone, I'll be discussing the compatibility of different designs for
malware detection systems and briefly demonstrate our code. I will be comparing
five prototypes, each using different back-end technologies, models, core
functionalities and extra functionalities.
- FE Technologies: All prototypes use React JS for the front end, combined with
Material UI and Bootstrap. This ensures a smooth, responsive UI across all designs.
- Back-End Technologies:
Prototype 1, 2 and 3 use Python, Flask, and MySQL.
Prototype 4 and 5 use to Python, Django, and MongoDB.
=> The choice of back-end technology is determined by the client's requirements
and the specific needs of each design. For example, user information should be
stored in a relational database. Flask is more suitable for a lightweight and flexible
system like small applications and quick prototypes, while Django is a full-featured
framework more suited for larger projects.
- Source Code: For each prototype, our source code is available on GitHub with
instruction on how to run each system on local machine. Links for each prototype
are as follows
Slide 2: Next, I will show you the different models that we selected for each
prototype and discuss their performance based on key metrics such as accuracy
and training time.
- Dataset: All prototypes use PE Extracted Features for their datasets.
- Models:
Prototype 1 uses RF, Extreme Gradient Boost and a hybrid model using
ensemble method.
Prototype 2 uses K-Nearest Neighbors + Light Gradient-Boosting Machine
and a hybrid model using stacking method.
Prototype 3 uses a Convolutional Neutral Network with Random Forest
hybrid model and Convolutional Neutral Network with Extreme Gradient Boosting
hybrid model.
Prototype 4 uses Gradient Boosting, SVM and a hybrid model using
ensemble method.
Prototype 5 uses Naive Bayes, Support Vector Machine and a hybrid model.
=> We implemented all of these prototypes and evaluated their performance
based on training time and accuracy to make the final decision on which one
would be best suited for our final solution. Let’s now look at the metrics.
- Accuracy Metrics: Read from screen
- Training Duration: Read from screen
=> Based on these performance metrics, we decided to eliminate the
combination of Gradient Boosting and Support Vector Machine, as this hybrid
model took 7 minutes to train, which is too long and would negatively impact the
user experience. Additionally, we dropped the Naïve Bayes and SVM hybrid model
because it achieved an accuracy rate of only 0.6 to 0.7, which are lower than the
other hybrid models that we implemented.
- Core Functionalities: All solutions allow admins to upload datasets and train
models. Once the training is complete, the system visualizes the results. Users
can then use these trained models to predict malware.
- Extra Functionalities:
Prototype 1 allows users to upload a PE file for feature extraction, which is
then used for malware prediction.
Prototype 2 allows admins to keep training models with new datasets over
time. The system then visualizes the results in a chart, displaying progress over
time with timelines.
Prototype 3 allows users to compare predictions from two different trained
models, along with accuracy probability metrics.
Slide 3: Code Demonstration
Now, I will show you a brief code demonstration to showcase our implementation
of the three selected designs.
The top three screenshot are our database diagram for each solution showing the
relationship between the tables.
Below, these are the code structures for our three solutions. Each solution
contains a client folder and a server folder. The client folder stores our React code,
which is responsible for the user interface. In the server folder, we store the API
routes using Flask. The model folder contains the code that interacts with the
database, and the controller folder holds the code for services such as training
and predicting ML and DL models. The test folder contains unit tests for each
system. And each screenshot also displays the libraries that we use for the client
side and server side of each design.
Now, we will move on to demonstrate our applications.