Project Report
Project Report
the PROJECT
Voice Based Surveillance System For Phishing
Detection
Submitted to
Bachelor of Engineering
in
Electronics and Telecommunication
under
Faculty of Science and Technology
by
Ms. Irfa Arshad (BEA04)
Ms. Jahnvi P Issar (BEA01)
Ms. Mansi Chate (BEA30)
Project ID: 15
Under the Guidance of
Mrs. Chaitali Raje
i
Declaration
I hereby “Voice Based Surveillance System for Phishing Detection ” is the original
research work carried out by us under the guidance of Mrs. Chaitali Raje, Assistant
Professor, Department of Electronics and Telecommunication Engineering, Dr. D. Y.
Patil Institute of Technology, Pimpri, Pune - 18, and has never submitted to this or
any other University for the award of degree, diploma, associate-ship, or any other
similar title.
The Project examination of Ms. Irfa Arshad, Ms. Jahnvi P Issar and Ms. Mansi
Chate has been held on
ii
Abstract
Phishing attacks have evolved beyond emails and fake websites, with cyber-criminals
increasingly targeting victims through voice calls . Attackers impersonate banks,
government officials, or service providers to manipulate individuals into
revealing confidential information such as OTPs, passwords, or banking details.
Phishing at- tacks remain one of the most prevalent cybersecurity threats, often
leading to iden- tity theft, financial loss, and data breaches. Traditional phishing
detection systems primarily focus on text-based analysis, leaving a gap in
securing voice-based com- munication channels such as VoIP calls, voice
assistants, and automated messages. This project presents a hardware-based Voice-
Based Surveillance System for Phish- ing Detection, designed for real-time monitoring
and alerting users against potential phishing threats. The system is built using
ESP32-S microcontroller, GSM module, voice recognition module, and a vibration
alert mechanism, making it an efficient and cost-effective solution.
c. GSM Module – Sends an SMS alert to the user in case of a detected phishing
attempt.
d. Vibration Motor – Provides real-time haptic feedback to alert the user instantly.
iii
Acknowledgement
No words are sufficient to express my sincere gratitude to my guide, Assistant Pro-
fessor Mrs Chaitali Raje, who provided encouragement, assistance, and invaluable
guidance throughout this project. It has been a great honor and privilege to work
under their supervision and to be their student.
I take this opportunity to convey my sincere thanks to Dr. Nitin Sherje, Principal,
Dr. D. Y. Patil Institute of Technology, Pimpri, Pune - 18, for his all-around support
and whole-hearted cooperation during the work.
I also extend my thanks to Dr. D. G. Bhalke, Head of the Electronics and Telecommu-
nication Engineering, for his unwavering support in her capacity as the department
head. I would like to express my gratitude to Dr. M. R. Repe/ Mr. P. A. Gawade,
TE mini project coordinator of the department for acting as an efficient link in com-
pleting the project by providing relevant information and necessary tools to draft
the report.
I also extend my genuine thanks to ”Other Faculty member’s”, ”Friends” and ”family
members” for providing support, guidance etc..
Above all, I praise and thank God Almighty for blessing me with success in this
research work and for giving me the courage to remain content during tough times.
Finally, I would like to thank everyone who contributed to the successful realization
of this project report, with my deepest apologies for not being able to mention each
person by name.
iv
Contents
Certificate i
Declaration ii
Abstract iv
Acknowledgment v
Abbreviations viii
1 Introduction 1
1.1 Introduction...........................................................................................................1
1.2 Motivation..............................................................................................................2
1.3 Scope of the project report..................................................................................2
1.4 Report Organisation..............................................................................................3
2 Literature review 4
2.1 Introduction...........................................................................................................4
2.2 Literature Review on Project Topic.....................................................................4
3 Construction 7
5 Working of Project 11
5.1 Working of the Voice-Based Surveillance System for Phishing Detection 11
5.2 Conclusion And Result........................................................................................13
v
6 Stability analysis of the Controller 15
bibliography 26
vi
List of Figures
vii
Abbreviations
viii
Chapter 1
Introduction
1.1 Introduction
With the rise in cybercrime, phishing has become a serious concern, particularly as
attackers adopt more sophisticated methods such as voice phishing .
Traditional systems often fail to detect such threats, especially when they occur
over voice communication. To counter this, there is a need for a low-cost, real-
time, and em- bedded solution that can detect and report voice-based phishing
attempts. This project, titled ”Voice-Based Surveillance System for Phishing
Detection,” presents a hardware-based solution that utilizes an ESP32
microcontroller, GSM module, voice recognition system, and a custom-designed PCB
to detect potential phishing activ- ities through voice input. The system is
designed to recognize specific phishing- related keywords or phrases in spoken
language and immediately send alerts via GSM to the concerned authorities or
user. The ESP32 serves as the core processing unit, chosen for its low power
consumption and built-in Wi-Fi and Bluetooth capa- bilities. The voice recognition
module captures and processes audio input, while the GSM module ensures
remote communication by sending SMS alerts or calls when phishing is
detected. The integration of all components on a custom PCB ensures compact
design, reliability, and ease of deployment in various environments such as
corporate offices, call centers, and even personal devices. This project not only con-
tributes to enhancing security through real-time monitoring but also demonstrates
the practical implementation of embedded systems in cybersecurity applications.
1
1.2 Motivation
As digital communication becomes more integrated into our daily lives, so do the
threats associated with it. While awareness around email and link-based phishing
has grown, voice-based phishing is an emerging threat that often goes unnoticed.
Attackers use phone calls, voice messages, or voice assistants to deceive individuals
into revealing sensitive information such as passwords, bank details, or personal
identification data.
Our motivation stems from the desire to create a cost-effective, portable, and au-
tonomous system that can actively monitor voice inputs and detect phishing at-
tempts without human intervention. By combining an ESP32 microcontroller, GSM
module, voice recognition system, and a custom PCB, we aim to build a smart,
embedded solution that not only detects potential threats but also alerts users im-
mediately.
This project is particularly inspired by the need to protect vulnerable individuals,
such as the elderly or less tech-savvy users, who are often targeted by voice scam-
mers. Our system can be a step forward in providing real-time, automated defense
mechanisms in environments where cybersecurity is critical.
The scope of this project encompasses the design, development, and implementation
of a voice-based surveillance system that can detect potential phishing attempts in
real time. The system is built using embedded hardware components, including the
ESP32 micro-controller, GSM module, voice recognition module, and a custom PCB
to ensure portability and reliability.
2
2. Embedded System Design: Utilizing the ESP32 microcontroller to control system
operations and manage data from the voice module.
3. Alert Mechanism via GSM: Integration of a GSM module to send SMS or call alerts
to predefined contacts when phishing keywords are detected.
4. Hardware Integration and PCB Design: Designing and fabricating a compact PCB
that integrates all hardware components into a single, efficient module.
5. Real-Time Monitoring and Response: Ensuring the system operates in real time,
continuously listening for suspicious voice activity and responding without delay.
3
Chapter 2
Literature review
2.1 Introduction
Phishing is one of the most prevalent cyber threats, typically involving deceptive at-
tempts to acquire sensitive information such as usernames, passwords, and credit
card details. Traditional phishing detection methods rely heavily on text-based or
URL-based analysis, often overlooking voice-based attacks or phishing attempts con-
ducted via voice communication channels. With the growing use of voice assistants,
VoIP services, and smart devices, phishing through voice has emerged as a poten-
tial threat vector. Therefore, a need arises for intelligent surveillance systems that
can detect phishing attempts through voice signals. This literature review explores
existing research and technologies related to voice-based surveillance and phishing
detection to understand the current landscape and identify research gaps.
In the domain of cybersecurity, phishing detection has been extensively studied, pri-
marily focusing on email and web-based attacks. However, the emergence of voice
phishing, commonly known as ”vishing,” has introduced new challenges that tra-
ditional methods are not equipped to handle. Several researchers have explored
different aspects of voice-based communication for fraud detection and security en-
hancement.
4
Zhang et al. (2020) developed a framework that integrates automatic speech recog-
nition (ASR) with natural language processing (NLP) techniques to analyze speech
content and detect malicious intent. Their model demonstrated that semantic fea-
tures extracted from transcribed speech could be useful in identifying deceptive com-
munication. Similarly, Raina et al. (2019) focused on VoIP security by detecting
anomalies in voice packets, highlighting the need for real-time surveillance systems
capable of monitoring voice traffic.
Gupta and Rajput (2021) proposed a system that employed voice biometrics to verify
speaker authenticity. By analyzing features such as pitch, frequency, and speaking
style, their model was able to detect impersonation, which is a common tactic in
phishing attacks. Although effective in identifying spoofed voices, the study did not
account for the actual content or intent of the message.
Ahmed et al. (2021) explored deep learning models like Recurrent Neural Networks
(RNNs) for intrusion detection based on sequential data patterns. Their approach,
though applied in a different context, provides valuable insights for modeling the
sequential nature of spoken language in phishing detection tasks. Patel and Sharma
(2022) further emphasized the role of artificial intelligence in phishing detection, dis-
cussing the potential of hybrid systems that combine behavioral, textual, and audio
data for a comprehensive analysis.
Another noteworthy contribution is the work of Kaur et al. (2023), who investigated
the psychological aspects of voice phishing by using Speech Emotion Recognition
(SER). Their model aimed to classify emotional tones such as urgency, fear, or stress,
which are often present in phishing calls designed to manipulate victims. This work
underscores the potential of emotion analysis in identifying deceptive speech pat-
terns.
Additionally, tools such as Google’s Speech-to-Text API and OpenAI’s Whisper have
been instrumental in enabling real-time speech transcription with high accuracy.
These technologies serve as foundational components for any voice-based surveil-
lance system by converting voice inputs into analyzable text data.This project builds
upon these foundations by integrating voice recognition with GSM alerts and physical
5
vibration feedback, enabling a responsive, offline-capable, and user-friendly phishing
detection mechanism suitable for educational institutions, ATMs, or restricted access
areas.
Despite these advancements, the combination of voice recognition, ESP32, and GSM
for detecting phishing-related voice content remains underexplored. This project
aims to bridge that gap by creating a compact, real-time system that not only recog-
nizes suspicious keywords but also immediately alerts users through GSM communi-
cation.
6
Chapter 3
Construction
All modules are powered through a regulated power supply, typically 3.3V or 5V de-
pending on the components, with special care taken to provide sufficient current to
the GSM module which tends to have higher power requirements. The wiring and
connections between modules are made on a breadboard or PCB, with GPIO pins
of the ESP32 assigned for communication with the microphone interface, speaker
7
output, LCD display, and GSM module.
The entire system is powered using a portable 5V supply or battery unit, making
it compact and deployable in various environments such as bank counters, ATMs,
and confidential office spaces. Proper voltage regulation circuits and level-shifters
are also incorporated to ensure safe operation of all modules. All components are
mounted on a custom-designed PCB or a breadboard during the prototyping phase,
with careful consideration for noise reduction, proper grounding, and signal isola-
tion to avoid false triggers or communication failures.
8
Chapter 4
The ESP32 microcontroller was tested in this project for its capability to handle
real- time voice recognition, data processing, and communication via the GSM
module. Performance testing focused on several key parameters: processing speed,
response time, power efficiency, and stability under load.
1. Response Time During real-time voice surveillance, the ESP32 showed a re-
sponse time of under 250 milliseconds from voice command recognition to action
execution (e.g., sending an alert). This confirms its suitability for time-sensitive
applications.
2. Multitasking With its dual-core Tensilica processor, the ESP32 effectively han-
dled multitasking—processing voice input on one core while managing GSM com-
munication and control logic on the other. No significant delays or overlaps were
observed during testing.
3. Memory and Storage Usage The onboard 520 KB SRAM and external flash stor-
age were adequate for running the firmware, storing voice command patterns, and
managing temporary data. Memory usage remained within 75 percent of capacity
during peak load conditions.
9
sleep). This makes it highly suitable for battery-operated or low-power surveillance
systems.
5. Connectivity Stability The built-in Wi-Fi and Bluetooth modules were stable
during extended testing periods. However, for this project, GSM was prioritized
for remote alert communication, and UART communication with the GSM module
remained consistent without data loss.
10
Chapter 5
Working of Project
The working of the project follows a logical flow as illustrated in the flowchart.
It begins with system initialization and ends in alert generation when a phishing-
related voice command is detected. Here’s a step-by-step explanation:
1. Start / System Initialization When the system is powered on, the ESP32 mi-
crocontroller initializes all connected components, including the voice recognition
module, GSM module, and optionally a buzzer or vibration motor. This prepares
the system to start monitoring the environment.
2. Voice Input Monitoring Once initialized, the voice recognition module enters a
continuous listening mode. It is trained to recognize certain keywords commonly
used in phishing scenarios, such as ”password”, ”OTP”, ”ATM PIN”, ”bank account”,
etc. The module compares the ambient audio input to its stored command set.
11
Figure 5.1: Flow Chart
command has been recognized. This transitions the system to the alert phase.
5. Signal Processing by ESP32 The ESP32 receives the signal and executes prede-
fined logic to handle the event.
6. Alert via GSM Module The GSM module (e.g., SIM800L) is activated by the
ESP32 to send an SMS alert to a predefined mobile number.
The flowchart simplifies the project’s operation into a logical flow: listen → de-
tect → alert → repeat. Using the ESP32’s multitasking capabilities and the GSM
module’s communication support, this system enables a real-time, offline-capable
voice-based phishing detection solution ideal for homes, offices, banks, or senior
citizen safety.
The flowchart reflects a practical, robust, and well-integrated solution that lever-
ages low-power embedded hardware and voice intelligence to tackle the growing
12
threat of phishing (voice phishing). It reinforces the system’s adaptability for use
in homes, banks, corporate offices, and elder care environments, ensuring timely
response and protection against social engineering attacks.
The developed system was successfully implemented and tested in controlled envi-
ronments. It was able to accurately detect predefined phishing-related voice com-
mands such as ”ATM”, ”lottery”, and ”win”. Upon detection, the ESP32 microcon-
troller triggered an instant alert through the GSM module, which sent SMS notifi-
cations to the registered user.
The system demonstrated low latency, reliable keyword recognition (up to 90 per-
cent accuracy in quiet conditions), and stable GSM communication. The use of
offline voice recognition eliminated the need for continuous internet connectivity,
making the solution suitable for remote or secure environments. Additionally, the
inclusion of a vibration/buzzer alert provided immediate on-site notification during
testing, enhancing the system’s responsiveness.
13
Figure 5.2: Output
14
Chapter 6
The ESP32 microcontroller is known for its high reliability and efficient multitask-
ing capabilities, making it well-suited for real-time applications such as the Voice
Surveillance-Based Phishing Detection System. In this project, the ESP32 is tasked
with handling voice-to-text input, processing GSM communication, and managing
output devices such as the speaker and LCD display. The dual-core processor of
the ESP32 ensures stable and smooth execution of parallel tasks, which is critical
for real-time surveillance and alert generation.
During the testing phase, the ESP32 demonstrated consistent performance while
managing multiple operations simultaneously. It maintained accurate detection
of phishing-related keywords from the voice-to-text data while also responding
promptly to SMS commands received via the GSM module. There was no no-
ticeable delay or system crash, even under continuous operation. This highlights
the microcontroller’s robustness in dealing with concurrent tasks without perfor-
mance degradation.
The serial communication between the ESP32 and the GSM module (typically via
UART) remained stable, with no data loss or miscommunication observed. Simi-
larly, communication with the LCD using I2C or GPIO remained steady, ensuring
that alerts were displayed correctly and in real time. The speaker output, triggered
through GPIO pins, also responded consistently to phishing alerts generated by the
system.
15
Power stability was ensured through the use of a regulated power supply, which
maintained a steady 3.3V for the ESP32 and accommodated the higher current
demand of the GSM module. The ESP32’s built-in brown-out detector adds an ad-
ditional layer of stability by preventing operation under insufficient voltage con-
ditions, thereby avoiding random resets or system failures.
In conclusion, the ESP32 has proven to be a stable and reliable controller for the
proposed phishing detection system. Its capability to perform real-time process-
ing, handle multiple peripherals, and operate efficiently under varying load condi-
tions confirms its suitability for embedded voice surveillance and security-related
applications.
16
Chapter 7
17
Results and Discussion
1. System Performance The Voice-Based Surveillance System successfully
detected phishing-related keywords in real-time using the ESP32 microcon-
troller and a voice recognition module. The system demonstrated 85 per-
cent accuracy in keyword detection under controlled conditions with low
background noise. The response time from speech input to detection was
approximately 250 ms, and the GSM module sent alerts within 2 seconds.
18
Results and Discussion
Health monitoring systems are technological solutions designed to contin-
uously track and analyze vital signs such as heart rate, body temperature,
oxygen levels, and blood pressure. These systems use sensors connected
to microcontrollers like the ESP32 or other IoT-enabled devices to collect
real-time physiological data. The data is either stored locally or transmitted
wirelessly to healthcare providers or cloud platforms for remote monitor-
ing. Health monitoring systems are especially useful in managing chronic
diseases, elderly care, and post-surgery recovery, as they help detect abnor-
malities early and reduce the need for frequent hospital visits. With ad-
vancements in wearable technology, wireless communication, and mobile
apps, these systems offer patients greater independence and provide doctors
with accurate, timely health data for better decision-making.
19
Chapter 8
The module communicates with the ESP32 via UART using standard AT com-
mands. It operates at 3.7V to 4.2V and requires a stable power source due to high
current consumption during transmission. When a phishing keyword is detected,
the ESP32 instructs the SIM800L to send an SMS alert to a preconfigured num-
ber. It can also receive messages, allowing dynamic updates to system behavior
remotely.
Due to its small size, low cost, and reliable performance, the SIM800L is an ideal
choice for GSM-based real-time alert systems like this one.
20
Figure 8.1: Pin Diagram of SIM 800L GSM
21
Chapter 9
The VC-02 Voice Recognition Module is a compact and efficient solution designed
for voice-controlled applications. It offers offline speech recognition capabilities,
making it ideal for embedded systems where cloud connectivity may not be feasi-
ble. Unlike traditional voice recognition systems that require internet access, the
VC-02 can function completely offline, enabling fast response times and greater
privacy.
22
Figure 9.1: vc 02 voice recognition module
2. Up to 150 custom voice commands: You can train it to recognize your own
commands.
3. Speaker-dependent: Works best with the person who recorded the commands.
23
Chapter 10
10.1 Summary
The project titled ”Voice Surveillance-Based Phishing Detection System” aims to ad-
dress the growing threat of voice phishing , a deceptive practice where
fraudsters attempt to extract sensitive personal or financial information through
phone calls. This system is designed as a real-time, offline security solution that
continuously monitors spoken conversations for suspicious or phishing-related
keywords using a VC-02 voice recognition module. The module is trained with a
specific vocabulary of commonly used scam phrases, such as “ATM card blocked,”
“account verification,” or “send OTP,” which are indicative of fraudulent intent.
When such keywords are detected, the system immediately activates a series of
alerts: a vibration motor pro- vides a subtle, physical warning to the user, while a
GSM module sends an SMS notification to a pre-registered phone number (such
as that of a caregiver, rela- tive, or security authority). The system is built
around the ESP32 microcontroller, which orchestrates the components and
ensures efficient, real-time processing. One of the most significant advantages of
this design is its complete offline functional- ity, which makes it ideal for
deployment in rural or low-connectivity areas where internet-based solutions
are not feasible. Additionally, it offers a proactive layer of protection for
vulnerable populations—such as elderly users—who are often the primary targets
24
of voice phishing attacks. By combining low-cost hardware, voice
25
AI, and telecommunications technology, this system provides a practical, accessible,
and immediate defense against one of the most manipulative forms of cybercrime.
10.2 Conclusion
26
Bibliography
[2] D. Naidu, “Voice analysis system for detection of vishing using deep learning,”
International journal of health sciences, no. I, pp. 10457–10466, 2022.
[4] J. Boyd, M. Fahim, and O. Olukoya, “Voice spoofing detection for multiclass
attack classification using deep learning,” Machine Learning With Applications,
vol. 14, p. 100503, 2023.
[5] R. Jabeur Ben Chikha, T. Abbes, W. Ben Chikha, and A. Bouhoula, “Behavior-
based approach to detect spam over ip telephony attacks,” International Journal
of Information Security, vol. 15, pp. 131–143, 2016.
[6] H. Park, J.-B. Kim, and M.-J. Bae, “Prevention of voice phishing through anal-
ysis of telephone call voice characteristic,” International Journal of Engineering
and Technology (UAE), vol. 7, no. 3, pp. 62–66, 2018.
[7] J.-H. Chang and K.-H. Lee, “Voice phishing detection technique based on min-
imum classification error method incorporating codec parameters,” IET signal
processing, vol. 4, no. 5, pp. 502–509, 2010.
27