SMART SURVEILLANCE SYSTEM USING MACHINE LEARNING
Abstract: In an era where safety and security are the topmost priorities in residential and commercial endeavors, traditional
surveillance systems fall short due to manual human monitoring and delayed response mechanisms. This paper proposes a smart
surveillance system using machine learning techniques to enhance situational awareness and real-time threat detection. It
interweaves computer vision mechanisms for object monitoring, facial recognition, noise detection, and entry/exit monitoring.
Developed using Python and Opencv, the system employs various algorithms, such as Structural Similarity Index (SSIM), Haar
Cascade, and Local Binary Patterns Histogram (LBPH), to monitor anomalies and recognise individuals. This ensures less fatigue
in the process of monitoring and lesser delay in response using a scalable and efficient substitute to conventional surveillance.
Therefore, the proposed system will find its huge application in places of high risks like airports, malls, and other public
infrastructures. It is targeted at contributing to the emerging field of intelligent video surveillance, especially in proactive security
enforcement.
Index Terms - Computer Vision, Object Detection, Real-Time Alert system, Face recognition.
I. INTRODUCTION
Surveillance technology has moved fast as worldwide concerns are soaring regarding threats to security in public and private areas.
Classical surveillance systems through closed-circuit television are limited due to their dependency upon human monitoring and
post-incidence analysis. Research conducted by the National Institute of Justice chronicles the difficulties even highly trained
operators have in detecting incidents solely by manual observation of video feeds (Norris & Armstrong, 2007). Therefore, the
inefficiency has called for systems that can intelligently detect, process, and interpret data on their own.
This smart surveillance system, equipped with machine learning and computer vision capabilities, represents a great advancement
in the immediate detection, classification, and alert generation. Such systems can learn environmental patterns, distinguish between
normal and anomalous behavior, and respond accordingly. Thus, with the help of Haar cascade classifiers for face detection and
LBPH for facial recognition, the proposed system improves detection rates and reduces false positives (Ahonen et al., 2006).
This study specifies the design and development of a Python-based graphical user interface (GUI) application to execute
fundamental surveillance tasks like object tracking, visitor detection, noise detection, and face identification. Combining all these
capabilities turns surveillance from reactive into proactive mode, ensuring a prompt security response while freeing the security
staff from manual monitoring. With this intention, the project endeavours to contribute to evolving AI-assisted surveillance systems
that are adaptive, scalable, and efficient.
II. LITERATURE SURVEY
The producers of the intelligent surveillance systems made there, in essence, assuming that AI has an increasing application in
security affairs. The classical video monitoring systems demanded constant human supervision, which inevitably, became
ineffectual and even erroneous on some occasions. Recent studies show that with the introduction of machine learning techniques
into surveillance systems, these systems were transformed into intelligent self-contained systems that can interpret the visual data
and react to abnormal activities in real time (Sreenu & Saleem Durai, 2019). Numerous machine learning algorithms have been
proposed for the purposes of detection, tracking, and facial recognition. Viola and Jones introduced fast object detectors using
Haar-like features and cascade classifiers in 2001, which still form the backbone of face detection modules today. This technique
is complemented by the Local Binary Pattern Histogram (LBPH) method, which provides robust facial feature extraction under
different illumination conditions and orientations of the face (Ahonen et al., 2006). More recently, CNNS, which stands for
convolutional neural networks, has been in the limelight for object and face recognition with even better precision but usually
higher computational costs (Huang et al., 2020).
Structural similarity measures such as SSIM (Wang et al., 2004) have been used to analyse image frames to detect changes in the
environment for motion detection and scene understanding. This method mainly works when, for example, the removal or the
addition of an object in the surveillance frames has to be detected. Another approach is using perception-based ID systems using
GMM and optical flow analysis, which allows the detection and tracking of moving objects even in a complex environment
(Maddalena & Petrosino, 2008).
It still faces problems such as false alarms, scalability, and computational load. The present system aims to solve these problems by
exploiting lightweight, well-researched algorithms such as Haar and LBPH, along with real-time decision logic, precisely to make
the system amenable to low-resource environments such as small businesses and residential settings.
III. REQUIREMENTS
The proposed security system arranges computer vision and machine learning functions into a desktop system. Since the system is
primarily software-based, it requires moderate hardware and software specifications to support real-time processing and user
interaction.
3.1 SOFTWARE REQUIREMENTS
Implemented with Python 3.x, it is an open-source, high-level programming language, just like the rich library ecosystem
supporting machine learning and image processing. The main libraries used are:
• Open CV: A heavily utilized computer vision library, offering tools for image processing, face detection, and motion
tracking (Bradski & Kaehler, 2020).
• NumPy: Supports numerical operations and array manipulations.
• Scikit-image: Raises the capability of image processing by performing filtering, segmentation, and feature extraction.
• Tkinter: Designed to build a graphical user interface (GUI) for user interaction.
• Django: Allows for the extension of the surveillance system from the Web, enabling remote access.Windows/Linux/mac
any version of python 3
The main operating systems are Windows, macOS, and Linux.
3.2 HARDWARE REQUIREMENTS
System Recommendations for Best Performance:
• A standard laptop or desktop PC with at least 8 GB of RAM. It must carry a multi-core processor and support GPU-based
acceleration for faster frame analysis.
• Webcam resolution of 720p or higher is required for reliable face and motion detections.
• Adequate storage must be available to save along the tracks; video frames and detection snapshots.
• Some external components such as LED lighting could help in a low-light environment for enhanced face recognition.
The above configuration/style balances performance and cost efficiency, making the system accessible for a small-scale setup
or home surveillance application.
IV. METHODOLOGY
The smart surveillance system, as proposed, has to be modular in nature. Its four basic functions are object monitoring, facial
recognition, noise detection, and visitor tracking. All these modules are implemented with lightweight yet sufficiently strong
computer vision algorithms for real-time processing on consumer-grade hardware.
Figure 4: Dataflow Diagram
We have established elements in our project such item monitoring, noise detection, facial recognition, and guest in and out detection.
The various functions that can be carried out with this project are listed below:
• Monitor
• Identify the person
• Detect the noise
• In and out Detection
1. Monitoring Feature
2.
Figure 1: Structural Similarity Index Matrix Chart
The backbone of object monitoring is to compare video frames one after the other using the structural similarity index measure
(SSIM)-the perception-based metric that weighs changes in luminance, contrast, and the structure (Wang et al., 2004). When
major variations are detected between the two frames, thus suspecting that an object has moved or been taken away, the system
flags and alerts the event. This works fine when cameras are static and a consistent background exists.
3. Identify Feature
This feature is used to find if the person in the Frame is known or Unknown.
This is done in two steps:
3.1 Detect Faces in the Frames
This is done via Haar-cascade classifiers which are built in the OpenCV module of Python.
Haar cascade classifiers
Figure 2: Working of Cascade Classifier
3.1.1 Use LBPH Algorithm for face recognition
Radius, neighbors, grid x, and grid y are the four parameters employed by the LBPH.
Initial LBPH generates an intermediate image based on the radius and neighbours of the parameter that accurately represents the
original image.
Figure 2.1.1: LBPH for Face Recognition
4. Detection of noise in the frame
Frame differencing is an established approach for noticing noise because any anomaly in motion or audio characteristics gets
captured. Essentially, pixel-level changes detected as the absolute difference between two consecutive grayscale frames is equated
with either movement or interference. Computationally, the method is efficient and very effective for indoor surveillance,
especially when noise from the background is comparatively little (Lavanya , 2014).
Figure 3: Noise detection Grid matrix
5. Visitor entry/exit monitoring
This function looks to see if anyone entered or exited the space. Here is how it works:
1. It begins by scanning the frame for noise.
2. If motion is present, it is next assessed whether it is coming from the left or the right.
The frame will be taken if motion is detected from left to right and is considered to have entered.
Or the opposite.
As a result, this particular feature does not operate in a sophisticated mathematical way. As a result, we first detect motion, then we
draw a rectangle over the noise, and last we verify the coordinates to establish which side the motion came from. The motion is
then categorized whether those points lean left.
V. RESULT
1. Home Page
Figure 5.1 : Home Page
Home Page: When a user logs into your smart surveillance system, they are directed to the home page. It gives customers easy
access to the various functionalities and gives an overview of the features that are available.
Face Recognition:The function of recognizing and identifying people based on their facial traits, employing methods of machine
learning, can be used for security such as allowing access only to authorized persons to prohibited places.
Stolen Objects: This ability uses machine learning methods to check whether any objects have been stolen from the location. It
may be used to keep an eye on precious items, such as jewellery or electronics, and alert the user if anything disappears.
Visitor Detection: This function allows users to keep track of and monitor site visits. It may be used to monitor visitor movements
across the property and ensure that only authorized subjects are given access.
Alarm System:This function employs machine-learning algorithms to determine when an alarm is activated. It can be used to detect
intrusions or any breaches of security, and means females will be alerted when an alarm has been set off.
Motion in a window: This function uses machine learning methods to detect any movement inside the range of the camera. It can
be used to look for intruders or any strange activities and alert the user once movement is detected.Motion in a window
Figure 5.2: Motion in a Window
The very premise of motion detection is the continuous analysis of webcam live video, in this case-of-field, in order to detect some
form of activity or motion. Activity is only recognized as motion when the frame differencing and thresholding procedures detect
changes in the pixels that are sufficiently significant to pass the predetermined threshold. Thismeans that some false-positives
might have been triggered for some insignificant changes like illumination or airflow in the background (Lavanya, 2014). Upon
the detection of any motion, the area around the activity gets a bounding box placed around it, and an alarm is set off.
2. Stolen Objects
Figure 5.3 Stolen Objects
During initialization, the system captures reference frames and periodically compares them with the newly acquired frames, to
check for object placement or removal. The Structural Similarity Index (SSIM) measures changes in luminance, contrast, and
structure perceived by the human eye, hence making it possible to identify visual inconsistencies due to tampering or theft (Wang
et al., 2004). Alerts are activated if major differences are identified.
3. Identification
Figure 5.4 Identification
The facial recognition module consists of two stages: detection and classification. At detection, human faces are detected by Open
CV's Hear Cascade classifier, and then the faces are given to the LBPH algorithm for recognition.This model, therefore, is trained
on a dataset of 500 labeled images to provide an accurate real-time recognition. If any unknown person appears under the camera
view, it logs the event and may even raise an alarm (Kumar et al., 2018).
4. Record
Figure 5.5 Record
Your smart surveillance project's "Record" feature is made to let you capture live video feed from the camera. The feature operates
by periodically taking still images from the camera stream and saving them as video files on the system's storage device.
Simply select the "Record" option to use this feature, and the system will start recording and storing frames from the camera feed.
The video content is kept in a file format that can be viewed by common media players or video editing programmes.
The "Record" feature not only records video, but it also adds a date and time stamp to each frame that matches the time and date on
the system clock. When evaluating film in the event of an incident or security breach, this makes it simple to find and examine
footage from specified dates or hours.
5. Alarm
Figure 5.6 Alarm
In your smart observation project, the "Alarm" feature generates sound alerts in response to incoming motion. The function monitors
the camera's feed and looks for any motion or movement within its frame.
Upon the detection of movement, the system goes off to the alarm and emits a sound alert to indicate to the user or administrator
about the movement. The alarm can be set to conceptually be turned on after a predetermined time or can be adjusted to produce at
different volume or frequencies.
Therefore, if there is no motion at all within the frame, it shall not make any sound, thus reducing false alarms as well as minimizing
unnecessary noise or disturbances.
6. Visitors Detection
Figure 5.7 Visitors Detection
Tracking people is done by analyzing their directional motion. Changes in position are noticed against time, creating instant alerts
either for movements entering or leaving the frame. The entry is spotted when the object appears to be moving from left to-right;
contrariwise, exit is understood as right-to-left motion. This logic is worked out by comparing the coordinates of bounding boxes
and able to perform effective people flow analysis without incorporating extra sensors.
VI. CONCLUSION
The paper under consideration is concerned with the design and implementation of a smart surveillance system integrating multiple
machine-learning and computer-vision methods, which working together would enhance conventional security systems. By
automating tasks with algorithms including Haar for face detection, LBPH for recognition purposes, SSIM for structural
comparison, and frame difference for motion detection, the system improves upon aspects of surveillance that have traditionally
been manual and error-prone. The modular architecture facilitates real-time monitoring, tracking of the object, and visitor
identification activities, thus allowing for better awareness of situations in confined environments. Testing in a controlled indoor
setting has shown acceptable values of accuracy, response time, and usability, all pointing towards scenario suitability for small-
scale commercial and residential domains.
VII. FUTURE SCOPE
The current system forms a strong foundation for an intelligent surveillance system but improvements to increase accuracies in
complex environments can be made by integrating deep learning models. Some other upcoming features may include weapon
detection, fire, or smoke alarm detection, and audio anomaly detection. Extending the system to interface with cloud storage and
mobile notifications will allow remote access and instant alerts. Integration of edge computing will also improve processing
latency and alleviate central servers' dependency, further aiming for system efficiency and scalability.
REFERENCES
Ahonen, T., Hadid, A., & Pietikainen, M. (2006). Face Description with Local Binary Patterns: Application to Face Recognition.
IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(12), 2037–2041.
Bradski, G., & Kaehler, A. (2020). Learning OpenCV 4: Computer vision with Python. O’Reilly Media.
Huang, K., Li, G., & Liu, S. (2020). Learning channel-wise spatio-temporal representations for video salient object detection.
Neurocomputing, 403, 325–336.
Kumar, A., Kaur, A., & Kumar, M. (2018). Face detection techniques: a review. Artificial Intelligence Review, 52(2), 927–948.
Lavanya, M. P. (2014). Real Time Motion Detection Using Background Subtraction Method and Frame Difference. International
Journal of Science and Research (IJSR), 3(6), 1857–1861.
Maddalena, L., & Petrosino, A. (2008). A Self-Organizing Approach to Background Subtraction for Visual Surveillance
Applications. IEEE Transactions on Image Processing, 17(7), 1168–1177.
Sreenu, G., & Saleem Durai, M. A. (2019). Intelligent video surveillance: a review through deep learning techniques for crowd
analysis. Journal of Big Data, 6(1).
Viola, P., & Jones, M. (2001). Rapid object detection using a boosted cascade of simple features. Proceedings of the 2001 IEEE
Computer Society Conference on Computer Vision and Pattern Recognition, 1, I–I.
Wang, Z., Bovik, A. C., Sheikh, H. R., & Simoncelli, E. P. (2004). Image quality assessment: From error visibility to structural
similarity. IEEE Transactions on Image Processing, 13(4), 600–612.