Theft Detection Using Deep Learning
Theft Detection Using Deep Learning
Research Article
Keywords: YOLO (You Only Look Once), OpenCV (Open-Source Computer Vision Library), NCRB (National
Crime records bureau) ML (machine learning)
DOI: https://doi.org/10.21203/rs.3.rs-3540282/v1
License: This work is licensed under a Creative Commons Attribution 4.0 International License.
Read Full License
Abstract— In earlier times, crime detection relied solely on iii) Detecting suspicious activities in the vicinity, including the
human observation, lacking efficient methods for detection. The presence of weapons, and promptly notifying relevant
advent of CCTV cameras marked a significant advancement in authorities.
crime detection, but the manual review of video footage by It is worth noting that crimes, including theft, often exhibit
humans proved to be a time-consuming process. In present patterns that can be predicted through the analysis of large
world, Artificial Intelligence (AI) and Machine Learning (ML)
volumes of data. These patterns, once identified, can be
have made significant strides, the need for intelligent systems to
automate crime detection in CCTV surveillance has become invaluable to law enforcement efforts. Unfortunately, in many
paramount. Such systems can not only detect crimes but also instances, thefts go unreported due to societal pressures and
classify them and provide alerts to nearby police stations and other factors. Intelligent systems have the potential to swiftly
ambulances, thereby contributing to the reduction of crime detect theft incidents, thereby bypassing the need for
rates in any given country. Object detection and tracking in individuals to report them and automatically alerting the
computer vision have gained widespread attention due to their appropriate authorities. This proactive approach can help curb
diverse applications, including surveillance and security manipulative activities associated with theft and enhance
systems. Researchers have diligently worked to improve the overall security.
accuracy and efficiency of these processes. Our system aims to
enhance security measures and facilitate swift responses to
potential threats by employing real-time object detection on live
video feeds. Furthermore, this system can be further optimized
through the integration of specialized hardware, ensuring even
more robust and efficient crime detection capabilities.
II. RELATED WORK
1) The authors of given paper propose the use of
Keywords— YOLO (You Only Look Once), OpenCV machine learning (ML) models for real-time handgun
(Open-Source Computer Vision Library), NCRB (National weapon identification in surveillance. Their approach
Crime records bureau) ML (machine learning).
involves the utilization of a sliding window and a region-
I. INTRODUCTION based technique to detect handguns. They find that the Faster
Region-based Convolutional Neural Network (Faster R-
Theft is a prevalent global crime, accounting for a significant CNN) provides faster, more precise, and accurate results,
portion of criminal offenses, as reported by the National
achieving a precision rate of 84.21%, a 100% recall rate, and
Crime Records Bureau (NCRB), with theft incidents making
a higher true negatives rate.To make security alarm decisions
up approximately 80% of all reported crimes. The
consequences of increasing theft rates are not only financial when detecting a firearm, they introduce the Alert Activation
but also emotional for victims. This underscores the pressing Time per Interval (AATpI) mechanism, which validates the
need to develop a surveillance system that is user-friendly, presence of a handgun in the following frames before making
minimizes false alarms, reduces human intervention, and is a decision. This approach contributes to making more
cost-effective. accurate decisions regarding security alerts.Some notable
strengths of their project include real-time detection of
machine learning (ML) techniques offer a valuable avenue for weapons, testing on low-quality YouTube footage, and
the development of such efficient systems. These techniques obtaining predicted results. However, the project does have
can be instrumental in achieving various key objectives, some limitations, such as the inability to detect handguns in
including: the background and faster-moving objects. Additionally, the
system is designed to detect only handguns, limiting its scope
i) Detecting motion in otherwise static to a specific type of firearm.
environments.
ii) Recognizing facial expressions and identifying 2) In this paper, the authors utilized the "Change of
individuals wearing masks using ML models. trajectory by theta angle" method, as proposed by W.
Kobanne, to detect suspicious motion. The method involves effective object detection model training and subsequent
comparing two theta angles, theta 1 and theta 2, and if theta analysis.
1 is greater than theta 2, it triggers suspicious behavior
detection through object tracking.The system they developed
incorporates multiple levels of surveillance, each involving 5) The authors of the project used a crime dataset
meticulous monitoring of actions within each frame of video “UCF-crime dataset”. This dataset contains survelliance
using machine learning models that are trained for their videos of length 128-hours. The author has used lengthy
specific tasks. These models consider various parameters to surveillance footage if 1900 with different abnormalities like
assess whether a behavior qualifies as criminal.The decision- accidents,shoplifting,robberies and other events.The model
making process involves integrating the outputs of various created is able to detect crime without human involvement
sub-models, each with its own priority settings. These sub- and alerts the police to fasten the process. They had used
models encompass functions like mask detection, weapon different Pre-trained models, like Googlee Net and VGGNet-
detection, pose detection, and motion detection. By 19, have been well trained on and they can recognize objects
aggregating the outputs of these sub-models, the system can with less mistakes. But they chose the VGGNet19 model due
make informed decisions regarding potential criminal to its high accuracy. It can classify and recognizes items in
activity. real-time.
3) In their paper, the authors applied Markov's chain
rule to estimate the probabilities associated with different
types of crimes. They utilized a Transition Probability Matrix 6) In their project, the authors plan to employ various
(TPM) as a method to predict the probabilities of next object detection approaches, including Faster R-CNN,
occurrences of crimes. The TPM consists of two key Retina-Net, SingleShot MultiBox Detector (SSD), and YOLO.
components: a vector representing the probability from the From all the given approaches YOLO can give the maximum
training dataset and a matrix that characterizes the Markov accuracy for detecting objects in real world, particularly
chain method. In this specific context, the authors used the suited for real-time scenarios.
crime vector as the probability found from the given dataset YOLO utilizes neural networks to achieve object detection
. From the probability found matrix is made. They introduced and incorporates several key techniques:
the concept of a "crime growth factor," which essentially
quantifies how likely it is for one type of crime (e.g., Crime a) Residual Blocks: Residual blocks are a fundamental
A) to occur on day d + 1 given that another type of crime component in YOLO's neural network architecture. They help
(e.g., Crime B) occurred on day d. These crime growth in addressing the vanishing gradient problem and allow for
factors are then converted into probability values. To the training of deeper networks, which can capture complex
features in the input data.
calculate these factors, the authors considered dividing each
day into four parts and eventually merged them into a single
matrix. Subsequently, they employed the Naïve Bayes b) Bounding Box Regression: YOLO includes
algorithm to identify the primary hotspots or locations where bounding box regression, a technique used to refine the
these crimes are most likely to occur based on the calculated locations of detected objects. This helps in accurately
probabilities. localizing objects within the image.
4) In their paper, the authors recognized the c) Intersection Over Union (IOU): IOU is very
importance of having a substantial dataset of weapon images important metric in detecting objects. It measures the overlay
for training machine learning models. They went about between predicted bounding boxes and ground truth boxes.
collecting these images manually from Google and organized YOLO uses IOU to evaluate the accuracy of object detection
them into a specific format, typically saved as ".jpg" files, and and to decide when to detect objects.
stored within a folder named "images." They made sure to In YOLO, the bounding boxes are weighted based on
gather a minimum of 50 images for each distinct weapon probabilities assigned to different objects in the image. These
class to ensure a diverse and representative dataset. probabilities are determined by the model during inference.
Before proceeding with the training process, a preprocessing The final weights are then used to determine which bounding
step was undertaken. All the collected images were resized to boxes should be considered as valid detections.
a uniform size of 416x416 pixels. Standardizing the image Each bounding box in YOLO has four dimensions: the center
dimensions to this size helps streamline the subsequent coordinates of the bounding box, its length, height, and a
processing of images in batches, making it more confidence score. This representation makes YOLO well-
computationally efficient and consistent. The main goal of suited for applications that require fast and robust object
this data acqusition process and preprocessed the data to detection.
facilitate the training of machine learning models, Overall, the utilization of YOLO and its associated
particularly for the task of object detection. Object detection, techniques in this project aims to provide efficient and
as explained, is a CV field dedicated to the identification and accurate object detection, particularly in real-time scenarios
localization of objects within digital images or video frames. where rapid detection of objects is crucial.
By preparing a comprehensive dataset and resizing images
to a consistent size, the authors laid the foundation for 7) In their paper [7], the authors present a CCTV
surveillance system designed to automatically detect gestures
or signs of aggression and brutality in real-time. This system (DTs). In particular, it highlights the interpretability and
is made of two main modules, each serving a distinct scalability of these methods.
purpose: The comparative results obtained from this study offer
valuable insights for both researchers and practitioners who
a) Object Detection: The first module is focused on work on image recognition tasks. It helps them make valuable
detecting objectionable objects like guns and knives. This is decisions when picking the best significant machine learning
crucial for identifying potential threats within the algorithm established on their definite needs and essentials.
surveillance footage. In summary, the research paper [8] provides a rigorous
analysis of machine learning algorithms in the context of
image recognition, aiding in the understanding of their
b) Abnormal Human Activity Detection: The second strengths and weaknesses regarding accuracy,
module is designed to identify abnormal human activities, computational efficiency, and hardiness.
such as aggressive gestures or actions that may indicate .
violence. This module aims to recognize patterns of behavior 9) In this paper authored by [9], the authors propose a
that are not typical in a given context. way for the detection of larceny using a combination of
Inceptionn V3 and both directional long short term memory
The primary objective of the system is to minimize the need (BILSTM). Here's an overview of the key components and
for human intervention in monitoring CCTV feeds and to findings of the study:
minimize false alarms. To achieve this, the system is designed
to activate surveillance only when there is movement in a a) Inception V3: Inception V3 is a CNN architecture
room. This approach not only conserves resources but also noted for its effectiveness in capturing one and the other
ensures privacy when surveillance is unnecessary. local and global attributes. It employs various convolutional
machine learning techniques are leveraged in this system, filters to capture details at different scales, making it suitable
including the Faster R-CNN (Region-based Convolutional for detecting suspicious activities or objects in surveillance
Neural Network) for detecting objects and optical flow for footage.
motion estimation. Faster R-CNN is employed for accurate b) Bidirectional long short term memory (BILSTM):
and efficient object detection, while optical flow helps in BILSTM is a type of RNN that brilliant in modeling
tracking motion patterns. impermanent provinces in uninterrupted data. It takes input
When potential criminal activities or signs of aggression are series in both further and rearward directions, allowing it to
detected, the system triggers alerts or buzzers to bring catch both before and after the context. This is particularly
attention to the situation. Additionally, it has the capability to useful for analyzing sequences of video frames.
notify relevant authorities, such as law enforcement agencies, c) Method Combination: The proposed method
to facilitate a prompt response. combines the strengths of Inception V3 and BILSTM.
The authors also mention potential future enhancements to Inception V3 is used for feature extraction, while BILSTM is
the system, including the incorporation of night vision employed for sequence analysis. This combination aims to
capabilities using infrared image enhancement. This would provide comprehensive feature extraction and sequence
further improve the system's effectiveness in low-light modeling for shoplifting detection.
conditions, enhancing overall surveillance capabilities. d) Dataset: The study uses a dataset called 'shoplift-
23,' which consists of 900 videos categorized into two
8) In research paper [8], the authors present a classes: shoplifting and non-shoplifting. Each video provides
comprehensive relative study of various machine learning 90 frames for training input, resulting in a total of 81,000
algorithms for recogning images, with a particular focus on frames. The problem is treated as a supervised classification
the following four algorithms: Convolutional Neural task.
Networks (CNNs), Support Vectoring Machines (SVMs), e) Model Evaluation: Various tactics, such as two-D
Random Forests (RFs), and k-Nearest Neighbors (KNN). CNN, three-D CNN, and the used model, are evaluated using
The study evaluates these machine learning algorithms based an 80:20 random split for training and validation.
on several critical criteria, including: f) Results: The results demonstrates the used method
surpassed baseline tactics in respect of accuracy, precision,
a) Accuracy: This criterion assesses how well each recall, and f1-score. The proposed model achieves an
algorithm can correctly classify and recognize objects or accuracy 82%, precision 88.80%, recall 78.40%, and f1-
patterns in images. score 83.01%.
b) Computational Efficiency: It examines the g) Reasons for Superior Performance: The paper
computational resources required by each algorithm, discusses the reasons behind the superior performance of the
including training and inference times. proposed model, including its multi-scale processing,
c) Robustness: The study investigates how well the efficient use of parameters, regularization techniques, and
models perform under different conditions,i.e; in case of the ability of BILSTM to catch deep-rooted provinces in
noisy images and fluctuations in image quality. sequential video data.
In summary, the authors present a method that combines
The paper also delves into the architecture and training Inception V3 and BILSTM for shoplifting detection,
process of Deep Belief Networks (DBNs) and Decision Trees demonstrating its effectiveness in outperforming baseline
methods on a dataset of surveillance videos. The method's
success is attributed to its feature extraction capabilities, During data preprocessing, a unique vocabulary is
sequence modeling, and various architectural optimizations. generated, consisting of words that appear in all the dataset's
image captions. This vocabulary is saved for reference in
subsequent steps, and the data is prepared for further
processing.
III. METHODOLOGY
B) Data Acquisition and Analysis:
The components indexed down are the different steps of our The image frames obtained from the video are processed
system are: and passed through various machine learning models. Each
of these models performs specific tasks in a predefined
1. Acquiring Real-time footages sequence, focusing on different evaluation parameters. In
2. Different Techniques of our model
this phase, the following machine learning models are
3. Deciding theft activity depending on our tactics
utilized:
1) Mask Detection:
Detecting instances of robbery and identifying individuals
wearing masks is a crucial application of CV and machine
learning. To accomplish this, machine learning model is
trained to distinguish images or video frames into distinct
categories, including "normal," "robbery," "mask," and "no
mask." Transfer learning with pre-trained models like
ResNet or Mobile Net can be employed to expedite the
training process. To specifically detect masks, object
detection techniques like YOLO or FR-CNN is used to
locate faces within the frames and then determine the
presence of masks.
2) Weapon Detection:
Weapon detection is indeed a critical task in various
security applications, and machine learning algo knn can
be used for classification in such scenarios. Here's an
overview of how knn can be used for weapon detection:
4) Motion Detection:
In this we use Change of trajectory by theta angle method
proposed by W.Kobanne which is used for detecting
suspicious motion. It consists of the following steps: -.
a) Commence: first rectangle encompassing the objec.
We Manually start rectangle nearby object in the first frame
b) Then the extracting of interesting points of the object
in the frame.
3) Object Detection c) ‘ x’ is the gap from the starting in the horizontal
Selecting an object detection framework or axis, In the vertical axis, ‘y’ is the gap from the start.
library depends on the specific requirements of your
d) After every ten frames, a mean displacement vector
application. Here's a brief overview of each of the options
is determined, as well as inclination THETA in middle of two
you've mentioned:
successive mean displacement vectors. |a|* |b|. cos = a*b
a) YOLO: - Real-world Performance: yolo is renowned for
its high precision object detection capabilities. It can operate 5)Theft Detection using CNN:
images or video frames very quickly, making it suitable for Real-time theft detection using Convolutional Neural
applications requirng low-latency responses, such as Networks- (CNNs) is a challenging but achievable task. To
autonomous vehicles and real-time surveillance. implement a real-time theft detection system, you'll need to
consider factors such as low latency, real-time video
b) FR-CNN - Accuracy: FR-CNN is known for its accuracy processing, and efficient use of computational resources.
in object detection. It typically achieves max precisionn and a) We must design a CNN architecture that's
recal rates, building it suitable for application for detecting suitable for real-time video frame analysis. We
objects with high precision is crucial, such as medical will be considering employing lightweight models
imaging or fine-grained object recognition. like MobileNet , SqueezeNet, or custom
architectures optimized for real-time
performance
b) We will Optimize our model for real-time IV. CONCLUSION
processing by reducing its size, utilizing In conclusion, theft detection and security challenges require
quantization, and leveraging hardware a multi-faceted and adaptive approach. Utilizing advanced
acceleration (e.g., GPUs or TPUs) if available. algorithms, machine learning, and continuous improvement
c) CNNs are thoroughly used for image and video efforts can enhance the effectiveness of security systems.
inspection. They are effective for detecting theft in The dynamic nature of security threats underscores the need
images and video frames, especially when for ongoing research, development, and vigilance in
combined with techniques for object detection addressing these challenges.
and tracking. a.
b.
c.
j) Continuous Improvement: Recognizing that theft [8] Castillo, A., Tabik, S., Pérez, F., Olmos, R. and Herrera, F. “Brightness
detection is an ongoing effort, and continually refining
guided preprocessing for automatic cold steel weapon detection in
strategies and technologies to adapt to evolving threats.
surveillance videos with deep learning”. Neurocomputing, 330,
pp.151-161, 2019.
Detection and Classification using YOLOv3” Dept. of Electronics and Access, 8, pp.133330-133348,2020.
Communication Engineering SDM College Of Engineering and [24] Zhang, L.; Zhu, G.; Shen, P.; Song, J. “Learning Spatiotemporal
Technology Dharwad, India. International Journal of Engineering Features Using 3DCNN and Convolutional LSTM for Gesture
Research & Technology (IJERT) ISSN: 2278-0181 Recognition”,In Proceedings of the 2017 IEEE International
IJERTV10IS020078 ,Vol. 10 Issue 02, February-2021 Conference on Computer Vision Workshops (ICCVW), Venice, Italy,;
[13] Joseph Redmon, Santosh Divvala, Ross Girshick, Ali Farhadi “You pp. 3120–3128. [CrossRef] , 22–29 October 2017.
Only Look Once: Unified, Real-Time Object Detection”, University of [25] Ogwueleka, F.N.; Misra, S.; Colomo-Palacios, R.; Fernandez, L.
Washington , Allen Institute for AI , Facebook AI Research. “Neural Network and Classification Approach in Identifying Customer
[14] Tanvir Ahmad , Yinglong Ma , Muhammad Yahya, Belal Ahmad, Shah Behavior in the Banking Sector”,A Case Study of an International
Nazir , and Amin ul Haq. Hindawi Scientific Programming Volume, Bank. Hum. Factors Ergon. Manuf. Serv. Ind. 25, 28–42.
[15] Y. Lee, T. Song, H. Kim, D. K. Hant, and H. Ko, “Hostile intent and
behaviour detection in elevators” in 4th International Conference on
Imaging for Crime Detection and Prevention 2011 (ICDP 2011), pp. 1–
6, London, 2011.
[17] Toshev, A. and Szegedy, C., Deeppose: “Human pose estimation via
deep neural networks” In Proceedings of the IEEE conference on
computer vision and pattern recognition (pp. 1653-1660),2014.
[18] Xu, X., Tang, J., Zhang, X., Liu, X., Zhang, H. and Qiu, Y., “Exploring
techniques for vision based human activity recognition: Methods,
systems, and evaluation. Sensors”, 13(2), pp.1635-1650.”,2013.
[21] Pérez-Hernández, F., Tabik, S., Lamas, A., Olmos, R., Fujita, H.
and Herrera, F., “Object detection binary classifiers methodology based
on deep learning to identify small objects handled similarly:
Application in video surveillance”, KnowledgeBased Systems, 194,
p.105590,2020.
[22] Simo-Serra, E., Ramisa, A., Alenyà, G., Torras, C. and Moreno-
Noguer, F., “Single image 3D human pose estimation from noisy
observations”. In 2012 IEEE Conference on Computer Vision and
Pattern Recognition (pp. 2673-2680). IEEE, 2012 June.