0% found this document useful (0 votes)
17 views8 pages

Vehicle Object Detection Based On Deep Learning

This paper explores the application of deep learning, specifically the SSD algorithm and convolutional neural networks, for vehicle detection to enhance traffic monitoring and reduce accidents. It details the theoretical foundations of deep neural networks and the implementation of a vehicle detection system using OpenCV, demonstrating its effectiveness through both static images and video analysis. The proposed method aims to achieve high recognition rates and stability in identifying vehicles in various states.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views8 pages

Vehicle Object Detection Based On Deep Learning

This paper explores the application of deep learning, specifically the SSD algorithm and convolutional neural networks, for vehicle detection to enhance traffic monitoring and reduce accidents. It details the theoretical foundations of deep neural networks and the implementation of a vehicle detection system using OpenCV, demonstrating its effectiveness through both static images and video analysis. The proposed method aims to achieve high recognition rates and stability in identifying vehicles in various states.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Academic Journal of Science and Technology

ISSN: 2771-3032 | Vol. 5, No. 1, 2023

Vehicle Object Detection Based on Deep Learning


Zhaoming Zhou, Hui Li
School of Mechatronic Engineering, Southwest Petroleum University, Chengdu 610500, China

Abstract: With the continuous improvement of science and technology and living standards, cars have become a necessary
means of transportation for people to travel, which is bound to be followed by traffic accidents, so it is particularly necessary to
detect or identify cars. Aiming at the above problems, this paper focuses on the application of deep learning in vehicle recognition
problems, and is committed to finding a vehicle recognition algorithm with high recognition rate and stability. This paper focuses
on the analysis of SSD algorithm and its basic theory convolutional neural network. Finally, this paper uses the pictures of
stationary vehicles and a film video respectively to identify the stationary and moving state of vehicles. In the process of image
recognition, the rationality and accuracy of the proposed method are verified according to the accuracy rate given above the
image.
Keywords: Deep learning, Vehicle detection, OpenCV, SSD algorithm.

semantically similar words and sentences will have meaning.


1. Introduction The entire network uses re-sharing (instances of multitasking
With the continuous development of science and learning) to train all of these tasks together. Domestic research
technology, cars have become an indispensable means of on deep learning has also made great progress and obtained a
transportation in People's Daily life, which is also the lot of research results. Ding Hong and Rao Wanxian[5] used
fundamental cause of traffic congestion and a necessary factor the deep belief network to detect human behavior. Before
for traffic accidents. It is the existence of these two situations using the deep belief network for recognition, the author
that increase the work burden of traffic police and related preprocessed the data, denoised the data using wavelet
units. Beyond that, tracking vehicles is the most immediate technology, and then used PCA method for main frequency
problem facing government agencies. In order to alleviate this analysis. Finally, the processed data was substituted into the
burden and even eliminate all kinds of possible problems, deep belief network for training, and the model was obtained
necessary intelligent monitoring of vehicles has become an for human behavior recognition. The deep belief network
effective means. Deep learning not only solves these proposed in this paper is as effective as other machine
problems, but is now being applied to every aspect of life. learning methods. In order to improve the image recognition
Deep learning was first proposed by Hinton[1] in 2006. Since accuracy, Gao Yuan et al.[6] proposed a wide residual super-
then, it has played an irreplaceable role in People's Daily life. resolution neural network based on depth-detachability
Therefore, scholars at home and abroad have conducted convolution. This method divides the channels of the
extensive research on the theoretical work and application convolutional layer into several groups and normizes the data
objects of deep learning. Hayit Greenspan et al.[2] take into of each group.
account the advanced nature of deep learning and realize the Based on the subject of vehicle target detection, this paper
power of Convolutional Neural Network (CNN) in processing selects OpenCV toolbox[7] and uses the method of single
visual tasks, so they use CNN to study the positioning work target multi-box detection (Single Shot MultiBox
of deep learning in medical image processing. It also provides Detector,SSD)[8,9]combined with deep neural network
some help for the future medical image processing and the (Deep Neural Network, DNN)[10,11]to achieve accurate and
development of medical career. Amodei D et al.[3] replaced efficient vehicle positioning and recognition.
the entire pipeline of hand-designed components with neural
networks, and end-to-end learning enabled us to process a 2. The Theoretical Basis of The
wide variety of speech, including noisy environments, accents, Research
and different languages. Finally, this method is used to
In this paper, MobileNet SSD in OpenCV and deep neural
recognize two distinct speech sounds of English and
network module are combined to construct vehicle target
Mandarin, which opens a successful door for future speech
detector.
recognition work. Finally, looking at the advantages of deep
learning in natural language processing, American scholar 2.1. Deep neural network DNN
Ronan Collobert et al.[4] described a single convolutional
DNN divides neural network layers into three categories
neural network architecture that, given a sentence, outputs a
according to the layers in different positions, namely input
series of language processing predictions: part of speech tags,
layer, hidden layer and output layer. Fig. 1 shows the specific
blocks, named entity tags, semantic roles, the likelihood that
figure.

38
Figure 1. Neural network structure

The middle layer and layer of DNN can be fully connected.


Specifically, the neurons in layer 𝑖 are connected with each It can be seen from the figure that the connection between
neuron in layer 𝑖 1, and the specific connection mode can the two layers of neurons is connected by weight. In the figure,
be expressed as Equation (1). 𝑤 represents the weight of the connection between the first
neuron of the first layer and the first neuron of the next layer,
𝑧 ∑𝑤 𝑥 𝑏 (1) and 𝑤 represents the weight of the connection between the
third neuron of the first layer and the fourth neuron of the next
In equation (1), 𝑧 represents the output of neurons in layer layer. Take the output of the first neuron in the second layer
𝑖 1, 𝑤 represents the weight vector of neurons in layer 𝑖, for example, as shown in Equation (2).
𝑥 represents the value of neurons in layer 𝑖 , and 𝑏
represents the bias of neurons in Layer 𝑖. 𝑧 𝑤 𝑥 𝑤 𝑥 𝑤 𝑥 𝑏 (2)
The definition of parameters between neurons in DNN can
be shown in Fig. 2. In Formula (2), 𝑥 , 𝑥 and 𝑥 respectively represent the
values of the three neurons in the first layer, 𝑏 represents the
bias of the first layer, and b represents the weight. The forward
propagation of eigenvalues in a neural network is that
multiple neurons in Fig. 2 are overlapped until the final output.
2.2. Convolutional neural networks CNN
The classic CNN model is shown in Fig. 3, in which 2D
images are directly input into the network and then convolved
with several adjustable convolution kernels to generate
corresponding feature maps to form layer C1. The feature
map in layer C1 will be sampled twice to reduce its size and
form layer S1. Typically, the pool size during mapping is 2×2.
This process will be repeated in layers C2 and S2. After
sufficient features are extracted, the 2D pixel raster is
converted into 1D data and input into the traditional neural
Figure 2. Stealth relationship between the two layers of network classifier.
neurons

Figure 3. Convolutional neural network structure

39
Entering the convolution layer, the upper layer feature map 𝑟𝑜𝑡180 ∑ 𝑋 ∗ 𝑟𝑜𝑡180 𝛥𝑝 (6)
is divided into many local regions and convolved with
trainable kernels respectively. After processing convolution
by activation function, we will obtain new output feature In equation,𝑟𝑜𝑡180 represents a 180 degree rotation of
maps. Let the 𝑙 layer be the convolutional layer, and the the matrix.
output of 𝑗 layer can be expressed as: If the feature mapping in the former S layer 𝑞′ is connected
to the set C in the convolution layer 𝑝 , the extended residual
𝑋 𝑓 ∑∈ 𝑋 ∗𝐾 𝑏 (3) error 𝑞′ can be expressed as:

In equation (3),𝑀 represents the local region connected by 𝛥 ′ ∑ ∈ 𝛥 ∗ 𝑟𝑜𝑡180 𝛥𝑝 𝑋 ′ (7)


the 𝑗 kernel,𝐾 is a parameter of the convolution kernel, 𝐾
After updating all parameters, the network completes a
is a bias, 𝑓 is a Gaussian kernel.
round of training, which should be performed for all training
In the subsampling layer, the most common method is the
samples until the entire network meets the training
mean pool in the 2×2 region. That is, an average of 4 points
requirements.
in the area serve as new pixel values.
Parameter estimation in CNN still uses the gradient 2.3. Principle of SSD algorithm
algorithm of back propagation. However, according to the
The SSD algorithm is a branch algorithm of CNN. SSD
characteristics of CNN, we should make some modifications
convolutional neural network is a supervised deep learning
in a few specific steps.
model, which is mainly composed of three layers:
Here, assume that the residual error vector propagated to
convolutional layer, activation function layer and pooling
the raster layer is. Specifically, it can be expressed as formula
layer. In a standard SSD (SSD300), 8732 bounding boxes and
(4).
8732 scores per category are generated for each input image.
𝑑 𝑑 ,𝑑 ,…,𝑑 (4)
The output after the non-maximum suppression step is the
final detection result. Non-maximum suppression is an
Since rasterization is a 2D-to-1D transmission, it is only algorithm that attempts to eliminate additional boundary
necessary to reorganize the residual error vector from the 1D boxes based on scores and intersections. In other words, non-
to 2D matrix and pass it back to the subsampling layer. maximum suppression attempts to merge all bounds
When inverting from S layer to C layer, different pooling suggested as unique objects. In the early layers of SSDS, a
methods correspond to different processes of back standard architecture called the underlying network was used
propagation of residual error. In the average convergence, we for high-quality image representation (truncated before any
simply average the residual error of the current point to the classification layer).
top 4 points. Here, it is recommended that the residuals at the
point of layer S be 𝛥 . After subsampling, the error 3. Vehicle Object Detection Based on
transmitted to layer C can be expressed as Equation (5). SSD
𝛥 𝑢𝑝𝑠𝑎𝑚𝑝𝑙𝑒 𝛥 (5) 3.1. Object detection analysis based on
OpenCV
Next, the trainable parameters in layer C are analyzed.
This chapter combines MobileNet SSD and DNN module
Layer C has two tasks in backpropagation: reverse the
in OpenCV to build deep learning object detector. This article
residuals and update their parameters. According to BP
introduces the code in several parts. First, create anew project
algorithm and considering convolution operation, we can get
file in Python and write code to import some of the necessary
the formula for updating parameter 𝜃 in the convolution
toolkits in OpenCV, as shown in Fig. 4.
layer as follows:

Figure 4. Importing and setting the tool package

Set the necessary Settings for target detection. The setting methods are manual and automatic. --image indicates setting

40
the path to the input image, -- prototxt indicates setting the signal detector, the default value being 20%.
location of the Caffe file, --model indicates setting the After completing this series of preparatory work, set class
location of the pre-training model, and --confidence indicates labels and detection box colors for detection targets, as shown
setting the minimum probability threshold of the filter weak in Fig. 5.

Figure 5. Setting labels and colors

As shown in Fig. 5, it is not difficult to find that the class setting relevant parameters, the model needed for analysis in
label set by this program includes two categories of "bus" and this paper needs to be loaded. The program of loading the
"car" to be studied, which can be accurately detected, and then model is shown in Fig. 6.
the border color of the object to be detected is set. After

Figure 6. Loading the model

After loading the model, you need to load the query image which is called blob. Blob analysis tools can isolate objects
and perform blob analysis. The blob feeds forward through from the background and calculate the number, location,
the network, as shown in Fig. 7. blob analysis is the analysis shape, orientation, and size of objects, as well as provide a
of the connected domain of the same pixels in the image, topology between related spots.

Figure 7. Feature extraction

In Fig. 7, the image is first loaded, then the height and width completed, the process of feature transmission is followed, as
of the image are extracted, and a blob of 300×300 pixels is shown in Fig. 8.
calculated from the image. After feature extraction is

Figure 8. Setting the forwarding network

In Fig. 8, we need to set the network input, calculate the ordinary CPU without GPU intervention. After all
forward transmission result of the input and store the result as configurations are complete, you need to implement cyclic
detection. The running time of the program here is closely detection and determine the location of the detected object.
related to the size of the input features and the model chosen, Fig. 9 shows the detection process.
but the method chosen in this paper can be implemented on

41
Figure 9. Target detection

In Fig. 9 we first loop through our detection, remembering Step4: The display text needs to be constructed by itself, so
that multiple objects can be detected in a single image, in next it needs to build a text label containing the class name
addition to checking the confidence (i.e., probability) and confidence;
associated with each detection. If the confidence level is high Step5: Use the label, print it to the terminal for display, and
enough (that is, above the threshold), then we display the then use the coordinates extracted before to draw a color
prediction in the terminal and draw the prediction on the rectangle around the object;
image using text and color bounding boxes. The specific Step6: Set the label display position, which is not important
detection steps are as follows: in the whole detection process. Under normal circumstances,
Step p1: Loop test the object to be tested, and extract the we want the label to be displayed above the rectangle, but if
confidence value using the confidence function; there is no space, we will display it below the top of the
Step2: Judge the size of the confidence value. If the rectangle. The setting here can be set according to the
confidence value is greater than the minimum threshold set, preferences of scholars.
extract the class index label and calculate the boundary box Step7: Finally, use the value just calculated to cover the
around the detected object; color text on the image.
Step3: After calculating the value of the boundary box, After all the detection steps are completed, the detected
extract the coordinate value of the border, and use it to draw images need to be displayed to the readers, as shown in Fig.
a rectangle and display text; 10.

Figure 10. Detection result

So far, the OpencV-based target detection system has recognition system based on computer vision consists of five
completed all the functions of static detection. For the parts: vehicle image information acquisition, vehicle image
dynamic detection of detection objects, it only needs to preprocessing, vehicle image segmentation, vehicle image
change image to video before the program. The target feature extraction, vehicle recognition. The above 5 parts are
detection results under the above two states will be shown in equally applicable to both moving and stationary vehicles.
the next chapter. Vehicle image information can be captured by high-speed
camera or video data stream collected by high-definition
3.2. Vehicle detection analysis based on camera. After the acquisition of image information, it is
OpenCV necessary to preprocess the image information. The
In general, the vehicle state is divided into two states: processing content includes information denoising processing
moving and stationary. A relatively complete vehicle and image effect enhancement processing. After the

42
completion of image preprocessing, the image needs to be 4. The Case Analysis of Vehicle
segmented. The purpose of image segmentation is to Detection
determine whether there is a vehicle in it. If there is a vehicle,
the next step of feature extraction can be carried out. 4.1. Static vehicle image recognition
Classification processing mainly includes establishing Firstly, the vehicle image is detected according to the
vehicle classification model according to vehicle sample requirements of this paper. The original image and detection
information, and adjusting relevant parameters in the model, results are shown in Fig. 11.
and finally realizing the whole vehicle identification work
based on this.

(a) Original drawing of vehicle (b) Vehicle test result


Figure 11. Comparison of vehicle detection
detection accuracy rate is as high as 99.99%. In order to
The vehicle in Fig. 11(a) is detected, and the results are exclude accidental circumstances, this paper will give a
shown in Fig. 11(b). It can be seen that the detection algorithm comparison diagram of another group of test results, as shown
correctly gives the frame where the car is located, and the in Fig. 12.

(a) Original drawing of vehicle (b) Vehicle test result


Figure 12. Comparison diagram of vehicle detection

In order to show that the detection result in Fig. 11 is not recognition is up to 99.43%. From the point of view of
accidental, some other background is added to the material probability theory we can say that this is a car. The above are
given in Fig. 12, and the location of the vehicle is very far the detection results of a single target. In order to illustrate the
away. However, according to the detection results, this correctness of the method, the detection results of multiple
method can well identify the vehicle information, and it can targets are shown in Fig. 13.
be seen from Fig. 12 that the accuracy rate of vehicle

(a) Original drawing of vehicle (b) Vehicle test result


Figure 13. Comparison of vehicle detection results

43
Fig. 13 shows that the detection method proposed in this
paper can detect not only a single vehicle, but also multiple 4.2. Dynamic vehicle image recognition
vehicles. The results are shown in Fig. 13(b), where the For dynamic vehicle image recognition, this paper captures
detection accuracy of two vehicles is 99.99% and 97.73% a film video and identifies the vehicles in it. The recognition
respectively, which can also illustrate the accuracy of the process is shown in Fig. 14.
results.

(a) Dynamic Vehicle identification I (b) Dynamic vehicle identification II

(c) Dynamic vehicle identification III (d) Dynamic vehicle identification IV

(e) Dynamic Vehicle identification V (f) Dynamic vehicle identification VI


Figure 14. Dynamic vehicle identification diagram

Fig. 14 shows the vehicle recognition process in the film open source tool box, is used to detect the target object, and
material provided in this paper. The six subgraphs show the the detection of stationary state and moving state of the
appearance of cars as the result of recognition respectively. It vehicle is realized.
can be seen that the algorithm provided in this paper can also
better realize the recognition of the whole vehicle in the References
process of vehicle movement. The figure not only contains the [1] Hinton G E, Osindero S, Teh Y W. A fast learning algorithm
recognition of a single vehicle, but also includes the for deep belief nets[J]. Neural computation, 2006, 18(7): 1527-
recognition of multiple vehicles. The figure also includes the 1554.
recognition of vehicles in good light and dim light. No matter
[2] Greenspan H, Van Ginneken B, Summers R M. Guest editorial
what kind of environment, the recognition method in this deep learning in medical imaging: Overview and future
paper can better identify the dynamic process of vehicles. promise of an exciting new technique[J]. IEEE Transactions on
Medical Imaging, 2016, 35(5): 1153-1159.
5. Conclusion [3] Amodei D, Ananthanarayanan S, Anubhai R, et al. Deep
This paper takes vehicle recognition as the research speech 2: End-to-end speech recognition in english and
objective, and studies the application of deep learning in mandarin[C]. International conference on machine learning.
image processing, which includes image information 2016: 173-182.
acquisition, vehicle image preprocessing, vehicle image [4] Collobert R, Weston J. A unified architecture for natural
segmentation, vehicle image feature extraction and vehicle language processing: Deep neural networks with multitask
recognition several processes. In the research process, the learning[C]. Proceedings of the 25th international conference
deep learning method of convolutional neural network is on Machine learning. ACM, 2008: 160-167.
studied, and the SSD algorithm based on convolutional neural [5] Ding Hong, Rao Wanxian. Design of Human Behavior
network is used to realize the recognition process of vehicle Detection System based on Deep Learning [J]. Electronic
image. In the specific process of identification, OpenCV, an Technology and Software Engineering,2019,No.155(09):79-80.

44
[6] Gao Yuan, Wang Xiaochen, Qin Pinle, et al. Superresolution [9] Qin Xiaowen, Wen Zhifang, Qiao Weiwei. Image processing
reconstruction of medical image based on depth-separable based on OpenCV [J]. Electronic Testing and
Convolution and wide residual Network [J]. Journal of Testing,2011,No.229(07):39-41.
Computer Applications,2019,39(09):2731-2737.
[10] Qian Y, Fan Y, Hu W, et al. On the training aspects of deep
[7] Jibbe M K, Kannan S. Method to handle demand based neural network (DNN) for parametric TTS synthesis[C]. 2014
dynamic cache allocation between SSD and RAID cache: U.S. IEEE International Conference on Acoustics, Speech and
Patent Application 12/070,531[P]. 2009-8-20. Signal Processing (ICASSP). IEEE, 2014: 3829-3833.
[8] Liu W, Anguelov D, Erhan D, et al. Ssd: Single shot multibox [11] Lu Hongtao, Zhang Qinchuan. Application of deep
detector[C]. European conference on computer vision. convolutional neural networks in computer vision [J]. Data
Springer, Cham, 2016: 21-37. Acquisition and Processing,2016,31(01):1-17.

45

You might also like