0% found this document useful (0 votes)

19 views19 pages

Jimaging 10 00197

This paper presents an improved Faster R-CNN algorithm designed to enhance multi-scale target detection capabilities. Key enhancements include the use of ResNet101 for feature extraction, integration of Online Hard Example Mining, Soft non-maximum suppression, and Distance Intersection Over Union modules, as well as a simplified Region Proposal Network to improve detection speed and accuracy. The proposed model demonstrates significant advantages in detection metrics compared to existing models, providing valuable insights for applications in various fields such as smart agriculture and medical diagnosis.

Uploaded by

João Gabriel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views19 pages

Jimaging 10 00197

Uploaded by

João Gabriel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Journal of

Imaging

Article
A Multi-Scale Target Detection Method Using an Improved
Faster Region Convolutional Neural Network Based on
Enhanced Backbone and Optimized Mechanisms
Qianyong Chen, Mengshan Li * , Zhenghui Lai, Jihong Zhu and Lixin Guan

College of Physics and Electronic Information, Gannan Normal University, Ganzhou 341000, China;
[email protected] (Q.C.); [email protected] (Z.L.); [email protected] (J.Z.);
[email protected] (L.G.)
* Correspondence: [email protected]

Abstract: Currently, existing deep learning methods exhibit many limitations in multi-target detection,
such as low accuracy and high rates of false detection and missed detections. This paper proposes
an improved Faster R-CNN algorithm, aiming to enhance the algorithm’s capability in detecting
multi-scale targets. This algorithm has three improvements based on Faster R-CNN. Firstly, the
new algorithm uses the ResNet101 network for feature extraction of the detection image, which
achieves stronger feature extraction capabilities. Secondly, the new algorithm integrates Online Hard
Example Mining (OHEM), Soft non-maximum suppression (Soft-NMS), and Distance Intersection
Over Union (DIOU) modules, which improves the positive and negative sample imbalance and the
problem of small targets being easily missed during model training. Finally, the Region Proposal
Network (RPN) is simplified to achieve a faster detection speed and a lower miss rate. The multi-scale
training (MST) strategy is also used to train the improved Faster R-CNN to achieve a balance between
detection accuracy and efficiency. Compared to the other detection models, the improved Faster
R-CNN demonstrates significant advantages in terms of [email protected], F1-score, and Log average miss
rate (LAMR). The model proposed in this paper provides valuable insights and inspiration for many
fields, such as smart agriculture, medical diagnosis, and face recognition.
Citation: Chen, Q.; Li, M.; Lai, Z.;
Zhu, J.; Guan, L. A Multi-Scale Target
Detection Method Using an Improved
Keywords: DIoU; improved Faster R-CNN; multi-scale target detection; ResNet101; Soft-NMS
Faster Region Convolutional Neural
Network Based on Enhanced
Backbone and Optimized
Mechanisms. J. Imaging 2024, 10, 197. 1. Introduction
https://doi.org/10.3390/ Object detection has long been a research focus for computer vision, and it has been
jimaging10080197 extensively applied in areas such as face recognition, medical image diagnosis, and road
Academic Editor: Vijayan K. Asari detection [1–3]. At present, deep learning-based target detection methods can be broadly
categorized into two main classes [4,5]. One class is the two-stage object detection approach
Received: 16 July 2024 typified by Region-based Convolutional Neural Networks (R-CNNs), and the other class
Revised: 8 August 2024
is the one-stage approach typified by You Only Look Once (YOLO). These two types of
Accepted: 10 August 2024
algorithms have their own characteristics and advantages [6]. One-stage target detection
Published: 13 August 2024
algorithms detect targets directly on the original image without a region proposal step, so
these algorithms are relatively faster, but the detection accuracy decreases when detecting
different multi-scale targets. Two-stage object detection algorithms exhibit relatively higher
Copyright: © 2024 by the authors.
detection accuracy but at the cost of slower processing speeds. Driven by the rapid advance-
Licensee MDPI, Basel, Switzerland. ment of deep learning, target detection algorithms have achieved impressive gains in both
This article is an open access article accuracy and processing speed. R-CNN [7] is the seminal work in object detection algo-
distributed under the terms and rithms, which computes over candidate regions generated by the selective search method,
conditions of the Creative Commons and further applies SVM classification and bounding box regression. Consequently, R-CNN
Attribution (CC BY) license (https:// consumes too much time in image processing, resulting in low detection efficiency. He
creativecommons.org/licenses/by/ et al. [8] introduced the Faster R-CNN on the basis of R-CNN, which introduced the Region
4.0/). Proposal Network (RPN) to generate candidate regions and utilized shared convolutional

J. Imaging 2024, 10, 197. https://doi.org/10.3390/jimaging10080197 https://www.mdpi.com/journal/jimaging

J. Imaging 2024, 10, 197 2 of 19

features to further improve detection accuracy and efficiency. Cai et al. [9] introduced Cas-
cade R-CNN, which is best characterized by cascading classifiers and multi-stage training
to improve detection accuracy and speed. Wan et al. [10] proposed an improved version
of Faster R-CNN with optimized convolutional and pooling layers for detecting a wide
range of fruits and achieving higher accuracy. Yang et al. [11] introduced an improved
strawberry detection algorithm based on Mask R-CNN, which resulted in a substantial
improvement in model generalization and robustness. After the R-CNN family of algo-
rithms, Divvala et al. [12] proposed YOLO as an alternative to R-CNN. Unlike R-CNN,
YOLO directly predicts classifications and regressions from features, using a single fully
connected layer for both tasks. This design enhances speed and efficiency in processing.
However, the disadvantage of YOLO is that its generalization ability and robustness are not
strong, and it is easy to miss small targets. With the aim of improving the above problems
effectively, Liu et al. [13] introduced the Single Shot MultiBox Detector (SSD) family of
algorithms. Zhu et al. [14] used SSD to detect fruits on mango trees with an F1 of 0.91.
Anagnostis et al. [15] used SSD to categorize infected trees in walnut orchards and the
method detected whether walnut leaves were infected with 87% accuracy. Tian et al. [16]
proposed EasyRP-R-CNN, a convolution-based framework for cyclone detection. The
method was improved based on Region of Interest and achieved satisfactory detection
accuracy. Li et al. [17] proposed a lightweight convolutional neural network, WearNet, to
achieve automatic detection of scratches on contact sliding parts such as metal molding,
and the classification accuracy of the method can reach 94.16%.
All of the methods mentioned above have some problems. (1) These methods detect
and recognize a single or no target categories, which cannot satisfy the multi-target de-
tection task, and are not able to accurately localize and recognize small targets. (2) These
methods do not have strong feature extraction capabilities for small targets and cannot
extract enough information about the target features. Thus, they can generate noise in the
detection region, resulting in a decrease in accuracy. (3) These methods do not achieve a
balance between detection accuracy and speed to meet the real-time demands of detection
tasks. For the purpose of improving the detection accuracy in a multi-scale target environ-
ment, after considering accuracy and detection efficiency, this paper chooses to use Faster
R-CNN as a baseline to detect different multi-scale targets on the Pascal VOC (Visual Object
Classes) dataset [18]. In this paper, the following modifications are made while reducing
model computation and improving model detection performance:
(1) ResNet101 [19] is employed as the trunk network in the improved Faster R-CNN,
which enhances the feature extraction capabilities of the model.
(2) The Online Hard Example Mining (OHEM) algorithm [20] is used to help the model learn
hard-to-classify samples more efficiently, which in turn enhances the model’s capacity
for generalization. The Soft non-maximum suppression (Soft-NMS) algorithm [21] and
the Distance Intersection Over Union (DIOU) algorithm [22] are used to optimize the ex-
cessive bounding boxes generated by the RPN and their overlap degree, which enhances
the accuracy of detecting small targets and improves the issue of missed target detection.
(3) The RPN structure is optimized by adding an anchor box with a scale of 64 and using
a smaller convolutional kernel to achieve bounding box regression. Employing the
multi-scale training (MST) method to train the improved Faster R-CNN [23] achieves
a balance between detection accuracy and speed.

2. Methodology and Modeling

In this chapter, we first introduce the overall structure of the proposed model. Then, we
provide a detailed explanation of the improvements and integrated modules within the model.
Finally, we briefly describe the performance evaluation metrics used in the experiments.

2.1. The Adjusted Faster R-CNN Network

Figure 1 shows the overall framework of the improved Faster R-CNN [24]. By com-
paring the four networks, VGG16, ResNet34, ResNet50, and ResNet101 [19], ResNet101
J. Imaging 2024, 10, x FOR PEER REVIEW 3 of 21

2.1. The Adjusted Faster R-CNN Network

J. Imaging 2024, 10, 197
Figure 1 shows the overall framework of the improved Faster R-CNN [24]. By 3com- of 19

paring the four networks, VGG16, ResNet34, ResNet50, and ResNet101 [19], ResNet101 is
chosen as the trunk network. The introduction of DIOU can increase the eﬀectiveness of
is
thechosen as theFaster
improved trunkR-CNN
network. The introduction
regarding of DIOU
the problems can convergence
of slow increase the effectiveness
of the target
of the improved Faster R-CNN regarding the problems of slow
detection loss function and target regression localization accuracy. convergence of the target
The introduction of
detection loss function and target regression localization accuracy. The
OHEM enables the improved Faster R-CNN to mine diﬃcult samples in the dynamic introduction of
OHEM
trainingenables
process.the improved
This Faster
can improve theR-CNN
issue oftoimbalance
mine difficult
betweensamples in the
positive and dynamic
negative
training process. This can improve the issue of imbalance between positive
samples during the training process. As depicted in Figure 1, the feature map from and negative
the
samples during the training process. As depicted in Figure 1, the feature map
ResNet101 trunk network is input into the optimized RPN. At this point, a large number from the
ResNet101 trunk network is input into the optimized RPN. At this point, a large number of
of candidate proposal boxes are generated on the feature map. Soft-NMS is used to elim-
candidate proposal boxes are generated on the feature map. Soft-NMS is used to eliminate
inate redundant target proposal boxes. It can reduce the miss rate of small targets by grad-
redundant target proposal boxes. It can reduce the miss rate of small targets by gradually
ually decreasing the confidence score of overlapping proposal boxes.
decreasing the confidence score of overlapping proposal boxes.

1. General framework of the improved

Figure 1. improved Faster
Faster R-CNN.
R-CNN.

2.1.1.
2.1.1. Improved
Improved Backbone
Backbone Network
Network
VGG16
VGG16 [25] serves as the
[25] serves as the trunk
trunk network
network forfor the
the original
original Faster
Faster R-CNN.
R-CNN. InIn general,
general,
data expansion and increasing network depth methods can be used
data expansion and increasing network depth methods can be used to improve to improve model per-
model
formance,
performance,andand
network
networkdepth is very
depth important
is very for optimizing
important network
for optimizing performance
network [26].
performance
ResNet [27] makes it possible for information to skip certain layers directly by introducing
[26]. ResNet [27] makes it possible for information to skip certain layers directly by intro-
direct
ducingconnections across layers
direct connections across in the network.
layers This effectively
in the network. improves
This eﬀectively the problems
improves the prob-
of gradient vanishing and gradient explosion. ResNet mainly uses convolutional opera-
lems of gradient vanishing and gradient explosion. ResNet mainly uses convolutional op-
tions instead of fully connected layers, which reduces the number of network parameters
erations instead of fully connected layers, which reduces the number of network parame-
and can effectively avoid overfitting problems. Common ResNet structures are ResNet34,
ters and can eﬀectively avoid overfitting problems. Common ResNet structures are Res-
ResNet50, and ResNet101 [28]. One of the most significant features of ResNet50 compared
Net34, ResNet50, and ResNet101 [28]. One of the most significant features of ResNet50
to ResNet34 is the introduction of a new bottleneck residual block structure. It comprises a
compared to ResNet34 is the introduction of a new bottleneck residual block structure. It
sequence of 1 × 1 convolutional layers, followed by a 3 × 3 convolutional layer, and then
comprises a sequence of 1 × 1 convolutional layers, followed by a 3 × 3 convolutional layer,
another 1 × 1 convolutional layer. This structure allows ResNet50 to have stronger feature
and then another 1 × 1 convolutional layer. This structure allows ResNet50 to have
representation while maintaining the depth of the model. ResNet101 has an additional
stronger feature representation while maintaining the depth of the model. ResNet101 has
set of convolutional blocks compared to ResNet50, which contains multiple residual units.
an additional set of convolutional blocks compared to ResNet50, which contains multiple
The deeper network structure allows ResNet101 to further enhance its expressive and
residual units. The deeper network structure allows ResNet101 to further enhance its ex-
learning capabilities, allowing it to better capture image details and semantic information,
pressive
as shownand learning
in Figure 2. capabilities, allowing it to better capture image details and semantic
information,
ResNet101as shown
consistsinof
Figure 2.
two fundamental blocks, as depicted in Figures 2a and 2b,
respectively, named Conv Block and Identity Block. The Conv Block’s input and output
dimensions are different and cannot be connected in series. Its function is to change the
network dimension. The input dimensions and output dimensions of the Identity Block
are the same and can be connected in series to deepen the network. ResNet101 consists
of multiple residual units in each convolutional block, as illustrated in Figure 2c. Each
residual unit performs three convolutional operations. The shortcut connections of residual
J. Imaging 2024, 10, 197 4 of 19

units and the identity mapping help address the issues of gradient vanishing or exploding,
which can lead to a decrease in detection accuracy. As shown in Figure 2d, ResNet101
J. Imaging 2024, 10, x FOR PEER REVIEW 4 of 21
consists of five convolutional layers and has a depth of 101 layers, which can provide
stronger feature extraction capabilities.

Resnet101network
2. Resnet101
Figure 2. networkcomposition.
composition.(a)(a) Conv
Conv Block.
Block. (b)(b) Identity
Identity Block.
Block. (c) Residual
(c) Residual block.
block. (d)
Resnet101 network
(d) Resnet101 structure.
network structure.

In the fourth
ResNet101 chapter,
consists ofcomparative experiments
two fundamental blocks, are conducted
as depicted in for VGG16,
Figure ResNet34,
2a and Figure
ResNet50, and ResNet101. Experimental results indicate that ResNet101 outperforms
2b, respectively, named Conv Block and Identity Block. The Conv Block’s input and out- the
other three networks in overall detection performance. Therefore, ResNet101 has
put dimensions are diﬀerent and cannot be connected in series. Its function is to change been
selected
the as thedimension.
network trunk network
The for thedimensions
input improved Faster R-CNN.dimensions of the Identity
and output
Block are the same and can be connected in series to deepen the network. ResNet101 con-
2.1.2. Modifying the Region Proposal Network
sists of multiple residual units in each convolutional block, as illustrated in Figure 2c. Each
In the
residual object
unit detection
performs threeprocess, CNNs are
convolutional usually used
operations. Thetoshortcut
extract image features.
connections These
of resid-
features are convolved and pooled through several convolution and
ual units and the identity mapping help address the issues of gradient vanishing or ex- pooling operations to
produce a smaller feature map (also referred to as an activation map).
ploding, which can lead to a decrease in detection accuracy. As shown in Figure 2d, Res- The activation map
contains semantic information and location information taken from the image, and then the
Net101 consists of five convolutional layers and has a depth of 101 layers, which can pro-
activation map is input into the RPN. The RPN [8,29] will first further extract features from
vide stronger feature extraction capabilities.
the input activation map through a shared convolutional layer, and then it will initially
In the fourth chapter, comparative experiments are conducted for VGG16, ResNet34,
generate a set of predefined anchor boxes at each spatial position of the activation map to
ResNet50, and ResNet101. Experimental results indicate that ResNet101 outperforms the
accomplish object detection. These predefined anchor boxes generated during the detection
other three networks in overall detection performance. Therefore, ResNet101 has been se-
process usually have different width-to-height ratios as well as areas. Figure 3 shows the
lected as the trunk network for the improved Faster R-CNN.
schematic of the optimized RPN.
For each anchor box, the RPN network uses a binary classifier to predict whether it
2.1.2. Modifying the Region Proposal Network
contains a target. It outputs the probability that each anchor box belongs to the foreground
In the object
or background to detection
complete process,
the initialCNNs are usually
classification used to The
prediction. extract image features.
conventional RPN
These features
generates are convolved
nine anchor boxes atandeachpooled
spatialthrough
position several convolution
of the activation mapandwithpooling oper-
aspect ratios
ations to produce
of 1:1, 1:2, and 2:1 aandsmaller
scalesfeature
of 128,map256, (also referred
and 512, to as an activation
respectively, as shown in map). The3a.
Figure acti-
In
vation map contains semantic information and location information
order to improve the accuracy of detecting small targets and reduce the leakage rate, ataken from the image,
and
newthen
anchorthebox
activation
with a map
scaleisofinput
64 is into
added theto
RPN. The RPN
the RPN while[8,29] will first
the aspect further
ratio extract
remains un-
features from the input activation map through a shared convolutional
changed [30], so that there are a total of 12 anchor boxes at each position of the feature layer, and then it
map,
will initially
which is thengenerate a set capture
able to better of predefined anchor
small-scale boxesasatshown
targets, each spatial position
in Figure of the acti-
3b. In addition to
vation map to accomplish
the classification predictions,object
the detection.
RPN completesThesean predefined anchor boxes
initial bounding generated
box regression ondur-
the
ing the detection
positive process usually
samples (containing have diﬀerent
the target’s anchor boxes)width-to-height
with the goal ratios as well aspredict-
of accurately areas.
Figure
ing the 3target’s
showsbounding
the schematic of the optimized
box position. The output RPN.
is the translation and scaling parameters
relative to each positive sample anchor box. As depicted in Figure 3c, the structure of the
RPN is simplified, and only the 3 × 3 convolutional kernel is used to generate 256 feature
J. Imaging 2024, 10, 197 5 of 19

maps. The benefits of using a 3 × 3 convolution kernel include: reducing the number of
parameters and computational burden, improving computational efficiency, capturing local
features, effectively utilizing boundary information, and simplifying the network design.
J. Imaging 2024, 10,The
x FOR × 3 REVIEW
3 PEER convolution kernel is able to maintain model simplicity and computational 5
efficiency while ensuring the effectiveness and accuracy of feature extraction.

FigureRPN
Figure 3. Optimized 3. Optimized RPN
schematic. (a)schematic. (a) window
RPN sliding RPN sliding
andwindow
anchors.and
(b)anchors.
Modified(b)RPN
Modified RPN
chors. (c)
anchors. (c) Modified RPN. Modified RPN.

2.1.3. Optimization For each anchor

of Detection Boxes box, theSample
and RPN network
Imbalance uses a binary classifier to predict wheth
In object detection, the bounding box mechanism is used each
contains a target. It outputs the probability that anchor
to locate andbox belongs
identify to the foregro
objects
or background to complete the initial classification
in an image using rectangular boxes. This is mainly achieved by predicting the center prediction. The conventional RPN
erates nine anchor boxes at each spatial position of the
coordinates, width, and height of the bounding boxes to ensure accurate object localization activation map with aspect r
of 1:1, 1:2, and 2:1 and scales of 128, 256, and 512, respectively,
and classification. Regression loss functions and post-processing algorithms are employed as shown in Figure 3
to optimize theorder to improve
position the accuracy
of the bounding of detecting
boxes and remove smallredundant
targets and reduce theen-
detections, leakage ra
new anchor
hancing the accuracy box with of
and efficiency a scale of 64 In
detection. is the
added to the RPN
improved Fasterwhile the aspect
R-CNN, severalratio rem
common strategiesunchanged [30], so
are adopted to that therethe
enhance arebounding
a total of 12 boxanchor boxes atBelow
mechanism. each position
is a briefof the fea
map, which
introduction to these strategies. is then able to better capture small-scale targets, as shown in Figure 3
(1) DIOU addition to the classification predictions, the RPN completes an initial bounding bo
gression on the positive samples (containing the target’s
IOU [31] is a widely used performance index in object detection, indicating the degree anchor boxes) with the go
accurately predicting the target’s bounding box position.
of overlap between the predicted box and the actual box. Figure 4 shows the schematic The output is the translation
diagram of thescaling
boundingparameters
box withrelative to eachdegrees
different positive of
sample anchor
overlap. box. As depicted
Currently, there are in Figure 3c
structure ofto
two main disadvantages theusing
RPN IOUis simplified,
as a lossand only the
function for3 object
× 3 convolutional
detection. kernel
One isisthat used to gen
256 feature
if there is no overlap maps. The
between benefits
the two of using
detection a 3 × 3IOU
boxes, convolution
= 0, which kernel include:
means thatreducing
the the n
ber of parameters and computational burden, improving
two detection boxes can no longer participate in deep learning training. Secondly, it can computational efficiency, captu
be possible for local features, effectively
two detection utilizing
boxes to have boundary
different information,
degrees of overlapandbutsimplifying
yield thethe same network de
The 3 × 3 convolution kernel is able to maintain model simplicity
IOU. This means that IOU does not give accurate feedback on the degree of overlap between and computational effici
while ensuring the
the detection boxes, as shown in Figure 4a. effectiveness and accuracy of feature extraction.
With the aim of improving the above two problems effectively, DIOU [22] is used to
better refine the2.1.3. Optimization
positioning of theof Detectionbox.
bounding Boxes andcan
DIOU Sample Imbalance
effectively address the issues
In object detection, the bounding box mechanism is
of slow convergence and regression localization accuracy that exist in IOU, and used to locate and identify ob
its formula
in an image
is defined as Equation (1). using rectangular boxes. This is mainly achieved by predicting the cente
ordinates, width, and height ofρthe bgt )
2 (b,bounding boxes to ensure gt
ρ2 (b, baccurate
) object localiza
LDIOU = 1 − DIOU = 1 − ( IOU −
and classification. Regression loss functions ) = 1 − IOU +
and post-processing (1)
c2 c2 algorithms are emplo
to optimize the position of the bounding boxes and remove redundant detections, enh
ing the accuracy and eﬃciency of detection. In the improved Faster R-CNN, several c
mon strategies are adopted to enhance the bounding box mechanism. Below is a brie
troduction to these strategies.
ρ2 (b, b gt ) ρ2 (b, b gt )
L DIOU = 1 − DIOU = 1 − (IOU − ) = 1 − IOU +
c2 c2
2 gt
J. Imaging 2024, 10, 197
where ρ (b, b ) denotes the distance between the centers of the prediction
6 of 19 box
gt
the ground truth b , and c denotes the diagonal length of the smallest region
where 2 (b, bgt ) denotes the distance between the centers of the prediction box b and the
tainsρboth the prediction box and the ground truth, as shown in Figure 4b. Since th
gt
ground
loss function, and
truth b hasc the
denotes the diagonal length
characteristic of being of theable
smallest region that contains
to minimize both betw
the distance
the prediction box and the ground truth, as shown in Figure 4b. Since the DIOU loss func-
prediction boxes directly, it converges and regresses faster than IOU. DIoU impr
tion has the characteristic of being able to minimize the distance between two prediction
traditional
boxes directly,IoU [32] metric
it converges by addressing
and regresses faster than the insensitivity
IOU. DIoU improves of IoU to the distance
the traditional
bounding boxes. By introducing a penalty term in the loss function based on th
IoU [32] metric by addressing the insensitivity of IoU to the distance between bounding
boxes. By introducing
distance a penalty term
of the bounding box,inDIoU
the loss function based
provides a moreon the center distance ofand
comprehensive the accura
bounding box, DIoU provides a more comprehensive and accurate measure of bounding
ure of bounding box similarity, which leads to better performance in target d
box similarity, which leads to better performance in target detection tasks. Using DIOU as
tasks.
the Using DIOU
loss function as the loss
of the improved function
Faster R-CNN ofcan
themake
improved Faster
the model R-CNN
pay more can make th
attention
topay more attention
hard-to-detect to hard-to-detect
targets during targets
the training process, during
which the training
is especially importantprocess,
for the which
detection of small and irregularly shaped targets.
cially important for the detection of small and irregularly shaped targets.

Figure 4. Schematic representation of the different levels of overlap of the bounding box.
Figure 4. Schematic representation of the different levels of overlap of the boundingbox.
(a) Three possible scenarios when IoUs are identical. (b) DIOU loss for bounding box regression.
possible scenarios when IoUs are identical. (b) DIOU loss for bounding box regression.
(2) OHEM
During
(2) OHEMthe object detection training process, the activation maps from the trunk
network are input into the RPN, at which time a large number of candidate proposal
During the object detection training process, the activation maps from the tr
boxes are generated. Generally, the IOU threshold is set to 0.5 and proposal boxes with
work
an IOU are
aboveinput into
0.5 are the RPN,
retained at which
as positive timewhile
samples, a large
thosenumber
below 0.5ofare candidate
treated as propos
are generated.
negative Generally,
samples. This the IOU threshold
leads to significantly is set
more positive to 0.5than
samples and proposal
negative ones.boxes
As with
aabove 0.5model
result, the are retained as positive
may overlook samples,
difficult negative whilethat
samples those below 0.5 to
are challenging are treated as
detect.
These difficult
samples. samples
This leadscanto contribute
significantlyhigher loss values,
more positiveimproving
samples thethan
overall detection
negative ones. As
performance of the model.
theInmodel may overlook difficult negative samples that are challenging to detec
order to enable more difficult samples to be used in the dynamic process of training,
difficult
OHEM samplesinto
is introduced canthe
contribute higher
improved Faster loss[20].
R-CNN values,
Figureimproving theworking
5 illustrates the overall detec
formance
principle of the OHEM
of OHEM. model.[33] improves the training effect of the model by dynamically
selecting and processing those difficult samples that the model currently has difficulty clas-
sifying during the training process. Specifically, it first detects the input image during each
training session and selects the samples with higher loss as difficult samples according to
the loss value. These difficult samples are then utilized for backpropagation and parameter
updating so that the model learns to correctly classify difficult cases faster, thus improving
its overall performance. The OHEM training strategy in the dynamic process of training
can send the target samples that are difficult to detect into the network again for deep
learning training. This makes the network more sensitive to the detection target, which in
turn improves the target detection accuracy.
ples according to the loss value. These difficult samples are then utilized for backpropa-
gation and parameter updating so that the model learns to correctly classify difficult cases
faster, thus improving its overall performance. The OHEM training strategy in the dy-
namic process of training can send the target samples that are difficult to detect into the
J. Imaging 2024, 10, 197 network again for deep learning training. This makes the network more sensitive 7toofthe19
detection target, which in turn improves the target detection accuracy.

Figure5.5.Diagram
Figure Diagramof
ofthe
theOHEM
OHEMalgorithm.
algorithm.

(3)Soft-NMS
(3) Soft-NMS
NMS isis aavery
NMS veryimportant
importantalgorithm
algorithm in intarget
targetdetection.
detection. ItsIts basic
basic idea
idea is
is to
to sort
sort the
the
proposal
proposalboxes
boxesbybytheir
theirconfidence
confidencescoresscoresandandretain
retainthe
theone
onewith
withthe thehighest
highestscore.
score.InInthis
this
process,
process,ififthe
theoverlap
overlapbetween
betweentwo twoproposal
proposalboxes
boxesexceeds
exceedsaa set
set threshold
threshold (generally
(generally 0.5),
0.5),
the
thebox
boxwith
with the
the lower
lower score
score isis discarded,
discarded, and and the
the one
one with
with the
the higher
higher score
score isis retained.
retained.
Therefore,
Therefore,TheTheNMS
NMSscore
scoreisisbased
basedsolely
solelyononthe
theclassification
classificationconfidence,
confidence, without
without consider-
consid-
ing thethe
ering localization
localizationaccuracy
accuracy ofofthe bounding
the bounding box.
box.This
Thismeans
meansthat thatthe
theclassification
classificationand and
localization
localizationconfidences
confidencesarearenotnotpositively
positivelycorrelated.
correlated.
To
Toeffectively
effectivelyaddress
addresssomesomeof ofthe
thelimitations
limitationsof ofNMS,
NMS,Soft-NMS
Soft-NMS isisadopted
adopted in inthe
the
improved Faster R-CNN [21]. Its linear weighted equation is defined
improved Faster R-CNN [21]. Its linear weighted equation is defined as Equation (2). as Equation (2).

 SSi IOU (M, bi ) b<)N<t N
IOU(M,
i i t
Si =  (2)
Si = Si (1 − IOU(M, bi )

IOU(M, bi ) ≥ Nt (2)

Here, Si is the proposal i (1 −confidence
Sbox IOU(M, b iscore
) IOU(M,
and M is b i) ≥
the Nt
proposal box with the
highest score. The set of all proposal boxes during training is denoted by b, the area of
Here, Si is the proposal box confidence score and M is the proposal box with the
overlap between M and the proposal boxes in set b is denoted by IOU(M, bi ), and Nt is
highest
the score. Theset
IOU threshold setatofthe
allbeginning.
proposal boxes during training
The difference is denoted
between Soft-NMSbyand b ,NMS
the area of
is the
overlap between M and the proposal boxes in set b is denoted by
way overlapping bounding boxes are handled. Traditional NMS directly removesi otherIOU(M, b ) , and
N t is theboxes
bounding that overlap
IOU threshold set with
at thethe highest scoring
beginning. box above
The difference a certain
between threshold,
Soft-NMS andwhich
NMS
may result in some correct detections being mistakenly removed. In contrast, Soft-NMS
is the way overlapping bounding boxes are handled. Traditional NMS directly removes
does not remove these overlapping bounding boxes, but gradually reduces their scores.
other bounding boxes that overlap with the highest scoring box above a certain threshold,
Soft-NMS works by attenuating the scores proportionally to the degree of overlap between
which may result in some correct detections being mistakenly removed. In contrast, Soft-
the bounding box and the highest-scoring box [34]. The larger the overlap, the more the
NMS does not remove these overlapping bounding boxes, but gradually reduces their
score is attenuated. In this way, Soft-NMS is able to retain more valuable detection results,
scores. Soft-NMS works by attenuating the scores proportionally to the degree of overlap
reduce missed detections, and improve the accuracy of target detection.
between the bounding box and the highest-scoring box [34]. The larger the overlap, the
moreMulti-Scale
2.1.4. the score is Training
attenuated. In this way, Soft-NMS is able to retain more valuable detec-
tion The
results, reduce missed can
MST [23] method detections,
enhanceand
theimprove
model’sthe accuracy ofand
adaptability target detection. ca-
generalization
pabilities when detecting a variety of target sizes. An image pyramid called MST [35] is
used in the training process of CNN. As shown in Figure 6, image training is achieved
by randomly inputting images of different scales within a given segment of image size,
which enables the target detection model to adapt to targets of different scales. In the
testing phase, the same image at different scales is input for multiple detections. Finally,
Soft-NMS is employed to integrate all detection results, which enables the detection model
to cover as many targets as possible at multiple scales and improves the robustness of the
detection model. During the feature extraction phase, the generated activation map will be
significantly smaller than the original image. This can make it challenging for the model to
focus on the details of small targets. Therefore, by providing the model with larger and
richer images, its detection capabilities can be enhanced effectively [36]. In this paper, MST
is used to train the improved Faster R-CNN. The training samples have image lengths
ranging from 380 to 640, and image widths spanning 300 to 450.
detection model. During the feature extraction phase, the generated activation map will
be significantly smaller than the original image. This can make it challenging for the model
to focus on the details of small targets. Therefore, by providing the model with larger and
richer images, its detection capabilities can be enhanced effectively [36]. In this paper, MST
J. Imaging 2024, 10, 197 8 of 19
is used to train the improved Faster R-CNN. The training samples have image lengths
ranging from 380 to 640, and image widths spanning 300 to 450.

Figure 6.
Figure 6. Diagram
Diagram of
of the multi-scale training
the multi-scale training strategy.
strategy.

2.2. Evaluation Metrics

To assess
assess the
the model’s
model’s detection
detection capabilities,
capabilities, several
several widely used evaluation
evaluation metrics
are used here [37,38].
[37,38]. These metrics include Average Precision (AP), F1 Score (F1), Log
average miss rate (LAMR), and mean Average Precision (mAP). The specific formula is
given
given inin Equation
Equation (3).
(3).
XXTP represents the number of targets accurately identified by the model, XFP indicates
TP represents the number of targets accurately identified by the model, X FP in-
the number of targets mistakenly identified by the model, and XFN denotes the number of
dicates missed
targets the number
by theofmodel.
targetsAPmistakenly
is the areaidentified
surrounded by bythethe
model, and X
PR curve. FN denotes
LAMR is usedthe
to
number of
measure targets
the miss missed by the
rate of the model.
target. mAP APisisused
the areaas asurrounded by theevaluation
comprehensive PR curve. metric,
LAMR
is used
and to measure
its value is the the miss of
average rate
theofAPs
the target. mAP is used
of all detection as a comprehensive evaluation
categories.
metric, and its value is the average of the APs ofXall

TP
detection categories.
 precision(P) = × 100%
 XTP + XXTPFP


 precision(P) = XTP ×100％


+




 recall ( R ) = X TP X ×FP100%
XTP + XFN


X


 recall(R) = R1 TP ×100％


AP =X p(+r)X

dr


TP FN


0


1



2PR (3)
 F1 = AP = p(r)dr
× 100%
P + R




0


 LAMR = 2PR

1 M

(3)

∑ log(MR )

 F1 = M i=1 × 100％i



P+R





K
  

∑1APMi



 LAMR = i=1 log(MR i)


mAP = × 100%
M



K i =1 
 K

3. Experiments and Results  APi

 mAPthe
In this chapter, we first introduce = datasets×100
i =1
％in the experiments. Then, we
used
 K
provide a detailed explanation of the data augmentation methods applied to these datasets.
Next, we conduct experiments on datasets with different proportional distributions and
present the recorded results.

3.1. Data and Preprocessing

3.1.1. Data Collection
To accentuate the model’s performance more prominently, this experiment uses the
Pascal VOC [18] dataset, which contains a total of 20 detection categories. Each target has
three types: large, medium, and small, which can fulfill the purpose of detecting different
multi-scale targets, as illustrated in Table 1. The Pascal VOC 2007 dataset provides rich
object classes and diverse scenes, which makes it an important benchmark for evaluating
target detection and other computer vision tasks [39]. It covers a wide range of categories
from humans and animals to vehicles and indoor objects, with diverse lighting conditions,
viewing angles, and occlusions, which makes it an important tool for testing the robustness
and generalizability of algorithms. The numbers 1–20 for subsequent related experiments
correspond to the detection targets in the table.
J. Imaging 2024, 10, 197 9 of 19

Table 1. Dataset distribution of Pascal VOC 2007.

Object Classes Train Validation Test Total

Pottedplant 133 112 224 469
Chair 224 221 417 862
Sofa 111 118 223 452
sheep 48 48 97 193
bottle 139 105 212 456
diningtable 97 103 190 390
Bird 180 150 282 612
boat 81 100 172 353
aeroplane 112 126 204 442
motorbike 120 125 222 467
tvmonitor 128 128 229 485
person 1025 983 2007 4015
train 127 134 259 520
bicycle 116 127 239 482
Cow 69 72 127 268
Dog 203 218 418 839
Cat 163 174 322 659
Bus 97 89 174 360
Car 376 337 721 1434
horse 139 148 274 561
total 2501 2510 4952 9963

3.1.2. Data Augmentation

Before training the model, after analyzing the dataset, it was found that the dataset was
very uneven and that the image size was not uniform. Therefore, the dataset needed to be
processed in the preprocessing stage. Image augmentation is a widely adopted technique
to expand the dataset and strengthen the model’s robustness [40,41]. Data distribution
J. Imaging 2024, 10, x FOR PEER REVIEW
can be changed regularly by image enhancement techniques, weakening the features 10 of 21of
unimportant objects. There are two types of commonly used image enhancement methods.
One is image enhancement based on image processing techniques and the other is deep
learning based based
deep learning image image
enhancement algorithms.
enhancement This paper
algorithms. adopts
This paper the first
adopts the one,
first which
one,
mainly
which mainly includes rotation, flipping, random cropping, and mosaic stitching.this
includes rotation, flipping, random cropping, and mosaic stitching. In case,
In this
mosaic
case, mosaic stitching is the proportional recombination of four images from the trainingset
stitching is the proportional recombination of four images from the training
into a new
set into image.
a new This enhances
image. This enhancesthe object’s contextual
the object’s background,
contextual allowing
background, the model
allowing the
to learn to
model richer
learnfeatures from a single
richer features from aimage.
single The effect
image. Theof eﬀect
data enhancement is shownisin
of data enhancement
Figure
shown7.inAfter
Figure data enhancement,
7. After the dataset
data enhancement, the was expanded
dataset from 9963
was expanded fromto9963
15,000 images.
to 15,000
The image
images. The size is uniformly
image resized to
size is uniformly 400 ×to400.
resized 400 × 400.

Figure7.7.Data
Figure Data enhancement effect diagram.
enhancement eﬀect diagram.

3.2. Results
All experiments are conducted on the Colab cloud server platform, using a Tesla T4
graphics card with 16 GB of video memory and Windows as the operating system. The
deep learning framework used is PyTorch 1.9, with 100 training epochs and all images
uniformly resized to 400 × 400. To improve model robustness and efficiency, the expanded
dataset is divided into three categories, A, B, and C [30], and the proportional allocation of
the dataset in each category is depicted in Table 2.

Table 2. Proportional distribution of the three datasets.

Data Set Train Validation Test

A (8:1:1) 12,000 1500 1500
B (7:1.5:1.5) 10,500 2250 2250
C (6:2:2) 9000 3000 3000

The improved Faster R-CNN is first applied to the training set of three datasets and the
results are recorded. During the training process, model error is minimized by adjusting
various training parameters over 100 iterations of training. At last, the improved Faster
R-CNN is trained using test data from the three datasets. Figure 8 shows the experimental
result curves and data distributions during the training process.
By observing the changes and positions of the curves, we know that the accuracy curves
of dataset A are steadily increasing and located at the top of the three curves. This indicates that
the improved Faster R-CNN has the best performance on dataset A. Meanwhile, the loss value
curve of dataset A decreases steadily. The fluctuation is small, located at the bottom of the three
curves, and gradually tends to be stable. This also indicates that the improved Faster R-CNN
has stronger stability and robustness on dataset A. The variation of these curves can be seen in
Figure 8a,c,e,g. Accordingly, by observing the data distribution of the experimental results, it
can be found that, compared to the training and testing sets of datasets B and C, the accuracy
values of dataset A are more centrally distributed and have higher positions, while the loss
values are at lower distribution positions and more compactly distributed. The improved
Faster R-CNN is adequately trained on dataset A with better generalization performance. This
is because 80% of the images in dataset A are used for training so that the model can extract
J. Imaging 2024, 10, x FOR PEER REVIEW
richer information about image features. Based on these experimental results, dataset 11 ofA21is used

for all subsequent experiments.

Figure 8. Cont.
J. Imaging 2024, 10, 197 11 of 19

Figure 8. The result curves and data distribution for datasets A, B, and C. (a) Training accuracy
Figure 8. The result curves and data distribution for datasets A, B, and C. (a) Training accuracy curve.
curve. (b) Distribution of training accuracy data. (c) Training loss curve. (d) Distribution of training
(b) loss
Distribution of training
data. (e) Test accuracyaccuracy
curve. (f)data. (c) Training
Distribution loss
of test curve.data.
accuracy (d) Distribution of training
(g) Test loss curve. loss data.
(h) Dis-
(e) Test accuracy
tribution of testcurve. (f) Distribution of test accuracy data. (g) Test loss curve. (h) Distribution of
loss data.
test loss data.
By observing the changes and positions of the curves, we know that the accuracy curves
4. Discussion
of dataset A are steadily increasing and located at the top of the three curves. This indicates
4.1.that
Comparison
the improvedof Trunk Networks
Faster R-CNN has the best performance on dataset A. Meanwhile, the
loss value curve of dataset
In this section, comparative A decreases steadily. The
experiments fluctuation
[24] is small,for
are conducted located
VGG16, at theResNet34,
bot-
tom of the three curves, and gradually tends to be stable. This also indicates that the im-
ResNet50, and ResNet101. A total of 15,000 images of dataset A are divided in the ratio
proved Faster R-CNN has stronger stability and robustness on dataset A. The variation of
of 8:1:1, and SGD is employed to optimize the model [42]. The momentum is 0.9, the
these curves can be seen in Figure 8a,c,e,g. Accordingly, by observing the data distribution
learning rate is 0.001,
of the experimental and the
results, it canF1
bethreshold
found that,is 0.5. After
compared 100training
to the epochs, andthe model
testing sets training
of
curve converges, indicating that the model has reached an optimal solution.
datasets B and C, the accuracy values of dataset A are more centrally distributed and have The improved
Faster
higherR-CNN improves
positions, while the theloss
capability
values areofattarget
lower detection
distributionby combining
positions different
and more com- trunk
networks. Table 3 shows the experimental results on the test set.
pactly distributed. The improved Faster R-CNN is adequately trained on dataset A with
better generalization performance. This is because 80% of the images in dataset A are used
for3.
Table training so that
Predictive the model can
performance extract richer
of different trunkinformation
networks. about image features. Based on
these experimental results, dataset A is used for all subsequent experiments.
Methods [email protected] (%) F1 (%) LAMR (%)
VGG16 72.7 55.3 31.4
ResNet34 73.3 56.4 30.8
ResNet50 73.8 56.2 30.2
ResNet101 74.9 57.2 29.5

As can be seen in Table 3, the trunk network with a residual structure has stable detection
performance in the multi-scale target category. With the increase of network layers, the
detection accuracy of ResNet50 surpasses that of VGG16 and there are fewer missed detections
for targets. Since the residual unit module inside the ResNet101 network is connected by a
multilevel residual network, this makes it better able to capture the local feature information
of multi-scale targets. ResNet101 has the best detection effect, with [email protected] reaching 74.9%
and the LAMR reaching 29.5%. Therefore, from Table 3, it can be seen that the improvement
of the Faster R-CNN trunk network is effective. ResNet101 performs better in multi-scale
target category detection, and the miss rate for small targets is lower.
Figure 9 compares the detection results of the four trunk networks. Focusing on the
first two columns, when using ResNet101 as the feature extractor, detection performance
is notably improved compared to the other trunk networks. The target detection scores
are generally higher, the target localization is more accurate, and there are no target
misdetections or missed detections. VGG16 and ResNet50 mistakenly detected the ponytail
as a person, while ResNet34 missed the couch and did not detect the chair in the upper right
corner. In the third resultant image containing only small targets, ResNet101 can accurately
detect the person and the occluded bicycle in the image. This further demonstrates better
detection performance when using ResNet101.
J. Imaging
J. Imaging 2024,
2024, 10,10,
197x FOR PEER REVIEW 13 of
12 21
of 19

Figure9.9.Visual
Figure Visualcomparison
comparison of detection results
results for
forfour
fourbackbone
backbonenetworks.
networks.

4.2.
4.2.Comparison
Comparison of of Different
Diﬀerent Object Detectors
Object Detectors
ToTodemonstrate
demonstratethat thatthe
theproposed
proposedmethod
methodhas hasbetter
betterdetection
detectioneffectiveness
effectivenessand andaccuracy,
accu-
experiments are also performed on the test set of the expanded dataset
racy, experiments are also performed on the test set of the expanded dataset for RetinaNet for RetinaNet [43],
Faster R-CNN [8], Mask R-CNN [44], YOLOv4 [45], and Cascade
[43], Faster R-CNN [8], Mask R-CNN [44], YOLOv4 [45], and Cascade R-CNN [9]. The SGD R-CNN [9]. The SGD
optimizer
optimizerisisused
used toto optimize
optimize thethe model
modelwith
withaamomentum
momentumofof0.9 0.9and
anda alearning
learning rate
rate of of 0.001,
0.001,
with
with the learning rate decaying 0.1 times every 20 epochs. By continuously adjusting the train-the
the learning rate decaying 0.1 times every 20 epochs. By continuously adjusting
training parameters,
ing parameters, afterafter 100 epochs,
100 epochs, the training
the training curvescurves
of eachofcomparison
each comparison model gradually
model gradually level
level off. This
off. This indicates
indicates that
that the the model
model training
training processprocess is relatively
is relatively smooth.smooth. Table 4the
Table 4 shows shows
meanthe
mean
valuesvalues
of eachofmetric
each metric
for the for the 20 targets
20 targets detecteddetected
by the six bycomparison
the six comparison
models onmodels
the teston
set.the
test set. 10
Figure Figure
shows10 ashows a comparison
comparison of thevalues
of the metric metricandvalues
theirand
datatheir data distributions
distributions for the 20for
de-the
20 detection targets of the six comparison models
tection targets of the six comparison models on the test set. on the test set.

Table4.4.Experimental
Table Experimental mean
mean results for six comparison
comparisonmodels.
models.

Methods
Methods [email protected]
[email protected] (%) (%) F1 (%)
F1 (%) LAMR
LAMR (%)(%) T
T (s)
RetinaNet
RetinaNet 75.675.6 58.558.5 29.2
29.2 0.155
0.155
Faster
FasterR-CNN
R-CNN 72.772.7 55.355.3 31.4
31.4 0.147
0.147
Mask R-CNN 75.8 58.2 28.5 0.153
Mask
YOLOv4 R-CNN 76.575.8 59.258.2 28.5
27.2 0.153
0.132
Cascade R-CNN
YOLOv4 76.276.5 59.759.2 28.1
27.2 0.139
0.132
This Paper 77.8 60.6 26.5 0.163
Cascade R-CNN 76.2 59.7 28.1 0.139
This Paper 77.8 60.6 26.5 0.163
Table 4 shows the mean values of each metric for the 20 targets detected by the
six comparison models
Table 4 shows the on thevalues
mean test set.ofFigure 10 shows
each metric a comparison
for the of the metric
20 targets detected by thevalues
six
and their datamodels
comparison distributions for the
on the test set. 20 detection
Figure targets
10 shows of the six comparison
a comparison of the metricmodels
values on
andthe
test set.
their Thedistributions
data improved Faster
for theR-CNN improves
20 detection [email protected]
of the sixtocomparison
77.8%, F1 tomodels
60.6%,onandthethe
LAMR
test set.toThe
26.5%. Compared
improved FastertoR-CNN
the other five detection
improves [email protected] models, the proposed
to 77.8%, method
F1 to 60.6%, and thehas
better
LAMR performance. Specifically,
to 26.5%. Compared compared
to the with
other five the original
detection Faster
models, theR-CNN,
proposed the [email protected]
method has is
improved by 5.1%, the F1 is improved by 5.3%, and the LAMR is reduced by 4.9%. This
indicates that the proposed method is effective in improving the detection rate of small
J. Imaging 2024, 10, x FOR PEER REVIEW 14 of 21

J. Imaging 2024, 10, 197 13 of 19

better performance. Specifically, compared with the original Faster R-CNN, the [email protected]
is improved by 5.1%, the F1 is improved by 5.3%, and the LAMR is reduced by 4.9%. This
indicates that the proposed method is eﬀective in improving the detection rate of small
targets as well as reducing the miss rate. In terms of detection speed, the proposed approach
targets as well as reducing the miss rate. In terms of detection speed, the proposed ap-
isproach
0.016 sisslower
0.016 sthan Faster
slower thanR-CNN. The potential
Faster R-CNN. reasonreason
The potential may be duebetodue
may thetoincrease
the inin
the number of anchor boxes in the RPN, which has resulted in a longer overall
crease in the number of anchor boxes in the RPN, which has resulted in a longer overall detection
time for thetime
detection proposed approach,approach,
for the proposed slightly decreasing speed but
slightly decreasing obtaining
speed more accurate
but obtaining more
detection results. This helps to achieve a balance between time and accuracy.
accurate detection results. This helps to achieve a balance between time and accuracy.

Figure10.
Figure 10.Comparison
Comparison of
of experimental
experimental result
resultvalues
valuesand
andtheir
theirdata
datadistribution forfor
distribution 20 20
detection
detection
targets for each comparison model. (a,b) Comparisons of AP values. (c,d) Comparisons of values.
targets for each comparison model. (a,b) Comparisons of AP values. (c,d) Comparisons of F1 F1 values.
(e,f) Comparisons of LAMR values.
(e,f) Comparisons of LAMR values.

Observing Figure 10a,b, it can be seen that 15 AP values of the proposed approach
are located in the first position, and the data distribution of the prediction results is more
centralized than the other compared models. This indicates that the overall detection accuracy
J. Imaging 2024, 10, 197 14 of 19

of the model is higher. From Figure 10c,d, it can be seen that 17 F1 values are in the leading
position, among which two F1 values exceed 0.8, and the distribution intervals of the F1 values
are relatively higher. This indicates that the model is more adaptable when facing multi-scale
target categories. As shown in Figure 10e,f, the prediction results of the proposed approach
all have LAMR values below 0.5, and the miss rate is also reduced for small targets such as
pottedplant (1), sheep (4), and bottle (5). The reason for this may be that the bounding box
optimization mechanism, as well as the introduced multi-scale training strategy, plays a role.
The data distribution interval of the LAMR is also relatively lower. This indicates that the
model is able to focus on small targets that are difficult to detect when performing multi-scale
target category detection, and the model is more robust and stable. Figure 11 shows the
visualization of the detection image after mosaic data enhancement. Compared to the other
detection models, although the proposed approach is not the best in each target category, the
overall effect is excellent. The very small car in the upper right corner, as well as the occluded
chair in the lower left corner, can be accurately detected, and both target detection scores are
relatively quite high. This demonstrates that the proposed approach can better adapt to the
J. Imaging 2024, 10, x FOR PEER REVIEW 16 ofand
different sizes and shapes of targets in different multi-scale detection target environments 21
improve the accuracy of small target detection.

Figure 11.Visual
Figure11. Visualcomparison
comparisonof
ofdetection
detection results
results for
for six
six comparison
comparison models.
models.

4.3. The Ablation Experiments and Analysis

In this section, we set up seven sets of ablation experiments on the test set to verify the
validity of each module. The setting of the ablation experiments is detailed in Table 5. The
SGD optimizer is used to optimize each combined model [42]. The momentum is 0.9 and
the batch size is 64. To mitigate overfitting during training, the regularization parameter is
0.01 and the learning rate is 0.001. The number of training times for each combination of
models is still 100 epochs, and to achieve timely adjustments to the training parameters,
the training results are taken every 10 epochs. When the training curves of each combined
model gradually level off, it indicates that the model can converge stably. The results of the
ablation study are presented in Table 5.

Table 5. Setting of ablation experiments and experimental results.

Method ResNet101 RPN DIOU OHEM Soft-NMS MST [email protected] (%) F1 (%) LAMR (%)
Faster R-
72.7 55.3 31.4
CNN(VGG16)
Faster R-
✓ 74.9 57.2 29.5
CNN(ResNet101)
Improve1 ✓ ✓ 75.2 57.4 29.2
Improve2 ✓ ✓ 75.1 57.3 29.3
Improve3 ✓ ✓ 75.4 57.6 28.9
Improve4 ✓ ✓ 75.0 57.4 29.3
Improve5 ✓ ✓ ✓ ✓ ✓ 75.5 58.1 28.7
This paper ✓ ✓ ✓ ✓ ✓ ✓ 77.8 60.6 26.5

As depicted in Table 5, the addition of the modified RPN to the Faster R-CNN
(ResNet101) increases the [email protected] by 0.3% and the F1 by 0.2%. The introduction of
DIOU improves the [email protected] by 0.2% and the F1 by 0.1%. It can be seen that the
introduction of OHEM increases the [email protected] by 0.5% and the F1 by 0.4%. This shows
its improvement for the problems of sample imbalance and insufficient training of hard
case samples in the target detection task. After replacing NMS with Soft-NMS, the miss
rate is reduced, while performance and accuracy are improved. Finally, by combining
the modules with a multi-scale training strategy, an improved Faster R-CNN is obtained
with a mAP of 77.8%, which is 5.1% higher than the original Faster R-CNN, and the
F1 is improved by 5.3%. It is worth noting that the LAMR of the proposed approach
compared to Improve5 is lower. This improvement is attributed to the efficacy of the
MST [16] in reducing the miss rate and enhancing the model’s recognition capabilities for
small targets. The partial detection results for Faster R-CNN, Improved5, and Improved
Faster R-CNN are shown in Figure 12. A comparison shows that Faster R-CNN is not
very effective in detecting cows and bottles with small sizes and misses some small
targets. While Improved5 improves this phenomenon, the network can detect some
small targets that were originally missed. The best detection was achieved by Improved
Faster R-CNN, which had more accurate edge localization, detected more small targets,
and largely correctly identified overlapping and low-contrast targets. The detection
performance of this model is much stronger.
J. Imaging
J. Imaging2024,
2024, 10,
10, x FOR PEER REVIEW
197 16 of 19 18 of 2

Figure
Figure Visual
12.12. comparison
Visual of detection
comparison resultsresults
of detection for ablation experiments.
for ablation experiments.
5. Conclusions
5. Conclusions
This paper proposes a novel two-stage object detection model for detecting multi-
This paper
scale objects proposes
from diverse a novelThe
categories. two-stage object detection
model introduces model to
improvements forthedetecting
Faster mult
scale objects
R-CNN and its from
trunk diverse categories.
feature extraction The model
network. DIOU,introduces
OHEM, andimprovements to the Faste
Soft-NMS are used
toR-CNN
improveand
theits
problems of unbalanced positive and negative samples and target
trunk feature extraction network. DIOU, OHEM, and Soft-NMS are use miss
rate during model
to improve training. The
the problems RPN is also positive
of unbalanced optimizedand
andnegative
the proposed approach
samples is
and target mis
trained by employing a multi-scale training strategy. Comparison experiments with trunk
rate during model training. The RPN is also optimized and the proposed approach
networks verify that using the ResNet101 feature extraction network is more advantageous.
trained by employing a multi-scale training strategy. Comparison experiments with trun
networks verify that using the ResNet101 feature extraction network is more advanta
geous. The validity of the proposed approach is further confirmed by comparison exper
ments with other detection models. Ablation experiments are also conducted to verify tha
the modules in the proposed approach can indeed be useful. The experiments show tha
J. Imaging 2024, 10, 197 17 of 19

The validity of the proposed approach is further confirmed by comparison experiments

with other detection models. Ablation experiments are also conducted to verify that the
modules in the proposed approach can indeed be useful. The experiments show that the
proposed method has a [email protected] of 77.8% and an F1 of 60.6%, which are 5.1% and 5.3%
higher than the original Faster R-CNN, respectively. The experimental results show that
the proposed method can improve the accuracy and performance of object detection in a
multi-scale target detection environment. In the future, we will further optimize, extend,
and experiment on more datasets so that the model can be better applied to different types
of object detection scenarios and provide a roadmap for continued advancements in the
field of multi-scale object detection.

Author Contributions: Q.C. and M.L. designed the study; Q.C. and Z.L. performed the research; M.L.
and J.Z. conceived the idea; Q.C., J.Z. and Z.L. provided and analyzed the data; Q.C., M.L. and L.G.
helped perform the analysis with constructive discussions. All authors have read and agreed to the
published version of the manuscript.
Funding: This research was funded by the National Natural Science Foundation of China (Grant
Numbers: 51663001, 52063002, 42061067 and 61741202).
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Data Availability Statement: The dataset is available free of charge at kaggle. (https://www.kaggle.
com/datasets/qianyongchen/dataset, accessed on 10 August 2024).
Acknowledgments: The authors thank the anonymous reviewers and editors for their valuable
comments and constructive suggestions.
Conflicts of Interest: The authors declare no conflicts of interest.

References
1. Zeng, N.; Wu, P.; Wang, Z.; Li, H.; Liu, W.; Liu, X. A Small-Sized Object Detection Oriented Multi-Scale Feature Fusion Approach
with Application to Defect Detection. IEEE Trans. Instrum. Meas. 2022, 71, 1–14. [CrossRef]
2. Deng, Y.; Hu, X.L.; Li, B.; Zhang, C.X.; Hu, W.M. Multi-scale self-attention-based feature enhancement for detection of targets
with small image sizes. Pattern Recogn. Lett. 2023, 166, 46–52. [CrossRef]
3. Ma, Y.L.; Wang, Q.Q.; Cao, L.; Li, L.; Zhang, C.J.; Qiao, L.S.; Liu, M.X. Multi-Scale Dynamic Graph Learning for Brain Disorder
Detection with Functional MRI. IEEE Trans. Neur. Syst. Rehabil. 2023, 31, 3501–3512. [CrossRef] [PubMed]
4. Menezes, A.G.; de Moura, G.; Alves, C.; de Carvalho, A. Continual Object Detection: A review of definitions, strategies, and
challenges. Neural Netw. 2023, 161, 476–493. [CrossRef] [PubMed]
5. Xu, S.B.; Zhang, M.H.; Song, W.; Mei, H.B.; He, Q.; Liotta, A. A systematic review and analysis of deep learning-based underwater
object detection. Neurocomputing 2023, 527, 204–232. [CrossRef]
6. Goswami, P.K.; Goswami, G. A Comprehensive Review on Real Time Object Detection using Deep Learing Model. In Proceedings
of the 2022 11th International Conference on System Modeling & Advancement in Research Trends (SMART), Moradabad, India,
16–17 December 2022; pp. 1499–1502.
7. Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Coiumbus, OH, USA, 23–28 June 2014;
IEEE: Piscataway, NJ, USA, 2014; pp. 580–587.
8. Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Trans.
Pattern Anal. 2017, 28, 1137–1149. [CrossRef] [PubMed]
9. Cai, Z.; Vasconcelos, N. Cascade r-cnn: Delving into high quality object detection. In Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition(CVPR), Salt Lake City, UT, USA, 18–22 June 2018; IEEE: Piscataway, NJ, USA, 2018;
pp. 6154–6162.
10. Wan, S.; Goudos, S. Faster R-CNN for multi-class fruit detection using a robotic vision system. Comput. Netw. 2020, 168, 107036.
[CrossRef]
11. Yu, Y.; Zhang, K.; Yang, L.; Zhang, D. Fruit detection for strawberry harvesting robot in non-structural environment based on
Mask-RCNN. Comput. Electron. Agric. 2019, 163, 104846. [CrossRef]
12. Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; IEEE: Piscataway,
NJ, USA, 2016; pp. 779–788.
J. Imaging 2024, 10, 197 18 of 19

13. Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.-Y.; Berg, A.C. Ssd: Single shot multibox detector. In Proceedings of
the 2016 14th European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands, 11–14 October 2016; pp. 21–37.
14. Liang, Q.; Zhu, W.; Long, J.; Wang, Y.; Sun, W.; Wu, W. A real-time detection framework for on-tree mango based on SSD network.
In Proceedings of the 2018 11th International Conference on Intelligent Robotics and Applications (ICIRA), Newcastle, NSW,
Australia, 9–11 August 2018; pp. 423–436.
15. Anagnostis, A.; Tagarakis, A.C.; Asiminari, G.; Papageorgiou, E.; Kateris, D.; Moshou, D.; Bochtis, D. A deep learning approach
for anthracnose infected trees classification in walnut orchards. Comput. Electron. Agric. 2021, 182, 105998. [CrossRef]
16. Tian, X.X.; Bi, C.K.; Han, J.; Yu, C. EasyRP-R-CNN: A fast cyclone detection model. Vis. Comput. 2024, 40, 4829–4841. [CrossRef]
17. Li, W.; Zhang, L.C.; Wu, C.H.; Cui, Z.X.; Niu, C. A new lightweight deep neural network for surface scratch detection. Int. J. Adv.
Manuf. Tech. 2022, 123, 1999–2015. [CrossRef] [PubMed]
18. Tong, K.; Wu, Y. Rethinking PASCAL-VOC and MS-COCO dataset for small object detection. J. Vis. Commun. Image R. 2023,
93, 103830. [CrossRef]
19. Demir, A.; Yilmaz, F.; Kose, O. Early detection of skin cancer using deep learning architectures: Resnet-101 and inception-v3. In
Proceedings of the 2019 Medical Technologies Congress (TIPTEKNO 2019), Izmir, Turkey, 3–5 October 2019; pp. 1–4.
20. Shrivastava, A.; Gupta, A.; Girshick, R. Training region-based object detectors with online hard example mining. In Proceedings
of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; IEEE:
Piscataway, NJ, USA, 2016; pp. 761–769.
21. Bodla, N.; Singh, B.; Chellappa, R.; Davis, L.S. Soft-NMS--improving object detection with one line of code. In Proceedings of the
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; IEEE: Piscataway,
NJ, USA, 2017; pp. 5561–5569.
22. Zheng, Z.; Wang, P.; Liu, W.; Li, J.; Ye, R.; Ren, D. Distance-IoU loss: Faster and better learning for bounding box regression. In
Proceedings of the 2020 20th AAAI Conference on Artificial Intelligence (AAAI-20), New York, NY, USA, 7–12 February 2020;
pp. 12993–13000.
23. Tian, R.; Shi, H.; Guo, B.; Zhu, L. Multi-scale object detection for high-speed railway clearance intrusion. Appl. Intell. 2022, 52,
3511–3526. [CrossRef]
24. Wang, H.; Xiao, N. Underwater object detection method based on improved Faster RCNN. Appl. Sci. 2023, 13, 2746. [CrossRef]
25. Lu, X.; Wang, H.; Zhang, J.J.; Zhang, Y.T.; Zhong, J.; Zhuang, G.H. Research on J wave detection based on transfer learning and
VGG16. Biomed. Signal. Process. 2024, 95, 106420. [CrossRef]
26. Pal, S.K.; Pramanik, A.; Maiti, J.; Mitra, P. Deep learning in multi-object detection and tracking: State of the art. Appl. Intell. 2021,
51, 6400–6429. [CrossRef]
27. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition(CVPR), Las Vegas, NV, USA, 27–30 June 2016; IEEE: Piscataway, NJ, USA, 2016;
pp. 770–778.
28. Corso, M.P.; Stefenon, S.F.; Singh, G.; Matsuo, M.V.; Perez, F.L.; Leithardt, V.R.Q. Evaluation of visible contamination on power
grid insulators using convolutional neural networks. Electr. Eng. 2023, 105, 3881–3894. [CrossRef]
29. Chan, S.X.; Tao, J.; Zhou, X.L.; Bai, C.; Zhang, X.Q. Siamese Implicit Region Proposal Network with Compound Attention for
Visual Tracking. IEEE Trans. Image Process. 2022, 31, 1882–1894. [CrossRef] [PubMed]
30. Sha, G.; Wu, J.; Yu, B. The improved faster-RCNN for spinal fracture lesions detection. J. Intell. Fuzzy Syst. 2022, 42, 5823–5837.
[CrossRef]
31. Zhou, D.; Fang, J.; Song, X.; Guan, C.; Yin, J.; Dai, Y.; Yang, R. Iou loss for 2d/3d object detection. In Proceedings of the 2019
International Conference on 3D Vision (3DV), Québec City, QC, Canada, 16–19 September 2019; pp. 85–94.
32. Shen, Y.Y.; Zhang, F.Z.; Liu, D.; Pu, W.H.; Zhang, Q.L. Manhattan-distance IOU loss for fast and accurate bounding box regression
and object detection. Neurocomputing 2022, 500, 99–114. [CrossRef]
33. Wang, Z.H.; Jiang, Q.P.; Zhao, S.S.; Feng, W.S.; Lin, W.S. Deep Blind Image Quality Assessment Powered by Online Hard Example
Mining. IEEE Trans. Multimed. 2023, 25, 4774–4784. [CrossRef]
34. Li, W.B.; Wang, Q.; Gao, S. PF-YOLOv4-Tiny: Towards Infrared Target Detection on Embedded Platform. Intell. Autom. Soft
Comput. 2023, 37, 921–938. [CrossRef]
35. Xiao, L.; Wu, B.; Hu, Y. Surface defect detection using image pyramid. IEEE Sens. J. 2020, 20, 7181–7188. [CrossRef]
36. Sun, P.; Zhang, R.; Jiang, Y.; Kong, T.; Xu, C.; Zhan, W.; Tomizuka, M.; Li, L.; Yuan, Z.; Wang, C.; et al. Sparse R-CNN: End-to-End
Object Detection with Learnable Proposals. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
(CVPR), Nashville, TN, USA, 20–25 June 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 14449–14458.
37. Fang, H.; Ding, L.; Wang, L.; Chang, Y.; Yan, L.; Han, J. Infrared Small UAV Target Detection Based on Depthwise Separable
Residual Dense Network and Multiscale Feature Fusion. IEEE Trans. Instrum. Meas. 2022, 71, 1–20. [CrossRef]
38. Smart, P.D.S.; Thanammal, K.K.; Sujatha, S.S. An Ontology Based Multilayer Perceptron for Object Detection. Comput. Syst. Sci.
Eng. 2023, 44, 2065–2080. [CrossRef]
39. Zhang, X.; Zhao, C.; Luo, H.Z.; Zhao, W.Q.; Zhong, S.; Tang, L.; Peng, J.Y.; Fan, J.P. Automatic learning for object detection.
Neurocomputing 2022, 484, 260–272. [CrossRef]
40. Chen, J.A.; Tam, D.; Raffel, C.; Bansal, M.; Yang, D.Y. An Empirical Survey of Data Augmentation for Limited Data Learning in
NLP. Trans. Assoc. Comput. Linguist 2023, 11, 191–211. [CrossRef]
J. Imaging 2024, 10, 197 19 of 19

41. Shi, J.; Ghazzai, H.; Massoud, Y. Differentiable Image Data Augmentation and Its Applications: A Survey. IEEE Trans. Pattern
Anal. 2024, 46, 1148–1164. [CrossRef]
42. Gower, R.M.; Loizou, N.; Qian, X.; Sailanbayev, A.; Shulgin, E.; Richtárik, P. SGD: General analysis and improved rates. In
Proceedings of the International Conference on Machine Learning (ICML), Long Beach, CA, USA, 9–15 June 2019; pp. 5200–5209.
43. Lin, T.-Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International
Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2980–2988.
44. He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision
(ICCV), Venice, Italy, 22–29 October 2017; pp. 2961–2969.
45. Bochkovskiy, A.; Wang, C.-Y.; Liao, H.-Y.M. Yolov4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.

Li 2021 J. Phys.: Conf. Ser. 1827 012085
No ratings yet
Li 2021 J. Phys.: Conf. Ser. 1827 012085
11 pages
Object Detection Using Adaptive Mask RCNN
No ratings yet
Object Detection Using Adaptive Mask RCNN
12 pages
Remotesensing 14 00984 v2
No ratings yet
Remotesensing 14 00984 v2
21 pages
Attention and Feature Fusion SSD For Remote Sensing Object Detection
No ratings yet
Attention and Feature Fusion SSD For Remote Sensing Object Detection
9 pages
A Brief Review and Challenges of Object 2020
No ratings yet
A Brief Review and Challenges of Object 2020
17 pages
Object Detection Techniques A Review
No ratings yet
Object Detection Techniques A Review
9 pages
Seminar Paper by Roquia Salam
No ratings yet
Seminar Paper by Roquia Salam
29 pages
Object Detection Based On Efficient Multiscale Auto-Inference in Remote Sensing Images
No ratings yet
Object Detection Based On Efficient Multiscale Auto-Inference in Remote Sensing Images
5 pages
Real Time Object Detection in Surveillance Cameras With 2xjeq74wam
No ratings yet
Real Time Object Detection in Surveillance Cameras With 2xjeq74wam
8 pages
Deep Learning Object Detection
No ratings yet
Deep Learning Object Detection
7 pages
IMINT Target Acquisition Using Deep Learning
No ratings yet
IMINT Target Acquisition Using Deep Learning
5 pages
10 1109@access 2019 2932731
No ratings yet
10 1109@access 2019 2932731
9 pages
Remote Sensing: Improved YOLO Network For Free-Angle Remote Sensing Target Detection
No ratings yet
Remote Sensing: Improved YOLO Network For Free-Angle Remote Sensing Target Detection
20 pages
Mask R-CNN for Indoor Scene Segmentation
No ratings yet
Mask R-CNN for Indoor Scene Segmentation
13 pages
Literature Survey For Robotics
No ratings yet
Literature Survey For Robotics
6 pages
Object Detection in Satellite Images by Faster R-CNN Incorporated With Enhanced ROI Pooling (FrRNet-ERoI) Framework
No ratings yet
Object Detection in Satellite Images by Faster R-CNN Incorporated With Enhanced ROI Pooling (FrRNet-ERoI) Framework
18 pages
Faster R-CNN with Region Proposal Networks
No ratings yet
Faster R-CNN with Region Proposal Networks
9 pages
Sensors 23 05824
No ratings yet
Sensors 23 05824
23 pages
Applsci 13 12977
No ratings yet
Applsci 13 12977
21 pages
SuperYOLO: Fast Object Detection in RSI
No ratings yet
SuperYOLO: Fast Object Detection in RSI
15 pages
Drones 05 00066 v3
No ratings yet
Drones 05 00066 v3
24 pages
Remote Sensing Target Application
No ratings yet
Remote Sensing Target Application
23 pages
Remotesensing 15 03265
No ratings yet
Remotesensing 15 03265
29 pages
Deep Learning for Daily Object Detection
No ratings yet
Deep Learning for Daily Object Detection
6 pages
An Improved Faster R-CNN For Same Object
No ratings yet
An Improved Faster R-CNN For Same Object
12 pages
Engproc 33 00022
No ratings yet
Engproc 33 00022
6 pages
Faster R-CNN: Real-Time Object Detection
No ratings yet
Faster R-CNN: Real-Time Object Detection
13 pages
Multiscale Object Detection in Remote Sensing Images Using 1qh06jan
No ratings yet
Multiscale Object Detection in Remote Sensing Images Using 1qh06jan
10 pages
1525 Context Augmentation and Featu
No ratings yet
1525 Context Augmentation and Featu
11 pages
Development of Framework For Detecting Smoking Scenes
No ratings yet
Development of Framework For Detecting Smoking Scenes
5 pages
Grid R-CNN: Enhanced Object Detection
No ratings yet
Grid R-CNN: Enhanced Object Detection
10 pages
Real-Time Object Detection Analysis
No ratings yet
Real-Time Object Detection Analysis
11 pages
Anchor
No ratings yet
Anchor
19 pages
Applsci 13 08161
No ratings yet
Applsci 13 08161
17 pages
Applsci 12 03322 v2
No ratings yet
Applsci 12 03322 v2
17 pages
GFD-SSD: Fast Multispectral Detection
No ratings yet
GFD-SSD: Fast Multispectral Detection
10 pages
Object Detection Using Deep Learning
No ratings yet
Object Detection Using Deep Learning
6 pages
Weapon Detection with YOLO Models
No ratings yet
Weapon Detection with YOLO Models
10 pages
Face Detection With The Faster R-CNN
No ratings yet
Face Detection With The Faster R-CNN
6 pages
Zeng 2020
No ratings yet
Zeng 2020
12 pages
Few-Shot Object Detection in Drone Images
No ratings yet
Few-Shot Object Detection in Drone Images
8 pages
Region-Based Object Detection and Classification Using Faster R-CNN
No ratings yet
Region-Based Object Detection and Classification Using Faster R-CNN
6 pages
Object Detection Techniques with ODUELAN
No ratings yet
Object Detection Techniques with ODUELAN
6 pages
An Improved Rotation Invariant CNN-based Detector With Rotatable Bounding Boxes For Aerial Image Detection
No ratings yet
An Improved Rotation Invariant CNN-based Detector With Rotatable Bounding Boxes For Aerial Image Detection
5 pages
SuperYOLO: Fast Object Detection in RSI
No ratings yet
SuperYOLO: Fast Object Detection in RSI
14 pages
A Rich Feature Fusion Single-Stage Object Detector
No ratings yet
A Rich Feature Fusion Single-Stage Object Detector
8 pages
A Performance Comparison and Enhancement of Animal Species Detection in Images With Various R-CNN Models
No ratings yet
A Performance Comparison and Enhancement of Animal Species Detection in Images With Various R-CNN Models
26 pages
Du 2018 J. Phys. Conf. Ser. 1004 012029
No ratings yet
Du 2018 J. Phys. Conf. Ser. 1004 012029
9 pages
Scalable High Quality Object Detection
No ratings yet
Scalable High Quality Object Detection
10 pages
Accurate Single Stage Detector Using Recurrent Rolling Convolution
No ratings yet
Accurate Single Stage Detector Using Recurrent Rolling Convolution
9 pages
Yang, Liu Et Al 2022 - Graph R-CNN - Towards Accurate 3D Object Detection With Semantic-Decorated Local Graph
No ratings yet
Yang, Liu Et Al 2022 - Graph R-CNN - Towards Accurate 3D Object Detection With Semantic-Decorated Local Graph
18 pages
Ijlbps 6620dd20c5747
No ratings yet
Ijlbps 6620dd20c5747
8 pages
Paper 2 - MaskRcnn
No ratings yet
Paper 2 - MaskRcnn
15 pages
Slicing Aidedhyperinferenceandfine-Tuning Forsmallobjectdetection
No ratings yet
Slicing Aidedhyperinferenceandfine-Tuning Forsmallobjectdetection
5 pages
A Comprehensive Survey of The R-CNN Family For Object Detection
No ratings yet
A Comprehensive Survey of The R-CNN Family For Object Detection
6 pages
A Target Detection Algorithm in Traffic Scenes Based On Deep Reinforcement Learning
No ratings yet
A Target Detection Algorithm in Traffic Scenes Based On Deep Reinforcement Learning
14 pages
Object Tracking in Crowd Environment Using Deep Learning
No ratings yet
Object Tracking in Crowd Environment Using Deep Learning
8 pages
Biochem - Enzymes Report Script
No ratings yet
Biochem - Enzymes Report Script
5 pages
University of The Cordilleras Third Trimester 2019-2020 Ethics
No ratings yet
University of The Cordilleras Third Trimester 2019-2020 Ethics
4 pages
Chuyên 8.2025
No ratings yet
Chuyên 8.2025
9 pages
School Dropout Risk Tools for Teachers
No ratings yet
School Dropout Risk Tools for Teachers
11 pages
01 - Full Score - II V I
No ratings yet
01 - Full Score - II V I
5 pages
A Rose For Emily Analysis
75% (4)
A Rose For Emily Analysis
1 page
GNKHHV
No ratings yet
GNKHHV
3 pages
Dontrey: The Music of Cambodia
No ratings yet
Dontrey: The Music of Cambodia
15 pages
Stakeholder Management 101 Suter en 36871
No ratings yet
Stakeholder Management 101 Suter en 36871
3 pages
Ogl 058051755080434667 PDF
No ratings yet
Ogl 058051755080434667 PDF
5 pages
Hysteria Guitar Tab by Def Leppard
No ratings yet
Hysteria Guitar Tab by Def Leppard
5 pages
How To Write A Term Paper in Mla Format
100% (1)
How To Write A Term Paper in Mla Format
9 pages
Active CV Dr. Panagiotis Vagianas
No ratings yet
Active CV Dr. Panagiotis Vagianas
5 pages
Sibonginkosi Saruchera - Resume
No ratings yet
Sibonginkosi Saruchera - Resume
3 pages
r1034 r1035 Quick-Rna Viral Kit
No ratings yet
r1034 r1035 Quick-Rna Viral Kit
16 pages
Management Integrative Managerial Issues
No ratings yet
Management Integrative Managerial Issues
19 pages
Kisi dan Kartu Soal Kurikulum Merdeka
No ratings yet
Kisi dan Kartu Soal Kurikulum Merdeka
2 pages
For Flag Raising Ceremony Program
No ratings yet
For Flag Raising Ceremony Program
5 pages
Legal Termination Due to Redundancy
No ratings yet
Legal Termination Due to Redundancy
8 pages
Topics - Professional Ethics (DR Nidhi Chauhan Revisions)
No ratings yet
Topics - Professional Ethics (DR Nidhi Chauhan Revisions)
3 pages
ELECTRONICS
No ratings yet
ELECTRONICS
5 pages
Kotak Mahindra Bank Alwar Account Summary
No ratings yet
Kotak Mahindra Bank Alwar Account Summary
5 pages
"Super 30": Long Questions
No ratings yet
"Super 30": Long Questions
16 pages
Attributes of Meaningful Learning in Online English Teaching and Learning
No ratings yet
Attributes of Meaningful Learning in Online English Teaching and Learning
148 pages
Imogene King.
100% (2)
Imogene King.
4 pages
The Simón Bolívar Law Can Lead To "Civil Death" With Simple Suspicions
No ratings yet
The Simón Bolívar Law Can Lead To "Civil Death" With Simple Suspicions
7 pages
Direction Sense 50 Questions& Explanations
No ratings yet
Direction Sense 50 Questions& Explanations
18 pages
XII CS Project: Employee System
No ratings yet
XII CS Project: Employee System
9 pages
Auto Ex
No ratings yet
Auto Ex
2 pages
Surface Finish To DIN ISO 1302
0% (1)
Surface Finish To DIN ISO 1302
2 pages