Machine Vision-Based Object Detection Strategy For
Machine Vision-Based Object Detection Strategy For
Scientific Programming
Volume 2022, Article ID 1188974, 12 pages
https://doi.org/10.1155/2022/1188974
Research Article
Machine Vision-Based Object Detection Strategy for Weld Area
Received 27 February 2022; Revised 18 March 2022; Accepted 21 March 2022; Published 11 April 2022
Copyright © 2022 Chenhua Liu et al. This is an open access article distributed under the Creative Commons Attribution License,
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
For the noisy industrial environment, the welded parts will have different types of defects in the weld area during the welding
process, which need to be polished, and there are disadvantages such as low efficiency and high labor intensity when polishing
manually; machine vision is used to automate the polishing and achieve continuous and efficient work. In this study, the Faster
R-CNN object detection algorithm of two-stage is used to investigate the relationship between flops and the number of network
parameters on the model by using a V-shaped welded thick plate as the research object and establishing the workpiece dataset with
different lighting and angles, using six regional candidate networks for migration learning, comparing the convergence degree of
different Batch and Mini-Batch on the model, and exploring the relationship between flops and the number of network parameters
on the model. The optimal learning rate is selected for training to form a weld area object detection network based on the weld
plate workpiece under few samples. The study shows that the VGG16 model is the best in weld seam area recognition with 91.68%
average accuracy and 25.02 ms average detection time in the validation set, which can effectively identify weld seam areas in
various industrial environments and provide location information for subsequent automatic grinding of robotic arms.
Num:1000 Num:5000
Training Set (80%)
Num=4000
infer-001 infer-002 infer-003 infer-004 infer-005 infer-006 infer-007
Figure 3: Image data enhancement: (a) original image, (b) Gaussian filtered image, (c) image with ’salt and pepper,’ (d) image with random
rotation, (e) image with random flap, and (f ) image with random cut.
Classification Layer
and the final output is the ROI feature map. Finally, the fully Initial training was conducted using SGD (stochastic
connected layer and the Softmax classifier determine gradient descent) with learning rate set to 0.1, momentum
whether the candidate region is the weld region and output factor set to 0.9, maximum Epochs set to 30, and dropout set
the exact location of the bounding box [16]. to 0.1; the six feature region candidate networks were trained
Six pretrained convolutional neural networks VGG16 in turn, respectively. During the training process, the loss
[17], VGG19 [18], Googlenet [19], Resnet50 [20], Alexnet function L is as in equations (1), (2), and (3):
[21], and Lenet [22] on the neu-dataset are used as RPNs for 1 1
migration learning so that the Faster R-CNN model first L pi , ui � Lcls pi , pi∗ + λ p , (1)
Ncls i Nreg i i
obtains the underlying feature weights of the images and
then migrates the learning of these feature information to
the task of weld region recognition, achieving the goal that Lcls pi , pi∗ � −logpi pi∗ + 1 − pi 1 − pi∗ , (2)
the model can recognize accurate weld regions with a small
number of artifacts. Lreg ti , ti∗ � R ti − ti∗ , (3)
4 Scientific Programming
(a) (b)
number of parameters [28] for the convolutional and fully material is mild steel Q215, the welding method used is melt
connected layers is calculated as shown in equation (8), (9), electrode gas metal arc welding (GMAW) and multilayer
and (10): multipass welding, the shielding gas is argon, the welding
current is 200 A, the welding wire diameter is 2 mm, and
ParasConv � n ×(h × w × c + 1), (8)
welding speed is 2 mm per second. By the recognition effect
of the weld seam on the same V-shaped welding plate, the
ParasFull � Weightin × Weightout , (9) convolutional neural network with the best effect is selected
as the RPN, and the model is fine-tuned on this basis for
i
comparative analysis of the recognition of the weld seam in
ParasNum � ParasConvi + ParasFulli , (10) different work scenarios.
1
Ensuring consistent FLOPS, the network accuracy im-
where c is the number of input channels, n is the number of ages and training loss images are plotted using the initial
output channels, h is the height of the convolutional layer, training parameters set in Section 2.2, using six different
and w is the width of the convolutional layer. The pooling RPNs in the training and validation sets, respectively. The
layer does not need to calculate the number of parameters. A result is shown in Figure 5.
low number of parameters prevents the model from As shown in Figure 6, the statistical training results
reaching the weld area features, making the model underfit. obtained by changing different RPNs in the Faster R-CNN
A high number of parameters will cause the model to occupy model are shown in Table 3, where AP (%) - T is the average
too much memory space, and the memory access cost precision of the training set and AP (%) - V is the average
(MAC) will increase. To measure the model complexity of precision of the validation set.
the candidate area network, the number of floating-point 100, 200, 300, 400, and 500 different weld plate images
operations, FlOPs, is introduced. are taken from the weld plate part test set as input images,
To compute FLOPs, this study assumes convolution is and their average detection times are recorded to evaluate
implemented as a sliding window [29] and that the non- the operational efficiency of the algorithm (Figure 7).
linearity function is computed free. For the specified con- Resnet50 is used as the RPN with the highest average
volution kernel, it is calculated as in equation (11). accuracy in the training and validation sets, and the fraction
of recognized weld areas is up to 94.34% (Figure 8(d)). The
FLOPs � 2HWCin K2 + 1Cout , (11) Resnet series network provides a shortcut connection
mechanism, which ensures a reasonable recognition rate
where H, W, and Cin are height, width, and number of
even if the network depth of Resnet50 is 50 layers and the
channels of the output feature map, K is the kernel width
number of layers.
(assumed to be symmetric), and Cout is the number of output
The increase in the number of layers leads to flops
channels [30]. For fully connected layer, it is calculated as
4GFLOPs. Resnet50 average detection time is 92.03 ms.
shown in (7):
Among them, when Alexnet is used as RPN, the flops are
FLOPs � 2Din − 1Dout , (12) 727MFLOPs, and the average detection time consumes the
shortest 18.02 ms because the number of convolutional
where Din is the input dimensionality and Dout is the output layers is 5, the network learning effect is not as good as the
dimensionality. The performance parameters of each RPN deep network Figure 8(b), and its average recognition ac-
are shown in Table 2. curacy is 75.50% (Figure 8(a)). In the training process,
Alexnet accuracy function and loss function gradually
3. Result and Analysis converge, and the loss function curve in the validation set
fails to approximate the loss function curve in the training
3.1. Recognition Results’ Weld Areas by Different Feature set, so the result weld candidate box has more rear view parts
Extraction Networks. This study uses a V-shaped welded in the weld region candidate box (weld plate region). VGG19
steel plate with dimensions of 30.00 cm × 17.00 cm × can identify the weld area more completely (Figure 8(f ))
0.50 cm, a V-shaped opening angle of 45°, and a weld seam with 20GFLOPs and 20,483,904 parameters, as shown in
formed in the following welding parameters: the steel plate Table 2, resulting in a memory occupation of 548 M and a
6 Scientific Programming
1.0 1.0
0.9 0.9
0.8 0.8
0.7 0.7
Accuracy
Accuracy
0.6 0.6
0.5 0.5
0.4
0.4 0.3
0.3 0.2
0.2 0.1
0.1 0.0
0 5 10 15 20 25 30 0 5 10 15 20 25 30
Epochs Epochs
3.0
2.8
2.4 2.5
2.0 2.0
LOSS
LOSS
1.6
1.5
1.2
0.8 1.0
0.4 0.5
0 5 10 15 20 25 30 0 5 10 15 20 25 30
Epochs Epochs
The Results Of LeNet The Results Of GoogleNet
Training LOSS Training LOSS
Validation LOSS Validation LOSS
(a) (b)
1.0 1.0
0.9 0.9
0.8 0.8
0.7 0.7
Accuracy
Accuracy
0.6 0.6
0.5 0.5
0.4 0.4
0.3 0.3
0.2 0.2
0.1 0.1
0 5 10 15 20 25 30 0 5 10 15 20 25 30
Epochs Epochs
1.4 1.4
1.2 1.2
1.0 1.0
LOSS
LOSS
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0 5 10 15 20 25 30 0 5 10 15 20 25 30
Epochs Epochs
The Results Of VGG16 The Results Of VGG19
(c) (d)
Figure 6: Continued.
Scientific Programming 7
1.0 1.0
0.9
0.8 0.8
0.7
Accuracy
Accuracy
0.6 0.6
0.5
0.4 0.4
0.3
0.2 0.2
0.1
0 5 10 15 20 25 30 0 5 10 15 20 25 30
Epochs Epochs
1.4 2.6
2.4
1.2 2.2
2.0
1.0 1.8
LOSS
LOSS
0.8 1.6
1.4
0.6 1.2
1.0
0.4 0.8
0.2 0.6
0 5 10 15 20 25 30 0 5 10 15 20 25 30
Epochs Epochs
The Results Of ResNet50 The Results Of AlexNet
(e) (f )
Figure 6: Training curves of different RPN. (a) Training curves of LeNet, (b) Training curves of GoogleNet, (c) Training curves of VGG16,
(d) Training curves of VGG19, (e) Training curves of ResNet50. (f ) Training curves of AlexNet.
180
160
155.50 153.80 151.60 150.30
140 148.70
Detection Time (ms)
120
100
93.50 92.80 91.50 91.00 91.35
80
72.45 71.70 69.30 68.15 70.90
60
40
26.48 23.15 24.75 25.30 25.42
20 18.70 18.40 17.50 17.30 18.20
12.50 12.00 12.53 12.80 11.92
0
100 200 300 400 500
Number Of Test Images (n)
VGG16 VGG19
Resnet50 Alexnet
Googlenet LeNet
Figure 7: Runtime with different numbers of test sets.
8 Scientific Programming
(d) (e) (f )
Figure 8: Recognition results of different RPN models for the same weld plate (comparison of six different RPNs using the same weld plate
workpiece): (a) Alexnet, (b) Googlenet, (c) Lenet, (d) Resnet50, (e) VGG16, and (f ) VGG19.
bloated model with a detection time of 70.50 ms. The worst reduce the memory occupied memory space. The average
performance is LeNet, which has the shortest average de- elapsed time is lower, only 25.02 ms, and the confidence level
tection time, occupies 2.32 MFlops, and has the smallest of the identified weld region is up to 91.30%, and the entire
number of parameters among the six networks. weld region is framed (Figure 8(e)). Therefore, the Faster
Because its network layer is 7, it cannot learn the weld R-CNN model based on the VGG16 candidate region
region’s image features in-depth and is more suitable for network is selected as the model to identify the weld region.
dealing with scenarios requiring shallow network work and
can only identify some features in the weld region under this 3.2. Optimization of Models. As shown in the Result of
study (Figure 8(c)). VGG16 in Figure 5, the Faster R-CNN based on VGG16 for
The average accuracy of VGG16 is slightly lower than weld area recognition object detection network achieves
that of Resnet50 at 91.55%. Since VGG16 uses several 91.55% accuracy in the training set, and the accuracy in the
consecutive 3x3 convolutional kernels instead of the larger validation set cannot converge to the same level as the test set.
convolutional kernels in Alexnet compared to AlexNet, for a And in order to reduce the loss value to convergence in both
given receptive field, the performance of the small con- training set and test set, different learning rates are chosen to be
volutional kernels with stacking is better than that of the adjusted, and different numbers of Epochs are added, and
large convolutional kernels because the multilayer nonlinear Mini-Batch is more suitable for fewer sample data set models,
layers can increase the network depth to ensure learning of so the optimization of the model is added on this basis. In order
more complex pat-terns, the flops’ value of VGG16 is lower to make a comparison with the initial training parameters in
than that of VGG19 in the same series, and the number of Section 2.2, the size of the Mini-Batch is chosen to be 128. the
parameters is increased to ensure the learning effect and time cost of training. As shown in Table 4.
Scientific Programming 9
Table 5: Accuracy results of training models with different numbers of Epoch and different learning rates.
Epochs: training accuracy (validation accuracy) (%)
Learning rate
30 35 40 45 50 55 60
0.1 89.57 (81.63) 91.18 (82.34) 93.88 (83.65) 92.03 (84.55) 92.86 (85.64) 92.01 (84.63) 91.61 (83.52)
0.01 93.85 (84.56) 94.10 (86.78) 94.33 (87.12) 94.85 (89.73) 95.17 (91.23) 94.01 (90.55) 93.51 (89.77)
0.001 94.56 (86.64) 94.84 (88.50) 95.25 (89.66) 95.36 (90.16) 95.66 (92.38) 93.83 (90.56) 92.84 (89.71)
0.0001 95.75 (89.31) 95.89 (89.65) 95.91 (91.68) 95.85 (92.36) 95.93 (92.45) 94.59 (91.63) 93.88 (91.09)
0.00001 94.79 (88.71) 94.97 (89.08) 95.48 (90.16) 95.12 (91.74) 95.25 (91.65) 94.77 (91.02) 93.87 (91.87)
The plasticity of the VGG16-based Faster-RCNN weld determined the results more excellent than this threshold as
region detection model is verified by adjusting different positive samples and those less than this threshold as
Epoch and learning rates, and then, the optimal network negative samples. As can be seen in Figure 9, as the entry is
model is obtained, and the results are shown in Table 5. gradually reduced, more and more weld samples are pre-
In Table 5, using VGG16 as RPN for migration learning dicted to be positive, and the curve reaches the (1, 0) co-
at different learning rates, the model’s accuracy gradually ordinate point, indicating that the target detection model
improves with increasing training times and starts to con- can identify the weld seam in the image data as a positive
verge in both the training and validation sets. Reducing the sample. In equation (7), the area enclosed by the curve and
learning rate of the network model can effectively improve the coordinate perimeter is AP. As shown in Figure 9, the
the average accuracy of the overall model network at the area surrounded by its X- and Y-axes occupies almost the
same number of training sessions. Among them, when the entire coordinate axis area, indicating the strong recognition
learning rate is 0.1 and 0.01, the network converges very performance of the model.
slowly, increasing the time to find the optimal value and not As shown in Figure 10, the color is similar to that of the
achieving optimal training accuracy. When the learning rate weld plate workpiece at different angles and working en-
is 0.0001 and Epochs is 30, the accuracy of the network on vironments, but the network can also identify the weld area
top of the training set reaches 95.75%, and as Epochs keep of both weld plate workpieces with a score above 90%. As
increasing, the training accuracy keeps converging, but the shown in Figure 11, the complex industrial environment
growth is slow, and the average accuracy of 45 iterations is with different angles and each large field of view is used for
lower than before, and if we use continued iterative training, recognition, and there are multiple weld parts in the image,
it will increase the algorithm is time complexity, but it does the model network is able to recognize the complete weld
not bring noticeable improvement of training accuracy. area, and some smaller weld areas in the lower-left corner of
When the learning rate is 0.00001, the effect is not as good as the image can also be accurately recognized.
the learning rate of 0.0001 because the model hovers around In Figure 12, the model is tested in this study by
the optimal value, does not converge, and appears to be placing it in different working scenarios. Figures 12(a),
overfitting. Each learning rate showed an overall decreasing 12(b) and 12(c) show the weld target detection recogni-
trend after Epochs of 40, which was due to overfitting the tion results at different angles. It can be seen that the
network during training. Therefore, in this study, VGG16 is recognition accuracy is almost always above 90% and is
used as the RPN, and the Faster R-CNN with two-stage is close to the training accuracy when trained on the training
trained on the weld plate workpiece dataset, keeping other set. The material color in the background of the recog-
parameters in Section 2.2 unchanged. The batch is changed nition scene is similar to that of the foreground weld plate,
to a Mini-Batch of size 128, and at a learning rate of 0.0001 and the object detection network can still accurately frame
and Epochs of 40 times, the 500 weld plates in the test set the weld area’s location. Figures 12(d), 12(e) and 12(f )
were used as the target for the Precision-Recall curve. The show the effect of weld target detection with the influence
initial accuracy threshold was set to 0.8500 using a non- of the side light source at a normal overhead angle, which
extreme suppression mechanism. The point of [0, 0.8500] can identify the weld area more completely and obtain an
was decremented in the Recall interval. The model accuracy of about 92%. Figures 12(g), 12(h) and 12(i)
10 Scientific Programming
1.0
0.8
0.6
Precision
0.4
0.2
0.0
SCORES
0.9003
RES
SC O 0
0.933
ES
SC
O
0.9 RES
045
SCO
0.9 RES
466
76 S
0.8 RE
5
O
SC
54 S
0.9 ORE
5
SC
0.8 RES
0.7 ES R
O
156
SC
O
6
96
SC
0.7 ES R
O
SC
0
41
show the weld seam target detection results with laser 93.17% with laser and multiangle interference. The ex-
interference and multiangle recognition. The target de- perimental results in Figure 12 show the robustness of this
tection network can still frame the weld seam in a noisy neural network model for the identification of weld areas
environment, with a maximum recognition accuracy of in different working environments.
Scientific Programming 11
SCO
RE
S0
.91 SC
24 O
SCORES 0.9 RES
0.8946 13
3
SCORES SCORES
0.9321 0.9210
SCORES 0.9387
SCORES 0.9204
SCORES 0.9180
(d) (e) (f )
SC
0.9 ORE
SCO 04 S
R 4
0.88 ES
95
SCO
R
0.90 ES
11
SCORES
0.9279
283 SCORES 0.9317
SCORES 0.9