Fatigue Crack Detection
Fatigue Crack Detection
sciences
Article
Fatigue Crack Detection Based on Semantic Segmentation
Using DeepLabV3+ for Steel Girder Bridges
Xuejun Jia 1,2 , Yuxiang Wang 3, * and Zhen Wang 4, *
Abstract: Artificial intelligence technology is receiving more and more attention in structural health
monitoring. Fatigue crack detection in steel box girders in long-span bridges is an important and
challenging task. This paper presents a semantic segmentation network model for this task based
on DeepLabv3+, ResNet50, and active learning. Specifically, the classification network ResNet50
is re-tuned using the crack image dataset. Secondly, with the re-tuned ResNet50 as the backbone
network, a crack semantic segmentation network was constructed based on DeepLabv3+, which was
trained with the assistance of active learning. Finally, optimization for the probability threshold of
the pixel category was performed to improve the pixel-level detection accuracy. Tests show that,
compared with the crack detection network based on conventional ResNet50, this model can improve
MIoU from 0.6181 to 0.7241.
1. Introduction
Citation: Jia, X.; Wang, Y.; Wang, Z.
Fatigue Crack Detection Based on With the continuous advancements in bridge design and construction technology, long-
Semantic Segmentation Using span steel bridges have developed rapidly. Among these, steel box-girder bridges have
DeepLabV3+ for Steel Girder Bridges. become popular due to their light weight, high torsional rigidity, and other advantages.
Appl. Sci. 2024, 14, 8132. https:// However, due to the coupling effect between initial material defects and dynamic vehicle
doi.org/10.3390/app14188132 loads, fatigue cracks often occur in steel bridge joints, especially around welded joints.
Fatigue cracks in steel girder bridges pose significant safety risks and can lead to
Academic Editor: José António
Correia
catastrophic failures if not detected and repaired in time. Traditional inspection methods,
which rely heavily on manual visual inspections, are labor-intensive, time-consuming, and
Received: 23 July 2024 prone to human error. In recent years, advancements in artificial intelligence (AI) have
Revised: 31 August 2024 revolutionized the field of structural health monitoring, offering more efficient, accurate,
Accepted: 3 September 2024 and automated solutions for fatigue crack detection. Techniques employed for fatigue crack
Published: 10 September 2024
detection in steel girder bridges include machine learning-based techniques, convolutional
neural networks (CNNs), and semantic segmentation networks, among others.
Machine learning (ML) has been widely applied in structural health monitoring for
feature extraction and pattern recognition. Various ML algorithms, such as support vector
Copyright: © 2024 by the authors.
Licensee MDPI, Basel, Switzerland.
machines (SVMs), k-nearest neighbor (KNN), and decision trees, have been employed
This article is an open access article
to classify and detect cracks in steel structures. For example, Zhang et al. [1] used SVM
distributed under the terms and
to classify features extracted from acoustic emission signals, achieving a high detection
conditions of the Creative Commons accuracy for fatigue cracks in steel beams. Similarly, Li et al. [2] implemented a KNN-
Attribution (CC BY) license (https:// based approach to analyze strain data collected from sensors on steel girder bridges and
creativecommons.org/licenses/by/ successfully identified crack initiation and propagation phases.
4.0/).
TP
i =
where TPi is theIoU
number of pixels fori true crack prediction, and FPj is the pixel number for
(1)
false background (non-crack) + FPi + FP
TPi prediction. MIoU is defined as the average of the IoU of two
j
categories, namely:
Ioui + Iou j
where TPi is the number of pixels for true crackMIoU prediction,
=
2
and FP j is the pixel num- (2)
Figure
Figure Fatigue
2. 2. crackcrack
Fatigue detection model.
detection model.
Figure
3.2. 2. Fatigue crack
Classification detection
Network model.
ResNet50_crack
3.2. Initial
(1)
Classification Network ResNet50_crack
dataset for classification network
(1) Initial
The dataset
original for classification
image dataset network
includes 120 pictures with a size of 4928 × 3264 pixels or
5152 × 3864 pixels. A total of 100 images are randomly selected for training, while the
The original image dataset includes 120 pictures with a size of 4928 ×
remaining 20 images are used for testing the network. Generally, the image size has a
5152 ×
certain 3864
impact on pixels. Aresults
the training totalofofthe100
deepimages are randomly
learning model selected
[13,14]. Subsequently, for train
these
remaining
large images and20 ground
images areare
truths used for into
cropped testing thewith
samples network. Generally,
a size of 224 × 224 pixels,the im
resulting in 32,128 small image samples and corresponding ground truths. However, most
of these samples are intact and less informative for crack detection. For improving the
computation efficiency of the training process, these intact samples can be removed from
the dataset. Therefore, pixel variances in samples are evaluated and sorted, and then
the 60% of samples with the smallest variance are discarded. Subsequently, samples are
classified into crack and background datasets according to their ground truth. This process
is schematically shown in Figure 3a.
(2) Image screening based on active learning
The concept of active learning [11,12] originally referred to the case of semi-supervised
learning (that is, a part of the data would be labeled, with the remaining parts unlabeled),
where the algorithm could actively select the samples that were more informative and
representative. These samples were artificially labeled and then added to the dataset
for training. This paper adopts this idea of active learning to screen the most important
samples (namely, those with larger cross-entropy) for re-tuning the classification network
ResNet50_crack. The specific process is as follows, as shown in Figure 3b.
Appl. Sci. 2024, 14, 8132 5 of 11
Figure 3. Tuning process of classification network ResNet50_crack. (a): dataset construction; (b):
Figure 3. Tuning process of classification network ResNet50_crack. (a): dataset construction; (b): ac-
active learning; (c): tuning process for ResNet50_crack.
tive learning; (c): tuning process for ResNet50_crack.
Figure 4. 4.
Figure TheThetraining
trainingprocess of the
process of thesemantic
semanticsegmentation
segmentation network
network DeepLabv3+.
DeepLabv3+.
4. ComparativeStudies
4. Comparative Studies of
of Different
Different Models
Models
This section describes the comparison results of different models. In order to verify
This section describes the comparison results of different models. In order to verify
the effectiveness of the improved strategy proposed in this paper, we conducted compar-
theative
effectiveness
experiments of based
the improved strategy proposed
on the Deeplabv3+ algorithm.inThe
thisfirst
paper,
and we conducted
second compar-
models are
ative
the experiments
DeeplabV3+ with based on the and
ResNet50 Deeplabv3+ algorithm.
mobilenet-v2 The first networks.
[15] as backbone and second models
They are are
thecompared
DeeplabV3+with the proposed one. It should be noted that the difference between the second are
with ResNet50 and mobilenet-v2 [15] as backbone networks. They
network with
compared and thetheproposed
proposed one is that
one. the proposed
It should be notedonethat
is based on ResNet50,
the difference which the
between is sec-
ondre-tuned
network as and
a classification network
the proposed one isusing
that athe
crack detection
proposed onedataset.
is basedEvery model waswhich
on ResNet50,
trained foras
is re-tuned 20aepochs.
classification network using a crack detection dataset. Every model was
trained for 20 epochs. is carried out on the test dataset, which contains 20 raw pictures. All
This comparison
investigations are performed on a Lenovo workstation P910, which is installed with Nvidia
This comparison is carried out on the test dataset, which contains 20 raw pictures. All
GeForce, Santa Clara, CA, USA, GTX 1080 Ti, and MATLAB 2020a.
investigations are performed on a Lenovo workstation P910, which is installed with
Nvidia GeForce, Santa Clara, CA, USA, GTX 1080 Ti, and MATLAB 2020a.
The training results are shown in Table 1. The lowest training accuracy is seen in
mobilenet-v2 and is 99%, and the highest is ResNet50_crack, which is 99.26%. Compared
Appl. Sci. 2024, 14, 8132 7 of 11
The training results are shown in Table 1. The lowest training accuracy is seen in
mobilenet-v2 and is 99%, and the highest is ResNet50_crack, which is 99.26%. Compared
with mobileNet-v2, ResNet50_crack and ResNet50 are deeper and then have stronger
generalization ability for samples with complex backgrounds and finer cracks. Compared
with ResNet50, ResNet50_crack has undergone crack classification pre-training, and thus
the training of the semantic segmentation network is easier and has higher accuracy. Table 1
shows that the training process is slow but acceptable.
Training Accuracy
Training Accuracy Global Validation
Network Type Time (h) of Background
of Crack Pixels Accuracy Accuracy
Pixels
Model 1:
18.9 78.52% 99.74% 99.71% 99.11%
DeeplabV3+, ResNet50
Model 2:
14.7 70.09% 99.67% 99.64% 99.00%
DeeplabV3+, MobileNet-v2
Proposed Model:
DeeplabV3+, ResNet50_crack, 12.25 75.59% 99.89% 99.86% 99.26%
Active Learning
IoU
Network Type MIoU
Crack Background
Model 1:
0.2391 0.9971 0.6181
DeeplabV3+, ResNet50
Model 2:
0.1910 0.9964 0.5937
DeeplabV3+, MobileNet-v2
Proposed Model:
DeeplabV3+, ResNet50_crack, 0.3897 0.9986 0.6942
Active Learning
Proposed Model:
DeeplabV3+, ResNet50_crack, 0.3897 0.9986 0.6942
Active Learning
The crack distribution predicted by Model 1 and the proposed model are compared
Appl. Sci. 2024, 14, 8132 8 of 11
in Figure 5 together with the ground truth. Clearly, the proposed model can provide re-
sults that are more consistent with the ground truth.
threshold means that fewer pixels are classified as crack, indicating fewer true positive
crack pixels and lower accuracy. As defined in Equation (1), the IoU increases in that
Appl. Sci. 2024, 14, x FOR PEER REVIEW 10 of 12
its denominator is effectively reduced. Notably, the threshold step herein is 0.05
Appl. Sci. 2024, 14, x FOR PEER REVIEW 10 of to
12 save
computation time.
It It
can canbebeseen
It can be
seenfrom
from Figure
Figure 66 that
that thethemaximum
maximumvalues valuesof of
thethe
IoUIoU
andand
MIoU mustmust
MIoU
correspond to aseen from Figure
threshold around6 0.9.
thatInthe maximum
order values
to precisely of thethe
identify IoU and MIoU
optimal must
threshold,
correspond
correspond totoa athreshold
threshold around 0.9.
around0.01, In order
0.9. In order toprecisely
preciselyidentify
identify the optimal threshold, a
a smaller sampling step, namely, is chosentoand the
adopted for analysis,optimal threshold,
with the results
smaller sampling
a smaller step, namely, 0.01, is
is chosen andadopted
adopted for analysis, with the results
plotted insampling
Figure 7. step, namely,
The results 0.01,that
show chosen
when and
the threshold for analysis,
is 0.89, with
the highesttheMIoU
results
is
plotted
plottedin Figure
in Figure 7. The results
7. Thedataset show
resultsand
show that when
that when the threshold is 0.89, the highest MIoU
is is
0.7466 for the training 0.7231 for thethe threshold
test dataset. is 0.89,
This the highest
means that 0.89MIoU
is the
0.7466 for
0.7466 for
optimal the training
the training
threshold dataset
dataset and 0.7231 for the test dataset. This means that 0.89 is theis the
value. and 0.7231 for the test dataset. This means that 0.89
optimal threshold
optimal threshold value.value.
7.7.Conclusions
Conclusions
Thispaper
This paperpresents
presentsaasemantic
semanticsegmentation
segmentationnetwork
networkmodel
modelforforfatigue
fatiguecrack
crackdetection
detec-
tion based on DeepLabv3+, ResNet50, and active learning. The classification
based on DeepLabv3+, ResNet50, and active learning. The classification network ResNet50 network Res-
isNet50 is re-tuned
re-tuned using theusing theimage
crack crackdataset,
image dataset,
leading leading to ResNet50_crack,
to ResNet50_crack, which
which is is
adopted
adopted as the backbone network for DeepLabv3+ to construct a crack
as the backbone network for DeepLabv3+ to construct a crack semantic segmentation semantic segmen-
tation network.
network. This network
This network was trained
was trained with
with the the assistance
assistance of active
of active learning,
learning, followed
followed by the
optimization of the probability threshold of the pixel category. Compared withwith
by the optimization of the probability threshold of the pixel category. Compared the
the crack
crack detection network based on conventional ResNet50, this model
detection network based on conventional ResNet50, this model can improve MIoU from can improve MIoU
from 0.6181
0.6181 to 0.7241.
to 0.7241. In future
In future research,
research, we arewetoare to further
further improve
improve the detection
the detection accuracy
accuracy of the
Appl. Sci. 2024, 14, 8132 11 of 11
model for small cracks and to achieve the automatic and fast calculation of crack width,
length, and other information based on the Transformer network.
References
1. Zhang, Y.; Li, H.; Wang, Y. Fatigue Crack Detection in Steel Beams Using Support Vector Machines. J. Struct. Health Monit. 2020,
15, 234–246.
2. Li, Z.; Zhao, J.; Chen, Q. K-Nearest Neighbors-Based Crack Detection Using Strain Data from Steel Girder Bridges. Struct. Control
Health Monit. 2019, 26, e2467.
3. Kim, S.; Park, J.; Choi, J. Drone-Based Crack Detection in Steel Bridges Using Convolutional Neural Networks. Autom. Constr.
2021, 126, 103675.
4. Yang, Y.; Yao, X. Multi-Scale Convolutional Neural Network for Crack Detection in Steel Bridges. Eng. Struct. 2023, 259, 114245.
5. Chen, L.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. DeepLabv3+: Semantic Image Segmentation with Deep
Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 40, 834–848.
[CrossRef]
6. Chen, L.C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-Decoder with Atrous Separable Convolution for Semantic
Image Segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), Tel Aviv, Israel, 23–27 October
2022; Springer: Cham, Switzerland, 2019; pp. 801–818.
7. Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Medical Image
Computing and Computer-Assisted Intervention (MICCAI); Springer: Cham, Switzerland, 2015; Volume 9351, pp. 234–241.
8. Xu, K.; Zhang, J.; Li, W. Transfer Learning for Crack Detection in Steel Bridges. Comput.-Aided Civ. Infrastruct. Eng. 2023, 38,
265–278.
9. Long, J.; Shelhamer, E.; Darrell, T. Fully Convolutional Networks for Semantic Segmentation. In Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 15 October 2015; pp. 3431–3440.
10. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 12 December 2016; pp. 770–778.
11. Settles, B. Active Learning Literature Survey; Computer Sciences Technical Report 1648; University of Wisconsin-Madison: Madison,
WI, USA, 2009.
12. Wang, Z.; Xu, G.; Ding, Y.; Wu, B.; Lu, G. A vision-based active learning convolutional neural network model for concrete surface
crack detection. Adv. Struct. Eng. 2020, 23, 2952–2964. [CrossRef]
13. Rukundo, O. Effects of Image Size on Deep Learning. Electronics 2023, 12, 985. [CrossRef]
14. Rukundo, O. Evaluation of extra pixel interpolation with mask processing for medical image segmentation with deep learning.
Signal Image Video Process. 2024, 18, 1–8. [CrossRef]
15. Mark, S.; Andrew, H.; Menglong, Z.; Andrey, Z. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the
IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4510–4520.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.