0% found this document useful (0 votes)
7 views17 pages

Infrastructures 10 00003

This study investigates the use of deep learning models, specifically transfer learning and the YOLOv11 architecture, for automatic railway fault detection. The YOLOv11 model achieved the highest accuracy of 92.64%, outperforming other models like VGG16, which had an accuracy of 89.18%. The research emphasizes the importance of automated systems in enhancing railway safety and operational efficiency by enabling real-time monitoring and early detection of structural defects.

Uploaded by

Mai Vạn Hậu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views17 pages

Infrastructures 10 00003

This study investigates the use of deep learning models, specifically transfer learning and the YOLOv11 architecture, for automatic railway fault detection. The YOLOv11 model achieved the highest accuracy of 92.64%, outperforming other models like VGG16, which had an accuracy of 89.18%. The research emphasizes the importance of automated systems in enhancing railway safety and operational efficiency by enabling real-time monitoring and early detection of structural defects.

Uploaded by

Mai Vạn Hậu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Article

Automatic Detection of Railway Faults Using Neural Networks:


A Comparative Study of Transfer Learning Models and YOLOv11
Omar Rodríguez-Abreo 1 , Mario A. Quiroz-Juárez 1, * , Idalberto Macías-Socarras 2 ,
Juvenal Rodríguez-Reséndiz 3 , Juan M. Camacho-Pérez 4 , Gabriel Carcedo-Rodríguez 5,6
and Enrique Camacho-Pérez 5,7, *

1 Centro de Física Aplicada y Tecnología Avanzada, Universidad Nacional Autónoma de México,


Santiago de Querétaro 76230, Mexico; [Link]@[Link]
2 Facultad de Ciencias Agrarias, Universidad Estatal Península de Santa Elena, Santa Elena (UPSE),
La Libertad 240204, Ecuador; imacias@[Link]
3 Facultad de Ingeniería, Universidad Autónoma de Querétaro, Querétaro 76010, Mexico; juvenal@[Link]
4 Departamento de Sistemas y Computación, Tecnológico Nacional de México/I.T. Mérida,
Mérida 97118, Mexico; [Link]@[Link]
5 Red de Investigación OAC Optimización, Automatización y Control, El Marqués, Queretaro 76240, Mexico;
gabrielcarcedo@[Link]
6 Instituto de Investigaciones en Matemáticas Aplicadas y en Sistemas, Unidad Académica del Estado de
Yucatán, Universidad Nacional Autónoma de México, Mérida 97357, Mexico
7 Facultad de Ingeniería, Universidad Autónoma de Yucatán, Mérida 97000, Mexico
* Correspondence: maqj@[Link] (M.A.Q.-J.); [Link]@[Link] (E.C.-P.)

Abstract: Developing reliable railway fault detection systems is crucial for ensuring both
safety and operational efficiency. Various artificial intelligence frameworks, especially deep
learning models, have shown significant potential in enhancing fault detection within
railway infrastructure. This study explores the application of deep learning models for
railway fault detection, focusing on both transfer learning architectures and a novel classifi-
cation framework. Transfer learning was utilized with architectures such as ResNet50V2,
Xception, VGG16, MobileNet, and InceptionV3, which were fine-tuned to classify railway
Academic Editor: Maciej Kruszyna track images into defective and non-defective categories. Additionally, the state-of-the-art
Received: 10 November 2024 YOLOv11 model was adapted for the same classification task, leveraging advanced data
Revised: 10 December 2024 augmentation techniques to achieve high accuracy. Among the transfer learning models,
Accepted: 24 December 2024 VGG16 demonstrated the best performance with a test accuracy of 89.18%. However,
Published: 28 December 2024
YOLOv11 surpassed all models, achieving a test accuracy of 92.64% while maintaining
Citation: Rodríguez-Abreo, O.; significantly lower computational demands. These findings underscore the versatility of
Quiroz-Juárez, M.A.; Macías-Socarras,
deep learning models and highlight the potential of YOLOv11 as an efficient and accurate
I.; Rodríguez-Reséndiz, J.; Camacho-
solution for railway fault classification tasks.
Pérez, J.M.; Carcedo-Rodríguez, G.;
Camacho-Pérez, E. Automatic
Detection of Railway Faults Using Keywords: deep learning; transfer learning; YOLOv11; railway fault detection; safety;
Neural Networks: A Comparative maintenance
Study of Transfer Learning Models
and YOLOv11. Infrastructures 2025, 10,
3. [Link]
infrastructures10010003 1. Introduction
Copyright: © 2024 by the authors. Railway infrastructure is essential for global economic and sustainable development,
Licensee MDPI, Basel, Switzerland. providing efficient transport for goods, people, and services while contributing to reduced
This article is an open access article
carbon emissions. A critical component of maintaining this infrastructure is Structural
distributed under the terms and
Health Monitoring (SHM), which evaluates the performance of various subsystems and
conditions of the Creative Commons
Attribution (CC BY) license
enables the early detection of faults to prevent major damage and costly shutdowns.
([Link] Railway systems operate in diverse environments and are exposed to threats posed by the
licenses/by/4.0/). time, location, and weather conditions. Cracks and poor track conditions are among the

Infrastructures 2025, 10, 3 [Link]


Infrastructures 2025, 10, 3 2 of 17

primary causes of derailments. Manual inspections are resource-intensive, time-consuming,


and prone to human error. Thus, an automated system is necessary to accurately assess
track conditions and prevent derailments.
Currently, several challenges limit the effective detection of faults on railway lines. A
manual inspection, due to the length of the tracks, has delays, requiring long times to complete,
and this task must be performed quickly and continuously, in addition to the fact that there will
be bias from human subjectivity [1,2]. On the other hand, if the necessary human resources
were guaranteed to conduct inspections manually in times comparable to technological
solutions, the number of operators would increase, generating high costs. Therefore, manual
inspections are resource-intensive, time-consuming, and prone to human error. Thus, an
automated system is necessary to assess track conditions and prevent derailments accurately.
In recent years, there has been significant progress in developing SHM technologies,
utilizing various advanced methods. Shafique et al. [3] developed an automatic system using
acoustic analysis to detect track faults like wheel burns and superelevation, achieving a 97%
accuracy with machine learning models such as random forests and decision trees. Rifat
et al. [4] proposed a solar-powered autonomous vehicle equipped with ultrasonic sensors
for crack detection, stopping trains upon detecting faults. Gálvez et al. [5] introduced a
hybrid model-based approach (HyMA) combining physics-based and data-driven models for
HVAC system diagnostics, achieving 92.60% accuracy. Li et al. [6] focused on fault detection
in high-speed railway systems under hybrid AC/DC grids using a neural network-based
monitoring system, validated through real-time and off-line experiments.
Chayan et al. [7] compared Fast Fourier Transformation and Discrete Wavelet Trans-
formation for detecting track faults, finding the latter more effective in real-time scenarios.
These methods achieved outstanding accuracies of 100% and 99.85%, respectively, in detect-
ing rail corrugations and cracks during simulations. However, they presented limitations in
terms of the computational speed and require additional adaptations, such as GPS sensors
to locate faults efficiently. Andrusca et al. [8] discussed using sensors to monitor impedance
bonds, providing real-time warnings for faults. Hafeez et al. [9] presented an IoT-based
system for real-time fault localization on Pakistani railway tracks, achieving 98.4% accuracy
with a multilayer perceptron model.
Yu et al. [10] used the enhanced YOLOv4-tiny model deployed on FPGA platforms,
which achieved a speed of 295.9 FPS and an mAP accuracy of 95.1% in detecting defects
in track fasteners in an unmanned aerial vehicle (UAV). Tong et al. [11] implemented
networks such as ARTNet, designed for track detection from aerial imagery at varying
angles, offering real-time detection capabilities with inference rates of up to 233 FPS and an
average F1 of 76.12. In addition to image-based systems, real-time monitoring solutions
with onboard sensors on trains offer a complementary perspective for fault detection.
Moriello et al. [12] generated an IoT-based system that was able to identify anomalies in
freight train carriage components through vibration analysis, enhancing capabilities such
as early detection, better resource allocation, and improved railway safety by enabling
preventive interventions.
The utilization of deep learning in railway fault detection systems has significantly
enhanced safety and maintenance across various railway components. Kyuetael et al. [13]
categorized deep learning applications into four main areas, including infrastructure, train
components, operations, and station safety, discussing challenges and opportunities within
each. Wang et al. [14] highlighted the effectiveness of one-stage deep learning models like
YOLOv2 for rapid and accurate component inspection, achieving a 93% mAP at 35 IPS.
Shim et al. [15] focused on anomaly detection in wheel flats, employing signal processing
techniques and a modified LeNet deep learning model, which improved detection accuracy
using spectrogram images. Rajiv et al. [16] developed a system for obstacle detection on
Infrastructures 2025, 10, 3 3 of 17

tracks using a deep classifier network and 2-D Singular Spectrum Analysis, achieving 85.2%
accuracy under varying illumination conditions. Thendral et al. [17] introduced a computer
vision-based crack detection system using Gabor transforms and neural networks, attaining
94.9% accuracy. Wang et al. [18] proposed the YOLOv5s-VF network for detecting rail
surface defects, featuring a sharpening attention mechanism and adaptive spatial feature
fusion, resulting in 93.5% accuracy and a 114.9 fps detection speed. Jiang et al. [19] leveraged
MobileNet and YOLO-inspired architectures for real-time defect detection, demonstrating
the method’s potential in industrial applications. Dequiang et al. [20] enhanced obstacle
detection with an improved YOLOv4 network, optimizing performance for medium- and
long-distance obstacles with an mAP of 93% on NVIDIA Jetson AGX.
Jassada et al. [21] employed deep learning models, including CNN and RNN, to
detect and evaluate the severity of combined rail defects, achieving 99% accuracy for both
settlement and dipped joint detection, with CNN providing high accuracy for the dipped
joint severity and RNN for the settlement severity. These studies collectively highlight
the advancements and diverse methodologies in railway fault detection, emphasizing
the importance of automation and real-time monitoring to ensure safety and efficiency in
railway systems.
The YOLOv11 model implementation in our study offers an advanced approach for
automated fault detection in railways, especially in tasks that require real-time processing
and high accuracy. In contrast to previous versions such as YOLOv4 or YOLOv5, which
have been adapted for the detection of defects in specific track elements such as rail
fasteners, our model was optimized with data augmentation techniques and lightweight
configurations to address a wide range of structural defects and railway contexts. In
comparison, Brintha et al. [22] describe the FOD-YOLO model that specializes in detecting
objects and defects in fasteners, achieving an accuracy of 98.14% and an mAP of 96.85%,
thanks to the incorporation of modules such as SE-CSPDarkNet53 and SE-EfficientNet,
designed to improve attention to specific features and the processing efficiency. Although
both approaches employ YOLO-based architectures, our implementation of YOLOv11
stands out by integrating an enhanced generalization capability, with a test accuracy of
92.64% and a lightweight framework of 2.6 million parameters, facilitating its deployment
in low-power devices and diverse operating environments.
This study aims to develop a robust and reliable system for railway fault detection,
specifically targeting critical issues like track cracks through advanced image analysis with
transfer learning. By focusing on image-based analysis, this approach provides an efficient
solution for real-time track condition monitoring, enabling the early detection of structural
defects that could lead to derailments or costly shutdowns.

2. Materials and Methods


This article describes a structured approach for detecting faults in railway tracks
through deep learning, addressed in several key stages to optimize model performance.
The first stage focused on dataset formation and image preprocessing. The second stage
covered the architecture definition and deep learning model selection based on transfer
learning. In the third stage, these architectures were trained using the combined dataset.
This process involved standard techniques like the Adam optimizer and categorical cross-
entropy loss function, optimizing each model based on the required number of epochs to
prevent overfitting. To mitigate overfitting, the fourth stage employed data augmentation
techniques, thereby increasing the number of training images with additional transforma-
tions like random flipping, rotation, translation, and brightness adjustments. This approach
helped reduce the overfitting observed in the previous stage, improving model general-
ization. Finally, in the last stage, fine-tuning was applied to the models obtained in the
Infrastructures 2025, 10, 3 4 of 17

previous stage. In this step, the upper layers of the pre-trained models were unlocked
to allow their weights to be specifically adjusted to the rail dataset, optimizing the final
accuracy. Through fine-tuning, the VGG16, MobileNet, and InceptionV3 models achieved
adequate performance for real-world applications, providing accuracy and reliability in
detecting tracks in poor conditions.

2.1. Stage 1: Dataset Formation and Image Preprocessing


2.1.1. Dataset
Three publicly available datasets [23–25], as described in [26], were used in this study.
These datasets included a total of 2225 images collected from a variety of railway tracks,
including old, new, and abandoned lines, to capture a diverse range of faults specific to
meter gauge and broad gauge tracks. The images, annotated manually to distinguish
defective and non-defective examples, featured critical railway components such as rails
and fasteners. These components are often linked to severe accidents, making their accurate
detection essential. The dataset included images captured from different angles to ensure
diversity and generalizability.
To optimize model performance and improve generalization, data augmentation tech-
niques were applied to the training data. These transformations included random flipping,
rotation, translation, and brightness adjustments, simulating real-world variations and en-
hancing the model’s robustness to unseen data. Figure 1 provides representative examples
from the dataset, illustrating diverse fault types such as rail fractures (Figure 1a–d), wear
and broken hooks (Figure 1e–h), and critical breakages with artifacts (Figure 1i–l).

(a) (b) (c) (d)

(e) (f) (g) (h)

(i) (j) (k) (l)


Figure 1. Samples from the dataset illustrating various fault types on railway tracks. (a–d) Examples
of rail fractures, showing cracks and separations on the rail surface; (e–h) wear and broken hooks,
highlighting the degradation and damage to rail fasteners; (i–l) critical breakages with artifacts,
demonstrating severe failures such as complete rail splits and displacements.

While the current dataset is diverse, the inclusion of railway images captured under
varying climatic conditions is recommended for future research. This enhancement would
Infrastructures 2025, 10, 3 5 of 17

further validate the adaptability and robustness of the proposed models across a wider
range of real-world scenarios.

2.1.2. Image Resizing


The image dataset underwent several preprocessing steps to ensure compatibility with
the deep learning architectures and to maximize model performance. Initially, all images
were resized to dimensions of 256 × 256 × 3 using bilinear interpolation, chosen for its
ability to maintain image quality during resizing. This step ensured uniformity across
input data and compatibility with the pre-trained architectures.

2.1.3. Dataset Splitting


The dataset was normalized to scale pixel values to the [0, 1] range, enhancing stability
during training. It was then divided into three subsets: training, validation, and testing,
as detailed in Table 1, ensuring the diverse representation of defect types in each subset.
Special care was taken to maintain a balanced distribution of defective and non-defective
images to prevent bias during model training and evaluation. The variations observed in
Table 1 resulted from the data augmentation process, which expanded the training set to
improve model performance and generalization while keeping the validation and testing
sets consistent for reliable evaluation.

Table 1. Size of dataset subsets in every stage.

Stage Training Validation Test


3 1517 477 231
4 6068 477 231
5 6068 477 231

2.1.4. Data Augmentation


In Stages 4 and 5, the training dataset was expanded from 1517 to 6068 images by ap-
plying various transformation techniques to enhance model generalization and robustness.
The transformations applied included the following:
• Random Flip: Mirroring images randomly along the horizontal and/or vertical axes.
• Random Rotation: Rotation applied at random angles within the range of −0.25π
to 0.25π.
• Translation: Shifting images by 20% along both the horizontal and vertical axes to
simulate positional shifts and enhance robustness.
• Brightness Adjustment: Modifying the image brightness randomly within the range
of −0.2 to 0.2, where −1.0 results in a completely black image and 1.0 in a fully white
image, to account for varying lighting conditions.
For these transformations, the Keras layers RandomFlip, RandomRotation, Random-
Translation, and RandomBrightness were utilized. Figure 2 illustrates all the applied image
transformations. This significantly increased the diversity of the training data, enhancing
the model’s ability to generalize to unseen scenarios.
The resizing and splitting of data were performed prior to training the architectures
in Stages 3, 4, and 5. The data augmentation process was applied before training in
Stages 4 and 5 to further enhance the diversity of the training set.
Infrastructures 2025, 10, 3 6 of 17

Figure 2. Example of all transformations in the data augmentation process.

2.2. Stage 2: Architecture Definition and Model Selection for Transfer Learning
A transfer learning process was performed using the following architectures:
ResNet50V2, Xception, VGG16, MobileNet, and InceptionV3. All of these architectures
were accessed through the Keras Applications module in Python. The models utilized
pre-trained weights from the ImageNet dataset, which are readily available within the
Keras Applications module. These pre-trained weights provided a strong starting point for
the models, enabling them to leverage prior knowledge and adapt efficiently to the task of
railway fault detection.
Table 2 describes each pre-trained model along with the performance metrics on the
ImageNet database. This information was sourced from the Keras documentation [27]. As
explained in [27], the accuracy values represent the models’ performance on the ImageNet
validation dataset, while the time per inference step was calculated as the average over
30 batches, each with a size of 32, repeated 10 times. The hardware specifications used
included an AMD EPYC Processor (with IBPB) with 92 cores, 1.7 TB of RAM, and a Tesla
A100 GPU.

Table 2. Performance, on Image-Net, of pre-trained models considered for the transfer learning.

Size Accuracy Time (ms) per Inference Step


Model Parameters Depth
(MB) Top 1 Top 5 CPU GPU
ResNet50V2 98 76.0% 93.0% 25.6 M 103 45.6 4.4
Xception 88 79.0% 94.5% 22.9 M 81 109.4 8.1
VGG16 528 71.3% 90.1% 138.4 M 16 69.5 4.2
MobileNet 16 70.4% 89.5% 4.3 M 55 22.6 3.4
InceptionV3 92 77.% 93.7% 23.9 M 189 42.2 6.9

From these pre-trained models, their fully connected top layers were not used. Instead,
a custom model was built on top of these pre-trained models. This custom model consisted
of three layers: a flatten layer, a fully connected layer, and an output layer. Additionally, an
input layer was implemented before the pre-trained models. Figure 3 illustrates the final
structure of the defined models.
Infrastructures 2025, 10, 3 7 of 17

Figure 3. The diagram illustrates the overall structure of the deep learning model used for railway
track fault detection.

Each architecture consists of the following components:


• Input Layer: Handles the input data, consisting of images with dimensions of 256 × 256 × 3.
• Pre-trained Model: Used without its fully connected top layer, allowing for the inte-
gration of a custom classification head.
• Flatten Layer: Converts the output array from the pre-trained model into a one-
dimensional vector, preparing it for the fully connected layers.
• Fully Connected Layer: Contains 128 neurons with ReLU activation to introduce
non-linearity and enhance the model’s learning capability.
• Output Layer: A fully connected layer with 2 neurons and a Softmax activation
function. The problem is formulated as a categorical classification task with two classes:
one representing non-defective tracks and the other representing defective tracks.
The choice of 2 neurons and the Softmax function ensure that the model outputs a
probability distribution over the two classes, facilitating accurate predictions.
As can be observed, the only differences between the architectures lie in their middle
layers, which correspond to the pre-trained models. The input layer, flatten layer, fully
connected layer, and output layer remain identical across all the proposed architectures.
The proposed architectures were trained in three phases, with a model generated for
each architecture at each phase:
• Phase 1: Transfer learning was applied using the combined dataset consisting of
Dataset 1 and Dataset 2.
• Phase 2: Transfer learning was conducted again, this time using an augmented version
of the training data from the combined dataset to improve model generalization.
• Phase 3: A fine-tuning process was applied to all models obtained in Phase 2, allowing
for further optimization and adaptation to the specific dataset.
Each phase corresponded to a project stage, spanning from Stage 3 to Stage 5.
Infrastructures 2025, 10, 3 8 of 17

2.3. Stage 3: Training Architectures with Combined Dataset


The training process for each defined architecture utilized the Adam optimizer and
categorical cross-entropy as the loss function. To mitigate overfitting, the number of epochs
varied across models. This approach was necessary because different models require
different numbers of epochs to effectively learn a task. Some models tend to overfit more
quickly than others, losing their generalization capability after fewer epochs.
Categorical cross-entropy was chosen as the loss function because the problem was
categorized as a classification task. Consequently, the Softmax activation function was
employed in the output layer to produce probability distributions over the classes.
In this stage, as shown in Table 1, the models were trained using a dataset of
1517 images and validated with 477 images. Finally, testing was conducted on the re-
maining 231 images from the combined dataset. The number of epochs used for each model
is detailed in Table 3.

Table 3. Number of training epochs per stage for each architecture.

Model Stage 3 Stage 4 Stage 5


ResNet50V2 20 15 7
Xception 20 10 5
VGG16 5 5 3
MobileNet 10 10 10
InceptionV3 30 30 15

As a result, five models were generated, with the accuracy and loss values for both the
training and validation data recorded for subsequent analysis. Each model was evaluated
using a sample of 231 images.
During the training phase, the pre-trained model embedded within each defined
architecture was blocked, meaning its weights remained unchanged throughout the train-
ing process.

2.4. Stage 4: Training Architectures with the Augmented Dataset


To improve the performance metrics obtained by the models in Stage 3, the architec-
tures listed in Table 2 were re-trained using the augmented training dataset. This dataset
was expanded as described in Section 2.1.2.
In this stage, the training data increased to 6068 images (see Table 1), while the
validation and test samples remained the same size as in Stage 3.
The number of epochs for the ResNet50V2 and Xception models was adjusted (refer
to Table 3) to account for the expanded training dataset and to address the learning perfor-
mance observed in Stage 3, specifically concerning their loss values. More details can be
found in the Section 3.
Learning metrics, such as the accuracy and loss, were documented for further analysis.
Similarly to the previous stage, the pre-trained model within each architecture was kept
blocked during training to prevent any updates to its weights.

2.5. Stage 5: Fine-Tuning the Models Obtained in Stage 4


To achieve better results, the models obtained in Stage 4 underwent a fine-tuning
process. For this step, the pre-trained components of each model from Stage 4 were
unblocked to allow their weights to be updated during training. This adjustment aimed
to better adapt the models to the specific dataset used in this problem. In previous stages
(Stage 3 and Stage 4), the models retained weights pre-trained on the ImageNet dataset.
Infrastructures 2025, 10, 3 9 of 17

The number of epochs for fine-tuning was set to approximately half of the epochs
used for training in Stage 4 for all models, except MobileNet and VGG16. For these two
models, the number of epochs was increased beyond half because they achieved better
results in Stage 4. More details on this can be found in the Section 3.
The models were fine-tuned using the augmented training dataset (see Table 1), vali-
dated on the same set of 477 images, and tested on a sample of 231 images, similarly to the
setup in Stage 4.
The training and test results were recorded to evaluate whether the fine-tuning process
positively impacted model performance.

2.6. YOLOv11: Training Methodology and Data Augmentation


The YOLOv11 model was developed using a unique methodology that diverged from
the approaches employed for the other models in this study. The default transformations in
the Python library Ultralytics were used to prepare the training data. These transformations
were applied randomly using the randaugment method, which involved adjustments to
the hue, saturation, and brightness, rotation, horizontal and vertical translation, scaling,
and horizontal flipping. Unlike traditional data augmentation techniques that increase
the dataset size by generating additional samples, this approach ensured that unique
datasets were created for each training epoch. This was achieved by applying random
transformations to the entire training dataset for every iteration, resulting in dynamic
augmentation throughout the training process. The loss values recorded at the end of each
epoch corresponded to these augmented training datasets. After each training epoch, the
model was evaluated using the original, untransformed training and validation datasets.
This approach ensured that the accuracy values reported for training, validation, and
testing were consistent with those of the other models, as the datasets used for validation
and testing were identical across all evaluations. By maintaining this consistency, the study
enabled a reliable comparison of YOLOv11’s performance with that of the other models.
The YOLOv11 model utilized the same base training dataset as the other models analyzed
in this research. Despite the differences in the data augmentation methodology, the validation
and test datasets remained unmodified, ensuring uniformity in the evaluation process. This
consistency allowed for a robust comparison of YOLOv11’s effectiveness against that of the
previously evaluated models in terms of both accuracy and computational efficiency.

3. Results
3.1. Analysis of Stage 3: Initial Training and Overfitting
To analyze the performance of the models, the test results were compared with the
learning metrics (accuracy and loss) obtained during training. Figure 4 summarizes the
learning process of each model from Stage 3. The final values of accuracy and loss recorded
at the last epoch for each model are presented in Table 4.
According to Figure 4, all the models exhibited signs of overfitting, as evidenced by
the gap between the validation and training curves for both accuracy and loss. Table 4
further confirms this, with test accuracy values being significantly lower than the training
accuracy values recorded at the last epoch.
While most models achieved acceptable training accuracy values of over 85% (except
ResNet50V2), the poor performance on validation and test data indicated that these models
were not suitable for real-world applications in their current form.
Infrastructures 2025, 10, 3 10 of 17

Figure 4. Stage 3 models’ learning curves.

Table 4. Training, validation, and testing results per stage for each model.

Models Training Loss Validation Loss Training Accuracy Validation Accuracy Test Accuracy
ResNet50V2 0.8003 4.0292 0.8154 0.6541 59.74
Xception 0.1221 0.6924 0.9618 0.7442 73.16
Stage 3 VGG16 0.1641 0.3751 0.9341 0.8491 77.06
MobileNet 0.2407 0.4387 0.8985 0.8029 79.65
InceptionV3 0.2381 0.9418 0.9038 0.6646 64.07
ResNet50V2 1.4049 1.1115 0.7121 0.7400 69.70
Xception 0.4775 0.6022 0.7615 0.7191 66.23
Stage 4 VGG16 0.2009 0.2764 0.9103 0.9140 85.71
MobileNet 0.3216 0.3953 0.8543 0.8176 80.09
InceptionV3 0.5548 0.5729 0.7050 0.6897 67.97
ResNet50V2 0.4190 1.2387 0.8131 0.7044 58.44
Xception 0.3942 0.4959 0.8153 0.7547 77.49
Stage 5 VGG16 0.0539 0.1747 0.9789 0.9560 89.18
MobileNet 0.2036 0.2903 0.9100 0.8826 85.71
InceptionV3 0.2591 0.3838 0.8856 0.8407 85.71

3.2. Analysis of Stage 4: Mitigating Overfitting Through Data Augmentation


To mitigate the overfitting observed in Stage 3, two changes were implemented:
• An augmented training dataset was created, as described in Section 2.4.
• The number of epochs was adjusted for ResNet50V2 and Xception, given their high
ratio of validation to training loss values.
Infrastructures 2025, 10, 3 11 of 17

The results for the Stage 4 models are presented in Table 4 and Figure 5. As shown in
Figure 5, the validation and training curves tended to converge, with no significant gap,
indicating a reduction in overfitting. This conclusion is further supported by the closer
alignment between the test, training, and validation accuracy values in Table 4.
Despite improvements, the training accuracy for ResNet50V2, Xception, and Incep-
tionV3 remained below 85%, which is generally insufficient for real-world applications. On
the other hand, VGG16 and MobileNet showed promising performance, achieving over
90% accuracy during training and validation.

Figure 5. Stage 4 models’ learning curves.

3.3. Analysis of Stage 5: Fine-Tuning to Enhance Accuracy


In Stage 5, fine-tuning was applied to the models from Stage 4 to further adapt them
to the specific dataset. The number of epochs for fine-tuning was set to approximately
half of those used in Stage 4 for ResNet50V2, Xception, and InceptionV3. For VGG16 and
MobileNet, the epochs were slightly increased due to their better performance metrics in
Stage 4 (Table 4).
The fine-tuned results are shown in Table 4 and Figure 6. While fine-tuning improved
the accuracy for all models except ResNet50V2, it also introduced signs of overfitting in
some cases. For ResNet50V2, fine-tuning led to increased training accuracy but decreased
validation and test accuracy, confirming overfitting.
VGG16 emerged as the best performing model, achieving accuracy values above 89%
across all metrics. MobileNet and InceptionV3 also demonstrated acceptable performance,
though they did not surpass VGG16. While Xception’s performance improved, it remained
insufficient for practical applications.
Infrastructures 2025, 10, 3 12 of 17

Figure 6. Stage 5 models’ learning curves.

Table 4 summarizes the final loss and accuracy values for each model. The most
notable observation is the consistent improvement in VGG16’s performance, achieving the
highest test accuracy of 89.18% after fine-tuning in Stage 5. MobileNet and InceptionV3
also demonstrated robust performance, with test accuracies of 85.71%, positioning them
as reliable alternatives for deployment. Noteworthy as well is the marked reduction in
overfitting observed in Stage 4 for all models, evidenced by the convergence of training
and validation metrics. These trends highlight the effectiveness of data augmentation and
fine-tuning in enhancing the generalization capabilities of the models.
3.4. Performance of the YOLOv11 Model
The loss curves for the YOLOv11 model demonstrated consistent convergence without
any signs of overfitting, indicating that the training process was well regularized and robust,
as shown in Figure 7. While the accuracy curves did not fully converge, the stable behavior
of the loss curves suggested that the model remained viable for practical applications,
particularly in scenarios requiring reliable and efficient detection.
At the end of the training process, the YOLOv11 model achieved final accuracy values
of 99.87% on the original training dataset, 96.24% on the validation dataset, and 92.64%
on the test dataset. These results surpassed those obtained by all other models in this
study, including VGG16, which had previously shown the highest accuracy. The superior
performance of YOLOv11, especially on the validation and test datasets, underscores its
effectiveness in detecting railway faults with high precision.
In addition to its accuracy, YOLOv11 offered significant computational advantages.
The nano version of the model, YOLOv11n, contains only 2.6 million parameters, which is
considerably fewer than VGG16 (138.4 million parameters) and even less than MobileNetV2
(3.5 million parameters). This reduced parameter count translates to lower computational
Infrastructures 2025, 10, 3 13 of 17

resource requirements, making YOLOv11n particularly well suited for deployment in


real-world environments where efficiency and scalability are critical.

Figure 7. Training and validation loss and accuracy curves for the YOLOv11 model.

3.5. Comparison Between VGG16 and YOLOv11n


The performance comparison between the best transfer learning model, VGG16, and
the YOLOv11n model highlighted key differences in accuracy and computational efficiency.
As shown in Table 5, YOLOv11n demonstrated superior accuracy across all datasets, achiev-
ing 99.87% on the training set, 96.24% on the validation set, and 92.64% on the test set,
outperforming VGG16, which achieved a lower test accuracy of 89.18%. Additionally,
YOLOv11n’s lightweight architecture, with only 2.6 million parameters, is significantly
more resource-efficient compared to VGG16’s 138.4 million parameters, making it particu-
larly well suited for real-world applications where computational resources are limited.

Table 5. Comparison of the best transfer learning model (VGG16) and YOLOv11n.

Metric VGG16 YOLOv11n


Training Accuracy (%) 97.89 99.87
Validation Accuracy (%) 95.60 96.24
Test Accuracy (%) 89.18 92.64
Number of Parameters (M) 138.4 2.6
Computational Efficiency High Resource Use Low Resource Use

This balance between high accuracy and a reduced computational demand positions
YOLOv11n as the most viable option for real-time railway fault detection, while still
acknowledging the strong performance of VGG16.

4. Discussion
In this study, a binary classification approach was utilized to distinguish between
defective and non-defective railway tracks as a foundational step toward automated fault
detection. This simplification enabled an efficient method for detecting structural defects,
particularly cracks, which represent a significant safety concern. However, this approach
Infrastructures 2025, 10, 3 14 of 17

inherently limits the ability to identify a broader spectrum of defect types, such as fastener
issues and wear patterns, which are critical for comprehensive railway maintenance.
The dataset primarily included images labeled as either cracked or non-defective,
which constrained the capacity of deep learning models such as VGG16 and MobileNet
to generalize beyond these categories. While fine-tuning improved model performance
significantly, achieving an accuracy above 89% in VGG16, the absence of diverse labeled ex-
amples for other defect types impeded the exploration of more granular classification tasks.
Future work will focus on expanding the dataset to incorporate a wider variety of defect
categories and implementing multiclass classification frameworks. These advancements
aim to provide a deeper understanding of defect characteristics and enhance the practical
utility of fault detection systems in real-time scenarios.
The incorporation of YOLOv11 in this study demonstrated its ability to surpass trans-
fer learning models in both accuracy and computational efficiency. By leveraging advanced
data augmentation techniques and a lightweight architecture with only 2.6 million pa-
rameters, YOLOv11 achieved a test accuracy of 92.64%, outperforming VGG16 (89.18%).
This indicates that models specifically designed for efficient image processing, such as
YOLOv11, can provide robust performance for classification tasks while requiring fewer
computational resources, making them more suitable for practical deployment in con-
strained environments.
Compared with other studies on railway fault detection, our findings support the
growing consensus that deep learning models, including transfer learning architectures
and specialized frameworks like YOLOv11, can effectively enhance fault detection accuracy
and efficiency. Previous research, such as the study by Gibert et al. [28], utilized a fully
convolutional DCNN for material classification and semantic segmentation in railway
images. In contrast, our methodology focused on transfer learning with pre-trained models
like ResNet50V2, Xception, and VGG16, as well as adapting YOLOv11 for classification
tasks. While Gibert et al. achieved 93.35% accuracy in material classification, our approach
using YOLOv11 demonstrated that advanced frameworks can achieve similar or better
performance with a lower computational overhead.
Compared to Chandran et al. [29], who combined image processing with CNN and
ResNet-50 models to detect missing railway fasteners, our study compared transfer learn-
ing and YOLOv11 to classify railway tracks into defective and non-defective categories.
YOLOv11, in particular, achieved superior accuracy with a more efficient architecture, high-
lighting its suitability for real-world deployment. Similarly, Yilmazer and Karakose [30]
achieved high accuracy for fastener defect detection using drone-acquired images and
ResNet-based models. Our study, relying on ground-level imagery and simpler setups,
demonstrated comparable accuracy, underscoring the versatility of YOLOv11 for classifica-
tion tasks across diverse imaging methods.
In a similar study, Yilmazer and Karakose [30] utilized an autonomous drone to collect
aerial railway images, applying Faster R-CNN for object detection and ResNet101v2 for
defect classification. Their approach achieved an accuracy of 95% for fastener defects and
98% for rail surface defects, benefiting from drone-acquired stability and video preprocess-
ing. In contrast, our approach, which used ground-level images, demonstrated comparable
accuracy in fault detection, particularly for fastener issues, by optimizing transfer learning
strategies without requiring aerial imaging setups. This highlights that model fine-tuning
can yield effective results across diverse data collection methods.
Studies such as those by Guo et al. [31] and Min et al. [32] focused on YOLO-based ob-
ject detection frameworks optimized for real-time applications. While Guo et al. achieved
a mean average precision of 94.4% with YOLOv4, and Min et al. achieved 96.1% accuracy
with an improved YOLOX, their methodologies prioritized real-time performance. In con-
Infrastructures 2025, 10, 3 15 of 17

trast, our adaptation of YOLOv11 for classification highlights its effectiveness in scenarios
where computational efficiency and accuracy are prioritized over the real-time speed.
Overall, these comparisons emphasize the complementary strengths of transfer learn-
ing models and frameworks like YOLOv11. While transfer learning provides a practical and
adaptable approach for fault detection across multiple defect types, YOLOv11’s lightweight
architecture and high accuracy offer an efficient alternative for real-world deployment in
constrained environments. Future work should explore integrating multiclass classifica-
tion capabilities into YOLO-based frameworks and transfer learning models to address a
broader range of railway defects with enhanced precision.

5. Conclusions
This research has demonstrated the effectiveness of using deep learning architectures,
including both transfer learning models and specialized frameworks like YOLOv11, in the
development of an automated and reliable system for railway defect detection, a critical
area for safety and operational efficiency. A methodology involving a combination of multi-
phase training and a controlled fine-tuning process was applied to the selected models,
enabling the identification of optimal architectures for the successful classification of rail de-
fects while minimizing the training time. Notable transfer learning models include VGG16,
MobileNet, and InceptionV3. Data augmentation and a fine-tuning process were essential
to improve model generalization and mitigate overfitting. The high accuracy achieved
by VGG16 underscores its feasibility for real-world applications in railway monitoring
systems, while MobileNet and InceptionV3 also demonstrated strong performance.
The inclusion of YOLOv11 further highlighted its potential as a state-of-the-art frame-
work for railway fault classification. The model achieved a test accuracy of 92.64%, surpass-
ing the best results obtained by VGG16 and other deep learning models used in this research.
Additionally, the nano version of YOLOv11, with only 2.6 million parameters, proved to
be the most computationally efficient architecture, requiring significantly fewer resources
while maintaining superior performance. These results position YOLOv11 as the most
viable option for real-world applications, particularly in scenarios where computational
resources are limited, such as in edge devices or embedded systems.
Despite some limitations, such as the constrained dataset focused on the binary clas-
sification of defects, our proposed approach confirms the potential of deep learning for
monitoring railway infrastructure. Expanding the dataset to include a broader range of
defect categories and implementing multiclass classification frameworks could further
enhance the system’s practical utility. Future work should also focus on refining YOLOv11’s
training processes, exploring real-time implementation, and optimizing deployment on
embedded systems to improve the scalability and efficiency. These advancements would
contribute significantly to ensuring railway safety and operational reliability while reducing
maintenance costs through predictive fault detection.

Author Contributions: Conceptualization, O.R.-A., E.C.-P. and G.C.-R.; methodology, O.R.-A., J.M.C.-P.
and I.M.-S.; software, O.R.-A., E.C.-P. and G.C.-R.; validation, O.R.-A., M.A.Q.-J., E.C.-P. and I.M.-S.;
formal analysis, O.R.-A., E.C.-P. and I.M.-S.; investigation, O.R.-A., E.C.-P. and I.M.-S.; resources,
O.R.-A., M.A.Q.-J., E.C.-P. and G.C.-R.; data curation, J.M.C.-P., E.C.-P. and G.C.-R.; writing—original
draft preparation, O.R.-A., E.C.-P. and I.M.-S.; writing—review and editing, O.R.-A., E.C.-P., M.A.Q.-J.
and I.M.-S.; visualization, O.R.-A., E.C.-P. and I.M.-S.; supervision, M.A.Q.-J., J.R.-R., O.R.-A., E.C.-P.
and I.M.-S.; project administration, O.R.-A., E.C.-P. and I.M.-S.; funding acquisition, O.R.-A., J.R.-R.,
M.A.Q.-J., J.M.C.-P., E.C.-P. and I.M.-S. All authors have read and agreed to the published version of
the manuscript.
Infrastructures 2025, 10, 3 16 of 17

Funding: This work was funded by the Consejo Nacional de Humanidades, Ciencias y Tecnologías
(CONAHCyT) of Mexico under Grant CF-2023-I-1496 and the Dirección General de Asuntos del
Personal Académico (DGAPA), National Autonomous University of Mexico (UNAM), under Project
UNAMPAPIIT TA101023 and the postdoctoral fellowship DGAPA-UNAM for O.R.-A.

Data Availability Statement: Data will be made available on request.

Conflicts of Interest: The authors declare that they have no known competing financial interests or
personal relationships that could have appeared to influence the work reported in this paper.

References
1. Rampriya, R.; Jenefa, A.; Prathiba, S.B.; Julus, L.J.; Selvaraj, A.K.; Rodrigues, J.J. Fault Detection and Semantic Segmentation on
Railway Track Using Deep Fusion Model. IEEE Access 2024, 12, 136183–136201. [CrossRef]
2. Zheng, D.; Li, L.; Zheng, S.; Chai, X.; Zhao, S.; Tong, Q.; Wang, J.; Guo, L. A Defect Detection Method for Rail Surface and
Fasteners Based on Deep Convolutional Neural Network. Comput. Intell. Neurosci. 2021, 2021, 2565500. [CrossRef] [PubMed]
3. Shafique, R.; Siddiqui, H.U.R.; Rustam, F.; Ullah, S.; Siddique, M.A.; Lee, E.; Ashraf, I.; Dudley, S. A novel approach to railway
track faults detection using acoustic analysis. Sensors 2021, 21, 6221. [CrossRef]
4. Rifat, A.; Pandao, P.; Babu, B.S. Solar powered fault detection system for railway tracks. Eur. J. Electr. Eng. Comput. Sci. 2022,
6, 39–43. [CrossRef]
5. Gálvez, A.; Diez-Olivan, A.; Seneviratne, D.; Galar, D. Fault detection and RUL estimation for railway HVAC systems using a
hybrid model-based approach. Sustainability 2021, 13, 6828. [CrossRef]
6. Liu, Q.; Liang, T.; Dinavahi, V. Real-time hierarchical neural network based fault detection and isolation for high-speed railway
system under hybrid AC/DC grid. IEEE Trans. Power Deliv. 2020, 35, 2853–2864. [CrossRef]
7. Ghosh, C.; Verma, A.; Verma, P. Real time fault detection in railway tracks using Fast Fourier Transformation and Discrete
Wavelet Transformation. Int. J. Inf. Technol. 2022, 14, 31–40. [CrossRef]
8. Andrusca, M.; Adam, M.; Dragomir, A.; Lunca, E.; Seeram, R.; Postolache, O. Condition monitoring system and faults detection
for impedance bonds from railway infrastructure. Appl. Sci. 2020, 10, 6167. [CrossRef]
9. Siddiqui, H.U.R.; Saleem, A.A.; Raza, M.A.; Zafar, K.; Munir, K.; Dudley, S. IoT based railway track faults detection and
localization using acoustic analysis. IEEE Access 2022, 10, 106520–106533. [CrossRef]
10. Yu, Q.; Liu, A.; Yang, X.; Diao, W. An Improved Lightweight Deep Learning Model and Implementation for Track Fastener Defect
Detection with Unmanned Aerial Vehicles. Electronics 2024, 13, 1781. [CrossRef]
11. Tong, L.; Jia, L.; Geng, Y.; Liu, K.; Qin, Y.; Wang, Z. Anchor-adaptive railway track detection from unmanned aerial vehicle
images. Comput.-Aided Civ. Infrastruct. Eng. 2023, 38, 2666–2684. [CrossRef]
12. Moriello, R.S.L.; Caputo, E.; Gargiulo, F.; de Alteriis, G.; Donvito, A.; Bitonto, P.; Alfano, A. An Internet of Things-based Solution
for Monitoring Freight Train Carriages. In Proceedings of the 2024 IEEE International Workshop on Metrology for Industry 4.0 &
IoT (MetroInd4.0 & IoT), Firenze, Italy, 29–31 May 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 94–99.
13. Oh, K.; Yoo, M.; Jin, N.; Ko, J.; Seo, J.; Joo, H.; Ko, M. A review of deep learning applications for railway safety. Appl. Sci. 2022,
12, 10572. [CrossRef]
14. Wang, T.; Yang, F.; Tsui, K.L. Real-time detection of railway track component via one-stage deep learning networks. Sensors 2020,
20, 4325. [CrossRef] [PubMed]
15. Shim, J.; Koo, J.; Park, Y.; Kim, J. Anomaly detection method in railway using signal processing and deep learning. Appl. Sci.
2022, 12, 12901. [CrossRef]
16. Kapoor, R.; Goel, R.; Sharma, A. An intelligent railway surveillance framework based on recognition of object and railway track
using deep learning. Multimed. Tools Appl. 2022, 81, 21083–21109. [CrossRef] [PubMed]
17. Thendral, R.; Ranjeeth, A. Computer vision system for railway track crack detection using deep learning neural network. In
Proceedings of the 2021 3rd International Conference on Signal Processing and Communication (ICPSC), Coimbatore, India,
13–14 May 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 193–196.
18. Wang, M.; Li, K.; Zhu, X.; Zhao, Y. Detection of surface defects on railway tracks based on deep learning. IEEE Access 2022,
10, 126451–126465. [CrossRef]
19. Feng, J.H.; Yuan, H.; Hu, Y.Q.; Lin, J.; Liu, S.W.; Luo, X. Research on deep learning method for rail surface defect detection. IET
Electr. Syst. Transp. 2020, 10, 436–442. [CrossRef]
20. He, D.; Zou, Z.; Chen, Y.; Liu, B.; Yao, X.; Shan, S. Obstacle detection of rail transit based on deep learning. Measurement 2021,
176, 109241. [CrossRef]
21. Sresakoolchai, J.; Kaewunruen, S. Detection and severity evaluation of combined rail defects using deep learning. Vibration 2021,
4, 341–356. [CrossRef]
Infrastructures 2025, 10, 3 17 of 17

22. Brintha, K.; Joseph Jawhar, S. FOD-YOLO NET: Fasteners fault and object detection in railway tracks using deep yolo network 1.
J. Intell. Fuzzy Syst. 2024, 46, 1–15. [CrossRef]
23. Adnan, A.; Hossain, S.; Shihab, R.; Ibne, S. Railway Track Fault Detection, Dataset2 (Fastener). Available online: [Link]
com/datasets/ashikadnan/railway-track-fault-detection-dataset2fastener (accessed on 4 November 2024).
24. Adnan, A.; Hossain, S.; Shihab, R.; Ibne, S. Railway Track Fault Detection. Available online: [Link]
salmaneunus/railway-track-fault-detection (accessed on 4 November 2024).
25. Ibne, S.; Hossain, S.; RIDWAN, A. Railway-Rail-Fault_Detection. Available online: [Link]
salmaneunus/newdatasetridwanrailjanuary/discussion?sort=undefined (accessed on 4 November 2024).
26. Eunus, S.I.; Hossain, S.; Ridwan, A.E.M.; Adnan, A.; Islam, M.S.; Karim, D.Z.; Alam, G.R.; Uddin, J. ECARRNet: An Efficient
LSTM-Based Ensembled Deep Neural Network Architecture for Railway Fault Detection. AI 2024, 5, 482–503. [CrossRef]
27. Team, K. Keras Documentation: Keras Applications—[Link]. Available online: [Link] (accessed
on 30 October 2024).
28. Gibert, V.; Patel, M.; Chellappa, R. Semantic Segmentation of Railway Track Images with Deep Convolutional Neural Networks.
Available online: [Link] (accessed on
1 November 2024).
29. Chandran, P.; Asber, J.; Thiery, F.; Odelius, J.; Rantatalo, M. An investigation of railway fastener detection using image processing
and augmented deep learning. Sustainability 2021, 13, 12051. [CrossRef]
30. Yilmazer, M.; Karakose, M. Fastener and rail surface defects detection with deep learning techniques. Int. J. Adv. Intell. Inform.
2024, 10, 253–264. [CrossRef]
31. Guo, F.; Qian, Y.; Shi, Y. Real-time railroad track components inspection based on the improved YOLOv4 framework. Autom.
Constr. 2021, 125, 103596. [CrossRef]
32. Min, Y.; Guo, J.; Yang, K. Research on real-time detection algorithm of rail-surface defects based on improved YOLOX. J. Appl.
Sci. Eng. 2022, 26, 799–810.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.

You might also like