Predictive Maintenance For Smart Industry
Predictive Maintenance For Smart Industry
INDUSTRY
A Thesis Submitted to
the Graduate School
İzmir Institute of Technology
in Partial Fulfillment of the Requirements for the Degree of
MASTER OF SCIENCE
In Computer Engineering
by
Asad ASADZADE
December 2020
İzmir
ACKNOWLEDGEMENTS
First of all, I would like to give my special thanks to my supervisor Assoc. Prof.
Tolga AYAV for all kinds of support and guidance to make this study happen. Also, I
am very grateful to him for sharing his time and knowledge with me.
I also would like to thank my friends for not leaving me alone and motivating me
whenever I needed.
I am infinitely grateful to my family for their lifetime support including this
stressful phase.
ii
ABSTRACT
After the internet of things developed rapidly, it started to be used in many several
industrial areas. Thanks to IoT, data that affect the health of any equipment or other
important systems are collected. When these data are processed correctly, important
information about the production process is obtained. For example, thanks to this data,
systems based on machine learning are created to predict when various components will
fail. Thus, maintenance operations are carried out before the component’s breakdown,
and replacement operations are performed if necessary. This strategy, called predictive
maintenance, provides industries with advantages such as maximizing the life of
components, reducing extra costs, and time saving.
In this study, we applied ARF method, which is based on stream learning, on
Turbofan Engine Degradation Simulation Datasets which are provided by NASA to
estimate the remaining useful lifetime of jet engines. As a result, we mentioned about the
advantages of streaming learning over batch learning and compared our results with other
batch learning based studies which are applied on the same datasets.
iii
ÖZET
iv
To my family
v
TABLE OF CONTENTS
vi
3.4. Experimental Results ............................................................................27
REFERENCES ...............................................................................................................38
vii
LIST OF FIGURES
Figure Page
Figure 2.1. Maintenance strategies and their roles (Source: [7]) ......................................6
Figure 2.2. Machine Learning techniques (Source: medium.com)...................................8
Figure 2.3. Investment in time and memory for batch, stream learning (Source: [9]) ...11
Figure 2.4. The stream learning algorithm cycle (Source: [10]). ...................................13
Figure 2.5. Distribution changes over time (Source: [11]). ............................................13
Figure 2.6. Comparison of HT and HAT on drifts (Source: [15]). .................................15
Figure 2.7. Filter feature selection method (Source: [21]). ............................................17
Figure 2.8. Wrapper feature selection method (Source: [21]). .......................................18
Figure 2.9. Embedded feature selection method (Source: [21]). ....................................18
Figure 3.1. Simplified diagram of engine simulated in C-MAPSS (Source: [24]) .........21
Figure 3.2. The first and last five rows of train dataset (FD001) ...................................23
Figure 3.3. (a,b,c,d,e,f) Six sensor readings until engine 1 fails. ...................................24
Figure 3.4. Performance evaluation of ARF-Reg on the FD001 dataset. .......................29
Figure 3.5. Performance evaluation of ARF-Reg on the FD002 dataset. .......................30
Figure 3.6. Performance evaluation of ARF-Reg on the FD003 dataset. .......................31
Figure 3.7. Performance evaluation of ARF-Reg on the FD004 dataset. .......................32
Figure 3.8. Predicted and actual values of RUL for FD001 dataset. ..............................33
Figure 3.9. Predicted and actual values of RUL for FD003 dataset. ..............................34
Figure 3.10. Performance evaluations of ARF-Reg and HAT methods. ........................35
viii
LIST OF TABLES
Table Page
ix
ABBREVIATIONS
x
CHAPTER 1
INTRODUCTION
Industry started to advance rapidly in the 'Industry 4.0' era after the emergence
and development of the Internet of Things technology. Today, a large amount of data is
collected on several important sectors such as environmental science, industry etc. thanks
to sensors and other devices. For example, using these sensors which are integrated into
the equipment in factories, data such as working conditions of the equipment, heating
and vibration are produced. The correct processing and analysis of these data helps in
making important decisions for the production process. In addition, the health status of
these equipment or more vital important systems are continuously monitored and
analyzed with methods which based on collected data [7].
It is known how important the system, or any equipment maintenance activities
are. Because maintenance activities directly affect the production process. Unnecessary
and untimely maintenance activities may cause time wasting, extra cost and sometimes
even life-threatening, most importantly. For this reason, maintenance activities must be
carried out as planned and when it is necessary in order to provide advantages.
Recently, the Predictive Maintenance strategy has attracted more attention than
other methods because it can address the above difficulties. In this strategy, the goal is to
predict when maintenance will need to take place, using predictive tools such as Machine
Learning based on data. And nowadays this strategy is used in many industrial areas [1].
Since Machine Learning (ML), Deep Learning and Artificial Intelligence are
popular methods and effective approach, they are used for several kind of areas. There
are experimental studies which are related with ML and Deep Learning used in predictive
maintenance applications. Linear Regression, Support Vector Machines, Random
Forests, Decision Trees, K-Nearest Neighbors, Convolutional Neural Network (CNN)
and Long Short-Term Memory (LSTM) are some of them for both ML and Deep
Learning methods. Carvalho et al. [7] handled literature review related with predictive
maintenance with machine learning and showed 36 studies which based on batch learning
and were published until 2019. Different kind of datasets were used in these studies for
either classification or regression tasks. Kulkarni et al. [31] applied Random Forest
1
method to estimate presences of problems in refrigerators at any point as a classification
task and as a result, they got 95% precision and 46% recall. In another study, Random
Forest was applied on a dataset which include various sensor measurements related with
cutting machine [32]. Also, in this study, early breakdowns of cutting machine were
estimated as classification task and they achieve 95% overall accuracy. Biswal et al. [33]
proposed Artificial Neural Network method to classify faulty and healthy wind turbine
components and they obtained 92.6% classification accuracy. In another study [25] Long
Short Term Memory was applied on engine dataset and they classify engine health
condition to four class: fail, repair, caution and healthy. So, they got results with 85%
accuracy. Dong et L. [35] used jet engine dataset which includes a set of sensor
measurements in their study and applied LSTM-RNN and Naive Bayesian methods for
regression task that aimed to estimate remaining useful lifetime of jet engines. In their
study, LSTM-RNN achieved to get better result which RMSE equals to 17.84 than Naive
Bayesian method. Kanawaday et al. [34] used four supervised models such as Naive
Bayes, Support Vector Machine, Classification And Regression Tree and Deep Neural
Network to forecast potential breakdowns and quality defects of slitting machine. As a
result Deep Neural Network performed better than other three methods with 95.69%
accuracy.
In following related studies, Turbofan Engine Degradation Simulation Datasets
were handled which also we used too in our study to predict RUL of jet engines. In all
these studies performance evaluations of methods were measured using Root Mean
Squared Error (RMSE). It is important to point out that all these methods based on batch
learning.
Table 1.1. shows RMSE values of methods used in each study. In the first study
ten different ML algorithms were used on each four datasets. As a result, Random Forest
performed better than other methods and best result was obtained from the first (FD001)
dataset [1]. In the second study, LSTM was handled on only the first (FD001) dataset
and as a result, RMSE value equals to 11.42 [2]. CNN algorithm was applied on each
four datasets in third study [3]. The best result obtained from first dataset where RMSE
value equals to 18.45. LSTM was used in the last study [4] as in the third, but on each
four data sets not only FD001. On FD001 dataset, LSTM method performed better than
on other datasets.
2
Table 1.1. RMSE values of methods in related studies.
Consider that IoT sensors collect large amount of data over time related with
equipments and other system and this data are used to monitor and analysis of health
condition of important systems. After a while, the data coming from the sensors create
big data and thus it becomes difficult for batch learning methods to process such data.
Also, in dynamic environments, the data changes over time, batch methods cannot keep
up with new data and it may cause model breakdown. As a solution, batch methods need
to be re-train on data that changes over time. But stream learning methods process each
new sample one by one and update itself, so they can adapt to changing data over time.
The purpose of this thesis is to apply a Machine Learning method in order to
estimates the Remaining Useful Lifetime (RUL) of jet engines. In this study, we handled
the problem as a regression problem and used Turbofan Engine Degradation Simulation
Datasets which are provided by NASA. Unlike the other studies which based on batch
learning performed on these datasets, we used Adaptive Random Forests, which is a
Machine Learning method adjusted for stream learning and showed that our results were
not bad at all compared with batch learning methods.
3
1.2. Outline of Thesis
4
CHAPTER 2
THEORETICAL BACKGROUND
5
3. Predictive Maintenance (PdM): Using predictive tools such as ML techniques,
this strategy predicts when maintenance should take place based on historical data of
the equipments or parts. Highly accurate estimation of failures before they occur,
eliminates problems such as time wasting, extra cost and operational safety. Figure2.1.
shows general maintenance strategies and their aims. As we can see, each strategy has
its own role.
6
2.2. Machine Learning
• Data collection step defines how data is gathered for a ML model and how to
select the most valuable data for this model.
• Data preprocessing step, we can divide this step into some steps such as data
transformation step, feature selection step etc. In most cases, feature values in data
sets have very different scales from each other, which makes the model mislead during
prediction. Therefore, transformation rescales features from an interval to a new one
and it is very important to build a model with higher accuracy. Also feature selection
step is one of the most necessary steps to establish a highly reliable prediction model,
it aims to select features which are the most effective for the target.
• Model selection step, this step the most appropriate ML model is selected for
high accurate prediction for applications.
ML techniques are generally divided into the following three types: ‘Supervised
Learning’, ‘Unsupervised Learning’ and ‘Reinforcement Learning’. Figure 2.2.
illustrates these ML techniques.
7
Figure 2.2. Machine Learning techniques (Source: medium.com).
8
models estimate photo belongs to which face among thousands of faces.
Regression tasks are also Supervised Learning method but difference between
Classification and Regression is the results of prediction which in Regression they are
numeric values but in Classification they are categoric values [8]. There are many popular
ML algorithms for Supervised Learning tasks. Some of them are Logistic Regression, K-
Nearest Neighbors, Decision Trees, Support Vector Machine, Naive Bayes, Random
Forest etc.
9
2.3. Batch and Streaming Learning
10
cope with such changes. As can be seen from the examples for the dynamic environments
above, batch learning systems cannot easily adapt to rapidly changing environments,
which causes problems such as model degradations. In general, in batch learning systems,
users decide to train the model after sufficient data is collected. Hence, two major
problems arise. First, models based on batch learning cannot incorporate new incoming
data into the process. Thus, these models require all steps to be rerun as new data is
collected. The second is to decide whether the model should be trained according to new
data only. In general, the value of variation on the data is taken into account before retrain
the model on only new data. But in stream learning systems, models update themselves
according to each new data and can adapt to such changes [9].
Figure 2.3. shows the relationship between predictive performance and
investment in resources such as time and memory for both batch and stream learning.
Figure 2.3. Investment in time and memory for batch and stream learning (Source: [9]).
As we can see that investment on the batch learning are increasing while data is
increasing but investment is constant despite data increase in stream learning. It is very
important that models based on stream learning not only get high performance but can
also process infinite incoming data stream. In stream learning, incoming data should be
processed as fast as possible, and the model must be ready to process the next incoming
data.
11
2.4. Machine Learning for Streaming Data
Where T→ ∞, 𝑥⃗𝑡 is feature value and 𝑦𝑡 its target value. The values of 𝑦𝑡 are continues
in regression tasks but in classification task it is categorical. Here the goal is to predict
the target 𝑦 value corresponding to each new incoming 𝑥⃗ value.
Unlike batch learning which evaluates all data in the training process, the data are
handled one by one during training process and models is updated with each new data
sample (𝑥⃗𝑡 , 𝑦𝑡 ) in the stream learning. Different from batch learning models, models
based on stream learning must fulfill following requirements [10].
1. Only one sample should be processed at a time and it only should be handled one
time.
2. Limited memory should be used. Because data streams can continue endlessly, it
is not practical to store them.
3. Should be worked in a limited time. Delayed operations which could cause the
algorithm to fail should be avoided.
4. Models based on stream learning must be constantly updated and always ready
to predict
Repeated cycle in Figure 2.4. shows the use of a stream learning algorithm and
the steps in which the requirements should be done. In the 1st step algorithm receives the
available sample from the stream by fulfilling Requirement 1. In the second step, the
received sample processed by the algorithm and this step the algorithm must fulfill
Requirement 2 and Requirement 3. In the last step the algorithm makes prediction and
must be able to ready to receive next sample (Requirement 4). Thus, stream learning
algorithms must fulfill these mentioned important requirements in order to process
streaming data which continuously income to model.
12
Figure 2.4. The stream learning algorithm cycle (Source: [10]).
The concept drift in the field of ML is known as the change in the relationship
between features and targets over time. Changes in p(y|X) are known as real concept drift
and this type of drift cannot be affected changes which happens in p(X). Generally, in the
stream learning concept drift defined as following:
Where 𝑝𝑡0 and 𝑝𝑡1 indicate joint distributions between X features and y targets at 𝑡0 and
𝑡1 time points respectively [11].
Over time, changes in data distributions i.e., concept drift, may occur in different
ways. Figure 2.5. illustrates these different types of changes.
13
• Sudden/Abrupt drift occurs when suddenly switching from a concept to another
concept.
• Incremental: unlike sudden/abrupt, there is no sudden transition between
concepts, that is, intermediate concepts arise while the transition.
• Gradual: there are random examples belong to both concepts occur during the
transition.
• Reoccuring: again, reoccuring old concepts sometimes during the data streaming
is continues.
• Outlier: it is important point out that outliers is not concept drift because
generally they are far from the mean of the samples in the dataset and do not continue
during data stream.
Since methods based on stream learning constantly update themselves so they can
adapt to concept drifts. But despite this, several methods are recommended in the
literature to detect and react fast and efficiently concept drifts. For example, the
Hoeffding Tree (HT) [12] and Hoeffding Adaptive Tree (HAT) [13] algorithms which
both are decision tree and are set for stream learning. Difference between these two
algorithms is that the first one does not have any special property to detect concept drifts
but the second can detect concept drifts using ADaptive WINdowing (ADWIN) [14].
ADWIN, one of the most common drift detection methods, uses windows (W0,
W1) on the last incoming samples to determine if there is a change in distributions. The
average values of the samples in W0 and W1 are compared with the distribution of the
new observed sample to determine whether there is a change , so if the equality is broken,
a warning signal emerges it means that the drift has been detected so W0 is changed with
W1 and a new W1 window is launched.
If the concept drift is not handled correctly, it can seriously affect the performance
of the predictive model. Stream learning models are able to adapt to concept drifts
because they update themselves constantly, but models which based on batch learning
cannot cope with the concept drifts when these drifts occur because the models are trained
with different dataset (train data) at first.
In following example, stream learning classification task is handled using two
14
classification model namely HoeffdingTreeClassifier (HT) and HoeffdingAdaptiveTree
( HAT)Classifier [15]. HAT is adaptive version of HT it means HAT have drift detection
property (uses ADWIN) unlike HT. Used data in this study includes three gradual drifts
at 5000, 10000 and 15000 points. Figure 2.6. illustrates the performance of HT and HAT
and their reactions to concept drifts on this dataset.
As seen from Figure6, both models show almost the same performance at the beginning.
But after 5000 and 15000 samples, drifting occurs and the performance of both models
decreases, but the HAT recovers faster. We can also see that after 1000 sample HT
performs better that HAT, but it doesn't take long. Also, it is observed that HAT is more
effective against drifts as well as better than HT in overall performance.
15
passes over input data are required to implement bootstraps for trees which are combined
in Random Forests. But it is not practicable to multiple passes over input data in stream
learning. Therefore, the following requirements must be fulfilled for random forests can
be applied to such data streams. First, online bagging process must be used, second, a
limited feature split must be used in each leaf split. To achieve the second requirement,
base trees must be modify, feature set should be restricted effectively to a random m
subset for more splits (m < M, M is total number of features). To adaptation to first
requirement, new bagging method was proposed namely ‘leveraging bagging’ [18].
Adaptive Random Forests (ARF) is the adaptive version of Random Forests for
streaming learning. In general models which based on stream learning can adapt the
concept drifts because they update themselves continuously. However, some stream
learning methods use extra drift detection methods. For example, ARF uses ADWIN drift
detection method as a default. In addition, ARF uses Hoeffding trees [19] as base learner.
Hyper-parameter tuning which aims to find optimum parameter values for models
is also very important for ML models to reach high accuracy. There are several hyper-
parameter tuning methods for batch learning based models in literature. Whereas it is
very difficult to apply this approach for stream learning based models. The naive solution
to this challenge is to separate the first incoming samples from the stream and find
optimum parameter values on this subset. But it is debatable that how this approach is
effective [20]. One more advantage of ARF is that it achieves highly reliable results
without further parameter tuning. Generally, ARF has following limited main
parameters: m (feature size), n (number of trees) and a parameter which control drift
detection method [18].
16
The same processes above are also required for stream learning. But unlike batch
learning, stream learning requires these steps continuously during data streaming.
Whereas it is very complicated to make these adjustments for streaming learning. For
example, rescaling a feature at a certain range in streaming data is a very complex process
because the statistical information of the data such as max and min is not known priori
because new data are constantly coming. So preprocessing operations are handled as
offline in most of streaming learning task due to lack of online preprocessing studies for
streaming data [20].
17
Wrapper: In this method, firstly a features subset is selected for train model and then
new features are added to or previous features are removed from the subset until to find
appropriate subset which include the most related features. The disadvantage of this
method is that it requires a lot of computation. Figure 2.8. illustrates Wrapper feature
selection method.
Embedded: In this method, the most relevant feature subset search happens during
learning process. The purpose of this method is to find the features which are the most
effective on model accuracy when the model is created. Figure 2.9. depicts Embedded
feature selection method.
Filter methods are more efficient than wrapper and embedded methods against
overfitting as well as faster than other two methods [22].
18
2.4.6. Evaluation Techniques
Due to the infinite nature of the streaming data, model evaluation methods such
as train test set split and cross validation which are used for models created on batch data
are not suitable for models which created on the streaming data. So, following special
performance evaluation methods are suggested for predictive models created for
streaming data [23]:
Holdout or Periodic Holdout: in this method, some amount of data is predefined from
the stream for test the model and after these seen test data or after a while predefined
testing samples are renewed again. So, in this method, only defined samples are used to
test the model not all samples from the stream.
Prequential Evaluation (PE): This method allows all the samples in the stream to be
tested first and then used to train the model after updating the model. Thus, the model is
tested on samples that it has not seen before. In addition, the model does not need
predefined test data specifically for testing. All data are used to test-then-train the models.
This makes the models make more accurate prediction because models update
continuously.
19
CHAPTER 3
Experiments part of this study has been carried out on Jupyter Notebook using
python program language and libraries such as numpy, pandas, seaborn, scikit-learn
which are necessary for ML and scikit-multiflow library for streaming learning.
Python programming language is a general-purpose language like other
programming languages like C ++, JavaScript, C # and Java. But in recent years, the
advantages of python compared to other languages have further fueled interest in it. For
example, to be easier and flexible, to be more standard than other competitors in the field
of Data Science, to work in almost all operating systems, to have a large number of
libraries and to be free.
Jupyter Notebook is a web-based technology and allows us to write our cod in
independent cells. Moreover, using Jupiter Notebook, we can easily implement processes
such as data cleaning, statistical modeling, data visualization, ML etc.
Numpy is the core library which is known as Numerical Python. Numpy library
helps us to do mathematical operations on multidimensional arrays and matrices easily.
Pandas is one of the most popular python libraries to use in analysis of data which
backend source code was strictly written in C or Python, providing highly optimized
performance.
Seaborn is the widely used data visualization library which based on matplotlib.
Seaborn library offers to draw high-level, useful and easy interpretable graphics.
Scikit-learn is an open source Python library that uses a single framework to
implement a variety ofML, pre-processing, cross-validation and visualization algorithms.
Scikit-multiflow is a ML library that was made for multi-output and stream data
and was written in Python. It is necessary to point out that scikit-multiflow and scikit-
learn which is designed for batch learning are different from each other.
20
3.2. Dataset Description
There are four datasets and they differ from each other according to their
operational settings. Each one has training, testing and RUL sets in the repository. In this
study, all datasets (FD001, FD002, FD003, FD004) are used for experiments. Each
dataset contains IDs for each different engine, 21 different sensor measurements, data
describing three ambient conditions of machines and the working time as a cycle. Table
3.1. contains description of dataset and helps to understand it better [25].
21
Table 3.1. Description of the dataset (Source: [25]).
ID ID of Engine
Setting1 Altitude
22
Engines run normally at the beginning of the time series, but after some cycles
over time, the motors start to malfunction and fail suddenly. Each of four datasets
contains train, testing and ground true sub datasets.
Figure 3.2. The first and last five rows of train dataset (FD001)
Figure 3.3. (a,b,c,d,e,f) shows six different sensor readings as line plots for an
engine which its ID number is 1 in the FD001 dataset. As we understand, each plot shows
a line that causes the first engine to malfunction. As understood from the figure, the
readings of sensor1 and sensor16 do not change from the beginning to the end. Unlike
this, the values of sensor4 and sensor11 increase, while the values of sensor7 and
sensor14 decrease and they cause degradation of the engine.
23
(a) (b)
(c) (d)
(e) (f)
24
3.3. Data Preprocessing
As we know, each of FD001, FD002, FD003 and FD004 consists of train and test
sets separately, and we handle whole dataset in this study by combining the train and test
sets in each of them. Table 3.2. shows the number of samples in each and their sum
respectively.
Training and test sets consist of id, cycle, operational settings and sensors
columns. In order to use these data for streaming models, we first need to calculate the
real RUL values in each cycle and add these values as columns to the dataset. Following
equation helps to understand better. RUL is remining number of cycles until engine will
fail
where last_cycle is the last cycle that is taken from Cycle feature in the dataset. After
adding RUL values, we drop ID feature, which is just ID number and identify engine,
because it is not informative for prediction process and does not affect RUL.
25
3.3.1. Data Transformation
In most cases, feature values in data sets have very different scales from each
other, which makes the model mislead during prediction. Therefore, transformation is
very important to build a model with higher accuracy. In this study, we used the Min-
Max transformation method to rescale features in [0-1] range. Min-Max transformation
rescales features from an interval to a new one. Following equation is used for this
method.
𝑥𝑖 − min(𝑥)
𝑥′𝑖 = (4)
max(𝑥) − min(𝑥)
where 𝑥𝑖 is input instance, min(𝑥) and max(𝑥) are minimum and maximum values and
𝑥′ is transformed data corresponding to the 𝑥.
𝑐𝑜𝑣(𝑋, 𝑌)
𝜌𝑋,𝑌 = (5)
𝜎𝑋 𝜎𝑌
26
other words, we tried multiple combinations to determine the optimum number of sensors
for each data set and got the best result in the above case.
FD001 s2, s3, s4, s7, s11, s12, s15, s17, s20, s21
FD002 s3, s4, s9, s11, s14, s15, s16, s17, s20, s21
FD003 s2, s3, s4, s7, s8, s9, s10, s11, s12, s13, s14, s17
FD004 s2, s3, s4, s9, s10, s11, s14, s15, s16, s17
• pretrain_size means the number of samples used to train the model before any
prediction takes place.
27
• show_plot shows the “current” and “mean” values of evaluation metrics as a
dynamic plot.
max_samples 33727
pretrain_size 200
show_plot True
n_wait 1
metrics ['mean_square_error','mean_absolute_error',
'running_time','model_size','true_vs_predicted']
output_file 'filename.csv'
28
We can see the performance evaluation of the model on the data from the stream
as a dynamic plot. Model performance has been evaluated in two ways (Mean and
Current) at various points on the dynamic plot. While mean means average performance
on the data seen so far, current is the performance on the most recent data. We have also
measured resources such as time and memory. Figure 3.4. indicates performance
evaluations of ARF-Reg model on the FD001 dataset.
We can realize performance of ARF-Reg which MSE is 308.2660 and MAE equals to
8.9876. Moreover, we have calculated RMSE for ARF-Reg and it equals to 17.5517. It
is important to point out that running_time and model_size is shown with their only
current values on the dynamic plot. As a result, in the FD001 dataset, the total train time
of the samples was 1865.33 seconds, and the test time was 161.14 seconds.
Table 3.5. shows parameter settings of PE for FD002 dataset in detail.
29
Table 3.5. Parameter values of Prequential Evaluation for FD002 dataset.
max_samples 87750
pretrain_size 1000
show_plot True
n_wait 1
metrics ['mean_square_error','mean_absolute_error',
'running_time','model_size','true_vs_predicted']
output_file 'filename1.csv'
We also used the same parameter settings of ARF-Reg which is mentioned at the
beginning of this section. Figure 3.5. illustrate performance evaluations for ARF-Reg
model with several metrics.
30
Performance evaluation of ARF-Reg on the FD002 dataset is like that: MSE is 316.2103,
MAE is 8.2481 and the extra calculated RMSE value is 17.7823. The total train time of
the samples was 5387.36 seconds, and the test time was 436.35 seconds. For FD003
dataset, parameters values of PE are set like in Table 3.5. but we change max_samples
value to 41315 because FD003 dataset includes 41315 samples and then we evaluate
performance of ARF-Reg for this dataset.
According to Figure 3.6. MSE is 1648.4953, MAE is 23.2139 and calculated RMSE is
40.6016. The total train and test time of the samples were 3436.48 second, and 341.85
second, respectively.
Following results belong to performance evaluation of ARF-Reg on the FD004
dataset. The same parameter values of PE are set for this dataset and again we change
max_samples value to 102463 because of number of samples in FD004 dataset. Figure
3.7. depicts detailed information about performance evaluation of ARF-Reg.
31
Figure 3.7. Performance evaluation of ARF-Reg on the FD004 dataset.
As we can see on right side of Figure 3.7., MSE and MAE values are 625.4140 and
9.6218, respectively. Also calculated RMSE value is 25.0082. The total train and test
time of the samples was 6257.61 second, and 518.58 second, respectively. In all datasets,
each sample was trained approximately at between 0.04 – 0.08 second and tested at
between 0.004 – 0.008second.
Finally, we have shown the performance evaluation of the ARF-Reg model in a
single table for the four data sets which we used. Table 3.6. illustrate MSE, MAE and
RMSE values of ARF-Reg for each four datasets.
Table 3.6. MSE, MAE and RMSE values of ARF-Reg for each four datasets
32
It is important to note that above results based on whole dataset. It means, we
combine train and test sets, so they were handled together because we used all samples
for test then train. Whereas we also calculated RMSE values of ARF-Reg on test sets for
comparison with related studies which are mentioned chapter 1. Table 3.7. depicts RMSE
values of ARF-Reg on test sets.
Datasets RMSE
FD001 18.1560
FD002 18.4601
FD003 48.4478
FD004 27.9401
Figure 3.8. Predicted and actual values of RUL for FD001 dataset.
Based on test set of FD001 dataset Figure 3.8. indicates comparison between actual values
of RUL and our predicted values using ARF-Reg. x axis and y axis of the Figure 3.8.
shows ID number and RUL cycles of each engines, respectively. For instance, according
to actual value, engine 1 will fail after 112 cycles, our model predicted it as 120 cycles.
33
In the same way, according to Figure 3.8. engine 2 will fail after 98 cycles, prediction of
ARF-Reg is 103 cycles.
Also, we can see comparison between actual values of RUL and our predicted
values using ARF-Reg for FD003 dataset on Figure 3.9. Considering the difference
between the predicted and actual values in this figure and Figure 3.8., we can easily see
that the performance of ARF-Reg on the FD001 data set is better than the performance
on the FD003 data set.
Figure 3.9. Predicted and actual values of RUL for FD003 dataset.
As seen from this Figure 3.9., while the actual RUL cycle of the first engine is 44, our
ARF-Reg model estimated it as 48. In another point, actual RUL cycles of engine96 is
113 but prediction of our model is 123 cycles.
In addition, scikit-multiflow provides some advantages. One of them is that it is
able to evaluate multiple models at the same time. As an example, to this, we compared
the performance of ARF-Reg which we used in our study and HAT models. Figure 3.10.
illustrates performance evaluations of both methods. MSE and MAE values are
represented on the right side of the dynamic plot. Also, we can see time (train and testing)
and memory values for each method on the bottom of the plot.
34
Figure 3.10. Performance evaluations of ARF-Reg and HAT methods.
35
CHAPTER 4
4.1. Conclusion
In this study, regression method with ARF-Reg which is one of the most popular
stream learning methods is proposed in order to predict Remaining Useful Lifetime of
engines. Four Turbofan Engine Degradation Simulation Datasets which were taken from
Nasa data repository were used in this study. The main objective of the study is to use
stream learning method (Adaptive Random Forests for Regression) on mentioned
datasets and talk about advantages of stream learning against to conventional batch
learning.
Firstly, we combined train and test sets to handle them together in each four
datasets because of nature of streaming learning and then data prepare step was applied
to be suitable for ML model. Data preprocessing operations such as feature selection and
data transformation were handled as offline because currently there is not any exact
solution to such operations for streaming learning. So, Min-Max transformation method
was applied to rescale feature into [0-1] range and Pearson’s Correlation Coefficient
method was used select the most relevant feature, respectively.
Prequential Evaluation method where each sample are used to test then train, was
applied to evaluate performance of ARF-Reg on each four datasets. We evaluated
performance of ARF-Reg using Mean Squared Error, Mean Absolute Error which are
metrics of Prequential Evaluation. We also calculated Root Mean Squared Error values
of ARF-Reg for each dataset.
As mentioned previously, results of performance evaluation were obtained from
whole dataset due to combine train and test sets. In addition, we also calculated Root
Mean Squared Error values of ARF-Reg on only test set to compare with other batch
learning based studies in which model performances were evaluated on test set. Results
indicated that ARF-Reg was not bad at all compared with other studies which handled
the same datasets in overall performance. In addition, stream learning is more effective
approach compared with batch learning in dynamic environment because stream learning
36
models update themselves according to new incoming data and can cope with concept
drifts. In addition, in systems based on batch learning, as data increases over time,
investments in resources such as memory and training time also increase. However, in
stream learning systems, resources are managed more effectively, which makes stream
learning an advantageous approach for big data applications.
There are various data preprocessing methods for batch learning in the literature,
but it is very difficult to say the same for stream learning. Due to the nature of stream
learning, it is necessary to perform the data preprocessing steps as online and systematic
with streamed data, that is, in a way that can be applied to the streaming data. In addition,
hyperparameter tunning of streaming models must be adapted to the streaming data.
There are not any exact solutions for these issues in literature today. So, it is very difficult
to apply stream learning methods for real-world application in efficient way.
Future studies may include to find solutions for such challenges to apply stream
learning techniques to real world problems and evaluate the performance of models more
realistically. Also, several stream learning methods may be applied on different kind of
datasets and compared each other.
37
REFERENCES
[1] Mathew, V., Toby, T., Singh, V., Rao, B. M., & Kumar, M. G. (2017, December).
Prediction of Remaining Useful Lifetime (RUL) of turbofan engine using machine
learning. In 2017 IEEE International Conference on Circuits and Systems
(ICCS) (pp. 306-311). IEEE.
[2] Bruneo, D., & De Vita, F. (2019, June). On the use of LSTM networks for Predictive
Maintenance in Smart Industries. In 2019 IEEE International Conference on Smart
Computing (SMARTCOMP) (pp. 241-248). IEEE.
[3] Babu, G. S., Zhao, P., & Li, X. L. (2016, April). Deep convolutional neural network
based regression approach for estimation of remaining useful life. In International
conference on database systems for advanced applications (pp. 214-228). Springer,
Cham
[4] Zheng, S., Ristovski, K., Farahat, A., & Gupta, C. (2017, June). Long short-term
memory network for remaining useful life estimation. In 2017 IEEE international
conference on prognostics and health management (ICPHM) (pp. 88-95). IEEE,
[5] Fornlöf, V. (2016). Improved remaining useful life estimations for on-condition parts
in aircraft engines (Doctoral dissertation, University of Skövde).
[6] Susto, G. A., Schirru, A., Pampuri, S., McLoone, S., & Beghi, A. (2014). Machine
learning for predictive maintenance: A multiple classifier approach. IEEE
Transactions on Industrial Informatics, 11(3), 812-820.
[7] Carvalho, T. P., Soares, F. A., Vita, R., Francisco, R. D. P., Basto, J. P., & Alcalá,
S. G. (2019). A systematic literature review of machine learning methods applied
to predictive maintenance. Computers & Industrial Engineering, 137, 106024.
[8] Baştanlar, Y., & Özuysal, M. (2014). Introduction to machine learning. miRNomics:
microRNA biology and computational analysis.
[9] Montiel, J. (2020). Learning from evolving data streams. Proceedings of the 19th
Python in Science Conference.
[10] Bifet, A., Holmes, G., Kirkby, R., & Pfahringer, B. (2011). Data stream mining a
practical approach. Journal of Machine Learning Research, 11 (2010) 1601-1604.
38
[11] Gama, J., Žliobaitė, I., Bifet, A., Pechenizkiy, M., & Bouchachia, A. (2014). A
survey on concept drift adaptation. ACM computing surveys (CSUR), 46(4), 1-37.
[12] Domingos, P., & Hulten, G. (2000, August). Mining high-speed data streams.
In Proceedings of the sixth ACM SIGKDD international conference on Knowledge
discovery and data mining (pp. 71-80).
[13] Bifet, A., & Gavaldà, R. (2009, August). Adaptive learning from evolving data
streams. In International Symposium on Intelligent Data Analysis (pp. 249-260).
Springer, Berlin, Heidelberg.
[14] Bifet, A., & Gavalda, R. (2007, April). Learning from time-changing data with
adaptive windowing. In Proceedings of the 2007 SIAM international conference on
data mining (pp. 443-448). Society for Industrial and Applied Mathematics.
[15] Montiel, J. (2020). Learning from evolving data streams. Proceedings of the 19th
Python in Science Conference.
[18] Gomes, H. M., Bifet, A., Read, J., Barddal, J. P., Enembreck, F., Pfharinger, B., ...
& Abdessalem, T. (2017). Adaptive random forests for evolving data stream
classification. Machine Learning, 106(9-10), 1469-1495.
[19] Domingos, P., & Hulten, G. (2000, August). Mining high-speed data streams.
In Proceedings of the sixth ACM SIGKDD international conference on Knowledge
discovery and data mining (pp. 71-80).
[20] Gomes, H. M., Read, J., Bifet, A., Barddal, J. P., & Gama, J. (2019). Machine
learning for streaming data: state of the art, challenges, and opportunities. ACM
SIGKDD Explorations Newsletter, 21(2), 6-22.
[21] AlNuaimi, N., Masud, M. M., Serhani, M. A., & Zaki, N. (2019). Streaming feature
selection algorithms for big data: A survey. Applied Computing and Informatics.
[22] Montiel López, J. (2019). Fast and slow machine learning (Doctoral dissertation,
Université Paris-Saclay (ComUE)).
39
[23] Gama, J., Sebastião, R., & Rodrigues, P. P. (2009, June). Issues in evaluation of
stream learning algorithms. In Proceedings of the 15th ACM SIGKDD international
conference on Knowledge discovery and data mining (pp. 329-338).
[24] Frederick, D. K., DeCastro, J. A., & Litt, J. S. (2007). User's guide for the
commercial modular aero-propulsion system simulation (C-MAPSS).
[25] Aydin, O., & Guldamlasioglu, S. (2017, April). Using LSTM networks to predict
engine condition on large scale data processing framework. In 2017 4th International
Conference on Electrical and Electronic Engineering (ICEEE) (pp. 281-285). IEEE.
[26] Qi, L., Zhanbao, G., Diyin, T., & Baoan, L. (2016). Remaining useful life estimation
for deteriorating.
[27] Dawid, A. P. (1984). Present position and potential developments: Some personal
views statistical theory the prequential approach. Journal of the Royal Statistical
Society: Series A (General), 147(2), 278-290.
[28] Montiel, J., Read, J., Bifet, A., & Abdessalem, T. (2018). Scikit-multiflow: A multi-
output streaming framework. The Journal of Machine Learning Research, 19(1),
2915-2914.
[29] Ramírez-Gallego, S., Krawczyk, B., García, S., Woźniak, M., & Herrera, F. (2017).
A survey on data preprocessing for data stream mining: Current status and future
directions. Neurocomputing, 239, 39-57.
[30] Gomes, H. M., Barddal, J. P., Ferreira, L. E. B., & Bifet, A. (2018, April). Adaptive
random forests for data stream regression. In ESANN.
[31] Kulkarni, K., Devi, U., Sirighee, A., Hazra, J., & Rao, P. (2018, June). Predictive
maintenance for supermarket refrigeration systems using only case temperature data.
In 2018 Annual American Control Conference (ACC) (pp. 4640-4645). IEEE.
[32] Paolanti, M., Romeo, L., Felicetti, A., Mancini, A., Frontoni, E., & Loncarski, J.
(2018, July). Machine learning approach for predictive maintenance in industry 4.0.
In 2018 14th IEEE/ASME International Conference on Mechatronic and Embedded
Systems and Applications (MESA) (pp. 1-6). IEEE.
[33] Biswal, S., & Sabareesh, G. R. (2015, May). Design and development of a wind
turbine test rig for condition monitoring studies. In 2015 International Conference
on Industrial Instrumentation and Control (ICIC) (pp. 891-896). IEEE.
40
[34] Kanawaday, A., & Sane, A. (2017, November). Machine learning for predictive
maintenance of industrial machines using IoT sensor data. In 2017 8th IEEE
International Conference on Software Engineering and Service Science
(ICSESS) (pp. 87-90). IEEE.
[35] Dong, D., Li, X. Y., & Sun, F. Q. (2017, July). Life prediction of jet engines based
on lstm-recurrent neural networks. In 2017 Prognostics and System Health
Management Conference (PHM-Harbin) (pp. 1-6). IEEE.
41