Sensor time-series of aircraft engines¶

Can one predict when an engine breaks down?¶

Ruthger Righart

Blog website: https://rrighart.github.io

Hire-me remotely, for short / long projects: https://rranalytics.weebly.com/

1. Introduction

Can one predict when an engine or device breaks down?

This seems a pure engineering question. But nowadays it is also a data science question. More concretely, it is an important question everywhere engines and devices use data to guide maintenance, such as aircraft engines¹, windturbine engines², and rotating machinery³. With regards to human safety and logistic planning, it is not a good idea to just wait until an engine breaks down. It is essential to proactively plan maintenance to avoid costly downtime.

Proactive planning is enabled by multiple sensor technologies that provide powerful data about the state of machines and devices. Sensors include data such as temperature, fan and core speed, static pressure etc. Can we use these data to predict within certain margins how long an aircraft engine will be functioning without failure? And if so, how to do it?

This is the question the concept of remaining useful life (RUL) attempts to answer. It aims to estimate the remaining time an item, component, or system is able to function in accordance with its intended purpose before warranting replacement. The present blog shows how to use deep learning in Python Keras to predict the RUL. It is meant to provide an example case study, not an exhaustive and ultimate solution.

There is a lack of real data to answer this question. However, data simulations have been made and provide a unique resource. One such a fascinating simulation is provided by the C-MAPSS data¹. It provides train data that show sensor-based time-series until the timepoint the engine breaks down. In contrast, the test data constitute of sensor-based time-series a "random" time before the endpoint. The key aim is to estimate the RUL of the testset, that is, how much time is left after the last recorded time-point.

2. Modules

Python 2 was used for the current blog. Python 3 can be used with a few adaptations to the code. The following modules will be used:

import os
import pandas as pd
import numpy as np
np.random.seed(1337)
import requests, zipfile, StringIO
from IPython.display import Image
import matplotlib as mpl
import matplotlib.pyplot as plt
from pandas import read_csv
from matplotlib import pyplot as plt
from sklearn.metrics import mean_squared_error

from keras.models import Sequential
from keras.layers.core import Dense, Activation
from keras.wrappers.scikit_learn import KerasRegressor
from sklearn.model_selection import cross_val_score
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import MinMaxScaler

Using TensorFlow backend.

The following settings will be used to avoid exponential values in output or tables and to display 50 rows maximum:

pd.set_option('display.float_format', lambda x: '%.3f' % x)
pd.options.display.max_rows=50

3. Sketch of the question

A simplified lineplot illustrates best the question at hand. Given a fictive temperature sensor, for which we have 8 cycles, are we able to predict the remaining cycles?

The kind of questions that can be addressed for such a time-serie:

Can we forecast the temperature of the future time-points?
Can we predict how long the time-serie continues until a pre-defined event happens?

A=[60,63,67,74,77,81,82,87,92]
B=[92,99,104,110,116,125]
C = np.append(np.repeat(np.nan, len(A)-1), B)
plt.figure(figsize = (16, 8))
plt.plot(A, color='red', linewidth=3)
plt.plot(C, 'r:', linewidth=3)
plt.axvline(x=len(A)-1, color='grey', linestyle=':', linewidth=4)
plt.axvline(x=len(C)-1, color='grey', linestyle=':', linewidth=4)
plt.title('Example temperature sensor', fontsize=16)
plt.xlabel('# Cycles', fontsize=16)
plt.ylabel('Degrees', fontsize=16)
plt.xticks(fontsize=16)
plt.yticks(fontsize=16)
plt.show()

4. Loading the data

The data concerns the C-MAPSS set and can be found at: https://ti.arc.nasa.gov/tech/dash/groups/pcoe/prognostic-data-repository/ where it is under "Turbofan Engine Degradation Simulation Data Set" providing a ZIP file. The one that we are using for the current purpose is FD001, but the reader is encouraged to use the others as well (more data usually means better prediction power). It can be downloaded either manually or using the script below that downloads the content of the ZIP file from the requested URL in your homedirectory⁴ .

r = requests.get('https://ti.arc.nasa.gov/c/6/', stream=True)
z = zipfile.ZipFile(StringIO.StringIO(r.content))
z.extractall()

Now that the files are in your home directory, you can read them using Pandas read_csv. We will load a train- and testset, as well as a RUL set. RUL contains the true values for remaining useful life to which we are going to compare our predicted values in the testset.

train = pd.read_csv('train_FD001.csv', parse_dates=False, delimiter=" ", decimal=".", header=None)
test = pd.read_csv('test_FD001.csv', parse_dates=False, delimiter=" ", decimal=".", header=None)
RUL = pd.read_csv('RUL_FD001.csv', parse_dates=False, delimiter=" ", decimal=".", header=None)

5. Missing values

First, we need to drop the two last columns since they actually only have missing values. This is probably due to trailing tab characters in the csv file. At any time, it is always better to verify this as the owners of the data may have edited this at any moment. The following code prints the proportion of missing values for each column in the train- and testset:

tableNA = pd.concat([train.isnull().sum(), test.isnull().sum()], axis=1)
tableNA.columns = ['train', 'test']
tableNA

We will first drop the columns that consisted of missing values:

train.drop(train.columns[[-1,-2]], axis=1, inplace=True)
test.drop(test.columns[[-1,-2]], axis=1, inplace=True)

And then we will give names to all columns:

cols = ['unit', 'cycles', 'op_setting1', 'op_setting2', 'op_setting3', 's1', 's2', 's3', 's4', 's5', 's6', 's7', 's8', 's9', 's10', 's11', 's12', 's13', 's14', 's15', 's16', 's17', 's18', 's19', 's20', 's21']
train.columns = cols
test.columns = cols

And so the train- and testset look as follows:

train.head()

test.head()

And the RUL data look as follows, in short meaning that the remaining useful life for the first unit was 112 cycles, the second unit 98 cycles, etc.

RUL.head()

6. Outliers and flat lines

To know if there are outliers (extreme values) in the data, we could use descriptive statistics, train.describe().transpose(), and see if the min. and max. values are far away from the central tendency. As we can see below, this is not the case for any of the sensors.

However, if we look carefully we can see something else that is quite remarkable: there are several sensors where the min. and max. values are identical, and where the standard deviation (std) is zero. In time-series, this is called a flat line, which means there is no activity, possibly caused by sensor malfunctioning.

train.describe().transpose()

The sensors s1, s5, s10, s16, s18, and s19 as well as op_setting 3, will for this reason be removed from further analyses:

train.drop(['s1', 's5', 's10', 's16', 's18', 's19', 'op_setting3'], axis=1, inplace=True)
test.drop(['s1', 's5', 's10', 's16', 's18', 's19', 'op_setting3'], axis=1, inplace=True)

The distribution of the columns looks as follows:

train.hist(bins=50, figsize=(18,16))
plt.show()

7. Exploratory analyses of the max. number of cycles per unit

Exploratory data analyses provide insight into the aircraft engines in action. For example, it would be good to have an idea of the maximum lifetime of the 100 different units. The barplots below show that there is a large variation across units regarding max. number of cycles, and that, as expected, the number of cycles is shorter for testset than trainset.

cyclestrain = train.groupby('unit', as_index=False)['cycles'].max()
cyclestest = test.groupby('unit', as_index=False)['cycles'].max()

fig = plt.figure(figsize = (16,12))
fig.add_subplot(1,2,1)
bar_labels = list(cyclestrain['unit'])
bars = plt.bar(list(cyclestrain['unit']), cyclestrain['cycles'], color='red')
plt.ylim([0, 400])
plt.xlabel('Units', fontsize=16)
plt.ylabel('Max. Cycles', fontsize=16)
plt.title('Max. Cycles per unit in trainset', fontsize=16)
plt.xticks(np.arange(min(bar_labels)-1, max(bar_labels)-1, 5.0), fontsize=12)
plt.yticks(fontsize=12)
fig.add_subplot(1,2,2)
bars = plt.bar(list(cyclestest['unit']), cyclestest['cycles'], color='grey')
plt.ylim([0, 400])
plt.xlabel('Units', fontsize=16)
plt.ylabel('Max. Cycles', fontsize=16)
plt.title('Max. Cycles per unit in testset', fontsize=16)
plt.xticks(np.arange(min(bar_labels)-1, max(bar_labels)-1, 5.0), fontsize=12)
plt.yticks(fontsize=12)
plt.show()

	unit	cycles	op_setting1	op_setting2	op_setting3	s1	s2	s3	s4	s5	...	s12	s13	s14	s15	s16	s17	s18	s19	s20	s21
0	1	1	-0.001	-0.000	100.000	518.670	641.820	1589.700	1400.600	14.620	...	521.660	2388.020	8138.620	8.419	0.030	392	2388	100.000	39.060	23.419
1	1	2	0.002	-0.000	100.000	518.670	642.150	1591.820	1403.140	14.620	...	522.280	2388.070	8131.490	8.432	0.030	392	2388	100.000	39.000	23.424
2	1	3	-0.004	0.000	100.000	518.670	642.350	1587.990	1404.200	14.620	...	522.420	2388.030	8133.230	8.418	0.030	390	2388	100.000	38.950	23.344
3	1	4	0.001	0.000	100.000	518.670	642.350	1582.790	1401.870	14.620	...	522.860	2388.080	8133.830	8.368	0.030	392	2388	100.000	38.880	23.374
4	1	5	-0.002	-0.000	100.000	518.670	642.370	1582.850	1406.220	14.620	...	522.190	2388.040	8133.800	8.429	0.030	393	2388	100.000	38.900	23.404

	unit	cycles	op_setting1	op_setting2	op_setting3	s1	s2	s3	s4	s5	...	s12	s13	s14	s15	s16	s17	s18	s19	s20	s21
0	1	1	0.002	0.000	100.000	518.670	643.020	1585.290	1398.210	14.620	...	521.720	2388.030	8125.550	8.405	0.030	392	2388	100.000	38.860	23.373
1	1	2	-0.003	-0.000	100.000	518.670	641.710	1588.450	1395.420	14.620	...	522.160	2388.060	8139.620	8.380	0.030	393	2388	100.000	39.020	23.392
2	1	3	0.000	0.000	100.000	518.670	642.460	1586.940	1401.340	14.620	...	521.970	2388.030	8130.100	8.444	0.030	393	2388	100.000	39.080	23.417
3	1	4	0.004	0.000	100.000	518.670	642.440	1584.120	1406.420	14.620	...	521.380	2388.050	8132.900	8.392	0.030	391	2388	100.000	39.000	23.374
4	1	5	0.001	0.000	100.000	518.670	642.510	1587.190	1401.920	14.620	...	522.150	2388.030	8129.540	8.403	0.030	390	2388	100.000	38.990	23.413

	count	mean	std	min	25%	50%	75%	max
unit	20631.000	51.507	29.228	1.000	26.000	52.000	77.000	100.000
cycles	20631.000	108.808	68.881	1.000	52.000	104.000	156.000	362.000
op_setting1	20631.000	-0.000	0.002	-0.009	-0.002	0.000	0.002	0.009
op_setting2	20631.000	0.000	0.000	-0.001	-0.000	0.000	0.000	0.001
op_setting3	20631.000	100.000	0.000	100.000	100.000	100.000	100.000	100.000
s1	20631.000	518.670	0.000	518.670	518.670	518.670	518.670	518.670
s2	20631.000	642.681	0.500	641.210	642.325	642.640	643.000	644.530
s3	20631.000	1590.523	6.131	1571.040	1586.260	1590.100	1594.380	1616.910
s4	20631.000	1408.934	9.001	1382.250	1402.360	1408.040	1414.555	1441.490
s5	20631.000	14.620	0.000	14.620	14.620	14.620	14.620	14.620
s6	20631.000	21.610	0.001	21.600	21.610	21.610	21.610	21.610
s7	20631.000	553.368	0.885	549.850	552.810	553.440	554.010	556.060
s8	20631.000	2388.097	0.071	2387.900	2388.050	2388.090	2388.140	2388.560
s9	20631.000	9065.243	22.083	9021.730	9053.100	9060.660	9069.420	9244.590
s10	20631.000	1.300	0.000	1.300	1.300	1.300	1.300	1.300
s11	20631.000	47.541	0.267	46.850	47.350	47.510	47.700	48.530
s12	20631.000	521.413	0.738	518.690	520.960	521.480	521.950	523.380
s13	20631.000	2388.096	0.072	2387.880	2388.040	2388.090	2388.140	2388.560
s14	20631.000	8143.753	19.076	8099.940	8133.245	8140.540	8148.310	8293.720
s15	20631.000	8.442	0.038	8.325	8.415	8.439	8.466	8.585
s16	20631.000	0.030	0.000	0.030	0.030	0.030	0.030	0.030
s17	20631.000	393.211	1.549	388.000	392.000	393.000	394.000	400.000
s18	20631.000	2388.000	0.000	2388.000	2388.000	2388.000	2388.000	2388.000
s19	20631.000	100.000	0.000	100.000	100.000	100.000	100.000	100.000
s20	20631.000	38.816	0.181	38.140	38.700	38.830	38.950	39.430
s21	20631.000	23.290	0.108	22.894	23.222	23.298	23.367	23.618

	train	test
0	0	0
1	0	0
2	0	0
3	0	0
4	0	0
5	0	0
6	0	0
7	0	0
8	0	0
9	0	0
10	0	0
11	0	0
12	0	0
13	0	0
14	0	0
15	0	0
16	0	0
17	0	0
18	0	0
19	0	0
20	0	0
21	0	0
22	0	0
23	0	0
24	0	0
25	0	0
26	20631	13096
27	20631	13096