0% found this document useful (0 votes)

16 views12 pages

IoT Data Analytics Using Deep Learning

This paper presents the LSTM-Gauss-NBayes model for anomaly detection in IoT time-series data, leveraging Long-Short Term Memory networks to predict future events based on historical data. The model addresses challenges in time-series analysis, such as high dimensionality and noise, by employing down sampling and normalization techniques. Experimental results demonstrate that the LSTM-Gauss-NBayes model is effective and robust in detecting anomalies across various real-world datasets.

Uploaded by

gabriel victor

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views12 pages

IoT Data Analytics Using Deep Learning

Uploaded by

gabriel victor

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

IoT Data Analytics Using Deep Learning

Xiaofeng Xie, Di Wu, Siping Liu, Renfa Li

Abstract: Deep learning is a popular machine learning approach which has achieved a
lot of progress in all traditional machine learning areas. Internet of thing (IoT) and
Smart City deployments are generating large amounts of time-series sensor data in need
of analysis. Applying deep learning to these domains has been an important topic of
research. The Long-Short Term Memory (LSTM) network has been proven to be well
suited for dealing with and predicting important events with long intervals and delays
in the time series. LTSM networks have the ability to maintain long-term memory. In
an LTSM network, a stacked LSTM hidden layer also makes it possible to learn a high
level temporal feature without the need of any fine tuning and preprocessing which
would be required by other techniques. In this paper, we construct a long-short term
memory (LSTM) recurrent neural network structure, use the normal time series training
set to build the prediction model. And then we use the predicted error from the
prediction model to construct a Gaussian naive Bayes model to detect whether the
original sample is abnormal. This method is called LSTM-Gauss-NBayes for short. We
use three real-world data sets, each of which involve long-term time-dependence or
short-term time-dependence, even very weak time dependence. The experimental
results show that LSTM-Gauss-NBayes is an effective and robust model.

1. Introduction
In many areas, such as the natural sciences, social sciences, and engineering,
phenomena are best analyzed with time-series data. In the field of Internet of
things[1][2], time series data can be generated by weather stations, RFID tags, IT
infrastructure components[3], and some other sensors[4], as shown in Fig. 1(a). In each
business process and every application of the Internet of things sensor, the time series
data can be used for process optimization or knowledge discovery. When carefully
analyzed such data can reveal operational trends, patterns, variability, changes,
covariation, cycle abnormalities, anomaly and abnormal value rate. Traditional time
series processing technique uses a statistical indicator such as cumulative sum
(CUSUM) and exponentially weighted moving average (EWMA) in a time window [5]
to detect potential changes in the distribution. The length of the time window usually
needs to be predetermined and the result is highly dependent on the parameter. In
addition, there are some sequence models, such as conditional random field model, d
Kalman filter, Markov model, dealing with sequential data but are ill-equipped to learn
long-range dependencies. What’s more, other models require domain knowledge or
feature engineering, thus they provide fewer opportunities for accidental discovery. In
contrast, neural network learning techniques allow unforeseen structures to be found.
Recurrent neural network (RNN) [6] can theoretically solve long delayed tasks without
requiring predefined time steps. However, because of the simplicity of the hidden layer
units’ structure, gradient explosion or vanishing gradient [7] is easy to occur over longer
time series tasks. The long short-term memory (LSTM) neural network [8][9] is a

Xiaofeng Xie, Di Wu, Siping Liu and Renfa Li are with the Key Laboratory for Embedded and
Networking Computing of Hunan Province, Hunan University.
Di Wu is the corresponding author (Email: [email protected]).
variant of the recurrent neural network which can effectively solve the problem of
gradient vanishing or gradient explosion by introducing a set of memory units.
Anomaly detection in time series data is an important research direction. In view
of the difficulties and challenges faced by anomaly detection, the method proposed in
this paper is to let the LSTM neural network model learn the trend of the future time
step, that is, to use the LSTM network as a predicted model. We use the stacked LSTM
model to learn only normal time series data. Then the predicted error of the future time
step is introduced into the Naive Bayes model [10] of Gaussian distribution to identify
the abnormal behavior, as shown in Fig. 2.
Since LSTM has never been used in this set up, we first validate its utility and
compare its performance to a set of strong baselines, that is, long short-term memory
neural network (LSTM NN) and multi-layer perceptron model (MLP). The
optimization goal of the stacked LSTM prediction model is to calculate only the losses
in the final sequence step. At the same time, in order to improve the model
generalization ability, we have used dropout [11] technology in the model training
process, which further improves the performance of this model. The later experiments
in this paper also prove that this is a more effective method.
The remainder of this paper is organized as follows: Section 2 deals with a review
of time series processing. In Section 3, we describe the challenges and solutions for
anomaly detection in time series processing. In Section 4, we use the stacked LSTM-
Gauss-NBayes method and two contrastive methods LSTM NN model and MLP model
for three real-world datasets, and the same time analyzing its results. Section 5
Summary and Future Outlook.

Data Sensing Sensor

Dynamic

Data Transport
IOT
Time Series

High Time
Data Storage Dimension Dependent
And Processing
Using Deep Learning to Analyze

(a) (b)

Figure 1. (a) shows the data collection process in a smart city, the sensor, human,
mobile terminal generated data are sent to the cloud through such as switches, routers
and other network equipment, and stored to the cloud server; (b) shows the
characteristics of the time-series data of the Internet of things and use the deep learning
to analyze it.
Data Cleaning Training Set
Stacked
LSTM Model
Initial
Data Downsample Validation Set
Data

Testing Trained
Data Normalization Testing Set
Model

Recognition Gaussian Maximun Error Calculate Error

Result Distribution Likelihood Training Set
Estimation
Error Dataset
Error
Native Bayes Testing
Testing Set
Model

Figure 2. Construction procedure of anomaly detection model.

2 Development and Trend of Time Series Processing in IoT

2.1 The Application of Statistical Learning Method in Time Series Data
A time series is a sequence of numbers in chronological order. Traditional time
series analysis uses mathematical statistical methods to analyze this sequence and
predict the future development of things. Its basic principles are: first to detect
continuous changes in the sequence. Application of past data to predict the development
of trends. Next, the randomness of the time series is taken into account. Process effects
may be affected by random factors, for which we use the weighted average method in
statistical analysis to deal with historical data. A relatively representative model of the
traditional time series method is the ARIMA model [12], which is called the
Autoregressive Moving Average Model (ARIMA). ARIMA transforms the non-
stationary time series into a stationary time series by d-order differential operation.
Then the autocorrelation coefficient (ACF) and the partial autocorrelation coefficient
(PACF) are obtained respectively for the stationary time series. Through the analysis of
the autocorrelation graph and the partial autocorrelation graph, the optimal class p and
order q are obtained. And then the ARIMA model is constructed from the above
parameters. The method is simple and easy to master, but it has low accuracy and it is
only suitable for short-term prediction. Time series prediction generally reflects three
kinds of actual changes: trend change, cyclical change and randomness. The traditional
time series analysis is commonly used in macroeconomic control of a national economy,
regional comprehensive development planning, enterprise management and
management, market potential forecast, meteorological forecast, hydrological
forecasting, earthquake precursor forecast, crop pest disaster forecast, environmental
pollution control, ecological balance, astronomy and oceanography.
2.2 Application of Neural Network in Internet of Things (IoT)
The neural network is a widely interconnected network of simple neurons which
can adapt, and simulate the response of the biological nervous system to real world
objects. A BP (Back Propagation) neural network has the ability to learn, memorize,
associate, induce, generalize and extract features, tolerance faults and introspection. It
can extract complex relationships between input and output, even when the relationship
itself is in flux. Recently, BP neural network has been widely used to solve the problems
of identification and prediction. It has achieved the effect that the conventional
economics method cannot get in the economic field such as economic prosperity
analysis, economic time series forecasting, portfolio securities optimization and stock
forecasting. At the same time, BP neural network as a new time series prediction method
can approximate nonlinear quantities with high prediction accuracy. Through the use of
the time series relationships between before and later, the past observations as BP neural
network’s input and the future value as the BP network’s output, which build a time
series prediction model. From a mathematical point of view, The BP network becomes
a nonlinear function of the input and output values.
2.3 Application of Recurrent Neural Network in Internet of Things
Nevertheless, although sometimes BP neural networks can achieve better results in
time series processing, in traditional neural networks, we assume that all inputs (and
outputs) are independent of each other. This is a very bad idea for many tasks. The
Recurrent Neural Network (RNN) differs from the general feedforward BP neural
network by memorizing the previous information and applying it to the calculation of
the current output, that is, the nodes between the hidden layers are no longer connected.
And the input of the hidden layer includes not only the output of the input layer but also
the output of the hidden layer at the last time. Theoretically it is possible to process any
length of sequence data. In practice, however, in order to reduce the complexity of the
model, it is often assumed that the current state is only related to the previous several
states. Although the simple recurrent neural network can theoretically establish a
dependency between states of long time intervals, due to the simplicity of its hidden
layer units’ structure, the gradient explosion or gradient vanished is likely to occur in
the relatively long-term time series processing task. This leads to the fact that only
short-period dependencies can only be learned.
2.4 Use the LSTM Neural Network to Process Time Series
The long short-term memory neural network (LSTM) is a variant of the recurrent
neural network, which can effectively solve the problem of gradient vanished or
gradient explosion by introducing a set of memory units. It allows the network to learn
when to forget the historical information of memory unit, when to update the memory
unit with new information. At time t, the memory unit 𝑐𝑐𝑡𝑡 records all historical
information up to the current moment and is controlled by three "gates": the input gate
𝑖𝑖𝑡𝑡 , the forget gate 𝑓𝑓𝑡𝑡 , and the output gate 𝑜𝑜𝑡𝑡 , the elements’ values of the three gates are
set to [0, 1]. These models are well suited for data sets that contain time dimensions
(such as web or server activity logs, sensor data from hardware or medical devices,
financial transactions, or call records). Only the current state and some of the previous
states are needed to train the network. LTSM can track dependencies and relationships
across many time-steps. Although the use of typical feedforward neural network that
receives the event window may also be done, the following window size will change
along with the time. The feedforward method would limit us to the dependencies
captured by the window, so the solution is not flexible. LSTM networks have been
widely used in many sequence learning tasks. For example, give you a word sequence,
we need to predict the likelihood of each word based on the previous word. LSTM
language Models allow us to measure how likely a sentence is, which is an important
part for Machine Translation (since high-probability sentences are typically correct).
The LSTM network model can also predict the sequence of speech segments and their
probabilities [13], given the input sequence of acoustic signals from sound waves. Other
applications are handwriting recognition [14] and generating image descriptions [15].

3. Deep Learning for Anomaly Detection in IoT Data

3.1 Challenge in Time Series Anomaly Detection
Time series data has high dimensionality, complexity, dynamic, high noise
characteristics, as shown in Fig. 1(b). If data mining is carried out directly on the
original time series, it will not only spend a lot of resources and time in storage and
computation, but also affect the accuracy and reliability of the algorithm. How to
effectively preprocess the time series data under the condition that the key information
of the time series data is not lost is a key problem. Reducing the dimensionality of the
data and removing noise, are the key goals of preprocessing. Noisy data increases the
complexity of an anomaly detection problem on the given time series data. At the same
time, when the abnormal data is not available or sparse, it is difficult to learn the normal
and abnormal sequence classification model.
Meanwhile, there is a certain recursive relationship between each point, and each
event. There is no much value in the analysis of the single point. Therefore, time series
data mining analysis needs to consider the logical relationship and the recursive
relationship among the events. But when a time span is very large, which becomes a
big challenge for the time series on the abnormal detection.
3.2 Use the Stacked LSTM-Gauss-NBayes Model to Detect Abnormal
For the challenges of time series data processing, we first use the down sampling
technique to obtain the characteristic subsequence of the original time series. Down
sampling reduces the number of dimensions in the original time series and makes it
easier to learn patterns. At the same time, in order to speed up the convergence rate of
the model, we normalize our data using min-max normalization for time series data,
which is a linear transformation of the original data. The transformed values are mapped
to the interval [0, 1].
Due to long-term and short-term dependency of time series in the Internet of things,
we consider a LSTM neural network structure. In our LTSM, the input layer
corresponds to a time series, the number of per hidden layer’s LSTM units corresponds
to the time step of the time series. And we use two hidden layers, that is, the stacked
LSTM hidden layer (As shown in Fig. 3). For the output layer, we use the fully
connected layer above the highest LSTM layer, followed by the element-shaped
sigmoid activation function. We use the least squares loss function as the cost function
for this model. Subsequently, we use the Back Propagation Through Time (BPTT)
algorithm to train this model. For the data set, we divide the data set into a training set
containing normal data, denoted as 𝐷𝐷𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛_𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 , a validation set containing normal
data, denoted as 𝐷𝐷𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛_𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣 , a test set containing normal data, denoted as
𝐷𝐷𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛_𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 , and a test set containing abnormal data, denoted as 𝐷𝐷𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎_𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 . In the
meantime, in the real Internet of things time series, abnormal samples are relatively
small. We let the stacked LSTM prediction model only use the normal data set
𝐷𝐷𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛_𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 to train, the hyper-parameters of which are determined by the validation
set 𝐷𝐷𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛_𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣 . Furthermore, we put the test set 𝐷𝐷𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛_𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 and 𝐷𝐷𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎_𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡
into the trained model respectively. The prediction results of the normal data and the
abnormal data are obtained respectively. Then we can calculate the difference between
the real data and the predicted data. So, we get the error data set, including the error of
normal and abnormal data.
Next, we take the error at each point in each test sample ( 𝐷𝐷𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛_𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 and
𝐷𝐷𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎_𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 ) as the attribute of the error data set. We divide the error data set into
training sets 𝐸𝐸𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 and test sets 𝐸𝐸𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 , where the labels values y belong to the set {0,
1} and 1 for abnormal. So, we can build the Bernoulli distribution for the target value
y and this distribution’s parameters can be obtained by the training set 𝐸𝐸𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 . For the
meantime, we assume that each numerical attribute in the error data set is subject to the
Gaussian distribution. In fact, this strong assumption is usually very effective and can
produce robust results. After that, we establish the corresponding Gaussian probability
density function for the conditional probability of each attribute. And the parameters of
the Gaussian density probability function can be calculated by using the maximum
likelihood estimation in the training set 𝐸𝐸𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 . Then we use it to compute the
conditional probability that the attribute occurs in the presence of a certain class. Due
to the independence principle of Naive Bayes, we multiply these conditional
probabilities of one sample directly. And we can get the conditional probability that one
sample occurs in the presence of a certain class, that is to say, 𝑃𝑃(𝑥𝑥|𝑦𝑦 = 0) and
𝑃𝑃(𝑥𝑥|𝑦𝑦 = 1), where x, y stands for the sample and its label value respectively. Then,
according to the Bayes formula, we calculate the posterior probability of the category
of each sample in the test set 𝐸𝐸𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 , i.e. whether this sample is abnormal. Because of
the anomaly detection, we use the Precision and Recall and 𝐹𝐹𝛽𝛽 scores to measure our

model. And β is greater than 0, which measures the relative importance of recall rate to
precision rate. Here we use the 𝐹𝐹1 score, Because of the data set referred to in this
article, Precision and Recall is equally important.

Ot-1 Ot Ot+1 ht
St-1 St
LSTM LSTM LSTM
tanh
LSTM LSTM LSTM
ht-1 σ σ tanh σ
ht
Xt-1 Xt Xt+1 Xt
(a) (b)
Figure 3. Fig.3 (a) shows the expansion of the staked-LSTM prediction model, the
LSTM units in the hidden layer are fully connected by recurrent connections. Each unit
of the lower LSTM hidden layer in the stacked LSTM layer is connected to each cell in
the LSTM hidden layer above it through the feedforward connection. In addition, Fig.3
(b) shows the internal structure of the LSTM layer, where σ and tanh represent the
activation function. 𝑋𝑋𝑡𝑡 stand for the input of model. ℎ𝑡𝑡 , ℎ𝑡𝑡−1 stand for the output of
LSTM unit in the t-th sequence step and the previous sequence step respectively. 𝑆𝑆𝑡𝑡 ,
𝑆𝑆𝑡𝑡−1 stand for the value of LSTM memory unit in the t-th sequence step and the
previous sequence step respectively.

4. Experiments
In this section, in order to demonstrate the effectiveness of our method. We use
Google's deep learning platform, Tensorflow, to implement our algorithm and use the
Nvidia GTX1070 to accelerate our training on the model. We did two sets of
comparative method to validate our model, namely LSTM NN model and MLP model.
For the LSTM NN model, we use an input layer, two hidden layers with LSTM memory
blocks and a classification layer to construct it. Meanwhile, we use cross entropy loss
function as the cost function for this model. For the MLP model, this is a conventional
neural network model, constructed by an input layer, multiple hidden layers, and a
classification layer. Here, we construct a multi-layer perceptron model with three
hidden layers, each of which has 10, 20, 10 units respectively. And its cost function is
also a cross entropy loss function. Furthermore, all models are trained on 80 percent of
the data and tested on 10 percent. The remaining 10 percent is used as a validation set.
We used the adaptive gradient algorithm (Adagrad) to train each model 1000 epoch. In
order to prevent over-fitting, we use regularization techniques and dropout to reduce
the complexity of these model. And we use 5-fold cross validation to select the hyper-
parameters of these model. We consider three real-world data sets: power data, loop
sensor data and land sensor data. The common characteristics of these data sets is that
they are time series. Some of these data series are cyclical, and some are irregular.

Table 1. The autocorrelation coefficients of each data set at different delay cases. The
autocorrelation coefficient is mainly used to describe quantitatively the relation
between past events and current events.
Autocorrelation coefficient(ACF) Time
Dataset
Delay k=1 Delay k=5 Delay k=10 dependency

Power dataset 0.79 -0.78 0.56 long

Loop sensor dataset 0.71 0.40 0.05 short
Land sensor dataset 0.32 0.13 0.08 very weak

4.1 Data Set Description

Power data: This data is a user's power data for a year. It is collected every 15 minutes
every day. We down-sample the original power data for each week, and the resulting
data constitutes the input samples for our model. Under the normal circumstances,
power consumption will be relatively high in the first 5 days of the week, and on the
weekend, it will be relatively low. As we can see from Fig. 4 (a), the trend of power
consumption shows 5 peaks in the first 5 days, and a trough appears two days later. If a
sample had troughs in the first five days of a week, or wave crest appeared two days
later, we could think of it as an anomaly. In addition, the data is noisy, so the peak does
not appear exactly at the same time of the day.
Loop sensor data set: The data set is mainly composed of the number of vehicles
passing through near the stadium and it was collected by the loop sensor. The data set
was collected only when there was a game in the stadium. From the table 1, we can see
that its ACF is less than 0.5 when the delay is 5, so we can know that this data set is a
short-term time-dependent. In order to better analyze and use the data, we have screened
the original time series, selected the data with only 1 hour before the game, the game
and 2 hours after the game. From Fig. 4 (c), it can be found that the time series has a
small peak in the first half and the second half, and the wavelet valley appears in the
middle of the time series. Especially after the wavelet valley, vehicle data rises rapidly.
This is quite consistent with a sharp increase in traffic after the race. Such behavior is
considered normal for this time series.
Land sensor data set: The data set is mainly composed of land humidity data collected
every 12 minutes by a sensor. Unlike the previous two data sets, they have some cyclical,
but this data set is an irregular time series and very weak time-dependent data set, as
shown in Table 1. With the development of time, this data set’s values fluctuate
randomly in a certain range. And the appearance of the anomaly is also irregular. So, it
has increased the challenge to our model validation. In order to better analyze this data,
similarly, we down-sampled the original time series, with 10 hours of humidity data as
one sample, to train our model. It can be seen from Fig. 4 (f) that an anomaly may have
occurred the first half and the second half of the time series.

Table 2. Experimental results, the performance of each model under four classification
indicators (Accuracy, Precision, Recall, 𝐹𝐹1 ) in three different data sets.
Dataset Method Accuracy Precision Recall F1
LSTM-Gauss-NBayes 0.969 1 0.941 0.962
Power
LSTM NN 0.905 0.846 0.931 0.886
dataset
MLP 0.873 0.843 0.925 0.882
LSTM-Gauss-NBayes 0.953 0.932 0.976 0.954
Loop sensor
LSTM NN 0.870 0.867 0.897 0.881
dataset
MLP 0.824 0.790 0.819 0.804
LSTM-Gauss-NBayes 0.971 0.917 0.946 0.931
Land sensor
LSTM NN 0.818 0.859 0.769 0.812
dataset
MLP 0.818 0.889 0.727 0.800
anomaly

(a) power-Normal (b) power-Abnormal

anomaly

(c) loopsensor-Normal (d) loopsensor-Abnormal

anomaly

(e) landsensor-Normal (f) landsensor-Abnormal

Figure 4. The figure above shows the predicted results of the model under the three
data sets, the orange lines represent the model predictions, and the blue lines represent
the real result. In addition, the left side of the figure is the result of the normal sample,
the right is the abnormal sample. The red arrow shows the possible anomaly.

4.2 Analysis and Comparison of Experimental Results

In order to evaluate our model, we use four indicators: accuracy, precision, recall,
𝐹𝐹1 to comprehensively consider our model. Generally speaking, accuracy is enough to
judge a model whose goal is to classify. Since our goal is to identify whether the sample
is abnormal, precision and recall are important metrics by which to evaluate our model.
Precision rate is mainly used to judge whether the classifier can correctly identify the
anomaly, in other words, it mainly focuses on the identified abnormal samples of which
how many such samples are really abnormal. And the recall rate is mainly judge
whether the classifier can make all the abnormal samples identified. 𝐹𝐹𝛽𝛽 index is a

combination of the previous two indicators, if β less than 1, then it represents the recall
rate is more important. On the contrary, the precision rate has a greater impact on the
model quality assessment. This experiment uses 𝐹𝐹1 indicators, because for this article
involved in the data set, precision rate and recall rate is equally important for us.
Table 2 shows the performance of each model under four classification indices in
three different data sets. Under the same data set, we use the underscore to mark the
highest value for each metric. For the power data set, we can see that its ACF is more
than 0.5 when the delay is 10 from the table 1, so we can know that this is a data set
with a long-term dependency on the time, and its current data is inextricably linked to
the data in front of it. It can be seen from Table 2, that is why, the model with the hidden
layer of the LSTM unit is generally better than the general hidden layer in the results
of each of indicators. However, since the data set has a certain periodicity, the specific
performance is that the electricity value in the first 5 days will be higher, and then two
days later will be relatively low. Therefore, the feature extraction is also easy to
implement for the ordinary neural network model. Thus, we can observe that the
performance of the multi-layer perceptron model (MLP) is also good, its recall rate
reached 88.2 percent.
For loop sensor data set, it is also closely related to the time order. However,
compared to the power data set, the characteristics of this data set is not obvious, it is
difficult to dig its feature, and it’s difficult to distinguish between its abnormal time
series and its normal time series. Therefore, it can be found from table 2, the result of
two contrast experiments is unsatisfactory in this data set. Especially, the precision rate
of anomaly detection for the multi-layer perceptron model (MLP), can be said to be bad,
only reached 79 percent. Nevertheless, in this data set the LSTM-Gauss-NBayes model
proposed in this paper is superior to the other two methods, and each of its indicators
values are far ahead of the other methods’ indicators values.
For the land sensor data set, this data set is not always dependent on the
chronological order, in other words, it is difficult to predict future values given only
past values. However, as can be seen from Table 2, our expected results are
unexpectedly good, every indicator has reached a good result. From 𝐹𝐹1 scores of Table
2, it can be seen that the long short-term memory neural network (LSTM NN) model
and the multi-layer perceptron (MLP) model are less effective for the anomaly detection
of this data set. Possibly this is because the features of the data set are not obvious and
its value is within a frequently fluctuating range. In addition, it can also be found in
Table 2, for this data set multi-layer perceptron (MLP) algorithm is better at precision
rate of anomaly detection than the long short-term memory neural network (LSTM NN).
This may be due to the fact that this time series does not exhibit strong time dependence.

5. Conclusion
In this paper, we propose a combination of long short-term memory neural network
and Naive Bayes model with Gaussian distribution for anomaly detection. The LSTM
neural network build model on normal time series behavior and then using the
predictive error for the Naive Bayesian model of Gaussian distribution to detect
anomalies. The LSTM-Gauss-NBayes method produces relatively good results on three
real-world datasets, each of which involve long-term time-dependent and short-term
time-dependent, even very weak time dependence. When compared with LSTM NN
model and MLP model, our model always gives better or similar results, indicating that
our model is robust. Future work may consider the use of multidimensional Gaussian
distribution discriminant analysis instead of the Gaussian distribution of the naive
Bayes model. This approach can take advantage of the relationship between the error
data set properties since the covariance matrix is increased. In general, the LSTM-
Gauss-NBayes method is a viable candidate in the field of anomaly detection.

References
[1] Y. Zhang, L. Bao, S. Yang, M. Welling and D. Wu, "Localization Algorithms for Wireless
Sensor Retrieval," in The Computer Journal, vol. 53, no. 10, pp. 1594-1605, Dec. 2010.

[2] Z. Qin, D. Wu, Z. Xiao, B. Fu and Z. Qin, "Modeling and Analysis of Data Aggregation From
Convergecast in Mobile Sensor Networks for Industrial IoT," in IEEE Transactions on
Industrial Informatics, vol. 14, no. 10, pp. 4457-4467, Oct. 2018.
[3] D. Wu et al., "Towards Distributed SDN: Mobility Management and Flow Scheduling in
Software Defined Urban IoT," in IEEE Transactions on Parallel and Distributed Systems, vol.
31, no. 6, pp. 1400-1418, 1 June 2020.
[4] F. Shi, D. Wu, D. I. Arkhipov, Q. Liu, A. C. Regan and J. A. McCann, "ParkCrowd: Reliable
Crowdsensing for Aggregation and Dissemination of Parking Space Information," in IEEE
Transactions on Intelligent Transportation Systems, vol. 20, no. 11, pp. 4032-4044, Nov. 2019.
[5] M. Basseville and I. V. Nikiforov, “Detection of abrupt changes: theory and application,”
Prentice Hall, 1993.
[6] Alex Graves, “Supervised sequence labelling with recurrent neural networks,” Heidelberg,
Springer, vol. 385, 2012.
[7] Razvan Pascanu, Tomas Mikolov, and Yoshua Bengio, “On the diﬃculty of training recurrent
neural networks,” arXiv: 1211.5063, 2012.
[8] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural computation, vol. 9,
no. 8, 1997, pp. 1735–1780.
[9] D. Wu, Z. Jiang, X. Xie, X. Wei, W. Yu and R. Li, "LSTM Learning With Bayesian and
Gaussian Processing for Anomaly Detection in Industrial IoT," in IEEE Transactions on
Industrial Informatics, vol. 16, no. 8, pp. 5244-5253, Aug. 2020.
[10] Jensen, F. V., “An introduction to Bayesian networks,” London, UCL press, vol. 210, 1996,
pp. 1-178.
[11] Geoffrey E. Hinton, Nitish Srivastava, Alex Krizhevsky, Ilya Sutskever, Ruslan R.
Salakhutdinov., “Improving neural networks by preventing co-adaptation of feature
detectors,” arXiv: 1207.0580, 2012.
[12] George E. P. Box, Gwilym M. Jenkins, Gregory C. Reinsel, Greta M. Ljung., “Time series
analysis: forecasting and control (Fifth Edition),” John Wiley & Sons, 2015.
[13] Alex Graves, Navdeep Jaitly., “Towards End-to-End Speech Recognition with Recurrent
Neural Networks,” International Conference on Machine Learning 2014, Beijing, China,
June 2014, pp. 1764-1772.
[14] X. Zhang, F. Yin, Y. Zhang, C. Liu and Y. Bengio, "Drawing and Recognizing Chinese
Characters with Recurrent Neural Network," in IEEE Transactions on Pattern Analysis and
Machine Intelligence, vol. 40, no. 4, pp. 849-862, 1 April 2018.
[15] A. Karpathy and L. Fei-Fei, "Deep Visual-Semantic Alignments for Generating Image
Descriptions," in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39,
no. 4, pp. 664-676, 1 April 2017.

Multi-Step Ahead Time Series Forecasting For Different Data Patterns Based On LSTM Recurrent Neural Network
No ratings yet
Multi-Step Ahead Time Series Forecasting For Different Data Patterns Based On LSTM Recurrent Neural Network
6 pages
A Combined Model Based On Recurrent Neural Networks and Graph Convolutional Networks For Financial Time Series Forecasting Mathematics-11-00224-V2
No ratings yet
A Combined Model Based On Recurrent Neural Networks and Graph Convolutional Networks For Financial Time Series Forecasting Mathematics-11-00224-V2
21 pages
CNN-Based Time Series Prediction
No ratings yet
CNN-Based Time Series Prediction
8 pages
Deep Learning for Time Series
No ratings yet
Deep Learning for Time Series
16 pages
LSTMs
No ratings yet
LSTMs
9 pages
RNNs for Time Series Forecasting
No ratings yet
RNNs for Time Series Forecasting
5 pages
Building Trend Fuzzy Granulation-Based LSTM Recurrent Neural Network For Long-Term Time-Series Forecasting
No ratings yet
Building Trend Fuzzy Granulation-Based LSTM Recurrent Neural Network For Long-Term Time-Series Forecasting
15 pages
Seriesnet:A Generative Time Series Forecasting Model: Zhipeng Shen, Yuanming Zhang, Jiawei Lu, Jun Xu, Gang Xiao
No ratings yet
Seriesnet:A Generative Time Series Forecasting Model: Zhipeng Shen, Yuanming Zhang, Jiawei Lu, Jun Xu, Gang Xiao
8 pages
A Study On Neural Networks Approach To Time-Series Analysis
No ratings yet
A Study On Neural Networks Approach To Time-Series Analysis
4 pages
1 s2.0 S0950705125010160 Main
No ratings yet
1 s2.0 S0950705125010160 Main
11 pages
BDCC 08 00048
No ratings yet
BDCC 08 00048
14 pages
Transformer Forecasting Review
No ratings yet
Transformer Forecasting Review
30 pages
XLSTMTime - Long-Term Time Series Forecasting With XLSTM
No ratings yet
XLSTMTime - Long-Term Time Series Forecasting With XLSTM
13 pages
Markov Enhanced I LSTM Approach For Effective A - 2024 - International Journal o
No ratings yet
Markov Enhanced I LSTM Approach For Effective A - 2024 - International Journal o
7 pages
VAECGAN A Generating Framework For Long-Term Prediction in Multivariate Time Series
No ratings yet
VAECGAN A Generating Framework For Long-Term Prediction in Multivariate Time Series
12 pages
CH 10
No ratings yet
CH 10
41 pages
Multi-Channels Deep Convolution Neural Network For Early Classification of Multivariate Time Series
No ratings yet
Multi-Channels Deep Convolution Neural Network For Early Classification of Multivariate Time Series
11 pages
FreDo - Frequency Domain-Based Long-Term Time Series Forecasting
No ratings yet
FreDo - Frequency Domain-Based Long-Term Time Series Forecasting
12 pages
Anomaly Detection of Spacecraft Telemetry Data Using Temporal Convolution Network
No ratings yet
Anomaly Detection of Spacecraft Telemetry Data Using Temporal Convolution Network
5 pages
Research Paper TARP Final Upload
No ratings yet
Research Paper TARP Final Upload
5 pages
Chandrasekaran, R., & Paramasivan, S. K. (2022) - A State-Of-The-Art Review of Time Series Forecasting Using Deep Learning Approaches.
No ratings yet
Chandrasekaran, R., & Paramasivan, S. K. (2022) - A State-Of-The-Art Review of Time Series Forecasting Using Deep Learning Approaches.
14 pages
A Comparison Between Machine and Deep Learning Models On High Stationarity Data
No ratings yet
A Comparison Between Machine and Deep Learning Models On High Stationarity Data
11 pages
MixMamba Time Series Modeling With Adaptive Expertise
No ratings yet
MixMamba Time Series Modeling With Adaptive Expertise
13 pages
Koopa: Learning Non-Stationary Time Series Dynamics With Koopman Predictors
No ratings yet
Koopa: Learning Non-Stationary Time Series Dynamics With Koopman Predictors
20 pages
Time Series Forecasting With Transformer Models and Application To Asset Management
No ratings yet
Time Series Forecasting With Transformer Models and Application To Asset Management
44 pages
Modeling Extreme Events in Time Series Prediction: Daizong Ding, Mi Zhang Xudong Pan, Min Yang Xiangnan He
No ratings yet
Modeling Extreme Events in Time Series Prediction: Daizong Ding, Mi Zhang Xudong Pan, Min Yang Xiangnan He
9 pages
Time Series Forecasting of Petroleum
No ratings yet
Time Series Forecasting of Petroleum
11 pages
Time Series Forecasting Using Back Propagation Neural Network With ADE Algorithm
No ratings yet
Time Series Forecasting Using Back Propagation Neural Network With ADE Algorithm
5 pages
Multivariate Time Series Data Prediction Based On
No ratings yet
Multivariate Time Series Data Prediction Based On
14 pages
Research Paper
No ratings yet
Research Paper
5 pages
Time Series Forecasting of Petroleum Pro
No ratings yet
Time Series Forecasting of Petroleum Pro
11 pages
Addressing Big Data Issues Using RN N Based Techniques
No ratings yet
Addressing Big Data Issues Using RN N Based Techniques
16 pages
Autoformer: Enhanced Long-Term Forecasting
No ratings yet
Autoformer: Enhanced Long-Term Forecasting
20 pages
Deep Learning For Time-Series Prediction in IIoT P
No ratings yet
Deep Learning For Time-Series Prediction in IIoT P
20 pages
Time Series Prediction Based On LSTM-Attention-LSTM Model
No ratings yet
Time Series Prediction Based On LSTM-Attention-LSTM Model
10 pages
Tranad: Deep Transformer Networks For Anomaly Detection in Multivariate Time Series Data
No ratings yet
Tranad: Deep Transformer Networks For Anomaly Detection in Multivariate Time Series Data
15 pages
Autoformer Nips21
No ratings yet
Autoformer Nips21
12 pages
Pay Attention To Evolution Time Series Forecasting
No ratings yet
Pay Attention To Evolution Time Series Forecasting
17 pages
Neural Networks vs. Traditional Forecasting
No ratings yet
Neural Networks vs. Traditional Forecasting
6 pages
1 s2.0 S0952197625009650 Main
No ratings yet
1 s2.0 S0952197625009650 Main
13 pages
Deep Learning Models For Time Series Forecasting A Review
No ratings yet
Deep Learning Models For Time Series Forecasting A Review
22 pages
TS RAG Retrieval Augment
No ratings yet
TS RAG Retrieval Augment
13 pages
Time Series Forecasting and Anomaly Detection Using Deep Learning - Elsevier
No ratings yet
Time Series Forecasting and Anomaly Detection Using Deep Learning - Elsevier
16 pages
Deep Learning For Anomaly Detection in Time-Series Data Review Analysis and Guidelines
No ratings yet
Deep Learning For Anomaly Detection in Time-Series Data Review Analysis and Guidelines
23 pages
Long Short-Term Memory RNN: Department of Computer Science
No ratings yet
Long Short-Term Memory RNN: Department of Computer Science
16 pages
Survey On Time Series Forecasting
No ratings yet
Survey On Time Series Forecasting
28 pages
TimeGPT: Foundation Model for Time Series
No ratings yet
TimeGPT: Foundation Model for Time Series
12 pages
Locality-Enhanced Transformer for Time Series Forecasting
No ratings yet
Locality-Enhanced Transformer for Time Series Forecasting
11 pages
Enhancing Transformer for Time Series
No ratings yet
Enhancing Transformer for Time Series
14 pages
Deep Learning For Time Series Forecasting: A Survey
No ratings yet
Deep Learning For Time Series Forecasting: A Survey
34 pages
LSTM vs NARX for EUR-USD Forecasting
No ratings yet
LSTM vs NARX for EUR-USD Forecasting
11 pages
Sustainability 14 15292 v2
No ratings yet
Sustainability 14 15292 v2
19 pages
SSRN 4165241
No ratings yet
SSRN 4165241
28 pages
762 32fre
No ratings yet
762 32fre
10 pages
Temporal Attention Augmented Bilinear Network For Financial Time-Series Data Analysis
No ratings yet
Temporal Attention Augmented Bilinear Network For Financial Time-Series Data Analysis
12 pages
GNNs for Time Series Analysis Survey
No ratings yet
GNNs for Time Series Analysis Survey
27 pages
FEDformer - Frequency Enhanced Decomposed Transformer For Long-Term Series Forecasting
No ratings yet
FEDformer - Frequency Enhanced Decomposed Transformer For Long-Term Series Forecasting
19 pages
Time - Series - Data 2024 05 22 05 16
No ratings yet
Time - Series - Data 2024 05 22 05 16
50 pages
TimeGPT: Foundation Model for Time Series
No ratings yet
TimeGPT: Foundation Model for Time Series
12 pages
Anomaly Detection in Iiot A Case Study Using Machine Learning
No ratings yet
Anomaly Detection in Iiot A Case Study Using Machine Learning
6 pages
Advanced Correlation-Based Anomaly Detection Method For Predictive Maintenance
No ratings yet
Advanced Correlation-Based Anomaly Detection Method For Predictive Maintenance
6 pages
Real-Time Anomaly Detection For Streaming Analytics
No ratings yet
Real-Time Anomaly Detection For Streaming Analytics
10 pages
DROCC: Deep Robust One-Class Classification
No ratings yet
DROCC: Deep Robust One-Class Classification
16 pages
Air Quality Monitoring and Forecasting System Using IoT and Machine Learning Techniques
No ratings yet
Air Quality Monitoring and Forecasting System Using IoT and Machine Learning Techniques
8 pages
StockAI Pro Documentation Final
No ratings yet
StockAI Pro Documentation Final
16 pages
Asymptotic Analysis of Mixed Effects Models: Theory, Applications, and Open Problems 1st Edition Jiming Jiang
No ratings yet
Asymptotic Analysis of Mixed Effects Models: Theory, Applications, and Open Problems 1st Edition Jiming Jiang
43 pages
Data Science Training by 3RI Technologies
100% (1)
Data Science Training by 3RI Technologies
33 pages
An Empirical Study On The Relationship B
No ratings yet
An Empirical Study On The Relationship B
9 pages
Anomalies in Time Series
No ratings yet
Anomalies in Time Series
19 pages
COVID-19 Forecasting in NE India
No ratings yet
COVID-19 Forecasting in NE India
33 pages
IBP-Demand Planning
No ratings yet
IBP-Demand Planning
7 pages
Importance of Index Numbers in Management
No ratings yet
Importance of Index Numbers in Management
57 pages
Are Stock Returns Predictable A Test Using Markov Chains
No ratings yet
Are Stock Returns Predictable A Test Using Markov Chains
26 pages
Renewable and Sustainable Energy Reviews: B. Yildiz, J.I. Bilbao, A.B. Sproul
No ratings yet
Renewable and Sustainable Energy Reviews: B. Yildiz, J.I. Bilbao, A.B. Sproul
19 pages
Dyssynchronous Diaphragm Contractions Impair Diaphragm Function in Mechanically Ventilated Patients
No ratings yet
Dyssynchronous Diaphragm Contractions Impair Diaphragm Function in Mechanically Ventilated Patients
11 pages
Measurement and Analysis of High Frequency Assert Volatility 1ba2xkfu
No ratings yet
Measurement and Analysis of High Frequency Assert Volatility 1ba2xkfu
11 pages
Crime Prediction Detailed Presentation
No ratings yet
Crime Prediction Detailed Presentation
11 pages
USD/IDR Exchange Rate Forecast
No ratings yet
USD/IDR Exchange Rate Forecast
35 pages
Quantitative Research Methods-Final Exam
No ratings yet
Quantitative Research Methods-Final Exam
4 pages
Longrun Technical Document: Jongwoo Kim, Allan M. Malz, Jorge Mina
No ratings yet
Longrun Technical Document: Jongwoo Kim, Allan M. Malz, Jorge Mina
176 pages
Forecasting Methods Guide
No ratings yet
Forecasting Methods Guide
2 pages
One-hour-Ahead Wind Speed Prediction Using A Bayesian Methodology
No ratings yet
One-hour-Ahead Wind Speed Prediction Using A Bayesian Methodology
6 pages
05 Forecasting
100% (1)
05 Forecasting
76 pages
Exponential Smoothing
No ratings yet
Exponential Smoothing
9 pages
Rainfall Data
No ratings yet
Rainfall Data
5 pages
Sundar Raghvan
No ratings yet
Sundar Raghvan
2 pages
Demand Forecasting
100% (1)
Demand Forecasting
62 pages
Time Series Analysis and Forecasting
No ratings yet
Time Series Analysis and Forecasting
271 pages
Analysis and Predicted Questions
No ratings yet
Analysis and Predicted Questions
4 pages
Research Methodology Notes 2025
No ratings yet
Research Methodology Notes 2025
49 pages
Decomposed Spatio-Temporal Mamba For Long-Term Traffic Prediction
No ratings yet
Decomposed Spatio-Temporal Mamba For Long-Term Traffic Prediction
9 pages
EPGC in Data Analytics Ihub IIT Roorkee - Intellipaat
No ratings yet
EPGC in Data Analytics Ihub IIT Roorkee - Intellipaat
16 pages
Lecture 19 Seas Arima
No ratings yet
Lecture 19 Seas Arima
28 pages
Lecture 11 Animation
No ratings yet
Lecture 11 Animation
17 pages

IoT Data Analytics Using Deep Learning

Uploaded by

IoT Data Analytics Using Deep Learning

Uploaded by

IoT Data Analytics Using Deep Learning

Xiaofeng Xie, Di Wu, Siping Liu, Renfa Li

Data Sensing Sensor

Recognition Gaussian Maximun Error Calculate Error

Figure 2. Construction procedure of anomaly detection model.

2 Development and Trend of Time Series Processing in IoT

3. Deep Learning for Anomaly Detection in IoT Data

Power dataset 0.79 -0.78 0.56 long

4.1 Data Set Description

(a) power-Normal (b) power-Abnormal

(c) loopsensor-Normal (d) loopsensor-Abnormal

(e) landsensor-Normal (f) landsensor-Abnormal

4.2 Analysis and Comparison of Experimental Results

You might also like