Artificial Intelligence Algorithm For Optimal Time
Artificial Intelligence Algorithm For Optimal Time
fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.2981488, IEEE Access
Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000.
Digital Object Identifier 10.1109/[Link] Number
School of Humanities and Media, Pingxiang University, Pingxiang Jiangxi, 337055, China.
ABSTRACT In order to solve the limitation of a large number of literatures on the study of modeling,
simulation and prediction of time series data, there is no model selection, and a certain model is directly
used for analysis. For three types of artificial intelligence models often applied to time series analysis:
hidden Markov Carrier model, artificial neural network model and autoregressive moving average model
are used to study model selection based on simulation comparison method. The study of nonlinear
integration methods, using intelligent system methods to learn the weighting mode, has made the model's
generalization ability and the degree is of fit to the sample data have been significantly improved. At the
same time, numerical simulations are performed on various models, and the characteristics of the time
series generated by various models are investigated. Based on the characteristics, the theory and algorithm
of model selection are proposed. The model selection theory and algorithm in this paper is used for
empirical analysis. For the artificial intelligence models commonly used in time series analysis such as
autoregressive moving average model, artificial neural network model, hidden Markov model, etc., when
selecting the research model, the method of simulation comparison can be used. The experimental results
show that the time series data generated by various models have different mathematical and physical
characteristics, which provide a basis for model selection. At the same time, the selection theory is practical.
The model selected by the theory has a good fit and prediction effect. The generation of different models
has different mathematical characteristics of time series data, which also provides a basis for selecting
models.
INDEX TERMS Time series; data model; artificial intelligence algorithm; weight pattern; generalization
ability
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see [Link]
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.2981488, IEEE Access
series. The prediction accuracy of multilayer perceptron and improve the robustness of the iterative estimation model,
(1VB, P), radial basis function network (RBF) and the multiple support vector regression (MSVR) model is used
conditional heteroscedasticity model is empirically compared. in the research of iterative time series analysis [43-46], and it
The results show that in the prediction of exchange rate time is used in three benchmark data sets. The performance of the
series, neural network model and conditional model is compared with SVR model and SVR direct model,
heteroscedasticity model are each of them can give effective which proves the validity of the MSVR model. A multi-
predictions [11-14], but it is clear that the overall output support vector regression based on the multi-input
performance of the neural network is better than the output strategy is proposed creatively [47-49], namely M-
conditional heteroscedastic model. However, neural networks SVR based on the 1VBM0 strategy, and the effectiveness of
tend to fall into a local minimum in the time series prediction the method is simulated with the help of simulated data sets
problem. In response to this defect, BPNN and Adaptive and real data sets. In addition, from the perspective of
Differential Evolution Algorithm (ADE) are combined to prediction accuracy and calculation cost, the performance of
present a hybrid model [15-19], namely ADE- BPNN to three SVR models based on different strategies is compared
improve the goodness of fit of sample data for time series and analyzed. The analysis results show that among the three
analysis. And using two real data sets, the operability and models compared, the M-SVR based on the 1VBM0 strategy
good performance of the proposed hybrid method are is acceptable. At the cost of computing, the best model
confirmed, and the proposed ADE-BPNN method can accuracy can be achieved in multi-step timing analysis
significantly increase the good fitting performance compared problems. Many application literatures directly apply one or
with separate models such as BPNN or ARIMA. Time series more of the models to directly analyze time series data
forecasting has always been a research problem of interest in without establishing a more comprehensive model selection
many application fields, such as stock price forecasting, theory. Before establishing a time series model, analyze and
temperature forecasting, hydrological time series forecasting, compare which data is suitable for use. Class model. In fact,
power load forecasting, network traffic forecasting, and so on. time series data are very different in terms of their own
Time series prediction is to predict the future data or trends characteristics. For example, in terms of autocorrelation,
from historical and current data by analyzing the rules or there are short-term, medium-term, and long-term differences.
trends of time series over time. Time series prediction A model that can only describe short-term correlation is
methods include classical time series analysis [20-23], neural obviously used to analyze time series with long correlation.
networks [24-26], and expert systems [27-30]. Time series The data is inappropriate.
prediction can be performed by mining time series, When model-based data mining methods are used to
discovering sequence rules, and using rule knowledge to process streaming data, the challenging problems are:
predict. An algorithm for discovering sequence rules was automatic selection of models, easy updating of models, and
proposed. The idea of Apriori algorithm in association training samples that appear in streaming form. In the process
analysis was used to mine sequence rules. Three algorithms, of streaming data, data-driven models have a lot of
AprioriAll, AprioriSome, and DynamicSome, were proposed limitations, because the parameters of these models only
[31-33]. An algorithm for finding numerical association rules appear as fitted parameters, without considering the internal
from multiple synchronized data streams is proposed [34]. mechanism that constitutes the data. Time series data are
The clustering method is used to symbolize the time series very different in terms of their own characteristics. For
first, a sequence pattern mining algorithm is used to find the example, in terms of autocorrelation, there are short-term,
rules in the symbol [35, 36]. An evolutionary rule based on medium-term, and long-term differences. A model that can
expert system was proposed, which combined fuzzy logic only describe short-term correlation is obviously used to
and rule inference for the analysis of stock market activities analyze time-series data with long correlation suitable. The
[37, 38]. Using the methods of fuzzy logic, ANN, and purpose of this article is to answer what kind of time series
evolutionary computation, the trend of the Nasdaq-100 index the above three models can describe, so as to select the time
value and the Nasdaq-100 index of six other companies were series data analysis model based on this, and establish a
predicted [39]. A time series association rule discovery computer intelligent time series model selection method. The
algorithm based on the cross-correlation succession tree selection theory of this model is helpful to guide researchers
model was proposed. The method of using sequence rules to to better perform data pre-analysis and pre-processing, and
make predictions is limited by the knowledge of domain improve the efficiency of establishing time series analysis
experts and has certain limitations [40-42]. In general models. Based on computer intelligence, the analysis of time
iterative methods in timing analysis, multi-step estimation is series data is better promoted, making the modeling more
an iteration based on one-step estimation. However, even if targeted and the results more accurate.
the one-step prediction model is very accurate, repeating the
iterative process of one-step prediction will accumulate II. CHARACTERISTIC ANALYSIS OF TIME SERIES
prediction errors, resulting in poor prediction performance. In DATA GENERATED BY VARIOUS MODELS
order to reduce the cumulative error in the iterative process
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see [Link]
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.2981488, IEEE Access
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see [Link]
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.2981488, IEEE Access
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see [Link]
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.2981488, IEEE Access
Through discussion, we can find that the data generated by (AR model), moving average model (MA model), and hybrid
these three types of models have the following rules. The model (ARMA model). In essence, the ARMA method is a
data generated by Hidden Markov is related to the first period. linear model with limited parameters. The principle
The KPSS stationarity test is applied, which is not stable. The framework of stationary time series analysis that meets this
model generated by the neural network is long-term related. condition has been perfected, and it is also widely used in
The stability of the generated data depends on the structure of many fields. The ARMA model uses a linear model with
the model; the data generated by the autoregressive moving limited parameters to characterize the autocorrelation of time
average model is short-term correlated and meets the stability series. Not only is it conducive to sequence analysis and
(under certain parameter conditions). Based on the above structure processing, but the finite parameter one-time model
description, the model selection theory here is based on the can also describe very common random phenomena, and the
following premise: the data suitable for a model should be accuracy of the actual fitting can meet the needs of reality. In
consistent with the characteristics of the data generated by addition, the linear prediction theory can be extended from
the model. And this is a necessary condition, because if a the structure of the primary model with limited parameters.
time series data is long-term related, it cannot be generated Therefore, the study of timing analysis problems based on
by a hidden Markov model. If a hidden Markov model is the ARMA model has a theoretically important position in
established for such data, there will not be a good modeling the fields of signal processing, economic prediction, state
effect and prediction. Therefore, the following model estimation, control, and pattern recognition.
selection algorithm is proposed, as shown in Table 1: M
TABLE 1 TIME SERIES DATA MODEL SELECTION PROGRAM PSEUDO CODE ri 0 i ri 1 i i 1 i
i 1, R i 1
(5)
(1) Input data;
(2) Perform correlation test and stationarity test on the data; i ui hi (6)
If the data is a period related then p q
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see [Link]
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.2981488, IEEE Access
network can contain multiple hidden layers, and the number Suppose there are m p-dimensional vectors ( x1 , x2 ...xm ) .
of nodes or neurons in the hidden layer is more or less, but The set of all possible linear combinations of these vectors,
the larger the number, the more obvious the nonlinearity of namely k1 x1 ...k m xm , forms a linear space, which is called
the neural network, and the more robust the network model.
the space of x sheets. Similarly, a vector composed of
For a neural network with a network structure of n-m-l, different lag orders of the original time series can also be
that is, it contains n input nodes, m hidden nodes, and l expanded into a new vector space, which is called a set of
output nodes, in the forward propagation process, the input basis that constitutes the prediction space. Proper selection of
signal first flows into the input layer, and then passes along the basis of the prediction space can help solve many
each hidden layer in turn. Finally, it is transmitted to the problems encountered in the application of neural network
output layer, and an output signal is formed at the output end. methods. Specifically, an appropriate set of basis can not
If the output signal meets a given output requirement, the only capture the potential characteristics of the input
calculation ends; otherwise, it turns to the second stage of the variables, but also avoid the computational difficulties caused
training process, which is the back propagation error link. by the non-uniqueness or multicollinearity of the parameters.
The calculation of the forward propagation process is as Therefore, as a dimensionality reduction technique with the
follows. For the hidden layer, there are: goal of finding several principal components that can explain
n most of the sample variance, principal component analysis
net j vij xi j , j 1, 2..m (PCA) becomes a natural choice in this case. In PCA, the
i 1 (8)
original variables are transformed into new variables that are
y j f (net j ), j 1, 2...m orthogonal to each other, which is very advantageous for
For the output layer, there is simplifying the calculation process, especially in those cases
m where the original variables are highly correlated. In addition,
netk w jk y j k , k 1, 2..l after the data is reduced in dimension, some noise may be
j 1 (9)
reduced, and more information containing fundamental
ok f (netk ), k 1, 2...l characteristics is identified, which is beneficial to the next
The excitation functions corresponding to the hidden layer data analysis.
and the output layer are both unipolar Sigmoid functions, that About the first part of the BPNN model, that is,
1 determining the connection weight between the input layer
is f ( x) . The Sigmoid function f (x) is continuously
1 e x and the hidden layer, the PCA-based solution involves two
differentiable and f ' ( x ) f ( x )[1 f ( x )] . When the actual aspects. First, the initial matrix of weights or loads contains
the correlation of all variables and factors. These factor loads
output of the network model is not consistent with the
represent the degree of agreement between the variables and
expected output, Will produce an output error E, which has
the principal components. Second, by associating a subset of
the following expression:
the original variables with a principal component, the
1 l
E (d k ok ) 2 (10) resulting variables will reflect the characteristics of the data
2 k 1 formation process.
Expanding from the error in the above expression to the In addition, mapping raw data into a low-dimensional
hidden layer, there are: space can greatly improve the performance of pattern
1 l recognition or prediction in a high-dimensional space.
E [d k f (net k )] 2 (11)
2 K 1 Although this mapping may result in loss of information, it is
The network error is a function of the weight w jk , vij of the ultimate goal to build a new set of bases that minimize
the number of variables while expanding into the original
each layer, so the magnitude of the error E will change as the
data space. It is more appropriate to use PCA to achieve this
weight changes. When correcting the weights, the error
goal, because PCA, as a dimension reduction method, can
should decrease at the fastest speed, so the weights should be
achieve the purpose of dimension reduction while extracting
corrected along the negative gradient direction of the weight,
that is, the correction amount of the weight is proportional to the main features of the prediction space. In addition, based
the negative gradient direction of the error, that is, on the variation of the prediction space explained by each
E principal component, a natural ordering of the input variables
w jk , j 1, 2...m; k 1, 2..l is given, which allows the non-linear function of the original
w jk
(12) variables to be considered without losing overall degrees of
E freedom in the parameter estimation process.
vij , i 1, 2...n; j 1, 2...m
vij The BPNN model also aims to modify the model
Where is the learning rate, which is a preset constant, parameters with the goal of minimizing the sum of squared
errors. For testing purposes, an extra Gaussian error structure
usually 0 1 is taken. is assumed during the parameter estimation process:
yt g (rt , ) t (13)
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see [Link]
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.2981488, IEEE Access
In the stage of neural network model structure selection, it In the theoretical framework of the generalized world, a
is necessary to evaluate the contribution of each basic new strategy is proposed in the field of statistical learning,
variable from the perspective of explanatory ability. This can which is different from the empirical risk minimization
be achieved by choosing between two specific model settings, criterion. Specifically, a function subset sequence is first
that is, the use of hypothesis testing to determine the optimal constructed from a function set, and then the subset sequence
number of hidden nodes in the neural network: is arranged according to the size of the VC dimension. Then
H 0 : yt g (rt , ) t based on the empirical risk, find the minimum value in each
(14) subset sequence, and finally select the subset with the
H1 : yt g (rt , ) h(rt , ) t
smallest sum of the minimum empirical risk and confidence
range among all the subset sequences. SVM transforms the
C. EMD-LSSVM model problem of finding an optimal hyperplane between two
different classes into a maximum classification interval
When implementing EMD decomposition, first of all, the problem. The maximum interval problem is actually a
following two prerequisites must be met: time series data quadratic programming problem with inequality constraints.
must meet the following: first, the number of local extreme
values of the time series data is the same as the number of D. Nonlinear integrated prediction model
zero points or the difference between the two is only 1; The
Assuming p training sample set ( x u , y u )(u 1, 2..m) , our
local mean of the sequence is zero, that is, the time series
signal is locally symmetric about the time axis, that is, the goal is to find the most appropriate functional relationship f =
upper envelope generated by the local maximum and the f (x) for prediction. There are n separate prediction
lower envelope generated by the local minimum, respectively, techniques that can be used to predict. For any x, the output
and the mean of the two is zero. The data sequence x (t) (t = of the i-th prediction technique is f i ( x ) . Below we will
1, 2… n) can be EMD decomposed according to the combine n separate prediction techniques to perform
following process: integrated prediction, the general form of which can be
(1) Find all local extreme values in x (t). Use cubic spline expressed as follows:
n
interpolation to connect all local maxima to generate the f ( x ) wi f i ( x ) (18)
upper envelope xu (t ) . Similarly, connect all local minima to i 1
produce the lower envelope xl (t ) . Where f (x) is the integrated prediction result, and the
(2) Calculate the average envelope value based on the weight of each individual prediction technology in the
upper and lower envelopes obtained in (1): integrated prediction is wi (i 1, 2..n) .
x (t ) xl (t ) The nonlinear integrated prediction model does not simply
m1 (t ) u (15)
2 assign weights to the prediction results of each integrated
(3) The original data sequence x (t) minus the average prediction member, but rather learns the weight pattern by
value of the upper and lower envelopes m1 (t ) will generate means of artificial intelligence to maximize the information
the first component d1 (t ) : contained in the data after fitting a single prediction model.
d1 (t ) x (t ) m1 (t ) (16) Reflected in the weights learned, this is better than the linear
integration model. To sum up, the nonlinear integrated
(4) Check whether d1 (t ) meets the requirements of the
prediction model has certain advantages over the other
eigenmode function. models mentioned above in terms of fitting accuracy and
n
[d k 1 (t ) d k (t )] 2 generalization ability. Following the above symbolic
t 1 d k21 (t )
(17)
assumptions, the general steps of nonlinear integrated
When the number of training samples is small, the modeling can be summarized as follows:
confidence range will increase with the increase of the VC ① Take the data set: Obtain the expected output d (x) and
dimension of the learning machine, that is, based on the the prediction results of the individual models constituting
actual risk, the deviation from the empirical risk will the integrated prediction model, that is, f i ( x ), i 1, 2...n ,
gradually increase. Therefore, choosing a learning machine constitute the data set of the nonlinear integrated modeling,
that is too complex, that is, a VC with too high a neural and divide the data set into two parts, the training set And test
network, often fails to get good results. This "over-learning" set;
phenomenon mainly occurs because in a small sample ② Training weight mode: Use the training set obtained in
situation, once the network structure or algorithm is not step ①, and use non-linear technology, such as the artificial
designed properly, it will lead to a large confidence range. intelligence method used in this article, including BPNN and
Even though the risk of experience may be small, the SVM, to train the weight mode of the nonlinear integrated
increased confidence will greatly reduce the ability to prediction model to determine each individual prediction The
promote. weight of technology, that is, wi ( x ), i 1, 2...n , and finally
determine the optimal model structure;
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see [Link]
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.2981488, IEEE Access
③ Test model performance: Use the test set obtained in class, merging data objects from bottom to top until a certain
step ① and the optimal structure of the model obtained in end condition is met or all objects have been merged into one
step ② to test the performance of the nonlinear prediction class. The data sequence space is mapped to the model space,
model to quantify the prediction effect or performance of the and various existing clustering algorithms are applied in the
nonlinear integrated model. model space. A partitioning and hierarchical combination
Generally, we can think of a nonlinear integrated clustering algorithm is proposed. Table 2 and Table 3
prediction model as a nonlinear information processing describe clustering algorithms based on HMM's time series
system. Assuming that the prediction results of n individual hierarchical clustering, partitioning, and hierarchical
prediction technologies is yi , i 1, 2...n , the nonlinear combination.
integrated prediction model in this paper can be described by
the following formula: TABLE 2 HIERARCHICAL CLUSTERING ALGORITHM BASED ON HMM
y g ( y1 , y 2 ... y n ) (19) Input: O O1 , O2 ...On
Output: results of clustering
Where g is a non-linear function and y g ( y1 , y 2 ... y n ) is
Method:
the input vector of the model. In the BPNN non-linear 1) Train each sequence Oi as an HMM i ;
integrated prediction model, the weight of the integrated 2) Construct the distance matrix P (Oi | i ) by the likelihood
model is determined by the BPNN to realize this non-linear
D D (Oi , O j ) or the distance between the models;
mapping. At this time, the input of the neural network is the 3) Use agglomerative hierarchical clustering algorithm to cluster by distance
prediction result yi of each individual prediction technology, matrix D;
and the model output is the result of the BPNN nonlinear TABLE 3 CLUSTERING ALGORITHM BASED ON HMM-BASED PARTITIONING
AND LAYERING
integrated prediction. The expected output is the
corresponding sample true value. Input: O O1 , O2 ...On
Output: results of clustering
Method:
IV. HMM-based time series artificial intelligence 1) Class division: set time series. Divided into k clusters;
algorithm 2) Train each cluster as a HMM i
Time series are different from static data, whose data change 3) Construct the distance matrix D D(i , j ) ;
over time. Time series exist in a wide range of fields, from 4) Use agglomerative hierarchical clustering algorithm to cluster by distance
scientific computing, engineering, business, finance, matrix D;
economics, health care to government departments. Cluster Suppose G and C are data sets with k classes. The
analysis of time series has also been extensively studied. similarity measure of clustering is defined as:
These studies include clustering analysis based on original 1 k
time series, feature-based time series clustering, and model- Sim(G , C ) maxSim(G I , C j ) (20)
k i 1 1 j k
based time series clustering.
HBHCTS (HMM-Based Hierarchical Clustering Time-
Series) algorithm is mainly divided into three parts: the
A. Time series clustering formation of initial partitions, hierarchical aggregation and
Common time series clustering algorithms mainly include automatic selection of clustering results. The initial partition
partitioning (dividing) method and layering method. is formed by scanning the time series set in a single pass,
Partition-based clustering randomly selects k objects as the comparing the currently accessed time series with the
initial class center, calculates the distance from each object to existing model (partition). If there is a suitable model, add it,
the class center, and assigns it to the nearest class, then otherwise create a new model. Judging whether the model is
recalculates the new class center, and so on until the object is suitable is determined by the distance min value. We can
not Change again. Hierarchical clustering method organizes obtain relevant prior knowledge by counting the distribution
data objects into a tree. According to whether the hierarchical of this threshold, which is easier to determine than specifying
structure is top-down or bottom-up, the method can be the initial number of partitions in advance. After the initial
divided into two types: splitting and agglomeration. The split partitions are formed, the hierarchical clustering is used to
method treats all objects as belonging to the same class, and merge the partitions. The evaluation of the clustering results
gradually splits down into more and smaller classes until is similar to the Dunn Index method, and the largest one is
each object becomes its own class or meets an end condition. the optimal clustering result. The algorithm flowchart is
The aggregation rule treats each object as an independent shown in Figure 6:
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see [Link]
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.2981488, IEEE Access
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see [Link]
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.2981488, IEEE Access
FIGURE 7 Corrected-rate of different clustering methods on the FIGURE 8 Histogram of random variable x
synthesized data
Regarding the distribution of the random variable X, we
In the above experiment, the distance threshold sfit of have also experimented with other models. Three models are
HBHCTS was set to 0.08. This threshold can be used to selected. The first model is a HMM model with five hidden
decide whether to build a new model for the new sequence. states, where the mean of the hidden state output distribution
This is achieved by calculating the distance between the is randomly set between 0-5 and the variance is randomly set
sequence and the existing HMM model. If the threshold is set between 0-1. The second model is similar to the first model,
too large, sequences of different classes will be merged by except that the variance is randomly set between 0-10. The
mistake. The smaller the threshold is, the more the number of third model selects the previously mentioned HMM 1 model.
classes generated in the initial partition is, which increases Generate a random sequence for these models to count the
the computational complexity of subsequent hierarchical distribution of the random variable X, calculate the kurtosis
clustering. value of the random variable X, and repeat 10 random
Experiments show that the distribution has the experiments. The experimental results are shown in Table 5.
characteristics of a normal distribution. As shown in Figure 8, Their kurtosis values are all close to 3.0. It can be seen that,
the statistical characteristics have a mean value of 0.007, a for the sequences of the three models, the random variable x
standard deviation of 0.025, a minimum and maximum value approximately follows a normal distribution.
TABLE 5 KURTOSIS VALUES OF THE RANDOM VARIABLE X OBTAINED
of -3.2 and 3.48, and a kurtosis value of 19.97. FROM DIFFERENT SEQUENCES
According to the 3 rule of normal distribution, the Experiment Model 1 Model 2 Model 3
points that fall into the (u 3 , u 3 ) interval account for 1 3. 0466 2. 9967 3. 0439
99% of the entire distribution. Therefore, it can be considered 2 3. 0712 3. 1822 3. 1086
that these points belong to a normal distribution with a 99% 3 2. 9072 3. 0495 3. 0819
confidence level. Therefore, we set the threshold sfit to 3 4 3. 1131 3. 1148 3. 0132
can merge the sequences as much as possible while ensuring 5 3. 0039 3. 0066 3. 0261
the correctness of the initial partitioning of the clustering 6 3. 0049 3. 1095 3. 0485
algorithm. 7 3. 0661 3. 0849 2. 9713
8 3. 0333 3. 0272 2. 9836
9 3. 0983 3. 0592 2. 9796
10 2. 9786 2. 9831 3. 0326
If in the initial partition, the sequences of different
classes are divided into the same region, this introduces
wrong partitioning. Because the hierarchical division
clustering does not consider class splitting, such as HBHCTS
and Hier-moHMMs methods, the initial partition It is
important to ensure that sequences belonging to the same
class are partitioned in the same region. In order to test the
effectiveness of HBHCTS, we have performed experiments
on the error rate of the partition sequence of the initial
partition of HBHCTS. As shown in Figure 9, as the distance
threshold increases, , The initial partition error rate also
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see [Link]
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.2981488, IEEE Access
FIGURE 9 Number of initial partitions at different distance thresholds FIGURE 11 Partial autocorrelation diagram of time series
In the S-BPNN model, in order to avoid the subjectivity
of artificially selecting the number of input nodes, the
IV. Experimental verification number of input nodes is set to 6 according to the
In the empirical research in this paper, the models used, autocorrelation analysis and partial autocorrelation analysis
ARMA-GARCH model, BPNN model, EMD-SVR model in the ARMA model modeling process, that is, 6 input
and nonlinear integrated prediction model are all realized by variables are selected. PCA was applied to these 6 variables
using matlab software developed by Mathworks simultaneously, and 6 principal components were obtained,
Experimental Company. as shown below. The hypothesis test with a confidence level
According to the analysis results of autocorrelation of 99% described in the previous section is used to
analysis, partial autocorrelation analysis, and determination determine whether a principal component remains in the
by the AIC criterion, in the ARMA model, the autoregressive model. In the test statistic L of the hypothesis test, n = 2264,
n
order p is set to 4 and the moving average order q is set to 6.
s = 1, k = 3, SSE (d i oi ) 2 . As a result, the two
The ARCH order and GARCH order of the GARCH model i 1
are selected as 3 and 2, respectively. The autocorrelation principal components remain in the network model of
diagram is shown in Figure 10 below, and Figure 11 is a conditional mean prediction, in other words, the optimal
partial autocorrelation diagram. network structure of the obtained neural network is 6-2-1.
By introducing PCA and hypothesis testing in the
process of neural network model selection, the resulting
BPNN model has several satisfactory features. First, the
BPNN model does not need to make any assumptions about
the functional relationship between lagging returns and future
returns. Secondly, by orthogonalizing the input space, the
possible multicollinearity is eliminated, and the uniqueness
of hidden nodes is guaranteed. Third, the step-by-step
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see [Link]
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.2981488, IEEE Access
selection process selects a most streamlined model to ensure As can be seen from Table 6, three separate prediction
that the training data does not overfit. Finally, it reduces the models, namely the ARMA (4,6) -GARCH (3,2) model, the
computational cost required to find the best model structure. S-BPNN model, the EMD-LS S VM model, and three
The RBF kernel function has two hyperparameters, C integrated prediction models, namely the simple average The
and , namely the penalty factor and the inverse of the integrated model, BPNN integrated prediction model, and
Gaussian kernel bandwidth. For a specific problem, naturally, SVM integrated prediction model. The prediction
the optimal values of C and cannot be determined in performance of the SVM integrated prediction model is the
advance, so it is absolutely necessary to perform model best among the six models. Not only is the NMSE the
selection, that is, the process of parameter (C , ) search. smallest, but the Dstat is the largest.
Choosing the best-performing parameter pair from the many In the sample, the difference between the fitted data and
optional parameter pairs is the ultimate goal of model the real data is shown in Figure 12, and it can be found that
selection, and the best-performing parameter pair refers to the effect is better, the maximum difference is 0.8, and the
the parameter that enables the support vector classifier to average fluctuates within the range of ± 0.8.
make the most accurate prediction of the test data Correct. In
the method to achieve this goal, the cross-validation method
can avoid the over-fitting phenomenon, and control the
variance of the model performance to ensure the stability of
the model performance. Cross-validation methods are
commonly used to determine tuning parameters and compare
model performance.
In the nonlinear integrated prediction model, the first
205 prediction sample values are used as the training set, and
the last 50 prediction sample values are used as the test set.
Based on simple average integration, the S-BPNN and
LSSVM are used to allocate the integrated time series
analysis model. Baseline integrated prediction model, the
arithmetic average of the results of simple average integrated
model. The BPNN and SVM nonlinear integrated prediction
FIGURE 12 Differences between model data and real data
models use neural network method and support vector
machine technology to determine the weights of three The following is a comparison group. Based on the data
separate prediction models in the integrated model. In used as the test set for the fitting set above, the model
addition, considering that the neural network method has the mechanism for establishing an autoregressive moving
disadvantage of easily falling into a local optimum, when the average is as follows. First, the model order is determined.
S-BPNN nonlinear integrated modeling is performed, the Generally, the smaller AIC and BIC are selected as the model
average value of the results of running the program 100 times according to the AIC and Schwarz criteria the lag order is
is taken as the final S-BPNN nonlinear integrated prediction shown in Table 7 in several cases.
TABLE 7 SELECTION OF LAG ORDER OF ARMA MODEL
model. The performance of the ARMA-GARCH model, the AR
Numbering MA AIC Schwarz
S-BPNN model, the EMD-LS S VM model, and the three 6 -2.813 -2.661
1 3
integrated prediction models, the simple average integration -2.782 -2.656
2 5 3
model, the BPNN integrated time series analysis model, and -2.772 -2.651
3 4 2
the SVM integrated time series analysis model are shown in -2.749 -2.690
4 3 2
Table 6. -2.712 -2.660
5 2 1
TABLE 6 MODEL PERFORMANCE OF EACH TIME SERIES PREDICTION MODEL The following examines the effect of model
Evaluation index NMSE Rank D Rank extrapolation, and the application model examines the
stat
ARMA(4,6)- 3.2243 6 48% 4 remaining 19 data differences (using real data minus the data
GARCH(3,2} generated by the model) as shown in Figure 13.
S-BPNN 1.0086 5 46% 5
EMD-SSVM 1.5324 4 54% 2
Simple average 1.2026 3 51% 3
integrated prediction
(benchmark)
Nonlinear Integrated 1.1129 2 52% 6
Prediction (BPNN)
Nonlinear Integrated 0.7384 1 63% 1
Prediction (SVM)
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see [Link]
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.2981488, IEEE Access
REFERENCES
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see [Link]
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.2981488, IEEE Access
[16] Ling Teng, Junwu Zhu, Bin Li. A Voting Aggregation relationship.[J]. Lecture Notes in Computer Science,
Algorithm for Optimal Social Satisfaction[J]. Mobile vol.6948, pp.250-264, Sep 2017.
Networks & Applications, vol.23, pp.344-351, Sep 2017. [32] Felix G , Gonzalo Nápoles, Falcon R , et al. A Review
[17] R. Arunkumar, V. Jothiprakash, Kirty Sharma. Artificial on Methods and Software for Fuzzy Cognitive Maps[J].
Intelligence Techniques for Predicting and Mapping Artificial Intelligence Review, vol.52, pp.1707-1737,
Daily Pan Evaporation[J]. Journal of the Institution of Aug 2017.
Engineers, vol.98, no.3, pp.219-231, Aug 2017. [33] Miquel L. Alomar, Vincent Canals, Nicolas Perez-Mora.
[18] Mahmud M S , Meesad P . An innovative recurrent FPGA-Based Stochastic Echo State Networks for Time-
error-based neuro-fuzzy system with momentum for Series Forecasting[J]. Computational Intelligence &
stock price prediction[J]. Soft Computing, vol.20, no.10, Neuroscience, 2016, vol.2016, pp.892-901, Aug 2016.
pp.4173-4191, June 2015. [34] Nikolaos Kariotoglou, Maryam Kamgarpour, Tyler H.
[19] Andrzej Janusz, Marek Grzegorowski, Marcin Michalak. Summers. The Linear Programming Approach to Reach-
Predicting seismic events in coal mines based on Avoid Problems for Markov Decision Processes[J].
underground sensor measurements[J]. Engineering Mathematics, 2017, vol.60, pp.263-285, Oct 2016.
Applications of Artificial Intelligence, vol.64, pp.83-94, [35] Haiyang Yu, Zhihai Wu, Dongwei Chen. Probabilistic
Sep 2017. Prediction of Bus Headway Using Relevance Vector
[20] Mabel González, Christoph Bergmeir, Isaac Triguero. Machine Regression[J]. IEEE Transactions on
Self-labeling techniques for semi-supervised time series Intelligent Transportation Systems, 2017, vol.18, no.7,
classification: an empirical study[J]. Knowledge & pp.1772-1781, Jan 2016.
Information Systems, vol.55, pp.493-518, Aug 2017. [36] Nema M K , Khare D , Chandniha S K . Application of
[21] Ozoegwu C G . The solar energy assessment methods artificial intelligence to estimate the reference
for Nigeria: The current status, the future directions and evapotranspiration in sub-humid Doon valley[J].
a neural time series method[J]. Renewable & Applied Water Science, 2017,vol.7, no.5, pp.3903-3910,
Sustainable Energy Reviews, vol.92, pp.146-159, Sep Mar 2017.
2018. [37] Philipp Grohs, Fabian Hornung, Arnulf Jentzen. A proof
[22] Selmo Eduardo Rodrigues Júnior, Ginalber Luiz de that artificial neural networks overcome the curse of
Oliveira Serra. A novel intelligent approach for state dimensionality in the numerical approximation of Black-
space evolving forecasting of seasonal time series[J]. Scholes partial differential equations[J]. Papers, vol.2,
Engineering Applications of Artificial Intelligence, no.2, pp.1314-1325, Sep 2018.
vol.64, pp.272-285, Sep 2017. [38] Tawfek Mahmoud, Zhao Yang Dong, Jin Ma. Advanced
[23] Diana M. Sánchez-Silva, Héctor G. Acosta-Mesa, Tania method for short-term wind power prediction with
Romo-González. Semi-Automatic Analysis for multiple observation points using extreme learning
Unidimensional Immunoblot Images to Discriminate machines[J]. Journal of Engineering, vol.1, no.1, pp.29-
Breast Cancer Cases Using Time Series Data Mining[J]. 38, Oct 2017.
International Journal of Pattern Recognition and [39] Priya Nayar, Bhim Singh, Sukumar Mishra. Neural
Artificial Intelligence, vol.32, no.1, pp.18604-18621, Network based Control of SG based Standalone
March 2018. Generating System with Energy Storage for Power
[24] Pritpal Singh. Neuro-Fuzzy Hybridized Model for Quality Enhancement[J]. Journal of the Institution of
Seasonal Rainfall Forecasting: A Case Study in Stock Engineers, 2016, vol.98, no.4, pp.405-413, Sep 2016.
Index Forecasting[J]. Studies in Computational [40] Albrecht S V , Beck J C , Buckeridge D L , et al.
Intelligence, vol.611, pp.361-385, Aug 2016. Reports on the 2015 AAAI Workshop Series[J]. Ai
[25] Xuemin Xing, Debao Wen, Hsing-Chung Chang. Magazine, vol.36, no.2, pp.90-101, June 2015.
Highway Deformation Monitoring Based on an [41] Shikha Gupta, Nikita Basant, Premanjali Rai. Modeling
Integrated CRInSAR Algorithm — Simulation and Real the binding affinity of structurally diverse industrial
Data Validation[J]. International Journal of Pattern chemicals to carbon using the artificial intelligence
Recognition & Artificial Intelligence, vol.32, no.8, approaches[J]. Environmental Science & Pollution
pp.185036-185051, May 2018. Research International, vol.22, no.22, pp.17810-17821,
[26] Xiao-Xia Yin, Sillas Hadjiloucas, Yanchun Zhang. July 2015.
Pattern identification of biomedical images with time [42] Wuhui Chen, Incheon Paik, Zhenni Li. Tology-Aware
series: Contrasting THz pulse imaging with DCE- Optimal Data Placement Algorithm for Network Traffic
MRIs[J]. Artificial Intelligence in Medicine, 2016, Optimization[J]. IEEE Transactions on Computers,
vol.67, no.3, pp.1-23, Feb 2016. vol.65, no.8, pp.2603-2617, May 2016.
[27] Wanjawa B W . Evaluating the Performance of ANN [43] Raul Cristian Scarlat, Georg Heygster, Leif Toudal
Prediction System at Shanghai Stock Market in the Pedersen. Experiences With an Optimal Estimation
Period [J]. vol.5, no.3, pp.124-145, Dec 2016. Algorithm for Surface and Atmospheric Parameter
[28] Hirata T , Kuremoto T , Obayashi M , et al. Deep Belief Retrieval From Passive Microwave Data in the Arctic[J].
Network Using Reinforcement Learning and Its IEEE Journal of Selected Topics in Applied Earth
Applications to Time Series Forecasting[J]. Lecture Observations & Remote Sensing, vol.3, pp.1-14, Sep
Notes in Computer Science, 2016, vol.6, no.8, pp.30-37, 2017.
Sep 2016. [44] Mohammad Altaher, Omaima Nomir. Euler–Lagrange
[29] Atencia M, Sandoval F, Prieto A. Advances in as Pseudo-metric of the RRT algorithm for optimal-time
computational intelligence: Selected and improved trajectory of flight simulation model in high-density
papers of the 12th International Work-Conference on obstacle environment[J]. Robotica, vol.5, pp.1-19, Dec
Artificial Neural Networks (IWANN 2013)[J]. vol.164, 2015.
no.21, pp.1-4, Sep 2015. [45] Ankita Sinha, Prasanta K. Jana. MRF: MapReduce
[30] Patty Kostkova, Jane Mani-Saada, Gemma Madle. based Forecasting Algorithm for Time Series Data[J].
Agent-Based Up-to-date Data Management in National Procedia Computer Science, vol.132, pp.92-102, Dec
electronic Library for Communicable Disease[J]. 2018.
Concurrency Practice and Experience, vol.34, pp.105- [46] Goran Klepac, Robert Kopal, Leo Mršić. REFII Model
124, May 2018. as a Base for Data Mining Techniques Hybridization
[31] Wen J , Kow Y M , Chen Y . Online games and family with Purpose of Time Series Pattern Recognition[J].
ties: Influences of social networking game on family
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see [Link]
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.2981488, IEEE Access
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see [Link]