0% found this document useful (0 votes)
81 views14 pages

Extreme Learning Machine: A Review

Extreme Learning Machine: A Review
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
81 views14 pages

Extreme Learning Machine: A Review

Extreme Learning Machine: A Review
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

International Journal of Applied Engineering Research ISSN 0973-4562 Volume 12, Number 14 (2017) pp.

4610-4623
© Research India Publications. http://www.ripublication.com

Extreme Learning Machine: A Review

Musatafa Abbas Abbood Albadra and Sabrina Tiuna

a
CAIT, Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia, Bangi, Selangor, Malaysia.

Abstract It has been recently proposed that Extreme Learning


Machines (ELM) can be used to train single hidden layer
Feedforward neural networks (FFNN) have been utilised for
feedforward neural networks (SLFNs). In ELM, initiation of
various research in machine learning and they have gained a
the hidden nodes is done randomly and before it is fixed
significantly wide acceptance. However, it was recently noted
without iterative tuning. Furthermore, ELM’s hidden nodes do
that the feedforward neural network has been functioning
not even need to be neuron alike. The free parameter that it
slower than needed. As a result, it has created critical
has to learn is the connections (or weights) between the output
bottlenecks among its applications. Extreme Learning
layer and the hidden layer. As such, ELM is developed as a
Machines (ELM) were suggested as alternative learning
linear-in-the-parameter model that is ultimately concerned
algorithms instead of FFNN. The former is characterised by
with solving a linear system. Unlike traditional FNN learning
single-hidden layer feedforward neural networks (SLFN). It
methods, ELM is significantly more efficient and it has a
selects hidden nodes randomly and analytically determines
greater tendency to achieve a global optimum. It has been
their output weight. This review aims to, first, present a short
shown by theoretical studies that ELM is capable of
mathematical explanation to explain the basic ELM. Second,
maintaining the SLFNs’ universal approximation capability
because of its notable simplicity, efficiency, and remarkable
even if it works with randomly generated hidden nodes
generalisation performance, ELM has had wide uses in
(Huang et al. 2006; Huang & Chen 2007; Huang & Chen
various domains, such as computer vision, biomedical
2008). With frequently utilised activation functions, ELM is
engineering, control and robotics, system identification, etc.
capable of achieving the traditional FNN’s almost optimal
Thus, in this review, we will aim to present a complete view
generalisation bound, where it learns all the parameters (Liu et
of these ELM advances for different applications. Finally,
al. 2015). ELM’s advantages over traditional FNN algorithms
ELM’s strengths and weakness will be presented, along with
in generalisation and efficiency performance have been
its future perspectives.
observed on a vast range of problems from various
Keywords: Extreme Learning Machine, Single-Hidden fields(Huang et al. 2006; Huang et al. 2012). ELM has been
Layer Feedforward Neural Networks. observed to generally have more efficiency compared to least
square support vector machines (LS-SVMs) (Suykens &
Vandewalle 1999), support vector machines (SVMs) (Cortes
INTRODUCTION & Vapnik 1995), and other advanced algorithms. Empirical
Ever since the popular backpropagation (BP) algorithm has studies revealed that ELM’s generalisation ability is
been introduced, feedforward neural networks (FNN) have comparable or even superior to that of SVMs and SVMs’
been studied well and used widely (Rumelhart et al. 1988). variants (Huang et al. 2006; Huang et al. 2012; Fernández-
Traditional BP algorithm is considered a first order gradient Delgado et al. 2014; Huang et al. 2014). ELM and SVM were
method that can be used to optimise parameters. However, it compared in detail in (Huang 2014) and (Huang et al. 2012).
suffers from local minimum problem and slow convergence. In the past decade, ELM applications and theories have been
Researchers have suggested different techniques to improve investigated extensively. From a learning efficiency
the optimality or efficiency in FNN training, such as subset standpoint, ELM’s original design has three objectives: high
selection methods (Chen et al. 1991; Li et al. 2005), second learning accuracy, least human invention, and fast learning
order optimisation methods (Hagan & Menhaj 1994; speed (as demonstrated in Fig. 1). The original ELM model
Wilamowski & Yu 2010), or global optimisation methods has been equipped with various extensions to make it more
(Yao 1993; Branke 1995). Despite the fact that it exhibits suitable and efficient for specific applications. The authors of
better generalisation performance or faster training speed (Huang et al. 2015) wrote a review paper that did a wide-
compared to the BP algorithm, majority of these methods are ranging study on ELM. Also, (Huang et al. 2011) gave a
still not capable of guaranteeing a global optimal solution. literature survey about ELM’s applications and theories. Since

4610
International Journal of Applied Engineering Research ISSN 0973-4562 Volume 12, Number 14 (2017) pp. 4610-4623
© Research India Publications. http://www.ripublication.com

then, there has been more active research on ELM. From a


theoretical point of view, ELM’s universal approximation
capability was investigated further in (Huang et al. 2012).
ELM’s generalisation ability was studied using the framework
from the initial localized generalisation error model (LGEM)
(Xi-Zhao et al. 2013) and statistical learning theory (Liu et al.
2012; Lin et al. 2015; Liu et al. 2015). Many ELM variants
have been suggested to meet specific application
requirements. For example, the test time must be minimized in
cost sensitive learning, which needs a compact network so
that it can satisfy the test time budget. In this context, ELM
was able to successfully adapt in order to attain high
compactness in terms of network size (Deng et al. 2011; He et Figure 1: Learning targets of ELM framework (Huang et al.
2015)
al. 2011; MartíNez-MartíNez et al. 2011; Yang et al. 2012; Du
et al. 2013; Lahoz et al. 2013; Li et al. 2013; Yang et al. 2013;
Bai et al. 2014; Wang et al. 2014). ELM extensions for
The succeeding parts of this review will be organised into the
noisy/missing data (Miche et al. 2010; Man et al. 2011;
following sections: In Section 2, the formulation of classical
Horata et al. 2013; Yu et al. 2013), online sequential data ELM will be introduced. Sections 3, 4, will offer an intensive
(Liang et al. 2006; Lan et al. 2009; Rong et al. 2009; Zhao et review of ELM’s extensions and improvements for its various
al. 2012; Ye et al. 2013), imbalanced data (Horata et al. 2013; applications. Section 5 will present ELM’s strengths and
Zong et al. 2013; Huang et al. 2014), etc. have also been weakness, along with its future perspectives. Section 6
observed. Furthermore, apart from its uses in regression and presents the conclusion of the paper.
traditional classification tasks, ELM’s applications have
recently been extended to feature selection, clustering, and
representational learning (Benoít et al. 2013; Kasun et al. CLASSICAL EXTREME LEARNING MACHINES
2013; Huang et al. 2014). This review will first give a short
This section will introduce the classical ELM model along
mathematical explanation for the basic ELM. Next, it will with its basic variants for supervised regression and
present a roadmap for the newest optimisations on ELM and classification (Huang et al. 2004; Huang et al. 2006; Huang et
ELM’s applications. Finally, ELM’s strengths and weaknesses al. 2011; Huang et al. 2012).The feature mappings, hidden
will be presented. nodes, and feature space of ELM were suggested for use in
It should be noted that the randomised strategies of ELM ‘‘generalised’’ single-hidden layer feedforward networks. In
learning frameworks for nonlinear feature construction have these networks, the hidden layer does not have to be neuron
attracted a large amount of interest in the machine learning alike (Huang & Chen 2007; Huang & Chen 2008; Huang et al.
2012). ELM’s output function for generalised SLFNs is
and computational intelligence community (Rahimi & Recht
represented by the following equation
2008; Rahimi & Recht 2008; Rahimi & Recht 2009; Saxe et
al. 2011; Le et al. 2013; Widrow et al. 2013). These 𝑓𝐿 (x) = ∑𝐿𝑖=1 𝛽𝑖 ℎ𝑖 (x) = ℎ(x)𝛽 (1)
approaches have a close relationship to ELM and a number of
Where 𝛽=[𝛽1 , … , 𝛽𝐿 ]𝑇 is represents the output weight vector
them can be considered special cases since they have many
between the L nodes’ hidden layer to the m ≥ 1 output nodes,
common properties. For instance, (Rahimi & Recht and h(x) = [h1(x), …, hL(x)] represents the nonlinear feature
2009)introduced the Random Kitchen Sinks (RKS), which is a mapping of ELM (Fig. 2), e.g., the hidden layer’s output (row)
special kind of ELM that restricts the construction of its vector in terms of the input x. hi(x) is the ith hidden node’s
hidden layer to the Fourier basis. The No-Prop algorithm output. The hidden nodes’ output functions are not always
demonstrated in (Widrow et al. 2013) possesses a spirit unique. One can use different output functions in different
similar to that of ELM. However, the former trains its output hidden neurons. Specifically, in real applications, hi(x) can be
weights with the use of the Least Mean Square (LMS) presented as
method.
hi(x) = G(ai, bi, x),ai∈Rd, bi ∈ R (2)
Where G (a, b, x) (having hidden node parameters (a, b))
represents a nonlinear piecewise continuous function that
meets the capability theorems of ELM universal
approximation (Huang et al. 2006; Huang & Chen 2007;
Huang & Chen 2008).

4611
International Journal of Applied Engineering Research ISSN 0973-4562 Volume 12, Number 14 (2017) pp. 4610-4623
© Research India Publications. http://www.ripublication.com

Feature learning
β1 βi βL
Clustering
Problem based Regression
optimization constraints Classification

L Random hidden neurons (which need to be


1 .. i .. L algebraic sum based) or other ELM feature
mappings. Different type of output function
(ai, bi)
could be used in different neurons:
hi (x) = Gi (ai, bi, x)

1 .. d d Input nodes

Xj

Figure 2: ELM architecture; the hidden nodes in ELM can be combinatorial nodes that are made up of different types of
computational nodes (Huang & Chen 2007)

Table 1: Commonly used mapping functions in ELM utilizes kernel functions to map features, or deep neural
networks (Bengio 2009), which utilise Auto-Encoders/Auto-
Decoders or Restricted Boltzmann Machines (RBM) for
𝟏 feature learning. In ELM, the nonlinear mapping functions
𝐆(𝐚, 𝐛, 𝐱) = can be any of the nonlinear piecewise continuous functions.
Sigmoid function 𝟏 + 𝐞𝐱𝐩(−(𝐚・𝐱 + 𝐛))
Table 1 shows some of the most commonly used ones.

1 − exp(−(a・x + b))
Hyperbolic tangent G(a, b, x) =
1 + exp(−(a・x + b))
function

Gaussian function G(a, b, x) = exp(−b ∥ x − a ∥)

Multiquadric
G(a, b, x) = (∥ x − a ∥ + b2)1/2
function

Hard limit function G(a, b, x) = {1,


0,
if 𝐚 ・𝐱+ b ≤ 0
otherwise
Figure 3: ELM feature mappings and feature space (Huang et
Cosine al. 2015)
function/Fourier G(a, b, x) = cos(a ・x + b)
basis
Instead of being explicitly trained, ELM randomly generates
(does not depend on the training data) the hidden node
Essentially, there are two main stages involved when ELM parameters (a, b) based on any of the continuous probability
trains an SLFN: (1) randomised feature mapping and (2) distribution. This results into notable efficiency in comparison
solving of linear parameters. During the first stage, the hidden to traditional BP neural networks. Aside from the activation
layer is randomly initialised by the ELM so that the input data functions presented in Table 1, other special mapping
can be mapped into a feature space (known as the ELM functions are also utilized in ELM and its variants, like those
feature space) using some of the nonlinear mapping functions utilized in wavelet ELM (Cao et al. 2010; Malathi et al. 2010;
(see Fig. 3). This first stage differentiates ELM from Malathi et al. 2011; Avci & Coteli 2012) and fuzzy ELM (Qu
numerous existing learning algorithms like SVM, which et al. 2011; Daliri 2012; Zhang & Ji 2013).

4612
International Journal of Applied Engineering Research ISSN 0973-4562 Volume 12, Number 14 (2017) pp. 4610-4623
© Research India Publications. http://www.ripublication.com

Basic Extreme Learning Machine is done using the least-squares solution. Furthermore, they
have been defined by the outputs of the targets and the hidden
(Huang et al. 2006) proposed the original ELM algorithm that
layer. Figure 4 shows an overview of the training algorithm
can be used to train SLFN. In ELM, the main idea involves
and the ELM structure. A brief description of ELM will be
the hidden layer weights. Furthermore, the biases are
given in the next section.
randomly generated and the calculation of the output weights

Input Layer Hidden Layer Output Layer

Input xj1
2

Input xj2
3 Output tj

Input xj3
...

4
Input xjn
...

Figure 4: Diagram of Extreme Learning Machine (Huang et al. 2011)

Where: The standard of SLFNs and L hidden nodes in the activation


N refers to a set of unique samples (Xi, ti), where Xi = [xi1, function g(x) can be taken as samples of N without error. In
xi2…xin] T∈ Rn and ti= [ti1, ti2…tim] T ∈Rm. other words, mean:∑𝐿𝑗=1 || 𝑜𝑗 − 𝑡𝑗 || = 0, i.e., and there exist
L represents the hidden layer nodes. 𝛽 𝑖,Wi, and bi in such a way that
𝐿
g(x) represents the activation function, which is also a
mathematical model that is represented by the following ∑ 𝛽𝑖 𝑔𝑖 (𝑊𝑖 . 𝑋𝑗 + 𝑏𝑖 ) = 𝑡𝑗 , 𝑗 = 1, … . , 𝑁. (4)
equation: 𝑖=1

𝐿 𝐿

∑ 𝛽𝑖 𝑔𝑖 (𝑋𝑗 ) = ∑ 𝛽𝑖 𝑔𝑖 (𝑊𝑖 . 𝑋𝑗 + 𝑏𝑖 ) (3) From the equations given above for N, it can then be
𝑖=1 𝑖=1 presented as follows:
J = 1… N. H𝛽 =T (5)
Where: Where:
Wi = [Wi1, Wi2… Win] T represents the weight vector that H (W1… WL, b1…bL, X1… XN)
connects the hidden node and the ith input nodes.
𝛽 𝑖 = [𝛽 𝑖1, 𝛽 𝑖2,……, 𝛽 𝑖𝑚] Trepresents the weight vector that g(W1 . X1 + b1 ) … g(WL . X1 + bL )
connects the hidden node and the ith output nodes.
= …

bi represents the ith hidden node’s threshold.


Wi .Xj represents the inner product of Wi and Xj. Selection of
g(W1 . XN + b1) … g(WL . XN + bL)
the output nodes is linearly done, however.

4613
International Journal of Applied Engineering Research ISSN 0973-4562 Volume 12, Number 14 (2017) pp. 4610-4623
© Research India Publications. http://www.ripublication.com

Moreover, (Huang et al. 2008) proposed an improved I-ELM


1T  t1T 
that possessed fully complex hidden nodes. This improvement
    and T   . 
 T  t T  extended the I-ELM from its real domain and all the way to
 L  L*m  N  N *m the complex domain.

Equation (5) then turns into a linear system. Furthermore, the Pruning ELM
output weights β can be determined analytically by
The use of too few/many hidden nodes may result into issues
discovering a least square solution in the following way:
of underfitting/overfitting in pattern classification.(Rong et al.
𝛽 = 𝐻†T 2008) developed a pruned-ELM (P-ELM) algorithm that
provides an automated and systematic way to design an ELM
Where 𝐻 † is represents the Moore–Penrose generalised
network. P-ELM starts with a large amount of hidden nodes
inverse for H. Thus, the output weights are calculated via a
before it gets rid of the lowly relevant or irrelevant hidden
mathematical transformation. This makes sure that the lengthy
nodes by taking into account their relevance to the class labels
training phrase when network parameters are iteratively
in the learning process. As a result, one can automate the
adjusted with some suitable learning parameters (like
architectural design of ELM. Simulation results revealed that
iterations and learning rate) is done away with.
the P-ELM resulted in compact network classifiers that are
Huang et al. (2006) enumerated the variables, where H capable of generating robust prediction accuracy and fast
represents the output matrix of the neural network’s hidden response on unseen data compared to the standard BP, ELM,
layer; in H, the ith column is used to describe the ith hidden and MRAN. P-ELM is mostly suitable for pattern
layer nodes in terms of the input nodes. If L ≤ N represents classification problems. Given the fact that too few/many
the desired number of hidden nodes, the activation function g hidden nodes can result into issues of underfitting/overfitting
becomes infinitely differentiable. in pattern classification, Rong et al. (2008) proposed a pruned-
ELM (P-ELM) algorithm. This algorithm is a systematic and
VARIANTS OF ELM
automated way to design an ELM network. P-ELM starts with
This section summarises and briefly introduces several typical a large amount of hidden nodes before it gets rid of the lowly
variants of ELM. relevant or irrelevant hidden nodes by taking into account
their relevance to the class labels in the learning process. As a
result, one can automate the architectural design of ELM.
Incremental ELM Simulation results revealed that the P-ELM resulted in
(Huang et al. 2006) developed an Incremental Extreme compact network classifiers that are capable of generating
Learning Machine (I-ELM) to create a feedforward network robust prediction accuracy and fast response on unseen data
that is incremental. I-ELM added nodes randomly to the compared to the standard BP, ELM, and MRAN. P-ELM is
hidden layer. This addition was done one by one. It then froze mostly suitable for pattern classification problems.
the existing hidden nodes’ output weights during the addition
of a new hidden node. I-ELM is efficient not only for SLFN
Error-minimised ELM
having continuous activation functions (as well as
differentiable), but they are also efficient for SLFNs that have (Feng et al. 2009) developed an error-minimisation-based
piecewise continuous activation functions (like threshold). method that can be used for ELM (EM-ELM). This method is
Given this context of I-ELM, Huang et al. presented the able to grow hidden nodes group by group or one by one and
convex I-ELM (CI-ELM) and enhance I-ELM (EI-ELM). automatically know the amount of hidden nodes that can be
Unlike I-ELM, CI-ELM (Huang & Chen 2007) recalculates found in generalised SLFNs. During network growth,
the existing hidden nodes’ output weights when a new hidden updating of the output weights is done incrementally, which
node is added. Compared to I-ELM, CI-ELM is able to pointedly lowers the computational complexity. For sigmoid
achieve more compact network architectures and faster type hidden nodes, the simulation results revealed that this
convergence rates while retaining the efficiency and technique could significantly lower ELM’s computational
simplicity of I-ELM. EI-ELM (Huang & Chen 2008) allows complexity and help formulate an efficient ELM
for the maximum amount of hidden nodes. Furthermore, users implementation.
need not set control parameters manually. Unlike the original
I-ELM, EI-ELM chooses the optimal hidden node. This
means that the smallest residual error is obtained at every Two-stage ELM
learning step among several of the hidden nodes that were
To achieve a parsimonious solution for the preliminary
randomly generated. EI-ELM is able to achieve a much more
ELM’s network structure, (Lan et al. 2010) proposed a
compact network architecture and faster convergence rate.

4614
International Journal of Applied Engineering Research ISSN 0973-4562 Volume 12, Number 14 (2017) pp. 4610-4623
© Research India Publications. http://www.ripublication.com

systematic two-stage algorithm (named TS-ELM). During the called the evolutionary extreme learning machine (E-ELM) to
first stage, they applied a forward recursive algorithm to optimize the hidden biases and input weights and determine
choose the hidden nodes from the randomly generated the weights of the output. In E-ELM, the input weights and
candidates in each step. These hidden nodes were then added hidden biases were optimised using the modified differential
to the network until they met the stopping criterion. evolutionary (DE) algorithm. The output weights were
Consequently, each hidden node’s significance was analytically determined using the Moore– Penrose (MP)
determined by their net contribution after being added to the generalised inverse. Experimental results revealed that E-
network. During the second stage, a review of the selected ELM was capable of achieving good generalisation
hidden nodes was done in order to get rid of the unimportant performance that has more compact networks and which is
nodes in the network. This step significantly reduced the superior to other algorithms like GALS, BP, and the original
complexity of the network. The empirical studies conducted ELM.
on the six cases revealed that TS-ELM having a significantly
smaller network structure may be able to achieve similar or
better performance than EM-ELM. Voting-based ELM
Since the hidden nodes in ELM’s learning parameters are
randomly assigned and stay the same during the training
Online sequential ELM
process, ELM may not achieve the optimal classification
When using the conventional ELM, all of the training data boundary. As such, the samples nearest to the classification
must be available for training purposes. However, the training boundary run the risk of being misclassified. Therefore, Cao
data in real applications may be obtained one by one or chunk et al. (2012) developed an improved algorithm referred to as
by chunk. (Liang et al. 2006) proposed a sequential learning the voting-based extreme learning machine (V-ELM). The
algorithm called the online sequential extreme learning aim of this algorithm is to lessen the amount of misclassified
machine (OS-ELM). This algorithm is able to work with both samples that are found near the classification boundary. In V-
RBF and additive nodes in a unified framework. OSELM that ELM, the main idea is to conduct multiple independent ELM
has additive nodes randomly generates the input weights that trainings rather than just performing a single ELM training
connect the hidden nodes and biases to the input nodes. It then and taking a final decision that is based on the results of the
analytically determines the output weights based on the majority voting method (Cao et al. 2012). V-ELM was able to
hidden nodes’ output. Unlike the other kinds of sequential improve the classification performance, lessen the amount of
learning algorithms, the OS-ELM only needs the specification misclassified samples, and reduce variance among the
for the number of hidden nodes. This is also similar to the different realisations. Based on the simulations conducted on
conventional ELM. To enhance the OSELM’s performance numerous real-world classification datasets, it was observed
and acquaint the ensemble networks with the sequential that V-ELM generally performed better than the original ELM
learning mode, (Lan et al. 2009) developed an integrated algorithm and even other recent classification algorithms.
network structure that is referred to as the ensemble of online
sequential extreme learning machine (EOS-ELM). EOS-ELM
is made up of several OS-ELM networks. For these networks, Ordinal ELM
the network performance’s final measurement is computed
To study the ELM algorithm further for ordinal regression
based on the average value of the outputs for every OS-ELM
problems, (Deng et al. 2010) introduced three ELM-based
in the ensemble. Furthermore, to show the training data’s
ordinal regression algorithms and an encoding-based ordinal
timeliness in the learning process,(Zhao et al. 2012)
regression framework. The paper developed an encoding-
developed an improved EOS-ELM called the online
based framework that can be used for ordinal regression and
sequential extreme learning machine with forgetting
that contained three encoding schemes: multiple binary
mechanism (FOS-ELM). This algorithm can retain EOS-
classifications having a one-against-all decomposition
ELM’s advantages and enhance its learning effects by quickly
method, single multi-output classifier, and one-against-one
getting rid of the out-dated data during the learning process in
method. The framework was used as the basis for the
order to lessen their bad affection to the next learning process.
redesigning of the SLFN for ordinal regression problems.
Extreme learning machine was then used to train the
algorithms. Experiments conducted on the three types of
Evolutionary ELM
datasets revealed that ordinal ELM is capable of achieving
Typically, the amount of hidden neurons is randomly good generalisation ability and extremely rapid training speed.
determined during the application of ELM. However, ELM
may require higher amounts of hidden neurons as a result of
the random determination of the hidden biases and input
weights. (Zhu et al. 2005) proposed a novel learning algorithm

4615
International Journal of Applied Engineering Research ISSN 0973-4562 Volume 12, Number 14 (2017) pp. 4610-4623
© Research India Publications. http://www.ripublication.com

Fully complex ELM representational learning. Their experiments tested a 784-700-


700-15000-10 multi-layer ELM network using the popular
To extend the ELM algorithm’s application,(Li et al. 2005)
MNIST data set. This data set contains 10,000 images for
developed a fully complex extreme learning algorithm called
testing and 60,000 images of handwritten digits that are used
the C-ELM. In this algorithm, the ELM algorithm’s reach was
for training. The results showed that compared to other state-
extended from being in the real domain all the way to the
of-the-art deep learning techniques, the multi-layer ELMAE
complex domain. Like ELM, the hidden layer biases and input
achieved matchable precision and was significantly faster in
weights of C-ELM were selected randomly on the basis of
training (see Table 2). Kasun et al. (2013) also examined by
some continuous distribution probability. Afterwards, the
which the ELM auto encoder acquires and learns feature
output weights were computed analytically instead of it
representations. They made 10 mini data sets that contained
turning iteratively. Then, C-ELM was utilised to equalise a
digits 0–9 taken from the MNIST data set. Each mini data set
complex nonlinear channel using QAM signals.
was then sent through an ELM AE (network structure: 784-
20-784). They observed that the output weights β of the ELM-
AE were able to actually obtain valuable information from the
Symmetric ELM
original images. Additionally, Huang et al. (2014)
(Liu et al. 2013) proposed a modified ELM algorithm known demonstrated that the unsupervised ELM performed better
as the symmetric ELM (S-ELM). This algorithm turned the than the deep auto-encoder in terms of clustering and
hidden neurons’ original activation function into a symmetric embedding tasks.
function in terms of the samples’ input variables.
Theoretically, S-ELM is capable of preserving the capacity to
approximate N arbitrary distinct samples with any errors. It Table 2: Performance comparison of ELM-AE with state-of-
was shown by the simulation results that S-ELM can obtain the-art deep networks on MNIST data set (Kasun et al. 2013)
faster learning speed, better generalisation performance, and
more compact network architecture by using the prior Algorithms Testing accuracy Training time
knowledge for symmetry. (%)

EXPERIMENTAL STUDY OF ELM AND ELM-AE 99.03 444.655 s


APPLICATIONS
Deep belief network (DBN) 98.87 20,580 s
Comprehensive empirical studies have been performed about
the performance of ELM and ELM’s variants. In the past Deep Boltzmann machine 99.05 68,246 s
years, it has also been compared with other advanced learning (DBM)
algorithms. Typically, ELM is easy in implementation, fast
Stacked auto-encoder (SAE) 98.6 >17 h
and stable in training, and accurate in prediction and
modelling. Stacked denoising auto- 98.72 >17 h
encoder (SDAE)

Comparison with SVM and its variant


A detailed empirical study about the generalisation Extreme Learning Machine for Speech Application
performance and training efficiency of ELM regression and
Comprehensive empirical studies have been performed in
classification was given in Huang, Zhou, et al. (2012).
language identification. Furthermore, several attempts have
Comparisons were made with LS-SVM and classical SVM on
been conducted to build an ELM-based language classifier as
over forty data sets. Moreover, they studied various kinds of
a replacement for the classical SVM. (Xu et al. 2015)
activation functions in ELM. The experimental results verified
formulated a new type of extreme learning machine which
that ELM was able to exhibit similar or better generalisation
they then applied on language identification. They called this
performance for binary class classification and regression, and
algorithm the Regularised Minimum Class Variance Extreme
that it showed pointedly better generalisation performance in
Learning Machine (RMCVELM). The algorithm’s core goal
terms of multiclass classification data sets. Furthermore, ELM
is to lessen the structural risk, empirical risk, and the intra-
had a significantly faster learning speed (reaching up to
class variance. The authors assessed it based on execution
several orders of magnitude) and better scalability.
time and accuracy. They discovered that it performed better
than SVM in terms of execution time. It was also able to
achieve comparable classification accuracy. (Lan et al. 2013)
Comparison with deep learning
also tried to apply extreme learning machine for speaker
Kasun et al. (2013) developed an ELM-based auto encoder recognition. They applied LM on speaker that has text
(ELM-AE) that can be used for classification and independent data. They then compared the results obtained

4616
International Journal of Applied Engineering Research ISSN 0973-4562 Volume 12, Number 14 (2017) pp. 4610-4623
© Research India Publications. http://www.ripublication.com

with that of SVM. They found out that ELM has higher ELM for image processing
accuracy and faster execution. (Han et al. 2014) also
ELM is also considered an attractive technique for image
attempted to identify the speaker’s emotion using extreme
processing. For instance, (An & Bhanu 2012) introduced an
learning machine as classifier and DNN as feature extractor.
efficient image super resolution method that has its basis in
They found that ELM and Kernel ELM (KELM), when
ELM. The aim of their approach is to generate high resolution
combined with DNN, have the highest accuracies compared to
images from inputs with low-resolution. In the training
all the other baseline approaches. (Muthusamy et al. 2015)
process, the input was extracted from the image features.
utilised ELM with another classifier on various types of
Furthermore, the high frequency components that were taken
audio-related classification problems. They also addressed
from the original images with high-resolution were utilized as
emotion recognition based on the speaker’s audio. They used
the target values. Then, ELM learns a model that is capable of
GMM model features as inputs for the classifier. The authors
mapping the interpolated image and imposing it on the high-
stress the power of GMM-based features in offering
frequency components. Once training is done, the learned
discriminative factors that can be used to classify emotions.
model can predict the high-frequency components using low-
resolution images. (Li et al. 2013) used ELM to burn state
recognition of rotary kiln. (Chang et al. 2010) used ELM for
ELM for medical/biomedical applications
change detection of land cover and land use. Moreover, ELM
Medical or biomedical data typically have high dimensional was utilised for image classification by (Cao et al. 2013; Bazi
features or a large amount of samples. Thus, medical or et al. 2014). ELM was used to assess the perceived image
biomedical data analysis often utilise advanced machine quality (Decherchi et al. 2013; Suresh et al. 2009). ELM was
learning techniques like SVM, Since ELM offers many also utilised in the detection of semantic concept for videos
advantages compared to other learning algorithms, its (Lu et al. 2013). Image deblurring can also be done using
application in this area could be an interesting thing to see. filters that are learned by ELM (Wang et al. 2011). SAE-ELM
Indeed, many encouraging results on the application of ELM was utilised in the coal mine water inrush’s multi-level
to predict protein–protein interactions (You et al. 2013), EEG- forecasting model (Zhao & Hu 2014).
based vigilance estimation(Shi & Lu 2013),epileptic EEG
patterns recognition (Yuan et al. 2011; Song et al. 2012; Song
& Zhang 2013), transmembrane beta-barrel chains detection ELM for system modelling and prediction
(Savojardo et al. 2011), an eye-control method for eye-based
Because traditional neural networks have had wide uses in
computer interaction (Barea et al. 2012), spike sorting with
system prediction and modelling, ELM also has great
overlap resolution that is based on a hybrid noise-assisted
potential in the development of accurate and efficient models
methodology (Adamos et al. 2010), lie detection (Gao et al.
for these applications. (Xu et al. 2013) proposed an ELM-
2013), an electrocardiogram ECG (Karpagachelvi et al. 2012),
based predictor that can be used in the actual frequency
liver parenchyma segmentation (Huang et al. 2012), diagnosis
stability assessment (FSA) of power systems. The predictor’s
of hepatitis (Kaya & Uyar 2013), thyroid (Li et al. 2012),
inputs are the power system operational parameters, while the
arrhythmia classification in ECG (Kim et al. 2009), detection
output is set as the frequency stability margin. This margin
of mycobacterium tuberculosis in tissue sections (Osman et al.
measures the power system’s stability degree, subject to a
2012), protein secondary structure prediction (Saraswathi et
contingency. Using off-line training and a frequency stability
al. 2012), and metagenomics taxonomic classification
database, one can apply the predictor online for real-time
(Rasheed & Rangwala 2012) have been observed in recent
FSA. They tested the predictor on New England’s 10-
years.
generator 39-bus test system. The results of this simulation
revealed that it is capable of accurately and efficiently
predicting the frequency stability. ELM was also utilised for
ELM for computer vision
electricity price forecasting (Chen et al. 2012), sales
ELM has had successful applications in various computer forecasting (Wong & Guo 2010; Chen & Ou 2011),
vision tasks, such as human action recognition (Minhas et al. temperature prediction of molten steel (Tian & Mao 2010),
2010; Minhas et al. 2012), face recognition (Mohammed et al. security assessment of wind power system (Xu et al. 2012; Xu
2011; Zong & Huang 2011; Choi et al. 2012; Baradarani et al. et al. 2012), drying system modelling (Balbay et al. 2012), etc.
2013; Marqués & Graña 2013; He et al. 2014), terrain-based Because of its notable advantages, many other applications
navigation (Kan et al. 2013), and matching of fingerprints have adopted ELM. Based on past literature, we can witness
(Yang et al. 2013). its successful applications in control system design, text
analysis, chemical process monitor, feature selection,
clustering, ranking, and representational learning.
Furthermore, ELM can be applied to more potential fields as
well.

4617
International Journal of Applied Engineering Research ISSN 0973-4562 Volume 12, Number 14 (2017) pp. 4610-4623
© Research India Publications. http://www.ripublication.com

STRENGTH AND WEAKNESSES OF ELM CONCLUSION


Majority of the literature reviewed considers ELM to be a This paper presented a comprehensive review of the ELM
good learning machine tool. ELM’s major strength is that the algorithm, with emphasis on its applications and variants. The
hidden layers’ learning parameters, including the biases and aim of the paper is to show that ELM is a valuable tool for
input weights, do not have to be iteratively tuned like in SLFN research applications, as it can provide more accurate results
(Huang et al. 2004; Huang et al. 2006; Huang et al. 2012; and save time in terms of the calculation time during
Ding et al. 2014; Lin et al. 2015; Liu et al. 2015; Ebtehaj et al. classification, regression and other similar problems.
2016). Because of this, the ELM is capable of achieving faster However, some of the ELM algorithm’s open problems have
speeds and lower costs (Huang et al. 2015). Furthermore, it is to be solved as well. The following concerns are still open and
the most favoured in machine learning compared to its may be worth studying in the future: (i) theoretical proof and
predecessors. Some of the other commendable attributes of application of the optimal amount of hidden nodes, (ii)
ELM include good generalisation accuracy and performance approximation of generalisation performance (iii)
(Huang et al. 2015), simple learning algorithm (Zhang et al. generalisation capability so that it can manage high
2016), improved efficiency (Huang et al. 2006; Ding et al. dimensional data (Deng et al. 2016; Liu et al. 2016; Oneto et
2014), non-linear transformation during its training phase, al. 2016; Wang et al. 2016), (iv) modification of ELM
possession of a unified solution to different practical algorithm for distributed and parallel computation (Ding et al.
applications (Huang 2015), lack of local minimal and 2014; Bodyanskiy et al. 2016). (Huang 2014) also stressed on
overfitting (Huang et al. 2006; Huang 2015), the need for the need to examine the connection between ELM and other
fewer optimisations compared to SVM and LS-SVM, and its algorithms related to it, such as the random forest algorithm.
similar computational cost with SVM (Zhang et al. 2016).
More importantly, ELM is able to bridge the gap between
biological learning machines and conventional learning ACKNOWLEDGEMENTS
machines (Huang et al. 2015),which is the goal of (Huang et
This project is funded by Malaysian government under
al. 2004) (cited in(Huang et al. 2015)), a pioneer in the study
research code FRGS/1/2016/ICT02/UKM/01/14.
of ELM.
Despite the many advantages that ELM possesses, it still has
some flaws. For example, it was observed that the REFERENCE
classification boundary of the hidden layers’ learning [1] Adamos, D. A., N. A. Laskaris, E. K. Kosmidis & G.
parameters may not be optimal since they remain the same Theophilidis 2010. NASS: an empirical approach to
during training (Cao et al. 2012; Ding et al. 2014). spike sorting with overlap resolution based on a
Furthermore, ELM is not capable of managing large high hybrid noise-assisted methodology. Journal of
dimensional data (Huang et al. 2015; Zhang et al. 2016) since neuroscience methods190(1): 129-142.
it needs more hidden nodes compared to the conventional [2] An, L. & B. Bhanu 2012. Image super-resolution by
tuning algorithms. It also is not suitable for being parallelised extreme learning machine. Image processing (ICIP),
because it goes through pseudo-inverse circulation (Oneto et 2012 19th IEEE International Conference on. pp.
al. 2016). A number of these challenges are already being 2209-2212.
addressed by modifications, optimisations, and hybridisations. [3] Avci, E. & R. Coteli 2012. A new automatic target
However, many of the current literature in ELM still have the recognition system based on wavelet extreme
following recommendations for further research: (i) learning machine. Expert Systems with
theoretical proof and application of the optimal amount of Applications39(16): 12340-12348.
hidden nodes, (ii) approximation of generalisation [4] Bai, Z., G.-B. Huang, D. Wang, H. Wang & M. B.
performance (iii) generalisation capability so that it can Westover 2014. Sparse extreme learning machine for
manage high dimensional data (Deng et al. 2016; Liu et al. classification. IEEE transactions on
2016; Oneto et al. 2016; Wang et al. 2016),(iv) modification cybernetics44(10): 1858-1870.
of ELM algorithm for distributed and parallel computation
[5] Balbay, A., Y. Kaya & O. Sahin 2012. Drying of
(Ding et al. 2014; Bodyanskiy et al. 2016). (Huang 2014) also black cumin (Nigella sativa) in a microwave assisted
stressed on the need to examine the connection between ELM drying system and modeling using extreme learning
and other algorithms related to it, such as the random forest machine. Energy44(1): 352-357.
algorithm.
[6] Baradarani, A., Q. J. Wu & M. Ahmadi 2013. An
efficient illumination invariant face recognition
framework via illumination enhancement and DD-
DTCWT filtering. Pattern Recognition46(1): 57-72.
[7] Barea, R., L. Boquete, S. Ortega, E. López & J.

4618
International Journal of Applied Engineering Research ISSN 0973-4562 Volume 12, Number 14 (2017) pp. 4610-4623
© Research India Publications. http://www.ripublication.com

Rodríguez-Ascariz 2012. EOG-based eye [21] Cortes, C. & V. Vapnik 1995. Support vector
movements codification for human computer machine. Machine learning20(3): 273-297.
interaction. Expert Systems with Applications39(3): [22] Daliri, M. R. 2012. A hybrid automatic system for
2677-2683.
the diagnosis of lung cancer based on genetic
[8] Bazi, Y., N. Alajlan, F. Melgani, H. AlHichri, S. algorithm and fuzzy extreme learning machines.
Malek & R. R. Yager 2014. Differential evolution Journal of medical systems36(2): 1001-1005.
extreme learning machine for the classification of [23] Decherchi, S., P. Gastaldo, R. Zunino, E. Cambria &
hyperspectral images. IEEE Geoscience and Remote
J. Redi 2013. Circular-ELM for the reduced-
Sensing Letters 11(6): 1066-1070.
reference assessment of perceived image quality.
[9] Bengio, Y. 2009. Learning deep architectures for AI. Neurocomputing102: 78-89.
Foundations and trends® in Machine Learning2(1): [24] Deng, J., K. Li & G. W. Irwin 2011. Fast automatic
1-127.
two-stage nonlinear model identification based on
[10] BenoíT, F., M. Van Heeswijk, Y. Miche, M. the extreme learning machine.
Verleysen & A. Lendasse 2013. Feature selection for Neurocomputing74(16): 2422-2429.
nonlinear models with extreme learning machines. [25] Deng, W.-Y., Z. Bai, G.-B. Huang & Q.-H. Zheng
Neurocomputing102: 111-124.
2016. A fast SVD-hidden-nodes based extreme
[11] Bodyanskiy, Y., O. Vynokurova, I. Pliss, G. Setlak & learning machine for large-scale data analytics.
P. Mulesa 2016. Fast learning algorithm for deep Neural Networks77: 14-28.
evolving GMDH-SVM neural network in data [26] Deng, W.-Y., Q.-H. Zheng, S. Lian, L. Chen & X.
stream mining tasks. Data Stream Mining &
Wang 2010. Ordinal extreme learning machine.
Processing (DSMP), IEEE First International
Neurocomputing74(1): 447-456.
Conference on. pp. 257-262.
[27] Ding, S., X. Xu & R. Nie 2014. Extreme learning
[12] Branke, J. 1995. Evolutionary algorithms for neural
machine and its applications. Neural computing &
network design and training. In Proceedings of the
applications25.
First Nordic Workshop on Genetic Algorithms and
its Applications. [28] Du, D., K. Li, G. W. Irwin & J. Deng 2013. A novel
automatic two-stage locally regularized classifier
[13] Cao, F., B. Liu & D. S. Park 2013. Image
construction method using the extreme learning
classification based on effective extreme learning
machine. Neurocomputing102: 10-22.
machine. Neurocomputing 102: 90-97.
[29] Ebtehaj, I., H. Bonakdari & S. Shamshirband 2016.
[14] Cao, J., Z. Lin & G.-b. Huang 2010. Composite Extreme learning machine assessment for estimating
function wavelet neural networks with extreme
sediment transport in open channels. Engineering
learning machine. Neurocomputing73(7): 1405-1416.
with Computers32(4): 691-704.
[15] Cao, J., Z. Lin, G.-B. Huang & N. Liu 2012. Voting [30] Feng, G., G.-B. Huang, Q. Lin & R. Gay 2009. Error
based extreme learning machine. Information minimized extreme learning machine with growth of
Sciences185(1): 66-77.
hidden nodes and incremental learning. IEEE
[16] Chang, N.-B., M. Han, W. Yao, L.-C. Chen & S. Xu Transactions on Neural Networks20(8): 1352-1357.
2010. Change detection of land use and land cover in [31] Fernández-Delgado, M., E. Cernadas, S. Barro, J.
an urban region with SPOT-5 images and partial Ribeiro & J. Neves 2014. Direct Kernel Perceptron
Lanczos extreme learning machine. Journal of
(DKP): Ultra-fast kernel ELM-based classification
Applied Remote Sensing4(1): 043551.
with non-iterative closed-form weight calculation.
[17] Chen, F. & T. Ou 2011. Sales forecasting system Neural Networks50: 60-71.
based on Gray extreme learning machine with
[32] Gao, J., Z. Wang, Y. Yang, W. Zhang, C. Tao, J.
Taguchi method in retail industry. Expert Systems
Guan & N. Rao 2013. A novel approach for lie
with Applications38(3): 1336-1345.
detection based on F-score and extreme learning
[18] Chen, S., C. F. Cowan & P. M. Grant 1991. machine. PloS one8(6): e64704.
Orthogonal least squares learning algorithm for
[33] Hagan, M. T. & M. B. Menhaj 1994. Training
radial basis function networks. IEEE Transactions on
feedforward networks with the Marquardt algorithm.
neural networks2(2): 302-309.
IEEE transactions on Neural Networks5(6): 989-
[19] Chen, X., Z. Y. Dong, K. Meng, Y. Xu, K. P. Wong 993.
& H. Ngan 2012. Electricity price forecasting with
[34] Han, K., D. Yu & I. Tashev 2014. Speech emotion
extreme learning machine and bootstrapping. IEEE
recognition using deep neural network and extreme
Transactions on Power Systems27(4): 2055-2062.
learning machine. Interspeech. pp. 223-227.
[20] Choi, K., K.-A. Toh & H. Byun 2012. Incremental [35] He, B., D. Xu, R. Nian, M. van Heeswijk, Q. Yu, Y.
face recognition for large-scale social network
Miche & A. Lendasse 2014. Fast Face Recognition
services. Pattern Recognition45(8): 2868-2883.

4619
International Journal of Applied Engineering Research ISSN 0973-4562 Volume 12, Number 14 (2017) pp. 4610-4623
© Research India Publications. http://www.ripublication.com

Via Sparse Coding and Extreme Learning Machine. approach to the segmentation of liver parenchyma
Cognitive Computation6(2): 264-277. from 3D CT images with Extreme Learning
[36] He, Q., C. Du, Q. Wang, F. Zhuang & Z. Shi 2011. A Machine. Engineering in Medicine and Biology
Society (EMBC), 2012 Annual International
parallel incremental extreme SVM classifier.
Conference of the IEEE. pp. 3752-3755.
Neurocomputing74(16): 2532-2540.
[37] Horata, P., S. Chiewchanwattana & K. Sunat 2013. [51] Kan, E. M., M. H. Lim, Y. S. Ong, A. H. Tan & S. P.
Robust extreme learning machine. Yeo 2013. Extreme learning machine terrain-based
navigation for unmanned aerial vehicles. Neural
Neurocomputing102: 31-44.
computing & applications: 1-9.
[38] Huang, G.-B. 2014. An insight into extreme learning
machines: random neurons, random features and [52] Karpagachelvi, S., M. Arthanari & M. Sivakumar
kernels. Cognitive Computation6(3): 376-390. 2012. Classification of electrocardiogram signals
with support vector machines and extreme learning
[39] Huang, G.-B. 2015. What are extreme learning machine. Neural Computing & Applications21(6):
machines? Filling the gap between Frank 1331-1339.
Rosenblatt’s dream and John von Neumann’s puzzle.
Cognitive Computation7(3): 263-278. [53] Kasun, L. L. C., H. Zhou, G.-B. Huang & C. M.
Vong 2013. Representational learning with ELMs for
[40] Huang, G.-B. & L. Chen 2007. Convex incremental big data.
extreme learning machine. Neurocomputing70(16):
3056-3062. [54] Kaya, Y. & M. Uyar 2013. A hybrid decision support
system based on rough set and extreme learning
[41] Huang, G.-B. & L. Chen 2008. Enhanced random machine for diagnosis of hepatitis disease. Applied
search based incremental extreme learning machine. Soft Computing13(8): 3429-3438.
Neurocomputing71(16): 3460-3468.
[55] Kim, J., H. S. Shin, K. Shin & M. Lee 2009. Robust
[42] Huang, G.-B., L. Chen & C. K. Siew 2006. Universal algorithm for arrhythmia classification in ECG using
approximation using incremental constructive extreme learning machine. Biomedical engineering
feedforward networks with random hidden nodes. online8(1): 31.
IEEE Trans. Neural Networks17(4): 879-892.
[56] Lahoz, D., B. Lacruz & P. M. Mateo 2013. A multi-
[43] Huang, G.-B., M.-B. Li, L. Chen & C.-K. Siew 2008. objective micro genetic ELM algorithm.
Incremental extreme learning machine with fully Neurocomputing111: 90-103.
complex hidden nodes. Neurocomputing71(4): 576-
583. [57] Lan, Y., Z. Hu, Y. C. Soh & G.-B. Huang 2013. An
extreme learning machine approach for speaker
[44] Huang, G.-B., D. H. Wang & Y. Lan 2011. Extreme recognition. Neural Computing and
learning machines: a survey. International journal of Applications22(3-4): 417-425.
machine learning and cybernetics2(2): 107-122.
[58] Lan, Y., Y. C. Soh & G.-B. Huang 2009. Ensemble
[45] Huang, G.-B., H. Zhou, X. Ding & R. Zhang 2012. of online sequential extreme learning machine.
Extreme learning machine for regression and Neurocomputing72(13): 3391-3395.
multiclass classification. IEEE Transactions on
[59] Lan, Y., Y. C. Soh & G.-B. Huang 2010. Two-stage
Systems, Man, and Cybernetics, Part B
(Cybernetics)42(2): 513-529. extreme learning machine for regression.
Neurocomputing73(16): 3028-3038.
[46] Huang, G.-B., Q.-Y. Zhu & C.-K. Siew 2004.
[60] Le, Q., T. Sarlós & A. Smola 2013. Fastfood-
Extreme learning machine: a new learning scheme of
approximating kernel expansions in loglinear time.
feedforward neural networks. Neural Networks,
2004. Proceedings. 2004 IEEE International Joint Proceedings of the international conference on
machine learning. 85.
Conference on. 2 pp. 985-990.
[61] Li, B., Y. Li & X. Rong 2013. The extreme learning
[47] Huang, G.-B., Q.-Y. Zhu & C.-K. Siew 2006.
Extreme learning machine: theory and applications. machine learning algorithm with tunable activation
Neurocomputing70(1): 489-501. function. Neural computing & applications: 1-9.
[62] Li, K., J.-X. Peng & G. W. Irwin 2005. A fast
[48] Huang, G., G.-B. Huang, S. Song & K. You 2015.
nonlinear model identification method. IEEE
Trends in extreme learning machines: A review.
Neural Networks61: 32-48. Transactions on Automatic Control50(8): 1211-
1216.
[49] Huang, G., S. Song, J. N. Gupta & C. Wu 2014.
[63] Li, L.-N., J.-H. Ouyang, H.-L. Chen & D.-Y. Liu
Semi-supervised and unsupervised extreme learning
2012. A computer aided diagnosis system for thyroid
machines. IEEE transactions on cybernetics44(12):
2405-2417. disease using extreme learning machine. Journal of
medical systems36(5): 3327-3337.
[50] Huang, W., Z. Tan, Z. Lin, G.-B. Huang, J. Zhou, C.
[64] Li, M.-B., G.-B. Huang, P. Saratchandran & N.
Chui, Y. Su & S. Chang 2012. A semi-automatic

4620
International Journal of Applied Engineering Research ISSN 0973-4562 Volume 12, Number 14 (2017) pp. 4610-4623
© Research India Publications. http://www.ripublication.com

Sundararajan 2005. Fully complex extreme learning extreme learning machine. IEEE transactions on
machine. Neurocomputing68: 306-314. neural networks21(1): 158-162.
[65] Li, W., D. Wang & T. Chai 2013. Burning state [79] Minhas, R., A. Baradarani, S. Seifzadeh & Q. J. Wu
recognition of rotary kiln using ELMs with 2010. Human action recognition using extreme
heterogeneous features. Neurocomputing102: 144- learning machine based on visual vocabularies.
153. Neurocomputing73(10): 1906-1917.
[66] Liang, N.-Y., G.-B. Huang, P. Saratchandran & N. [80] Minhas, R., A. A. Mohammed & Q. J. Wu 2012.
Sundararajan 2006. A fast and accurate online Incremental learning in human action recognition
sequential learning algorithm for feedforward based on snippets. IEEE Transactions on Circuits
networks. IEEE Transactions on neural and Systems for Video Technology22(11): 1529-
networks17(6): 1411-1423. 1541.
[67] Lin, S., X. Liu, J. Fang & Z. Xu 2015. Is extreme [81] Mohammed, A. A., R. Minhas, Q. J. Wu & M. A.
learning machine feasible? A theoretical assessment Sid-Ahmed 2011. Human face recognition based on
(Part II). IEEE Transactions on Neural Networks and multidimensional PCA and extreme learning
Learning Systems26(1): 21-34. machine. Pattern Recognition44(10): 2588-2597.
[68] Liu, D., Y. Wu & H. Jiang 2016. FP-ELM: An online [82] Muthusamy, H., K. Polat & S. Yaacob 2015.
sequential learning algorithm for dealing with Improved emotion recognition using Gaussian
concept drift. Neurocomputing207: 322-334. Mixture Model and extreme learning machine in
[69] Liu, X., C. Gao & P. Li 2012. A comparative speech and glottal signals. Mathematical Problems in
Engineering2015.
analysis of support vector machines and extreme
learning machines. Neural Networks33: 58-66. [83] Oneto, L., F. Bisio, E. Cambria & D. Anguita 2016.
[70] Liu, X., P. Li & C. Gao 2013. Symmetric extreme Statistical learning theory and ELM for big social
data analysis. ieee CompUTATionAl inTelliGenCe
learning machine. Neural computing & applications:
mAGAzine11(3): 45-55.
1-8.
[71] Liu, X., S. Lin, J. Fang & Z. Xu 2015. Is extreme [84] Osman, M. K., M. Y. Mashor & H. Jaafar 2012.
learning machine feasible? A theoretical assessment Performance comparison of extreme learning
machine algorithms for mycobacterium tuberculosis
(Part I). IEEE Transactions on Neural Networks and
detection in tissue sections. Journal of Medical
Learning Systems26(1): 7-20.
Imaging and Health Informatics2(3): 307-312.
[72] Lu, B., G. Wang, Y. Yuan & D. Han 2013. Semantic
concept detection for video based on extreme [85] Qu, Y., C. Shang, W. Wu & Q. Shen 2011.
Evolutionary Fuzzy Extreme Learning Machine for
learning machine. Neurocomputing102: 176-183.
Mammographic Risk Analysis. International Journal
[73] Malathi, V., N. Marimuthu & S. Baskar 2010. of Fuzzy Systems13(4).
Intelligent approaches using support vector machine
and extreme learning machine for transmission line [86] Rahimi, A. & B. Recht 2008. Random features for
large-scale kernel machines. Advances in neural
protection. Neurocomputing73(10): 2160-2167.
information processing systems. pp. 1177-1184.
[74] Malathi, V., N. Marimuthu, S. Baskar & K. Ramar
2011. Application of extreme learning machine for [87] Rahimi, A. & B. Recht 2008. Uniform approximation
series compensated transmission line protection. of functions with random bases. Communication,
Control, and Computing, 2008 46th Annual Allerton
Engineering Applications of Artificial
Conference on. pp. 555-561.
Intelligence24(5): 880-887.
[75] Man, Z., K. Lee, D. Wang, Z. Cao & C. Miao 2011. [88] Rahimi, A. & B. Recht 2009. Weighted sums of
random kitchen sinks: Replacing minimization with
A new robust training algorithm for a class of single-
randomization in learning. Advances in neural
hidden layer feedforward neural networks.
information processing systems. pp. 1313-1320.
Neurocomputing74(16): 2491-2501.
[76] Marqués, I. & M. Graña 2013. Fusion of lattice [89] Rasheed, Z. & H. Rangwala 2012. Metagenomic
taxonomic classification using extreme learning
independent and linear features improving face
machines. Journal of bioinformatics and
identification. Neurocomputing114: 80-85.
computational biology10(05): 1250015.
[77] MartíNez-MartíNez, J. M., P. Escandell-Montero, E.
Soria-Olivas, J. D. MartíN-Guerrero, R. Magdalena- [90] Rong, H.-J., G.-B. Huang, N. Sundararajan & P.
Saratchandran 2009. Online sequential fuzzy
Benedito & J. GóMez-Sanchis 2011. Regularized
extreme learning machine for function
extreme learning machine for regression problems.
approximation and classification problems. IEEE
Neurocomputing74(17): 3716-3721.
Transactions on Systems, Man, and Cybernetics,
[78] Miche, Y., A. Sorjamaa, P. Bas, O. Simula, C. Jutten Part B (Cybernetics)39(4): 1067-1072.
& A. Lendasse 2010. OP-ELM: optimally pruned
[91] Rong, H.-J., Y.-S. Ong, A.-H. Tan & Z. Zhu 2008. A

4621
International Journal of Applied Engineering Research ISSN 0973-4562 Volume 12, Number 14 (2017) pp. 4610-4623
© Research India Publications. http://www.ripublication.com

fast pruned-extreme learning machine for and learning systems25(10): 1828-1841.


classification problem. Neurocomputing72(1): 359- [105] Widrow, B., A. Greenblatt, Y. Kim & D. Park 2013.
366. The No-Prop algorithm: A new learning algorithm
[92] Rumelhart, D. E., G. E. Hinton & R. J. Williams for multilayer neural networks. Neural Networks37:
1988. Learning representations by back-propagating 182-188.
errors. Cognitive modeling5(3): 1. [106] Wilamowski, B. M. & H. Yu 2010. Neural network
[93] Saraswathi, S., J. L. Fernández-Martínez, A. learning without backpropagation. IEEE
Koliński, R. L. Jernigan & A. Kloczkowski 2012. Transactions on Neural Networks21(11): 1793-1803.
Fast learning optimized prediction methodology
[107] Wong, W. & Z. Guo 2010. A hybrid intelligent
(FLOPRED) for protein secondary structure model for medium-term sales forecasting in fashion
prediction. Journal of molecular modeling: 1-15. retail supply chains using extreme learning machine
[94] Savojardo, C., P. Fariselli & R. Casadio 2011. and harmony search algorithm. International Journal
Improving the detection of transmembrane β-barrel of Production Economics128(2): 614-624.
chains with N-to-1 extreme learning machines. [108] Xi-Zhao, W., S. Qing-Yan, M. Qing & Z. Jun-Hai
Bioinformatics27(22): 3123-3128. 2013. Architecture selection for networks trained
[95] Saxe, A., P. W. Koh, Z. Chen, M. Bhand, B. Suresh with extreme learning machine using localized
& A. Y. Ng 2011. On random weights and generalization error model. Neurocomputing102: 3-
unsupervised feature learning. Proceedings of the 9.
28th international conference on machine learning [109] Xu, J., W.-Q. Zhang, J. Liu & S. Xia 2015.
(ICML-11). pp. 1089-1096.
Regularized minimum class variance extreme
[96] Shi, L.-C. & B.-L. Lu 2013. EEG-based vigilance learning machine for language recognition.
estimation using extreme learning machines. EURASIP Journal on Audio, Speech, and Music
Neurocomputing102: 135-143. Processing2015(1): 22.
[97] Song, Y., J. Crowcroft & J. Zhang 2012. Automatic [110] Xu, Y., Y. Dai, Z. Y. Dong, R. Zhang & K. Meng
epileptic seizure detection in EEGs based on 2013. Extreme learning machine-based predictor for
optimized sample entropy and extreme learning real-time frequency stability assessment of electric
machine. Journal of neuroscience methods210(2): power systems. Neural computing & applications: 1-
132-146. 8.
[98] Song, Y. & J. Zhang 2013. Automatic recognition of [111] Xu, Y., Z. Y. Dong, Z. Xu, K. Meng & K. P. Wong
epileptic EEG patterns via extreme learning machine 2012. An intelligent dynamic security assessment
and multiresolution feature extraction. Expert framework for power systems with wind power.
Systems with Applications40(14): 5477-5489. IEEE Transactions on industrial informatics8(4):
995-1003.
[99] Suresh, S., R. V. Babu & H. Kim 2009. No-reference
image quality assessment using modified extreme [112] Xu, Y., Z. Y. Dong, J. H. Zhao, P. Zhang & K. P.
learning machine classifier. Applied Soft Wong 2012. A reliable intelligent system for real-
Computing9(2): 541-552. time dynamic security assessment of power systems.
IEEE Transactions on Power Systems27(3): 1253-
[100] Suykens, J. A. & J. Vandewalle 1999. Least squares
1263.
support vector machine classifiers. Neural
processing letters9(3): 293-300. [113] Yang, J., S. Xie, S. Yoon, D. Park, Z. Fang & S.
Yang 2013. Fingerprint matching based on extreme
[101] Tian, H.-X. & Z.-Z. Mao 2010. An ensemble ELM
learning machine. Neural computing & applications:
based on modified AdaBoost. RT algorithm for
predicting the temperature of molten steel in ladle 1-11.
furnace. IEEE Transactions on Automation Science [114] Yang, Y., Y. Wang & X. Yuan 2012. Bidirectional
and Engineering7(1): 73-80. extreme learning machine for regression problem
and its learning effectiveness. IEEE Transactions on
[102] Wang, H., Z. Xu & W. Pedrycz 2016. An overview
on the roles of fuzzy set techniques in big data Neural Networks and Learning Systems23(9): 1498-
processing: Trends, challenges and opportunities. 1505.
architecture2: 14. [115] Yang, Y., Y. Wang & X. Yuan 2013. Parallel chaos
search based incremental extreme learning machine.
[103] Wang, L., Y. Huang, X. Luo, Z. Wang & S. Luo
2011. Image deblurring with filters learned by Neural Processing Letters: 1-25.
extreme learning machine. Neurocomputing74(16): [116] Yao, X. 1993. A review of evolutionary artificial
2464-2474. neural networks. International journal of intelligent
systems8(4): 539-567.
[104] Wang, N., M. J. Er & M. Han 2014. Parsimonious
extreme learning machine using recursive orthogonal [117] Ye, Y., S. Squartini & F. Piazza 2013. Online
least squares. IEEE transactions on neural networks sequential extreme learning machine in nonstationary

4622
International Journal of Applied Engineering Research ISSN 0973-4562 Volume 12, Number 14 (2017) pp. 4610-4623
© Research India Publications. http://www.ripublication.com

environments. Neurocomputing116: 94-101.


[118] You, Z.-H., Y.-K. Lei, L. Zhu, J. Xia & B. Wang
2013. Prediction of protein-protein interactions from
amino acid sequences with ensemble extreme
learning machines and principal component analysis.
BMC bioinformatics14(8): S10.
[119] Yu, Q., Y. Miche, E. Eirola, M. Van Heeswijk, E.
SéVerin & A. Lendasse 2013. Regularized extreme
learning machine for regression with missing data.
Neurocomputing102: 45-51.
[120] Yuan, Q., W. Zhou, S. Li & D. Cai 2011. Epileptic
EEG classification based on extreme learning
machine and nonlinear features. Epilepsy
research96(1): 29-38.
[121] Zhang, J., L. Feng & B. Wu 2016. Local extreme
learning machine: local classification model for
shape feature extraction. Neural Computing and
Applications27(7): 2095-2105.
[122] Zhang, L., J. Li & H. Lu 2016. Saliency detection via
extreme learning machine. Neurocomputing218:
103-112.
[123] Zhang, W. & H. Ji 2013. Fuzzy extreme learning
machine for classification. Electronics Letters49(7):
448-450.
[124] Zhao, J., Z. Wang & D. S. Park 2012. Online
sequential extreme learning machine with forgetting
mechanism. Neurocomputing87: 79-89.
[125] Zhao, Z. & M. Hu 2014. Multi-level forecasting
model of coal mine water inrush based on self-
adaptive evolutionary extreme learning machine.
Appl. Math. Inf. Sci. Lett2(3): 103-110.
[126] Zhu, Q.-Y., A. K. Qin, P. N. Suganthan & G.-B.
Huang 2005. Evolutionary extreme learning
machine. Pattern recognition38(10): 1759-1763.
[127] Zong, W. & G.-B. Huang 2011. Face recognition
based on extreme learning machine.
Neurocomputing74(16): 2541-2551.
[128] Zong, W., G.-B. Huang & Y. Chen 2013. Weighted
extreme learning machine for imbalance learning.
Neurocomputing101: 229-242.

4623

You might also like