Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
1998, ICSC Symposium on Neural Computation
A cascade correlation learning network (CCLN) is a popular supervised learning architecture that gradually grows the hidden neurons of fixed nonlinear activation functions, adding one-by-one neuron in the network during the course of training. Because of fixed activation functions the cascaded connections from the existing neurons to the new candidate neuron are required to approximate high-order nonlinearity. The major drawback of a CCLN is that the error surface is very zigzag and unsmooth due to the use of maximum correlation criterion that consistently pushes the hidden neurons to their saturated extreme values instead of active region. To alleviate this drawback of the original CCLN two new cascadecorrelation learning networks (CCLNS1 and CCLNS2) are proposed, which enable smoothing of the error surface. Smoothing is performed by (re)training the gains of the hidden neurons' activation functions. In CCLNS1 smothing is enabled by using the sign functions of the neurons' outputs in the cascaded connections and in CCLNS2 each hidden neuron has two activation functions: fixed one for cascaded connections, and trainable one for connection to the neurons in output layer. The performances of the network structures are tested by learning them to approximate three nonlinear functions. Both proposed structures exhibit much better performances than the original CCLN, while CCLNS1 gives a little bit better results than CCLNS2.
1989
Cascade-Correlation is a new architecture and supervised learning algorithm for artificial neural networks. Instead of just adjusting the weights in a network of fixed topology, Cascade-Correlation begins with a minimal network, then automatically trains and adds new hidden units one by one, creating a multi-layer structure. Once a new hidden unit has been added to the network, its input-side weights are frozen. This unit then becomes a permanent feature-detector in the network, available for producing outputs or for creating other, more complex feature detectors. The Cascade-Correlation architecture has several advantages over existing algorithms: it learns very quickly, the network determines its own size and topology, it retains the structures it has built even if the training set changes, and it requires no back-propagation of error signals through the connections of the network.
IEEE Transactions on Neural Networks, 1994
The Cascade Correlation [1] is a very flexible, efficient and fast algorithm for supervised learning. It incrementally builds the network by adding hidden units one at a time, until the desired input/output mapping is achieved. It connects all the previously installed units to the new unit being added. Consequently, each new unit in effect adds a new layer and the fan-in of the hidden and output units keeps on increasing as more units get added. The resulting structure could be hard to implement in VLSI, because the connections are irregular and the fan-in is unbounded. Moreover, the depth or the propagation delay through the resulting network is directly proportional to the number of units and can be excessive. We have modified the algorithm to generate networks with restricted fan-in and small depth (propagation delay) by controlling the connectivity. Our results reveal that there is a tradeoff between connectivity and other performance attributes like depth, total number of independent parameters, learning time, etc. When the number of inputs or outputs is small relative to the size of the training set, a higher connectivity usually leads to faster learning, and fewer independent parameters, but it also results in unbounded fan-in and depth. Strictly layered architectures with restricted connectivity, on the other hand, need more epochs to learn and use more parameters, but generate more regular structures, with smaller, limited fan-in and significantly smaller depth (propagation delay), and may be better suited for VLSI implementations. When the number of inputs or outputs is not very small compared to the size of the training set, however, a strictly layered topology is seen to yield an overall better performance.
The architectures of Artificial Neural Networks (ANN) are based on the problem domain and it is applied during the " training phase " of sample data and used to infer results for the remaining data in the testing phase. Normally, the architecture consist of three layers as input, hidden, output layers with the number of nodes in the input layer as number of known values on hand and the number of nodes as result to be computed out of the values of input nodes and hidden nodes as the output layer. The number of nodes in the hidden layer is heuristically decided so that the optimum value is obtained with reasonable number of iterations with other parameters with its default values. This study mainly focuses on Cascade-Correlation Neural Networks (CCNN) using Back-Propagation (BP) algorithm which finds the number of neurons during the training phase itself by appending one from the previous iteration satisfying the error condition gives a promising result on the optimum number of neurons in the hidden layer.
Analytical Chemistry, 1998
A novel neural network has been devised that combines the advantages of cascade correlation and computational temperature constraints. The combination of advantages yields a nonlinear calibration method that is easier to use, stable, and faster than back-propagation networks. Cascade correlation networks adjust only a single unit at a time, so they train very rapidly when compared to backpropagation networks. Cascade correlation networks determine their topology during training. In addition, the hidden units are not readjusted once they have been trained, so these networks are capable of incremental learning and caching. With the cascade architecture, temperature may be optimized for each hidden unit. Computational temperature is a parameter that controls the fuzziness of a hidden unit's output. The magnitude of the change in covariance with respect to temperature is maximized. This criterion avoids local minima, forces the hidden units to model larger variances in the data, and generates hidden units that furnish fuzzy logic. As a result, models built using temperature-constrained cascade correlation networks are better at interpolation or generalization of the design points. These properties are demonstrated for exemplary linear interpolations, a nonlinear interpolation, and chemical data sets for which the numbers of chlorine atoms in polychlorinated biphenyl molecules are predicted from mass spectra.
1986
Description/Abstract A high-order correlation tensor formalism for neural networks is described. The model can simulate auto associative, heteroassociative, as well as multiassociative memory. For the autoassociative model, simulation results show a drastic ...
1995
It is often di cult to predict the optimal neural network size for a particular application. Constructive or destructive methods that add or subtract neurons, layers, connections, etc. might o er a solution to this problem. We prove that one method, Recurrent Cascade Correlation, due to its topology, has fundamental limitations in representation and thus in its learning capabilities. It cannot represent with monotone (i.e. sigmoid) and hard-threshold activation functions certain nite state automata. We give a \preliminary" approach on how to get around these limitations by devising a simple constructive training method that adds neurons during training while still preserving the powerful fully-recurrent structure. We illustrate this approach by simulations which learn many examples of regular grammars that the Recurrent Cascade Correlation method is unable to learn.
International Journal of Production Research, 2020
The performance and learning speed of the Cascade Correlation neural network (CasCor) may not be optimal because of redundant hidden units' in the cascade architecture and the tuning of connection weights. This study explores the limitations of CasCor and its variants and proposes a novel constructive neural network (CNN). The basic idea is to compute the input connection weights by generating linearly independent hidden units from the orthogonal linear transformation, and the output connection weights by connecting hidden units in a linear relationship to the output units. The work is unique in that few attempts have been made to analytically determine the connection weights on both sides of the network. Experimental work on real energy application problems such as predicting powerplant electrical energy, predicting seismic hazards to prevent fatal accidents and reducing energy consumption by predicting building occupancy detection shows that analytically calculating the connection weights and generating non-redundant hidden units improves the convergence of the network. The proposed CNN is compared with that of the state-of-the-art machine learning algorithms. The work demonstrates that proposed CNN predicts a wide range of applications better than other methods.
International Journal of Applied Mathematics and Computer Science
This research contribution instantiates a framework of a hybrid cascade neural network based on the application of a specific sort of neo-fuzzy elements and a new peculiar adaptive training rule. The main trait of the offered system is its competence to continue intensifying its cascades until the required accuracy is gained. A distinctive rapid training procedure is also covered for this case that offers the possibility to operate with non-stationary data streams in an attempt to provide online training of multiple parametric variables. A new training criterion is examined for handling non-stationary objects. Additionally, there is always an occasion to set up (increase) the inference order and the number of membership relations inside the extended neo-fuzzy neuron.
1992
In [5], a new incremental cascade network architecture has been presented. This paper discusses the properties of such cascade networks and investigates their generalization abilities under the particular constraint of small data sets. The evaluation is done for cascade networks consisting of local linear maps using the MackeyGlass time series prediction task as a benchmark. Our results indicate that to bring the potential of large networks to bear on the problem of extracting information from small data sets without running the risk of overjitting, deeply cascaded network architectures are more favorable than shallow broad architectures that contain the same number of nodes.
ArXiv, 2020
In this paper, a novel stepwise learning approach based on estimating desired premise parts' outputs by solving a constrained optimization problem is proposed. This learning approach does not require backpropagating the output error to learn the premise parts' parameters. Instead, the near best output values of the rules premise parts are estimated and their parameters are changed to reduce the error between current premise parts' outputs and the estimated desired ones. Therefore, the proposed learning method avoids error backpropagation, which lead to vanishing gradient and consequently getting stuck in a local optimum. The proposed method does not need any initialization method. This learning method is utilized to train a new Takagi-Sugeno-Kang (TSK) Fuzzy Neural Network with correlated fuzzy rules including many parameters in both premise and consequent parts, avoiding getting stuck in a local optimum due to vanishing gradient. To learn the proposed network parameters...
2004
Cascade correlation (CC) has proven to be an effective tool for simulating human learning. One important class of problem solving tasks can be thought of as establishing appropriate connections between inputs and outputs. A CC network initially attempts to solve the task with a minimal network configuration, but when the task cannot be solved, it is powered up by recruiting a hidden unit to capture the uncaptured aspects of the inputoutput relationship until a satisfactory degree of performance is reached. Knowledge-based CC (KBCC) has a similar mechanism, but instead of recruiting hidden units, it can recruit other networks previously trained with similar tasks. In this paper we demonstrate the usefulness of these network tools for simulating learning behavior by human subjects.
The constructive topology of the cascade correlation algorithm makes it a popular choice for many researchers wishing to utilize neural networks. However, for multimodal problems, the mean squared error of the approximation increases significantly as the number of modes increases. The components of this error will comprise both bias and variance and we provide formulae for estimating these values from mean squared errors alone. We achieve a near threefold reduction in the overall error by using early stopping and ensembling. Also described is a new subdivision technique that we call patchworking. Patchworking, when used in combination with early stopping and ensembling, can achieve an order of magnitude improvement in the error. Also presented is an approach for validating the quality of a neural network's training, without the explicit use of a testing dataset.
Soft Computing, 2014
This paper proposes a new architecture and learning algorithms for a hybrid cascade neural network with pool optimization in each cascade. The proposed system is different from existing cascade systems in its capability to operate in an online mode, which allows it to work with nonstationary and stochastic nonlinear chaotic signals with the required accuracy. Compared to conventional analogs, the proposed system provides computational simplicity and possesses both tracking and filtering capabilities. Keywords Hybrid system • Learning method • Neo-fuzzy neuron • Cascade network Communicated by V. Loia.
Behaviormetrika, 1999
Feed-forward neural network models approximate nonlinear functions connecting inputs to outputs. The cascade correlation (CC) learning algorithm allows networks to grow dynamically starting from the simplest network topology to solve increasingly more difficult problems. It has been demonstrated that the CC network can solve a wide range of problems including those for which other kinds of networks (e.g., back-propaga tion networks) have been found to fail. In this paper we show the mechanism and characteristics of nonlinear function learning and representations in CC networks, their generalization capabilities, the effects of environmental bias, etc., using a variety of knowledge representation analysis tools.
Proceedings of the World …, 2010
Intrinsic qualities of the cascade correlation algorithm make it a popular choice for many researchers wishing to utilize neural networks. Problems arise when the outputs required are highly multimodal over the input domain. The mean squared error of the approximation increases significantly as the number of modes increases. By applying ensembling and early stopping, we show that this error can be reduced by a factor of three. We also present a new technique based on subdivision that we call patchworking. When used in combination with early stopping and ensembling the mean improvement in error is over 10 in some cases.
1997
The current study investigates a method for avoidance of an overfitting/overtraining problem in Artificial Neural Network (ANN) based on a combination of two algorithms: Early Stopping and Ensemble averaging (ESE). We show that ESE provides an improvement of the prediction ability of ANN trained according to Cascade Correlation Algorithm. A simple algorithm to estimate the generalization ability of the method according to the Leave-One-Out technique is proposed and discussed. In the accompanying paper the problem of optimal selection of training cases is considered for accelerated learning of the ESE method.
Journal of Chemical Information and Computer Sciences, 1998
Cascade Learning (CL) [20] is a new adaptive approach to train deep neural networks. It is particularly suited to transfer learning, as learning is achieved in a layerwise fashion, enabling the transfer of selected layers to optimize the quality of transferred features. In the domain of Human Activity Recognition (HAR), where the consideration of resource consumption is critical, CL is of particular interest as it has demonstrated the ability to achieve significant reductions in computational and memory costs with negligible performance loss. In this paper, we evaluate the use of CL and compare it to end to end (E2E) learning in various transfer learning experiments, all applied to HAR. We consider transfer learning across objectives, for example opening the door features transferred to opening the dishwasher. We additionally consider transfer across sensor locations on the body, as well as across datasets. Over all of our experiments, we find that CL achieves state of the art performance for transfer learning in comparison to previously published work, improving F 1 scores by over 15%. In comparison to E2E learning, CL performs similarly considering F 1 scores, with the additional advantage of requiring fewer parameters. Finally, the overall results considering HAR classification performance and memory requirements demonstrate that CL is a good approach for transfer learning.
The 2006 Ieee International Joint Conference on Neural Network Proceedings, 2006
Wrapper-based feature selection is attractive because wrapper methods are able to optimize the features they select to the specific learning algorithm. Unfortunately, wrapper methods are prohibitively expensive to use with neural nets. We present an internal wrapper feature selection method for Cascade Correlation (C2) nets called C2FS that is 2-3 orders of magnitude faster than external wrapper feature selection. This new internal wrapper feature selection method selects features at the same time hidden units are being added to the growing C2 net architecture. Experiments with five test problems show that C2FS feature selection usually improves accuracy and squared error while dramatically reducing the number of features needed for good performance. Comparison to feature selection via an information theoretic ordering on features (gain ratio) shows that C2FS usually yields better performance and always uses substantially fewer features.
2016
This paper presents a multi-core programming model that implements the cascade correlation neural networks technique, to enhance the classification phase of any pattern recognition system. It is based on combining the strengths of both approaches in order to construct an efficient Parallel Cascade Correlation Neural Network (P-CC-NN) system. In this work a complex case of pattern recognition system which is a 3D facial data has been used to examine the proposed system and ensure its effectiveness, experimental results are presented using 360 3D facial images, each image contains 96 distinguishable features. Results show significant improvement in execution time about 31 minutes (4.6 times speedup) in comparison with 146.5 minutes for serial time, this topology generated an accuracy of 94 %. This work is the first approach to handle the classification challenges for different pattern recognition applications using multi-core techniques.
The effect of conelations in neural networks is investigated by wnsider@g biased input and output palm'" Statistical mechanics is applied @ study training times and intend potentials of the MINOVER and ADALME leaming algorithms. For the latter, a direet extension to generalization abilify is obtained. Comparison with computer simulations shows good apemen1 with theoretical predictions. With biased pattems, we find
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.