Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
1993
The non-linear complexities of neural networks make network solutions difficult to understand. Sanger's contribution analysis is here extended to the analysis of networks automatically generated by the cascade-correlation learning algorithm. Because such networks have cross connections that supersede hidden layers, standard analyses of hidden unit activation patterns are insufficient. A contribution is defined as the product of an output weight and the associated activation on the sending unit, whether that sending unit is an input or a hidden unit, multiplied by the sign of the output target for the current input pattern. Intercorrelations among contributions, as gleaned from the matrix of contributions x input patterns, can be subjected to principal components analysis (PCA) to extract the main features of variation in the contributions. Such an analysis is applied to three problems, continuous XOR, arithmetic comparison, and distinguishing between two interlocking spirals. In...
2000
Contribution analysis is a useful tool for the analysis of cross-connected networks such as those generated by the cascade-correlation learning algorithm. Networks with cross connections that supersede hidden layers pose particular difficulties for standard analyses of hidden unit activation patterns. A contribution is defined as the product of an output weight and the associated activation on the sending unit. Previously such contributions have been multiplied by the sign of the output target for a particular input pattern. The present work shows that a principal components analysis (PCA) of unscaled contributions yields more interesting insights than comparable analyses of contributions scaled by the sign of output targets.
1989
Cascade-Correlation is a new architecture and supervised learning algorithm for artificial neural networks. Instead of just adjusting the weights in a network of fixed topology, Cascade-Correlation begins with a minimal network, then automatically trains and adds new hidden units one by one, creating a multi-layer structure. Once a new hidden unit has been added to the network, its input-side weights are frozen. This unit then becomes a permanent feature-detector in the network, available for producing outputs or for creating other, more complex feature detectors. The Cascade-Correlation architecture has several advantages over existing algorithms: it learns very quickly, the network determines its own size and topology, it retains the structures it has built even if the training set changes, and it requires no back-propagation of error signals through the connections of the network.
IEEE Transactions on Neural Networks, 1994
The Cascade Correlation [1] is a very flexible, efficient and fast algorithm for supervised learning. It incrementally builds the network by adding hidden units one at a time, until the desired input/output mapping is achieved. It connects all the previously installed units to the new unit being added. Consequently, each new unit in effect adds a new layer and the fan-in of the hidden and output units keeps on increasing as more units get added. The resulting structure could be hard to implement in VLSI, because the connections are irregular and the fan-in is unbounded. Moreover, the depth or the propagation delay through the resulting network is directly proportional to the number of units and can be excessive. We have modified the algorithm to generate networks with restricted fan-in and small depth (propagation delay) by controlling the connectivity. Our results reveal that there is a tradeoff between connectivity and other performance attributes like depth, total number of independent parameters, learning time, etc. When the number of inputs or outputs is small relative to the size of the training set, a higher connectivity usually leads to faster learning, and fewer independent parameters, but it also results in unbounded fan-in and depth. Strictly layered architectures with restricted connectivity, on the other hand, need more epochs to learn and use more parameters, but generate more regular structures, with smaller, limited fan-in and significantly smaller depth (propagation delay), and may be better suited for VLSI implementations. When the number of inputs or outputs is not very small compared to the size of the training set, however, a strictly layered topology is seen to yield an overall better performance.
Behaviormetrika, 1999
Feed-forward neural network models approximate nonlinear functions connecting inputs to outputs. The cascade correlation (CC) learning algorithm allows networks to grow dynamically starting from the simplest network topology to solve increasingly more difficult problems. It has been demonstrated that the CC network can solve a wide range of problems including those for which other kinds of networks (e.g., back-propaga tion networks) have been found to fail. In this paper we show the mechanism and characteristics of nonlinear function learning and representations in CC networks, their generalization capabilities, the effects of environmental bias, etc., using a variety of knowledge representation analysis tools.
The effect of conelations in neural networks is investigated by wnsider@g biased input and output palm'" Statistical mechanics is applied @ study training times and intend potentials of the MINOVER and ADALME leaming algorithms. For the latter, a direet extension to generalization abilify is obtained. Comparison with computer simulations shows good apemen1 with theoretical predictions. With biased pattems, we find
ICSC Symposium on Neural Computation, 1998
A cascade correlation learning network (CCLN) is a popular supervised learning architecture that gradually grows the hidden neurons of fixed nonlinear activation functions, adding one-by-one neuron in the network during the course of training. Because of fixed activation functions the cascaded connections from the existing neurons to the new candidate neuron are required to approximate high-order nonlinearity. The major drawback of a CCLN is that the error surface is very zigzag and unsmooth due to the use of maximum correlation criterion that consistently pushes the hidden neurons to their saturated extreme values instead of active region. To alleviate this drawback of the original CCLN two new cascadecorrelation learning networks (CCLNS1 and CCLNS2) are proposed, which enable smoothing of the error surface. Smoothing is performed by (re)training the gains of the hidden neurons' activation functions. In CCLNS1 smothing is enabled by using the sign functions of the neurons' outputs in the cascaded connections and in CCLNS2 each hidden neuron has two activation functions: fixed one for cascaded connections, and trainable one for connection to the neurons in output layer. The performances of the network structures are tested by learning them to approximate three nonlinear functions. Both proposed structures exhibit much better performances than the original CCLN, while CCLNS1 gives a little bit better results than CCLNS2.
Journal of the Society of Dyers and Colourists, 1998
IEEE Transactions on Neural Networks, 1996
In this paper, we present two learning mechanisms for artificial neural networks (ANN's) that can be applied to solve classification problems with binary outputs. These mechanisms are used to reduce the number of hidden units of an ANN when trained by the cascade-correlation learning algorithm (CAS). Since CAS adds hidden units incrementally as learning proceeds, it is difficult to predict the number of hidden units required when convergence is reached. Further, learning must be restarted when the number of hidden units is larger than expected. Our key idea in this paper is to provide alternatives in the learning process and to select the best alternative dynamically based on run-time information obtained. Mixed-mode learning (MM), our first algorithm, provides alternative output matrices so that learning is extended to find one of the many one-to-many mappings instead of finding a unique one-to-one mapping. Since the objective of learning is relaxed by this transformation, the number of learning epochs can be reduced. This in turn leads to a smaller number of hidden units required for convergence. Population-based learning for ANN's (PLAN), our second algorithm, maintains alternative network configurations to select at run time promising networks to train based on error information obtained and time remaining. This dynamic scheduling avoids training possibly unpromising ANNs to completion before exploring new ones. We show the performance of these two mechanisms by applying them to solve the two-spiral problem, a two-region classification problem, and the Pima Indian diabetes diagnosis problem
The 2006 Ieee International Joint Conference on Neural Network Proceedings, 2006
Wrapper-based feature selection is attractive because wrapper methods are able to optimize the features they select to the specific learning algorithm. Unfortunately, wrapper methods are prohibitively expensive to use with neural nets. We present an internal wrapper feature selection method for Cascade Correlation (C2) nets called C2FS that is 2-3 orders of magnitude faster than external wrapper feature selection. This new internal wrapper feature selection method selects features at the same time hidden units are being added to the growing C2 net architecture. Experiments with five test problems show that C2FS feature selection usually improves accuracy and squared error while dramatically reducing the number of features needed for good performance. Comparison to feature selection via an information theoretic ordering on features (gain ratio) shows that C2FS usually yields better performance and always uses substantially fewer features.
Connection Science, 2007
Cascade-correlation (cascor) networks grow by recruiting hidden units to adjust their computational power to the task being learned. The standard cascor algorithm recruits each hidden unit on a new layer, creating deep networks. In contrast, the flat cascor variant adds all recruited hidden units on a single hidden layer. Student-teacher network approximation tasks were used to investigate the ability of flat and standard cascor networks to learn the input-output mapping of other, randomly initialized flat and standard cascor networks. For lowcomplexity approximation tasks, there was no significant performance difference between flat and standard student networks. Contrary to the common belief that standard cascor does not generalize well due to cascading weights creating deep networks, we found that both standard and flat cascor generalized well on problems of varying complexity. On high-complexity tasks, flat cascor networks had fewer connection weights and learned with less computational cost than standard networks did.
2004
Cascade correlation (CC) has proven to be an effective tool for simulating human learning. One important class of problem solving tasks can be thought of as establishing appropriate connections between inputs and outputs. A CC network initially attempts to solve the task with a minimal network configuration, but when the task cannot be solved, it is powered up by recruiting a hidden unit to capture the uncaptured aspects of the inputoutput relationship until a satisfactory degree of performance is reached. Knowledge-based CC (KBCC) has a similar mechanism, but instead of recruiting hidden units, it can recruit other networks previously trained with similar tasks. In this paper we demonstrate the usefulness of these network tools for simulating learning behavior by human subjects.
Journal of Physics A: Mathematical and General, 1997
On-line learning in layered perceptrons is often hampered by plateaus in the time dependence of the performance. Studies on backpropagation in networks with a small number of input units have revealed that correlations between subsequently presented patterns shorten the length of such plateaus. We show how to extend the statistical mechanics framework to quantitatively check the e ect of correlations on learning in networks with a large number of input units. The surprisingly compact description we obtain makes it possible to derive properties
IEEE Transactions on Signal Processing, 1994
In this paper we provide theoretical foundations for a new neural model for singular value decomposition based on an extension of the Hebbian learning rule called the crosscoupled Hebhian rule. The model is extracting the SVD of the cross-correlation matrix of two stochastic signals and is an extension on previous work on neural-network-related principal component analysis (PCA). We prove the asymptotic convergence of the network to the principal (normalized) singular vectors of the cross-correlation and we provide simulation results which suggest that the convergence is exponential. The new model may have useful applications in the problems of filtering for signal processing and signal detection.
In this paper, we present a new learning mechanism called mixed-mode learning. This learning mechanism transforms an existing supervising learning algorithm from one that finds a unique one-to-one mapping into an algorithm that finds one of the many one-to-many mappings. Since the objective of learning is relaxed by this transformation, the number of learning epochs can be reduced significantly. We show in this paper that mixedmode learning can be applied in the well-known cascade correlation learning algorithm to reduce the number of hidden units required for convergence when applied to classification problems whose desired outputs are binary. Our experimental results confirm this reduction, although they show that more learning time is required than that of cascade correlation. In general, reducing the number of hidden units at the expense of additional learning time is usually desirable.
Connection Science, 2001
The architectures of Artificial Neural Networks (ANN) are based on the problem domain and it is applied during the " training phase " of sample data and used to infer results for the remaining data in the testing phase. Normally, the architecture consist of three layers as input, hidden, output layers with the number of nodes in the input layer as number of known values on hand and the number of nodes as result to be computed out of the values of input nodes and hidden nodes as the output layer. The number of nodes in the hidden layer is heuristically decided so that the optimum value is obtained with reasonable number of iterations with other parameters with its default values. This study mainly focuses on Cascade-Correlation Neural Networks (CCNN) using Back-Propagation (BP) algorithm which finds the number of neurons during the training phase itself by appending one from the previous iteration satisfying the error condition gives a promising result on the optimum number of neurons in the hidden layer.
Journal of Chemical Education, 1994
Journal of Chemical Information and Computer Sciences, 1998
Cascade Learning (CL) [20] is a new adaptive approach to train deep neural networks. It is particularly suited to transfer learning, as learning is achieved in a layerwise fashion, enabling the transfer of selected layers to optimize the quality of transferred features. In the domain of Human Activity Recognition (HAR), where the consideration of resource consumption is critical, CL is of particular interest as it has demonstrated the ability to achieve significant reductions in computational and memory costs with negligible performance loss. In this paper, we evaluate the use of CL and compare it to end to end (E2E) learning in various transfer learning experiments, all applied to HAR. We consider transfer learning across objectives, for example opening the door features transferred to opening the dishwasher. We additionally consider transfer across sensor locations on the body, as well as across datasets. Over all of our experiments, we find that CL achieves state of the art performance for transfer learning in comparison to previously published work, improving F 1 scores by over 15%. In comparison to E2E learning, CL performs similarly considering F 1 scores, with the additional advantage of requiring fewer parameters. Finally, the overall results considering HAR classification performance and memory requirements demonstrate that CL is a good approach for transfer learning.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.