Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2000
Contribution analysis is a useful tool for the analysis of cross-connected networks such as those generated by the cascade-correlation learning algorithm. Networks with cross connections that supersede hidden layers pose particular difficulties for standard analyses of hidden unit activation patterns. A contribution is defined as the product of an output weight and the associated activation on the sending unit. Previously such contributions have been multiplied by the sign of the output target for a particular input pattern. The present work shows that a principal components analysis (PCA) of unscaled contributions yields more interesting insights than comparable analyses of contributions scaled by the sign of output targets.
1994
The non-linear complexities of neural networks make network solutions difficult to understand. Sanger's contribution analysis is here extended to the analysis of networks automatically generated by the cascadecorrelation learning algorithm. Because such networks have cross connections that supersede hidden layers, standard analyses of hidden unit activation patterns are insufficient. A contribution is defined as the product of an output weight and the associated activation on the sending unit, whether that sending unit is an input or a hidden unit, multiplied by the sign of the output target for the current input pattern. Intercorrelations among contributions, as gleaned from the matrix of contributions x input patterns, can be subjected to principal components analysis (PCA) to extract the main features of variation in the contributions. Such an analysis is applied to three problems, continuous XOR, arithmetic comparison, and distinguishing between two interlocking spirals. In all three cases, this technique yields useful insights into network solutions that are consistent across several networks.
1995
Understanding knowledge representations in neural nets has been a difficult problem. Principal components analysis (PCA) of contributions (products of sending activations and connection weights) has yielded valuable insights into knowledge representations, but much of this work has focused on the correlation matrix of contributions. The present work shows that analyzing the variance-covariance matrix of contributions yields more valid insights by taking account of weights.
IEEE Transactions on Neural Networks, 1994
The Cascade Correlation [1] is a very flexible, efficient and fast algorithm for supervised learning. It incrementally builds the network by adding hidden units one at a time, until the desired input/output mapping is achieved. It connects all the previously installed units to the new unit being added. Consequently, each new unit in effect adds a new layer and the fan-in of the hidden and output units keeps on increasing as more units get added. The resulting structure could be hard to implement in VLSI, because the connections are irregular and the fan-in is unbounded. Moreover, the depth or the propagation delay through the resulting network is directly proportional to the number of units and can be excessive. We have modified the algorithm to generate networks with restricted fan-in and small depth (propagation delay) by controlling the connectivity. Our results reveal that there is a tradeoff between connectivity and other performance attributes like depth, total number of independent parameters, learning time, etc. When the number of inputs or outputs is small relative to the size of the training set, a higher connectivity usually leads to faster learning, and fewer independent parameters, but it also results in unbounded fan-in and depth. Strictly layered architectures with restricted connectivity, on the other hand, need more epochs to learn and use more parameters, but generate more regular structures, with smaller, limited fan-in and significantly smaller depth (propagation delay), and may be better suited for VLSI implementations. When the number of inputs or outputs is not very small compared to the size of the training set, however, a strictly layered topology is seen to yield an overall better performance.
1989
Cascade-Correlation is a new architecture and supervised learning algorithm for artificial neural networks. Instead of just adjusting the weights in a network of fixed topology, Cascade-Correlation begins with a minimal network, then automatically trains and adds new hidden units one by one, creating a multi-layer structure. Once a new hidden unit has been added to the network, its input-side weights are frozen. This unit then becomes a permanent feature-detector in the network, available for producing outputs or for creating other, more complex feature detectors. The Cascade-Correlation architecture has several advantages over existing algorithms: it learns very quickly, the network determines its own size and topology, it retains the structures it has built even if the training set changes, and it requires no back-propagation of error signals through the connections of the network.
Behaviormetrika, 1999
Feed-forward neural network models approximate nonlinear functions connecting inputs to outputs. The cascade correlation (CC) learning algorithm allows networks to grow dynamically starting from the simplest network topology to solve increasingly more difficult problems. It has been demonstrated that the CC network can solve a wide range of problems including those for which other kinds of networks (e.g., back-propaga tion networks) have been found to fail. In this paper we show the mechanism and characteristics of nonlinear function learning and representations in CC networks, their generalization capabilities, the effects of environmental bias, etc., using a variety of knowledge representation analysis tools.
1994
Abstract| This paper demonstrates that a skeletal structure of a network emerges when independent noises are added to the inputs of the hidden units of Multilayer Perceptron during the learning by error backpropagation. By analyzing the average behavior of the error backpropagation algorithm to such noises, it is shown that the weights from the hidden units to the output units tend to get smaller and the outputs of the hidden units tend to be0 or 1. Such tendency have been demonstrated by experiments of learning of pattern classication problem.
PLoS ONE, 2013
Graph representations of brain connectivity have attracted a lot of recent interest, but existing methods for dividing such graphs into connected subnetworks have a number of limitations in the context of neuroimaging. This is an important problem because most cognitive functions would be expected to involve some but not all brain regions. In this paper we outline a simple approach for decomposing graphs, which may be based on any measure of interregional association, into coherent ''principal networks''. The technique is based on an eigendecomposition of the association matrix, and is closely related to principal components analysis. We demonstrate the technique using cortical thickness and diffusion tractography data, showing that the subnetworks which emerge are stable, meaningful and reproducible. Graph-theoretic measures of network cost and efficiency may be calculated separately for each principal network. Unlike some other approaches, all available connectivity information is taken into account, and vertices may appear in none or several of the subnetworks. Subject-by-subject ''scores'' for each principal network may also be obtained, under certain circumstances, and related to demographic or cognitive variables of interest.
Proceedings of the International …, 2000
2004
Cascade correlation (CC) has proven to be an effective tool for simulating human learning. One important class of problem solving tasks can be thought of as establishing appropriate connections between inputs and outputs. A CC network initially attempts to solve the task with a minimal network configuration, but when the task cannot be solved, it is powered up by recruiting a hidden unit to capture the uncaptured aspects of the inputoutput relationship until a satisfactory degree of performance is reached. Knowledge-based CC (KBCC) has a similar mechanism, but instead of recruiting hidden units, it can recruit other networks previously trained with similar tasks. In this paper we demonstrate the usefulness of these network tools for simulating learning behavior by human subjects.
In this paper, we present a new learning mechanism called mixed-mode learning. This learning mechanism transforms an existing supervising learning algorithm from one that finds a unique one-to-one mapping into an algorithm that finds one of the many one-to-many mappings. Since the objective of learning is relaxed by this transformation, the number of learning epochs can be reduced significantly. We show in this paper that mixedmode learning can be applied in the well-known cascade correlation learning algorithm to reduce the number of hidden units required for convergence when applied to classification problems whose desired outputs are binary. Our experimental results confirm this reduction, although they show that more learning time is required than that of cascade correlation. In general, reducing the number of hidden units at the expense of additional learning time is usually desirable.
IEEE Transactions on Signal Processing, 1994
In this paper we provide theoretical foundations for a new neural model for singular value decomposition based on an extension of the Hebbian learning rule called the crosscoupled Hebhian rule. The model is extracting the SVD of the cross-correlation matrix of two stochastic signals and is an extension on previous work on neural-network-related principal component analysis (PCA). We prove the asymptotic convergence of the network to the principal (normalized) singular vectors of the cross-correlation and we provide simulation results which suggest that the convergence is exponential. The new model may have useful applications in the problems of filtering for signal processing and signal detection.
Journal of Physics A: Mathematical and General, 1997
On-line learning in layered perceptrons is often hampered by plateaus in the time dependence of the performance. Studies on backpropagation in networks with a small number of input units have revealed that correlations between subsequently presented patterns shorten the length of such plateaus. We show how to extend the statistical mechanics framework to quantitatively check the e ect of correlations on learning in networks with a large number of input units. The surprisingly compact description we obtain makes it possible to derive properties
Connection Science, 2007
Cascade-correlation (cascor) networks grow by recruiting hidden units to adjust their computational power to the task being learned. The standard cascor algorithm recruits each hidden unit on a new layer, creating deep networks. In contrast, the flat cascor variant adds all recruited hidden units on a single hidden layer. Student-teacher network approximation tasks were used to investigate the ability of flat and standard cascor networks to learn the input-output mapping of other, randomly initialized flat and standard cascor networks. For lowcomplexity approximation tasks, there was no significant performance difference between flat and standard student networks. Contrary to the common belief that standard cascor does not generalize well due to cascading weights creating deep networks, we found that both standard and flat cascor generalized well on problems of varying complexity. On high-complexity tasks, flat cascor networks had fewer connection weights and learned with less computational cost than standard networks did.
The effect of conelations in neural networks is investigated by wnsider@g biased input and output palm'" Statistical mechanics is applied @ study training times and intend potentials of the MINOVER and ADALME leaming algorithms. For the latter, a direet extension to generalization abilify is obtained. Comparison with computer simulations shows good apemen1 with theoretical predictions. With biased pattems, we find
2000
We extend previous research on parameter dynamics of digital filters to examine weight sensitivity and interdependence in feedforward networks. Weight sensitivity refers to the effect of small weight perturbations on the network's output, and weight interdependence refers to the degree of co-linearity between weights. A combined measure of the weight space (τ), defined as the ratio of weight interdependence to sensitivity, is explored in networks with hidden-unit activation functions of different complexity in the contexts of learning (1) a nonlinearly separable bivariate normal classification task, (2) the XOR problem, (3) sigmoidal functions, and (4) sine functions. Simulations show that networks with more complex activation functions give rise to a smaller τ and more rapid learning, suggesting that weight sensitivity and interdependence together are indicative of network complexity and are predictive of learning efficiency. Weight Sensitivity and Interdependence 3 Characterizing Network Complexity and Learning Efficiency by the Ratio of Weight Interdependence to Sensitivity 1 Introduction Mathematical models of human cognition must explain complex phenomena but, at the same time, need to retain parsimony of description. Similarly, in practical applications, neural networks should learn rapidly and generalize extensively without excessive computational complexity (e.g., Hinton and van Camp, 1993; Hochreiter and Schmidhuber, 1997). The learning capabilities of a network can be readily ascertained; however, the measurement of model complexity continues to be subject to theoretical debate and research (e.g., proposed three primary indicators of model complexity: the number of parameters, the model's functional form, and the range of its parameter space. In network terms, the number of parameters corresponds to the number of units and weights, whereas functional form and range correspond to the nature of the activation function of those units. Although the number of units and adjustable weights can be readily verified, they are poor indicators of computational power because they ignore the functional form of the activation function and thus usually do not capture the intrinsic degrees of freedom of a network. A network's computational capacity derives instead from the feature space defined
Connection Science, 2001
Journal of Chemical Information and Computer Sciences, 1998
Cascade Learning (CL) [20] is a new adaptive approach to train deep neural networks. It is particularly suited to transfer learning, as learning is achieved in a layerwise fashion, enabling the transfer of selected layers to optimize the quality of transferred features. In the domain of Human Activity Recognition (HAR), where the consideration of resource consumption is critical, CL is of particular interest as it has demonstrated the ability to achieve significant reductions in computational and memory costs with negligible performance loss. In this paper, we evaluate the use of CL and compare it to end to end (E2E) learning in various transfer learning experiments, all applied to HAR. We consider transfer learning across objectives, for example opening the door features transferred to opening the dishwasher. We additionally consider transfer across sensor locations on the body, as well as across datasets. Over all of our experiments, we find that CL achieves state of the art performance for transfer learning in comparison to previously published work, improving F 1 scores by over 15%. In comparison to E2E learning, CL performs similarly considering F 1 scores, with the additional advantage of requiring fewer parameters. Finally, the overall results considering HAR classification performance and memory requirements demonstrate that CL is a good approach for transfer learning.
International Statistical Review, 2017
PCA is a statistical method, which is directly related to EVD and SVD. Neural networks-based PCA method estimates PC online from the input data sequences, which especially suits for high-dimensional data due to the avoidance of the computation of large covariance matrix, and for the tracking of nonstationary data, where the covariance matrix changes slowly over time. Neural networks and algorithms for PCA will be described in this chapter, and algorithms given in this chapter are typically unsupervised learning methods. PCA has been widely used in engineering and scientific disciplines, such as pattern recognition, data compression and coding, image processing, high-resolution spectrum analysis, and adaptive beamforming. PCA is based on the spectral analysis of the second moment matrix that statistically characterizes a random vector. PCA is directly related to SVD, and the most common way to perform PCA is via the SVD of a data matrix. However, the capability of SVD is limited for very large data sets. It is well known that preprocessing usually maps a high-dimensional space to a low-dimensional space with the least information loss, which is known as feature extraction. PCA is a well-known feature extraction method, and it allows the removal of the second-order correlation among given random processes. By calculating the eigenvectors of the covariance matrix of the input vector, PCA linearly transforms a high-dimensional input vector into a low-dimensional one whose components are uncorrelated.
1992
In [5], a new incremental cascade network architecture has been presented. This paper discusses the properties of such cascade networks and investigates their generalization abilities under the particular constraint of small data sets. The evaluation is done for cascade networks consisting of local linear maps using the MackeyGlass time series prediction task as a benchmark. Our results indicate that to bring the potential of large networks to bear on the problem of extracting information from small data sets without running the risk of overjitting, deeply cascaded network architectures are more favorable than shallow broad architectures that contain the same number of nodes.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.