Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2000, Knowledge and Information Systems
…
24 pages
1 file
In this paper, an additional entropy penalty term is used to steer the direction of the hidden node's activation in the process of learning. A state with minimum entropy means that most nodes are operating in the non-linear zones (i.e. saturation zones) near the extreme ends of the Sigmoid curve. As the training proceeds, redundant hidden nodes' activations are pushed towards their extreme value corresponding to a low entropy state with maximum information, while some relevant nodes remain active in the linear zone. As training progresses, more nodes get into saturation zones. The early creation of such nodes may impair generalisation performance. To prevent the network from being driven into saturation before it can really learn, an entropy cycle is proposed in this paper to dampen the creation of such inactive nodes in the early stage of training. At the end of training, these inactive nodes can then be eliminated without affecting the performance of the original network. The concept has been successfully applied for pruning in two classification problems. The experiments indicate that redundant nodes are pruned resulting in optimal network topologies.
IEEE SMC'99 Conference Proceedings. 1999 IEEE International Conference on Systems, Man, and Cybernetics (Cat. No.99CH37028), 1999
In this paper, an entropy penalty term is used to steer the direction of the hidden node's activation in the process of learning. A state with minimum entropy means that nodes are operating near the extreme values of the Sigmoid curve. As the training proceeds, redundant hidden nodes' activations are pushed towards their extreme value, while relevant nodes remain active in the linear region of the Sigmoid curve. The early creation of redundant nodes may impair generalisation. To prevent the network from being driven into saturation before it can really learn, an entropy cycle is proposed to dampen the early creation of such redundant nodes.
International journal of neural systems, 2003
In this paper, entropy is a term used in the learning phase of a neural network. As learning progresses, more hidden nodes get into saturation. The early creation of such hidden nodes may impair generalisation. Hence an entropy approach is proposed to dampen the early creation of such nodes by using a new computation called entropy cycle. Entropy learning also helps to increase the importance of relevant nodes while dampening the less important nodes. At the end of learning, the less important nodes can then be pruned to reduce the memory requirements of the neural network.
ASEAN Journal on Science and Technology for Development
In this paper, entropy term is used in the learning phase of a neural network. As learning progresses, more hidden nodes get into saturation. The early creation of such hidden nodes may impair generalisation. Hence entropy approach is proposed to dampen the early creation of such nodes. The entropy learning also helps to increase the importance of relevant nodes while dampening the less important nodes. At the end of learning, the less important nodes can then be eliminated to reduce the memory requirements of the neural network.
Biological and Artificial Intelligence Environments, 2005
One way of using the entropy criteria in learning systems is to minimize the entropy of the error between two variables: typically, one is the output of the learning system and the other is the target. This framework has been used for regression. In this paper we show how to use the minimization of the entropy of the error for classification. The minimization of the entropy of the error implies a constant value for the errors. This, in general, does not imply that the value of the errors is zero. In regression, this problem is solved by making a shift of the final result such that it's average equals the average value of the desired target. We prove that, under mild conditions, this algorithm, when used in a classification problem, makes the error converge to zero and can thus be used in classification.
2013
Design of artificial neural networks is an important and practical task:"how to choose the adequate size of neural architecture for a given application". One popular method to overcome this problem is to start with an oversized structure and then prune it to obtain simpler network with a good generalization performance. This paper presents a pruning algorithm based on pseudo-entropy of hidden neurons. The pruning is performed by iteratively training of network to a certain performance criterion and then removing the hidden neuron with individual pseudo-entropy greater than a preselected threshold which is slightly higher than the average value of network's pseudo-entropy until no one can further be removed. This approach is validated with an academic example and it is tested on induction motor modeling problem. Compared with two modified versions of Optimal Brain Surgeon (OBS) algorithm, the developed method gives interesting results with an easier computation tasks.
We explore the role of entropy manipulation during learning in supervised multiple layer perceptron classifiers. Entropy maximization [1][2] or mutual information maximization [3] is the criterion for unsupervised blind signal separation or feature extraction. In contrast, we show that for a 2-layer MLP classifier, conditional entropy minimization in the internal layer is a necessary condition for error minimization in the mappingfiom the input to the output. The relationship between entropy and the expected volume and mass of a convex hull constructedfiom n sample points is examined. We show that minimizing the expected hull volume may have more desirable gradient dynamics when compared to minimizing entropy. We show that entropy by itseK has some geometrical invariance with respect to expected hull volumes. We develop closed form expressions for the expected convex hull mass and volumes in RI and relate these to error probability. Finally we show that learning in an MLP may be accomplished solq by minimization of the conditional expected hull volumes and the expected volume of the "intensity of collision. ''
Neural Networks, 1989
Layered feedforward networks are viewed as multistage encoders. This view provides a link between neural networks and information theory and leads to a new measure for the performance of hidden units as well as output units. The measure, called conditional class entropy, not only allows existing networks to be judged but is also the basis of a new training algorithm with which an optimum number of neurons with optimum connecting weights can be found.
Neural Processing Letters, 2011
Optimizing the structure of neural networks is an essential step for the discovery of knowledge from data. This paper deals with a new approach which determines the insignificant input and hidden neurons to detect the optimum structure of a feedforward neural network. The proposed pruning algorithm, called as neural network pruning by significance (N2PS), is based on a new significant measure which is calculated by the Sigmoidal activation value of the node and all the weights of its outgoing connections. It considers all the nodes with significance value below the threshold as insignificant and eliminates them. The advantages of this approach are illustrated by implementing it on six different real datasets namely iris, breast-cancer, hepatitis, diabetes, ionosphere and wave. The results show that the proposed algorithm is quite efficient in pruning the significant number of neurons on the neural network models without sacrificing the networks performance.
We have previously proposed the use of quadratic Renyi's error entropy with a Parzen density estimator with Gaussian kernels as an alternative optimality criterion for supervised neural network training, and showed that it produces better performance on the test data compared to the MSE. The error entropy criterion imposes the minimization of average information content in the error signal rather than simply minimizing the energy as MSE does. Recently, we developed a nonparametric entropy estimator for Renyi's definition that makes possible the use of any entropy order and any suitable kernel function in Parzen density estimation. The new estimator reduces to the previously used estimator for the special choice of Gaussian kernels and quadratic entropy. In this paper, we briefly present the new criterion and how to apply it to MLP training. We also address the issue of global optimization by the control of the kernel size in the Parzen window estimation.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
Perspectives in Neural Computing, 1998
IEEE Transactions on Neural Networks, 1996
Science of Artificial Neural Networks, 1992
Neurocomputing, 2008
Neurocomputing, 2019
Proceedings of Digital Processing Applications (TENCON '96), 1996
Neural Networks, IEEE …, 1999
IEEE Transactions on Computers, 1993
arXiv (Cornell University), 2021
International Journal of Computing and Digital Systems, 2019
Proceedings of 1st International Conference on Conventional and Knowledge Based Intelligent Electronic Systems. KES '97, 1997