Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2005, Biological and Artificial Intelligence Environments
One way of using the entropy criteria in learning systems is to minimize the entropy of the error between two variables: typically, one is the output of the learning system and the other is the target. This framework has been used for regression. In this paper we show how to use the minimization of the entropy of the error for classification. The minimization of the entropy of the error implies a constant value for the errors. This, in general, does not imply that the value of the errors is zero. In regression, this problem is solved by making a shift of the final result such that it's average equals the average value of the desired target. We prove that, under mild conditions, this algorithm, when used in a classification problem, makes the error converge to zero and can thus be used in classification.
We explore the role of entropy manipulation during learning in supervised multiple layer perceptron classifiers. Entropy maximization [1][2] or mutual information maximization [3] is the criterion for unsupervised blind signal separation or feature extraction. In contrast, we show that for a 2-layer MLP classifier, conditional entropy minimization in the internal layer is a necessary condition for error minimization in the mappingfiom the input to the output. The relationship between entropy and the expected volume and mass of a convex hull constructedfiom n sample points is examined. We show that minimizing the expected hull volume may have more desirable gradient dynamics when compared to minimizing entropy. We show that entropy by itseK has some geometrical invariance with respect to expected hull volumes. We develop closed form expressions for the expected convex hull mass and volumes in RI and relate these to error probability. Finally we show that learning in an MLP may be accomplished solq by minimization of the conditional expected hull volumes and the expected volume of the "intensity of collision. ''
Knowledge and Information Systems, 2000
In this paper, an additional entropy penalty term is used to steer the direction of the hidden node's activation in the process of learning. A state with minimum entropy means that most nodes are operating in the non-linear zones (i.e. saturation zones) near the extreme ends of the Sigmoid curve. As the training proceeds, redundant hidden nodes' activations are pushed towards their extreme value corresponding to a low entropy state with maximum information, while some relevant nodes remain active in the linear zone. As training progresses, more nodes get into saturation zones. The early creation of such nodes may impair generalisation performance. To prevent the network from being driven into saturation before it can really learn, an entropy cycle is proposed in this paper to dampen the creation of such inactive nodes in the early stage of training. At the end of training, these inactive nodes can then be eliminated without affecting the performance of the original network. The concept has been successfully applied for pruning in two classification problems. The experiments indicate that redundant nodes are pruned resulting in optimal network topologies.
ASEAN Journal on Science and Technology for Development
In this paper, entropy term is used in the learning phase of a neural network. As learning progresses, more hidden nodes get into saturation. The early creation of such hidden nodes may impair generalisation. Hence entropy approach is proposed to dampen the early creation of such nodes. The entropy learning also helps to increase the importance of relevant nodes while dampening the less important nodes. At the end of learning, the less important nodes can then be eliminated to reduce the memory requirements of the neural network.
IEEE SMC'99 Conference Proceedings. 1999 IEEE International Conference on Systems, Man, and Cybernetics (Cat. No.99CH37028), 1999
In this paper, an entropy penalty term is used to steer the direction of the hidden node's activation in the process of learning. A state with minimum entropy means that nodes are operating near the extreme values of the Sigmoid curve. As the training proceeds, redundant hidden nodes' activations are pushed towards their extreme value, while relevant nodes remain active in the linear region of the Sigmoid curve. The early creation of redundant nodes may impair generalisation. To prevent the network from being driven into saturation before it can really learn, an entropy cycle is proposed to dampen the early creation of such redundant nodes.
Perspectives in Neural Computing, 1998
Los Alamos National Laboratory, an affirmative action/equal opportunity employer, is operated by the University of California for the US. Department of Energy under contract W-7405-ENG-36. By acceptance of this article, the publisher recognizes that the U.S. Government retains a nonexclusive, royalty-free license to publish or reproduce the published form of this contribution, or to allow others to do so, for U.S. Government pufpnses. The Los Alamos National Laboratory requests that the publisher identify this article as work performed under the auspices of the U.S. Department of Energy. Fom, No. 836 R5 SF2629 lolo1 DISCLAIMER This report was prepared as an account of work sponsored by an agency of the United States Government Neither the United States Government nor any agency thereof, nor any of their employees, make any warranty, express or implied, or assumes any legal liability or respom-b' bility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disdosed, or represents that its use would not infringe privately owned rights. Reference herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise does not necessarily constitute or imply its endorsement, recommendation, or favoring by the United States Government or any agency thereof. The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States Government or any agency thereof.
Artificial neural networks are capable of constructing complex decision boundaries and over the recent years they have been widely used in many practical applications ranging from business to medical diagnosis and technical problems. A large number of error functions have been proposed in the literature to achieve a better predictive power. However, only a few works employ Tsallis statistics, which has successfully been applied in other fields. This paper undertakes the effort to examine the í µí±-generalized function based on Tsallis statistics as an alternative error measure in neural networks. The results indicate that Tsallis entropy error function can be successfully applied in the neural networks yielding satisfactory results.
1995
A new learning algorithm is developed for the design of statistical classifiers minimizing the rate of misclassification. The method, which is based on ideas from information theory and analogies to statistical physics, assigns data to classes in probability. The distributions are chosen to minimize the expected classification error while simultaneously enforcing the classifier's structure and a level of "randomness" measured by Shannon's entropy. Achievement of the classifier structure is quantified by an associated cost. The constrained optimization problem is equivalent to the minimization of a Helmholtz free energy, and the resulting optimization method is a basic extension of the deterministic annealing algorithm that explicitly enforces structural constraints on assignments while reducing the entropy and expected cost with temperature. In the limit of low temperature, the error rate is minimized directly and a hard classifier with the requisite structure is obtained. This learning algorithm can be used to design a variety of classifier structures. The approach is compared with standard methods for radial basis function design and is demonstrated to substantially outperform other design methods on several benchmark examples, while often retaining design complexity comparable to, or only moderately greater than that of strict descent-based methods. 592 D. NnLLER.A.RAO.K. ROSE.A. GERSHO
Artificial neural networks are currently one of the most commonly used classifiers and over the recent years they have been successfully used in many practical applications, including banking and finance, health and medicine, engineering and manufacturing. A large number of error functions have been proposed in the literature to achieve a better predictive power. However, only a few works employ Tsallis statistics, although the method itself has been successfully applied in other machine learning techniques. This paper undertakes the effort to examine the q-generalized function based on Tsallis statistics as an alternative error measure in neural networks. In order to validate different performance aspects of the proposed function and to enable identification of its strengths and weaknesses the extensive simulation was prepared based on the artificial benchmarking dataset. The results indicate that Tsallis entropy error function can be successfully introduced in the neural networks yielding satisfactory results and handling with class imbalance, noise in data or use of non-informative predictors.
Entropy
Measuring the predictability and complexity of time series using entropy is essential tool designing and controlling a nonlinear system. However, the existing methods have some drawbacks related to the strong dependence of entropy on the parameters of the methods. To overcome these difficulties, this study proposes a new method for estimating the entropy of a time series using the LogNNet neural network model. The LogNNet reservoir matrix is filled with time series elements according to our algorithm. The accuracy of the classification of images from the MNIST-10 database is considered as the entropy measure and denoted by NNetEn. The novelty of entropy calculation is that the time series is involved in mixing the input information in the reservoir. Greater complexity in the time series leads to a higher classification accuracy and higher NNetEn values. We introduce a new time series characteristic called time series learning inertia that determines the learning rate of the neural n...
We have previously proposed the use of quadratic Renyi's error entropy with a Parzen density estimator with Gaussian kernels as an alternative optimality criterion for supervised neural network training, and showed that it produces better performance on the test data compared to the MSE. The error entropy criterion imposes the minimization of average information content in the error signal rather than simply minimizing the energy as MSE does. Recently, we developed a nonparametric entropy estimator for Renyi's definition that makes possible the use of any entropy order and any suitable kernel function in Parzen density estimation. The new estimator reduces to the previously used estimator for the special choice of Gaussian kernels and quadratic entropy. In this paper, we briefly present the new criterion and how to apply it to MLP training. We also address the issue of global optimization by the control of the kernel size in the Parzen window estimation.
Neurocomputing, 2019
Entropy models the added information associated to data uncertainty, proving that stochasticity is not purely random. This paper explores the potential improvement of machine learning methodologies through the incorporation of entropy analysis in the learning process. A multi-layer perceptron is applied to identify patterns in previous forecasting errors achieved by a machine learning methodology. The proposed learning approach is adaptive to the training data through a retraining process that includes only the most recent and relevant data, thus excluding misleading information from the training process. The learnt error patterns are then combined with the original forecasting results in order to improve forecasting accuracy, using the Rényi entropy to determine the amount in which the original forecasted value should be adapted considering the learnt error patterns. The proposed approach is combined with eleven different machine learning methodologies, and applied to the forecasting of electricity market prices using real data from the Iberian electricity market operator-OMIE. Results show that through the identification of patterns in the forecasting error, the proposed methodology is able to improve the learning algorithms' forecasting accuracy and reduce the variability of their forecasting errors.
International journal of neural systems, 2003
In this paper, entropy is a term used in the learning phase of a neural network. As learning progresses, more hidden nodes get into saturation. The early creation of such hidden nodes may impair generalisation. Hence an entropy approach is proposed to dampen the early creation of such nodes by using a new computation called entropy cycle. Entropy learning also helps to increase the importance of relevant nodes while dampening the less important nodes. At the end of learning, the less important nodes can then be pruned to reduce the memory requirements of the neural network.
Neural Networks, 1989
Layered feedforward networks are viewed as multistage encoders. This view provides a link between neural networks and information theory and leads to a new measure for the performance of hidden units as well as output units. The measure, called conditional class entropy, not only allows existing networks to be judged but is also the basis of a new training algorithm with which an optimum number of neurons with optimum connecting weights can be found.
IEEE Transactions on Signal Processing, 2002
This paper investigates error-entropy-minimization in adaptive systems training. We prove the equivalence between minimization of error's Renyi entropy of order and minimization of a Csiszar distance measure between the densities of desired and system outputs. A nonparametric estimator for Renyi's entropy is presented, and it is shown that the global minimum of this estimator is the same as the actual entropy. The performance of the error-entropy-minimization criterion is compared with mean-square-error-minimization in the short-term prediction of a chaotic time series and in nonlinear system identification.
Pattern Analysis and Applications, 2015
Most of the existing classification methods are aimed at minimization of empirical risk (through some simple point-based error measured with loss function) with added regularization. We propose to approach this problem in a more information theoretic way by investigating applicability of entropy measures as a classification model objective function. We focus on quadratic Renyi's entropy and connected Cauchy-Schwarz Divergence which leads to the construction of Extreme Entropy Machines (EEM).
2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541), 2004
There are many existing supervised learning algorithms based on the technique of gradient descent. A problem with these algorithms is that they occasionally converge to undesired local minimum. This paper builds on the theory of nonextensive statistical mechanics to develop a new adaptive gradient-based learning scheme that applies a sign-based weight adjustment, inspired from the Rprop algorithm, on a perturbed version of the original error function. The perturbations are characterized by the q entropic index of the nonextensive entropy, and their impact is controlled by means of regularization. This approach modifies the error landscape at each iteration allowing the algorithm to explore previously unavailable regions of the error surface, and possibly escape undesired local minima. The performance of the adaptive scheme is empirically evaluated using problems from the UCI Repository of Machine Learning Databases and other classic benchmarks.
IEEE Transactions on Signal Processing, 2003
Recently, we have proposed the minimum error entropy (MEE) criterion as an information theoretic alternative to the widely used mean square error criterion in supervised adaptive system training. For this purpose, we have formulated a nonparametric estimator for Renyi's entropy that employs Parzen windowing. Mathematical investigation of the proposed entropy estimator revealed interesting insights about the process of information theoretical learning. This new estimator and the associated criteria have been applied to the supervised and unsupervised training of adaptive systems in a wide range of problems successfully. In this paper, we analyze the structure of the MEE performance surface around the optimal solution, and we derive the upper bound for the step size in adaptive linear neuron (ADA-LINE) training with the steepest descent algorithm using MEE. In addition, the effects of the entropy order and the kernel size in Parzen windowing on the shape of the performance surface and the eigenvalues of the Hessian at and around the optimal solution are investigated. Conclusions from the theoretical analyses are illustrated through numerical examples.
Neural Networks, IEEE …, 1999
2014
There are many methods for determining the Classification Accuracy. In this paper significance of Entropy of training signatures in Classification has been shown. Entropy of training signatures of the raw digital image represents the heterogeneity of the brightness values of the pixels in different bands. This implies that an image comprising a homogeneous lu/lc category will be associated with nearly the same reflectance values that would result in the occurrence of a very low entropy value. On the other hand an image characterized by the occurrence of diverse lu/lc categories will consist of largely differing reflectance values due to which the entropy of such image would be relatively high. This concept leads to analyses of classification accuracy. Although Entropy has been used many times in RS and GIS but its use in determination of classification accuracy is new approach.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.