Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2012, Entropy
…
12 pages
1 file
The minimum error entropy (MEE) criterion has been receiving increasing attention due to its promising perspectives for applications in signal processing and machine learning. In the context of Bayesian estimation, the MEE criterion is concerned with the estimation of a certain random variable based on another random variable, so that the error's entropy is minimized. Several theoretical results on this topic have been reported. In this work, we present some further results on the MEE estimation. The contributions are twofold: (1) we extend a recent result on the minimum entropy of a mixture of unimodal and symmetric distributions to a more general case, and prove that if the conditional distributions are generalized uniformly dominated (GUD), the dominant alignment will be the MEE estimator; (2) we show by examples that the MEE estimator (not limited to singular cases) may be non-unique even if the error distribution is restricted to zero-mean (unbiased).
Entropy, 2014
The minimum error entropy (MEE) criterion has been successfully used in fields such as parameter estimation, system identification and the supervised machine learning. There is in general no explicit expression for the optimal MEE estimate unless some constraints on the conditional distribution are imposed. A recent paper has proved that if the conditional density is conditionally symmetric and unimodal (CSUM), then the optimal MEE estimate (with Shannon entropy) equals the conditional median. In this study, we extend this result to the generalized MEE estimation where the optimality criterion is the Renyi entropy or equivalently, the α-order information potential (IP).
IEEE transactions on neural networks and learning systems, 2016
The minimum error entropy (MEE) is an important and highly effective optimization criterion in information theoretic learning (ITL). For regression problems, MEE aims at minimizing the entropy of the prediction error such that the estimated model preserves the information of the data generating system as much as possible. In many real world applications, the MEE estimator can outperform significantly the well-known minimum mean square error (MMSE) estimator and show strong robustness to noises especially when data are contaminated by non-Gaussian (multimodal, heavy tailed, discrete valued, and so on) noises. In this brief, we present some theoretical results on the robustness of MEE. For a one-parameter linear errors-in-variables (EIV) model and under some conditions, we derive a region that contains the MEE solution, which suggests that the MEE estimate can be very close to the true value of the unknown parameter even in presence of arbitrarily large outliers in both input and outp...
Information Sciences, 1994
The principle of minimum error entropy estimation as found in the work of Weidemann and Stear is reformulated as a problem of finding optimum locations of probability densities in a given mixture such that the resulting (differential) entropy is minimized. New results concerning the entropy lower bound are derived. Continuity of the entropy and attaining the minjmum entropy are proved in the case where the mixture is finite. Some other examples and situations, in particular that of s~metric unimodal densities, are studied in more detail.
IEEE Access
The past decade has seen a rapid application of information theoretic learning (ITL) criteria in robust signal processing and machine learning problems. Generally, in ITL's literature, it is seen that, under non-Gaussian assumptions, especially when the data are corrupted by heavy-tailed or multi-modal non-Gaussian distributions, information theoretic criteria [such as minimum error entropy (MEE)] outperform second order statistical ones. The objective of this research is to investigate this better performance of MEE criterion against that of minimum mean square error. Having found similar results for MEE-and MSE-based methods, in the non-Gaussian environment under particular conditions, we need a precise demarcation between this occasional similarity and occasional outperformance. Based on the theoretic findings, we reveal a better touchstone for the outperformance of MEE versus MSE.
Entropy
Entropy and relative entropy measures play a crucial role in mathematical information theory. The relative entropies are also widely used in statistics under the name of divergence measures which link these two fields of science through the minimum divergence principle. Divergence measures are popular among statisticians as many of the corresponding minimum divergence methods lead to robust inference in the presence of outliers in the observed data; examples include the φ-divergence, the density power divergence, the logarithmic density power divergence and the recently developed family of logarithmic super divergence (LSD). In this paper, we will present an alternative information theoretic formulation of the LSD measures as a two-parameter generalization of the relative α-entropy, which we refer to as the general (α, β)-entropy. We explore its relation with various other entropies and divergences, which also generates a two-parameter extension of Renyi entropy measure as a by-product. This paper is primarily focused on the geometric properties of the relative (α, β)-entropy or the LSD measures; we prove their continuity and convexity in both the arguments along with an extended Pythagorean relation under a power-transformation of the domain space. We also derive a set of sufficient conditions under which the forward and the reverse projections of the relative (α, β)-entropy exist and are unique. Finally, we briefly discuss the potential applications of the relative (α, β)-entropy or the LSD measures in statistical inference, in particular, for robust parameter estimation and hypothesis testing. Our results on the reverse projection of the relative (α, β)-entropy establish, for the first time, the existence and uniqueness of the minimum LSD estimators. Numerical illustrations are also provided for the problem of estimating the binomial parameter.
Nucleation and Atmospheric Aerosols, 2015
The main object of this tutorial article is first to review the main inference tools using Bayesian approach, Entropy, Information theory and their corresponding geometries. This review is focused mainly on the ways these tools have been used in data, signal and image processing. After a short introduction of the different quantities related to the Bayes rule, the entropy and the Maximum Entropy Principle (MEP), relative entropy and the Kullback-Leibler divergence, Fisher information, we will study their use in different fields of data and signal processing such as: entropy in source separation, Fisher information in model order selection, different Maximum Entropy based methods in time series spectral estimation and finally, general linear inverse problems.
Evolutionary Computation, 2005
Estimation of Distribution Algorithms (EDA) have been proposed as an extension of genetic algorithms. In this paper we explain the relationship of EDA to algorithms developed in statistics, artificial intelligence, and statistical physics. The major design issues are discussed within a general interdisciplinary framework. It is shown that maximum entropy approximations play a crucial role. All proposed algorithms try to minimize the Kullback-Leibler divergence KLD between the unknown distribution p(x) and a class q(x) of approximations. However, the Kullback-Leibler divergence is not symmetric. Approximations which suppose that the function to be optimized is additively decomposed (ADF) minimize KLD(q||p), the methods which learn the approximate model from data minimize KLD(p||q). This minimization is identical to maximizing the log-likelihood. In the paper three classes of algorithms are discussed. FDAuses the ADF to compute an approximate factorization of the unknown distribution....
Information-theoretic learning often requires the use of the probability density function (pdf), entropy, or mutual information. In this appendix, we provide a brief overview of some efficient methods for estimating the pdf as well as the entropy function. The pdf and entropy estimators discussed here are practically useful because of their simplicity and the basis of sample statistics. For discussion simplicity, we restrict our attention to continuous, real-valued univariate random variables, for which the estimators of pdf and its associated entropy are sought. Definition D.1 A real-valued Lebesgue-integrable function p(x) (x ∈ R) is called a pdf if it satisfies p(x) = x −∞ F (x) dx, where F (x) is a cumulative probability distribution function. A pdf is everywhere nonnegative and its integral from −∞ to +∞ is equal to 1; namely 0 ≤ p(x) ≤ 1 and ∞ −∞ p(x) dx = 1. Definition D.2 Given the pdf of a continuous random variable x, its differential Shannon entropy is defined as H (x) = E[− log p(x)] = − ∞ −∞ p(x) log p(x) dx.
Journal of Statistical Planning and Inference, 1986
The problem considered is simultaneous estimation of scale parameters and their reciprocals from p independent gamma distributions under a scale invariant loss function first introduced in James and Stein (1961). Under mild restrictions on the shape parameters, the best scale invariant estimators are shown to be admissible for p= 2. For p_> 3, a general technique is developed for improving upon the best scale invariant estimators. Improvement on the generalized Bayes estimators of a vector involving certain powers of the scale parameter is also obtained.
The problem of estimation of density functionals like entropy and mutual information has received much attention in the statistics and information theory communities. A large class of estimators of functionals of the probability density suffer from the curse of dimensionality, wherein the mean squared error (MSE) decays increasingly slowly as a function of the sample size T as the dimension d of the samples increases. In particular, the rate is often glacially slow of order O(T −γ/d ), where γ > 0 is a rate parameter. Examples of such estimators include kernel density estimators, k-nearest neighbor (k-NN) density estimators, k-NN entropy estimators, intrinsic dimension estimators and other examples. In this paper, we propose a weighted affine combination of an ensemble of such estimators, where optimal weights can be chosen such that the weighted estimator converges at a much faster dimension invariant rate of O(T −1 ). Furthermore, we show that these optimal weights can be determined by solving a convex optimization problem which can be performed offline and does not require training data. We illustrate the superior performance of our weighted estimator for two important applications: (i) estimating the Panter-Dite distortion-rate factor and (ii) estimating the Shannon entropy for testing the probability distribution of a random sample.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
Information Fusion, 2013
AIP Conference Proceedings, 2004
Econometrics Journal, 2005
Methodology and Computing in Applied Probability, 2007
Journal of Statistical Planning and Inference, 2010
Advances in Neural Information Processing Systems 14, 2002
RePEc: Research Papers in Economics, 2005
Brazilian Journal of Probability and Statistics
IEEE Transactions on Neural Networks and Learning Systems, 2018
Mathematica Slovaca, 2008
Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662)
Journal of the Iranian Statistical Society Jirss, 2010
Journal of Multivariate Analysis, 2008
Lecture Notes in Physics, 2003
Applied and Computational Harmonic Analysis, 2014
IEEE Transactions on Information Theory, 1989
Computing Research Repository, 2004