Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
1998, IEEE Transactions on Information Theory
While Kolmogorov complexity is the accepted absolute measure of information content in an individual finite object, a similarly absolute notion is needed for the information distance between two individual objects, for example, two pictures. We give several natural definitions of a universal information metric, based on length of shortest programs for either ordinary computations or reversible (dissipationless) computations. It turns out that these definitions are equivalent up to an additive logarithmic term. We show that the information distance is a universal cognitive similarity distance. We investigate the maximal correlation of the shortest programs involved, the maximal uncorrelation of programs (a generalization of the Slepian-Wolf theorem of classical information theory), and the density properties of the discrete metric spaces induced by the information distances. A related distance measures the amount of nonreversibility of a computation. Using the physical theory of reversible computation, we give an appropriate (universal, anti-symmetric, and transitive) measure of the thermodynamic work required to transform one object in another object by the most efficient process. Information distance between individual objects is needed in pattern recognition where one wants to express effective notions of "pattern similarity" or "cognitive similarity" between individual objects and in thermodynamics of computation where one wants to analyse the energy dissipation of a computation from a particular input to a particular output.
ArXiv, 2020
We consider the notion of information distance between two objects x and y introduced by Bennett, Gacs, Li, Vitanyi, and Zurek [1] as the minimal length of a program that computes x from y as well as computing y from x, and study different versions of this notion. It was claimed by Mahmud [11] that the prefix version of information distance equals max(K(x|y), K(y|) + O(1) (this equality with logarithmic precision was one of the main results of the paper by Bennett, Gacs, Li, Vitanyi, and Zurek). We show that this claim is false, but does hold if the information distance is at least super logarithmic.
Journal of Computer and System Sciences, 2011
Normalized information distance (NID) uses the theoretical notion of Kolmogorov complexity, which for practical purposes is approximated by the length of the compressed version of the file involved, using a real-world compression program. This practical application is called 'normalized compression distance' and it is trivially computable. It is a parameter-free similarity measure based on compression, and is used in pattern recognition, data mining, phylogeny, clustering, and classification. The complexity properties of its theoretical precursor, the NID, have been open. We show that the NID is neither upper semicomputable nor lower semicomputable up to any reasonable precision.
2000
For many systems characterized as \complex/living/intelligent" the spatio-temporal patterns exhibited on di erent scales di er markedly from one another. For example the biomass distribution of a human body \looks very di erent" depending on the spatial scale at which one examines that biomass. Conversely, the density patterns at di erent scales in \dead/simple" systems (e.g., gases, mountains, crystals) do not vary signi cantly from one another. Accordingly, we argue that the degrees of self-dissimilarity between the various scales with which a system is examined constitute a complexity \signature" of that system. Such signatures can be empirically measured for many real-world data sets concerning spatio-temporal densities, be they mass densities, species densities, or symbol densities. This allows one to compare the complexity signatures of wholly di erent kinds of systems (e.g., systems involving information density in a digital computer, vs. species densities in a rainforest , vs. capital density in an economy, etc.). Such signatures can also be clustered, to provide an empirically determined taxonomy of \kinds of systems" that share organizational traits. The precise measure of dissimilarity between scales that we propose is the amount of extra information on one scale beyond that which exists on a di erent scale. This \added information" is perhaps most naturally determined using a maximum entropy inference of the distribution of patterns at the second scale, based on the provided distribution at the rst scale. We brie y discuss using our measure with other inference mechanisms (e.g., Kolmogorov complexity-based inference).
2006
Abstract-We introduce a definition of similarity based on Tversky's set-theoretic linear contrast model and on informationtheoretic principles. The similarity measures the residual entropy with respect to a random object. This residual entropy similarity strongly captures context, which we conjecture is important for similarity-based statistical learning. Properties of the similarity definition are established and examples illustrate its characteristics. We show that a previously-defined information-theoretic similarity is also set-theoretic, and compare it to the residual entropy similarity. The similarity between random objects is also treated. I. INTRODUCTION Similarity definitions are important for a range of classification, clustering, and other pattern recognition tasks. A recent review of the issues in assessing similarity for pattern recognition is given in [1]. Many similarity-based pattern recognition solutions have used application-specific notions of similarity. Her...
Normalized information distance (NID) uses the theoretical notion of Kolmogorov complexity, which for practical purposes is approximated by the length of the compressed version of the file involved, using a real-world compression program. This practical application is called `normalized compression distance' and it is trivially computable. It is a parameter-free similarity measure based on compression, and is used in pattern recognition, data mining, phylogeny, clustering, and classification. The complexity properties of its theoretical precursor, the NID, have been open. We show that the NID is neither upper semicomputable nor lower semicomputable up to any reasonable precision. Comment: LaTeX 8 pages, Submitted. 2nd version corrected some typos
IEEE Transactions on Information Theory, 2004
A new class of distances appropriate for measuring similarity relations between sequences, say one type of similarity per distance, is studied. We propose a new "normalized information distance," based on the noncomputable notion of Kolmogorov complexity, and show that it is in this class and it minorizes every computable distance in the class (that is, it is universal in that it discovers all computable similarities). We demonstrate that it is a metric and call it the similarity metric . This theory forms the foundation for a new practical tool. To evidence generality and robustness, we give two distinctive applications in widely divergent areas using standard compression programs like gzip and GenCompress. First, we compare whole mitochondrial genomes and infer their evolutionary history. This results in a first completely automatic computed whole mitochondrial phylogeny tree. Secondly, we fully automatically compute the language tree of 52 different languages.
1997
For systems usually characterized as complex/living/intelligent, the spatio-temporal patterns exhibited on di erent scales di er markedly from one another. (E.g., the biomass distribution of a human body looks very di erent depending on the spatial scale at which one examines that biomass.) Conversely, the density patterns at di erent scales in non-living/simple systems (e.g., gases, mountains, crystal) do not vary signi cantly from one another. Such self-dissimilarity can be empirically measured on almost any real-world data set involving spatiotemporal densities, be they mass densities, species densities, or symbol densities. Accordingly, taking a system's (empirically measurable) self-dissimilarity over various scales as a complexity \signature" of the system, we can compare the complexity signatures of wholly di erent kinds of systems (e.g., systems involving information density in a digital computer vs. systems involving species densities in a rainforest, vs. capital density in an economy etc.). Signatures can also be clustered, to provide an empirically determined taxonomy of kinds of systems that share organizational traits. Many of our candidate self-dissimilarity measures can also be calculated (or at least approximated) for physical models. The measure of dissimilarity between two scales that we nally choose is the amount of extra information on one of the scales beyond that which exists on the other scale. It is natural to determine this \added information" using a maximum entropy inference of the pattern at the second scale, based on the provided pattern at the rst scale. We brie y discuss using our measure with other inference mechanisms (e.g., Kolmogorov complexity-based inference, fractal-dimension preserving inference, etc.).
Complexity, 2007
For many systems characterized as "complex" the patterns exhibited on different scales differ markedly from one another. For example, the biomass distribution in a human body "looks very different" depending on the scale at which one examines it. Conversely, the patterns at different scales in "simple" systems (e.g., gases, mountains, crystals) vary little from one scale to another. Accordingly, the degrees of self-dissimilarity between the patterns of a system at various scales constitute a complexity "signature" of that system. Here we present a novel quantification of self-dissimilarity. This signature can, if desired, incorporate a novel information-theoretic measure of the distance between probability distributions that we derive here. Whatever distance measure is chosen, our quantification of self-dissimilarity can be measured for many kinds of real-world data. This allows comparisons of the complexity signatures of wholly different kinds of systems (e.g., systems involving information density in a digital computer vs. species densities in a rain forest vs. capital density in an economy, etc.). Moreover, in contrast to many other suggested complexity measures, evaluating the self-dissimilarity of a system does not require one to already have a model of the system. These facts may allow self-dissimilarity signatures to be used as the underlying observational variables of an eventual overarching theory relating all complex systems. To illustrate self-dissimilarity, we present several numerical experiments. In particular, we show that the underlying structure of the logistic map is picked out by the self-dissimilarity signature of time series produced by that map.
International Journal of Modern Nonlinear Theory and Application, 2014
This paper presents an information theoretic approach to the concept of intelligence in the computational sense. We introduce a probabilistic framework from which computation alintelligence is shown to be an entropy minimizing process at the local level. Using this new scheme, we develop a simple data driven clustering example and discuss its applications.
Canadian Conference on Electrical and Computer Engineering 2004 (IEEE Cat. No.04CH37513), 2004
After reviewing unnormalized and normalized information distances based on incomputable notions of Kolmogorov complexity, we discuss how Kolmogorov complexity can be approximated by data compression algorithms. We argue that optimal algorithms for data compression with side information can be successfully used to approximate the normalized distance. Next, we discuss an alternative information distance, which is based on relative entropy rate (also known as Kullback-Leibler divergence), and compression-based algorithms for its estimation. Based on available biological and linguistic data, we arrive to unexpected conclusion that in Bioinformatics and Computational Linguistics this alternative distance is more relevant and important than the ones based on Kolmogorov complexity.
Information Theory and Statistical Learning, 2009
The normalized information distance is a universal distance measure for objects of all kinds. It is based on Kolmogorov complexity and thus uncomputable, but there are ways to utilize it. First, compression algorithms can be used to approximate the Kolmogorov complexity if the objects have a string representation. Second, for names and abstract concepts, page count statistics from the World Wide Web can be used. These practical realizations of the normalized information distance can then be applied to machine learning tasks, expecially clustering, to perform feature-free and parameter-free data mining. This chapter discusses the theoretical foundations of the normalized information distance and both practical realizations. It presents numerous examples of successful real-world applications based on these distance measures, ranging from bioinformatics to music clustering to machine translation.
We present a new similarity measure based on information theoretic measures which is superior than Normalized Compression Distance for clustering problems and inherits the useful properties of conditional Kolmogorov complexity.
Complexity, 2007
For many systems characterized as "complex" the patterns exhibited on different scales differ markedly from one another. For example, the biomass distribution in a human body "looks very different" depending on the scale at which one examines it. Conversely, the patterns at different scales in "simple" systems (e.g., gases, mountains, crystals) vary little from one scale to another. Accordingly, the degrees of self-dissimilarity between the patterns of a system at various scales constitute a complexity "signature" of that system. Here we present a novel quantification of self-dissimilarity. This signature can, if desired, incorporate a novel information-theoretic measure of the distance between probability distributions that we derive here. Whatever distance measure is chosen, our quantification of self-dissimilarity can be measured for many kinds of real-world data. This allows comparisons of the complexity signatures of wholly different kinds of systems (e.g., systems involving information density in a digital computer vs. species densities in a rain forest vs. capital density in an economy, etc.). Moreover, in contrast to many other suggested complexity measures, evaluating the self-dissimilarity of a system does not require one to already have a model of the system. These facts may allow self-dissimilarity signatures to be used as the underlying observational variables of an eventual overarching theory relating all complex systems. To illustrate self-dissimilarity, we present several numerical experiments. In particular, we show that the underlying structure of the logistic map is picked out by the self-dissimilarity signature of time series produced by that map.
2009
According to the transformational approach to similarity, two objects are judged to be more similar the simpler the transformation of one of the object representations into the other. This approach draws inspiration from the mathematical theory of Kolmogorov complexity, but otherwise remains an informal theory to this day. In this paper we investigate several different ways in which the informal theory of transforma-tional similarity can be understood, providing a formalization for each possible reading. We then study the computational (in)tractability of each formalization for a variety of parameter settings. Our results have both theoretical and empirical implications for transformational approaches to similarity.
This work contributes to the design and understanding of similarity and dissimilarity in AI, in order to increase their general utility. A formal definition for each concept is proposed, joined with a set of fundamental properties. A main basis of results are compiled by application of transformation functions. The behavior of the properties under the transformations is studied and revealed as an important matter to bear in mind. Some examples try to illustrate the proposed framework.
Proceedings DCC 2000. Data Compression Conference
Modern information theory is founded on the ideas of Hartley and Shannon, amongst others. From a practitioners standpoint, Shannon's probabilistic framework carries certain impediments for the practical measurement of information, such as requiringá priori knowledge of a source's characteristics. Moreover, such a statistical formulation of entropy is an asymptotic limit, meaningful only within the context of an ensemble of messages. It thus fails to address the notion of an individual string having information content in of itself. However, in 1953, Cherry [1] demonstrated that Shannon's entropy could be viewed equivalently as a measure of the average number of selections required to identify each message symbol from the alphabet. Here the terminology contrasts with Shannon's probabilistic formulation, with the process of counting selection steps appearing to be meaningful for individual, isolated, finite strings. We explore this alternative approach in the context of a recursive hierarchical pattern copying (RHPC) algorithm. We use to measure the complexity of finite strings, in terms of the number of steps required to recursively construct the string from its alphabet. From this we compute an effective rate of steps-per-symbol required for linearly constructing the string. By Cherry's interpretation of Shannon's entropy, we infer this as giving asymptotic equivalence between the two approaches, but perhaps the real significance of this new way to measure information, is its applicability and usefulness in respect of evaluating individual finite strings.
Modern Physics Letters A, 2011
We introduce a concept of distance between physical theories described by an action. The definition of the distance is based on the relative entropy. We briefly discuss potential physical applications. 1
2020
Recently, it has been argued that entropy can be a direct measure of complexity, where the smaller value of entropy indicates lower system complexity, while its larger value indicates higher system complexity. We dispute this view and propose a universal measure of complexity that is based on Gell-Mann’s view of complexity. Our universal measure of complexity is based on a non-linear transformation of time-dependent entropy, where the system state with the highest complexity is the most distant from all the states of the system of lesser or no complexity. We have shown that the most complex is the optimally mixed state consisting of pure states, i.e., of the most regular and most disordered which the space of states of a given system allows. A parsimonious paradigmatic example of the simplest system with a small and a large number of degrees of freedom is shown to support this methodology. Several important features of this universal measure are pointed out, especially its flexibili...
Software - Practice and Experience, 2010
Normalized information distance (NID) uses the theoretical notion of Kolmogorov complexity, which for practical purposes is approximated by the length of the compressed version of the file involved, using a real-world compression program. This practical application is called 'normalized compression distance' and it is trivially computable. It is a parameter-free similarity measure based on compression, and is used in pattern
In this paper, we introduce a new information-theoretic approach to study the complexity of an image. The new framework we present here is based on considering the information channel that goes from the histogram to the regions of the partitioned image, maximizing the mutual information. Image complexity has been related to the entropy of the image intensity histogram. This disregards the spatial distribution of pixels, as well as the fact that a complexity measure must take into account at what level one wants to describe an object. We define the complexity by using two measures which take into account the level at which the image is considered. One is the number of partitioning regions needed to extract a given ratio of information from the image. The other is the compositional complexity given by the Jensen-Shannon divergence of the partitioned image.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.