Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2000, IEEE Transactions on Information Theory
We consider a new class of information sources called wordvalued sources in order to investigate coding algorithms based upon string parsing. A word-valued source is defined as a pair of an independent and identically distributed (i.i.d.) source with a countable alphabet and a function that maps each symbol into a finite sequence over a finite alphabet. A word-valued source is a nonstationary process and has countable states. If the function of a word-valued source is prefix-free, the entropy rate is characterized with a simple expression and the AEP (Asymptotic Equipartition Property) holds.
Ieice Transactions on Fundamentals of Electronics Communications and Computer, 2003
Nishiara and Morita defined an i.i.d. wordvalued source which is defined as a pair of an i.i.d. source with a countable alphabet and a function which transforms each symbol into a word over finite alphabet. They showed the asymptotic equipartition property (AEP) of the i.i.d. word-valued source and discussed the relation with source coding algorithm based on a string parsing approach. However, their model is restricted in the i.i.d. case and any universal code for a class of word-valued sources isn't discussed. In this paper, we generalize the i.i.d. word-valued source to the ergodic word-valued source which is defined by an ergodic source with a countable alphabet and a function from each symbol to a word. We show existence of entropy rate of the ergodic word-valued source and its formula. Moreover, we show the recurrence time theorem for the ergodic word-valued source with a finite alphabet. This result clarifies that Ziv-Lempel code (ZL77 code) is universal for the ergodic word-valued source.
IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, 2006
Recently, a word-valued source has been proposed as a new class of information source models. A word-valued source is regarded as a source with a probability distribution over a word set. Although a word-valued source is a nonstationary source in general, it has been proved that an entropy rate of the source exists and the Asymptotic Equipartition Property (AEP) holds when the word set of the source is prefix-free. However, when the word set is not prefix-free (non-prefix-free), only an upper bound on the entropy density rate for an i.i.d. word-valued source has been derived so far. In this paper, we newly derive a lower bound on the entropy density rate for an i.i.d. word-valued source with a finite non-prefix-free word set. Then some numerical examples are given in order to investigate the behavior of the bounds.
arXiv (Cornell University), 2022
We propose a generalization of the asymptotic equipartition property to discrete sources with an ambiguous alphabet, and prove that it holds for irreducible stationary Markov sources with an arbitrary distinguishability relation. Our definition is based on the limiting behavior of graph parameters appearing in a recent dual characterization of the Shannon capacity, evaluated at subgraphs of strong powers of the confusability graph induced on high-probability subsets. As a special case, our results give an information-theoretic interpretation of the graph entropy rate of such sources.
2006
We prove a coding theorem for the class of variable-to-fixed length codes and memory less processes using a generalized version of average word length and Rényi's entropy. Further, a generalized version of Tunstall's algorithm is introduced and its optimality is proved.
IEEE Transactions on Information Theory, 1981
Zeitschrift f�r Wahrscheinlichkeitstheorie und Verwandte Gebiete, 1969
The aim of this paper is to provide a mathematically rigorous and sufficiently general treatment of the basic inform~ition-theoretic problems concerning sources with symbols of different costs and noiseless coding in a general sense. The main new concepts defined in this paper are the entropy rate (entropy per unit cost) of a source with respect to a stochastic cost scale and the encoding (in particular decodable encoding) of a source in a general sense. On the basis of these concepts, we prove some general theorems on the relation of entropy rates with respect to different cost scales and on the effect of encoding to the entropy rate. In particular, the "principle of conservation of entropy" and the "noiseless coding theorem" are proved under very general conditions.
The minimum average number of bits need to describe a random variable is its entropy. This supposes knowledge of the distribution of the random variable. On the other hand, universal compression supposes that the distribution of the random variable, while unknown, belongs to a known set P of distributions. Such universal descriptions for the random variable are agnostic to the identity of the distribution in P. But because they are not matched exactly to the underlying distribution of the random variable, the average number of bits they use is higher, and the excess over the entropy used is the redundancy. This formulation is fundamental to problems not just in compression, but also estimation and prediction and has a wide variety of applications from language modeling to insurance. In this paper, we study the redundancy of universal encodings of strings generated by independent identically distributed (i.i.d.) sampling from a set P of distributions over a countable support. We first show that if describing a single sample from P incurs finite redundancy, then P is tight but that the converse does not always hold. If a single sample can be described with finite worst-case-regret (a more stringent formulation than redundancy above), then it is known that describing length-n i.i.d. samples only incurs a diminishing (in n) redundancy per symbol as n increases. However, we show it is possible that a collection P incurs finite redundancy, yet description of length-n i.i.d. samples incurs a constant redundancy per symbol encoded. We then show a sufficient condition on P such that length-n i.i.d. samples will incur diminishing redundancy per symbol encoded.
The problem of predicting a sequence x 1 , x 2 , · · · generated by a discrete source with unknown statistics is considered. Each letter x t+1 is predicted using the information on the word x 1 x 2 · · · x t only. This problem is of great importance for data compression, because of its use to estimate probability distributions for PPM algorithms and other adaptive codes. On the other hand, such prediction is a classical problem which has received much attention. Its history can be traced back to Laplace. We address the problem where the sequence is generated by an i.i.d. source with some large (or even infinite) alphabet. A method is presented for which the redundancy of the code goes to 0 even for infinite alphabet.
Entropy, 2014
The minimum expected number of bits needed to describe a random variable is its entropy, assuming knowledge of the distribution of the random variable. On the other hand, universal compression describes data supposing that the underlying distribution is unknown, but that it belongs to a known set P of distributions. However, since universal descriptions are not matched exactly to the underlying distribution, the number of bits they use on average is higher, and the excess over the entropy used is the redundancy. In this paper, we study the redundancy incurred by the universal description of strings of positive integers (Z +), the strings being generated independently and identically distributed (i.i.d.) according an unknown distribution over Z + in a known collection P. We first show that if describing a single symbol incurs finite redundancy, then P is tight, but that the converse does not always hold. If a single symbol can be described with finite worst-case regret (a more stringent formulation than redundancy above), then it is known that describing length n i.i.d. strings only incurs vanishing (to zero) redundancy per symbol as n increases. On the contrary, we show it is possible that the description of a single symbol from an unknown distribution of P incurs finite redundancy, yet the description of length n i.i.d. strings incurs a constant (> 0) redundancy per symbol encoded. We then show a sufficient condition on single-letter marginals, such that length n i.i.d. samples will incur vanishing redundancy per symbol encoded.
IEEE Transactions on Information Theory, 1987
An expression is obtained for the optimum distortion theoretically attainable when an information source with finite alphabet is encoded at a fixed rate with respect to a single-letter fidelity criterion. The expression is demonstrated by means of an appropriate coding theorem and converse. Thfs new result generalizes the coding theorem of Shannon for stationary ergodic sources, the coding theorem of Gray-Davisson for stationary nonergodic sources, that of Gray-Saadat for asymptotically mean stationary sources, and that of Ziv for an individual sequence.
This article presents the calculation of the entropy of a system with Zipfian distribution. It shows that a communication system tends to present an exponent value close to, but greater than one. This choice both maximizes entropy and, at the same time, enables the retention of a feasible and growing lexicon. This result is in accordance with what is observed in natural languages and with the balance between the speaker and listener communication efforts. On the other hand, the entropy of the communicating source is very sensitive to the exponent value as well as the length of the observable data. Slight deviations on these parameters might lead to very different entropy measurements. A comparison of the estimation proposed with the entropy measure of written texts yields errors in the order of 0.3 bits and 0.05 bits for non-smoothed and smoothed distributions, respectively.
We prove an asymptotic relationship between certain longest match-lengths along a single realization of a stationary process, and its entropy rate: Given a process X = fX n ; n 2 Zg and a realization x from X, we de ne N i (x) as the length of the shortest substring starting at x i , that does not appear as a contiguous substring of (x i?N ; x i?N+1 ; : : : ; x i?1 ). We show that, for a class of stationary processes with nite state-space (including all i.i.d. and mixing Markov processes of all orders), the following limiting relation holds: lim
2003
Common deterministic measures of the information content of symbolic strings revolve around the resources used in describing or parsing the string. The well known and successful Lempel-Ziv parsing process is described briefly, and compared to the lessor known Titchener parsing process that might have certain theoretical advantages in the study of the nature of deterministic information in strings. Common to the two methods we find that the maximal complexity is asymptotic to hn/ log n, where h is a probabilistic entropy and n is the length of the string. By considering a generic parsing process that can be used to define string complexity, it is shown that this complexity bound appears as a consequence of the counting of unique words, rather than being a result specific to any particular parsing process.
IEEE Transactions on Information Theory, 1994
We consider the problem of source coding. We investigate the cases of known and unknown statistics. The efficiency of the compression codes can be estimated by three characteristics: 1) the rebundancy (r), defined as the maximal difference between the average codeword length and Shannon entropy in case the letters are generated by a Bernoulli source;
Electronic Proceedings in Theoretical Computer Science, 2011
Most of the text algorithms build data structures on words, mainly trees, as digital trees (tries) or binary search trees (bst). The mechanism which produces symbols of the words (one symbol at each unit time) is called a source, in information theory contexts. The probabilistic behaviour of the trees built on words emitted by the same source depends on two factors: the algorithmic properties of the tree, together with the information-theoretic properties of the source. Very often, these two factors are considered in a too simplified way: from the algorithmic point of view, the cost of the Bst is only measured in terms of the number of comparisons between words -from the information theoretic point of view, only simple sources (memoryless sources or Markov chains) are studied. We wish to perform here a realistic analysis, and we choose to deal together with a general source and a realistic cost for data structures: we take into account comparisons between symbols, and we consider a general model of source, related to a dynamical system, which is called a dynamical source. Our methods are close to analytic combinatorics, and our main object of interest is the generating function of the source Λ(s), which is here of Dirichlet type. Such an object transforms probabilistic properties of the source into analytic properties. The tameness of the source, which is defined through analytic properties of Λ(s), appears to be central in the analysis, and is precisely studied for the class of dynamical sources. We focus here on arithmetical conditions, of diophantine type, which are sufficient to imply tameness on a domain with hyperbolic shape. Plan of the paper. We first recall in Section 1 general facts on sources and trees, and define the probabilistic model chosen for the analysis. Then, we provide the statements of the main two theorems (Theorem 1 and 2) which establish the possible probabilistic behaviour of trees, provided that the source be tame. The tameness notions are defined in a general framework and then studied in the case of simple sources (memoryless sources and Markov chains). In Section 2, we focus on a general model of sources, the dynamical sources, that contains as a subclass the simple sources. We present sufficient conditions on these sources under which it is possible to prove tameness. We compare these tameness properties to those of simple sources, and exhibit both resemblances and differences between the two classes. 1 Probabilistic behaviour of trees built on general sources. 1.1. General sources. Throughout this paper, an ordered (possibly infinite denumerable) alphabet Σ := {a 1 , a 2 , . . . , a r } is fixed. A probabilistic source, which produces infinite words of Σ N , is specified by the set {p w , w ∈ Σ } of fundamental probabilities p w , where p w is the probability that an infinite word begins with the finite prefix w. It is furthermore assumed that π k := sup{p w : w ∈ Σ k } tends to 0, as k → ∞.
IEEE Transactions on Information Theory, 2000
This paper sheds light on universal coding with respect to classes of memoryless sources over a countable alphabet defined by an envelope function with finite and non-decreasing hazard rate. We prove that the auto-censuring (AC) code introduced by Bontemps is adaptive with respect to the collection of such classes. The analysis builds on the tight characterization of universal redundancy rate in terms of metric entropy by and on a careful analysis of the performance of the AC-coding algorithm. The latter relies on non-asymptotic bounds for maxima of samples from discrete distributions with finite and non-decreasing hazard rate.
IEEE Transactions on Information Theory, 1979
Absstruct-A combiiatorial approach is proposed for proving the ciassical source coding theorems for a finite memoryleas stationary source (giving achievable rates and the error probability exponent). This approach provides a sound heuristic justification for the widespread appearence of entropy and divergence (Knliback's discrimination) in source coding. The results are based on the notion of composition class -a set made up of all the distinct source sequences of a given length which are permutations of one another. The asymptotic growth rate of any composition class is precisely an entropy. For a finite memoryless constant source ail members of a composition class have equal probability; the probability of any given class therefore is equal to the number of sequences in the class times the probability of an individual sequence in the class. The number of different composition classes is algebraic in block length, whereas the probability of a composition class is exponential, and the probability exponent is a divergence. Thus if a codeword is assigned to all sequences whose composition classes have rate less than some rate R, the probability of error is asymptotically the probability of the most probable composition class of rate greater than R. This is expressed in terms of a divergence. No use is made either of the law of large numbers or of Chebyshev's inequality.
Applied Mathematics and Computation, 2002
The concept of entropy plays a major part in communication theory. The Shannon entropy is a measure of uncertainty with respect to a priori probability distribution. In algorithmic information theory the information content of a message is measured in terms of the size in bits of the smallest program for computing that message. This paper discusses the classical entropy and entropy rate for discrete or continuous Markov sources, with finite or continuous alphabets, and their relations to program-size complexity and algorithmic probability. The accent is on ideas, constructions and results; no proofs will be given.
European Transactions on Telecommunications, 1993
We show that there is a universal noiseless source code for the class of all countably infinite memoryless sources for which a fixed given uniquely decodable code has finite expected codeword length. This source code is derived from a class of distribution estimation procedures which are consistent in expected information divergence.
IEEE Transactions on Information Theory, 1998
We discuss a family of estimators for the entropy rate of a stationary ergodic process and prove their pointwise and mean consistency under a Doeblin-type mixing condition. The estimators are Cesàro averages of longest match-lengths, and their consistency follows from a generalized ergodic theorem due to Maker. We provide examples of their performance on English text, and we generalize our results to countable alphabet processes and to random fields.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.