Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2009, 2009 International Symposium on Signals, Circuits and Systems
In this paper an information analysis for lossless compression of a large class of discrete sources is performed. The lossless compression is performed by means of a Huffman code with an alphabet A of size M. Matrix characterization of the encoding as a source with memory is realized. The information quantities H(S,A), H(S), H(A), H(A|S), H(S|A), I(S,A) as well as the minimum average code word length are derived. Three extreme cases, p=M-1, p=0 and M=2, p=1 have been analyzed.
2009 International Symposium on Signals, Circuits and Systems, 2009
We analyze the lossless compression for a large class of discrete complete and memoryless sources performed by a generalized Huffman with an alphabet consisting of M letters. Given the number of source messages, N, the alphabet size, M, and the number of code words, p, on each level in the graph, excepting the last two ones, we have determined the unknown encoding parameters, that is, the number n of the levels in the encoding graph, the number q of code words on the level n-1, the number k of groups of M nodes, and the remaining m nodes on the last level. The average code word length is also computed. Two extreme cases, when p=0 and p=M-1 have been analyzed.
A Huffman code is a particular type of optimal prefix code that is commonly used for loss-less data compression. The process of finding such a code is known as Huffman coding. The output from Huffman's algorithm can be viewed as a variable-length code table for encoding a source symbol. The algorithm derives this table from the estimated probability or frequency of occurrence for each possible value of the source symbol. In this paper, we present a new approach to measure the performance and redundancy that work on two methods of coding like Huffman coding and Minimum Variance Huffman Coding. After getting the code-word for each symbol, we compress it on the basis of its binary values like 0 and 1 using binary coding. This is applied to both the approaches; this process is called as Double Huffman Coding. Finally we produce a better result than Huffman coding.
Problems of Information Transmission, 2012
The compression-complexity trade-off of lossy compression algorithms that are based on a random codebook or a random database is examined. Motivated, in part, by recent results of Gupta-Verdú-Weissman (GVW) and their underlying connections with the pattern-matching scheme of Kontoyiannis' lossy Lempel-Ziv algorithm, we introduce a non-universal version of the lossy Lempel-Ziv method (termed LLZ). The optimality of LLZ for memoryless sources is established, and its performance is compared to that of the GVW divide-and-conquer approach. Experimental results indicate that the GVW approach often yields better compression than LLZ, but at the price of much higher memory requirements. To combine the advantages of both, we introduce a hybrid algorithm (HYB) that utilizes both the divide-and-conquer idea of GVW and the single-database structure of LLZ. It is proved that HYB shares with GVW the exact same rate-distortion performance and implementation complexity, while, like LLZ, requiring less memory, by a factor which may become unbounded, depending on the choice of the relevant design parameters. Experimental results are also presented, illustrating the performance of all three methods on data generated by simple discrete memoryless sources. In particular, the HYB algorithm is shown to outperform existing schemes for the compression of some simple discrete sources with respect to the Hamming distortion criterion. 2 codes, turbo codes, and local message-passing decoding algorithms; see, e.g., [33][48][34], the texts [16][32][44], and the references therein.
IEEE Transactions on Information Theory, 2004
It has long been known that the compression redundancy of independent and identically distributed (i.i.d.) strings increases to infinity as the alphabet size grows. It is also apparent that any string can be described by separately conveying its symbols, and its pattern-the order in which the symbols appear. Concentrating on the latter, we show that the patterns of i.i.d. strings over all, including infinite and even unknown, alphabets, can be compressed with diminishing redundancy, both in block and sequentially, and that the compression can be performed in linear time. To establish these results, we show that the number of patterns is the Bell number, that the number of patterns with a given number of symbols is the Stirling number of the second kind, and that the redundancy of patterns can be bounded using results of Hardy and Ramanujan on the number of integer partitions. The results also imply an asymptotically optimal solution for the Good-Turing probability-estimation problem.
Communications of the ACM, 1987
Journal of Computer Science Applications and Information Technology, 2017
In today's world storing numerous data or information efficiently is a significant issue because of limited storage. Every moment we need to transfer large volume of information soundly and correctly. A storage space/memory storage required to store this information is. The way of communication which is also constricted with confined communication lines. These incompetence drives to need of data compression. Compression makes a reduction in memory storage, data transmission time, communication bandwidth and the ultimate cost savings also. To compress data efficiently and effectively one of the most popular and widely used techniques is Huffman compression. It is a lossless compression technique that enables the restoration of a file to its authentic/key state, having not to loss of a single bit of data when the file is uncompressed. It will be more efficient by reducing the memory requirements for Huffman tree. This article aimed at reducing the tree size of Huffman coding and also explored a newly memory efficient technique to store Huffman tree. As a consequence we also designed an encoding and decoding algorithm. Our proposed technique is much more efficient than most of the/all other existing techniques where to represent the Huffman tree structure total memory requirements are 9n-2 bits for the worst case, average and also for the best case. The results obtained significant improvements over previous/existing work.
2007 IEEE International Symposium on Information Theory, 2007
In this paper we investigate universal data compression with side information at the decoder by leveraging traditional universal data compression algorithms. Specifically, consider a source network with feedback in which a finite alphabet source X = {Xi} ∞ i=0 is to be encoded and transmitted, and another finite alphabet source Y = {Yi} ∞ i=0 available only to the decoder as the side information correlated with X. Assuming that the encoder and decoder share a uniform i.i.d. (independent and identically distributed) random database that is independent of (X, Y), we propose a string matching-based (variable-rate) block coding algorithm with a simple progressive encoder for the feedback source network. Instead of using standard joint typicality decoding, this algorithm derives its decoding rule from the codeword length function of a traditional universal lossless coding algorithm. As a result, neither the encoder nor the decoder assumes any prior knowledge of the joint distribution of (X, Y) or even the achievable rates. It is proven that for any (X, Y) in the class of all stationary, ergodic source-side information pairs with finite alphabet, the average number of bits per letter transmitted from the encoder to the decoder (compression rate) goes arbitrarily close to the conditional entropy rate H(X|Y) of X given Y asymptotically, and the average number of bits per letter transmitted from the decoder to the encoder (feedback rate) goes to 0 asymptotically.
2015
This thesis makes several contributions to the field of data compression. Lossless data compression algorithms shorten the description of input objects, such as sequences of text, in a way that allows perfect recovery of the original object. Such algorithms exploit the fact that input objects are not uniformly distributed: by allocating shorter descriptions to more probable objects and longer descriptions to less probable objects, the expected length of the compressed output can be made shorter than the object’s original description. Compression algorithms can be designed to match almost any given probability distribution over input objects. This thesis employs probabilistic modelling, Bayesian inference, and arithmetic coding to derive compression algorithms for a variety of applications, making the underlying probability distributions explicit throughout. A general compression toolbox is described, consisting of practical algorithms for compressing data distributed by various fund...
A data compression scheme that exploits locality of reference, such as occurs when words are used frequently over short intervals and then fall into long periods of disuse, is described. The scheme is based on a simple heuristic for self-organizing sequential search and on variable-length encodings of integers. We prove that it never performs much worse than Huffman coding and can perform substantially better; experiments on real files show that its performance is usually quite close to that of Huffman coding. Our scheme has many implementation advantages: it is simple, allows fast encoding and decod- ing, and requires only one pass over the data to be com- pressed (static Huffman coding takes huo passes).
International Journal of Advance Research and Innovative Ideas in Education, 2018
This project is aimed at optimizing the source file by using Huffman code (lossless data compression) in today's'vastly expanding technical environment where quality data transmission has become necessary. It has application in fields where it is important that the original and compressed data be identical, like in zip file format and is often used as a component within lossy data compression techniques like mp3 encoder and lossy encoders. For this, we are using MATLAB(R2015) where the result from Huffman's algorithm is viewed as a variable code table. This algorithm derives the table from an estimated probability/frequency of occurrence(weight) for each possible value of the source symbol.
Lossless data compression through exploiting redundancy in a sequence of symbols is a well-studied field in computer science and information theory. One way to achieve compression is to statistically model the data and estimate model parameters. In practice, most general purpose data compression algorithms model the data as stationary sequences of 8-bit symbols. While this model fits very well the currently used computer architectures and the vast majority of information representation standards, other models may have both computational and information theoretic merits in being more efficient in implementation or fitting some data closer. In addition, compression algorithms based on the 8 bit symbol model perform very poorly on data represented by binary sequences not aligned with byte boundaries either because the fixed symbol length is not a multiple of 8 bits (e.g. DNA sequences) or because the symbols of the source are encoded into bit sequences of variable length. Throughout this thesis, we assume that the source alphabet consists of blocks of equal size of elementary symbols (typically bits), and address the impact of this block size on lossless compression algorithms in general and in the context of socalled block-sorting compression algorithms in particular. These algorithms are quite popular both in theory and in practice and are the subjects of intensive research with many interesting results in recent years.
IEEE Transactions on Information Theory, 1994
We consider the problem of source coding. We investigate the cases of known and unknown statistics. The efficiency of the compression codes can be estimated by three characteristics: 1) the rebundancy (r), defined as the maximal difference between the average codeword length and Shannon entropy in case the letters are generated by a Bernoulli source;
International Journal of Computer Applications, 2012
Compression helps in reducing the redundancy in the data representation so as to reduce the storage requirement of it. The task of compression consists of two components, an encoding algorithm that takes a message and generates a "compressed" representation and a decoding algorithm that reconstructs the original message or some approximation of it from the compressed representation. Many algorithms are available for compressing the data. Some of the algorithms help in achieving lossless compression and some are good at lossy compression. In this paper, the proposed technique has improved the better compression ratio and compression efficiency on the Huffman Coding on data. The proposed technique works even with image file also but conventional Huffman Algorithm cannot do this. This paper also outlines the use of Data Normalization on the text data so as to remove redundancy for more data compression.
IEEE Transactions on Communications, 2004
Encyclopedia of GIS, 2008
This paper surveys a variety of data compression methods spanning almost 40 years of research, from the work of Shannon, Fano, and Huffman in the late 1940s to a technique developed in 1986. The aim of data compression is to reduce redundancy in stored or communicated data, thus increasing effective data density. Data compression has important application in the areas of file storage and distributed systems. Concepts from information theory as they relate to the goals and evaluation of data compression methods are discussed briefly. A framework for evaluation and comparison of methods is constructed and applied to the algorithms presented. Comparisons of both theoretical and empirical natures are reported, and possibilities for future research are suggested
Since the discovery of the Huffman encoding scheme in 1952, Huffman codes have been widely used in efficient storing of data, image and video. Huffman coding has been subjected to numerous investigations in the past 60 years. Many techniques have been proposed since then. But still this is an important field as it significantly reduces storage requirement and communication cost. In this paper we have directed our works mainly to repeated Huffman coding which is a lossless coding technique. This paper studies performance of repeated Huffman coding. In repeated Huffman coding, Huffman encoding technique is applied on the binary file obtained by Huffman coding repeatedly. Since this technique is applied on a file with header, corresponding to the Huffman tree compression cannot be applied indefinitely. While it is expected that encoded message length will be smaller in every pass of repeated Huffman coding, nevertheless encoding the tree itself will be an overhead in each pass. So repetition count will depend upon how efficiently we can represent a Huffman tree. A memory efficient representation of a Huffman tree has been presented in this paper. Experimental results on ultimate compression ratios for different types of files have also been presented. Keywords: Huffman Coding, Repeated Huffman Coding, Block Huffman Coding, Tree Clustering.
2014
Huffman encoding is often improved by using block codes, for example a 3-block would be an alphabet consisting of each possible combination of three characters. We take the approach of starting with a base alphabet and expanding it to include frequently occurring aggregates of symbols. We prove that the change in compressed message length by the introduction of a new aggregate symbol can be expressed as the difference of two entropies, dependent only on the probabilities and length of the introduced symbol. The expression is independent of the probability of all other symbols in the alphabet. This measure of information gain, for a new symbol, can be applied in data compression methods. We also demonstrate that aggregate symbol alphabets, as opposed to mutually exclusive alphabets have the potential to provide good levels of compression, with a simple experiment. Finally, compression gain as defined in this paper may also be useful for feature selection.
Though Huffman codes [2,3,4,5,9] have shown their power in data compression, there are still some issues that are not noticed. In the present paper, we address the issue on the random property of compressed data via Huffman coding. Randomized computation is the only known method for many notoriously difficult #P-complete problems such as permanent, and some network reliability problems, etc [1,7,8,10]. According to Kolmogorov complexity [6,10], a truly random binary string is very difficult to compress and for any fixed length there exist incompressible strings. In other words, incompressible strings tend to carry higher degree of randomness. We study this phenomenon via Huffman coding. We take compressed data as random source that provides coin flips for randomized algorithms. A simple randomized algorithm is proposed to calculate the value of £kwith the compressed data as random number. Experimental results show that compressed data via Huffman coding does provide a better approxi...
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.