Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2011
Efficient access to large data collections is nowadays an interesting problem for many research areas and applications. A recent conception of the time-space relationship suggests a strong relation between data compression and algorithms in the comparison model. In this sense, efficient algorithms could be used to induce compressed representations of the data they process. Examples of this relationship include unbounded search algorithms and integer encodings, adaptive sorting algorithms and compressed representation of permutations, or union algorithms and encoding for bit vectors. In this thesis, we propose to study the time-space relationship on different data types. We aim to define new compression schemes and compressed data structures based on adaptive algorithms that work over these data types, and to evaluate their practicality in data compression applications.
Communications of the ACM, 1986
A data compression scheme that exploits locality of reference, such as occurs when words are used frequently over short intervals and then fall into long periods of disuse, is described. The scheme is based on a simple heuristic for self-organizing sequential search and on variable-length encodings of integers. We prove that it never performs much worse than Huffman coding and can perform substantially better; experiments on real files show that its performance is usually quite close to that of Huffman coding. Our scheme has many implementation advantages: it is simple, allows fast encoding and decoding, and requires only one pass over the data to be compressed (static Huffman coding takes two passes).
Dagstuhl Reports, 2018
From the 8th of July 2018 to the 13th of July 2018, a Dagstuhl Seminar took place with the topic “Synergies between Adaptive Analysis of Algorithms, Parameterized Complexity, Compressed Data Structures and Compressed Indices”. There, 40 participants from as many as 14 distinct countries and four distinct research areas, dealing with running time analysis and space usage analysis of algorithms and data structures, gathered to discuss results and techniques to “go beyond the worst-case” for classes of structurally restricted inputs, both for (fast) algorithms and (compressed) data structures. The seminar consisted of (1) a first session of personal introductions, each participant presenting his expertise and themes of interests in two slides; (2) a series of four technical talks; and (3) a larger series of presentations of open problems, with ample time left for the participants to gather and work on such open problems. Seminar July 8–13, 2018 – http://www.dagstuhl.de/18281 2012 ACM S...
Algorithms, 2019
Nowadays, a variety of data-compressors (or archivers) is available, each of which has its merits, and it is impossible to single out the best ones. Thus, one faces the problem of choosing the best method to compress a given file, and this problem is more important the larger is the file. It seems natural to try all the compressors and then choose the one that gives the shortest compressed file, then transfer (or store) the index number of the best compressor (it requires log m bits, if m is the number of compressors available) and the compressed file. The only problem is the time, which essentially increases due to the need to compress the file m times (in order to find the best compressor). We suggest a method of data compression whose performance is close to optimal, but for which the extra time needed is relatively small: the ratio of this extra time and the total time of calculation can be limited, in an asymptotic manner, by an arbitrary positive constant. In short, the main i...
Information Processing & Management, 1976
This paper describes a formalism to construct some kinds of algorithms useful to represent one structure about a set of data. It proves that if we do not take into account cost considerations of one algorithm, one can partialy replace the memory by an algorithm. It also proves that the remaining memory part is independant of the construction process. It then evaluate the affects of algorithms representation cost and gives the resulting memory gain obtained in two particular examples.
Algorithms - ESA 2015, 2015
We consider the problem of storing a dynamic string S over an alphabet Σ = { 1,. .. , σ } in compressed form. Our representation supports insertions and deletions of symbols and answers three fundamental queries: access(i, S) returns the i-th symbol in S, ranka(i, S) counts how many times a symbol a occurs among the first i positions in S, and selecta(i, S) finds the position where a symbol a occurs for the i-th time. We present the first fully-dynamic data structure for arbitrarily large alphabets that achieves optimal query times for all three operations and supports updates with worst-case time guarantees. Ours is also the first fully-dynamic data structure that needs only nH k +o(n log σ) bits, where H k is the k-th order entropy and n is the string length. Moreover our representation supports extraction of a substring S[i..i + ] in optimal O(log n/ log log n + / log σ n) time.
IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2000
In this paper, we propose a new two-stage hardware architecture that combines the features of both parallel dictionary LZW (PDLZW) and an approximated adaptive Huffman (AH) algorithms. In this architecture, an ordered list instead of the treebased structure is used in the AH algorithm for speeding up the compression data rate. The resulting architecture shows that it not only outperforms the AH algorithm at the cost of only one-fourth the hardware resource but it is also competitive to the performance of LZW algorithm (compress). In addition, both compression and decompression rates of the proposed architecture are greater than those of the AH algorithm even in the case realized by software.
2006
We propose measures for compressed data structures, in which space usage is measured in a data-aware manner. In particular, we consider the fundamental dictionary problem on set data, where the task is to construct a data structure to represent a set S of n items out of a universe U = {0, . . . , u − 1} and support various queries on S. We use a well-known data-aware measure for set data called gap to bound the space of our data structures.
Lecture Notes in Computer Science, 2004
In this paper, we introduce a new approach to adaptive coding which utilizes Stochastic Learning-based Weak Estimation (SLWE) techniques to adaptively update the probabilities of the source symbols. We present the corresponding encoding and decoding algorithms, as well as the details of the probability updating mechanisms. Once these probabilities are estimated, they can be used in a variety of data encoding schemes, and we have demonstrated this, in particular, for the adaptive Fano scheme and and an adaptive entropy-based scheme that resembles the well-known arithmetic coding. We include empirical results using the latter adaptive schemes on real-life files that possess a fair degree of nonstationarity. As opposed to higher-order statistical models, our schemes require linear space complexity, and compress with nearly 10% better efficiency than the traditional adaptive coding methods.
International Conference of the Chilean Computer Science Society, 2010
Compact representation of integer values is a key feature for data compression. A compressed representation allows us to store more integers within less space, and therefore, to work in faster hierarchies of memory. Adaptive searching yields to compressed representations of integers. In scenarios where the cost of updating the encode is high, approaches to perform operations over integers dynamically are preferable. In this paper we present a framework to create integer encodings, based on augmenting search algorithms in the comparison model. We show that such algorithms shares common properties allowing the support of arithmetic and logic operations over the encoded integers. Finally, we propose a practical approach to support these operations over Elias integer encodings.
2014 Data Compression Conference, 2014
List update is a key step during the Burrows-Wheeler transform (BWT) compression. Previous work has shown (e.g., ]) that careful study of the list update step leads to better BWT compression. Surprisingly, the theoretical study of list update algorithms for compression has lagged behind its use in real practice. To be more precise, the standard model by Sleator and Tarjan for list update considers a linear cost-of-access model while compression incurs a logarithmic cost of access, i.e. accessing item i in the list has cost Θ(i) in the standard model but Θ(log i) in compression applications 1 . These models have been shown, in general, not to be equivalent . This paper has two contributions:
A data compression scheme that exploits locality of reference, such as occurs when words are used frequently over short intervals and then fall into long periods of disuse, is described. The scheme is based on a simple heuristic for self-organizing sequential search and on variable-length encodings of integers. We prove that it never performs much worse than Huffman coding and can perform substantially better; experiments on real files show that its performance is usually quite close to that of Huffman coding. Our scheme has many implementation advantages: it is simple, allows fast encoding and decod- ing, and requires only one pass over the data to be com- pressed (static Huffman coding takes huo passes).
Encyclopedia of GIS, 2008
This paper surveys a variety of data compression methods spanning almost 40 years of research, from the work of Shannon, Fano, and Huffman in the late 1940s to a technique developed in 1986. The aim of data compression is to reduce redundancy in stored or communicated data, thus increasing effective data density. Data compression has important application in the areas of file storage and distributed systems. Concepts from information theory as they relate to the goals and evaluation of data compression methods are discussed briefly. A framework for evaluation and comparison of methods is constructed and applied to the algorithms presented. Comparisons of both theoretical and empirical natures are reported, and possibilities for future research are suggested
Combinatorial Pattern Matching, 2011
LRM-Trees are an elegant way to partition a sequence of values into sorted consecutive blocks, and to express the relative position of the first element of each block within a previous block. They were used to encode ordinal trees and to index integer arrays in order to support range minimum queries on them. We describe how they yield many other convenient results in a variety of areas, from data structures to algorithms: some compressed succinct indices for range minimum queries; a new adaptive sorting algorithm; and a compressed succinct data structure for permutations supporting direct and indirect application in time all the shortest as the permutation is compressible. As part of our review preliminary work, we also give an overview of the, sometimes redundant, terminology relative to succinct data-structures and indices.
Algorithms and Computation, 2010
We present a data structure that stores a sequence s[1..n] over alphabet [1..σ] in nH 0 (s) + o(n)(H 0 (s)+1) bits, where H 0 (s) is the zero-order entropy of s. This structure supports the queries access, rank and select, which are fundamental building blocks for many other compressed data structures, in worst-case time O (lg lg σ) and average time O (lg H 0 (s)). The worst-case complexity matches the best previous results, yet these had been achieved with data structures using nH 0 (s) + o(n lg σ) bits. On highly compressible sequences the o(n lg σ) bits of the redundancy may be significant compared to the the nH 0 (s) bits that encode the data. Our representation, instead, compresses the redundancy as well. Moreover, our average-case complexity is unprecedented. Our technique is based on partitioning the alphabet into characters of similar frequency. The subsequence corresponding to each group can then be encoded using fast uncompressed representations without harming the overall compression ratios, even in the redundancy. The result also improves upon the best current compressed representations of several other data structures. For example, we achieve (i) compressed redundancy, retaining the best time complexities, for the smallest existing full-text self-indexes; (ii) compressed permutations π with times for π() and π −1 () improved to loglogarithmic; and (iii) the first compressed representation of dynamic collections of disjoint sets. We also point out various applications to inverted indexes, suffix arrays, binary relations, and data compressors. Our structure is practical on large alphabets. Our experiments show that, as predicted by theory, it dominates the space/time tradeoff map of all the sequence representations, both in synthetic and application scenarios.
1992
Data may be compressed using textual substitution. Textual substitution identifies repeated substrings and replaces some or all substrings by pointers to another copy. We construct an incremental algorithm for a specific textual substitution method: coding a text with respect to a dictionary. With this incremental algorithm it is possible to combine two coded texts in constant time.
ACM Transactions on Algorithms, 2006
We report on a new experimental analysis of high-order entropy-compressed suffix arrays, which retains the theoretical performance of previous work and represents an improvement in practice.
Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms, 2017
We show that the compressed suffix array and the compressed suffix tree of a string $T$ can be built in $O(n)$ deterministic time using $O(n\log\sigma)$ bits of space, where $n$ is the string length and $\sigma$ is the alphabet size. Previously described deterministic algorithms either run in time that depends on the alphabet size or need $\omega(n\log \sigma)$ bits of working space. Our result has immediate applications to other problems, such as yielding the first linear-time LZ77 and LZ78 parsing algorithms that use $O(n \log\sigma)$ bits.
Data compression has important application in the field of file storage and distributed systems. It helps in reducing redundancy in stored or communicated data. This paper studies various compression techniques and analyzes the approaches used in data compression. Furthermore, information theory concepts that relates to aims and evaluation of data compression methods are briefly discussed. A framework for the evaluation and comparison of various compression algorithms is constructed and applied to the algorithms presented here. This paper reports the theoretical and practical nature of compression algorithms. Moreover, it also discusses the future possibilities of research work in the field of data compression.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.