Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2015, WALCOM: Algorithms and Computation
…
7 pages
1 file
In recent years, there has been an explosion of interest in succinct data structures, which store the given data in compact or compressed formats and answer queries on the data rapidly while it is still in its compressed format. Our focus in this talk is to introduce encoding data structures. Encoding data structures consider the data together with the queries and aim to store only as much information about the data as is needed to store the queries. Once this is done, the original data can be deleted. In many cases, one can obtain space-efficient encoding data structures even when the original data is incompressible.
2015 7th International Conference of Soft Computing and Pattern Recognition (SoCPaR), 2015
Succinct data structures are introduced to efficiently solve a given problem while representing the data using as little space as possible. However, the full potential of the succinct data structures have not been utilized in software-based implementations due to the large storage size and the memory access bottleneck. This paper proposes a hardware-oriented data compression method to reduce the storage space without increasing the processing time. We use a parallel processing architecture to reduce the decompression overhead. According to the evaluation, we can compress the data by 37.5% and still have fast data access with small decompression overhead.
Algorithms - ESA 2015, 2015
We consider the problem of storing a dynamic string S over an alphabet Σ = { 1,. .. , σ } in compressed form. Our representation supports insertions and deletions of symbols and answers three fundamental queries: access(i, S) returns the i-th symbol in S, ranka(i, S) counts how many times a symbol a occurs among the first i positions in S, and selecta(i, S) finds the position where a symbol a occurs for the i-th time. We present the first fully-dynamic data structure for arbitrarily large alphabets that achieves optimal query times for all three operations and supports updates with worst-case time guarantees. Ours is also the first fully-dynamic data structure that needs only nH k +o(n log σ) bits, where H k is the k-th order entropy and n is the string length. Moreover our representation supports extraction of a substring S[i..i + ] in optimal O(log n/ log log n + / log σ n) time.
ACM SIGMOD Record, 2001
Over the last decades, improvements in CPU speed have outpaced improvements in main memory and disk access rates by orders of magnitude, enabling the use of data compression techniques to improve the performance of database systems. Previous work describes the benefits of compression for numerical attributes, where data is stored in compressed format on disk. Despite the abundance of string-valued attributes in relational schemas there is little work on compression for string attributes in a database context. Moreover, none of the previous work suitably addresses the role of the query optimizer: During query execution, data is either eagerly decompressed when it is read into main memory, or data lazily stays compressed in main memory and is decompressed on demand only In this paper, we present an effective approach for database compression based on lightweight, attribute-level compression techniques. We propose a IIierarchical Dictionary Encoding strategy that intelligently selects ...
2011
Efficient access to large data collections is nowadays an interesting problem for many research areas and applications. A recent conception of the time-space relationship suggests a strong relation between data compression and algorithms in the comparison model. In this sense, efficient algorithms could be used to induce compressed representations of the data they process. Examples of this relationship include unbounded search algorithms and integer encodings, adaptive sorting algorithms and compressed representation of permutations, or union algorithms and encoding for bit vectors. In this thesis, we propose to study the time-space relationship on different data types. We aim to define new compression schemes and compressed data structures based on adaptive algorithms that work over these data types, and to evaluate their practicality in data compression applications.
ArXiv, 2021
We present a new universal source code for distributions of unlabeled binary and ordinal trees that achieves optimal compression to within lower order terms for all tree sources covered by existing universal codes. At the same time, it supports answering many navigational queries on the compressed representation in constant time on the word-RAM; this is not known to be possible for any existing tree compression method. The resulting data structures, “hypersuccinct trees”, hence combine the compression achieved by the best known universal codes with the operation support of the best succinct tree data structures. We apply hypersuccinct trees to obtain a universal compressed data structure for range-minimum queries. It has constant query time and the optimal worst-case space usage of 2n + o(n) bits, but the space drops to 1.736n + o(n) bits on average for random permutations of n elements, and 2 lg ( n r ) + o(n) for arrays with r increasing runs, respectively. Both results are optima...
Database Engineering …, 1997
This paper addresses the question of how informationtheoretically-derived c ompact representations can be applied i n p r actice to improve storage and processing e ciency in DBMS. Compact data representation has the potential for savings in storage, access and processing costs throughout the systems architecture and may alter the balance of usage between disk and solid state storage. To r ealise the potential performance b ene ts, however, novel systems engineering must be adopted to ensure that compression decompression overheads are limited. This paper describe s a b asic approach to storage and processing of relations in a highly compressed form. A vertical columnwise representation is adopted in which columns can dynamically vary incrementally in both length and width. To achieve good p erformance query processing is carried out directly on the compressed relational representation using a compressed r epresentation of the query, thus avoiding decompression overheads. Measurements of performance of the Hibase prototype implementation are c ompared with those obtained f r om conventional DBMS.
2007
We present a framework to dynamize succinct data structures, to encourage their use over non-succinct versions in a wide variety of important application areas. Our framework can dynamize most state-of-the-art succinct data structures for dictionaries, ordinal trees, labeled trees, and text collections. Of particular note is its direct application to XML indexing structures that answer subpath queries [2]. Our framework focuses on achieving information-theoretically optimal space along with near-optimal update/query bounds. As the main part of our work, we consider the following problem central to text indexing: Given a text T over an alphabet Σ, construct a compressed data structure answering the queries char(i), rank s (i), and select s (i) for a symbol s ∈ Σ. Many data structures consider these queries for static text T [5,3,16,4]. We build on these results and give the best known query bounds for the dynamic version of this problem, supporting arbitrary insertions and deletions of symbols in T. Specifically, with an amortized update time of O(n ε ), any static succinct data structure D for T, taking t(n) time for queries, can be converted by our framework into a dynamic succinct data structure that supports rank s (i), select s (i), and char(i) queries in O(t(n) + loglogn) time, for any constant ε> 0. When |Σ| = polylog(n), we achieve O(1) query times. Our update/query bounds are near-optimal with respect to the lower bounds from [13].
Lecture Notes in Computer Science, 2014
Engineering efficient implementations of compact and succinct structures is a time-consuming and challenging task, since there is no standard library of easy-touse, highly optimized, and composable components. One consequence is that measuring the practical impact of new theoretical proposals is a difficult task, since older baseline implementations may not rely on the same basic components, and reimplementing from scratch can be very time-consuming. In this paper we present a framework for experimentation with succinct data structures, providing a large set of configurable components, together with tests, benchmarks, and tools to analyze resource requirements. We demonstrate the functionality of the framework by recomposing succinct solutions for document retrieval.
Combinatorial Pattern Matching, 2011
LRM-Trees are an elegant way to partition a sequence of values into sorted consecutive blocks, and to express the relative position of the first element of each block within a previous block. They were used to encode ordinal trees and to index integer arrays in order to support range minimum queries on them. We describe how they yield many other convenient results in a variety of areas, from data structures to algorithms: some compressed succinct indices for range minimum queries; a new adaptive sorting algorithm; and a compressed succinct data structure for permutations supporting direct and indirect application in time all the shortest as the permutation is compressible. As part of our review preliminary work, we also give an overview of the, sometimes redundant, terminology relative to succinct data-structures and indices.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
ACM SIGMOD Record, 2000
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data, 2009
Lecture Notes in Computer Science, 2001
Proceedings of the 25th …, 2002
Proceedings 2000 International Database Engineering and Applications Symposium (Cat. No.PR00789), 2000
Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073)
Information Processing & Management, 1976
Database and Expert Systems …, 2008
ACM Transactions on Programming Languages and Systems, 1979