Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2004, Journal of Computer and System Sciences
An implicit data structure for the dictionary problem maintains n data values in the first n locations of an array in such a way that it efficiently supports the operations insert, delete and search. No information other than that in Oð1Þ memory cells and in the input data is to be retained; and the only operations performed on the data values (other than reads and writes) are comparisons. This paper describes the implicit B-tree, a new data structure supporting these operations in Oðlog B nÞ block transfers like in regular B-trees, under the realistic assumption that a block stores B ¼ Oðlog nÞ keys, so that reporting r consecutive keys in sorted order has a cost of Oðlog B n þ r=BÞ block transfers. En route a number of space efficient techniques for handling segments of a large array in a memory hierarchy are developed. Being implicit, the proposed data structure occupies exactly Jn=Bn blocks of memory after each update, where n is the number of keys after each update and B is the number of keys contained in a memory block. In main memory, the time complexity of the operations is Oðlog 2 n=log log nÞ; disproving a conjecture of the mid 1980s.
2005
We present the interpolation search tree (ISB-tree), a new cache-aware indexing scheme that supports update operations (insertions and deletions) in O(1) worst-case (w.c.) block transfers and search operations in O(logB log n) expected block transfers, where B represents the disk block size and n denotes the number of stored elements. The expected search bound holds with high probability for a large class of (unknown) input distributions. The w.c. search bound of our indexing scheme is O(logB n) block transfers. Our update and expected search bounds constitute a considerable improvement over the O(logB n) w.c. block transfer bounds for search and update operations achieved by the B-tree and its numerous variants. This is also suggested by a set of preliminary experiments we have carried out. Our indexing scheme is based on an externalization of a main memory data structure based on interpolation search.
SIAM Journal on Computing, 2014
This paper presents a general technique for optimally transforming any dynamic data structure that operates on atomic and indivisible keys by constant-time comparisons, into a data structure that handles unbounded-length keys whose comparison cost is not a constant. Examples of these keys are strings, multi-dimensional points, multiple-precision numbers, multi-key data (e.g. records), XML paths, URL addresses, etc. The technique is more general than what has been done in previous work as no particular exploitation of the underlying structure of is required. The only requirement is that the insertion of a key must identify its predecessor or its successor. Using the proposed technique, online suffix tree construction can be done in worst case time O(log n) per input symbol (as opposed to amortized O(log n) time per symbol, achieved by previously known algorithms). To our knowledge, our algorithm is the first that achieves O(log n) worst case time per input symbol. Searching for a pattern of length m in the resulting suffix tree takes O(min(m log |Σ|, m + log n) + tocc) time, where tocc is the number of occurrences of the pattern. The paper also describes more applications and show how to obtain alternative methods for dealing with suffix sorting, dynamic lowest common ancestors and order maintenance. The technical features of the proposed technique for a given data structure D are the following ones. The new data structure D ′ is obtained from D by augmenting the latter with an oracle for strings, extending the functionalities of the Dietz-Sleator list for order maintenance . The space complexity of D ′ is S (n) + O(n) memory cells for storing n keys, where S (n) denotes the space complexity of D. Then, each operation involving O(1) keys taken from D ′ requires O(T (n)) time, where T (n) denotes the time complexity of the corresponding operation originally supported in D. Each operation involving a key y not stored in D ′ takes O(T (n) + |y|) time, where |y| denotes the length of y. For the special case where the oracle handles suffixes of a string, the achieved insertion time is O(T (n)).
Journal of Computer and System Sciences, 1980
We consider representations of data structures in which the relative ordering of the values stored is inzpZicit in the pattern in which the elements are retained, rather than explicit in pointers. Several implicit schemes for storing data are introduced to permit efficient implementation of the instructions insert, delete and search. B(Nll*) basic operations are shown to be necessary and sufficient, in the worst case, to perform these instructions provided that the data elements are kept in some fixed partial order. We demonstrate, however, that the upper bound can be reduced to O(N1j3 log N) if arrangements other than fixed partial orders are used.
We propose the PATRICIA-hypercube-tree, or PH-tree, a multi-dimensional data storage and indexing structure. It is based on binary PATRICIA-tries combined with hypercubes for efficient data access. Space efficiency is achieved by combining prefix sharing with a space optimised implementation. This leads to storage space requirements that are comparable or below storage of the same data in non-index structures such as arrays of objects. The storage structure also serves as a multi-dimensional index on all dimensions of the stored data. This enables efficient access to stored data via point and range queries. We explain the concept of the PH-tree and demonstrate the performance of a sample implementation on various datasets and compare it to other spatial indices such as the kD-tree. The experiments show that for larger datasets beyond 10^7 entries, the PH-tree increasingly and consistently outperforms other structures in terms of space efficiency, query performance and update performance. For some highly skewed datasets, it even shows super-constant performance, becoming faster for larger datasets.
Acta Informatica, 1982
In this paper we explore the use of weak B-trees to represent sorted lists. In weak B-trees each node has at least a and at most b sons where 2a<b. We analyse the worst case cost of sequences of insertions and deletions in weak B-trees. This leads to a new data structure (level-linked weak B-trees) for representing sorted lists when the access pattern exhibits a (time-varying) locality of reference. Our structure is substantially simpler than the one proposed in [7], yet it has many of its properties. Our structure is as simple as the one proposed in [5], but our structure can treat arbitrary sequences of insertions and deletions whilst theirs can only treat non-interacting insertions and deletions. We also show that weak Btrees support concurrent operations in an efficient way.
wwwdb.inf.tu-dresden.de
Abstract: Efficient data structures for in-memory indexing gain in importance due to (1) the exponentially increasing amount of data, (2) the growing main-memory capac-ity, and (3) the gap between main-memory and CPU speed. In consequence, there are high performance demands for ...
Proceedings of the 2014 ACM symposium on Principles of distributed computing - PODC '14, 2014
In this paper we present a novel algorithm for concurrent lock-free internal binary search trees (BST) and implement a Set abstract data type (ADT) based on that. We show that in the presented lock-free BST algorithm the amortized step complexity of each set operation -Add, Remove and Contains -is O(H(n) + c), where, H(n) is the height of BST with n number of nodes and c is the contention during the execution. Our algorithm adapts to contention measures according to read-write load. If the situation is read-heavy, the operations avoid helping pending concurrent Remove operations during traversal, and, adapt to interval contention. However, for write-heavy situations we let an operation help pending Remove, even though it is not obstructed, and so adapt to tighter point contention. It uses single-word compare-and-swap (CAS) operations. We show that our algorithm has improved disjoint-access-parallelism compared to similar existing algorithms. We prove that the presented algorithm is linearizable. To the best of our knowledge this is the first algorithm for any concurrent tree data structure in which the modify operations are performed with an additive term of contention measure.
Lecture Notes in Computer Science, 2013
This paper proposes a new lock-based concurrent binary tree using a methodology for writing concurrent data structures. This methodology limits the high contention induced by today's multicore environments to come up with efficient alternatives to the most widely used search structures. Data structures are generally constrained to guarantee a big-oh step complexity even in the presence of concurrency. By contrast our methodology guarantees the big-oh complexity only in the absence of contention and limits the contention when concurrency appears. The key concept lies in dividing update operations within an eager abstract access that returns rapidly for efficiency reason and a lazy structural adaptation that may be postponed to diminish contention. Our evaluation clearly shows that our lock-based tree is up to 2.2× faster than the most recent lock-based tree algorithm we are aware of.
In multimedia databases, the spatial index structures based on trees (like R-tree, M-tree) have been proved to be efficient and scalable for low-dimensional data retrieval. However, if the data dimensionality is too high, the hierarchy of nested regions (represented by the tree nodes) becomes spatially indistinct. Hence, the query processing deteriorates to inefficient index traversal (in terms of random-access I/O costs) and in such case the tree-based indexes are less efficient than the sequential search. This is mainly due to repeated access to many nodes at the top levels of the tree. In this paper we propose a modified storage layout of tree-based indexes, such that nodes belonging to the same tree level are stored together. Such level-ordered storage allows to prefetch several top levels of the tree to the buffer pool by only a few or even a single contiguous I/O operation (i.e. one-seek read). The experimental results show that our approach can speedup the tree-based search significantly.
WSEAS Transactions on Computers, 2008
Abstract: Trees are frequently used data structures for fast access to the stored data. Data structures like arrays, vectors and linked lists are limited by the trade-off between the ability to perform a fast search and the ability to resize easily. Binary Search Trees are an ...
INTERNATIONAL JOURNAL OF COMPUTERS & TECHNOLOGY
In this paper, a new and novel data structure is proposed to dynamically insert and delete segments. Unlike the standard segment trees, the proposed data structure permits insertion of a segment with interval range beyond the interval range of the existing tree, which is the interval between minimum and maximum values of the end points of all the segments. Moreover, the number of nodes in the proposed tree is lesser as compared to the dynamic version of the standard segment trees, and is able to answer both stabbing and range queries practically much faster compared to the standard segment trees.
Information Processing Letters, 1996
We present I/O-efficient fully persistent B-Trees that support range searches at any version in O(logBn + t/B) I/Os and updates at any version in O(logBn + log2B) amortized I/Os, using space O(m/B) disk blocks. By n we denote the number of elements in the accessed version, by m the total number of updates, by t the size of the query's output, and by B the disk block size. The result improves the previous fully persistent B-Trees of Lanka and Mays by a factor of O(logBm) for the range query complexity and O(logBn) for the update complexity. To achieve the result, we first present a new B-Tree implementation that supports searches and updates in O(logBn) I/Os, using O(n/B) blocks of space. Moreover, every update makes in the worst case a constant number of modifications to the data structure. We make these B-Trees fully persistent using an I/O-efficient method for full persistence that is inspired by the node-splitting method of Driscoll et al. The method we present is interesting...
2008 IEEE International Conference on Communications, 2008
A Bloom Filter is an efficient randomized data structure for membership queries on a set with a certain known false positive probability. A Counting Bloom Filter (CBF) allows the same operations on dynamical sets that can be updated via insertions and deletions with larger memory requirements. This paper presents a novel hierarchical data structure, called Blooming Tree, that replicates the functionalities of a CBF with lower memory consumption and tunable false positive probability. The hierarchical multi-layer design of Blooming Trees allows for distributing the structure in different memory levels, thus exploiting small but fast on-chip memories for most frequently accessed substructures. The proposed algorithm is compared to previous existing schemes on a target platform: Intel IXP2XXX Network Processors (NPs).
Usage of Data Structures In Presenting a Tree Structure for Storing and Searching Large Lists Of Order, 2019
This paper tries to throw light in the usage of data structures in the field of information retrieval. Information retrieval is an area of study which is gaining momentum as the need and urge for sharing and exploring information is growing day by day. Data structures have been the area of research for a long period in the arena of computer science. The need to have efficient data structures has become even more important as the data grows in an exponential nature. Sort the list of greatest concern is that mathematicians are working on optimizing the algorithms. Sort the list so far using linear arrays was performed. Due to the limited size of the linear array and traverse difficult time sorting this type of data structure and it was not desired. Sorting in linear lists must be scanned once for each element and other elements to be compared thus, when about twice the size of the list to sort the list (O () to spend scrolling up to search for the elements as we move toward the desired element. The best way to split the original list into two smaller lists with this action against O (Logn) to be spent. In this paper has presented a tree structure for storing and searching that can order large lists in O (n) order time. It also has a search feature in the list of possible list elements do not depend on the size and type of the list is the same time (O (1)).
B-tree and R-tree are two basic index structures; many different variants of them are proposed after them. Different variants are used in specific application for the performance optimization. In this paper different variants of B-tree and R-tree are discussed and compared. Index structures are different in terms of structure, query support, data type support and application. Index structure’s structures are discussed first. B-tree and its variants are discussed and them R-tree and its variants are discussed. Some structures example is also shown for the more clear idea. Then comparison is made between all structure with respect to complexity, query type support, data type support and application.
Foundations and Trends in Databases, 2010
Invented about 40 years ago and called ubiquitous less than 10 years later, B-tree indexes have been used in a wide variety of computing systems from handheld devices to mainframes and server farms. Over the years, many techniques have been added to the basic design in order to improve efficiency or to add functionality. Examples include separation of updates to structure or contents, utility operations such as non-logged yet transactional index creation, and robust query processing such as graceful degradation during index-to-index navigation. This survey reviews the basics of B-trees and of B-tree indexes in databases, transactional techniques and query processing techniques related to B-trees, B-tree utilities essential for database operations, and many optimizations and improvements. It is intended both as a survey and as a reference, enabling researchers to compare index innovations with advanced B-tree techniques and enabling professionals to select features, functions, and tradeoffs most appropriate for their data management challenges.
2015 IEEE 31st International Conference on Data Engineering, 2015
With prices of main memory constantly decreasing, people nowadays are more interested in performing their computations in main memory, and leave high I/O costs of traditional disk-based systems out of the equation. This change of paradigm, however, represents new challenges to the way data should be stored and indexed in main memory in order to be processed efficiently. Traditional data structures, like the venerable B-tree, were designed to work on disk-based systems, but they are no longer the way to go in main-memory systems, at least not in their original form, due to the poor cache utilization of the systems they run on. Because of this, in particular, during the last decade there has been a considerable amount of research on index data structures for main-memory systems. Among the most recent and most interesting data structures for main-memory systems there is the recently-proposed adaptive radix tree ARTful (ART for short). The authors of ART presented experiments that indicate that ART was clearly a better choice over other recent tree-based data structures like FAST and B +-trees. However, ART was not the first adaptive radix tree. To the best of our knowledge, the first was the Judy Array (Judy for short), and a comparison between ART and Judy was not shown. Moreover, the same set of experiments indicated that only a hash table was competitive to ART. The hash table used by the authors of ART in their study was a chained hash table, but this kind of hash tables can be suboptimal in terms of space and performance due to their potentially high use of pointers. In this paper we present a thorough experimental comparison between ART, Judy, two variants of hashing via quadratic probing, and three variants of Cuckoo hashing. These hashing schemes are known to be very efficient. For our study we consider whether the data structures are to be used as a non-covering index (relying on an additional store), or as a covering index (covering key-value pairs). We consider both OLAP and OLTP scenarios. Our experiments strongly indicate that neither ART nor Judy are competitive to the aforementioned hashing schemes in terms of performance, and, in the case of ART, sometimes not even in terms of space.
Being popular for managing data dynamically in today's storage systems, fast data insertion, deletion and searching are also concerned with the system's performance. Those criteria are heavily dependent on the way to handle the attributes of the algorithm used because it can determine how large as well as how much the system can hold data and throughput. B+ tree-based indexing algorithm is capable of scaling data logarithmically and so widely used in distributed file system. However, the level of the system's scalability is solely associated with the order and height of the tree. The proposed system modifies the traditional B+ Tree in the form power of 2-based for data expansion and it is designed on object-based file system.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.