Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2008
Abstract We study the problem of optimal skip placement in an inverted list. Assuming the query distribution to be known in advance, we formally prove that an optimal skip placement can be computed quite efficiently. Our best algorithm runs in time O (n log n), n being the length of the list.
Skip lists are a data structure that can be used in place of balanced trees. Skip lists use probabilistic balancing rather than strictly enforced balancing and as a result the algorithms for insertion and deletion in skip lists are much simpler and significantly faster than equivalent algorithms for balanced trees.
Skip graphs are a kind of distributed data structure based on skip lists. They have the full functionality of a balanced tree in a distributed systems. Skip graphs are mostly used in searching peer-to-peer (p2p) networks. As they provide the ability to query by key ordering, they improve other search tools based on the hash table functionality only. In compare to skip lists and other tree data structure, they are very resillent and can tolerate a large function of node fails. Simple and straightforward algorithms can be used to construct a skip graph, insert new nodes into it, search it, and detect and repair errors in a skip graph introduced due to node failures.
Theoretical Computer Science, 2004
We consider the problem of searching for a given element in a partially ordered set. More precisely, we address the problem of computing efficiently near-optimal search strategies for typical partial orders. We consider two classical models for random partial orders, the random graph model and the uniform model. We shall show that certain simple, fast algorithms are able to produce nearly-optimal search strategies for typical partial orders under the two models of random partial orders that we consider. For instance, our algorithm for the random graph model produces, in linear time, a search strategy that makes O (log n) 1/2 log log n more queries than the optimal strategy, for almost all partial orders on n elements. Since we need to make at least lg n = log 2 n queries for any n-element partial order, our result tells us that one may efficiently devise near-optimal search strategies for almost all partial orders in this model (the problem of determining an optimal strategy is NP-hard, as proved recently in [1]).
Theoretical Computer Science, 2006
To any sequence of real numbers a n n 0 , we can associate another sequence â s s 0 , which Knuth calls its binomial transform. This transform is defined through the rulê a s = B s a n = n (−1) n s n a n .
Lecture Notes in Computer Science, 2011
In large web search engines the performance of Information Retrieval systems is a key issue. Block-based compression methods are often used to improve the search performance, but current self-indexing techniques are not adapted to such data structure and provide suboptimal performance. In this paper, we present SkipBlock, a self-indexing model for block-based inverted lists. Based on a cost model, we show that it is possible to achieve significant improvements on both search performance and structure's space storage.
Bell System Technical Journal, 1983
In this paper, we introduce two new kinds of biased search trees: biased, a, b trees and pseudo-weight-balanced trees. A biased search tree is a data structure for storing a sorted set in which the access time for an item depends on its estimated access frequency in such a way that the average access time is small. Bent, Sleator, and Tarjan were the rust to describe classes of biased search trees that are easy to update; such trees have applications not only in efficient table storage but also in various network optimization algorithms. Our biased a, b trees generalize the biased 2, b trees of Bent, Sleator, and Tarjan. They provide a biased generalization of B-trees and are suitable for use in paged external memory, whereas previous kinds of biased trees are suitable for internal memory. Our pseudo-weight-balanced trees are a biased version of weight-balanced trees much simpler than Bent's version. Weight balance is the natural kind of balance to use in designing biased trees; pseudoweight-balanced trees are especially easy to implement and analyze. I. INTRODUCTION The following problem, which we shall call the dictionary problem, occurs frequently in computer science. Given a totally ordered universe U, we wish to maintain one or more subsets of U under the following operations, where R and 8 denote any subsets of U and i denotes any item in U: access (i, 8)-1f item i is in 8, return a pointer to its location. Otherwise, return a special null pointer. * Research done partly while a summer employee of Bell Laboratories and partly while a graduate student supported by Air Force grant AFOSR-80-042. t Bell Laboratories.
2011 Second International Conference on Networking and Computing, 2011
In this paper, we introduce a generalization of the distributed sorting problem on chain network. This problem consists of sorting values in processes that are separated from each other by any number of intermediate processes which can relay values but do not have their own values. We solve this problem in a chain network by proposing a silent self-stabilizing distributed algorithm. This algorithm converges from any initial configuration to a terminal configuration where values are sorted in the chain in increasing order from left to right.
Proceedings of the 33rd …, 2007
The general problem of answering top-k queries can be modeled using lists of data items sorted by their local scores. The most efficient algorithm proposed so far for answering top-k queries over sorted lists is the Threshold Algorithm (TA). However, TA may still incur a lot of useless accesses to the lists. In this paper, we propose two new algorithms which stop much sooner. First, we propose the best position algorithm (BPA) which executes top-k queries more efficiently than TA. For any database instance (i.e. set of sorted lists), we prove that BPA stops as early as TA, and that its execution cost is never higher than TA. We show that the position at which BPA stops can be (m-1) times lower than that of TA, where m is the number of lists. We also show that the execution cost of our algorithm can be (m-1) times lower than that of TA. Second, we propose the BPA2 algorithm which is much more efficient than BPA. We show that the number of accesses to the lists done by BPA2 can be about (m-1) times lower than that of BPA. Our performance evaluation shows that over our test databases, BPA and BPA2 achieve significant performance gains in comparison with TA. 1 Work partially funded by ARA "Massive Data" of the French ministry of research and the European Strep Grid4All project.
2014
We propose a new way to build a combined list from K base lists, each containing N items. A combined list consists of top segments of various sizes from each base list so that the total size of all top segments equals N . A sequence of item requests is processed and the goal is to minimize the total number of misses. That is, we seek to build a combined list that contains all the frequently requested items. We first consider the special case of disjoint base lists. There, we design an efficient algorithm that computes the best combined list for a given sequence of requests. In addition, we develop a randomized online algorithm whose expected number of misses is close to that of the best combined list chosen in hindsight. We prove lower bounds that show that the expected number of misses of our randomized algorithm is close to the optimum. In the presence of duplicate items, we show that computing the best combined list is NP-hard. We show that our algorithms still apply to a linearized notion of loss in this case. We expect that this new way of aggregating lists will find many ranking applications.
2007
In this paper we give a finer separation of several known paging algorithms. This is accomplished using a new technique that we call relative interval analysis. The technique compares the fault rate of two paging algorithms across the entire range of inputs of a given size rather than in the worst case alone. Using this technique we characterize the relative performance of LRU and LRU-2, as well as LRU and FWF, among others. We also show that lookahead is beneficial for a paging algorithm, a fact that is well known in practice but it was, until recently, not verified by theory.
Symposium on the Theory of Computing, 1991
Workshop on Distributed Data and Structures, 2000
We propose DSL, a new Scalable Distributed Data Structure for the dic- tionary problem, based on a version of Skip Lists, as an alternative to both random trees and deterministic height balanced trees. Our scheme exhibits, with high probability, logarithmic search time, constant reconstruction time, and linear space overhead. Additionally, at the expense of two additional pointers per internal node,
Lecture Notes in Computer Science, 2011
We introduce a generalization of the distributed sorting problem on chain network. Our problem consists of sorting values in processes that are separated from each other by any number of intermediate processes which can relay values but do not have their own values. We solve this problem in a chain network by proposing a silent self-stabilizing distributed algorithm.
Information Processing Letters, 1981
Theoretical Computer Science, 2000
We introduce the concept of presorting algorithms, quantifying and evaluating the performance of such algorithms with the average reduction in number of inversions. Stages of well-known algorithms such as Shellsort and quicksort are evaluated in such a framework and shown to cause a meaning drop in the inversion statistic. The expected value, variance and generating function for the decrease in number of inversions are computed. The possibility of "presorting" a sorting algorithm is also investigated under a similar framework.
Proceedings. Fifth International Conference on High Performance Computing (Cat. No. 98EX238), 1998
We consider the problem of ranking an N element list on a P processor EREW PRAM. Recent work on this problem has shown the importance of grain size. While several optimal O N P + log P time list ranking algorithms are known, Reid-Miller and Blelloch recently showed that these do not lead to good implementations in practice[6, 7], because of the fine-grained nature of these algorithms. In Reid-Miller and Blelloch's experiments the best performance was obtained by an O N P + log 2 P time coarse grained randomized algorithm devised by them. We build upon their idea and present a coarse-grained randomized algorithm that runs in time O N P + log P, and is thus also optimal. Our algorithm simplifies some of the ideas from [6, 7]-these simplifications might be of interest to implementors.
Information and Control, 1986
1991
The paging problem is that of deciding which pages to keep in a memory of k pages in order to minimize the number of page faults. We develop the marking algorithm, a randomized on-line algorithm for the paging problem. We prove that its expected cost on any sequence of requests is within a factor of 2Hk of optimum. (Where Hk is the kth harmonic number, which is roughly Ink.) The best such factor that can be achieved is Hk. This is in contrast to deterministic algorithms, which cannot be guaranteed to be within a factor smaller than k of optimum. An alternative to comparing an on-line algorithm with the optimum off-line algorithm is the idea of comparing it to several other on-line algorithms. We have obtained results along these lines for the paging problem. Given a set of on-line algorithms 'Support was provided by a Weizmann fellowship.
Lecture Notes in Computer Science, 2011
Discrete & Computational Geometry, 2010
In approximate halfspace range counting, one is given a set P of n points in R d , and an ε > 0, and the goal is to preprocess P into a data structure which can answer efficiently queries of the form: Given a halfspace h, compute an estimate Several recent papers have addressed this problem, including a study by the authors [18], which is based, as is the present paper, on Cohen's technique for approximate range counting . In this approach, one chooses a small number of random permutations of P , and then constructs, for each permutation π, a data structure that answers efficiently minimum range queries: Given a query halfspace h, find the minimum-rank element (according to π) in P ∩ h. By repeating this process for all chosen permutations, the approximate count can be obtained, with high probability, using a certain averaging process over the minimum-rank outputs. In the previous study, the authors have constructed such a data structure in R 3 , using a combinatorial result about the overlay of minimization diagrams in a randomized incremental construction of lower envelopes. In the present work, we propose an alternative approach to the range-minimum problem, based on cuttings, which achieves better performance. Specifically, it uses, for each permutation, O(n ⌊d/2⌋ log 1-⌊d/2⌋ n) expected storage and preprocessing time, and answers a range-minimum query in O(log n) expected time. We also present a different approach, based on so-called "antennas", which is very simple to implement, although the bounds on its expected storage, preprocessing, and query costs are worse by polylogarithmic factors.