Faster Compressed Top-k Document Retrieval

Jeffrey  Vitter

Faster Compressed Top-k Document Retrieval

Jeffrey Vitter

Sign up for access to the world's latest research

checkGet notified about relevant papers

checkSave papers to use in your research

checkJoin the discussion with peers

checkTrack your impact

Abstract

Let D = {d 1 , d 2 , ...d D } be a given collection of D string documents of total length n, our task is to index D, such that whenever a pattern P (of length p) and an integer k come as a query, those k documents in which P appears the most number of times can be listed efficiently. In this paper, we propose a compressed index taking 2|CSA| + D log n D + O(D) + o(n) bits of space, which answers a query with O(t sa log k log n) per document report time. This improves the O(t sa log k log 1+ n) per document report time of the previously best-known index with (asymptotically) the same space requirements [Belazzougui and Navarro, SPIRE 2011]. Here, |CSA| represents the size (in bits) of the compressed suffix array (CSA) of the text obtained by concatenating all documents in D, and t sa is the time for decoding a suffix array value using the CSA.

Related papers

Faster Compact Top-k Document Retrieval

Roberto Konow

An optimal index solving top-k document retrieval [Navarro and Nekrich, SODA'12] takes O(m + k) time for a pattern of length m, but its space is at least 80n bytes for a collection of n symbols. We reduce it to 1:5n{ 3n bytes, with O(m+(k+log log n) log log n) time, on typical texts. The index is up to 25 times faster than the best previous compressed solutions, and requires at most 5% more space in practice (and in some cases as little as one half). Apart from replacing classical by compressed data structures, our main idea is to replace sux tree sampling by frequency thresholding to achieve compression.

Log In

Faster Compressed Top-k Document Retrieval

Sign up for access to the world's latest research

Abstract

Related papers

Related papers