Compression of inverted indexes for fast query evaluation

Falk Scholer

Compression of inverted indexes for fast query evaluation

Falk Scholer

2002, Proceedings of the 25th …

Sign up for access to the world's latest research

checkGet notified about relevant papers

checkSave papers to use in your research

checkJoin the discussion with peers

checkTrack your impact

Abstract

Compression reduces both the size of indexes and the time needed to evaluate queries. In this paper, we revisit the compression of inverted lists of document postings that store the position and frequency of indexed terms, considering two approaches to improving retrieval efficiency: better implementation and better choice of integer compression schemes. First, we propose several simple optimisations to well-known integer compression schemes, and show experimentally that these lead to significant reductions in time. Second, we explore the impact of choice of compression scheme on retrieval efficiency.

Mauricio Oyarzún

Information Processing & Management, 2018

Text search engines are a fundamental tool nowadays. Their efficiency relies on a popular and simple data structure: inverted indexes. They store an inverted list per term of the vocabulary. The inverted list of a given term stores, among other things, the document identifiers (docIDs) of the documents that contain the term. Currently, inverted indexes can be stored efficiently using integer compression schemes. Previous research also studied how an optimized document ordering can be used to assign docIDs to the document database. This yields important improvements in index compression and query processing time. In this paper we show that using a hybrid compression approach on the inverted lists is more effective in this scenario, with two main contributions: • First, we introduce a document reordering approach that aims at generating runs of consecutive docIDs in a properly-selected subset of inverted lists of the index. • Second, we introduce hybrid compression approaches that combine gap and run-length encodings within inverted lists, in order to take advantage not only from small gaps, but also from long runs of consecutive docIDs generated by our document reordering approach. Our experimental results indicate a reduction of about 10%-30% in the space usage of the whole index (just regarding docIDs), compared with the most efficient state-of-the-art results. Also, decompression speed is up to 1.22 times faster if the runs of consecutive docIDs must be explicitly decompressed, and up to 4.58 times faster if implicit decompression of these runs is allowed (e.g., representing the runs as intervals in the output). Finally, we also improve the query processing time of AND queries (by up to 12%), WAND queries (by up to 23%), and full (non-ranked) OR queries (by up to 86%), outperforming the best existing approaches.

Log In

Compression of inverted indexes for fast query evaluation

Sign up for access to the world's latest research

Abstract

Related papers

Related topics