Joins on encoded and partitioned data

Jae-Gil Lee; Gopi Attaluri; Ronald Barber; Naresh Chainani; Oliver Draese; Frederick Ho; Stratos Idreos; Min-Soo Kim; Sam Lightstone; Guy Lohman; Konstantinos Morfonios; Keshava Murthy; Ippokratis Pandis; Lin Qiao; Vijayshankar Raman; Vincent Kulandai Samy; Richard Sidle; Knut Stolze; Liping Zhang

Joins on encoded and partitioned data

Oliver Draese

2014, Proceedings of the VLDB Endowment

Sign up for access to the world's latest research

checkGet notified about relevant papers

checkSave papers to use in your research

checkJoin the discussion with peers

checkTrack your impact

Abstract

Compression has historically been used to reduce the cost of storage, I/Os from that storage, and buffer pool utilization, at the expense of the CPU required to decompress data every time it is queried. However, significant additional CPU efficiencies can be achieved by deferring decompression as late in query processing as possible and performing query processing operations directly on the still-compressed data. In this paper, we investigate the benefits and challenges of performing joins on compressed (or encoded) data. We demonstrate the benefit of independently optimizing the compression scheme of each join column, even though join predicates relating values from multiple columns may require translation of the encoding of one join column into the encoding of the other. We also show the benefit of compressing "payload" data other than the join columns "on the fly," to minimize the size of hash tables used in the join. By partitioning the domain of each column ...

Zhiyuan Chen

ACM SIGMOD Record, 2001

Over the last decades, improvements in CPU speed have outpaced improvements in main memory and disk access rates by orders of magnitude, enabling the use of data compression techniques to improve the performance of database systems. Previous work describes the benefits of compression for numerical attributes, where data is stored in compressed format on disk. Despite the abundance of string-valued attributes in relational schemas there is little work on compression for string attributes in a database context. Moreover, none of the previous work suitably addresses the role of the query optimizer: During query execution, data is either eagerly decompressed when it is read into main memory, or data lazily stays compressed in main memory and is decompressed on demand only In this paper, we present an effective approach for database compression based on lightweight, attribute-level compression techniques. We propose a IIierarchical Dictionary Encoding strategy that intelligently selects ...

Log In

Joins on encoded and partitioned data

Sign up for access to the world's latest research

Abstract

Related papers

Related topics