Compression and query execution within column oriented databases

MIGUEL LOPES

Compression and query execution within column oriented databases

MIGUEL LOPES

2005

Sign up for access to the world's latest research

checkGet notified about relevant papers

checkSave papers to use in your research

checkJoin the discussion with peers

checkTrack your impact

Abstract

Compression is a known technique used by many database management systems("DBMS") to increase performance[4, 5, 14]. However, not much research has been done in how compression can be used within column oriented architectures. Storing data in column increases the similarity between adjacent records, thus increase the compressibility of the data. In addition, compression schemes not traditionally used in row-oriented DBMSs can be applied to column-oriented systems. This thesis presents a column-oriented query executor designed to operate directly on compressed data. We show that operating directly on compressed data can improve query performance. Additionally, the choice of compression scheme depends on the expected query workload, suggesting that for ad-hoc queries we may wish to store a column redundantly under different coding schemes. Furthermore, the executor is designed to be extensible so that the addition of new compression schemes does not impact operator implementation. The executor is part of a larger database system, known as CStore [10].

Related papers

Data compression in database systems

Paul Cockshott

Database Engineering …, 1997

This paper addresses the question of how informationtheoretically-derived c ompact representations can be applied i n p r actice to improve storage and processing e ciency in DBMS. Compact data representation has the potential for savings in storage, access and processing costs throughout the systems architecture and may alter the balance of usage between disk and solid state storage. To r ealise the potential performance b ene ts, however, novel systems engineering must be adopted to ensure that compression decompression overheads are limited. This paper describe s a b asic approach to storage and processing of relations in a highly compressed form. A vertical columnwise representation is adopted in which columns can dynamically vary incrementally in both length and width. To achieve good p erformance query processing is carried out directly on the compressed relational representation using a compressed r epresentation of the query, thus avoiding decompression overheads. Measurements of performance of the Hibase prototype implementation are c ompared with those obtained f r om conventional DBMS.

Log In

Compression and query execution within column oriented databases

Sign up for access to the world's latest research

Abstract

Related papers

Related topics

Related papers