Improving join performance for skewed databases

Ramon Lawrence

Improving join performance for skewed databases

Ramon Lawrence

2008, 2008 Canadian Conference on Electrical and Computer Engineering

visibility

…

description

5 pages

link

1 file

Sign up for access to the world's latest research

checkGet notified about relevant papers

checkSave papers to use in your research

checkJoin the discussion with peers

checkTrack your impact

Abstract

The largest queries in data warehouses and decision support systems use hybrid hash join to relate information in multiple tables. Hybrid hash join functions independently of the data distributions of the join relations. Real-world data sets are not uniformly distributed and often contain significant skew. Although partition skew has been studied for hash joins, no prior work has examined how exploiting data skew can improve performance. In this paper, we present histojoin, a join algorithm that uses histograms to identify data skew and improve join performance. Experimental results show that for skewed data sets histojoin performs significantly fewer I/O operations and is faster by 20 to 60% than hybrid hash join.

Daniel Dias

[1991] Proceedings. Seventh International Conference on Data Engineering, 1991

Parallel processing of relational queries has received considerable attention of late. However, in the presence of data skew, the speedup from conventional parallel join algorithms can be very limited, due to load imbalances among the various processors. Even a single large skew element can cause a processor to become overloaded.

Log In

Improving join performance for skewed databases

Sign up for access to the world's latest research

Abstract

Related papers

Related papers

Related topics