2012, 2012 IEEE 28th International Conference on Data Engineering
Top-k pairs queries have received significant attention by the research community. k-closest pairs queries, kfurthest pairs queries and their variants are among the most well studied special cases of the top-k pairs queries. In this paper, we present the first approach to answer a broad class of top-k pairs queries over sliding windows. Our framework handles multiple top-k pairs queries and each query is allowed to use a different scoring function, a different value of k and a different size of the sliding window. Although the number of possible pairs in the sliding window is quadratic to the number of objects N in the sliding window, we efficiently answer the top-k pairs query by maintaining a small subset of pairs called Kskyband which is expected to consist of O(K log(N/K)) pairs. For all the queries that use the same scoring function, we need to maintain only one K-skyband. We present efficient techniques for the K-skyband maintenance and query answering. We conduct a detailed complexity analysis and show that the expected cost of our approach is reasonably close to the lower bound cost. We experimentally verify this by comparing our approach with a specially designed supreme algorithm that assumes the existence of an oracle and meets the lower bound cost. * Corresponding Author. when the divergence between the two stocks returns to normal. A top-k pairs query can be issued to obtain the pairs of stocks that are correlated (e.g., they belong to the same business sector and have similar fundamentals such as market caps, dividends etc.) and display different trends. Pair-trading can be profitable only if the trader is the first one to capitalize on the opportunity [7]. Hence, the trader may want to continuously monitor the top-k pairs from the most recent data (e.g., a sliding window containing most recent n items). Consider another example of an online auction website. A user may be interested in finding the pairs of products that have similar specifications but are sold at very different prices (i.e., different final bids). Such pairs may be used to understand the users behavior and market trends, e.g., suitable bidding time for buyers and suitable bidding closing time for sellers etc. An analyst or a user may issue the following query to obtain top-k pairs of such products sold during last 7 days. Select a.id, b.id from auction a, auction b where a.id < b.id order by dist(a.spec,b.spec)-|a.bid-b.bid| limit k window [7 days] Here dist(a.spec, b.spec) computes the distance (or difference) between their specifications and |a.bid − b.bid| denotes the absolute difference between the final bids they receive. Note that the query prefers the pairs of products that have small difference between their specifications but have large difference between their selling prices. The condition a.id < b.id ensures that a pair (a, b) is not repeated as (b, a). While the above example shows a simple scoring function, in real-world applications, the users may specify a more sophisticated scoring function. Our framework allows the users to define arbitrarily complex scoring functions. A query that retrieves top-k pairs among the most recent n data items (i.e., sliding window of size n) and uses the scoring function s is denoted as Q (k,n,s). A. Contributions Our framework has following features. Unified framework. To the best of our knowledge, we are the first to study top-k pairs queries over sliding windows. We present a unified framework that efficiently solves the