Tree Pattern Aggregation for Scalable XML Data Dissemination

Minos Garofalakis

Tree Pattern Aggregation for Scalable XML Data Dissemination

Minos Garofalakis

2002, VLDB '02: Proceedings of the 28th International Conference on Very Large Databases

visibility

…

description

12 pages

link

1 file

Sign up for access to the world's latest research

checkGet notified about relevant papers

checkSave papers to use in your research

checkJoin the discussion with peers

checkTrack your impact

Abstract

With the rapid growth of XML-document traffic on the Internet, scalable content-based dissemination of XML documents to a large, dynamic group of consumers has become an important research challenge. To indicate the type of content that they are interested in, data consumers typically specify their subscriptions using some XML pattern specification language (e.g., XPath). Given the large volume of subscribers, system scalability and efficiency mandate the ability to aggregate the set of consumer subscriptions to a smaller set of content specifications, so as to both reduce their storagespace requirements as well as speed up the documentsubscription matching process. In this paper, we provide the first systematic study of subscription aggregation where subscriptions are specified with tree patterns (an important subclass of XPath expressions). The main challenge is to aggregate an input set of tree patterns into a smaller set of generalized tree patterns such that: (1) a given space constraint on the total size of the subscriptions is met, and (2) the loss in precision (due to aggregation) during document filtering is minimized. We propose an efficient tree-pattern aggregation algorithm that makes effective use of document-distribution statistics in order to compute a precise set of aggregate tree patterns within the allotted space budget. As part of our solution, we also develop several novel algorithms for tree-pattern containment and minimization, as well as "least-upper-bound" computation for a set of tree patterns. These results are of interest in their own right, and can prove useful in other domains, such as XML query optimization. Extensive results from a prototype implementation validate our approach.

Songting Chen

IEEE Transactions on Knowledge and Data Engineering, 2008

An XML publish/subscribe system needs to filter a large number of queries over XML streams. Most existing systems only consider filtering the simple XPath statements. In this paper, we focus on filtering of the more complex Generalized-Tree-Pattern (GTP) queries. Our filtering mechanism is based on a novel Tree-of-Path (TOP) encoding scheme, which compactly represents the path matches for the entire document. First, we show that the TOP encodings can be efficiently produced via shared bottom-up path matching. Second, with the aid of this TOP encoding, we can 1) achieve polynomial time and space complexity for postprocessing, 2) avoid redundant predicate evaluations, 3) allow an efficient duplicate-free and merge join-based algorithm for merging multiple encoded path matches, and 4) simplify the processing of GTP queries. Overall, our approach maximizes the sharing opportunity across queries by exploiting the suffix as well as prefix sharing. At the same time, our TOP encodings allow efficient postprocessing for GTP queries. Extensive performance studies show that GFilter not only achieves significantly better filtering performance than state-ofthe-art algorithms but also is capable of efficiently filtering the more complex GTP queries.

Log In

Tree Pattern Aggregation for Scalable XML Data Dissemination

Sign up for access to the world's latest research

Abstract

Related papers

Related papers

Related topics