Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2002, VLDB '02: Proceedings of the 28th International Conference on Very Large Databases
…
12 pages
1 file
With the rapid growth of XML-document traffic on the Internet, scalable content-based dissemination of XML documents to a large, dynamic group of consumers has become an important research challenge. To indicate the type of content that they are interested in, data consumers typically specify their subscriptions using some XML pattern specification language (e.g., XPath). Given the large volume of subscribers, system scalability and efficiency mandate the ability to aggregate the set of consumer subscriptions to a smaller set of content specifications, so as to both reduce their storagespace requirements as well as speed up the documentsubscription matching process. In this paper, we provide the first systematic study of subscription aggregation where subscriptions are specified with tree patterns (an important subclass of XPath expressions). The main challenge is to aggregate an input set of tree patterns into a smaller set of generalized tree patterns such that: (1) a given space constraint on the total size of the subscriptions is met, and (2) the loss in precision (due to aggregation) during document filtering is minimized. We propose an efficient tree-pattern aggregation algorithm that makes effective use of document-distribution statistics in order to compute a precise set of aggregate tree patterns within the allotted space budget. As part of our solution, we also develop several novel algorithms for tree-pattern containment and minimization, as well as "least-upper-bound" computation for a set of tree patterns. These results are of interest in their own right, and can prove useful in other domains, such as XML query optimization. Extensive results from a prototype implementation validate our approach.
IEEE Transactions on Knowledge and Data Engineering, 2008
An XML publish/subscribe system needs to filter a large number of queries over XML streams. Most existing systems only consider filtering the simple XPath statements. In this paper, we focus on filtering of the more complex Generalized-Tree-Pattern (GTP) queries. Our filtering mechanism is based on a novel Tree-of-Path (TOP) encoding scheme, which compactly represents the path matches for the entire document. First, we show that the TOP encodings can be efficiently produced via shared bottom-up path matching. Second, with the aid of this TOP encoding, we can 1) achieve polynomial time and space complexity for postprocessing, 2) avoid redundant predicate evaluations, 3) allow an efficient duplicate-free and merge join-based algorithm for merging multiple encoded path matches, and 4) simplify the processing of GTP queries. Overall, our approach maximizes the sharing opportunity across queries by exploiting the suffix as well as prefix sharing. At the same time, our TOP encodings allow efficient postprocessing for GTP queries. Extensive performance studies show that GFilter not only achieves significantly better filtering performance than state-ofthe-art algorithms but also is capable of efficiently filtering the more complex GTP queries.
2007
Publish/subscribe (or pub/sub) systems perform asynchronous message transmission, from publishers to subscribers, without any of the parties having knowledge of the other. The pub/sub infrastructure manages the delivery of the messages, which is guided by user subscriptions that specify the type of information the subscribers are interested in. Since XML prevails as the standard for information exchange, efficient XML-aware pub/sub systems become necessary. Within that context, we propose VA-RoXSum, a novel message representation scheme that aggregates the content of messages in a space efficient manner. Coupled with specialized processing algorithms that operate on its aggregated content, the VA-RoXSum enables the batch processing of groups of messages and considerably improves the performance of the subscription-guided filtering task. Our preliminary experiments show that a pub/sub infrastructure with VA-RoXSum achieves up to two orders of magnitude faster matching, compared with state-of-the-art alternatives, which operate on the original messages. * Research partially supported by Lotus Interworks and UC Micro.
Dissemination systems are used to route information received from many publishers individually to multiple subscribers. The core of a dissemination system consists of an efficient filtering engine deciding what part of an incoming message goes to which recipient. Within this paper we are proposing a chunking framework of XML documents to speed up the filtering process for a set of registered subscriptions based on XPath expressions. The problem which will be leveraged by the proposed chunk- ing scheme is based on the observation that the execution time of XPath expressions increases with the size of the underlying XML document. The proposed chunking strategy is based on the idea of sharing XPath prefixes among the query set addition- ally extended by individually selected nodes to be able to handle XPath-filter expres- sions. Extensive tests showed substantial performance gains.
kunz-pc.sce.carleton.ca
Journal of Advanced Computational Intelligence and Intelligent Informatics
In this paper, we present a polynomial-time algorithm for TPQ (tree pattern queries) minimization without XML constraints involved. The main idea of the algorithm is a dynamic programming strategy to find all the matching subtrees within a TPQ. A matching subtree implies a redundancy and should be removed in such a way that the semantics of the original TPQ is not damaged. Our algorithm consists of two parts: one for subtree recognization and the other for subtree deletion. Both of them needs only O(<I>n</I>2) time, where <I>n</I> is the number of nodes in a TPQ.
Lecture Notes in Computer Science, 2004
We propose an efficient approach for finding relevant XML data twigs defined by unordered query tree specifications. We use the tree signatures as the index structure and find qualifying patterns through integration of structurally consistent query path qualifications. An efficient algorithm is proposed and its implementation tested on real-life data collections.
The VLDB Journal The International Journal on Very Large Data Bases, 2002
We propose a novel index structure, termed XTrie, that supports the efficient filtering of XML documents based on XPath expressions. Our XTrie index structure offers several novel features that make it especially attractive for largescale publish/subscribe systems. First, XTrie is designed to support effective filtering based on complex XPath expressions (as opposed to simple, single-path specifications). Second, our XTrie structure and algorithms are designed to support both ordered and unordered matching of XML data. Third, by indexing on sequences of element names organized in a trie structure and using a sophisticated matching algorithm, XTrie is able to both reduce the number of unnecessary index probes as well as avoid redundant matchings, thereby providing extremely efficient filtering. Our experimental results over a wide range of XML document and XPath expression workloads demonstrate that our XTrie index structure outperforms earlier approaches by wide margins.
Lecture Notes in Computer Science
Fragmentation techniques for XML data are gaining momentum within both distributed and centralized XML query engines and pose novel and unrecognized challenges to the community. Albeit not novel, and clearly inspired by the classical divide et impera principle, fragmentation for XML trees has been proved successful in boosting the querying performance, and in cutting down the memory requirements. However, fragmentation considered so far has been driven by semantics, i.e. built around query predicates. In this paper, we propose a novel fragmentation technique that founds on structural constraints of XML documents (size, tree-width, and tree-depth) and on special-purpose structure histograms able to meaningfully summarize XML documents. This allows us to predict bounding intervals of structural properties of output (XML) fragments for efficient query processing of distributed XML data. An experimental evaluation of our study confirms the effectiveness of our fragmentation methodology on some representative XML data sets.
2004
We propose an efficient approach for finding relevant XML data twigs defined by unordered query tree specifications. We use the tree signatures as the index structure and find qualifying patterns through integration of structurally consistent query path qualifications. An efficient technique is proposed and its implementation tested on real-life data collections.
2007 IEEE 23rd International Conference on Data Engineering, 2007
Content-based routing is the primary form of communication within publish/subscribe systems. In those systems data transmission is performed by sophisticated overlay networks of content-based routers, which match data messages against registered subscriptions and forward them based on this matching. Despite their inherent complexities, such systems are expected to deliver information in a timely and scalable fashion. As a result, their successful deployment is a strenuous task. Relevant efforts have so far focused on the construction of the overlay network and the filtering of messages at each broker. However, the efficient transmission of messages has received less attention. In this work, we propose a solution that gracefully handles the transmission task, while providing performance benefits for the matching task as well. Along those lines, we design RoXSum, a message representation scheme that aggregates the routing information from multiple documents in a way that permits subscription matching directly on the aggregated content. Our performance study shows that RoXSum is a viable and effective technique, as it speeds up message routing for more than an order of magnitude.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
Proceedings of the 2008 EDBT Ph.D. workshop, 2008
Lecture Notes in Computer Science, 2003
Lecture Notes in Computer Science, 2002
2015 IEEE International Conference on Computer and Information Technology; Ubiquitous Computing and Communications; Dependable, Autonomic and Secure Computing; Pervasive Intelligence and Computing
Journal of Discrete Algorithms, 2004
2007 International Conference on Multimedia and Ubiquitous Engineering (MUE'07), 2007
Journal of Advanced Computational Intelligence and Intelligent Informatics
Lecture Notes in Computer Science, 2003
Encyclopedia of Information Communication Technology