Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
Journal of Advanced Computational Intelligence and Intelligent Informatics
In this paper, we present a polynomial-time algorithm for TPQ (tree pattern queries) minimization without XML constraints involved. The main idea of the algorithm is a dynamic programming strategy to find all the matching subtrees within a TPQ. A matching subtree implies a redundancy and should be removed in such a way that the semantics of the original TPQ is not damaged. Our algorithm consists of two parts: one for subtree recognization and the other for subtree deletion. Both of them needs only O(<I>n</I>2) time, where <I>n</I> is the number of nodes in a TPQ.
Encyclopedia of Information Communication Technology
XML employs a tree-structured model for representing data. Queries in XML query languages, for example, XPath (World Wide Web Consortium, 1999), XQuery (World Wide Web Consortium, 2001), XML-QL (Deutch, Fernandex, Florescu, Levy, & Suciu, 1999), and Quilt (Chamberlin, Clark, Florescu, & Stefanescu 1999; Chamberlin, Robie, & Florescu, 2000), typically specify patterns of selection predicates on multiple elements that have some specified tree structured relationships. For instance, the following XPath expression: a[b[c and //d]]/b[c and e//d] asks for any node of type b that is a child of some node of type a. In addition, the b-node is the parent of some c-node and some e-node, as well as an ancestor of some d-node. In general, such an expression can be represented by a tree structure as shown in Figure 1(a). In such a tree pattern, the nodes are types from S ? {*} (* is a wildcard, matching any node type), and edges are parent-child or ancestor-descendant relationships. Among all the...
Journal of Advanced Computational Intelligence and Intelligent Informatics
In this paper, we provide a polynomial-time tree pattern query minimization algorithm whose efficiency stems from two key observations: (i) Inherent redundant “components” usually exist inside the rudimentary query provided by the user. (ii) Irredundant nodes may become redundant when constraints such as co-occurrence and required child/descendant are given. We show the result that the algorithm obtained by first augmenting the input tree pattern using the constraints, and then applying minimization, always finds the unique minimal equivalent to the original query. We complement our analytical results with an experimental study that shows the effectiveness of our tree pattern minimization techniques.
2005
Popular XML languages, like XPath, use "treepattern" queries to select nodes based on their structural characteristics. While many processing methods have already been proposed for such queries, none of them has found its way to any of the existing "lightweight" XML engines (i.e. engines without optimization modules). The main reason is the lack of a systematic comparison of query methods under a common storage model. In this work, we aim to fill this gap and answer two important questions: what the relative similarities and important differences among the tree-pattern query methods are, and if there is a prominent method among them in terms of effectiveness and robustness that an XML processor should support. For the first question, we propose a novel classification of the methods according to their matching process. We then describe a common storage model and demonstrate that the access pattern of each class conforms or can be adapted to conform to this model. Finally, we perform an experimental evaluation to compare their relative performance. Based on the evaluation results, we conclude that the family of holistic processing methods, which provides performance guarantees, is the most robust alternative for such an environment.
cerc.wvu.edu
Abstract− An XML tree pattern query, represented as a labeled tree, is essentially a complex selection predicate on both structure and content of an XML. Tree pattern matching has been identified as a core operation in querying XML data. We distinguish between two ...
2010 International Conference on Information Science and Applications, 2010
Since the extensible markup language XML emerged as a new standard for information representation and exchange on the Internet, the problem of storing, indexing, and querying XML documents has been among the major issues of database research. In this paper, we study the tree pattern matching and discuss a new algorithm for processing ordered tree pattern queries, by which not only ancestor/descendant relationships, but also left-to-right ordering of query nodes are considered. Such kind of tree matching has many applications in practice, such as the linguistic analysis, the video content-based retrieval, as well as the computational biology and the data mining. The time complexities of the new algorithm is bounded by O(|D| ⋅ |Q| + |T| ⋅ leaf Q) and its space overhead is by O(leaf T ⋅ leaf Q), where T stands for a document tree, Q for a tree pattern and D is the largest data stream among all the data streams associated with the nodes in Q. Each data stream contains the database nodes that match the predicate at a node q. leaf T (leaf Q) represents the number of the leaf nodes of T (resp. Q). In addition, the algorithm can be adapted to an indexing environment with XB-trees being used. Experiments have been conducted, which shows that our algorithm is promising.
The VLDB Journal The International Journal on Very Large Data Bases, 2002
Tree patterns form a natural basis to query treestructured data such as XML and LDAP. To improve the efficiency of tree pattern matching, it is essential to quickly identify and eliminate redundant nodes in the pattern. In this paper, we study tree pattern minimization both in the absence and in the presence of integrity constraints (ICs) on the underlying tree-structured database. In the absence of ICs, we develop a polynomial-time query minimization algorithm called CIM, whose efficiency stems from two key properties: (i) a node cannot be redundant unless its children are; and (ii) the order of elimination of redundant nodes is immaterial. When ICs are considered for minimization, we develop a technique for query minimization based on three fundamental operations: augmentation (an adaptation of the well-known chase procedure), minimization (based on homomorphism techniques), and reduction. We show the surprising result that the algorithm, referred to as ACIM, obtained by first augmenting the tree pattern using ICs, and then applying CIM, always finds the unique minimal equivalent query. While ACIM is polynomial time, it can be expensive in practice because of its inherent non-locality. We then present a fast algorithm, CDM, that identifies and eliminates local redundancies due to ICs, based on propagating "information labels" up the tree pattern. CDM can be applied prior to ACIM for improving the minimization efficiency. We complement our analytical results with an experimental study that shows the effectiveness of our tree pattern minimization techniques.
2004
We propose an efficient approach for finding relevant XML data twigs defined by unordered query tree specifications. We use the tree signatures as the index structure and find qualifying patterns through integration of structurally consistent query path qualifications. An efficient technique is proposed and its implementation tested on real-life data collections.
XML is a self-describing data representation format with a flexible structure. Since hundreds of XML-based languages have been developed, XML is widely accepted as a standard for data representation and information exchange over the internet. The major advantage of using XML is that it allows the users to create their own tags. This kind of increasing popularity of XML attracted the Business and enterprises to make queries on XML data more frequently. There is an increasing demand for efficient and effective query processing on XML data. For performing query processing operations an input XML File is required. Such an XML Files can be viewed as an XML Tree using DOM parser. XML-DOM Parser is mainly used to store, access and manipulate our XML Tree. We have proposed a new search engine named XML Search engine for pattern matching.
ACM SIGMOD Record, 2001
Tree patterns forms a natural basis to query tree-structured data such as XML and LDAP. Since the efficiency of tree pattern matching against a tree-structured database depends on the size of the pattern, it is essential to identify and eliminate redundant nodes in the pattern and do so as quickly as possible. In this paper, we study tree pattern minimization both in the absence and in the presence of integrity constraints (ICs) on the underlying tree-structured database. When no ICs are considered, we call the process of minimizing a tree pattern, constraint-independent minimization. We develop a polynomial time algorithm called CIM for this purpose. CIM's efficiency stems from two key properties: (i) a node cannot be redundant unless its children are, and (ii) the order of elimination of redundant nodes is immaterial. When ICs are considered for minimization, we refer to it as constraint-dependent minimization. For tree-structured databases, required child/descendant and type ...
Information Systems, 2009
Tree pattern matching is a fundamental problem that has a wide range of applications in Web data management, XML processing, and selective data dissemination. In this paper we develop efficient algorithms for the tree homeomorphism problem, i.e., the problem of matching a tree pattern with exclusively transitive (descendant) edges. We first prove that deciding whether there is a tree homeomorphism is LOGSPACE-complete, improving on the current LOGCFL upper bound. Furthermore, we develop a practical algorithm for the tree homeomorphism decision problem that is both space-and timeefficient. The algorithm is in LOGDCFL and space consumption is strongly bounded, while the running time is linear in the size of the data tree. This algorithm immediately generalizes to the problem of matching the tree pattern against all subtrees of the data tree, preserving the mentioned efficiency properties.
Lecture Notes in Computer Science, 2004
We propose an efficient approach for finding relevant XML data twigs defined by unordered query tree specifications. We use the tree signatures as the index structure and find qualifying patterns through integration of structurally consistent query path qualifications. An efficient algorithm is proposed and its implementation tested on real-life data collections.
IEEE Transactions on Knowledge and Data Engineering, 2008
An XML publish/subscribe system needs to filter a large number of queries over XML streams. Most existing systems only consider filtering the simple XPath statements. In this paper, we focus on filtering of the more complex Generalized-Tree-Pattern (GTP) queries. Our filtering mechanism is based on a novel Tree-of-Path (TOP) encoding scheme, which compactly represents the path matches for the entire document. First, we show that the TOP encodings can be efficiently produced via shared bottom-up path matching. Second, with the aid of this TOP encoding, we can 1) achieve polynomial time and space complexity for postprocessing, 2) avoid redundant predicate evaluations, 3) allow an efficient duplicate-free and merge join-based algorithm for merging multiple encoded path matches, and 4) simplify the processing of GTP queries. Overall, our approach maximizes the sharing opportunity across queries by exploiting the suffix as well as prefix sharing. At the same time, our TOP encodings allow efficient postprocessing for GTP queries. Extensive performance studies show that GFilter not only achieves significantly better filtering performance than state-ofthe-art algorithms but also is capable of efficiently filtering the more complex GTP queries.
2003
With the growing importance of XML in data exchange, much research has been done in providing flexible query facilities to extract data from structured XML documents. In this paper, we propose ViST, a novel index structure for searching XML documents. By representing both XML documents and XML queries in structure-encoded sequences, we show that querying XML data is equivalent to finding subsequence matches. Unlike index methods that disassemble a query into multiple sub-queries, and then join the results of these sub-queries to provide the final answers, ViST uses tree structures as the basic unit of query to avoid expensive join operations. Furthermore, ViST provides a unified index on both content and structure of the XML documents, hence it has a performance advantage over methods indexing either just content or structure. ViST supports dynamic index update, and it relies solely on B + Trees without using any specialized data structures that are not well supported by DBMSs. Our experiments show that ViST is effective, scalable, and efficient in supporting structural queries.
and order restriction(<).
Lecture Notes in Computer Science, 2006
Query processing performance in XML databases can be greatly enhanced by the usage of materialized views whose content has been stored in the database. This requires a method for identifying query subexpressions matching the views, a process known as view-based query rewriting. This process is quite complex for relational databases, and all the more daunting on XML databases. Current XML materialized view proposals are based on tree patterns, since query navigation is conceptually close to such patterns. However, the existing algorithms for extracting tree patterns from XQuery do not detect patterns across nested query blocks. Thus, complex, useful tree pattern views may be missed by the rewriting algorithm. We present a novel tree pattern extraction algorithm from XQuery queries, able to identify larger patterns than previous methods. Our algorithm has been implemented in an XML database prototype [5]. We study its performance, and the overall benefits of our tree pattern identification approch.
International Journal of Web Information Systems, 2008
PurposeEfficient processing of XML queries is critical for XML data management and related applications. Previously proposed techniques are unsatisfactory. The purpose of this paper is to present Determined – a new prototype system designed for XML query processing and optimization from a system perspective. With Determined, a number of novel techniques for XML query processing are proposed and demonstrated.Design/methodology/approachThe methodology emphasizes on query pattern minimization, logic‐level optimization, and efficient query execution. Accordingly, three lines of investigation have been pursued in the context of Determined: XML tree pattern query (TPQ) minimization; logic‐level XML query optimization utilizing deterministic transformation; and specialized algorithms for fast XML query execution.FindingsDeveloped and demonstrated were: a runtime optimal and powerful algorithm for XML TPQ minimization; a unique logic‐level XML query optimization approach that solely pursues...
2013
Extensible Markup Language (XML) has gained importance in web and middleware development from the end of last millennium. It is write-it-yourself markup language that one uses to describe data and it allows for more precise structuring of that data than is possible with more rigid markup language. XML is the product of the world wide web consortium, W3C. As XML is involved in the access of information, the intermediate processing on XML should not be the bottleneck to deteriorate the performance of the data access. The choice data structure to represent the XML document is a tree. Twig pattern matching on XML trees is core operation for optimal evaluation of XML queries. But optimality of any pattern matching algorithm depends on labeling scheme applied to the logical tree of XML document on which twig pattern is to be matched. Most of existing labeling schemes computes the labels by traversing the logical tree of XML document in some order which is considered for prediction of rela...
2006
The evaluation of Xpath expressions can be handled as a tree embedding problem. In this paper, we propose two strategies on this issue. One is ordered-tree embedding based and the other is unordered-tree embedding based. For the ordered-tree embedding, our algorithm needs only O(|T|⋅|P|) time and O(|T|⋅|P|) space, where |T| and |P| stands for the numbers of the nodes in the target tree T and the pattern tree P, respectively. For the unordered-tree embedding, we give an algorithm that needs O(|T|⋅|P|⋅2) time, where k is the largest outdegree of any node in P.
2002
XML queries typically specify patterns of selection predicates on multiple elements that have some specified tree structured relationships. The primitive tree structured relationships are parent-child and ancestor-descendant, and finding all occurrences of these relationships in an XML database is a core operation for XML query processing.
XML queries are based on path expressions which are composed of some elements connected to each other in a tree pattern structure, called Query Tree Pattern (QTP). Thus, the core operation of XML query processing is finding all instances of QTP in the XML document. A number of methods are offered for QTP matching, but they process too many elements in XML document while most of them have no opportunity to participate in the final result. The exiting techniques have lots of limitations and disadvantages that are illustrated in detail in Chapter III. In this thesis, the author proposes a novel method which doesn't blindly processes elements of the document. The author abstracts structural relationships inside the XML documents to evaluate the XML queries. In contrast to the existing methods, in the proposed method only elements which have a chance to produce a result are processed and those which are definitely not part of any final result are ignored. An XML query is either a cha...
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.