Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2004, Università di Pisa eBooks
…
20 pages
1 file
A bottom-up subtree P of a labeled unordered tree T is such that, for each internal vertex u of P , all the children of u in T are also vertices of P , and the labels in corresponding positions also match. We aim to finding all the occurrences of a pattern tree P of m vertices as a bottom-up subtree of a text tree T of n vertices, m ≤ n. If the labels are single characters of a constant or of an n-integer alphabet Σ, the problem is solved in O(m + log n) time and Θ(m) additional space, after a preprocessing of T is done in Θ(n) time and Θ(n) additional space. Note that the number of occurrences of P in T does not appear in the search time. For more complex labels the running times increase, becoming a function of the total length of all the labels in T and P if such labels are sequences of characters. Regarding T as a static text and P as the contents of a query on T , and assuming m = o(n), the response time for each P is sublinear in the size of the overall structure.
Information Processing Letters, 1992
In this paper we study subtree isomorphism and its relationships with some important symbolic problems, Recently, a linear time algorithm for ordered subtree isomorphism was given by Makinen. We note that a subtree of a rooted tree T, in MCkinen's paper, refers to the tree rooted at any vertex in T. We consider ordered subtree isomorphism when a broader definition of subtree is used, viz., a subtree of T is any connected subgraph, including L', of the tree rooted at any vertex ~1 in T. We give some evidence to show that a linear time algorithm for this problem is unlikely. We also give an almost linear time reduction from the important problem of pattern matching to ordered subtree isomorphism. Finally, we present simpler linear time algorithms for ordered and unordered subtree isomorphism (with the more restrictive definition of subtree).
2010 International Conference on Information Science and Applications, 2010
Since the extensible markup language XML emerged as a new standard for information representation and exchange on the Internet, the problem of storing, indexing, and querying XML documents has been among the major issues of database research. In this paper, we study the tree pattern matching and discuss a new algorithm for processing ordered tree pattern queries, by which not only ancestor/descendant relationships, but also left-to-right ordering of query nodes are considered. Such kind of tree matching has many applications in practice, such as the linguistic analysis, the video content-based retrieval, as well as the computational biology and the data mining. The time complexities of the new algorithm is bounded by O(|D| ⋅ |Q| + |T| ⋅ leaf Q) and its space overhead is by O(leaf T ⋅ leaf Q), where T stands for a document tree, Q for a tree pattern and D is the largest data stream among all the data streams associated with the nodes in Q. Each data stream contains the database nodes that match the predicate at a node q. leaf T (leaf Q) represents the number of the leaf nodes of T (resp. Q). In addition, the algorithm can be adapted to an indexing environment with XB-trees being used. Experiments have been conducted, which shows that our algorithm is promising.
Information Processing Letters, 1989
Citeseer
This paper deals with succinct representations of data types motivated by applications in posting lists for search engines, in querying XML documents, and in the more general setting (which extends XML) of multi-labeled trees, where several labels can be assigned to each node of a tree. To find the set of references corresponding to a set of keywords, one typically intersects the list of references associated with each keyword. We view this instead as having a single list of objects [n] = {1, . . . , n} (the references), each of which has a subset of the labels [σ] = {1, . . . , σ} (the keywords) associated with it. We are able to find the objects associated with an arbitrary set of keywords in time O(δk lg lg σ) using a data structure requiring only t(lg σ + o(lg σ)) bits, where δ is the number of steps required by a non-deterministic algorithm to check the answer, k is the number of keywords in the query, σ is the size of the set from which the keywords are chosen, and t is the number of associations between references and keywords. The data structure is succinct in that it differs from the space needed to write down all t occurrences of keywords by only a lower order term. An XML document is, for our purpose, a labeled rooted tree. We deal primarily with "non-recursive labeled trees", where no label occurs more than once on any root to leaf path. We find the set of nodes which path from the root include a set of keywords in the same time, O(δk lg lg σ), on a representation of the tree using essentially minimum space, 2n + n(lg σ + o(lg σ)) bits, where n is the number of nodes in the tree. If we permit nodes to have multiple labels, this space bound becomes 2n + t(lg σ + o(lg σ)) bits, that is the information theoretic lower bound for an ordinal tree (a node can have an arbitrary number of children ordered from left to right) plus that for the multiple labeling, where t is the total number of labels assigned. In proving those results, we consider two data-structures if independant interest: we give an encoding for σ by n boolean matrices, using optimal space and supporting in time O(lg lg σ) the operators access (the value at the intersection of a row and a column) rank (how many matches occur in this row to the left of this entry, or how many are in this column and above), and select (find the r-th match in this row, or in this column); and we give an encoding for labeled trees of n nodes and σ labels, using optimal space and supporting in time O(lg lg σ) the labeled based operator labeltree desc(a, x), which finds the first descendant of x labeled a.
Theoretical Computer Science, 2007
This paper deals with succinct representations of data types motivated by applications in posting lists for search engines, in querying XML documents, and in the more general setting (which extends XML) of multi-labeled trees, where several labels can be assigned to each node of a tree. To find the set of references corresponding to a set of keywords, one typically intersects the list of references associated with each keyword. We view this instead as having a single list of objects [n] = {1, . . . , n} (the references), each of which has a subset of the labels [σ] = {1, . . . , σ} (the keywords) associated with it. We are able to find the objects associated with an arbitrary set of keywords in time O(δk lg lg σ) using a data structure requiring only t(lg σ + o(lg σ)) bits, where δ is the number of steps required by a non-deterministic algorithm to check the answer, k is the number of keywords in the query, σ is the size of the set from which the keywords are chosen, and t is the number of associations between references and keywords. The data structure is succinct in that it differs from the space needed to write down all t occurrences of keywords by only a lower order term. An XML document is, for our purpose, a labeled rooted tree. We deal primarily with "non-recursive labeled trees", where no label occurs more than once on any root to leaf path. We find the set of nodes which path from the root include a set of keywords in the same time, O(δk lg lg σ), on a representation of the tree using essentially minimum space, 2n + n(lg σ + o(lg σ)) bits, where n is the number of nodes in the tree. If we permit nodes to have multiple labels, this space bound becomes 2n + t(lg σ + o(lg σ)) bits, that is the information theoretic lower bound for an ordinal tree (a node can have an arbitrary number of children ordered from left to right) plus that for the multiple labeling, where t is the total number of labels assigned. In proving those results, we consider two data-structures if independant interest: we give an encoding for σ by n boolean matrices, using optimal space and supporting in time O(lg lg σ) the operators access (the value at the intersection of a row and a column) rank (how many matches occur in this row to the left of this entry, or how many are in this column and above), and select (find the r-th match in this row, or in this column); and we give an encoding for labeled trees of n nodes and σ labels, using optimal space and supporting in time O(lg lg σ) the labeled based operator labeltree desc(a, x), which finds the first descendant of x labeled a.
2006
The evaluation of Xpath expressions can be handled as a tree embedding problem. In this paper, we propose two strategies on this issue. One is ordered-tree embedding based and the other is unordered-tree embedding based. For the ordered-tree embedding, our algorithm needs only O(|T|⋅|P|) time and O(|T|⋅|P|) space, where |T| and |P| stands for the numbers of the nodes in the target tree T and the pattern tree P, respectively. For the unordered-tree embedding, we give an algorithm that needs O(|T|⋅|P|⋅2) time, where k is the largest outdegree of any node in P.
Information Systems, 2009
Tree pattern matching is a fundamental problem that has a wide range of applications in Web data management, XML processing, and selective data dissemination. In this paper we develop efficient algorithms for the tree homeomorphism problem, i.e., the problem of matching a tree pattern with exclusively transitive (descendant) edges. We first prove that deciding whether there is a tree homeomorphism is LOGSPACE-complete, improving on the current LOGCFL upper bound. Furthermore, we develop a practical algorithm for the tree homeomorphism decision problem that is both space-and timeefficient. The algorithm is in LOGDCFL and space consumption is strongly bounded, while the running time is linear in the size of the data tree. This algorithm immediately generalizes to the problem of matching the tree pattern against all subtrees of the data tree, preserving the mentioned efficiency properties.
ABSTgACT. Tree pattern matching is an interesting special problem which occurs as a crucial step m a number of programmmg tasks, for instance, design of interpreters for nonprocedural programming languages, automatic implementations of abstract data types, code optimization m compilers, symbohc computation, context searching in structure editors, and automatic theorem provmg. As with the sorting problem, the variations in requirements and resources for each application seem to preclude a uniform, umversal solution to the tree-pattern-matching problem. Instead, a collection of well-analyzed techmques, from which specific applications may be selected and adapted, should be sought. Five new techniques for tree pattern matching are presented, analyzed for time and space complexity, and compared with previously known methods. Particularly important are applications where the same patterns are matched against many subjects and where a subject may be modified incrementally Therefore, methods which spend some tune preprocessmg patterns in order to improve the actual matching time are included
Proceedings of the twenty-ninth annual ACM symposium on Theory of computing - STOC '97, 1997
lle main goal of this paper is to give an efficient algorithm for the Tree Pattern Matching problem. We also introduce and give an efficient algorithm for the Subset Matching problem. The Subset Matching problem is to find all occurrences of a pattern string p of length m in a text string t of length n, where each pattern and text location is a set of characters drawn from some alphabet. The pattern is said to occur at text position i if the set p~] is a subset of the set t[i + j-1], for allj, 1< j < m. Wegivean O((s+ n)log2 mlog(s + n)) randomized algorithm for this problem, wheres denotes the sum of the sizes of all the sets. Then we reduce the Tree Pattern Matching problem to a number of instances of the Subset Matching problem. This reduction takes linear time and the sum of the sizes of the Subset Matching problems obtained is also linear. Coupled with our first result, this implies an O(n logz m log n) time randomized algorithm for the Tree Pattern Matching problem.
In this paper, we consider two kinds of unordered tree matchings for evaluating tree pattern queries in XML databases. For the first kind of unordered tree matching, we propose a new algorithm, which runs in O(|D||Q|) time, where Q is a tree pattern and D is a largest data stream associated with a node of Q. It can also be adapted to an indexing environment with XB-trees being used to speed up disk access. Experiments have been conducted, showing that the new algorithm is promising. For the second of tree matching, the so-called strict unordered tree matching, we show that the problem is NP-complete by a reduction from the satisfiability problem.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
Lecture Notes in Computer Science, 1991
Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 2014
IEEE Transactions on Knowledge and Data Engineering, 2000
ACM SIGMOD Record, 2004
Scientific and Statistical …, 2002
Encyclopedia of Information Communication Technology
Proceedings of the 14th ACM SIGACT-SIGPLAN …, 1987
IEEE Transactions on Knowledge and Data Engineering, 1994
IEEE Transactions on Systems, Man, and Cybernetics, 1994
SIAM Journal on Computing, 2003
Knowledge and Information Systems, 2005
IEEE Transactions on Pattern Analysis and Machine Intelligence, 1998
The VLDB Journal The International Journal on Very Large Data Bases, 2002