Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2009
In this paper we present Regular XPath (RXPath), which is a natural extension of XPath with regular expressions over paths that has the same computational properties as XPath: linear-time query evaluation and exponential-time reasoning. To establish these results, we devise a unifying automata-theoretic framework based on two-way weak alternating tree automata. Specifically, we consider automata that have infinite runs on finite trees.
Information and Computation, 2014
Automata on infinite trees Expressive Description Logics (DLs) have been advocated as formalisms for modeling the domain of interest in various application areas, including the Semantic Web, data and information integration, peer-to-peer data management, and ontology-based data access. An important requirement there is the ability to answer complex queries beyond instance retrieval, taking into account constraints expressed in a knowledge base. We consider this task for positive 2-way regular path queries (P2RPQs) over knowledge bases in the expressive DL ZIQ. P2RPQs are more general than conjunctive queries, union of conjunctive queries, and regular path queries from the literature. They allow regular expressions over roles and data joins that require inverse paths. The DL ZIQ extends the core DL ALC with qualified number restrictions, inverse roles, safe Boolean role expressions, regular expressions over roles, and concepts of the form ∃S.Self in the style of the DL SRIQ. Using techniques based on two-way tree-automata, we first provide as a stepping stone an elegant characterization of TBox and ABox satisfiability testing which gives us a tight ExpTime bound for this problem (under unary number encoding). We then establish a double exponential upper bound for answering P2RPQs over ZIQ knowledge bases; this bound is tight. Our result significantly pushes the frontier of 2ExpTime decidability of query answering in expressive DLs, both with respect to the query language and the considered DL. Furthermore, by reducing the well known DL SRIQ to ZIQ (with an exponential blow-up in the size of the knowledge base), we also provide a tight 2ExpTime upper bound for knowledge base satisfiability in SRIQ and establish the decidability of query answering for this significant fragment of the new OWL 2 standard.
2015
HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et a ̀ la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
Journal of Artificial Intelligence Research, 2015
We investigate model theoretic properties of XPath with data (in)equality tests over the class of data trees, i.e., the class of trees where each node contains a label from a finite alphabet and a data value from an infinite domain. We provide notions of (bi)simulations for XPpath logics containing the child, parent, ancestor and descendant axes to navigate the tree. We show that these notions precisely characterize the equivalence relation associated with each logic. We study formula complexity measures consisting of the number of nested axes and nested subformulas in a formula; these notions are akin to the notion of quantifier rank in first-order logic. We show characterization results for fine grained notions of equivalence and (bi)simulation that take into account these complexity measures. We also prove that positive fragments of these logics correspond to the formulas preserved under (non-symmetric) simulations. We show that the logic including the child axis is equivalent to...
1998
Query languages for data with irregular structure use regular path expressions for navigation. This feature is useful for querying data where parts of the structure is either unknown, unavailable to the user, or changes frequently. Naive execution of regular path expressions is inefficient however, because it ignores any structure in the data. We describe two optimization techniques for queries with regular path expressions. Both rely on graph schemas for specifying partial knowledge about the data's structure. Query pruning uses this structure to restrict navigation to only a fragment of the data; we give an efficient algorithm for rewriting any regular path expression query into a pruned one. Query rewriting using state extents can eliminate or reduce navigation altogether; it is reminiscent of optimizing relational queries using indices. There may be several ways to optimize a query using state extents; we give a polynomial space algorithm that finds all such optimizations. For restricted forms of regular path expressions, the algorithm is provably efficient. We also give an efficient approximation algorithm that works on all regular path expressions
2003
Abstract We present SPEX, a streamed and progressive evaluation of regular path expressions with XPath-like qualifiers against XML streams. SPEX proceeds as follows. An expression is translated in linear time into a network of transducers, most of them having 1-DPDT equivalents. Every stream message is then processed once by the entire network and result fragments are output on the fly.
2004
XPath 2.0 path expressions can observe and preserve the document order and identity of XML values in a document. In particular, their semantics requires that the complete result and the result of each individual step in a path expression be in document order and duplicate-free. Implementations of this semantics often guarantee correctness by inserting explicit operations that sort and remove duplicates after each step. Such operations, however, can be redundant, because an intermediate result may already be ...
We study the complexity of two central XML processing problems. The first is XPath 1.0 query processing, which has been shown to be in PTime in previous work. We prove that both the data complexity and the query complexity of XPath 1.0 fall into lower (highly parallelizable) complexity classes, while the combined complexity is PTime-hard. Subsequently, we study the sources of this hardness and identify a large and practically important fragment of XPath 1.0 for which the combined complexity is LogCFL-complete and, therefore, in the highly parallelizable complexity class NC 2 . The second problem is the complexity of validating XML documents against various typing schemes like Document Type Definitions (DTDs), XML Schema Definitions (XSDs), and tree automata, both with respect to data and to combined complexity. For data complexity, we prove that validation is in LogSpace and depends crucially on how XML data is represented. For the combined complexity, we show that the complexity ranges from LogSpace to LogCFL, depending on the typing scheme.
Proceedings IEEE Advances in Digital Libraries 2000, 2000
We address the problem of tight XML schemas and propose regular tree automata to model XML data. We show that the tree automata model is more powerful that the XML DTDs and is closed under main algebraic operations. We introduce the XML query algebra based the tree automata model, and discuss the query optimization and query pruning techniques. Finally, we show the conversion of tree automata schema into XML DTDs.
2003
XML queries are usually expressed by means of XPath expressions identifying portions of the selected documents. An XPath expression defines a way of navigating an XML tree and returns the set of nodes which are reachable from one or more starting nodes through the paths specified by the expression. The problem of efficiently answering XPath queries is very interesting and has recently received increasing attention by the research community. In particular, an increasing effort has been devoted to define effective optimization techniques for XPath queries. One of the main issues related to the optimization of XPath queries is their minimization. The minimization of XPath queries has been studied for limited fragments of XPath, containing only the descendent, the child and the branch operators. In this work, we address the problem of minimizing XPath queries for a more general fragment, containing also the wildcard operator. We characterize the complexity of the minimization of XPath queries, stating that it is NP-hard, and propose an algorithm for computing minimum XPath queries. Moreover, we identify an interesting tractable case and propose an ad hoc algorithm handling the minimization of this kind of queries in polynomial time.
2004
XPath 2.0 path expressions can observe and preserve the document order and identity of XML values in a document. In particular, their semantics requires that the complete result and the result of each individual step in a path expression be in document order and duplicate-free. Implementations of this semantics often guarantee correctness by inserting explicit operations that sort and remove duplicates after each step. Such operations, however, can be redundant, because an intermediate result may already be ...
We present an algorithm to solve XPath decision problems under regular tree type constraints and show its use to statically typecheck XPath queries. To this end, we prove the decidability of a logic with converse for finite ordered trees whose time complexity is a simple exponential of the size of a formula. The logic corresponds to the alternation free modal µ-calculus without greatest fixpoint restricted to finite trees where formulas are cycle-free.
2014
We investigate model theoretic properties of XPath with data (in)equality tests over the class of data trees, i.e., the class of trees where each node contains a label from a nite alphabet and a data value from an innite domain. We provide notions of (bi)simulations for XPath logics containing the child, descendant, parent and ancestor axes to navigate the tree. We show that these notions precisely characterize the equivalence relation associated with each logic. We study formula complexity measures consisting of the number of nested axes and nested subformulas in a formula; these notions are akin to the notion of quantier rank in rst-order logic. We show characterization results for ne grained notions of equivalence and (bi)simulation that take into account these complexity measures. We also prove that positive fragments of these logics correspond to the formulas preserved under (non-symmetric) simulations. We show that the logic including the child axis is equivalent to the fragme...
2003
We propose a new, highly scalable and efficient technique for evaluating node-selecting queries on XML trees which is based on recent advances in the theory of tree automata. Our query processing techniques require only two linear passes over the XML data on disk, and their main memory requirements are in principle independent of the size of the data. The overall running time is O(m + n), where m only depends on the query and n is the size of the data.
Our experimental analysis of several popular XPath processors reveals a striking fact: Query evaluation in each of the systems requires time exponential in the size of queries in the worst case. We show that XPath can be processed much more efficiently, and propose main-memory algorithms for this problem with polynomial-time combined query evaluation complexity. Moreover, we present two fragments of XPath for which linear-time query processing algorithms exist.
Lecture Notes in Computer Science, 2005
lille3.fr/mostrare N-ary queries in trees select sets of n-tuples of nodes. We propose and investigate representation formalisms for n-ary queries by tree automata, both for ranked and unranked trees. We show that existential run-based queries capture MSO in the n-ary case, as well as universal run-based queries. We then characterize queries by runs of unambiguous tree automata, and show how to decide whether an MSO defined query belongs to this class.
2004
XPath was introduced as the standard language for addressing parts of XML documents, and has been widely adopted by practioners and theoreticaly studied. We aim at building a logical framework for formal study and analysis of XPath and have to face the combinatorial complexity of formal proofs caused by XPath expressive power. We chose the Coq proof assistant and its powerful inductive constructions to rigorously investigate XPath peculiarities. We focus in this paper on a basic modeling of XPath syntax and semantics, and make two contributions. First, we propose a new formal semantics, which is an interpretation of paths as first order logic propositions that turned out to greatly simplify our formal proofs. Second, we formally prove that this new interpretation is equivalent to previously known XPath denotational semantics , opening perspectives for more ambitious mathematical characterizations. We illustrate our Coq based model through several examples and we develop a formal proof of a simple yet significant XPath property that compare quite favorably to a former informal proof proposed in .
2010
We provide complete axiomatizations for several fragments of Core XPath, the navigational core of XPath 1.0 introduced by Gottlob, Koch and Pichler. A complete axiomatization for a given fragment is a set of equivalences from which every other valid equivalence is derivable; equivalences can be thought of as (undirected) rewrite rules. Specifically, we axiomatize single axis fragments of Core XPath as well as full Core XPath. Our completeness proofs use results and techniques from modal logic.
Proceedings of the 12th International Conference on Extending Database Technology Advances in Database Technology - EDBT '09, 2009
The problem of rewriting a query using a materialized view is studied for a well known fragment of XPath that includes the following three constructs: wildcards, descendant edges and branches. In earlier work, determining the existence of a rewriting was shown to be coNP-hard, but no tight complexity bound was given. While it was argued that Σ p 3 is an upper bound, the proof was based on results that have recently been refuted. Consequently, the exact complexity (and even decidability) of this basic problem has been unknown, and there have been no practical rewriting algorithms if the query and the view use all the three constructs mentioned above. It is shown that under fairly general conditions, there are only two candidates for rewriting and hence, the problem can be practically solved by two containment tests. In particular, under these conditions, determining the existence of a rewriting is coNP-complete. The proofs utilize various novel techniques for reasoning about XPath patterns. For the general case, the exact complexity remains unknown, but it is shown that the problem is decidable.
Lecture Notes in Computer Science, 2004
The problem of selecting nodes in unranked trees is the most basic querying problem for XML. We propose stepwise tree automata for querying unranked trees. Stepwise tree automata can express the same monadic queries as monadic Datalog and monadic second-order logic. We prove this result by reduction to the ranked case, via a new systematic correspondence that relates unranked and
2005
Automata for unranked trees form a foundation for XML schemas, querying and pattern languages. We study the problem of efficiently minimizing such automata. We start with the unranked tree automata (UTAs) that are standard in database theory, assuming bottomup determinism and that horizontal recursion is represented by deterministic finite automata. We show that minimal UTAs in that class are not unique and that minimization is np-hard. We then study more recent automata classes that do allow for polynomial time minimization. Among those, we show that bottom-up deterministic stepwise tree automata yield the most succinct representations.