Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2001
Queries navigate semistructured data via path expressions, and can be accelerated using an index. Our solution encodes paths as strings, and inserts those strings into a special index that is highly optimized for long and complex keys. We describe the Index Fabric, an indexing structure that provides the efficiency and flexibility we need. We discuss how "raw paths" are used to optimize ad hoc queries over semistructured data, and how "refined paths" optimize specific access paths. Although we can use knowledge about the queries and structure of the data to create refined paths, no such knowledge is needed for raw paths. A performance study shows that our techniques, when implemented on top of a commercial relational database system, outperform the more traditional approach of using the commercial system's indexing mechanisms to query the XML.
The most promising and dominant data format for data processing and representing on the Internet is the Semistructured data form termed XML. XML data has no fixed schema; it evolved and is self describing which results in management difficulties compared to, for example relational data. XML queries differ from relational queries in that the former are expressed as path expressions. The efficient handling of structural relationships has become a key factor in XML query processing. It is therefore a major challenge for the database community to design query processing techniques and storage methods that can manage semistructured data efficiently. The main contribution of this paper is querying semistructured data using bitmap to represent path-value relationship and compress the bitmap to save space. The presented bitmap indexing and querying scheme termed BIQS data that stores the element path, token of the word, attribute and document number in a dynamically created matrix structure. We use word, attribute and path dictionaries for the construction of a Bitmap structure. This paper describes an algorithm to query semistructured data in a more time efficient way than is provided by other relational and semistructured query processing techniques. The presented BIQS structure provides storage and query performance improvement due to the compression of semistructured data.
International Journal of Metadata, Semantics and Ontologies, 2018
In today's digitally connected world, diverse applications use data in various formats. The flexible nature of the XML has motivated applications in various fields like technical to financial to drift towards the XML representation. The emerging drift towards XML applications increased the number of documents exponentially over the web. Thus, unprecedented growth in the usage of XML documents on the web warrants research attention towards efficient methodologies to facilitate accelerated query processing of XML documents. This paper proposes a new indexing structure which combines terminal sibling nodes at the same level into a single path thereby reducing the search space while querying. The main advantage of this index is that it can process the branch (twig) queries efficiently with fewer lookups and decompositions in contrast with the existing approaches. The results also show that they are processed with equal or better performance compared to the existing ones.
2014
Optimizing XML queries is an intensively studied problem in the field of databases of late. The topic has a host of applications, viz., web-scale XML and keyword search. In this paper, we address the problem of efficient execution of XML path queries (commonly known as XPath queries), branch queries and wild-card queries. Our index structure assists in fast identification of child-parent as well as ancestor-descendant relationship, thus increasing the efficiency of XPath query execution. Both XML data and queries possess an inherent tree structure and, thus, fast child-parent lookup is a necessity to improve performance. We propose a holistic hybrid index structure that combines the Extended Dewey labeling scheme with the CTree index structure to leverage advantages of both the mechanisms. Our index structure is capable of catering to all the queries (single path, branch and wild-card queries), with equal or better performance metrics when compared to the state-of-the-art.
2006
The mark-up language XML (Extensible Mark-up Language) is recently understood as a new approach to data modeling. A well- formed XML document or a set of documents is an XML database and the associated DTD or schema specified in the language XML Schema is its database schema. Implementation of a system enabling us to store and query XML documents eciently (so called native XML databases) requires a development of new techniques that make it possible to index an XML document in a way that provides an ecient evaluation of a user query. Most of XML query languages are based on the language XPath and use a form of path expressions for composing more general queries. In the paper we compare element-based and path-based approaches to indexing XML data. In the case of element-based approaches query is evaluated step by step. Each step produces a lot of elements which may be refused in the next evaluation step. In the paper we show that the previously published multi-dimensional path-based ...
2012
Database indices are fundamental data structures that improve the speed of data retrieval operations. In this paper, we focus on native XML database systems and provide an elementary survey of existing approaches for indexing semistructured data employed in selected academic open-source systems. Considering the requirements set for a particular system, ExDB, and the results of the accomplished research, we provide a design proposal of the indexing facility and discuss the properties of the solution we plan to subsequently realize.
1998
ABSTRACT. Query languages for object bases became enriched by generalized path expressions that allow for attribute and path variables. Optimizing queries containing generalized path expressions attracted some interest. However, many interesting queries require still a full scan over the whole object base. This unbearable situation can be remedied best by utilizing index structures. However, traditional database indexes fail to support generalized path expressions.
ACM Transactions on Database …, 2004
IEEE Transactions on Knowledge and Data Engineering, 1998
We present a new access method, called the path dictionary index (PDI) method, for supporting nested queries on object-oriented databases. PDI supports object traversal and associative search, respectively, with a path dictionary and a set of attribute indexes built on top of the path dictionary. We discuss issues on indexing and query processing in object-oriented databases; describe the operations of the new mechanism; develop cost models for its storage overhead and query and update costs; and compare the new mechanism to the path index method. The result shows that the path dictionary index method is significantly better than the path index method over a wide range of parameters in terms of retrieval and update costs and that the storage overhead grows slowly with the number of indexed attributes.
2009
We introduce a new technique for fast computation of structural join "pattern trees" in XML. Using a small amount of pre-computed path information, typically small enough to fit easily in main memory, we are able to render structural join computation almost independent of data set size. Our technique is amenable to bit-mapped processing, leading to further speed-up. In this paper, we present our technique and experimentally evaluate its performance.
Lecture Notes in Computer Science, 2004
While there are many proposals for path indexes on XML documents, none of them is perfectly suited for indexing large-scale collections of interlinked XML documents. Existing strategies lack support for intra-or inter-document links, require large amounts of time to build or space to store the index, or cannot efficiently answer connection queries. This paper presents the FliX framework for connection indexing that supports large, heterogeneous document collections with many links, using the existing path indexes as building blocks. We introduce some example configurations of the framework that are appropriate for many important application scenarios. Experiments show the feasibility of our approach.
The most promising and dominant data format for data processing and representation on the Internet is the semistructured data form termed XML. XML data has no fixed schema; it evolved and is self describing which results in management difficulties compared to, for example, relational data. It is therefore a major challenge for the database community to design query processing techniques and storage methods that can retrieve semistructured data efficiently. In this paper, we present a querying scheme for semistructured data views of relational form. The proposed technique stores element-paths, attributes, contents of the element paths and attributes, and XML processing instructions in a dynamic relational structure termed as Multi-XML-Data-Structure (MXDS).
Lecture Notes in Computer Science, 1999
XML queries are based on path expressions which are composed of some elements connected to each other in a tree pattern structure, called Query Tree Pattern (QTP). Thus, the core operation of XML query processing is finding all instances of QTP in the XML document. A number of methods are offered for QTP matching, but they process too many elements in XML document while most of them have no opportunity to participate in the final result. The exiting techniques have lots of limitations and disadvantages that are illustrated in detail in Chapter III. In this thesis, the author proposes a novel method which doesn't blindly processes elements of the document. The author abstracts structural relationships inside the XML documents to evaluate the XML queries. In contrast to the existing methods, in the proposed method only elements which have a chance to produce a result are processed and those which are definitely not part of any final result are ignored. An XML query is either a cha...
Advances in Database Technology - EDBT 2004, 2004
In this paper we present HOPI, a new connection index for XML documents based on the concept of the 2-hop cover of a directed graph introduced by Cohen et al. In contrast to most of the prior work on XML indexing we consider not only paths with child or parent relationships between the nodes, but also provide space-and time-efficient reachability tests along the ancestor, descendant, and link axes to support path expressions with wildcards in our XXL search engine. We improve the theoretical concept of a 2-hop cover by developing scalable methods for index creation on very large XML data collections with long paths and extensive cross-linkage. Our experiments show substantial savings in the query performance of the HOPI index over previously proposed index structures in combination with low space requirements.
Proceedings 2004 VLDB Conference, 2004
As XML usage grows for both data-centric and document-centric applications, introducing native support for XML data in relational databases brings significant benefits. It provides a more mature platform for the XML data model and serves as the basis for interoperability between relational and XML data. Whereas query processing on XML data shredded into one or more relational tables is well understood, it provides limited support for the XML data model. XML data can be persisted as a byte sequence (BLOB) in columns of tables to support the XML model more faithfully. This introduces new challenges for query processing such as the ability to index the XML blob for good query performance. This paper reports novel techniques for indexing XML data in the upcoming version of Microsoft® SQL Server™, and how it ties into the relational framework for query processing.
2003
With the growing importance of XML in data exchange, much research has been done in providing flexible query facilities to extract data from structured XML documents. In this paper, we propose ViST, a novel index structure for searching XML documents. By representing both XML documents and XML queries in structure-encoded sequences, we show that querying XML data is equivalent to finding subsequence matches. Unlike index methods that disassemble a query into multiple sub-queries, and then join the results of these sub-queries to provide the final answers, ViST uses tree structures as the basic unit of query to avoid expensive join operations. Furthermore, ViST provides a unified index on both content and structure of the XML documents, hence it has a performance advantage over methods indexing either just content or structure. ViST supports dynamic index update, and it relies solely on B + Trees without using any specialized data structures that are not well supported by DBMSs. Our experiments show that ViST is effective, scalable, and efficient in supporting structural queries.
Journal of Computing and Information Technology
Querying nested data has become one of the most challenging issues for retrieving desired information from the Web. Today diverse applications generate a tremendous amount of data in different formats. These data and information exchanged on the Web are commonly expressed as nested representation such as XML, JSON, etc. Unlike the traditional database system, they do not possess a rigid schema. In general, the nested data is managed by storing data and its structures separately which significantly reduces the performance of data retrieving. Ensuring efficiency of processing queries which locates the exact positions of the elements has become a big challenging issue. There are different indexing structures which have been proposed in the literature to improve the performance of the query processing on the nested structure. Most of the past researches on nested structure concentrate on the structure alone. This paper proposes new index structure which combines siblings of the terminal nodes as one path which efficiently processes twig queries with less number of lookups and joins. The proposed approach is compared with some of the existing approaches. The results also show that they are processed with better performance compared to the existing ones. ACM CCS (2012) Classification: → Information systems → Data management systems → Query language Information systems → World Wide Web → Web searching and information discovery → Web search engines → Web indexing
The explosive growth of XML has led to an increasing need for scalable XML retrieval systems. Our XML retrieval system, the SQLGenerator, stores XML of any schema in a fixed schema relational database and supports a fullfeatured semistructured query language, XML-QL, through optimized translation of its semantics to relational SQL queries. This paper examines the scalability of this approach with respect to increasing data size. We index four XML collections ranging in size from 500MB to 2GB that were generated using a standard XML generator, XBench. We then compare the execution times of 11 standard XBench queries, covering a wide range of semistructured query features, whose semantics were directly translatable from their original XQuery language to XML-QL. Although it is difficult to estimate the theoretical baseline for scalability of these query features in an RDBMS, many of the queries' runtimes grow linearly with respect to the size of the document collection.
2004
Abstract Path expressions are ubiquitous in XML processing languages. Existing approaches evaluate a path expression by selecting nodes that satisfies the tag-name and value constraints and then joining them according to the structural constraints. We propose a novel approach, next-of-kin (NoK) pattern matching, to speed up the node-selection step, and to reduce the join size significantly in the second step.
Indexing XML data to facilitate query processing has been a popular subject of study in recent years. Most of previous studies can be classified into three categories: path indexing, node indexing and sequence-based indexing. Many of them cannot answer both single-path and branching queries with various value predicates very efficiently. In this paper, we propose a novel compact tree (Ctree) structure, which provides not only concise path summaries but also detailed element relationships, and a configurable index scheme based on data statistics. We develop an efficient Ctree-based method for processing a tree structure query with various value constraints. Efficiency of our method is achieved by: (1) summarizing a large database into a condensed structure view to prune irrelevant search space; (2) evaluating a tree structure query directly without expensive join operations; (3) using Ctree properties such as trivial groups and bi-direction to reduce query processing time; (4) using ...
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.