Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2011
Abstract Distributing data collections by fragmenting them is an effective way of improving the scalability of a database system. While the distribution of relational data is well understood, the unique characteristics of the XML data and query model present challenges that require different distribution techniques. In this paper, we show how XML data can be fragmented horizontally and vertically.
2010
Abstract Distributing data collections by fragmenting them is an effective way of improving the scalability of a database system. While the distribution of relational data is well understood, the unique characteristics of XML data and its query model present challenges that require different distribution techniques. In this paper, we show how XML data can be fragmented horizontally and vertically.
2009
ABSTRACT Distributing data collections by fragmenting them is an effective way of improving the scalability of relational database systems. The unique characteristics of XML data present challenges that require different distribution techniques to achieve scalability. In this paper, we propose solutions to two of the problems encountered in distributed query processing and optimization on XML data, namely localization and pruning.
2010
The increasing volume of data stored as XML documents makes fragmentation techniques an alternative to the performance issues in query processing. Fragmented databases are feasible only if there is a transparent way to query the distributed database. Fragments allow for intra-query parallel processing and data reduction. This paper presents our methodology for XQuery query processing over distributed XML databases. The methodology comprises the steps of query decomposition, data localization, and global optimization. This methodology can be used in an XML database or in a system that publishes homogeneous views of semi-autonomous databases. An implementation has been done and experimental results can achieve performance improvements of up to 95% when compared to the centralized environment.
2010
Abstract Experience with relational systems has shown that distribution is an effective way of improving the scalability of query evaluation. In this paper, we show how distributed query evaluation can be performed in a vertically partitioned XML database system. We propose a novel technique for constructing distributed execution plans that is independent of local query evaluation strategies. We then present a number of optimizations that allow us to further improve the performance of distributed query execution.
2005
The data volume of XML repositories and the response time of query processing have become critical issues for many applications, especially for those in the Web. An interesting alternative to improve query processing performance consists in reducing the size of XML databases through fragmentation techniques. However, traditional fragmentation definitions do not directly apply to collections of XML documents. This work formalizes the fragmentation definition for collections of XML documents, and proposes an architecture for XQuery processing on top of fragmented XML data. This architecture was implemented in a system prototype named PartiX, which exploits intra-query parallelism on top of XQueryenabled sequential DBMS modules. We have analyzed several experimental settings, and our results showed a performance improvement of up to a 72 scale up factor against centralized databases.
XML is a standard of data exchange between web applications such as in e-commerce, elearning and other web portals. The data volume has grown substantially in the web and in order to effectively retrieve or store these data, it is recommended to be physically or virtually fragmented and distributed into different nodes. Basically, fragmentation design contains of two parts: fragmentation operation and fragmentation method. There are three different kinds of fragmentation operation: Horizontal, Vertical and Hybrid, determines how the XML should be fragmented. The aim of this paper is to give an overview on the fragmentation design consideration.
Lecture Notes in Computer Science
Fragmentation techniques for XML data are gaining momentum within both distributed and centralized XML query engines and pose novel and unrecognized challenges to the community. Albeit not novel, and clearly inspired by the classical divide et impera principle, fragmentation for XML trees has been proved successful in boosting the querying performance, and in cutting down the memory requirements. However, fragmentation considered so far has been driven by semantics, i.e. built around query predicates. In this paper, we propose a novel fragmentation technique that founds on structural constraints of XML documents (size, tree-width, and tree-depth) and on special-purpose structure histograms able to meaningfully summarize XML documents. This allows us to predict bounding intervals of structural properties of output (XML) fragments for efficient query processing of distributed XML data. An experimental evaluation of our study confirms the effectiveness of our fragmentation methodology on some representative XML data sets.
The Journal of Supercomputing, 2007
XML is a flexible and powerful tool that enables information and security sharing in heterogeneous environments. Scalable technologies are needed to effectively manage the growing volumes of XML data. A wide variety of methods exist for storing and searching XML data; the two most common techniques are conventional tree-based and relational approaches. Tree-based approaches represent XML as a tree and use indexes and path join algorithms to process queries. In contrast, the relational approach utilizes the power of a mature relational database to store and search XML. This method relationally maps XML queries to SQL and reconstructs the XML from the database results. To date, the limited acceptance of the relational approach to XML processing is due to the need to redesign the relational schema each time a new XML hierarchy is defined. We, in contrast, describe a relational approach that is fixed schema eliminating the need for schema redesign at the expense of potentially longer runtimes. We show, however, that these potentially longer runtimes are still significantly shorter than those of the tree approach. We use a popular XML benchmark to compare the scalability of both approaches. We generated large collections of heterogeneous XML documents ranging in size from 500MB to 8GB using the XBench benchmark. The scalability of each method was measured by running XML queries that cover a wide range of XML search features on each collection. We measure the scalability of each method over different query features as the collection size increases. In addition, we examine the performance of each method as the result size and the number of predicates increase. Our results show that our relational approach provides a scalable approach to XML retrieval by leveraging existing relational database optimizations. Furthermore, we show that the relational approach typically outperforms the treebased approach while scaling consistently over all collections studied.
2008
In this paper, we propose a new structure-based approach, called Xregion, to storing XML data in relational databases. Our approach first partitions an XML document into several disjoint regions according to the cardinality of element nodes, and then maps these regions into separate relations. The experimental results demonstrate that the proposed approach dramatically improves the performance of queries on the XML data over the generic mapping approaches.
To achieve good performance of processing queries on huge XML data in cluster machines, data partitioning and placement strategy is one of the key factors. In this paper we propose a multidimensional data structure for maintaining XML data partitions, specifically for holistic twig join processing. Initially, we construct the multidimensional data structure from statistical information on various XML documents and queries execution that have been recorded in the past, such that dimensions of the data structure such as XML tag, query, and XML document can be composed. Also, we outline series of multidimensional analysis operations for generating and maintaining XML data partitions in three basic steps: document clustering, query clustering and partition refinement. Each step yields partitions with their associated costs that are computed by a cost function. As a part of partition refinement, some partitions having considerably high costs resided in overloaded machines are analyzed to select an appropriate multidimensional analysis operation for refinement. Finally, we show the effectiveness of our proposed method with the cost distribution measurement in cluster machines.
2007
In this paper, we propose a new structure-based approach, called Xregion, to store XML data in relational databases. Our approach first partitions an XML document into several disjoint regions according to the cardinality of element nodes, and then maps these regions into separate relations. Our experimental results demonstrate that the proposed approach dramatically improves the performance of queries on the XML data over the existing approaches.
Data & Knowledge Engineering, 2005
Some XML query processors operate on an internal representation of XML documents and can leverage neither the XML storage structure nor the possible access methods dedicated to this storage structure. Such query processors are often used in organizations that usually process transient XML documents received from other organizations. In this paper, we propose a different approach to accelerating query execution on XML source documents in such environments. The approach is based on the notion of query equivalence of XML documents with respect to a query. Under this equivalence, we propose two different document transformation strategies which prune parts of the documents irrelevant to the query, just before executing the query itself. The proposed transformations are implemented and evaluated using a two-level index structure: a structural directory capturing document paths and an inverted index of tag offsets.
IOSR Journal of Engineering, 2012
The growing use of XML data format in global information needs an effective XML data management system. With the rapid growth of XML data in internet, we are confronted with big data issues; it's becoming a new research direction for managing massive XML data now. Conventional centralized data management technologies are limited in the aspects of efficiency, throughout and maintenance cost. This ability coupled with the increase use of XML data in different areas have triggered the need for a better method to structure a large data in order to improve query performance. Issues concerning the ways to efficiently partition large XML documents into a more manageable form are yet to be addressed. At the same time, it is essential to ensure that the partitioning method maintains the preservation of XML data hierarchical structure. Effective data management system for storing and querying large document repositories is required. Managing large XML repositories are storing and querying XML data sets within either an Enabled XML database or a Native XML database. This limitation related to xml database is resolved in this project using partition algorithm-Object Based Data Partition Algorithm. It structures large XML data logically by partitioning them into object based XML components.
International Journal of Grid and Utility Computing, 2012
Due to the rapid growth of XML representation for information exchange, XML databases have been widely adopted in a variety of applications. This paper presents two layers of optimisation for dealing with large XML databases: (1) OXDP (Object-Based Methodology for XML Data Partitioning) which has been developed to partition XML data efficiently and (2) OXiP (Object-Based XML Indexing for Partitions) which is an indexing and linking mechanism for partitioned data. OXDP provides optimal XML data partitioning based on an object's semantic features which improves XML data query performance. The OXiP method tokenises all rooted label paths and preserves the pathways within each XML object partition. The semanticbased data partition ultimately enhances the notion of a frequently accessed data subset which is an advantageous feature in our proposed methods to decrease the time to answer queries. Experimentally, OXDP and OXiP can achieve an order of magnitude performance improvement for querying XML data.
Lecture Notes in Computer Science, 2001
In this paper, we present a data and an execution model that allow for efficient storage and retrieval of XML documents in a relational database. The data model is strictly based on the notion of binary associations: by decomposing XML documents into small, flexible and semantically homogeneous units we are able to exploit the performance potential of vertical fragmentation. Moreover, our approach provides clear and intuitive semantics, which facilitates the definition of a declarative query algebra. Our experimental results with large collections of XML documents demonstrate the effectiveness of the techniques proposed.
ACM SIGMOD …, 2001
There has been recent interest in using relational database systems to store and query XML documents. Each of the techniques proposed in this context works by (a) creating tables for the purpose of storing XML documents (also called relational schema generation), (b) storing XML documents by shredding them into rows in the created tables, and (c) converting queries over XML documents into SQL queries over the created tables. Since relational schema generation is a physical database design issue -dependent on factors such as the nature of the data, the query workload and availability of schemas -there have been many techniques proposed for this purpose. Currently, each relational schema generation technique requires its own query processor to efficiently convert queries over XML documents into SQL queries over the created tables. In this paper, we present an efficient technique whereby the same query-processor can be used for all such relational schema generation techniques. This greatly simplifies the task of relational schema generation by eliminating the need to write a special-purpose query processor for each new solution to the problem. In addition, our proposed technique enables users to query seamlessly across relational data and XML documents. This provides users with unified access to both relational and XML data without them having to deal with separate databases. SIGMOD
2005
Vertical partitioning is a well-known technique for optimizing query performance in relational databases. An extreme form of this technique, which we call vectorization, is to store each column separately. We use a generalization of vectorization as the basis for a native XML store. The idea is to decompose an XML document into a set of vectors that contain the data values and a compressed skeleton that describes the structure. In order to query this representation and produce results in the same vectorized format, we consider a practical fragment of XQuery and introduce the notion of query graphs and a novel graph reduction algorithm that allows us to leverage relational optimization techniques as well as to reduce the unnecessary loading of data vectors and decompression of skeletons. A preliminary experimental study based on some scientific and synthetic XML data repositories in the order of gigabytes supports the claim that these techniques are scalable and have the potential to provide performance comparable with established relational database technology.
Knowledge-Based Systems, 2011
With the rapid emergence of XML as a data exchange standard over the Web, storing and querying XML data have become critical issues. The two main approaches to storing XML data are (1) to employ traditional storage such as relational database, object-oriented database and so on, and (2) to create an XMLspecific native storage. The storage representation affects the efficiency of query processing. In this paper, firstly, we review the two approaches for storing XML data. Secondly, we review various query optimization techniques such as indexing, labeling and join algorithms to enhance query processing in both approaches. Next, we suggest an indexing classification scheme and discuss some of the current trends in indexing methods, which indicate a clear shift towards hybrid indexing.
Very Large Data Bases, 2006
XQuery and SQL/XML are powerful new languages for querying XML data. However, they contain a number of stumbling blocks that users need to be aware of to get the expected results and performance. For example, certain language features make it hard if not impossible to exploit XML indexes. The major database vendors provide XQuery and SQL/XML support in their current or upcoming product releases. In this paper, we identify common pitfalls gleaned from the experiences of early adopters of this functionality. We illustrate these pitfalls through concrete examples, explain the unexpected query behavior, and show alternative formulations of the queries that behave and perform as anticipated. As results we provide guidelines for XQuery and SQL/XML users, feedback on the language standards, and food for thought for emerging languages and APIs.
Lecture Notes in Computer Science, 2002
With XML rapidly gaining popularity as the standard for data exchange on the World Wide Web, a variety of XML management systems (XMLMS) are becoming available. The choice of an XMLMS is made difficult by the significant difference in the expressive power of the queries and the performance shown by these XMLMS. Most XMLMS are legacy systems (mostly relational) extended to load, query, and publish data in XML format. A few are native XMLMS and capture all the characteristics of XML data representation. This paper looks at expressive power and efficiency of various XMLMS. The performance analysis relies on the testbed provided by XOO7, a benchmark derived from OO7 to capture both data and document characteristics of XML. We present efficiency results for two native XMLMS, an XML-enabled semi-structured data management system and an XML-enabled RDBMS, which emphasize the need for a delicate balance between the data-centric and document-centric aspects of XML query processing.