Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2007, ACM Transactions on Information Systems
…
33 pages
1 file
XML is a rather verbose representation of semistructured data, which may require huge amounts of storage space. We propose a summarized representation of XML data, based on the concept of instance pattern, which can both provide succinct information and be directly queried. The physical representation of instance patterns exploits itemsets or association rules to summarize the content of XML datasets. Instance patterns may be used for (possibly partially) answering queries, either when fast and approximate answers are required, or when the actual dataset is not available, for example, it is currently unreachable. Experiments on large XML documents show that instance patterns allow a significant reduction in storage space, while preserving almost entirely the completeness of the query result. Furthermore, they provide fast query answers and show good scalability on the size of the dataset, thus overcoming the document size limitation of most current XQuery engines.
Proceedings of the 2008 EDBT Ph.D. workshop, 2008
The importance of performing efficient XML query processing increases along with its usage and pervasiveness. Studying the properties of important fragments of XML query languages and designing accurate structural summaries (including indexes and statistical summaries) are all critical ingredients in solving this problem. However, up to this point there has been a gap between the theoretical and engineering efforts taken in the context of XML. We draw from research methodologies used in relational query languages and database design and apply it to the study of XPath and the design of structural summaries for XML. In particular, we study the roles various fragments of XPath algebra play in distinguishing data components in an XML document, and leverage the results in designing novel structural indexes and statistical summaries for more efficient XML query processing and more accurate result size estimation.
2008
AbstrAct XML is a rather verbose representation of semistructured data, which may require huge amounts of storage space. Several summarized representations of XML data have been proposed, which can both provide succinct information and be directly queried. In this chapter, we focus on compact representations based on the extraction of association rules from XML datasets.
International Journal of Computer Applications, 2012
XML is recognized as a standard for data storage and exchange for web applications. This is because it has certain unique features like it is self describing, extensible and it is stored in the form of text document. In spite of all these unique features XML has an inherent limitation of verbosity. Because of the strong presence of XML in database technology and its inherent verbosity there is ever increasing need to design compact storage for XML which can be effectively utilized for efficient indexing and querying of XML. The proposed technique creates a structure index which is a compact summarization of the XML document and data index which groups and stores the contents of all similar paths at one place. Based on this compact storage a novel query algorithm is proposed which can answer xpath queries very efficiently. This approach dramatically reduces the storage requirement for XML coupled with efficient processing of xpath queries. The implementation of this technique and com...
2012
XML is recognized as a standard for data storage and exchange for web applications. This is because it has certain unique features like it is self describing, extensible and it is stored in the form of text document. In spite of all these unique features XML has an inherent limitation of verbosity. Because of the strong presence of XML in database technology and its inherent verbosity there is ever increasing need to design compact storage for XML which can be effectively utilized for efficient indexing and querying of XML. The proposed technique creates a structure index which is a compact summarization of the XML document and data index which groups and stores the contents of all similar paths at one place. Based on this compact storage a novel query algorithm is proposed which can answer xpath queries very efficiently. This approach dramatically reduces the storage requirement for XML coupled with efficient processing of xpath queries. The implementation of this technique and comparison with other techniques confirms our claim.
Lecture Notes in Computer Science, 2005
In this article we investigate a novel execution paradigm-ML-like pattern-matchingfor XML query processing. We show that such a paradigm is well adapted for a common and frequent set of queries and advocate that it constitutes a candidate for efficient execution of XML queries far better than the current XPath-based query mechanisms. We support our claim by comparing performances of XPath-based queries with pattern based ones, and by comparing the latter with the two efficiency-best XQuery processor we are aware of.
Studies in Fuzziness and Soft Computing, 2006
XML was born to represent, exchange and publish information on the Web, but now it has spread in many other applications. Due to this success, the W3C has proposed a new query language, XQuery, specifically designed to query XML data. XQuery allows to obtain exact answers to queries; however when applied to large XML repositories or warehouses, such precise queries may require high response times. Our research proposes a methodology for the semi-automatic derivation of summarized documents (synopses) for massive, heterogeneous XML data-sets, with the final aim of producing query transformation rules from queries on the original data-sets to queries on the summarized data-set.
Lecture Notes in Computer Science, 2007
Retrieval queries that combine structural constraints with keyword search are placing new challenges on retrieval systems. This paper presents TReX-a new retrieval system for XML. TReX uses structural summaries to efficiently retrieve elements given structural constraints. TReX can efficiently return either all the answers to a given query or only the top-k answers. In this paper, we discuss our participation in the annual Initiative for the Evaluation of XML Retrieval (INEX) workshop in the ad-hoc track. Specifically, we investigate the use of summaries and the flexibility they provide when dealing with structural constraints. We present an algorithm for retrieval using summaries. Finally, experimental results are presented showing that TReX answers queries efficiently and effectively.
XML is becoming prevalent in data presentation and data exchange on the internet. One important issue in the XML research community is how to query XML documents to extract and restructure information. Currently, XQuery based on XPath is the most promising standard. In this paper, we discuss limitations of XPath and XQuery, and propose a generalization of XPath called XTree that overcomes these limitations. Using XTree, multiple variable bindings can be instantiated in one expression; and XTree expressions, which represent a tree rather than a path, can be used in both the querying part and the result construction part of a query. Based on XTree, we develop an XTree query language, which is more compact and convenient to use than XQuery, and supports common query operations such as join, negation, grouping, and recursion in a direct way. We describe an algorithm that converts XTree query scripts to XQuery scripts. This algorithm provides not only a means of executing queries written in XTree query language but also highlights differences between the two query languages.
Lecture Notes in Computer Science, 2006
Statistical summaries in relational databases mainly focus on the distribution of data values and have been found useful for various applications, such as query evaluation and data storage. As xml has been widely used, e.g. for online data exchange, the need for (corresponding) statistical summaries in xml has been evident. While relational techniques may be applicable to the data values in xml documents, novel techniques are requried for summarizing the structures of xml documents. In this paper, we propose metrics for major structural properties, in particular, nestings of entities and one-to-many relationships, of XML documents. Our technique is different from the existing ones in that we generate a quantitative summary of an xml structure. By using our approach, we illustrate that some popular real-world and synthetic xml benchmark datasets are indeed highly skewed and hardly hierarchical and contain few recursions. We wish this preliminary finding shreds insight on improving the design of xml benchmarking and experimentations.
Proceedings of the 4th Wseas International Conference on Artificial Intelligence Knowledge Engineering Data Bases, 2005
Recently, a lot of index techniques for storing and querying XML document have been studied so far and many researches of them used coordinate-based methods. But update operation and query processing to express structural relations among elements, attributes and texts make a large burden. In this paper, we propose an efficient extensible index technique based on pattern information. It supports containment queries and pattern queries and it does not cause serious performance degradations even if there are frequent update operations. Management of XML Schema's pattern information can reduce the number of nodes participating in the containment relationship query processing among each element. Overall, the performance could be improved by reduction of the number of times for traversing nodes.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
IEICE Transactions on Information and Systems
Journal of Intelligent Information Systems, 2009
Lecture Notes in Computer Science, 2003
IEEE Transactions on Knowledge and Data Engineering, 2012
arXiv preprint arXiv:1203.6454, 2012
Journal of Systems and Software, 2002
Models, Methods, and Applications
Lecture Notes in Computer Science, 2002
Lecture Notes in Computer Science, 2001
VLDB '02: Proceedings of the 28th International Conference on Very Large Databases, 2002
Proceedings 2004 VLDB Conference, 2004
XML-Based Data Management and …, 2002