Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2007
Abstract XML is a rather verbose representation of semistructured data, which may require huge amounts of storage space. We propose a summarized representation of XML data, based on the concept of instance pattern, which can both provide succinct information and be directly queried. The physical representation of instance patterns exploits itemsets or association rules to summarize the content of XML datasets.
2008
AbstrAct XML is a rather verbose representation of semistructured data, which may require huge amounts of storage space. Several summarized representations of XML data have been proposed, which can both provide succinct information and be directly queried. In this chapter, we focus on compact representations based on the extraction of association rules from XML datasets.
Proceedings of the 2008 EDBT Ph.D. workshop, 2008
The importance of performing efficient XML query processing increases along with its usage and pervasiveness. Studying the properties of important fragments of XML query languages and designing accurate structural summaries (including indexes and statistical summaries) are all critical ingredients in solving this problem. However, up to this point there has been a gap between the theoretical and engineering efforts taken in the context of XML. We draw from research methodologies used in relational query languages and database design and apply it to the study of XPath and the design of structural summaries for XML. In particular, we study the roles various fragments of XPath algebra play in distinguishing data components in an XML document, and leverage the results in designing novel structural indexes and statistical summaries for more efficient XML query processing and more accurate result size estimation.
Lecture Notes in Computer Science, 2006
Statistical summaries in relational databases mainly focus on the distribution of data values and have been found useful for various applications, such as query evaluation and data storage. As xml has been widely used, e.g. for online data exchange, the need for (corresponding) statistical summaries in xml has been evident. While relational techniques may be applicable to the data values in xml documents, novel techniques are requried for summarizing the structures of xml documents. In this paper, we propose metrics for major structural properties, in particular, nestings of entities and one-to-many relationships, of XML documents. Our technique is different from the existing ones in that we generate a quantitative summary of an xml structure. By using our approach, we illustrate that some popular real-world and synthetic xml benchmark datasets are indeed highly skewed and hardly hierarchical and contain few recursions. We wish this preliminary finding shreds insight on improving the design of xml benchmarking and experimentations.
Models, Methods, and Applications
In this work we describe the TreeRuler tool, which makes it possible for inexperienced users to access huge XML (or relational) datasets. TreeRuler encompasses two main features: (1) it mines all the frequent association rules from input documents without any a-priori specification of the desired results, and (2) it provides quick, summarized, thus often approximate answers to user’s queries, by using the previously mined knowledge. TreeRuler has been developed in the scenario of the Odyssey EU project dealing with information about crimes, both for the relational and XML data model. In this chapter we mainly focus on the objectives, strategies, and difficulties encountered in the XML context.
2004
In this work we propose a flexible approach to extract and evaluate association rules on XML documents. We describe two kinds of association rules: structural associations and value associations. A structural association allows one to capture the similarity of an XML document with respect to a given structure, while a value association allows one to capture the similarity of the information contained in the XML document with respect to a given scenario. Moreover, we show how it possible to compose these associations in order to describe complex association rules on XML documents.
Journal of Intelligent Information Systems, 2009
Text search engines are inadequate for indexing and searching XML documents because they ignore metadata and aggregation structure implicit in the XML documents. On the other hand, the query languages supported by specialized XML search engines are very complex. In this paper, we present a simple yet flexible query language, and develop its semantics to enable intuitively appealing extraction of
By rapid development of using extensible language and XML development on the Internet, retrieval of XML data has become one of the most interesting research matters. Since the XML documents are increasingly expanding, engines for search and retrieval can be developed into a set of XML documents in order to perform the search. XML documents have not only textual information, but also contain information about the logical structure of the documents. The logical structure in fact is a tree-like structure that is encrypted by the XML labels. In XML retrieval, elements and components of document are retrieved, not the whole document. Content-based retrieval of XML documents over the past few years has been the most highly regarded which mainly has emerged from the NEXI initiative design [1]. The aim of XML retrieval is restoring related parts of an XML document that by exploiting the document structure can respond to users' needs [2]. Information retrieval systems are often inconsistent with relational databases. In XML retrieval, information needs of users determine as queries, includes key phrases and structured points. Structure, specifies XML elements tracks marked in the set from which system should restore the information [3]. In XML documents and texts, structure and content are separable . An information retrieval system in response to a query returns a ranked list of documents. Then, user examine in the linear case each of them that are in a higher rank . Since the numbers of XML components are generally high, it is necessary that users have systems to retrieve XML, so that components of content have became retrieved and reviewed. One approach could involve the use of summarization that is useful in interactive information retrieval. In interactive XML retrieval, a summary can connect by any one of its document parts which has returned via XML retrieval system [6].
2012
XML is recognized as a standard for data storage and exchange for web applications. This is because it has certain unique features like it is self describing, extensible and it is stored in the form of text document. In spite of all these unique features XML has an inherent limitation of verbosity. Because of the strong presence of XML in database technology and its inherent verbosity there is ever increasing need to design compact storage for XML which can be effectively utilized for efficient indexing and querying of XML. The proposed technique creates a structure index which is a compact summarization of the XML document and data index which groups and stores the contents of all similar paths at one place. Based on this compact storage a novel query algorithm is proposed which can answer xpath queries very efficiently. This approach dramatically reduces the storage requirement for XML coupled with efficient processing of xpath queries. The implementation of this technique and comparison with other techniques confirms our claim.
International Journal of Computer Applications, 2012
XML is recognized as a standard for data storage and exchange for web applications. This is because it has certain unique features like it is self describing, extensible and it is stored in the form of text document. In spite of all these unique features XML has an inherent limitation of verbosity. Because of the strong presence of XML in database technology and its inherent verbosity there is ever increasing need to design compact storage for XML which can be effectively utilized for efficient indexing and querying of XML. The proposed technique creates a structure index which is a compact summarization of the XML document and data index which groups and stores the contents of all similar paths at one place. Based on this compact storage a novel query algorithm is proposed which can answer xpath queries very efficiently. This approach dramatically reduces the storage requirement for XML coupled with efficient processing of xpath queries. The implementation of this technique and com...
2009
XML has been explored by both research and industry communities. More than 5500 papers were published on different aspects of XML. With so many publications, it is hard for someone to decide where to start. Hence, this paper presents some of the research topics on XML, namely: XML on relational databases, query processing, views, data matching, and schema evolution. It then summarizes some (some!) of the most relevant or traditional papers on those subjects.
and order restriction(<).
IEICE Transactions on Information and Systems
As data integration over the Web has become an increasing demand, there is a growing desire to use XML as a standard format for data exchange. For sharing their grammars efficiently, most of the XML documents in use are associated with a document structure description, such as DTD or XML schema. However, the document structure information is not utilized efficiently in previously proposed techniques of XML query processing. In this paper, we present a novel technique that reduces the disk I/O complexity of XML query processing. We design a schemabased numbering scheme called SPAR that incorporates both structure information and tag names extracted from DTD or XML schema. Based on SPAR, we develop a mechanism called VirtualJoin that significantly reduces disk I/O workload for processing XML queries. As shown by experiments, VirtualJoin outperforms many prior techniques.
Lecture Notes in Computer Science, 2007
Retrieval queries that combine structural constraints with keyword search are placing new challenges on retrieval systems. This paper presents TReX-a new retrieval system for XML. TReX uses structural summaries to efficiently retrieve elements given structural constraints. TReX can efficiently return either all the answers to a given query or only the top-k answers. In this paper, we discuss our participation in the annual Initiative for the Evaluation of XML Retrieval (INEX) workshop in the ad-hoc track. Specifically, we investigate the use of summaries and the flexibility they provide when dealing with structural constraints. We present an algorithm for retrieval using summaries. Finally, experimental results are presented showing that TReX answers queries efficiently and effectively.
eXtensible Markup Language (XML) is one of the standard data representations used in various applications. The need to summarize XML document to generate concise, readable summary that provides all important information is very noble as it saves both time and effort. This paper presents Main approaches for summarizing XML documents based on both its structural and data contents.
2008 IEEE 24th International Conference on Data Engineering, 2008
The nature of semistructured data in web collections is evolving. Increasingly, XML web documents (or documents exchanged via web services) are valid with regard to a schema, yet the actual structure of such documents exhibits significant variations across collections for several reasons: the schema is very lax (e.g., RSS feeds), the schema is large and different subsets are used (e.g., industry standards like UBL), or open content models allow arbitrary schemas to be mixed (e.g., RSS extensions like those used for podcasting). Many web development tasks that incorporate XPath queries to process XML documents require an understanding of the actual structure present in the collection.
Revealing issues with current framework is itself a critical assignment. A review taken out for revealing issues related with Association standard mining on XML data. Preparatory essential ideas of Association rule mining is given in this work. Mining enormous amount of data, association rule mining have been demonstrated a powerful idea. Amid late years, the vast majority of the overall information exchanges are finished with XML (eXtensible Markup Language). Numerous empowering techniques have been distinguished and produced for mining XML data. In this paper, the idea of XML data examination is compressed and its importance towards association rule extraction has been represented. We have cantered a variety of strategies and methodologies of the examination, which are useful and set apart as the imperative field of XML data investigation. This work gives a study of different association rule strategies connected effectively on XML information since last one decade.
Proceedings of the 2006 ACM symposium on …, 2006
Lecture Notes in Computer Science, 2005
In this article we investigate a novel execution paradigm-ML-like pattern-matchingfor XML query processing. We show that such a paradigm is well adapted for a common and frequent set of queries and advocate that it constitutes a candidate for efficient execution of XML queries far better than the current XPath-based query mechanisms. We support our claim by comparing performances of XPath-based queries with pattern based ones, and by comparing the latter with the two efficiency-best XQuery processor we are aware of.
Advances in Scalable Web Information Integration and Service, 2007
XML is emerging as a de facto standard for information exchange over the Web, while businesses and enterprises generate and exchange large amounts of XML data daily. One of the major challenges is how to query this data efficiently. Queries typically can be represented as twig patterns. Some researchers have developed algorithms that reduce the intermediate results that are generated during query processing, while others have introduced labeling schemes that encode the position of elements, enabling queries to be answered by accessing the labels without traversing the original XML documents. In this paper we outline optimizations that are based on semantics of the data being queried, and introduce efficient algorithms for content and keyword searches in XML databases. If the semantics are known we can further optimize the query processing, but if the semantics are unknown we revert to the traditional query processing approaches.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.