Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2003
Information technology is widely adopting the use of XML for information exchange. As messaging standards migrate to XML, there is growing concern for the magnitude of messages compared to binary formatted messages. XML compression can help mitigate the risk of exceeding the capacity of current communication resources. However, it is critical that compression technologies do not hinder XML query processing efficiency. Schema-aware compression and stream-based query processing seem to provide the best combination because decoded values can be accessed by the filtering processor without producing the original XML document. When query specifications are predefined, further optimizations can be made by embedding path-related information for each tag and attribute in the compressed message. A similar approach could be adopted by other XML processes (such as XSL, XForms and others) by augmenting compressed information with specific ‘hints’ to improve performance . 1 Compressing and filter...
Advances in Databases and Information Systems, 2007
This paper describes a new XML compression scheme that offers both high compression ratios and short query response time. Its core is a fully reversible transform featuring substitution of every word in an XML document using a semi-dynamic dictionary, effective encoding of dictionary indices, as well as numbers, dates and times found in the document, and grouping data within the same structural context in individual containers. The results of conducted tests show that the proposed scheme attains compression ratios rivaling the best available algorithms, and fast compression, decompression, and query processing.
2012
Extensible Markup Language (XML) is proposed as a standardized data format designed for specifying and exchanging data on the Web. With the proliferation of mobile devices, such as palmtop computers, as a means of communication in recent years, it is reasonable to expect that in the foreseeable future, a massive amount of XML data will be generated and exchanged between applications in order to perform dynamic computations over the Web. However, XML is by nature verbose, since terseness in XML markup is not considered a pressing issue from the design perspective. In practice, XML documents are usually large in size as they often contain much redundant data. The size problem hinders the adoption of XML, since it substantially increases the costs of data processing, data storage, and data exchanges over the Web. As the common generic text compressors, such as Gzip, Bzip2, WinZip, PKZIP, or MPEG-7 (BiM), are not able to produce usable XML compressed data, many XML specific compression technologies have been recently proposed. The essential idea of these technologies is that, by utilizing the exposed structure information in the input XML document during the compression process, they pursue two important goals at the same time. First, they aim at achieving a good compression ratio and time compared to the generic text compressors. Second, they aim at generating a compressed XML document that is able to support efficient evaluation of queries over the data. This paper discuses survey of some of the Adaptive Compression Techniques for XML namely Xmill ,Xpress ,Xgrind.
Database and Expert Systems …, 2008
With the rapidly increasing popularity of XML as a data format, there is a large demand for efficient techniques in storing and querying XML documents. However XML is by nature verbose, due to repeatedly used tags that describe data. For this reason the storage requirements of XML can be excessive and lead to increased costs for data manipulation. Therefore, it seems natural to use compression techniques to increase the efficiency of storing and querying XML data. In this paper, we propose a new approach called SCQX for Storing, Compressing and Querying XML documents. This approach compresses the structure of an XML document based on exploiting repetitive consecutive tags in the structure, and then SCQX stores the compressed XML structure and the data separately in a robust storage structure that includes a set of access support structures to guarantee fast query performance. Moreover, SCQX supports querying of the compressed XML structure directly and efficiently without requiring decompression. An experimental evaluation on sets of XML data shows the effectiveness of our approach.
2008
We outline in this paper the main contributions of the XQueC project. XQueC, namely XQuery processor and Compressor, is the first compression tool to seamlessly allow XQuery queries in the compressed domain. It includes a set of data structures, that basically shred the XML document into suitable chunks linked to each other, thus disagreeing with the 'homomorphic' principle so far adopted in previous XML compressors. According to this principle, the compressed document is homomorphic to the original document. Moreover, in order to avoid the time consumption due to compressing and decompressing intermediate query results, XQueC applies 'lazy' decompression by issuing the queries directly in the compressed domain.
XML is a standard for exchanging and presenting information on the Web because XML makes data flexible in representation and easily portable as well. However, XML data is also recognized as verbose since it heavily increases the size of the data due to the repeated tags and structures. The data verbosity problem gives rise to many challenges of conventional query processing and data exchange. The XML increase the overhead of bandwidth-and memory-limited devices. XML compression and optimization are one of the solutions of the verbosity problems of XML. Although many effective XML compressors, such as XMill, have been proposed to solve the data size problem but it does not address the problem of running queries on compressed XML data. Other compressors have been proposed to query compressed XML data. However, the compression ratio of these compressors is usually worse than that of XMill and that of the generic compressor gzip, while their query performance and the expressive power of the query language they support are inadequate. The main objective of this work is in two folds; first design and development of XML compression method and second optimization of existing methods of XML compression. In addition, the increased size affects both query processing and data exchange. XML files require a lot more storage space and network bandwidth.
International Journal of Database Management Systems, 2011
Extensible Mark-up Language was designed to carry data which provides a platform to define own tags. XML documents are immense in nature. As a result there has been an ever growing need for developing an efficient storage structure and high-performance techniques to query efficiently. QUICX (Query and Update Support for Indexed and Compressed XML) is the compact storage structure which gives highest compression ratio than many of the queriable compressors available today. The data are compressed using LZW approach, and stored compactly. Indexing is done for the data stored in the containers, there by further increasing the compression ratio. These also reduce the time for querying the storage structure. Index files are created only for highly redundant files which is larger in size. IQCX support simple, aggregate, conditional, compound, nested and correlation predicate queries. All queries except simple query use the index files. The reduced execution time proves the efficiency of the indexing technique.
International Journal of Web Information Systems, 2008
PurposeEfficient processing of XML queries is critical for XML data management and related applications. Previously proposed techniques are unsatisfactory. The purpose of this paper is to present Determined – a new prototype system designed for XML query processing and optimization from a system perspective. With Determined, a number of novel techniques for XML query processing are proposed and demonstrated.Design/methodology/approachThe methodology emphasizes on query pattern minimization, logic‐level optimization, and efficient query execution. Accordingly, three lines of investigation have been pursued in the context of Determined: XML tree pattern query (TPQ) minimization; logic‐level XML query optimization utilizing deterministic transformation; and specialized algorithms for fast XML query execution.FindingsDeveloped and demonstrated were: a runtime optimal and powerful algorithm for XML TPQ minimization; a unique logic‐level XML query optimization approach that solely pursues...
IEEE Transactions on Knowledge and Data Engineering, 2008
An XML publish/subscribe system needs to filter a large number of queries over XML streams. Most existing systems only consider filtering the simple XPath statements. In this paper, we focus on filtering of the more complex Generalized-Tree-Pattern (GTP) queries. Our filtering mechanism is based on a novel Tree-of-Path (TOP) encoding scheme, which compactly represents the path matches for the entire document. First, we show that the TOP encodings can be efficiently produced via shared bottom-up path matching. Second, with the aid of this TOP encoding, we can 1) achieve polynomial time and space complexity for postprocessing, 2) avoid redundant predicate evaluations, 3) allow an efficient duplicate-free and merge join-based algorithm for merging multiple encoded path matches, and 4) simplify the processing of GTP queries. Overall, our approach maximizes the sharing opportunity across queries by exploiting the suffix as well as prefix sharing. At the same time, our TOP encodings allow efficient postprocessing for GTP queries. Extensive performance studies show that GFilter not only achieves significantly better filtering performance than state-ofthe-art algorithms but also is capable of efficiently filtering the more complex GTP queries.
XML is a popular meta-language in widespread use across a variety of application domains. However, its verbose nature has limited its acceptance in cases where a more succinct textual or binary data encoding format can be used. In this report, we describe AXECHOP, an XML-conscious compressor which uses a grammarbased approach to exploit the possibly significant structural redundancies within XML documents in order to achieve significant rates of compression.
Lecture Notes in Computer Science, 2009
Due to the growing popularity of XML as a data exchange and storage format, the need to develop efficient techniques for storing and querying XML documents has emerged. A common approach to achieve this is to use labeling techniques. However, their main problem is that they either do not support updating XML data dynamically or impose huge storage requirements. On the other hand, with the verbosity and redundancy problem of XML, which can lead to increased cost for processing XML documents, compaction of XML documents has become an increasingly important research issue. In this paper, we propose an approach called CXDLS combining the strengths of both, labeling and compaction techniques. Our approach exploits repetitive consecutive subtrees and tags for compacting the structure of XML documents by taking advantage of the ORDPATH labeling scheme. In addition it stores the compacted structure and the data values separately. Using our proposed approach, it is possible to support efficient query and update processing on compacted XML documents and to reduce storage space dramatically. Results of a comprehensive performance study are provided to show the advantages of CXDLS.
2005
XML is a popular meta-language that facilitates the interchange and access of data. However, XML's verbose nature may increase the size of a data set as much as ten-fold. In this paper, we present a novel technique for lossless XML compression, called TREECHOP, which supports querying of compressed XML data without requiring full decompression. Unlike other query-capable XML compression schemes, TREECHOP requires only a single pass over the input document during the compression process, resulting in an efficient, online operation that is well-suited for transmission of compressed XML documents over a network.
Lecture Notes in Computer Science, 2013
The exploitation of large volume of XML (eXtensible Markup Language) data with a limited storage space implies the development of a special and reliable treatment to compress data and query them. This work studies and treats these processes in order to combine them via a mediator while facilitating querying compressed XML data without recourse to the decompression process. We propose a new technique to compress, re-index and query XML data while improving XMill and B+Tree algorithms. We show the reliability and the speed up of the proposed querying system towards response time and answers' exactitude.
IOSR Journal of Computer Engineering, 2013
This research work demonstrates the Extraction, compression and query processing of XML documents for Adaptive Compression Techniques and Efficient Query Evaluation. Proposed here are the algorithms for xml compression and Efficient Query Evaluation as-Feasible XML compression using data compression algorithm. Qurey Processor using Sax parsing and Interfaces. It is shown that using the proposed techniques for xml data compression will pave a way for better compression and improve the compression ratio and performance of compressor system
cs.uic.edu
XML has grown into a widely used and highly developed technology, due in part to the subcomponents built around the technology (advanced parsers, frameworks, libraries, etc). The use of XML reduces development time and increases the robustness of distributed applications. Due to these ...
XML has been acknowledged as the defacto standard for data representation and exchange over the World Wide Web. Being self describing grants XML its great flexibility and wide acceptance but on the other hand it is the cause of its main drawback that of being huge in size. The huge document size means that the amount of information that has to be transmitted, processed, stored, and queried is often larger than that of other data formats. Several XML compression techniques has been introduced to deal with these problems. In this paper, we provide a complete survey over the state-of-the-art of XML compression techniques. In addition, we present an extensive experimental study of the available implementations of these techniques. We report the behavior of nine XML compressors using a large corpus of XML documents which covers the different natures and scales of XML documents. In addition to assessing and comparing the performance characteristics of the evaluated XML compression tools, the study also tries to assess the effectiveness and practicality of using these tools in the real world. Finally, we provide some guidelines and recommendations which are useful for helping developers and users for making an effective decision towards selecting the most suitable XML compression tool for their needs.
Data & Knowledge Engineering, 2005
Some XML query processors operate on an internal representation of XML documents and can leverage neither the XML storage structure nor the possible access methods dedicated to this storage structure. Such query processors are often used in organizations that usually process transient XML documents received from other organizations. In this paper, we propose a different approach to accelerating query execution on XML source documents in such environments. The approach is based on the notion of query equivalence of XML documents with respect to a query. Under this equivalence, we propose two different document transformation strategies which prune parts of the documents irrelevant to the query, just before executing the query itself. The proposed transformations are implemented and evaluated using a two-level index structure: a structural directory capturing document paths and an inverted index of tag offsets.
2010
With respect to current methods for query evaluation over XML data streams, adoption of certain types of buffering techniques is unavoidable. Under lots of circumstances, the buffer scale may increase exponentially, which can cause memory bottleneck. Some optimization techniques have been proposed to solve the problem. However, the limit of these techniques has been defined by a concurrency lower bound and has been theoretically proved. In this paper, we show through an empirical study that this lower bound can be broken by taking semantic information into account for buffer reduction. To demonstrate this, we built a SAX-based XML stream query evaluation system and designed an algorithm that consumes buffers in line with the concurrency lower bound. After a further analysis of the lower bound, we designed several semantic rules for the purpose of breaking the lower bound and incorporated these rules in the lower bound algorithm. Experiments are conducted to show that the algorithms deploying semantic rules individually and collectively all significantly outperform the lower bound algorithm that does not consider semantic information.
International Conference on Management of Data, 2009
XML stream querying problem involves evaluating a given, potentially large, set of query expressions on a continuous stream of XML messages. Since the messages arrive continuously, it is essential that the query processing rate matches the data arrival rate. Therefore, it is necessary to index the given set of query expressions appropriately to enable real-time processing of the streaming XML data. In this paper we propose a simple and scalable system for the XML stream querying problem. The system indexes the queries compactly using a query guide and uses simple integer stacks to efficiently process the stream. Our experiments demonstrate that the new system outperforms the classical stream query processor YFilter by sizeable margins without asking for more index space. Also, the system shows good time and space scalability with respect to query workload and stream size.
IEICE Transactions on Information and Systems
As data integration over the Web has become an increasing demand, there is a growing desire to use XML as a standard format for data exchange. For sharing their grammars efficiently, most of the XML documents in use are associated with a document structure description, such as DTD or XML schema. However, the document structure information is not utilized efficiently in previously proposed techniques of XML query processing. In this paper, we present a novel technique that reduces the disk I/O complexity of XML query processing. We design a schemabased numbering scheme called SPAR that incorporates both structure information and tag names extracted from DTD or XML schema. Based on SPAR, we develop a mechanism called VirtualJoin that significantly reduces disk I/O workload for processing XML queries. As shown by experiments, VirtualJoin outperforms many prior techniques.
Due to the flexibility and the easy use of XML, it is nowadays widely used in a vast number of application areas and new information is increasingly being encoded as XML documents. Therefore, it is important to provide a repository for XML documents, which supports efficient management and storage of XML data. For this purpose, many proposals have been made, the most common ones are node labeling schemes. On the other hand, XML repeatedly uses tags to describe the data itself. This self-describing nature of XML makes it verbose with the result that the storage requirements of XML are often expanded and can be excessive. In addition, the increased size leads to increased costs for data manipulation. Therefore, it also seems natural to use compression techniques to increase the efficiency of storing and querying XML data. In our previous works, we aimed at combining the advantages of both areas (labeling and compaction technologies), Specially, we took advantage of XML structural peculiarities for attempting to reduce storage space requirements and to improve the efficiency of XML query processing using labeling schemes. In this paper, we continue our investigations on variations of binary string encoding forms to decrease the label size. Also We report the experimental results to examine the impact of binary string encoding on the query performance and the storage size needed to store the compacted XML documents.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.