Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2007, Advances in Databases and Information Systems
…
13 pages
1 file
This paper describes a new XML compression scheme that offers both high compression ratios and short query response time. Its core is a fully reversible transform featuring substitution of every word in an XML document using a semi-dynamic dictionary, effective encoding of dictionary indices, as well as numbers, dates and times found in the document, and grouping data within the same structural context in individual containers. The results of conducted tests show that the proposed scheme attains compression ratios rivaling the best available algorithms, and fast compression, decompression, and query processing.
2012
Extensible Markup Language (XML) is proposed as a standardized data format designed for specifying and exchanging data on the Web. With the proliferation of mobile devices, such as palmtop computers, as a means of communication in recent years, it is reasonable to expect that in the foreseeable future, a massive amount of XML data will be generated and exchanged between applications in order to perform dynamic computations over the Web. However, XML is by nature verbose, since terseness in XML markup is not considered a pressing issue from the design perspective. In practice, XML documents are usually large in size as they often contain much redundant data. The size problem hinders the adoption of XML, since it substantially increases the costs of data processing, data storage, and data exchanges over the Web. As the common generic text compressors, such as Gzip, Bzip2, WinZip, PKZIP, or MPEG-7 (BiM), are not able to produce usable XML compressed data, many XML specific compression technologies have been recently proposed. The essential idea of these technologies is that, by utilizing the exposed structure information in the input XML document during the compression process, they pursue two important goals at the same time. First, they aim at achieving a good compression ratio and time compared to the generic text compressors. Second, they aim at generating a compressed XML document that is able to support efficient evaluation of queries over the data. This paper discuses survey of some of the Adaptive Compression Techniques for XML namely Xmill ,Xpress ,Xgrind.
Database and Expert Systems …, 2008
With the rapidly increasing popularity of XML as a data format, there is a large demand for efficient techniques in storing and querying XML documents. However XML is by nature verbose, due to repeatedly used tags that describe data. For this reason the storage requirements of XML can be excessive and lead to increased costs for data manipulation. Therefore, it seems natural to use compression techniques to increase the efficiency of storing and querying XML data. In this paper, we propose a new approach called SCQX for Storing, Compressing and Querying XML documents. This approach compresses the structure of an XML document based on exploiting repetitive consecutive tags in the structure, and then SCQX stores the compressed XML structure and the data separately in a robust storage structure that includes a set of access support structures to guarantee fast query performance. Moreover, SCQX supports querying of the compressed XML structure directly and efficiently without requiring decompression. An experimental evaluation on sets of XML data shows the effectiveness of our approach.
XML is a standard for exchanging and presenting information on the Web because XML makes data flexible in representation and easily portable as well. However, XML data is also recognized as verbose since it heavily increases the size of the data due to the repeated tags and structures. The data verbosity problem gives rise to many challenges of conventional query processing and data exchange. The XML increase the overhead of bandwidth-and memory-limited devices. XML compression and optimization are one of the solutions of the verbosity problems of XML. Although many effective XML compressors, such as XMill, have been proposed to solve the data size problem but it does not address the problem of running queries on compressed XML data. Other compressors have been proposed to query compressed XML data. However, the compression ratio of these compressors is usually worse than that of XMill and that of the generic compressor gzip, while their query performance and the expressive power of the query language they support are inadequate. The main objective of this work is in two folds; first design and development of XML compression method and second optimization of existing methods of XML compression. In addition, the increased size affects both query processing and data exchange. XML files require a lot more storage space and network bandwidth.
XML has been acknowledged as the defacto standard for data representation and exchange over the World Wide Web. Being self describing grants XML its great flexibility and wide acceptance but on the other hand it is the cause of its main drawback that of being huge in size. The huge document size means that the amount of information that has to be transmitted, processed, stored, and queried is often larger than that of other data formats. Several XML compression techniques has been introduced to deal with these problems. In this paper, we provide a complete survey over the state-of-the-art of XML compression techniques. In addition, we present an extensive experimental study of the available implementations of these techniques. We report the behavior of nine XML compressors using a large corpus of XML documents which covers the different natures and scales of XML documents. In addition to assessing and comparing the performance characteristics of the evaluated XML compression tools, the study also tries to assess the effectiveness and practicality of using these tools in the real world. Finally, we provide some guidelines and recommendations which are useful for helping developers and users for making an effective decision towards selecting the most suitable XML compression tool for their needs.
Software: Practice and …, 2008
The innate verbosity of the Extensible Markup Language remains one of its main weaknesses, especially when large XML documents are concerned. This problem can be solved with the aid of XML-specialized compression algorithms.
2005
XML is a popular meta-language that facilitates the interchange and access of data. However, XML's verbose nature may increase the size of a data set as much as ten-fold. In this paper, we present a novel technique for lossless XML compression, called TREECHOP, which supports querying of compressed XML data without requiring full decompression. Unlike other query-capable XML compression schemes, TREECHOP requires only a single pass over the input document during the compression process, resulting in an efficient, online operation that is well-suited for transmission of compressed XML documents over a network.
The Extensible Markup Language (XML) has been acknowledge as the defacto standard for data exchange over the web and data representation. But on the other hand its main drawback that of being huge in size. The huge document size means that the amount of information has to be stored, transmitted, and queried is often larger than that of other data formats. Several XML compression techniques have been introduced to deal with these problems. In this paper, we present an experimental study of available XML compression techniques and we provide guidelines for users for making an effective decision towards selecting the most suitable XML compression tool according their needs.
In Last few years XML has became standard of data interchange in most of the applications over the web. Most of the Enterprise level application, Middleware software adopted XML as their standard medium for data interchange. So in these days of Internet and web technology popularity of XML is huge. The power of XML lies in its self describing abilities. This same ability makes XML verbose thus introducing significant amount of redundancy that adds no particular value. Increased size of XML documents create burden on application and network bandwidth. So there is a clear requirement for good and effective data compression algorithm for XML. Traditional ZIP utility like GZIP compressed the XML data in a non readable format means we can't query those data until it's decompressed. So in this paper I had developed an efficient XML compression utility named as XLARGE. XLARGE has shown a significant amount of improvement in compression ratio over GZip.
CAD Systems in …, 2007
The main drawback of the XML format seems to be its verbosity, a key problem especially in case of large documents. Therefore, efficient encoding of XML constitutes an important research issue. In this work, we describe a preprocessing transform meant to be used with popular LZ77-style compressors. We show experimentally that our transform, albeit quite simple, leads to better compression ratios than existing XML-aware compressors. Moreover, it offers high decoding speed, which often is of utmost priority.
IOSR Journal of Computer Engineering, 2013
This research work demonstrates the Extraction, compression and query processing of XML documents for Adaptive Compression Techniques and Efficient Query Evaluation. Proposed here are the algorithms for xml compression and Efficient Query Evaluation as-Feasible XML compression using data compression algorithm. Qurey Processor using Sax parsing and Interfaces. It is shown that using the proposed techniques for xml data compression will pave a way for better compression and improve the compression ratio and performance of compressor system
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
International Journal of Database Management Systems, 2011
Lecture Notes in Computer Science, 2006
Lecture Notes in Computer Science, 2013
2010 International Conference on Computer Information Systems and Industrial Management Applications (CISIM), 2010
Communications in Computer and Information Science, 2011
Lecture Notes in Computer Science, 2009
Proceedings of the 12th International Conference on Web Information Systems and Technologies, 2016
Computing Research Repository, 2008
Knowledge Representation Meets Databases, 2001
Lecture Notes in Computer Science, 2009
International Journal of Data Warehousing and Mining, 2014