Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
Due to the flexibility and the easy use of XML, it is nowadays widely used in a vast number of application areas and new information is increasingly being encoded as XML documents. Therefore, it is important to provide a repository for XML documents, which supports efficient management and storage of XML data. For this purpose, many proposals have been made, the most common ones are node labeling schemes. On the other hand, XML repeatedly uses tags to describe the data itself. This self-describing nature of XML makes it verbose with the result that the storage requirements of XML are often expanded and can be excessive. In addition, the increased size leads to increased costs for data manipulation. Therefore, it also seems natural to use compression techniques to increase the efficiency of storing and querying XML data. In our previous works, we aimed at combining the advantages of both areas (labeling and compaction technologies), Specially, we took advantage of XML structural peculiarities for attempting to reduce storage space requirements and to improve the efficiency of XML query processing using labeling schemes. In this paper, we continue our investigations on variations of binary string encoding forms to decrease the label size. Also We report the experimental results to examine the impact of binary string encoding on the query performance and the storage size needed to store the compacted XML documents.
Since Extensible Markup Language abbreviated as XML, became an official World Wide Web Consortium recommendation in 1998, XML has emerged as the predominant mechanism for data storage and exchange, in particular over the World Web. Due to the flexibility and the easy use of XML, it is nowadays widely used in a vast number of application areas and new information is increasingly being encoded as XML documents. Because of the widespread use of XML and the large amounts of data that are represented in XML, it is therefore important to provide a repository for XML documents, which supports efficient management and storage of XML data. Since the logical structure of an XML document is an ordered tree consisting of tree nodes, establishing a relationship between nodes is essential for processing the structural part of the queries. Therefore, tree navigation is essential to answer XML queries. For this purpose, many proposals have been made, the most common ones are node labeling schemes. On the other hand, XML repeatedly uses tags to describe the data itself. This self-describing nature of XML makes it verbose with the result that the storage requirements of XML are often expanded and can be excessive. In addition, the increased size leads to increased costs for data manipulation. Therefore, it also seems natural to use compression techniques to increase the efficiency of storing and querying XML data. In our previous works, we aimed at combining the advantages of both areas (labeling and compaction technologies), Specially, we took advantage of XML structural peculiarities for attempting to reduce storage space requirements and to improve the efficiency of XML query processing using labeling schemes. In this paper, we continue our investigations on variations of binary string encoding forms to decrease the label size. Also We report the experimental results to examine the impact of binary string encoding on reducing the storage size needed to store the compacted XML documents.
Lecture Notes in Computer Science, 2009
Due to the growing popularity of XML as a data exchange and storage format, the need to develop efficient techniques for storing and querying XML documents has emerged. A common approach to achieve this is to use labeling techniques. However, their main problem is that they either do not support updating XML data dynamically or impose huge storage requirements. On the other hand, with the verbosity and redundancy problem of XML, which can lead to increased cost for processing XML documents, compaction of XML documents has become an increasingly important research issue. In this paper, we propose an approach called CXDLS combining the strengths of both, labeling and compaction techniques. Our approach exploits repetitive consecutive subtrees and tags for compacting the structure of XML documents by taking advantage of the ORDPATH labeling scheme. In addition it stores the compacted structure and the data values separately. Using our proposed approach, it is possible to support efficient query and update processing on compacted XML documents and to reduce storage space dramatically. Results of a comprehensive performance study are provided to show the advantages of CXDLS.
Proceedings of the 12th International Conference on Web Information Systems and Technologies, 2016
XML is the de-facto standard for data representation and communication over the web, and so there is a lot of interest in querying XML data and most approaches require the data to be labelled to indicate structural relationships between elements. This is simple when the data does not change but complex when it does. In the day-today management of XML databases over the web, it is usual that more information is inserted over time than deleted. Frequent insertions can lead to large labels which have a detrimental impact on query performance and can cause overflow problems. Many researchers have shown that prefix encoding usually gives the highest compression ratio in comparison to other encoding schemes. Nonetheless, none of the existing prefix encoding methods has been applied to XML labels. This research investigates compressing XML labels via different prefix-encoding methods in order to reduce the occurrence of any overflow problems and improve query performance. The paper also presents a comparison between the performances of several prefix-encodings in terms of encoding/decoding time and compressed code size.
2007 2nd International Conference on Digital Information Management, 2007
In recent years, the method of assigning labels to the nodes of an XML tree is getting more attraction. Various functions in an RDBMS can be easily utilized by storing the labeled XML documents into the RDB. However, in traditional labeling methods, a number of nodes need to be relabeled, when the XML documents are updated. To address this problem, we proposed DO-VLEI code combining VLEI code with the Dewey Order method. DO-VLEI code is effective to reduce the update cost, but the label size increases rapidly when handling large XML documents. To reduce the label size, we presented Compressed-bit-string DO-VLEI (C-DO-VLEI) code. However, it is difficult to handle the length of C-DO-VLEI because it is a variable-length code. In this paper, we propose two effective methods, VLEI-ABL and VLEI-EOL for handling the code length of C-DO-VLEI. We perform experiments to compare the storage consumption of the proposed methods with the previously known OR-DPATH. The experimental results show that our methods considerably outperform the ORDPATH.
Database and Expert Systems …, 2008
With the rapidly increasing popularity of XML as a data format, there is a large demand for efficient techniques in storing and querying XML documents. However XML is by nature verbose, due to repeatedly used tags that describe data. For this reason the storage requirements of XML can be excessive and lead to increased costs for data manipulation. Therefore, it seems natural to use compression techniques to increase the efficiency of storing and querying XML data. In this paper, we propose a new approach called SCQX for Storing, Compressing and Querying XML documents. This approach compresses the structure of an XML document based on exploiting repetitive consecutive tags in the structure, and then SCQX stores the compressed XML structure and the data separately in a robust storage structure that includes a set of access support structures to guarantee fast query performance. Moreover, SCQX supports querying of the compressed XML structure directly and efficiently without requiring decompression. An experimental evaluation on sets of XML data shows the effectiveness of our approach.
2007
In order to efficiently determine structural relationships among XML elements and to avoid re-labeling for updates, much research about labeling schemes has been conducted, recently. However, a harmonic support of efficient query processing and updating has not been achieved. In this paper, we propose an efficient XML encoding and labeling scheme, called EXEL, which is a variant of the region numbering scheme using bit strings. In order to generate the ordinal and insert-friendly bit strings in EXEL, a novel binary encoding method is devised. Also, we devise a labeling scheme for a newly inserted node which incurs no re-labeling of pre-existing labels. These encoding and inserting methods are the bases of efficient query processing and the complete avoidance of re-labeling for updates. Moreover, EXEL supports all structural relationships in XPath and the relationships can be checked by SQL statements supported by an RDBMS. Finally, the experimental results show that EXEL provides fairly reasonable query processing performance while completely avoiding re-labeling for updates.
This has generated an increasing need for robust, high performance XML database systems, which are able to not only query and update XML data efficiently, but also store it in a compact representation. There have been many proposals to manage XML documents. However, two common strategies are available to provide robust storage and efficient query processing. The first is based on numbering schemes for gathering structural information from XML documents and storing it in such a way that allows quick identification of structural relationships between nodes. This identification plays a crucial role in efficient XML query processing. The second strategy tries to reduce the size of XML documents through compaction techniques. While a naive representation of XML documents leads to excessive redundancy, the compaction of XML documents not only reduces the amount of disk space occupied by the data, but also enhances query processing speed. The thesis presents different solutions for the eff...
XML is a standard for exchanging and presenting information on the Web because XML makes data flexible in representation and easily portable as well. However, XML data is also recognized as verbose since it heavily increases the size of the data due to the repeated tags and structures. The data verbosity problem gives rise to many challenges of conventional query processing and data exchange. The XML increase the overhead of bandwidth-and memory-limited devices. XML compression and optimization are one of the solutions of the verbosity problems of XML. Although many effective XML compressors, such as XMill, have been proposed to solve the data size problem but it does not address the problem of running queries on compressed XML data. Other compressors have been proposed to query compressed XML data. However, the compression ratio of these compressors is usually worse than that of XMill and that of the generic compressor gzip, while their query performance and the expressive power of the query language they support are inadequate. The main objective of this work is in two folds; first design and development of XML compression method and second optimization of existing methods of XML compression. In addition, the increased size affects both query processing and data exchange. XML files require a lot more storage space and network bandwidth.
2012
Extensible Markup Language (XML) is proposed as a standardized data format designed for specifying and exchanging data on the Web. With the proliferation of mobile devices, such as palmtop computers, as a means of communication in recent years, it is reasonable to expect that in the foreseeable future, a massive amount of XML data will be generated and exchanged between applications in order to perform dynamic computations over the Web. However, XML is by nature verbose, since terseness in XML markup is not considered a pressing issue from the design perspective. In practice, XML documents are usually large in size as they often contain much redundant data. The size problem hinders the adoption of XML, since it substantially increases the costs of data processing, data storage, and data exchanges over the Web. As the common generic text compressors, such as Gzip, Bzip2, WinZip, PKZIP, or MPEG-7 (BiM), are not able to produce usable XML compressed data, many XML specific compression technologies have been recently proposed. The essential idea of these technologies is that, by utilizing the exposed structure information in the input XML document during the compression process, they pursue two important goals at the same time. First, they aim at achieving a good compression ratio and time compared to the generic text compressors. Second, they aim at generating a compressed XML document that is able to support efficient evaluation of queries over the data. This paper discuses survey of some of the Adaptive Compression Techniques for XML namely Xmill ,Xpress ,Xgrind.
XML has been acknowledged as the defacto standard for data representation and exchange over the World Wide Web. Being self describing grants XML its great flexibility and wide acceptance but on the other hand it is the cause of its main drawback that of being huge in size. The huge document size means that the amount of information that has to be transmitted, processed, stored, and queried is often larger than that of other data formats. Several XML compression techniques has been introduced to deal with these problems. In this paper, we provide a complete survey over the state-of-the-art of XML compression techniques. In addition, we present an extensive experimental study of the available implementations of these techniques. We report the behavior of nine XML compressors using a large corpus of XML documents which covers the different natures and scales of XML documents. In addition to assessing and comparing the performance characteristics of the evaluated XML compression tools, the study also tries to assess the effectiveness and practicality of using these tools in the real world. Finally, we provide some guidelines and recommendations which are useful for helping developers and users for making an effective decision towards selecting the most suitable XML compression tool for their needs.
The Extensible Markup Language (XML) has been acknowledge as the defacto standard for data exchange over the web and data representation. But on the other hand its main drawback that of being huge in size. The huge document size means that the amount of information has to be stored, transmitted, and queried is often larger than that of other data formats. Several XML compression techniques have been introduced to deal with these problems. In this paper, we present an experimental study of available XML compression techniques and we provide guidelines for users for making an effective decision towards selecting the most suitable XML compression tool according their needs.
Advances in Databases and Information Systems, 2007
This paper describes a new XML compression scheme that offers both high compression ratios and short query response time. Its core is a fully reversible transform featuring substitution of every word in an XML document using a semi-dynamic dictionary, effective encoding of dictionary indices, as well as numbers, dates and times found in the document, and grouping data within the same structural context in individual containers. The results of conducted tests show that the proposed scheme attains compression ratios rivaling the best available algorithms, and fast compression, decompression, and query processing.
Journal of Software, 2011
Recently, the researchers have proposed a number of labeling schemes. In these labeling schemes, the approach which can extract structural information between nodes and process query efficiently is more outstanding. However, most of these labeling schemes do not well support update operations. To achieve update-friendly operations, some of the methods keep intervals between labeling numbers, but it requires whole relabeling when the intervals are used up. Several labeling schemes support dynamic XML documents, but most of these labeling schemes allow only leaf node insertions. OrdPathX supports both leaf node insertions and internal node insertions. Inspired by the method of inserting internal nodes of OrdPathX and extending the C-DO-VLEI code, in this paper we propose two dimensions VLEI code. We discuss how this labeling scheme labels nodes and how we can get the structural information of nodes from their labels. We design experiments to evaluate the efficiency of producing labels, the storage consumption and the querying performance of two dimensions VLEI code we proposed, and compare those with the OrdPathX.
Journal of Computer Science, 2010
Problem statement: In order to facilitate XML query processing, labeling schemes are used to determine the structural relationships between XML nodes. However, labeling schemes have to reliable the existing nodes or recalculate the label values when a new node is inserted into the XML document during XML update process. EXEL as a labeling scheme is able to remove relabeling for existing nodes during XML update process. Also, it is able to compute the structural relationship between nodes effectively. However, for the case of skewed insertions where nodes are always inserted at a fixed place, the label size of EXEL scheme increases very fast. Approach: This study discussed how to control the increment of label size for the EXEL scheme. In addition, EXEL does not consider the process of deleting labels. We also study how to reuse the deleted labels for future label insertions. Results: We proposed an algorithm which is able to control the label size increment. Conclusion: It required less storage size to store the inserted binary bit string and thus can improve query performance.
Lecture Notes in Computer Science, 2009
Recently, labeling methods to extract and reconstruct the structural information of XML data, which are important for many applications such as XPath query and keyword search, are becoming more attractive. To achieve efficient structural information extraction, in this paper we propose C-DO-VLEI code, a novel update-friendly bit-vector encoding scheme, based on register-length bit operations combining with the properties of Dewey Order numbers, which are not able to be implemented in other relevant existing schemes such as ORDPATH. Meanwhile, the proposed method also achieves lower storage consumption because it does not require either prefix schema or any reserved codes for node insertion. We performed experiments to evaluate and compare the performance and storage consumption of the proposed method with those of the ORDPATH method. Experimental results show that the execution times for extracting depth information and parent node labels using the C-DO-VLEI code are about 25% and 15% less, respectively, and the average label size using the C-DO-VLEI code is about 24% smaller, comparing with ORDPATH.
Journal of Systems and Software, 2009
In this paper, we propose an efficient encoding and labeling scheme for XML, called EXEL, which is a variant of the region labeling scheme using ordinal and insertfriendly bit strings. We devise a binary encoding method to generate the ordinal bit strings, and an algorithm to make a new bit string inserted between bit strings without any influences on the order of preexisting bit strings. These binary encoding method and bit string insertion algorithm are the bases of the efficient query processing and the complete avoidance of re-labeling for updates. We present query processing and update processing methods based on EXEL. In addition, the Stack-Tree-Desc algorithm is used for an efficient structural join, and the String B-tree indexing is utilized to improve the join performance. Finally, the experimental results show that EXEL enables complete avoidance of re-labeling for updates while providing fairly reasonable query processing performance.
IJEIT ON ENGINEERING AND INFORMATION TECHNOLOGY, 2023
Nowadays Extensible Markup Language (XML) is a dominant technology for formatting and exchanging data across the Internet world. Updating and retrieving a massive amount of XML data is an interesting and active research area. In addition, indexing XML data is a significant task to improve the efficiency of XML queries. Labelling nodes is the used technique for indexing XML data efficiently. There are many labelling schemes that have been proposed. However, these schemes have many limitations and shortcomings. Therefore, this paper aims to propose a new XML labelling scheme that addresses the issue of efficiency of XML query performance. Thus, this paper developed a new XML labelling scheme. Consequently, four experiments were designed in order to evaluate this. The results of these experiments suggest that the proposed scheme achieved the target results and showed an improvement in the performance and the efficiency of labelling XML documents.
Lecture Notes in Computer Science, 2013
The exploitation of large volume of XML (eXtensible Markup Language) data with a limited storage space implies the development of a special and reliable treatment to compress data and query them. This work studies and treats these processes in order to combine them via a mediator while facilitating querying compressed XML data without recourse to the decompression process. We propose a new technique to compress, re-index and query XML data while improving XMill and B+Tree algorithms. We show the reliability and the speed up of the proposed querying system towards response time and answers' exactitude.
2011
In this paper, we report experimental results of our approach for retrieval large-scale XML collection, to improve both efficiency and effectiveness of XML Retrieval. We propose new XML compression algorithm that allows supporting Absolute Document XPath Indexing and Score Sharing Algorithm by a Top-Down Scheme approach. It has been discovered that these steps reduce the size of the data down by 91.87 % compare to GPX, and reduce the length of Score Sharing processing time down to 44.18% when compared to before the compression. In terms of processing time, our system required an average of one second per topic on INEX-IEEE and an average of ten seconds per topic on INEX-Wiki better than GPX system. In addition, we explain the comprehensive description of our XML retrieval system, with performance experiments on large-scale corpora on INEX collections.
2012
XML is recognized as a standard for data storage and exchange for web applications. This is because it has certain unique features like it is self describing, extensible and it is stored in the form of text document. In spite of all these unique features XML has an inherent limitation of verbosity. Because of the strong presence of XML in database technology and its inherent verbosity there is ever increasing need to design compact storage for XML which can be effectively utilized for efficient indexing and querying of XML. The proposed technique creates a structure index which is a compact summarization of the XML document and data index which groups and stores the contents of all similar paths at one place. Based on this compact storage a novel query algorithm is proposed which can answer xpath queries very efficiently. This approach dramatically reduces the storage requirement for XML coupled with efficient processing of xpath queries. The implementation of this technique and comparison with other techniques confirms our claim.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.