Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2014, International Journal of Data Warehousing and Mining
In this paper, the authors present an approach to efficiently compress XML OLAP cubes. They propose a multidimensional snowflake schema of the cube as the basic physical configuration. The cube is then composed of one XML fact document and as many XML documents as the dimension hierarchy members. The basic configuration is reorganized into two ways by adding data redundancy on purpose in order to achieve a better compression ratio on the one hand and to improve query response time on the other hand. In the second configuration, all the documents of the cube are merged into one single XML document. In the third configuration, each reference between the fact and the dimensions or between the members of a dimension hierarchy is replaced by the whole XML referenced fragments. To the three physical configurations of the cube, the authors apply a new compression technique named XCC. They demonstrate the efficiency of the third configuration before and after compression and they also show the efficiency of their compression technique when applied to XML OLAP cubes.
Lecture Notes in Computer Science, 2004
Hierarchical clustering has been proved an effective means for physically organizing large fact tables since it reduces significantly the I/O cost during ad hoc OLAP query evaluation. In this paper, we propose a novel multidimensional file structure for organizing the most detailed data of a cube, the CUBE File. The CUBE File achieves hierarchical clustering of the data, enabling fast access via hierarchical restrictions. Moreover, it imposes a low storage cost and adapts perfectly to the extensive sparseness of the data space achieving a high compression rate. Our results show that the CUBE File outperforms the most effective method proposed up to now for hierarchically clustering the cube, resulting in 7-9 times less I/Os on average for all workloads tested. Thus, it achieves a higher degree of hierarchical clustering. Moreover, the CUBE File imposes a 2-3 times lower storage cost.
2011 IEEE 12th International Conference on Mobile Data Management, 2011
In this paper, we present a complete demonstration of Hand-OLAP, a Java-based distributed system that relies on intelligent data cube compression techniques for effectively and efficiently supporting OLAP in mobile environments. Hand-OLAP is based on an innovative systematic technique according to which first a two-dimensional OLAP view of interest is extracted from the target multidimensional data cube via the so-called OLAP dimension flattening process, and then this view is compressed by means of a meaningful semantics-based data cube compression approach. The compressed two-dimensional view is finally delivered to mobile devices, and used to support interactive OLAP exploration and querying tasks in an off-line manner.
2012
Extensible Markup Language (XML) is proposed as a standardized data format designed for specifying and exchanging data on the Web. With the proliferation of mobile devices, such as palmtop computers, as a means of communication in recent years, it is reasonable to expect that in the foreseeable future, a massive amount of XML data will be generated and exchanged between applications in order to perform dynamic computations over the Web. However, XML is by nature verbose, since terseness in XML markup is not considered a pressing issue from the design perspective. In practice, XML documents are usually large in size as they often contain much redundant data. The size problem hinders the adoption of XML, since it substantially increases the costs of data processing, data storage, and data exchanges over the Web. As the common generic text compressors, such as Gzip, Bzip2, WinZip, PKZIP, or MPEG-7 (BiM), are not able to produce usable XML compressed data, many XML specific compression technologies have been recently proposed. The essential idea of these technologies is that, by utilizing the exposed structure information in the input XML document during the compression process, they pursue two important goals at the same time. First, they aim at achieving a good compression ratio and time compared to the generic text compressors. Second, they aim at generating a compressed XML document that is able to support efficient evaluation of queries over the data. This paper discuses survey of some of the Adaptive Compression Techniques for XML namely Xmill ,Xpress ,Xgrind.
Database and Expert Systems …, 2008
With the rapidly increasing popularity of XML as a data format, there is a large demand for efficient techniques in storing and querying XML documents. However XML is by nature verbose, due to repeatedly used tags that describe data. For this reason the storage requirements of XML can be excessive and lead to increased costs for data manipulation. Therefore, it seems natural to use compression techniques to increase the efficiency of storing and querying XML data. In this paper, we propose a new approach called SCQX for Storing, Compressing and Querying XML documents. This approach compresses the structure of an XML document based on exploiting repetitive consecutive tags in the structure, and then SCQX stores the compressed XML structure and the data separately in a robust storage structure that includes a set of access support structures to guarantee fast query performance. Moreover, SCQX supports querying of the compressed XML structure directly and efficiently without requiring decompression. An experimental evaluation on sets of XML data shows the effectiveness of our approach.
2007 IEEE 23rd International Conference on Data Engineering, 2007
With increasing amounts of data being exchanged and even generated or stored in XML, a natural question is how to perform OLAP on XML data, which can be structurally heterogeneous (e.g., parse trees) and/or marked-up text documents. A core operator for OLAP is the data cube. While the relational cube can be extended in a straightforward way to XML, we argue such an extension would not address the specific issues posed by XML. While in a relational warehouse, facts are flat records and dimensions may have hierarchies, in an XML warehouse, both facts and dimensions may be hierarchical. Second, XML is flexible: (a) an element may have missing or repeated subelements; (b) different instances of the same element type may have different structure. We identify the challenges introduced by these features of XML for cube definition and computation. We propose a definition for cube adapted for XML data warehouse, including a suitably generalized specification mechanism. We define a cube lattice over the aggregates so defined. We then identify properties of this cube lattice that can be leveraged to allow optimized computation of the cube. Finally, we present the results of an extensive performance evaluation experiment gauging the behavior of alternative algorithms for cube computation.
2009
Abstract Data explosion in quantity and diversified sources of data in variety pose certain problems in data storage on heterogeneous repositories for online analytical processing (OLAP). Over the years, Extensible Markup Language (XML) is emerged as a standard language for integrating and exchanging structural and semi-structured data from heterogeneous and miscellaneous data sources.
Information analytical systems provide data warehouses analyze for all users levels from operating systems. In the big organizations such as banks, information analytical systems process lots of information daily and generate several of them. The database of these organizations faces with a lot of information that comprises financial, official and accounting transactions and so on. Therefore the compression of these data were always important and it is knows as a serious challenge. To reach this goal several algorithms and various methods have been provided but the improvement in compressing at these methods are very important and have been progressing. Beside of compression, the importance of security and ensure that these information are unavailable for unknown users is important issues. But in recent years have not been provide a secure compression method for information analytical systems data. As well as advent of cloud infrastructure and the transfer of traditional storage systems to cloud storage servers, we can use the benefits of them such as scalability, flexibility, reduction in costs and accessing from everywhere. In this paper we use of XCC Advanced algorithm for compression processing in cloud infrastructure for storage XML OLAP documents in the cloud. Also for improving security and unavailability for unknown users, we use a secure encryption for secure data transactions. The results show the new method can provide a proportional structure for storage the XML OLAP documents in cloud data center and compress them optimally that they improve the speed of data transfer and reduction in costs.
This has generated an increasing need for robust, high performance XML database systems, which are able to not only query and update XML data efficiently, but also store it in a compact representation. There have been many proposals to manage XML documents. However, two common strategies are available to provide robust storage and efficient query processing. The first is based on numbering schemes for gathering structural information from XML documents and storing it in such a way that allows quick identification of structural relationships between nodes. This identification plays a crucial role in efficient XML query processing. The second strategy tries to reduce the size of XML documents through compaction techniques. While a naive representation of XML documents leads to excessive redundancy, the compaction of XML documents not only reduces the amount of disk space occupied by the data, but also enhances query processing speed. The thesis presents different solutions for the eff...
Proceedings of the 5th ACM international workshop on Data Warehousing and OLAP - DOLAP '02, 2002
On-Line Analytical Processing (OLAP) is a powerful method for analysing large data warehouse data. Typically, the data for an OLAP database is collected from a set of data repositories such as e.g. operational databases. This data set is often huge, and it may not be known in advance what data is required and when to perform the desired data analysis tasks. Sometimes it may happen that some parts of the data are only needed occasionally. Therefore, keeping the OLAP database constantly up-to-date is not only a highly demanding task but it also may be overkill in practice.
Advances in Databases and Information Systems, 2007
This paper describes a new XML compression scheme that offers both high compression ratios and short query response time. Its core is a fully reversible transform featuring substitution of every word in an XML document using a semi-dynamic dictionary, effective encoding of dictionary indices, as well as numbers, dates and times found in the document, and grouping data within the same structural context in individual containers. The results of conducted tests show that the proposed scheme attains compression ratios rivaling the best available algorithms, and fast compression, decompression, and query processing.
Lecture Notes in Computer Science, 2009
Due to the growing popularity of XML as a data exchange and storage format, the need to develop efficient techniques for storing and querying XML documents has emerged. A common approach to achieve this is to use labeling techniques. However, their main problem is that they either do not support updating XML data dynamically or impose huge storage requirements. On the other hand, with the verbosity and redundancy problem of XML, which can lead to increased cost for processing XML documents, compaction of XML documents has become an increasingly important research issue. In this paper, we propose an approach called CXDLS combining the strengths of both, labeling and compaction techniques. Our approach exploits repetitive consecutive subtrees and tags for compacting the structure of XML documents by taking advantage of the ORDPATH labeling scheme. In addition it stores the compacted structure and the data values separately. Using our proposed approach, it is possible to support efficient query and update processing on compacted XML documents and to reduce storage space dramatically. Results of a comprehensive performance study are provided to show the advantages of CXDLS.
Background: Bioinformatics data are the description of all the chemical interactions or the protein structure to form the gene in the living bodies. To make these data easy to be stored, transmit, retrieved, and unified the best way is to represent these data as XML representation. However, XML documents suffer from high redundancy in its structure. Objective: This paper produces a new XML compressor (BioXComp) to compress the Bioinformatics XML documents (Bio-XML) and to retrieve information from the compressed XML documents without the need to decompress them. The proposed algorithm depends on generating the Structure Indexed Tree (SIT) for the Bio-XML document to use it for the retrieval purposes according to different types of queries. Results: BioXComp achieves 68.7% compression ratio and retrieves information based on different kinds of XQuery queries. Conclusion: As the importance of using XML documents in representing the Bioinformatics data increases, the need to compress these data is increasing as well to transmit these documents using less bandwidth. Instead of decompressing these files each time the user needs to retrieve a specific portion from them, BioXComp provides a way to retrieve the information from the compressed data using different types of XQuery query language.
2005
This paper proposes the compression of data in Relational Database Management Systems (RDBMS) using existing text compression algorithms. Although the technique proposed is general, we believe it is particularly advantageous for the compression of medium size and large dimension tables in data warehouses. In fact, dimensions usually have a high number of text attributes and a reduction in their size has a big impact in the execution time of queries that join dimensions with fact tables. In general, the high complexity and long execution time of most data warehouse queries make the compression of dimension text attributes (and possible text attributes that may exist in the fact table, such as false facts) an effective approach to speed up query response time. The proposed approach has been evaluated using the well-known TPC-H benchmark and the results show that speed improvements greater than 40% can be achieved for most of the queries.
Proc. DMDW, 2003
Abstract. On-Line Analytical Processing (OLAP) is a powerful method for analysing large warehouse data. Typically, the data for an OLAP database is collected from a set of data repositories such as eg opera-tional databases. This data set is often huge, and it may not be known in ...
IOSR Journal of Computer Engineering, 2013
This research work demonstrates the Extraction, compression and query processing of XML documents for Adaptive Compression Techniques and Efficient Query Evaluation. Proposed here are the algorithms for xml compression and Efficient Query Evaluation as-Feasible XML compression using data compression algorithm. Qurey Processor using Sax parsing and Interfaces. It is shown that using the proposed techniques for xml data compression will pave a way for better compression and improve the compression ratio and performance of compressor system
cs.uic.edu
XML has grown into a widely used and highly developed technology, due in part to the subcomponents built around the technology (advanced parsers, frameworks, libraries, etc). The use of XML reduces development time and increases the robustness of distributed applications. Due to these ...
Multidimensional array is widely used in large number of scientific research. It is necessary to develop a suitable scheme to compress the multidimensional array in an efficient way so that it takes comparatively low memory storage. In this paper we propose a new scheme; Extendible Array based Compressed Row Storage (EXCRS) to compress the large multidimensional array. The main idea of this scheme is to compress multidimensional array by using CRS method. In this scheme multidimensional data which is represented as a set of two dimensional arrays can be extended in any direction at any time. Then the subarrays found from the EA (Extendible Array) are compressed using CRS method. To evaluate our proposed scheme we implemented the traditional compression schemes like Bitmap, Header, CRS/CCS and Extendible Array (EA) in MS Visual C++ 6.0 compiler. The experimental results shows that EACRS is outperforms than that of Bitmap, Header, CRS in MS Visual C++ 6.0 compiler.
XML is a standard for exchanging and presenting information on the Web because XML makes data flexible in representation and easily portable as well. However, XML data is also recognized as verbose since it heavily increases the size of the data due to the repeated tags and structures. The data verbosity problem gives rise to many challenges of conventional query processing and data exchange. The XML increase the overhead of bandwidth-and memory-limited devices. XML compression and optimization are one of the solutions of the verbosity problems of XML. Although many effective XML compressors, such as XMill, have been proposed to solve the data size problem but it does not address the problem of running queries on compressed XML data. Other compressors have been proposed to query compressed XML data. However, the compression ratio of these compressors is usually worse than that of XMill and that of the generic compressor gzip, while their query performance and the expressive power of the query language they support are inadequate. The main objective of this work is in two folds; first design and development of XML compression method and second optimization of existing methods of XML compression. In addition, the increased size affects both query processing and data exchange. XML files require a lot more storage space and network bandwidth.
2010
In this paper, we provide further extensions of Hand-OLAP, a Java-based distributed system for enabling OLAP in mobile environments via intelligent data cube compression approaches. These extensions aim at integrating innovative semantics representation and management models within compressed OLAP views, in order to improve the data cube compression process itself, and to support an improved summarized, OLAP-like knowledge fruition from multidimensional data cubes throughout mobile devices. We complete our analytical contribution by means of an experimental evaluation of the novel semantics-based data cube compression approach on well-known benchmark data cubes, which definitely confirms to us the efficiency and the reliability of our proposed research.
XML has gained prominence as data storage and exchange format for web applications. This is because there are certain features which are unique to XML like self descriptivism, extensibility and non proprietary text document storage. In spite of all these unique features XML has an inherent limitation of verbosity. This size problem of XML should be dealt with efficiently so that a good compression is achieved and at the same time the compressed data is directly queriable i.e. it should not require decompression at the time of querying. The proposed technique creates a new query engine based on novel three dimensional indexes consisting of structure, attribute and content index. The structure index consists of all unique root to leaf paths of the XML document, the content index stores the contents path wise i.e. all the contents of one particular type of path class is stored in one file and attribute index is created in manner similar to that of content index. Based on this three dimensional compact storage a new query engine is proposed which can answer xpath queries very efficiently. This approach dramatically reduces the storage requirement for XML coupled with efficient processing of xpath queries.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.