Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2005, Lecture Notes in Computer Science
…
11 pages
1 file
Recently, a large number of XML documents are available on the Internet. This trend motivated many researchers to analyze them multi-dimensionally in the same way as relational data. In this paper, we propose a new framework for multidimensional analysis of XML documents, which we call XML-OLAP. We base XML-OLAP on XML warehouses where every fact data as well as dimension data are stored as XML documents. We build XML cubes from XML warehouses. We propose a new multidimensional expression language for XML cubes, which we call XML-MDX. XML-MDX statements target XML cubes and use XQuery expressions to designate the measure data. They specify text mining operators for aggregating text constituting the measure data. We evaluate XML-OLAP by applying it to a U.S. patent XML warehouse. We use XML-MDX queries, which demonstrate that XML-OLAP is effective for multi-dimensionally analyzing the U.S. patents.
Concepts and Competitive Analytics, 2009
With the emergence of Semi-structured data format (such as XML), the storage of documents in centralised facilities appeared as a natural adaptation of data warehousing technology. Nowadays, OLAP (On-Line Analytical Processing) systems face growing non-numeric data. This chapter presents a framework for the multidimensional analysis of textual data in an OLAP sense. Document structure, metadata, and contents are converted into subjects of analysis (facts) and analysis axes (dimensions) within an adapted conceptual multidimensional schema. This schema represents the concepts that a decision maker will be able to manipulate in order to express his analyses. This allows greater multidimensional analysis possibilities as a user may gain insight within a collection of documents.
… Journal of Data …, 2006
Nowadays, most organizations deal with complex data having different formats and coming from different sources. The XML formalism is evolving and becoming a promising solution for modelling and warehousing these data in decision support systems. Nevertheless, classical OLAP tools are still not capable to analyze such data. In this paper, we associate OLAP and data mining to cope advanced analysis on complex data. We provide a generalized OLAP operator, called OpAC, based on the AHC. OpAC is adapted for all types of data since it deals with data cubes modelled within XML. Our operator enables significant aggregates of facts expressing semantic similarities. Evaluation criteria of aggregates' partitions are proposed in order to assist the choice of the best partition. Furthermore, we developed a Web application for our operator. We also provide performance experiments and drive a case study on XML documents dealing with the breast cancer researches domain.
2007 IEEE 23rd International Conference on Data Engineering, 2007
With increasing amounts of data being exchanged and even generated or stored in XML, a natural question is how to perform OLAP on XML data, which can be structurally heterogeneous (e.g., parse trees) and/or marked-up text documents. A core operator for OLAP is the data cube. While the relational cube can be extended in a straightforward way to XML, we argue such an extension would not address the specific issues posed by XML. While in a relational warehouse, facts are flat records and dimensions may have hierarchies, in an XML warehouse, both facts and dimensions may be hierarchical. Second, XML is flexible: (a) an element may have missing or repeated subelements; (b) different instances of the same element type may have different structure. We identify the challenges introduced by these features of XML for cube definition and computation. We propose a definition for cube adapted for XML data warehouse, including a suitably generalized specification mechanism. We define a cube lattice over the aggregates so defined. We then identify properties of this cube lattice that can be leveraged to allow optimized computation of the cube. Finally, we present the results of an extensive performance evaluation experiment gauging the behavior of alternative algorithms for cube computation.
Lecture Notes in Computer Science, 2011
Despite a decade of research in OLAP systems, very few works attempt to tackle the problem of analysing data extracted from XML text-rich documents. These documents are loosely structured XML documents mainly composed of text. This paper details conceptual design steps of multidimensional databases from such documents. With the use of an adapted multidimensional conceptual model, the design process allows the integration of data extracted from text-rich XML documents within an adapted OLAP system.
Annals of Information Systems, 2008
There has been a lot of research on OLAP (On-Line Analytical Processing) systems during the past decade. These systems allow decision makers to improve their decisions. Despite numerous multidimensional conceptual models, none tackle the problem of analysing data extracted from text-rich XML documents. These documents represent a lot of unavailable information for actual OLAP systems. Moreover, the implementation of such a system requires an adapted design process. In this paper, we present an adapted "galaxy" model for the analysis of text-rich XML documents. This model is associated to an adapted design process and a tool that takes in charge all automated tasks of the process.
2004
List of Definitions ix List of Figures xi List of Listings xiii List of Tables xv List of Examples xvii Acknowledgement xix Preface Part IV puts forward experimental results of described approaches. In Chapter 10, we show experimental results of the multi-dimensional approach to indexing XML data. Various XPath queries are tested. Chapter 11 describes experimental results of the signature multi-dimensional data structures. In the following Chapter, experimental results of the multi-dimensional approach for term indexing are depicted. These results prove an efficiency of the multi-dimensional forest for indexing points of different dimensions. Finally, in Conclusion, we conclude with summary of contributions and discussions on possibilities of a future work.
International Journal of Strategic Information Technology and Applications
The Diamond model is a multidimensional model dedicated to XML document warehouses. It considers structured and unstructured data simultaneously. Furthermore, it orders the semantics of documents via a specific semantic dimension linked to conventional dimensions, thus breaking the classical orthogonality rule of dimensions. After giving an overview of their three-phase quasi-automatic approach for the generation of the diamond model, the authors focus on the Diamond-Gen software tool that supports the proposed approach. The authors illustrate the Diamond-Gen functionalities and assess it through an experimental study using a set of 1500 XML documents issued from the PubMed collection.
Lecture Notes in Computer Science, 2014
As the amount of textual information grows explosively in various kinds of business systems, it becomes more and more essential to analyze both structured data and unstructured textual data simultaneously. However information contained in non structured data (documents and so on) is only partially used in business intelligence (BI). Indeed On-Line Analytical Processing (OLAP) cubes which are the main support of BI analysis in decision support systems have focused on structured data. This is the reason why OLAP is being extended to unstructured textual data. In this paper we introduce the innovative "Diamond" multidimensional model that will serve as a basis for semantic OLAP on XML documents and then we describe the meta modeling, generation and implementation of a the Diamond multidimensional model.
Proceedings of the 5th ACM international workshop on Data Warehousing and OLAP - DOLAP '02, 2002
On-Line Analytical Processing (OLAP) is a powerful method for analysing large data warehouse data. Typically, the data for an OLAP database is collected from a set of data repositories such as e.g. operational databases. This data set is often huge, and it may not be known in advance what data is required and when to perform the desired data analysis tasks. Sometimes it may happen that some parts of the data are only needed occasionally. Therefore, keeping the OLAP database constantly up-to-date is not only a highly demanding task but it also may be overkill in practice.
Proceedings of the 2nd International Conference on Agents and Artificial Intelligence, 2010
The rapid growth of semi-structured sources raises the need of designing and implementing environments for knowledge discovery out of XML data. This paper presents an Inductive Database System in which raw data, mining models and domain knowledge are represented as XML documents, stored inside XML native databases. In particular, we discuss our experiences in the design and development of XQuake, a mining query language that extends XQuery. Features of the language are an intuitive syntax, a good expressiveness and the capability of dealing uniformly with raw data, induced and background knowledge. The language is presented by means of examples and a sketch of its implementations and the evaluation of its performance is given. 1 www.dblp.uni-trier.de/xml/ 2 www.dbis.informatik.uni-goettingen.de/Mondial/ 20 Romei A. and Turini F. (2010). XQUAKE-An XQuery-like Language for Mining XML Data.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
Information Systems, 2010
Advances in Data Warehousing and Mining, 2009
Proceedings of the 2008 EDBT Ph.D. workshop, 2008
2015 IEEE 9th International Conference on Research Challenges in Information Science (RCIS), 2015
Lecture Notes in Computer Science
Software: Practice and Experience, 2008