Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2009, Information Systems …
While most business applications typically operate on structured data that can be effectively managed using relational databases, some applications use more complex semistructured data that lacks a stable schema. XML techniques are available for the management of semistructured data, but such techniques tend to be ineffective when applied to large amounts of heterogeneous data, in particular in applications with complex query requirements. We describe an approach that relies on the mapping of multiple semistructured data sets to object-relational structures and uses an object-relational database to support complex query requirements. As an example we use weakly heterogeneous oceanographic data.
Fifth Computer Science and Engineering …
2002
Multidimensional Semistructured Data (MSSD) are semistructured data that present di erent facets under di erent c o n texts (i.e. alternative w orlds). For the representation of MSSD various formalisms have been proposed by the authors, both syntactic (such as mssd-expressions and MXML) as well as graphical (such as Multidimensional OEM). In this paper we present an infrastructure for handling MSSD. This infrastructure provides appropriate tools for building MSSD applications, and is independent from any particular application that uses it. We also present a graphical interface, called MSSDesigner, that provides access to the infrastructure, and we describe OEM History, an MSSD application that supports keeping track of temporal changes in semistructured databases.
2007
Organizations have to work with large volumes of information at different formats. In this context they are necessary tools and formats like XML, to integrate these data and solve the heterogeneity problem. In this paper we proposed a new software tool called XDS. XDS (eXtensible Data Sources) is a new system to integrate data from relational databases, native XML databases and XML documents. On the other hand, the environment of libraries has also their bibliographical catalogues at different sources and formats. Therefore, we show the use and validity of the XDS system in this environment obtaining the results in a bibliographical format like MODS (Metadata Object Description Schema). Hereby the resources of different bibliographical catalogues can be consulted obtaining the results in a common format as MODS.
Intl. Workshop on Knowledge Representation meets Databases (KRDB), 2001
Nowadays, data can be represented and stored by using different formats ranging from non structured data, typical of file systems, to semistructured data, typical of Web sources, to highly structured data, typical of relational database systems. Therefore, the necessity arises to define new models and approaches for uniformly handling datasources having different formats and structures, and obtaining a global, integrated, and uniform representation. In this paper we present three approaches to data integration and propose a unifying framework integrating the various methodologies and incorporating techniques developed separately. We also present the architecture of a metadata repository supporting the integration framework.
2011
Many Web data sources and APIs make their data available in XML, JSON, or a domain-specific semi-structured format, with the goal of making the data easily accessible and usable by Web application developers. Although such data formats are more machine-processable than pure text documents, managing and analyzing such data in large scale is often nontrivial. This is mainly due to the lack of a well-defined (or understood) structure and clear semantics in such data formats, which could result in poor data quality. In the xCurator project, we add structure to such data with the goal of publishing it on the Web as Linked Data. We enhance the quality of such data by: extracting entities, their types, and their relationships to other entities; performing entity (and entity type) identification; merging duplicate entities (and entity types); linking related entities (internally and to external sources); and publishing the results on the Web as high-quality Linked Data. This is all in a light-weight easy-to-use and scalable framework that effectively incorporates user feedback in all phases. We describe the initial framework of our system and report the results of using our system for managing large volumes of (user-generated) data on the Web in several real world applications.
Motivated to a large extent by the substantial and growing prominence of the World-Wide Web and the potential benefits that may be obtained by applying database concepts and techniques to web data management, new data models and query languages have emerged that contend with web data. These models organize data in graphs where nodes denote objects or values and edges are labeled with single words or phrases. Nodes are described by the labels of the paths that lead to them, and these descriptions serve as the basis for querying. This paper proposes an extensible framework for capturing and querying meta-data properties in a semistructured data model. Properties such as temporal aspects of data, prices associated with data access, quality ratings associated with the data, and access restrictions on the data are considered. Specifically, the paper defines an extensible data model and an accompanying query language that provides new facilities for matching, slicing, collapsing, and coalescing properties. It also briefly introduces an implemented, SQLlike query language for the extended data model that includes additional constructs for the effective querying of graphs with properties.
2006
An important reality when integrating scientific data is the fact that data may often be "missing", partially specified, or conflicting. Therefore, in this paper, we present an assertion-based data model that captures both value-based and structure-based "nulls" in data. We also introduce the QUEST system, which leverages the proposed model for Query-driven Exploration of Semistructured data with conflicT s and partial knowledge. Our approach to integration lies in enabling researchers to observe and resolve conflicts in the data by considering the context provided by the data requirements of a given research question. In particular, we discuss how pathcompatibility can be leveraged, within the context of a query, to develop a high-level understanding of conflicts and nulls in data.
Journal of Digital Information Management, 2005
In this paper we present an ontology-based method for formalizing the implicit semantic and we suggest mechanisms to semantically integrate XML schemas and documents as well. After a survey of database interoperability, we present our semantic integration approach by explaining the nature of ontology. The article then presents our integration method for XML data and schemas using a generic ontology.
Information Systems, 2004
Multidimensional semistructured data (MSSD) are semistructured data that present different facets under different contexts. Context represents alternative worlds, and is expressed by assigning values to a set of user-defined variables called dimensions. The notion of context has been incorporated in the Object Exchange Model (OEM), and the extended model is called Multidimensional OEM (MOEM), a graph model for MSSD. In this paper, we explain in detail how MOEM can represent the history of an OEM database. We discuss how MOEM properties are applied in the case of
World Wide Web, 2001
Page 1. World Wide Web, 4: 7999 (2001) © 2001 Kluwer Academic Publishers Modelling and Manipulating Multidimensional Data in Semistructured Databases RAYMOND K. WONG and FRANKY LAM School of Computer ...
IOSR Journal of Engineering, 2012
Information Retrieval from heterogeneous information systems is required but challenging at the same as data is stored and represented in different data models in different information systems.Information integrated from heterogeneous data sources into single data source are faced upon by major challenge of information transformation-were in different formats and constraints in data transformation are used in data integration for the purpose of integrating information systems, at the same is not cost effective. This paper introduces ideaof Information integration based on search criteria from heterogeneous data sources into single data source. Every element of information source such as entity, field, and relation is mapped to component of new single text source-created every time heterogeneous information systems are searched and result is saved into new text file. This approach allows us to create new text file and delete existing file, modifiying wrapper, making modifications later and managing data retrieval in a simple unified style. This architecture is flexible enough to incorporate variety of data models and query capabilities by various protocols. It is possible to select logically tied information from all available legacy data sources.
TR00-004, University of Florida, …, 2000
Lecture Notes in Computer Science, 2001
Semi-structured data has become prevalent with the growth of the Internet. The data is usually stored in a traditional database system or in a specialized repository. While many information providers have presented their databases on the web as semi-structured data, other information providers are developing repositories for new application. One such application is e-commerce, which is emerging as a major web-supported application assisting business transactions between multiple parties via the network and involving large amounts of data. Designing a \good" semi-structured database is increasingly crucial to prevent data redundancy, inconsistency and updating anomalies. In this paper, we propose a conceptual approach to design semi-structured databases. A conceptual layer which is based on the popular Entity-Relationship (ER) model is employed to remove anomalies and redundancies at the semantic level. An algorithm to map an ER diagram involving composite attributes weak entity types, recursive, n-ary and ISA relationship sets, and aggregations to a semi-structured schema graph (S3-Graph) used to represent semi-structured data is given. Our study reveals similarities between the S3-Graph and the hierarchical model and nested relations in that all have limitations in modeling situations with nonhierarchical relationships given their tree-like structures.
IAEME PUBLICATION, 2020
Multidimensional Analysis or On-Line Analytical Processing (OLAP) has generally used for the analysis of structured data, in the context of data warehouses. However, these techniques are not appropriate to manage or analyze semi-structured data, as XML (eXtensible Mark-up Language) documents. In this paper, we propose a meta-model of heterogeneous semi-structure data (Structure and Content) and a multidimensional model based on galaxy model in order to analyze the content of this data type according to several perspectives.
in the past decade, research works in heterogeneous database integration have established a good and solid framework to alleviate this task. However, there are still works that need to be accomplished to make these achievements easily implementable. In our project, we shall develop a software tool using XML for integrating and querying disparate heterogeneous information as unified XML views.
2002
XML has been quickly emerging as a dominant standard for data representation and exchange on the World Wide Web for its many good features such as well-formed structure or semantic support. Research on semistructured data over the last several years has focused on data models, query languages, and systems where the database is modeled in some form of a labeled, directed graph. Processing this as a sophisticated query on semistructured data is not very easy because of the complexity of the structure of the graph and the lack of corresponding schemata associated with it. To deal with such problems the paper proposes an approach to process semistructured data with XML. Although there are many similarities between semistructured data and XML there exist some differences. A key difference is that current XML DOM only supports tree structures and does not directly support graph structures. To deal with such differences two approaches in this paper are proposed to treat an XML document as a semantic graph and literal tree which are the foundation to transform semistructured data into XML documents for processing. For this purpose several algorithms are designed to transform semistructured data into XML documents and XML-Schema document based on the schema tree extracted from original semistructured data. To ensure that semistructured data can be reconstructed from XML documents this transformation must be lossless. Finally the paper also presents an algorithm for reconstructing semistructured data.
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON VERY LARGE DATA BASES, 1999
Motivated to a large extent by the substantial and growing prominence of the World-Wide Web and the potential benefits that may be obtained by applying database concepts and techniques to web data management, new data models and query languages have emerged that contend with the semistructured nature of web data. These models organize data in graphs. The nodes in a graph denote objects or values, and each edge is labeled with a single word or phrase. Nodes are described by the labels of the paths that lead to ...
Citeseer
Semi-structured data is becoming increasingly important with the introduction of XML and related languages and technologies. The recent shift from DTDs (document type de nitions) to XML-Schema for XML data highlights the importance of a schema de nition for semi-structured data ...
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.