Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2006, Information Systems
This paper presents a novel approach for the integration of a set of XML Schemas. The proposed approach is specialized for XML, is almost automatic, semantic and "light". As a further, original, peculiarity, it is parametric w.r.t. a "severity" level against which the integration task is performed. The paper describes the approach in all details, illustrates various theoretical results, presents the experiments we have performed for testing it and, finally, compares it with various related approaches already proposed in the literature.
2003
This paper presents a novel approach for the integration of a set of XML Schemas. The proposed approach is specialized for XML, is almost automatic, semantic and "light". As a further, original, peculiarity, it is parametric w.r.t. a "severity" level against which the integration task is performed. The paper describes the approach in all details, illustrates various theoretical results, presents the experiments we have performed for testing it and, finally, compares it with various related approaches already proposed in the literature.
Lecture Notes in Computer Science, 2002
We define an object-oriented data model called XSDM (XML Schema Data Model) and present a graphical representation of XML Schema integration. The three layers included are, namely, pre-integration, comparison and integration. During pre-integration, the schema present in XML Schema notation is read and is converted into the XSDM notation. During the comparison phase of integration, correspondences as well as conflicts between elements are identified. During the integration phase, conflict resolution, restructuring and merging of the initial schemas take place to obtain the global schema.
Web Information Systems and Technologies, 2007
This paper presents an ontology integration approach of XML data. The approach is composed of two pillars the first of which is based on formal language and XML grammars analysis. The second pillar is based on ontology and domain ontology analysis. The keystone of this architecture which creates a bridge between the two pillars is based on the concept of schematic marks introduced in this paper. These schematic marks make it possible to establish the link between the syntactic level and the semantic level for our integration framework.
Lecture Notes in Computer Science, 2005
This paper presents a detailed integration process for XML schemata called BInXS. BInXS adopts a global-as-view integration approach that builds a global schema from a set of heterogeneous XML schemata related to a same application domain. This bottom-up approach maps all element and attribute definitions in XML schemata to correspondent concepts at the global schema, allowing access to all data available at the XML sources. The integration process is semiautomatically performed over conceptual representations of the XML schemata, which provides a better understanding of the semantics of the XML data to be unified. A conceptual schema is generated by a set of conversion rules that are applied to a schema definition for XML data. Once this conceptual schema is the result of a meticulous analysis of the XML logical model, it is able to abstract the particularities of semistructured and XML data, like elements with mixed contents and elements with alternative representations. Therefore, the further unification of such conceptual schemata implicitly deals with structural conflicts inherent to semistructured and XML data. In addition, BInXS supports a mapping strategy based on XPath expressions in order to maintain correspondences among global concepts and data at the XML sources. 5 Figure 2 (b) is a logical abstraction of a schema defined through a DTD (Document Type Definition) or an XSD (XML Schema Definition) [5]
2001
XML raises as the standard for semistructured data representation and data exchange in the Web. In this context, data integration mechanisms are required to provide an unified view of semantically related information of a same domain. In this paper, a bottom-up integration process is proposed to solve such problem. In this approach, an ontology is generated from the semantic integration of conceptual schemata derived from DTDs. The process is semi-automatic taking into account the intervention of an human expert to provide semantic adjustments. The resulting ontology is an unified vocabulary for semistructured concepts presented in several XML sources; keeps mapping information to DTD elements and attributes; and acts as a global schema for user queries. The overall integration process is briefly presented through examples.
The Semantic Web—ISWC …, 2002
… of the 23rd Brazilian symposium on …, 2008
There are two major problems for merging instances from different sources in order to build a datawarehouse: entity identification ambiguity and attribute value conflict. In this paper we propose a data model that facilitates the resolution of value attribute conflicts by explicitly representing them in the integrated schema. In this model, the datawarehouse is an XML tree populated with data imported from one or more XML sources, and nodes are annotated with provenance information. The purpose of annotations is twofold: first, they represent the origin of every element in the datawarehouse. This information is essential for determining the quality and amount of trust one places on the data. Second, they allow the portion of source XML tree used to populate the warehouse to be reconstructed. This capability is important if one needs the original document to compare with new releases from the same source in order to incrementally update the warehouse. Algorithms for populating the warehouse according to the proposed model and for reconstructing the source data are presented. We also report results from an experimental study conducted to determine the impact of the annotations on the size of the warehouse.
2005
Data integration is the problem of combining data residing at different sources, and providing the user with a virtual view, called global schema, which is independent from the model and the physical origin of the sources. Whereas many data integration systems and theoretical works have been proposed for relational data, not much investigation has been focused yet on XML data integration. Our goal is therefore to address some of its related issues. In particular, we highlight two major issues that emerge in the XML context: (i) the global schema may be characterized by a set of constraints, expressed by means of a DTD and XML integrity constraints, (ii) the concept of node identity requires to introduce semantic criteria to identify nodes coming from different sources. We propose a formal framework for XML data integration systems based on an expressive XML global schema, a set of XML data sources and a set of mappings specified by means of a simple tree language. Then, we define an identification function that aims at globally identifying nodes coming from different sources. Finally, we propose algorithms to answer queries under different assumptions for the mappings.
Proceedings. International Database Engineering and Applications Symposium, 2004. IDEAS '04., 2004
XML is becoming the standard for data interchange on the web. However, XML and its schema languages do not express semantics but rather structure, such as nesting information. Therefore, semantically equivalent documents often present different document structures. In this paper, we provide an ontology-based framework that aims to make two XML documents interoperate at the semantic level while retaining their nesting structure. In our global-asview approach, we generate an RDF ontology for each of the participating XML documents, which preserves the nesting structure of the document. An RDF global ontology is the result of merging the individual ontologies. The global ontology unifies the query access and establishes semantic connections among the underlying individual databases. We consider two types of queries: those that are posed on the global ontology and those that are posed on any of the XML documents, in a P2P fashion. The former type is processed using query translation from an RDF query to an XML query. The latter type entails bidirectional query processing: the translation from an XML query to an RDF query followed by the translation from an RDF query to an XML query. To ensure the correctness of the answer to the query in the latter case, we introduce the concept of reversibility of the query translation.
Bncod, 2004
This paper describes the integration of XML data sources within the AutoMed heterogeneous data integration system. The paper presents a description of the overall framework, as well as an overview of and comparison with related work and implemented solutions by other researchers. The main contribution of this research is an algorithm for the integration of XML data sources, based on graph restructuring of their schemas.
Lecture Notes in Computer Science, 2003
Various XML instances from different data sources can model the same object of the real world. Query processing or view definition over these sources demands instance integration. In this context, integration means to identify which data instances represent the same object of the real world, as well as to solve ambiguities of representation of this object. The entity identification problem in XML is more complex than in structured databases. XML data, as originally considered, necessarily do not have the identification notion of primary key or object identifier. Thus, it is necessary the adoption of a mechanism that identifies the instances at the moment of data integration. This paper presents a proposal for identifiers attribution to XML instances, based on the use of Skolem functions and XPath recommendation, as proposed by W3C. Integration of XML Data 75 2.1 State of the Art Universal Key This is the simplest method for data integration. It is based on the existence of a common key between the instances to be integrated [2, 14, 16, 17, 23]. However, this approach is restricted, since the sources not always have a common key, as XML data. Key Equivalence Specified by the User This approach requires that the user specifies equivalence between the instances, for example, using a mapping table of the local identifiers from each source to the global identifiers in the integrated system. This technique is used in [1, 18, 20]. The disadvantage of this technique is that the mapping table can be considerable and present difficult maintenance, handled by the database administrator, not in a automatic way.
2001
The paper describes a prototype tool, named DIXSE, which supports the integration of XML Document Type Definitions (DTDs) into a common conceptual schema. The mapping from each individual DTD into the common schema is used to automatically generate wrappers for XML documents, which conform to a given DTD. These wrappers are used to populate the common conceptual schema thereby achieving data integration for XML documents.
Data & Knowledge Engineering, 2008
The availability of large amounts of heterogeneous distributed web data necessitates the integration of XML data from multiple XML sources for many reasons. For example, currently, there are many e-commerce companies, which offer similar products but use different XML schemas with possibly different ontologies. When any two such companies merge, or make an effort to service customers in cooperation, there is a need for an integrated schema and query mechanism for the interoperability of applications. In applications like comparison-shopping, there is a need for an illusionary centralized homogeneous information system. In this paper, we propose XML Schema integration and querying methodology. We define an object-oriented data model called XSDM (XML Schema Data Model) and present a graphical representation of XML Schema for the purpose of schema integration. We use a three-layered architecture for XML Schema integration. The three layers included are namely pre-integration, comparison and integration. The three layers can conceptually be regarded as three phases of the integration process. During pre-integration, the schemas present in XML Schema notation are read and converted into the XSDM notation. During the comparison phase of integration, correspondences as well as conflicts between elements are identified. During the integration phase, conflict resolution, restructuring and merging of the initial schemas takes place to obtain the global schema. We define integration policies for integrating element definitions as well as their datatypes and attributes. An integrated global schema forms the basis for querying a set of local XML documents. We discuss various strategies for rewriting the global query over the global schema into the sub-queries over local schemas. Their respective local schemas validate the subqueries over the local XML documents. This requires the identification and use of mapping rules and relationships between the local schemas.
Abstract. Modern information systems may exploit numerous XML formats for communication. Each message may have its own XML format for data representation which causes problems with integration and evolution of their schemas. Manual integration and management of evolution of the XML formats may be very hard. We tackled this problem in our previous work, however, for simplicity reasons, we omitted the possibility of exploiting reusable schema parts.
Label Streams, Semantics Utilization and Data Query Technologies
In XML Data Integration, data/metadata merging and query processing are indispensable. Specifically, merging integrates multiple disparate (heterogeneous and autonomous) input data sources together for further usage, while query processing is one main reason why the data need to be integrated in the first place. Besides, when supported with appropriate user feedback techniques, queries can also provide contexts in which conflicts among the input sources can be interpreted and resolved. The flexibility of XML structure provides opportunities for alleviating some of the difficulties that other less flexible data types face in the presence of uncertainty; yet, this flexibility also introduces new challenges in merging multiple sources and query processing over integrated data. In this chapter, the authors discuss two alternative ways XML data/schema can be integrated: conflict-eliminating (where the result is cleaned from any conflicts that the different sources might have with each ot...
2004
Reconciling of knowledge from multiple heterogeneous data sources has been a major focus of database research for more than a decade.As a standard for exchanging business data on the WWW, XML should provide the ability of expressing data and semantics among them. Since most of application data are stored in relational databases due to its popularity and rich development experiences over it.Therefore, how to provide a proper mapping approach from relational model to XML model becomes the major research problem in the field of current information exchanging, sharing and integration..The model needs to be integrated and at the same time maintain the semantic knowledge among the data. The aim of this paper is to provide an overview for XML based data integration on semantic knowledge.At the end of the paper, we review some methodologies from existing literature.
Lecture Notes in Computer Science, 2012
Querying XML databases and their integration is an emerging area in applications related to database management systems. XML documents having similar data may differ in their organization. Hence a single query may not produce uniform results in all the documents. Different queries have to be generated for each of these hierarchies. To avoid this overhead, it is proposed to restructure and integrate different schemas by transforming all the XML schemas to a unique schema. We propose an algorithm to generate a unified query to extract results from this unique schema. Our work is demonstrated using an application of Primary Health Care (PHC) data maintenance system.
Journal of Digital Information Management, 2005
In this paper we present an ontology-based method for formalizing the implicit semantic and we suggest mechanisms to semantically integrate XML schemas and documents as well. After a survey of database interoperability, we present our semantic integration approach by explaining the nature of ontology. The article then presents our integration method for XML data and schemas using a generic ontology.
J. Univers. Comput. Sci., 2004
In this paper we propose X-Global, a novel, almost automatic and semantic system for integrating a set of XML sources. X-Global is parametric w.r.t. the flexibility level against which the integration task is performed; indeed, it can operate on rigid contexts, when two concepts are merged only if they have exactly the same meaning, as well as on flexible and informal situations, when two concepts are merged if they have close, even if not exactly identical, meanings. In this paper we describe the system in all details, illustrate various theoretical results as well as several experiments we have carried out for verifying its performance. Finally, we examine related literature and point out similarities and differences between X-Global and several other approaches already proposed in the past.
In this paper we propose a novel, almost automatic and semantic approach for integrating a set of XML sources. The approach is parametric w.r.t, the flexibility degree against which the integration task is performed," indeed, it can operate on rigid contexts, when two concepts are merged only if they have exactly the same meaning, as well as on flexible and informal situations, when two concepts are merged if they have close, even if not exactly identical, meanings. This paper describes the approach in all details, illustrates various theoretical results, examines the related literature and points out similarities and differences between the proposed approach and the existing ones. 0-7803-7983-7/03/$17.00 ©2003 IEEE An x-component is characterized by its name, its type, its cardinality and, if it is an element, by its content specification. These last three features are better specified by the following definitions.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.