Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2003, Web, Web-Services, and Database Systems
Recently, schema matching has found considerable interest in both research and practice. Determining matching components of database or XML schemas is needed in many applications, e.g. for E-business and data integration. Various schema matching systems have been developed to solve the problem semi-automatically. While there have been some evaluations, the overall effectiveness of currently available automatic schema matching systems is largely unclear. This is because the evaluations were conducted in diverse ways making it difficult to assess the effectiveness of each single system, let alone to compare their effectiveness. In this paper we survey recently published schema matching evaluations. For this purpose, we introduce the major criteria that influence the effectiveness of a schema matching approach and use these criteria to compare the various systems. Based on our observations, we discuss the requirements for future match implementations and evaluations.
The VLDB Journal, 2001
Schema matching is a basic problem in many database application domains, such as data integration, Ebusiness, data warehousing, and semantic query processing. In current implementations, schema matching is typically performed manually, which has significant limitations. On the other hand, previous research papers have proposed many techniques to achieve a partial automation of the match operation for specific application domains. We present a taxonomy that covers many of these existing approaches, and we describe the approaches in some detail. In particular, we distinguish between schema-level and instance-level, element-level and structure-level, and language-based and constraint-based matchers. Based on our classification we review some previous match implementations thereby indicating which part of the solution space they cover. We intend our taxonomy and review of past work to be useful when comparing different approaches to schema matching, when developing a new match algorithm, and when implementing a schema matching component.
VLDB '02: Proceedings of the 28th International Conference on Very Large Databases, 2002
Schema matching is the task of finding semantic correspondences between elements of two schemas. It is needed in many database applications, such as integration of web data sources, data warehouse loading and XML message mapping. To reduce the amount of user effort as much as possible, automatic approaches combining several match techniques are required. While such match approaches have found considerable interest recently, the problem of how to best combine different match algorithms still requires further work. We have thus developed the COMA schema matching system as a platform to combine multiple matchers in a flexible way. We provide a large spectrum of individual matchers, in particular a novel approach aiming at reusing results from previous match operations, and several mechanisms to combine the results of matcher executions. We use COMA as a framework to comprehensively evaluate the effectiveness of different matchers and their combinations for real-world schemas. The results obtained so far show the superiority of combined match approaches and indicate the high value of reuse-oriented strategies.
2007
Schema matching is the process of developing semantic matches between two or more schemas. The purpose of schema matching is generally either to merge two or more databases, or to enable queries on multiple, heterogeneous databases to be formulated on a single schema (Doan and Halevy 2005). This paper develops a taxonomy of schema matching approaches, classifying them as being based on a combination schema matching technique and the type of data used by those techniques. Schema matching techniques are categorized as being based on rules, learning, or ontology, and the type of data used is categorized as being based on schema elements or instance data. This taxonomy is an extension to previous work, and significant current research efforts are categorized using this taxonomy. Several of these research efforts are profiled and their categorization in the taxonomy is explored. The current research is used to identify the directions in which future research is headed.
VLDB Journal
Schema matching is a basic problem in many database application domains, such as data integration, E-business, data warehousing, and semantic query processing. In current implementations, schema matching is typically performed manually, which has significant limitations. On the other hand, in previous research many techniques have been proposed to achieve a partial automation of the Match operation for specific application domains. We present a taxonomy that covers many of the existing approaches, and we describe these approaches in some detail. In particular, we distinguish between schema-and instance-level, element-and structure-level, and language-and constraint-based matchers. Based on our classification we review some previous match implementations thereby indicating which part of the solution space they cover. We intend our taxonomy and review of past work to be useful when comparing different approaches to schema matching, when developing a new match algorithm, and when implementing a schema matching component.
Indonesian Journal of Electrical Engineering and Computer Science, 2018
The main issue concern of schema matching is how to support the merging decision by providing matching between attributes of different schemas. There have been many works in the literature toward utilizing database instances to detect the correspondence between attributes. Most of these previous works aim at improving the match accuracy. We observed that no technique managed to provide an accurate matching for different types of data. In other words, some of the techniques treat numeric values as strings. Similarly, other techniques process textual instance, as numeric, and this negatively influences the process of discovering the match and compromising the matching result. Thus, a practical comparative study between syntactic and semantic techniques is needed. The study emphasizes on analyzing these techniques to determine the strengths and weaknesses of each technique. This paper aims at comparing two different instance-based matching techniques, namely: (i) regular expression and (ii) Google similarity to identify the match between attributes. Several analyses have been conducted on real and synthetic data sets to evaluate the performance of these techniques with respect to Precision (P), Recall (R) and F-Measure.
Lecture Notes in Computer Science, 2005
Schema and ontology matching is a critical problem in many application domains, such as semantic web, schema/ontology integration, data warehouses, e-commerce, etc. Many different matching solutions have been proposed so far. In this paper we present a new classification of schema-based matching techniques that builds on the top of state of the art in both schema and ontology matching. Some innovations are in introducing new criteria which are based on (i) general properties of matching techniques, (ii) interpretation of input information, and (iii) the kind of input information. In particular, we distinguish between approximate and exact techniques at schema-level; and syntactic, semantic, and external techniques at element-and structure-level. Based on the classification proposed we overview some of the recent schema/ontology matching systems pointing which part of the solution space they cover. The proposed classification provides a common conceptual basis, and, hence, can be used for comparing different existing schema/ontology matching techniques and systems as well as for designing new ones, taking advantages of state of the art solutions. For more information on the topic (e.g., tutorials, relevant events), please visit the Ontology Matching web-site at www.OntologyMatching.org
Lecture Notes in Electrical Engineering, 2013
Schema matching has been one of the basic tasks in almost every data intensive distributed applications such as enterprize information integration, collaborating web services, web catalogue integration, and schema based point to point database systems and so on. Typical schema matchers perform manually and use a set of matching algorithms with a composition function by using them in an arbitrary manner which results in wasteful computations and needs manual specification for different domains. Recently, there has been some schema matching strategy proposed with partial or full automation. Such a schema matching strategy is OntoMatch. In this paper, we propose an element level automated linguistic based schema matching strategy motivated by the concept of OntoMatch, with more powerful matching algorithms and definite property construction for matcher selection that produces better output. Experimental result is also provided to support the claim of the improvement.
2011
In a paper published in the 2001 VLDB Conference, we proposed treating generic schema matching as an independent problem. We developed a taxonomy of existing techniques, a new schema matching algorithm, and an approach to comparative evaluation. Since then, the field has grown into a major research topic. We briefly summarize the new techniques that have been developed and applications of the techniques in the commercial world. We conclude by discussing future trends and recommendations for further work.
Lecture Notes in Computer Science, 2005
We view match as an operator that takes two graph-like structures (e.g., XML schemas) and produces a mapping between the nodes of these graphs that correspond semantically to each other. Semantic schema matching is based on the two ideas: (i) we discover mappings by computing semantic relations (e.g., equivalence, more general); (ii) we determine semantic relations by analyzing the meaning (concepts, not labels) which is codified in the elements and the structures of schemas. In this paper we present basic and optimized algorithms for semantic schema matching, and we discuss their implementation within the S-Match system. We also validate the approach and evaluate S-Match against three state of the art matching systems. The results look promising, in particular for what concerns quality and performance.
Today schema matching is a basic task in almost every data intensive distributed application, namely enterprise information integration, collaborating web services, ontology based agents communication, web catalogue integration and schema based P2P database systems. There has been a plethora of algorithms and techniques researched in schema matching and integration for data interoperability. Numerous surveys have been presented in the past to summarize this research. The requirement for extending the previous surveys has been created because of the mushrooming of the dynamic nature of these data intensive applications. Indeed, evolving large scale distributed information systems are further pushing the schema matching research to utilize the processing power not available in the past and directly increasing the industry investment proportion in the matching domain. This article reviews the latest application domains in which schema matching is being utilized. The paper gives a detailed insight about the desiderata for schema matching and integration in the large scale scenarios. Another panorama which is covered by this survey is the shift from manual to automatic schema matching. Finally the paper presents the state of the art in large scale schema matching, classifying the tools and prototypes according to their input, output and execution strategies and algorithms.
Information Systems, 2007
Current schema matching approaches still have to improve for large and complex Schemas. The large search space increases the likelihood for false matches as well as execution times. Further difficulties for Schema matching are posed by the high expressive power and versatility of modern schema languages, in particular user-defined types and classes, component reuse capabilities, and support for distributed schemas and namespaces. To better assist the user in matching complex schemas, we have developed a new generic schema matching tool, COMA++, providing a library of individual matchers and a flexible infrastructure to combine the matchers and refine their results. Different match strategies can be applied including a new scalable approach to identify context-dependent correspondences between schemas with shared elements and a fragment-based match approach which decomposes a large match task into smaller tasks. We conducted a comprehensive evaluation of the match strategies using large e-Business standard schemas. Besides providing helpful insights for future match implementations, the evaluation demonstrated the practicability of our system for matching large schemas. r
International Journal of Advanced Computer Science and Applications
Schema matching is a crucial issue in applications that involve multiple databases from heterogeneous sources. Schema matching evolves from a manual process to a semiautomated process to effectively guide users in finding commonalities between schema elements. New models are generally developed using a combination of methods to improve the effectiveness of schema matching results. Our previous research has developed a prototype of hybrid schema matching utilizing a combination of constraints-based method and an instance-based method. The innovation of this paper presents a mathematical formulation of a hybrid schema matching model so it can be run for different cases and becomes the basis of development to improve the effectiveness of output and or efficiency during schema matching process. The developed mathematical model serves to perform the main task in the schema matching process that matches the similarity between attributes, calculates the similarity value of the attribute pair, and specifies the matching attribute pair. Based on the test results, a hybrid schema matching model is more effective than the constraints-based method or instance-based method run individually. The more matching criteria used in the schema matching provide better mapping results. The model developed is limited to schema matching processes in the relational model database.
International Journal of Electrical and Computer Engineering (IJECE), 2016
Schema matching is critical problem within many applications to integration of data/information, to achieve interoperability, and other cases caused by schematic heterogeneity. Schema matching evolved from manual way on a specific domain, leading to a new models and methods that are semi-automatic and more general, so it is able to effectively direct the user within generate a mapping among elements of two the schema or ontologies better. This paper is a summary of literature review on models and prototypes on schema matching within the last 25 years to describe the progress of and research chalenge and opportunities on a new models, methods, and/or prototypes.
International Journal of Distributed Systems and Technologies, 2010
With the development and the use of a large variety of DB schemas and ontologies, in many domains (e.g. semantic web, digital libraries, life science, etc), matching techniques are called to overcome the challenge of aligning and reconciling these different interrelated representations. Matching field is becoming a very attractive research topic. In this paper, we are interested in studying scalable matching problem. We survey the approaches and tools of large scale matching, when a large number of schemas/ontologies and attributes are involved. We attempt to cover a variety of techniques for schema matching called Pair-wise and Holistic. One can acknowledge that this domain is on top of effervescence and scalable matching needs many more advances. Therefore, we propose our scalable schema matching methodology that deals with the creation of a hybrid approach combining these techniques. Our architecture includes a prematching approach based on XML schemas decomposition. As shown by our experiments, our proposed methodology has been evaluated and implementing in a PLASMA (Platform for LArge Scale MAtching) prototype.
Schema Matching and Mapping, 2010
The increasing demand of matching and mapping tasks in modern integration scenarios has led to a plethora of tools for facilitating these tasks. While the plethora made these tools available to a broader audience, it led into some form of confusion regarding the exact nature, goals, core functionalities expected features and basic capabilities of these tools. Above all, it made performance measurements of these systems and their distinction, a difficult task. The need for design and development of comparison standards that will allow the evaluation of these tools is becoming apparent. These standards are particularly important to mapping and matching system users since they allow them to evaluate the relative merits of the systems and take the right business decisions. They are also important to mapping system developers, since they offer a way of comparing the system against competitors, and motivating improvements and further development. Finally, they are important to researchers since they serve as illustrations of the existing system limitations, triggering further research in the area. In this work we provide a generic overview of the existing efforts on benchmarking schema matching and mapping tasks. We offer a comprehensive description of the problem, list the basic comparison criteria and techniques and provide a description of the main functionalities and characteristics of existing systems.
2012 IEEE 28th International Conference on Data Engineering, 2012
Mapping complex metadata structures is crucial in a number of domains such as data integration, ontology alignment or model management. To speed up the generation of such mappings, automatic matching systems were developed to compute mapping suggestions that can be corrected by a user. However, constructing and tuning match strategies still requires a high manual effort by matching experts as well as correct mappings to evaluate generated mappings. We therefore propose a self-configuring schema matching system that is able to automatically adapt to the given mapping problem at hand. Our approach is based on analyzing the input schemas as well as intermediate matching results. A variety of matching rules use the analysis results to automatically construct and adapt an underlying matching process for a given match task. We comprehensively evaluate our approach on different mapping problems from the schema, ontology and model management domains. The evaluation shows that our system is able to robustly return good quality mappings across different mapping problems and domains.
Proceedings of the ICSOFT 2006 - International conference on Software and Data Technologies, pp 115-120, Setubal, Portugal, 11-14 Sept. 2006.
In order to deal with the problem of semantic and schematic heterogeneity in collaborative networks, matching components among database schemas need to be identified and heterogeneity needs to be resolved, by creating the corresponding mappings in a process called schema matching. One important step in this process is the identification of the syntactic and semantic similarity among elements from different schemas, usually referred to as Linguistic Matching. The Linguistic Matching component of a schema matching and integration system, called SASMINT, is the focus of this paper. Unlike other systems, which typically utilize only a limited number of similarity metrics, SASMINT makes an effective use of NLP techniques for the Linguistic Matching and proposes a weighted usage of several syntactic and semantic similarity metrics. Since it is not easy for the user to determine the weights, SASMINT provides a component called Sampler as another novelty, to support automatic generation of weights.
Computer Systems: Science & Engineering, 2010
Schema matching plays a key role in many different applications, such as schema integration, data integration, data warehousing, data transformation, E-commerce, peer-to-peer data management, ontology matching and integration, semantic Web, semantic query processing, etc. Manual matching is expensive and error-prone, so it is therefore important to develop techniques to automate the schema matching process. In this paper, we present a solution for XML schema automated matching problem which produces semantic mappings between corresponding schema elements of given source and target schemas. This solution contributed in solving more comprehensively and efficiently XML schema automated matching problem. Our solution based on combining linguistic similarity, data type compatibility and structural similarity of XML schema elements. After describing our solution, we present experimental results that demonstrate the effectiveness of this approach.
Journal of Digital Information Management, 2010
Schema matching is a basic problem in many database application domains, such as data integration. The problem of schema matching can be formulated as follows, "given two schemas, S i and S j , find the most plausible correspondences between the elements of S i and S j , exploiting all available information, such as the schemas, instance data, and auxiliary sources" [24]. Given the rapidly increasing number of data sources to integrate and due to database heterogeneities, manually identifying schema matches is a tedious, time consuming, error-prone, and therefore expensive process. As systems become able to handle more complex databases and applications, their schemas become large, further increasing the number of matches to be performed. Thus, automating this process, which attempts to achieve faster and less labor-intensive, has been one of the main tasks in data integration. However, it is not possible to determine fully automatically the different correspondences between schemas, primarily because of the differing and often not explicated or documented semantics of the schemas. Several solutions in solving the issues of schema matching have been proposed. Nevertheless, these solutions are still limited, as they do not explore most of the available information related to schemas and thus affect the result of integration. This paper presents an approach for matching schemas of heterogeneous relational databases that utilizes most of the information related to schemas, which indirectly explores the implicit semantics of the schemas, that further improves the results of the integration.
Lecture Notes in Computer Science, 2009
Schema matching is the problem of finding relationships among concepts across heterogeneous data sources (heterogeneous in format and in structure). Starting from the "hidden meaning" associated to schema labels (i.e. class/attribute names) it is possible to discover relationships among the elements of different schemata. Lexical annotation (i.e. annotation w.r.t. a thesaurus/lexical resource) helps in associating a "meaning" to schema labels. However, accuracy of semi-automatic lexical annotation methods on real-world schemata suffers from the abundance of non-dictionary words such as compound nouns and word abbreviations. In this work, we address this problem by proposing a method to perform schema labels normalization which increases the number of comparable labels. Unlike other solutions, the method semi-automatically expands abbreviations and annotates compound terms, with a minimal manual effort. We empirically prove that our normalization method helps in the identification of similarities among schema elements of different data sources, thus improving schema matching accuracy.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.