Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2014, Proceedings of the 26th International Conference on Scientific and Statistical Database Management - SSDBM '14
…
12 pages
1 file
Automatic schema matching algorithms are typically only concerned with finding attribute correspondences. However, real world data integration problems often require matchings whose arguments span all three types of elements in relational databases: relation, attribute and data value. This paper introduces the definitions and semantics of three additional correspondence types concerning both schema and data values. These correspondences cover the higher-order mappings identified in a seminal paper by Krishnamurthy, Litwin, and Kent. It is shown that these correspondences can be automatically translated to tuple generating dependencies (tgds), and thus this research is compatible with data integration applications that leverage tgds.
Today schema matching is a basic task in almost every data intensive distributed application, namely enterprise information integration, collaborating web services, ontology based agents communication, web catalogue integration and schema based P2P database systems. There has been a plethora of algorithms and techniques researched in schema matching and integration for data interoperability. Numerous surveys have been presented in the past to summarize this research. The requirement for extending the previous surveys has been created because of the mushrooming of the dynamic nature of these data intensive applications. Indeed, evolving large scale distributed information systems are further pushing the schema matching research to utilize the processing power not available in the past and directly increasing the industry investment proportion in the matching domain. This article reviews the latest application domains in which schema matching is being utilized. The paper gives a detailed insight about the desiderata for schema matching and integration in the large scale scenarios. Another panorama which is covered by this survey is the shift from manual to automatic schema matching. Finally the paper presents the state of the art in large scale schema matching, classifying the tools and prototypes according to their input, output and execution strategies and algorithms.
VLDB '02: Proceedings of the 28th International Conference on Very Large Databases, 2002
Schema matching is the task of finding semantic correspondences between elements of two schemas. It is needed in many database applications, such as integration of web data sources, data warehouse loading and XML message mapping. To reduce the amount of user effort as much as possible, automatic approaches combining several match techniques are required. While such match approaches have found considerable interest recently, the problem of how to best combine different match algorithms still requires further work. We have thus developed the COMA schema matching system as a platform to combine multiple matchers in a flexible way. We provide a large spectrum of individual matchers, in particular a novel approach aiming at reusing results from previous match operations, and several mechanisms to combine the results of matcher executions. We use COMA as a framework to comprehensively evaluate the effectiveness of different matchers and their combinations for real-world schemas. The results obtained so far show the superiority of combined match approaches and indicate the high value of reuse-oriented strategies.
VLDB Journal
Schema matching is a basic problem in many database application domains, such as data integration, E-business, data warehousing, and semantic query processing. In current implementations, schema matching is typically performed manually, which has significant limitations. On the other hand, in previous research many techniques have been proposed to achieve a partial automation of the Match operation for specific application domains. We present a taxonomy that covers many of the existing approaches, and we describe these approaches in some detail. In particular, we distinguish between schema-and instance-level, element-and structure-level, and language-and constraint-based matchers. Based on our classification we review some previous match implementations thereby indicating which part of the solution space they cover. We intend our taxonomy and review of past work to be useful when comparing different approaches to schema matching, when developing a new match algorithm, and when implementing a schema matching component.
The VLDB Journal, 2001
Schema matching is a basic problem in many database application domains, such as data integration, Ebusiness, data warehousing, and semantic query processing. In current implementations, schema matching is typically performed manually, which has significant limitations. On the other hand, previous research papers have proposed many techniques to achieve a partial automation of the match operation for specific application domains. We present a taxonomy that covers many of these existing approaches, and we describe the approaches in some detail. In particular, we distinguish between schema-level and instance-level, element-level and structure-level, and language-based and constraint-based matchers. Based on our classification we review some previous match implementations thereby indicating which part of the solution space they cover. We intend our taxonomy and review of past work to be useful when comparing different approaches to schema matching, when developing a new match algorithm, and when implementing a schema matching component.
Schema-matching problem is the most basic level refers to the problem of mapping schema elements in one information repository to corresponding elements in a second repository. Schema matching is one of the key challenges in information integration. In fact, significant improvements will be observed. The technique that is in the existing paper is an instance-based technique. I emphasize that our claims is that this technique, is not the best of techniques to apply as a useful addition to a suite of automated schema mapping tools. I propose a new usage-based schema matching technique. The proposed technique exploits the usage information of the attributes in the query logs to find matches, in contrast to relying on the schema information or the data instances. The existing methods for weighted graph matching Algorithms for schema matching is compared with the new proposed methods of 1) Direct Tree Search 2) Ullman 3) Clique 4) Hash coding. The performance and accuracy of the schema matching of the various techniques are compared and analyzed.
IEEE Transactions on Knowledge and Data Engineering, 2017
Schema Matching (SM) and Record Matching (RM) are two necessary steps in integrating multiple relational tables of different schemas, where SM unifies the schemas and RM detects records referring to the same real-world entity. The two processes have been thoroughly studied separately, but few attention has been paid to the interaction of SM and RM. In this work, we find that, even alternating them in a simple manner, SM and RM can benefit from each other to reach a better integration performance (i.e., in terms of precision and recall). Therefore, combining SM and RM is a promising solution for improving data integration. To this end, we define novel matching rules for SM and RM, respectively, that is, every SM decision is made based on intermediate RM results, and vice versa, such that SM and RM can be performed alternately. The quality of integration is guaranteed by a Matching Likelihood Estimation model and the control of semantic drift, which prevent the effect of mismatch magnification. To reduce the computational cost, we design an index structure based on q-grams and a greedy search algorithm that can reduce around 90 percent overhead of the interaction. Extensive experiments on three data collections show that the combination and interaction between SM and RM significantly outperforms previous works that conduct SM and RM separately.
IEEE Transactions on Knowledge and Data Engineering, 2008
Schema matching is one of the key challenges in information integration. It is a labor-intensive and time-consuming process. To alleviate the problem, many automated solutions have been proposed. Most of the existing solutions mainly rely upon textual similarity of the data to be matched. However, there exist instances of the schema-matching problem for which they do not even apply. Such problem instances typically arise when the column names in the schemas and the data in the columns are opaque or very difficult to interpret. In our previous work, we proposed a two-step technique to address this problem. In the first step, we measure the dependencies between attributes within tables using an information-theoretic measure and construct a dependency graph for each table capturing the dependencies among attributes. In the second step, we find matching node pairs across the dependency graphs by running a graph-matching algorithm. In our previous work, we experimentally validated the accuracy of the approach. One remaining challenge is the computational complexity of the graph-matching problem in the second step. The problem instance we are facing is the weighted graph-matching problem to which no efficient solution has yet been found. In this paper, we extend the previous work by improving the second phase of the algorithm incorporating efficient approximation algorithms into the framework.
Today schema matching is a basic problem in almost every data intensive distributed application, namely enterprise information integration, collaborating web services, ontology based agents communication , web catalogue integration and schema based P2P database systems. There has been a plethora of algorithms and techniques researched in schema matching and integration for data interoperability. Numerous surveys have been presented in the past to summerize this research. The requirement for extending the previous surveys has been created because of the mushrooming of the dynamic nature of these data intensive applications. Today data is viewed as a semantic entity, motivating new algorithms and strategies. The evolving large scale distributed information systems are further pushing the schema matching research to utilize the processing power not available in the past. Thus directly increasing the industry investment proportion in the matching domain. This article reviews the latest ...
2011
In a paper published in the 2001 VLDB Conference, we proposed treating generic schema matching as an independent problem. We developed a taxonomy of existing techniques, a new schema matching algorithm, and an approach to comparative evaluation. Since then, the field has grown into a major research topic. We briefly summarize the new techniques that have been developed and applications of the techniques in the commercial world. We conclude by discussing future trends and recommendations for further work.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
Advanced Information Systems Engineering
2008
Indonesian Journal of Electrical Engineering and Computer Science, 2018
Journal of Digital Information Management, 2010
International Journal of Advanced Computer Science and Applications
International Journal of Electrical and Computer Engineering (IJECE), 2016
International Journal of Electrical and Computer Engineering (IJECE), 2016
2012 IEEE 28th International Conference on Data Engineering, 2012
Web, Web-Services, and Database Systems, 2003
ACM Transactions on Database Systems, 2011
Lecture Notes in Computer Science, 2013
Proceedings of the ICDE Workshop on …, 2008
Lecture Notes in Computer Science, 2005