Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2011, ACM Transactions on Database Systems
Schema mappings are high-level specifications that describe the relationship between two database schemas; they are considered to be the essential building blocks in data exchange and data integration, and have been the object of extensive research investigations. Since in real-life applications schema mappings can be quite complex, it is important to develop methods and tools for understanding, explaining, and refining schema mappings. A promising approach to this effect is to use “good” data examples that illustrate the schema mapping at hand. We develop a foundation for the systematic investigation of data examples and obtain a number of results on both the capabilities and the limitations of data examples in explaining and understanding schema mappings. We focus on schema mappings specified by source-to-target tuple generating dependencies (s-t tgds) and investigate the following problem: which classes of s-t tgds can be “uniquely characterized” by a finite set of data examples?...
Proceedings of the 2011 international conference on Management of data - SIGMOD '11, 2011
A schema mapping is a specification of the relationship between a source schema and a target schema. Schema mappings are fundamental building blocks in data integration and data exchange and, as such, obtaining the right schema mapping constitutes a major step towards the integration or exchange of data. Up to now, schema mappings have typically been specified manually or have been derived using mapping-design systems that automatically generate a schema mapping from a visual specification of the relationship between two schemas.
2013
Embracing Incompleteness in Schema Mappings Patricia C. Rodriguez-Gianolli Doctor of Philosophy Graduate Department of Computer Science University of Toronto 2013 Various forms of information integration have become ubiquitous in current Business Intelligence (BI) technologies. In many cases, the semantic relationship between heterogeneous data sources is specified using high-level declarative rules, called schema mappings. For decades, Skolem functions have been regarded as an important tool in schema mappings as they permit a precise representation of incomplete information. The powerful mapping language of second-order tuple generating dependencies (SO tgds) permits arbitrary Skolem functions and has been proven to be the right class for modeling many integration problems, such as composition and correlation of mappings. This language is strictly more powerful than the languages used in many integration systems, including source-to-target and nested tgds which are both first-orde...
Lecture Notes in Computer Science, 2009
The Clio project provides tools that vastly simplify information integration. Information integration requires data conversions to bring data in different representations into a common form. Key contributions of Clio are the definition of non-procedural schema mappings to describe the relationship between data in heterogeneous schemas, a new paradigm in which we view the mapping creation process as one of query discovery, and algorithms for automatically generating queries for data transformation from the mappings. Clio provides algorithms to address the needs of two major information integration problems, namely, data integration and data exchange. In this chapter, we present our algorithms for both schema mapping creation via query discovery, and for query generation for data exchange. These algorithms can be used in pure relational, pure XML, nested relational, or mixed relational and nested contexts.
The VLDB Journal, 2011
Schema mappings are high-level specifications that describe the relationship between two database schemas. They are an important tool in several areas of database research, notably in data integration and data exchange. However, a concrete theory of schema mapping optimization including the formulation of optimality criteria and the construction of algorithms for computing optimal schema mappings is completely lacking to date. The goal of this work is to fill this gap. We start by presenting a system of rewrite rules to minimize sets of source-to-target tuple-generating dependencies (st-tgds, for short). Moreover, we show that the result of this minimization is unique up to variable renaming. Hence, our optimization also yields a schema mapping normalization. By appropriately extending our rewrite rule system, we also provide a normalization of schema mappings containing equality-generating targetdependencies (egds). An important application of such a normalization is in the area of defining the semantics of query answering in data exchange, since several definitions in this area depend on the concrete syntactic representation of the st-tgds. This is, in particular, the case for queries with negated atoms and for aggregate queries. The normalization of schema mappings allows us to eliminate the effect of the concrete syntactic representation of the st-tgds from the semantics of query answering. We discuss in detail how our results can be fruitfully applied to aggregate queries.
2005
Schema mappings are high-level specifications that describe the relationship between database schemas. Schema mappings are prominent in several different areas of database management, including database design, information integration, data exchange, metadata management, and peer-topeer data management systems. Our main aim in this paper is to present an overview of recent advances in data exchange and metadata management, where the schema mappings are between relational schemas. In addition, we highlight some research issues and directions for future work.
Proceedings of the 12th …, 2009
Schema mappings define relationships between schemas in a declarative way. We demonstrate MVT, a mapping validation tool that allows the designer to ask whether the mapping has certain desirable properties. The answers to these questions will provide information on whether the mapping adequately matches the intended needs and requirements. MVT is able to deal with a highly expressive class of mappings and database schemas, which allows the use of negations, order comparisons and null values. The tool does not only provide a Boolean answer as test result, but also a feedback for that result. Depending on the tested property and on the test result, the provided feedback can be in the form of example schema instances, or in the form of an explanation, that is, highlighting the mapping assertions and schema constraints responsible for getting such a result.
Proceedings of the 26th International Conference on Scientific and Statistical Database Management - SSDBM '14, 2014
Automatic schema matching algorithms are typically only concerned with finding attribute correspondences. However, real world data integration problems often require matchings whose arguments span all three types of elements in relational databases: relation, attribute and data value. This paper introduces the definitions and semantics of three additional correspondence types concerning both schema and data values. These correspondences cover the higher-order mappings identified in a seminal paper by Krishnamurthy, Litwin, and Kent. It is shown that these correspondences can be automatically translated to tuple generating dependencies (tgds), and thus this research is compatible with data integration applications that leverage tgds.
Conceptual Modeling-ER 2007, 2007
Schema mappings come in different flavors: simple correspondences are produced by schema matchers, intensional mappings are used for schema integration. However, the execution of mappings requires a formalization based on the extensional semantics of models. This problem is aggravated if multiple metamodels are involved. In this paper we present extensional mappings, that are based on second order tuple generating dependencies, between models in our Generic Role-based Metamodel GeRoMe. By using a generic metamodel, our mappings support data translation between heterogeneous metamodels. Our mapping representation provides grouping functionalities that allow for complete restructuring of data, which is necessary for handling nested data structures such as XML and object oriented models. Furthermore, we present an algorithm for mapping composition and optimization of the composition result. To verify the genericness, correctness, and composability of our approach we implemented a data translation tool and mapping export for several data manipulation languages.
The VLDB Journal, 2012
One of the main steps towards integration or exchange of data is to design the mappings that describe the (often complex) relationships between the source schemas or formats and the desired target schema. In this paper, we introduce a new operator, called Map-Merge, that can be used to correlate multiple, independently designed schema mappings of smaller scope into larger schema mappings. This allows a more modular construction of complex mappings from various types of smaller mappings such as schema correspondences produced by a schema matcher or pre-existing mappings that were designed by either a human user or via mapping tools. In particular, the new operator also enables a new "divideand-merge" paradigm for mapping creation, where the design is divided (on purpose) into smaller components that are easier to create and understand, and where MapMerge is used to automatically generate a meaningful overall mapping. We describe our MapMerge algorithm and demonstrate the feasibility of our implementation on several real and synthetic mapping scenarios. In our experiments, we make use of a novel similarity measure between two database instances with different schemas that quantifies the preservation of data associations. We show experimentally that MapMerge improves the quality of the schema mappings, by significantly increasing the similarity between the input source instance and the generated target instance.
Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems - PODS '08, 2008
A schema mapping is a high-level specification that describes the relationship between two database schemas. As schema mappings constitute the essential building blocks of data exchange and data integration, an extensive investigation of the foundations of schema mappings has been carried out in recent years. Even though several different aspects of schema mappings have been explored in considerable depth, the study of schema-mapping optimization remains largely uncharted territory to date.
The VLDB Journal, 2012
The inversion of schema mappings has been identified as one of the fundamental operators for the development of a general framework for metadata management. During the last few years three alternative notions of inversion for schema mappings have been proposed (Fagin-inverse [11], quasi-inverse [15] and maximum recovery [4]). However, these notions lack some fundamental properties which limit their practical applicability: most of them are expressed in languages including features that are difficult to use in practice, some of these inverses are not guaranteed to exist for mappings specified with source-to-target tuple-generating dependencies (st-tgds), and it has been futile to search for a meaningful mapping language that is closed under any of these notions of inverse. In this paper, we develop a framework for the inversion of schema mappings that fulfills all of the above requirements. It is based on the notion of C-maximum recovery, for a query language C, a notion designed to generate inverse mappings that recover back only the information that can be retrieved with queries in C. By focusing on the language of conjunctive queries (CQ), we are able to find a mapping language that contains the class of st-tgds, is closed under CQ-maximum recovery, and for which the chase procedure can be used to exchange data efficiently. Furthermore, we A preliminary version of this article appeared in PVLDB [3].
2013
Abstract A schema mapping is a formal specification of the relationship holding between the databases conforming to two given schemas, called source and target, respectively. While in the general case a schema mapping is specified in terms of assertions relating two queries in some given language, various simplified forms of mappings, in particular lav and gav, have been considered, based on desirable properties that these forms enjoy.
2008 IEEE 24th International Conference on Data Engineering, 2008
Many data integration solutions in the market today include tools for schema mapping, to help users visually relate elements of different schemas. Schema elements are connected with lines, which are interpreted as mappings, i.e. high-level logical expressions capturing the relationship between source and target data-sets; these are compiled into queries and programs that convert source-side data instances into target-side instances. This paper describes Clip, an XML Schema mapping tool distinguished from existing tools in that mappings explicitly specify structural transformations in addition to value couplings. Since Clip maps hierarchical XML schemas, lines appear naturally nested. We describe the transformation semantics associated with our "lines" and how they combine to form mappings that are more expressive than those generated by Clio, a well-known mapping tool. Further, we extend Clio's mapping generation algorithms to generate Clip's mappings.
2011
Recent results in schema-mapping and data-exchange research may be considered the starting point for a new generation of systems, capable of dealing with a significantly larger class of applications. In this paper we demonstrate the first of these second-generation systems, called ++Spicy. We introduce a number of scenarios from a variety of data management tasks, such as data fusion, data cleaning, and ETL, and show how, based on the system, schema mappings and data exchange techniques can be very effectively applied to these contexts. We compare ++Spicy to the previous generations of tools, to show that this is much-needed advancement in the field.
Proceedings of the 13th International Conference on Extending Database Technology - EDBT '10, 2010
The specification of schema mappings has proved to be time and resource consuming, and has been recognized as a critical bottleneck to the large scale deployment of data integration systems. In an attempt to address this issue, dataspaces have been proposed as a data management abstraction that aims to reduce the up-front cost required to setup a data integration system by gradually specifying schema mappings through interaction with end users in a pay-asyou-go fashion. As a step in this direction, we explore an approach for incrementally annotating schema mappings using feedback obtained from end users. In doing so, we do not expect users to examine mapping specifications; rather, they comment on results to queries evaluated using the mappings. Using annotations computed on the basis of user feedback, we present a method for selecting from the set of candidate mappings, those to be used for query evaluation considering user requirements in terms of precision and recall. In doing so, we cast mapping selection as an optimization problem. Mapping annotations may reveal that the quality of schema mappings is poor. We also show how feedback can be used to support the derivation of better quality mappings from existing mappings through refinement. An evolutionary algorithm is used to efficiently and effectively explore the large space of mappings that can be obtained through refinement. The results of evaluation exercises show the effectiveness of our solution for annotating, selecting and refining schema mappings.
Schema Matching and Mapping, 2010
The increasing demand of matching and mapping tasks in modern integration scenarios has led to a plethora of tools for facilitating these tasks. While the plethora made these tools available to a broader audience, it led into some form of confusion regarding the exact nature, goals, core functionalities expected features and basic capabilities of these tools. Above all, it made performance measurements of these systems and their distinction, a difficult task. The need for design and development of comparison standards that will allow the evaluation of these tools is becoming apparent. These standards are particularly important to mapping and matching system users since they allow them to evaluate the relative merits of the systems and take the right business decisions. They are also important to mapping system developers, since they offer a way of comparing the system against competitors, and motivating improvements and further development. Finally, they are important to researchers since they serve as illustrations of the existing system limitations, triggering further research in the area. In this work we provide a generic overview of the existing efforts on benchmarking schema matching and mapping tasks. We offer a comprehensive description of the problem, list the basic comparison criteria and techniques and provide a description of the main functionalities and characteristics of existing systems.
2007
Abstract In many applications it is important to find a meaningful relationship between the schemas of a source and target database. This relationship is expressed in terms of declarative logical expressions called schema mappings. The more successful previous solutions have relied on inputs such as simple element correspondences between schemas in addition to local schema constraints such as keys and referential integrity.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.