Papers by mauricio hernandez
Proceedings of the 13th International Conference on Database Theory - ICDT '10, 2010
We examine schema mappings from a type-theoretic perspective and aim to facilitate and formalize ... more We examine schema mappings from a type-theoretic perspective and aim to facilitate and formalize the reuse of mappings. Starting with the mapping language of Clio, we present a type-checking algorithm such that typeable mappings are necessarily satisfiable. We add type variables to the schema language and present a theory of polymorphism, including a sound and complete type inference algorithm and a semantic notion of a principal type of a mapping. Principal types, which intuitively correspond to the minimum amount of schema structure required by the mappings, have an important application for mapping reuse. Concretely, we show that mappings can be reused, with the same semantics, on any schemas as long as these schemas are expansions (i.e., subtypes) of the principal types.
Artemis
Proceedings of the VLDB Endowment, 2009

Proceedings of the VLDB Endowment, 2008
Data exchange is the process of converting an instance of one schema into an instance of a differ... more Data exchange is the process of converting an instance of one schema into an instance of a different schema according to a given specification. Recent data exchange systems have largely dealt with the case where the schemas are given a priori and transformations can only migrate data from the first schema to an instance of the second schema. In particular, the ability to perform data-metadata translations, transformation in which data is converted into metadata or metadata is converted into data, is largely ignored. This paper provides a systematic study of the data exchange problem with data-metadata translation capabilities. We describe the problem, our solution, implementation and experiments. Our solution is a principled and systematic extension of the existing data exchange framework; all the way from the constructs required in the visual interface to specify data-metadata correspondences, which naturally extend the traditional value correspondences, to constructs required for the mapping language to specify data-metadata translations, and algorithms required for generating mappings and queries that perform the exchange.
Explaining missing answers to SPJUA queries
Proceedings of the VLDB Endowment, 2010
This paper addresses the problem of explaining missing answers in queries that include selection,... more This paper addresses the problem of explaining missing answers in queries that include selection, projection, join, union, aggrega- tion and grouping (SPJUA). Explaining missing answers of queries is useful in various scenarios, including query understanding and debugging. We present a general framework for the generation of these explanations based on source data. We describe the algo- rithms used to generate
MapMerge
Proceedings of the VLDB Endowment, 2010
2008 IEEE 24th International Conference on Data Engineering, 2008
This paper describes Orchid, a system that converts declarative mapping specifications into data ... more This paper describes Orchid, a system that converts declarative mapping specifications into data flow specifications (ETL jobs) and vice versa. Orchid provides an abstract operator model that serves as a common model for both transformation paradigms; both mappings and ETL jobs are transformed into instances of this common model. As an additional benefit, instances of this common model can be optimized and deployed into multiple target environments. Orchid is being deployed in FastTrack, a data transformation toolkit in IBM Information Server.

2008 IEEE 24th International Conference on Data Engineering, 2008
Many data integration solutions in the market today include tools for schema mapping, to help use... more Many data integration solutions in the market today include tools for schema mapping, to help users visually relate elements of different schemas. Schema elements are connected with lines, which are interpreted as mappings, i.e. high-level logical expressions capturing the relationship between source and target data-sets; these are compiled into queries and programs that convert source-side data instances into target-side instances. This paper describes Clip, an XML Schema mapping tool distinguished from existing tools in that mappings explicitly specify structural transformations in addition to value couplings. Since Clip maps hierarchical XML schemas, lines appear naturally nested. We describe the transformation semantics associated with our "lines" and how they combine to form mappings that are more expressive than those generated by Clio, a well-known mapping tool. Further, we extend Clio's mapping generation algorithms to generate Clip's mappings.
Clio
ACM SIGMOD Record, 2001

The VLDB Journal, 2012
One of the main steps towards integration or exchange of data is to design the mappings that desc... more One of the main steps towards integration or exchange of data is to design the mappings that describe the (often complex) relationships between the source schemas or formats and the desired target schema. In this paper, we introduce a new operator, called Map-Merge, that can be used to correlate multiple, independently designed schema mappings of smaller scope into larger schema mappings. This allows a more modular construction of complex mappings from various types of smaller mappings such as schema correspondences produced by a schema matcher or pre-existing mappings that were designed by either a human user or via mapping tools. In particular, the new operator also enables a new "divideand-merge" paradigm for mapping creation, where the design is divided (on purpose) into smaller components that are easier to create and understand, and where MapMerge is used to automatically generate a meaningful overall mapping. We describe our MapMerge algorithm and demonstrate the feasibility of our implementation on several real and synthetic mapping scenarios. In our experiments, we make use of a novel similarity measure between two database instances with different schemas that quantifies the preservation of data associations. We show experimentally that MapMerge improves the quality of the schema mappings, by significantly increasing the similarity between the input source instance and the generated target instance.
International Conference on Data Engineering, 2002
Merging and coalescing data from multiple and diverse sources into different data formats continu... more Merging and coalescing data from multiple and diverse sources into different data formats continues to be an important problem in modern information systems. Schema matching (the process of matching elements of a source schema with elements of a target schema) and schema mapping (the process of creating a query that maps between two disparate schemas) are at the heart of data integration systems. We demonstrate Clio, a semi-automatic schema mapping tool developed at the IBM Almaden Research Center. In this paper, we showcase Clio's mapping engine which allows mapping to and from relational and XML schemas, and takes advantage of data constraints in order to preserve data associations
Sigmod Record, 2001
Page 1. The Clio Project: Managing Heterogeneity Ren~e J. Miller 1 Mauricio A. Hern£ndez 2 Laura ... more Page 1. The Clio Project: Managing Heterogeneity Ren~e J. Miller 1 Mauricio A. Hern£ndez 2 Laura M. Haas 2 CT Howard Ho 2 Ronald Fagin 2 Lucian Popa 2 1Univ. of Toronto 2IBM Almaden Research Center [email protected] Lingling Yan 2 ...

International Conference on Management of Data, 2005
Clio, the IBM Research system for expressing declarative schema mappings, has progressed in the p... more Clio, the IBM Research system for expressing declarative schema mappings, has progressed in the past few years from a research prototype into a technology that is behind some of IBM's mapping technology. Clio provides a declarative way of specifying schema mappings between either XML or relational schemas. Mappings are compiled into an abstract query graph representation that captures the transformation semantics of the mappings. The query graph can then be serialized into different query languages, depending on the kind of schemas and systems involved in the mapping. Clio currently produces XQuery, XSLT, SQL, and SQL/XML queries. In this paper, we revisit the architecture and algorithms behind Clio. We then discuss some implementation issues, optimizations needed for scalability, and general lessons learned in the road towards creating an industrial-strength tool.
Uploads
Papers by mauricio hernandez