An Ontology-based Methodology for Reusing Data Cleaning Knowledge

Ricardo Almeida; Paulo Maio; Paulo Oliveira; João Barroso

An Ontology-based Methodology for Reusing Data Cleaning Knowledge

2015, Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management

Sign up for access to the world's latest research

checkGet notified about relevant papers

checkSave papers to use in your research

checkJoin the discussion with peers

checkTrack your impact

Abstract

The organizations' demand to integrate several heterogeneous data sources and an ever-increasing volume of data is revealing the presence of quality problems in data. Currently, most of the data cleaning approaches (for detection and correction of data quality problems) are tailored for data sources with the same schema and sharing the same data model (e.g., relational model). On the other hand, these approaches are highly dependent on a domain expert to specify the data cleaning operations. This paper extends a previously proposed data cleaning methodology that reuses cleaning knowledge specified for other data sources. The methodology is further detailed/refined by specifying the requirements that a data cleaning operations vocabulary must satisfy. Ontologies in RDF/OWL are proposed as the data model for an abstract representation of the data schemas, no matter which data model is used (e.g., relational; graph). Existing approaches, methods and techniques that support the implementation of the proposed methodology, in general, and specifically of the data cleaning operations vocabulary are also presented and discussed in this paper.

Fátima Rodrigues

This paper describes an ontology-based approach to data cleaning. Data cleaning is the process of detecting and correcting errors in databases. An ontology is a formal explicit specification of a shared conceptualization of a domain. Our approach to data cleaning requires a set of ontologies describing the domains represented by the classes and their attributes. Using the ontology-based approach, we are able to clean data of not only syntactic errors but also some classes of semantic errors.

Log In

An Ontology-based Methodology for Reusing Data Cleaning Knowledge

Sign up for access to the world's latest research

Abstract

Related papers