Explicitly Involving the User in a Data Cleaning Process

Helena Galhardas

Explicitly Involving the User in a Data Cleaning Process

Helena Galhardas

2010

Sign up for access to the world's latest research

checkGet notified about relevant papers

checkSave papers to use in your research

checkJoin the discussion with peers

checkTrack your impact

Abstract

Data cleaning and Extract-Transform-Load processes are usually modeled as graphs of data transformations. These graphs typically involve a large number of data transformations, and must handle large amounts of data. The involvement of the users responsible for executing the corresponding programs over real data is important to tune data transformations and to manually correct data items that cannot be treated automatically.

Helena Galhardas

2011

Data cleaning and ETL processes are usually modeled as graphs of data transformations. The involvement of the users responsible for executing these graphs over real data is important to tune data transformations and to manually correct data items that cannot be treated automatically. In this paper, in order to better support the user involvement in data cleaning processes, we equip a data cleaning graph with data quality constraints to help users identifying the points of the graph and the records that need their attention and manual data repairs for representing the way users can provide the feedback required to manually clean some data items. We provide preliminary experimental results that show the significant gains obtained with the use of data cleaning graphs.

Log In

Explicitly Involving the User in a Data Cleaning Process

Sign up for access to the world's latest research

Abstract

Related papers