Data quality dimension is a term to identify quality measure that is related to many data elements including attribute, record, table, system or more abstract groupings such as business unit, company or product range. This paper presents a thorough analysis of three data quality dimensions which are completeness, relevance, and duplication. Besides; it covers all commonly used techniques for each dimension. Regarding completeness; Predictive value imputation, distribution-based imputation, KNN, and more methods are investigated. Moreover; relevance dimension is explored via filter and wrapper approach, rough set theory, hybrid feature selection, and other techniques. Duplication is investigated throughout many techniques such as; K-medoids, standard duplicate elimination algorithm, online record matching, and sorted blocks.
Menna Ibrahim Gabr hasn't uploaded this paper.
Create a free Academia account to let Menna Ibrahim know you want this paper to be uploaded.