Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2005
…
14 pages
1 file
The exploration of data to extract information or knowledge to support decision making is a critical success factor for an organization in today's society. However, several problems can affect data quality. These problems have a negative effect in the results extracted from data, affecting their usefulness and correctness. In this context, it is quite important to know and understand the data problems. This paper presents a taxonomy of data quality problems, organizing them by granularity levels of occurrence. A formal definition is presented for each problem included. The taxonomy provides rigorous definitions, which are information-richer than the textual definitions used in previous works. These definitions are useful to the development of a data quality tool that automatically detects the identified problems.
Journal of Data and Information Quality
In today's society the exploration of one or more databases to extract information or knowledge to support management is a critical success factor for an organization. However, it is well known that several problems can affect data quality. These problems have a negative effect in the results extracted from data, influencing their correction and validity. In this context, it is quite important to understand theoretically and in practice these data problems. This paper presents a taxonomy of data quality problems, derived from real-world databases. The taxonomy organizes the problems at different levels of abstraction. Methods to detect data quality problems represented as binary trees are also proposed for each abstraction level. The paper also compares this taxonomy with others already proposed in the literature.
Nowadays, activities and decisions making in an organization is based on data and information obtained from data analysis, which provides various services for constructing reliable and accurate process. As data are significant resources in all organizations the quality of data is critical for managers and operating processes to identify related performance issues. Moreover, high quality data can increase opportunity for achieving top services in an organization. However, identifying various aspects of data quality from definition, dimensions, types, strategies, techniques are essential to equip methods and processes for improving data. This paper focuses on systematic review of data quality dimensions in order to use at proposed framework which combining data mining and statistical techniques to measure dependencies among dimensions and illustrate how extracting knowledge can increase process quality.
International Journal of Business Information Systems, 2016
Data quality has significance to companies, but is an issue that can be challenging to approach and operationalise. This study focuses on data quality from the perspective of operationalisation by analysing the practices of a company that is a world leader in its business. A model is proposed for managing data quality to enable evaluation and operationalisation. The results indicate that data quality is best ensured when organisation specific aspects are taken into account. The model acknowledges the needs of different data domains, particularly those that have master data characteristics. The proposed model can provide a starting point for operationalising data quality assessment and improvement. The consequent appreciation of data quality improves data maintenance processes, IT solutions, data quality and relevant expertise, all of which form the basis for handling the origins of products.
2015
Data quality (DQ) has been studied in significant depth over the last two decades and has received attention from both the academic and the practitioner community. Over that period of time a large number of data quality dimensions have been identified in due course of research and practice. While it is important to embrace the diversity of views of data quality, it is equally important for the data quality research and practitioner community to be united in the consistent interpretation of this foundational concept. In this paper, we provide a step towards this consistent interpretation. Through a systematic review of research and practitioner literature, we identify previously published data quality dimensions and embark on the analysis and consolidation of the overlapping and inconsistent definitions. We stipulate that the shared understanding facilitated by this consolidation is a necessary prelude to generic and declarative forms of requirements modeling for data quality.
Integrity, Internal Control and Security in Information Systems, 2002
This paper first examines various issues on data quality and provides an overview of current research in the area. Then it focuses on research at the MITRE Corporation to use annotations to manage data quality. Next some of the emerging directions in data quality including managing quality for the semantic web and the relationships between data quality and data mining will be discussed. Finally some of the directions for data quality will be provided.
Proceedings of the 2017 Federated Conference on Computer Science and Information Systems, 2017
The research discusses the issue how to describe data quality and what should be taken into account when developing an universal data quality management solution. The proposed approach is to create quality specifications for each kind of data objects and to make them executable. The specification can be executed step-by-step according to business process descriptions, ensuring the gradual accumulation of data in the database and data quality checking according to the specific use case. The described approach can be applied to check the completeness, accuracy, timeliness and consistency of accumulated data.
JUCS - Journal of Universal Computer Science, 2020
The paper proposes a new data object-driven approach to data quality evaluation. It consists of three main components: (1) a data object, (2) data quality requirements, and (3) data quality evaluation process. As data quality is of relative nature, the data object and quality requirements are (a) use-case dependent and (b) defined by the user in accordance with his needs. All three components of the presented data quality model are described using graphical Domain Specific Languages (DSLs). In accordance with Model-Driven Architecture (MDA), the data quality model is built in two steps: (1) creating a platform-independent model (PIM), and (2) converting the created PIM into a platform-specific model (PSM). The PIM comprises informal specifications of data quality. The PSM describes the implementation of a data quality model, thus making it executable, enabling data object scanning and detecting data quality defects and anomalies. The proposed approach was applied to open data sets, ...
Proceedings of the 16th International …, 2011
Data quality (DQ) assessment can be significantly enhanced with the use of the right DQ assessment methods, which provide automated solutions to assess DQ. The range of DQ assessment methods is very broad: from data profiling and semantic profiling to data matching and data validation. This paper gives an overview of current methods for DQ assessment and classifies the DQ assessment methods into an existing taxonomy of DQ problems. Specific examples of the placement of each DQ method in the taxonomy are provided and illustrate why the method is relevant to the particular taxonomy position. The gaps in the taxonomy, where no current DQ methods exist, show where new methods are required and can guide future research and DQ tool development.
Australasian Database Conference, 2011
Data Quality is a cross-disciplinary and often domain specific problem due to the importance of fitness for use in the definition of data quality metrics. It has been the target of research and development for over 4 decades by business analysts, solution architects, database experts and statisticians to name a few. However, the changing landscape of data quality challenges indicate the need for holistic solutions. As a first step towards bridging any gaps between the various research communities, we undertook a comprehensive literature study of data quality research published in the last two decades 1 . In this study we considered a broad range of Information System (IS) and Computer Science (CS) publication (conference and journal) outlets. The main aims of the study were to understand the current landscape of data quality research, to create better awareness of (lack of) synergies between various research communities, and, subsequently, to direct attention towards holistic solutions. In this paper, we present a summary of the findings from the study, that include a taxonomy of data quality problems, identification of the top themes, outlets and main trends in data quality research, as well as a detailed thematic analysis that outlines the overlaps and distinctions between the focus of IS and CS publications.
Handbook of Data Quality, 2013
This handbook is motivated by the presence of diverse communities within the area of data quality management, which have individually contributed a wealth of knowledge on data quality research and practice. The chapter presents a snapshot of these contributions from both research and practice, and highlights the background and rational for the handbook.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
IEEE Transactions on Knowledge and Data Engineering, 1995
Journal of Theoretical and Applied Information Technology , 2017
Future Computing and Informatics Journal, 2021
Journal of Theoretical and Applied Information Technology , 2019
Information & Management, 1980
2015 IEEE 21st Pacific Rim International Symposium on Dependable Computing (PRDC), 2015
Annual Review of Statistics and Its Application
Faculty of Science and Technology School of Information Technology, 2010
Information & Management, 1980
Proceedings of the 5th International Conference on Data Management Technologies and Applications, 2016