Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2011, Proceedings of the 16th International …
Data quality (DQ) assessment can be significantly enhanced with the use of the right DQ assessment methods, which provide automated solutions to assess DQ. The range of DQ assessment methods is very broad: from data profiling and semantic profiling to data matching and data validation. This paper gives an overview of current methods for DQ assessment and classifies the DQ assessment methods into an existing taxonomy of DQ problems. Specific examples of the placement of each DQ method in the taxonomy are provided and illustrate why the method is relevant to the particular taxonomy position. The gaps in the taxonomy, where no current DQ methods exist, show where new methods are required and can guide future research and DQ tool development.
Various techniques have been proposed to enable organisations to assess the current quality level of their data. Unfortunately, organisations have many different requirements related to data quality (DQ) assessment because of domain and context differences. Due to the gamut of possible requirements, organisations may be forced to select an assessment technique which may not be wholly suitable for their requirements. Therefore, we propose and evaluate the Hybrid Approach to assessing DQ which demonstrates that it is possible to develop new techniques for assessing DQ, suitable for any set of requirements, while leveraging the best practices proposed by existing ATs.
Data quality (DQ) assessment and improvement in larger information systems would often not be feasible without using suitable " DQ methods " , which are algorithms that can be automatically executed by computer systems to detect and/or correct problems in datasets. Currently, these methods are already essential, and they will be of even greater importance as the quantity of data in organisational systems grows. This paper provides a review of existing methods for both DQ assessment and improvement and classifies them according to the DQ problem and problem context. Six gaps have been identified in the classification, where no current DQ methods exist, and these show where new methods are required as a guide for future research and DQ tool development.
2009
Abstract. Poor quality data may be detected and corrected by performing various quality assurance activities that rely on techniques with different efficacy and cost. In this paper, we propose a quantitative approach for measuring and comparing the effectiveness of these data quality (DQ) techniques. Our definitions of effectiveness are inspired by measures proposed in Information Retrieval.
Proceedings of the 2017 Federated Conference on Computer Science and Information Systems, 2017
The research discusses the issue how to describe data quality and what should be taken into account when developing an universal data quality management solution. The proposed approach is to create quality specifications for each kind of data objects and to make them executable. The specification can be executed step-by-step according to business process descriptions, ensuring the gradual accumulation of data in the database and data quality checking according to the specific use case. The described approach can be applied to check the completeness, accuracy, timeliness and consistency of accumulated data.
Proceedings of the 8th international workshop on Software quality - WoSQ '11, 2011
This industrial contribution describes a tool support approach to assessing the quality of relational databases. The approach combines two separate audits-an audit of the database structure as described in the schema and an audit of the database content at a given point in time. The audit of the database schema checks for design weaknesses, data rule violations and deviations from the original data model. It also measures the size, complexity and structural quality of the database. The audit of the database content compares the state of selected data attributes to identify incorrect data and checks for missing and redundant records. The purpose is to initiate a data clean-up process to ensure or restore the quality of the data.
2005
The exploration of data to extract information or knowledge to support decision making is a critical success factor for an organization in today's society. However, several problems can affect data quality. These problems have a negative effect in the results extracted from data, affecting their usefulness and correctness. In this context, it is quite important to know and understand the data problems. This paper presents a taxonomy of data quality problems, organizing them by granularity levels of occurrence. A formal definition is presented for each problem included. The taxonomy provides rigorous definitions, which are information-richer than the textual definitions used in previous works. These definitions are useful to the development of a data quality tool that automatically detects the identified problems.
19º Simposio Brasileiro …, 2004
To solve complex user requirements the information systems need to integrate data from several, possibly autonomous data sources. One challenge in such environment is to provide the user with data meeting his requirements in terms of quality. These requirements are difficult to satisfy because of the strong heterogeneity of the sources. In this paper we address the problem of data quality evaluation in data integration systems. We present a framework which is a first attempt to formalize the evaluation of data quality. It is based on a graph model of the data integration system which allows us to define evaluation methods and demonstrate propositions in terms of graph properties. To illustrate our approach, we also present a first experiment with the data freshness quality factor and we show how the framework is used to evaluate this factor according to different scenarios.
Journal of Data and Information Quality
In today's society the exploration of one or more databases to extract information or knowledge to support management is a critical success factor for an organization. However, it is well known that several problems can affect data quality. These problems have a negative effect in the results extracted from data, influencing their correction and validity. In this context, it is quite important to understand theoretically and in practice these data problems. This paper presents a taxonomy of data quality problems, derived from real-world databases. The taxonomy organizes the problems at different levels of abstraction. Methods to detect data quality problems represented as binary trees are also proposed for each abstraction level. The paper also compares this taxonomy with others already proposed in the literature.
The notion of data quality cannot be separated from the context in which the data is produced or used. Recently, a conceptual framework for capturing context-dependent data quality assessment has been proposed. According to it, a database D is assessed wrt. a context which is modeled as an external system containing additional data, metadata, and definitions of quality predicates. The instance D is "put in context" via schema mappings; and after contextual processing of the data, a collection of alternative clean versions D of D is produced. The quality of D is measured in terms of its distance to this class. In this work we extend contexts for data quality assessment by including multidimensional data, which allows to analyze data from multiple perspectives and different degrees of granularity. It is possible to navigate through dimensional hierarchies in order to go for the data that is needed for quality assessment. More precisely, we introduce contextual hierarchies as components of contexts for data quality assessment. The resulting contexts are later represented as ontologies written in description logic.
2021
The effective functioning of data-intensive applications usually requires that the dataset should be of high quality. The quality depends on the task they will be used for. However, it is possible to identify task-independent data quality dimensions which are solely related to data themselves and can be extracted with the help of rule mining/pattern mining. In order to assess and improve data quality, we propose an ontological approach to report data quality violated triples. Our goal is to provide data stakeholders with a set of methods and techniques to guide them in assessing and improving data quality.
Due to the increase in the predicaments of data handling, the need for improving the quality of data arises to reduce its insidious effects over the performance. If the cause of the hindrance is analyzed, it is better to commence the data quality improvement plan by assessing all the scenarios affected previously. Values for measuring the quality of data should be constituted in this. The quality of data should be measured so as to evaluate the importance of the information and how can it be improved. Nevertheless, the primitive matter is to understand that what and how quality should be measured as it is said, "if you can't measure it, you can't manage it ". [Peter Drucker] The foremost step here is to focus on the elements of the data that are considered critical based on the needs of the user in the business. The quality of data can be measured and improved using the metrics methodology. This writing tries to represent how quality of can be quantified for selected dimension. At first, several requirements for defining a metric for measurement are stated. Furthermore, analysis of metrics is discussed with respect to the requirement of the company to improve data quality. After that, on the basis of available approaches, new metrics for the dimensions completeness and timeliness that meets the defined requirements are derived. Lastly, evaluation of the derived metric for timeliness is done in a case study.
Nowadays, activities and decisions making in an organization is based on data and information obtained from data analysis, which provides various services for constructing reliable and accurate process. As data are significant resources in all organizations the quality of data is critical for managers and operating processes to identify related performance issues. Moreover, high quality data can increase opportunity for achieving top services in an organization. However, identifying various aspects of data quality from definition, dimensions, types, strategies, techniques are essential to equip methods and processes for improving data. This paper focuses on systematic review of data quality dimensions in order to use at proposed framework which combining data mining and statistical techniques to measure dependencies among dimensions and illustrate how extracting knowledge can increase process quality.
International Journal of …, 2011
We present a Heterogenous Data Quality Methodology (HDQM) for Data Quality (DQ) assessment and improvement that considers all types of data managed in an organization, namely structured data represented in databases, semistructured data usually represented in XML, and unstructured data represented in documents. We also define a meta-model in order to describe the relevant knowledge managed in the methodology. The different types of data are translated in a common conceptual representation. We consider two dimensions widely analyzed in the specialist literature and used in practice: Accuracy and Currency. The methodology provides stakeholders involved in DQ management with a complete set of phases for data quality assessment and improvement. A non trivial case study from the business domain is used to illustrate and validate the methodology.
Future Computing and Informatics Journal, 2021
Achieving high level of data quality is considered one of the most important assets for any small, medium and large size organizations. Data quality is the main hype for both practitioners and researchers who deal with traditional or big data. The level of data quality is measured through several quality dimensions. High percentage of the current studies focus on assessing and applying data quality on traditional data. As we are in the era of big data, the attention should be paid to the tremendous volume of generated and processed data in which 80% of all the generated data is unstructured. However, the initiatives for creating big data quality evaluation models are still under development. This paper investigates the data quality dimensions that are mostly used in both traditional and big data to figure out the metrics and techniques that are used to measure and handle each dimension. A complete definition for each traditional and big data quality dimension, metrics and handling t...
JUCS - Journal of Universal Computer Science, 2020
The paper proposes a new data object-driven approach to data quality evaluation. It consists of three main components: (1) a data object, (2) data quality requirements, and (3) data quality evaluation process. As data quality is of relative nature, the data object and quality requirements are (a) use-case dependent and (b) defined by the user in accordance with his needs. All three components of the presented data quality model are described using graphical Domain Specific Languages (DSLs). In accordance with Model-Driven Architecture (MDA), the data quality model is built in two steps: (1) creating a platform-independent model (PIM), and (2) converting the created PIM into a platform-specific model (PSM). The PIM comprises informal specifications of data quality. The PSM describes the implementation of a data quality model, thus making it executable, enabling data object scanning and detecting data quality defects and anomalies. The proposed approach was applied to open data sets, ...
Faculty of Science and Technology School of Information Technology, 2010
The assessment of data quality is a key success factor for organisational performance. It supports managers and executives to clearly identify and reveal defective data in their information systems, and consequently minimises and eliminates the risks associated with decisions based on poor data. Despite the importance of data quality assessment, limited research has been conducted on providing an objective data quality assessment. Researchers and practitioners usually rely on an error ratio metric to calculate abnormal data. However, this approach is insufficient in terms of providing a complete quality assessment since errors can be randomly and systematically distributed across databases. This study will introduce a decision rule method for providing a comprehensive quality assessment, which captures and allocates quality change at the early stage in organisational information systems. A decision rule can also be extended to answer important questions such as the randomness degree and the probability distribution of errors. These advantages will significantly reduce the time and costs associated with performing quality assessment tasks. More importantly, the efficiency and effectiveness of the decision rule for assessing data quality enables management to make accurate decisions reflecting positively on organizational values.
2015
Data quality (DQ) has been studied in significant depth over the last two decades and has received attention from both the academic and the practitioner community. Over that period of time a large number of data quality dimensions have been identified in due course of research and practice. While it is important to embrace the diversity of views of data quality, it is equally important for the data quality research and practitioner community to be united in the consistent interpretation of this foundational concept. In this paper, we provide a step towards this consistent interpretation. Through a systematic review of research and practitioner literature, we identify previously published data quality dimensions and embark on the analysis and consolidation of the overlapping and inconsistent definitions. We stipulate that the shared understanding facilitated by this consolidation is a necessary prelude to generic and declarative forms of requirements modeling for data quality.
International Journal of Business Information Systems, 2016
Data quality has significance to companies, but is an issue that can be challenging to approach and operationalise. This study focuses on data quality from the perspective of operationalisation by analysing the practices of a company that is a world leader in its business. A model is proposed for managing data quality to enable evaluation and operationalisation. The results indicate that data quality is best ensured when organisation specific aspects are taken into account. The model acknowledges the needs of different data domains, particularly those that have master data characteristics. The proposed model can provide a starting point for operationalising data quality assessment and improvement. The consequent appreciation of data quality improves data maintenance processes, IT solutions, data quality and relevant expertise, all of which form the basis for handling the origins of products.
Integrity, Internal Control and Security in Information Systems, 2002
This paper first examines various issues on data quality and provides an overview of current research in the area. Then it focuses on research at the MITRE Corporation to use annotations to manage data quality. Next some of the emerging directions in data quality including managing quality for the semantic web and the relationships between data quality and data mining will be discussed. Finally some of the directions for data quality will be provided.
Techniques for assessing data quality along different dimensions have been discussed in the data quality management (DQM) literature. In recent years, researchers and practitioners have underscored the importance of contextual quality assessment, highlighting its contribution to decision-making. The current data quality measurement methods, however, are often derived from impartial data and system characteristics, disconnected from the business and decision-making context. This paper suggests that with the increased attention to the contextual aspects, there is a need to revise current data quality measurement methods and consider alternatives that better reflect contextual evaluation. As a step in this direction, this study develops content-based measurement methods for commonly-used quality dimensions: completeness, validity, accuracy, and currency. The measurements are based on Intrinsic Value, a conceptual measure of the business value that is associated with the evaluated data. Intrinsic value is used as a scaling factor that allows aggregation of quality measurements from the single data item to higher-level data collections. The proposed value-based quality measurement models are illustrated with a few examples and their implications for data management research and practice discussed.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.