Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
Techniques for assessing data quality along different dimensions have been discussed in the data quality management (DQM) literature. In recent years, researchers and practitioners have underscored the importance of contextual quality assessment, highlighting its contribution to decision-making. The current data quality measurement methods, however, are often derived from impartial data and system characteristics, disconnected from the business and decision-making context. This paper suggests that with the increased attention to the contextual aspects, there is a need to revise current data quality measurement methods and consider alternatives that better reflect contextual evaluation. As a step in this direction, this study develops content-based measurement methods for commonly-used quality dimensions: completeness, validity, accuracy, and currency. The measurements are based on Intrinsic Value, a conceptual measure of the business value that is associated with the evaluated data. Intrinsic value is used as a scaling factor that allows aggregation of quality measurements from the single data item to higher-level data collections. The proposed value-based quality measurement models are illustrated with a few examples and their implications for data management research and practice discussed.
Current and Future Trends, 2009
Maintaining data at a high quality is critical to organizational success. Firms, aware of the consequences of poor data quality, have adopted methodologies and policies for measuring, monitoring, and improving it (Redman, 1996; Eckerson, 2002). Today’s quality measurements are typically driven by physical characteristics of the data (e.g., item counts, time tags, or failure rates) and assume an objective quality standard, disregarding the context in which the data is used. The alternative is to derive quality metrics from data content and evaluate them within specific usage contexts. The former approach is termed as structure-based (or structural), and the latter, content-based (Ballou and Pazer, 2003). In this chapter we propose a novel framework to assess data quality within specific usage contexts and link it to data utility (or utility of data) - a measure of the value contribution associated with data within specific usage contexts. Our utility-driven framework addresses the li...
2015
Data quality (DQ) has been studied in significant depth over the last two decades and has received attention from both the academic and the practitioner community. Over that period of time a large number of data quality dimensions have been identified in due course of research and practice. While it is important to embrace the diversity of views of data quality, it is equally important for the data quality research and practitioner community to be united in the consistent interpretation of this foundational concept. In this paper, we provide a step towards this consistent interpretation. Through a systematic review of research and practitioner literature, we identify previously published data quality dimensions and embark on the analysis and consolidation of the overlapping and inconsistent definitions. We stipulate that the shared understanding facilitated by this consolidation is a necessary prelude to generic and declarative forms of requirements modeling for data quality.
Nowadays, activities and decisions making in an organization is based on data and information obtained from data analysis, which provides various services for constructing reliable and accurate process. As data are significant resources in all organizations the quality of data is critical for managers and operating processes to identify related performance issues. Moreover, high quality data can increase opportunity for achieving top services in an organization. However, identifying various aspects of data quality from definition, dimensions, types, strategies, techniques are essential to equip methods and processes for improving data. This paper focuses on systematic review of data quality dimensions in order to use at proposed framework which combining data mining and statistical techniques to measure dependencies among dimensions and illustrate how extracting knowledge can increase process quality.
Data and information obtained from data analysis is an essential asset to construct and support information systems. As data is a significant resource, the quality of data is critical to enhance data quality and increase the effectiveness of business processes. Relationships among all four major data quality dimensions for process improvement are often neglected. For this reason, this study proposes to construct a reliable framework to support process activities in information systems. This study focuses on four critical quality dimensions; accuracy, completeness, consistency, and timeliness. A qualitative approach was conducted using a questionnaire and the responses were assessed to measure reliability and validity of the survey. Factor analysis and Cronbach-alpha test were applied to interpret the results. The results show that the items of each data quality dimension and improvement process are reliable and valid. This framework can be used to evaluate data quality in an information system to improve the involved process. Accuracy AQ1: This information is correct. (0.939) AQ2: This information is incorrect. (R) (0.872) AQ3: This information is accurate. (0.914) AQ4: This information is reliable. (0.797) Completeness ComQ1: This information includes all necessary values. (0.888) ComQ2: This information is incomplete. (R) (0.858) ComQ3: This information is complete. (0.844)
Various techniques have been proposed to enable organisations to assess the current quality level of their data. Unfortunately, organisations have many different requirements related to data quality (DQ) assessment because of domain and context differences. Due to the gamut of possible requirements, organisations may be forced to select an assessment technique which may not be wholly suitable for their requirements. Therefore, we propose and evaluate the Hybrid Approach to assessing DQ which demonstrates that it is possible to develop new techniques for assessing DQ, suitable for any set of requirements, while leveraging the best practices proposed by existing ATs.
Future Computing and Informatics Journal, 2021
Achieving high level of data quality is considered one of the most important assets for any small, medium and large size organizations. Data quality is the main hype for both practitioners and researchers who deal with traditional or big data. The level of data quality is measured through several quality dimensions. High percentage of the current studies focus on assessing and applying data quality on traditional data. As we are in the era of big data, the attention should be paid to the tremendous volume of generated and processed data in which 80% of all the generated data is unstructured. However, the initiatives for creating big data quality evaluation models are still under development. This paper investigates the data quality dimensions that are mostly used in both traditional and big data to figure out the metrics and techniques that are used to measure and handle each dimension. A complete definition for each traditional and big data quality dimension, metrics and handling t...
International Journal of Business Information Systems, 2016
Data quality has significance to companies, but is an issue that can be challenging to approach and operationalise. This study focuses on data quality from the perspective of operationalisation by analysing the practices of a company that is a world leader in its business. A model is proposed for managing data quality to enable evaluation and operationalisation. The results indicate that data quality is best ensured when organisation specific aspects are taken into account. The model acknowledges the needs of different data domains, particularly those that have master data characteristics. The proposed model can provide a starting point for operationalising data quality assessment and improvement. The consequent appreciation of data quality improves data maintenance processes, IT solutions, data quality and relevant expertise, all of which form the basis for handling the origins of products.
Australasian Database Conference, 2011
Data Quality is a cross-disciplinary and often domain specific problem due to the importance of fitness for use in the definition of data quality metrics. It has been the target of research and development for over 4 decades by business analysts, solution architects, database experts and statisticians to name a few. However, the changing landscape of data quality challenges indicate the need for holistic solutions. As a first step towards bridging any gaps between the various research communities, we undertook a comprehensive literature study of data quality research published in the last two decades 1 . In this study we considered a broad range of Information System (IS) and Computer Science (CS) publication (conference and journal) outlets. The main aims of the study were to understand the current landscape of data quality research, to create better awareness of (lack of) synergies between various research communities, and, subsequently, to direct attention towards holistic solutions. In this paper, we present a summary of the findings from the study, that include a taxonomy of data quality problems, identification of the top themes, outlets and main trends in data quality research, as well as a detailed thematic analysis that outlines the overlaps and distinctions between the focus of IS and CS publications.
2005
The value of management decisions, the security of our nation, and the very foundations of our business integrity are all dependent on the quality of data and information. However, the quality of the data and information is dependent on how that data or information will be used. This paper proposes a theory of data quality based on the five principles defined by J. M. Juran for product and service quality and extends Wang et al's 1995 framework for data quality research. It then examines the data and information quality literature from journals within the context of this framework.
Due to the increase in the predicaments of data handling, the need for improving the quality of data arises to reduce its insidious effects over the performance. If the cause of the hindrance is analyzed, it is better to commence the data quality improvement plan by assessing all the scenarios affected previously. Values for measuring the quality of data should be constituted in this. The quality of data should be measured so as to evaluate the importance of the information and how can it be improved. Nevertheless, the primitive matter is to understand that what and how quality should be measured as it is said, "if you can't measure it, you can't manage it ". [Peter Drucker] The foremost step here is to focus on the elements of the data that are considered critical based on the needs of the user in the business. The quality of data can be measured and improved using the metrics methodology. This writing tries to represent how quality of can be quantified for selected dimension. At first, several requirements for defining a metric for measurement are stated. Furthermore, analysis of metrics is discussed with respect to the requirement of the company to improve data quality. After that, on the basis of available approaches, new metrics for the dimensions completeness and timeliness that meets the defined requirements are derived. Lastly, evaluation of the derived metric for timeliness is done in a case study.
IEEE Transactions on Knowledge and Data Engineering, 1995
Abstiuct-Organizational databases are pervaded with data of poor quality. However, there has not been an analysis of the data quality literature that provides an overall understanding of the state-of-art research in this area. Using an analogy between product manufacturing and data manufacturing, this paper develops a framework for analyzing data quality research, and uses it as the basis for organizing the data quality literature. This framework consists of seven elements: management responsibilities, operation and assurance costs, research and development, production, distribution, personnel management, and legal function. The analysis reveals that most research efforts focus on operation and assurance costs, research and development, and production of data products. Unexplored research topics and unresolved issues are identified and directions for future research provided.
Proceedings of the 16th International …, 2011
Data quality (DQ) assessment can be significantly enhanced with the use of the right DQ assessment methods, which provide automated solutions to assess DQ. The range of DQ assessment methods is very broad: from data profiling and semantic profiling to data matching and data validation. This paper gives an overview of current methods for DQ assessment and classifies the DQ assessment methods into an existing taxonomy of DQ problems. Specific examples of the placement of each DQ method in the taxonomy are provided and illustrate why the method is relevant to the particular taxonomy position. The gaps in the taxonomy, where no current DQ methods exist, show where new methods are required and can guide future research and DQ tool development.
Handbook of Data Quality, 2013
This handbook is motivated by the presence of diverse communities within the area of data quality management, which have individually contributed a wealth of knowledge on data quality research and practice. The chapter presents a snapshot of these contributions from both research and practice, and highlights the background and rational for the handbook.
Decision Support Systems, 2006
In the complex decision-environments that characterize e-business settings, it is important to permit decision-makers to proactively manage data quality. In this paper we propose a decision-support framework that permits decision-makers to gauge quality both in an objective (context-independent) and in a context-dependent manner. The framework is based on the information product approach and uses the Information Product Map (IPMAP). We illustrate its application in evaluating data quality using completeness-a data quality dimension that is acknowledged as important. A decision-support tool (IPView) for managing data quality that incorporates the proposed framework is also described. D
International Journal of Advanced Computer Science and Applications
Data-related expertise is a central and determining factor in the success of many organizations. Big Tech companies have developed an operational environment that extracts benefit from collected data to increase the efficiency and effectiveness of daily operations and services offered. However, in a complex economic environment, with transparent accounting and financial management, it is not possible to solve data quality issues with "dollars" without justifications and measurable indicators beforehand. The overall goal is not to improve data quality by any means, but to plan cost-effective data quality projects that benefit the organization. This knowledge is particularly relevant for organizations with little or no experience in the field of data quality assessment and improvement. Indeed, it is important that the costs and benefits associated with data quality are explicit and above all, quantifiable for both business managers and IT analysts. Organizations must also evaluate the different scenarios related to the implementation of data quality projects. The optimal scenario must provide the best financial and business value and meet the specifications in terms of time, resources and cost. The approach presented is this paper is an evaluation-oriented approach. For data quality projects, it evaluates the positive impact on the organization's financial and business objectives, which could be linked to the positive value of quality improvement and the implementation complexity, which could be coupled with the costs of quality improvement. This paper tries also to translate empirically the implementation complexity to costs expressed in monetary terms.
Journal of Applied Sciences, 2013
Improving data quality is a basic step for all companies and organizations as it leads to increase opportunity to achieve top services. The aim of this study was to validate and adapt the four major data quality dimensions' instruments in different information systems. The four important quality dimensions which were used in this study were; accuracy, completeness, consistency and timeliness. The questionnaire was developed, validated and used for collecting data on the different information system's users. A set of questionnaire was conducted to 50 respondents who using different information systems. Inferential statistics and descriptive analysis were employed to measure and validate the factor contributing to quality improvement process. This study has been compared with related parts of previous studies; and showed that the instrument is valid to measure quality dimensions and improvement process. The content validity, reliability and factor analysis were applied on 24 items to compute the results. The results showed that the instrument is considered to be reliable and validate. The results also suggest that the instrument can be used as a basic foundation to implicate data quality for organizations manager to design improvement process.
Proceedings of the 2017 Federated Conference on Computer Science and Information Systems, 2017
The research discusses the issue how to describe data quality and what should be taken into account when developing an universal data quality management solution. The proposed approach is to create quality specifications for each kind of data objects and to make them executable. The specification can be executed step-by-step according to business process descriptions, ensuring the gradual accumulation of data in the database and data quality checking according to the specific use case. The described approach can be applied to check the completeness, accuracy, timeliness and consistency of accumulated data.
The notion of data quality cannot be separated from the context in which the data is produced or used. Recently, a conceptual framework for capturing context-dependent data quality assessment has been proposed. According to it, a database D is assessed wrt. a context which is modeled as an external system containing additional data, metadata, and definitions of quality predicates. The instance D is "put in context" via schema mappings; and after contextual processing of the data, a collection of alternative clean versions D of D is produced. The quality of D is measured in terms of its distance to this class. In this work we extend contexts for data quality assessment by including multidimensional data, which allows to analyze data from multiple perspectives and different degrees of granularity. It is possible to navigate through dimensional hierarchies in order to go for the data that is needed for quality assessment. More precisely, we introduce contextual hierarchies as components of contexts for data quality assessment. The resulting contexts are later represented as ontologies written in description logic.
Annual Review of Statistics and Its Application
Data, and hence data quality, transcend all boundaries of science, commerce, engineering, medicine, public health, and policy. Data quality has historically been addressed by controlling the measurement processes, controlling the data collection processes, and through data ownership. For many data sources being leveraged into data science, this approach to data quality may be challenged. To understand that challenge, a historical and disciplinary perspective on data quality, highlighting the evolution and convergence of data concepts and applications, is presented.
2013
Data quality (DQ) has been studied in significant depth over the last two decades and has received attention from both the academic and the practitioner community. Over that period of time a large number of data quality dimensions have been identified in due course of research and practice. While it is important to embrace the diversity of views of data quality, it is equally important for the data quality research and practitioner community to be united in the consistent interpretation of this foundational concept. In this paper, we provide a step towards this consistent interpretation by providing a lens to analyse the dimensions towards developing clear and concise metrics to manage DQ. Through a systematic review of research and practitioner literature, we identify previously published data quality dimensions and embark on the analysis and consolidation of the overlapping and inconsistent definitions.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.