Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2002, Integrity, Internal Control and Security in Information Systems
This paper first examines various issues on data quality and provides an overview of current research in the area. Then it focuses on research at the MITRE Corporation to use annotations to manage data quality. Next some of the emerging directions in data quality including managing quality for the semantic web and the relationships between data quality and data mining will be discussed. Finally some of the directions for data quality will be provided.
2012 IEEE Sixth International Conference on Semantic Computing, 2012
The increasing size and availability of web data make data quality a core challenge in many applications. Principles of data quality are recognized as essential to ensure that data fit for their intended use in operations, decision-making, and planning. However, with the rise of the Semantic Web, new data quality issues appear and require deeper consideration. In this paper, we propose to extend the data quality principles to the context of Semantic Web. Based on our extensive industrial experience in data integration, we identify five main classes suited for data quality in Semantic Web. For each class, we list the principles that are involved at all stages of the data management process. Following these principles will provide a sound basis for better decision-making within organizations and will maximize longterm data integration and interoperability.
2016
The Data Quality (DQ) is an important component for any kind of decision process. This subject has been widely studied for several years, but it has not received sufficient attention from the Semantic Web community. In this paper, a DQ taxonomy of RDF data is presented. The proposed taxonomy organizes the problems at different levels of abstraction. We also show SPARQL query templates, which can be instantiated into concrete data quality test queries, in order to detect DQ problems in the RDF datasets. The proposed taxonomy includes more types of problems than any of the existing taxonomies for the Semantic Web.
2021
The effective functioning of data-intensive applications usually requires that the dataset should be of high quality. The quality depends on the task they will be used for. However, it is possible to identify task-independent data quality dimensions which are solely related to data themselves and can be extracted with the help of rule mining/pattern mining. In order to assess and improve data quality, we propose an ontological approach to report data quality violated triples. Our goal is to provide data stakeholders with a set of methods and techniques to guide them in assessing and improving data quality.
I examine the equation between semantics and data quality and discuss suitable methods for data integration. As a rare example of a data quality based semantic technology, I take the ClearCore product from Infoshare Ltd. I guide the reader through the process of using ClearCore and give examples from financial and retail services projects. The term ontology is found useful in describing the process of data integration, which comes to be seen as uncovering the ontology that underlies the data. The stages of validation and matching involve the specification of different kinds of business rules, constraining and generative respectively. The software helps to elicit these rules from human analysts and thus behaves as an expert system, accumulating a domain-specific knowledge base.
2005
The technological advance and the internet have favoured the appearance of a great diversity of web applications through which organizations develop their businesses in a more and more competitive environment. A decisive factor for this competitiveness is the assurance of data quality of the web applications used. In the last years, several research works on Data Quality have been developed. In this paper, we will present a systematic review of those researches.
2007
I would like to thank Tiziana Catarci, Helena Galhardas, and Mokrane Bouzeghoub for kindly accepting to serve as readers and reviewers for this dissertation of "Habilitation à Diriger des Recherches" and to participate to the jury.
Proceedings of the 16th International …, 2011
Data quality (DQ) assessment can be significantly enhanced with the use of the right DQ assessment methods, which provide automated solutions to assess DQ. The range of DQ assessment methods is very broad: from data profiling and semantic profiling to data matching and data validation. This paper gives an overview of current methods for DQ assessment and classifies the DQ assessment methods into an existing taxonomy of DQ problems. Specific examples of the placement of each DQ method in the taxonomy are provided and illustrate why the method is relevant to the particular taxonomy position. The gaps in the taxonomy, where no current DQ methods exist, show where new methods are required and can guide future research and DQ tool development.
Nowadays, activities and decisions making in an organization is based on data and information obtained from data analysis, which provides various services for constructing reliable and accurate process. As data are significant resources in all organizations the quality of data is critical for managers and operating processes to identify related performance issues. Moreover, high quality data can increase opportunity for achieving top services in an organization. However, identifying various aspects of data quality from definition, dimensions, types, strategies, techniques are essential to equip methods and processes for improving data. This paper focuses on systematic review of data quality dimensions in order to use at proposed framework which combining data mining and statistical techniques to measure dependencies among dimensions and illustrate how extracting knowledge can increase process quality.
The notion of data quality cannot be separated from the context in which the data is produced or used. Recently, a conceptual framework for capturing context-dependent data quality assessment has been proposed. According to it, a database D is assessed wrt. a context which is modeled as an external system containing additional data, metadata, and definitions of quality predicates. The instance D is "put in context" via schema mappings; and after contextual processing of the data, a collection of alternative clean versions D of D is produced. The quality of D is measured in terms of its distance to this class. In this work we extend contexts for data quality assessment by including multidimensional data, which allows to analyze data from multiple perspectives and different degrees of granularity. It is possible to navigate through dimensional hierarchies in order to go for the data that is needed for quality assessment. More precisely, we introduce contextual hierarchies as components of contexts for data quality assessment. The resulting contexts are later represented as ontologies written in description logic.
Information & Management, 1980
Until recently, data quality was poor'.y understood and seldom achieved, yet it is essential to tlihe effective use of information systems. This paper discusses/ the nature and importance of data quality. The role of dataquality is placed in the life cycle framework. Many new concepts, tools and i techniques from both programming lang,uages and database management systems are presented and rhiated to data quality. In particular, the coqcept of a databrlse constraint is considered in detail. Some current limitation/s and research directions are proposed.
Information & Management, 1980
Lecture Notes in Computer Science, 2003
Web Information Systems (WIS's) [12] are characterized by the presentation to a wide audience of a large amount of data, the quality of which can be very heterogeneous. There are several reasons for this variety, but a significant rea-son is the conflict between the need of ...
2021
Data is of high quality if it is fit for its intended use in operations, decision-making, and planning. There is a colossal amount of linked data available on the web. However, it is difficult to understand how well the linked data fits into the modeling tasks due to the defects present in the data. Faults emerged in the linked data, spreading far and wide, affecting all the services designed for it. Addressing linked data quality deficiencies requires identifying quality problems, quality assessment, and the refinement of data to improve its quality. This study aims to identify existing end-to-end frameworks for quality assessment and improvement of data quality. One important finding is that most of the work deals with only one aspect rather than a combined approach. Another finding is that most of the framework aims at solving problems related to DBpedia. Therefore, a standard scalable system is required that integrates the identification of quality issues, the evaluation, and the improvement of the linked data quality. This survey contributes to understanding the state of the art of data quality evaluation and data quality improvement. A solution based on ontology is also proposed to build an end-to-end system that analyzes quality violations' root causes.
Journal of Data and Information Quality
In today's society the exploration of one or more databases to extract information or knowledge to support management is a critical success factor for an organization. However, it is well known that several problems can affect data quality. These problems have a negative effect in the results extracted from data, influencing their correction and validity. In this context, it is quite important to understand theoretically and in practice these data problems. This paper presents a taxonomy of data quality problems, derived from real-world databases. The taxonomy organizes the problems at different levels of abstraction. Methods to detect data quality problems represented as binary trees are also proposed for each abstraction level. The paper also compares this taxonomy with others already proposed in the literature.
2016 IEEE Tenth International Conference on Semantic Computing (ICSC), 2016
The Web meanwhile got complemented with a Web of Data. Examples are the Linked Open Data cloud, the RDFa and Microformats data increasingly being embedded in ordinary Web pages, or the schema.org initiative. However, the Web of Data shares many characteristics with the original Web of documents, for example, varying quality. There are a large variety of dimensions and measures of data quality. Hence, the assessment of of quality in terms of fitness for use with respect to a certain use case is challenging. In this article, we present a comprehensive and extensible framework for the automatic assessment of linked data quality. Within this framework we implemented around 30 data quality metrics. A particular focus of our work is on scalability and support for the evolution of data. Regarding scalability, we follow a stream processing approach, which provides an easy interface for the integration of domain specific quality measures. With regard to the evolution of data, we introduce data quality assessment as a stage of a holistic data life cycle.
Lecture Notes in Business Information Processing, 2011
We motivate, formalize and investigate the notions of data quality assessment and data quality query answering as context dependent activities. Contexts for the assessment and usage of a data source at hand are modeled as collections of external databases, that can be materialized or virtual, and mappings within the collections and with the data source at hand. In this way, the context becomes "the complement" of the data source wrt a data integration system. The proposed model allows for natural extensions, like considering data quality predicates, and even more expressive ontologies for data quality assessment. Topics. Data quality and cleansing. ⋆ Research funded by the NSERC Strategic Network on BI (BIN, ADC05) ⋆⋆ Faculty Fellow of the IBM CAS. Also affiliated to University of Concepción (Chile). ⋆⋆⋆ Also affiliated to University of Ottawa.
2005
The exploration of data to extract information or knowledge to support decision making is a critical success factor for an organization in today's society. However, several problems can affect data quality. These problems have a negative effect in the results extracted from data, affecting their usefulness and correctness. In this context, it is quite important to know and understand the data problems. This paper presents a taxonomy of data quality problems, organizing them by granularity levels of occurrence. A formal definition is presented for each problem included. The taxonomy provides rigorous definitions, which are information-richer than the textual definitions used in previous works. These definitions are useful to the development of a data quality tool that automatically detects the identified problems.
Autonomy Heterogeneity no yes totally semi DIS DW & MIS VMS CIS RS P2P no
2017
This paper reflects on six years developing semantic data quality tools and curation systems for both large-scale social sciences data collection and a major web of data hub. This experience has led the author to believe in using organisational value as a mechanism for automation of data quality management to deal with Big Data volumes and variety. However there are many challenges in developing these automated systems and this discussion paper sets out a set of challenges with respect to the current state of the art and identifies a number of potential avenues for researchers to tackle these challenges.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.