Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2020, JUCS - Journal of Universal Computer Science
The paper proposes a new data object-driven approach to data quality evaluation. It consists of three main components: (1) a data object, (2) data quality requirements, and (3) data quality evaluation process. As data quality is of relative nature, the data object and quality requirements are (a) use-case dependent and (b) defined by the user in accordance with his needs. All three components of the presented data quality model are described using graphical Domain Specific Languages (DSLs). In accordance with Model-Driven Architecture (MDA), the data quality model is built in two steps: (1) creating a platform-independent model (PIM), and (2) converting the created PIM into a platform-specific model (PSM). The PIM comprises informal specifications of data quality. The PSM describes the implementation of a data quality model, thus making it executable, enabling data object scanning and detecting data quality defects and anomalies. The proposed approach was applied to open data sets, ...
Procedia Computer Science, 2017
The paper discusses an external solution for data quality management in information systems. In contradiction to traditional data quality assurance methods, the proposed approach provides the usage of a domain specific language (DSL) for description data quality models. Data quality models consists of graphical diagrams, which elements contain requirements for data object's values and procedures for data object's analysis. The DSL interpreter makes the data quality model executable therefore ensuring measurement and improving of data quality. The described approach can be applied: (1) to check the completeness, accuracy and consistency of accumulated data; (2) to support data migration in cases when software architecture and/or data models are changed; (3) to gather data from different data sources and to transfer them to data warehouse.
Proceedings of the 2017 Federated Conference on Computer Science and Information Systems, 2017
The research discusses the issue how to describe data quality and what should be taken into account when developing an universal data quality management solution. The proposed approach is to create quality specifications for each kind of data objects and to make them executable. The specification can be executed step-by-step according to business process descriptions, ensuring the gradual accumulation of data in the database and data quality checking according to the specific use case. The described approach can be applied to check the completeness, accuracy, timeliness and consistency of accumulated data.
2018
The paper discusses an external solution for data quality management in information systems. In contradiction to traditional data quality assurance methods, the proposed approach provides the usage of a domain specific language (DSL) for description data quality models. Data quality models consists of graphical diagrams, which elements contain requirements for data object’s values and procedures for data object’s analysis. The DSL interpreter makes the data quality model executable therefore ensuring measurement and improving of data quality. The described approach can be applied: (1) to check the completeness, accuracy and consistency of accumulated data; (2) to support data migration in cases when software architecture and/or data models are changed; (3) to gather data from different data sources and to transfer them to data warehouse. © 2016 The Authors. Published by Elsevier B.V. Peer-review under responsibility of organizing committee of the scientific committee of the internat...
2009 Ninth International Conference on Quality Software, 2009
Inadequate levels of Data Quality (DQ) in Information Systems (IS) suppose a very important problem for organizations. In any case, they look for to assure data quality from earlier stages on information system developments. This paper proposes to incorporate mechanisms into software development methodologies, in order to integrate users DQ requirements aimed at assuring the data quality from the beginning of development. It brings a framework consisting of processes, activities and tasks, well defined, which would be incorporated in existent software development methodology, as METRICA V3; and therefore, to assure software product data quality created according to this methodology. The extension presented, is a guideline, and this can be extended and applied to other development methodologies like Unified Development Process.
2019
This research is an extension of a data object-driven approach to data quality evaluation allowing to analyse data object quality in scope of multiple data objects. Previously presented approach was used to analyse one particular data object, mainly focusing on syntactic analysis. It means that the primary data object quality can be analysed against secondary data objects of unlimited number. This opportunity allows making more comprehensive, in-depth contextual data object analysis. The given analysis was applied to open data sets, making comparison between previously obtained results and results of application of the extended approach, underlying importance and benefits of the given extension.
Thirty years ago, software was not considered a concrete value. Everyone agreed on its importance, but it was not considered as a good or possession. Nowadays, software is part of the balance of an organization. Data is slowly following the same process. The information owned by an organization is an important part of its assets. Information can be used as a competitive advantage. However, data has long been underestimated by the software community. Usually, methods and techniques apply to software (including data schemata), but the data itself has often been considered as an external problem. Validation and verification techniques usually assume that data is provided by an external agent and concentrate only on software.
International Journal of Business Information Systems, 2016
Data quality has significance to companies, but is an issue that can be challenging to approach and operationalise. This study focuses on data quality from the perspective of operationalisation by analysing the practices of a company that is a world leader in its business. A model is proposed for managing data quality to enable evaluation and operationalisation. The results indicate that data quality is best ensured when organisation specific aspects are taken into account. The model acknowledges the needs of different data domains, particularly those that have master data characteristics. The proposed model can provide a starting point for operationalising data quality assessment and improvement. The consequent appreciation of data quality improves data maintenance processes, IT solutions, data quality and relevant expertise, all of which form the basis for handling the origins of products.
2005
The exploration of data to extract information or knowledge to support decision making is a critical success factor for an organization in today's society. However, several problems can affect data quality. These problems have a negative effect in the results extracted from data, affecting their usefulness and correctness. In this context, it is quite important to know and understand the data problems. This paper presents a taxonomy of data quality problems, organizing them by granularity levels of occurrence. A formal definition is presented for each problem included. The taxonomy provides rigorous definitions, which are information-richer than the textual definitions used in previous works. These definitions are useful to the development of a data quality tool that automatically detects the identified problems.
Proceedings of the 16th International …, 2011
Data quality (DQ) assessment can be significantly enhanced with the use of the right DQ assessment methods, which provide automated solutions to assess DQ. The range of DQ assessment methods is very broad: from data profiling and semantic profiling to data matching and data validation. This paper gives an overview of current methods for DQ assessment and classifies the DQ assessment methods into an existing taxonomy of DQ problems. Specific examples of the placement of each DQ method in the taxonomy are provided and illustrate why the method is relevant to the particular taxonomy position. The gaps in the taxonomy, where no current DQ methods exist, show where new methods are required and can guide future research and DQ tool development.
Proceedings of the 8th international workshop on Software quality - WoSQ '11, 2011
This industrial contribution describes a tool support approach to assessing the quality of relational databases. The approach combines two separate audits-an audit of the database structure as described in the schema and an audit of the database content at a given point in time. The audit of the database schema checks for design weaknesses, data rule violations and deviations from the original data model. It also measures the size, complexity and structural quality of the database. The audit of the database content compares the state of selected data attributes to identify incorrect data and checks for missing and redundant records. The purpose is to initiate a data clean-up process to ensure or restore the quality of the data.
Iq, 2008
Nowadays, data plays a key role in organizations, and management of its quality is becoming an essential activity. As part of such required management, organizations need to draw up processes for measuring the data quality (DQ) levels of their organizational units, taking into account the particularities of different scenarios, available resources, and characteristics of the data used in them. Given that there are not many works in the literature related to this objective, this paper proposes a methodology-abbreviated MMPROto develop processes for measuring DQ. MMPRO is based on ISO/IEC 15939. Despite being a standard of quality software, we believe it can be successfully applied in this context because of the similarities between software and data. The proposed methodology consists of four activities: (1) Establish and sustain the DQ measurement commitment, (2) Plan the DQ Measurement Process, (3) Perform the DQ Measurement Process, and (4) Evaluate the DQ Measurement Process. These four activities are divided into tasks. For each task, input and output products are listed, as well as a set of useful techniques and tools, many of them borrowed from the Software Engineering field.
Information & Management, 1980
Until recently, data quality was poor'.y understood and seldom achieved, yet it is essential to tlihe effective use of information systems. This paper discusses/ the nature and importance of data quality. The role of dataquality is placed in the life cycle framework. Many new concepts, tools and i techniques from both programming lang,uages and database management systems are presented and rhiated to data quality. In particular, the coqcept of a databrlse constraint is considered in detail. Some current limitation/s and research directions are proposed.
2010
A successful information system is the one that meets its design goals. Expressing these goals and subsequently translating them into a working solution is a major challenge for information systems engineering. This thesis adopts the concepts and techniques from goal-oriented (software) requirements engineering research for conceptual database design, with a focus on data quality issues. Based on a real-world case study, a goal-oriented process is proposed for database requirements analysis and modeling.
2015
Data quality (DQ) has been studied in significant depth over the last two decades and has received attention from both the academic and the practitioner community. Over that period of time a large number of data quality dimensions have been identified in due course of research and practice. While it is important to embrace the diversity of views of data quality, it is equally important for the data quality research and practitioner community to be united in the consistent interpretation of this foundational concept. In this paper, we provide a step towards this consistent interpretation. Through a systematic review of research and practitioner literature, we identify previously published data quality dimensions and embark on the analysis and consolidation of the overlapping and inconsistent definitions. We stipulate that the shared understanding facilitated by this consolidation is a necessary prelude to generic and declarative forms of requirements modeling for data quality.
Nowadays, activities and decisions making in an organization is based on data and information obtained from data analysis, which provides various services for constructing reliable and accurate process. As data are significant resources in all organizations the quality of data is critical for managers and operating processes to identify related performance issues. Moreover, high quality data can increase opportunity for achieving top services in an organization. However, identifying various aspects of data quality from definition, dimensions, types, strategies, techniques are essential to equip methods and processes for improving data. This paper focuses on systematic review of data quality dimensions in order to use at proposed framework which combining data mining and statistical techniques to measure dependencies among dimensions and illustrate how extracting knowledge can increase process quality.
19º Simposio Brasileiro …, 2004
To solve complex user requirements the information systems need to integrate data from several, possibly autonomous data sources. One challenge in such environment is to provide the user with data meeting his requirements in terms of quality. These requirements are difficult to satisfy because of the strong heterogeneity of the sources. In this paper we address the problem of data quality evaluation in data integration systems. We present a framework which is a first attempt to formalize the evaluation of data quality. It is based on a graph model of the data integration system which allows us to define evaluation methods and demonstrate propositions in terms of graph properties. To illustrate our approach, we also present a first experiment with the data freshness quality factor and we show how the framework is used to evaluate this factor according to different scenarios.
Various techniques have been proposed to enable organisations to assess the current quality level of their data. Unfortunately, organisations have many different requirements related to data quality (DQ) assessment because of domain and context differences. Due to the gamut of possible requirements, organisations may be forced to select an assessment technique which may not be wholly suitable for their requirements. Therefore, we propose and evaluate the Hybrid Approach to assessing DQ which demonstrates that it is possible to develop new techniques for assessing DQ, suitable for any set of requirements, while leveraging the best practices proposed by existing ATs.
The notion of data quality cannot be separated from the context in which the data is produced or used. Recently, a conceptual framework for capturing context-dependent data quality assessment has been proposed. According to it, a database D is assessed wrt. a context which is modeled as an external system containing additional data, metadata, and definitions of quality predicates. The instance D is "put in context" via schema mappings; and after contextual processing of the data, a collection of alternative clean versions D of D is produced. The quality of D is measured in terms of its distance to this class. In this work we extend contexts for data quality assessment by including multidimensional data, which allows to analyze data from multiple perspectives and different degrees of granularity. It is possible to navigate through dimensional hierarchies in order to go for the data that is needed for quality assessment. More precisely, we introduce contextual hierarchies as components of contexts for data quality assessment. The resulting contexts are later represented as ontologies written in description logic.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.