Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
Thirty years ago, software was not considered a concrete value. Everyone agreed on its importance, but it was not considered as a good or possession. Nowadays, software is part of the balance of an organization. Data is slowly following the same process. The information owned by an organization is an important part of its assets. Information can be used as a competitive advantage. However, data has long been underestimated by the software community. Usually, methods and techniques apply to software (including data schemata), but the data itself has often been considered as an external problem. Validation and verification techniques usually assume that data is provided by an external agent and concentrate only on software.
The quality of data is important in research working with data sets because poor data quality may lead to invalid results. Data sets contain measurements that are associated with metrics and entities; however, in some data sets, it is not always clear which entities have been measured and exactly which metrics have been used. This means that measurements could be misinterpreted. In this study, we develop a framework for data quality assessment that determines whether a data set has sufficient information to support the correct interpretation of data for analysis in empirical research. The framework incorporates a dataset metamodel and a quality assessment process to evaluate the data set quality. To evaluate the effectiveness of our framework, we conducted a user study. We used observations, a questionnaire and think aloud approach to provide insights into the framework through participant thought processes while applying the framework. The results of our study provide evidence that most participants successfully applied the definitions of dataset category elements and the formal definitions of data quality issues to the datasets. Further work is needed to reproduce our results with more participants, and to determine whether the data quality framework is generalizable to other types of data sets.
2010
In recent years many organizations have come to realize the importance of maintaining data with the most appropriate levels of data quality when using their information systems (IS). We therefore consider that it is essential to introduce and implement mechanisms into the organizational IS in order to ensure acceptable quality levels in its data. Only by means of these mechanisms, will users be able to trust in the data they are using for the task in hand. These mechanisms must be developed to satisfy their data quality requirements when using specific functionalities of the IS. From our point of view as software engineering researchers, both these data quality requirements and the remaining software requirements must be dealt with in an appropriate manner. Since the goal of our research is to establish means to develop those software mechanisms aimed at managing data quality in IS, we decided to begin by carrying out a survey on related methodological and technical issues to depict the current state of the field. We decided to use the systematic review technique to achieve this goal. This paper shows the principal results of the survey, along with the conclusions reached.
2009 Ninth International Conference on Quality Software, 2009
Inadequate levels of Data Quality (DQ) in Information Systems (IS) suppose a very important problem for organizations. In any case, they look for to assure data quality from earlier stages on information system developments. This paper proposes to incorporate mechanisms into software development methodologies, in order to integrate users DQ requirements aimed at assuring the data quality from the beginning of development. It brings a framework consisting of processes, activities and tasks, well defined, which would be incorporated in existent software development methodology, as METRICA V3; and therefore, to assure software product data quality created according to this methodology. The extension presented, is a guideline, and this can be extended and applied to other development methodologies like Unified Development Process.
Information & Management, 1980
Lecture Notes in Computer Science, 2014
Data collection and analysis are key artifacts in any software engineering experiment. However, these data might contain errors. We propose a Data Quality model specific to data obtained from software engineering experiments, which provides a framework for analyzing and improving these data. We apply the model to two controlled experiments, which results in the discovery of data quality problems that need to be addressed. We conclude that data quality issues have to be considered before obtaining the experimental results.
2013 20th Asia-Pacific Software Engineering Conference (APSEC), 2013
Background: The quality of data sets used in software engineering research is of the utmost importance. To ensure credibility of results obtained from use of data sets, the quality of the data must be examined. Objective: This study provides an overview of recent research (2008)(2009)(2010)(2011)(2012) involving data quality in software engineering data sets, with the goal of generally understanding what research there is that addresses data quality, and in particular to determine to what degree researchers have addressed any data quality issues in order to evaluate the trustworthiness of their results. Method: We performed a systematic mapping study to investigate treatment of data quality issues in software engineering research. A total of 64 papers published from 2008 to 2012 explicitly address issues with the quality of data and use software engineering data sets. These studies were classified according to the data quality topic, data set and data quality problem. Results: We found only 31 studies gave serious consideration for how the quality of the data affected their results. We observed that there is a lack of clear and consistent terminology regarding data quality, especially with respect to the kinds of quality problems a data set might have. As a first step to address this problem, we propose a model that describes the lifecycle that research data goes through when used in research. Conclusions: The results suggest that researchers should give more attention to the quality of data sets in order to produce trustworthy data for reliable empirical research, and that the research community needs to better understand and communicate issues with data quality.
2010
A successful information system is the one that meets its design goals. Expressing these goals and subsequently translating them into a working solution is a major challenge for information systems engineering. This thesis adopts the concepts and techniques from goal-oriented (software) requirements engineering research for conceptual database design, with a focus on data quality issues. Based on a real-world case study, a goal-oriented process is proposed for database requirements analysis and modeling.
Iq, 2008
Nowadays, data plays a key role in organizations, and management of its quality is becoming an essential activity. As part of such required management, organizations need to draw up processes for measuring the data quality (DQ) levels of their organizational units, taking into account the particularities of different scenarios, available resources, and characteristics of the data used in them. Given that there are not many works in the literature related to this objective, this paper proposes a methodology-abbreviated MMPROto develop processes for measuring DQ. MMPRO is based on ISO/IEC 15939. Despite being a standard of quality software, we believe it can be successfully applied in this context because of the similarities between software and data. The proposed methodology consists of four activities: (1) Establish and sustain the DQ measurement commitment, (2) Plan the DQ Measurement Process, (3) Perform the DQ Measurement Process, and (4) Evaluate the DQ Measurement Process. These four activities are divided into tasks. For each task, input and output products are listed, as well as a set of useful techniques and tools, many of them borrowed from the Software Engineering field.
2015
Data quality (DQ) has been studied in significant depth over the last two decades and has received attention from both the academic and the practitioner community. Over that period of time a large number of data quality dimensions have been identified in due course of research and practice. While it is important to embrace the diversity of views of data quality, it is equally important for the data quality research and practitioner community to be united in the consistent interpretation of this foundational concept. In this paper, we provide a step towards this consistent interpretation. Through a systematic review of research and practitioner literature, we identify previously published data quality dimensions and embark on the analysis and consolidation of the overlapping and inconsistent definitions. We stipulate that the shared understanding facilitated by this consolidation is a necessary prelude to generic and declarative forms of requirements modeling for data quality.
International Journal of Business Information Systems, 2016
Data quality has significance to companies, but is an issue that can be challenging to approach and operationalise. This study focuses on data quality from the perspective of operationalisation by analysing the practices of a company that is a world leader in its business. A model is proposed for managing data quality to enable evaluation and operationalisation. The results indicate that data quality is best ensured when organisation specific aspects are taken into account. The model acknowledges the needs of different data domains, particularly those that have master data characteristics. The proposed model can provide a starting point for operationalising data quality assessment and improvement. The consequent appreciation of data quality improves data maintenance processes, IT solutions, data quality and relevant expertise, all of which form the basis for handling the origins of products.
Procedia Computer Science, 2017
The paper discusses an external solution for data quality management in information systems. In contradiction to traditional data quality assurance methods, the proposed approach provides the usage of a domain specific language (DSL) for description data quality models. Data quality models consists of graphical diagrams, which elements contain requirements for data object's values and procedures for data object's analysis. The DSL interpreter makes the data quality model executable therefore ensuring measurement and improving of data quality. The described approach can be applied: (1) to check the completeness, accuracy and consistency of accumulated data; (2) to support data migration in cases when software architecture and/or data models are changed; (3) to gather data from different data sources and to transfer them to data warehouse.
Information & Management, 1980
Until recently, data quality was poor'.y understood and seldom achieved, yet it is essential to tlihe effective use of information systems. This paper discusses/ the nature and importance of data quality. The role of dataquality is placed in the life cycle framework. Many new concepts, tools and i techniques from both programming lang,uages and database management systems are presented and rhiated to data quality. In particular, the coqcept of a databrlse constraint is considered in detail. Some current limitation/s and research directions are proposed.
Journal on Advances in Theoretical and Applied Informatics
Users need trusting in data managed by software applications that are part of Information Systems (IS), which supposes that organizations should assuring adequate levels of quality in data that are managed in their IS. Therefore, the fact that an IS can manage data with an adequate level of quality should be a basic requirement for all organizations. In order to reach this basic requirement some aspects and elements related with data quality (DQ) should be taken in account from the earliest stages of development of software applications, i.e. “data quality by design”. Since DQ is considered a multidimensional and largely context-dependent concept, managing all specific requirements is a complex task. The main goal of this paper is to introduce a specific methodology, which is aimed to identifying and eliciting DQ requirements coming from different viewpoints of users. These specific requirements will be used as normal requirements (both functional and non-functional) during the deve...
2018
The paper discusses an external solution for data quality management in information systems. In contradiction to traditional data quality assurance methods, the proposed approach provides the usage of a domain specific language (DSL) for description data quality models. Data quality models consists of graphical diagrams, which elements contain requirements for data object’s values and procedures for data object’s analysis. The DSL interpreter makes the data quality model executable therefore ensuring measurement and improving of data quality. The described approach can be applied: (1) to check the completeness, accuracy and consistency of accumulated data; (2) to support data migration in cases when software architecture and/or data models are changed; (3) to gather data from different data sources and to transfer them to data warehouse. © 2016 The Authors. Published by Elsevier B.V. Peer-review under responsibility of organizing committee of the scientific committee of the internat...
2002
Abstract: La qualità dei dati è una tematica affrontata in ambito statistico, gestionale, informatico, insieme a molti altri settori scientifici. Il presente articolo considera il problema della definizione della qualità dei dati dal punto di vista informatico. Sono comparate alcune proposte di dimensioni (o caratteristiche) che contribuiscono alla definizione della qualità dei dati e viene introdotta una definizione “base” di tale concetto.
JUCS - Journal of Universal Computer Science, 2020
The paper proposes a new data object-driven approach to data quality evaluation. It consists of three main components: (1) a data object, (2) data quality requirements, and (3) data quality evaluation process. As data quality is of relative nature, the data object and quality requirements are (a) use-case dependent and (b) defined by the user in accordance with his needs. All three components of the presented data quality model are described using graphical Domain Specific Languages (DSLs). In accordance with Model-Driven Architecture (MDA), the data quality model is built in two steps: (1) creating a platform-independent model (PIM), and (2) converting the created PIM into a platform-specific model (PSM). The PIM comprises informal specifications of data quality. The PSM describes the implementation of a data quality model, thus making it executable, enabling data object scanning and detecting data quality defects and anomalies. The proposed approach was applied to open data sets, ...
2005
The exploration of data to extract information or knowledge to support decision making is a critical success factor for an organization in today's society. However, several problems can affect data quality. These problems have a negative effect in the results extracted from data, affecting their usefulness and correctness. In this context, it is quite important to know and understand the data problems. This paper presents a taxonomy of data quality problems, organizing them by granularity levels of occurrence. A formal definition is presented for each problem included. The taxonomy provides rigorous definitions, which are information-richer than the textual definitions used in previous works. These definitions are useful to the development of a data quality tool that automatically detects the identified problems.
Journal of Systems and Software, 2021
The most successful organizations in the world are data-driven businesses. Data is at the core of the business of many organizations as one of the most important assets, since the decisions they make cannot be better than the data on which they are based. Due to this reason, organizations need to be able to trust their data. One important activity that helps to achieve data reliability is the evaluation and certification of the quality level of organizational data repositories. This paper describes the results of the application of a data quality evaluation and certification process to the repositories of three European organizations belonging to different sectors. We present findings from the point of view of both the data quality evaluation team and the organizations that underwent the evaluation process. In this respect, several benefits have been explicitly recognised by the involved organizations after achieving the data quality certification for their repositories (e.g., long-term organizational sustainability, better internal knowledge of data, and a more efficient management of data quality). As a result of this experience, we have also identified a set of best practices aimed to enhance the data quality evaluation process.
Proceedings of the 2017 Federated Conference on Computer Science and Information Systems, 2017
The research discusses the issue how to describe data quality and what should be taken into account when developing an universal data quality management solution. The proposed approach is to create quality specifications for each kind of data objects and to make them executable. The specification can be executed step-by-step according to business process descriptions, ensuring the gradual accumulation of data in the database and data quality checking according to the specific use case. The described approach can be applied to check the completeness, accuracy, timeliness and consistency of accumulated data.
2005
The technological advance and the internet have favoured the appearance of a great diversity of web applications through which organizations develop their businesses in a more and more competitive environment. A decisive factor for this competitiveness is the assurance of data quality of the web applications used. In the last years, several research works on Data Quality have been developed. In this paper, we will present a systematic review of those researches.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.