Executable Data Quality Models

Jānis Bičevskis

Executable Data Quality Models

Jānis Bičevskis

2017, Procedia Computer Science

Sign up for access to the world's latest research

checkGet notified about relevant papers

checkSave papers to use in your research

checkJoin the discussion with peers

checkTrack your impact

Abstract

The paper discusses an external solution for data quality management in information systems. In contradiction to traditional data quality assurance methods, the proposed approach provides the usage of a domain specific language (DSL) for description data quality models. Data quality models consists of graphical diagrams, which elements contain requirements for data object's values and procedures for data object's analysis. The DSL interpreter makes the data quality model executable therefore ensuring measurement and improving of data quality. The described approach can be applied: (1) to check the completeness, accuracy and consistency of accumulated data; (2) to support data migration in cases when software architecture and/or data models are changed; (3) to gather data from different data sources and to transfer them to data warehouse.

Figures (3)

Fig. 1. (a) ER model of WTT; (b) Syntactic control of input record.

Fig. 2. (a) contextual control on interrelated data; (b) implementation of contextual control on interrelated data. The ETL procedure of data warehouse receives messages containing five values of attributes: < Proj_ID, Dev_ID, Wt_date, Wt_hours, Wt_work_descr >

Fig. 3. (a) Contextual control on the entire data set; (b) implementation of contextual control on the entire data set. This paper emphasizes the role of executable data quality model with three successive data quality controls. The informally described controls should be replaced be executab e software routines. The syntactic control or correctness of input fields may be implemented by using external code libraries, accordingly this topic will not be discussed further. The contextual control on interrelated data can course, the statements will be individual for each usage or task. be described by SQL statements (see Fig.2b). O: The contextual control on the entire data set may also be described using SQL statements although the statements could be rather complicated (see Fig.3b). Therefore it is proposed to create individual data quality models for each usage instead of one universal data quality contro solution. Because the data quality controls may be implemented by software routines or SQL statements, the dat quality model is executable. In the example given some of executable data quality management activities are no explicitly defined, for instance, assigning of values to variables in software or sending of error messages (activity SendMessage) to users. These activities may be unique but still quite trivial and therefore they are not addressed it the following discussions.

Jānis Bičevskis

2018

The paper discusses an external solution for data quality management in information systems. In contradiction to traditional data quality assurance methods, the proposed approach provides the usage of a domain specific language (DSL) for description data quality models. Data quality models consists of graphical diagrams, which elements contain requirements for data object’s values and procedures for data object’s analysis. The DSL interpreter makes the data quality model executable therefore ensuring measurement and improving of data quality. The described approach can be applied: (1) to check the completeness, accuracy and consistency of accumulated data; (2) to support data migration in cases when software architecture and/or data models are changed; (3) to gather data from different data sources and to transfer them to data warehouse. © 2016 The Authors. Published by Elsevier B.V. Peer-review under responsibility of organizing committee of the scientific committee of the internat...

Log In

Executable Data Quality Models

Sign up for access to the world's latest research

Abstract

Related papers

Related topics

Related papers