Automated ETL Testing on the Data Quality of a Data Warehouse

sara Dakrory

Automated ETL Testing on the Data Quality of a Data Warehouse

sara Dakrory

Sign up for access to the world's latest research

checkGet notified about relevant papers

checkSave papers to use in your research

checkJoin the discussion with peers

checkTrack your impact

Abstract

Testing ETL (Extract, Transform, and Load) procedures is an important and vital phase during testing Data warehouse (DW); it’s almost the most complex phase, because it directly affects the quality of data. It has been proved that automated testing is valuable tool to improve the quality of DW systems while the manual testing process is time consuming and not accurate so automating tests improves Data Quality (DQ) in less time, cost and attaining good data quality. In this paper the author’s propose testing framework to automate testing data quality at the stage of ETL process. Different datasets with different volumes (stared from 10,000 records till 50,000 records) are used to evaluate the effectiveness of the proposed automated ETL testing. The conducted experimental results showed that the proposed testing framework is effective in detecting errors with the different data volumes.

Figures (8)

International J ournal of Computer Applications (0975 — 8887) Volume 131 —No.16, December2015

Fig 2: The proposed Automating DQ Testing Framework

Fig 4: ERD for source database. International J ournal of Computer Applications (0975 — 8887) Volume 131 —No.16, December2015 The data flow diagram for the ETL process is shown that the specific columns will be extracted from the source database and the extracted data will be compare with those in the DW table, only the new records will be added and the others will be rejected. Figure (5) displays the data flow for employee dimension table.

The proposed tool has been tested by carried out two different databases, including respectively 1) real data that loaded in a

Table 2: The Generated Test Cases Results.

After getting the TCs results from the previous table an evaluation of the system effectiveness is made by compare the detected defeats with the number of seeded defects .Figure (6) shows the seeded errors' types and there frequency in the tested data set.

sara Dakrory

2016

Log In

Automated ETL Testing on the Data Quality of a Data Warehouse

Sign up for access to the world's latest research

Abstract

Related papers

Related papers