Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
…
7 pages
1 file
Data warehouse (DW) testing is a very critical stage in the DW development because decisions are made based on the information resulting from the DW. So, testing the quality of the resulting information will support the trustworthiness of the DW system. A number of approaches were made to describe how the testing process should take place in the DW environment. In this paper we will present briefly these testing approaches, and then a proposed matrix that structures the DW testing routines will be used to evaluate and compare these approaches. Afterwards an analysis of the comparison matrix will highlight the weakness points that exist in the available DW testing approaches. Finally, we will point out the requirements towards achieving a homogeneous DW testing framework. In the end, we will conclude our work.
ipcsit.net
The aim of this article is to suggest the partial proposal of datawarehouse testing methodology.
International Journal of Recent Technology and Engineering, 2019
Data Quality, Database Testing, and ETL Testing are all different techniques for testing Data Warehouse Environment. Testing the data became very important as it should be guaranteed that the data is accurate for further manipulation and decision making. A lot of approaches and tools came up supporting and defining the test cases to be used, their functionality, and if they could be automated or not. The most trending approach was the automating of testing data warehouse using tools, the tools started firstly by supporting only the automation of running the scripts helping the developers to write the test case just once and run it multiple times, then the tools developed and modified to automate the creation of the testing scripts and offer their service as a complete application that supports the creation and running of the test cases claiming that the user can work without the need of expertise and high technicality and just by being an end user using the tool's GUI. Banking sector differs completely than any other industry, as data warehouse in banking sectors collects data from multiple sources and multiple branches with different data formats, and quality that should then be transformed and loaded in the data warehouse and classified into some data marts to be used in different dashboards and projects that depend on high quality and accurate data for further decision making and predictions. In this paper we propose a strategy for data warehouse testing, that automates all the test cases needed in banking environment
In the past few years, the data warehouse (DW) has regained experts' interest due to the paradigm shift from data storages to data analysis. During the development of DWs data passes through a number of transformations and are staged in multiple storages which might lead to data corruption and/or manipulation. Hence, testing DWs is a vital stage in the DW development life cycle. In this paper, we will present a DW testing approach that is adjustable to fit multiple DW architectures and will present its applicability on three case studies to outline the flexibility and generality of the proposed approach.
2012
In current trend, every software development, enhancement, or maintenance project includes some quality assurance activities. Quality assurance attempts defects prevention by concentrating on the process of producing the rather than working on the defect detection after the product is built. Regression testing means rerunning test cases from existing test suites to build confidence that software changes have no unintended side-effects. Data warehouse obtains the data from a number of operational data source systems which can be relational tables or ERP package, etc. The data from these sources are converted and loaded into data warehouse in suitable form, this process is called Extraction, Transformation and Loading (ETL). In addition to the target database, there will be another data base to store the metadata, called the metadata repository. This data base contains data about data-description of source data, target data and how the source data has been transformed into target data. In data warehouse migration or enhancement projects, data quality checking process includes ensuring all expected data is loaded, data is transformed correctly according to design specifications, comparing record counts between source data loaded to the warehouse and rejected records, validating correct processing of ETL-generated fields such as surrogate keys. The quality check process also involves validating the data types in the warehouse are as specified in the design and/or the data model. In our work, have automated regression testing for ETL activities, which will saves effort and resource while being more accurate and less prone to any issues. Author experimented around 338 Regression test cases, manual testing is taking around 800 hrs so with RTA it will take around 88 hrs which is a reduction of 84%. This paper explains the process of automating the regression suite for data quality testing in data warehouse systems.
2015
Data warehouse (DW) projects are undertakings that require integration of disparate sources of data, a well-defined mapping of the source data to the reconciled data, and effective Extract, Transform, and Load (ETL) processes. Owing to the complexity of data warehouse projects, great emphasis must be placed on an agile-based approach with properly developed and executed test plans throughout the various stages of designing, developing, and implementing the data warehouse to mitigate against budget overruns, missed deadlines, low customer satisfaction, and outright project failures. Yet, there are often attempts to test the data warehouse exactly like traditional back-end databases and legacy applications, or to downplay the role of quality assurance (QA) and testing, which only serve to fuel the frustration and mistrust of data warehouse and business intelligence (BI) systems. In spite of this, there are a number of steps that can be taken to ensure DW/BI solutions are successful, highly trusted, and stable. In particular, adopting a Data Vault (DV)-based Enterprise Data Warehouse (EDW) can simplify and enhance various aspects of testing, and curtail delays common in non-DV based DW projects. A major area of focus in this research is raw DV loads from source systems, keeping transformations to a minimum in the ETL process which loads the DV from the source. Certain load errors, classified as permissible errors and enforced by business rules, are kept in the Data Vault until correct values are supplied. Major transformation activities are pushed further downstream to the next ETL process which loads and refreshes the Data Mart (DM) from the Data Vault.
Testing ETL (Extract, Transform, and Load) procedures is an important and vital phase during testing Data warehouse (DW); it’s almost the most complex phase, because it directly affects the quality of data. It has been proved that automated testing is valuable tool to improve the quality of DW systems while the manual testing process is time consuming and not accurate so automating tests improves Data Quality (DQ) in less time, cost and attaining good data quality. In this paper the author’s propose testing framework to automate testing data quality at the stage of ETL process. Different datasets with different volumes (stared from 10,000 records till 50,000 records) are used to evaluate the effectiveness of the proposed automated ETL testing. The conducted experimental results showed that the proposed testing framework is effective in detecting errors with the different data volumes.
Information Technology And Control
A data warehouse should be tested for data quality on regular basis, preferably as a part of each ETL cycle. That way, a certain degree of confidence in the data warehouse reports can be achieved, and it is generally more likely to timely correct potential data errors. In this paper, we present an algorithm primarily intended for integration testing in the data warehouse environment, though more widely applicable. It is a generic, time-constrained, metadata driven algorithm that compares large database tables in order to attain the best global overview of the data set's differences in a given time frame. When there is not enough time available, the algorithm is capable of producing coarse, less precise estimates of all data sets differences, and if allowed enough time, the algorithm will pinpoint exact differences. This paper presents the algorithm in detail, presents algorithm evaluation on the data of a real project and TPC-H data set, and comments on its usability. The tests show that the algorithm outperforms the relational engine when the percentage of differences in the database is relatively small, which is typical for data warehouse ETL environments.
2016
Testing ETL (Extract, Transform, and Load) procedures is an important and vital phase during testing Data warehouse (DW); it’s almost the most complex phase, because it directly affects the quality of data. It has been proved that automated testing is valuable tool to improve the quality of DW systems while the manual testing process is time consuming and not accurate so automating tests improves Data Quality (DQ) in less time, cost and attaining good data quality. In this paper the author’s propose testing framework to automate testing data quality at the stage of ETL process. Different datasets with different volumes (stared from 10,000 records till 50,000 records) are used to evaluate the effectiveness of the proposed automated ETL testing. The conducted experimental results showed that the proposed testing framework is effective in detecting errors with the different data volumes.
Verification and validation are two important processes in the software system lifecycle. Despite the importance of these processes, a recent survey has shown that testing of data warehouse systems is currently neglected. The survey participants named besides others modest budget and the lack of appropriate tools as potential reasons for this circumstance. In order to verify these reasons, the paper at hand presents an evaluation of unit testing tools suitable for data warehouse testing. To address the modest budget problem, the range of evaluation candidates is limited to no charge, open source solutions, namely AnyDbTest, BI.Quality, DbFit, DbUnit, NDbUnit, SQLUnit, TSQLUnit, and utPLSQL. The evaluation follows the IEEE 14102-2010 guidelines for evaluation and selection of computer-aided software engineering tools in order to guarantee benefits from a practioners’ as well as scientific point of view. It results in a detailed overview of how the testing tools meet criteria such as ...
The Extract-Transform-Load (ETL) process in data warehousing involves extracting data from source databases, transforming it into a form suitable for research and analysis, and loading it into a data warehouse. ETL processes can use complex transformations involving sources and targets that use different schemas, databases, and technologies, which make ETL implementations fault-prone. In this paper, we present an approach for validating ETL processes using automated balancing tests that check for various types of discrepancies between the source and target data. We formalize three categories of properties, namely, completeness, consistency, and syntactic validity that must be checked during testing. Our approach uses the rules provided in the ETL specifications to generate source-to-target mappings, from which balancing test assertions are generated for each property. We evaluated the approach on a real-world health data warehouse project and revealed 11 previously undetected faults. Using mutation analysis, we demonstrated that our auto-generated assertions can detect faults in the data inside the target data warehouse.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
Journal of Statistics and Management Systems, 2017
Proceedings of the 28th International Conference on Software Engineering and Knowledge Engineering, 2016
2017
International Journal of Computer and Electrical Engineering, 2009
IEE Proceedings - Software, 2002