Using Materialized View in Data Warehousing
Environment
- A case study
Infosys Technologies Ltd.
By
Dhruba Satya Mukherjee
Doc No. Version Date Author(s) Reviewer(s)
1 0.00a May 30h 05 Infosys
Using Materialized View in Data warehousing Env – a case [Link] Page 1 of 7
Acknowledgements
I am grateful to my project managers Manoranjan Nayak, Nayan Kumar Puthal
and Suman Saha under whose leadership we are executing the project
ABNBASLT. Last, but definitely not the least, thanks to our team mates, who
provided many of the inputs to create this document.
Thank You,
Dhruba Satya Mukherjee
Using Materialized View in Data warehousing Env – a case [Link] Page 2 of 7
TABLE OF CONTENTS
DECLARATION..............................................................................................................................................2
ACKNOWLEDGEMENTS...........................................................................................................................3
INTRODUCTION..........................................................................................................................................4
CASE OVERVIEW........................................................................................................................................5
SOLUTIONS...................................................................................................................................................5
FINAL RECOMMENDATION....................................................................................................................6
REFERENCES.................................................................................................................................................8
Using Materialized View in Data warehousing Env – a case [Link] Page 3 of 7
Introduction
This document contains a detailed description of the usage of Materialized View in Data
Warehousing environment.
Case Overview:
ABN AMRO Services Company North America (NA), is trying to comply with Basel II
specification and guidelines. ABN AMRO deals with various types of products and hence
needs to capture various kinds of data coming from different operational source systems.
This implies that data from divergent operational source systems should be brought into
one central repository/DW (Called FRW (Financial Reporting Warehouse) at ABN). The
data should be cleaned and massaged to make sure that it is consistent with data coming
from other source systems. The data mainly comes as flat files and it needs to be loaded
first to a staging area (ODS). From ODS the data will be loaded to the
Warehouse(ODW). All these ETL activities are to be accomplished by the help of an
ETL tool i.e Informatica PowerCenter 7.1
The ODW basically is a set of relational tables. Each table is capable housing different
types of data. For example one table can contain balance data, some other table may
contain income related data or Customer data .
To capture balance related data they need to do some avarage calculation and the base
table is a huge one (containing more than 120 million rows). It is surely going to affect
the performance of the ETL code.
Solutions:
Alternative 1:
The average calculation can be done in the ETL code itself by adding an Aggregator
Transformation.
Cons:
1. The ODS to ODW ETL mapping is already bulky . Therefore adding one more
Aggregator transformation is not going to help much.
2. The entire ETL load will wait for the Aggregator to complete its operation. The
main load window will be affected.
Pros:
3. Easy to implement
4. Easy to understand and maintain
Using Materialized View in Data warehousing Env – a case [Link] Page 4 of 7
Alternative 2:
Creating a Materialized View on the base table using group by clause and refreshing the
materialized view in a separate flow (outside the main load window) ON DEMAND
refresh.
Pros:
1. The performance of the main ETL flow will not get affected.
2. Refreshing the Materialized view can be controlled separately without affecting
the main load window.
3. Once the Materialized View gets refreshed , doing a lookup on the materialized
view will be much easier than calculating the average from the base table.
Basically the Materialized View will store the average values of the base table.
4. By creating partitions on the base table the Materialized View refresh can be
made faster.
Cons:
1. A separate stored procedure is required to refresh the Materialized View. Other
wise the Materialized View refresh process needs to be automated.
2. A Materialized View occupies space therefore to maintain a Materialized View
we need space in the database.
Alternative 3:
Creating a Materialized View on the base table using group by clause and refreshing the
materialized view in the same flow (outside the main load window) ON COMMIT
refresh.
Cons:
1. The performance of the main ETL flow will be affected. The refresh will be done
immediately after commit.
2. Refreshing the Materialized view can not be controlled separately without
affecting the main load window.
3. A Materialized View occupies space therefore to maintain a Materialized View
we need space in the database.
Cons:
Using Materialized View in Data warehousing Env – a case [Link] Page 5 of 7
4. Once the Materialized View gets refreshed , doing a lookup on the materialized
view will be much easier than calculating the average from the base table.
Basically the Materialized View will store the average values of the base table.
5. No external procedure is required to refresh the Materialized View.
Alternative 3:
The average calculation can be done in the ETL code by adding an Aggregator
Transformation. But multiple partitions and sub partitions will be created on the base
table so that we can take advantage of Partition Prunning.
Pros:
1. Easy to implement
2. Performace will be faster because the data will be searched in a single partition
only.
3. Easy to understand and maintain
Cons:
4. The ODS to ODW ETL mapping is already bulky. Therefore adding one more
Aggregator transformation may affect the performance.
5. The entire ETL load will wait for the Aggregator to complete its operation. The
main load window will be affected.
Final Recommendation:
We went for the alternative 2: as this turned out to be performance wise faster than the
other two. Based on the test result a decision was taken to follow the 2 nd approach i.e
using a Materialized View where the base table has been partitioned.
Details of the Solution
A separate Materialized View is created for each Source System on the same base table.
The Refresh Mechanism was made ON DEMAND and a PL/SQL procedure was created
to refresh the Materialized View. Using the REFRESH () method of the DBMS_MVIEW
package the Materialized view is refreshed.
Base table Definition
CREATE TABLE AVG_BAL_TAB_ODS
(
SRC_SYSTEM_ID VARCHAR2(6) NULL,
SRC_TABLE_NAME VARCHAR2(30) NULL,
PK_VALUE1 VARCHAR2(30) NULL,
Using Materialized View in Data warehousing Env – a case [Link] Page 6 of 7
PK_VALUE2 VARCHAR2(30) NULL,
PK_VALUE3 VARCHAR2(30) NULL,
PK_VALUE4 VARCHAR2(30) NULL,
PK_VALUE5 VARCHAR2(30) NULL,
YEAR NUMBER(4) NULL,
MONTH NUMBER(2) NULL,
DAY NUMBER(2) NULL,
PROCESS_DATE DATE NULL,
AMT_FIELD_1 NUMBER(15,3) NULL,
AMT_FIELD_2 NUMBER(15,3) NULL,
AMT_FIELD_3 NUMBER(15,3) NULL,
AMT_FIELD_4 NUMBER(15,3) NULL,
DLY_MTLY_FLAG CHAR(1) NULL
)
PARTITION BY RANGE (YEAR, MONTH)
SUBPARTITION BY LIST (SRC_SYSTEM_ID)
Materialized View Structure
CREATE MATERIALIZED VIEW AVG_BAL_TAB_ODS_<Source System Name>
BUILD DEFFERED
REFRESH COMPLETE ON DEMAND
AS
SELECT
TEMP.SRC_SYSTEM_ID,
TEMP.SRC_TABLE_NAME,
TEMP.PK_VALUE1,
TEMP.PK_VALUE2,
TEMP.PK_VALUE3,
TEMP.PK_VALUE4,
TEMP.PK_VALUE5,
[Link],
[Link],
SUM(TEMP.AMT_FIELD_1) TOT_AMT_1,
SUM(TEMP.AMT_FIELD_2) TOT_AMT_2,
SUM(TEMP.AMT_FIELD_3) TOT_AMT_3,
SUM(TEMP.AMT_FIELD_4) TOT_AMT_4
FROM
AVG_BAL_TAB_ODS TEMP
WHERE TEMP.SRC_SYSTEM_ID='<Source System ID>'
GROUP BY
TEMP.SRC_SYSTEM_ID, TEMP.SRC_TABLE_NAME, TEMP.PK_VALUE1,
TEMP.PK_VALUE2, TEMP.PK_VALUE3,TEMP.PK_VALUE4, TEMP.PK_VALUE5,
[Link], [Link];
In ODW instead of doing the lookup on the AVG_BAL_TAB_ODS table the lookup is
done on the Materialized View.
References:
Oracle 9i SQL Reference
Using Materialized View in Data warehousing Env – a case [Link] Page 7 of 7