Unit 3
Data Warehouse Architecture
Generally a data warehouses adopts three-tier architecture. Following are
the three tiers of the data warehouse architecture.
Bottom Tier - The bottom tier of the architecture is the data
warehouse database server. It is the relational database system. We
use the back end tools and utilities to feed data into the bottom tier.
These back end tools and utilities perform the Extract, Clean, Load,
and refresh functions.
Middle Tier - In the middle tier, we have the OLAP Server that can be
implemented in either of the following ways.
o By Relational OLAP (ROLAP), which is an extended relational
database management system. The ROLAP maps the operations
on multidimensional data to standard relational operations.
o By Multidimensional OLAP (MOLAP) model, which directly
implements the multidimensional data and operations.
Top-Tier - This tier is the front-end client layer. This layer holds the
query tools and reporting tools, analysis tools and data mining tools.
Data Warehouse Models
From the perspective of data warehouse architecture, we have the following
data warehouse models:
1
Virtual Warehouse
Data mart
Enterprise Warehouse
Virtual Warehouse
The view over an operational data warehouse is known as a virtual
warehouse. It is easy to build a virtual warehouse. Building a virtual
warehouse requires excess capacity on operational database servers.
Data Mart
Data mart contains a subset of organization-wide data. This subset of data is
valuable to specific groups of an organization. In other words, we can claim
that data marts contain data specific to a particular group. For example, the
marketing data mart may contain data related to items, customers, and
sales. Data marts are confined to subjects.
Enterprise Warehouse
An enterprise warehouse collects all the information and the subjects
spanning an entire organization. It provides us enterprise-wide data
integration. The data is integrated from operational systems and external
information providers. This information can vary from a few gigabytes to
hundreds of gigabytes, terabytes or beyond.
Warehouse Manager
A warehouse manager is responsible for the warehouse management
process. It consists of third-party system software, C programs, and shell
scripts. The size and complexity of warehouse managers varies between
specific solutions. A warehouse manager includes the following:
The controlling process
Stored procedures or C with SQL
Backup/Recovery tool
SQL Scripts
2
Operations Performed by Warehouse Manager
A warehouse manager analyzes the data to perform consistency and
referential integrity checks.
Creates indexes, business views, partition views against the base data.
Generates new aggregations and updates existing aggregations.
Generates normalizations.
Transforms and merges the source data into the published data
warehouse.
Backup the data in the data warehouse.
Online Analytical Processing (OLAP)
Online Analytical Processing Server (OLAP) is based on the multidimensional
data model. It allows managers, and analysts to get an insight of the
information through fast, consistent, and interactive access to information.
OLAP Operations
Since OLAP servers are based on multidimensional view of data, we will
discuss OLAP operations in multidimensional data. Here is the list of OLAP
operations:
Roll-up
Drill-down
Slice and dice
Pivot (rotate)
Roll-up: Roll-up performs aggregation on a data cube in any of the following
ways: By climbing up a concept hierarchy for a dimension or by dimension
reduction. The following diagram illustrates how roll-up works.
3
Roll-up is performed by climbing up a concept hierarchy for the dimension
location. Initially the concept hierarchy was "street < city < province <
country". On rolling up, the data is aggregated by ascending the location
hierarchy from the level of city to the level of country. The data is grouped
into cities rather than countries. When roll-up is performed, one or more
dimensions from the data cube are removed.
Drill-down: Drill-down is the reverse operation of roll-up. It is performed by
either of the following ways: By stepping down a concept hierarchy for a
dimension or by introducing a new dimension. The following diagram
illustrates how drill-down works:
4
Drill-down is performed by stepping down a concept hierarchy for the
dimension time. Initially the concept hierarchy was "day < month < quarter
< year." On drilling down, the time dimension is descended from the level of
quarter to the level of month. When drill-down is performed, one or more
dimensions from the data cube are added. It navigates the data from less
detailed data to highly detailed data.
Slice: The slice operation selects one particular dimension from a given cube
and provides a new sub-cube. Consider the following diagram that shows
how slice works.
5
Here Slice is performed for the dimension "time" using the criterion time =
"Q1". It will form a new sub-cube by selecting one or more dimensions.
Dice: Dice selects two or more dimensions from a given cube and provides a
new sub-cube. Consider the following diagram that shows the dice operation.
The dice operation on the cube based on the following selection criteria
involves three dimensions.
(location = "Toronto" or "Vancouver")
(time = "Q1" or "Q2")
(item =" Mobile" or "Modem")
Pivot: The pivot operation is also known as rotation. It rotates the data axes
in view in order to provide an alternative presentation of data. Consider the
following diagram that shows the pivot operation.
6
OLAP vs OLTP
OLAP OLTP
Involves historical processing of Involves day-to-day processing.
information.
OLAP systems are used by OLTP systems are used by clerks,
knowledge workers such as DBAs, or database professionals.
executives, managers and analysts.
Useful in analyzing the business. Useful in running the business.
Based on Star Schema, Snowflake, Based on Entity Relationship Model.
Schema and Fact Constellation
Schema.
Provides summarized and Provides primitive and highly
consolidated data. detailed data.
Highly flexible. Provides high performance.
Types of OLAP Servers
We have four types of OLAP servers:
Relational OLAP (ROLAP)
Multidimensional OLAP (MOLAP)
Hybrid OLAP (HOLAP)
Relational OLAP:- Relational OLAP servers are placed between relational
back-end server and client front-end tools. To store and manage the
7
warehouse data, the relational OLAP uses relational or extended-relational
DBMS. ROLAP includes the following:
Implementation of aggregation navigation logic
Optimization for each DBMS back-end
Additional tools and services
ROLAP includes the following components:
Database server
ROLAP server
Front-end tool.
Advantages
ROLAP servers can be easily used with existing RDBMS.
Data can be stored efficiently, since no zero facts can be stored.
ROLAP tools do not use pre-calculated data cubes.
Disadvantages
Poor query performance.
Some limitations of scalability depending on the technology
architecture that is utilized.
Multidimensional OLAP: Multidimensional OLAP (MOLAP) uses array-based
multidimensional storage engines for multidimensional views of data. With
multidimensional data stores, the storage utilization may be low if the data
set is sparse. Therefore, many MOLAP servers use two levels of data storage
representation to handle dense and sparse data-sets. MOLAP includes the
following components:
8
Database server.
MOLAP server.
Front-end tool.
Advantages
MOLAP allows fastest indexing to the pre-computed summarized data.
Easier to use, therefore MOLAP is suitable for inexperienced users.
Disadvantages
MOLAP are not capable of containing detailed data.
The storage utilization may be low if the data set is sparse.
Hybrid OLAP (HOLAP)
Hybrid OLAP is a combination of both ROLAP and MOLAP. It offers higher
scalability of ROLAP and faster computation of MOLAP. HOLAP servers allows
to store the large data volumes of detailed information. The aggregations are
stored separately in MOLAP store.
MOLAP vs. ROLAP
MOLAP ROLAP
Information retrieval is fast. Information retrieval is
comparatively slow.
Uses sparse array to store Uses relational table.
data-sets.
MOLAP is best suited for ROLAP is best suited for
inexperienced users, since it experienced users.
9
is very easy to use.
Maintains a separate It may not require space other
database for data cubes. than available in the Data
warehouse.
DBMS facility is weak. DBMS facility is strong.
10