Data Access
Lecture 2
Conducted by
Ms. Akila Brahmana
Department of ICT
Faculty of Technology
University of Ruhuna
Objectives
Describe the importance of defining the access strategy early
in the development of the warehouse.
Identify the different categories of data access tools.
Identify the different database models that support OLAP
query tools.
Identify OLAP query techniques.
Identify non-OLAP tools for data access.
Identify factors influencing query tool choice.
Data Warehouse vs. Heterogeneous DBMS
Traditional heterogeneous DB integration:
Build wrappers/mediators on top of heterogeneous databases
Query driven approach
When a query is posed to a client site, a meta-dictionary is used to
translate the query into queries appropriate for individual
heterogeneous sites involved, and the results are integrated into a
global answer set
Complex information filtering, compete for resources
Data warehouse: update-driven, high performance
Information from heterogeneous sources is integrated in
advance and stored in warehouses for direct query and analysis
Data Warehouse vs. Operational DBMS
OLTP (on-line transaction processing)
Major task of traditional relational DBMS
Day-to-day operations: purchasing, inventory, banking, manufacturing, payroll, registration,
accounting, etc.
OLAP (on-line analytical processing)
Major task of data warehouse system
Data analysis and decision making
Distinct features (OLTP vs. OLAP):
User and system orientation: customer vs. market
Data contents: current, detailed vs. historical, consolidated
Database design: ER + application vs. star + subject
View: current, local vs. evolutionary, integrated
Access patterns: update vs. read-only but complex queries
OLTP vs. OLAP
OLTP OLAP
Users clerk, IT professional knowledge worker
Function day to day operations decision support
DB design application-oriented subject-oriented
Data current, up-to-date detailed, flat relational historical, summarized,
isolated multidimensional integrated,
consolidated
Usage repetitive ad-hoc
Access read/write lots of scans
index/hash on prim. key
Unit of work short, simple transaction complex query
# records accessed tens millions
#users thousands Hundreds
DB size 100MB-GB 100GB-TB
Metric transaction throughput query throughput, response
Data Access Tools
OLAP tools
Multidimensional data access
Relational data access
Standard reporting tools
Relational data access primarily
Multidimensional access possibly
Data mining tools
OLAP and DSS Compared
OLAP - used for multidimensional analysis
DSS - provides a system enabling decision making
OLAP tools provide a DSS capability
OLAP for the warehouse provides analytical power
OLAP Rules
1. Multidimensional conceptual view
2. Transparency
3. Accessibility
4. Consistent reporting performance
5. Client-server architecture
OLAP Rules
6. Multi-user support
7. Unrestricted cross dimensional operations
8. Intuitive data manipulation
9. Flexible reporting
10. Unlimited dimensions and aggregation levels
Relational Database Model
Table Columns
Rows
Key values to join
The data is found at the intersection of a row
and a column
Multidimensional Database Model
Customer Store
Store
Time Time
SALES FINANCE
Product
GL_Line
The data is found at the intersection of
dimensions
Comparison of Models
Property Relational Multidimensional
Data representation Two dimension Multiple dimensions
Data extraction Specific rows Specific dimensions
Results Tool specific Matrix
Multidimensional Queries
What are my ten best and worst products, in terms of percentage
growth, in 2019 compared with 2018?
What is the relative growth of my largest selling product?
How did each of my products grow last year?
Chart the four-week moving-average trend for sales over the last
two fiscal years.
How profitable is each of my sales territories, if overhead is
allocated on a sales versus head count basis?
Project next quarter’s sales based on the last 38 months’ figures
for the Northern Region.
OLAP Models
MOLAP ROLAP
Desktop
OLAP
Server
Desktop
MD Server OLAP
Database
Server
Warehouse
The MOLAP Model
The application layer stores data in a DSS Client
multidimensional structure.
The presentation layer provides the
multidimensional view.
MOLAP
Application Layer
Engine
Warehouse
The MOLAP Model
Data
DSS Client
Arrays
Cached
Offloaded from server
Efficient storage and processing MOLAP
Application Layer
Engine
Complexity hidden from the user
Analysis using pre aggregated
summaries and pre calculated measures
Warehouse
The ROLAP Model
The warehouse stores atomic data. DSS Client
The application layer generates SQL for
the two dimensional view.
The presentation layer provides the
multidimensional view.
ROLAP
Application Layer
Engine
Multiple
SQL
Warehouse
Server
The ROLAP Model
Data and metadata in Server
DSS Client
Multidimensional views of data
High connectivity
Unlimited ROLAP
Application Layer
Database size
Engine
Query criteria
Multiple
SQL
Complex SQL generated by tool
Warehouse
Server
OLAP Query Characteristics
Access to large amounts of data
Analysis of data relationships by many business criteria
Analysis of data by time
Display of data across different dimensions
Complex calculations using formula
Quick response
Standard Query Techniques
Customer
Why?
Slice /Dice Time
What? Account
Why?
Why?
Drill-down
Standard Query Techniques
Why? Drill-up
What?
Drill-across
Why?
Why?
Pivoting
Standard Query Techniques
Roll up (drill-up): summarize data
by climbing up hierarchy or by dimension reduction
Drill down (roll down): reverse of roll-up
from higher level summary to lower level summary or detailed data, or
introducing new dimensions
Slice and dice:
project and select
Standard Query Techniques
Pivot (rotate):
reorient the cube, visualization, 3D to series of 2D planes.
Other operations
drill across: involving (across) more than one fact table
drill through: through the bottom level of the cube to its backend
relational tables (using SQL)
A Sample Data Cube
Example: Standard Query Techniques
Subject: Sales
Slice
The Slice OLAP operations takes
one specific dimension from a cube
given and represents a new sub-
cube, which provides information
from another point of view.
It can create a new sub-cube by
choosing one or more dimensions.
The use of Slice implies the
specified granularity level of the
dimension.
Dice
OLAP Dice emphasizes two or
more dimensions from a cube
given and suggests a new sub-
cube, as well as Slice operation
does.
Drill Up
Drill-up is an operation to gather
data from the cube either by
ascending a concept hierarchy for a
dimension or by dimension reduction
in order to receive measures at a less
detailed granularity.
So that to see a broader perspective
in compliance with the concept
hierarchy a user has to group
columns and unite the values.
Drill down
Drill-down is an operation
opposite to Drill-up. It is carried
out either by descending a
concept hierarchy for a
dimension or by adding a new
dimension.
It lets a user deploy highly
detailed data from a less
detailed cube.
Pivot
This OLAP operation rotates
the axes of a cube to provide
an alternative view of the
data cube.
Pivot clusters the data with
other dimensions which helps
analyze the performance of a
company or enterprise.
Understanding OLAP Standard Query Techniques
Given a fact table with sales data (for example sales(market#, product#,
time#, amount)) and relevant dimension tables, write an SQL statement
that slices the cube to select sales only in week 2, and dice it by regions.
Assuming that we have the following dimension tables:
Market (Market_ID, City, Region)
Product (Product_ID, Name, Category, Price)
Time (Time_ID, Week, Month, Quarter)
Sales (Market_ID, Product_ID, Time_ID, Amount)
Answer
The following query provides a solution:
SELECT M.Region, SUM (Amount)
FROM Sales S, Time T, Market M
WHERE T.Time_ID = S.Time_ID AND T.Week =“Week2”
AND S.Market_ID = M.Market_ID
GROUP BY M. Region
Understanding OLAP Standard Query Techniques
Calculate the following query
given the table (sales) below:
SELECT Time, Region,
Department, sum(Profit) AS
Profit
FROM sales
GROUP BY ROLLUP(Time,
Region, Department)
Answer
Non-OLAP Tools
Specialized tools for DSS
May access relational or multidimensional data
Access operational and warehouse data
Ad-hoc query
Reporting
Exploration
Non-OLAP Tools
Spreadsheets
12345.00 100% ABC CO
12780.00 110% GMBH LTD
2345787.00 230% GBUK INC
87877.98 200% FFR ASSOC
5678.00 -10% MCD CO
Choosing a Reporting Architecture
Business needs
User adaptability Good
GUI interface MOLAP
Query
Computer architecture Performance
Network architecture ROLAP
OK
Openness
Simple Complex
Performance Analysis
Management
Enterprise-wide perspective
Tools Comparison
Relational
Known environment
Use with operational and warehouse systems
No complex analysis functions
Multidimensional
Quick access to data
Extensive libraries of complex functions
Strong modeling and forecasting capabilities
Use with operational and warehouse systems
Difficulty of changing dimensions
Summary
Data access strategy
Data access tools
Database models for OLAP
OLAP query techniques
Non-OLAP tools
Query tool choice
Thank You!