0% found this document useful (0 votes)
2 views39 pages

Modul1 IntroductionToOLAP

The document discusses Online Analytical Processing (OLAP), its strengths, limitations of SQL, and typical OLAP queries. It explains the concept of multidimensional databases, data cubes, and various operations like slicing, dicing, roll-up, and drill-down. Additionally, it highlights the importance of structured data and the use of OLAP tools for decision support in business analytics.

Uploaded by

kashishj405
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views39 pages

Modul1 IntroductionToOLAP

The document discusses Online Analytical Processing (OLAP), its strengths, limitations of SQL, and typical OLAP queries. It explains the concept of multidimensional databases, data cubes, and various operations like slicing, dicing, roll-up, and drill-down. Additionally, it highlights the importance of structured data and the use of OLAP tools for decision support in business analytics.

Uploaded by

kashishj405
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 39

OLAP

-By Ujwala Bharambe


II. On-Line Analytical Processing
(OLAP)

Making Decision Support


Possible
Limitations of SQL

“A Freshman in
Business needs a
Ph.D. in SQL”

-- Ralph Kimball

© Dr.Ujwala Bharambe
3
Typical OLAP Queries
• Write a multi-table join to compare sales for each product line YTD this year vs.
last year.
• Repeat the above process to find the top 5 product contributors to margin.
• Repeat the above process to find the sales of a product line to new vs. existing
customers.
• Repeat the above process to find the customers that have had negative sales
growth.

© Dr.Ujwala Bharambe
4
What Is OLAP?
• Online Analytical Processing - coined by
EF Codd in 1994 paper contracted by
Arbor Software*
• Generally synonymous with earlier terms such as Decisions
Support, Business Intelligence, Executive Information System
• OLAP = Multidimensional Database
• MOLAP: Multidimensional OLAP (Arbor Essbase, Oracle
Express)
• ROLAP: Relational OLAP (Informix MetaCube, Microstrategy DSS
Agent)

* Reference: http://www.arborsoft.com/essbase/wht_ppr/coddTOC.html

© Dr.Ujwala Bharambe
5
Strengths of OLAP
• It is a powerful visualization paradigm
• It provides fast, interactive response times
• It is good for analyzing time series
• It can be useful to find some clusters and outliers
• Many vendors offer OLAP tools

© Dr.Ujwala Bharambe
6
OLAP Is FASMI
• Fast
• Analysis
• Shared
• Multidimensional
• Information

Nigel Pendse, Richard Creath - The OLAP Report


© Dr.Ujwala Bharambe
7
Multi-dimensional Data

• “Hey…I sold $100M worth of goods”


Dimensions: Product, Region, Time
Hierarchical summarization paths
W
S Product Region Time
N
Industry Country Year
Product

Juice
Cola
Milk
Category Region Quarter
Cream
Toothpaste
Soap Product City Month Week
1 2 34 5 6 7
Month
8
© Dr.Ujwala Bharambe Office Day
Data Cube Lattice
• Cube lattice
• ABC
AB AC BC
A B C
none
• Can materialize some groupbys, compute others on demand
• Question: which groupbys to materialze?
• Question: what indices to create
• Question: how to organize data (chunks, etc)

© Dr.Ujwala Bharambe
9
Visualizing Neighbors is simpler
Month Store Sales
1 2 3 4 5 6 7 8 Apr 1
Apr Apr 2
Apr 3
May Apr 4
Jun Apr 5
Jul Apr 6
Apr 7
Aug Apr 8
Sep May 1
Oct May 2
May 3
Nov May 4
Dec May 5
Jan May 6
May 7
Feb May 8
Mar Jun 1
Jun 2
© Dr.Ujwala Bharambe 10
A Visual Operation: Pivot (Rotate)

10
Juice
47
Cola
Milk 30
Cream 12 Product

3/1 3/2 3/3 3/4

11
Date © Dr.Ujwala Bharambe
“Slicing and Dicing”

The Telecomm Slice


Product

Household

Telecomm

Video Europe
Far East
Audio India

Retail Direct Special Sales Channel


© Dr.Ujwala Bharambe
12
Roll-up and Drill Down
Higher Level of
Aggregation
• Sales Channel
• Region
• Country
• State
• Location Address
• Sales Representative

Low-level
Details
© Dr.Ujwala Bharambe
13
Nature of OLAP Analysis
• Aggregation -- (total sales,
percent-to-total)
• Comparison -- Budget vs. Expenses
• Ranking -- Top 10, quartile analysis
• Access to detailed and aggregate
data
• Complex criteria specification
• Visualization

© Dr.Ujwala Bharambe 14
Organizationally Structured Data
• Different Departments look at the same detailed data
in different ways. Without the detailed,
organizationally structured data as a foundation,
there is no reconcilability of data

marketing

sales
finance
manufacturing
© Dr.Ujwala Bharambe 15
Multidimensional Spreadsheets
• Analysts need spreadsheets that
support
• pivot tables (cross-tabs)
• drill-down and roll-up
• slice and dice
• sort
• selections
• derived attributes
• Popular in retail domain

© Dr.Ujwala Bharambe 16
OLAP - Data Cube

• Idea: analysts need to group data in many different ways


• eg. Sales(region, product, prodtype, prodstyle, date, saleamount)
• saleamount is a measure attribute, rest are dimension attributes
• groupby every subset of the other attributes
• materialize (precompute and store) groupbys to give online response
• Also: hierarchies on attributes: date -> weekday,
date -> month -> quarter -> year

© Dr.Ujwala Bharambe
17
SQL Extensions
• Front-end tools require
• Extended Family of Aggregate Functions
• rank, median, mode
• Reporting Features
• running totals, cumulative totals
• Results of multiple group by
• total sales by month and total sales by product
• Data Cube

© Dr.Ujwala Bharambe
18
Relational OLAP: 3 Tier DSS
Data Warehouse ROLAP Engine Decision Support Client

Database Layer Application Logic Layer Presentation Layer

Store atomic Generate SQL Obtain multi-


data in industry execution plans in dimensional
standard the ROLAP engine reports from the
RDBMS. to obtain OLAP DSS Client.
functionality.
© Dr.Ujwala Bharambe
19
MD-OLAP: 2 Tier DSS
MDDB Engine MDDB Engine Decision Support Client

Database Layer Application Logic Layer Presentation Layer

Store atomic data in a proprietary Obtain multi-


data structure (MDDB), pre-calculate dimensional
as many outcomes as possible, obtain reports from the
OLAP functionality via proprietary DSS Client.
algorithms running against this data.
© Dr.Ujwala Bharambe
20
Typical OLAP Problems
Data Explosion

70000 Data Explosion Syndrome


65536
60000
50000
40000
Number of Aggregations

30000
20000
16384
10000
4096
0 16 81 256 1024
2 3 4 5 6 7 8

(4 levels in each dimension) Number of Dimensions


© Dr.Ujwala Bharambe
21 Microsoft TechEd’98
Granularity in Warehouse
• Can not answer some questions with summarized data
• Did Anand call Seshadri last month? Not possible to answer if total duration of
calls by Anand over a month is only maintained and individual call details are
not.
• Detailed data too voluminous

© Dr.Ujwala Bharambe
22
Granularity in Warehouse
• Tradeoff is to have dual level of granularity
• Store summary data on disks
• 95% of DSS processing done against this data
• Store detail on tapes
• 5% of DSS processing against this data

© Dr.Ujwala Bharambe
23
Vertical Partitioning
Acct. Interest
Name Balance Date Opened Address
No Rate

Frequently
accessed Rarely
accessed
Acct. Acct. Interest
Balance Name Date Opened Address
No No Rate

Smaller table
and so less I/O
© Dr.Ujwala Bharambe
24
Derived Data
• Introduction of derived (calculated data) may often help
• Have seen this in the context of dual levels of granularity
• Can keep auxiliary views and indexes to speed up query processing

© Dr.Ujwala Bharambe
25
Aggregates
• Add up amounts for day 1
• In SQL: SELECT sum(amt) FROM SALE
WHERE date = 1

sale prodId storeId date amt


p1 s1 1 12
p2 s1 1 11
p1 s3 1 50
p2 s2 1 8
81
p1 s1 2 44
p1 s2 2 4

© Dr.Ujwala Bharambe 26
Aggregates
• Add up amounts by day
• In SQL: SELECT date, sum(amt) FROM SALE
GROUP BY date

sale prodId storeId date amt


p1 s1 1 12
p2 s1 1 11 ans date sum
p1 s3 1 50 1 81
p2 s2 1 8 2 48
p1 s1 2 44
p1 s2 2 4

© Dr.Ujwala Bharambe 27
Another Example
• Add up amounts by day, product
• In SQL: SELECT date, sum(amt) FROM SALE
GROUP BY date, prodId
sale prodId storeId date amt
p1 s1 1 12 sale prodId date amt
p2 s1 1 11 p1 1 62
p1 s3 1 50 p2 1 19
p2 s2 1 8
p1 s1 2 44 p1 2 48
p1 s2 2 4

rollup
drill-down

© Dr.Ujwala Bharambe 28
Aggregates
• Operators: sum, count, max, min, median, ave
• “Having” clause
• Using dimension hierarchy
• average by region (within store)
• maximum by month (within date)

© Dr.Ujwala Bharambe 29
Data Cube

Fact table view: Multi-dimensional cube:


sale prodId storeId amt
p1 s1 12 s1 s2 s3
p2 s1 11 p1 12 50
p1 s3 50 p2 11 8
p2 s2 8

dimensions = 2

© Dr.Ujwala Bharambe 30
3-D Cube
Fact table view: Multi-dimensional cube:

sale prodId storeId date amt


p1 s1 1 12
p2 s1 1 11 s1 s2 s3
day 2
p1 s3 1 50 p1 44 4
p2 s2 1 8 p2 s1 s2 s3
p1 s1 2 44 day 1
p1 12 50
p1 s2 2 4 p2 11 8

dimensions = 3

© Dr.Ujwala Bharambe 31
Example
roll-up to region
Dimensions:
NY
SF
Time, Product, Store
roll-up to brand
LA
Attributes:
10
Product (upc, price, …)
Juice
Store …
Product

Milk 34
56 …
Coke
Cream 32 Hierarchies:
Soap 12 Product → Brand → …
Bread 56 roll-up to week Day → Week → Quarter
M T W Th F S S
Store → Region → Country
Time
56 units of bread sold in LA on M

© Dr.Ujwala Bharambe 32
Cube Aggregation: Roll-up
Example: computing sums
s1 s2 s3
day 2 ...
p1 44 4
p2 s1 s2 s3
day 1
p1 12 50
p2 11 8

s1 s2 s3
sum 67 12 50
s1 s2 s3
p1 56 4 50
p2 11 8
129
sum
rollup p1 110
p2 19
drill-down
© Dr.Ujwala Bharambe 33
Cube Operators for Roll-up
s1 s2 s3
day 2 ...
p1 44 4
p2 s1 s2 s3
day 1
p1 12 50
p2 11 8 sale(s1,*,*)

s1 s2 s3
sum 67 12 50
s1 s2 s3
p1 56 4 50
p2 11 8
129
sum
sale(s2,p2,*) p1 110
p2 19 sale(*,*,*)

© Dr.Ujwala Bharambe 34
Extended Cube

* s1 s2 s3 *
p1 56 4 50 110
p2 11 8 19
day 2 *
s1 67
s2 12
s3 *50 129
p1 44 4 48
p2
s1 s2 s3 *
day 1
p1
*
12
44 4
50 62
48 sale(*,p2,*)
p2 11 8 19
* 23 8 50 81

© Dr.Ujwala Bharambe 35
Aggregation Using Hierarchies
s1 s2 s3
day 2
p1 44 4
store
p2 s1 s2 s3
day 1
p1 12 50
p2 11 8
region

country

region A region B
p1 56 54
p2 11 8
(store s1 in Region A;
stores s2, s3 in Region B)

© Dr.Ujwala Bharambe 36
Slicing
s1 s2 s3
day 2
p1 44 4
p2 s1 s2 s3
day 1
p1 12 50
p2 11 8

TIME = day 1

s1 s2 s3
p1 12 50
p2 11 8

© Dr.Ujwala Bharambe 37
Slicing & Sales
($ millions)
Products Time
Pivoting Store s1 Electronics
d1
$5.2
d2

Toys $1.9
Clothing $2.3
Cosmetics $1.1
Store s2 Electronics $8.9
Toys $0.75
Clothing $4.6
Cosmetics $1.5
Sales
($ millions)
Products d1
Store s1 Store s2
Store s1 Electronics $5.2 $8.9
Toys $1.9 $0.75
Clothing $2.3 $4.6
Cosmetics $1.1 $1.5
Store s2 Electronics
Toys
Clothing
© Dr.Ujwala Bharambe 38
Summary of Operations
• Aggregation (roll-up)
• aggregate (summarize) data to the next higher dimension element
• e.g., total sales by city, year → total sales by region, year
• Navigation to detailed data (drill-down)
• Selection (slice) defines a subcube
• e.g., sales where city =‘Gainesville’ and date = ‘1/15/90’
• Calculation and ranking
• e.g., top 3% of cities by average income
• Visualization operations (e.g., Pivot)
• Time functions
• e.g., time average

© Dr.Ujwala Bharambe 39

You might also like