0% found this document useful (0 votes)
135 views13 pages

Lecture4 - DATA WAREHOUSING PDF

The document discusses data warehousing and decision support. It explains that a data warehouse is a separate database from transaction systems, structured for analysis rather than transactions. Data is integrated from multiple sources and structured for queries. The data warehouse allows historical and aggregated data analysis to support decision making. Data flows into the warehouse from sources, and is then available to users through querying and reporting tools.

Uploaded by

Ahmad Rijal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
135 views13 pages

Lecture4 - DATA WAREHOUSING PDF

The document discusses data warehousing and decision support. It explains that a data warehouse is a separate database from transaction systems, structured for analysis rather than transactions. Data is integrated from multiple sources and structured for queries. The data warehouse allows historical and aggregated data analysis to support decision making. Data flows into the warehouse from sources, and is then available to users through querying and reporting tools.

Uploaded by

Ahmad Rijal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Data Warehousing

• Data for Decision Support


• OLAP & Data Warehousing
Data Warehousing
• Data Flows in the Data Warehouse
• Data in the Data Warehouse

Chapter 8

Data for Decision Support


• Data Types:
– Data
Data for Decision Support – Information
– Knowledge

• Data Origin
– Internal
Traditional Data Management – External
– Personal

1
Data for Decision Support
• Databases
– Relational
– Hierarchical & Network OLAP & Data Warehousing
– Object-Oriented
– Multimedia & Document Systems
• Data Problems
– Incorrect data
– Data not timely
Databases for Decision Support
– Data not measured or indexed properly
– Needed data do not exist

In the late 1980s, an interesting intersection


of ideas & technologies... Data Analysis Limitations
• DSS: data modeling & analysis tools • Organizations have been collecting data for many years.
However, much of that data is hardly used in
• Databases: tables & query languages sophisticated data analysis for decision support
• EIS: consolidation, drill-down & data cubes purposes.
– TPS databases are not structured to support analysis.
• User Interfaces: focus on end user – SQL has difficulty answering common business
questions.
– Complex queries may take days to answer.
And a need for more extensive access to
– Some questions may take several queries to answer.
information for decision support purposes!
Sales ratio of promoted vs. unpromoted products?
Quantity shipped to discount stores vs. department stores?

2
Data Analysis Limitations The Problem: OLTP
• Consider a typical business analysis problem: • OLTP = On-Line Transaction Processing
– Find the share of total sales represented by each
product in different markets, categories, and periods, – OLTP is the basic architecture underlying
compared with the same period a year ago. relational database-based transaction processing
• To do so, you would calculate the percentage systems.
each number is of the total of its column, a – Data is broken up into relational tables, with
simple and common concept. relationships described using foreign keys.
• However, in a classic relational database these – OLTP is optimized for the entry and retrieval of
calculations and display would require definition individual pieces of fact.
of a separate view, requiring complex SQL – OLTP was not designed to support large-scale
commands. data analysis.

The Solution: OLAP The Data Warehouse


• OLAP = On-Line • Most applications of OLAP are not known as
Analytical Processing such, but rather as Data Warehouses.
– A database architecture – A data archive in which the data has been
not designed for optimal structured and formatted for analysis purposes.
data processing but for – A database system separate from the transaction
optimal data analysis!
processing system so as not to interfere with the
– The differences in data TPS performance.
processing and data
analysis suggest that – The current foundation for Decision Support.
this would be a
separate database
system. Source: OlapReport.com

3
The Data Warehouse The Data Warehouse
• A managed database in which the data is: – Time-variant:
– Subject-oriented: • Contains informational data (historic data along a
time-axis) rather than operational data
• Designed for decision-support rather than for TPS.
(momentary snapshot).
– Integrated: – Aggregated:
• Contains data from different legacy & DB-systems.
• Data exists at different levels of granularity
– Nonvolatile: • to support management needs
• New data is integrated with existing data, rather
than replacing it.

William Inmon, 1990, 1992

The Data Warehouse The Data Warehouse


• "A data warehouse is a copy of transaction • Additional Data Warehouse characteristics:
data specifically structured for query and – Web based
analysis“ – Relational/multidimensional
(Kimball, 1996) – Client/server
– Real-time
– Include metadata

4
Why do we need a Data Warehouse? Data Warehousing & Legacy Systems

• Business Intelligence • The Data Warehouse:


– One version of the truth - consistency – Allows existing legacy systems to continue in
– Cleaner data - scrubbed operation.
– Security – Consolidates inconsistent data from the
– Accessibility various legacy systems into a coherent set.
– Historical data – Reaps benefits from vital information about
current operations.
– Many applications
– Provides fertile ground to architect new
• mining, strategic, tactical, performance
operational systems.

Data Warehouse Usage The Data Mart


• Data mining software - based on OLAP - is • A Data Mart is a subset of the Data
multidimensional query and analysis software Warehouse
which can actively hunt for patterns and add
– Most users only need a subset of DW data
value to the data it is searching.
– Makes it easier for users to navigate
– Replicated Data Marts are fed DW data only
– Independent Data Marts maintain their own
databases

5
Operational Units
• Operational data stores (ODS):
– A type of database often used as an interim
area for a data warehouse. It includes short- Data Flows in the Data Warehouse
term data from multiple sources in a near
real-time manner to support short-term
decision making
• Oper marts: Putting the Data Warehouse together
– An operational data mart, which allows for
multi-dimensional data analysis of ODS data.

Data Flows in a Data Warehouse Data Flows in a Data Warehouse


• Inflow - bring data into data warehouse. • Inflow
• Upflow - add value to data through – Legacy data is cleaned up and fed into the
summarizing, packaging, and distributing. data warehouse.
• Downflow - archive data with low business
value. Data Sources
• Outflow - make data available to users through E-Mail Current
access and delivery. Detail
Wire Services
• Metaflow - describe data and data flows to External Databases capture
repair transfer
constantly refine and improve the data validate apply
Legacy Data
warehouse.

6
Inflow isn’t easy... Inflow isn’t easy...
• Because of incorrect, contaminated, or just plain
missing data, the migration phase to a data
warehouse can dominate the entire project! Source 1: 9-12-2009
Source 2: Sep-12-09 09/12/2009
• Integrating data about a single entity from Source 3: September 12, 2009
multiple sources is often difficult.
• Automated tools are available to help out:
Source 1: IBM
– Data Migration tools for data extraction Source 2: I.B.M. ?
– Data Scrubbing tools for cleaning up data. Source 3: IBM Corporation
– Data Auditing tools for patterns and structures
contained within the data.

Inflow By Any Other Name… Inflow By Any Other Name…

• Extraction, transformation, and load (ETL)


– A data warehousing process that consists of:
– Extraction - reading data from a database
– Transformation - converting the extracted
data from its previous form into the form in
which it needs to be so that it can be placed
into a data warehouse or simply another
database
– Load - putting the data into the data
The ETL Process
warehouse

7
Inflow By Any Other Name… Data Flows in a Data Warehouse
• Enterprise application integration (EAI) • Upflow
– A technology that provides a vehicle for – Highly detailed data Heavily Summarized
pushing data from source systems into a data is aggregated,
warehouse summarized, and
otherwise processed Lightly
• Enterprise information integration (EII) in the warehouse to Summarized
– A tool space that provides real-time data make it easier for
integration from various sources, such as users to get quick
relational databases, Web services, and responses to their
queries. Current
multidimensional databases
Detail

Data Flows in a Data Warehouse Data Flows in a Data Warehouse


• Outflow 90% of • Downflow Heavily Summarized
– Users run queries – With new data
queries to get Heavily Summarized constantly being Lightly
the information
they need. In a added, move Summarized
correctly 10% of marginally useful old
structured data Lightly queries data to off to an
warehouse, Summarized Current
archive where it can
most queries will Detail
go against be accessed if
highly Current needed.
summarized
data. Detail Users Older Detail

8
Data Flows in a Data Warehouse Data Flows in a Data Warehouse
• The reason for downflow: data loses business Monthly sales by product
Heavily Summarized
value over time! line 2001-2011

Lightly Weekly sales by sub-


Business Need product line 2006-2011
Hierarchical Storage Mgmt Summarized
for Data
Data Warehouse
Current Sales detail 2007-2011
Storage Area Networks
Detail
Online Databases Storage Archive

Active Data Inactive Data


Older Detail Sales detail 1990-2006

Data Flows in a Data Warehouse Data Warehouse Architectures

• Metaflow Data Warehouse Framework


– Users & IS need to know about the data in the Data Custom Apps
Mart
warehouse: where it is located, what it represents,
where it came from, etc. This information is Mktg
Reporting Tools
Middleware

Legacy
metadata - data about the data. Select Metadata Data
Extract Reports Mart
Data Sources Data Transform Query Tools
OLTP
Storage Users Integrate Enterprise
Risk
Mgmt
Maintain
Data Warehouse
Preparation
System Regulate & Business External Data
OLAP Tools
Modeling Synthesize Modeling Mart
Engineering Web Browsers
METADATA

9
Data Warehouse Architectures Data Warehouse Architectures

Common Architecture of a Three-Tier Data Warehouse

Client Workstation Application Server Database Server

Inmon’s Data Warehousing Target


Architecture
Real-Time Data Warehouse
• Real-time or Active Data Warehouses load
Primitive Derived operational data as soon as it becomes
Data Data available
– Immediate data analysis
– View of organization’s changes over time
– Support for real-time decision making
Operational Departmental
Data
Warehouse Individual

10
The Star Schema
• A currently popular data architecture for data
warehouses is the Star Schema.
Data in the Data Warehouse – Simple structure with relatively few tables.
– Data is not fully normalized as in OLTP.
– Star design is easily understood by analysts
and end users, especially those without much
The Star Schema database familiarity.
– The star design eases data analysis: more
analysis possibilities and simpler queries.

The Star Schema Table Types The Star Schema


• Fact Tables (Major Tables): large tables which • Sales database with simple star design:
contain the quantitative or factual business data. Time Dimension
Fact
– Example: a fact table might contain sales revenue for Period_ID
company products for each customer in each Period_ID
Period_Descr
geographical market over a period of time. Quarter
Product_ID
Space Dimension
• Dimension Tables (Minor Tables): smaller tables Year Market_ID
Market_ID
which contain descriptive data that reflect the Product Dimension Units
dimensions of a business. Dollars Market_Descr
Product_ID District
Discount_%
– Example: dimension tables which define the Region
customers, products, geographic markets, and time Prod_Descr
Brand
periods used in the fact table. Size

11
Nine Decisions in Data Design Estimating Data Warehouse Size
• Choose the subject matter Grocery Store Star Schema
• Decide what a fact table represents
Time Dimension Product Dimension
Fact Table
• Identify and conform the dimensions
Time-ID Product-ID
• Choose the facts Day ID
Time-ID SKU ID
• Store pre-calculations in the fact table Week ID
Month ID
Store-ID Package ID
Brand ID
• Round out the dimension tables Quarter ID Product-ID
Category ID
Year ID Promotion-ID
• Choose the duration of the database Sales Facts
Subcategory ID

• The need to track slowly changing dimensions Store Dimension


Promotion Dimension
• Decide the query priorities and the query modes Store-ID
State ID
Promotion-ID
Promo Type ID
District ID
Media Type ID
Region ID
Source: Univ. of North Carolina - Greensboro Display Type ID

Estimating Data Warehouse Size Data Warehouse Pros & Cons


• Time dimension: 2 years X 365 days = 730 days • Direct benefits of a data warehouse
• Store dimension: 300 stores, reporting sales each day
• Product dimension: 30,000 products in each store, of – Allows end users to perform extensive
which 3,000 sell each day in a given store analysis
• Promotion dimension: a sold item appears in only one – Allows a consolidated view of corporate data
promotion condition in a store on a day
• Number of base fact records = 730 X 300 X 3000 X 1 =
– Better and more timely information A
657 million records – Enhanced system performance
• Number of key fields = 4; Number of fact fields = 4; – Simplification of data access
Total fields = 8
• Base fact table size = 657 million X 8 fields X 4 bytes =
21 GB

12
Data Warehouse Pros & Cons Data Warehouse Pros & Cons
• Indirect benefits result from end users • Data warehousing implementation issues
using these direct benefits – Implementing a data warehouse is generally
– Enhance business knowledge a massive effort that must be planned and
– Present competitive advantage executed according to established methods
– Enhance customer service and satisfaction – Data warehouses are expensive and time-
consuming to develop
– Facilitate decision making
– Benefits might not be immediately obvious
– Help in reforming business processes
– Management support and championing is
vital to success

13

You might also like