0% found this document useful (0 votes)

99 views89 pages

Data Warehouse for Information Processing

This document discusses key concepts related to data warehousing and online analytical processing (OLAP). It begins with an introduction to data warehouses and their purpose in supporting business decision making. The document then covers architectural components of data warehouses including data sources, the data warehouse itself, and business intelligence tools. It also discusses dimensional modeling, extract-transform-load processes, and techniques for efficiently querying and indexing multidimensional data.

Uploaded by

Harshad Nawghare

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

99 views89 pages

Data Warehouse for Information Processing

Uploaded by

Harshad Nawghare

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 89

Data Warehousing & Mining

B. Tech and MBA. Tech. (Information Technology)

Semester-V

Chapter 2 Architecture and Infrastructure & Data

Representation
Chapter 3 Information access and delivery

By. Prof. Bhushan Inje

Outline
• Architectural components
• Infrastructure and metadata
• Principles of dimensional modeling
• Dimensional modeling advance topics
• Data Extraction, Transformation and Loading
• Data quality

By. Prof. Bhushan Inje

Introduction
• A data warehouse is a collection of corporate information,
derived directly from operational systems and some external data sources.
• Its specific purpose is to support business decisions, not business
operations.
• This is what a data warehouse is all about, helping your business ask
“What if?” questions.
• The answers to these questions will ensure your business is proactive,
instead of reactive, a necessity in today’s information age.
What is a Data Warehouse?
• Data warehouse provides architectures and tools for business executives to
systematically organise, understand, and use their data to make strategic
decisions.
• In simple terms, a data warehouse refers to a database that is maintained
separately from an organization’s operational databases.
• According to W. H. Inman, a leading architect in the construction of data
warehouse systems,
“a data warehouse is a subject-oriented, integrated, time-variant, and
nonvolatile collection of data in support of management’s decision making
process.”
Cont.…
• Let us understand the four key words in more detail as
follows:
– Subject-oriented
– Integrated
– Time-variant
– Non-volatile
Use of Data Warehouses in organizations
• Many organizations are creating data warehouse to support
business decision-making activities for the following reasons:
– To increasing customer focus.
– To reposition products and managing product portfolios.
– To analyzing operations and looking for sources of profit.
– To managing the customer relationships
– Data warehousing is also very useful from the point of view of
heterogeneous database integration.
Differences between operational Database systems
and Data Warehouses
• On Line Transactional Processing (OLTP)
• On Line Analytical Processing (OLAP).
Characteristics of Data Warehouse
• Subject oriented
• Integrated
• Time variant
• Non volatile
Data Warehouse components
Data Warehouse components
• Data sources
• Data Warehouse
• Reporting
– Business intelligence tools
– Executive information systems (known more widely as Dashboard (business)
– OLAP Tools
– Data Mining

• Metadata
• Operations
•
Data Warehouse components Cont..

• Optional components
– Dependent Data Marts
– Logical Data Marts
– Operational Data Store
Designing the Data Warehouse
Data Warehouse Architecture
• Why do Business analysts need Data Warehouse?
Process of Data Warehouse Design
A data warehouse can be built using three approaches:
1. A top-down approach
2. A bottom-up approach
3. A combination of both approaches
• In general, the warehouse design process consists of the
following steps:
A three-tier Data
Warehouse
architecture
OLAP server architectures
There are three different possible designs:
1. Relational OLAP (ROLAP)
2. Multidimensional OLAP (MOLAP)
3. Hybrid OLAP (HOLAP)
Getting Multidimensional Data out of the Warehouse
A Multidimensional Data Model
• From Tables and Spreadsheets to Data Cubes
• What is a data cube?
– A data cube allows data to be modeled and viewed in
multiple dimensions. It is defined by dimensions and
facts
– Dimensions are the perspectives or entities with respect
to which an organization wants to keep records.
Cont.…
Cont.…
Schemas for Multidimensional Databases

• Stars, Snowflakes, and Fact Constellations:

• Star schema:
– (1) a large central table (fact table) containing the bulk of the data,
with no redundancy, and
– (2) a set of smaller attendant tables (dimension tables), one for each
dimension.
Examples for Defining Star, Snowflake,
and Fact Constellation Schemas
• Data warehouses and data marts can be defined using two language primitives,
one for cube definition and one for dimension definition. The cube definition
statement has the following syntax:
Measures: Their Categorization and Computation

• Note that a multidimensional point in the data cube space can be defined
by a set of dimension-value pairs, for example, (time = “Q1”, location =
“Vancouver”, item = “computer”).
• A data cube measure is a numerical function that can be evaluated
at each point in the data cube space.
• Measures can be organized into three categories (i.e., distributive,
algebraic, holistic), based on the kind of aggregate functions used.
• Distributive: An aggregate function is distributive if it can be
computed in a distributed manner
• Algebraic: An aggregate function is algebraic if it can be
computed by an algebraic function with M arguments
• Holistic: An aggregate function is holistic if there is no constant
bound on the storage size needed to describe a sub aggregate.
Concept Hierarchies
• A concept hierarchy defines a sequence of mappings from a set of low-
level concepts to higher-level, more general concepts.
OLAP Operations in the Multidimensional Data Model
• “How are concept hierarchies useful in OLAP?”
OLAP Operations
• Roll-up- The roll-up operation (also called the drill-up operation by some vendors)
performs aggregation on a data cube, either by climbing up a concept hierarchy for
a dimension or by dimension reduction.
• Drill-down- Drill-down is the reverse of roll-up. It navigates from less detailed data
to more detailed data. Drill-down can be realized by either stepping down a concept
hierarchy for a dimension or introducing additional dimensions.
• Slice and dice- The slice operation performs a selection on one dimension of the
given cube, resulting in a sub-cube
• Pivot (rotate)- Pivot (also called rotate) is a visualization operation that rotates the
data axes in view in order to provide an alternative presentation of the data.
• Other OLAP operations
– Drill-across
– Drill-through
OLAP Systems versus Statistical Databases

• Many of the characteristics of OLAP systems, such as the use

of a multidimensional data model and concept hierarchies, the
association of measures with dimensions, and the notions of
roll-up and drill-down, also exist in earlier work on statistical
databases (SDBs).
• A statistical database is a database system that is designed to
support statistical applications. Similarities between the two
types of systems are rarely discussed, mainly due to
differences in terminology and application domains.
A Starnet Query Model for Querying Multidimensional
Databases
A Starnet Query Model for Querying Multidimensional
Databases
Data Warehouse Back-End Tools and Utilities
• Data extraction, which typically gathers data from multiple, heterogeneous, and
external sources
• Data cleaning, which detects errors in the data and rectifies them when possible
• Data transformation, which converts data from legacy or host format to
warehouse format
• Load, which sorts, summarizes, consolidates, computes views, checks integrity,
and builds indices and partitions
• Refresh, which propagates the updates from the data sources to the warehouse
Types of OLAP Servers: ROLAP versus MOLAP
versus HOLAP
• Relational OLAP (ROLAP) servers:
• Multidimensional OLAP (MOLAP) servers:
• Hybrid OLAP (HOLAP) servers:
• Specialized SQL servers:
“How are data actually stored in ROLAP and MOLAP
architectures?”
Data Warehouse Implementation
• Efficient Computation
of Data Cubes
– The compute cube
Operator and the
Curse of
Dimensionality
“How many cuboids are there in an n-dimensional data cube?”

• If there were no hierarchies associated with each dimension, then the total
number of cuboids for an n-dimensional data cube, as we have seen above,
is 2n.
• such as in the hierarchy “day < month < quarter < year”.

where Li is the number of levels associated with dimension i.

Partial Materialization: Selected Computation of Cuboids

• There are three choices for data cube materialization given a base cuboid:
1. No materialization:
2. Full materialization:
3. Partial materialization:

• The partial materialization of cuboids or subcubes should consider three factors:

(1) identify the subset of cuboids or subcubes to materialize;
(2) exploit the materialized cuboids or subcubes during query processing; and
(3) efficiently update the materialized cuboids or subcubes during load and refresh.
Indexing OLAP Data
• How to index OLAP data by bitmap indexing and join indexing.
Efficient Processing of OLAP Queries

1. Determine which operations should be performed on the

available cuboids:
2.Determine to which materialized cuboid(s) the relevant
operations should be applied:
• “Which of the above four cuboids should be
selected to process the query?”
• “How would the costs of each cuboid
compare if used to process the query?”
From Data Warehousing to Data Mining
• “How do data warehousing and OLAP relate to data mining?”
• Data Warehouse Usage
– There are three kinds of data warehouse applications:
• Information processing
• Analytical processing
• Data mining
• “How does data mining relate to information processing and
on-line analytical processing?”
• “Do OLAP systems perform data mining?
• Are OLAP systems actually data mining systems?”
From On-Line Analytical Processing to
On-Line Analytical Mining
• On-line analytical mining (OLAM) (also called OLAP mining)
• OLAM is particularly important for the following reasons:
– High quality of data in data warehouses
– Available information processing infrastructure surrounding data
warehouses
– OLAP-based exploratory data analysis
– On-line selection of data mining functions
Architecture for
On-Line
Analytical Mining
Data
Warehouse
Deployment
Data Warehouse Deployment
• Lifecycle for data warehouse deployment project:
– 0. Project Scoping and Planning
– 1. Requirement
– 2. Front-End Design
– 3. Warehouse Schema Design
– 4. OLTP to data warehouse mapping
– 5. Implementation
– 6. Deployment
– 7. Management and Maintenance of the system
Growth and maintenance of Data warehouse

• Monitoring The Data Warehouse

– Collection of Statistics
– The following is a random list that includes statistics for different uses. You will find most
of these applicable to your environment.
• Physical disk storage space utilization
• Number of times the DBMS is looking for space in blocks or causes fragmentation
• Memory buffer activity
Collection of Statistics Cont..
• Buffer cache usage
• Input–output performance
• Memory management Profile of the warehouse content, giving number of distinct
entity occurrences (example: number of customers, products, etc.)
• Size of each database table Accesses to fact table records
• Usage statistics relating to subject areas
• Numbers of completed queries by time slots during the day
• Time each user stays online with the data warehouse
• Total number of distinct users per day
• Maximum number of users during time slots daily
• Duration of daily incremental loads
• Count of valid users
• Query response times
• Number of reports run each day
• Number of active tables in the database
Using Statistics for Growth Planning
• We indicate below the types of action that are prompted by the monitoring
statistics:
– Allocate more disk space to existing database tables Plan for new disk
space for additional tables
– Modify file block management parameters to minimize fragmentation
– Create more summary tables to handle large number of queries looking
for summary information
– Reorganize the staging area files to handle more data volume
– Add more memory buffers and enhance buffer management Upgrade
database servers
– Offload report generation to another middle tier
– Smooth out peak usage during the 24-hour cycle
– Partition tables to run loads in parallel and to manage backups
MANAGING THE DATA WAREHOUSE
Platform
Upgrades

Ongoing Managing
Fine- Data
Tuning Growth

Information
Delivery Storage
Enhanceme Management
nts

Data
ETL
Model Management
Revisions
Thank You

Data Mining Unit 1
No ratings yet
Data Mining Unit 1
46 pages
Unit 2 DATA WAREHOUSE AND DATA MART
No ratings yet
Unit 2 DATA WAREHOUSE AND DATA MART
17 pages
ML Module1
No ratings yet
ML Module1
56 pages
Mapping Data Warehouse to Multiprocessor
No ratings yet
Mapping Data Warehouse to Multiprocessor
34 pages
Data Warehousing & OLAP Guide
No ratings yet
Data Warehousing & OLAP Guide
35 pages
Unit2 Olap
No ratings yet
Unit2 Olap
13 pages
Data Warehousing and OLAP Overview
No ratings yet
Data Warehousing and OLAP Overview
38 pages
UNIT2DM
No ratings yet
UNIT2DM
63 pages
Data Mining and Warehousing (203105431) : Sandeep Jangir, Assistant Professor
No ratings yet
Data Mining and Warehousing (203105431) : Sandeep Jangir, Assistant Professor
44 pages
Chapter 2.introduction To Data Warehouse
No ratings yet
Chapter 2.introduction To Data Warehouse
49 pages
DMDW 1 2nd Module
No ratings yet
DMDW 1 2nd Module
29 pages
Unit 2 - Data Science BCA
No ratings yet
Unit 2 - Data Science BCA
20 pages
Multidimensional
No ratings yet
Multidimensional
77 pages
Data Warehousing Explained
No ratings yet
Data Warehousing Explained
21 pages
DWM Unit 1
No ratings yet
DWM Unit 1
67 pages
Data Warehousing Guide for IT Students
No ratings yet
Data Warehousing Guide for IT Students
77 pages
Data Warehousing and Multidimensional Models
No ratings yet
Data Warehousing and Multidimensional Models
71 pages
Data Warehouse
No ratings yet
Data Warehouse
71 pages
R20-DMT Unit-I
No ratings yet
R20-DMT Unit-I
24 pages
CST466-M1 - Ktunotes - in
No ratings yet
CST466-M1 - Ktunotes - in
24 pages
Mod1 Data Warehouse
No ratings yet
Mod1 Data Warehouse
30 pages
03 04OLAP SKJ Edited Oct 1, 2024
No ratings yet
03 04OLAP SKJ Edited Oct 1, 2024
93 pages
Data Warehouse Fundamentals and Design
No ratings yet
Data Warehouse Fundamentals and Design
36 pages
Data Warehousing and Mining Overview
No ratings yet
Data Warehousing and Mining Overview
14 pages
Bca DM Unit Ii
No ratings yet
Bca DM Unit Ii
17 pages
Data Warehouse Concepts Overview
No ratings yet
Data Warehouse Concepts Overview
58 pages
An Overview of Data Warehousing and OLAP Technology What Is Decision Support?
No ratings yet
An Overview of Data Warehousing and OLAP Technology What Is Decision Support?
4 pages
Data Warehousing Overview and Concepts
No ratings yet
Data Warehousing Overview and Concepts
60 pages
What Is Data Warehouse?: Data Mining by IK Unit 2
No ratings yet
What Is Data Warehouse?: Data Mining by IK Unit 2
21 pages
Unit-2 1
No ratings yet
Unit-2 1
60 pages
DM 6
No ratings yet
DM 6
29 pages
DW&DM Material
No ratings yet
DW&DM Material
107 pages
DWDM 2
No ratings yet
DWDM 2
15 pages
Datascience Unit 02 1
No ratings yet
Datascience Unit 02 1
53 pages
Data Warehouse and Mining
No ratings yet
Data Warehouse and Mining
7 pages
Data Warehouses and Data Cubes
No ratings yet
Data Warehouses and Data Cubes
21 pages
Bi Unit 4
No ratings yet
Bi Unit 4
40 pages
Data Warehouse Overview and Architecture
No ratings yet
Data Warehouse Overview and Architecture
19 pages
Data Warehouse and OLAP Overview
No ratings yet
Data Warehouse and OLAP Overview
32 pages
Data Warehousing and OLAP Concepts
No ratings yet
Data Warehousing and OLAP Concepts
26 pages
Data Warehousing & OLAP Insights
No ratings yet
Data Warehousing & OLAP Insights
53 pages
MultiDimensional Data Model
No ratings yet
MultiDimensional Data Model
22 pages
FALLSEM2023-24 CSI3010 ETH VL2023240104197 2023-07-26 Reference-Material-I
No ratings yet
FALLSEM2023-24 CSI3010 ETH VL2023240104197 2023-07-26 Reference-Material-I
28 pages
Unit 2
No ratings yet
Unit 2
34 pages
DWDM 2
No ratings yet
DWDM 2
16 pages
Data Warehousing & Mining Guide
No ratings yet
Data Warehousing & Mining Guide
29 pages
Data Warehouse & OLAP Essentials
No ratings yet
Data Warehouse & OLAP Essentials
17 pages
Data Warehousing & OLAP Guide
No ratings yet
Data Warehousing & OLAP Guide
20 pages
Data Warehouse - Unit-2 - S
No ratings yet
Data Warehouse - Unit-2 - S
21 pages
DMDW Operations
No ratings yet
DMDW Operations
65 pages
CH - 3
No ratings yet
CH - 3
45 pages
Unit 2
No ratings yet
Unit 2
32 pages
Unit 1
No ratings yet
Unit 1
141 pages
What Is A Data Warehouse?
No ratings yet
What Is A Data Warehouse?
47 pages
Module 1
No ratings yet
Module 1
71 pages
DMDW Mid 1 Solution
No ratings yet
DMDW Mid 1 Solution
29 pages
DWDM Unit - I Notes
No ratings yet
DWDM Unit - I Notes
24 pages
DM Lect4
No ratings yet
DM Lect4
31 pages
Understanding Digital Footprints
No ratings yet
Understanding Digital Footprints
9 pages
Oops Practical
No ratings yet
Oops Practical
17 pages
Delphi Multi Property Edition ISM Guide EU Based MPE (Watford)
No ratings yet
Delphi Multi Property Edition ISM Guide EU Based MPE (Watford)
12 pages
Embedded Lab Manual
No ratings yet
Embedded Lab Manual
76 pages
Next Generation Clickjacking: New Attacks Against Framed Web Pages
No ratings yet
Next Generation Clickjacking: New Attacks Against Framed Web Pages
54 pages
Byungjin: Omnious. Co., LTD
No ratings yet
Byungjin: Omnious. Co., LTD
2 pages
KPI Metrics & Excel Resources Guide
No ratings yet
KPI Metrics & Excel Resources Guide
6 pages
UDO ID IO Ifl IO 111 OI: Perpustakaan Ump
No ratings yet
UDO ID IO Ifl IO 111 OI: Perpustakaan Ump
24 pages
2 Evolution of Computer
No ratings yet
2 Evolution of Computer
9 pages
212-89 Exam: Certified Incident Handler Q&A
No ratings yet
212-89 Exam: Certified Incident Handler Q&A
4 pages
Solutions to Password Fatigue
No ratings yet
Solutions to Password Fatigue
2 pages
SDLC Assignment 1 BKC18400
67% (3)
SDLC Assignment 1 BKC18400
24 pages
Class 5 L-2 Input & Output Devices
0% (1)
Class 5 L-2 Input & Output Devices
61 pages
HCIP-Access V2.5 Exam Outline
No ratings yet
HCIP-Access V2.5 Exam Outline
3 pages
BBA Computer Basics Exam Guide
No ratings yet
BBA Computer Basics Exam Guide
2 pages
03-Huawei OceanStor RAID 2.0 Technical Pre-Sales Training V1.2-Wang Zhen
No ratings yet
03-Huawei OceanStor RAID 2.0 Technical Pre-Sales Training V1.2-Wang Zhen
16 pages
Internet Evolution: A Brief History
No ratings yet
Internet Evolution: A Brief History
21 pages
ISO 27001 Statement of Applicability Guide
100% (3)
ISO 27001 Statement of Applicability Guide
4 pages
Alcatel-Lucent 9365 Base Station Router Femto
No ratings yet
Alcatel-Lucent 9365 Base Station Router Femto
4 pages
Chapter 1 Introduction To .NET Framework
No ratings yet
Chapter 1 Introduction To .NET Framework
13 pages
NIS Microproject in Cyber Operations
100% (2)
NIS Microproject in Cyber Operations
14 pages
Class S. ForIT Students Modified
No ratings yet
Class S. ForIT Students Modified
5 pages
The Hackers
No ratings yet
The Hackers
2 pages
Strong and Weak Typing
No ratings yet
Strong and Weak Typing
4 pages
Splunk - Actualtests.splk 1001.v2019!09!10.by - Owen.43q
No ratings yet
Splunk - Actualtests.splk 1001.v2019!09!10.by - Owen.43q
19 pages
Media and Information Literacy-Lesson 2
100% (1)
Media and Information Literacy-Lesson 2
31 pages
System Maintenance Changelog
No ratings yet
System Maintenance Changelog
261 pages
REJETI JAHNAVI Crimes in Digital Marketing
No ratings yet
REJETI JAHNAVI Crimes in Digital Marketing
10 pages
Kelso 300ppt
No ratings yet
Kelso 300ppt
14 pages
Problem Statement: Travel Website 2021-22
No ratings yet
Problem Statement: Travel Website 2021-22
8 pages

Data Warehouse for Information Processing

Uploaded by

Data Warehouse for Information Processing

Uploaded by

Data Warehousing & Mining

B. Tech and MBA. Tech. (Information Technology)

Chapter 2 Architecture and Infrastructure & Data

By. Prof. Bhushan Inje

By. Prof. Bhushan Inje

• Stars, Snowflakes, and Fact Constellations:

• Many of the characteristics of OLAP systems, such as the use

where Li is the number of levels associated with dimension i.

• The partial materialization of cuboids or subcubes should consider three factors:

1. Determine which operations should be performed on the

• Monitoring The Data Warehouse

You might also like