0% found this document useful (0 votes)
177 views27 pages

Datawarehousing Chap01

This document provides an overview of data warehousing. It defines data warehousing as subject-oriented, integrated, time-variant, and non-volatile collection of data to support management decision making. The benefits of data warehousing include increased productivity, competitive advantage, and returns on investment. Key components of a data warehouse architecture include operational data sources, an operational data store, a load manager, warehouse manager, query manager, and end-user access tools. Data flows through the system via inflow, upflow, downflow, outflow, and meta-flow processes.

Uploaded by

rahul_mhatre_26
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
177 views27 pages

Datawarehousing Chap01

This document provides an overview of data warehousing. It defines data warehousing as subject-oriented, integrated, time-variant, and non-volatile collection of data to support management decision making. The benefits of data warehousing include increased productivity, competitive advantage, and returns on investment. Key components of a data warehouse architecture include operational data sources, an operational data store, a load manager, warehouse manager, query manager, and end-user access tools. Data flows through the system via inflow, upflow, downflow, outflow, and meta-flow processes.

Uploaded by

rahul_mhatre_26
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 27

Data Warehousing

 Outline
• What is data warehousing
• The benefit of data warehousing
• Differences between OLTP and data warehousing
• The architecture of data warehouse
• The main components
• Data flows
• Tools and technologies
• Integration
• The importance of managing meta-data
• Data marts
 What is data warehousing?
• data warehousing is subject-oriented,
integrated, time-variant, and non-volatile
collection of data in support of management’s
decision-making process.
• a data warehouse is data management and
data analysis
• data webhouse is a distributed data
warehouse that is implement over the web
with no central data repository
• goal: is to integrate enterprise wide corporate
data into a single reository from which users
can easily run queries
 What is data warehousing?
• Subject-orientedWH is organized around the major subjects of the
enterprise..rather than the major application areas.. This is reflected in
the need to store decision-support data rather than application-oriented
data
• Integratedbecause the source data come together from different
enterprise-wide applications systems. The source data is often
inconsistent using..The integrated data source must be made
consistent to present a unified view of the data to the users
• Time-variantthe source data in the WH is only accurate and valid at
some point in time or over some time interval. The time-variance of the
data warehouse is also shown in the extended time that the data is
held, the implicit or explicit association of time with all data, and the fact
that the data represents a series of snapshots
• Non-volatiledata is not update in real time but is refresh from OS on
a regular basis. New data is always added as a supplement to DB,
rather than replacement. The DB continually absorbs this new data,
incrementally integrating it with previous data
 The benefits of data
warehousing
• The potential benefits of data
warehousing are high returns on
investment..
• substantial competitive advantage..
• increased productivity of corporate
decision-makers..
 The difference bewteen
OLTP and data warehousing
• A DBMS built for online transaction
processing (OLTP) is generally
regarded as unsuitable for data
warehousing because each system is
designed with a differing set of
requirements in mind
• example: OLTP systems are design to maximize the transaction
processing capacity, while data warehouses are designed to
support ad hoc query processing
comparision of OLTP systems and data
OLTP systems Datasystem
warehousing warehousing
systems
Hold current data Holds historical data
Stores detailed data Stores detailed, lightly, and highly
Data is dynamic summarized data
Repetitive processing Data is largely static
High level of transaction throughput Ad hoc, unstructured, and heuristic
Predictable pattern of usage processing
Transaction-driven Medium to how level of transaction
Application-orented throughput
Supports day-to-day decisions Unpredictable pattern of usage
Serves large number of clerical/operation Analysis driven
users Subject-oriented
supports strategic decisions
Serves relatively how number of
managerial users
 Problems
• Underestimation of resources for data loading
• Hidden problems with source systems
• Required data not captured
• Increased end-user demands
• Data homogenization
• High demand for resources
• Data ownership
• High maintenance
• Long-duration projects
• Complexity of integration
 The architecture
Operatio
nal Reporting, query,
data application development,
source1 and EIS(executive
High
Query Manage summa information system) tools
Meta-
Operati rized
data
onal data
data Lightly
source summa
2
Load Manager
rized
data
Operati
onal
data
source
Detaile
d data
DBMS
Warehouse Manager OLAP(online analytical
n processing) tools

Operati
onal
data
store
(ods)
Operational data store
(ODS)
Data mining
Archiv
e/back
up
data End-user
access
Typical architecture of a data tools
warehouse
 The main components
• Operational data sourcesfor the DW is supplied from
mainframe operational data held in first generation hierarchical
and network databases, departmental data held in proprietary file
systems, private data held on workstaions and private serves
and external systems such as the Internet, commercially
available DB, or DB assoicated with and organization’s suppliers
or customers
• Operational datastore(ODS)is a repository of
current and integrated operational data used for analysis. It is
often structured and supplied with data in the same way as the
data warehouse, but may in fact simply act as a staging area for
data to be moved into the warehouse
 The main components
• load manageralso called the frontend component, it
performance all the operations associated with the extraction and
loading of data into the warehouse. These operations include
simple transformations of the data to prepare the data for entry
into the warehouse
• warehouse managerperforms all the operations associated
with the management of the data in the warehouse. The
operations performed by this component include analysis of data
to ensure consistency, transformation and merging of source
data, creation of indexes and views, generation of
denormalizations and aggregations, and archiving and backing-
up data
 The main components
• query manageralso called backend component, it performs
all the operations associated with the management of user
queries. The operations performed by this component include
directing queries to the appropriate tables and scheduling the
execution of queries
• detailed, lightly and lightly summarized
data,archive/backup data
• meta-data
• end-user access toolscan be categorized into five main
groups: data reporting and query tools, application development
tools, executive information system (EIS) tools, online analytical
processing (OLAP) tools, and data mining tools
 Data flows
• Inflow- The processes associated with the extraction, cleansing, and
loading of the data from the source systems into the data warehouse.

• upflow- The process associated with adding value to the data in the
warehouse through summarizing, packaging , packaging, and distribution of the
data
• downflow- The processes associated with archiving and backing-up
of data in the warehouse

• outflow- The process associated with making the data availabe to the
end-users

• Meta-flow- The processes associated with the management of the


meta-data
Reporting, query,application
Operatio Warehouse Manager development, and EIS (executive
nal
data
information system) tools
source1 Meta-flow High
Meta-
summa
data
rized
data
Inflow Outflow
Lightly
Load summa
Warehouse
rized
Manager
Manager OLAP (online
data Query Manage
Operati Upflow analytical processing)
onal tools
data
source
Detaile
d data
DBMS
n

Operati
onal
data
store Data mining tools
(ods)
End-user
Downflow access
Archiv tools
e/back
up
data

Information flows of a data


warehouse
 Tools and Technologies
• The critical steps in the construction of a data
warehouse:
a. Extraction
b. Cleansing
c. Transformation
• after the critical steps, loading the results into
target system can be carried out either by
separate products, or by a single, categories:
• code generators
• database data replication tools
• dynamic transformation engines
 Data Warehouse
DBSM(integration)
• due to the maturity of such products, most
relational databases will integrate predictably
with other types of software
• The reqirements for data warehose RDBMS
• Load performance
• Load processing
• Data quality management
• Query perfomance
• Terabyte scalability
• Mass user scalability
• Networked data warehouse
• Warehouse administration
• Integrated dimensional analysis
• Advanced query funtionlity
 The importance of
managing meta-

data(integration)
The integration of meta-data, that is ”data about data”
• Meta-data is used for a variety of purposes and the management
of it is a critical issue in achieving a fully integrated data
warehouse
• The major purpose of meta-data is to show the pathway back to
where the data began, so that the warehouse administrators
know the history of any item in the warehouse
• The meta-data associated with data transformation and loading
must describe the source data and any changes that were made
to the data
• The meta-data associated with data management describes the
data as it is stored in the warehouse
• The meta-data is required by the query manager to generate
appropriate queries, also is associated with the user of queries
• The major integration issue is how to synchronize the various
types of meta-data use throughout the data warehouse. The
challenge is to synchronize meta-data between different products
from different vendors using different meta-data stores
• Two major standards for meta-data and modeling in the areas of
data warehousing and component-based development-
MDC(Meta Data Coalition) and OMG(Object Management
Group)
 Administration and
Management Tools
• a data warehouse requires tools to support the
administration and management of such complex
enviroment.
• for the various types of meta-data and the day-to-day
operations of the data warehouse, the administration
and management tools must be capable of
supporting those tasks:
• monitoring data loading from multiple sources
• data quality and integrity checks
• managing and updating meta-data
• monitoring database performance to ensure efficient query
response times and resource utilization
• auditing data warehouse usage to provide user chargeback
information
• replicating, subsetting, and distributing data
• maintaining effient data storage management
• purging data;
• archiving and backing-up data
• implementing recovery following failure
• security management
 Data mart
• data mart a subset of a data
warehouse that supports the
requirements of particular department
or business function
• The characteristics that differentiate
data marts and data warehouses
include:
• a data mart focuses on only the requirements of users
associated with one department or business function
• data marts do not normally contain detailed operational data,
unlike data warehouses
• as data marts contain less data compared with data warehouses,
data marts are more easily understood and navigated
Operatio Warehouse Manager
nal
data
source1 High
Reporting, query,application developmen
Meta- summa
and EIS(executive information system) to
data rized
Operati data
onal
data Lightly
summa Query
source Load Manage
2 Warehouse
rized Manager
Manager data
Operati
onal OLAP(online analytical
data
source
Detaile
d data DBMS processing) tools

Operati
onal
data
store
(ods) Data mining
(First Tier)
(Third Tier)
Operational data store (ODS)
Archiv
e/back End-user
up access
data tools
Data Mart
summari
zed
data(Rel
ational
databas
Summe)
arized
data
(Multi-
dimen (Second Tier)
sion
databa
Typical data warehouse adn data martse)
architecture
Reasons for creating a data
mart
• To give users access to the data they need to analyze most often
• To provide data in a form that matches the collective view of the
data by a group of users in a department or business function
• To improve end-user response time due to the reduction in the
volume of data to be accessed
• To provide appropriately structured data as ditated by the
requirements of end-user access tools
• Normally use less data so tasks such as data cleansing, loading,
transformation, and integration are far easier, and hence
implementing and setting up a data mart is simpler than
establishing a corporate data warehouse
• The cost of implementing data marts is normally less than that
required to establish a data warehouse
• The potential users of a data mart are more clearly defined and
can be more easily targeted to obtain support for a data mart
project rather than a corporate data warehouse project
data marts issues
• data mart functionalitythe capabilities of data
marts have increased with the growth in their popularity
• data mart sizethe performance deteriorates as
data marts grow in size, so need to reduce the size
of data marts to gain improvements in performance
• data mart load performancetwo critical
components: end-user response time and data
loading performanceto increment DB updating so
that only cells affected by the change are updated
and not the entire MDDB structure
• users’ access to data in multiple martsone
approach is to replicate data between different data marts or,
alternatively, build virtual data martit is views of several
physical data marts or the corporate data warehouse tailored to
meet the requirements of specific groups of users
• data mart internet/intranet accessit’s products sit
between a web server and the data analysis product.Internet/intranet
offers users low-cost access to data marts and the data WH using web
browsers.
• data mart administrationorganization can not easily
perform administration of multiple data marts, giving rise to issues such
as data mart versioning, data and meta-data consistency and integrity,
enterprise-wide security, and performance tuning . Data mart
administrative tools are commerciallly available
• data mart installationdata marts are becoming
increasingly complex to build. Vendors are offering products
referred to as ”data mart in a box” that provide a low-cost source
of data mart tools

You might also like