Data Warehousing & Business Intelligence - I
KALINGA INSTITUTE OF INDUSTRIAL
TECHNOLOGY
School Of Computer
Engineering
Datamining and Dr. Amiya Ranjan Panda
Assistant Professor [II]
Data warehousing School of Computer Engineering,
Kalinga Institute of Industrial Technology (KIIT),
(CS 2004) Deemed to be University,Odisha
3 Credit Lecture Note 06
Acknoledgement
2
A Special
Thanks to
J. Han and M. Kamber.
&
Tan, Steinbach, Kumar
for their slides and books, which I have
used for preparation of these slides.
Chapter Contents
3
q What is the need for BI (Business Intelligence)?
q What is Data Warehousing?
q Key Terminology to DWH Architecture
Ø OLTP Vs OLAP
Ø ETL
Ø Data Mart
Ø Metadata
q DWH Architecture
What is Business Intelligence (BI) ?
4
q Data, Information, Decision
q BI as a decision process
q BI as an information system
q Business Inteligence is a set of theories, methodologies, architectures, and
technologies that transforms raw data into meaningful and useful information
for business purposes.
q It provides reports but it can't predict future.
q Helps to answer the questions of current problem.
q Input- past data, Output- present solution.
Information Processing
5
Modification Reporting
Deletion Transactional Analytical Analysis
Processing Processing
(Focus on TYPES (Focus on
individual data multiple data
Data item processing) item processing) Decision
insertion support
Transmission Transformation
In this sense, BI focuses on
analytical data processing
What is Data Warehouse
6
q A data warehouse is a central repository of data management system that
collects, manages data from various sources designed to enable and support
business intelligence (BI) activities, especially analytics. It is used to help the
organization in taking decisions.
Ø A data warehouse centralizes and consolidates large amounts of data
from multiple sources.
Ø They store current and historical data in one single place.
Ø Data in the data warehouse must have strong analytical
characteristics.
Ø Creating data to be analytical requires that it be subject- oriented,
integrated, time-referenced, and non-volatile.
Ø Support information processing by providing a solid platform
Ø of consolidated, historical data for analysis.
Data Warehouses Vs Operational Database Systems
7
q A special database system called data warehouse or data mart is often used to
store enterprise data.
Ø The purpose of a data warehouse is to organize lots of stable data for ease of
analysis and retrieval.
Ø It deals with analyzing data and making decisions, often major, about how
the enterprise will operate now, and in the future.
q OLAP: On Line Analytical Processing: Describes processing at warehouse
q Traditional (operational) relational databases facilitate data management and
transaction processing. They have two limitations for data analysis and decision
support
Ø Performance
ü They are transaction oriented (data insert, update, move, etc.)
ü Not optimized for complex data analysis
ü Usually do not hold historical data
Ø Heterogeneity
ü Individual databases usually manage data in very different ways, even
in the same organization (not to mention external data sources which
may be dramatically different).
q OLTP: On Line Transaction Processing: Describes processing at operational
sites
Data Warehouses Vs Operational Database Systems -
Functional point of view
8
Key Data warehouse Operational Database
Basic A data warehouse is a repository for Operational Database are those databases
structured, filtered data that has where data changes frequently
already been processed for a specific
purpose
Data Structure Data warehouse has denormalized It has normalized schema
schema
Transaction Optimized for bulk loads and large Optimized for a common and known set
Optimization complex, unpredictable queries. of transactions.
Performance It is fast for analysis queries It is slow for analytics queries
Type of Data It focuses on historical data It focuses on current transactional data
Uses Case It is used for OLAP It is used for OLTP
Data Updates Batch updates Continuous updates
Query Handling Usually very complex queries Simple to complex queries
8
Design point of view
9
ER Diagram
Star Schema
9
Why Do We Need Data Warehouses?
10
q Consolidation of information resources
q Improved query performance
q Separate research and decision support functions from the operational systems
q Foundation for data mining, data visualization, advanced reporting and OLAP
tools
q Example of Data Warehousing:
Ø E-commerce website. They have a warehouse. Where the organization project the
demand for the products, that were to procure from the supplier and store it in the
warehouse. The good is dispatched by the supplier.
Ø The customers directly interact with the e-commerce website for the product. The
managers of these e-commerce business houses directly orders the supplier and get
the product.
Ø The products come to the business houses from different sources by maintaining
different databases.
ü Grocery items (database1)
ü Fashion items like clothes, shoes, cosmetics (database2)
ü Computer, laptops, mobile and electronic goods (database3)
10
Data Warehouses Example...
11
Ø Now , lets consider 100 suppliers will contact the manufacturer for a specific
product which is demanded by the consumers, imagine the strain that is in
manufacturer to supply those products.
Ø From the customers prospective, there is a delay in getting the product which you
will be never acceptable.
Ø If you place an order in an e-commerce website, you must except the product
should be deliver in 24-48 hrs. If it is delayed by 2-3 days it impact the customers
satisfaction levels. So the business house may lose the customers.
11
Applications of Data Warehousing
12
Sector Usage
Airline Helps in airline system management operations like crew
assignment, analyzes of route, frequent flyer program
discount schemes for passenger, etc.
Banking It is used in the banking sector to manage the resources
available on the desk effectively.
Healthcare sector Used to strategize and predict outcomes, create patient's
treatment reports, etc. Advanced machine learning, big data
enable datawarehouse systems can predict ailments.
Insurance sector Used to analyze data patterns, customer trends, and to
track market movements quickly.
Retail chain Helps you to track items, identify the buying pattern of the
customer, promotions and also used for determining
pricing policy.
Telecommunication Used for product promotions, sales decisions and to
make distribution decisions.
12
Database vs Data Warehousing
13
Database Data Warehouse
Purpose Is designed to record Is designed to analyze
Processing Method The database uses the Online Transactional Data warehouse uses Online Analytical
Processing (OLTP) Processing (OLAP).
Usage The database helps to perform fundamental Data warehouse allows you to analyze your
operations for your business business.
Tables and Joins Tables and joins of a database are complex as Table and joins are simple in a data
they are normalized. warehouse because they are denormalized.
Orientation Is an application-oriented collection of data It is a subject-oriented collection of data
Storage limit Generally limited to a single application Stores data from any number of applications
Availability Data is available real-time Data is refreshed from source systems as
and when needed
Usage ER modeling techniques are used for designing. Data modeling techniques are used for
designing.
Technique Capture data Analyze data
Data Type Data stored in the Database is up to date. Current and Historical Data is stored in Data
Warehouse. May not be up to date.
Storage of data Flat Relational Approach method is used for Data Ware House uses dimensional and
data storage. normalized approach for the data structure.
Example: Star and snowflake schema.
Query Type Simple transaction queries are used. Complex queries are used for analysis
purpose.
Data Summary Detailed Data is stored in a database. It stores highly summarized data.
13
Comparison of OLTP and OLAP Systems
14
Feature OLTP OLAP
Characteristics Operational Processing Informational processing
Orientation Transaction Analysis
User Clerk,DBA, Database Knowledge worker (manager, Executive,
professional Analyst)
Function day-to-day operation long-term informational
requirements decision support
DB design ER-based, application oriented star/snowflake, subject oriented
Data current, guaranteed up to date historic, accuracy maintained over time
Summarization Primitive, highly detailed summarized, consolidated
view detailed, flat relational summarized, multidimensional
Unit of work short, simple transaction complex query
Access read and write mostly read
Focus data in information out
Operations index / hash on primary key lots of scans
# of records access tens millions
Number of users thousands hundreds
DB size GB to High order GB >= TB
Piority High performance and high- High flexibility, end-user autonoy
availability
Metric Transaction throughput query throughput, response time
14
Examples of OLTP and OLAP
15
q OLTP
Ø A railway reservation server which records the transaction of the passengers.
Ø A supermarket server which records the every product
Ø purchased from the market.
Ø A bank server which records every time the transactions made by any
account
q OLAP
Ø An insurance company wants to know the number of policies each agent has
sold.
Ø A bank Manager wants know how many customers are utilizing the ATM of
that branch.
15
Examples of OLTP and OLAP...
16
q OLTP- To record each and every transactions. Take a real life example of ATM.
Every transaction is recorded in a OLTP system. ATM is not the only feeder to this
OLTP system. When you will go to the bank and perform some transaction that is
also recorded. So there are multiple sources to feed to this OLTP. The
disadvantages are:
Ø To have a query, one has to combine this multiples sources which
Ø are different formatting of their own.
Ø The number of transactions. (100s of customers, 1000s of queries)
Ø So this only to record the transactions.
Ø Railway Reservation System:
ü For reservation, one can opt internet, mobile, railway station.
ü Can go to 'n' number of agents spread across the town. These are multiple
desperate sources. When u will use internet to book the ticket the format is totally
different from u use a mobile.
ü The data types are changing. So this multiple desperate sources is difficult for
queering. It is an hindrance to Analytical Processing. So for reporting at the end
of the day we need a OLAP system. It is an alternate system.
16
Operational vs. Informational Systems
17
Information Access Today
17
Operational vs. Informational Systems
18
Ord. Operational
Mafg.
Entry Systems
Information Access Today
18
Operational vs. Informational Systems
19
Operational
Systems
Informational
Systems
Information Access Today
19
Operational vs. Informational Systems
20
Operational
Systems
Informational
Estimating Marketing Product
& Analysis Systems Planning
Systems
Information Access Today
20
Operational vs. Informational Systems
21
Operational
Systems
Information
Delivery System
Informational
Systems
Information Access Today
21
Operational vs. Informational Systems
22
Operational
Systems
Information
Data Warehousing is fundamentally
Delivery System
an issue of Enterprise Data Architecture
Informational
Systems
Information Access Today
22
Operational vs. Informational Systems
23
Information
Delivery System
Informational
Systems
23
Operational vs. Informational Systems
24
Operational
Systems
Data
Information
Data Warehouse
Delivery System
Marts
Informational
Systems
24
Operational vs. Informational Systems
25
Information
Delivery System
25
Operational vs. Informational Systems
26
Information
Delivery System
26
Three-tier Data Warehousing Architecture
27
27
Traditional Data Warehousing and Business Intelligence
28
28
Traditional Data Warehousing and Business Intelligence
29
q Data Source Layer: defining which data will be loaded into the system and analyzed.
Ø Text Files
Ø OLTP, Databases
Ø XML
Ø JSON
Ø Spreadsheet Files
q Source Data Examples
Ø Retail POS system
Ø Web Site
Ø DBMS
29
Traditional Data Warehousing and Business Intelligence
30
q ETL (Extract, Transform, and Load) and Staging Layer:
– Extraction : accessing and extracting the data from the source systems,
including database, flat files, spreadsheets, etc.
– Transformation : data cleanse, change the extracted data to a format and structure that
conform to the destination data.
– Loading : load the data to the destination database, and check for data integrity
• Tools to move data to staging DB
• Staging DB is a temporary storage to be loaded to DWH
• Staging DB could be operational reporting tool/platform
30
Traditional Data Warehousing and Business Intelligence
31
q Data Warehouse:-
Ø Used for reporting
Ø A scalable DB storing historical enterprise data
Ø Online Analytical processing
Ø Not used for transaction processing
31
Traditional Data Warehousing and Business Intelligence
32
q Access Layer:-
Ø Data Mart for business fast query (Star Schema)
Ø OLAP uses a multidimensional data model, allowing for complex
Ø analytical and adhoc queries with a rapid execution time
Ø Data mining for mostly in structured data format
32
Traditional Data Warehousing and Business Intelligence
33
q Presentation Layer:-
Ø Need to gather requirements from Business Units for Visualization and Touch points
Ø Need to identify data sources and method to deliver results
Ø Enterprise dashboards, reports and alerts that present findings from the
Ø analysis
33
Data Warehouse Models
34
q Three Data Warehouse Models
Ø Enterprise warehouse
ü collects all of the information about subjects spanning the entire
organization
Ø Data Mart
ü a subset of corporate-wide data that is of value to a specific groups of
users. Its scope is confined to specific, selected groups, such as
marketing data mart
ü Independent vs. dependent (directly from warehouse) data mart
Ø Virtual warehouse
ü A set of views over operational databases
ü Only some of the possible summary views may be
ü materialized
34
Data Marts
35
q Data mart is a smaller version of the
Datawarehouse.
q Data mart deal with a signle subject.
q Data marts are focused on one area and they
draw data from a limited number of sources.
q Time taken to build the data is very low
compared to the time taken to build a
Datawarehouse.
q Data Mart helps to enhance user's response
time due to reduction in volume of data
q It provides easy access to frequently requested
data.
q Type of Data Mart
Ø Dependent: Dependent data marts are created by drawing data directly from
operational, external or both sources.
Ø Independent: Independent data mart is created without the use of a central
data warehouse.
Ø Hybrid: This type of data marts can take data from data warehouses or
operational systems.
35
From the Data Warehouse to Data Marts
36
q Data Mart – A logical subset of the complete data warehouse. Often viewed as a
restriction of the data warehouse to a single business process or to a group of
related business processes targeted toward a particular business group.
36
A recommended approach for data warehouse development.
37
37
Dimensions of Data Warehousing
38
Performance
Security
Connection to
the Operational Data
Ease of
Use Flexibility
Distributed Data
Quality
Scalability
38
Extraction, Transformation, and Loading (ETL)
39
q Data extraction
Ø get data from multiple, heterogeneous, and external sources
q Data cleaning
Ø detect errors in the data and rectify them when possible
q Data transformation
Ø convert data from legacy or host format to warehouse format
q Load
Ø sort, summarize, consolidate, compute views, check integrity, and build
indicies and partitions
q Refresh
Ø propagate the updates from the data sources to the warehouse
39
Recommended Text and Reference Books
40
q Text Book:
Ø J. Han and M. Kamber. Data Mining: Concepts and Techniques. Morgan
Kaufmann, 3rd ed., 2011
q Reference Books:
Ø H. Dunham. Data Mining: Introductory and Advanced Topics. Pearson
Education. 2006.
Ø I. H. Witten and E. Frank. Data Mining: Practical Machine Learning Tools
and Techniques. Morgan Kaufmann. 2000.
Ø D. Hand, H. Mannila and P. Smyth. Principles of Data Mining.Prentice-Hall.
2001.
40
41