Data Warehouse &
Data Mining
Database and Datawarehouse
• DWH constitute entire information base for all time
• DB constitute real time information
• DWH supports DB and Business Intelligence
• DB is used to run the business
• DWH is how to run the business
Database and Big data
Database Big Data
Volume GB TB or PB
Generate rate Per day, per hour Per second and per minisecond
Structure Structured Unstructured, semistructured
Data Source Centralized Distributed
Data integration easy Difficult
Data Store RDBMS HDFS, noSQL
Data Structure Static Schema Dynamic Schema
A producer wants to Know
Data, Data everywhere yet…
What is a Datawarehouse?
Data Warehousing- A process
What is a data warehousing
Structured Data
• Fixed Format
• Can be accessed from database by simple search engine algorithm.
• Ex: Employee Table: Emp details, salary, job position etc can be found.
Semistructured
• Personal Data in XML file
Unstructured Data
• Image
• Video
• Audio
• CCTV Survillience
Characteristics of Datawarehouse
Data Marts: Structured and
Unstructured Data
Data Query
Making A Data warehouse:
Extraction
ETL: Transformation Process
ETL Process: Loading
Data Warehouse Architecture
OLAP Vs OLTP
OLAP Operations
• OLAP databases are
divided into one or more
cubes and these cubes
are known as Hyper-
cubes.
• It is based on
multidimensional data
model and allows the user
to query on multi-
dimensional data (eg.
Delhi -> 2018 -> Sales
data).
OLAP Operations
• Drill Down
• In drill-down operation, the less
detailed data is converted into
highly detailed data. It can be
done by:Moving down in the
concept hierarchy
• Adding a new dimension
• In the cube given in overview
section, the drill down operation
is performed by moving down in
the concept hierarchy
of Time dimension (Quarter ->
Month).
OLAP Operations
• Roll Up
• It is just opposite of the drill-
down operation. It performs
aggregation on the OLAP cube.
It can be done by:Climbing up in
the concept hierarchy
• Reducing the dimensions
• In the cube given in the
overview section, the roll-up
operation is performed by
climbing up in the concept
hierarchy of Location dimension
(City -> Country).
OLAP Operations
• Slice
• It selects a single
dimension from the OLAP
cube which results in a
new sub-cube creation. In
the cube given in the
overview section, Slice is
performed on the
dimension Time = “Q1”.
OLAP Operations
• Dice
• It selects a sub-cube from the
OLAP cube by selecting two or
more dimensions. In the cube
given in the overview section,
a sub-cube is selected by
selecting following
dimensions with
criteria:Location = “Delhi” or
“Kolkata”
• Time = “Q1” or “Q2”
• Item = “Car” or “Bus”