Course Name: Data Warehousing and Mining
Subject Code:24MCA012C03
Program: MCA 2nd
by
Aadil Ahmad
Introduction to Data Warehousing
A data warehouse is a centralized repository for storing large volumes of
structured, unstructured and historical data from multiple sources. It enables
data analysis, reporting, and decision-making.
Purpose:
• Integrates data from different sources like databases, applications, and
external systems.
• Provides a single version of truth for business intelligence and analytics.
Importance:
• Facilitates strategic decision-making.
• Improves data quality and consistency.
• Supports advanced analytics and reporting tools.
www.ctuniversity.in Campus : Ludhiana
www.ctuniversity.in Campus : Ludhiana
www.ctuniversity.in Campus : Ludhiana
Decision Support Systems (DSS)
A Decision Support System (DSS) is a computer-based information system
designed to assist in decision-making by analysing large volumes of data and
providing actionable insights.
Purpose
Supports semi-structured and unstructured decision-making processes.
Enhances managerial decision-making with data-driven insights.
Components
Data Management: Stores and retrieves relevant data from internal and
external sources.
Model Management: Includes tools and algorithms for data analysis (e.g.,
forecasting, optimization).
User Interface: Enables interaction with the system through dashboards,
reports, and queries.
www.ctuniversity.in Campus : Ludhiana
www.ctuniversity.in Campus : Ludhiana
Benefits and Real-World Applications
Benefits
• Improves decision efficiency and accuracy.
• Supports scenario analysis and predictive modeling.
• Facilitates better strategic planning.
Applications
• Healthcare: Clinical decision support systems.
• Finance: Risk analysis and portfolio management.
• Supply Chain: Inventory optimization.
• Marketing: Customer segmentation and campaign analysis.
www.ctuniversity.in Campus : Ludhiana
Data Warehouse Modeling
Data Warehouse Modelling involves designing the structure of a data
warehouse to organize and integrate data effectively for analysis and reporting.
Steps
• Understand Requirements
• Design Conceptual Schema
• Develop Logical Schema
• Implement Physical Schema
• Optimize Performance
www.ctuniversity.in Campus : Ludhiana
www.ctuniversity.in Campus : Ludhiana
Granularity in the Data Warehouse
Granularity means the level of detail in the data stored in a data warehouse.
It shows how specific or summarized the data is.
Types of Granularity:
1. Fine Granularity:
– Data is very detailed (e.g., every transaction).
– Good For:
• In-depth analysis (e.g., daily sales for each product).
– Drawbacks:
• Needs more storage space.
• Slower to process large queries.
www.ctuniversity.in Campus : Ludhiana
Granularity in the Data Warehouse
2. Coarse Granularity:
– Data is summarized (e.g., total sales for a month).
– Good For:
• Quick reporting and analysis.
• Saves storage space.
– Drawbacks:
• Less detailed, so you can’t analyze individual transactions.
www.ctuniversity.in Campus : Ludhiana
www.ctuniversity.in Campus : Ludhiana
Granularity in the Data Warehouse
How to Decide?
• Think about:
– What you need: Do you need detailed data or just summaries?
– Storage: Do you have enough space for detailed data?
– Speed: Do you need fast reports?
Example:
• Fine Granularity: Daily sales for each store.
• Coarse Granularity: Monthly sales for each region
www.ctuniversity.in Campus : Ludhiana
www.ctuniversity.in Campus : Ludhiana
5 Data Warehouse Life Cycle 6
3
1
4 2
www.ctuniversity.in Campus : Ludhiana
www.ctuniversity.in Campus : Ludhiana
Types of Data warehouse Architecture
Single-Tier Architecture
• The objective of a single layer is to minimize the amount of data stored.
This goal is to remove data redundancy. This architecture is not frequently
used in practice.
www.ctuniversity.in Campus : Ludhiana
Single-Tier Architecture
www.ctuniversity.in Campus : Ludhiana
Types of Data warehouse Architecture
Two-Tier Architecture
• Two-layer architecture is one of the Data Warehouse layers which
separates physically available sources and data warehouse.
• This architecture is not expandable and also not supporting a large
number of end-users. It also has connectivity problems because of
network limitations.
www.ctuniversity.in Campus : Ludhiana
Two-Tier Architecture
www.ctuniversity.in Campus : Ludhiana
Types of Data warehouse Architecture
Three-Tier Architecture
It consists of the Top, Middle and Bottom Tier.
• Bottom Tier: The database of the Data warehouse servers as the bottom
tier. It is usually a relational database system. Data is cleansed,
transformed, and loaded into this layer using back-end tools.
• Middle Tier: The middle tier in Data warehouse is an OLAP. For a user, this
application tier presents an abstracted view of the database. This layer
also acts as a mediator between the end-user and the database.
• Top-Tier: The top tier is a front-end client layer. Top tier is the tools and
API that you connect and get data out from the data warehouse. It could
be Query tools, reporting tools, managed query tools, Analysis tools and
Data mining tools.
www.ctuniversity.in Campus : Ludhiana
Three-Tier Architecture
www.ctuniversity.in Campus : Ludhiana
Online Analytical Processing (OLAP)
OLAP is a technology used to analyze large volumes of data from different
perspectives for decision-making. It enables interactive, fast, and multi-
dimensional analysis.
Key Features of OLAP
• Multidimensional Analysis:
– Data is viewed across dimensions (e.g., time, product, region).
• Drill Down and Roll Up:
– Drill Down: View detailed data (e.g., sales by city).
– Roll Up: View summarized data (e.g., sales by country).
• Slice and Dice:
– Focus on specific data by selecting dimensions (e.g., sales in 2023 for
Product A).
www.ctuniversity.in Campus : Ludhiana
www.ctuniversity.in Campus : Ludhiana
Types of OLAP Systems
1. MOLAP (Multidimensional OLAP):
– Stores data in a cube format. Well suited for scenarios where data
does not frequently change
– Advantage: Very fast for pre-aggregated queries.
– Example: Analyzing sales trends across time and regions.
2. ROLAP (Relational OLAP):
– Works directly with relational databases. Well suited for large datasets
and scenarios where data is subject to frequent updates.
– Advantage: Handles large datasets.
– Example: Analyzing data stored in relational tables.
3. HOLAP (Hybrid OLAP):
– Combines MOLAP and ROLAP features.
– Advantage: Balances performance and scalability.
www.ctuniversity.in Campus : Ludhiana