0% found this document useful (0 votes)

41 views7 pages

Lecture Notes DWM

Uploaded by

gf1166679

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

41 views7 pages

Lecture Notes DWM

Uploaded by

gf1166679

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Yadavrao Tasgaonkar Institute of Engingeineering & Technology, Karjat

Semester-V A.Y. 2025-26 Subject: Data Warehousing and Mining

Lecture Notes

Q.1 Explain data warehouse features.

Ans. i. Data Warehouse is characterized by four key features:

1. Subject oriented Data

2. Integrated Data
3. Time Variant Data
4. Non-volatile Data

ii. Let us examine some of key defining features of data warehouse based on these definitions.
How is this data different from the data in any operational system?

 Subject oriented Data:

- In every industries data sets are organized around individual application to support
those particular operational system.
- In contrast, in the data warehouse data is stored by business subjects not by
applications. Business subject differs from enterprise to enterprise. The fig. below
distinguishes between how data is stored in operational system and in the data
warehouse.
- For example, claims is a critical business subject for an insurance company. Claims
under automobile insurance policies are processed in Auto Insurance Application.
- Similarly, claims data for worker’s compensation insurance is organized in the Workers
Company Insurance application.

 Integrated Data:
- For proper decision making, you need to pull together all relevant data from the various
applications.
- Fig. below illustrates the simple process of data integration for a banking
[Link] data fed into the subject area of account in the data warehouse comes
from three different operational applications.
- Before moving the data into the data warehouse, you have to go through a process of
transformation, consolidation and integration of source data.
Yadavrao Tasgaonkar Institute of Engingeineering & Technology, Karjat
Semester-V A.Y. 2025-26 Subject: Data Warehousing and Mining

Here are some of item that would need standardization

- Naming Conventions
- Codes
- Data Attributes
- Measurements
 Time Variant Data
- For operational system, the stored data reflect current information because these
system support day to day current operations.
- A data warehouse, because of the very nature of its purpose has to contain historical
data not just current values.
- Every data structure in the data warehouse contains the time element.
- The time variant nature of the data in data warehouse
 Allows for analysis of the past.
 Relates information to present.
 Enables forecast for the future.

 Non-volatile Data
- As shown in fig. below, every business transaction does not update the operational
system database in real time.
- We add, change or delete data from operational system database as each transaction
happens but not usually update data in the data warehouse.
- You don’t delete the data in data warehouse in real time.
- The data in data warehouse is not as volatile as the data in an operational database is.
Yadavrao Tasgaonkar Institute of Engingeineering & Technology, Karjat
Semester-V A.Y. 2025-26 Subject: Data Warehousing and Mining

Q.2 What is prediction? Explain about Linear regression method.

Ans. Prediction: Prediction within the context of linear regression refers to the process of
estimating
the value of dependent variable based on value of one and more independent
variable, assuming a linear relation exist between them. Consider example,
y = kx

Linear Regression Method for Prediction

 Modeling the Relationship:
1. Linear regression establishes a mathematical model that describes the relationship
between the dependent variable (the variable to be predicted, often denoted as ‘y’)
and the independent variable (the predictor variables, often denoted as ‘x’).
2. This linear relationship is represented by a linear equation. For simple linear regression
one independent variables:
Y=a+bx
3. For multiple linear regression multiple independent variables:
Y = a + b1x1 + b2x2 + ……….. + bnxn
4. In these equations ‘a’ represents the y intercept the value of when pole x variable are
zero. ‘b’ represents the slope the change in y curve unit change in x.

 Training the Model:

1. The linear regression model is “trained” using a dataset of known values for both the
independent and dependent variables.
2. During this training phase, statistical methods are used to determine the optimal values
for ‘a’ and ‘b’ (b1,b2……bn) that best bit the observed data, minimizing the difference
between the actual ‘y’ values and the predicted ‘y’ values.

 Making the Prediction:

1. Once the model is trained and the parameter (a& b) are determined. You can use the
model to predict the value of the dependent variables.
2. You simply plug the new ‘x’ variables into established linear equation, and the equation
will output the predicted ‘y’ value.
Example:
If you have historical data on advertising spend (independent variable) and corresponding
sales figures (dependent variable) linear regression can be used to find the linear
relationship between them.

Q.3 Explain with example any four OLAP operations.

Ans. A combination of multiple types of technologies is needed for building a data warehouse.
- The range is wide, data modelling, data extraction, data transformation, database
management system, control modules, alert system agents, query tools, analysis tools,
report writers and so on.
- There is no scarcity of vendors and products.
- These multivendor products have to cooperate and work together in your data warehouse.
- When you use database from one product/vendor, the query and reporter tool from another
vendor, and OLAP product from yet another vendor, these three products have no standard
method for exchanging data.
Yadavrao Tasgaonkar Institute of Engingeineering & Technology, Karjat
Semester-V A.Y. 2025-26 Subject: Data Warehousing and Mining

OLAP (Online Analytical Processing) is a technology used in data warehousing to enable fast,
multi-dimensional analysis of large datasets.
1. It is a comparing technology that provides fast, consistent, and interactive access to data for
analysis.
2. Typically used in data warehouse but also can be used with data lakes.
3. The core of OLAP data model is multi-dimensional which allows users to explore data from
different dimensions (time, product, geography).
4. Pre-calculate and aggregate data to enable quick query responses.

How OLAP works with data warehouse:

- Data is extracted from various sources and loaded into data warehouse.
- The data warehouse is optimized for analytical queries not for real time transactions.
- OLAP serves then connected to the data warehouse and provide tools for users to query,
analyze and report on the data.
- Users can use OLAP tools to perform operations like drilling upto higher level of aggregation,
slicing and dicing data by different dimensions.

Key benefits of OLAP:

- Faster query performance.
- Improved business intelligence.
- Enhanced reporting capabilities.
- Support for complex queries.

Q.4 Explain different OLAP operations on multi-dimensional data.

Ans. OLAP (Online Analytical Processing) utilizes a multi-dimensional data model, often represented
as a cube, to enable fast and interactive analysis of data.
- The multi-dimensional structure, with dimensions like time, product, and location facilitates
quick access to summarized information and supports sophisticated business intelligence
and decision making.

Multi-dimensional Data Model OLAP cubes:

- At the heart of OLAP is the concept of multi-dimensional data cube, which is a data structure
designed for efficient storage and retrieval of data across multi-dimensions.

Dimensions:
- Dimensions represent the different perspective from which data can be analyzed. (e.g. time,
product, geography, customer)

Measures:
- Measures are the numerical values that are being analyzed within the cube. (e.g. sales,
figures, profit, margin)

How it works:
1. Data is pre-aggregated OLAP systems pre-calculate and store aggregated data across
different dimensions, allowing for rapid query performance.
2. Users interact with the cube user can “slice and dice” the data, down into details or rolling
upto higher level of summarization.
Yadavrao Tasgaonkar Institute of Engingeineering & Technology, Karjat
Semester-V A.Y. 2025-26 Subject: Data Warehousing and Mining

3. Complex calculations are performed efficiently because data is pre-aggregated, OLAP system
can handle complex calculations and aggregations much faster than traditional relational
database.

Example:
- Imagine a sales database, An OLAP cube could represent with dimension like “Time” (years,
queries, month) “Product” (different product categories) and “Location” (region categories).

Q.5 Demonstrate with diagram Data Mining Architecture.

Ans. - Data mining is the process of discovering interesting and useful knowledge from large
amount of data stored in data base.
- Data mining is usually applied to data warehouse.
- Fig. shows the architecture of data mining system Database or Data Warehouse or other
information repository.
- This includes one or more databases, a data warehouse, or any other information repository.
- Data cleaning and data integration techniques have to be applied to the data before data
mining algorithm can be applied on it.

Database or Data Warehouse Serves:

- The database or data warehouse serves is used to fetch the relevant data on the user’s data
mining request.

Knowledge base:
- It contains the domain knowledge that is used to guide the search or used for evaluation of
the interestingness of resulting patterns.

Data Mining Engine:

- It consists of a set of functionalities for task such as characterization, association,
classification and cluster analysis and evolution analysis.

Pattern Evolution Model:

- The correspondents employ interestingness measures and other threshold values and
interacts with data mining engine so as to retrieve only the interesting results.

Graphical uses Interface (GUI):

- This module interacts with the users and data mining system thereby allowing the user to
use data mining system either by specifying the query or task like that of characterization
association, classification and cluster analysis.
Yadavrao Tasgaonkar Institute of Engingeineering & Technology, Karjat
Semester-V A.Y. 2025-26 Subject: Data Warehousing and Mining

Q.6 Explain with example (1) Star Schema and (2) Snowflake Schema.
Ans. - In this section, we will use the fact and dimensional tables to prepare the logical design of the
data warehouse.
- It is a design technique to structure the business dimensions and metrics.
- Dimensional modelling is used for designing tables for a data warehouse.
- Once the fact and dimension tables have been formed, how these tables be arranged in the
dimensional model

The Star Schema:

- Just imagine a dimensional model with the fact table in the middle and dimension tables
arranged around the fact table.
- This model represents a star formation with the fact table at core and dimension table along
the spikes of star.
- This particular arrangement is thus called a star schema.
- Example: fig. shows simple star schema. It shows an order fact table in the middle and four
dimension table of customers, sales person, time and product.
Yadavrao Tasgaonkar Institute of Engingeineering & Technology, Karjat
Semester-V A.Y. 2025-26 Subject: Data Warehousing and Mining

The Snowflake Schema:

- It is a variation of the star schema model, in which some or all dimension tables are
normalized, thereby further splitting the data into additional tables.
- The resulting schema graph from a shape similar to a snowflake.
- Thus, snowflake schema is a more complex data warehouse model than a star schema.
- Example: Snowflake schemas are generally used when a dimension table become very big
and when a star schema can’t represent the complexity of a data structure.

Data Mining and Warehouse Overview
No ratings yet
Data Mining and Warehouse Overview
9 pages
2017 Summer Model Answer Paper
No ratings yet
2017 Summer Model Answer Paper
29 pages
U1-U5 Consolidated PDF
No ratings yet
U1-U5 Consolidated PDF
222 pages
Data Mining and Data Warehouse Guide
No ratings yet
Data Mining and Data Warehouse Guide
10 pages
DWM Assigment-Questions Ans
No ratings yet
DWM Assigment-Questions Ans
67 pages
Chapter 2.introduction To Data Warehouse
No ratings yet
Chapter 2.introduction To Data Warehouse
49 pages
UNIT2DM
No ratings yet
UNIT2DM
63 pages
Unit 4
No ratings yet
Unit 4
27 pages
Data Mining and Warehousing Q&A Guide
No ratings yet
Data Mining and Warehousing Q&A Guide
13 pages
SUM23 - Model Answer
No ratings yet
SUM23 - Model Answer
26 pages
Ds Assign
No ratings yet
Ds Assign
6 pages
Data Warehouse Fundamentals Explained
No ratings yet
Data Warehouse Fundamentals Explained
31 pages
Data Mining Edited
No ratings yet
Data Mining Edited
29 pages
Data Warehousing and Multidimensional Models
No ratings yet
Data Warehousing and Multidimensional Models
71 pages
Dedan Kimathi University of Technology Department of Information Technology CIT 4207: Data Mining and Warehousing Bbit 4.2 Cat 1 (1mark) (1 Mark)
No ratings yet
Dedan Kimathi University of Technology Department of Information Technology CIT 4207: Data Mining and Warehousing Bbit 4.2 Cat 1 (1mark) (1 Mark)
5 pages
MultiDimensional Data Model
No ratings yet
MultiDimensional Data Model
22 pages
Bi 1nov2017 One
No ratings yet
Bi 1nov2017 One
10 pages
s-22 DWM
100% (2)
s-22 DWM
33 pages
Unit 2
No ratings yet
Unit 2
144 pages
DM HarshQuesAns
No ratings yet
DM HarshQuesAns
183 pages
DWM
No ratings yet
DWM
29 pages
Ctit QB Solution-U1
No ratings yet
Ctit QB Solution-U1
12 pages
Data Mining Notes
No ratings yet
Data Mining Notes
20 pages
List Data Warehouse Models With Example
No ratings yet
List Data Warehouse Models With Example
19 pages
Data Warehouse
No ratings yet
Data Warehouse
16 pages
Current Trends
No ratings yet
Current Trends
35 pages
DWM Paper
No ratings yet
DWM Paper
10 pages
ETL Testing
No ratings yet
ETL Testing
32 pages
KDD and Data Mining Concepts Explained
No ratings yet
KDD and Data Mining Concepts Explained
6 pages
Unit 2
No ratings yet
Unit 2
31 pages
IV-cse DM Viva Questions
No ratings yet
IV-cse DM Viva Questions
10 pages
Unit 1 DWDM Pre
No ratings yet
Unit 1 DWDM Pre
20 pages
Solved DM Questions
No ratings yet
Solved DM Questions
6 pages
On-Line Analytical Processing: Analyzing Data Resources
No ratings yet
On-Line Analytical Processing: Analyzing Data Resources
60 pages
On-Line Analytical Processing For Business Intelligence Using 3-D Architecture
No ratings yet
On-Line Analytical Processing For Business Intelligence Using 3-D Architecture
3 pages
03 DM BI Data Warehousing
No ratings yet
03 DM BI Data Warehousing
94 pages
Partial Exam of Advanced Topics
No ratings yet
Partial Exam of Advanced Topics
6 pages
Summary For Exam
No ratings yet
Summary For Exam
8 pages
Unit-1 4
No ratings yet
Unit-1 4
54 pages
Data Accquisition
No ratings yet
Data Accquisition
6 pages
Data Warehouse Insights
No ratings yet
Data Warehouse Insights
8 pages
Data Mining & Warehousing Guide
No ratings yet
Data Mining & Warehousing Guide
17 pages
Data Warehouse - Unit-2 - S
No ratings yet
Data Warehouse - Unit-2 - S
21 pages
Data Warehouses
No ratings yet
Data Warehouses
6 pages
Unit 2 DATA WAREHOUSE AND DATA MART
No ratings yet
Unit 2 DATA WAREHOUSE AND DATA MART
17 pages
Assignment No 2
No ratings yet
Assignment No 2
26 pages
Key Data Warehouse and Mining Concepts
No ratings yet
Key Data Warehouse and Mining Concepts
18 pages
DW&DM Material
No ratings yet
DW&DM Material
107 pages
Week 7 - Data Warehousing and OLAP
No ratings yet
Week 7 - Data Warehousing and OLAP
4 pages
Mod1 Data Warehouse
No ratings yet
Mod1 Data Warehouse
30 pages
DWM Cheatsheet Sem 5
No ratings yet
DWM Cheatsheet Sem 5
27 pages
Understanding Data Warehousing Concepts
No ratings yet
Understanding Data Warehousing Concepts
3 pages
Question Bank For DMDW
100% (1)
Question Bank For DMDW
10 pages
Bca DM Unit Ii
No ratings yet
Bca DM Unit Ii
17 pages
DW&DM 1,2&3
No ratings yet
DW&DM 1,2&3
58 pages
Data Mining Important
No ratings yet
Data Mining Important
15 pages
CS3352 Foundations of Data Science APRIL MAY 2023
No ratings yet
CS3352 Foundations of Data Science APRIL MAY 2023
16 pages
CLG Exam Form
No ratings yet
CLG Exam Form
2 pages
Dbms Short Notes Compressed
No ratings yet
Dbms Short Notes Compressed
22 pages
Dbms Short Notes Compressed
No ratings yet
Dbms Short Notes Compressed
22 pages
Cost Accounting
No ratings yet
Cost Accounting
3 pages
DWM Lab Manual
No ratings yet
DWM Lab Manual
47 pages
CG Lab Manual
No ratings yet
CG Lab Manual
57 pages
Concept Design With E-R Model
No ratings yet
Concept Design With E-R Model
58 pages
7 Demultiplexing 8086
No ratings yet
7 Demultiplexing 8086
8 pages
Report-2 (1) NEW
No ratings yet
Report-2 (1) NEW
21 pages
New ShowGeneralReport
No ratings yet
New ShowGeneralReport
1 page
Os Assignment 24-25
No ratings yet
Os Assignment 24-25
2 pages
Dlca Assign 1 & 2
No ratings yet
Dlca Assign 1 & 2
1 page
Anand Memorial Intercollege Tournament 2024
No ratings yet
Anand Memorial Intercollege Tournament 2024
4 pages
Super Intelligence
No ratings yet
Super Intelligence
3 pages
4th Micro Project
No ratings yet
4th Micro Project
32 pages
Thonbs 1
No ratings yet
Thonbs 1
6 pages
Jonny's Partners Club Overview
No ratings yet
Jonny's Partners Club Overview
1 page
The Road To Business Mastery: Your Guide To Corporate Success
No ratings yet
The Road To Business Mastery: Your Guide To Corporate Success
1 page
Business Benefits of SAP S4HANA
100% (1)
Business Benefits of SAP S4HANA
9 pages
Module 1
No ratings yet
Module 1
61 pages
Swathi Resume (Power Bi)
No ratings yet
Swathi Resume (Power Bi)
7 pages
The Study On Data Warehouse Design and Usage: Mr. Dishek Mankad, Mr. Preyash Dholakia
No ratings yet
The Study On Data Warehouse Design and Usage: Mr. Dishek Mankad, Mr. Preyash Dholakia
5 pages
Chapter 2 - Introduction To Enterprise Systems
No ratings yet
Chapter 2 - Introduction To Enterprise Systems
6 pages
SAP Visual Intelligence
No ratings yet
SAP Visual Intelligence
40 pages
Data Warehousing
No ratings yet
Data Warehousing
25 pages
A Notes Ravi
No ratings yet
A Notes Ravi
11 pages
DMDW Lesson Plan
No ratings yet
DMDW Lesson Plan
8 pages
Data Warehouse Architecture Guide
No ratings yet
Data Warehouse Architecture Guide
10 pages
Vedant's Resume
No ratings yet
Vedant's Resume
1 page
Report On Data Warehousing
No ratings yet
Report On Data Warehousing
12 pages
Connecting To SAP BW With Microsoft Excel Pivot Tables and ODBO
No ratings yet
Connecting To SAP BW With Microsoft Excel Pivot Tables and ODBO
19 pages
GATE DA Data Warehousing
100% (1)
GATE DA Data Warehousing
30 pages
SAUser Guide
No ratings yet
SAUser Guide
29 pages
TM1 Migration Strategies by QueBIT
No ratings yet
TM1 Migration Strategies by QueBIT
8 pages
Data Warehouse and OLAP Concepts Quiz
13% (8)
Data Warehouse and OLAP Concepts Quiz
24 pages
CS2032 Data Warehousing and Data Mining PPT Unit I
No ratings yet
CS2032 Data Warehousing and Data Mining PPT Unit I
88 pages
SAP Frontend Installation Guide
No ratings yet
SAP Frontend Installation Guide
20 pages
Sharda dss10 PPT 03
100% (1)
Sharda dss10 PPT 03
50 pages
Understanding IVP Data Warehousing
No ratings yet
Understanding IVP Data Warehousing
4 pages
Business Data Mining and Warehousing-2024-2025
No ratings yet
Business Data Mining and Warehousing-2024-2025
122 pages
Essbase Optimization Guide
100% (4)
Essbase Optimization Guide
249 pages
Bba Ca 2023
No ratings yet
Bba Ca 2023
22 pages
Data Warehouse Systems Design and Implementation, 2nd Edition Alejandro Vaisman Full Chapters Included
No ratings yet
Data Warehouse Systems Design and Implementation, 2nd Edition Alejandro Vaisman Full Chapters Included
138 pages
Overview of DOLAP Features and Benefits
No ratings yet
Overview of DOLAP Features and Benefits
4 pages
Chapter-4 3
No ratings yet
Chapter-4 3
25 pages
Azure Data Fundamentals Guide
No ratings yet
Azure Data Fundamentals Guide
264 pages
Data Exploration & Integration with WEKA
No ratings yet
Data Exploration & Integration with WEKA
40 pages
Mumbai University IT Curriculum Rev-2012
No ratings yet
Mumbai University IT Curriculum Rev-2012
51 pages

Lecture Notes DWM

Uploaded by

Lecture Notes DWM

Uploaded by

Yadavrao Tasgaonkar Institute of Engingeineering & Technology, Karjat

Semester-V A.Y. 2025-26 Subject: Data Warehousing and Mining

Q.1 Explain data warehouse features.

1. Subject oriented Data

 Subject oriented Data:

Here are some of item that would need standardization

Q.2 What is prediction? Explain about Linear regression method.

Linear Regression Method for Prediction

 Training the Model:

 Making the Prediction:

Q.3 Explain with example any four OLAP operations.

How OLAP works with data warehouse:

Key benefits of OLAP:

Q.4 Explain different OLAP operations on multi-dimensional data.

Multi-dimensional Data Model OLAP cubes:

Q.5 Demonstrate with diagram Data Mining Architecture.

Database or Data Warehouse Serves:

Data Mining Engine:

Pattern Evolution Model:

Graphical uses Interface (GUI):

The Star Schema:

The Snowflake Schema:

You might also like