DATA MINING AND WAREHOUSE
t.t.Dilly
dillybabu
babu
t. Dilly babu
1. Data Warehouse Architecture A data warehouse is a system used for reporting and data analysis. Its architecture t
1. Data Warehouse Architecture :
❖ A data warehouse is a system used for reporting and data
analysis.
❖ Its architecture typically has three layers.
Bottom Tier:
A database server where data is stored.
Middle Tier:
An OLAP server that helps analyze data quickly.
Top Tier:
The front-end tools used by users to get reports or analyze data.
2. ETL (Extract, Transform, Load) Process :
❖ ETL stands for Extract, Transform, and Load.
❖ It's the process of:Extracting data from different sources
Transforming it into a proper format (cleaning and
organizing it)Loading it into a data warehouse.
❖ This process is essential for making sure the data is
accurate and ready for analysis.
3. OLAP (Online Analytical Processing) :
❖ OLAP stands for online, analytical and processing.
❖ OLAP is a tool that helps users analyze data quickly in many ways.
❖ It allows for:Viewing data in different dimensions.
❖ OLAP makes it easier to find patterns or trends in large datasets.
4. Star and Snowflake Schemas :
❖ These are ways to organize data in a data warehouse are,
Star Schema:
A simple structure where all data connects to a central fact
table.
Snowflake Schema:
➢ A more complex version where dimensions are split into
smaller tables.
➢ Both help improve the speed and efficiency of data queries.
5. Data Marts :
❖A data mart is a smaller, focused part of a data
warehouse.
❖ It stores data related to one specific department or subject,
like sales, marketing, or finance.
❖ It helps that department quickly access the data it needs
without searching the whole data warehouse.
Types of Data Marts:
1. Dependent Data Mart :
Gets its data from the main data warehouse.
2. Independent Data Mart :
Built directly from different data sources (not from a warehouse).
3. Hybrid Data Mart :
Uses both warehouse and other sources.
6. Data preprocessing :
❖ The data preprocessing is the process of cleaning and
preparing raw data before it is used in data mining or
machine learning.
❖ Raw data is often incomplete, inconsistent, noisy
(contains errors), or not in the right format. Preprocessing
makes the data accurate, clean, and ready for analysis.
❖ This is called data preprocessing.
7.Clustering:
❖ The clustering is a data mining technique used to
group similar data items together based on their
features or patterns.
❖ Unlike classification, clustering does not use
predefined labels. It tries to discover natural
groupings in the data.
❖ It is used to understand hidden patterns in
data.
8. Anomaly Detection :
❖ The anomaly detection is a technique used to identify unusual
or unexpected data that does not follow the normal pattern.
❖ These unusual items are called anomalies, outliers, or
exceptions.
❖ It helps detect fraud, errors, or rare events.
❖ It helps identify problems early before they become
serious.
❖ It is useful in security, healthcare, finance, and
monitoring systems.
9. Applications of Data Mining :
❖ Data Mining is the process of finding patterns, trends,
or useful information from large sets of data.
❖ This information is then used to make better decisions,
predict future outcomes, and improve
performance in many industries.
Marketing:
To find target customers.
Healthcare:
To predict diseases.
Banking:
To detect fraud.
110. Differenciate between data warehouse and data mining:
Aspect Data Warehousing Data Mining
Aspect Data Warehousing Data Mining
Store and manage large amounts of Extract patterns
Extract and insights
patterns and from
insights from the
Purpose Purpose Storedata
and manage large amounts of data the data
data
Analyze data to discover trends,
Function Collect, clean, and organize data Analyze
Function Collect, clean, and organize data predictions, etc.data to discover trends,
predictions, etc.
Data loading, transformation, and Pattern recognition, classification,
Process
storage Pattern recognition, classification,
clustering
Process Data loading, transformation, and storage
A well-structured database or clustering
Models, rules, predictions, and
Output
repository knowledge
Models, rules, predictions, and
Output A well-structured database or repository
Data analysis and knowledge
Focus Data integration and storage knowledge
discovery
Focus Data integration and storage Data analysis
Data analysts, and knowledge discovery
data scientists,
Users Database administrators, IT teams
business analysts
A centralized system holding sales Data
Finding analysts,
customer data
buying scientists, business
patterns
Users Example Database administrators, IT teams
data from stores analysts
from sales data
A centralized system holding sales data Finding customer buying patterns from
Example
from stores sales data
thank you