0% found this document useful (0 votes)

22 views13 pages

U1 - Data Warehouse Intro

The document provides an introduction to Data Warehousing, outlining its purpose as a centralized repository for business data analysis and reporting. It details the steps involved in data warehousing, the goals and benefits, and distinguishes between Knowledge Discovery in Databases (KDD) and Data Mining, along with their respective processes and techniques. Additionally, it covers various data mining techniques such as classification, clustering, regression, and visualization methods for representing knowledge.

Uploaded by

hydey472

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views13 pages

U1 - Data Warehouse Intro

Uploaded by

hydey472

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 13

Unit 1 - Data Warehouse Introduction

Prepared by: Varun Rao (Dean, Data Science & AI)

For: Data Science - 1st years

An idea on Data Warehouse

Data Warehouse is a relational database management system (RDBMS) that was
constructed to meet the requirements of transaction processing systems. It can be
loosely described as any centralized data repository which can be queried for business
benefits. A Data warehouse is typically used to connect and analyze business
data from heterogeneous sources. The data warehouse is the core of the BI
system which is built for data analysis and reporting.

Data warehouse system is also known by the following name:

● Decision Support System (DSS)
● Executive Information System
● Management Information System
● Business Intelligence Solution
● Analytic Application
● Data Warehouse

A Data Warehouse can be viewed as a data system with the following attributes:

● It is a database designed for investigative tasks, using data from various

applications.

● It supports a relatively small number of clients with relatively long interactions.

● It includes current and historical data to provide a historical perspective of

information.

● Its usage is read-intensive.

● It contains a few large tables.

Steps in Data Warehousing
The following steps are involved in the process of data warehousing:

1. Extraction of data – A large amount of data is gathered from various

sources.
2. Cleaning of data – Once the data is compiled, it goes through a
cleaning process. The data is scanned for errors, and any error found is
either corrected or excluded.
3. Conversion of data – After being cleaned, the format is changed from
the database to a warehouse format.
4. Storing in a warehouse – Once converted to the warehouse format,
the data stored in a warehouse goes through processes such as
consolidation and summarization to make it easier and more
coordinated to use. As sources get updated over time, more data is
added to the warehouse.

Goals of Data Warehousing

● To help reporting as well as analysis

● Maintain the organization's historical information

● Be the foundation for decision making.

Benefits of Data Warehouse

1. Understand business trends and make better forecasting decisions.

2. Data Warehouses are designed to perform enormous amounts of data.

3. The structure of data warehouses is more accessible for end-users to navigate,

understand, and query.

4. Queries that would be complex in many normalized databases could be easier to

build and maintain in data warehouses.

5. Data warehousing is an efficient method to manage demand for lots of

information from lots of users.

6. Data warehousing provides the capabilities to analyze a large amount of

historical data.

Difference between KDD and Data Mining

KDD (Knowledge Discovery in Databases) is a field of computer science, which includes the
tools and theories to help humans in extracting useful and previously unknown information
(i.e. knowledge) from large collections of digitized data. KDD consists of several steps, and
Data Mining is one of them. Data Mining is application of a specific algorithm in order to
extract patterns from data.

KDD is a computer science field specializing in extracting previously unknown and

interesting information from raw data. KDD is the whole process of trying to make sense
of data by developing appropriate methods or techniques. This process deals with low-
level mapping data into other forms that are more compact, abstract, and useful. This is
achieved by creating short reports, modeling the process of generating data, and
developing predictive models that can predict future cases.

What is Data Mining?

As mentioned above, Data Mining is only a step within the overall KDD process. There are
two major Data Mining goals as defined by the goal of the application, and they are namely
verification or discovery. Verification is verifying the user’s hypothesis about data, while
discovery is automatically finding interesting patterns. There are four major data mining
tasks: clustering, classification, regression, and association (summarization). Clustering is
identifying similar groups from unstructured data. Classification is learning rules that can be
applied to new data. Regression is finding functions with minimal error to model data. An
association is looking for relationships between variables.

KDD Process Steps

Knowledge discovery in the database process includes the following steps, such as:

1. Goal identification: Develop and understand the application domain and the
relevant prior knowledge and identify the KDD process's goal from the customer
perspective.

2. Creating a target data set: Selecting the data set or focusing on a set of
variables or data samples on which the discovery was made.

3. Data cleaning and preprocessing:Basic operations include removing noise if

appropriate, collecting the necessary information to model or account for noise,
deciding on strategies for handling missing data fields, and accounting for time
sequence information and known changes.

4. Data reduction and projection: Finding useful features to represent the data
depending on the purpose of the task. The effective number of variables under
consideration may be reduced through dimensionality reduction methods or
conversion, or invariant representations for the data can be found.

5. Matching process objectives: KDD with step 1 a method of mining in particular.

For example, summarization, classification, regression, clustering, and others.
6. Modeling and exploratory analysis and hypothesis selection: Choosing the
algorithms or data mining and selecting the method or methods to search for
data patterns. This process includes deciding which model and parameters may
be appropriate (e.g., definite data models are different models on the real vector)
and the matching of data mining methods, particularly with the general approach
of the KDD process (for example, the end-user might be more interested in
understanding the model in its predictive capabilities).

7. Data Mining: The search for patterns of interest in a particular representational

form or a set of these representations, including classification rules or trees,
regression, and clustering. The user can significantly aid the data mining method
to carry out the preceding steps properly.

8. Presentation and evaluation: Interpreting mined patterns, possibly returning to

some of the steps between steps 1 and 7 for additional iterations. This step may
also involve the visualization of the extracted patterns and models or
visualization of the data given the models drawn.

9. Taking action on the discovered knowledge: Using the knowledge directly,

incorporating the knowledge in another system for further action, or simply
documenting and reporting to stakeholders. This process also includes checking
and resolving potential conflicts with previously believed knowledge (or
extracted).

Stages of the Data Mining Process

Data Mining is a process of discovering various models, summaries, and derived

values from a given collection of data. The general experimental procedure
adapted to data-mining problem involves following steps :
1. State problem and formulate hypothesis – In this step, a modeler
usually specifies a group of variables for unknown dependency and, if
possible, a general sort of this dependency as an initial hypothesis.
There could also be several hypotheses formulated for one problem at
this stage. The primary step requires combined expertise of an
application domain and a data-mining model. In practice, it always
means an in-depth interaction between a data-mining expert and
application expert. In successful data-mining applications, this
cooperation does not stop within the initial phase. It continues during
the whole data-mining process.
2. Collect data – This step cares about how information is generated and
picked up. Generally, there are two distinct possibilities. The primary is
when the data-generation process is under control of an expert
(modeler). This approach is understood as a designed experiment. The
second possibility is when experts cannot influence the data generation
process. This is often referred to as the observational approach. An
observational setting, namely, random data generation, is assumed in
most data-mining applications. Typically, sampling distribution is totally
unknown after data are collected, or it is partially and implicitly given
within data-collection procedure. It is vital, however, to know how data
collection affects its theoretical distribution since such a piece of prior
knowledge is often useful for modeling and, later, for ultimate
interpretation of results. Also, it is important to be sure that information
used for estimating a model and therefore data used later for testing
and applying a model come from an equivalent, unknown, sampling
distribution. If this is often not the case, the estimated model cannot be
successfully utilized in a final application of results.
3. Data Preprocessing – In the observational setting, data is usually
“collected” from prevailing databases, data warehouses, and data
marts. Data preprocessing usually includes a minimum of two common
tasks :
○ (i) Outlier Detection (and removal) : Outliers are unusual
data values that are not according to most observations.
Commonly, outliers result from measurement errors, coding,
and recording errors, and, sometimes, are natural, abnormal
values. Such non-representative samples can seriously affect
models produced later. There are two strategies for handling
outliers : Detect and eventually remove outliers as a
neighborhood of preprocessing phase. And Develop robust
modeling methods that are insensitive to outliers.
○ (ii) Scaling, encoding, and selecting features : Data
preprocessing includes several steps like variable scaling and
differing types of encoding. For instance, one feature with
range [0, 1] and other with range [100, 1000] will not have an
equivalent weight within applied technique. They are going to
also influence ultimate data-mining results differently.
Therefore, it is recommended to scale them and convey both
features to an equivalent weight for further analysis. Also,
application-specific encoding methods usually achieve
dimensionality reduction by providing a smaller number of
informative features for subsequent data modeling.
4. Estimate model – The selection and implementation of acceptable
data-mining techniques is that main task during this phase. This
process is not straightforward. Usually, in practice, implementation is
predicated on several models, and selecting the simplest one is a
further task.
5. Interpret the model and draw conclusions – In most cases, data-
mining models should help in deciding. Hence, such models have to be
interpretable so as to be useful because humans are not likely to base
their decisions on complex “black-box” models. Note that goals of
accuracy of model and accuracy of its interpretation are somewhat
contradictory. Usually, simple models are more interpretable, but they
are also less accurate. Modern data-mining methods are expected to
yield highly accurate results using high dimensional models. The matter
of interpreting these models, also vital, is taken into account as a
separate task, with specific techniques to validate results.

Data Mining Techniques

1. Association

Association analysis is the finding of association rules showing attribute-value

conditions that occur frequently together in a given set of data. Association
analysis is widely used for a market basket or transaction data analysis.
Association rule mining is a significant and exceptionally dynamic area of data
mining research. One method of association-based classification, called
associative classification, consists of two steps. In the main step, association
instructions are generated using a modified version of the standard association
rule mining algorithm known as Apriori. The second step constructs a classifier
based on the association rules discovered.

2. Classification

Classification is the process of finding a set of models (or functions) that describe
and distinguish data classes or concepts, for the purpose of being able to use the
model to predict the class of objects whose class label is unknown. The
determined model depends on the investigation of a set of training data
information (i.e. data objects whose class label is known).
Data Mining has a different type of classifier:
● Decision Tree
● SVM(Support Vector Machine)
● Generalized Linear Models
● Bayesian classification:
● Classification by Backpropagation
● K-NN Classifier (K-nearest neighbor)
● Rule-Based Classification
● Frequent-Pattern Based Classification
● Rough set theory
● Fuzzy Logic

3. Prediction

Data Prediction is a two-step process, similar to that of data classification.

Although, for prediction, we do not utilize the phrasing of “Class label attribute”
because the attribute for which values are being predicted is consistently
valued(ordered) instead of categorical (discrete-esteemed and unordered). The
attribute can be referred to simply as the predicted attribute. Prediction can be
viewed as the construction and use of a model to assess the class of an
unlabeled object, or to assess the value or value ranges of an attribute that a
given object is likely to have.
4. Clustering

Unlike classification and prediction, which analyze class-labeled data objects or

attributes, clustering analyzes data objects without consulting an identified class
label. In general, the class labels do not exist in the training data simply because
they are not known to begin with. Clustering can be used to generate these
labels. The objects are clustered based on the principle of maximizing the intra-
class similarity and minimizing the inter-class similarity. That is, clusters of
objects are created so that objects inside a cluster have high similarity in contrast
with each other, but are different objects in other clusters. Each Cluster that is
generated can be seen as a class of objects, from which rules can be inferred.
Clustering can also facilitate classification formation, that is, the organization of
observations into a hierarchy of classes that group similar events together.

5. Regression

Regression can be defined as a statistical modeling method in which previously

obtained data is used to predict a continuous quantity for new observations. This
classifier is also known as the Continuous Value Classifier. There are two types
of regression models: Linear regression and multiple linear regression models.

6. Artificial Neural network (ANN) Classifier Method

An artificial neural network (ANN) also referred to as simply a “Neural Network”

(NN), could be a process model supported by biological neural networks. It
consists of an interconnected collection of artificial neurons. A neural network is a
set of connected input/output units where each connection has a weight
associated with it. During the knowledge phase, the network acquires by
adjusting the weights to be able to predict the correct class label of the input
samples. Neural network learning is also denoted as connectionist learning due
to the connections between units. Neural networks involve long training times
and are therefore more appropriate for applications where this is feasible.

7. Outlier Detection

A database may contain data objects that do not comply with the general
behavior or model of the data. These data objects are Outliers. The investigation
of OUTLIER data is known as OUTLIER MINING. An outlier may be detected
using statistical tests which assume a distribution or probability model for the
data, or using distance measures where objects having a small fraction of “close”
neighbors in space are considered outliers.

Knowledge Representation

Knowledge representation is the presentation of knowledge to the user for

visualization in terms of trees, tables, rules graph, charts, matrices, etc.
For Example: Histograms

Histograms
● Histogram provides the representation of a distribution of values of a
single attribute.
● It consists of a set of rectangles that reflects the counts or frequencies of
the classes present in the given data.

. Geometric projection visualization technique

Techniques used to find geometric transformation are:

i. Scatter-plot matrices
It consists of scatter plots of all possible pairs of variables in a dataset.

ii. Hyper slice

It is an extension to scatter-plot matrices. They represent multidimensional

function as a matrix of orthogonal two dimensional slices.

iii. Parallel coordinates

● The parallel vertical lines which are separated define the axes.
● A point in the Cartesian coordinates corresponds to a polyline in parallel
coordinates.
. Icon-based visualization techniques
● Icon-based visualization techniques are also known as iconic display
techniques.
● Each multidimensional data item is mapped to an icon.
● This technique allows visualization of large amounts of data.
● The most commonly used technique is Chernoff faces.

Some of the visualization techniques are:

i. Dimensional stacking

● In dimension stacking, n-dimensional attribute space is partitioned in 2-

dimensional subspaces.
● Attribute values are partitioned into various classes.
● Each element is two dimensional space in the form of xy plot.
● Helps to mark the important attributes and are used on the outer level.

ii. Mosaic plot

● Mosaic plot gives the graphical representation of successive

decompositions.
● Rectangles are used to represent the count of categorical data and at
every stage, rectangles are split parallel.

iii. Worlds within worlds

● Worlds within worlds are useful to generate an interactive hierarchy of

display.
● Innermost word must have a function and two most important
parameters.
● Remaining parameters are fixed with the constant value.
● Through this, N-vision of data are possible like data glove and stereo
displays, including rotation, scaling (inner) and translation (inner/outer).
● Using queries, static interaction is possible.

iv. Tree maps

● Tree maps visualization techniques are well suited for displaying large
amount of hierarchical structured data.
● The visualization space is divided into the multiple rectangles that are
ordered, according to a quantitative variable.
● The levels in the hierarchy are seen as rectangles containing the other
rectangle.
● Each set of rectangles on the same level in the hierarchy represents a
category, a column or an expression in a data set.

v. Visualization complex data and relations

● This technique is used to visualize non-numeric data.

For example: text, pictures, blog entries and product reviews.
● A tag cloud is a visualization method which helps to understand the
information of user generated tags.
● It is also possible to arrange the tags alphabetically or according to the
user preferences with different font sizes and colors.

Data Structures: Notes For Lecture 12 Introduction To Data Mining by Samaher Hussein Ali
No ratings yet
Data Structures: Notes For Lecture 12 Introduction To Data Mining by Samaher Hussein Ali
4 pages
Data Mining
No ratings yet
Data Mining
25 pages
Data Mining & KDD Overview
No ratings yet
Data Mining & KDD Overview
22 pages
Fund Data Science
No ratings yet
Fund Data Science
91 pages
Unit 1
No ratings yet
Unit 1
43 pages
DWDM Notes - Unit 1
No ratings yet
DWDM Notes - Unit 1
26 pages
Data Mining
No ratings yet
Data Mining
15 pages
Introduction to Data Mining Techniques
No ratings yet
Introduction to Data Mining Techniques
11 pages
Data Mining New
No ratings yet
Data Mining New
21 pages
DWM 4
No ratings yet
DWM 4
23 pages
Unit-2 Introduction To Data Mining
100% (1)
Unit-2 Introduction To Data Mining
11 pages
Data Mining: Concepts and Challenges
100% (1)
Data Mining: Concepts and Challenges
24 pages
Data Mining Unit 1
No ratings yet
Data Mining Unit 1
22 pages
Unit-1 Introduction To Data Mining
No ratings yet
Unit-1 Introduction To Data Mining
33 pages
Understanding Machine Learning Basics
No ratings yet
Understanding Machine Learning Basics
14 pages
Data Mining Notes UNIT I
No ratings yet
Data Mining Notes UNIT I
21 pages
Data Mining Essentials for Students
No ratings yet
Data Mining Essentials for Students
15 pages
Unit 1 Datamining For Business Intelligence
No ratings yet
Unit 1 Datamining For Business Intelligence
101 pages
Unit-2 Finalized
No ratings yet
Unit-2 Finalized
12 pages
Data Mining and Warehousing Overview
No ratings yet
Data Mining and Warehousing Overview
25 pages
DMDW Unit1
No ratings yet
DMDW Unit1
31 pages
Data Mining Essentials Explained
No ratings yet
Data Mining Essentials Explained
24 pages
Steps Involved in KDD Process: Data Mining
No ratings yet
Steps Involved in KDD Process: Data Mining
14 pages
DWM Notes Class by Proff
No ratings yet
DWM Notes Class by Proff
88 pages
DMW ALLinONE
No ratings yet
DMW ALLinONE
64 pages
Data Mining Notes
No ratings yet
Data Mining Notes
82 pages
Subject Data Warehouse
No ratings yet
Subject Data Warehouse
42 pages
Unit III DWDM
No ratings yet
Unit III DWDM
113 pages
Data Mining - Reference - 1
No ratings yet
Data Mining - Reference - 1
91 pages
Data Mining Unit-I
No ratings yet
Data Mining Unit-I
11 pages
Unit 3
No ratings yet
Unit 3
34 pages
Unit-I Data Mining
No ratings yet
Unit-I Data Mining
28 pages
Unit 3 DWM Notes
No ratings yet
Unit 3 DWM Notes
17 pages
FDS Unit 1
No ratings yet
FDS Unit 1
20 pages
Unit 1 DMW
No ratings yet
Unit 1 DMW
41 pages
Data Mining Assignment Overview
No ratings yet
Data Mining Assignment Overview
11 pages
Data Mining
No ratings yet
Data Mining
44 pages
DM Module1
No ratings yet
DM Module1
15 pages
Unit 1 - Introduction
No ratings yet
Unit 1 - Introduction
8 pages
Data Mining: Key Concepts and Steps
No ratings yet
Data Mining: Key Concepts and Steps
25 pages
Data Mining - Prashant
No ratings yet
Data Mining - Prashant
10 pages
Unit 1
No ratings yet
Unit 1
11 pages
Intro 2
No ratings yet
Intro 2
3 pages
Data Mine
No ratings yet
Data Mine
14 pages
Unit - 4 Introduction To Data Mining
No ratings yet
Unit - 4 Introduction To Data Mining
71 pages
Data Mining For Humanity: An Overview
No ratings yet
Data Mining For Humanity: An Overview
4 pages
Data Mining: Techniques and Processes
No ratings yet
Data Mining: Techniques and Processes
26 pages
Understanding Data Mining Concepts
No ratings yet
Understanding Data Mining Concepts
44 pages
Notes For DMDWH - Module1
No ratings yet
Notes For DMDWH - Module1
21 pages
DW and DM Notes
No ratings yet
DW and DM Notes
89 pages
Data Mining
No ratings yet
Data Mining
19 pages
Data Warehousing for Analysts
No ratings yet
Data Warehousing for Analysts
9 pages
FDS Unit 1 Notes
No ratings yet
FDS Unit 1 Notes
30 pages
Current Trends
No ratings yet
Current Trends
35 pages
DWH Unit 3
No ratings yet
DWH Unit 3
7 pages
Data Mining Questions 1st Unit
No ratings yet
Data Mining Questions 1st Unit
6 pages
Data Mining Notes
No ratings yet
Data Mining Notes
25 pages
Data Mining Notes
75% (4)
Data Mining Notes
75 pages
Customer Assistance
No ratings yet
Customer Assistance
2 pages
Offer Letter 6
No ratings yet
Offer Letter 6
10 pages
Offer Letter 3
No ratings yet
Offer Letter 3
10 pages
Offer Letter-14
No ratings yet
Offer Letter-14
10 pages
Offer Letter 2
No ratings yet
Offer Letter 2
10 pages
Offer Letter 15
No ratings yet
Offer Letter 15
10 pages
Offer Letter 17
No ratings yet
Offer Letter 17
10 pages
Bim Task1 - Edited
No ratings yet
Bim Task1 - Edited
2 pages
Software Project Management: Telone Centre For Learning
No ratings yet
Software Project Management: Telone Centre For Learning
10 pages
253 Companies Hiring Remotely in December (391 Jobs)
No ratings yet
253 Companies Hiring Remotely in December (391 Jobs)
78 pages
Assignment Tracking System Report File
No ratings yet
Assignment Tracking System Report File
121 pages
Module 5 Assignment - SQL
No ratings yet
Module 5 Assignment - SQL
3 pages
LibreOffice Spreadsheet Subtotals
No ratings yet
LibreOffice Spreadsheet Subtotals
5 pages
Question Paper Bit2034 Object-Oriented Analysis Design - Prof Adnan
No ratings yet
Question Paper Bit2034 Object-Oriented Analysis Design - Prof Adnan
5 pages
MediCard GO User Manual and FAQs - Sept 2023
No ratings yet
MediCard GO User Manual and FAQs - Sept 2023
33 pages
Ricoh MP 6503SP MP 7503SP MP 9003SP
No ratings yet
Ricoh MP 6503SP MP 7503SP MP 9003SP
6 pages
Best British Short Stories 2025
No ratings yet
Best British Short Stories 2025
19 pages
TACCP/VACCP Guide for Malting Industry
100% (1)
TACCP/VACCP Guide for Malting Industry
16 pages
PU3 - U6 - Unit Test
No ratings yet
PU3 - U6 - Unit Test
8 pages
Sunny Sharma Resume
No ratings yet
Sunny Sharma Resume
2 pages
Strand Century Lighting 6000 Series Plug-In Boxes Flush Receptacles Spec Sheet 6-77
No ratings yet
Strand Century Lighting 6000 Series Plug-In Boxes Flush Receptacles Spec Sheet 6-77
2 pages
NaviTrak Short Radius Manual 750-500-029
No ratings yet
NaviTrak Short Radius Manual 750-500-029
110 pages
PRORXD Broadcast Receiver User Guide - Rev12.1
No ratings yet
PRORXD Broadcast Receiver User Guide - Rev12.1
60 pages
ARK MachTek: Innovating Gear Solutions
No ratings yet
ARK MachTek: Innovating Gear Solutions
17 pages
Soft Starter Setup Guide
No ratings yet
Soft Starter Setup Guide
24 pages
MCS 202
No ratings yet
MCS 202
6 pages
Free Photobook Campaign Report
No ratings yet
Free Photobook Campaign Report
6 pages
DLL Tle-Ict 9 q2 w1
No ratings yet
DLL Tle-Ict 9 q2 w1
10 pages
RiSCAN PRO Changes 2.18.1
No ratings yet
RiSCAN PRO Changes 2.18.1
51 pages
Net Framework
No ratings yet
Net Framework
18 pages
b0700sw F PDF
No ratings yet
b0700sw F PDF
610 pages
Extra Joss Website Development Plan
100% (3)
Extra Joss Website Development Plan
6 pages
Curriculum Vitae
No ratings yet
Curriculum Vitae
5 pages
2k Url Go - Id Access
No ratings yet
2k Url Go - Id Access
42 pages
AC 800PEC Training
No ratings yet
AC 800PEC Training
34 pages
AI Cybersecurity Proposal With Flowcharts
No ratings yet
AI Cybersecurity Proposal With Flowcharts
12 pages
Tools of The System Analyst
80% (10)
Tools of The System Analyst
12 pages
Lec06 2.9m
No ratings yet
Lec06 2.9m
58 pages

U1 - Data Warehouse Intro

Uploaded by

U1 - Data Warehouse Intro

Uploaded by

Unit 1 - Data Warehouse Introduction

Prepared by: Varun Rao (Dean, Data Science & AI)

An idea on Data Warehouse

Data warehouse system is also known by the following name:

● It is a database designed for investigative tasks, using data from various

● It supports a relatively small number of clients with relatively long interactions.

● It includes current and historical data to provide a historical perspective of

● Its usage is read-intensive.

● It contains a few large tables.

1. Extraction of data – A large amount of data is gathered from various

Goals of Data Warehousing

● Maintain the organization's historical information

● Be the foundation for decision making.

Benefits of Data Warehouse

1. Understand business trends and make better forecasting decisions.

2. Data Warehouses are designed to perform enormous amounts of data.

3. The structure of data warehouses is more accessible for end-users to navigate,

4. Queries that would be complex in many normalized databases could be easier to

5. Data warehousing is an efficient method to manage demand for lots of

6. Data warehousing provides the capabilities to analyze a large amount of

Difference between KDD and Data Mining

KDD is a computer science field specializing in extracting previously unknown and

What is Data Mining?

KDD Process Steps

3. Data cleaning and preprocessing:Basic operations include removing noise if

5. Matching process objectives: KDD with step 1 a method of mining in particular.

7. Data Mining: The search for patterns of interest in a particular representational

8. Presentation and evaluation: Interpreting mined patterns, possibly returning to

9. Taking action on the discovered knowledge: Using the knowledge directly,

Stages of the Data Mining Process

Data Mining is a process of discovering various models, summaries, and derived

Data Mining Techniques

Association analysis is the finding of association rules showing attribute-value

Data Prediction is a two-step process, similar to that of data classification.

Unlike classification and prediction, which analyze class-labeled data objects or

Regression can be defined as a statistical modeling method in which previously

6. Artificial Neural network (ANN) Classifier Method

An artificial neural network (ANN) also referred to as simply a “Neural Network”

Knowledge representation is the presentation of knowledge to the user for

. Geometric projection visualization technique

ii. Hyper slice

It is an extension to scatter-plot matrices. They represent multidimensional

iii. Parallel coordinates

Some of the visualization techniques are:

● In dimension stacking, n-dimensional attribute space is partitioned in 2-

ii. Mosaic plot

● Mosaic plot gives the graphical representation of successive

iii. Worlds within worlds

● Worlds within worlds are useful to generate an interactive hierarchy of

iv. Tree maps

v. Visualization complex data and relations

● This technique is used to visualize non-numeric data.

You might also like