0% found this document useful (0 votes)

27 views23 pages

Introduction Lecture1gghhhhh

Gghjhggghjuhbgvbk hhhbjjghhhjjj gghhhh Jjjjj

Uploaded by

fetnbadani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views23 pages

Introduction Lecture1gghhhhh

Gghjhggghjuhbgvbk hhhbjjghhhjjj gghhhh Jjjjj

Uploaded by

fetnbadani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

Data Mining

— Introduction —

1
Why Data Mining?
 The Explosive Growth of Data(abundant data): from terabytes
to petabytes
 Data collection and data availability

Automated data collection tools, database systems,
Web, computerized society
 We are drowning in data, but starving for knowledge!
 “Necessity is the mother of invention”—Data mining—
Automated analysis of massive data sets

2
define Data Mining?
 Sifting through very large amounts of data for useful
information. Data mining uses artificial intelligence
techniques, neural networks, and advanced statistical tools
(such as cluster analysis) to reveal trends, patterns, and
relationships, which might otherwise have remained
undetected. In contrast to an expert system (which draws
inferences from the given data on the basis of a given set of
rules) data mining attempts to discover hidden rules
underlying the data. Also called data surfing.

3
Data Mining Techniques
 The most commonly used techniques in data mining are:
1- Artificial neural networks: Non-linear predictive models that
learn through training and resemble biological neural networks
in structure.

2- Decision trees: Tree-shaped structures that represent sets of

decisions. These decisions generate rules for the classification
of a dataset..

4
Data Mining Techniques

3- Genetic algorithms: Optimization techniques that use

processes such as genetic combination, mutation, and natural
selection in a design based on the concepts of evolution.

4-Nearest neighbor method: A technique that classifies each

record in a dataset based on a combination of the classes of
the k record(s) most similar to it in a historical dataset (where
k ³ 1). Sometimes called the k-nearest neighbor technique.

5- Rule induction: The extraction of useful if-then rules from data

based on statistical significance

5
Applications of Data Mining

 There is a rapidly growing body of successful applications in

a wide range of areas as diverse as:
 analysis of organic compounds
 weather forecasting
 predicting share of television audiences
 medical diagnosis
 financial forecasting
 automatic abstracting
 credit card fraud detection
 targeted marketing
 electric load prediction
 toxic hazard analysis

6
Application examples
 and many more. Some examples of applications (potential or
actual) are:
1– a supermarket chain mines its customer transactions data to
optimise targeting of high value customers.

2– a credit card company can use its data warehouse of

customer transactions for fraud detection.

3– a major hotel chain can use survey databases to identify

attributes of a 'high-value’ prospect

4– predicting the probability of default for consumer loan

applications by improving the ability to predict bad loans.

7
Application examples

5– reducing fabrication flaws in VLSI chips.

6– data mining systems can sift through vast quantities of data

collected during the semiconductor fabrication process to
identify conditions that are causing yield problems.

7– predicting audience share for television programmers ,

allowing television executives to arrange show schedules to
maximize market share and increase advertising revenues

8– predicting the probability that a cancer patient will respond

to chemotherapy,thus reducing health-care costs without
affecting quality of care.

8
Knowledge Discovery in Databases (KDD)
Process

 The KDD process is defined as: the nontrivial process of

identifying
valid, novel, potentially useful, and ultimately
understandable (comprehensible) patterns in data”, [ Fayyad
et al.(1996)].

 Valid: are the discovered patterns representative of the data.

 Novel: are the discovered patterns new to the organization.
 Useful: can the organization use the discovered patterns.
 Comprehensible: can we understand the discovered patterns.

9
Knowledge Discovery (KDD) Process

 Data mining—core of Pattern Evaluation

knowledge discovery
process
Data Mining

Task-relevant Data

Data Selection and

Warehouse Transformation

Data Cleaning

Data Integration

Databases
10
KDD Process: Several Key
Steps
1. Preprocessing steps:-
 Data cleaning (to remove noise and inconsistent data).
 Data integration (where multiple data sources may be
combined).
 Data transformation( where data transformed into
appropriate for mining).

2. Data mining( an essential process where intelligent

methods are applied in order to extract data patterns).

3. Post-processing steps:-
 Pattern evaluation (to identify the truly interesting patterns)
 knowledge presentation( present the mined knowledge to
the user -rules, tables, pie/bar chart, concept hierarchy, trees
etc.)
11
What Is Data Mining?

 Data mining (knowledge discovery from data)

 Extraction of interesting (non-trivial, implicit, previously
unknown and potentially useful) patterns or knowledge
from huge amount of data
 Data mining: a misnomer?
 Alternative names
 Knowledge discovery (mining) in databases (KDD),
knowledge extraction, data/pattern analysis, data
dredging, information harvesting, etc.

12
Data Mining: Confluence of Multiple
Disciplines

Database
Technology Statistics

Machine Visualization
Learning Data Mining

Pattern
Recognition Other
Algorithm Disciplines

13
Data Mining Functionalities

 General functionality
 Descriptive data mining
Find human-interpretable patterns that describe
the data.
 Predictive data mining
Use some variables to predict unknown or future
values of other variables.

14
Data Mining Tasks…
 Classification [Predictive]
 Clustering [Descriptive]
 Association Rule Discovery [Descriptive]
 Regression [Predictive]
 Deviation Detection [Predictive]
Classification: Definition
 Given a collection of records (training set )
 Each record contains a set of attributes, one of
the attributes is the class.
 Find a model for class attribute as a
function of the values of other attributes.
 Goal: previously unseen records should be
assigned a class as accurately as possible.
 A test set is used to determine the accuracy of
the model. Usually, the given data set is
divided into training and test sets, with training
set used to build the model and test set used
to validate it.
Classification Example
cal cal us
i i o
gor gor inu
a te a te ont a ss
c c c cl
Tid Refund Marital Taxable Refund Marital Taxable
Status Income Cheat Status Income Cheat

1 Yes Single 125K No No Single 75K ?

2 No Married 100K No Yes Married 50K ?
3 No Single 70K No No Married 150K ?
4 Yes Married 120K No Yes Divorced 90K ?
5 No Divorced 95K Yes No Single 40K ?
6
7
No
Yes
Married
Divorced 220K
60K No
No
10
No Married 80K ?
Test
8 No Single 85K Yes Set
9 No Married 75K No
Training
Learn
10 No Single 90K Yes Model
10

Set Classifier
Clustering Definition
 Given a set of data points, each having a
set of attributes, and a similarity measure
among them, find clusters such that
 Data points in one cluster are more similar to
one another.
 Data points in separate clusters are less similar
to one another.
Association Rule Discovery:
Definition
 Association rule mining searches for interesting relationships
among items in a given dataset.

Which items are frequently purchased by my customers?
Market basket analyst.
TID Items

{Milk}→
Rules
RulesDiscovered:
→{Coke}(
1 Bread, Coke, Milk Discovered:

Milk}→{Beer}
{Milk}
→{Beer}
2 Beer, Bread {Coke}(support=0.6%, confidence=0.75%
support=0.6%, confidence=0.75
{Diaper,
{Diaper,Milk}
3 Beer, Coke, Diaper, Milk
4 Beer, Bread, Diaper, Milk
5 Coke, Diaper, Milk


If a customer buys diaper and milk, then he is very
likely to buy beer.

So, don’t be surprised if you find six-packs stacked
next to diapers!
Regression
 Predict a value of a given continuous valued
variable based on the values of other variables,
assuming a linear or nonlinear model of
dependency.
 Greatly studied in statistics, neural network fields.
 Examples:

Predicting sales amounts of new product based
on advetising expenditure.

Predicting wind velocities as a function of
temperature, humidity, air pressure, etc.

Time series prediction of stock market indices.
Deviation/Anomaly Detection

 Detect significant deviations from normal

behavior
 Applications:

Credit Card Fraud Detection


Network Intrusion
Detection
Are All the “Discovered” Patterns
Interesting?
 Data mining may generate thousands of patterns: Not all of
them are interesting
 Interestingness measures
 A pattern is interesting if it is easily understood by humans, valid
on new or test data with some degree of certainty, potentially
useful, novel, or validates some hypothesis that a user seeks to
confirm
 Objective vs. subjective interestingness measures
 Objective(data driven): based on statistics and structures of
patterns, e.g., support, confidence(degree of certainty), etc.
 Subjective(user driven): based on user’s belief in the data, e.g.,
unexpectedness(contradicting a user’s belief), novelty(previously
unknown), actionability(Use of discovered knowledge), etc…
22
Pattern Interestingness Measure

 Simplicity
e.g., (association) rule length, (decision) tree size
 Certainty ( A → B)
e.g., confidence= #(A and B)/ #(A), classification
reliability or accuracy, certainty factor, rule strength, rule
quality,
 Support = #(A and B)/ #(Domain),
 Coverage= #(A and B)/ #(B),
 Novelty
not previously known, surprising.

Data Mining Techniques and Applications
No ratings yet
Data Mining Techniques and Applications
39 pages
Tum Dersler Veri Madenciligi
No ratings yet
Tum Dersler Veri Madenciligi
123 pages
1 - Lect 1 & 2 Data Mining
No ratings yet
1 - Lect 1 & 2 Data Mining
20 pages
4 Datamining
No ratings yet
4 Datamining
90 pages
Introduction To Data Mining
No ratings yet
Introduction To Data Mining
43 pages
Lect 1 2 Data Mining 3
No ratings yet
Lect 1 2 Data Mining 3
19 pages
Introduction To Data Mining Unit1
100% (1)
Introduction To Data Mining Unit1
37 pages
Data Mining for Business Insights
100% (1)
Data Mining for Business Insights
39 pages
Data Mining
No ratings yet
Data Mining
26 pages
Data Miningppt378
No ratings yet
Data Miningppt378
31 pages
Introduction
No ratings yet
Introduction
26 pages
01 - Introduction To Datamining
No ratings yet
01 - Introduction To Datamining
19 pages
Data Mining Course Overview
No ratings yet
Data Mining Course Overview
41 pages
Datamining 1
No ratings yet
Datamining 1
30 pages
Motivation For Data Mining The Information Crisis
No ratings yet
Motivation For Data Mining The Information Crisis
13 pages
Data Mining
No ratings yet
Data Mining
63 pages
To Data Mining: Motivation: "Necessity Is The Mother of Invention"
No ratings yet
To Data Mining: Motivation: "Necessity Is The Mother of Invention"
14 pages
Data Mining Merged PDF CS1 CS8
No ratings yet
Data Mining Merged PDF CS1 CS8
272 pages
Data Mining Mids
No ratings yet
Data Mining Mids
24 pages
5 Data Mining Proccess and Techniques - Week 7
No ratings yet
5 Data Mining Proccess and Techniques - Week 7
61 pages
CIS 467 - Topic 1 - Introduction - 2020
No ratings yet
CIS 467 - Topic 1 - Introduction - 2020
79 pages
1 - DM
No ratings yet
1 - DM
5 pages
Chapter 1
No ratings yet
Chapter 1
38 pages
What Is Not Data Mining - Ex: Generation of Attendance Report (Of A Course) From Registration Cards. - Student Table (STD)
No ratings yet
What Is Not Data Mining - Ex: Generation of Attendance Report (Of A Course) From Registration Cards. - Student Table (STD)
33 pages
Unit 1
No ratings yet
Unit 1
59 pages
Data Mining
No ratings yet
Data Mining
20 pages
Data Mining Survey Overview
No ratings yet
Data Mining Survey Overview
8 pages
Lecture 2 Data Mining Functions
No ratings yet
Lecture 2 Data Mining Functions
40 pages
Unit - I MLT
No ratings yet
Unit - I MLT
137 pages
Data Mining in Digital Humanities
No ratings yet
Data Mining in Digital Humanities
84 pages
Data Mining (Introduction)
No ratings yet
Data Mining (Introduction)
31 pages
01 Intro
No ratings yet
01 Intro
23 pages
Data Mining Techniques Overview
No ratings yet
Data Mining Techniques Overview
44 pages
Week-1-Introduction To Data Mining
No ratings yet
Week-1-Introduction To Data Mining
43 pages
Chapter 1 Intro
No ratings yet
Chapter 1 Intro
23 pages
DM Lec1
No ratings yet
DM Lec1
40 pages
Introduction To Data Mining
No ratings yet
Introduction To Data Mining
44 pages
DMiningKuliah 1 Introduction
No ratings yet
DMiningKuliah 1 Introduction
41 pages
Introduction to Data Mining Basics
No ratings yet
Introduction to Data Mining Basics
43 pages
Combinepdf 1
No ratings yet
Combinepdf 1
74 pages
1 Intro
No ratings yet
1 Intro
33 pages
Data Mining Concepts and Applications
No ratings yet
Data Mining Concepts and Applications
27 pages
Introduction to Data Mining Concepts
No ratings yet
Introduction to Data Mining Concepts
34 pages
Knowledge Discovery & Data Mining
No ratings yet
Knowledge Discovery & Data Mining
30 pages
Data Mining
No ratings yet
Data Mining
254 pages
FALLSEM2025 26 - VL - ISWE209L - 00100 - TH - 2025 07 31 - Course Material For Module 1
No ratings yet
FALLSEM2025 26 - VL - ISWE209L - 00100 - TH - 2025 07 31 - Course Material For Module 1
31 pages
Data Mining for Business Insights
100% (3)
Data Mining for Business Insights
11 pages
Data Mining
No ratings yet
Data Mining
35 pages
Data Mining
No ratings yet
Data Mining
88 pages
Datamining&warehousing
No ratings yet
Datamining&warehousing
65 pages
Data Mining Course Overview
No ratings yet
Data Mining Course Overview
30 pages
DM Notes
No ratings yet
DM Notes
91 pages
INTRODUCTION Data Mining
No ratings yet
INTRODUCTION Data Mining
43 pages
Lec 1
No ratings yet
Lec 1
33 pages
Data Mining Overview by Archana Ketkar
No ratings yet
Data Mining Overview by Archana Ketkar
24 pages
Data Mining: Concepts and Applications
No ratings yet
Data Mining: Concepts and Applications
35 pages
Chapter 1 Data Mining Lecture Note
No ratings yet
Chapter 1 Data Mining Lecture Note
31 pages
UNIT 5 Introduction To Data Mining-1
No ratings yet
UNIT 5 Introduction To Data Mining-1
185 pages
SAP PM Presentation Bmansi - ppt1
100% (1)
SAP PM Presentation Bmansi - ppt1
70 pages
DUTY ROSTRUM For PLT 17 To 18 QAED Gakkhar Gujranwala
No ratings yet
DUTY ROSTRUM For PLT 17 To 18 QAED Gakkhar Gujranwala
4 pages
Cs CPD 6thStPkingGarage Us
No ratings yet
Cs CPD 6thStPkingGarage Us
2 pages
Lecture Note
No ratings yet
Lecture Note
2 pages
BCA Software Testing Exam Questions
No ratings yet
BCA Software Testing Exam Questions
6 pages
Bahawalpur Highway Tender Notice
No ratings yet
Bahawalpur Highway Tender Notice
5 pages
EconUnited - Unit 4 of Macroeconomics Slides
No ratings yet
EconUnited - Unit 4 of Macroeconomics Slides
43 pages
SkitreLABS: AI-Driven Soft Skills Training
No ratings yet
SkitreLABS: AI-Driven Soft Skills Training
15 pages
SLN 06 Nov 2024
No ratings yet
SLN 06 Nov 2024
1 page
Simulation Tools for Educators
No ratings yet
Simulation Tools for Educators
5 pages
BACOSTMX Module 4-Process Costing
No ratings yet
BACOSTMX Module 4-Process Costing
59 pages
HS Codes & FTAs for Trade Experts
No ratings yet
HS Codes & FTAs for Trade Experts
66 pages
Aritifical inTeilLiGence
No ratings yet
Aritifical inTeilLiGence
24 pages
Tests de Conduite Reprogrammés
100% (1)
Tests de Conduite Reprogrammés
5 pages
Hazard Identification and Risk Assesment
No ratings yet
Hazard Identification and Risk Assesment
4 pages
Passport Service Fee Receipt
No ratings yet
Passport Service Fee Receipt
3 pages
Ga H61M S1
No ratings yet
Ga H61M S1
6 pages
Tyro Human Resource Company Profile
No ratings yet
Tyro Human Resource Company Profile
11 pages
Meeting Minutes (1) F&B Logistic Technical & Safety
No ratings yet
Meeting Minutes (1) F&B Logistic Technical & Safety
3 pages
Certificate Application Form
No ratings yet
Certificate Application Form
2 pages
SSN College of Engineering KALAVAKKAM-603110
No ratings yet
SSN College of Engineering KALAVAKKAM-603110
6 pages
Manual - Mako Imagesetter
No ratings yet
Manual - Mako Imagesetter
116 pages
SMARTRAC Datasheet Crate Tag
No ratings yet
SMARTRAC Datasheet Crate Tag
2 pages
Subhash Kumar-Resume
No ratings yet
Subhash Kumar-Resume
5 pages
Networkprgramabilityusing Poxcontroller
No ratings yet
Networkprgramabilityusing Poxcontroller
6 pages
LFS 2020-21 Report
No ratings yet
LFS 2020-21 Report
230 pages
100 Qs and As JAN23
No ratings yet
100 Qs and As JAN23
47 pages
Game Informer - January 2015 USA
100% (1)
Game Informer - January 2015 USA
104 pages
AA430ML Strainer Specifications
No ratings yet
AA430ML Strainer Specifications
25 pages
Acct 240, CHP 8 - Master Budgeting Flashcards - Quizlet
No ratings yet
Acct 240, CHP 8 - Master Budgeting Flashcards - Quizlet
7 pages

Introduction Lecture1gghhhhh

Uploaded by

Introduction Lecture1gghhhhh

Uploaded by

Data Mining

2- Decision trees: Tree-shaped structures that represent sets of

3- Genetic algorithms: Optimization techniques that use

4-Nearest neighbor method: A technique that classifies each

5- Rule induction: The extraction of useful if-then rules from data

 There is a rapidly growing body of successful applications in

2– a credit card company can use its data warehouse of

3– a major hotel chain can use survey databases to identify

4– predicting the probability of default for consumer loan

5– reducing fabrication flaws in VLSI chips.

6– data mining systems can sift through vast quantities of data

7– predicting audience share for television programmers ,

8– predicting the probability that a cancer patient will respond

 The KDD process is defined as: the nontrivial process of

 Valid: are the discovered patterns representative of the data.

 Data mining—core of Pattern Evaluation

Data Selection and

2. Data mining( an essential process where intelligent

 Data mining (knowledge discovery from data)

1 Yes Single 125K No No Single 75K ?

 Detect significant deviations from normal

You might also like