0% found this document useful (0 votes)

49 views23 pages

Lecturenotes Data Mining

Data mining is the process of analyzing data from different perspectives and summarizing it into useful information. It involves discovering patterns and relationships within large datasets. Common techniques include classification, clustering, association rule mining, and prediction. Decision trees and clustering are popular algorithms. The CRISP-DM methodology provides a standardized process for conducting a data mining project through phases of business understanding, data understanding, data preparation, modeling, evaluation, and deployment.

Uploaded by

tanyah Lloyd

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

49 views23 pages

Lecturenotes Data Mining

Uploaded by

tanyah Lloyd

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

DATA MINING

• It is the process of analyzing data from different

perspectives and summarizing it into useful
information - information that can be used to
increase revenue, cuts costs, or both.
(http://www.anderson.ucla.edu)
• Also defined as the process of extracting valid
previously unknown comprehensible and actionable
information from large databases and using it to
make crucial business decisions.(Conolly & Begg,
2005)
� Technically, it is a process of discovering
meaningful patterns and relationships that lie
hidden within very large databases(Seidman,
2001)
� Refers to the mining or discovery of new
information in terms of patterns or rules from
vast amounts of data
� Keyword here is patterns:
So what is a pattern??
� A set of events that occur with enough frequency
in the dataset to reveal a relationship between
them. Revealing the relationship is usually an
inductive reasoning process
THE MATHEMATICS OF DATA MINING

� Mathematicians have provided an ideal

framework within which to conduct data mining
called the “EUCLIDEAN SPACE” and the
mathematical theory describing it is known as
linear algebra
� So what is the Euclidean space??
PREDICTION

CLASSIFICATION GOALS OF DATA MINING

OPTIMIZATION

IDENTIFICATION
STYLES TO DATA MINING
• Directed data mining- takes the form of predictive
modelling where we know exactly what we want to
predict
• It classifies data for use in making predictions or
estimates with the goal of deriving target values
• Egs banks may use it to predict defaulters on loans,
businesses may use it to decide whom to market their
products to
• Uses popular data mining algorithms such as
decision trees(which will be discussed later on in detail)
� Undirected data mining- which finds patterns
in the data and leaves it up to the user to
determine whether or not these patterns are
important
� Data is placed in a format that makes it easier
for us to make sense of it
� Most commonly used algorithm is clustering
which clumps data together in groups based on
common characteristics(to be discussed later in detail)
� One can then take one of the derived clusters
and apply the decision tree algorithm to it so
that they focus on a particular segment of the
cluster
DATA MINING METHODOLOGY
DATA MINING ALGORITHMS

� A data mining algorithm is a well-defined

procedure that takes data as input and produces as
output: models or patterns
DECISION TREES

� This algorithm analyzes the data and creates a

repeating series of branches until no more
relevant branches can be made
� The end result is a binary tree structure where the
splits in the branches can be followed along
specific criteria to find the most desired result
� Decision Tree (DT):
�Tree where the root and each internal node is labeled
with a question.
�The arcs represent each possible answer to the
associated question.
�Each leaf node represents a prediction of a solution to
the problem.
� Popular technique for classification; Leaf node
indicates class to which the corresponding tuple
belongs.
CLUSTERING
� This algorithm groups data into clusters
� The goal of clustering is to place records into
groups, such that records in a group are similar
to each other and dissimilar to records in other
groups
� An important facet of clustering is the
similarity function that is used
� The Euclidean distance(the ordinary or straight
line distance between two points) can be used
to measure similarity
ASSOCIATION RULE MINING

� It is an important data mining model initially

used for Market Basket Analysis to find how
items purchased by customers are related
ASSOCIATION RULE MINING
� Given a set of transactions, find rules that will predict the
occurrence of an item based on the occurrences of other items
in the transaction

Market-Basket transactions
Example of Association Rules

{Diaper} → {Beer},
{Milk, Bread} → {Eggs,Coke},
{Beer, Bread} → {Milk},

Implication means co-occurrence,

not causality!
DEFINITION: ASSOCIATION RULE
● Association Rule
– An implication expression of the form
X → Y, where X and Y are itemsets
– Example:
{Milk, Diaper} → {Beer}

● Rule Evaluation Metrics

– Support (s)
◆ Fraction of transactions that contain Example
both X and Y :
– Confidence (c)
◆ Measures how often items in Y
appear in transactions that
contain X
MINING ASSOCIATION RULES
Example of Rules:
{Milk,Diaper} → {Beer} (s=0.4, c=0.67)
{Milk,Beer} → {Diaper} (s=0.4, c=1.0)
{Diaper,Beer} → {Milk} (s=0.4, c=0.67)
{Beer} → {Milk,Diaper} (s=0.4, c=0.67)
{Diaper} → {Milk,Beer} (s=0.4, c=0.5)
{Milk} → {Diaper,Beer} (s=0.4, c=0.5)

Observations:
• All the above rules are binary partitions of the same itemset:
{Milk, Diaper, Beer}
• Rules originating from the same itemset have identical support but
can have different confidence
• Thus, we may decouple the support and confidence requirements
CROSS INDUSTRY STANDARD PROCESS FOR DATA
MINING (CRISP- DM)
CRISP-DM: OVERVIEW

� CRISP-DM is a comprehensive data mining

methodology and process model that provides
anyone—from novices to data mining experts—
with a complete blueprint for conducting a data
mining project.
� CRISP-DM breaks down the life cycle of a data
mining project into six phases.
CRISP-DM: PHASES

Business Understanding
� Understanding project objectives and
requirements; Data mining problem definition
Data Understanding
Initial data collection and familiarization; Identify
data quality issues; Initial, obvious results
Data Preparation
� Record and attribute selection; Data cleansing
Modeling
� Run the data mining tools
Evaluation
� Determine if results meet business objectives;
Identify business issues that should have been
addressed earlier
Deployment
� Put the resulting models into practice; Set up for
continuous mining of the data

Data Mining Techniques and Applications
No ratings yet
Data Mining Techniques and Applications
39 pages
Data Mining
No ratings yet
Data Mining
30 pages
BI Unit 3 Part 1
No ratings yet
BI Unit 3 Part 1
51 pages
4 Datamining
No ratings yet
4 Datamining
90 pages
Combinepdf 1
No ratings yet
Combinepdf 1
74 pages
Knowledge Discovery & Data Mining
No ratings yet
Knowledge Discovery & Data Mining
30 pages
Data Mining Technique Using Weka Tool
No ratings yet
Data Mining Technique Using Weka Tool
21 pages
Unit 3 Data Mining
No ratings yet
Unit 3 Data Mining
21 pages
Introduction Lecture1gghhhhh
No ratings yet
Introduction Lecture1gghhhhh
23 pages
Data Mining: Prof Jyotiranjan Hota
No ratings yet
Data Mining: Prof Jyotiranjan Hota
17 pages
Introduction to Data Mining Concepts
No ratings yet
Introduction to Data Mining Concepts
34 pages
Data Mining
No ratings yet
Data Mining
63 pages
Presentation 1
No ratings yet
Presentation 1
28 pages
Week-1-Introduction To Data Mining
No ratings yet
Week-1-Introduction To Data Mining
43 pages
Fundamentals of Data Mining
No ratings yet
Fundamentals of Data Mining
36 pages
What Is Not Data Mining - Ex: Generation of Attendance Report (Of A Course) From Registration Cards. - Student Table (STD)
No ratings yet
What Is Not Data Mining - Ex: Generation of Attendance Report (Of A Course) From Registration Cards. - Student Table (STD)
33 pages
Unit I DM
No ratings yet
Unit I DM
27 pages
Introduction
No ratings yet
Introduction
26 pages
Data Mining: Patterns and Predictions
No ratings yet
Data Mining: Patterns and Predictions
9 pages
Process: 1. Data Mining (The Analysis Step of The "Knowledge Discovery in Databases" Process, or KDD)
No ratings yet
Process: 1. Data Mining (The Analysis Step of The "Knowledge Discovery in Databases" Process, or KDD)
4 pages
Data Mining & Machine Learning Guide
No ratings yet
Data Mining & Machine Learning Guide
19 pages
Big Data 4 (3 - 4)
No ratings yet
Big Data 4 (3 - 4)
13 pages
Data Mining Essentials for Analysts
No ratings yet
Data Mining Essentials for Analysts
73 pages
Chapter 3-IB
No ratings yet
Chapter 3-IB
69 pages
Data Mining: An Overview From A Database Perspective
No ratings yet
Data Mining: An Overview From A Database Perspective
30 pages
Data Mining Implementation
No ratings yet
Data Mining Implementation
9 pages
FDS Unit01
No ratings yet
FDS Unit01
10 pages
Data Mining Course Overview
No ratings yet
Data Mining Course Overview
30 pages
Introduction To Data Mining Techniques: Dr. Rajni Jain
No ratings yet
Introduction To Data Mining Techniques: Dr. Rajni Jain
11 pages
Data Mining for Business Insights
100% (3)
Data Mining for Business Insights
11 pages
Topic 3 Data Mining For Business Intelligence
No ratings yet
Topic 3 Data Mining For Business Intelligence
49 pages
Introduction To Data Mining Unit1
100% (1)
Introduction To Data Mining Unit1
37 pages
Business Intelligence Data Mining: (John Naisbett)
No ratings yet
Business Intelligence Data Mining: (John Naisbett)
60 pages
Data Mining
No ratings yet
Data Mining
31 pages
DM Lec1
No ratings yet
DM Lec1
40 pages
FALLSEM2025 26 - VL - ISWE209L - 00100 - TH - 2025 07 31 - Course Material For Module 1
No ratings yet
FALLSEM2025 26 - VL - ISWE209L - 00100 - TH - 2025 07 31 - Course Material For Module 1
31 pages
Introduction To Data Mining
No ratings yet
Introduction To Data Mining
6 pages
CT075!3!2-DTM-Topic 8 - Introduction To Data Mining
No ratings yet
CT075!3!2-DTM-Topic 8 - Introduction To Data Mining
32 pages
Introduction to Data Mining Basics
No ratings yet
Introduction to Data Mining Basics
43 pages
DW and DM Notes
No ratings yet
DW and DM Notes
89 pages
Clustering & Association Algorithms 4
No ratings yet
Clustering & Association Algorithms 4
17 pages
Introduction To Data Mining-Week1
No ratings yet
Introduction To Data Mining-Week1
43 pages
Data Mining for Aspiring Analysts
No ratings yet
Data Mining for Aspiring Analysts
36 pages
Data Mining and Data Warehouse BY: Dept. of Computer Science Engineering
No ratings yet
Data Mining and Data Warehouse BY: Dept. of Computer Science Engineering
10 pages
Data Mining Techniques Overview
No ratings yet
Data Mining Techniques Overview
39 pages
Introduction To Data Mining
No ratings yet
Introduction To Data Mining
44 pages
3 DM
No ratings yet
3 DM
36 pages
Data Mining Course Overview
No ratings yet
Data Mining Course Overview
38 pages
Data Mining
No ratings yet
Data Mining
33 pages
Integrating Uncertainty in Data Mining
No ratings yet
Integrating Uncertainty in Data Mining
6 pages
DWM
No ratings yet
DWM
66 pages
Data Mining for Analysts
No ratings yet
Data Mining for Analysts
87 pages
BCA Data Mining
No ratings yet
BCA Data Mining
116 pages
2 Data Mining
No ratings yet
2 Data Mining
20 pages
Data Mining Process Overview
100% (1)
Data Mining Process Overview
51 pages
DWDM Unit 2
No ratings yet
DWDM Unit 2
24 pages
Fundamentals of Data Science Notes (Module - 1)
No ratings yet
Fundamentals of Data Science Notes (Module - 1)
19 pages
Lab 7 Identifying and Exploiting SQL Vulnerability Using Burp Suite
No ratings yet
Lab 7 Identifying and Exploiting SQL Vulnerability Using Burp Suite
4 pages
Array 4
No ratings yet
Array 4
44 pages
Distributed Computer System (Final Exam)
No ratings yet
Distributed Computer System (Final Exam)
18 pages
Big Data Assignment 3
No ratings yet
Big Data Assignment 3
3 pages
) A.) B.) C.) D.: Correct
No ratings yet
) A.) B.) C.) D.: Correct
72 pages
CS3481 Set 1
No ratings yet
CS3481 Set 1
3 pages
Apex (Docx) - CliffsNotes
No ratings yet
Apex (Docx) - CliffsNotes
18 pages
DWDM CP - Iii - A
No ratings yet
DWDM CP - Iii - A
8 pages
Windows 10 Rules
No ratings yet
Windows 10 Rules
349 pages
Pandas & NumPy For Tabular Data (Cleaning & Reshaping)
No ratings yet
Pandas & NumPy For Tabular Data (Cleaning & Reshaping)
9 pages
90 (Informatics Practices)
No ratings yet
90 (Informatics Practices)
12 pages
Big Data and Business Intelligence
No ratings yet
Big Data and Business Intelligence
108 pages
CHAPTER 6 File System Management
No ratings yet
CHAPTER 6 File System Management
13 pages
ABD22 1st Exam - 6 January - Attempt Review
No ratings yet
ABD22 1st Exam - 6 January - Attempt Review
13 pages
Business Analyst Course With Tableau Power BI SQL Advance Excel Brochure
No ratings yet
Business Analyst Course With Tableau Power BI SQL Advance Excel Brochure
26 pages
Filetype Inurl PDF Perl
No ratings yet
Filetype Inurl PDF Perl
2 pages
Database Functions Lab Guide
No ratings yet
Database Functions Lab Guide
17 pages
Reletional Database Management
No ratings yet
Reletional Database Management
84 pages
Cs403 Midterm Solved Mcqs by Moaaz
100% (2)
Cs403 Midterm Solved Mcqs by Moaaz
29 pages
Databases vs. Data Warehouses Explained
100% (1)
Databases vs. Data Warehouses Explained
39 pages
AWS Certified Data Engineer Exam Guide
No ratings yet
AWS Certified Data Engineer Exam Guide
112 pages
Oracle DB Tuning 2
No ratings yet
Oracle DB Tuning 2
24 pages
Excel Power Pivot Tutorial PDF
No ratings yet
Excel Power Pivot Tutorial PDF
24 pages
List of Supported DAT Linux Apps:: App Description
No ratings yet
List of Supported DAT Linux Apps:: App Description
3 pages
Database and PLSQL Concepts
No ratings yet
Database and PLSQL Concepts
5 pages
Calc Script 651
No ratings yet
Calc Script 651
418 pages
Introduction To Data Science - Unit-1
No ratings yet
Introduction To Data Science - Unit-1
9 pages
Exadata 11 2 Overview
No ratings yet
Exadata 11 2 Overview
23 pages
Duplicate Power BI
No ratings yet
Duplicate Power BI
28 pages
ISO - IsO 690 - 2021 - Information and Documentation
100% (1)
ISO - IsO 690 - 2021 - Information and Documentation
172 pages

Lecturenotes Data Mining

Uploaded by

Lecturenotes Data Mining

Uploaded by

DATA MINING

• It is the process of analyzing data from different

� Mathematicians have provided an ideal

CLASSIFICATION GOALS OF DATA MINING

� A data mining algorithm is a well-defined

� This algorithm analyzes the data and creates a

� It is an important data mining model initially

Implication means co-occurrence,

● Rule Evaluation Metrics

� CRISP-DM is a comprehensive data mining

You might also like