0% found this document useful (0 votes)

82 views17 pages

Lecture 1

This document provides an overview of data mining. It discusses data mining techniques like classification, clustering, association rule mining and sequential pattern mining. It describes the steps involved in a knowledge discovery process including data selection, cleaning, transformation, mining and evaluation. Examples of large datasets and applications of data mining are also presented. The document outlines the origins, functionalities and process of data mining.

Uploaded by

Subhashini Reddy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

82 views17 pages

Lecture 1

Uploaded by

Subhashini Reddy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 17

Data Mining

Data Mining Overview

• Data warehouses and OLAP (On Line Analytical Processing.)
• Association Rules Mining
• Clustering: Hierarchical and Partition approaches
• Classification: Decision Trees and Bayesian classifiers
• Sequential Pattern Mining
• Advanced topics: graph mining, privacy preserving data
mining, outlier detection, spatial data mining
What is Data Mining?
• Data Mining is:
(1) The efficient discovery of previously
unknown, valid, potentially useful,
understandable patterns in large datasets

(2) The analysis of (often large) observational

data sets to find unsuspected relationships
and to summarize the data in novel ways that
are both understandable and useful to the
data owner
Overview of terms

• Data: a set of facts (items) D, usually stored in

a database
• Pattern: an expression E in a language L, that
describes a subset of facts
• Attribute: a field in an item i in D.
• Interestingness: a function ID,L that maps an
expression E in L into a measure space M
Overview of terms
• The Data Mining Task:

For a given dataset D, language of facts L,

interestingness function ID,L and threshold c,
find the expression E such that ID,L(E) > c
efficiently.
Knowledge Discovery
Steps of a KDD Process

• Learning the application domain

– Relevant prior knowledge and goals of application
• Creating a target data set: data selection
• Data cleaning and preprocessing: (may take 60% of effort!)
• Data reduction and transformation
– Find useful features, dimensionality/variable reduction.
• Choosing functions of data mining
– Summarization, classification, regression, association, clustering.
• Choosing the mining algorithm(s)
• Data mining: search for patterns of interest
• Pattern evaluation and knowledge presentation
– Visualization, transformation, removing redundant patterns, etc.
• Use of discovered knowledge

7
Architecture: Typical Data Mining System

Graphical user interface

Pattern evaluation

Data mining engine

Knowledge-base
Database or data
warehouse server
Data cleaning & data integration Filtering

Data
Databases Warehouse
8
Data Mining: On What Kinds of Data?
• Relational database
• Data warehouse
• Transactional database
• Advanced database and information repository
– Spatial and temporal data
– Time-series data
– Stream data
– Multimedia database
– Text databases & WWW

9
Examples of Large Datasets

• Government: IRS, NGA, …

• Large corporations
– WALMART: 20M transactions per day
– MOBIL: 100 TB geological databases
– AT&T 300 M calls per day
– Credit card companies

• Scientific
– NASA, EOS project: 50 GB per hour
– Environmental datasets
Examples of Data mining Applications

1. Fraud detection: credit cards, phone cards

2. Marketing: customer targeting
3. Data Warehousing: Walmart
4. Astronomy
5. Molecular biology
How Data Mining is used

1. Identify the problem

2. Use data mining techniques to transform
the data into information
3. Act on the information
4. Measure the results
The Data Mining Process
1. Understand the domain
2. Create a dataset:
– Select the interesting attributes
– Data cleaning and preprocessing
3. Choose the data mining task and the specific
algorithm
4. Interpret the results, and possibly return to 2
Origins of Data Mining

• Draws ideas from machine learning/AI, pattern

recognition, statistics, and database systems
AI /
• Must address: Statistics
Machine Learning
– Enormity of data
– High dimensionality
Data Mining
of data
– Heterogeneous,
distributed nature Database
of data systems
Data Mining Functionalities

• Concept description: Characterization and discrimination

– Generalize, summarize, and contrast data characteristics
• Association (correlation and causality)
– Diaper à Beer [0.5%, 75%]
• Classification and Prediction
– Construct models (functions) that describe and distinguish classes or
concepts for future prediction
– Presentation: decision-tree, classification rule, neural network

15
Data Mining Functionalities

• Cluster analysis
– Class label is unknown: Group data to form new classes, e.g., cluster
houses to find distribution patterns
– Maximizing intra-class similarity & minimizing interclass similarity
• Outlier analysis
– Outlier: a data object that does not comply with the general behavior of
the data
– Useful in fraud detection, rare events analysis
• Trend and evolution analysis
– Trend and deviation: regression analysis
– Sequential pattern mining, periodicity analysis

16
Data Mining: Confluence of Multiple Disciplines

Database
Statistics
Systems

Machine Data Mining Visualization

Learning

Algorithm Other
Disciplines

Lecture 1.1.1 1.1.2
No ratings yet
Lecture 1.1.1 1.1.2
32 pages
Data Mining: Concepts and Techniques
100% (2)
Data Mining: Concepts and Techniques
27 pages
Data Mining: Concepts and Techniques
No ratings yet
Data Mining: Concepts and Techniques
27 pages
Data Mining: Concepts and Techniques
No ratings yet
Data Mining: Concepts and Techniques
25 pages
Introduction
No ratings yet
Introduction
27 pages
Intro of Data Mining
No ratings yet
Intro of Data Mining
27 pages
3-OLAP Operations-13!08!2021 (13-Aug-2021) Material I 13-Aug-2021 Data Mining - Introductory Slides
No ratings yet
3-OLAP Operations-13!08!2021 (13-Aug-2021) Material I 13-Aug-2021 Data Mining - Introductory Slides
37 pages
Chapter - 1
No ratings yet
Chapter - 1
22 pages
DWDM LS1 Fall 24 25
No ratings yet
DWDM LS1 Fall 24 25
42 pages
Data Mining Concepts and Applications
No ratings yet
Data Mining Concepts and Applications
27 pages
Comprehensive Guide to Data Mining
No ratings yet
Comprehensive Guide to Data Mining
32 pages
Lecture 01 11jan
No ratings yet
Lecture 01 11jan
29 pages
1 - Lect 1 & 2 Data Mining
No ratings yet
1 - Lect 1 & 2 Data Mining
20 pages
Data Mining Introduction
No ratings yet
Data Mining Introduction
32 pages
FALLSEM2025 26 - VL - ISWE209L - 00100 - TH - 2025 07 31 - Course Material For Module 1
No ratings yet
FALLSEM2025 26 - VL - ISWE209L - 00100 - TH - 2025 07 31 - Course Material For Module 1
31 pages
Unit - I
No ratings yet
Unit - I
22 pages
Data Mining 1
No ratings yet
Data Mining 1
39 pages
01 Intro
No ratings yet
01 Intro
40 pages
Data Mining: An Overview From A Database Perspective
No ratings yet
Data Mining: An Overview From A Database Perspective
30 pages
Lecture 1428550844
No ratings yet
Lecture 1428550844
84 pages
Major Issues in Data Mining
80% (5)
Major Issues in Data Mining
45 pages
Chapter 1 Intro
No ratings yet
Chapter 1 Intro
23 pages
01 Intro
No ratings yet
01 Intro
23 pages
Module1 1 Introduction
No ratings yet
Module1 1 Introduction
27 pages
Data Mining: Concepts and Techniques: - Chapter 1
No ratings yet
Data Mining: Concepts and Techniques: - Chapter 1
37 pages
Data Mining: Concepts and Applications
No ratings yet
Data Mining: Concepts and Applications
35 pages
01 - Introduction To Datamining
No ratings yet
01 - Introduction To Datamining
19 pages
Chapter 1. Introduction
No ratings yet
Chapter 1. Introduction
323 pages
Introduction To Data Mining 1604
No ratings yet
Introduction To Data Mining 1604
32 pages
01 - Data Mining Introduction
No ratings yet
01 - Data Mining Introduction
21 pages
Introduction to Data Mining Concepts
No ratings yet
Introduction to Data Mining Concepts
27 pages
Chapter-1 - Introduction To Data Mining
No ratings yet
Chapter-1 - Introduction To Data Mining
10 pages
Data Mining
No ratings yet
Data Mining
88 pages
Data Mining and Datawarehousing CS-303
No ratings yet
Data Mining and Datawarehousing CS-303
34 pages
ICS 2408 Lecture 1 Introduction
No ratings yet
ICS 2408 Lecture 1 Introduction
32 pages
1 Intro
No ratings yet
1 Intro
33 pages
8 Data Mining and Warehousing
No ratings yet
8 Data Mining and Warehousing
171 pages
Unit 1 Data Mining
No ratings yet
Unit 1 Data Mining
15 pages
Lecture 1
No ratings yet
Lecture 1
37 pages
Understanding Data Mining Techniques
No ratings yet
Understanding Data Mining Techniques
41 pages
Data Mining Concepts Overview
No ratings yet
Data Mining Concepts Overview
28 pages
Inf 444e - Datamining N Advanced Databases Introduction 2019
No ratings yet
Inf 444e - Datamining N Advanced Databases Introduction 2019
32 pages
Combine 056
No ratings yet
Combine 056
57 pages
Introduction to Data Mining Concepts
No ratings yet
Introduction to Data Mining Concepts
45 pages
Data Mining
No ratings yet
Data Mining
254 pages
DWDM
No ratings yet
DWDM
30 pages
Summarizing Transactional Data Insights
No ratings yet
Summarizing Transactional Data Insights
22 pages
Data Mining Notes1
No ratings yet
Data Mining Notes1
56 pages
Motivation For Data Mining The Information Crisis
No ratings yet
Motivation For Data Mining The Information Crisis
13 pages
Intro Data Mining
No ratings yet
Intro Data Mining
51 pages
01 Intro
No ratings yet
01 Intro
29 pages
Unit 1: Data Warehousing & Data Mining
No ratings yet
Unit 1: Data Warehousing & Data Mining
54 pages
History and Patterns in Data Mining
No ratings yet
History and Patterns in Data Mining
25 pages
Unit III
No ratings yet
Unit III
101 pages
Introduction
No ratings yet
Introduction
46 pages
DM 1
No ratings yet
DM 1
47 pages
Data Mining-CH5
No ratings yet
Data Mining-CH5
49 pages
DM Notes
No ratings yet
DM Notes
91 pages
21IS503 UnitII LM5
No ratings yet
21IS503 UnitII LM5
20 pages
Music Therapy Eases Student Stress
No ratings yet
Music Therapy Eases Student Stress
10 pages
Major Challenges in Developing AGI
No ratings yet
Major Challenges in Developing AGI
4 pages
Bandura - Social Learning Theory: Saul Mcleod
No ratings yet
Bandura - Social Learning Theory: Saul Mcleod
4 pages
English119 (Campus Journalism)
No ratings yet
English119 (Campus Journalism)
14 pages
Handout 2021 - Teaching Pronunciation
No ratings yet
Handout 2021 - Teaching Pronunciation
8 pages
2 Lesson Plans Expressing Apology and Criticism Obligation and Deduction
No ratings yet
2 Lesson Plans Expressing Apology and Criticism Obligation and Deduction
2 pages
EdCK - Assessment in Learning Module 2 BEED3B
No ratings yet
EdCK - Assessment in Learning Module 2 BEED3B
14 pages
Class Magazine
100% (3)
Class Magazine
20 pages
Advertising's Impact on Memory Recall
No ratings yet
Advertising's Impact on Memory Recall
23 pages
An Introduction To Genre-Based Approach
94% (18)
An Introduction To Genre-Based Approach
15 pages
Entity Relationship Diagram - Video PIVC
No ratings yet
Entity Relationship Diagram - Video PIVC
5 pages
Zero Conditional Explained with Examples
No ratings yet
Zero Conditional Explained with Examples
23 pages
Perceived Effects of Self-Talk in Stress Management of Students
No ratings yet
Perceived Effects of Self-Talk in Stress Management of Students
68 pages
Behavioral Therapy for Autism
No ratings yet
Behavioral Therapy for Autism
2 pages
Reading For Today Insights 2
75% (4)
Reading For Today Insights 2
242 pages
Lesson Plan in Oral Com 1.9
No ratings yet
Lesson Plan in Oral Com 1.9
2 pages
9 JOHNSON BEHAVIOURAL SYSTEMfinal
No ratings yet
9 JOHNSON BEHAVIOURAL SYSTEMfinal
11 pages
Philosophical Foundations
No ratings yet
Philosophical Foundations
63 pages
Developmental Origins of
No ratings yet
Developmental Origins of
7 pages
Detailed Evidence Examples
100% (1)
Detailed Evidence Examples
24 pages
Role of System Analyst
No ratings yet
Role of System Analyst
10 pages
Allison, Henry - Kant's Transcendental Idealism
100% (2)
Allison, Henry - Kant's Transcendental Idealism
199 pages
7 PPT Role Model
No ratings yet
7 PPT Role Model
13 pages
Chapter 1: Importance of Primary Sources in History
No ratings yet
Chapter 1: Importance of Primary Sources in History
23 pages
LET Reviewer Professional Education Prof. Ed.: D. Maturation Child and Adolescent Development Part 1
No ratings yet
LET Reviewer Professional Education Prof. Ed.: D. Maturation Child and Adolescent Development Part 1
3 pages
Conditionals: Download My Infographic!
No ratings yet
Conditionals: Download My Infographic!
10 pages
Lecture Notes For Chapter 7 Introduction To Data Mining, 2 Edition
No ratings yet
Lecture Notes For Chapter 7 Introduction To Data Mining, 2 Edition
108 pages
Husserl and Levinas
No ratings yet
Husserl and Levinas
21 pages
MIT Technology Review - Volume 124 Issue 5 SeptemberOctober 2021
100% (2)
MIT Technology Review - Volume 124 Issue 5 SeptemberOctober 2021
92 pages
CIET's Role in Educational Technology
No ratings yet
CIET's Role in Educational Technology
3 pages

Lecture 1

Uploaded by

Lecture 1

Uploaded by

Data Mining

Data Mining Overview

(2) The analysis of (often large) observational

• Data: a set of facts (items) D, usually stored in

For a given dataset D, language of facts L,

• Learning the application domain

Graphical user interface

Data mining engine

• Government: IRS, NGA, …

1. Fraud detection: credit cards, phone cards

1. Identify the problem

• Draws ideas from machine learning/AI, pattern

• Concept description: Characterization and discrimination

Machine Data Mining Visualization

You might also like