0% found this document useful (0 votes)

104 views43 pages

Data Mining for Business Insights

Data mining is the automated analysis of massive data sets to discover hidden patterns and relationships. It has grown out of the inability to analyze growing amounts of data using traditional methods, and the ability to economically store large datasets. Data mining uses techniques from machine learning, statistics, pattern recognition and visualization to extract useful information from large datasets. It is a core component of the knowledge discovery process.

Uploaded by

anon_947471502

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

104 views43 pages

Data Mining for Business Insights

Uploaded by

anon_947471502

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 43

Data Mining

Md Tabrez Nafis
Department of Computer Science & Engineering
JAMIA HAMDARD, New Delhi

1
Why Data Mining?

 The Explosive Growth of Data: from terabytes to petabytes

 Data collection and data availability
 Automated data collection tools, database systems, Web,
computerized society
 Major sources of abundant data
 Business: Web, e-commerce, transactions, stocks, …
 Science: Remote sensing, bioinformatics, scientific simulation, …
 Society and everyone: news, digital cameras, YouTube
 We are drowning in data, but starving for knowledge!
 “Necessity is the mother of invention”—Data mining—Automated
analysis of massive data sets

2
Evolution of Sciences
 Before 1600, empirical science
 1600-1950s, theoretical science
 Each discipline has grown a theoretical component. Theoretical models often
motivate experiments and generalize our understanding.
 1950s-1990s, computational science
 Over the last 50 years, most disciplines have grown a third, computational branch
(e.g. empirical, theoretical, and computational ecology, or physics, or linguistics.)
 Computational Science traditionally meant simulation. It grew out of our inability to
find closed-form solutions for complex mathematical models.
 1990-now, data science
 The flood of data from new scientific instruments and simulations
 The ability to economically store and manage petabytes of data online
 The Internet and computing Grid that makes all these archives universally accessible
 Scientific info. management, acquisition, organization, query, and visualization tasks
scale almost linearly with data volumes. Data mining is a major new challenge!

3
Evolution of Database Technology
 1960s:
 Data collection, database creation, IMS and network DBMS
 1970s:
 Relational data model, relational DBMS implementation
 1980s:
 RDBMS, advanced data models (extended-relational, OO, deductive, etc.)
 Application-oriented DBMS (spatial, scientific, engineering, etc.)
 1990s:
 Data mining, data warehousing, multimedia databases, and Web
databases
 2000s
 Stream data management and mining
 Data mining and its applications
 Web technology (XML, data integration) and global information systems

4
What Is Data Mining?

 Data mining (knowledge discovery from data)

 Extraction of interesting (non-trivial, implicit, previously
unknown and potentially useful) patterns or knowledge from
huge amount of data
 Data mining: a misnomer?
 Alternative names
 Knowledge discovery (mining) in databases (KDD), knowledge
extraction, data/pattern analysis, data archeology, data
dredging, information harvesting, business intelligence, etc.

5
Knowledge Discovery (KDD) Process

 Data mining—core of Pattern Evaluation

knowledge discovery
process
Data Mining

Task-relevant Data

Data Warehouse Selection

Data Cleaning

Data Integration

Databases
6
Data Mining and Business Intelligence

Increasing potential
to support
business decisions End User
Decision
Making

Data Presentation Business

Analyst
Visualization Techniques
Data Mining Data
Information Discovery Analyst

Data Exploration
Statistical Summary, Querying, and Reporting

Data Preprocessing/Integration, Data Warehouses

DBA
Data Sources
Paper, Files, Web documents, Scientific experiments, Database Systems
7
Data Mining: Confluence of Multiple Disciplines

Database
Technology Statistics

Machine Visualization
Learning Data Mining

Pattern
Recognition Other
Algorithm Disciplines

8
Why Not Traditional Data Analysis?
 Tremendous amount of data
 Algorithms must be highly scalable to handle such as tera-bytes of
data
 High-dimensionality of data
 Micro-array may have tens of thousands of dimensions
 High complexity of data
 Data streams and sensor data
 Time-series data, temporal data, sequence data
 Structure data, social networks
 Heterogeneous databases
 Spatial, multimedia, text and Web data
 Software programs, scientific simulations
 New and sophisticated applications
9
Database Processing vs. Data Mining
Processing

 Query  Query
 Well defined  Poorly defined
 SQL  No precise query
language

 Data  Data
– Operational data – Not operational data

 Output  Output
– Precise – Fuzzy
– Subset of database – Not a subset of database

10
Query Examples
 Database
– Find all credit applicants with last name of Smith.
– Identify customers who have purchased more
than Rs. 10,000 in the last month.
– Find all customers who have purchased milk

 Data Mining
– Find all credit applicants who are poor credit
risks. (classification)
– Identify customers with similar buying habits.
(Clustering)
– Find all items which are frequently purchased
with milk. (association rules)
11
Architecture of Data Mining System

This is the information

of domain we are
mining like concept
Communicates between users and data mining hierarchies, to organize
system. Visualizes results or perform attributes onto various
exploration on data and schemas. levels of abstraction

Tests for interestingness of a pattern

Performs functionalities like characterization,

association, classification, prediction etc. Also contains user
beliefs, which can be
Is responsible for fetching relevant data based used to access
on user request interestingness of
pattern or thresholds

This is usually the source of data.

The data may require cleaning and
integration.

Architecture of data mining system

Basic Data Mining Tasks
 Classification maps data into predefined
groups or classes
 Supervised learning
 Prediction
 Regression

 Clustering groups similar data together into

clusters.
 Unsupervised learning
 Segmentation
 Partitioning

13
Basic Data Mining Tasks (cont’d)
 Link Analysis uncovers relationships among data.
 Affinity Analysis
 Association Rules
 Sequential Analysis determines sequential patterns.

14
Multi-Dimensional View of Data Mining
 Data to be mined
 Relational, data warehouse, transactional, stream, object-
oriented/relational, active, spatial, time-series, text, multi-media,
heterogeneous, legacy, WWW
 Knowledge to be mined
 Characterization, discrimination, association, classification, clustering,
trend/deviation, outlier analysis, etc.
 Multiple/integrated functions and mining at multiple levels
 Techniques utilized
 Database-oriented, data warehouse (OLAP), machine learning, statistics,
visualization, etc.
 Applications adapted
 Retail, telecommunication, banking, fraud analysis, bio-data mining, stock
market analysis, text mining, Web mining, etc.

15
Data Mining: Classification Schemes

 General functionality
 Descriptive data mining
 Predictive data mining
 Different views lead to different classifications
 Data view: Kinds of data to be mined
 Knowledge view: Kinds of knowledge to be discovered
 Method view: Kinds of techniques utilized
 Application view: Kinds of applications adapted

16
Data Mining Functionalities
 Multidimensional concept description: Characterization and
discrimination
 Generalize, summarize, and contrast data characteristics, e.g.,
dry vs. wet regions
 Frequent patterns, association, correlation vs. causality
 Bread  Butter [0.5%, 75%] (Correlation or causality?)
 Classification and prediction
 Construct models (functions) that describe and distinguish
classes or concepts for future prediction
 E.g., classify countries based on (climate), or classify cars
based on (gas mileage)
 Predict some unknown or missing numerical values

17
Data Mining Functionalities (2)
 Cluster analysis
 Class label is unknown: Group data to form new classes, e.g.,

cluster houses to find distribution patterns

 Maximizing intra-class similarity & minimizing interclass similarity

 Outlier analysis
 Outlier: Data object that does not comply with the general behavior

of the data
 Noise or exception? Useful in fraud detection, rare events analysis

 Trend and evolution analysis

 Trend and deviation: e.g., regression analysis

 Sequential pattern mining: e.g., digital camera  large SD memory

 Periodicity analysis

 Similarity-based analysis

 Other pattern-directed or statistical analyses

18
Why Data Mining?—Potential Applications

 Data analysis and decision support

 Market analysis and management
 Target marketing, customer relationship management (CRM),
market basket analysis, cross selling, market segmentation
 Risk analysis and management
 Forecasting, customer retention, improved underwriting,
quality control, competitive analysis
 Fraud detection and detection of unusual patterns (outliers)
 Other Applications
 Text mining (news group, email, documents) and Web mining
 Stream data mining
 Bioinformatics and bio-data analysis

19
Ex. 1: Market Analysis and Management
 Where does the data come from?—Credit card transactions, loyalty cards,
discount coupons, customer complaint calls, plus (public) lifestyle studies
 Target marketing
 Find clusters of “model” customers who share the same characteristics: interest,
income level, spending habits, etc.
 Determine customer purchasing patterns over time
 Cross-market analysis—Find associations/co-relations between product sales,
& predict based on such association
 Customer profiling—What types of customers buy what products (clustering
or classification)
 Customer requirement analysis
 Identify the best products for different groups of customers
 Predict what factors will attract new customers
 Provision of summary information
 Multidimensional summary reports
 Statistical summary information (data central tendency and variation)

20
Ex. 2: Corporate Analysis & Risk Management

 Finance planning and asset evaluation

 cash flow analysis and prediction
 contingent claim analysis to evaluate assets
 cross-sectional and time series analysis (financial-ratio, trend
analysis, etc.)
 Resource planning
 summarize and compare the resources and spending
 Competition
 monitor competitors and market directions
 group customers into classes and a class-based pricing procedure
 set pricing strategy in a highly competitive market

21
Ex. 3: Fraud Detection & Mining Unusual Patterns

 Approaches: Clustering & model construction for frauds, outlier analysis

 Applications: Health care, retail, credit card service, telecomm.
 Auto insurance: ring of collisions
 Money laundering: suspicious monetary transactions
 Medical insurance
 Professional patients, ring of doctors, and ring of references
 Unnecessary or correlated screening tests
 Telecommunications: phone-call fraud
 Phone call model: destination of the call, duration, time of day or
week. Analyze patterns that deviate from an expected norm
 Retail industry
 Analysts estimate that 38% of retail shrink is due to dishonest
employees

22
Mining for Knowledge
 Knowledge in the form of rules
 If <condition_1>&<condition_2>& …&<condition_n> Then
<conclusion>
 Types of knowledge
 Association
 Presence of one set of items/attributes implies presence of
another set.
 Classification
 Given examples of objects belonging to different groups,
develop profile of each group in terms of attributes of the
objects.
 Clustering.
 Unsupervised grouping of similar records based on attributes.
 Prediction (temporal and spatial).
 Historical records collected at fixed period of time.

23
Mining Association Rules

 The presence of one set of items in a transaction

implies the presence of another set of items
 30% of people who buy bread also buy butter.
 The presence of an attribute value in a record
implies the presence of another
 60% of patients with these symptoms also have that
symptom.

24
Data Mining Functionalities:
Mining Frequent Patterns
Frequent patterns are the patterns that occur 8
frequently in the data. Patterns can include
itemsets, sequences and subsequences.
A frequent itemset refers to a set of items that
often appear together in a transactional data set.
ex: bread and milk
Data Mining Functionalities:
Mining Frequent Patterns
Association Rules 9
buys(X, “computer”)=>buys(X, “software”) [support =1%, confidence = 50%]

age(X, “20..29”)^income(X, “40K..49K”)=>buys(X, “laptop”)

if a customer buys a computer, there is a 50% chance that he will buy software as well

Single Dimension Association Rule 1% of all the transactions under analysis show
that computer and software are purchased together

[support = 2%, confidence = 60%]

Multi-Dimension Association Rule

Association rules are discarded as uninteresting if they do not satisfy minimum support threshold and minimum confidence threshold
An Example Association Rule

 Mobile Telecom Data

 Provided by a telecom company.
 Over 200 relational tables and transactional data
of over 30,000 records.
 Example of a discovered association rules
 60% who call from New Delhi call to Mumbai.

 77% whose average call duration is greater

than 5 minutes make an average of over 80

phone calls per month.

27
Data Mining Functionalities:
Classification and Prediction
10
Classification is the process of finding a model (or function) that describes and
distinguishes data classes or concepts. The model is derived based on the
analysis of a set of training data and is used to predict the class label of objects.

Representation of Derived model

IF-THEN Rules

Decision Tree

Neural Network
Data Mining Functionalities:
Classification and Prediction
11 or
Prediction values continuous valued functions, i.e. it is used to predict missing
unavailable numeric data values rather than class labels.
Prediction can be used for both numeric prediction and class label prediction.
Regression analysis is a statistical method used numeric prediction.
Classification and regression may need to be preceded by relevance analysis,
which attempts to identify attributes that are significantly relevant to the
classification and regression process. Such attributes will be selected for the
classification and regression process. Other attributes, which are irrelevant, can
then be excluded from consideration
Mining Classification Rules
Patient Records
Symptoms, Diseases
Recovered

Never
Recover Recover
? ed
Not
recover?
30
An Example of Classification
 Credit card data
 Each transaction contains transaction date, amount, and a set of
items purchased, etc.
 Each customer record contains gender, age, education
background, etc.
 Example of rules discovered:
 IF use of card >= 9 months continuously & no. of transaction <= 2
THEN Cash Advance = Yes.
 Actionable item:
 Promote credit services to potential customers who requires cash
advance.

31
Data Mining Functionalities:
Cluster Analysis
Clustering analyzes data objects without consulting
12
class labels.
Clustering can be used to generate class labels for
a group of data which did not exist at the
beginning.
The objects are clustered or grouped based on the
principle of maximizing the intra-class similarity and
minimizing the inter-class similarity.
Discovering Clusters
Dividing them up into groups according to similarity

33
34
Classification ≠Clustering

Classification
What is the difference
between Good & Bad
Good Customers Bad Customers

Clustering
How can I group the
customers

35
Discovering Sequential Patterns
 People who have purchased a VCR are three
times more likely to purchase a camcorder
two to four months after the purchase.

 If the price of Stock A increases by more than

10% and the price of Stock B decreases by
less than 2% today, then the price of Stock C
will increase by 5% two days later.

36
An Example of Sequential Pattern
Mining
 Electricity consumption data:
 A set of time series each associated with an
industrial user.
 Each time series represents an electricity load
profile of a user at a certain premise.
 Reading of electricity load taken every 30 min.
 The Goal
 Identify companies with similar electricity load
profiles using data mining.

37
Web Log Mining

 Web Servers register a log entry for every single

access they get.
 A huge number of accesses (hits) are registered and
collected in an ever-growing web log.
 Web log mining:
 Understand general access patterns and trends.
 Better structure and grouping of resource providers.
 Adaptive Sites -- Web site restructures itself automatically.
 Personalization.
 Target customers for electronic commerce
 Identify potential prime advertisement locations

38
An Example of Web Log Mining

 Given a web access log file

 Provided by an airline company.
 The Goal
 Analysis user access pattern
 e.g. Page A --> Page B --> Page C --> …
 Which page the viewer will arrive after accessing certain URLs.
 Results:
 IF Page = Destination Information & Next Page = Flight
Schedules THEN Next Page = XxxAir Travel Packages
 IF Day of week = Wed. & Time = Non-office hour
THEN duration = long
 Actionable Items
 Golden time for advertisements is on Wed. during non-office
hour.

39
KDD Process: Several Key Steps
 Learning the application domain
 relevant prior knowledge and goals of application
 Creating a target data set: data selection
 Data cleaning and preprocessing: (may take 60% of effort!)
 Data reduction and transformation
 Find useful features, dimensionality/variable reduction, invariant
representation
 Choosing functions of data mining
 summarization, classification, regression, association, clustering
 Choosing the mining algorithm(s)
 Data mining: search for patterns of interest
 Pattern evaluation and knowledge presentation
 visualization, transformation, removing redundant patterns, etc.
 Use of discovered knowledge
40
Are All the “Discovered” Patterns Interesting?

 Data mining may generate thousands of patterns: Not all of them

are interesting
 Suggested approach: Human-centered, query-based, focused mining

41
Requirements and Challenges
 Variety of data types.
 Noisy and incomplete data
 The interestingness problem.
 Different kinds of knowledge.
 Different levels of abstraction.
 Expression and visualization of data mining
results.
 Efficiency and scalability of data mining
algorithms.

42
 Thank You

Data Mining: Techniques and Applications
No ratings yet
Data Mining: Techniques and Applications
43 pages
Introduction to Data Mining Concepts
No ratings yet
Introduction to Data Mining Concepts
45 pages
Major Issues in Data Mining
80% (5)
Major Issues in Data Mining
45 pages
02-Introduction To Data Mining
No ratings yet
02-Introduction To Data Mining
40 pages
01 Intro
No ratings yet
01 Intro
40 pages
Data Mining: Concepts and Applications
No ratings yet
Data Mining: Concepts and Applications
35 pages
Data Mining Concepts Overview
No ratings yet
Data Mining Concepts Overview
28 pages
DM-Unit 1
No ratings yet
DM-Unit 1
110 pages
Introduction
No ratings yet
Introduction
46 pages
01 Introduction
No ratings yet
01 Introduction
36 pages
01 Intro
No ratings yet
01 Intro
29 pages
Inf 444e - Datamining N Advanced Databases Introduction 2019
No ratings yet
Inf 444e - Datamining N Advanced Databases Introduction 2019
32 pages
Data Mining Notes
100% (1)
Data Mining Notes
45 pages
Intro Data Mining
No ratings yet
Intro Data Mining
51 pages
Chapter 1. Introduction
No ratings yet
Chapter 1. Introduction
323 pages
Comprehensive Guide to Data Mining
No ratings yet
Comprehensive Guide to Data Mining
32 pages
01 Intro
No ratings yet
01 Intro
28 pages
01 Intro
No ratings yet
01 Intro
22 pages
Data Mining: Concepts and Applications
No ratings yet
Data Mining: Concepts and Applications
41 pages
Data Mining
No ratings yet
Data Mining
27 pages
Data Mining Introduction
No ratings yet
Data Mining Introduction
32 pages
Data Mining: Concepts and Techniques
No ratings yet
Data Mining: Concepts and Techniques
25 pages
Data Mining 1
No ratings yet
Data Mining 1
39 pages
Lecture 01 11jan
No ratings yet
Lecture 01 11jan
29 pages
CSC 452 DM Lecture01 Course Information 13102020 014048pm
No ratings yet
CSC 452 DM Lecture01 Course Information 13102020 014048pm
49 pages
Unit 1: Data Warehousing & Data Mining
No ratings yet
Unit 1: Data Warehousing & Data Mining
54 pages
Introduction To Data Mining 1604
No ratings yet
Introduction To Data Mining 1604
32 pages
01 Intro
No ratings yet
01 Intro
23 pages
CSM6404 DM L1
No ratings yet
CSM6404 DM L1
29 pages
Data Mining for Analysts
No ratings yet
Data Mining for Analysts
17 pages
Data Mining: Concepts and Techniques
No ratings yet
Data Mining: Concepts and Techniques
27 pages
Data Mining: Concepts and Techniques
100% (2)
Data Mining: Concepts and Techniques
27 pages
Data Mining Basics with Excel and R
No ratings yet
Data Mining Basics with Excel and R
17 pages
01 - Introduction To Datamining
No ratings yet
01 - Introduction To Datamining
19 pages
DWDM LS1 Fall 24 25
No ratings yet
DWDM LS1 Fall 24 25
42 pages
01 Intro
No ratings yet
01 Intro
41 pages
Intro of Data Mining
No ratings yet
Intro of Data Mining
27 pages
Data Mining Essentials for Analysts
No ratings yet
Data Mining Essentials for Analysts
35 pages
Data Mining Concepts and Techniques
No ratings yet
Data Mining Concepts and Techniques
37 pages
LECTURE 1 Data Mining
No ratings yet
LECTURE 1 Data Mining
41 pages
Introduction
No ratings yet
Introduction
27 pages
Data Mining: Concepts and Techniques: - Chapter 1
No ratings yet
Data Mining: Concepts and Techniques: - Chapter 1
37 pages
Data Mining & BI Course Guide
No ratings yet
Data Mining & BI Course Guide
25 pages
Data Mining Essentials for Students
No ratings yet
Data Mining Essentials for Students
95 pages
Chapter - 1
No ratings yet
Chapter - 1
22 pages
01 Intro 1
No ratings yet
01 Intro 1
50 pages
01 - Data Mining Introduction
No ratings yet
01 - Data Mining Introduction
21 pages
July 16, 2009 1 Data Mining
No ratings yet
July 16, 2009 1 Data Mining
26 pages
Lecture 1 and 2 - Introduction and Background
No ratings yet
Lecture 1 and 2 - Introduction and Background
28 pages
01 Intro 1
No ratings yet
01 Intro 1
33 pages
DataMining S
No ratings yet
DataMining S
103 pages
Lecture 1
No ratings yet
Lecture 1
37 pages
Internal
No ratings yet
Internal
267 pages
Lecture 1. Introduction
No ratings yet
Lecture 1. Introduction
42 pages
KDD in Data Mining: Hindi Overview
No ratings yet
KDD in Data Mining: Hindi Overview
19 pages
Data Mining:: Concepts and Techniques
No ratings yet
Data Mining:: Concepts and Techniques
41 pages
Data Mining Concepts and Applications
No ratings yet
Data Mining Concepts and Applications
27 pages
Data Mining Basics for Beginners
No ratings yet
Data Mining Basics for Beginners
59 pages
B.Tech CSE VII Semester Syllabus
No ratings yet
B.Tech CSE VII Semester Syllabus
36 pages
AbuSaa2019 Article FactorsAffectingStudentsPerfor
No ratings yet
AbuSaa2019 Article FactorsAffectingStudentsPerfor
32 pages
Introduction To Data Mining Global Edition Pang Ning Tan Michael Steinbach Anuj Karpatne Vipin Kumar All Chapters Available
No ratings yet
Introduction To Data Mining Global Edition Pang Ning Tan Michael Steinbach Anuj Karpatne Vipin Kumar All Chapters Available
98 pages
Networks of Control PDF
No ratings yet
Networks of Control PDF
165 pages
Algorithmics Research On Knowledge Discovery and Data Mining
No ratings yet
Algorithmics Research On Knowledge Discovery and Data Mining
32 pages
Assignment 5
No ratings yet
Assignment 5
3 pages
Chapter-2 DM
No ratings yet
Chapter-2 DM
23 pages
Graduate Computer Vision Course
No ratings yet
Graduate Computer Vision Course
1 page
Fundamentals of Data Science Unit 1
No ratings yet
Fundamentals of Data Science Unit 1
29 pages
Neural Networks Overview for 16-385
100% (1)
Neural Networks Overview for 16-385
20 pages
2018 Class Resume Book
No ratings yet
2018 Class Resume Book
46 pages
Master Data Science Program Overview
No ratings yet
Master Data Science Program Overview
20 pages
Multilevel Modal Value Analysis For Interpreting Categorical K-Medoid Clusters Data
No ratings yet
Multilevel Modal Value Analysis For Interpreting Categorical K-Medoid Clusters Data
10 pages
Data Sciencefor Business
No ratings yet
Data Sciencefor Business
107 pages
Web Structure Mining
No ratings yet
Web Structure Mining
10 pages
Digital Marketing Research Strategies
100% (1)
Digital Marketing Research Strategies
32 pages
Marketing Management Insights
No ratings yet
Marketing Management Insights
74 pages
11 Grid Based Methods 04-11-2024
No ratings yet
11 Grid Based Methods 04-11-2024
12 pages
Machine Learning For Science and Society: Cynthia Rudin
No ratings yet
Machine Learning For Science and Society: Cynthia Rudin
9 pages
Cs402 Datamining and Warehousing Mod Iv Question Bank
No ratings yet
Cs402 Datamining and Warehousing Mod Iv Question Bank
4 pages
DWDM Lab
No ratings yet
DWDM Lab
5 pages
Fundamentals of Business Analytics
No ratings yet
Fundamentals of Business Analytics
10 pages
Data Mining & Warehousing Course
No ratings yet
Data Mining & Warehousing Course
1 page
Unit 1 - Big Data Technologies
No ratings yet
Unit 1 - Big Data Technologies
89 pages
Top Cited Articles - October 2024 - Top Cited Articles in Data Mining
No ratings yet
Top Cited Articles - October 2024 - Top Cited Articles in Data Mining
6 pages
2003 Book HandbookOnDataManagementInInfo
No ratings yet
2003 Book HandbookOnDataManagementInInfo
586 pages
Deep Learning Insights by Suresh Jaganathan
No ratings yet
Deep Learning Insights by Suresh Jaganathan
73 pages
Diabetes Prediction Using Data Mining: 1. Admin
No ratings yet
Diabetes Prediction Using Data Mining: 1. Admin
2 pages
Business Intelligence Overview and Methods
No ratings yet
Business Intelligence Overview and Methods
9 pages
Mouli Full Project
No ratings yet
Mouli Full Project
53 pages

Data Mining for Business Insights

Uploaded by

Data Mining for Business Insights

Uploaded by

Data Mining

 The Explosive Growth of Data: from terabytes to petabytes

 Data mining (knowledge discovery from data)

 Data mining—core of Pattern Evaluation

Data Warehouse Selection

Data Presentation Business

Data Preprocessing/Integration, Data Warehouses

This is the information

Tests for interestingness of a pattern

Performs functionalities like characterization,

This is usually the source of data.

Architecture of data mining system

 Clustering groups similar data together into

cluster houses to find distribution patterns

 Trend and evolution analysis

 Sequential pattern mining: e.g., digital camera  large SD memory

 Other pattern-directed or statistical analyses

 Data analysis and decision support

 Finance planning and asset evaluation

 Approaches: Clustering & model construction for frauds, outlier analysis

 The presence of one set of items in a transaction

age(X, “20..29”)^income(X, “40K..49K”)=>buys(X, “laptop”)

[support = 2%, confidence = 60%]

 Mobile Telecom Data

 77% whose average call duration is greater

than 5 minutes make an average of over 80

Representation of Derived model

 If the price of Stock A increases by more than

 Web Servers register a log entry for every single

 Given a web access log file

 Data mining may generate thousands of patterns: Not all of them

You might also like