0% found this document useful (0 votes)
7 views45 pages

DMiningKuliah1 (Introduction)

The document provides an overview of data mining, its significance, and its applications across various fields. It discusses the process of Knowledge Data Discovery (KDD), the tasks involved in data mining such as classification, estimation, prediction, clustering, and association, as well as potential applications in market analysis, risk management, and fraud detection. The document emphasizes the importance of data mining in extracting valuable insights from the vast amounts of data generated in today's world.

Uploaded by

lala lele
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views45 pages

DMiningKuliah1 (Introduction)

The document provides an overview of data mining, its significance, and its applications across various fields. It discusses the process of Knowledge Data Discovery (KDD), the tasks involved in data mining such as classification, estimation, prediction, clustering, and association, as well as potential applications in market analysis, risk management, and fraud detection. The document emphasizes the importance of data mining in extracting valuable insights from the vast amounts of data generated in today's world.

Uploaded by

lala lele
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 45

Data Mining

February 9, 2023 Data Mining 1


Introduction
■ Why data mining?
■ What is Data Mining / Knowledge Data Discovery?
■ Origins of Data Mining
■ Potential Applications
■ Data Mining: On what kind of data?
■ Data Mining Functionalities
■ OLAP Mining System

February 9, 2023 Data Mining 2


Why Data Mining:
Trends leading to Data Flood
More data is generated:
■Bank, telecom, other
business
transactions ...
■Scientific data:
astronomy, biology, etc
■Web, text, and e-
commerce

February 9, 2023 Data Mining 3


Scale Of Data

February 9, 2023 Data Mining 4


Data Growth Rate
■ Twice as much information was created in
2002 as in 1999 (~30% growth rate)
■ Other growth rate estimates even higher
■ And THE PROBLEM IS:
■ Very little data will ever be looked at by a
human
■ We are drowning in data, but starving for
knowledge
■ Knowledge Discovery is NEEDED to make
sense and use of data.
February 9, 2023 Data Mining 5
Why Mine Data?
■ There is often information “hidden” in the data that is not readily
evident
■ Human analysts may take weeks to discover useful information
■ Much of the data is never analyzed at all

February 9, 2023 Data Mining 6


Why Mine Data?

February 9, 2023 Data Mining 7


What Is Data Mining:
Many Names of Data Mining
■Data Fishing, Data Dredging: 1960-
■used by Statistician (as a bad name)
■Data Mining :1990-
■used DB, business
■in 2003 – bad image because of TIA
■Knowledge Discovery in Databases: 1989-
■used by AI, Machine Learning Community
■ also Data Archaeology, Information Harvesting,
Information Discovery, Knowledge Extraction, ...

Currently: Data Mining and Knowledge Discovery in


Databases (KDD) are used interchangeably
February 9, 2023 Data Mining 8
Knowledge Data Discovery (KDD)
■Knowledge Discovery in Data
is the non-trivial process of
identifying
■valid
■novel
■potentially useful
■and ultimately
understandable patterns in
data.
from Advances in Knowledge Discovery and Data
Mining, Fayyad, Piatetsky-Shapiro, Smyth, and
Uthurusamy, (Chapter 1), AAAI/MIT Press 1996
February 9, 2023 Data Mining 9
What is (not) Data Mining?
What is not Data What is Data Mining?
Mining?
– Look up phone number – Certain names are more
in phone directory prevalent in certain US
locations (O’Brien,
O’Rurke, O’Reilly… in
Boston area)
– Query a Web search – Group together similar
engine for information documents returned by
about “Amazon” search engine according to
their context (e.g. Amazon
rainforest, Amazon.com,
etc)
February 9, 2023 Data Mining 10
Origins of Data Mining
■Draws ideas from
machine
learning/AI,
pattern
recognition,
statistics,
and
database systems

February 9, 2023 Data Mining 11


Data Mining: Confluence of Multiple Disciplines

Database
Statistics
Technology

Machine Data Visualizatio


Learning Mining n

Information Other
Science Disciplines
February 9, 2023 Data Mining 12
What is Data Mining: A KDD Process

Data mining: the core of


Knowledge Data Discovery
process. Pattern Evaluation

Data
Task-relevant Mining
Data
Selection
Data
Warehouse
Data
Cleaning

Data Integration
Databases
February 9, 2023 Data Mining 13
Steps of a KDD Process
1. Learning the application domain
■ relevant prior knowledge and goals of application
2. Creating a target data set → data selection
3. Data cleaning and preprocessing (may take 60% of effort!)
4. Data reduction and transformation
■ Find useful features, dimensionality/variable reduction,
invariant representation.
5. Choosing functions of data mining
■ summarization, classification, regression, association,
clustering.
6. Choosing the mining algorithm(s)
7. Data mining → search for patterns of interest
8. Pattern evaluation and knowledge presentation
■ visualization, transformation, removing redundant patterns,
etc.
9. Use of discovered knowledge
February 9, 2023 Data Mining 14
Data Mining and Business Intelligence
Increasing potential
to support
business decisions End User
Making
Decisions

Data Presentation Business


Analyst
Visualization Techniques
Data Mining
Information Discovery
Data
Data Exploration Analyst

Statistical Analysis, Querying and Reporting


Data Warehouses / Data Marts
OLAP, MDA
DBA
Data Sources
Paper, Files, Information Providers,
February 9, 2023 Data Mining Database Systems, OLTP 15
Architecture of a Typical Data
Mining System
Graphical user
interface

Pattern evaluation
Data mining
engine
(Database / data Knowledge-
warehouse) base
Data cleaning server
& data integration Filtering

Data
Databases Warehouse
February 9, 2023 Data Mining 16
What Tasks Can Data Mining
Accomplish?

The most common data mining tasks.


■Description
■Classification
■Estimation
■Prediction
■Clustering
■Association

February 9, 2023 Data Mining 17


Task 1: Description
■Find ways to describe patterns and trends
lying within data.
■For example:
■A pollster can uncover evidence that those who
have been laid off are less likely to support the
present incumbent in the presidential election.
■From descriptions of patterns and trends we knew
that they are now less well off financially than
before the incumbent was elected, and so would
tend to prefer an alternative.

February 9, 2023 Data Mining 18


Task 1: Description
■The models should be as transparent
as possible.
■High-quality description can often be
accomplished by exploratory data
analysis , a graphical method of
exploring data in search of patterns and
trends.

February 9, 2023 Data Mining 19


Task 2: Classification
The data mining model examines a large set of records, each record
containing information on the target variable as well as a set of
input or predictor variables.
■ For example, consider the excerpt data set.

■ After “learns” the data, the algorithm can classify new records,
for which no information about income bracket is available.

February 9, 2023 Data Mining 20


Task 2: Classification
Examples of classification tasks in business and research include:
■ Determining whether a particular credit card transaction is
fraudulent
■ Placing a new student into a particular track with regard to
special needs
■ Assessing whether a mortgage application is a good or bad
credit risk
■ Diagnosing whether a particular disease is present
■ Determining whether a will was written by the actual deceased,
or fraudulently by someone else
■ Classifying type of drug a patient should be prescribed, based
on certain patient characteristics.
■ Etc.

February 9, 2023 Data Mining 21


Task 2: Classification
■Common data mining methods
used for classification are:
■k -nearest neighbor
■decision tree
■neural network

February 9, 2023 Data Mining 22


Task 3: Estimation
■Similar to classification except that the target
variable is numerical rather than categorical.
■Models are built using “complete ” records,
which provide the value of the target variable
as well as the predictors.
■Then, for new observations, estimates of the
value of the target variable are made, based
on the values of the predictors.

February 9, 2023 Data Mining 23


Task 3: Estimation
Examples of estimation tasks in business and research include:
■ Estimating the amount of money a randomly chosen family of
four will spend for back-to-school shopping this fall.
■ Estimating the percentage decrease in rotary-movement
sustained by a National Football League running back with a
knee injury.
■ Estimating the number of points per game that Patrick Ewing
will score when double-teamed in the playoffs.
■ Estimating the grade-point average (GPA) of a graduate
student, based on that student ’s undergraduate GPA.
■ Estimating person yearly incomes based on the description and
personal data, ie: age, jobs, home addresses, etc.
■ Etc.

February 9, 2023 Data Mining 24


Task 3: Estimation
■Common data mining methods used for
estimation are:
■Statistical analysis:
■Point estimation
■Confidence interval estimations
■Simple linear regression
■Multiple regression
■Correlation
■Neural networks
February 9, 2023 Data Mining 25
Task 4: Prediction
Similar to classification and estimation, except that for
prediction, the results lie in the future.
■ For example, predicting the price of a stock three
months in the future.

February 9, 2023 Data Mining 26


Task 4: Prediction
Examples of prediction tasks in business and research
include:
■ Predicting the price of a stock three months into the
future
■ Predicting the percentage increase in traffic deaths
next year if the speed limit is increased
■ Predicting the winner of this fall’s baseball World
Series, based on a comparison of team statistics
■ Predicting whether a particular molecule in drug
discovery will lead to a profitable new drug for a
pharmaceutical company
February 9, 2023 Data Mining 27
Task 4: Prediction
■Any of the methods and techniques
used for classification and estimation
may also be used for prediction. These
include:
■Statistical methods
■Neural Networks
■Decision tree
■k-nearest neighbor
February 9, 2023 Data Mining 28
Task 5: Clustering
■ Grouping of records, observations, or cases into
classes of similar objects.
■ A cluster is a collection of records that are similar to
one another, and dissimilar to records in other
clusters.
■ The clustering task does not try to classify, estimate,
or predict the value of a target variable.
■ It seek to segment the entire data set into relatively
homogeneous subgroups or clusters.

February 9, 2023 Data Mining 29


Task 5: Clustering
■ For Example, PRIZM segmentation system, which
describes every U.S. zip code area in terms of
distinct lifestyle types.
■ For illustration, the clusters for zip code 90210,
Beverly Hills, California, are:
■ Cluster 01: Blue Blood Estates
■ Cluster 10: Bohemian Mix
■ Cluster 02: Winner ’s Circle
■ Cluster 07: Money and Brains
■ Cluster 08: Young Literati

February 9, 2023 Data Mining 30


Task 5: Clustering
Common data mining methods used for
clustering are:
■Hierarchical clustering (AgNes, DiAna, etc)
■Partitional clustering (K–means, PAM, etc)
■DB-Scan
■Kohonen networks

February 9, 2023 Data Mining 32


Task 6: Association
■Finding which attributes “go together. ”
■Most prevalent in the business world.
■It is known as affinity analysis or
market basket analysis
■The task of association seeks to
uncover rules for quantifying the
relationship between two or more
attributes.
February 9, 2023 Data Mining 33
Task 6: Association
■For example, a particular supermarket may
find that of the 1000 customers shopping on a
Thursday night, 200 bought diapers, and of
those 200 who bought diapers, 50 bought
beer.
■Thus, the association rule would be “If buy
diapers, then buy beer” with a support of
200/1000 = 20% and a confidence of 50/200
= 25%.

February 9, 2023 Data Mining 34


Task 6: Association
Examples of association tasks in business and research
include:
■ Examining the proportion of children whose parents read to
them who are themselves good readers
■ Predicting degradation in telecommunications networks
■ Finding out which items in a supermarket are purchased
together and which items are never purchased together
■ Determining the proportion of cases in which a new drug
will exhibit dangerous side effects
■ Cross-selling analysis of the products.
■ Optimize the performance of online banner advertisement,
which presents discount offers on various investment
products Data Mining 35
February 9, 2023
Task 6: Association
Common data mining methods used for
association are:
■Apriori Algorithm
■FP-Tree
■Generalized Rule Induction Method
■Etc.

February 9, 2023 Data Mining 36


Potential Applications
■ Database analysis and decision support
■ Market analysis and management
■target marketing, customer relation management, market
basket analysis, cross selling, market segmentation
■ Risk analysis and management
■Forecasting, customer retention, improved underwriting,
quality control, competitive analysis
■ Fraud detection and management
■ Other Applications
■ Text mining (news, email, documents) and Web analysis.
■ Intelligent query answering
February 9, 2023 Data Mining 37
Market Analysis and Management (1)
■ The Data Sources
■ Sales transactions, credit card transactions, loyalty cards,
discount coupons, customer complaint calls, plus (public)
lifestyle studies
■ Target marketing
■ Find clusters of “model” customers who share the same
characteristics: interest, income level, spending habits, etc.
■ Determine customer purchasing patterns over time
■ Conversion of single to a joint bank account: marriage, etc.
■ Cross-market analysis
■ Associations/co-relations between product sales
■ Prediction based on the association information
February 9, 2023 Data Mining 38
Market Analysis and Management (2)
■ Customer profiling
■ data mining can tell you what types of customers buy what
products (clustering or classification)

■ Identifying customer requirements


■ identifying the best products for different customers
■ finding what factors will attract new customers

■ Provides summary information


■ various multidimensional summary reports
■ statistical summary information (data central tendency and
variation)
February 9, 2023 Data Mining 39
Corporate Analysis and Risk Management
■ Finance planning and asset evaluation:
■ cash flow analysis and prediction
■ claim analysis to evaluate assets
■ cross-sectional and time series analysis (financial-ratio, trend
analysis, etc.)
■ Resource planning:
■ summarize and compare the resources and spending
■ Competition:
■ monitor competitors and market directions
■ group customers into classes and a class-based pricing
procedure
■ set pricing strategy in a highly competitive market

February 9, 2023 Data Mining 40


Successful e-commerce – Case Study

February 9, 2023 Data Mining 41


Fraud Detection and Management (1)
■ Applications
■ widely used in health care, retail, credit card services,
telecommunications (phone card fraud), etc.
■ Approach
■ use historical data to build models of fraudulent behavior and
use data mining to help identify similar instances
■ Examples
■ auto insurance: detect a group of people who stage accidents
to collect on insurance
■ money laundering: detect suspicious money transactions (US
Treasury's Financial Crimes Enforcement Network)
■ medical insurance: detect professional patients and ring of
doctors and ring of references
February 9, 2023 Data Mining 42
Fraud Detection and Management (2)
■ Detecting inappropriate medical treatment
■ Australian Health Insurance Commission identifies that in many
cases blanket screening tests were requested (save Australian
$1m/yr).
■ Detecting telephone fraud
■ Telephone call model: destination of the call, duration, time of
day or week. Analyze patterns that deviate from an expected
norm.
■ British Telecom identified discrete groups of callers with
frequent intra-group calls, especially mobile phones, and broke
a multimillion dollar fraud.
■ Retail
■ Analysts estimate that 38% of retail shrink is due to dishonest
employees.
February 9, 2023 Data Mining 43
Other Applications
■ Sports
■ IBM Advanced Scout analyzed NBA game statistics (shots blocked,
assists, and fouls) to gain competitive advantage for New York Knicks
and Miami Heat
■ Astronomy
■ JPL and the Palomar Observatory discovered 22 quasars with the help
of data mining
■ Internet Web Surf-Aid
■ IBM Surf-Aid applies data mining algorithms to Web access logs for
market-related pages to discover customer preference and behavior
pages, analyzing effectiveness of Web marketing, improving Web site
organization, etc.
■ Detecting diseases, pendemic, epidemic, plagues spreading.
February 9, 2023 Data Mining 44
Data Mining: On What Kind of Data?
■ Relational databases
■ Data warehouses
■ Transactional databases
■ Advanced DB and information repositories
■Object-oriented and object-relational databases
■Spatial databases
■Time-series data and temporal data
■Text databases and multimedia databases
■Heterogeneous and legacy databases
■WWW
February 9, 2023 Data Mining 45
Thanks

February 9, 2023 Data Mining 51

You might also like