0% found this document useful (0 votes)

8 views51 pages

Classification Lecture 1

Chapter 6 discusses classification and prediction in data mining, defining classification as predicting categorical labels and prediction as modeling continuous values. It outlines the processes of model construction and usage, the importance of data preparation, and evaluates classification methods based on accuracy, speed, and interpretability. The chapter also covers decision tree induction, Bayesian classification, and issues like overfitting and scalability in large databases.

Uploaded by

agents0209

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views51 pages

Classification Lecture 1

Uploaded by

agents0209

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 51

Chapter 6.

Classification and Prediction

 What is classification? What is  Prediction

prediction?  Accuracy and error measures
 Issues regarding classification  Summary
and prediction

 Classification by decision tree

induction

 Bayesian classification

 Rule-based classification

January 27, 2015 Data Mining: Concepts and Techniques 1

Classification vs. Prediction
 Classification
 predicts categorical class labels (discrete or nominal)

 classifies data (constructs a model) based on the

training set and the values (class labels) in a
classifying attribute and uses it in classifying new data
 Prediction
 models continuous-valued functions, i.e., predicts
unknown or missing values
 Typical applications
 Credit approval

 Target marketing

 Medical diagnosis

 Fraud detection

January 27, 2015 Data Mining: Concepts and Techniques 2

Classification—A Two-Step Process
 Model construction: describing a set of predetermined classes
 Each tuple/sample is assumed to belong to a predefined class,
as determined by the class label attribute
 The set of tuples used for model construction is training set

 The model is represented as classification rules, decision trees,

or mathematical formulae
 Model usage: for classifying future or unknown objects
 Estimate accuracy of the model

 The known label of test sample is compared with the

classified result from the model

 Accuracy rate is the percentage of test set samples that are

correctly classified by the model

 Test set is independent of training set, otherwise over-fitting

will occur
 If the accuracy is acceptable, use the model to classify data
tuples whose class labels are not known
January 27, 2015 Data Mining: Concepts and Techniques 3
Process (1): Model Construction

Classification
Algorithms
Training
Data

NAME RANK YEARS TENURED Classifier

M ike A ssistant P rof 3 no (Model)
M ary A ssistant P rof 7 yes
B ill P rofessor 2 yes
Jim A ssociate P rof 7 yes
IF rank = ‘professor’
D ave A ssistant P rof 6 no
OR years > 6
A nne A ssociate P rof 3 no
THEN tenured = ‘yes’
January 27, 2015 Data Mining: Concepts and Techniques 4
Process (2): Using the Model in Prediction

Classifier

Testing
Data Unseen Data

(Jeff, Professor, 4)
NAME RANK YEARS TENURED
T om A ssistant P rof 2 no Tenured?
M erlisa A ssociate P rof 7 no
G eorge P rofessor 5 yes
Joseph A ssistant P rof 7 yes
January 27, 2015 Data Mining: Concepts and Techniques 5
Supervised vs. Unsupervised Learning

 Supervised learning (classification)

 Supervision: The training data (observations,
measurements, etc.) are accompanied by labels
indicating the class of the observations
 New data is classified based on the training set
 Unsupervised learning (clustering)
 The class labels of training data is unknown
 Given a set of measurements, observations, etc. with
the aim of establishing the existence of classes or
clusters in the data
January 27, 2015 Data Mining: Concepts and Techniques 6
Issues: Data Preparation

 Data cleaning
 Preprocess data in order to reduce noise and handle
missing values
 Relevance analysis (feature selection)
 Remove the irrelevant or redundant attributes
 Data transformation
 Generalize and/or normalize data

January 27, 2015 Data Mining: Concepts and Techniques 7

Issues: Evaluating Classification Methods

 Accuracy
 classifier accuracy: predicting class label

 predictor accuracy: guessing value of predicted

attributes
 Speed
 time to construct the model (training time)

 time to use the model (classification/prediction time)

 Robustness: handling noise and missing values

 Scalability: efficiency in disk-resident databases
 Interpretability
 understanding and insight provided by the model

 Other measures, e.g., goodness of rules, such as decision

tree size or compactness of classification rules
January 27, 2015 Data Mining: Concepts and Techniques 8
Decision Tree Induction: Training Dataset

age income student credit_rating buys_computer

<=30 high no fair no
This <=30 high no excellent no
31…40 high no fair yes
follows an >40 medium no fair yes
example >40 low yes fair yes
of >40 low yes excellent no
31…40 low yes excellent yes
Quinlan’s <=30 medium no fair no
ID3 <=30 low yes fair yes
>40 medium yes fair yes
<=30 medium yes excellent yes
31…40 medium no excellent yes
31…40 high yes fair yes
>40 medium no excellent no

January 27, 2015 Data Mining: Concepts and Techniques 9

Output: A Decision Tree for “buys_computer”

age?

<=30 overcast
31..40 >40

student? yes credit rating?

no yes excellent fair

no yes yes

January 27, 2015 Data Mining: Concepts and Techniques 10

Algorithm for Decision Tree Induction
 Basic algorithm (a greedy algorithm)
 Tree is constructed in a top-down recursive divide-and-conquer
manner
 At start, all the training examples are at the root
 Attributes are categorical (if continuous-valued, they are
discretized in advance)
 Examples are partitioned recursively based on selected attributes
 Test attributes are selected on the basis of a heuristic or
statistical measure (e.g., information gain)
 Conditions for stopping partitioning
 All samples for a given node belong to the same class
 There are no remaining attributes for further partitioning –
majority voting is employed for classifying the leaf
 There are no samples left
January 27, 2015 Data Mining: Concepts and Techniques 11
Algorithm for Decision Tree
Induction
 Algorithm: Generate_decision_tree.Generate a
decision tree from the given data
 Input: The training samples , represented by
discrete valued attributes, the set of candidate
attributes, attribute_list.
 Output: A decision tree.
 Method:
(1) create a node N;
(2) if samples are all of the same class , C then
(3) return N as a leaf node labelled with the
class C;
January 27, 2015 Data Mining: Concepts and Techniques 12
Algorithm for Decision Tree
Induction

(4) if attribute_list is empty then

(5) return N as a leaf node labelled with the most
common class in samples;// majority voting
(6) select test_attribute , the attribute among
attribute_list with the highest information
gain;
(7) label node N with the test_attribute;
(8) for each known value ai of test attribute // partition the
samples

(9) grow a branch from node N for the condition test_attribute= ai

(10) let si be the set of samples in samples for which test attribute = ai
January 27, 2015 Data Mining: Concepts and Techniques 13
Algorithm for Decision Tree
Induction

(11) if si is empty then

(12) attach a leaf labelled with the most
common class in samples ;
(13) else attach a node returned by
Generate_decision_tree
(si,attribute_list,test_attribute)

January 27, 2015 Data Mining: Concepts and Techniques 14

15
Attribute Selection Measure:
Information Gain (ID3/C4.5)
 Select the attribute with the highest information gain
 Let pi be the probability that an arbitrary tuple in D
belongs to class Ci, estimated by |Ci, D|/|D|
 Expected information (entropy) needed to classify a tuple
in D: m
Info( D)   pi log 2 ( pi )
i 1

 Information needed (after using A to split D into v

partitions) to classify D: v |D |
InfoA ( D)    I (D j )
j

j 1 | D |

 Information gained by branching on attribute A

Gain(A)  Info(D)  InfoA(D)
January 27, 2015 Data Mining: Concepts and Techniques 16
Attribute Selection: Information Gain

 Class P: buys_computer = “yes” age pi ni I(pi, ni)

 Class N: buys_computer = “no”
<=30 2 3 0.971
Info( D)  I (9,5)  
9 9
log 2 ( ) 
5 5
log 2 ( ) 0.940 31…40 4 0 0
14 14 14 14
>40 3 2 0.971
age income student credit_rating buys_computer
<=30 high no fair no
<=30 high no excellent no
31…40 high no fair yes
>40 medium no fair yes
>40 low yes fair yes
>40 low yes excellent no
31…40 low yes excellent yes
<=30 medium no fair no Gain(age)  Info( D)  Infoage ( D)  0.246
<=30 low yes fair yes
>40 medium yes fair yes Gain(income)  0.029
<=30 medium yes excellent yes
31…40 medium no excellent yes Gain( student)  0.151
31…40 high yes fair yes
>40 medium no excellent no Gain(credit _ rating)  0.048
17
Gini index (CART, IBM IntelligentMiner)
 Ex. D has 9 tuples in buys_computer = “yes” and 5 in “no”
2 2
9 5
gini( D)  1        0.459
 14   14 
 Suppose the attribute income partitions D into 10 in D1: {low,
medium} and 4 in D2 gini  10  4
income{low, medium} ( D )   Gini( D1 )   Gini( D1 )
 14   14 

but gini{medium,high} is 0.30 and thus the best since it is the lowest
 All attributes are assumed continuous-valued
 May need other tools, e.g., clustering, to get the possible split values
 Can be modified for categorical attributes

January 27, 2015 Data Mining: Concepts and Techniques 25

Overfitting and Tree Pruning

 Overfitting:
 Overfitting results in decision trees that are more
complex than necessary
 An induced tree may overfit the training data
 Too many branches, some may reflect anomalies due to noise or
outliers
 Poor accuracy for unseen samples
 Two approaches to avoid overfitting
 Prepruning: Halt tree construction early—do not split a node if this
would result in the goodness measure falling below a threshold
 Difficult to choose an appropriate threshold

January 27, 2015 Data Mining: Concepts and Techniques 27

Post pruning

– Trim the nodes of the decision tree in a

bottom-up fashion
– If generalization error improves after trimming,
replace sub-tree by a leaf node.
– Class label of leaf node is determined from
majority class of instances in the sub-tree

January 27, 2015 Data Mining: Concepts and Techniques 28

Classification in Large Databases

 Classification—a classical problem extensively studied by

statisticians and machine learning researchers
 Scalability: Classifying data sets with millions of examples
and hundreds of attributes with reasonable speed
 Why decision tree induction in data mining?
 relatively faster learning speed (than other classification
methods)
 convertible to simple and easy to understand
classification rules
 can use SQL queries for accessing databases
 comparable classification accuracy with other methods

January 27, 2015 Data Mining: Concepts and Techniques 30

Scalable Decision Tree
Induction Methods in Data
Mining Studies
 SLIQ
 builds an index for each attribute and only class list and

the current attribute list reside in memory.

 Handles disk resident data sets using disk resident
attribute list and memory resident class list.
 Memory restriction is there when the training set is tool

large.
 When a class list becomes too large performance of

SLIQ decreases.
 SPRINT
 constructs an attribute list data structure .

 SPRINT removes all memory restrictions.

 Designed to be easily parallelized.

January 27, 2015 Data Mining: Concepts and Techniques 31
Scalable Decision Tree Induction
Methods in Data Mining Studies
 PUBLIC
 integrates tree splitting and tree pruning: stop growing

the tree earlier

 RainForest
 separates the scalability aspects from the criteria that

determine the quality of the tree

 builds an AVC-list (attribute, value, class label)

 Rain forest report a speed up over SPRINT.

January 27, 2015 Data Mining: Concepts and Techniques 32

Bayesian Classification: Why?
 A statistical classifier: performs probabilistic prediction,
i.e., predicts class membership probabilities
 Foundation: Based on Bayes’ Theorem.
 Performance: A simple Bayesian classifier, naïve Bayesian
classifier, has comparable performance with decision tree
and selected neural network classifiers
 Incremental: Each training example can incrementally
increase/decrease the probability that a hypothesis is
correct — prior knowledge can be combined with observed
data
 Standard: Even when Bayesian methods are
computationally intractable, they can provide a standard
of optimal decision making against which other methods
can be measured
January 27, 2015 Data Mining: Concepts and Techniques 33
Bayesian Theorem: Basics

 Let X be a data sample (“evidence”): class label is unknown

 Let H be a hypothesis that X belongs to class C
 Classification is to determine P(H|X), the probability that
the hypothesis holds given the observed data sample X
 P(H) (prior probability), the initial probability
 E.g., X will buy computer, regardless of age, income, …
 P(X): probability that sample data is observed
 P(X|H) (posteriori probability), the probability of observing
the sample X, given that the hypothesis holds
 E.g., Given that X will buy computer, the prob. that X is
31..40, medium income
January 27, 2015 Data Mining: Concepts and Techniques 34
Bayesian Theorem

 Given training data X, posteriori probability of a

hypothesis H, P(H|X), follows the Bayes theorem

P(H | X)  P(X | H )P(H )

P(X)
 Informally, this can be written as
posteriori = likelihood x prior/evidence
 Predicts X belongs to Ci iff the probability P(Ci|X) is the
highest among all the P(Ck|X) for all the k classes
 Practical difficulty: require initial knowledge of many
probabilities, significant computational cost
January 27, 2015 Data Mining: Concepts and Techniques 35
Towards Naïve Bayesian Classifier
 Let D be a training set of tuples and their associated class
labels, and each tuple is represented by an n-D attribute
vector X = (x1, x2, …, xn)
 Suppose there are m classes C1, C2, …, Cm.
 Classification is to derive the maximum posteriori, i.e., the
maximal P(Ci|X)
 This can be derived from Bayes’ theorem
P(X | C )P(C )
P(C | X)  i i
i P(X)
 Since P(X) is constant for all classes, only
P(C | X)  P(X | C )P(C )
i i i
needs to be maximized

January 27, 2015 Data Mining: Concepts and Techniques 36

Derivation of Naïve Bayes Classifier
 A simplified assumption: attributes are conditionally
independent (i.e., no dependence relation between
attributes): n
P( X | C i)   P( x | C i)  P( x | C i)  P( x | C i)  ... P( x | C i)
k 1 2 n
k 1
 This greatly reduces the computation cost: Only counts
the class distribution
 If Ak is categorical, P(xk|Ci) is the # of tuples in Ci having
value xk for Ak divided by |Ci, D| (# of tuples of Ci in D)
 If Ak is continous-valued, P(xk|Ci) is usually computed
based on Gaussian distribution with a mean μ and
standard deviation σ 
( x ) 2
1
g ( x,  ,  )  e 2 2
2 
and P(xk|Ci) is
P(X | Ci)  g ( xk , Ci , Ci )
January 27, 2015 Data Mining: Concepts and Techniques 37
Naïve Bayesian Classifier: Training Dataset
age income studentcredit_rating
buys_compu
<=30 high no fair no
<=30 high no excellent no
Class: 31…40 high no fair yes
C1:buys_computer = ‘yes’ >40 medium no fair yes
C2:buys_computer = ‘no’ >40 low yes fair yes
>40 low yes excellent no
Data sample
31…40 low yes excellent yes
X = (age <=30,
Income = medium, <=30 medium no fair no
Student = yes <=30 low yes fair yes
Credit_rating = Fair) >40 medium yes fair yes
<=30 medium yes excellent yes
31…40 medium no excellent yes
31…40 high yes fair yes
>40 medium no excellent no
January 27, 2015 Data Mining: Concepts and Techniques 38
Naïve Bayesian Classifier: An Example
 P(Ci): P(buys_computer = “yes”) = 9/14 = 0.643
P(buys_computer = “no”) = 5/14= 0.357

 Compute P(X|Ci) for each class

P(age = “<=30” | buys_computer = “yes”) = 2/9 = 0.222
P(age = “<= 30” | buys_computer = “no”) = 3/5 = 0.6
P(income = “medium” | buys_computer = “yes”) = 4/9 = 0.444
P(income = “medium” | buys_computer = “no”) = 2/5 = 0.4
P(student = “yes” | buys_computer = “yes) = 6/9 = 0.667
P(student = “yes” | buys_computer = “no”) = 1/5 = 0.2
P(credit_rating = “fair” | buys_computer = “yes”) = 6/9 = 0.667
P(credit_rating = “fair” | buys_computer = “no”) = 2/5 = 0.4

 X = (age <= 30 , income = medium, student = yes, credit_rating = fair)

P(X|Ci) : P(X|buys_computer = “yes”) = 0.222 x 0.444 x 0.667 x 0.667 = 0.044

P(X|buys_computer = “no”) = 0.6 x 0.4 x 0.2 x 0.4 = 0.019
P(X|Ci)*P(Ci) : P(X|buys_computer = “yes”) * P(buys_computer = “yes”) = 0.028
P(X|buys_computer = “no”) * P(buys_computer = “no”) = 0.007

Therefore, X belongs to class (“buys_computer = yes”)

January 27, 2015 Data Mining: Concepts and Techniques 39
Example - 2
Outlook Temperature Humidity Windy Class
sunny hot high false N
sunny hot high true N
An unseen sample
overcast hot high false P
rain mild high false P X = <rain, hot, high,
rain cool normal false P false>
rain cool normal true N
overcast cool normal true P
sunny mild high false N
sunny cool normal false P
rain mild normal false P
sunny mild normal true P
overcast mild high true P
overcast hot normal false P
rain mild high true N
January 27, 2015 Data Mining: Concepts and Techniques 40
Play-tennis example: estimating
P(xi|C)
Outlook Temperature Humidity Windy Class outlook
sunny hot high false N P(sunny|p) = 2/9 P(sunny|n) = 3/5
sunny hot high true N
overcast hot high false P P(overcast|p) = P(overcast|n) = 0
rain mild high false P 4/9
rain cool normal false P P(rain|p) = 3/9 P(rain|n) = 2/5
rain cool normal true N temperature
overcast cool normal true P
sunny mild high false N P(hot|p) = 2/9 P(hot|n) = 2/5
sunny cool normal false P P(mild|p) = 4/9 P(mild|n) = 2/5
rain mild normal false P P(cool|p) = 3/9 P(cool|n) = 1/5
sunny mild normal true P
overcast mild high true P humidity
overcast hot normal false P P(high|p) = 3/9 P(high|n) = 4/5
rain mild high true N P(normal|p) = P(normal|n) =
6/9 2/5
P(p) = 9/14
windy
P(n) = 5/14 P(true|p) = 3/9 P(true|n) = 3/5
January 27, 2015 Data Mining: Concepts and Techniques 41
Play-tennis example: classifying X

 An unseen sample X = <rain, hot, high, false>

 Sample X is classified in class n (don’t play)

January 27, 2015 Data Mining: Concepts and Techniques 42

Naïve Bayesian Classifier: Comments
 Advantages
 Easy to implement

 Good results obtained in most of the cases

 Disadvantages
 Assumption: class conditional independence, therefore

loss of accuracy
 Practically, dependencies exist among variables

 E.g., hospitals: patients: Profile: age, family history, etc.

Symptoms: fever, cough etc., Disease: lung cancer, diabetes, etc.
 Dependencies among these cannot be modeled by Naïve

Bayesian Classifier
 How to deal with these dependencies?
 Bayesian Belief Networks
January 27, 2015 Data Mining: Concepts and Techniques 43
Bayesian Belief Networks

 Bayesian belief network allows a subset of the variables

conditionally independent
 A graphical model of casual relationships
 Represents dependency among the variables
 Gives a specification of joint probability distribution
 Nodes: random variables
 Links: dependency
X Y  X and Y are the parents of Z, and Y is
the parent of P
Z  No dependency between Z and P
P  Has no loops or cycles
January 27, 2015 Data Mining: Concepts and Techniques 44
Bayesian Belief Network: An Example

Family The conditional probability table

Smoker
History (CPT) for variable LungCancer:
(FH, S) (FH, ~S) (~FH, S) (~FH, ~S)

LC 0.8 0.5 0.7 0.1

~LC 0.2 0.5 0.3 0.9
LungCancer Emphysema
CPT shows the conditional probability for each
possible combination of its parents. The CPT for
a variable Z specifies the conditional distribution
P(Z/Parents(Z)).
P(Lungcancer=“yes” | FamilyHistory = “yes” ,
PositiveXRay Dyspnea smoker=“yes”)=0.8

Bayesian Belief Networks Derivation of the probability of a particular

combination of values of X, from CPT:
n
P( x1 ,..., xn )   P( xi | Parents( X i ))
January 27, 2015 i 1 45
Chapter 6. Classification and Prediction

 What is classification? What is  Prediction

prediction?  Accuracy and error measures
 Issues regarding classification  Summary
and prediction

 Classification by decision tree

induction

 Bayesian classification

 Rule-based classification

January 27, 2015 Data Mining: Concepts and Techniques 46

What Is Prediction?
 (Numerical) prediction is similar to classification
 construct a model

 use model to predict continuous or ordered value for a given input

 Prediction is different from classification

 Classification refers to predict categorical class label

 Prediction models continuous-valued functions

 Major method for prediction: regression

 model the relationship between one or more independent or

predictor variables and a dependent or response variable

 Regression analysis
 Linear and multiple regression

 Non-linear regression

 Other regression methods: generalized linear model, Poisson

regression, log-linear models, regression trees

January 27, 2015 Data Mining: Concepts and Techniques 47
Linear Regression
 Linear regression: involves a response variable y and a single
predictor variable x
y = w0 + w 1 x
where w0 (y-intercept) and w1 (slope) are regression coefficients
 Method of least squares: estimates the best-fitting straight line
| D|

 (x  x )( yi  y )
w  i 1
i
w  y w x
1 | D|
0 1
 (x
i 1
i  x )2

 Multiple linear regression: involves more than one predictor variable

 Training data is of the form (X1, y1), (X2, y2),…, (X|D|, y|D|)
 Ex. For 2-D data, we may have: y = w0 + w1 x1+ w2 x2
 Solvable by extension of least square method
 Many nonlinear functions can be transformed into the above
January 27, 2015 Data Mining: Concepts and Techniques 48
Regression - Example
 Table shows a set of X Y
paired data where X is Years Salary (in $
Experience 1000s)
the number of years of 3 30
work experience of a 8 57
college graduate and y 9 64
is the corresponding 13 72
salary of the graduate. 3 36
6 43
 Y = 23.6 + 3.5X 11 59
 Predict the salary for a 21 90
graduate with 10 yrs of 1 20
experience. 16 83

 Y = 58.6$
January 27, 2015 Data Mining: Concepts and Techniques 49
Nonlinear Regression
 Some nonlinear models can be modeled by a polynomial
function
 A polynomial regression model can be transformed into
linear regression model. For example,
y = w0 + w1 x + w2 x2 + w3 x3
convertible to linear with new variables: x2 = x2, x3= x3
y = w0 + w1 x + w2 x2 + w3 x3
 Other functions, such as power function, can also be
transformed to linear model
 Some models are intractable nonlinear (e.g., sum of
exponential terms)
 possible to obtain least square estimates through

extensive calculation on more complex formulae

January 27, 2015 Data Mining: Concepts and Techniques 50
Chapter 6. Classification and Prediction

 What is classification? What is  Prediction

prediction?  Accuracy and error measures
 Issues regarding classification  Summary
and prediction

 Classification by decision tree

induction

 Bayesian classification

 Rule-based classification

January 27, 2015 Data Mining: Concepts and Techniques 51

Evaluating the Accuracy of a Classifier
or Predictor (I)
 Holdout method
 Given data is randomly partitioned into two independent sets

 Training set (e.g., 2/3) for model construction

 Test set (e.g., 1/3) for accuracy estimation

Derive Estimate
Training Classifier Accuracy
set

Data

Test set
 Random sampling: a variation of holdout
 Repeat holdout k times, accuracy = avg. of the accuracies

obtained
January 27, 2015 Data Mining: Concepts and Techniques 52
Evaluating the Accuracy of a Classifier
or Predictor (I)
 Cross-validation (k-fold, where k = 10 is most popular)
 Randomly partition the data into k mutually exclusive subsets,

each approximately equal size

 At i-th iteration, use Di as test set and others as training set

 The accuracy estimate =

Overall number of correct classifications from the k iterations

Total number of samples in the initial data

 Leave-one-out: k folds where k = # of tuples, for small sized data

 Stratified cross-validation: folds are stratified so that class dist. in

each fold is approx. the same as that in the initial data.

January 27, 2015 Data Mining: Concepts and Techniques 53

Ensemble Methods: Increasing the Accuracy

 Ensemble methods
 Use a combination of models to increase accuracy

 Combine a series of k learned models, M1, M2, …, Mk,

with the aim of creating an improved model M*

 Popular ensemble methods
 Bagging: averaging the prediction over a collection of

classifiers
 Boosting: weighted vote with a collection of classifiers

 Ensemble: combining a set of heterogeneous classifiers

January 27, 2015 Data Mining: Concepts and Techniques 55

Bagging: Boostrap Aggregation
 Analogy: Diagnosis based on multiple doctors’ majority vote
 Training
 Given a set D of d tuples, at each iteration i, a training set Di of d

tuples is sampled with replacement from D (i.e., boostrap)

 A classifier model Mi is learned for each training set Di

 Classification: classify an unknown sample X

 Each classifier Mi returns its class prediction

 The bagged classifier M* counts the votes and assigns the class

with the most votes to X

 Prediction: can be applied to the prediction of continuous values by
taking the average value of each prediction for a given test tuple
 Accuracy
 Often significant better than a single classifier derived from D

 For noise data: not considerably worse, more robust

 Proved improved accuracy in prediction

January 27, 2015 Data Mining: Concepts and Techniques 56
Boosting
 Analogy: Consult several doctors, based on a combination of weighted
diagnoses—weight assigned based on the previous diagnosis accuracy
 How boosting works?
 Weights are assigned to each training tuple
 A series of k classifiers is iteratively learned
 After a classifier Mi is learned, the weights are updated to allow the
subsequent classifier, Mi+1, to pay more attention to the training
tuples that were misclassified by Mi
 The final M* combines the votes of each individual classifier, where
the weight of each classifier's vote is a function of its accuracy
 The boosting algorithm can be extended for the prediction of
continuous values
 Comparing with bagging: boosting tends to achieve greater accuracy,
but it also risks overfitting the model to misclassified data
January 27, 2015 Data Mining: Concepts and Techniques 57
Classifier Accuracy Measures and Confusion
matrix
 t_pos (Eg “cancer samples” that were correctly

classified as such)
 t_neg (“not_cancer” samples that were
correctly classified as such)
 False positives (“not_cancer” samples that were

incorrectly labeled as “cancer”)

 False negative(“cancer” samples that were
incorrectly labeled as “not_cancer”)
 pos is the number of positive C1 C2
samples C1 t_pos f_neg
 neg is the number of negative C2 f_pos t_neg
samples
January 27, 2015 Data Mining: Concepts and Techniques 58
Classifier Accuracy Measures

classes buy_computer = yes buy_computer = no total recognition(%)

buy_computer = yes 6954 46 7000 99.34
buy_computer = no 412 2588 3000 86.27
total 7366 2634 10000 95.52
 Accuracy of a classifier M, acc(M): percentage of test set tuples that are
correctly classified by the model M
 Error rate (misclassification rate) of M = 1 – acc(M)

 Given m classes, CMi,j, an entry in a confusion matrix, indicates #

of tuples in class i that are labeled by the classifier as class j

 Alternative accuracy measures (e.g., for cancer diagnosis)
sensitivity = t-pos/pos /* true positive recognition rate */
specificity = t-neg/neg /* true negative recognition rate */
precision = t-pos/(t-pos + f-pos)
accuracy = sensitivity * pos/(pos + neg) + specificity * neg/(pos + neg)
 This model can also be used for cost-benefit analysis

January 27, 2015 Data Mining: Concepts and Techniques 59

Predictor Error Measures
 Measure predictor accuracy: measure how far off the predicted value is
from the actual known value
 Loss function: measures the error betw. yi and the predicted value yi’
 Absolute error: | yi – yi’|
 Squared error: (yi – yi’)2
 Test error (generalization error):
d
the average loss over the test
d
set
 Mean absolute error: | y
i 1
i  yi ' | Mean squared error: ( y  y ')
i 1
i i
2

d d
d

 Relative absolute error:  | y

i  yi ' |
Relative squared error:  ( yi  yi ' ) 2
i 1
i 1
d d
| y
i 1
i y|  ( y  y)
i 1
i
2

The mean squared-error exaggerates the presence of outliers

Popularly use (square) root mean-square error, similarly, root relative
squared error
January 27, 2015 Data Mining: Concepts and Techniques 60
January 27, 2015 Data Mining: Concepts and Techniques 61

Chapter 6 Classification and Prediction25.10.13
No ratings yet
Chapter 6 Classification and Prediction25.10.13
43 pages
Slides For Textbook - Chapter 7 - : March 6, 2014 Data Mining: Concepts and Techniques 1
No ratings yet
Slides For Textbook - Chapter 7 - : March 6, 2014 Data Mining: Concepts and Techniques 1
23 pages
Classification Prediction
No ratings yet
Classification Prediction
71 pages
7 Class
No ratings yet
7 Class
72 pages
Unit V Classification
No ratings yet
Unit V Classification
69 pages
7 Class
No ratings yet
7 Class
72 pages
Data Mining: UNIT-3 Classification
No ratings yet
Data Mining: UNIT-3 Classification
54 pages
Classification and Prediction in Data Mining
No ratings yet
Classification and Prediction in Data Mining
39 pages
Chap 7
No ratings yet
Chap 7
71 pages
Chapter 7. Classification and Prediction
No ratings yet
Chapter 7. Classification and Prediction
68 pages
Classification and Prediction
No ratings yet
Classification and Prediction
40 pages
Classification Intr DT
No ratings yet
Classification Intr DT
31 pages
08 Class Basic
No ratings yet
08 Class Basic
86 pages
Data Mining: Concepts and Techniques: - Slides For Textbook - Chapter 7
No ratings yet
Data Mining: Concepts and Techniques: - Slides For Textbook - Chapter 7
72 pages
Classification Techniques in Data Mining
No ratings yet
Classification Techniques in Data Mining
67 pages
Chapter3 Classification and Prediction
No ratings yet
Chapter3 Classification and Prediction
63 pages
Classification - Prediction Data Model Very Important
No ratings yet
Classification - Prediction Data Model Very Important
173 pages
08ClassBasic v1
No ratings yet
08ClassBasic v1
46 pages
Concepts and Techniques
No ratings yet
Concepts and Techniques
53 pages
Module - 4.1-DM-1
No ratings yet
Module - 4.1-DM-1
63 pages
Data Mining: Classification Techniques
No ratings yet
Data Mining: Classification Techniques
69 pages
Classification and Prediction
No ratings yet
Classification and Prediction
134 pages
Chapter4 Classification Prediction
No ratings yet
Chapter4 Classification Prediction
173 pages
Concepts and Techniques: - Chapter 8
No ratings yet
Concepts and Techniques: - Chapter 8
87 pages
Data Mining: Concepts and Techniques: - Chapter 6
No ratings yet
Data Mining: Concepts and Techniques: - Chapter 6
172 pages
Classification
No ratings yet
Classification
36 pages
Concepts and Techniques: - Chapter 8
No ratings yet
Concepts and Techniques: - Chapter 8
81 pages
Data Mining: Concepts and Techniques: - Chapter 6
No ratings yet
Data Mining: Concepts and Techniques: - Chapter 6
115 pages
Lecture 3.1.2
No ratings yet
Lecture 3.1.2
27 pages
Classification and Prediction in Data Mining
No ratings yet
Classification and Prediction in Data Mining
126 pages
Data Mining: Concepts and Techniques: - Chapter 6
No ratings yet
Data Mining: Concepts and Techniques: - Chapter 6
129 pages
Lecture 8
No ratings yet
Lecture 8
81 pages
DM Classification 1 3
No ratings yet
DM Classification 1 3
19 pages
08 Class Basic
No ratings yet
08 Class Basic
81 pages
Concepts and Techniques: Data Mining
100% (1)
Concepts and Techniques: Data Mining
81 pages
Data Classification Basics
No ratings yet
Data Classification Basics
34 pages
CH 5
No ratings yet
CH 5
81 pages
08 Class Basic
No ratings yet
08 Class Basic
81 pages
Concepts and Techniques: - Chapter 8
No ratings yet
Concepts and Techniques: - Chapter 8
81 pages
Data Mining: Concepts and Techniques
100% (2)
Data Mining: Concepts and Techniques
139 pages
DM 3
No ratings yet
DM 3
37 pages
Data Mining for Analysts
No ratings yet
Data Mining for Analysts
38 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
88 pages
Concepts and Techniques: - Chapter 8
No ratings yet
Concepts and Techniques: - Chapter 8
42 pages
Chapter 5. Classification and Prediction
No ratings yet
Chapter 5. Classification and Prediction
122 pages
Classification & Prediction
No ratings yet
Classification & Prediction
78 pages
Topic01 Classification Basics Jiawei Han Extra
No ratings yet
Topic01 Classification Basics Jiawei Han Extra
198 pages
DM Unit-3
No ratings yet
DM Unit-3
46 pages
Chapter 6 - : Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign
No ratings yet
Chapter 6 - : Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign
129 pages
Data Mining: Concepts and Techniques: - Chapter 6
No ratings yet
Data Mining: Concepts and Techniques: - Chapter 6
129 pages
VII - CS8031 - DMDW - Module 6 - Classification - VBP
No ratings yet
VII - CS8031 - DMDW - Module 6 - Classification - VBP
99 pages
08 Class Basic
No ratings yet
08 Class Basic
76 pages
ML Unit II
No ratings yet
ML Unit II
183 pages
DM 4
No ratings yet
DM 4
68 pages
05 Classification
No ratings yet
05 Classification
79 pages
Deep Neural Decision Trees: Yongxin Yang Irene Garcia Morillo Timothy M. Hospedales
No ratings yet
Deep Neural Decision Trees: Yongxin Yang Irene Garcia Morillo Timothy M. Hospedales
7 pages
Overview of Rough Set Theory
No ratings yet
Overview of Rough Set Theory
40 pages
Heart Disease Prediction Using ML Techniques
No ratings yet
Heart Disease Prediction Using ML Techniques
14 pages
XGBoost for Real Estate Pricing
No ratings yet
XGBoost for Real Estate Pricing
5 pages
AIML Lab Manual
67% (3)
AIML Lab Manual
31 pages
Business Intelligence System Development Guide
0% (1)
Business Intelligence System Development Guide
21 pages
Data Mining Models & Evaluation Techniques
No ratings yet
Data Mining Models & Evaluation Techniques
47 pages
Machine Learning Theory Updated
No ratings yet
Machine Learning Theory Updated
8 pages
Forest Fire Prediction Using Machine Learning
No ratings yet
Forest Fire Prediction Using Machine Learning
28 pages
AI & ML Lab Manual: Search Algorithms
No ratings yet
AI & ML Lab Manual: Search Algorithms
42 pages
Machine Learning Report Official
No ratings yet
Machine Learning Report Official
17 pages
Ai Project Documentation
100% (1)
Ai Project Documentation
28 pages
BCA 6th Semester Project
No ratings yet
BCA 6th Semester Project
39 pages
CSC 425 Data Mining and Warehousing 2024
No ratings yet
CSC 425 Data Mining and Warehousing 2024
54 pages
Random Forest
No ratings yet
Random Forest
225 pages
Data Analytics With Python - Unit 14 - Week 12
100% (1)
Data Analytics With Python - Unit 14 - Week 12
4 pages
ML Unit2
No ratings yet
ML Unit2
22 pages
ID3 Algorithm & ROC Analysis
No ratings yet
ID3 Algorithm & ROC Analysis
51 pages
Detecting SIMbox Fraud in Mobile Networks
No ratings yet
Detecting SIMbox Fraud in Mobile Networks
8 pages
Module 6
No ratings yet
Module 6
82 pages
CS Students' Crop Suitability Project
No ratings yet
CS Students' Crop Suitability Project
62 pages
Final Report
No ratings yet
Final Report
26 pages
Understanding Decision Trees
No ratings yet
Understanding Decision Trees
13 pages
Decision Analysis Ba 182
No ratings yet
Decision Analysis Ba 182
111 pages
Understanding Behavioural Accounting Research
No ratings yet
Understanding Behavioural Accounting Research
34 pages
Chapter 5 Concept Description Characterization and Comparison 395
No ratings yet
Chapter 5 Concept Description Characterization and Comparison 395
64 pages
Cengage EBA 2e Chapter15
No ratings yet
Cengage EBA 2e Chapter15
70 pages
Phishing Attacks Detection Using Machine Learning Approach
No ratings yet
Phishing Attacks Detection Using Machine Learning Approach
7 pages
Real-Time Data Predictive Analysis
100% (1)
Real-Time Data Predictive Analysis
40 pages
Al3461 - Machine Learning Laboratory Final
No ratings yet
Al3461 - Machine Learning Laboratory Final
28 pages