Knowledge Discovery and Data Mining
The Classification Problem
EL Moukhtar ZEMMOURI
ENSAM-Meknès
2023-2024
The classification problem
• Task:
• Given a set of pre-classified examples, build a model or classifier to classify new
cases.
• Classes are known for the examples used to build the classifier.
• è Supervised learning problem
• è Learn by example
• A classifier can be :
• a set of rules, a decision tree, a neural network, etc.
E. Zemmouri
• Typical applications:
• Credit approval, direct marketing, fraud detection, medical diagnosis, …
2
The classification problem
• Learn a method for predicting the instance class from pre-labeled
(classified) instances
A2 Given a set of points from classes
what is the class of new point ?
E. Zemmouri
A1
3
The classification problem
• Formal definition :
• Let ! = #! = #!" , #!# , … , #!$ / ' = 1. . * be a multidimensional dataset of
* examples, characterized by , attributes -" , -# , … , -% (predictors)
• è ! is an "×$ data matrix
• Let . = /! / ' = 1. . * be a set of labels (classes) for !
• è Each data point %! ∈ ! is labeled by a class '! ∈ (
• Where C = {c" , … , c# } is a finite set of 1 predefined classes (discrete)
• The classification problem consists of using the training set !, . to learn a
E. Zemmouri
model M (classifier) that allows to predict the class / of new data point #
• If 0-12(4) = 2, then the problem is a binary classification (7 = 2)
4
Classification vs Regression
• (", $) is the training set
• Classification problem : the predicted attribute is discrete
• è predicting a discrete class label.
• Regression problem : the predicted attribute is numeric continuous
• è predicting a continuous quantity
• Predictors may be nominal or numeric
E. Zemmouri
5
Examples of applications
• Customer target marketing (direct marketing)
• Labels correspond to the user interest in a particular product,
• The attributes may correspond to the demographic profiles of the customers,
• Training examples contain also previous buying behavior,
• The training examples are used to learn whether a customer, with a known demographic
profile, but unknown buying behavior, may be interested in a particular product or not.
E. Zemmouri
6
Examples of applications
• Document categorization and filtering
• Many online applications require real-time classification of documents,
• These are used to organize the documents under specific topics of a portal,
• Previous examples of manually classified documents from each topic may be available,
• The features correspond to the words in the document,
• The class labels correspond to the various topics : politics, sports, current events, …
E. Zemmouri
7
Examples of applications
• Medical disease management
• The features may be extracted from patient medical tests and treatments,
• The class label may correspond to treatment outcomes,
• The task is to predict treatment outcomes.
• Intrusion detection
• The sequences of customer activity in a computer system may be used to predict the
possibility of intrusions.
E. Zemmouri
8
Classification models
• Various models have been designed for data classification
• Decision trees,
• Rule-based classifiers,
• Probabilistic models,
• Instance-based classifiers,
• Support vector machines,
• Neural networks,
• …
E. Zemmouri
9
The weather data
Attributes
• Fictitious dataset ! Outlook Temp Humidity Windy Play
1 Sunny Hot High False No
• It concerns the conditions that are suitable
2 Sunny Hot High True No
for playing some unspecified game.
3 Overcast Hot High False Yes
• Given data of past days, construct a model or 4 Rainy Mild High False Yes
rules to predict what to do in a new day. 5 Rainy Cool Normal False Yes
6 Rainy Cool Normal True No
Instances
• Example :
7 Overcast Cool Normal True Yes
• (sunny, cool, normal, true) 8 Sunny Mild High False No
• Play or not ? 9 Sunny Cool Normal False Yes
10 Rainy Mild Normal False Yes
E. Zemmouri
11 Sunny Mild Normal True Yes
12 Overcast Mild High True Yes
13 Overcast Hot Normal False Yes
14 Rainy Mild High True No
10
The weather data
• Weather data with mixed
Outlook Temp Humidity Windy Play
1 Sunny 85 85 False No
attributes 2 Sunny 80 90 True No
3 Overcast 83 86 False Yes
4 Rainy 70 96 False Yes
5 Rainy 68 80 False Yes
6 Rainy 65 70 True No
7 Overcast 64 65 True Yes
8 Sunny 72 95 False No
9 Sunny 69 70 False Yes
10 Rainy 75 80 False Yes
75
E. Zemmouri
11 Sunny 70 True Yes
12 Overcast 72 90 True Yes
13 Overcast 81 75 False Yes
14 Rainy 71 91 True No
11
The contact lenses data
Spectacle Tear production Recommended
Age Astigmatism
prescription rate lenses
Young Myope No Reduced None
Young Myope No Normal Soft
Young Myope Yes Reduced None
Young Myope Yes Normal Hard
Young Hypermetrope No Reduced None
Young Hypermetrope No Normal Soft
Young Hypermetrope Yes Reduced None
Young Hypermetrope Yes Normal hard
Pre-presbyopic Myope No Reduced None
Pre-presbyopic Myope No Normal Soft
Pre-presbyopic Myope Yes Reduced None
Pre-presbyopic Myope Yes Normal Hard
Pre-presbyopic Hypermetrope No Reduced None
Pre-presbyopic Hypermetrope No Normal Soft
Pre-presbyopic Hypermetrope Yes Reduced None
Pre-presbyopic Hypermetrope Yes Normal None
Presbyopic Myope No Reduced None
Presbyopic Myope No Normal None
E. Zemmouri
Presbyopic Myope Yes Reduced None
Presbyopic Myope Yes Normal Hard
Presbyopic Hypermetrope No Reduced None
Presbyopic Hypermetrope No Normal Soft
Presbyopic Hypermetrope Yes Reduced None
Presbyopic Hypermetrope Yes Normal None
12
Question
Can you imagine your own classification method ?
The Classification Problem
Simplicity first
Simplicity first
• Simple algorithms often work very well !
• Many kinds of simple structures in data :
• One attribute does all the work
• All attributes contribute equally & independently
• A weighted linear combination might do
• Instance-based
• …
• But success of a method depends on the domain
E. Zemmouri
• Data Mining is an Experimental science !
15
Simplicity first : OneR
• One Rule algorithm Attribute
• Learn rudimentary rules that all test only one attribute e va
lu lu
va value e
• è equivalent to 1 level decision tree
• How that works ? Class Class Class
• Generates one rule for each attribute (predictor) in the data
• construct a frequency table for each predictor against the target (class)
• then selects the rule with the smallest total error as its "one rule".
• Reference :
E. Zemmouri
• R.C. Holte (1993). “Very simple classification rules perform well on most
commonly used datasets.” Machine Learning.
16
OneR pseudo code
For each attribute (predictor) :
For each value of that attribute, make a rule as follows:
Count how often each class appears
Find the most frequent class
Make the rule assign that class to this attribute value
Calculate the error rate of this attribute’s rules
Choose the predictor with the smallest total error.
E. Zemmouri
17
OneR : example
Frequency Tables
Weather Problem
Which one is the best predictor ? Outlook Yes No Rule Errors Total Errors
Sunny
Outlook Temp Humidity Windy Play
Overcast
Sunny Hot High False No
Rainy
Sunny Hot High True No
Overcast Hot High False Yes
Temp Yes No Rule Errors Total Errors
Rainy Mild High False Yes
Hot
Rainy Cool Normal False Yes
Mild
Rainy Cool Normal True No
Cool
Overcast Cool Normal True Yes
Sunny Mild High False No
Humidity Yes No Rule Errors Total Errors
Sunny Cool Normal False Yes High
E. Zemmouri
Rainy Mild Normal False Yes
Normal
Sunny Mild Normal True Yes
Overcast Mild High True Yes Windy Yes No Rule Errors Total Errors
Overcast Hot Normal False Yes False
Rainy Mild High True No True 18
OneR : example
Frequency Tables
Weather Problem
Which one is the best predictor ? Outlook Yes No Rule Errors Total Errors
Sunny 2 3 Sunny à No 2/5
Outlook Temp Humidity Windy Play
Sunny Hot High False No
Overcast 4 0 Overcast à Yes 0/4 4/14
Sunny Hot High True No
Rainy 2 3 Rainy à No 2/5
Overcast Hot High False Yes
Temp Yes No Rule Errors Total Errors
Rainy Mild High False Yes
Hot 2 2 Hot à Yes 2/4
Rainy Cool Normal False Yes
Mild 4 2 Mild à Yes 2/6 5/14
Rainy Cool Normal True No
Cool 3 1 Cool à Yes 1/4
Overcast Cool Normal True Yes
Sunny Mild High False No
Humidity Yes No Rule Errors Total Errors
Sunny Cool Normal False Yes High 3 4 High à No 3/7
4/14
E. Zemmouri
Rainy Mild Normal False Yes
Normal 6 1 Normal à Yes 1/7
Sunny Mild Normal True Yes
Overcast Mild High True Yes Windy Yes No Rule Errors Total Errors
Overcast Hot Normal False Yes False 6 2 False à Yes 2/8
5/15
Rainy Mild High True No True 3 3 True à No 3/6 19
OneR : example
Weather Problem
Which one is the best predictor ?
• The best predictor is Outlook
Outlook Temp Humidity Windy Play
Sunny Hot High False No • Rule :
Sunny Hot High True No
Overcast Hot High False Yes
• Outlook = sunny à Play = no
Rainy Mild High False Yes
• Outlook = overcast à Play = yes
Rainy Cool Normal False Yes
Rainy Cool Normal True No • Outlook = rainy à Play = yes
Overcast Cool Normal True Yes
Sunny Mild High False No
• With accuracy : 0.71
Sunny Cool Normal False Yes
• How to classify this day :
E. Zemmouri
Rainy Mild Normal False Yes
Sunny Mild Normal True Yes • (sunny, cool, normal, true)
Overcast Mild High True Yes
Overcast Hot Normal False Yes
Rainy Mild High True No 20
The Classification Problem
Weka
Weka Software
• Waikato Environment for Knowledge Analysis
• Developed in Java at the University of Waikato, New Zealand
• Open source software (GNU General Public License)
• A collection of machine learning algorithms for data mining tasks :
• data preparation,
• classification,
• regression,
• clustering,
• association rules mining,
E. Zemmouri
• and visualization.
22
Weka Software
E. Zemmouri
23