Classification

The document explains the concept of classification in machine learning, contrasting it with regression, and introduces the RandomForestClassifier for predicting mutually exclusive outcomes. It provides a practical example using a dataset of phone features to predict price ranges, detailing the model creation, fitting, and evaluation using accuracy and confusion matrix. Additionally, it discusses the importance of class probabilities in making business decisions based on expected monetary values.

Uploaded by

noha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views6 pages

Classification

Uploaded by

noha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

ClassifiCation

So far you've predicted numeric targets. This type of modeling is

called regression, hence the "Regressor" part of RandomForestRegressor.
Another common problem you'll see is making a choice between mutually
exclusive outcomes. For example, spam detection is predicting whether an email is
"spam" or "not spam" based on the email's content. This type of modeling is
called classification.
There are two types of classification: binary (choosing between two classes) and
multiclass (choosing between more than two classes). In general there are different
approaches to the two types of classification, but most multiclass models will also
work for binary problems.
It's straightforward to build classification models using what you already know
about scikit-learn. Instead of RandomForestRegressor, you will
use RandomForestClassifier.
As an example of classification with RandomForestClassifier, I'll use a dataset of
phone features to predict a phone's price range. The targets in the data have values:
• 0 (low cost)
• 1 (medium cost)
• 2 (high cost)
• 3 (very high cost)
The features are things like
• battery_power: Total energy a battery can store in one time measured in
mAh
• blue: Has bluetooth or not
• clock_speed: speed at which microprocessor executes instructions
• dual_sim: Has dual sim support or not
• fc: Front Camera mega pixels
• four_g: Has 4G or not
• ....
Here is a quick overview of the data

import pandas as pd
from sklearn.model_selection import train_test_split
from [Link] import RandomForestClassifier
import [Link] as metrics

data = pd.read_csv('../input/mobile-price-classification/[Link]')
[Link]()

In[2]: [Link]
Out[2]:
Index(['battery_power', 'blue', 'clock_speed', 'dual_sim', 'fc', 'four_g',
'int_memory', 'm_dep', 'mobile_wt', 'n_cores', 'pc', 'px_height',
'px_width', 'ram', 'sc_h', 'sc_w', 'talk_time', 'three_g',
'touch_screen', 'wifi', 'price_range'],
dtype='object')
We create our feature and targets the same as before using train_test_split. This
part looks like what you've already seen.

In[3]:
# Set variables for the targets and features
y = data['price_range']
X = [Link]('price_range', axis=1)

# Split the data into training and validation sets

train_X, val_X, train_y, val_y = train_test_split(X, y, random_state=7)

Creating and fitting the model is similar to what you've done before, except you'll
use RandomForestClassifier instead of RandomForestRegressor.

In[4]:
# Create the classifier and fit it to our training data
model = RandomForestClassifier(random_state=7, n_estimators=100)
[Link](train_X, train_y)

Out[4]:
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
max_depth=None, max_features='auto', max_leaf_nodes=None,
min_impurity_decrease=0.0, min_impurity_split=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, n_estimators=100,
n_jobs=None, oob_score=False, random_state=7, verbose=0,
warm_start=False)
The simplest metric for classification models is the accuracy, the fraction
predictions that are correct. Scikit-learn provides metrics.accuracy_score to
calculate this.

In[5]:
# Predict classes given the validation features
pred_y = [Link](val_X)

# Calculate the accuracy as our performance metric

accuracy = metrics.accuracy_score(val_y, pred_y)
print("Accuracy: ", accuracy)
Accuracy: 0.864

Confusion Matrix:
Our model did pretty well, correctly predicting around 86% of the price ranges in
the validation data. It's often useful to look at where the model is failing with
a confusion matrix which shows us how our model classified the inputs.

In[6]:
# Calculate the confusion matrix itself
confusion = metrics.confusion_matrix(val_y, pred_y)
print(f"Confusion matrix:\n{confusion}")
# Normalizing by the true label counts to get rates
print(f"\nNormalized confusion matrix:")
for row in confusion:
print(row / [Link]())

It's a little easier to understand as a nice little figure like so:

The rows of the confusion matrix are the true class and the columns are the
predicted class. The diagonal tells us how many of each class the model predicted
correctly. The off-diagonals show where the model is making wrong predictions,
where it is "confused." For example, looking at the first column and second row,
we classified four phones that were actually low cost as medium cost. We see for
classes 0 and 3, the low cost and highest cost phones, our model works really well,
above 90% accurate. However, our model is weaker for medium and high cost
phones. Note that incorrect predictions are only between adjacent classes. The
model doesn't confuse low cost and very high cost phones.

Class probabilities
Classification models actually calculate a probability distribution over the classes.
Using : [Link] simply returns the class with the highest probability. This
might not be ideal based on how the decision affects your metrics or downstream
measures. To get the probabilities themselves, use the .predict_proba method.
In[7]:
probs = model.predict_proba(val_X)
print(probs)

This shows the probability the model assigns to each class. Often in business
problems, decisions you make lead to different monetary returns. The expected
return for a decision based on your classifier is the probability times the monetary
return of that decision.
Consider probabilities [0.05 0.17 0.42 0.36]. Assume the third option would result
in $100 of profit while the fourth option would return \$150 in profit. Then the
expected monetary values are 0.42∗$100=$42 and 0.36∗$150=$54. Even though
the third option has the highest probability, on average it would be better from a
business perspective to choose the fourth option.

Binary Classifier Evaluation Guide
No ratings yet
Binary Classifier Evaluation Guide
12 pages
How to Evaluate Machine Learning Models
No ratings yet
How to Evaluate Machine Learning Models
14 pages
Telecom Churn Proj
No ratings yet
Telecom Churn Proj
4 pages
Model Evaluation in ML
No ratings yet
Model Evaluation in ML
12 pages
Classification
No ratings yet
Classification
3 pages
Lecture03. Classification (Chapter 3)
No ratings yet
Lecture03. Classification (Chapter 3)
46 pages
Feature Selection in Python ML
No ratings yet
Feature Selection in Python ML
7 pages
08 Classification
No ratings yet
08 Classification
26 pages
Modelling-Project Notes-2
No ratings yet
Modelling-Project Notes-2
26 pages
Lab 2
No ratings yet
Lab 2
17 pages
Model Learning Steps
No ratings yet
Model Learning Steps
12 pages
ML Metrics
No ratings yet
ML Metrics
9 pages
Module 4 - Classification
No ratings yet
Module 4 - Classification
10 pages
Machine Learning II
No ratings yet
Machine Learning II
61 pages
DA PRA WEEK 13 (Random Forest) - 054551
No ratings yet
DA PRA WEEK 13 (Random Forest) - 054551
12 pages
ML - Mod2 Classification
No ratings yet
ML - Mod2 Classification
74 pages
Module 4 - Supervised Learning - First ML Model
No ratings yet
Module 4 - Supervised Learning - First ML Model
23 pages
Lecture 4 Evaluation
No ratings yet
Lecture 4 Evaluation
58 pages
3 Classification
No ratings yet
3 Classification
16 pages
ML Internal Answers
No ratings yet
ML Internal Answers
9 pages
Project
No ratings yet
Project
31 pages
Progress of GRADIENT BOOSTING ALGORITHM FOR ELECTRICITY THEFT DETECTION IN POWER UTILITIES
No ratings yet
Progress of GRADIENT BOOSTING ALGORITHM FOR ELECTRICITY THEFT DETECTION IN POWER UTILITIES
10 pages
Classification Model Evaluation Methods
No ratings yet
Classification Model Evaluation Methods
18 pages
ML Python Exercises UOM BDS Classification
No ratings yet
ML Python Exercises UOM BDS Classification
18 pages
Customer Churn Prediction in Telecom
No ratings yet
Customer Churn Prediction in Telecom
4 pages
Machine Learning
No ratings yet
Machine Learning
16 pages
Rev Insurance Business Report
No ratings yet
Rev Insurance Business Report
4 pages
A3 Classification and Feature Engineering
No ratings yet
A3 Classification and Feature Engineering
2 pages
Case Study - Classifier
No ratings yet
Case Study - Classifier
5 pages
Moocs Ritesh
No ratings yet
Moocs Ritesh
22 pages
Maxbox Starter66 Machine Learning4
No ratings yet
Maxbox Starter66 Machine Learning4
10 pages
3ML.02.MainConcepts Evaluation
No ratings yet
3ML.02.MainConcepts Evaluation
35 pages
(REPORT) LAB - 2 - Decision - Tree
No ratings yet
(REPORT) LAB - 2 - Decision - Tree
17 pages
Classification - With - Decision - Tree - MarketingData - Jupyter Notebook
No ratings yet
Classification - With - Decision - Tree - MarketingData - Jupyter Notebook
9 pages
How To Create A Python Model
No ratings yet
How To Create A Python Model
29 pages
Evaluation Metricsflaksdj Fa
No ratings yet
Evaluation Metricsflaksdj Fa
22 pages
ML Adv
No ratings yet
ML Adv
51 pages
CLASSIFICATION
No ratings yet
CLASSIFICATION
36 pages
X y Train - Test - Split Test - Size
No ratings yet
X y Train - Test - Split Test - Size
1 page
Decision Tree, Random Forest
No ratings yet
Decision Tree, Random Forest
37 pages
INT524 Unit3
No ratings yet
INT524 Unit3
35 pages
Lecture 3 1611410001002
No ratings yet
Lecture 3 1611410001002
51 pages
ML Fundamentals
No ratings yet
ML Fundamentals
15 pages
Classification Metrics
No ratings yet
Classification Metrics
24 pages
Machine Learning Unit-2
No ratings yet
Machine Learning Unit-2
89 pages
AI Note
No ratings yet
AI Note
5 pages
UNIT-2 ML
No ratings yet
UNIT-2 ML
10 pages
Prediction of Mobile Phone Price Class Using Supervised Machine Learning Techniques
No ratings yet
Prediction of Mobile Phone Price Class Using Supervised Machine Learning Techniques
4 pages
Last Day
No ratings yet
Last Day
35 pages
Professional Machine Learning
No ratings yet
Professional Machine Learning
67 pages
Beginner's Guide To Implementing A Simple Machine Learning Project - DeV Community
No ratings yet
Beginner's Guide To Implementing A Simple Machine Learning Project - DeV Community
9 pages
Tutorial 6
No ratings yet
Tutorial 6
8 pages
Data Scientists' Guide to Metrics
No ratings yet
Data Scientists' Guide to Metrics
70 pages
04 - Model Selection
No ratings yet
04 - Model Selection
62 pages
Chương 2e. Model Evaluation
No ratings yet
Chương 2e. Model Evaluation
27 pages
Progress of CATBOOST ALGORITHM FOR ELECTRICITY THEFT DETECTION IN POWER UTILITIES
No ratings yet
Progress of CATBOOST ALGORITHM FOR ELECTRICITY THEFT DETECTION IN POWER UTILITIES
9 pages
Exercise5 Solution
No ratings yet
Exercise5 Solution
22 pages
Hands On Machine Learning 3 Edition
No ratings yet
Hands On Machine Learning 3 Edition
31 pages
Evaluation Metrics
No ratings yet
Evaluation Metrics
6 pages
IENA 2020 Annales Concours Anglais
No ratings yet
IENA 2020 Annales Concours Anglais
82 pages
Algebra 2 Exam Practice
No ratings yet
Algebra 2 Exam Practice
26 pages
1) CSC264 Lab Tutorial 1 Environment Setup
No ratings yet
1) CSC264 Lab Tutorial 1 Environment Setup
8 pages
3d-Aware Conditional Image Synthesis: Kangle Deng Gengshan Yang Deva Ramanan Jun-Yan Zhu Carnegie Mellon University
No ratings yet
3d-Aware Conditional Image Synthesis: Kangle Deng Gengshan Yang Deva Ramanan Jun-Yan Zhu Carnegie Mellon University
15 pages
EIGRP Route Redistribution Lab Guide
No ratings yet
EIGRP Route Redistribution Lab Guide
12 pages
Introduction To Simulink With Engineering Applications 2nd Edition Steven T. Karris Online Version
No ratings yet
Introduction To Simulink With Engineering Applications 2nd Edition Steven T. Karris Online Version
163 pages
Sampling Budget Optimization For Pharma Companies
No ratings yet
Sampling Budget Optimization For Pharma Companies
4 pages
Modern C Programming Including Standards C99 C11 C17 C23 1st Edition Gazi Full Chapters Instanly
No ratings yet
Modern C Programming Including Standards C99 C11 C17 C23 1st Edition Gazi Full Chapters Instanly
151 pages
Syllabus For Bcs (Written) Examination: (Compulsory Subjects)
100% (1)
Syllabus For Bcs (Written) Examination: (Compulsory Subjects)
11 pages
MariaDB Installation and Management Guide
No ratings yet
MariaDB Installation and Management Guide
10 pages
PFE IEEE Ray Marching VS Ray Tracing
No ratings yet
PFE IEEE Ray Marching VS Ray Tracing
3 pages
Edge Controller User Guide v1.2
No ratings yet
Edge Controller User Guide v1.2
25 pages
Week 05 People Oriented Methodologies
No ratings yet
Week 05 People Oriented Methodologies
41 pages
Face2Face - Pre-Intermediate - Teacher's Book (PDFDrive)
No ratings yet
Face2Face - Pre-Intermediate - Teacher's Book (PDFDrive)
224 pages
BSBCRT512 Student Assessment Oyuna
No ratings yet
BSBCRT512 Student Assessment Oyuna
49 pages
Digital Conference System Overview
No ratings yet
Digital Conference System Overview
68 pages
GMM - Steak Wellington
No ratings yet
GMM - Steak Wellington
6 pages
ETL Process for DW Developers
No ratings yet
ETL Process for DW Developers
5 pages
Norman's 7 Principles in HCI Design
No ratings yet
Norman's 7 Principles in HCI Design
16 pages
WordPress SEO
80% (10)
WordPress SEO
353 pages
How To Install MySQL 8 On Windows - Tutorials24x7
No ratings yet
How To Install MySQL 8 On Windows - Tutorials24x7
14 pages
Challenges Encountered by Academic Library Staff Regarding The Processing of Requested Information Resources
No ratings yet
Challenges Encountered by Academic Library Staff Regarding The Processing of Requested Information Resources
20 pages
Efficient SQL Server License Key For All Version TechAid24
No ratings yet
Efficient SQL Server License Key For All Version TechAid24
1 page
DAA Project
No ratings yet
DAA Project
2 pages
(Ebook PDF) AutoCAD and Its Applications Comprehensive 2018 Twenty Fifth Edition PDF Download
100% (1)
(Ebook PDF) AutoCAD and Its Applications Comprehensive 2018 Twenty Fifth Edition PDF Download
51 pages
SWOT and TOWS Analysis Strategy Map Gerald Q. Reyes
No ratings yet
SWOT and TOWS Analysis Strategy Map Gerald Q. Reyes
11 pages
Belmont Trading Colombia Invoice BT2288
No ratings yet
Belmont Trading Colombia Invoice BT2288
2 pages
SMT 585
No ratings yet
SMT 585
109 pages
Reasoning
No ratings yet
Reasoning
8 pages
Communication Systems 3rd Edition Marcelo S. Alencar All Ebook Chapters Available
No ratings yet
Communication Systems 3rd Edition Marcelo S. Alencar All Ebook Chapters Available
43 pages