0% found this document useful (0 votes)

17 views7 pages

Classification

Class Notes on Classification

Uploaded by

Abhyudya Singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views7 pages

Classification

Class Notes on Classification

Uploaded by

Abhyudya Singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Classificatoin

Prof. Satanik Mitra

BITS Pilani
Decision Tree
• Decision trees are a classification methodology, wherein the classification process is modeled with the
use of a set of hierarchical decisions on the feature variables, arranged in a tree-like structure.

• The decision at a particular node of the tree, which is referred to as the split criterion, is typically a
condition on one or more feature variables in the training data.

• For example, consider the case where Age is an attribute, and the split criterion is Age ≤ 30. In this case,
the left branch of the decision tree contains all training examples with age at most 30, whereas the right
branch contains all examples with age greater than 30.

• The goal is to identify a split criterion.

• The main difference from clustering is that the partitioning criterion in the decision tree is supervised
with the class label in the training instances.

• Some classical decision tree algorithms include C4.5, ID3, and CART.
Decision Tree
• Splits can be univariate or multivariate.
• A decision tree induction algorithm has two types of nodes, referred to as the
internal nodes and leaf nodes. Each leaf node is labeled with the dominant class
at that node.
• Eventually, the decision tree algorithm stops the growth of the tree based on a
stopping criterion.
• To avoid the degradation in accuracy associated with overfitting, the classifier
uses a postpruning mechanism.
• The goal of the split criterion is to maximize the separation of the different
classes among the children nodes.
• The design of the split criterion depends on the nature of the underlying
attribute.
• Binary attribute: Only one type of split is possible, and the tree is always binary.
• Categorical attribute: If a categorical attribute has r different values, there are multiple ways
to split it.
• Numeric attribute: If the numeric attribute contains a small number r of ordered values (e.g.,
integers in a small range [1, r]), it is possible to create an r-way split for each distinct value.
Decision Tree
• Split criterion need to be checked and quality of split needs to be quantified.
• Error rate: Let p be the fraction of the instances in a set of data points S belonging to the dominant class.
Then, the error rate is simply 1−p.

• Gini index: The Gini index G(S) for a set S of data points may be computed on the class distribution p1 .
. . pk of the training data points in S. The CART algorithm uses the Gini index as the split criterion.

• Entropy: The entropy measure is used in one of the earliest classification algorithms, referred to as ID3.
The entropy E(S) for a set S may be computed on the class distribution p1 . . . pk of the training data
points in the node.

• Lower values of the entropy are more desirable.

Information Gain and Entropy
• The entropy (very common in Information Theory) characterizes the (im)purityof an arbitrary collection
of examples.
• Information Gain is the expected reduction in entropy caused by partitioning the examples according to a
given attribute.

• If we have a set with k different values in it, we can calculate the entropy as follows:

• Where P(valuei) is the probability of getting the ithvalue when randomly selecting one from the set. So,
for the set R = {a,a,a,b,b,b,b,b}

• Dip. di Matematica Pura ed Applicata F. Aiolli -Sistemi Informativi 2007/2008

Dataset

• Dip. di Matematica Pura ed Applicata F. Aiolli -Sistemi Informativi 2007/2008

Information Gain and Entropy
• 16 instances: 9 positive, 7 negative.

• This equals: 0.9836

• For attribute “Size” we have left branch with entry “Small” we have 8 examples and get entropy of 0.8113

• The right branch with entry “Large” we have 8 examples and get entropy of 0.9544

• I(S ) = (8/16).8113 + (8/16).9544 = 0.8828.

Size

• We want to calculate the information gain(or entropy reduction). This is the reduction in ‘uncertainty’ when
choosing our first branch as ‘size’. We will represent information gain as “G.”
• G(size) = I(S) – I(S )
size

• G(size) = 0.9836 – 0.8828

• G(size) = 0.1008
• We have gained 0.1008 bits of information about the dataset by choosing ‘size’ as the first branch of our
decision tree.

Trees
No ratings yet
Trees
78 pages
Decision Trees for Data Scientists
No ratings yet
Decision Trees for Data Scientists
30 pages
Decision Tree
No ratings yet
Decision Tree
19 pages
Understanding Decision Trees
No ratings yet
Understanding Decision Trees
6 pages
Decistion Tree
No ratings yet
Decistion Tree
27 pages
Decision-Tree Learning .
No ratings yet
Decision-Tree Learning .
29 pages
Unit 3 Part 2
No ratings yet
Unit 3 Part 2
21 pages
Decision Trees for Data Scientists
No ratings yet
Decision Trees for Data Scientists
61 pages
Classification - Decision Trees
No ratings yet
Classification - Decision Trees
43 pages
DWDM Asgmnt Prog
No ratings yet
DWDM Asgmnt Prog
51 pages
Unit 1 Classification & Prediction DM
No ratings yet
Unit 1 Classification & Prediction DM
71 pages
ML TCS Lecture 1608 DecisionTree
No ratings yet
ML TCS Lecture 1608 DecisionTree
41 pages
Decision Tree Basics for Data Scientists
No ratings yet
Decision Tree Basics for Data Scientists
61 pages
Call Center Decision Trees Explained
No ratings yet
Call Center Decision Trees Explained
8 pages
Unit6 - 2 Classification-Decision-Trees
No ratings yet
Unit6 - 2 Classification-Decision-Trees
36 pages
Decision Trees for Data Scientists
No ratings yet
Decision Trees for Data Scientists
30 pages
Training Day 22
No ratings yet
Training Day 22
48 pages
Cse 445 Lecture 8 Mma
No ratings yet
Cse 445 Lecture 8 Mma
107 pages
ML UNIT-2 Notes
No ratings yet
ML UNIT-2 Notes
15 pages
Aiml Easy Solution
No ratings yet
Aiml Easy Solution
70 pages
Unit 5. Decision Trees
No ratings yet
Unit 5. Decision Trees
58 pages
UNIT - 3 ML
No ratings yet
UNIT - 3 ML
24 pages
07.2.decision Trees
No ratings yet
07.2.decision Trees
33 pages
Unit 3.2 Decision Tree Algorithm Wit Examples
No ratings yet
Unit 3.2 Decision Tree Algorithm Wit Examples
85 pages
Decision Trees
No ratings yet
Decision Trees
16 pages
5 1 Decision Trees
No ratings yet
5 1 Decision Trees
34 pages
Session 5b Classification by Decision Tree Induction
No ratings yet
Session 5b Classification by Decision Tree Induction
42 pages
Decision Tree Induction Basics
No ratings yet
Decision Tree Induction Basics
55 pages
Unit-3 ML
No ratings yet
Unit-3 ML
47 pages
Decision Tree Example
No ratings yet
Decision Tree Example
21 pages
MLT 3 UNIT-Part-1
No ratings yet
MLT 3 UNIT-Part-1
28 pages
Decision Tree
No ratings yet
Decision Tree
5 pages
Understanding Decision Trees in ML
No ratings yet
Understanding Decision Trees in ML
11 pages
Understanding Classification and Decision Trees
No ratings yet
Understanding Classification and Decision Trees
80 pages
UNIT - 3 ML
No ratings yet
UNIT - 3 ML
24 pages
ML - 4
No ratings yet
ML - 4
58 pages
DECISION TREES-jb
No ratings yet
DECISION TREES-jb
8 pages
Data Mining Notes Unit 4
No ratings yet
Data Mining Notes Unit 4
30 pages
Decision Tree Learning
No ratings yet
Decision Tree Learning
70 pages
07.2.decision Trees - ML
No ratings yet
07.2.decision Trees - ML
32 pages
DT-0 (3 Files Merged)
No ratings yet
DT-0 (3 Files Merged)
143 pages
Classification With Decision Trees: Instructor: Qiang Yang
100% (1)
Classification With Decision Trees: Instructor: Qiang Yang
62 pages
ML Unit 3 Notes
No ratings yet
ML Unit 3 Notes
117 pages
DMDW Classification
No ratings yet
DMDW Classification
18 pages
ML Unit 3 Notes-1
No ratings yet
ML Unit 3 Notes-1
118 pages
Business Analytics & Machine Learning: Decision Tree Classifiers
No ratings yet
Business Analytics & Machine Learning: Decision Tree Classifiers
60 pages
Inductive Inference with Decision Trees
No ratings yet
Inductive Inference with Decision Trees
53 pages
Decision Trees: A Recent Overview: S. B. Kotsiantis
No ratings yet
Decision Trees: A Recent Overview: S. B. Kotsiantis
23 pages
COS10022 DSP Week05 Decision Tree and Random Forest
No ratings yet
COS10022 DSP Week05 Decision Tree and Random Forest
50 pages
3 Decision Trees - LMS
No ratings yet
3 Decision Trees - LMS
47 pages
Data Science Lectures 3
No ratings yet
Data Science Lectures 3
46 pages
Classification DecisionTreesNaiveBayeskNN
No ratings yet
Classification DecisionTreesNaiveBayeskNN
75 pages
Supervised Decision TreeRandom Forest
No ratings yet
Supervised Decision TreeRandom Forest
39 pages
Decision Tree - Associative Rule Mining
No ratings yet
Decision Tree - Associative Rule Mining
69 pages
Decision Tree
No ratings yet
Decision Tree
35 pages
DM Unit-4
No ratings yet
DM Unit-4
75 pages
Decision Tree Basics
No ratings yet
Decision Tree Basics
30 pages
AI&Ml-module 4 (Part 1)
No ratings yet
AI&Ml-module 4 (Part 1)
85 pages
Types of Process Temperature Controllers
No ratings yet
Types of Process Temperature Controllers
23 pages
Jewel Case Insert Template Guide
No ratings yet
Jewel Case Insert Template Guide
3 pages
To Study and Implement The Basic UNIX Commands
No ratings yet
To Study and Implement The Basic UNIX Commands
17 pages
PF Ip55
No ratings yet
PF Ip55
7 pages
Channel Partner Program (Presentation (169) )
No ratings yet
Channel Partner Program (Presentation (169) )
7 pages
Vertical Pressure Vessel Design
No ratings yet
Vertical Pressure Vessel Design
27 pages
시스템 에어콘 유선리모콘-LG (MEZ61995604)
No ratings yet
시스템 에어콘 유선리모콘-LG (MEZ61995604)
94 pages
Delta MS300 Series Drive Manual
No ratings yet
Delta MS300 Series Drive Manual
518 pages
Why Are There Are Holes in My New Impeller 1664683640
No ratings yet
Why Are There Are Holes in My New Impeller 1664683640
11 pages
Scan-Chain Insertion with TetraMax
No ratings yet
Scan-Chain Insertion with TetraMax
43 pages
Layout Decisions Complete 2018
No ratings yet
Layout Decisions Complete 2018
30 pages
Understanding Critical Success Factors
100% (1)
Understanding Critical Success Factors
19 pages
Manufacturing Tolerances Guide
No ratings yet
Manufacturing Tolerances Guide
92 pages
Meeting Design Agenda 2015
No ratings yet
Meeting Design Agenda 2015
2 pages
Boiler and Steam Basics
No ratings yet
Boiler and Steam Basics
7 pages
Foil Resistor
No ratings yet
Foil Resistor
3 pages
MCCB Low Voltage Distribution - Pricelist - WEF 15th Jul 25
No ratings yet
MCCB Low Voltage Distribution - Pricelist - WEF 15th Jul 25
84 pages
Códigos de Falha PSA
100% (4)
Códigos de Falha PSA
94 pages
Guide to Managing Residential Noise
No ratings yet
Guide to Managing Residential Noise
12 pages
Pump Mechanical Efficiency Calculation
100% (1)
Pump Mechanical Efficiency Calculation
2 pages
Translucent Concrete
No ratings yet
Translucent Concrete
7 pages
73 17 Speed Dryers
No ratings yet
73 17 Speed Dryers
1 page
Connect, Cultivate, Convert - A New Marketing Model
No ratings yet
Connect, Cultivate, Convert - A New Marketing Model
39 pages
Form
No ratings yet
Form
3 pages
Stream Testing of Spin Finish Pumps
No ratings yet
Stream Testing of Spin Finish Pumps
4 pages
Network Login Homework Activity Sheet
No ratings yet
Network Login Homework Activity Sheet
2 pages
SDMO V275C2 Generator Specifications
No ratings yet
SDMO V275C2 Generator Specifications
4 pages
0 Results For 'Wyrmtide': Search
No ratings yet
0 Results For 'Wyrmtide': Search
3 pages
Expt - 5 - Directional Coupler - New
No ratings yet
Expt - 5 - Directional Coupler - New
10 pages
T 8000 Manual
No ratings yet
T 8000 Manual
17 pages

Classification

Uploaded by

Classification

Uploaded by

Classificatoin

Prof. Satanik Mitra

• The goal is to identify a split criterion.

• Lower values of the entropy are more desirable.

• Dip. di Matematica Pura ed Applicata F. Aiolli -Sistemi Informativi 2007/2008

• Dip. di Matematica Pura ed Applicata F. Aiolli -Sistemi Informativi 2007/2008

• This equals: 0.9836

• I(S ) = (8/16)*.8113 + (8/16)*.9544 = 0.8828.

• G(size) = 0.9836 – 0.8828

You might also like

• I(S ) = (8/16).8113 + (8/16).9544 = 0.8828.