0% found this document useful (0 votes)
17 views7 pages

Classification

Class Notes on Classification

Uploaded by

Abhyudya Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views7 pages

Classification

Class Notes on Classification

Uploaded by

Abhyudya Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Classificatoin

Prof. Satanik Mitra


BITS Pilani
Decision Tree
• Decision trees are a classification methodology, wherein the classification process is modeled with the
use of a set of hierarchical decisions on the feature variables, arranged in a tree-like structure.

• The decision at a particular node of the tree, which is referred to as the split criterion, is typically a
condition on one or more feature variables in the training data.

• For example, consider the case where Age is an attribute, and the split criterion is Age ≤ 30. In this case,
the left branch of the decision tree contains all training examples with age at most 30, whereas the right
branch contains all examples with age greater than 30.

• The goal is to identify a split criterion.

• The main difference from clustering is that the partitioning criterion in the decision tree is supervised
with the class label in the training instances.

• Some classical decision tree algorithms include C4.5, ID3, and CART.
Decision Tree
• Splits can be univariate or multivariate.
• A decision tree induction algorithm has two types of nodes, referred to as the
internal nodes and leaf nodes. Each leaf node is labeled with the dominant class
at that node.
• Eventually, the decision tree algorithm stops the growth of the tree based on a
stopping criterion.
• To avoid the degradation in accuracy associated with overfitting, the classifier
uses a postpruning mechanism.
• The goal of the split criterion is to maximize the separation of the different
classes among the children nodes.
• The design of the split criterion depends on the nature of the underlying
attribute.
• Binary attribute: Only one type of split is possible, and the tree is always binary.
• Categorical attribute: If a categorical attribute has r different values, there are multiple ways
to split it.
• Numeric attribute: If the numeric attribute contains a small number r of ordered values (e.g.,
integers in a small range [1, r]), it is possible to create an r-way split for each distinct value.
Decision Tree
• Split criterion need to be checked and quality of split needs to be quantified.
• Error rate: Let p be the fraction of the instances in a set of data points S belonging to the dominant class.
Then, the error rate is simply 1−p.

• Gini index: The Gini index G(S) for a set S of data points may be computed on the class distribution p1 .
. . pk of the training data points in S. The CART algorithm uses the Gini index as the split criterion.

• Entropy: The entropy measure is used in one of the earliest classification algorithms, referred to as ID3.
The entropy E(S) for a set S may be computed on the class distribution p1 . . . pk of the training data
points in the node.

• Lower values of the entropy are more desirable.


Information Gain and Entropy
• The entropy (very common in Information Theory) characterizes the (im)purityof an arbitrary collection
of examples.
• Information Gain is the expected reduction in entropy caused by partitioning the examples according to a
given attribute.

• If we have a set with k different values in it, we can calculate the entropy as follows:

• Where P(valuei) is the probability of getting the ithvalue when randomly selecting one from the set. So,
for the set R = {a,a,a,b,b,b,b,b}

• Dip. di Matematica Pura ed Applicata F. Aiolli -Sistemi Informativi 2007/2008


Dataset

• Dip. di Matematica Pura ed Applicata F. Aiolli -Sistemi Informativi 2007/2008


Information Gain and Entropy
• 16 instances: 9 positive, 7 negative.

• This equals: 0.9836


• For attribute “Size” we have left branch with entry “Small” we have 8 examples and get entropy of 0.8113

• The right branch with entry “Large” we have 8 examples and get entropy of 0.9544

• I(S ) = (8/16)*.8113 + (8/16)*.9544 = 0.8828.


Size

• We want to calculate the information gain(or entropy reduction). This is the reduction in ‘uncertainty’ when
choosing our first branch as ‘size’. We will represent information gain as “G.”
• G(size) = I(S) – I(S )
size

• G(size) = 0.9836 – 0.8828


• G(size) = 0.1008
• We have gained 0.1008 bits of information about the dataset by choosing ‘size’ as the first branch of our
decision tree.

You might also like