Data Mining
Classification:
Basic Concepts, Decision Trees (IDE algorithm)
Classification: Definition
• Classification is a task in data mining that involves assigning a class label to each
instance in a dataset based on its features. The goal of classification is to build a
model that accurately predicts the class labels of new instances based on their
features.
• There are two main types of classification: binary classification and multi-class
classification. Binary classification involves classifying instances into two classes,
such as “spam” or “not spam”, while multi-class classification involves classifying
instances into more than two classes.
Classification Techniques
• Decision Tree based Methods
• Rule-based Methods
• Neural Networks
• Naïve Bayes
• Support Vector Machines
Decision Tree
• It is supervised Machine Learning Algorithm that is used for both classification and
regression tasks.
• Tree Structure
• Decision nodes
• Leaf nodes
• Splitting
• Information Gain
• Entropy
Information Gain and Entropy
• Information Gain Measure of how much information, the answer about
specific question provides.
• Entropy is uncertainty/ randomness in the information obtained from IG,
the more the randomness the higher will be the entropy.
Decision Tree Induction
Many Algorithms:
• Hunt’s Algorithm (one of the earliest)
• CART
• ID3, C4.5
• SLIQ,SPRINT
Decision Tree Induction using ID3
• ID3 stands for Iterative Dichotomiser 3 and is named such because the
algorithm iteratively (repeatedly) dichotomizes(divides) features into two or
more groups at each step.