Inductive Bias in
Decision Tree
Learning
Inductive bias refers to the set of
assumptions a learning algorithm makes to
generalize from training data to unseen
data.
•Many different decision trees can perfectly fit the training data.
•But not all of them perform well on new, unseen data.
•Inductive bias helps the learner pick the best generalizable tree from many options.
Inductive Bias in ID3 Decision
Tree Algorithm
ID3 Algorithm applies inductive bias in two key ways:
1. Prefers Shorter Trees
Shorter trees are simpler, more general, and easier to interpret.
Follows Occam’s Razor: “Prefer the simplest hypothesis that fits
the data.”
[Link] Attributes with High Information Gain
Attributes with high information gain are chosen closer to the root
of the tree.
This allows the model to split data effectively and early.
Example: Inductive Bias in Action –
Buying a House
Goal: Decide whether to buy a house based
on Price, Location, and Size
Example: Inductive Bias – Buying a
House
1. Starts with Price:
The decision tree checks Price first (Low or High).
This gives the best initial split based on information.
➤ Inductive Bias: Chooses the attribute with the highest information gain.
2. Simple Decisions First:
If Price = Low and Location = Urban, the tree directly decides to Buy.
➤ Inductive Bias: Prefers shorter, simpler decision paths for better generalization.
3. Adds More Checks Only When Needed:
If Location = Suburban, it checks Size:
If Large → Buy
If Small → Don’t Buy
➤ Inductive Bias: Only adds complexity when necessary.
Issues in Decision Tree Learning
Decision trees are powerful but face challenges when
working with real-world data.
Common issues include:
Overfitting
Handling numeric (continuous) data
Choosing the right attribute
Missing values
Attribute cost differences
Issue 1 – Overfitting the Data
Grows too many branches: fits noise and rare cases
Solution: Pruning techniques (pre-pruning/post-
pruning)
Example: Tree fits one noisy sample → poor on test
data
Issue 2 – Continuous-Valued
Attributes
•Numeric values (e.g., Age, Salary) can’t be split like categories
• Solution: Use threshold splits (e.g., Age ≤ 30)
• Example: “Is Age ≤ 30?” gives binary split
Issue 3 – Choosing the Right
Attribute
•ID3 may prefer attributes with many unique values (e.g., Student ID)
• Solution: Use Gain Ratio (C4.5) or Gini Index (CART)
• Example: Student ID splits perfectly but is useless
Issue 4 – Missing Attribute Values
•Some attribute values may be blank or unknown
Solution:
•Use most common value
•Estimate using probabilities
•Ignore instance (if rare)
Issue 5 – Attributes with
Differing Costs
•Some features are expensive or hard to collect (e.g., Biopsy test)
• Solution:
•Use cost-sensitive gain
•Prefer low-cost attributes unless high accuracy needed
THANK YOU