Decision Tree
Decision trees are a powerful machine learning algorithm used for
both classification and regression tasks. They are intuitive, easy to
interpret, and can handle a wide range of data types and problem
domains.
Team Members :
I. Abhijit Das (IT/22/028)
II. Pritam Garai (IT/22/029)
III. Kushal Singha (IT/22/038)
IV. Tiyasa Saha (IT/22/026)
V. Shrestha Chakraborty (IT/22/027)
Introduction to Decision
Trees
1 Hierarchical 2 Recursive
Structure Partitioning
Decision trees are built as The algorithm recursively
a series of hierarchical splits the data into
decisions, with each node smaller subsets based on
representing a feature the feature that provides
and the branches the most information
representing the possible gain.
outcomes.
3 Interpretability
Decision trees are highly interpretable, as the decision-
making process can be easily visualized and understood.
Advantages of Decision
Trees
Intuitive Interpretation Handling of Missing Data
Decision trees are easy to
understand and explain, Decision trees can handle
making them suitable for both missing values in the input
technical and non-technical data, making them a robust
stakeholders. choice for real-world
applications.
Robustness to Outliers Flexibility
Decision trees are less Decision trees can be used for
sensitive to outliers in the data both classification and
compared to some other regression tasks, and can
machine learning algorithms. handle a wide range of data
types.
How Decision Trees Works
Feature Selection 1
The algorithm selects the feature that provides the
most information gain to split the data at each node.
2 Recursive Splitting
The data is recursively split into smaller subsets
based on the selected feature until a stopping
Leaf Node Prediction 3 criterion is met.
The final leaf nodes of the tree represent the
predicted classes or target values for new instances.
Constructing Decision Trees
Data Preparation
Clean and preprocess the data, handling missing values and
encoding categorical features.
Feature Selection
Choose the most informative features to use in the decision tree
construction.
Tree Building
Recursively split the data based on the selected features to build
the decision tree.
Evaluation
Assess the performance of the decision tree on a held-out test set.
Pruning and Overfitting
Overfitting Pruning Cross-Validation
Decision trees can overfit the Pruning techniques can be used to Cross-validation is crucial for
training data, leading to poor simplify the decision tree and evaluating the performance of the
generalization on new, unseen prevent overfitting. decision tree and finding the optimal
instances. level of pruning.
Conclusion and Future
Directions
Decision trees are a powerful tool for data analysis, capable of
making decisions based on the input of information. Decision trees
can be used to predict outcomes and make intelligent decisions
without relying on human intuition or experience. These decision
trees come in two main varieties: classification tree and regression
tree. Classification tree is used when there is a categorical target
variable while regression tree is used when the target variable is
continuous. Both types of decision trees have been widely utilised
in artificial intelligence (AI) and machine learning (ML).