Decision Tree
Decision Tree
graphical representation of diff options for solving a problem and show diff related factors.
it has hierarchical tree structure
starts with one main question at top - called a node , which further branches out into diff
possible outcomes;
where:
Root Node: starting point that rep the entire dataset
Branches: lines that connect nodes. shows flow from one decision to another.
Internal nodes: points where decision is made based on the input features.
Leaf Nodes: terminal node at the end of branches which rep final outcomes or
predictions
Classification
mainly two types based on nature of target variable:
1. Classification trees
2. Regression trees
1. Classification Trees
Decision Tree 1
designed to predict categorical outcomes
classify data into diff classes
can determine whether an email is “spam” or “not spam” based on various features of email
2. Regression Trees
used when target variable is continuous
predict numerical value rather categories
e.g., a regression tree can predict price of house based on its location, size and other
features.
Working
works start with a main ques known as root node
ques is derived from dataset and serves as starting point for decision making.
from root node , the tree ask series of yes/no ques.
each ques is designed to split the data into subset based on attributes
e.g., “is it raining>” the ans will determine which branch to follow
depending on the response to each ques you follow diff branches.
if your ans is “yes” you will proceed down one path of “no’ you will take another path.
this branching continues through a sequence of decision.
as you follow each branch , you get more quest that break data into smaller group
this step by step continues until you have no more helpful ques.
at the end of a branch you will find the final outcome or decision.
it could be classification(”spam” or “not spam”) or a prediction (like estimated price)
Decision Tree 2
Advantages:
1. Simplicity and interpretability: DT are straightforward and easy to understand. You can
visualize them like a flowchart which makes it simple to see how decision are made.
2. Versality: it can be used for diff types of tasks can work well for bath classification and
regression.
3. No need for feature scaling: they don't require to normalize or scale your data
4. Handle non-linear Relationship: capable of handling non-linear relationship between
target variable and features.
Disadvantages:
1. Overfitting: overfitting occurs when decision tree captures noise and details in the training
data and it perform poorly on new data.
2. Instability: means that the model can be unreliable slight variations in input can lead to
significant differences in predictions.
Decision Tree 3
3. Bias towards features with more levels: DT can be biased toward features with more
categories focusing to much on them during decision making. This can cause model to miss
out other imp features led to less accurate prediction.
Application
Loan approval in banking: a bank needs to decide whether to approve a lone application
based on customer profile
input features like : income, credit score, employment statues and loan history
DT predicts the loan approval or rejection, helping bank to make quick and reliable
decision.
Decision Tree 4