Data Analytics & R Programming
Decision Tree Algorithm
S.Prabhakar, M.C.A., M.E (SEOR)
Assistant Professor, DFT Dept.,
NIFT, Chennai
Today’s Session
• What is Decision Tree?
• Terminologies associated to Decision Tree
• Visualizing a Decision Tree
• Writing a Decision tree classifier from scratch
Types of Machine Learning Algorithm
Supervised Learning (Task Un Supervised Learning Reinforcement Learning
Driven) (Data Driven) (Learn from Mistakes)
Decision Tree
• Decision tree is a type of supervised learning algorithm
(having a pre-defined target variable) that is mostly used
in classification problems.
• Graphical representation of all possible solution to a decision
• Decisions are based on some conditions
• Decision made can be easily explained
• Works for both categorical and continuous input and output
variables.
• Split the population or sample into two or more
homogeneous sets (or sub-populations) based on most
significant splitter / differentiator in input variables.
Decision Tree Terminology
ROOT Node: It represents entire population or sample and this further gets divided
into two or more homogeneous sets.
SPLITTING: It is a process of dividing a node into two or more sub-nodes.
Decision Node: When a sub-node splits into further sub-nodes, then it is called
decision node.
Leaf/ Terminal Node: Nodes do not split is called Leaf or Terminal node.
Pruning: When we remove sub-nodes of a decision node, this process is called
pruning. You can say opposite process of splitting.
Weekend Plan – Decision Tree
ROOT NODE
NO Rain YES
Go
Go Out
Out Stay
Stay in
in
INTERIOR NODES
Shopping Movie Coffee Shop TV Shows
Shopping Movie Coffee Shop TV Shows
NO YES NO YES
LEAF NODES
CART ALGORITHM
• Classification and Regression Trees or CART for short is a term
introduced by Leo Breiman to refer to Decision Tree algorithms
that can be used for classification or regression predictive
modeling problems.
• Classically, this algorithm is referred to as “decision trees”, but on
some platforms like R they are referred to by the more modern
term CART.
• The CART algorithm provides a foundation for important
algorithms like bagged decision trees, random forest and boosted
decision trees.
Decision Tree – R Programming
library (rpart)
library (rpart.plot)
data = read.csv("D:/S Prabhakar 14.6.18/1 Even Semester 2020 Jan/R Programming/Class/Decision Tree/data1.csv")
tree <- rpart (height ~ gender +weight,data)
a <- data.frame(gender=c("BOY"),weight=(30))
result<-predict(tree,a)
print (result)
rpart.plot(tree)
tree <- rpart(gender ~ height+weight,data)
a<- data.frame(height=c(12),weight=c(20))
result<-predict(tree,a)
print (result)
rpart.plot(tree)
Decision Tree – R Programming