CART
What is Decision Tree:
• Decision tree learning is a method commonly used in data
mining. The goal is to create a model that predicts the value of
a target variable based on several input variables. An example
is shown on the right. Each interior node corresponds to one of
the input variables; there are edges to children for each of the
possible values of that input variable. Each leaf represents a
value of the target variable given the values of the input
variables represented by the path from the root to the leaf.
• Decision trees used in data mining are of two
main types
• Classification tree analysis is when the predicted
outcome is the class to which the data belongs.
• Regression tree analysis is when the predicted
outcome can be considered a real number (e.g. the
price of a house, or a patient’s length of stay in a
hospital).
• Tree Building:
• There are many specific decision-tree algorithms. Notable
ones include:
• ID3 (Iterative Dichotomiser 3)
• C4.5 (successor of ID3)
• CART (Classification And Regression Tree)
• CHAID (CHI-squared Automatic Interaction Detector).
Performs multi-level splits when computing classification
trees.
• MARS: extends decision trees to handle numerical data
better.
• Conditional Inference Trees.
Advantages of Decision Tree:
• Simple to understand and interpret
• Requires little data preparation.
• Able to handle both numerical and categorical data.
• Uses a white box model.
• Possible to validate a model using statistical tests
• Robust.
• Performs well with large datasets.