0% found this document useful (0 votes)
26 views13 pages

Module 2 CARTAlgorithm

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views13 pages

Module 2 CARTAlgorithm

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

CART (Classification And Regression Tree)

CART( Classification And Regression Trees) is a variation of the decision tree algorithm. It can handle both classification
and regression tasks. Scikit-Learn uses the Classification And Regression Tree (CART) algorithm to train Decision Trees (also
called “growing” trees). CART was first produced by Leo Breiman, Jerome Friedman, Richard Olshen, and Charles Stone in
1984.

CART(Classification And Regression Tree) for Decision Tree

CART is a predictive algorithm used in Machine learning and it explains how the target variable's values can be predicted
based on other matters. It is a decision tree where each fork is split into a predictor variable and each node has a prediction for
the target variable at the end.

The term CART serves as a generic term for the following categories of decision trees:

• Classification Trees: The tree is used to determine which "class" the target variable is most likely to fall into when it
is continuous.

• Regression trees: These are used to predict a continuous variable's value.

CART Algorithm

Classification and Regression Trees (CART) is a decision tree algorithm that is used for both classification and regression
tasks. It is a supervised learning algorithm that learns from labelled data to predict unseen data.

• Tree structure: CART builds a tree-like structure consisting of nodes and branches. The nodes represent different
decision points, and the branches represent the possible outcomes of those decisions. The leaf nodes in the tree contain
a predicted class label or value for the target variable.

• Splitting criteria: CART uses a greedy approach to split the data at each node. It evaluates all possible splits and
selects the one that best reduces the impurity of the resulting subsets. For classification tasks, CART uses Gini
impurity as the splitting criterion. The lower the Gini impurity, the more pure the subset is. For regression tasks,
CART uses residual reduction as the splitting criterion. The lower the residual reduction, the better the fit of the model
to the data.

• Pruning: To prevent overfitting of the data, pruning is a technique used to remove the nodes that contribute little to
the model accuracy. Cost complexity pruning and information gain pruning are two popular pruning techniques. Cost
complexity pruning involves calculating the cost of each node and removing nodes that have a negative cost.
Information gain pruning involves calculating the information gain of each node and removing nodes that have a low
information gain.

How does CART algorithm work?

The CART algorithm works via the following process:

• The best-split point of each input is obtained.

• Based on the best-split points of each input in Step 1, the new “best” split point is identified.

• Split the chosen input according to the “best” split point.

• Continue splitting until a stopping rule is satisfied or no further desirable splitting is available.
Credit: https://www.geeksforgeeks.org/machine-learning/cart-classification-and-regression-tree-in-machine-learning/
POPULAR CART-BASED ALGORITHMS:

• CART (Classification and Regression Trees): The original algorithm that uses binary splits to build decision trees.

• C4.5 and C5.0: Extensions of CART that allow for multiway splits and handle categorical variables more effectively.

• Random Forests: Ensemble methods that use multiple decision trees (often CART) to improve predictive
performance and reduce overfitting.

• Gradient Boosting Machines (GBM): Boosting algorithms that also use decision trees (often CART) as base
learners, sequentially improving model performance.

Advantages of CART

• Results are simplistic.

• Classification and regression trees are Nonparametric and Nonlinear.

• Classification and regression trees implicitly perform feature selection.

• Outliers have no meaningful effect on CART.

• It requires minimal supervision and produces easy-to-understand models.

Limitations of CART

• Overfitting.

• High Variance.

• low bias.

• the tree structure may be unstable.

Year Contribution Contributors


1963 Recursive partition method Morgan & Sonquist
1972 First classification tree Hunt, Messenger & Mandell (THAID)
1980 CHAID algorithm Gordon V. Kass
1984 CART Breiman, Friedman, Olshen & Stone
1986 ID3 Ross Quinlan
1993 C4.5 Ross Quinlan

You might also like