CART Regression Model: Detailed
Explanation
1. Introduction
CART (Classification and Regression Trees) is a decision tree algorithm used for both
classification and regression tasks. In regression problems, CART builds a binary tree that
splits the data into subsets to minimize prediction error, measured by Mean Squared Error
(MSE).
2. Important Terms
• Node: A point in the tree where the data is split.
• Root Node: The topmost node representing the entire dataset.
• Leaf Node: Terminal node that gives a prediction (average of target values).
• Split Point: The value used to divide the dataset.
• MSE (Mean Squared Error): A measure of the average squared difference between actual
and predicted values.
• MSR (Mean Squared Residual): Equivalent to MSE in the context of CART regression; error
from using mean of node as prediction.
• Weighted MSE: The average of MSEs from child nodes, weighted by the number of
samples.
3. Steps to Build a CART Regression Tree
1. Start with all data at the root.
2. Try all possible split points (feature ≤ threshold).
3. For each split, calculate the MSE for left and right nodes.
4. Compute the weighted MSE of the split:
Weighted MSE = (n_left / n_total) * MSE_left + (n_right / n_total) * MSE_right
5. Choose the split with the lowest weighted MSE.
6. Repeat recursively for each child node until a stopping condition is met (e.g., minimum
samples per leaf).
7. The prediction at each leaf is the mean of target values in that leaf.
4. Example
Dataset:
Hours Studied: [1, 2, 3, 4, 5]
Test Scores: [50, 55, 65, 70, 75]
Candidate Split at 2.5:
Left Node: [50, 55], Mean = 52.5, MSE = 6.25
Right Node: [65, 70, 75], Mean = 70, MSE = 16.67
Weighted MSE = (2/5)*6.25 + (3/5)*16.67 = 12.5
5. Prediction for Exact Match (e.g., 2.5 hours)
If the input value is exactly equal to the split point (e.g., 2.5), it goes to the left node due to
the ≤ condition.
So, 2.5 hours would lead to a prediction of 52.5.
6. Drawbacks of CART Regression
• Greedy Algorithm: Finds locally optimal splits, may miss better global tree structure.
• High Variance: Small changes in data can produce very different trees.
• Overfitting: May create complex trees that fit noise.
• Stepwise Prediction: Can't model smooth relationships well.
• Bias Toward Features with Many Values: Prefers features with more unique values.
• Complexity: Large trees can be hard to interpret.