0% found this document useful (0 votes)
38 views2 pages

CART Regression Model

CART (Classification and Regression Trees) is a decision tree algorithm used for regression tasks, which builds a binary tree to minimize prediction error measured by Mean Squared Error (MSE). The process involves selecting split points to create nodes, calculating weighted MSE, and recursively building the tree until a stopping condition is met. However, CART regression has drawbacks such as overfitting, high variance, and complexity in interpretation.

Uploaded by

Ram Jaybhaye
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views2 pages

CART Regression Model

CART (Classification and Regression Trees) is a decision tree algorithm used for regression tasks, which builds a binary tree to minimize prediction error measured by Mean Squared Error (MSE). The process involves selecting split points to create nodes, calculating weighted MSE, and recursively building the tree until a stopping condition is met. However, CART regression has drawbacks such as overfitting, high variance, and complexity in interpretation.

Uploaded by

Ram Jaybhaye
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

CART Regression Model: Detailed

Explanation
1. Introduction
CART (Classification and Regression Trees) is a decision tree algorithm used for both
classification and regression tasks. In regression problems, CART builds a binary tree that
splits the data into subsets to minimize prediction error, measured by Mean Squared Error
(MSE).

2. Important Terms
• Node: A point in the tree where the data is split.

• Root Node: The topmost node representing the entire dataset.

• Leaf Node: Terminal node that gives a prediction (average of target values).

• Split Point: The value used to divide the dataset.

• MSE (Mean Squared Error): A measure of the average squared difference between actual
and predicted values.

• MSR (Mean Squared Residual): Equivalent to MSE in the context of CART regression; error
from using mean of node as prediction.

• Weighted MSE: The average of MSEs from child nodes, weighted by the number of
samples.

3. Steps to Build a CART Regression Tree


1. Start with all data at the root.

2. Try all possible split points (feature ≤ threshold).

3. For each split, calculate the MSE for left and right nodes.

4. Compute the weighted MSE of the split:

Weighted MSE = (n_left / n_total) * MSE_left + (n_right / n_total) * MSE_right

5. Choose the split with the lowest weighted MSE.

6. Repeat recursively for each child node until a stopping condition is met (e.g., minimum
samples per leaf).
7. The prediction at each leaf is the mean of target values in that leaf.

4. Example
Dataset:

Hours Studied: [1, 2, 3, 4, 5]

Test Scores: [50, 55, 65, 70, 75]

Candidate Split at 2.5:

Left Node: [50, 55], Mean = 52.5, MSE = 6.25

Right Node: [65, 70, 75], Mean = 70, MSE = 16.67

Weighted MSE = (2/5)*6.25 + (3/5)*16.67 = 12.5

5. Prediction for Exact Match (e.g., 2.5 hours)


If the input value is exactly equal to the split point (e.g., 2.5), it goes to the left node due to
the ≤ condition.

So, 2.5 hours would lead to a prediction of 52.5.

6. Drawbacks of CART Regression


• Greedy Algorithm: Finds locally optimal splits, may miss better global tree structure.

• High Variance: Small changes in data can produce very different trees.

• Overfitting: May create complex trees that fit noise.

• Stepwise Prediction: Can't model smooth relationships well.

• Bias Toward Features with Many Values: Prefers features with more unique values.

• Complexity: Large trees can be hard to interpret.

You might also like