Bias and Variance
Overview of Bias and Variance
• In supervised machine learning an algorithm learns
a model from training data.
• The goal of any supervised machine learning
algorithm is to best estimate the mapping function
(f) for the output variable (Y) given the input data
(X).
• The mapping function is often called the target
function because it is the function that a given
supervised machine learning algorithm aims to
approximate.
Prediction error
• Anytime you have a difference between your
model and your measurements, you have an
error.
• The prediction error for any machine learning
algorithm can be broken down into three
parts:
• Bias Error
• Variance Error
• Irreducible Error
Irreducible error
• The irreducible error cannot be reduced
regardless of what algorithm is used.
• It is the error introduced from the chosen
framing of the problem and may be caused by
factors like unknown variables that influence
the mapping of the input variables to the
output variable.
Bias Error
• Bias are the simplifying assumptions made by a model to make
the target function easier to learn.
• Generally, parametric algorithms have a high bias making them
fast to learn and easier to understand but generally less
flexible.
• In turn, they have lower predictive performance on complex
problems that fail to meet the simplifying assumptions of the
algorithms bias.
• Low Bias: Suggests less assumptions about the form of the
target function.
• High-Bias: Suggests more assumptions about the form of the
target function.
Examples of Bias
• Examples of low-bias machine learning
algorithms include: Decision Trees, k-Nearest
Neighbors and Support Vector Machines.
• Examples of high-bias machine learning
algorithms include: Linear Regression, Linear
Discriminant Analysis and Logistic Regression.
Variance Error
• Variance is the amount that the estimate of the
target function will change if different training data
was used.
• The target function is estimated from the training
data by a machine learning algorithm, so we should
expect the algorithm to have some variance.
• Ideally, it should not change too much from one
training dataset to the next, meaning that the
algorithm is good at picking out the hidden
underlying mapping between the inputs and the
output variables.
• Machine learning algorithms that have a high
variance are strongly influenced by the
specifics of the training data.
• This means that the specifics of the training
have influences the number and types of
parameters used to characterize the mapping
function.
Low Variance vs High Variance
• Low Variance: Suggests small changes to the estimate
of the target function with changes to the training
dataset.
• High Variance: Suggests large changes to the estimate
of the target function with changes to the training
dataset.
• Generally, nonparametric machine learning algorithms
that have a lot of flexibility have a high variance.
• For example, decision trees have a high variance, that
is even higher if the trees are not pruned before use.
Examples
• Examples of low-variance machine learning
algorithms include: Linear Regression, Linear
Discriminant Analysis and Logistic Regression.
• Examples of high-variance machine learning
algorithms include: Decision Trees, k-Nearest
Neighbors and Support Vector Machines.
Bias-Variance Trade-Off
• The goal of any supervised machine learning
algorithm is to achieve low bias and low variance.
• In turn the algorithm should achieve good
prediction performance.
• Generally, Parametric or linear machine learning
algorithms often have a high bias but a low
variance.
• Generally, Non-parametric or non-linear machine
learning algorithms often have a low bias but a high
variance.
Figure 8.8 The bias variance tradeoff illustrated with test error
and training error. The training error is the top curve, which has
a minimum in the middle of the plot. In order to create the best
forecasts, we should adjust our model complexity where the
test error is at a minimum.
Handling Bias
• The k-nearest neighbors algorithm has low bias and
high variance, but the trade-off can be changed by
increasing the value of k which increases the number of
neighbors that contribute t the prediction and in turn
increases the bias of the model.
• The support vector machine algorithm has low bias and
high variance, but the trade-off can be changed by
increasing the C parameter that influences the number
of violations of the margin allowed in the training data
which increases the bias but decreases the variance
Bais vs Variance
The relationship between bias and variance in
machine learning.
• Increasing the bias will decrease the variance.
• Increasing the variance will decrease the bias.