0% found this document useful (0 votes)
16 views28 pages

Regularization Techniques in Machine Learning

Uploaded by

Kaja Zajda
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views28 pages

Regularization Techniques in Machine Learning

Uploaded by

Kaja Zajda
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Regularization: A Tool for Better Machine Learning

Regularization is a crucial technique in machine learning that helps to prevent overfitting. Overfitting occurs when a model
becomes too complex and learns the training data so well that it fails to generalize to new, unseen data. This can lead to poor
performance on real-world applications.

How does regularization work?


By introducing a penalty term to the loss function, regularization discourages models from becoming overly complex. This
penalty term is calculated based on the magnitude of the model's parameters.

import pandas as pd
import numpy as np
import seaborn as sns
import [Link] as plt
%matplotlib inline

import warnings
[Link]("ignore")

df_train = pd.read_csv("[Link]")
df_test = pd.read_csv("[Link]")
print("Train Data: \n")
display(df_train.head(2))
print("Test Data: \n")
df_test.head(2)

Train Data:

OverallQual YearBuilt YearRemodAdd TotalBsmtSF 1stFlrSF GrLivArea FullBath TotRmsAbvGrd GarageCars GarageArea SalePrice ExterQu

0 6 1969 1969 663 663 1352 1 7 1 299 158000

1 6 1920 1950 1012 1012 1012 1 6 1 308 118400

Test Data:

OverallQual YearBuilt YearRemodAdd TotalBsmtSF 1stFlrSF GrLivArea FullBath TotRmsAbvGrd GarageCars GarageArea SalePrice ExterQu

0 4 1961 1961 1029 1029 1029 1 5 1 261 118500

1 5 1921 1950 731 820 1343 1 7 1 186 154900

df_test.shape

(329, 14)

Common Regularization Techniques:


L1 Regularization (Lasso): This technique encourages sparsity, meaning many model parameters are driven to zero. This can be
useful for feature selection, as it can help identify the most important features.
L2 Regularization (Ridge): L2 regularization prevents individual parameters from becoming too large, which can help to reduce the
variance of the model.
Elastic Net: This is a combination of L1 and L2 regularization, which can be useful when both feature selection and reducing
variance are important.

When to Use Regularization:


Limited Training Data: When you have limited training data, regularization can help prevent overfitting by preventing the model
from memorizing the training set.

High-Dimensional Data: With many features, regularization can help to prevent overfitting by reducing the complexity of the model.

Preventing Overfitting: Regularization is a general technique for preventing overfitting in various machine learning models.

Key Benefits of Regularization:


Improved Generalization: Regularization helps models generalize better to unseen data.
Reduced Overfitting: It prevents models from becoming too complex and memorizing the training data.
Feature Selection: L1 regularization can be used for feature selection.
Enhanced Model Stability: Regularization can make models more stable and less sensitive to small changes in the data.

By understanding regularization and applying it appropriately, you can significantly improve the performance and reliability of your
machine learning models.

from sklearn.linear_model import LinearRegression, Ridge, Lasso, ElasticNet


from [Link] import mean_squared_error
from [Link] import StandardScaler

X_train = df_train[['OverallQual', 'YearBuilt', 'YearRemodAdd', 'TotalBsmtSF', '1stFlrSF',


'GrLivArea', 'FullBath', 'TotRmsAbvGrd', 'GarageCars', 'GarageArea',
'ExterQual_TA', 'Foundation_PConc', 'KitchenQual_TA']]

y_train = df_train['SalePrice']

X_test = df_test[['OverallQual', 'YearBuilt', 'YearRemodAdd', 'TotalBsmtSF', '1stFlrSF',


'GrLivArea', 'FullBath', 'TotRmsAbvGrd', 'GarageCars', 'GarageArea',
'ExterQual_TA', 'Foundation_PConc', 'KitchenQual_TA']]

y_test = df_test['SalePrice']

# Scale the features


scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = [Link](X_test)

models = {
'Unregularized': LinearRegression(),
'L1 (Lasso)': Lasso(alpha=1.0),
'L2 (Ridge)': Ridge(alpha=1.0),
'Elastic Net': ElasticNet(alpha=1.0, l1_ratio=0.5)
}

results = {}

for name, model in [Link]():


[Link](X_train_scaled, y_train)
y_pred = [Link](X_test_scaled)
mse = mean_squared_error(y_test, y_pred)
r2 = [Link](X_test_scaled, y_test)
results[name] = {'model': model, 'mse': mse, 'r2': r2}

print("Model Results:")
color_map = [Link].tab20
colors = color_map([Link](len(results)) % color_map.N)

for i, (name, result) in enumerate([Link]()):


print(f"\033[1m{name}:\033[0m")
print(f" MSE: {result['mse']:.4f}")
print(f" R2 Score: {result['r2']:.4f}")
print(f" Coefficients:")

Model Results:
Unregularized:
MSE: 1101336094.4360
R2 Score: 0.7980
Coefficients:
L1 (Lasso):
MSE: 1101343139.1702
R2 Score: 0.7980
Coefficients:
L2 (Ridge):
MSE: 1101358746.2137
R2 Score: 0.7980
Coefficients:
Elastic Net:
MSE: 1200548075.3716
R2 Score: 0.7798
Coefficients:

Key Observations:
Key Observations:
Similar R2 Scores: All four models exhibit very similar R2 scores (around 0.7980), indicating that they explain a significant portion of
the variance in the target variable.
Slightly Higher MSE for Elastic Net: The Elastic Net model has a slightly higher MSE compared to the other three models. This
suggests that it might be introducing some bias to the model, potentially due to its regularization penalty.
Minimal Impact of Regularization: The differences in MSE between the regularized models (L1, L2, and Elastic Net) and the
unregularized model are relatively small. This could be due to the nature of the data or the chosen regularization parameters.

Conclusions:
Model Choice: While the R2 scores are comparable, the slightly lower MSE of the unregularized model might make it a preferred
choice if overfitting is not a major concern. However, if there's a risk of overfitting, the regularized models (L1, L2, or Elastic Net)
could be considered.
Regularization Impact: In this case, the regularization techniques (L1, L2, and Elastic Net) did not significantly improve the model's
performance. This could be due to various factors, such as the data characteristics or the choice of regularization parameters.
Further Analysis: To gain a deeper understanding of the models' behavior, it would be helpful to examine the feature coefficients
and explore the impact of different regularization parameters.

Additional Considerations:
Data Quality: Ensure that the data is clean and free from outliers or missing values.
Feature Engineering: Experiment with different feature engineering techniques to see if they can improve model performance.
Hyperparameter Tuning: Fine-tune the regularization parameters (alpha and l1_ratio for Elastic Net) to potentially optimize the
models' performance.
Cross-Validation: Use cross-validation to assess the models' generalization performance more reliably.

Note: By carefully considering these factors, we can make an informed decision about the best model for our specific problem.

[Link](figsize=(24, 10))

x = [Link](len(X_train.columns))
width = 0.20 # Bar width

cmap = [Link].get_cmap('tab20')
colors = cmap([Link](len(results)) % cmap.N)

for i, (name, result) in enumerate([Link]()):


coef = result['model'].coef_.ravel()

bars = [Link](x + i * width, coef, width, label=name, color=colors[i])

for bar in bars:


height = bar.get_height()
[Link](bar.get_x() + bar.get_width()/2, 0,
f"{height:.4f}", ha='center', va='baseline',
rotation=90, fontsize=18, color='Black',
bbox=dict(facecolor='white', edgecolor='none', alpha=0.7))

[Link](y=0, color='k', linestyle='-', linewidth= 1)


[Link]('Features', fontsize=20, color='Blue', fontweight='bold')
[Link]('Coefficient Value', fontsize=20, color='Blue', fontweight='bold')
[Link]('Feature Importance Comparison', fontsize=24, color='Blue', fontweight='bold', pad=20)
[Link](x + width, X_train.columns, fontsize=16, color='Blue', rotation=45, ha='right')
[Link](fontsize=16)
plt.tight_layout()
[Link]()
# Plot predictions vs actual for each model
fig, axes = [Link](1, 4, figsize=(24, 8))
[Link]('Predictions vs Actual', fontsize=26, color='#8B4513', fontweight='bold')

for ax, (name, result) in zip(axes, [Link]()):


model = result['model']
y_pred = [Link](X_test_scaled)
[Link](y_test, y_pred, alpha=0.5)
[Link]([y_test.min(), y_test.max()], [y_test.min(), y_test.max()], 'r--', lw=2)
ax.set_xlabel('Actual', fontsize=16, color='Blue', fontweight='bold')
ax.set_ylabel('Predicted', fontsize=16, color='Blue', fontweight='bold')
ax.set_title(f'{name}\nMSE: {result["mse"]:.4f}, R2: {result["r2"]:.4f}', fontsize=18, color='Blue', fontweight
ax.tick_params(axis='both', which='major', labelsize=12)

plt.tight_layout()
[Link]()

Predictions vs Actual Graph


The graph compares the predicted values from four regression models (Unregularized, L1 (Lasso), L2 (Ridge), and Elastic Net)
against the actual values. Each subplot represents a different model. The x-axis shows the actual values, while the y-axis
shows the predicted values.

Key Observations:

Strong Positive Correlation: All four models exhibit a strong positive correlation between predicted and actual values. This
indicates that the models are able to capture the underlying relationship in the data to a reasonable extent.
Similar Scatter Patterns: The scatter plots for all four models look quite similar, suggesting that the different regularization
techniques did not significantly alter the overall prediction patterns.
Diagonal Line: The dashed diagonal line represents perfect prediction, where the predicted values would exactly match the actual
values. The closer the points are to this line, the better the model's predictions.
MSE and R2 Scores: The provided metrics (MSE and R2) support the visual observations:

MSE (Mean Squared Error): Measures the average squared difference between predicted and actual values. Lower MSE
indicates better prediction accuracy.
R2 Score: Measures the proportion of variance in the target variable explained by the model. Higher R2 indicates a better fit.
The relatively high R2 scores (around 0.7980) for all models confirm their good predictive power.

Reasons for Similar Performance:


Data Characteristics: The underlying data might have a relatively linear relationship between the features and the target variable,
making it easier for all models to capture the pattern.
Regularization Strength: The chosen regularization parameters (alpha for L1, L2, and Elastic Net) might not be strong enough to
significantly differentiate the models' performance.
Model Complexity: The models might be relatively simple, and the regularization techniques might not be adding much complexity
or constraint.

To gain deeper insights, consider the following:

Feature Importance: Analyze the feature coefficients to understand which features are most influential in the models' predictions.
Hyperparameter Tuning: Experiment with different regularization parameters to see if it can improve performance.
Cross-Validation: Use cross-validation to assess the models' generalization performance more reliably.
Model Complexity: Try more complex models (e.g., non-linear models, ensemble methods) if the data is highly nonlinear.

By carefully analyzing these factors, we can gain a better understanding of the models' strengths and weaknesses and make informed
decisions for our specific problem.

import numpy as np
import pandas as pd
import [Link] as plt
from sklearn.linear_model import LinearRegression, Ridge, Lasso, ElasticNet
from [Link] import mean_squared_error, r2_score
from [Link] import StandardScaler
from sklearn.model_selection import GridSearchCV

# Assume df_train and df_test are your training and test DataFrames
X_train = df_train[['OverallQual', 'YearBuilt', 'YearRemodAdd', 'TotalBsmtSF', '1stFlrSF',
'GrLivArea', 'FullBath', 'TotRmsAbvGrd', 'GarageCars', 'GarageArea',
'ExterQual_TA', 'Foundation_PConc', 'KitchenQual_TA']]
y_train = df_train['SalePrice']
X_test = df_test[['OverallQual', 'YearBuilt', 'YearRemodAdd', 'TotalBsmtSF', '1stFlrSF',
'GrLivArea', 'FullBath', 'TotRmsAbvGrd', 'GarageCars', 'GarageArea',
'ExterQual_TA', 'Foundation_PConc', 'KitchenQual_TA']]
y_test = df_test['SalePrice']

# Scale the features


scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = [Link](X_test)

# Define models and their parameter grids


models = {
'Unregularized': (LinearRegression(), {}),
'L1 (Lasso)': (Lasso(), {'alpha': [0.1, 0.5, 1.0, 2.0, 5.0]}),
'L2 (Ridge)': (Ridge(), {'alpha': [0.1, 0.5, 1.0, 2.0, 5.0]}),
'Elastic Net': (ElasticNet(), {'alpha': [0.1, 0.5, 1.0, 2.0, 5.0], 'l1_ratio': [0.1, 0.3, 0.5, 0.7, 0.9]})
}

results = {}

for name, (model, param_grid) in [Link]():


if param_grid: # If there are hyperparameters to tune
grid_search = GridSearchCV(model, param_grid, cv=5, scoring='r2', n_jobs=-1)
grid_search.fit(X_train_scaled, y_train)
best_model = grid_search.best_estimator_
best_params = grid_search.best_params_
else:
best_model = model
best_model.fit(X_train_scaled, y_train)
best_params = {}

y_pred = best_model.predict(X_test_scaled)
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
results[name] = {'model': best_model, 'mse': mse, 'r2': r2, 'best_params': best_params}

# Print results
print("Model Results:")
color_map = [Link].tab20
colors = color_map([Link](len(results)) % color_map.N)

for i, (name, result) in enumerate([Link]()):


print(f"\033[1m{name}:\033[0m")
print(f" MSE: {result['mse']:.4f}")
print(f" R2 Score: {result['r2']:.4f}")
if result['best_params']:
print(f" Best Parameters: {result['best_params']}")
print(" Coefficients:")
for feature, coef in zip(X_train.columns, result['model'].coef_):
print(f" {feature}: {coef:.4f}")
print()
Model Results:
Unregularized:
MSE: 1101336094.4360
R2 Score: 0.7980
Coefficients:
OverallQual: 30302.6013
YearBuilt: 6529.5658
YearRemodAdd: 3560.5814
TotalBsmtSF: 5199.6669
1stFlrSF: 7978.9815
GrLivArea: 22277.7807
FullBath: -1017.9209
TotRmsAbvGrd: 4471.0704
GarageCars: 12747.2331
GarageArea: -160.4701
ExterQual_TA: -980.7238
Foundation_PConc: -782.6592
KitchenQual_TA: -3238.3793

L1 (Lasso):
MSE: 1101367531.2049
R2 Score: 0.7980
Best Parameters: {'alpha': 5.0}
Coefficients:
OverallQual: 30304.6742
YearBuilt: 6512.1724
YearRemodAdd: 3553.5329
TotalBsmtSF: 5201.1843
1stFlrSF: 7974.0482
GrLivArea: 22263.8330
FullBath: -994.3052
TotRmsAbvGrd: 4465.6362
GarageCars: 12714.1825
GarageArea: -120.0299
ExterQual_TA: -972.6826
Foundation_PConc: -758.6943
KitchenQual_TA: -3229.4819

L2 (Ridge):
MSE: 1101561963.3566
R2 Score: 0.7980
Best Parameters: {'alpha': 5.0}
Coefficients:
OverallQual: 29905.8406
YearBuilt: 6413.3202
YearRemodAdd: 3588.5604
TotalBsmtSF: 5339.9915
1stFlrSF: 7984.5463
GrLivArea: 21810.0169
FullBath: -807.5970
TotRmsAbvGrd: 4778.5632
GarageCars: 12477.6537
GarageArea: 133.2542
ExterQual_TA: -1160.4003
Foundation_PConc: -673.2106
KitchenQual_TA: -3239.9181

Elastic Net:
MSE: 1216604065.9478
R2 Score: 0.7769
Best Parameters: {'alpha': 2.0, 'l1_ratio': 0.7}
Coefficients:
OverallQual: 15844.9424
YearBuilt: 4453.7693
YearRemodAdd: 3994.2282
TotalBsmtSF: 7297.1307
1stFlrSF: 8140.8526
GrLivArea: 12637.7487
FullBath: 4465.5254
TotRmsAbvGrd: 8336.4735
GarageCars: 8000.2867
GarageArea: 5316.2396
ExterQual_TA: -4995.2670
Foundation_PConc: 2754.8928
KitchenQual_TA: -4081.0423

# Plot feature importance


[Link](figsize=(24, 10))
x = [Link](len(X_train.columns))
width = 0.2

for i, (name, result) in enumerate([Link]()):


coef = result['model'].coef_
bars = [Link](x + i * width, coef, width, label=name, color=colors[i])

for bar in bars:


height = bar.get_height()
[Link](bar.get_x() + bar.get_width()/2, 0,
f"{height:.4f}", ha='center', va='bottom',
rotation=90, fontsize=10, color='black',
bbox=dict(facecolor='white', edgecolor='none', alpha=0.7))

[Link](y=0, color='k', linestyle='-', linewidth=0.5)


[Link]('Features', fontsize=20, color='Blue', fontweight='bold')
[Link]('Coefficient Value', fontsize=20, color='Blue', fontweight='bold')
[Link]('Feature Importance Comparison', fontsize=24, color='Blue', fontweight='bold', pad=20)
[Link](x + width * 1.5, X_train.columns, fontsize=14, color='Blue', rotation=45, ha='right')
[Link](fontsize=12)
plt.tight_layout()
[Link]()

# Plot predictions vs actual


fig, axes = [Link](2, 2, figsize=(24, 20))
[Link]('Predictions vs Actual', fontsize=24, color='Blue', fontweight='bold')

for ax, (name, result) in zip([Link](), [Link]()):


model = result['model']
y_pred = [Link](X_test_scaled)
[Link](y_test, y_pred, alpha=0.5)
[Link]([y_test.min(), y_test.max()], [y_test.min(), y_test.max()], 'r--', lw=2)
ax.set_xlabel('Actual', fontsize=16, color='Blue', fontweight='bold')
ax.set_ylabel('Predicted', fontsize=16, color='Blue', fontweight='bold')
ax.set_title(f'{name}\nMSE: {result["mse"]:.4f}, R2: {result["r2"]:.4f}', fontsize=18, color='Blue', fontweight
ax.tick_params(axis='both', which='major', labelsize=12)

plt.tight_layout()
[Link]()
Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/[Link]
equlanizaton

om6times when wc tfain a model Ui t


StaSt to

,
ovcs{it. A way to avold oves{4Aing data
especially o models (tke tineas segcessfons
hat a&e haviluy atkeoted by outiess we can
USe &egulaxizaion Tbis wi ead tota mose
N9eneSamodel that is technicall
less accusae
bt geoecaliges to he data betes.

. Kidge12 BeasesS
used to cedúce oveHHi ng)
E
Cost unclion= O

Tsaininq data :lou bioas

Testtng data:low/bigh
VOianceY

o I4hehendata (destdaka) is neae to besi (tne


pexkofmance wi be govd.
(low voeiance)_

hehen 4est dota


is
kas (away) to best pit linc
pexlosmancc uwin bebad.
(high vawianoc)
aim: To SeduceovekliHing

> best itlne


t
tvecseate multiple
tines to imp&ovNc
peonmance oj des+
data.

costPuncHon:
2
cost uncion
hec-
m i
A(slope

beCo)

muliple
. Bo+ein>>

keaturcs
hypex

slopc= 611

aepeeSent, then
paxamete

(slope )=2(slope

Slope slope O dikkeeeot iocS.

cost unciion is Same as lineat seqsessiorn


Costunclon.
Relationship between slopcand

20

-02 o 06 o8
sbittinq tousards 2ero global minima

GHlobal minima gets shited Houwadeds let with


incecase in Va.

Cost unclion = O+ (slope


+Ve

change evaluc to cseate anothe best kt line.

slope

inveksly pcapaSHonal.
A =4 make swe +hat ou line'doesdt
OvCit.

O
the
is

lage, the
OSbinkage.
a

The cockhecieot
compleocity
amount o shinkage *

Value o ,
poMameee hat

he gseate
arc sh<unk towayds aero.
controls 4he

the amount

value nevee becomes 2eO

beCo) 0o+OI+2+ 933


eo+ O.g5 + 082o2 +0.10o3
4 will qet deletod

Ridge Regsession is used to inteodace bias


to4edata inosdet 4ogenerolíze the
data and inctease bias.
This is uselul iE you don'+ have much ainin9
dota.
O
o egsession

L RequlariaatHon NOSm)
+is used to &educe the leatwres belps in
Leature Selec4ion

CostuncHon

cost huncHon (hetayatepel

TCe 30
-20
0 t
-02 O o2 0.4 06 o 8 .2
be() -6o+ 61o+O2o2 +9303
o+0.S40 +0:23d2+o 103
leastcONElaed
data has ouHiers use Ridge Reg&ession.

Lasso Least Absolute Shsinkage and


Selection Opekato Regsession

Lasso Fegsession dends toeliminate the weigbts


Ohe
o the least impostant catur es by setting
Aheis weights to eNO

Elastc et
combinaton ot LI and l2 Regularigalion:

cost uncHon
= he(x)-) a (slope)2
L2
21slope

Can be chanqed to MAE, Rmse,


mSE
Notes taken from
John Starmer of
Stat Quest
Youtube Videos
KeaalavigatonRidge (L2)Reasescion

efind he line Ahat Sesuts in thea


minim am sum of squaned
Sesiduals.

we end up awitth he ean ofHhe line

Siae - o75%uweight
weignt>
-inecept slope

Uwhen we have ok lot ok meascee ments we can


oe faisly conident hat least squaves lme
,
accxoately &eflectss he 6elaHonship betuween
sige and welght
2
Bu what if uwé onlyhave [Link] ment s S
we 4 meus line since he neuo lneovetlaps
the 4usodata _points, he minimum 3um o squae
&esidaals O
Nes linme eanwsigeE 0:44 1:3xweight
Sumo) Ahe squaked Sesidaals fo6
desHng data ts lakge
which means Hhe neu ine das
high vauiance
o&iqinal dala
in m,neus line(6lue)is ovexéit +oaining data.
Ahe main idea bebind Ridge Regeession îs4
Hnd a neuw line ha doesnt f+he eainin
data aswell

Inothee
o
uwo ds, we initodace
BiaS intohoshe neo line ts
but in etuwen fos ha
o
+
small amount
to the data
smal1 omount ot bias,
we get a signi Rcant dkop in vaniance.

Ridge Regsession can péovide bette longg


ee p6edicHons.

when 1east squares deter mines values fo6+he


in 4his equaio
paamelees

sige q-amis îndercept +slopexwelgh


minimizes. -
he Sium o the squoJeed &esidaals.
Im con ast
Uwhen Redge Reg&ession delekminesvalues
fo8 he
pakamelers im +this eguaion

Si3e -acis indekcept +slopexuwet


4 minimiaes
he Sum o+he sguaHed Eesiduals
Ax he slope e i

lambda.
his poet adds a penay to
iao the aditional least squake
method.
and lambda (A) dede oines how sevexethat
penalty îs.

he sum of squaed sesidaals


fo& 4he esiderols least squaee ftt
O
is Cbecause theline oveklaps
the daa poinds). and heslope
is3.
0+A a(1.3) =O1(1.3) =1:69
O
sopeE
forblue line
(O 3(0:1)
) allHogethee
Ridae Reqsession tine
Red l63 , 04
13lae

Thus, uwe uanied +ominimize Hhe sum of +he


penolt e
squared 6esiduals plus he Rtdge Regsession
woad d choose he
Rrdae Regfession Line over he least square
Ane
withocut he small amountot Blas that he
penaltCeates,the least squoees £t+ hoas a
loae amount o Vaxiance.
Incontkast,the Ridge Regsession line,which
has he Small amounH o Bias dae to the
penal hasless vaiance

hete is a omeunit inceease (n


pêedteted sige
sie AMis tine suggests hatfoa
evetyOne unit înCHease
in Aelgaht

etaht
1fhe slope of he lineis
Sdeeper

Sige

nen fo8evesy One unitinckease


4 inweraht
eigt
hen he psedicion size
inceeases byovetuwo units.

In othe wosds, uwhen 4he slope okhe tne îs steep


hen he
psedicion fossige isveky Semsitèvta
Sensitueo &eintively smal changes in
weight

when heslope is Smal,


Hoen f06 evexU
intokease

he paedetion
tn wetaht
One unit
w
fos sige boely
nu
eNt2or
1
indeases

ln o4hee twosds, Cohen'4he slope of he lne is 8malt,


hen paedtctions fos sige axe much less Sensilisee
changes n_eighk.
least squake ltne

Ridge segsession line

The Ridge Reg&ession penalte sesulted tn aline tho


has o Smalle slope
uwhich nmeans hat p6edictions made with he
Ridge Regsession lIne axe less sensittve do
useighi4han the least squaee line.

Ridae Regcession éRR)

Sum ofthe squaxed fesidaals +ar(slope)


can be any value from O +o posiHve In ftni

0,]Ridge =
Awhen

RR
&egtession line

ended up wih
he leoaSt
leastsquare

a smallee slopehan
squoRe line
ine
Ondthe lokgee we make A, the slope gets
asympto+tbatly aloseto O.

SO,he laxge gets,ouepEedictions foa siae


becomes less oand less sensiHve to weight.

Sohou do we decide whad vatue +ogive

we ius 4eqa banch otvalues fos\ and use


CrOss-alidalibn, tupicaliy 10-fold cross valldalion,
o detecmine uhich onesesuHs in lowest he
NOKIOnce.
upttu nouo RR was fo6
Contnuous vakiable.

Howevele, RR also uwosks ushen we use disceele


ariable. stae(S7Hleh
Yinlgkcep
die
Olscxetevatiable:

co6s esponds
O
nAer cep C
o 9
40heaverogesige0Smal 8
he mice on 4he
diet

siae 1 S+o7xHiqh fotdieouse


AO&mal diet Hiqh fat

sum ok +[Link]
is he peedioion fot ahe

sie olhemice on hs Hiah fa diel.


O
hese distance beiuween 4he dota and
ne means oe minimi3ed
ushen RR dedemines value fo6 the pakamelees
in he equaton..
minimies

nesum othe Squovred eesiduals


dtetdifference

0, least squoted-ettoe RRtine E


1,hen onty ay to minimiae +Hhe whole ean
isto shsink diet distance douon.

In 6thee wosds,as geislaege; oue peedictRon


fos he mice on heblgh
sige ot he fotdiel
becomes less Sensitve 4ohe diffeence
bel ween the noamal diehand aiah-fatdie

The whole point ok doing RR is because small


Sample sige like hese can lead poo
leastsquares esmales hat Sesult im teible
o
machineleaning p&edicions
Ridge Regeession can alsobe applied 40
LogisticReg&ession

he Ssunm oh theHkelihoods+ (slope)

Dole hen applied.4o Logistc Regsessíon,


Ridge Regsession optmiaes hesum o the
Aikelihoods însiead ofhe squared eesiduals
because legishc Regsession is solved asing
mamimum likeit hasd

Ridge Reg tession helps Seduaevaeiance by


shuinking paxamelees and making oue
peedicttons less sensikve tohem

In genetal,RR penalty contains


all ohe pakametets i3
emcept fo& the [Link].

me scm of he Squaked &esidualss weiqht

(slop2
+
+dietdistance)
le ast squaxes cant ftnd a slng leopmal
Soluton, Since any ine line Ahat qoeswoug
4he dot wil minimige Ahe sumof+he
squaxed sesidaals.

but RR can nda solutiom with CeosS


alidaion and he RR pelte penalty hat
fovoeS smalle pak amee valaes.

Sumo{ he sauaked sesiduals


Cslope)

ummaku
henthe sample sizes ate elatvely Small,
hen RR can impfove pfedictions made feom
nes data (iie. educe vaeiance by making
Ahe péedictions less sensive do the
[Link].

RR penalty HselE is dimes he sum of al


Squaked parameteks, eacep+ os he
- xOSS
ntecept and is detek mined ustng
validaion.
Lasso Reg&ession: (LI)

Ridge Regsession fenalty 2Cslope)


Lasso Reqsessions

Sum of all +he squa ed sesidaals


+x Jslope

LassoReqsession Penalty contalns allof the


estmated porametees eoccept fo8 the
y-lnietcept.

Ridgeand Lasso Seg&ession sh&ink paNamekrs


heydont have to sheink them all equally.
Bia difeeencebetueen Ridge and Lasso
Reg&ession is hat Ridge Reg8ession can
only shsink +he slope àsymptoticalky
close +to O uuhìle Lasso 6egeèssion can
shsink he slope all e way +o 0.

LR can emclude useless vakiables {eom equations,


bett han RR at &educing'the vaiance în
models hat contain a lot of Uselesss
Nariahles
lastic Det Regsession

Det Reg&ession sta+suwith least squates


elastic-
hen combines the Losso Reasession penaHy
wih the Ridge Reasession pena4

Sum ofhe squared &esidcaals

Ivarlable Ivaniablenl

(vaeiable) a. +(yaiablen
ote
E
LRand fRR penalty aet thei& ouon As

The hyb8id Elasic et


Regsesston is especially
good at dealing [Link]
Ose co66e laHons between paHamelees.

This is because oni+SOUwn Lasso Keg6ession


ends +o pick just one of he co6felated
Aetms and eltminates the ohes
whexasRR 4ends to shtink akojthe pacameters
Lo&he cos@eladed. vatiables 4ogether
B combining LRand RR,
Elasic-Nt Reafession gsoups and shsinks
he patametees_associated aw14h 4he cosselaed
Naiables and leaves hem in eqn
SemoveS hem allat once:

You might also like