0% found this document useful (0 votes)

16 views28 pages

Regularization Techniques in Machine Learning

Uploaded by

Kaja Zajda

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views28 pages

Regularization Techniques in Machine Learning

Uploaded by

Kaja Zajda

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Regularization: A Tool for Better Machine Learning

Regularization is a crucial technique in machine learning that helps to prevent overfitting. Overfitting occurs when a model
becomes too complex and learns the training data so well that it fails to generalize to new, unseen data. This can lead to poor
performance on real-world applications.

How does regularization work?

By introducing a penalty term to the loss function, regularization discourages models from becoming overly complex. This
penalty term is calculated based on the magnitude of the model's parameters.

import pandas as pd
import numpy as np
import seaborn as sns
import [Link] as plt
%matplotlib inline

import warnings
[Link]("ignore")

df_train = pd.read_csv("[Link]")
df_test = pd.read_csv("[Link]")
print("Train Data: \n")
display(df_train.head(2))
print("Test Data: \n")
df_test.head(2)

Train Data:

OverallQual YearBuilt YearRemodAdd TotalBsmtSF 1stFlrSF GrLivArea FullBath TotRmsAbvGrd GarageCars GarageArea SalePrice ExterQu

0 6 1969 1969 663 663 1352 1 7 1 299 158000

1 6 1920 1950 1012 1012 1012 1 6 1 308 118400

Test Data:

OverallQual YearBuilt YearRemodAdd TotalBsmtSF 1stFlrSF GrLivArea FullBath TotRmsAbvGrd GarageCars GarageArea SalePrice ExterQu

0 4 1961 1961 1029 1029 1029 1 5 1 261 118500

1 5 1921 1950 731 820 1343 1 7 1 186 154900

df_test.shape

(329, 14)

Common Regularization Techniques:

L1 Regularization (Lasso): This technique encourages sparsity, meaning many model parameters are driven to zero. This can be
useful for feature selection, as it can help identify the most important features.
L2 Regularization (Ridge): L2 regularization prevents individual parameters from becoming too large, which can help to reduce the
variance of the model.
Elastic Net: This is a combination of L1 and L2 regularization, which can be useful when both feature selection and reducing
variance are important.

When to Use Regularization:

Limited Training Data: When you have limited training data, regularization can help prevent overfitting by preventing the model
from memorizing the training set.

High-Dimensional Data: With many features, regularization can help to prevent overfitting by reducing the complexity of the model.

Preventing Overfitting: Regularization is a general technique for preventing overfitting in various machine learning models.

Key Benefits of Regularization:

Improved Generalization: Regularization helps models generalize better to unseen data.
Reduced Overfitting: It prevents models from becoming too complex and memorizing the training data.
Feature Selection: L1 regularization can be used for feature selection.
Enhanced Model Stability: Regularization can make models more stable and less sensitive to small changes in the data.

By understanding regularization and applying it appropriately, you can significantly improve the performance and reliability of your
machine learning models.

from sklearn.linear_model import LinearRegression, Ridge, Lasso, ElasticNet

from [Link] import mean_squared_error
from [Link] import StandardScaler

X_train = df_train[['OverallQual', 'YearBuilt', 'YearRemodAdd', 'TotalBsmtSF', '1stFlrSF',

'GrLivArea', 'FullBath', 'TotRmsAbvGrd', 'GarageCars', 'GarageArea',
'ExterQual_TA', 'Foundation_PConc', 'KitchenQual_TA']]

y_train = df_train['SalePrice']

X_test = df_test[['OverallQual', 'YearBuilt', 'YearRemodAdd', 'TotalBsmtSF', '1stFlrSF',

'GrLivArea', 'FullBath', 'TotRmsAbvGrd', 'GarageCars', 'GarageArea',
'ExterQual_TA', 'Foundation_PConc', 'KitchenQual_TA']]

y_test = df_test['SalePrice']

# Scale the features

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = [Link](X_test)

models = {
'Unregularized': LinearRegression(),
'L1 (Lasso)': Lasso(alpha=1.0),
'L2 (Ridge)': Ridge(alpha=1.0),
'Elastic Net': ElasticNet(alpha=1.0, l1_ratio=0.5)
}

results = {}

for name, model in [Link]():

[Link](X_train_scaled, y_train)
y_pred = [Link](X_test_scaled)
mse = mean_squared_error(y_test, y_pred)
r2 = [Link](X_test_scaled, y_test)
results[name] = {'model': model, 'mse': mse, 'r2': r2}

print("Model Results:")
color_map = [Link].tab20
colors = color_map([Link](len(results)) % color_map.N)

for i, (name, result) in enumerate([Link]()):

print(f"\033[1m{name}:\033[0m")
print(f" MSE: {result['mse']:.4f}")
print(f" R2 Score: {result['r2']:.4f}")
print(f" Coefficients:")

Model Results:
Unregularized:
MSE: 1101336094.4360
R2 Score: 0.7980
Coefficients:
L1 (Lasso):
MSE: 1101343139.1702
R2 Score: 0.7980
Coefficients:
L2 (Ridge):
MSE: 1101358746.2137
R2 Score: 0.7980
Coefficients:
Elastic Net:
MSE: 1200548075.3716
R2 Score: 0.7798
Coefficients:

Key Observations:
Key Observations:
Similar R2 Scores: All four models exhibit very similar R2 scores (around 0.7980), indicating that they explain a significant portion of
the variance in the target variable.
Slightly Higher MSE for Elastic Net: The Elastic Net model has a slightly higher MSE compared to the other three models. This
suggests that it might be introducing some bias to the model, potentially due to its regularization penalty.
Minimal Impact of Regularization: The differences in MSE between the regularized models (L1, L2, and Elastic Net) and the
unregularized model are relatively small. This could be due to the nature of the data or the chosen regularization parameters.

Conclusions:
Model Choice: While the R2 scores are comparable, the slightly lower MSE of the unregularized model might make it a preferred
choice if overfitting is not a major concern. However, if there's a risk of overfitting, the regularized models (L1, L2, or Elastic Net)
could be considered.
Regularization Impact: In this case, the regularization techniques (L1, L2, and Elastic Net) did not significantly improve the model's
performance. This could be due to various factors, such as the data characteristics or the choice of regularization parameters.
Further Analysis: To gain a deeper understanding of the models' behavior, it would be helpful to examine the feature coefficients
and explore the impact of different regularization parameters.

Additional Considerations:
Data Quality: Ensure that the data is clean and free from outliers or missing values.
Feature Engineering: Experiment with different feature engineering techniques to see if they can improve model performance.
Hyperparameter Tuning: Fine-tune the regularization parameters (alpha and l1_ratio for Elastic Net) to potentially optimize the
models' performance.
Cross-Validation: Use cross-validation to assess the models' generalization performance more reliably.

Note: By carefully considering these factors, we can make an informed decision about the best model for our specific problem.

[Link](figsize=(24, 10))

x = [Link](len(X_train.columns))
width = 0.20 # Bar width

cmap = [Link].get_cmap('tab20')
colors = cmap([Link](len(results)) % cmap.N)

for i, (name, result) in enumerate([Link]()):

coef = result['model'].coef_.ravel()

bars = [Link](x + i * width, coef, width, label=name, color=colors[i])

for bar in bars:

height = bar.get_height()
[Link](bar.get_x() + bar.get_width()/2, 0,
f"{height:.4f}", ha='center', va='baseline',
rotation=90, fontsize=18, color='Black',
bbox=dict(facecolor='white', edgecolor='none', alpha=0.7))

[Link](y=0, color='k', linestyle='-', linewidth= 1)

[Link]('Features', fontsize=20, color='Blue', fontweight='bold')
[Link]('Coefficient Value', fontsize=20, color='Blue', fontweight='bold')
[Link]('Feature Importance Comparison', fontsize=24, color='Blue', fontweight='bold', pad=20)
[Link](x + width, X_train.columns, fontsize=16, color='Blue', rotation=45, ha='right')
[Link](fontsize=16)
plt.tight_layout()
[Link]()
# Plot predictions vs actual for each model
fig, axes = [Link](1, 4, figsize=(24, 8))
[Link]('Predictions vs Actual', fontsize=26, color='#8B4513', fontweight='bold')

for ax, (name, result) in zip(axes, [Link]()):

model = result['model']
y_pred = [Link](X_test_scaled)
[Link](y_test, y_pred, alpha=0.5)
[Link]([y_test.min(), y_test.max()], [y_test.min(), y_test.max()], 'r--', lw=2)
ax.set_xlabel('Actual', fontsize=16, color='Blue', fontweight='bold')
ax.set_ylabel('Predicted', fontsize=16, color='Blue', fontweight='bold')
ax.set_title(f'{name}\nMSE: {result["mse"]:.4f}, R2: {result["r2"]:.4f}', fontsize=18, color='Blue', fontweight
ax.tick_params(axis='both', which='major', labelsize=12)

plt.tight_layout()
[Link]()

Predictions vs Actual Graph

The graph compares the predicted values from four regression models (Unregularized, L1 (Lasso), L2 (Ridge), and Elastic Net)
against the actual values. Each subplot represents a different model. The x-axis shows the actual values, while the y-axis
shows the predicted values.

Key Observations:

Strong Positive Correlation: All four models exhibit a strong positive correlation between predicted and actual values. This
indicates that the models are able to capture the underlying relationship in the data to a reasonable extent.
Similar Scatter Patterns: The scatter plots for all four models look quite similar, suggesting that the different regularization
techniques did not significantly alter the overall prediction patterns.
Diagonal Line: The dashed diagonal line represents perfect prediction, where the predicted values would exactly match the actual
values. The closer the points are to this line, the better the model's predictions.
MSE and R2 Scores: The provided metrics (MSE and R2) support the visual observations:

MSE (Mean Squared Error): Measures the average squared difference between predicted and actual values. Lower MSE
indicates better prediction accuracy.
R2 Score: Measures the proportion of variance in the target variable explained by the model. Higher R2 indicates a better fit.
The relatively high R2 scores (around 0.7980) for all models confirm their good predictive power.

Reasons for Similar Performance:

Data Characteristics: The underlying data might have a relatively linear relationship between the features and the target variable,
making it easier for all models to capture the pattern.
Regularization Strength: The chosen regularization parameters (alpha for L1, L2, and Elastic Net) might not be strong enough to
significantly differentiate the models' performance.
Model Complexity: The models might be relatively simple, and the regularization techniques might not be adding much complexity
or constraint.

To gain deeper insights, consider the following:

Feature Importance: Analyze the feature coefficients to understand which features are most influential in the models' predictions.
Hyperparameter Tuning: Experiment with different regularization parameters to see if it can improve performance.
Cross-Validation: Use cross-validation to assess the models' generalization performance more reliably.
Model Complexity: Try more complex models (e.g., non-linear models, ensemble methods) if the data is highly nonlinear.

By carefully analyzing these factors, we can gain a better understanding of the models' strengths and weaknesses and make informed
decisions for our specific problem.

import numpy as np
import pandas as pd
import [Link] as plt
from sklearn.linear_model import LinearRegression, Ridge, Lasso, ElasticNet
from [Link] import mean_squared_error, r2_score
from [Link] import StandardScaler
from sklearn.model_selection import GridSearchCV

# Assume df_train and df_test are your training and test DataFrames
X_train = df_train[['OverallQual', 'YearBuilt', 'YearRemodAdd', 'TotalBsmtSF', '1stFlrSF',
'GrLivArea', 'FullBath', 'TotRmsAbvGrd', 'GarageCars', 'GarageArea',
'ExterQual_TA', 'Foundation_PConc', 'KitchenQual_TA']]
y_train = df_train['SalePrice']
X_test = df_test[['OverallQual', 'YearBuilt', 'YearRemodAdd', 'TotalBsmtSF', '1stFlrSF',
'GrLivArea', 'FullBath', 'TotRmsAbvGrd', 'GarageCars', 'GarageArea',
'ExterQual_TA', 'Foundation_PConc', 'KitchenQual_TA']]
y_test = df_test['SalePrice']

# Scale the features

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = [Link](X_test)

# Define models and their parameter grids

models = {
'Unregularized': (LinearRegression(), {}),
'L1 (Lasso)': (Lasso(), {'alpha': [0.1, 0.5, 1.0, 2.0, 5.0]}),
'L2 (Ridge)': (Ridge(), {'alpha': [0.1, 0.5, 1.0, 2.0, 5.0]}),
'Elastic Net': (ElasticNet(), {'alpha': [0.1, 0.5, 1.0, 2.0, 5.0], 'l1_ratio': [0.1, 0.3, 0.5, 0.7, 0.9]})
}

results = {}

for name, (model, param_grid) in [Link]():

if param_grid: # If there are hyperparameters to tune
grid_search = GridSearchCV(model, param_grid, cv=5, scoring='r2', n_jobs=-1)
grid_search.fit(X_train_scaled, y_train)
best_model = grid_search.best_estimator_
best_params = grid_search.best_params_
else:
best_model = model
best_model.fit(X_train_scaled, y_train)
best_params = {}

y_pred = best_model.predict(X_test_scaled)
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
results[name] = {'model': best_model, 'mse': mse, 'r2': r2, 'best_params': best_params}

# Print results
print("Model Results:")
color_map = [Link].tab20
colors = color_map([Link](len(results)) % color_map.N)

L1 (Lasso):
MSE: 1101367531.2049
R2 Score: 0.7980
Best Parameters: {'alpha': 5.0}
Coefficients:
OverallQual: 30304.6742
YearBuilt: 6512.1724
YearRemodAdd: 3553.5329
TotalBsmtSF: 5201.1843
1stFlrSF: 7974.0482
GrLivArea: 22263.8330
FullBath: -994.3052
TotRmsAbvGrd: 4465.6362
GarageCars: 12714.1825
GarageArea: -120.0299
ExterQual_TA: -972.6826
Foundation_PConc: -758.6943
KitchenQual_TA: -3229.4819

L2 (Ridge):
MSE: 1101561963.3566
R2 Score: 0.7980
Best Parameters: {'alpha': 5.0}
Coefficients:
OverallQual: 29905.8406
YearBuilt: 6413.3202
YearRemodAdd: 3588.5604
TotalBsmtSF: 5339.9915
1stFlrSF: 7984.5463
GrLivArea: 21810.0169
FullBath: -807.5970
TotRmsAbvGrd: 4778.5632
GarageCars: 12477.6537
GarageArea: 133.2542
ExterQual_TA: -1160.4003
Foundation_PConc: -673.2106
KitchenQual_TA: -3239.9181

Elastic Net:
MSE: 1216604065.9478
R2 Score: 0.7769
Best Parameters: {'alpha': 2.0, 'l1_ratio': 0.7}
Coefficients:
OverallQual: 15844.9424
YearBuilt: 4453.7693
YearRemodAdd: 3994.2282
TotalBsmtSF: 7297.1307
1stFlrSF: 8140.8526
GrLivArea: 12637.7487
FullBath: 4465.5254
TotRmsAbvGrd: 8336.4735
GarageCars: 8000.2867
GarageArea: 5316.2396
ExterQual_TA: -4995.2670
Foundation_PConc: 2754.8928
KitchenQual_TA: -4081.0423

# Plot feature importance

[Link](figsize=(24, 10))
x = [Link](len(X_train.columns))
width = 0.2

for i, (name, result) in enumerate([Link]()):

coef = result['model'].coef_
bars = [Link](x + i * width, coef, width, label=name, color=colors[i])

for bar in bars:

height = bar.get_height()
[Link](bar.get_x() + bar.get_width()/2, 0,
f"{height:.4f}", ha='center', va='bottom',
rotation=90, fontsize=10, color='black',
bbox=dict(facecolor='white', edgecolor='none', alpha=0.7))

[Link](y=0, color='k', linestyle='-', linewidth=0.5)

# Plot predictions vs actual

fig, axes = [Link](2, 2, figsize=(24, 20))
[Link]('Predictions vs Actual', fontsize=24, color='Blue', fontweight='bold')

for ax, (name, result) in zip([Link](), [Link]()):

plt.tight_layout()
[Link]()
Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/[Link]
equlanizaton

om6times when wc tfain a model Ui t

StaSt to

,
ovcs{it. A way to avold oves{4Aing data
especially o models (tke tineas segcessfons
hat a&e haviluy atkeoted by outiess we can
USe &egulaxizaion Tbis wi ead tota mose
N9eneSamodel that is technicall
less accusae
bt geoecaliges to he data betes.

. Kidge12 BeasesS
used to cedúce oveHHi ng)
E
Cost unclion= O

Tsaininq data :lou bioas

Testtng data:low/bigh
VOianceY

o I4hehendata (destdaka) is neae to besi (tne

pexkofmance wi be govd.
(low voeiance)_

hehen 4est dota

is
kas (away) to best pit linc
pexlosmancc uwin bebad.
(high vawianoc)
aim: To SeduceovekliHing

> best itlne

t
tvecseate multiple
tines to imp&ovNc
peonmance oj des+
data.

costPuncHon:
2
cost uncion
hec-
m i
A(slope

beCo)

muliple
. Bo+ein>>

keaturcs
hypex

slopc= 611

aepeeSent, then
paxamete

(slope )=2(slope

Slope slope O dikkeeeot iocS.

cost unciion is Same as lineat seqsessiorn

Costunclon.
Relationship between slopcand

-02 o 06 o8
sbittinq tousards 2ero global minima

GHlobal minima gets shited Houwadeds let with

incecase in Va.

Cost unclion = O+ (slope

+Ve

change evaluc to cseate anothe best kt line.

slope

inveksly pcapaSHonal.
A =4 make swe +hat ou line'doesdt
OvCit.

O
the
is

lage, the
OSbinkage.
a

The cockhecieot
compleocity
amount o shinkage *

Value o ,
poMameee hat

he gseate
arc sh<unk towayds aero.
controls 4he

the amount

value nevee becomes 2eO

beCo) 0o+OI+2+ 933

eo+ O.g5 + 082o2 +0.10o3
4 will qet deletod

Ridge Regsession is used to inteodace bias

to4edata inosdet 4ogenerolíze the
data and inctease bias.
This is uselul iE you don'+ have much ainin9
dota.
O
o egsession

L RequlariaatHon NOSm)
+is used to &educe the leatwres belps in
Leature Selec4ion

CostuncHon

cost huncHon (hetayatepel

TCe 30
-20
0 t
-02 O o2 0.4 06 o 8 .2
be() -6o+ 61o+O2o2 +9303
o+0.S40 +0:23d2+o 103
leastcONElaed
data has ouHiers use Ridge Reg&ession.

Lasso Least Absolute Shsinkage and

Selection Opekato Regsession

Lasso Fegsession dends toeliminate the weigbts

Ohe
o the least impostant catur es by setting
Aheis weights to eNO

Elastc et
combinaton ot LI and l2 Regularigalion:

cost uncHon
= he(x)-) a (slope)2
L2
21slope

Can be chanqed to MAE, Rmse,

mSE
Notes taken from
John Starmer of
Stat Quest
Youtube Videos
KeaalavigatonRidge (L2)Reasescion

efind he line Ahat Sesuts in thea

minim am sum of squaned
Sesiduals.

we end up awitth he ean ofHhe line

Siae - o75%uweight
weignt>
-inecept slope

Uwhen we have ok lot ok meascee ments we can

oe faisly conident hat least squaves lme
,
accxoately &eflectss he 6elaHonship betuween
sige and welght
2
Bu what if uwé onlyhave [Link] ment s S
we 4 meus line since he neuo lneovetlaps
the 4usodata _points, he minimum 3um o squae
&esidaals O
Nes linme eanwsigeE 0:44 1:3xweight
Sumo) Ahe squaked Sesidaals fo6
desHng data ts lakge
which means Hhe neu ine das
high vauiance
o&iqinal dala
in m,neus line(6lue)is ovexéit +oaining data.
Ahe main idea bebind Ridge Regeession îs4
Hnd a neuw line ha doesnt f+he eainin
data aswell

Inothee
o
uwo ds, we initodace
BiaS intohoshe neo line ts
but in etuwen fos ha
o
+
small amount
to the data
smal1 omount ot bias,
we get a signi Rcant dkop in vaniance.

Ridge Regsession can péovide bette longg

ee p6edicHons.

when 1east squares deter mines values fo6+he

in 4his equaio
paamelees

sige q-amis îndercept +slopexwelgh

minimizes. -
he Sium o the squoJeed &esidaals.
Im con ast
Uwhen Redge Reg&ession delekminesvalues
fo8 he
pakamelers im +this eguaion

Si3e -acis indekcept +slopexuwet

4 minimiaes
he Sum o+he sguaHed Eesiduals
Ax he slope e i

lambda.
his poet adds a penay to
iao the aditional least squake
method.
and lambda (A) dede oines how sevexethat
penalty îs.

he sum of squaed sesidaals

fo& 4he esiderols least squaee ftt
O
is Cbecause theline oveklaps
the daa poinds). and heslope
is3.
0+A a(1.3) =O1(1.3) =1:69
O
sopeE
forblue line
(O 3(0:1)
) allHogethee
Ridae Reqsession tine
Red l63 , 04
13lae

Thus, uwe uanied +ominimize Hhe sum of +he

penolt e
squared 6esiduals plus he Rtdge Regsession
woad d choose he
Rrdae Regfession Line over he least square
Ane
withocut he small amountot Blas that he
penaltCeates,the least squoees £t+ hoas a
loae amount o Vaxiance.
Incontkast,the Ridge Regsession line,which
has he Small amounH o Bias dae to the
penal hasless vaiance

hete is a omeunit inceease (n

pêedteted sige
sie AMis tine suggests hatfoa
evetyOne unit înCHease
in Aelgaht

etaht
1fhe slope of he lineis
Sdeeper

Sige

nen fo8evesy One unitinckease

4 inweraht
eigt
hen he psedicion size
inceeases byovetuwo units.

In othe wosds, uwhen 4he slope okhe tne îs steep

hen he
psedicion fossige isveky Semsitèvta
Sensitueo &eintively smal changes in
weight

when heslope is Smal,

Hoen f06 evexU
intokease

he paedetion
tn wetaht
One unit
w
fos sige boely
nu
eNt2or
1
indeases

ln o4hee twosds, Cohen'4he slope of he lne is 8malt,

hen paedtctions fos sige axe much less Sensilisee
changes n_eighk.
least squake ltne

Ridge segsession line

The Ridge Reg&ession penalte sesulted tn aline tho

has o Smalle slope
uwhich nmeans hat p6edictions made with he
Ridge Regsession lIne axe less sensittve do
useighi4han the least squaee line.

Ridae Regcession éRR)

Sum ofthe squaxed fesidaals +ar(slope)

can be any value from O +o posiHve In ftni

0,]Ridge =
Awhen

RR
&egtession line

ended up wih
he leoaSt
leastsquare

a smallee slopehan
squoRe line
ine
Ondthe lokgee we make A, the slope gets
asympto+tbatly aloseto O.

SO,he laxge gets,ouepEedictions foa siae

becomes less oand less sensiHve to weight.

Sohou do we decide whad vatue +ogive

we ius 4eqa banch otvalues fos\ and use

CrOss-alidalibn, tupicaliy 10-fold cross valldalion,
o detecmine uhich onesesuHs in lowest he
NOKIOnce.
upttu nouo RR was fo6
Contnuous vakiable.

Howevele, RR also uwosks ushen we use disceele

ariable. stae(S7Hleh
Yinlgkcep
die
Olscxetevatiable:

co6s esponds
O
nAer cep C
o 9
40heaverogesige0Smal 8
he mice on 4he
diet

siae 1 S+o7xHiqh fotdieouse

AO&mal diet Hiqh fat

sum ok +[Link]
is he peedioion fot ahe

sie olhemice on hs Hiah fa diel.

O
hese distance beiuween 4he dota and
ne means oe minimi3ed
ushen RR dedemines value fo6 the pakamelees
in he equaton..
minimies

nesum othe Squovred eesiduals

dtetdifference

0, least squoted-ettoe RRtine E

1,hen onty ay to minimiae +Hhe whole ean
isto shsink diet distance douon.

In 6thee wosds,as geislaege; oue peedictRon

fos he mice on heblgh
sige ot he fotdiel
becomes less Sensitve 4ohe diffeence
bel ween the noamal diehand aiah-fatdie

The whole point ok doing RR is because small

Sample sige like hese can lead poo
leastsquares esmales hat Sesult im teible
o
machineleaning p&edicions
Ridge Regeession can alsobe applied 40
LogisticReg&ession

he Ssunm oh theHkelihoods+ (slope)

Dole hen applied.4o Logistc Regsessíon,

Ridge Regsession optmiaes hesum o the
Aikelihoods însiead ofhe squared eesiduals
because legishc Regsession is solved asing
mamimum likeit hasd

Ridge Reg tession helps Seduaevaeiance by

shuinking paxamelees and making oue
peedicttons less sensikve tohem

In genetal,RR penalty contains

all ohe pakametets i3
emcept fo& the [Link].

me scm of he Squaked &esidualss weiqht

(slop2
+
+dietdistance)
le ast squaxes cant ftnd a slng leopmal
Soluton, Since any ine line Ahat qoeswoug
4he dot wil minimige Ahe sumof+he
squaxed sesidaals.

but RR can nda solutiom with CeosS

alidaion and he RR pelte penalty hat
fovoeS smalle pak amee valaes.

Sumo{ he sauaked sesiduals

Cslope)

ummaku
henthe sample sizes ate elatvely Small,
hen RR can impfove pfedictions made feom
nes data (iie. educe vaeiance by making
Ahe péedictions less sensive do the
[Link].

RR penalty HselE is dimes he sum of al

Squaked parameteks, eacep+ os he
- xOSS
ntecept and is detek mined ustng
validaion.
Lasso Reg&ession: (LI)

Ridge Regsession fenalty 2Cslope)

Lasso Reqsessions

Sum of all +he squa ed sesidaals

+x Jslope

LassoReqsession Penalty contalns allof the

estmated porametees eoccept fo8 the
y-lnietcept.

Ridgeand Lasso Seg&ession sh&ink paNamekrs

heydont have to sheink them all equally.
Bia difeeencebetueen Ridge and Lasso
Reg&ession is hat Ridge Reg8ession can
only shsink +he slope àsymptoticalky
close +to O uuhìle Lasso 6egeèssion can
shsink he slope all e way +o 0.

LR can emclude useless vakiables {eom equations,

bett han RR at &educing'the vaiance în
models hat contain a lot of Uselesss
Nariahles
lastic Det Regsession

Det Reg&ession sta+suwith least squates

elastic-
hen combines the Losso Reasession penaHy
wih the Ridge Reasession pena4

Sum ofhe squared &esidcaals

Ivarlable Ivaniablenl

(vaeiable) a. +(yaiablen
ote
E
LRand fRR penalty aet thei& ouon As

The hyb8id Elasic et

Regsesston is especially
good at dealing [Link]
Ose co66e laHons between paHamelees.

This is because oni+SOUwn Lasso Keg6ession

ends +o pick just one of he co6felated
Aetms and eltminates the ohes
whexasRR 4ends to shtink akojthe pacameters
Lo&he cos@eladed. vatiables 4ogether
B combining LRand RR,
Elasic-Nt Reafession gsoups and shsinks
he patametees_associated aw14h 4he cosselaed
Naiables and leaves hem in eqn
SemoveS hem allat once:

Lab Manual 05
No ratings yet
Lab Manual 05
13 pages
Unit 4
No ratings yet
Unit 4
93 pages
Model Regularization
No ratings yet
Model Regularization
18 pages
NNDL Notes
No ratings yet
NNDL Notes
73 pages
Regularization
No ratings yet
Regularization
4 pages
Unit 4
No ratings yet
Unit 4
35 pages
Overfitting and Underfitting in ML
No ratings yet
Overfitting and Underfitting in ML
10 pages
Regularization in Machine Learning
No ratings yet
Regularization in Machine Learning
5 pages
LAB5 Regularization
No ratings yet
LAB5 Regularization
6 pages
12-Regularization For Deep Learning-17!08!2024
No ratings yet
12-Regularization For Deep Learning-17!08!2024
51 pages
Understanding Loss & Regularization in Deep Learning
No ratings yet
Understanding Loss & Regularization in Deep Learning
19 pages
Overfitting, Regularization & Scaling
No ratings yet
Overfitting, Regularization & Scaling
9 pages
Regularisation in Machine Learning Models
No ratings yet
Regularisation in Machine Learning Models
79 pages
CSL0777 L17
No ratings yet
CSL0777 L17
27 pages
Overfitting Underfitting: UNIT 2: Optimization and Regularization in Neural Networks
No ratings yet
Overfitting Underfitting: UNIT 2: Optimization and Regularization in Neural Networks
18 pages
Deep Learning Regularization Techniques
No ratings yet
Deep Learning Regularization Techniques
56 pages
Understanding Regularization Techniques
No ratings yet
Understanding Regularization Techniques
3 pages
Regularization
No ratings yet
Regularization
46 pages
MDS372 Lab4 2448001
No ratings yet
MDS372 Lab4 2448001
17 pages
ML - Perplexity
No ratings yet
ML - Perplexity
71 pages
What Is Regularization.
No ratings yet
What Is Regularization.
10 pages
Samatrix Assignment3
No ratings yet
Samatrix Assignment3
4 pages
Overfitting in Linear Regression
No ratings yet
Overfitting in Linear Regression
8 pages
1.2 Overfitting Under Fitting and Cross Validation and Confusion Matrix
No ratings yet
1.2 Overfitting Under Fitting and Cross Validation and Confusion Matrix
17 pages
Machine Learning
No ratings yet
Machine Learning
10 pages
Overfitting Problem Regularization (Ridge, Lasso, Elastic) Dropout and Early Stopping
No ratings yet
Overfitting Problem Regularization (Ridge, Lasso, Elastic) Dropout and Early Stopping
17 pages
Lecture 7
No ratings yet
Lecture 7
29 pages
UNIT-II Regularization in Deep Learning
No ratings yet
UNIT-II Regularization in Deep Learning
24 pages
PA Notes 2
No ratings yet
PA Notes 2
23 pages
Mlda - Lab
No ratings yet
Mlda - Lab
35 pages
Bias
No ratings yet
Bias
62 pages
Regularization - Ridge and Lasso
No ratings yet
Regularization - Ridge and Lasso
7 pages
Honours 1
No ratings yet
Honours 1
5 pages
Machine Learning Cheat Sheet
No ratings yet
Machine Learning Cheat Sheet
15 pages
Regularization Techniques in ML
No ratings yet
Regularization Techniques in ML
62 pages
Diagnosing Bias and Variance Lab
No ratings yet
Diagnosing Bias and Variance Lab
11 pages
AIML Project
No ratings yet
AIML Project
4 pages
SiddharthShah 1032221195 DivC 50 DL LabAssignment2
No ratings yet
SiddharthShah 1032221195 DivC 50 DL LabAssignment2
7 pages
EE2211 CheatSheet
No ratings yet
EE2211 CheatSheet
15 pages
Unit 2
No ratings yet
Unit 2
23 pages
B Ridge - and - Lasso - Regression
No ratings yet
B Ridge - and - Lasso - Regression
5 pages
Lecture 3
No ratings yet
Lecture 3
61 pages
Data Preparation
No ratings yet
Data Preparation
11 pages
Boston House Price Prediction Guide
No ratings yet
Boston House Price Prediction Guide
7 pages
Dlweek 6
No ratings yet
Dlweek 6
4 pages
SVM Guide for Data Science Enthusiasts
100% (1)
SVM Guide for Data Science Enthusiasts
28 pages
BACK PROPAGATION and REGULATION, BATCH NORMALIZATION
No ratings yet
BACK PROPAGATION and REGULATION, BATCH NORMALIZATION
20 pages
Regularization
No ratings yet
Regularization
3 pages
PyTorch Tabular Regression Guide
No ratings yet
PyTorch Tabular Regression Guide
13 pages
Lib - NN-đã G P
No ratings yet
Lib - NN-đã G P
83 pages
Regularization in ML
No ratings yet
Regularization in ML
2 pages
Mod 4
No ratings yet
Mod 4
65 pages
DL 3 Regularization
No ratings yet
DL 3 Regularization
50 pages
Regularization Techniques in Neural Networks
No ratings yet
Regularization Techniques in Neural Networks
8 pages
Subtitle
No ratings yet
Subtitle
3 pages
Mercedes-Benz Greener Manufacturing Ai
0% (1)
Mercedes-Benz Greener Manufacturing Ai
16 pages
Steel Joinery
No ratings yet
Steel Joinery
36 pages
Hydrocarbon Process Gas Compressor Lubricants (Jan2014)
No ratings yet
Hydrocarbon Process Gas Compressor Lubricants (Jan2014)
22 pages
Engineering Student Project Report
No ratings yet
Engineering Student Project Report
7 pages
Applications of Machine Learning Methods in Traffic Crash Severity Modelling Current Status and Future Directions
No ratings yet
Applications of Machine Learning Methods in Traffic Crash Severity Modelling Current Status and Future Directions
26 pages
PROBLEM 13.132: Table A-4, Air (Assume T - 8, H A
No ratings yet
PROBLEM 13.132: Table A-4, Air (Assume T - 8, H A
2 pages
Simulink-2-18
No ratings yet
Simulink-2-18
17 pages
Science and TV Trivia Quiz
No ratings yet
Science and TV Trivia Quiz
5 pages
Solutions YZ9nQ4VeXnC70vqR
No ratings yet
Solutions YZ9nQ4VeXnC70vqR
22 pages
Understanding Magnetism Basics
No ratings yet
Understanding Magnetism Basics
15 pages
PAF VerbalIntelligence CodingDecoding 100MCQs
No ratings yet
PAF VerbalIntelligence CodingDecoding 100MCQs
30 pages
How To Change AP Formats in R12
No ratings yet
How To Change AP Formats in R12
27 pages
Technical Specs for Systec D-45
No ratings yet
Technical Specs for Systec D-45
1 page
Resonancia Morfica
No ratings yet
Resonancia Morfica
10 pages
2 - Design of Sequential Logic Circuits - PPTX
No ratings yet
2 - Design of Sequential Logic Circuits - PPTX
38 pages
The Impact of HU Corrections in The Clinical Setting For Prostate Fiducial Marker Artifact
No ratings yet
The Impact of HU Corrections in The Clinical Setting For Prostate Fiducial Marker Artifact
17 pages
Plaxis 3D Foundation Tutorial
100% (1)
Plaxis 3D Foundation Tutorial
88 pages
Driscoplex 6400
No ratings yet
Driscoplex 6400
12 pages
Secondary Research On Patanjali
No ratings yet
Secondary Research On Patanjali
18 pages
Types of Soil Structures Explained
No ratings yet
Types of Soil Structures Explained
6 pages
SAE Tech Paper 2023-24-0102
No ratings yet
SAE Tech Paper 2023-24-0102
7 pages
Properties Saturated Ammonia
No ratings yet
Properties Saturated Ammonia
3 pages
Some Properties of Corn Grains and Their Flours I: Physicochemical, Functional and Chapati-Making Properties of Flours
No ratings yet
Some Properties of Corn Grains and Their Flours I: Physicochemical, Functional and Chapati-Making Properties of Flours
9 pages
Erbium Doped Fiber Amplifier (EDFA) For C-Band Optical Communication System
No ratings yet
Erbium Doped Fiber Amplifier (EDFA) For C-Band Optical Communication System
3 pages
Introduction To Cell Biology
No ratings yet
Introduction To Cell Biology
89 pages
Module 5 Notes
No ratings yet
Module 5 Notes
31 pages
VLSM Subnetting Guide
No ratings yet
VLSM Subnetting Guide
10 pages
China Civil Engineering Construction Corperation: Repair Welding Procedure Qualification-Manual Metal Arc
No ratings yet
China Civil Engineering Construction Corperation: Repair Welding Procedure Qualification-Manual Metal Arc
1 page
Antiferromagnetism and Ferrimagnetism
No ratings yet
Antiferromagnetism and Ferrimagnetism
18 pages
Kinetic Molecular Theory Lab Guide
No ratings yet
Kinetic Molecular Theory Lab Guide
4 pages
Motor Paso A Paso Españolberger-Lahr 2-Phase
No ratings yet
Motor Paso A Paso Españolberger-Lahr 2-Phase
22 pages