Regularization Techniques in Machine Learning
Regularization Techniques in Machine Learning
Regularization is a crucial technique in machine learning that helps to prevent overfitting. Overfitting occurs when a model
becomes too complex and learns the training data so well that it fails to generalize to new, unseen data. This can lead to poor
performance on real-world applications.
import pandas as pd
import numpy as np
import seaborn as sns
import [Link] as plt
%matplotlib inline
import warnings
[Link]("ignore")
df_train = pd.read_csv("[Link]")
df_test = pd.read_csv("[Link]")
print("Train Data: \n")
display(df_train.head(2))
print("Test Data: \n")
df_test.head(2)
Train Data:
OverallQual YearBuilt YearRemodAdd TotalBsmtSF 1stFlrSF GrLivArea FullBath TotRmsAbvGrd GarageCars GarageArea SalePrice ExterQu
Test Data:
OverallQual YearBuilt YearRemodAdd TotalBsmtSF 1stFlrSF GrLivArea FullBath TotRmsAbvGrd GarageCars GarageArea SalePrice ExterQu
df_test.shape
(329, 14)
High-Dimensional Data: With many features, regularization can help to prevent overfitting by reducing the complexity of the model.
Preventing Overfitting: Regularization is a general technique for preventing overfitting in various machine learning models.
By understanding regularization and applying it appropriately, you can significantly improve the performance and reliability of your
machine learning models.
y_train = df_train['SalePrice']
y_test = df_test['SalePrice']
models = {
'Unregularized': LinearRegression(),
'L1 (Lasso)': Lasso(alpha=1.0),
'L2 (Ridge)': Ridge(alpha=1.0),
'Elastic Net': ElasticNet(alpha=1.0, l1_ratio=0.5)
}
results = {}
print("Model Results:")
color_map = [Link].tab20
colors = color_map([Link](len(results)) % color_map.N)
Model Results:
Unregularized:
MSE: 1101336094.4360
R2 Score: 0.7980
Coefficients:
L1 (Lasso):
MSE: 1101343139.1702
R2 Score: 0.7980
Coefficients:
L2 (Ridge):
MSE: 1101358746.2137
R2 Score: 0.7980
Coefficients:
Elastic Net:
MSE: 1200548075.3716
R2 Score: 0.7798
Coefficients:
Key Observations:
Key Observations:
Similar R2 Scores: All four models exhibit very similar R2 scores (around 0.7980), indicating that they explain a significant portion of
the variance in the target variable.
Slightly Higher MSE for Elastic Net: The Elastic Net model has a slightly higher MSE compared to the other three models. This
suggests that it might be introducing some bias to the model, potentially due to its regularization penalty.
Minimal Impact of Regularization: The differences in MSE between the regularized models (L1, L2, and Elastic Net) and the
unregularized model are relatively small. This could be due to the nature of the data or the chosen regularization parameters.
Conclusions:
Model Choice: While the R2 scores are comparable, the slightly lower MSE of the unregularized model might make it a preferred
choice if overfitting is not a major concern. However, if there's a risk of overfitting, the regularized models (L1, L2, or Elastic Net)
could be considered.
Regularization Impact: In this case, the regularization techniques (L1, L2, and Elastic Net) did not significantly improve the model's
performance. This could be due to various factors, such as the data characteristics or the choice of regularization parameters.
Further Analysis: To gain a deeper understanding of the models' behavior, it would be helpful to examine the feature coefficients
and explore the impact of different regularization parameters.
Additional Considerations:
Data Quality: Ensure that the data is clean and free from outliers or missing values.
Feature Engineering: Experiment with different feature engineering techniques to see if they can improve model performance.
Hyperparameter Tuning: Fine-tune the regularization parameters (alpha and l1_ratio for Elastic Net) to potentially optimize the
models' performance.
Cross-Validation: Use cross-validation to assess the models' generalization performance more reliably.
Note: By carefully considering these factors, we can make an informed decision about the best model for our specific problem.
[Link](figsize=(24, 10))
x = [Link](len(X_train.columns))
width = 0.20 # Bar width
cmap = [Link].get_cmap('tab20')
colors = cmap([Link](len(results)) % cmap.N)
plt.tight_layout()
[Link]()
Key Observations:
Strong Positive Correlation: All four models exhibit a strong positive correlation between predicted and actual values. This
indicates that the models are able to capture the underlying relationship in the data to a reasonable extent.
Similar Scatter Patterns: The scatter plots for all four models look quite similar, suggesting that the different regularization
techniques did not significantly alter the overall prediction patterns.
Diagonal Line: The dashed diagonal line represents perfect prediction, where the predicted values would exactly match the actual
values. The closer the points are to this line, the better the model's predictions.
MSE and R2 Scores: The provided metrics (MSE and R2) support the visual observations:
MSE (Mean Squared Error): Measures the average squared difference between predicted and actual values. Lower MSE
indicates better prediction accuracy.
R2 Score: Measures the proportion of variance in the target variable explained by the model. Higher R2 indicates a better fit.
The relatively high R2 scores (around 0.7980) for all models confirm their good predictive power.
Feature Importance: Analyze the feature coefficients to understand which features are most influential in the models' predictions.
Hyperparameter Tuning: Experiment with different regularization parameters to see if it can improve performance.
Cross-Validation: Use cross-validation to assess the models' generalization performance more reliably.
Model Complexity: Try more complex models (e.g., non-linear models, ensemble methods) if the data is highly nonlinear.
By carefully analyzing these factors, we can gain a better understanding of the models' strengths and weaknesses and make informed
decisions for our specific problem.
import numpy as np
import pandas as pd
import [Link] as plt
from sklearn.linear_model import LinearRegression, Ridge, Lasso, ElasticNet
from [Link] import mean_squared_error, r2_score
from [Link] import StandardScaler
from sklearn.model_selection import GridSearchCV
# Assume df_train and df_test are your training and test DataFrames
X_train = df_train[['OverallQual', 'YearBuilt', 'YearRemodAdd', 'TotalBsmtSF', '1stFlrSF',
'GrLivArea', 'FullBath', 'TotRmsAbvGrd', 'GarageCars', 'GarageArea',
'ExterQual_TA', 'Foundation_PConc', 'KitchenQual_TA']]
y_train = df_train['SalePrice']
X_test = df_test[['OverallQual', 'YearBuilt', 'YearRemodAdd', 'TotalBsmtSF', '1stFlrSF',
'GrLivArea', 'FullBath', 'TotRmsAbvGrd', 'GarageCars', 'GarageArea',
'ExterQual_TA', 'Foundation_PConc', 'KitchenQual_TA']]
y_test = df_test['SalePrice']
results = {}
y_pred = best_model.predict(X_test_scaled)
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
results[name] = {'model': best_model, 'mse': mse, 'r2': r2, 'best_params': best_params}
# Print results
print("Model Results:")
color_map = [Link].tab20
colors = color_map([Link](len(results)) % color_map.N)
L1 (Lasso):
MSE: 1101367531.2049
R2 Score: 0.7980
Best Parameters: {'alpha': 5.0}
Coefficients:
OverallQual: 30304.6742
YearBuilt: 6512.1724
YearRemodAdd: 3553.5329
TotalBsmtSF: 5201.1843
1stFlrSF: 7974.0482
GrLivArea: 22263.8330
FullBath: -994.3052
TotRmsAbvGrd: 4465.6362
GarageCars: 12714.1825
GarageArea: -120.0299
ExterQual_TA: -972.6826
Foundation_PConc: -758.6943
KitchenQual_TA: -3229.4819
L2 (Ridge):
MSE: 1101561963.3566
R2 Score: 0.7980
Best Parameters: {'alpha': 5.0}
Coefficients:
OverallQual: 29905.8406
YearBuilt: 6413.3202
YearRemodAdd: 3588.5604
TotalBsmtSF: 5339.9915
1stFlrSF: 7984.5463
GrLivArea: 21810.0169
FullBath: -807.5970
TotRmsAbvGrd: 4778.5632
GarageCars: 12477.6537
GarageArea: 133.2542
ExterQual_TA: -1160.4003
Foundation_PConc: -673.2106
KitchenQual_TA: -3239.9181
Elastic Net:
MSE: 1216604065.9478
R2 Score: 0.7769
Best Parameters: {'alpha': 2.0, 'l1_ratio': 0.7}
Coefficients:
OverallQual: 15844.9424
YearBuilt: 4453.7693
YearRemodAdd: 3994.2282
TotalBsmtSF: 7297.1307
1stFlrSF: 8140.8526
GrLivArea: 12637.7487
FullBath: 4465.5254
TotRmsAbvGrd: 8336.4735
GarageCars: 8000.2867
GarageArea: 5316.2396
ExterQual_TA: -4995.2670
Foundation_PConc: 2754.8928
KitchenQual_TA: -4081.0423
plt.tight_layout()
[Link]()
Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/[Link]
equlanizaton
,
ovcs{it. A way to avold oves{4Aing data
especially o models (tke tineas segcessfons
hat a&e haviluy atkeoted by outiess we can
USe &egulaxizaion Tbis wi ead tota mose
N9eneSamodel that is technicall
less accusae
bt geoecaliges to he data betes.
. Kidge12 BeasesS
used to cedúce oveHHi ng)
E
Cost unclion= O
Testtng data:low/bigh
VOianceY
costPuncHon:
2
cost uncion
hec-
m i
A(slope
beCo)
muliple
. Bo+ein>>
keaturcs
hypex
slopc= 611
aepeeSent, then
paxamete
(slope )=2(slope
20
-02 o 06 o8
sbittinq tousards 2ero global minima
slope
inveksly pcapaSHonal.
A =4 make swe +hat ou line'doesdt
OvCit.
O
the
is
lage, the
OSbinkage.
a
The cockhecieot
compleocity
amount o shinkage *
Value o ,
poMameee hat
he gseate
arc sh<unk towayds aero.
controls 4he
the amount
L RequlariaatHon NOSm)
+is used to &educe the leatwres belps in
Leature Selec4ion
CostuncHon
TCe 30
-20
0 t
-02 O o2 0.4 06 o 8 .2
be() -6o+ 61o+O2o2 +9303
o+0.S40 +0:23d2+o 103
leastcONElaed
data has ouHiers use Ridge Reg&ession.
Elastc et
combinaton ot LI and l2 Regularigalion:
cost uncHon
= he(x)-) a (slope)2
L2
21slope
Siae - o75%uweight
weignt>
-inecept slope
Inothee
o
uwo ds, we initodace
BiaS intohoshe neo line ts
but in etuwen fos ha
o
+
small amount
to the data
smal1 omount ot bias,
we get a signi Rcant dkop in vaniance.
lambda.
his poet adds a penay to
iao the aditional least squake
method.
and lambda (A) dede oines how sevexethat
penalty îs.
etaht
1fhe slope of he lineis
Sdeeper
Sige
he paedetion
tn wetaht
One unit
w
fos sige boely
nu
eNt2or
1
indeases
0,]Ridge =
Awhen
RR
&egtession line
ended up wih
he leoaSt
leastsquare
a smallee slopehan
squoRe line
ine
Ondthe lokgee we make A, the slope gets
asympto+tbatly aloseto O.
co6s esponds
O
nAer cep C
o 9
40heaverogesige0Smal 8
he mice on 4he
diet
sum ok +[Link]
is he peedioion fot ahe
(slop2
+
+dietdistance)
le ast squaxes cant ftnd a slng leopmal
Soluton, Since any ine line Ahat qoeswoug
4he dot wil minimige Ahe sumof+he
squaxed sesidaals.
ummaku
henthe sample sizes ate elatvely Small,
hen RR can impfove pfedictions made feom
nes data (iie. educe vaeiance by making
Ahe péedictions less sensive do the
[Link].
Ivarlable Ivaniablenl
(vaeiable) a. +(yaiablen
ote
E
LRand fRR penalty aet thei& ouon As