Python - Linear Regression Model Cheat Sheet
by DarioPittera (aggialavura) via cheatography.com/83764/cs/19917/
TO START TRAIN MODEL (cont)
# IMPORT DATA LIBRARIES lm.coef_ show coefficients
import pandas as pd coeff_df = pd.DataFrame create coeff df
import numpy as np (lm.coef_,X.columns,columns=['Coeff'])*
# IMPORT VIS LIBRARIES
pd.DataFrame: pd.DataFrame(data=None, index=None, column‐
import matplotlib.pyplot as plt
s=None, dtype=None, copy=False). data = values, index= name
import seaborn as sns index, columns= name column. This could be useful just to interpret
%matplotlib inline the coefficient of the regression.
# IMPORT MODELLING LIBRARIES
from sklearn.model_selection import MAKE PREDICTIONS
train_test_split
predictions = lm.predict(X_test) create predictions
from sklearn.linear_model import LinearRegression
plt.scatter(y_test,predictions)* plot predictions
from sklearn import metrics
sns.distplot((y_test-predictions),bins=50)* distplot of residuals
PRELIMINARY OPERATIONS scatter: this graph show the difference between actual values and
the values predicted by the model we trained. It should resemble as
df = pd.read_csv('data.csv') read data
much as possible a diagonal line .
df.head() check head df
distplot: this graph shows the distributions of the residual errors, that
df.info() check info df is, the difference between the actual values minus the predicted
df.describe() check stats df values; it should result in an as much as possible normal distribution.
df.columns check col names If not, maybe change model!
VISUALISE DATA EVALUATION METRICS
sns.pairplot(df) pairplot print('MAE:', metrics.mean_absolute_error(y_test, predictions))
sns.distplot(df['Y']) distribution plot print('MSE:', metrics.mean_squared_error(y_test, predictions))
sns.heatmap(df.corr(), annot=True) heatmap with values print('RMSE:', np.sqrt(metrics.mean_squared_error(y_test, predictions))
MAE is the easiest to understand, because it's the average error.
TRAIN MODEL MSE is more popular than MAE, because MSE "punishes" larger
CREATE X and y --------------- errors, which tends to be useful in the real world.
RMSE is even more popular than MSE, because RMSE is interp‐
X = df[['col1','col2',etc.]] create df features
retable in the "y" units.
y = df['col'] create df var to predict
SPLIT DATASET ---------------
X_train, X_test, y_train, y_test = split df in train and test df
train_test_split(
X,
y,
test_size=0.3)
FIT THE MODEL ---------------
lm = LinearRegression() instatiate model
lm.fit(X_train, y_train) train/fit the model
SHOW RESULTS ---------------
lm.intercept_ show intercept
By DarioPittera (aggialavura) Not published yet. Sponsored by CrosswordCheats.com
Last updated 24th June, 2019. Learn to solve cryptic crosswords!
Page 1 of 1. http://crosswordcheats.com
cheatography.com/aggialavura/
www.dariopittera.com